Am I right in assuming that adding/removing elements to an std::map does not effect the other elements (ie cause them to be relocated in memory) and so that the following is safe:
I looked at various sites with info on the container but only found out about the cases where iterators are invalidated, which I already know...
std::map<std::string,std::string> map;
PopulateMap(map);
std::string &a= map["x"];
AddMoreData(map);
RemoveRandomKeysExceptX(map);
map["x"] = "foo";
std::cout << a << " " << map["x"] << std::endl;//prints "foo foo"
a = "bar";
std::cout << a << " " << map["x"] << std::endl;//prints "bar bar"
I tested some similar code on VC9, which seems to work however that doesn't mean I didn't just get lucky or that it doesn't vary across compilers.
The Standard is clear on this in 23.1.2/8 about associative containers
The insert members shall not affect the validity of iterators and references to the container, and the erase members shall invalidate only iterators and references to the erased elements.
Map has the important property that inserting a new element into a map does not invalidate iterators that point to existing elements.
quote taken from sgi docs.
If the iterators are guaranteed not to change then the values they point at also cannot change.
naveen previously had an answer that was similar to this. Unless there is a mistake in my logic what you are doing is safe.
Edit 2:
See point 3 in sgi docs to see how getting a value from operator [] is the same as getting the value from an iterator.
Yes you can count on this.
// retrieve reference to string stored at "x"
// note that since [] returns a reference, it must insert an element at "x" if
// it doesn't exists (in this case an empty string)
std::string &a= map["x"];
// retrieve reference for "x" again and set value to "foo"
map["x"] = "foo";
// use already stored reference
a = "bar";
Related
While messing around with type-punning iterators, I came across the ability to do
std::vector<int> vec{ 3, 7, 1, 8, 4 };
int* begin_i = (int*)(void*)&*vec.begin();
std::cout << "1st: " << begin_i << " = " << *begin_i << std::endl;
begin_i++;
std::cout << "2nd: " << begin_i << " = " << *begin_i << std::endl;
Then I tried to do the same kind of thing with an std::unordered_set:
std::unordered_set<int> set{ 3, 7, 1, 8, 4 };
for (auto& el : set)
{ // Display the order the set is currently in
std::cout << el << ", ";
}
std::cout << '\n' <<std::endl;
int* begin_i = (int*)(void*)&*set.begin();
std::cout << "1st: " << begin_i << " = " << *begin_i << std::endl;
begin_i++;
std::cout << "2nd: " << begin_i << " = " << *begin_i << std::endl;
But the output I got was:
4, 8, 1, 7, 3,
1st: [address] = 4
2nd: [address] = 0
I'm supposing this is because and an unordered set's elements are located in different parts of memory? I was confused here considering that I also printed the order the elements were stored in using a range-based loop.
My question is how does an std::unordered_set store its elements in memory? What happens when an element is added to the set? Where does it go in memory and how is that kept track of if it's not stored in an array-like container where the elements are one-right-after-the-other?
An unordered_set is implemented as a hash table using external chaining.
That basically means you have an array of linked lists (which are usually called "buckets"). So, to add an item to an unordered_set you start by hashing the new item you're doing to insert. You then take that hash and reduce it to the range of the current size of the array (which can/will expand as you add more items). You then add the new item at the tail of that linked list.
So, depending on the value produced by the hash, two consecutively inserted items may (and often will) be inserted in the linked lists at completely different parts of the table. Then the node in the linked list will typically be dynamically allocated, so even two consecutive items in the same linked list may be at completely unrelated addresses.
As I noted in an earlier answer, however, quite a bit more about this is actually specified in the standard than most people seem to realize. As I outlined there, it might be (barely) possible to violate the expectation and still (sort of) meet the requirements in the standard, but even at best, doing so would be quite difficult. For most practical purposes, you can assume it's something quite a bit like a vector of linked lists.
Most of the same things apply to an unordered_multiset--the only fundamental difference is that you can have multiple items with the same key instead of only one item with a particular key.
Likewise, there are also unordered_map and unordered_multimap, which are pretty similar again, except that they separate the things being stored into a key and a value associated with that key, and when they do hashing, the only look at the key part, not the value part).
Rather than directly answer the question, I would like to address the "type-punning" trick. (I put that in quotes because the provided code does not demonstrate type-punning. Perhaps the code was appropriately simplified for this question. In any event, *vec.begin() gives an int, so &*vec.begin() is an int*. Further casting to void* then back to int* is a net no-op.)
The property your code takes advantage of is
*(begin_i + 1) == *(vec.begin() + 1) // Using the initial value of begin_i
*(&*vec.begin() + 1) == *(vec.begin() + 1) // Without using an intermediary
This is a property of a contiguous iterator, which is associated with a contiguous container. These are the containers that store their elements in adjacent memory locations. The contiguous containers in the standard library are string, array, and vector; these are the only standard containers for which your trick is guaranteed to work. Trying it on a deque will probably seem to work at first, but the attempt will fail if enough is added to &*begin(). Other containers tend to dynamically allocate elements individually, so there need not be any relation between the addresses of elements; elements are linked together by pointers rather than by position/index.
So that I'm not ignoring the asked question:
An unordered set is merely required to organize elements into buckets. There are no requirements on how this is done, other than requiring that all elements with the same hash value are placed in the same bucket. (This does not imply that all elements in the same bucket have the same hash value.) In practice, each bucket is probably implemented as a list, and the container of buckets is probably a vector, simply because re-using code is cool. At the same time, this is an implementation detail, so it can very from compiler to compiler, and even from compiler version to compiler version. There are no guarantees.
The way std::unordered_set stores its memory is implementation defined. Standart doesn't care as long as it satisfies the requirements.
In VS version it stores them inside an std::list (fast access is provided by creating and managing additional data) - so each element has also pointers towards prev and next is stored via new (at least that's what I remember from std::list).
I have two elements (6 and 747) that share their key ("eggs"). I want to find all the elements that share a key (let's say "eggs", but I would in real life do that for every key). How to do that?
There must be a way to get a container or something back from the data structure . . .
You're still mistaking key's value with key's hash. But to answer question as asked: you can use unordered_map's bucket() member function with bucket iterators:
std::unordered_map<int,int,dumbest_hash> m;
m[0] = 42;
m[1] = 43;
size_t bucket = m.bucket(1);
for(auto it = m.begin(bucket), e = m.end(bucket); it != e; ++it) {
cout << "bucket " << bucket << ": " << it->first << " -> " << it->second << '\n';
}
demo
In simple and mostly correct terms, unordered containers imitate their ordered counterparts in terms of interface. That means that if a map will not allow you to have duplicate keys, then neither will unordered_map.
unordered do employ hashing function to speed up the lookup, but if two keys have the same hash, they will not necessarily have the same value. To keep the behaviour similar to the ordered containers, unordered_set and unordered_map will only consider elements equal when they're actually equal (using operator== or provided comparator), not when their hashed values collide.
To put things in perspective, let's assume that "eggs" and "chicken" have the same hash value and that there's no equality checking. Then the following code would be "correct":
unordered_map<string, int> m;
m["eggs"] = 42;
m.insert(make_pair("chicken", 0)); // not inserted, key already exists
assert(m["chicken"] == 42);
But if you want allow duplicate keys in the same map, simply use unordered_multimap.
Unordered map does not have elements that share a key.
Unordered multi map does.
Use umm.equal_range(key) to get a pair of iterators describing the elements in the map that match a given key.
However, note that "collision" when talking about hashed containers usually refers to elements with the same hashed key, not the same key.
Also, consider using a unordered_map<key, std::vector<value>> instead of a multimap.
I have a std::vector of a custom class (using int in sample for simplicity). I would like to keep a reference/pointer/link/other to a member of the vector. However, the vector frequently has elements removed and added.
To illustrate my point, in the sample below I take either a reference or a pointer to the second element of the vector. I use the reference/pointer to increase the value of the chosen element. I then erase the first element, and use the ref/pointer to increment again.
Reference example:
std::vector<int> intVect = {1,1,1};
int& refI = intVect.at(1);
refI++;
intVect.erase(intVect.begin());
refI++;
Smart-Pointer example:
std::vector<int> intVect2 = {1,1,1};
std::shared_ptr<int> ptrI = std::make_shared<int>(intVect2.at(1)) ;
*ptrI = *ptrI +1;
intVect2.erase(intVect2.begin());
*ptrI = *ptrI +1;
What I would like to happen is to end up with the referenced element to have a value of 3, the final vector being composed of {3,1}. However, in the reference example, the final vector is {2,2}, and in the pointer example the final vector is {1,1}.
Understanding that the pointer is essentially a memory address, I can understand why this method might not be possible, but if it somehow is, let me know.
The more important question is then, what alternate approach or structure could be used that would allow for some form of ref/pointer/link/other to that element (be it a value or an object) that is viable after adding members to, or removing members from, the vector(or other structure) that contains it?
For extra credit:
The objects I am actually working with have a position property. I have a second structure that needs to keep track of the objects for quick lookup of which objects are at which positions. I am currently using a grid (vector of vectors) to represent possible positions, each holding indexes into the vector of objects for the objects currently at the position. However, when an object is deleted from the vector (which happens very frequently, up to hundreds of times per iteration), my current resort is to loop through every grid position and decrement any indexes greater than the deleted index, which is slow and clumsy. Additional thoughts in regards to this problem in context are much appreciated, but my key question concerns the above examples.
One possible option is to have the vector store std::shared_ptr objects, and issue std::weak_ptr or std::shared_ptr objects to refer to the object in question.
std::vector<std::shared_ptr<int>> ints;
for(size_t i = 0; i < 10000; i++) {
ints.emplace_back(std::make_shared<int>(int(i)));
}
std::weak_ptr<int> my_important_int = ints[6000];
{
auto lock = my_important_int.lock();
if(lock) std::cout << *lock << std::endl;
else std::cout << "index 6000 expired." << std::endl;
}
auto erase_it = std:remove_if(ints.begin(), ints.end(), [](auto & i) {return (*i) > 5000 && ((*i) % 4) != 0;});
ints.erase(erase_it, ints.end());
{
auto lock = my_important_int.lock();
if(lock) std::cout << *lock << std::endl;
else std::cout << "index 6000 expired." << std::endl;
}
ints.erase(ints.begin(), ints.end());
{
auto lock = my_important_int.lock();
if(lock) std::cout << *lock << std::endl;
else std::cout << "index 6000 expired." << std::endl;
}
Which should print out:
6000
6000
index 6000 expired.
A container that stores key/value pairs might work for you. For example, std::map or std::unordered_map.
When using these containers, you'd keep a reference to the desired object by storing the key. If you want to modify said object, just look it up in the container using the key. Now you can add/remove other objects as much as you want without affecting the object in question (assuming the added/removed objects have unique keys).
If there is a way for you to keep using a vector and change the way you manage your objects, then you won't get much more performance than what you have now.
Otherwise, you can use a stable vector (here's the boost version). It is essentially a vector of pointers, which grants it iterator and reference stability. This means that iterators (pointers) and references to the elements are not invalidated by any operation other than removing the element itself.
Of course, there are some big drawbacks to this, mainly in performance. The two main performance issues are the fact that you go through a pointer every time you want to access an element, and the fact that the elements are not stored contiguously (which of course impacts the speed of iteration).
However, it also has advantages over other pointer-heavy data types (lists, sets, maps). Mainly, it performs lookup and pushbacks in constant time, even though it's slower than a normal vector.
Then again, if you really need performance, you might want to keep your vector and rethink your design around it.
I have a std::map associating const char* keys with int values:
std::map<const char*, int> myMap;
I initialize it with three keys, then check if it can find it:
myMap["zero"] = 0;
myMap["first"] = 1;
myMap["second"] = 2;
if (myMap.at("zero") != 0)
{
std::cerr << "We have a problem here..." << std::endl;
}
And nothing is printed. From here, everything looks ok.
But later in my code, without any alteration of this map, I try to find again a key:
int value = myMap.at("zero");
But the at function throws an std::out_of_range exception, which means it cannot find the element. myMap.find("zero") thinks the same, because it returns an iterator on the end of the map.
But the creepiest part is that the key is really in the map, if just before the call to the at function, I print the content of the map like this:
for (auto it = myMap.begin(); it != myMap.end(); it++)
{
std::cout << (*it).first << std::endl;
}
The output is as expected:
zero
first
second
How is it even possible? I don't use any beta-test library or anything supposed to be unstable.
You have a map of pointers to characters, not strings. The map lookup is based on the pointer value (address) and not the value of what's pointed at. In the first case, where "zero" is found in the map, you compiler has performed some string merging and is using one array of characters for both identical strings. This is not required by the language but is a common optimization. In the second case, when the string is not found, this merging has not been done (possibly your code here is in a different source module), so the address being used in the map is different from what was inserted and is then not found.
To fix this either store std::string objects in the map, or specify a comparison in your map declaration to order based on the strings and not the addresses.
key to map is char * . So map comparison function will try to compare raw pointer values and not the c style char string equivalence check. So declare the map having std::string as the key.
if you do not want to deal with the std::string and still want the same functionality with improved time complexity, sophisticated data structure is trie. Look at some implementations like Judy Array.
This is a question that goes to how BOOST_FOREACH checks it's loop termination
cout << "Testing BOOST_FOREACH" << endl;
vector<int> numbers; numbers.reserve(8);
numbers.push_back(1); numbers.push_back(2); numbers.push_back(3);
cout << "capacity = " << numbers.capacity() << endl;
BOOST_FOREACH(int elem, numbers)
{
cout << elem << endl;
if (elem == 2) numbers.push_back(4);
}
cout << "capacity = " << numbers.capacity() << endl;
gives the output
Testing BOOST_FOREACH
capacity = 8
1
2
3
capacity = 8
But what about the number 4 which was inserted half way through the loop? If I change the type to a list the newly inserted number will be iterated over. The vector push_back operation will invalidate any pointers IF a reallocation is required, however that is not happening in this example. So the question I guess is why does the end() iterator appear to only be evaluated once (before the loop) when using vector but has a more dynamic evaluation when using a list?
Under the covers, BOOST_FOREACH uses
iterators to traverse the element
sequence. Before the loop is executed,
the end iterator is cached in a local
variable. This is called hoisting, and
it is an important optimization. It
assumes, however, that the end
iterator of the sequence is stable. It
usually is, but if we modify the
sequence by adding or removing
elements while we are iterating over
it, we may end up hoisting ourselves
on our own petard.
http://www.boost.org/doc/libs/1_40_0/doc/html/foreach/pitfalls.html
If you don't want the end() iterator to change use resize on the vector rather than reserve.
http://www.cplusplus.com/reference/stl/vector/resize/
Note that then you wouldn't want to push_back but use the operator[] instead. But be careful of going out of bounds.
The question was raised in the comments as to why the Microsoft debug runtime raises an assertion during iteration over the vector but not over the list. The reason is that insert is defined differently for list and vector (note that push_back is just an insert at the end of the sequence).
Per the C++ standard (ISO/IEC 14882:2003 23.2.4.3, vector modifiers):
[on insertion], if no reallocation happens, all the iterators and references before the insertion point remain valid.
(23.2.2.3, list modifiers):
[insert] does not affect the validity of iterators and references.
So, if you use push_back (and are sure that it's not going to cause a reallocation), it's okay with either container to continue using your iterator to iterate over the rest of the sequence.
In the case of the vector, however, it's undefined behavior to use the end iterator that you obtained before the push_back.
This is a roundabout answer to the question; it's a direct answer to the discussion in the question's comments.
boost's foreach will terminate when it's iterator == numbers.end()
Be careful though, calling push_back can/will invalidate any current iterators you have.