Questions about STL containers in C++

Questions about STL containers in C++ - c++

How often do std::multimap and std::unordered_multimap shuffle entries around? I'm asking because my code passes around references to distinguish between entries with the same hash, and I want to know when to run the reference redirection function on them.
What happens if I do this:
std::multimap atable; //Type specification stuff left out
//Code that pus in two entries with the same key, call that key foo
int bar = atable[foo];
Is the result different if it's an unordered_multimap instead?
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Do the entries move around if I remove one of the entries (That's what's suggested by a reading of the documentation for std::vector)?

NO, no elements will be harmed during any operation.
As is explained in this famous Q&A, for associative containers, there is no iterator invalidation upon insertions / erasure (except for the element being erased of course). For unordered associative containers, there is iterator invalidation during rehashing, about which the Standard says (emphasize mine)
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The
number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.
Again, this does not entail the reshuflling of the actually stored elements (the Key and Value types in unordered_map<Key, Value>), because unordered maps have buckets that are organized as linked lists, and iterators to stored elements (key-value pairs) have both an element pointer and a bucket pointer. The rehashing shuffles buckets, which invalidates the iterators (because their bucket pointer is invalidated) but not pointers or references to the elements itself. This is explained in detail in another Q&A

How often do std::multimap and std::unordered_multimap shuffle entries around?
Never. The iterators that point to elements of any associative container (including sets, maps, and their unordered or "multi" versions) are never invalidated (unless the specific element they point to is deleted). In other words, the actual elements are never "shuffled around". These are required to be implemented as linked structures (e.g., linked-tree), meaning they can be re-structured just by changing a few pointers, without having to physically move any element.
EDIT: Apparently (see TemplateRex' comment), this is not the case for unordered containers. In that case, the iterators can get invalidated, but the elements themselves do not move around. These requirements imply an indirect container with no back-pointer, which I guess is a reasonable choice, but not one I would have expected.
What happens if I do this: ... (get [] of multimap) ...
The operator[] is not defined for std::multimap (or unordered version). So, what would happen? A compiler error would happen.
Is the result different if it's an unordered_multimap instead?
No, it's the same, the operator[] does not exist.
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Yes, the recommended practice is to refer to elements of the map / set / whatever using iterators, not references. The iterators to elements are guaranteed to remain valid, and they are copyable and have the right const-ness protection on them, and that makes them the perfect objects to "refer to an entry".
EDIT: As per the same comment, I would have to recommend using pointers to the elements if dealing with a hashed container (unordered containers), because iterators are not guaranteed (by the standard) to remain valid.

All of the associative containers in the C++ standard library are node based, i.e., their elements stay put. However, whether the hash is computed on the object after copying it or on a temporary object passed to the container isn't specified. I would guess, that generally the hash is computed before the object is copied/moved.
To distinguish elements with the same hash you need to have an equality function anyway: if the location of the object causes it to be different it would mean that all objects are different and you wouldn't be able to look them up at all. You need to have an equality function for the elements in an unordered container which defines equivalence of keys. For the ordered associative the equivalent class is based on the strict weak ordering, i.e., on an expression like this (using < rather than a binary predicate for readability; any binary predicate defining a strict weak order would work, too):
bool equivalent = !(a < b) && !(b < a);

Related

Sequence containers - can elements be accessed sequentially only

It is stated everywhere that the common property of all sequential containers is that the elements can be accessed sequentially. But we know that std::array, std::vector and std::deque all support fast random access to the elements. std::list supports bidirectional iteration, whereas std::forward_list supports only unidirectional iteration.
So what does actually "accessed sequentially" means here?

A Sequence Container has the requirement that its elements are stored in a well-defined, determined order, such that a function like front() or a reference to its nth element is meaningful. The fact that sequential access is permitted does not preclude that random access is also allowed.
In contrast, there is no requirement that the elements of an Associative Container are stored in any particular order. So, for example, attempting to call front() on a std::set object is meaningless.

Sequential access has little to do with random access or iterators.
For example, std::set's iterator is a bidirectional iterator. You can iterate over elements of a std::set just like you would over elements of a std::vector.
Sequence containers have a front and a back, and all their elements are between those, in the order you inserted them in. Contrast this with a std::set, which conceptually does have a front and a back (minimum and maximum values), but that stores its elements in an order defined by a comparison function. Contrast this also with a std::unordered_set, which does not really have a front and a back, and stores its elements in an order determined by a hash function. Finally, contrast this with a std::stack, which only has a top (conceptually, a back, but no front).
The only other standard container that has a front and a back and stores its elements in the order you insert them in is std::queue. However, you cannot access any arbitrary element in a queue without accessing and removing all the elements in front of it.
So, if I had to give a definition of a sequence container, it would be one with sequential access, meaning access to any of its elements in the order you insert them in, without having to do anything other than iterate over it. As a result, you can sort a sequence container.
Not to be confused with contiguous (or random) access.
That said, it isn't a very useful categorization. More useful categories are those of iterators and of complexity of operations.

unordered_map pointer to element's value valid after resize?

If I have an unordered_map<key, someNiceObject>
(note someNiceObject is not a pointer)
I have an API which inserts a new element, then returns a pointer to someNiceObject now in the map.
If I perform further insertions into the map, there could be a capacity change. If that happens is the pointer still valid or not?
I tried reading
Basic questions: Pointers to objects in unordered_maps (C++),
std::unordered_map pointers/reference invalidation and
http://eel.is/c++draft/unord.req#9
and couldn't locate the necessary information
Thanks all
edit: it seems that the pointer would be valid (https://www.thecodingforums.com/threads/do-insert-erase-invalidate-pointers-to-elements-values-of-std-unordered_map.961062/)
though would appreciate a second confirmation from someone here on SO.

According to cppreference:
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated.
Which implies that pointers aren't invalidated either. This is possible because std::unordered_map is conceptually can be thought of as std::vector<std::forward_list<std::pair<Key, Value>>>. And since std::forward_list, like any other linked list, allocates each element separately, changes to the list don't affect the memory location of it's elements.

std::unordered map is unusual in that the iterator invalidation rules do NOT apply to references to elements (excluding removal but what can you do when the item is gone?). A capacity change is not important. The problem is when the unordered map rehashes. Rehashing will invalidate all iterators, but will not invalidate references.
From point 9 of [unord.req] in the C++ Standard (citing n4618 because that's what I have on hand at the moment),
The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements. For unordered_multiset and unordered_multimap, rehashing preserves the relative ordering of equivalent elements.
emphasis mine

Does the C++ standard define the structure of a bucket for unordered_set?

When a hash value for an element in a unordered_set is computed it is placed in a "bucket" together with other -- different -- elements but same hash value.
My experience is that the elements in such a bucket are stored in a singly linked list. Meaning, it gets very slow when searching inside a bucket with a bad hash function.
Is the singly linked list a requirement by the standard or just one possible implementation? Could one implement unordered_set with sets as buckets?

The standard states the requirements and guarantees, but doesn't explicitly force the underlying data structures and algorithms.
N4140 §23.2.5 [unord.req]/1
Unordered associative containers provide an ability for fast retrieval
of data based on keys. The worstcase complexity for most operations is
linear, but the average case is much faster.
This is a little weird, because it states the worst case complexity as a fact, instead of just allowing it.
N4140 §23.2.5 [unord.req]/9
The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements.
The above does seem to invalidate std::set as a possible data type, but should allow a set-like data structure if it allowed moving elements between its instances without invalidating pointers or references.
That leaves one hurdle: sets would require a comparator/operator< to be defined (with strict weak ordering semantics), while unordered associative containers make no such requirement. In this case you could simply fall back to linked list if it isn't defined, though.
So, as far as I can tell, you could replace the linked list with a set-like structure, if the aforementioned conditions were met. That being said, it does feel like a problem that you shouldn't have experienced in the first place, had you used a proper hashing algorithm.

Determinism with insert in unordered containers

If I insert the same (size and value) elements in two unordered containers, will traversing the containers with two iterators always give the same element in the same position?
If yes, can a (single!) hash function be made to break this determinism ?

It depends: if you insert the same elements in the same order into two different unordered containers, then the order should be the same across both containers, even though the order itself is unspecified.
The reasoning is a little convoluted: all operations like hash(k) and the reallocations are deterministic. There is no actual quote in the Standard though, but the ability to do a find() in O(1) after an insert() seems to rule out any kind of randomized or otherwise non-deterministic insertion.
However, if you change the order of insertion, then all bets are off because internal reallocations will change the order of elements:
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized
into buckets. Keys with the same hash code appear in the same bucket.
The number of buckets is automatically increased as elements are added
to an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.

C++ map allocator stores items in a vector?

Here is the problem I would like to solve: in C++, iterators for map, multimap, etc are missing two desirable features: (1) they can't be checked at run-time for validity, and (2) there is no operator< defined on them, which means that they can't be used as keys in another associative container. (I don't care whether the operator< has any relationship to key ordering; I just want there to be some < available at least for iterators to the same map.)
Here is a possible solution to this problem: convince map, multimap, etc to store their key/data pairs in a vector, and then have the iterators be a small struct that contain a pointer to the vector itself and a subscript index. Then two iterators, at least for the same container, could be compared (by comparing their subscript indices), and it would be possible to test at run time whether an iterator is valid.
Is this solution achievable in standard C++? In particular, could I define the 'Allocator' for the map class to actually put the items in a vector, and then define the Allocator::pointer type to be the small struct described in the last paragraph? How is the iterator for a map related to the Allocator::pointer type? Does the Allocator::pointer have to be an actual pointer, or can it be anything that supports a dereference operation?
UPDATE 2013-06-11: I'm not understanding the responses. If the (key,data) pairs are stored in a vector, then it is O(1) to obtain the items given the subscript, only slightly worse than if you had a direct pointer, so there is no change in the asymptotics. Why does a responder say map iterators are "not kept around"? The standard says that iterators remain valid as long as the item to which they refer is not deleted. As for the 'real problem': say I use a multimap for a symbol table (variable name->storage location; it is a multimap rather than map because the variables names in an inner scope may shadow variables with the same name), and say now I need a second data structure keyed by variables. The apparently easiest solution is to use as key for the second map an iterator to the specific instance of the variable's name in the first map, which would work if only iterators had an operator<.

I think not.
If you were somehow able to "convince" map to store its pairs in a vector, you would fundamentally change certain (at least two) guarantees on the map:
insert, erase and find would no longer be logarithmic in complexity.
insert would no longer be able to guarantee the validity of unaffected iterators, as the underlying vector would sometimes need to be reallocated.
Taking a step back though, two things suggest to me that you are trying to "solve" the wrong problem.
First, it is unusual to need to have a vector of iterators.
Second, it is unusual to need to check an iterator for validity, as iterators are not generally kept around.
I wonder what the real problem is that you are trying to solve?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js