Determinism with insert in unordered containers - c++

If I insert the same (size and value) elements in two unordered containers, will traversing the containers with two iterators always give the same element in the same position?
If yes, can a (single!) hash function be made to break this determinism ?

It depends: if you insert the same elements in the same order into two different unordered containers, then the order should be the same across both containers, even though the order itself is unspecified.
The reasoning is a little convoluted: all operations like hash(k) and the reallocations are deterministic. There is no actual quote in the Standard though, but the ability to do a find() in O(1) after an insert() seems to rule out any kind of randomized or otherwise non-deterministic insertion.
However, if you change the order of insertion, then all bets are off because internal reallocations will change the order of elements:
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized
into buckets. Keys with the same hash code appear in the same bucket.
The number of buckets is automatically increased as elements are added
to an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.

Related

unordered_map pointer to element's value valid after resize?

If I have an unordered_map<key, someNiceObject>
(note someNiceObject is not a pointer)
I have an API which inserts a new element, then returns a pointer to someNiceObject now in the map.
If I perform further insertions into the map, there could be a capacity change. If that happens is the pointer still valid or not?
I tried reading
Basic questions: Pointers to objects in unordered_maps (C++),
std::unordered_map pointers/reference invalidation and
http://eel.is/c++draft/unord.req#9
and couldn't locate the necessary information
Thanks all
edit: it seems that the pointer would be valid (https://www.thecodingforums.com/threads/do-insert-erase-invalidate-pointers-to-elements-values-of-std-unordered_map.961062/)
though would appreciate a second confirmation from someone here on SO.
According to cppreference:
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated.
Which implies that pointers aren't invalidated either. This is possible because std::unordered_map is conceptually can be thought of as std::vector<std::forward_list<std::pair<Key, Value>>>. And since std::forward_list, like any other linked list, allocates each element separately, changes to the list don't affect the memory location of it's elements.
std::unordered map is unusual in that the iterator invalidation rules do NOT apply to references to elements (excluding removal but what can you do when the item is gone?). A capacity change is not important. The problem is when the unordered map rehashes. Rehashing will invalidate all iterators, but will not invalidate references.
From point 9 of [unord.req] in the C++ Standard (citing n4618 because that's what I have on hand at the moment),
The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements. For unordered_multiset and unordered_multimap, rehashing preserves the relative ordering of equivalent elements.
emphasis mine

C++11: does unordered_map/set guarantees traversing order as insert order?

I wrote some code like this:
unordered_map<int, int> uii;
uii.insert(make_pair(12,4));
uii.insert(make_pair(3,2));
uii.insert(make_pair(6,1));
uii.insert(make_pair(16,9));
....
When I use a for loop to visit this map, it prints key in the right order of my insertion. I tested unordered_set, with same result.
So my question is, does the C++ standard guarantee the visiting order as insert order, just like Java's LinkedHashMap?
No, it is unordered, there is no such guarantee.
Elements in an unordered associative container are organized into
buckets, keys with the same hash will end up in the same bucket. The
number of buckets is increased when the size of the container
increases to keep the average number of elements in each bucket under
a certain value.
Rehashing invalidates iterator and might cause the elements to be
re-arranged in different buckets but it doesn't invalidate references
to the elements.
This is valid for both unordered_map and unordered_set.
You might also want to check this question Keep the order of unordered_map as we insert a new key
But, internally an implementation of unordered container might use list or other ordered container to store elements and store only references to sublists in its buckets, that would make iteration order to coincide with the insertion order until enough elements are inserted to cause list rearranging. That is the case with VS implementation.

Does the C++ standard define the structure of a bucket for unordered_set?

When a hash value for an element in a unordered_set is computed it is placed in a "bucket" together with other -- different -- elements but same hash value.
My experience is that the elements in such a bucket are stored in a singly linked list. Meaning, it gets very slow when searching inside a bucket with a bad hash function.
Is the singly linked list a requirement by the standard or just one possible implementation? Could one implement unordered_set with sets as buckets?
The standard states the requirements and guarantees, but doesn't explicitly force the underlying data structures and algorithms.
N4140 §23.2.5 [unord.req]/1
Unordered associative containers provide an ability for fast retrieval
of data based on keys. The worstcase complexity for most operations is
linear, but the average case is much faster.
This is a little weird, because it states the worst case complexity as a fact, instead of just allowing it.
N4140 §23.2.5 [unord.req]/9
The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements.
The above does seem to invalidate std::set as a possible data type, but should allow a set-like data structure if it allowed moving elements between its instances without invalidating pointers or references.
That leaves one hurdle: sets would require a comparator/operator< to be defined (with strict weak ordering semantics), while unordered associative containers make no such requirement. In this case you could simply fall back to linked list if it isn't defined, though.
So, as far as I can tell, you could replace the linked list with a set-like structure, if the aforementioned conditions were met. That being said, it does feel like a problem that you shouldn't have experienced in the first place, had you used a proper hashing algorithm.

Is order of iteration over the elements of std::unordered_set guaranteed to be always the same?

If iterate over the elements of std::unordered_set multiple times without changing the contents of the set (but potentially reading from it, calculating its size, etc.), is it guaranteed that the elements will be visited in the same order each time?
In the specific case you mention, yes. Because the standard is explicit about when re-hashing (and therefore re-ordering) takes place.
It only happens during an insert.
§ 23.2.5 [unord.req]
9 The elements of an unordered associative container are organized into buckets. Keys with the same hash
code appear in the same bucket. The number of buckets is automatically increased as elements are added
to an unordered associative container, so that the average number of elements per bucket is kept below
a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets
elements appear in, but does not invalidate pointers or references to elements. For unordered_multiset
and unordered_multimap, rehashing preserves the relative ordering of equivalent elements.

Questions about STL containers in C++

How often do std::multimap and std::unordered_multimap shuffle entries around? I'm asking because my code passes around references to distinguish between entries with the same hash, and I want to know when to run the reference redirection function on them.
What happens if I do this:
std::multimap atable; //Type specification stuff left out
//Code that pus in two entries with the same key, call that key foo
int bar = atable[foo];
Is the result different if it's an unordered_multimap instead?
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Do the entries move around if I remove one of the entries (That's what's suggested by a reading of the documentation for std::vector)?
NO, no elements will be harmed during any operation.
As is explained in this famous Q&A, for associative containers, there is no iterator invalidation upon insertions / erasure (except for the element being erased of course). For unordered associative containers, there is iterator invalidation during rehashing, about which the Standard says (emphasize mine)
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The
number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.
Again, this does not entail the reshuflling of the actually stored elements (the Key and Value types in unordered_map<Key, Value>), because unordered maps have buckets that are organized as linked lists, and iterators to stored elements (key-value pairs) have both an element pointer and a bucket pointer. The rehashing shuffles buckets, which invalidates the iterators (because their bucket pointer is invalidated) but not pointers or references to the elements itself. This is explained in detail in another Q&A
How often do std::multimap and std::unordered_multimap shuffle entries around?
Never. The iterators that point to elements of any associative container (including sets, maps, and their unordered or "multi" versions) are never invalidated (unless the specific element they point to is deleted). In other words, the actual elements are never "shuffled around". These are required to be implemented as linked structures (e.g., linked-tree), meaning they can be re-structured just by changing a few pointers, without having to physically move any element.
EDIT: Apparently (see TemplateRex' comment), this is not the case for unordered containers. In that case, the iterators can get invalidated, but the elements themselves do not move around. These requirements imply an indirect container with no back-pointer, which I guess is a reasonable choice, but not one I would have expected.
What happens if I do this: ... (get [] of multimap) ...
The operator[] is not defined for std::multimap (or unordered version). So, what would happen? A compiler error would happen.
Is the result different if it's an unordered_multimap instead?
No, it's the same, the operator[] does not exist.
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Yes, the recommended practice is to refer to elements of the map / set / whatever using iterators, not references. The iterators to elements are guaranteed to remain valid, and they are copyable and have the right const-ness protection on them, and that makes them the perfect objects to "refer to an entry".
EDIT: As per the same comment, I would have to recommend using pointers to the elements if dealing with a hashed container (unordered containers), because iterators are not guaranteed (by the standard) to remain valid.
All of the associative containers in the C++ standard library are node based, i.e., their elements stay put. However, whether the hash is computed on the object after copying it or on a temporary object passed to the container isn't specified. I would guess, that generally the hash is computed before the object is copied/moved.
To distinguish elements with the same hash you need to have an equality function anyway: if the location of the object causes it to be different it would mean that all objects are different and you wouldn't be able to look them up at all. You need to have an equality function for the elements in an unordered container which defines equivalence of keys. For the ordered associative the equivalent class is based on the strict weak ordering, i.e., on an expression like this (using < rather than a binary predicate for readability; any binary predicate defining a strict weak order would work, too):
bool equivalent = !(a < b) && !(b < a);