unordered_map rehasing on erase()

unordered_map rehasing on erase() - c++

I am not entirely clear on whether unordered_map is allowed to perform a rehash when doing an erase()
It is pretty clear that rehashing can happen during insert() thus invalidating all iterators and references:
http://en.cppreference.com/w/cpp/container/unordered_map/insert
But erase() seems to preserve all iterators and references except for the ones erased:
http://en.cppreference.com/w/cpp/container/unordered_map/erase
However, that last page and the Standard, indicate that erase() worst execution time is O(size). What operation can take linear time to complete and not modify the container in a way it invalidates iterators?
This post suggests that iterators are invalidated during erasure:
http://kera.name/articles/2011/06/iterator-invalidation-rules-c0x/
I also read somewhere that a future proposal will allow rehashing on erase(). Is that true?
If indeed rehashing occurs, the old iterate and erase algorithm is wrong right?

If you have a very bad hash, or pathological data, all of the elements can end up in one bucket, making it an O(n) traversal to locate/remove the element.
It is perfectly legal to implement a std::unordered_map using a (singly-)linked-list for the elements: we can see that its iterator type is only required to fulfil the ForwardIterator concept. This makes a linear traversal necessary to remove an element within a bucket, even when passing the iterator.
This is the operation that may take O(n) time, not rehashing.

Related

unordered_map pointer to element's value valid after resize?

If I have an unordered_map<key, someNiceObject>
(note someNiceObject is not a pointer)
I have an API which inserts a new element, then returns a pointer to someNiceObject now in the map.
If I perform further insertions into the map, there could be a capacity change. If that happens is the pointer still valid or not?
I tried reading
Basic questions: Pointers to objects in unordered_maps (C++),
std::unordered_map pointers/reference invalidation and
http://eel.is/c++draft/unord.req#9
and couldn't locate the necessary information
Thanks all
edit: it seems that the pointer would be valid (https://www.thecodingforums.com/threads/do-insert-erase-invalidate-pointers-to-elements-values-of-std-unordered_map.961062/)
though would appreciate a second confirmation from someone here on SO.

According to cppreference:
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated.
Which implies that pointers aren't invalidated either. This is possible because std::unordered_map is conceptually can be thought of as std::vector<std::forward_list<std::pair<Key, Value>>>. And since std::forward_list, like any other linked list, allocates each element separately, changes to the list don't affect the memory location of it's elements.

std::unordered map is unusual in that the iterator invalidation rules do NOT apply to references to elements (excluding removal but what can you do when the item is gone?). A capacity change is not important. The problem is when the unordered map rehashes. Rehashing will invalidate all iterators, but will not invalidate references.
From point 9 of [unord.req] in the C++ Standard (citing n4618 because that's what I have on hand at the moment),
The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements. For unordered_multiset and unordered_multimap, rehashing preserves the relative ordering of equivalent elements.
emphasis mine

Is using std::find and C::insert() thread safe if iterators are not invalidated by insert

Suppose I have a container C whose iterators are not invalidated upon C.insert(), can I safely perform a std::find() on the container while a concurrent insert() is being performed? That is, am I guaranteed to find a matching element or C::end(), ignoring the fact that the inserted element may match but std::find() gives me C::end()?

No. Although iterators are not invalidated by your mutating operation, it is still a mutating operation, and none of the standard containers are defined to be safe to read in one thread while a mutating operation is taking place in another. Remember that there are still "innards" to your container, all manner of internal state, which may be involved in both operations.

Do the iterator invalidation rules mean thread safety?

Here in this Stack Overflow answer it is listed the iterator invalidation rules for the standard containers in C++11.
Particularly, there are for insertion:
[multi]{set,map}: all iterators and references unaffected [23.2.4/9]
unordered_[multi]{set,map}: all iterators invalidated when rehashing occurs, but references unaffected [23.2.5/8]. Rehashing does not occur if the insertion does not cause the container's size to exceed z * B where z is the maximum load factor and B the current number of buckets. [23.2.5/14]
erasure:
[multi]{set,map} and unordered_[multi]{set,map}: only iterators and references to the erased elements are invalidated
Do these rules mean I can safely do insertion and erasure in one thread, and safely in another thread access, look for (using find) elements as long as these elements are not the ones being inserted and erased in the first thread, and make sure that rehashing is not happening?
If not, what do these rules exactly mean?

The fact that iterators to elements of the container are not invalidated in no way implies thread safety on the container itself. For example, the size member variable would need to be modified atomically which is a totally separate issue from iterators being invalidated (or not) on insertion/deletion.
tl;dr; No.
These rules simply tell you when an iterator to an element is invalidated by an operation. For example, when a vector resizes, the underlying array is reallocated elsewhere so if you had an iterator (or pointer) to an element, it would no longer be valid after the resize (because it would be pointing to deleted elements of the old array).

There are two kinds of operations on C++ std containers. Reader and Writer operations (these are not the terms the standard uses, but this reads easier). In addition, there are operations on elements in the container.
A const method is a Reader method, as are "lookup" functions that are only non-const because they return a non-const reference to an element (or similar). There is a complete list in the standard, but common sense should work. vector::operator[], begin(), map::find() are all "Readers". map::operator[] is not a "Reader".
You can have any number of threads engaging in Reader operations at the same time no problem.
If you have any thread engaged in a Writer operation, no other access can occur on the container.
You cannot safely have one Reader and one Writer at the same time. If you have any Writers, you must have excluded all other access.
You can safely have 1337 readers at once.
Operations on elements is somewhat similar, except that Writing to an element need only exclude other access to that element. And you are responsible for making your const method play nice with each other. (the std guarantees that the const methods of the container will only call const methods of the elements)
Note that changing sorting order in a std:: associative container is UB without modifying the container.
An iterator that is not invalidated, where you just operate on the element, will (I believe) count as operations on the element. Only synchronization on that element is required.
Note that std::vector<bool> does not follow the above rules.
If you violate the above rules, the C++ std library does not stop you. It just states there is a race condition -- aka, undefined behavior. Anything can happen. In C++ std library, if you don't need something, you don't pay for it: and a single-threaded use of those containers would not need the weight of synchronization. So you don't get it by default.
A std::shared_timed_mutex or std::experimental::shared_mutex can both be useful to guarantee the above holds. unique_lock for Writing, and shared_lock for Reading. Write access to elements has to be shared_locked on the container guarded, and somehow guarded against overlapping access to the same element without deadlock.
Iterator invalidation rules are relatively orthogonal to the thread-safety rules.

Using find implies traversal, at least over a subset of the elements. insert and erase on [multi]{set,map} results in rebalancing the underlying tree, which impacts the links between the nodes. If a rebalance happens at the same time as a find, bad things will happen.
Similarly bad things will happen if you attempt a find during unordered_[multi]{set,map} insert or erase. insert can cause rehashing. And both insert and erase need to link/unlink elements from a list. If a find is searching over a list during a link/unlink, you lose.
[] on [unordered][multi]{set,map} is shorthand for "find and insert if not found". at is shorthand for find. So no, these are not safe to use either.
If you have an existing iterator into a [multi]{set,map}, you can continue to dereference (but not increment/decrement) that iterator while another element is inserted or erased. For unordered_[multi]{set,map}, this is true only if you can guarantee that rehashing won't happen under the insert (it never happens under the erase).

There are other answers here who go into the thread safety issue. So if these rules don't mean thread safety where does that leaves us?
If not, what do these rules exactly mean?
They tell you when you can't use an iterator anymore.
Lets take a (deceptive innocent) example:
auto v = std::vector<int>{....};
//v.reserve(...);
for (auto it = std::begin(v); it != std::end(v); ++it) {
if (*it == ...)
std::insert(it, ...);
}
Here you traverse a vector and for each element that tests a condition, you insert something into that position.
Now is this code valid? The iterator invalidation rules tells you that if the vector's capacity is big enough the insertion invalidates only iterator after the insert position. So if you can prove that the reserve (commented line) is big enough, then yes, the code is valid. If not, the code is invalid, as the insert invalidates all the iterators of the vector, which means that it is invalidated, which means that you cannot use it anymore. You'd have to have to reacquire it:
auto idx = std::distance(std::begin(v), it);
std::insert(it, ...);
it = std::begin(v) + idx;

C++: how to track a pointer to a STL list element?

I would like to track a pointer to a list element for the next read access. This pointer would be advanced every time the list is read. Would it be bad practice to cache an iterator to the list as a member variable, assuming I take proper precautions when deleting list elements?

An iterator to a list element remains valid until the element it refers to is removed. This is guaranteed by the standard.
There is no problem with caching an iterator as long as you make sure you don't remove elements from the list or refresh the cached iterator when you do.

Iterators are meant to be used. They aren't just for looping over each element in a for-loop. All you have to take into account is the rules for iterator invalidation.
std::vector iterators can be invalidated by any operation that inserts elements into the list. The iterators for all elements beyond the point of insertion are invalidated, and all iterators are invalidated if the insertion operation causes an increase in capacity. An operation that removes an element from the vector will invalidate any iterator after the point of removal.
std::deque iterators are invalidated by any operation that adds or removes elements from anywhere in the deque. So it's probably not a good idea to keep these around very long.
std::list, std::set, and std::map iterators are only invalidated by the specific removal of the particular element that the iterator refers to. These are the longest-lived of the iterator types.
As long as you keep these rules in mind, feel free to store these iterators all you want. It certainly isn't bad form to store std::list iterators, as long as you can be sure that that particular element isn't going anywhere.

The only way you're going to be able to properly advance though an STL std::list<T> in a platform and compiler independent way is through the use of a std::list<T>::iterator or std::list<T>::const_iterator, so that's really your only option unless you're planning on implementing your own linked-list. Per the standard, as others here have posted, an iterator to a std::list element will remain valid until that element is removed from the list.

STL vector vs map erase

In the STL almost all containers have an erase function. The question I have is in a vector, the erase function returns an iterator pointing to the next element in the vector. The map container does not do this. Instead it returns a void. Anyone know why there is this inconsistancy?

See http://www.sgi.com/tech/stl/Map.html
Map has the important property that
inserting a new element into a map
does not invalidate iterators that
point to existing elements. Erasing an
element from a map also does not
invalidate any iterators, except, of
course, for iterators that actually
point to the element that is being
erased.
The reason for returning an iterator on erase is so that you can iterate over the list erasing elements as you go. If erasing an item doesn't invalidate existing iterators there is no need to do this.

erase returns an iterator in C++11. This is due to defect report 130:
Table 67 (23.1.1) says that container::erase(iterator) returns an iterator. Table 69 (23.1.2) says that in addition to this requirement, associative containers also say that container::erase(iterator) returns void. That's not an addition; it's a change to the requirements, which has the effect of making associative containers fail to meet the requirements for containers.
The standards committee accepted this:
the LWG agrees the return type should be iterator, not void. (Alex Stepanov agrees too.)
(LWG = Library Working Group).

The inconsistency is due to use. vector is a sequence having an ordering over the elements. While it's true that the elements in a map are also ordered according to some comparison criterion, this ordering is non-evident from the structure. There is no efficient way to get from one element to the next (efficient = constant time). In fact, to iterate over the map is quite expensive; either the creation of the iterator or the iterator itself involves a walk over the complete tree. This cannot be done in O(n), unless a stack is used, in which case the space required is no longer constant.
All in all, there simply is no cheap way of returning the “next” element after erasing. For sequences, there is a way.
Additionally, Rob is right. There's no need for the Map to return an iterator.

Just as an aside, the STL shipped with MS Visual Studio C++ (Dinkumware IIRC) provides a map implementation with an erase function returning an iterator to the next element.
They do note it's not standards conforming.

I have no idea if this is the answer, but one reason might be with the cost of locating the next element. Iterating through a map is inherently "slow".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

unordered_map rehasing on erase() - c++

Related

unordered_map pointer to element's value valid after resize?

Is using std::find and C::insert() thread safe if iterators are not invalidated by insert

Do the iterator invalidation rules mean thread safety?

C++: how to track a pointer to a STL list element?

STL vector vs map erase

Categories

Resources