Can a reference to an element inside an std::map be invalidated? - c++

I have a multi-threaded application and a shared resource std::map<KeyType, ElementType>. I use a mutex to protect inserts, gets and removes.
My get method returns a reference to the stored element (unlocks on return), and then I do some work with that element.
Question: Is it possible that while working with the stored element reference, another thread may change the std::map so the element will be moved to a different address and the reference will no longer be valid? (I know there are certain ADT implementations which do rearrangement of the ADT on resize).

The iterator invalidation rule for associative containers (which std::map is) says at [associative.reqmts]/9:
The insert and emplace members shall not affect the validity of
iterators and references to the container, and the erase members shall
invalidate only iterators and references to the erased elements.
So if one thread inserts an element, it won't affect any references to existing elements. But if it removes something, other threads may be borked. Some form of element-wise locking is in order, I'd say.

Another thread may erase the element, or destroy the map, which would of course also invalidate the element.
Erasing an element only invalidates iterators and refewrences to this element. Insertion does not invalidate iterators or references into the map.
(that's what the second-hand documentation says, at least - and it's an assumption I hold that never was invalidated, if anecdotal evidence counts.)
Another problem remains: Manipulation of the element through the returned reference is not thread safe. You need to sync e.g. per element - and make sure you don't violate lock hierarchy.

Related

Are iterators still valid when the underlying elements have been moved?

If I have an iterator pointing to an element in an STL container, and I moved the element with the iterator, does the standard guarantee that the iterator is still valid? Can I use it with container's method, e.g. container::erase?
Also does it matter, if the container is a continuous one, e.g. vector, or non-continuous one, e.g. list?
std::list<std::string> l{"a", "b", "c"};
auto iter = l.begin();
auto s = std::move(*iter);
l.erase(iter); // <----- is it valid to erase it, whose underlying element has been removed?
Yes, you've modified the object in the container. You've not modified the container itself so the iterator is still valid
"Moving" an underlying element may not be the best name to use in this context. The name of this operation express the intention behind it but not how it really works.
In fact, the move operation is a form of copy operation with one difference: it is allowed to change the state of the "copied" object if it speeds up the execution. In case of the std::string this means that the internal buffer containing characters may be not deep-copied but just copied by address. The original object has to be then set to an empty state, to tell it to not use this buffer anymore. (Emptying the source string is not guaranteed. Optimizations of std::string are more complicated than I described.)
The important thing is that after the move operation, the original object is still there. It's just not guaranteed to have any specific state.
In this particular case you've done nothing to the iterator, but much rather to the object within it, so yes: The iterator remains valid.
But if you look at std::list::erase, it sports a line such as "References and iterators to the erased elements are invalidated. Other references and iterators are not affected."
So if you tried to do *iter after erase, it would cause your program to fail.
This may seem obvious for erase, but there are other operations where it is not as obvious.
For std::list for example, the reference page says:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
For std::vector on the other hand, the reference for the push_back method says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
That means, unlike with std::list, it is not generally safe to keep an iterator to an element around, if the vector grows (because the underlying storage location of the item changes).

std::unordered_map insert invalidates only iterators but not references and pointers to the element node

Can somebody explain why insertion into std::unordered_map container only invalidates iterators but not references and pointers. Also I am not able to understand what the below statement from https://en.cppreference.com/w/cpp/container/unordered_map/insert mean
If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid.
Insertion of unordered_map doesn't invalidate references because it doesn't move the data, however the underlying data structure might change rather significantly. Details of how exactly it is implemented aren't specified and different compilers do it differently. For instance, MSVC has a linked list for data storage and, I believe, a vector for the look-up. And insert can cause rehash, meaning look-up gets changed completely and the linked list gets reorded significantly - but original data is not moved.
The iterators refer to this underlying structure and any changes to it can cause iterators to invalidate. Basically, they contain more info than pointers and references and subsequently can get invalidated.
The confusing passage is about insert of a node_type - nodes that were previously extracted. Checkout the method unordered_map::extract
https://en.cppreference.com/w/cpp/container/unordered_map/extract
For some reason it is forbidden to use pointers/references while the node is extracted. After inserting it is allowed to use them again. I don't know why it is so.
In terms of the second part of the question, it is referring to the Node handle introduced in C++17. It is a move-only type, that has direct ownership of the underlying key and value. It can be used to change the key of an element without reallocation and transfer element ownership without copy or move.
Since it's allowed to change const-like data(such as key), I personally think it makes sense to only allow such edit to happen when it is isolated from the container, ie when it is in the node form; which is why pointer and reference to it underlying data should be invalidated once they are insert back to the container.
Similarly, since the insertion does not incur any reallocations, once the node is inserted back to the container, pointer and references that were point to the data before they were extract will be valid again.

Is using std::find and C::insert() thread safe if iterators are not invalidated by insert

Suppose I have a container C whose iterators are not invalidated upon C.insert(), can I safely perform a std::find() on the container while a concurrent insert() is being performed? That is, am I guaranteed to find a matching element or C::end(), ignoring the fact that the inserted element may match but std::find() gives me C::end()?
No. Although iterators are not invalidated by your mutating operation, it is still a mutating operation, and none of the standard containers are defined to be safe to read in one thread while a mutating operation is taking place in another. Remember that there are still "innards" to your container, all manner of internal state, which may be involved in both operations.

Do the iterator invalidation rules mean thread safety?

Here in this Stack Overflow answer it is listed the iterator invalidation rules for the standard containers in C++11.
Particularly, there are for insertion:
[multi]{set,map}: all iterators and references unaffected [23.2.4/9]
unordered_[multi]{set,map}: all iterators invalidated when rehashing occurs, but references unaffected [23.2.5/8]. Rehashing does not occur if the insertion does not cause the container's size to exceed z * B where z is the maximum load factor and B the current number of buckets. [23.2.5/14]
erasure:
[multi]{set,map} and unordered_[multi]{set,map}: only iterators and references to the erased elements are invalidated
Do these rules mean I can safely do insertion and erasure in one thread, and safely in another thread access, look for (using find) elements as long as these elements are not the ones being inserted and erased in the first thread, and make sure that rehashing is not happening?
If not, what do these rules exactly mean?
The fact that iterators to elements of the container are not invalidated in no way implies thread safety on the container itself. For example, the size member variable would need to be modified atomically which is a totally separate issue from iterators being invalidated (or not) on insertion/deletion.
tl;dr; No.
These rules simply tell you when an iterator to an element is invalidated by an operation. For example, when a vector resizes, the underlying array is reallocated elsewhere so if you had an iterator (or pointer) to an element, it would no longer be valid after the resize (because it would be pointing to deleted elements of the old array).
There are two kinds of operations on C++ std containers. Reader and Writer operations (these are not the terms the standard uses, but this reads easier). In addition, there are operations on elements in the container.
A const method is a Reader method, as are "lookup" functions that are only non-const because they return a non-const reference to an element (or similar). There is a complete list in the standard, but common sense should work. vector::operator[], begin(), map::find() are all "Readers". map::operator[] is not a "Reader".
You can have any number of threads engaging in Reader operations at the same time no problem.
If you have any thread engaged in a Writer operation, no other access can occur on the container.
You cannot safely have one Reader and one Writer at the same time. If you have any Writers, you must have excluded all other access.
You can safely have 1337 readers at once.
Operations on elements is somewhat similar, except that Writing to an element need only exclude other access to that element. And you are responsible for making your const method play nice with each other. (the std guarantees that the const methods of the container will only call const methods of the elements)
Note that changing sorting order in a std:: associative container is UB without modifying the container.
An iterator that is not invalidated, where you just operate on the element, will (I believe) count as operations on the element. Only synchronization on that element is required.
Note that std::vector<bool> does not follow the above rules.
If you violate the above rules, the C++ std library does not stop you. It just states there is a race condition -- aka, undefined behavior. Anything can happen. In C++ std library, if you don't need something, you don't pay for it: and a single-threaded use of those containers would not need the weight of synchronization. So you don't get it by default.
A std::shared_timed_mutex or std::experimental::shared_mutex can both be useful to guarantee the above holds. unique_lock for Writing, and shared_lock for Reading. Write access to elements has to be shared_locked on the container guarded, and somehow guarded against overlapping access to the same element without deadlock.
Iterator invalidation rules are relatively orthogonal to the thread-safety rules.
Using find implies traversal, at least over a subset of the elements. insert and erase on [multi]{set,map} results in rebalancing the underlying tree, which impacts the links between the nodes. If a rebalance happens at the same time as a find, bad things will happen.
Similarly bad things will happen if you attempt a find during unordered_[multi]{set,map} insert or erase. insert can cause rehashing. And both insert and erase need to link/unlink elements from a list. If a find is searching over a list during a link/unlink, you lose.
[] on [unordered][multi]{set,map} is shorthand for "find and insert if not found". at is shorthand for find. So no, these are not safe to use either.
If you have an existing iterator into a [multi]{set,map}, you can continue to dereference (but not increment/decrement) that iterator while another element is inserted or erased. For unordered_[multi]{set,map}, this is true only if you can guarantee that rehashing won't happen under the insert (it never happens under the erase).
There are other answers here who go into the thread safety issue. So if these rules don't mean thread safety where does that leaves us?
If not, what do these rules exactly mean?
They tell you when you can't use an iterator anymore.
Lets take a (deceptive innocent) example:
auto v = std::vector<int>{....};
//v.reserve(...);
for (auto it = std::begin(v); it != std::end(v); ++it) {
if (*it == ...)
std::insert(it, ...);
}
Here you traverse a vector and for each element that tests a condition, you insert something into that position.
Now is this code valid? The iterator invalidation rules tells you that if the vector's capacity is big enough the insertion invalidates only iterator after the insert position. So if you can prove that the reserve (commented line) is big enough, then yes, the code is valid. If not, the code is invalid, as the insert invalidates all the iterators of the vector, which means that it is invalidated, which means that you cannot use it anymore. You'd have to have to reacquire it:
auto idx = std::distance(std::begin(v), it);
std::insert(it, ...);
it = std::begin(v) + idx;

Storing iterators inside containers

I am building a DLL that another application would use. I want to store the current state of some data globally in the DLL's memory before returning from the function call so that I could reuse state on the next call to the function.
For doing this, I'm having to save some iterators. I'm using a std::stack to store all other data, but I wasn't sure if I could do that with the iterators also.
Is it safe to put list iterators inside container classes? If not, could you suggest a way to store a pointer to an element in a list so that I can use it later?
I know using a vector to store my data instead of a list would have allowed me to store the subscript and reuse it very easily, but unfortunately I'm having to use only an std::list.
Iterators to list are invalidated only if the list is destroyed or the "pointed" element is removed from the list.
Yes, it'll work fine.
Since so many other answers go on about this being a special quality of list iterators, I have to point out that it'd work with any iterators, including vector ones. The fact that vector iterators get invalidated if the vector is modified is hardly relevant to a question of whether it is legal to store iterators in another container -- it is. Of course the iterator can get invalidated if you do anything that invalidates it, but that has nothing to do with whether or not the iterator is stored in a stack (or any other data structure).
It should be no problem to store the iterators, just make sure you don't use them on a copy of the list -- an iterator is bound to one instance of the list, and cannot be used on a copy.
That is, if you do:
std::list<int>::iterator it = myList.begin ();
std::list<int> c = myList;
c.insert (it, ...); // Error
As noted by others: Of course, you should also not invalidate the iterator by removing the pointed-to element.
This might be offtopic, but just a hint...
Be aware, that your function(s)/data structure would probably be thread unsafe for read operations. There is a kind of basic thread safety where read operations do not require synchronization. If you are going to store the sate how much the caller read from your structure it will make the whole concept thread unsafe and a bit unnatural to use. Because nobody assumes a read to be state-full operation.
If two threads are going to call it they will either need to synchronize the calls or your data structure might end-up in a race condition. The problem in such a design is that both threads must have access to a common synchronization variable.
I would suggest making two overloaded functions. Both are stateless, but one of them should accept a hint iterator, where to start next read/search/retrieval etc. This is e.g. how Allocator in STL is implemented. You can pass to allocator a hint pointer (default 0) so that it quicker finds a new memory chunk.
Regards,
Ovanes
Storing the iterator for the list should be fine. It will not get invalidated unless you remove the same element from the list for which you have stored the iterator. Following quote from SGI site:
Lists have the important property that
insertion and splicing do not
invalidate iterators to list elements,
and that even removal invalidates only
the iterators that point to the
elements that are removed
However, note that the previous and next element of the stored iterator may change. But the iterator itself will remain valid.
The same rule applies to an iterator stored in a local variable as in a longer lived data structure: it will stay valid as long as the container allows.
For a list, this means: as long as the node it points to is not deleted, the iterator stays valid. Obviously the node gets deleted when the list is destructed...