Do the iterator invalidation rules mean thread safety? - c++

Here in this Stack Overflow answer it is listed the iterator invalidation rules for the standard containers in C++11.
Particularly, there are for insertion:
[multi]{set,map}: all iterators and references unaffected [23.2.4/9]
unordered_[multi]{set,map}: all iterators invalidated when rehashing occurs, but references unaffected [23.2.5/8]. Rehashing does not occur if the insertion does not cause the container's size to exceed z * B where z is the maximum load factor and B the current number of buckets. [23.2.5/14]
erasure:
[multi]{set,map} and unordered_[multi]{set,map}: only iterators and references to the erased elements are invalidated
Do these rules mean I can safely do insertion and erasure in one thread, and safely in another thread access, look for (using find) elements as long as these elements are not the ones being inserted and erased in the first thread, and make sure that rehashing is not happening?
If not, what do these rules exactly mean?

The fact that iterators to elements of the container are not invalidated in no way implies thread safety on the container itself. For example, the size member variable would need to be modified atomically which is a totally separate issue from iterators being invalidated (or not) on insertion/deletion.
tl;dr; No.
These rules simply tell you when an iterator to an element is invalidated by an operation. For example, when a vector resizes, the underlying array is reallocated elsewhere so if you had an iterator (or pointer) to an element, it would no longer be valid after the resize (because it would be pointing to deleted elements of the old array).

There are two kinds of operations on C++ std containers. Reader and Writer operations (these are not the terms the standard uses, but this reads easier). In addition, there are operations on elements in the container.
A const method is a Reader method, as are "lookup" functions that are only non-const because they return a non-const reference to an element (or similar). There is a complete list in the standard, but common sense should work. vector::operator[], begin(), map::find() are all "Readers". map::operator[] is not a "Reader".
You can have any number of threads engaging in Reader operations at the same time no problem.
If you have any thread engaged in a Writer operation, no other access can occur on the container.
You cannot safely have one Reader and one Writer at the same time. If you have any Writers, you must have excluded all other access.
You can safely have 1337 readers at once.
Operations on elements is somewhat similar, except that Writing to an element need only exclude other access to that element. And you are responsible for making your const method play nice with each other. (the std guarantees that the const methods of the container will only call const methods of the elements)
Note that changing sorting order in a std:: associative container is UB without modifying the container.
An iterator that is not invalidated, where you just operate on the element, will (I believe) count as operations on the element. Only synchronization on that element is required.
Note that std::vector<bool> does not follow the above rules.
If you violate the above rules, the C++ std library does not stop you. It just states there is a race condition -- aka, undefined behavior. Anything can happen. In C++ std library, if you don't need something, you don't pay for it: and a single-threaded use of those containers would not need the weight of synchronization. So you don't get it by default.
A std::shared_timed_mutex or std::experimental::shared_mutex can both be useful to guarantee the above holds. unique_lock for Writing, and shared_lock for Reading. Write access to elements has to be shared_locked on the container guarded, and somehow guarded against overlapping access to the same element without deadlock.
Iterator invalidation rules are relatively orthogonal to the thread-safety rules.

Using find implies traversal, at least over a subset of the elements. insert and erase on [multi]{set,map} results in rebalancing the underlying tree, which impacts the links between the nodes. If a rebalance happens at the same time as a find, bad things will happen.
Similarly bad things will happen if you attempt a find during unordered_[multi]{set,map} insert or erase. insert can cause rehashing. And both insert and erase need to link/unlink elements from a list. If a find is searching over a list during a link/unlink, you lose.
[] on [unordered][multi]{set,map} is shorthand for "find and insert if not found". at is shorthand for find. So no, these are not safe to use either.
If you have an existing iterator into a [multi]{set,map}, you can continue to dereference (but not increment/decrement) that iterator while another element is inserted or erased. For unordered_[multi]{set,map}, this is true only if you can guarantee that rehashing won't happen under the insert (it never happens under the erase).

There are other answers here who go into the thread safety issue. So if these rules don't mean thread safety where does that leaves us?
If not, what do these rules exactly mean?
They tell you when you can't use an iterator anymore.
Lets take a (deceptive innocent) example:
auto v = std::vector<int>{....};
//v.reserve(...);
for (auto it = std::begin(v); it != std::end(v); ++it) {
if (*it == ...)
std::insert(it, ...);
}
Here you traverse a vector and for each element that tests a condition, you insert something into that position.
Now is this code valid? The iterator invalidation rules tells you that if the vector's capacity is big enough the insertion invalidates only iterator after the insert position. So if you can prove that the reserve (commented line) is big enough, then yes, the code is valid. If not, the code is invalid, as the insert invalidates all the iterators of the vector, which means that it is invalidated, which means that you cannot use it anymore. You'd have to have to reacquire it:
auto idx = std::distance(std::begin(v), it);
std::insert(it, ...);
it = std::begin(v) + idx;

Related

STL iterator revalidation for end (past-the-end) iterator?

See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).
Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.
In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.

Is using std::find and C::insert() thread safe if iterators are not invalidated by insert

Suppose I have a container C whose iterators are not invalidated upon C.insert(), can I safely perform a std::find() on the container while a concurrent insert() is being performed? That is, am I guaranteed to find a matching element or C::end(), ignoring the fact that the inserted element may match but std::find() gives me C::end()?
No. Although iterators are not invalidated by your mutating operation, it is still a mutating operation, and none of the standard containers are defined to be safe to read in one thread while a mutating operation is taking place in another. Remember that there are still "innards" to your container, all manner of internal state, which may be involved in both operations.

Can a reference to an element inside an std::map be invalidated?

I have a multi-threaded application and a shared resource std::map<KeyType, ElementType>. I use a mutex to protect inserts, gets and removes.
My get method returns a reference to the stored element (unlocks on return), and then I do some work with that element.
Question: Is it possible that while working with the stored element reference, another thread may change the std::map so the element will be moved to a different address and the reference will no longer be valid? (I know there are certain ADT implementations which do rearrangement of the ADT on resize).
The iterator invalidation rule for associative containers (which std::map is) says at [associative.reqmts]/9:
The insert and emplace members shall not affect the validity of
iterators and references to the container, and the erase members shall
invalidate only iterators and references to the erased elements.
So if one thread inserts an element, it won't affect any references to existing elements. But if it removes something, other threads may be borked. Some form of element-wise locking is in order, I'd say.
Another thread may erase the element, or destroy the map, which would of course also invalidate the element.
Erasing an element only invalidates iterators and refewrences to this element. Insertion does not invalidate iterators or references into the map.
(that's what the second-hand documentation says, at least - and it's an assumption I hold that never was invalidated, if anecdotal evidence counts.)
Another problem remains: Manipulation of the element through the returned reference is not thread safe. You need to sync e.g. per element - and make sure you don't violate lock hierarchy.

C++: how to track a pointer to a STL list element?

I would like to track a pointer to a list element for the next read access. This pointer would be advanced every time the list is read. Would it be bad practice to cache an iterator to the list as a member variable, assuming I take proper precautions when deleting list elements?
An iterator to a list element remains valid until the element it refers to is removed. This is guaranteed by the standard.
There is no problem with caching an iterator as long as you make sure you don't remove elements from the list or refresh the cached iterator when you do.
Iterators are meant to be used. They aren't just for looping over each element in a for-loop. All you have to take into account is the rules for iterator invalidation.
std::vector iterators can be invalidated by any operation that inserts elements into the list. The iterators for all elements beyond the point of insertion are invalidated, and all iterators are invalidated if the insertion operation causes an increase in capacity. An operation that removes an element from the vector will invalidate any iterator after the point of removal.
std::deque iterators are invalidated by any operation that adds or removes elements from anywhere in the deque. So it's probably not a good idea to keep these around very long.
std::list, std::set, and std::map iterators are only invalidated by the specific removal of the particular element that the iterator refers to. These are the longest-lived of the iterator types.
As long as you keep these rules in mind, feel free to store these iterators all you want. It certainly isn't bad form to store std::list iterators, as long as you can be sure that that particular element isn't going anywhere.
The only way you're going to be able to properly advance though an STL std::list<T> in a platform and compiler independent way is through the use of a std::list<T>::iterator or std::list<T>::const_iterator, so that's really your only option unless you're planning on implementing your own linked-list. Per the standard, as others here have posted, an iterator to a std::list element will remain valid until that element is removed from the list.

What does enabling STL iterator debugging really do?

I've enabled iterator debugging in an application by defining
_HAS_ITERATOR_DEBUGGING = 1
I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?
Dinkumware STL, by the way.
There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).
The issue
The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:
Uninitialized iterator
Iterator to an element that has been erased
Iterator to an element which physical location has changed (reallocation for a vector)
Iterator outside of [begin, end)
The standard specifies in excruciating details for each container which operation invalidates which iterator.
There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:
std::vector<Animal> cats, dogs;
for_each(cats.begin(), dogs.end(), /**/); // obvious bug
This pertain to a more general issue: the validity of ranges passed to the algorithms.
[cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
[cats.end(), cats.begin()) is invalid (unless cats is empty ??)
The solution
The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.
The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.
In Dinkumware this is achieved by two additions:
Each iterator carries a pointer to its related container
Each container holds a linked list of the iterators it created
And this neatly solves our problems:
An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
On incrementing and decrementing an iterator it can checks it stays within the bounds
Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)
The cost
The cost is heavy, but does correctness have a price? We can break down the cost:
extra memory allocation (the extra list of iterators maintained): O(NbIterators)
notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
range validity check: O( min(last-first, container.end()-first) )
Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:
for (iterator_t it = vec.begin();
it != vec.end(); // Oops
++it)
// body
We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?
Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.
As far as I understand:
_HAS_ITERATOR_DEBUGGING will display a dialog box at run time to assert any incorrect iterator use including:
1) Iterators used in a container after an element is erased
2) Iterators used in vectors after a .push() or .insert() function is called
According to http://msdn.microsoft.com/en-us/library/aa985982%28v=VS.80%29.aspx
The C++ standard describes which member functions cause iterators to a container to become invalid. Two examples are:
Erasing an element from a container causes iterators to the element to become invalid.
Increasing the size of a vector (push or insert) causes iterators into the vector container become invalid.