Difference between `iterators` and `references to elements` - c++

I followed the following post that illustrates the scenario how iterator behaves after some non-const operations.
Iterator invalidation rules
I have problems to understand the difference between reference and iterator. Here is one of the rule listed as an example for clarification:
deque: all iterators and references are
invalidated, unless the inserted member is at an end (front or back)
of the deque (in which case all iterators are invalidated, but
references to elements are unaffected) [23.2.1.3/1]
Here is the sample code that is based on the reference.
std::deque<int> mydeque;
mydeque.push_back(1);
mydeque.push_back(2);
mydeque.push_back(3);
const int& int3 = mydeque.back(); // reference to 3
int& int3 = mydeque.back();
std::deque<int>::iterator itBack = mydeque.crbegin(); // pointing to 3
mydeque.push_back(4);
Question> If my vague understanding is correct, then the following statement is true:
After the calling of the line of `mydeque.push_back(4)`
`int3` is still valid because it is a reference to element.
`itBack` is invalidated because it is an iterator.
Thank you

Yes, that sounds correct. The reason the iterator gets invalidated but the reference does not is that the iterator needs to be able to do ++ and -- correctly, but pushing something onto a deque might cause it to rearrange its structures for tracking that. But, deque guarantees it won't move the elements themselves, even if it has to restructure the container around them.
This suggests that the implementation of deque has an additional level of indirection built into its iterators that it hides from you. But, that's pretty much the entire the point of iterators, and why they're distinct from references.

Related

Are iterators still valid when the underlying elements have been moved?

If I have an iterator pointing to an element in an STL container, and I moved the element with the iterator, does the standard guarantee that the iterator is still valid? Can I use it with container's method, e.g. container::erase?
Also does it matter, if the container is a continuous one, e.g. vector, or non-continuous one, e.g. list?
std::list<std::string> l{"a", "b", "c"};
auto iter = l.begin();
auto s = std::move(*iter);
l.erase(iter); // <----- is it valid to erase it, whose underlying element has been removed?
Yes, you've modified the object in the container. You've not modified the container itself so the iterator is still valid
"Moving" an underlying element may not be the best name to use in this context. The name of this operation express the intention behind it but not how it really works.
In fact, the move operation is a form of copy operation with one difference: it is allowed to change the state of the "copied" object if it speeds up the execution. In case of the std::string this means that the internal buffer containing characters may be not deep-copied but just copied by address. The original object has to be then set to an empty state, to tell it to not use this buffer anymore. (Emptying the source string is not guaranteed. Optimizations of std::string are more complicated than I described.)
The important thing is that after the move operation, the original object is still there. It's just not guaranteed to have any specific state.
In this particular case you've done nothing to the iterator, but much rather to the object within it, so yes: The iterator remains valid.
But if you look at std::list::erase, it sports a line such as "References and iterators to the erased elements are invalidated. Other references and iterators are not affected."
So if you tried to do *iter after erase, it would cause your program to fail.
This may seem obvious for erase, but there are other operations where it is not as obvious.
For std::list for example, the reference page says:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
For std::vector on the other hand, the reference for the push_back method says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
That means, unlike with std::list, it is not generally safe to keep an iterator to an element around, if the vector grows (because the underlying storage location of the item changes).

Can a reference to an element inside an std::map be invalidated?

I have a multi-threaded application and a shared resource std::map<KeyType, ElementType>. I use a mutex to protect inserts, gets and removes.
My get method returns a reference to the stored element (unlocks on return), and then I do some work with that element.
Question: Is it possible that while working with the stored element reference, another thread may change the std::map so the element will be moved to a different address and the reference will no longer be valid? (I know there are certain ADT implementations which do rearrangement of the ADT on resize).
The iterator invalidation rule for associative containers (which std::map is) says at [associative.reqmts]/9:
The insert and emplace members shall not affect the validity of
iterators and references to the container, and the erase members shall
invalidate only iterators and references to the erased elements.
So if one thread inserts an element, it won't affect any references to existing elements. But if it removes something, other threads may be borked. Some form of element-wise locking is in order, I'd say.
Another thread may erase the element, or destroy the map, which would of course also invalidate the element.
Erasing an element only invalidates iterators and refewrences to this element. Insertion does not invalidate iterators or references into the map.
(that's what the second-hand documentation says, at least - and it's an assumption I hold that never was invalidated, if anecdotal evidence counts.)
Another problem remains: Manipulation of the element through the returned reference is not thread safe. You need to sync e.g. per element - and make sure you don't violate lock hierarchy.

Do the iterator invalidation rules mean thread safety?

Here in this Stack Overflow answer it is listed the iterator invalidation rules for the standard containers in C++11.
Particularly, there are for insertion:
[multi]{set,map}: all iterators and references unaffected [23.2.4/9]
unordered_[multi]{set,map}: all iterators invalidated when rehashing occurs, but references unaffected [23.2.5/8]. Rehashing does not occur if the insertion does not cause the container's size to exceed z * B where z is the maximum load factor and B the current number of buckets. [23.2.5/14]
erasure:
[multi]{set,map} and unordered_[multi]{set,map}: only iterators and references to the erased elements are invalidated
Do these rules mean I can safely do insertion and erasure in one thread, and safely in another thread access, look for (using find) elements as long as these elements are not the ones being inserted and erased in the first thread, and make sure that rehashing is not happening?
If not, what do these rules exactly mean?
The fact that iterators to elements of the container are not invalidated in no way implies thread safety on the container itself. For example, the size member variable would need to be modified atomically which is a totally separate issue from iterators being invalidated (or not) on insertion/deletion.
tl;dr; No.
These rules simply tell you when an iterator to an element is invalidated by an operation. For example, when a vector resizes, the underlying array is reallocated elsewhere so if you had an iterator (or pointer) to an element, it would no longer be valid after the resize (because it would be pointing to deleted elements of the old array).
There are two kinds of operations on C++ std containers. Reader and Writer operations (these are not the terms the standard uses, but this reads easier). In addition, there are operations on elements in the container.
A const method is a Reader method, as are "lookup" functions that are only non-const because they return a non-const reference to an element (or similar). There is a complete list in the standard, but common sense should work. vector::operator[], begin(), map::find() are all "Readers". map::operator[] is not a "Reader".
You can have any number of threads engaging in Reader operations at the same time no problem.
If you have any thread engaged in a Writer operation, no other access can occur on the container.
You cannot safely have one Reader and one Writer at the same time. If you have any Writers, you must have excluded all other access.
You can safely have 1337 readers at once.
Operations on elements is somewhat similar, except that Writing to an element need only exclude other access to that element. And you are responsible for making your const method play nice with each other. (the std guarantees that the const methods of the container will only call const methods of the elements)
Note that changing sorting order in a std:: associative container is UB without modifying the container.
An iterator that is not invalidated, where you just operate on the element, will (I believe) count as operations on the element. Only synchronization on that element is required.
Note that std::vector<bool> does not follow the above rules.
If you violate the above rules, the C++ std library does not stop you. It just states there is a race condition -- aka, undefined behavior. Anything can happen. In C++ std library, if you don't need something, you don't pay for it: and a single-threaded use of those containers would not need the weight of synchronization. So you don't get it by default.
A std::shared_timed_mutex or std::experimental::shared_mutex can both be useful to guarantee the above holds. unique_lock for Writing, and shared_lock for Reading. Write access to elements has to be shared_locked on the container guarded, and somehow guarded against overlapping access to the same element without deadlock.
Iterator invalidation rules are relatively orthogonal to the thread-safety rules.
Using find implies traversal, at least over a subset of the elements. insert and erase on [multi]{set,map} results in rebalancing the underlying tree, which impacts the links between the nodes. If a rebalance happens at the same time as a find, bad things will happen.
Similarly bad things will happen if you attempt a find during unordered_[multi]{set,map} insert or erase. insert can cause rehashing. And both insert and erase need to link/unlink elements from a list. If a find is searching over a list during a link/unlink, you lose.
[] on [unordered][multi]{set,map} is shorthand for "find and insert if not found". at is shorthand for find. So no, these are not safe to use either.
If you have an existing iterator into a [multi]{set,map}, you can continue to dereference (but not increment/decrement) that iterator while another element is inserted or erased. For unordered_[multi]{set,map}, this is true only if you can guarantee that rehashing won't happen under the insert (it never happens under the erase).
There are other answers here who go into the thread safety issue. So if these rules don't mean thread safety where does that leaves us?
If not, what do these rules exactly mean?
They tell you when you can't use an iterator anymore.
Lets take a (deceptive innocent) example:
auto v = std::vector<int>{....};
//v.reserve(...);
for (auto it = std::begin(v); it != std::end(v); ++it) {
if (*it == ...)
std::insert(it, ...);
}
Here you traverse a vector and for each element that tests a condition, you insert something into that position.
Now is this code valid? The iterator invalidation rules tells you that if the vector's capacity is big enough the insertion invalidates only iterator after the insert position. So if you can prove that the reserve (commented line) is big enough, then yes, the code is valid. If not, the code is invalid, as the insert invalidates all the iterators of the vector, which means that it is invalidated, which means that you cannot use it anymore. You'd have to have to reacquire it:
auto idx = std::distance(std::begin(v), it);
std::insert(it, ...);
it = std::begin(v) + idx;

Why does vector::pop_back invalidate the iterator (end() - 1)?

Note: The question applies to erase, too. See bottom.
What's the reason behind the fact that the end() - 1 iterator is invalidated after pop_back is called on a vector?
To clarify, I'm referring to this situation:
std::vector<int> v;
v.push_back(1);
v.push_back(2);
std::vector<int>::iterator i1 = v.begin(), i2 = v.end() - 1, i3 = v.begin() + 1;
v.pop_back();
// i1 is still valid
// i2 is now invalid
// i3 is now invalid too
std::vector<int>::iterator i4 = v.end();
assert(i2 == i4); // undefined behavior (but why should it be?!)
assert(i3 == i4); // undefined behavior (but why should it be?!)
Why does this happen? (i.e. when would this invalidation ever prove beneficial for the implementation?)
(Note that this isn't just a theoretical problem. Visual C++ 2013 -- and probably 2012 too -- display an error if you try to do this in debug mode, if you have _ITERATOR_DEBUG_LEVEL set to 2.)
Regarding erase:
Note that the same question applies to erase:
Why does erase(end() - 1, end()) invalidate end() - 1?
(So please don't say, "pop_back invalidates end() - 1 because it is equivalent to calling erase(end() - 1, end())"; that's just begging the question.)
The interesting question is really what does it mean for an iterator to be invalidated. And I truly don't have a good answer from the standard. What I do know is that to some extent the standard considers an iterator not as a pointer to a location inside the container, but rather as a proxy to a particular element that lives within the container.
With that in mind, after erasing of a single element in the middle of a vector, all iterators after the point of removal become invalidated as they no longer refer to the same element that they referred to before.
Support for this line of reasoning comes from the iterator invalidation clauses of other operations in the container. For example, on insert, the standard guarantees that if there is no reallocation the iterators before the point of insertion remain valid. Exceptio probat regulam in casibus non exceptis, it invalidates all iterators after the point of insertion.
If the validity of iterators was only related to the fact that there is an element of the container where the iterator points, then none of the iterators would be invalidated with that operation (again, in the absence of reallocations).
Going even further in that line of reasoning, if you consider iterator validity as pointer validity, then none of the iterators into a vector would be invalidated during an erase operation. The end()-1 iterator would become non-dereferencable, but it could remain valid, which is not the case.
pop_back is usually defined in terms of erase(end()-1, end()).
When you erase a range of iterators from a vector, all iterators from the first erased and forward are invalidated. This includes a range of one.
In general, a valid derefencable non input iterator always refers to the same data 'location' until it becomes invalidated. In the case of erase, all non-invalid iterators afterwards have the same value and location currently.
Both of the above rules would have to be amended to get the behavior you want. The first to something like 'unless you erase the last element in so doing, in which case the first erased element becomes the one-past-the-end iterator'. And the still valid iterator will become not dereferencable, a possibly unique state change for an iterator that is, as far as I know, without precedent in C++.
The cost is both extra tricky verbage in the standard to cover your requested behavior and sanity checks for strict iterators. The benefit -- well, I do not see one: in every situation you will have to know exactly what just happened (a very particular iterator just became one past the end instead of being invalidated), and if you know that is the case you could just talk about end.
And words are required. When you call erase( a, b ), every iterator from a on will have its state changed in some way (*a will not return the same value, and how that changes must be specified). C++ takes the easy way, and simply states that every iterator whose state is changed by erase becomes invalid, and using it is undefined behavior: this allows the maximium latitude to the implementer.
In theory, it can also allow for optimizations. If you dereference an iterator before and after an erase operation, the value you get can be considered the same! (assuming no possible indirect modification of the container within object destructors, which can be proven as well).

C++ and iterator invalidation

So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)
The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.
An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.
The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.
Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.
From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);