C++ and iterator invalidation - c++

So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?

In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)

The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.

An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.

The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.

Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.

From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);

Related

Are iterators still valid when the underlying elements have been moved?

If I have an iterator pointing to an element in an STL container, and I moved the element with the iterator, does the standard guarantee that the iterator is still valid? Can I use it with container's method, e.g. container::erase?
Also does it matter, if the container is a continuous one, e.g. vector, or non-continuous one, e.g. list?
std::list<std::string> l{"a", "b", "c"};
auto iter = l.begin();
auto s = std::move(*iter);
l.erase(iter); // <----- is it valid to erase it, whose underlying element has been removed?
Yes, you've modified the object in the container. You've not modified the container itself so the iterator is still valid
"Moving" an underlying element may not be the best name to use in this context. The name of this operation express the intention behind it but not how it really works.
In fact, the move operation is a form of copy operation with one difference: it is allowed to change the state of the "copied" object if it speeds up the execution. In case of the std::string this means that the internal buffer containing characters may be not deep-copied but just copied by address. The original object has to be then set to an empty state, to tell it to not use this buffer anymore. (Emptying the source string is not guaranteed. Optimizations of std::string are more complicated than I described.)
The important thing is that after the move operation, the original object is still there. It's just not guaranteed to have any specific state.
In this particular case you've done nothing to the iterator, but much rather to the object within it, so yes: The iterator remains valid.
But if you look at std::list::erase, it sports a line such as "References and iterators to the erased elements are invalidated. Other references and iterators are not affected."
So if you tried to do *iter after erase, it would cause your program to fail.
This may seem obvious for erase, but there are other operations where it is not as obvious.
For std::list for example, the reference page says:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
For std::vector on the other hand, the reference for the push_back method says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
That means, unlike with std::list, it is not generally safe to keep an iterator to an element around, if the vector grows (because the underlying storage location of the item changes).

Preventing getting the address of items in a container in C++

I'm writing a C++ array-like container that can dynamically grow and shrink. I'd like to prevent users of this container from taking the address of its items, because they might be reallocated when the container needs to reallocate itself. The only correct way of using this container will be by keeping track of the address of the container and the index of each item (yes, I'll be specifying it in the documentation, but it would be better if I could make the compiler trigger an error if a user of the container tries to get the address of an item?
Can this be done somehow? I searched and found some question regarding making the "address of operator" private, but it doesn't seem to be guaranteed to work, nor it's a recommended practice either. So, I wonder if there could be any alternative technique for preventing access to pointers to items...
In C++ all memory allocated to the program is more or less fair game. So there is no real way to prevent users of your array type to obtain or calculate addresses within your array.
The STL even makes a lot of effort to guarantee that iterators do not suddenly become invalid because of this.
In the real world you write in the API description that it is very unwise to work with addresses within your container because the could become invalid at any time and then it is the responsibility of the users to follow that rule, IMHO.
What you want to do is not possible easily (if at all).
If your container invalidates pointers to elements in certain situations, that does not make your container "special". Lets consider what other containers do to mitigate this problem.
std::vector:
// THIS CODE IS BROKEN !! DO NOT WRITE CODE LIKE THIS !!
std::vector<int> x(24,0);
int* ptr = &x[0];
auto it = x.begin();
x.push_back(42);
x.insert(it, 100); // *
*ptr = 5;
*it = 7;
Starting from the line marked with * everything here uses an invalid pointer or iterator. From cppreference, push_back:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Actually x.insert(it, 100); might be using a valid iterator, but the code does not check whether the push_back had to increase capacity, so one has to assume that it and ptr are invalid after the call to push_back.
insert:
Causes reallocation if the new size() is greater than the old capacity(). If the new size() is greater than capacity(), all iterators and references are invalidated. Otherwise, only the iterators and references before the insertion point remain valid. The past-the-end iterator is also invalidated.
Users of standard containers must be aware of iterator invalidation rules (see Iterator invalidation rules) or they will write horribly broken code like the one above.
In general, you cannot protect yourself from all mistakes a user can possibly make. Document pre- and post-conditions and if a user ignores them they just get what they deserve.
Note that you could try to overload the & operator, but there is no way you can prevent someone to get the adress via std::addressof, which is made exactly for that: Get the address of an object in case the object tries to prevent it by overloading &.

Why does vector::pop_back invalidate the iterator (end() - 1)?

Note: The question applies to erase, too. See bottom.
What's the reason behind the fact that the end() - 1 iterator is invalidated after pop_back is called on a vector?
To clarify, I'm referring to this situation:
std::vector<int> v;
v.push_back(1);
v.push_back(2);
std::vector<int>::iterator i1 = v.begin(), i2 = v.end() - 1, i3 = v.begin() + 1;
v.pop_back();
// i1 is still valid
// i2 is now invalid
// i3 is now invalid too
std::vector<int>::iterator i4 = v.end();
assert(i2 == i4); // undefined behavior (but why should it be?!)
assert(i3 == i4); // undefined behavior (but why should it be?!)
Why does this happen? (i.e. when would this invalidation ever prove beneficial for the implementation?)
(Note that this isn't just a theoretical problem. Visual C++ 2013 -- and probably 2012 too -- display an error if you try to do this in debug mode, if you have _ITERATOR_DEBUG_LEVEL set to 2.)
Regarding erase:
Note that the same question applies to erase:
Why does erase(end() - 1, end()) invalidate end() - 1?
(So please don't say, "pop_back invalidates end() - 1 because it is equivalent to calling erase(end() - 1, end())"; that's just begging the question.)
The interesting question is really what does it mean for an iterator to be invalidated. And I truly don't have a good answer from the standard. What I do know is that to some extent the standard considers an iterator not as a pointer to a location inside the container, but rather as a proxy to a particular element that lives within the container.
With that in mind, after erasing of a single element in the middle of a vector, all iterators after the point of removal become invalidated as they no longer refer to the same element that they referred to before.
Support for this line of reasoning comes from the iterator invalidation clauses of other operations in the container. For example, on insert, the standard guarantees that if there is no reallocation the iterators before the point of insertion remain valid. Exceptio probat regulam in casibus non exceptis, it invalidates all iterators after the point of insertion.
If the validity of iterators was only related to the fact that there is an element of the container where the iterator points, then none of the iterators would be invalidated with that operation (again, in the absence of reallocations).
Going even further in that line of reasoning, if you consider iterator validity as pointer validity, then none of the iterators into a vector would be invalidated during an erase operation. The end()-1 iterator would become non-dereferencable, but it could remain valid, which is not the case.
pop_back is usually defined in terms of erase(end()-1, end()).
When you erase a range of iterators from a vector, all iterators from the first erased and forward are invalidated. This includes a range of one.
In general, a valid derefencable non input iterator always refers to the same data 'location' until it becomes invalidated. In the case of erase, all non-invalid iterators afterwards have the same value and location currently.
Both of the above rules would have to be amended to get the behavior you want. The first to something like 'unless you erase the last element in so doing, in which case the first erased element becomes the one-past-the-end iterator'. And the still valid iterator will become not dereferencable, a possibly unique state change for an iterator that is, as far as I know, without precedent in C++.
The cost is both extra tricky verbage in the standard to cover your requested behavior and sanity checks for strict iterators. The benefit -- well, I do not see one: in every situation you will have to know exactly what just happened (a very particular iterator just became one past the end instead of being invalidated), and if you know that is the case you could just talk about end.
And words are required. When you call erase( a, b ), every iterator from a on will have its state changed in some way (*a will not return the same value, and how that changes must be specified). C++ takes the easy way, and simply states that every iterator whose state is changed by erase becomes invalid, and using it is undefined behavior: this allows the maximium latitude to the implementer.
In theory, it can also allow for optimizations. If you dereference an iterator before and after an erase operation, the value you get can be considered the same! (assuming no possible indirect modification of the container within object destructors, which can be proven as well).

invalid iterator with random access iterators with deque

I am reading effective STL by Scott Meyers. Here in item 1 author is mentioning about how to choose among various containers and below is text snippet which I am having difficulty in understanding.
Would it be helpful to have a sequence container with random access
iterators where pointers and references to the data are not
invalidated as long as nothing is erased and insertions take place
only at the ends of the container? This is a very special case, but if
it’s your case, deque is the container of your dreams.
(Interestingly,deque’s iterators may be invalidated when insertions
are made only at the ends of the container. deque is the only standard
STL container whose iterators may be invalidated without also
invalidating its pointers and references.)
My questions on above text
What does author mean by pointers and references in above context and how is it different from iterators?
How deque's iterators may be invalidated when insertion made only at end and still we have valid pointers and references?
Request above two questions to be answered with simple example.
Thanks for your time and help.
For the first part, what's meant is this:
deque<int> foo(10, 1); // a deque with ten elements with value of 1
int& bar = foo.front(); // reference
int* baz = &foo.front(); // pointer
deque<int>::iterator buz = foo.begin(); // iterator
deque.push_front(0);
// At this point bar and baz are still valid, but buz may have been invalidated
For the second part it's been covered in the detail here:
Why does push_back or push_front invalidate a deque's iterators?
An iterator is often used to "cycle through" the elements of a standard-library container, much like you would do with an array index, e.g. in a for loop.
Iterators can be invalid for many reasons. One common case where this happens is when you use a for loop such as the following:
std::deque<int> c;
for(std::deque<int>::iterator i = c.begin(); i != c.end(); ++i) {
// do some stuff to the deque's elements here
}
At the end of the above loop, the iterator i will point to an "element" one block after the last real element in the deque. If you tried to do something like
*i = 88;
right after the end of the above for loop that would be a problem because the container does not "own" the memory i "points" to.
But what Meyers is likely talking about is that the Standard leaves much of the implementation of a deque open to the designer. Deques are usually implemented as linked-lists of blocks of memory holding several elements, so unlike vectors there is no guarantee that elements will be next to each other in memory. Furthermore, iterators necessarily contain information about these "blocks" so that they can traverse them smoothly (i.e. iterators are not simply pointers).
For example, if I push_back() a new element, but there is no more room in the "last" chunk of memory, then deque will need to allocate a new block of memory for the new element (and future elements added to the end). Since an iterator I was using previously might not "know" about this new chunk of memory, it could be invalid.
References and actual pointers, on the other hand, would be used in this context to refer/point to individual objects in the container. If I write
int& j = *c.begin();
then j is a reference to the first element of c. If I then do
c.push_front(74);
j still references that previous first element, even though it is no longer at the front of the deque.
However, if you insert something in the middle of the deque, then chances are you are effectively splitting one of those contiguous chunks of memory and trying to squeeze your new element in there. To make room, elements on one side or the other must be shuffled around in memory (and possibly new memory needs to be allocated). This would by necessity invalidate pointers/references to elements on that "side" of the insertion. Since it is up to the implementer how exactly room is made for an inserted element, all bets are off with respect to any pointer/reference, no matter where it is with respect to the insertion.

Why does push_back or push_front invalidate a deque's iterators?

As the title asks.
My understanding of a deque was that it allocated "blocks". I don't see how allocating more space invalidates iterators, and if anything, one would think that a deque's iterators would have more guarantees than a vector's, not less.
The C++ standard doesn't specify how deque is implemented. It isn't required to allocate new space by allocating a new chunk and chaining it on to the previous ones, all that's required is that insertion at each end be amortized constant time.
So, while it's easy to see how to implement deque such that it gives the guarantee you want[*], that's not the only way to do it.
[*] Iterators have a reference to an element, plus a reference to the block it's in so that they can continue forward/back off the ends of the block when they reach them. Plus I suppose a reference to the deque itself, so that operator+ can be constant-time as expected for random-access iterators -- following a chain of links from block to block isn't good enough.
What's more interesting is that push_back and push_front will not invalidate any references to a deque's elements. Only iterators are to be assumed invalid.
The standard, to my knowledge, doesn't state why. However if an iterator were implemented that was aware of its immediate neighbors - as a list is - that iterator would become invalid if it pointed to an element that was both at the edge of the deque and the edge of a block.
My guess. push_back/push_front can allocate a new memory block. A deque iterator must know when increment/decrement operator should jump into the next block. The implementation may store that information in iterator itself. Incrementing/decrementing an old iterator after push_back/push_front may not work as intended.
This code may or may not fail with run time error. On my Visual Studio it failed in debug mode but run to the conclusion in release mode. On Linux it caused segmentation fault.
#include <iostream>
#include <deque>
int main() {
std::deque<int> x(1), y(1);
std::deque<int>::iterator iterx = x.begin();
std::deque<int>::iterator itery = y.begin();
for (int i=1; i<1000000; ++i) {
x.push_back(i);
y.push_back(i);
++iterx;
++itery;
if(*iterx != *itery) {
std::cout << "increment failed at " << i << '\n';
break;
}
}
}
The key thing is not to make any assumptions just treat the iterator as if it will be invalidated.
Even if it works fine now, a later version of the compiler or the compiler for a different platform might come along and break your code. Alternatively, a colleague might come along and decide to turn your deque into a vector or linked list.
An iterator is not just a reference to the data. It must know how to increment, etc.
In order to support random access, implementations will have a dynamic array of pointers to the chunks. The deque iterator will point into this dynamic array. When the deque grows, a new chunk might need to be allocated. The dynamic array will grow, invalidating its iterators and, consequently, the deque's iterators.
So it is not that chunks are reallocated, but the array of pointers to these chunks can be. Indeed, as Johannes Schaub noted, references are not invalidated.
Also note that the deque's iterator guarantees are not less than the vector's, which are also invalidated when the container grows.
Even when you are allocating in chunks, an insert will cause that particular chunk to be reallocated if there isn't enough space (as is the case with vectors).
Because the standard says it can. It does not mandate that deque be implemented as a list of chunks. It mandates a particular interface with particular pre and post conditions and particular algorithmic complexity minimums.
Implementors are free to implement the thing in whatever way they choose, so long as it meets all of those requirements. A sensible implementation might use lists of chunks, or it might use some other technique with different trade-offs.
It's probably impossible to say that one technique is strictly better than another for all users in all situations. Which is why the standard gives implementors some freedom to choose.