Why does vector::pop_back invalidate the iterator (end() - 1)? - c++

Note: The question applies to erase, too. See bottom.
What's the reason behind the fact that the end() - 1 iterator is invalidated after pop_back is called on a vector?
To clarify, I'm referring to this situation:
std::vector<int> v;
v.push_back(1);
v.push_back(2);
std::vector<int>::iterator i1 = v.begin(), i2 = v.end() - 1, i3 = v.begin() + 1;
v.pop_back();
// i1 is still valid
// i2 is now invalid
// i3 is now invalid too
std::vector<int>::iterator i4 = v.end();
assert(i2 == i4); // undefined behavior (but why should it be?!)
assert(i3 == i4); // undefined behavior (but why should it be?!)
Why does this happen? (i.e. when would this invalidation ever prove beneficial for the implementation?)
(Note that this isn't just a theoretical problem. Visual C++ 2013 -- and probably 2012 too -- display an error if you try to do this in debug mode, if you have _ITERATOR_DEBUG_LEVEL set to 2.)
Regarding erase:
Note that the same question applies to erase:
Why does erase(end() - 1, end()) invalidate end() - 1?
(So please don't say, "pop_back invalidates end() - 1 because it is equivalent to calling erase(end() - 1, end())"; that's just begging the question.)

The interesting question is really what does it mean for an iterator to be invalidated. And I truly don't have a good answer from the standard. What I do know is that to some extent the standard considers an iterator not as a pointer to a location inside the container, but rather as a proxy to a particular element that lives within the container.
With that in mind, after erasing of a single element in the middle of a vector, all iterators after the point of removal become invalidated as they no longer refer to the same element that they referred to before.
Support for this line of reasoning comes from the iterator invalidation clauses of other operations in the container. For example, on insert, the standard guarantees that if there is no reallocation the iterators before the point of insertion remain valid. Exceptio probat regulam in casibus non exceptis, it invalidates all iterators after the point of insertion.
If the validity of iterators was only related to the fact that there is an element of the container where the iterator points, then none of the iterators would be invalidated with that operation (again, in the absence of reallocations).
Going even further in that line of reasoning, if you consider iterator validity as pointer validity, then none of the iterators into a vector would be invalidated during an erase operation. The end()-1 iterator would become non-dereferencable, but it could remain valid, which is not the case.

pop_back is usually defined in terms of erase(end()-1, end()).
When you erase a range of iterators from a vector, all iterators from the first erased and forward are invalidated. This includes a range of one.
In general, a valid derefencable non input iterator always refers to the same data 'location' until it becomes invalidated. In the case of erase, all non-invalid iterators afterwards have the same value and location currently.
Both of the above rules would have to be amended to get the behavior you want. The first to something like 'unless you erase the last element in so doing, in which case the first erased element becomes the one-past-the-end iterator'. And the still valid iterator will become not dereferencable, a possibly unique state change for an iterator that is, as far as I know, without precedent in C++.
The cost is both extra tricky verbage in the standard to cover your requested behavior and sanity checks for strict iterators. The benefit -- well, I do not see one: in every situation you will have to know exactly what just happened (a very particular iterator just became one past the end instead of being invalidated), and if you know that is the case you could just talk about end.
And words are required. When you call erase( a, b ), every iterator from a on will have its state changed in some way (*a will not return the same value, and how that changes must be specified). C++ takes the easy way, and simply states that every iterator whose state is changed by erase becomes invalid, and using it is undefined behavior: this allows the maximium latitude to the implementer.
In theory, it can also allow for optimizations. If you dereference an iterator before and after an erase operation, the value you get can be considered the same! (assuming no possible indirect modification of the container within object destructors, which can be proven as well).

Related

Are iterators still valid when the underlying elements have been moved?

If I have an iterator pointing to an element in an STL container, and I moved the element with the iterator, does the standard guarantee that the iterator is still valid? Can I use it with container's method, e.g. container::erase?
Also does it matter, if the container is a continuous one, e.g. vector, or non-continuous one, e.g. list?
std::list<std::string> l{"a", "b", "c"};
auto iter = l.begin();
auto s = std::move(*iter);
l.erase(iter); // <----- is it valid to erase it, whose underlying element has been removed?
Yes, you've modified the object in the container. You've not modified the container itself so the iterator is still valid
"Moving" an underlying element may not be the best name to use in this context. The name of this operation express the intention behind it but not how it really works.
In fact, the move operation is a form of copy operation with one difference: it is allowed to change the state of the "copied" object if it speeds up the execution. In case of the std::string this means that the internal buffer containing characters may be not deep-copied but just copied by address. The original object has to be then set to an empty state, to tell it to not use this buffer anymore. (Emptying the source string is not guaranteed. Optimizations of std::string are more complicated than I described.)
The important thing is that after the move operation, the original object is still there. It's just not guaranteed to have any specific state.
In this particular case you've done nothing to the iterator, but much rather to the object within it, so yes: The iterator remains valid.
But if you look at std::list::erase, it sports a line such as "References and iterators to the erased elements are invalidated. Other references and iterators are not affected."
So if you tried to do *iter after erase, it would cause your program to fail.
This may seem obvious for erase, but there are other operations where it is not as obvious.
For std::list for example, the reference page says:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
For std::vector on the other hand, the reference for the push_back method says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
That means, unlike with std::list, it is not generally safe to keep an iterator to an element around, if the vector grows (because the underlying storage location of the item changes).

Does std::vector::erase() invalidate the iterator at the point of erase?

C++03 Standard § 23.2.4.3/3 describes std::vector::erase(iterator position) and says specifically
Invalidates all the iterators and references after the point of the erase.
Is the iterator at the point of the erase not invalidated? Specifically if I have a vector with a single element and I copy begin() iterator into a local variable and then call
vec.erase(vec.begin())
Will that iterator I have in a local variable get invalidated or not?
Will the iterators be invalidated after the point of erasure or after and including the point of erasure?
I'd say that your example with erasing the only element in the vector shows that the iterator at the insertion point must be invalidated.
Anyway, in C++11, the wording has been changed (23.3.6.5/3):
Effects: Invalidates iterators and references at or after the point of the erase.
The complexity of vector::erase() says: Linear on the number of elements erased (destructions) plus the number of elements after the last element deleted (moving).
It seems to me that it is implemented as a lazy grow/shrink array. The iterator, a pointer to the data, when erased, the following data will copied into its place. And thus the iterator you keep, will point to some data else, provided that the data you erase is not the last one.
In fact, it may depend on implementation. Yet I think a lazy grow/shrink array is the best-fit implementation for vector::erase(). [Since it may depends on implementation, don't count on anything like invalidate...]

Difference between `iterators` and `references to elements`

I followed the following post that illustrates the scenario how iterator behaves after some non-const operations.
Iterator invalidation rules
I have problems to understand the difference between reference and iterator. Here is one of the rule listed as an example for clarification:
deque: all iterators and references are
invalidated, unless the inserted member is at an end (front or back)
of the deque (in which case all iterators are invalidated, but
references to elements are unaffected) [23.2.1.3/1]
Here is the sample code that is based on the reference.
std::deque<int> mydeque;
mydeque.push_back(1);
mydeque.push_back(2);
mydeque.push_back(3);
const int& int3 = mydeque.back(); // reference to 3
int& int3 = mydeque.back();
std::deque<int>::iterator itBack = mydeque.crbegin(); // pointing to 3
mydeque.push_back(4);
Question> If my vague understanding is correct, then the following statement is true:
After the calling of the line of `mydeque.push_back(4)`
`int3` is still valid because it is a reference to element.
`itBack` is invalidated because it is an iterator.
Thank you
Yes, that sounds correct. The reason the iterator gets invalidated but the reference does not is that the iterator needs to be able to do ++ and -- correctly, but pushing something onto a deque might cause it to rearrange its structures for tracking that. But, deque guarantees it won't move the elements themselves, even if it has to restructure the container around them.
This suggests that the implementation of deque has an additional level of indirection built into its iterators that it hides from you. But, that's pretty much the entire the point of iterators, and why they're distinct from references.

C++ and iterator invalidation

So I'm going through Accelerated C++ and am somewhat unsure about iterator invalidation in C++. Maybe it's the fact that it is never explained how these iterators are constructed is the problem.
Here is one example:
Vector with {1,2,3}
If my iterator is on {2} and I call an erase on {2} my iterator is invalid. Why? In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element. The only way I would see this as being not true is if iterators were made before hand for each element and each iterator had some type of field containing the address of the following element in that container.
My other question has to do with the statement such as "invalidates all other iterators". Erm, when I loop through my vector container, I am using one iterator. Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
In my head, {3} is shifted down so the memory location of where {2} was so the iterator is still pointing to a valid element.
That may be the case. But it’s equally valid that the whole vector is relocated in memory, thus making all iterators point to now-defunct memory locations. C++ simply makes no guarantees either way. (See comments for discussion.)
Do all those elements in the vector implicitly have their own iterator associated with them or am I missing something?
You’re merely missing the fact that you may have other iterators referencing the same vector besides your loop variable. For example, the following loop is an idiomatic style that caches the end iterator of the vector to avoid redundant calls:
vector<int> vec;
// …
for (vector<int>::iterator i(vec.begin()), end(vec.end()); i != end; ++i) {
if (some_condition)
vec.erase(i); // invalidates `i` and `end`.
}
(Nevermind the fact that this copy of the end iterator is in fact unnecessary with the STL on modern compilers.)
The following C++ defect report (fixed in C++0x) contains a brief discussion of the meaning of "invalidate":
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#414
int A[8] = { 1,3,5,7,9,8,4,2 };
std::vector<int> v(A, A+8);
std::vector<int>::iterator i1 = v.begin() + 3;
std::vector<int>::iterator i2 = v.begin() + 4;
v.erase(i1);
Which iterators are invalidated by
v.erase(i1): i1, i2, both, or neither?
On all existing implementations that I
know of, the status of i1 and i2 is
the same: both of them will be
iterators that point to some elements
of the vector (albeit not the same
elements they did before). You won't
get a crash if you use them. Depending
on exactly what you mean by
"invalidate", you might say that
neither one has been invalidated
because they still point to something,
or you might say that both have been
invalidated because in both cases the
elements they point to have been
changed out from under the iterator.
It seems that the specification is "playing safe" regarding iterator and reference invalidation. It says that they're invalidated even though, as you and Matt Austern both noted, there's still a vector element at the same address. It just has a different value.
So, those of us following the standard must program as if that iterator can't be used any more, even though no implementation is likely to do anything that would actually stop them working, except perhaps a debugging iterator that could do extra work to let us know we're off-road.
In fact that defect report relates to exactly the case you're talking about. As far as the C++03 standard actually says, at least in that clause, your iterator isn't invalidated. But that was considered an error.
An iterator basically wraps a pointer. Some operations on containers have the effect of reallocating some or all of the data behind the scenes. In that case, all current pointers/iterators are left pointing to the wrong memory locations.
The image "in your mind" is an implementation detail, and it could be that your iterator isn't implemented that way. Likely it is, but it could be that it isn't.
The "ivalidates all other iterators" language is their way of saying that the implemenation is allowed the freedom to do anything its coders' skeevie hearts feel like to the contaier when you perform that operation, including things that require internal changes to iterators. Since the only iterator it has access to is the one you passed in, that's the only one that it can fix up if need be.
If you want the behavior in your head for a vector, it is easy to get. Just use an index into the vector instead of an iterator. Then it works just like you think.
Chances are that your iterator is actually pointing at the 3 -- but it's not certain.
The general idea is to allow vector to allocate new storage and move your data from one block of storage to another when/if it sees fit to do so. As such, when you insert or delete data, the data might move to some other part of memory entirely.
At least that was sort of the intent. It turns out that other rules probably prevent it from moving the data when you delete -- but the iterator is invalidated anyway, probably because somebody didn't quite understand all the implications of those other rules when this one was made.
From SGI http://www.sgi.com/tech/stl/Vector.html
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
So you can erase starting from end
int i;
vector v;
for ( i = v.size(), i >=0, i--)
{
if (v[i])
v.erase(v.begin() + i);
}
OR use iterator returned from vector erase()
std::vector<int> v;
for (std::vector<int>::iterator it = v.begin(); it != v.end(); )
it = v.erase(it);

Is end() required to be constant in an STL map/set?

§23.1.2.8 in the standard states that insertion/deletion operations on a set/map will not invalidate any iterators to those objects (except iterators pointing to a deleted element).
Now, consider the following situation: you want to implement a graph with uniquely numbered nodes, where every node has a fixed number (let's say 4) of neighbors. Taking advantage of the above rule, you do it like this:
class Node {
private:
// iterators to neighboring nodes
std::map<int, Node>::iterator neighbors[4];
friend class Graph;
};
class Graph {
private:
std::map<int, Node> nodes;
};
(EDIT: Not literally like this due to the incompleteness of Node in line 4 (see responses/comments), but along these lines anyway)
This is good, because this way you can insert and delete nodes without invalidating the consistency of the structure (assuming you keep track of deletions and remove the deleted iterator from every node's array).
But let's say you also want to be able to store an "invalid" or "nonexistent" neighbor value. Not to worry, we can just use nodes.end()... or can we? Is there some sort of guarantee that nodes.end() at 8 AM will be the same as nodes.end() at 10 PM after a zillion insertions/deletions? That is, can I safely == compare an iterator received as a parameter to nodes.end() in some method of Graph?
And if not, would this work?
class Graph {
private:
std::map<int, Node> nodes;
std::map<int, Node>::iterator _INVALID;
public:
Graph() { _INVALID = nodes.end(); }
};
That is, can I store nodes.end() in a variable upon construction, and then use this variable whenever I want to set a neighbor to invalid state, or to compare it against a parameter in a method? Or is it possible that somewhere down the line a valid iterator pointing to an existing object will compare equal to _INVALID?
And if this doesn't work either, what can I do to leave room for an invalid neighbor value?
You write (emphasis by me):
§23.1.2.8 in the standard states that insertion/deletion operations on a set/map will not invalidate any iterators to those objects (except iterators pointing to a deleted element).
Actually, the text of 23.1.2/8 is a bit different (again, emphasis by me):
The insert members shall not affect the validity of iterators and references to the container, and the erase members shall invalidate only iterators and references to the erased elements.
I read this as: If you have a map, and somehow obtain an iterator into this map (again: it doesn't say to an object in the map), this iterator will stay valid despite insertion and removal of elements. Assuming std::map<K,V>::end() obtains an "iterator into the map", it should not be invalidated by insertion/removal.
This, of course, leaves the question whether "not invalidated" means it will always have the same value. My personal assumption is that this is not specified. However, in order for the "not invalidated" phrase to make sense, all results of std::map<K,V>::end() for the same map must always compare equal even in the face of insertions/removal:
my_map_t::iterator old_end = my_map.end();
// wildly change my_map
assert( old_end == my_map.end() );
My interpretation is that, if old_end remains "valid" throughout changes to the map (as the standard promisses), then that assertion should pass.
Disclaimer: I am not a native speaker and have a very hard time digesting that dreaded legaleze of the Holy PDF. In fact, in general I avoid it like the plague.
Oh, and my first thought also was: The question is interesting from an academic POV, but why doesn't he simply store keys instead of iterators?
23.1/7 says that end() returns an iterator that
is the past-the-end value for the container.
First, it confirms that what end() returns is the iterator. Second, it says that the iterator doesn't point to a particular element. Since deletion can only invalidate iterators that point somewhere (to the element being deleted), deletions can't invalidate end().
Well, there's nothing preventing particular collection implementation from having end() depend on the instance of collection and time of day, as long as comparisons and such work. Which means, that, perhaps, end() value may change, but old_end == end() comparison should still yield true. (edit: although after reading the comment from j_random_hacker, I doubt this paragraph itself evaluates to true ;-), not universally — see the discussion below )
I also doubt you can use std::map<int,Node>::iterator in the Node class due to the type being incomplete, yet (not sure, though).
Also, since your nodes are uniquely numbered, you can use int for keying them and reserve some value for invalid.
Iterators in (multi)sets and (multi)maps won't be invalidated in insertions and deletions and thus comparing .end() against previous stored values of .end() will always yield true.
Take as an example GNU libstdc++ implementation where .end() in maps returns the default intialized value of Rb_tree_node
From stl_tree.h:
_M_initialize()
{
this->_M_header._M_color = _S_red;
this->_M_header._M_parent = 0;
this->_M_header._M_left = &this->_M_header;
this->_M_header._M_right = &this->_M_header;
}
Assuming that (1) map implemented with red-black tree (2) you use same instance "after a zillion insertions/deletions"- answer "Yes".
Relative implmentation I can tell that all incarnation of stl I ever know use the tree algorithm.
A couple points:
1) end() references an element that is past the end of the container. It doesn't change when inserts or deletes change the container because it's not pointing to an element.
2) I think perhaps your idea of storing an array of 4 iterators in the Node could be changed to make the entire problem make more sense. What you want is to add a new iterator type to the Graph object that is capable of iterating over a single node's neighbours. The implementation of this iterator will need to access the members of the map, which possibly leads you down the path of making the Graph class extend the map collection. With the Graph class being an extended std::map, then the language changes, and you no longer need to store an invalid iterator, but instead simply need to write the algorithm to determine who is the 'next neighbour' in the map.
I think it is clear:
end() returns an iterator to the element one past the end.
Insertion/Deletion do not affect existing iterators so the returned values are always valid (unless you try to delete the element one past the end (but that would result in undefined behavior anyway)).
Thus any new iterator generated by end() (would be different but) when compared with the original using operator== would return true.
Also any intermediate values generated using the assignment operator= have a post condition that they compare equal with operator== and operator== is transitive for iterators.
So yes, it is valid to store the iterator returned by end() (but only because of the guarantees with associative containers, therefor it would not be valid for vector etc).
Remember the iterator is not necessarily a pointer. It can potentially be an object where the designer of the container has defined all the operations on the class.
I believe that this depends entirely on what type of iterator is being used.
In a vector, end() is the one past the end pointer and it will obviously change as elements are inserted and removed.
In another kind of container, the end() iterator might be a special value like NULL or a default constructed element. In this case it doesn't change because it doesn't point at anything. Instead of being a pointer-like thing, end() is just a value to compare against.
I believe that set and map iterators are the second kind, but I don't know of anything that requires them to be implemented in that way.
C++ Standard states that iterators should stay valid. And it is. Standard clearly states that in 23.1.2/8:
The insert members shall not affect the validity of iterators and references to the container, and the erase members shall invalidate only iterators and references to the erased elements.
And in 21.1/7:
end() returns an iterator which is the past-the-end value for the container.
So iterators old_end and new_end will be valid. That means that we could get --old_end (call it it1) and --new_end (call it it2) and it will be the-end value iterators (from definition of what end() returns), since iterator of an associative container is of the bidirectional iterator category (according to 23.1.2/6) and according to definition of --r operation (Table 75).
Now it1 should be equal it2 since it gives the-end value, which is only one (23.1.2/9). Then from 24.1.3 follows that: The condition that a == b implies ++a == ++b. And ++it1 and ++it2 will give old_end and new_end iterators (from definition of ++r operation Table 74). Now we get that old_end and new_end should be equal.
I had a similar question recently, but I was wondering if calling end() to retrieve an iterator for comparison purposes could possibly have race conditions.
According to the standard, two iterators are considered equivalent if both can be dereferenced and &*a == &*b or if neither can be dereferenced. Finding the bolded statement took a while and is very relevant here.
Because an std::map::iterator cannot be invalidated unless the element it points to has been removed, you're guaranteed that two iterators returned by end, regardless of what the state of the map was when they were obtained, will always compare to each other as true.