Invalidated iterator in a vector - c++

I'm aware erasing will invalidate iterators at and after the point of the erase. Consider:
std::vector<int> vec = {1, 2, 3, 4, 5};
std::vector<int>::iterator it = vec.end() - 1; //last element
vec.erase(vec.begin()); //shift everything one to the left, 'it' should be the new 'end()' ?
std::cout << (it == vec.end()); //not dereferencing 'it', just comparing, UB ?
Is it undefined behavior to compare (not dereference) an invalidated iterator (it in this case)? If not, is it == vec.end() guaranteed to hold true?
Edit: from the top answer it looks like this is UB if only it is a singular value. But from What is singular and non-singular values in the context of STL iterators? it seems like it is (or was) associated with the container hence making it non-singular.
I'd appreciate further analysis on this, thank you.

Once your iterator has been invalidated, it may be UB to even compare it to something else:
[C++14: 24.2.1/10]: An invalid iterator is an iterator that may be singular.
[C++14: 24.2.1/5]: [..] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation. [..]
Note that this means you also can't compare a default-constructed iterator to any .end().
Contrary to popular belief that "pointers are just memory addresses", these rules are also largely true for pointers. Indeed, the rules for iterators are a generalisation of the rules for pointers.

Formally any iterator pointing to an element on or after the erased element is invalidated. So
Yes this is UB (despite it being a pointer under the hood.)
UB again, despite the obvious plausibility.

Related

std::list - are the iterators invalidated on move?

std::list iterators have some very nice properties - they remain valid when any other element is removed, when a new element is added and even when 2 lists are swapped (Iterator invalidation rules)!
Considering following code behaviour and that the iterators are implement by a form of pointer to the actual node which doesn't change when the list is moved, my guess is that the iterators are still valid in the new container when a std::list is moved, but also I can be in the UB area here by accessing invalid memory which actually has the "expected" value.
std::list<int> l1{3, 2, 1};
std::list<int> l2;
auto it = std::prev(l1.end());
std::cout<<l1.size()<<" "<<l2.size()<<" "<<*it<<std::endl;
l2 = std::move(l1);
std::cout<<l2.size()<<" "<<*it<<std::endl;
3 0 1
3 1
Is it guaranteed by the standard if the iterators remain valid when std::list is moved? What about other containers?
For containers in general, only swap guarantees that iterators remain valid (and point into the swapped containers).
For std::list, the special member function splice() guarantees that iterators retain their expected meaning.
In general, constructing a container from an rvalue doesn't make guarantees about iterators; the only general requirement is that the new container has the "same value" as the container it was constructed from had originally.
(You can imagine debug iterator implementations that store a reference to the container, and that reference would become dangling after a move.)

vector::erase and reverse_iterator

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).
Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:
grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).
std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
iterate(vect_rit = it->rbegin() -> it->rend())
{
// ...
what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
if (what_to_delete.size() > threshold)
what_to_delete.erase(what_to_delete.begin());
// ...
}
}
Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.
When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:
The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)
Which means that to delete a vect_rit, I use:
vector.erase(--vect_rit.base());
Now, according to C++11 standard §23.3.6.5/3:
iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.
How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?
Edit:
Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.
Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?
Thanks!
In the question you have already quoted exactly what the standard says a reverse_iterator is:
The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)
Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.
If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.
As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).
You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1
Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.
However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:
vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));
From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:
namespace std {
template <class Iterator>
class reverse_iterator ...
{
...
Iterator base() const; // explicit
...
protected:
Iterator current;
...
};
}
And from §24.5.1.3.3:
Iterator base() const; // explicit
Returns: current.
Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.
Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.
Just in case that's clear as mud, let me say that differently:
std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;
// Erase everything after it_1's pointee:
// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();
// but that points one too far in, so decrement;
--tmp_it;
// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());
// perform erasure
v.erase(tmp_it); // invalidates all iterators pointing at or past *tmp_it
// (like, say it_2.base()...)
// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!
// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);
In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.
The reverse_iterator, just like the normal iterator, points to a certain position in the vector. Implementation details are irrelevant, but if you must know, they both are (in a typical implementation) just plain old pointers inside. The difference is the direction. The reverse iterator has its + and - reversed w.r.t. the regular iterator (and also ++ and --, > and < etc).
This is interesting to know, but doesn't really imply an answer to the main question.
If you read the language carefully, it says:
Invalidates iterators and references at or after the point of the erase.
References do not have a built-in sense of direction. Hence, the language clearly refers to the container's own sense of direction. Positions after the point of the erase are those with higher indices. Hence, the iterator's direction is irrelevant here.

Assigning boost::iterator_range to singular range

I'm using Boost.Range to pass around some data and a container class for this data. The data is loaded in a different thread and may in some cases not be ready yet. In this case the container is initialized with the default iterator_range, hence containing singular iterators. I'm doing assignments and copying of the data containers (hence the iterator_ranges). However, the iterator_range copy constructor calls begin() and end() which will assert when the original is singular. Therefor, it is not possible to copy an empty data container.
Is there any way to avoid this limitation?
Why is this limitation been implemented? The following works just fine, shouldn't ranges behave similarly?
typedef std::vector<int>::iterator iterator;
iterator it; // Singular
iterator it2 = it; // Works
boost::iterator_range<iterator> range; // Singular
boost::iterator_range<iterator> range2 = range; // Asserts in debug, but why?
If by "works", you mean "it does not blow up with my current compiler version and invocation options", then yes, assigning a singular iterator might "work".
Actually, the code
typedef std::vector<int>::iterator iterator;
iterator it; // Singular
iterator it2 = it; // Works
results in undefined behaviour, so you are up to the whims of your compiler for what may happen.
The C++ standard has this to say about it (section [lib.iterator.requirements]/5):
[...] Results of most expressions are undefined for singular values; the only exception is an assignment of a non-singular value to an iterator that holds a singular value. In this case the singular value is overwritten the same way as any other value. Dereferenceable and past-the-end values are always non-singular.
So, in the end ranges do work similar to single iterators. It just doesn't work the way you would like.
I think the best way is to use an empty range (explicitly constructed with to non-singular iterators) when the data is not yet ready, instead of a singular range.

Why is comparing against "end()" iterator legal?

According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:
std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
//process( *it );
}
std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a pointer through that iterator is undefined behavior.
Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?
The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.
Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:
char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);
That code will work just fine, and it mirrors your vector code.
You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.
The C standard, section 6.5.6.8 says that it's well defined and valid:
...if the expression P points to the
last element of an array object, the
expression (P)+1 points one past the
last element of the array object...
but cannot be dereferenced:
...if the result points one past the
last element of the array object, it
shall not be used as the operand of a
unary * operator that is evaluated...
One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.
std::vector<X>::iterator it;
This is a singular iterator. You can only assign a valid iterator to it.
std::vector<X>::iterator it = vec.end();
This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).
Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.
It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.
The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.
I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.
First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.
Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".
From the text of the standard in [iterator.requirements], it is clear that:
Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
Dereferenceable values are not singular, since they are associated with sequence.
Past-the-end values are not singular, since they are associated with sequence.
An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.
Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.
The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.
Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.
Simple. Iterators aren't (necessarily) pointers.
They have some similarities (i.e. you can dereference them), but that's about it.
Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite
According to C++ standard (3.7.3.2/4)
using (not only dereferencing, but
also copying, casting, whatever else)
an invalid pointer is undefined
behavior
wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:
int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule

Comparing default-constructed iterators with operator==

Does the C++ Standard say I should be able to compare two default-constructed STL iterators for equality? Are default-constructed iterators equality-comparable?
I want the following, using std::list for example:
void foo(const std::list<int>::iterator iter) {
if (iter == std::list<int>::iterator()) {
// Something
}
}
std::list<int>::iterator i;
foo(i);
What I want here is something like a NULL value for iterators, but I'm not sure if it's legal. In the STL implementation included with Visual Studio 2008, they include assertions in std::list's operator==() that preclude this usage. (They check that each iterator is "owned" by the same container and default-constructed iterators have no container.) This would hint that it's not legal, or perhaps that they're being over-zealous.
OK, I'll take a stab. The C++ Standard, Section 24.1/5:
Iterators can also have singular
values that are not associated with
any container. [Example: After the
declaration of an uninitialized
pointer x (as with int* x;), x must
always be assumed to have a singular
value of a pointer. ] Results of most
expressions are undefined for singular
values; the only excep- tion is an
assignment of a non-singular value to
an iterator that holds a singular
value.
So, no, they can't be compared.
This is going to change in C++14. [forward.iterators] 24.2.5p2 of N3936 says
However, value-initialized iterators may be compared and shall compare
equal to other value-initialized iterators of the same type.
I believe you should pass a range to the function.
void fun(std::list<int>::iterator beg, std::list<int>::iterator end)
{
while(beg != end)
{
// do what you want here.
beg++;
}
}
Specification says that the postcondition of default constructor is that iterator is singular. The comparison for equality are undefined, so it may be different in some implementation.