Does std::vector::swap invalidate iterators? - c++

If I swap two vectors, will their iterators remain valid, now just pointing to the "other" container, or will the iterator be invalidated?
That is, given:
using namespace std;
vector<int> x(42, 42);
vector<int> y;
vector<int>::iterator a = x.begin();
vector<int>::iterator b = x.end();
x.swap(y);
// a and b still valid? Pointing to x or y?
It seems the std mentions nothing about this:
[n3092 - 23.3.6.2]
void swap(vector<T,Allocator>& x);
Effects:
Exchanges the contents and capacity()
of *this with that of x.
Note that since I'm on VS 2005 I'm also interested in the effects of iterator debug checks etc. (_SECURE_SCL)

The behavior of swap has been clarified considerably in C++11, in large part to permit the Standard Library algorithms to use argument dependent lookup (ADL) to find swap functions for user-defined types. C++11 adds a swappable concept (C++11 §17.6.3.2[swappable.requirements]) to make this legal (and required).
The text in the C++11 language standard that addresses your question is the following text from the container requirements (§23.2.1[container.requirements.general]/8), which defines the behavior of the swap member function of a container:
Every iterator referring to an element in one container before the swap shall refer to the same element in the other container after the swap.
It is unspecified whether an iterator with value a.end() before the swap will have value b.end() after the swap.
In your example, a is guaranteed to be valid after the swap, but b is not because it is an end iterator. The reason end iterators are not guaranteed to be valid is explained in a note at §23.2.1/10:
[Note: the end() iterator does not refer to any element, so it may be
invalidated. --end note]
This is the same behavior that is defined in C++03, just substantially clarified. The original language from C++03 is at C++03 §23.1/10:
no swap() function invalidates any references, pointers, or iterators referring to the elements of the containers being swapped.
It's not immediately obvious in the original text, but the phrase "to the elements of the containers" is extremely important, because end() iterators do not point to elements.

Swapping two vectors does not invalidate the iterators, pointers, and references to its elements (C++03, 23.1.11).
Typically the iterator would contain knowledge of its container, and the swap operation maintains this for a given iterator.
In VC++ 10 the vector container is managed using this structure in <xutility>, for example:
struct _Container_proxy
{ // store head of iterator chain and back pointer
_Container_proxy()
: _Mycont(0), _Myfirstiter(0)
{ // construct from pointers
}
const _Container_base12 *_Mycont;
_Iterator_base12 *_Myfirstiter;
};

All iterators that refer to the elements of the containers remain valid

As for Visual Studio 2005, I have just tested it.
I think it should always work, as the vector::swap function even contains an explicit step to swap everything:
// vector-header
void swap(_Myt& _Right)
{ // exchange contents with _Right
if (this->_Alval == _Right._Alval)
{ // same allocator, swap control information
#if _HAS_ITERATOR_DEBUGGING
this->_Swap_all(_Right);
#endif /* _HAS_ITERATOR_DEBUGGING */
...
The iterators point to their original elements in the now-swapped vector object. (I.e. w/rg to the OP, they first pointed to elements in x, after the swap they point to elements in y.)
Note that in the n3092 draft the requirement is laid out in §23.2.1/9 :
Every iterator referring to an
element in one container before the
swap shall refer to the same element
in the other container after the swap.

Related

Is it safe to push_back an element from the same vector?

vector<int> v;
v.push_back(1);
v.push_back(v[0]);
If the second push_back causes a reallocation, the reference to the first integer in the vector will no longer be valid. So this isn't safe?
vector<int> v;
v.push_back(1);
v.reserve(v.size() + 1);
v.push_back(v[0]);
This makes it safe?
It looks like http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#526 addressed this problem (or something very similar to it) as a potential defect in the standard:
1) Parameters taken by const reference can be changed during execution
of the function
Examples:
Given std::vector v:
v.insert(v.begin(), v[2]);
v[2] can be changed by moving elements of vector
The proposed resolution was that this was not a defect:
vector::insert(iter, value) is required to work because the standard
doesn't give permission for it not to work.
Yes, it's safe, and standard library implementations jump through hoops to make it so.
I believe implementers trace this requirement back to 23.2/11 somehow, but I can't figure out how, and I can't find something more concrete either. The best I can find is this article:
http://www.drdobbs.com/cpp/copying-container-elements-from-the-c-li/240155771
Inspection of libc++'s and libstdc++'s implementations shows that they are also safe.
The standard guarantees even your first example to be safe. Quoting C++11
[sequence.reqmts]
3 In Tables 100 and 101 ... X denotes a sequence container class, a denotes a value of X containing elements of type T, ... t denotes an lvalue or a const rvalue of X::value_type
16 Table 101 ...
Expression a.push_back(t) Return type void Operational semantics Appends a copy of t. Requires: T shall be CopyInsertable into X. Container basic_string, deque, list, vector
So even though it's not exactly trivial, the implementation must guarantee it will not invalidate the reference when doing the push_back.
It is not obvious that the first example is safe, because the simplest implementation of push_back would be to first reallocate the vector, if needed, and then copy the reference.
But at least it seems to be safe with Visual Studio 2010. Its implementation of push_back does special handling of the case when you push back an element in the vector.
The code is structured as follows:
void push_back(const _Ty& _Val)
{ // insert element at end
if (_Inside(_STD addressof(_Val)))
{ // push back an element
...
}
else
{ // push back a non-element
...
}
}
This isn't a guarantee from the standard, but as another data point, v.push_back(v[0]) is safe for LLVM's libc++.
libc++'s std::vector::push_back calls __push_back_slow_path when it needs to reallocate memory:
void __push_back_slow_path(_Up& __x) {
allocator_type& __a = this->__alloc();
__split_buffer<value_type, allocator_type&> __v(__recommend(size() + 1),
size(),
__a);
// Note that we construct a copy of __x before deallocating
// the existing storage or moving existing elements.
__alloc_traits::construct(__a,
_VSTD::__to_raw_pointer(__v.__end_),
_VSTD::forward<_Up>(__x));
__v.__end_++;
// Moving existing elements happens here:
__swap_out_circular_buffer(__v);
// When __v goes out of scope, __x will be invalid.
}
The first version is definitely NOT safe:
Operations on iterators obtained by calling a standard library container or string member function may access the underlying container, but shall not modify it. [ Note: In particular, container operations that invalidate iterators conflict with operations on iterators associated with that container. — end note ]
from section 17.6.5.9
Note that this is the section on data races, which people normally think of in conjunction with threading... but the actual definition involves "happens before" relationships, and I don't see any ordering relationship between the multiple side-effects of push_back in play here, namely the reference invalidation seems not to be defined as ordered with respect to copy-constructing the new tail element.
It is completely safe.
In your second example you have
v.reserve(v.size() + 1);
which is not needed because if vector goes out of its size, it will imply the reserve.
Vector is responsible for this stuff, not you.
Both are safe since push_back will copy the value, not the reference. If you are storing pointers, that is still safe as far as the vector is concerned, but just know that you'll have two elements of your vector pointing to the same data.
Section 23.2.1 General Container Requirements
16
a.push_back(t) Appends a copy of t. Requires: T shall be CopyInsertable into X.
a.push_back(rv) Appends a copy of rv. Requires: T shall be MoveInsertable into X.
Implementations of push_back must therefore ensure that a copy of v[0] is inserted. By counter example, assuming an implementation that would reallocate before copying, it would not assuredly append a copy of v[0] and as such violate the specs.
From 23.3.6.5/1: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens, all the iterators and references before the insertion point remain valid.
Since we're inserting at the end, no references will be invalidated if the vector isn't resized. So if the vector's capacity() > size() then it's guaranteed to work, otherwise it's guaranteed to be undefined behavior.

Does incrementing a mutable input iterator invalidate old iterator values?

Iterators that further satisfy the requirements of output iterators are called mutable iterators. Nonmutable iterators are referred to as constant iterators. [24.2.1:4]
This suggests you could have a mutable input iterator, which meets the requirements of both input and output iterators.
After incrementing an input iterator, copies of its old value need not be dereferenceable [24.2.3]. However, the standard does not say the same for output iterators; in fact, the operational semantics for postfix increment are given as { X tmp = r; ++r; return tmp; }, suggesting that output iterators may not invalidate (copies of) old iterator values.
So, can incrementing a mutable input iterator invalidate old iterator copies?
If so, how would you support code like X a(r++); *a = t or X::reference p(*r++); p = t with (e.g.) a proxy object?
If not, then why does boost::iterator claim it needs a proxy object? (Link is code; scroll down to read the comments on structs writable_postfix_increment_proxy and postfix_increment_result). That is, if you can return a (dereferenceable) copy of the old iterator value, why would you need to wrap this copy in a proxy?
The explanation if found in the next section, [24.2.5] Forward iterators, where it is stated how these differ from input and output iterators:
Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if:
— a == b implies ++a == ++b and
— X is a pointer type or the expression (void)++X(a), *a is equivalent to the expression *a.
[ Note: The requirement that a == b implies ++a == ++b (which is not true for input and output iterators) and the removal of the restrictions on the number of the assignments through a mutable iterator (which applies to output iterators) allows the use of multi-pass one-directional algorithms with forward iterators.
—end note ]
Unfortunately, the standard must be read as a whole, and the explanation is not always where you expect it to be.
Input and output iterators are basically designed to allow single-pass traversal: to describe sequences where each element can only be visited once.
Streams are a great example. If you read from stdin or a socket, or write to a file, then there is only the stream's current position. All other iterators pointing to the same underlying sequence are invalidated when you increment an iterator.
Forward iterators allow multi-pass traversal, the additional guarantee you need: they ensure that you can copy your iterator, increment the original, and the copy will still point to the old position, so you can iterate from there.

vector::erase and reverse_iterator

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).
Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:
grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).
std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
iterate(vect_rit = it->rbegin() -> it->rend())
{
// ...
what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
if (what_to_delete.size() > threshold)
what_to_delete.erase(what_to_delete.begin());
// ...
}
}
Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.
When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:
The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)
Which means that to delete a vect_rit, I use:
vector.erase(--vect_rit.base());
Now, according to C++11 standard §23.3.6.5/3:
iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.
How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?
Edit:
Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.
Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?
Thanks!
In the question you have already quoted exactly what the standard says a reverse_iterator is:
The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)
Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.
If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.
As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).
You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1
Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.
However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:
vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));
From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:
namespace std {
template <class Iterator>
class reverse_iterator ...
{
...
Iterator base() const; // explicit
...
protected:
Iterator current;
...
};
}
And from §24.5.1.3.3:
Iterator base() const; // explicit
Returns: current.
Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.
Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.
Just in case that's clear as mud, let me say that differently:
std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;
// Erase everything after it_1's pointee:
// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();
// but that points one too far in, so decrement;
--tmp_it;
// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());
// perform erasure
v.erase(tmp_it); // invalidates all iterators pointing at or past *tmp_it
// (like, say it_2.base()...)
// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!
// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);
In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.
The reverse_iterator, just like the normal iterator, points to a certain position in the vector. Implementation details are irrelevant, but if you must know, they both are (in a typical implementation) just plain old pointers inside. The difference is the direction. The reverse iterator has its + and - reversed w.r.t. the regular iterator (and also ++ and --, > and < etc).
This is interesting to know, but doesn't really imply an answer to the main question.
If you read the language carefully, it says:
Invalidates iterators and references at or after the point of the erase.
References do not have a built-in sense of direction. Hence, the language clearly refers to the container's own sense of direction. Positions after the point of the erase are those with higher indices. Hence, the iterator's direction is irrelevant here.

Comparing default-constructed iterators with operator==

Does the C++ Standard say I should be able to compare two default-constructed STL iterators for equality? Are default-constructed iterators equality-comparable?
I want the following, using std::list for example:
void foo(const std::list<int>::iterator iter) {
if (iter == std::list<int>::iterator()) {
// Something
}
}
std::list<int>::iterator i;
foo(i);
What I want here is something like a NULL value for iterators, but I'm not sure if it's legal. In the STL implementation included with Visual Studio 2008, they include assertions in std::list's operator==() that preclude this usage. (They check that each iterator is "owned" by the same container and default-constructed iterators have no container.) This would hint that it's not legal, or perhaps that they're being over-zealous.
OK, I'll take a stab. The C++ Standard, Section 24.1/5:
Iterators can also have singular
values that are not associated with
any container. [Example: After the
declaration of an uninitialized
pointer x (as with int* x;), x must
always be assumed to have a singular
value of a pointer. ] Results of most
expressions are undefined for singular
values; the only excep- tion is an
assignment of a non-singular value to
an iterator that holds a singular
value.
So, no, they can't be compared.
This is going to change in C++14. [forward.iterators] 24.2.5p2 of N3936 says
However, value-initialized iterators may be compared and shall compare
equal to other value-initialized iterators of the same type.
I believe you should pass a range to the function.
void fun(std::list<int>::iterator beg, std::list<int>::iterator end)
{
while(beg != end)
{
// do what you want here.
beg++;
}
}
Specification says that the postcondition of default constructor is that iterator is singular. The comparison for equality are undefined, so it may be different in some implementation.

Does pop_back() really invalidate *all* iterators on an std::vector?

std::vector<int> ints;
// ... fill ints with random values
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
{
*it = ints.back();
ints.pop_back();
continue;
}
it++;
}
This code is not working because when pop_back() is called, it is invalidated. But I don't find any doc talking about invalidation of iterators in std::vector::pop_back().
Do you have some links about that?
The call to pop_back() removes the last element in the vector and so the iterator to that element is invalidated. The pop_back() call does not invalidate iterators to items before the last element, only reallocation will do that. From Josuttis' "C++ Standard Library Reference":
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
Here is your answer, directly from The Holy Standard:
23.2.4.2 A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.1) and of a sequence, including most of the optional sequence requirements (23.1.1).
23.1.1.12 Table 68
expressiona.pop_back()
return typevoid
operational semanticsa.erase(--a.end())
containervector, list, deque
Notice that a.pop_back is equivalent to a.erase(--a.end()). Looking at vector's specifics on erase:
23.2.4.3.3 - iterator erase(iterator position) - effects - Invalidates all the iterators and references after the point of the erase
Therefore, once you call pop_back, any iterators to the previously final element (which now no longer exists) are invalidated.
Looking at your code, the problem is that when you remove the final element and the list becomes empty, you still increment it and walk off the end of the list.
(I use the numbering scheme as used in the C++0x working draft, obtainable here
Table 94 at page 732 says that pop_back (if it exists in a sequence container) has the following effect:
{ iterator tmp = a.end();
--tmp;
a.erase(tmp); }
23.1.1, point 12 states that:
Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container
member function or passing a container as an argument to a library function shall not invalidate iterators to, or change
the values of, objects within that container.
Both accessing end() as applying prefix-- have no such effect, erase() however:
23.2.6.4 (concerning vector.erase() point 4):
Effects: Invalidates iterators and references at or after the point of the erase.
So in conclusion: pop_back() will only invalidate an iterator to the last element, per the standard.
Here is a quote from SGI's STL documentation (http://www.sgi.com/tech/stl/Vector.html):
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
I think it follows that pop_back only invalidates the iterator pointing at the last element and the end() iterator. We really need to see the data for which the code fails, as well as the manner in which it fails to decide what's going on. As far as I can tell, the code should work - the usual problem in such code is that removal of element and ++ on iterator happen in the same iteration, the way #mikhaild points out. However, in this code it's not the case: it++ does not happen when pop_back is called.
Something bad may still happen when it is pointing to the last element, and the last element is less than 10. We're now comparing an invalidated it and end(). It may still work, but no guarantees can be made.
Iterators are only invalidated on reallocation of storage. Google is your friend: see footnote 5.
Your code is not working for other reasons.
pop_back() invalidates only iterators that point to the last element. From C++ Standard Library Reference:
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
So to answer your question, no it does not invalidate all iterators.
However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.
If iterators are implemented as pure pointers (as they would in probably all non-debug STL vector cases), your code should just work. If iterators are more than pointers, then your code does not handle this case of removing the last element correctly.
Error is that when "it" points to the last element of vector and if this element is less than 10, this last element is removed. And now "it" points to ints.end(), next "it++" moves pointer to ints.end()+1, so now "it" running away from ints.end(), and you got infinite loop scanning all your memory :).
The "official specification" is the C++ Standard. If you don't have access to a copy of C++03, you can get the latest draft of C++0x from the Committee's website: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2723.pdf
The "Operational Semantics" section of container requirements specifies that pop_back() is equivalent to { iterator i = end(); --i; erase(i); }. the [vector.modifiers] section for erase says "Effects: Invalidates iterators and references at or after the point of the erase."
If you want the intuition argument, pop_back is no-fail (since destruction of value_types in standard containers are not allowed to throw exceptions), so it cannot do any copy or allocation (since they can throw), which means that you can guess that the iterator to the erased element and the end iterator are invalidated, but the remainder are not.
pop_back() will only invalidate it if it was pointing to the last item in the vector. Your code will therefore fail whenever the last int in the vector is less than 10, as follows:
*it = ints.back(); // Set *it to the value it already has
ints.pop_back(); // Invalidate the iterator
continue; // Loop round and access the invalid iterator
You might want to consider using the return value of erase instead of swapping the back element to the deleted position an popping back. For sequences erase returns an iterator pointing the the element one beyond the element being deleted. Note that this method may cause more copying than your original algorithm.
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
it = ints.erase( it );
else
++it;
}
std::remove_if could also be an alternative solution.
struct LessThanTen { bool operator()( int n ) { return n < 10; } };
ints.erase( std::remove_if( ints.begin(), ints.end(), LessThanTen() ), ints.end() );
std::remove_if is (like my first algorithm) stable, so it may not be the most efficient way of doing this, but it is succinct.
Check out the information here (cplusplus.com):
Delete last element
Removes the last element in the vector, effectively reducing the vector size by one and invalidating all iterators and references to it.