STL list::splice - iterator validity

STL list::splice - iterator validity - c++

I'm reading how "list::splice" works and I don't understand something:
mylist1.splice (it, mylist2); // mylist1: 1 10 20 30 2 3 4
// mylist2 (empty)
// "it" still points to 2 (the 5th element)
mylist2.splice (mylist2.begin(),mylist1, it);
// mylist1: 1 10 20 30 3 4
// mylist2: 2
// "it" is now invalid.
it = mylist1.begin();
std::advance(it,3); // "it" points now to 30
mylist1.splice ( mylist1.begin(), mylist1, it, mylist1.end());
// mylist1: 30 3 4 1 10 20
in the first and third splice the it iterator is still valid, but why isn't it in the second splice?
According to the documentation:
Iterator validity
No changes on the iterators, pointers and references
related to the container before the call. The iterators, pointers and
references that referred to transferred elements keep referring to
those same elements, but iterators now iterate into the container the
elements have been transferred to.
thus it should still be valid

It's only a guess, but they might have written that to imply that it is now "invalid" in the sense that it is no longer a valid iterator of mylist1, but instead becomes a valid iterator of mylist2.
But still, and I guess you already knew that, it is a valid iterator, so the wording is misleading. You need to be careful, though, as it means that after the second splice-operation, for example, you can no longer do:
std::distance( mylist1.begin(), it );
but need to use
std::distance( mylist2.begin(), it );
as the first would be illegal.
The standard clearly defines it that way in:
23.3.5.5 list operations [list.ops]
void splice(const_iterator position, list& x, const_iterator i);
void splice(const_iterator position, list&& x, const_iterator i);
7 Effects: Inserts an element pointed to by i from list x before position and removes the element from x. The result is unchanged if position == i or position == ++i. Pointers and references to *i continue to refer to this same element but as a member of *this. Iterators to *i (including i itself) continue to refer to the same element, but now behave as iterators into *this, not into x.
So, if your compiler/STL invalidates the iterator, this is clearly a bug.

Apparently (since I'm using MSVC2012) the behavior is different:
http://msdn.microsoft.com/en-us/library/72fb8wzd.aspx
In all cases, only iterators or references that point at spliced
elements become invalid.
Thus when I have iterators to elements that get moved from one container to another, these iterators become invalid.
I'd be interested in knowing if this behavior is the standard one, though.

Related

Guarantees of reordering of a vector

Say I have this code:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec {10, 15, 20};
auto itr = vec.begin();
vec.erase(itr);
for(const auto& element : vec)
{
std::cout << element << " ";
}
return 0;
}
This gives me 15 20 as expected. Now, cppreference says this about erase():
Invalidates iterators and references at or after the point of the
erase, including the end() iterator
Fair enough, but is that the only guarantee the standard gives about vector::erase()?
Is a vector allowed to reorder it's element after the erased iterator?
For example, are these conditions guaranteed to hold after the erase which would mean all elements after the erase() iterator shifted 1 to the left:
vec[0] == 15
vec[1] == 20
Or are implementations allowed to move values around as they see fit, and thus, create scenarios where vec[0] == 20 etc?
I would like a quote of the relevant part of the standard.

Let's start at the beginning:
23.2.3 Sequence containers
A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement.
The library provides four basic kinds of sequence containers: vector,
forward_list, list, and deque.
Emphasis on "a strictly linear arrangement". This is unambiguous.
This definition is followed by a table called "sequence container requirements", which describes erase() thusly:
a.erase(q) [ ... ]
Effects: Erases the element pointed to by q
Combined, this leaves no wiggle room for interpretation. The elements in a vector are always in "a strict linear arrangement", so when one of them is erase()d, there's only one possible outcome.

Technically, no, the standard doesn't write out a promise that it won't re-order elements when you least expect.
Practically, obviously it's not going to do that. That would be ridiculous.
Legally, you can probably take the "Effects" clause:
Erases the element pointed to by q
as having no other effects unless stated elsewhere (e.g. iterator invalidation, which follows from the erasure effect).

The two statements I found that I think guarantee it would be:
C++11 Standard
23.2.1
11 Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.
If you can't "change the values of" then you can't arbitrarily re-order elements (like swapping the end value with the erased one).
23.2.3
12 The iterator returned from a.erase(q) points to the element immediately following q prior to the element being erased. If no such element exists, a.end() is returned.
This implies that the conceptual erasure of an element is implemented by closing the physical gap from the right. Given the previous rule, conceptually closing the gap, can not be seen as conceptually changing their values. This means the only implementation would be to shift the values in order.
By means of explanation.
The Standard is dealing with the abstract concept not the actual implementation, although its statements do impact the implementation.
Conceptually erasing an element simply removes it and nothing more. So given the sequence:
3 5 7 4 2 9 (6 values)
If we erase the 3rd element what does that conceptually give us?
3 5 4 2 9 (5 values)
This must be true because of the first statement above:
23.2.1
11 Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.
If the implementation reordered the elements, say by swapping the erased element with the end element that rule would be broken because we would end up with this:
3 5 9 4 2
Conceptually the resulting value to the right of the erased element has changed from a 4 to a 9, thus breaking the rule.

What effect does "insert" have on references to deques?

I'm using a deque in one of my C++ programs, and was reading the documentation for insert on cppreference.com. Most of it made sense except this bit:
All iterators, including the past-the-end iterator, are invalidated.
References are invalidated too, unless pos == begin() or pos == end(),
in which case they are not invalidated.
What does this mean? Is this saying that references to the deque itself are invalidated, or references to its elements, or references to the iterators? Or something else entirely?
Here's a link the doc in question: http://en.cppreference.com/w/cpp/container/deque/insert

A deque is an object. It is a container, therefore it contains other objects within it's internal storage. Those are elements stored in a deque.
You can access those elements. Accessing an element is basically getting a reference back from the container. If you check it, all of the methods under element access section return a reference type.
You can make a copy of accessed element, but you can store the reference itself. T foo = d.front(); vs T& bar = d.front();. (Let d be some std::deque<T>)
A reference to a deque would be auto& ref_d = &d;. This is something else.
So:
1. What effect does “insert” have on references to deques?
None. references to d are fine.
2. What does this mean?
A deque is designed in such a way that inserting at the beginning or at the end of it does not invalidate the references to the elements which you might have already stored. Though if you insert in the middle, the elements might move in memory. Note that the bar is not touched. Precisely because it cannot be, it gets invalidated. Previously obtained reference (or iterator) doesn't point to anything meaningful anymore, thus dereferencing it is illegal.
3. Is this saying that references to the deque itself are invalidated?
Nope, as in 1.
4. or references to its elements [are invalidated]?
Yes, as in 2.
5. or references to the iterators [are invalidated]?
You again seem to confuse what is what. A reference to an iterator would be std::deque<T>::iterator& iter_ref;, if you obtain an iterator from a deque. E.g. auto iter = d.begin(); and make a reference to it iter_ref = &iter;, an insert doesn't make *iter_ref illegal, it invalidates the iterator, so *iter is illegal (or **ref_iter).
Note: I am not saying that something like std::deque<T>& ref_d or std::deque<T>::iterator& iter_ref make sense, but this is semantical meaning of "reference to a deque" and "reference to an interator".

Erase by iterator on a C++ STL map

I'm curious about the rationale behind the following code. For a given map, I can delete a range up to, but not including, end() (obviously,) using the following code:
map<string, int> myMap;
myMap["one"] = 1;
myMap["two"] = 2;
myMap["three"] = 3;
map<string, int>::iterator it = myMap.find("two");
myMap.erase( it, myMap.end() );
This erases the last two items using the range. However, if I used the single iterator version of erase, I half expected passing myMap.end() to result in no action as the iterator was clearly at the end of the collection. This is as distinct from a corrupt or invalid iterator which would clearly lead to undefined behaviour.
However, when I do this:
myMap.erase( myMap.end() );
I simply get a segmentation fault. I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case. Is there some subtle reason for this that I'm missing? I noticed that even this works:
myMap.erase( myMap.end(), myMap.end() );
(i.e. does nothing)
The reason I ask is that I have some code which receives a valid iterator to the collection (but which could be end()) and I wanted to simply pass this into erase rather than having to check first like this:
if ( it != myMap.end() )
myMap.erase( it );
which seems a bit clunky to me. The alternative is to re code so I can use the by-key-type erase overload but I'd rather not re-write too much if I can help it.

The key is that in the standard library ranges determined by two iterators are half-opened ranges. In math notation [a,b) They include the first but not the last iterator (if both are the same, the range is empty). At the same time, end() returns an iterator that is one beyond the last element, which perfectly matches the half-open range notation.
When you use the range version of erase it will never try to delete the element referenced by the last iterator. Consider a modified example:
map<int,int> m;
for (int i = 0; i < 5; ++i)
m[i] = i;
m.erase( m.find(1), m.find(4) );
At the end of the execution the map will hold two keys 0 and 4. Note that the element referred by the second iterator was not erased from the container.
On the other hand, the single iterator operation will erase the element referenced by the iterator. If the code above was changed to:
for (int i = 1; i <= 4; ++i )
m.erase( m.find(i) );
The element with key 4 will be deleted. In your case you will attempt to delete the end iterator that does not refer to a valid object.
I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case.
No, it is not hard to do, but the function was designed with a different contract in mind: the caller must pass in an iterator into an element in the container. Part of the reason for this is that in C++ most of the features are designed so that the incur the minimum cost possible, allowing the user to balance the safety/performance on their side. The user can test the iterator before calling erase, but if that test was inside the library then the user would not be able to opt out of testing when she knows that the iterator is valid.

n3337 23.2.4 Table 102
a.erase( q1, q2)
erases all the elements in the range [q1,q2). Returns q2.
So, iterator returning from map::end() is not in range in case of myMap.erase(myMap.end(), myMap.end());
a.erase(q)
erases the element pointed to by q. Returns an iterator pointing to the element immediately following q prior to the element being erased. If no such element exists, returns a.end().
I wouldn't have thought it difficult for map to check whether the
iterator equalled end() and not take action in that case. Is there
some subtle reason for this that I'm missing?
Reason is same, that std::vector::operator[] can don't check, that index is in range, of course.

When you use two iterators to specify a range, the range consists of the elements from the element that the first iterator points to up to but not including the element that the second iterator points to. So erase(it, myMap.end()) says to erase everything from it up to but not including end(). You could equally well pass an iterator that points to a "real" element as the second one, and the element that that iterator points to would not be erased.
When you use erase(it) it says to erase the element that it points to. The end() iterator does not point to a valid element, so erase(end()) doesn't do anything sensible. It would be possible for the library to diagnose this situation, and a debugging library will do that, but it imposes a cost on every call to erase to check what the iterator points to. The standard library doesn't impose that cost on users. You're on your own. <g>

vector::erase and reverse_iterator

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).
Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:
grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).
std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
iterate(vect_rit = it->rbegin() -> it->rend())
{
// ...
what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
if (what_to_delete.size() > threshold)
what_to_delete.erase(what_to_delete.begin());
// ...
}
}
Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.
When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:
The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)
Which means that to delete a vect_rit, I use:
vector.erase(--vect_rit.base());
Now, according to C++11 standard §23.3.6.5/3:
iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.
How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?
Edit:
Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.
Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?
Thanks!

In the question you have already quoted exactly what the standard says a reverse_iterator is:
The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)
Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.
If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.
As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).
You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1
Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.
However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:
vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));

From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:
namespace std {
template <class Iterator>
class reverse_iterator ...
{
...
Iterator base() const; // explicit
...
protected:
Iterator current;
...
};
}
And from §24.5.1.3.3:
Iterator base() const; // explicit
Returns: current.
Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.
Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.
Just in case that's clear as mud, let me say that differently:
std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;
// Erase everything after it_1's pointee:
// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();
// but that points one too far in, so decrement;
--tmp_it;
// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());
// perform erasure
v.erase(tmp_it); // invalidates all iterators pointing at or past *tmp_it
// (like, say it_2.base()...)
// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!
// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);
In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.

The reverse_iterator, just like the normal iterator, points to a certain position in the vector. Implementation details are irrelevant, but if you must know, they both are (in a typical implementation) just plain old pointers inside. The difference is the direction. The reverse iterator has its + and - reversed w.r.t. the regular iterator (and also ++ and --, > and < etc).
This is interesting to know, but doesn't really imply an answer to the main question.
If you read the language carefully, it says:
Invalidates iterators and references at or after the point of the erase.
References do not have a built-in sense of direction. Hence, the language clearly refers to the container's own sense of direction. Positions after the point of the erase are those with higher indices. Hence, the iterator's direction is irrelevant here.

Does pop_back() really invalidate all iterators on an std::vector?

std::vector<int> ints;
// ... fill ints with random values
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
{
*it = ints.back();
ints.pop_back();
continue;
}
it++;
}
This code is not working because when pop_back() is called, it is invalidated. But I don't find any doc talking about invalidation of iterators in std::vector::pop_back().
Do you have some links about that?

The call to pop_back() removes the last element in the vector and so the iterator to that element is invalidated. The pop_back() call does not invalidate iterators to items before the last element, only reallocation will do that. From Josuttis' "C++ Standard Library Reference":
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.

Here is your answer, directly from The Holy Standard:
23.2.4.2 A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.1) and of a sequence, including most of the optional sequence requirements (23.1.1).
23.1.1.12 Table 68
expressiona.pop_back()
return typevoid
operational semanticsa.erase(--a.end())
containervector, list, deque
Notice that a.pop_back is equivalent to a.erase(--a.end()). Looking at vector's specifics on erase:
23.2.4.3.3 - iterator erase(iterator position) - effects - Invalidates all the iterators and references after the point of the erase
Therefore, once you call pop_back, any iterators to the previously final element (which now no longer exists) are invalidated.
Looking at your code, the problem is that when you remove the final element and the list becomes empty, you still increment it and walk off the end of the list.

(I use the numbering scheme as used in the C++0x working draft, obtainable here
Table 94 at page 732 says that pop_back (if it exists in a sequence container) has the following effect:
{ iterator tmp = a.end();
--tmp;
a.erase(tmp); }
23.1.1, point 12 states that:
Unless otherwise speciﬁed (either explicitly or by deﬁning a function in terms of other functions), invoking a container
member function or passing a container as an argument to a library function shall not invalidate iterators to, or change
the values of, objects within that container.
Both accessing end() as applying prefix-- have no such effect, erase() however:
23.2.6.4 (concerning vector.erase() point 4):
Effects: Invalidates iterators and references at or after the point of the erase.
So in conclusion: pop_back() will only invalidate an iterator to the last element, per the standard.

Here is a quote from SGI's STL documentation (http://www.sgi.com/tech/stl/Vector.html):
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
I think it follows that pop_back only invalidates the iterator pointing at the last element and the end() iterator. We really need to see the data for which the code fails, as well as the manner in which it fails to decide what's going on. As far as I can tell, the code should work - the usual problem in such code is that removal of element and ++ on iterator happen in the same iteration, the way #mikhaild points out. However, in this code it's not the case: it++ does not happen when pop_back is called.
Something bad may still happen when it is pointing to the last element, and the last element is less than 10. We're now comparing an invalidated it and end(). It may still work, but no guarantees can be made.

Iterators are only invalidated on reallocation of storage. Google is your friend: see footnote 5.
Your code is not working for other reasons.

pop_back() invalidates only iterators that point to the last element. From C++ Standard Library Reference:
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
So to answer your question, no it does not invalidate all iterators.
However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.
If iterators are implemented as pure pointers (as they would in probably all non-debug STL vector cases), your code should just work. If iterators are more than pointers, then your code does not handle this case of removing the last element correctly.

Error is that when "it" points to the last element of vector and if this element is less than 10, this last element is removed. And now "it" points to ints.end(), next "it++" moves pointer to ints.end()+1, so now "it" running away from ints.end(), and you got infinite loop scanning all your memory :).

The "official specification" is the C++ Standard. If you don't have access to a copy of C++03, you can get the latest draft of C++0x from the Committee's website: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2723.pdf
The "Operational Semantics" section of container requirements specifies that pop_back() is equivalent to { iterator i = end(); --i; erase(i); }. the [vector.modifiers] section for erase says "Effects: Invalidates iterators and references at or after the point of the erase."
If you want the intuition argument, pop_back is no-fail (since destruction of value_types in standard containers are not allowed to throw exceptions), so it cannot do any copy or allocation (since they can throw), which means that you can guess that the iterator to the erased element and the end iterator are invalidated, but the remainder are not.

pop_back() will only invalidate it if it was pointing to the last item in the vector. Your code will therefore fail whenever the last int in the vector is less than 10, as follows:
*it = ints.back(); // Set *it to the value it already has
ints.pop_back(); // Invalidate the iterator
continue; // Loop round and access the invalid iterator

You might want to consider using the return value of erase instead of swapping the back element to the deleted position an popping back. For sequences erase returns an iterator pointing the the element one beyond the element being deleted. Note that this method may cause more copying than your original algorithm.
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
it = ints.erase( it );
else
++it;
}
std::remove_if could also be an alternative solution.
struct LessThanTen { bool operator()( int n ) { return n < 10; } };
ints.erase( std::remove_if( ints.begin(), ints.end(), LessThanTen() ), ints.end() );
std::remove_if is (like my first algorithm) stable, so it may not be the most efficient way of doing this, but it is succinct.

Check out the information here (cplusplus.com):
Delete last element
Removes the last element in the vector, effectively reducing the vector size by one and invalidating all iterators and references to it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

STL list::splice - iterator validity - c++

Related

Guarantees of reordering of a vector

What effect does "insert" have on references to deques?

Erase by iterator on a C++ STL map

vector::erase and reverse_iterator

Does pop_back() really invalidate all iterators on an std::vector?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

STL list::splice - iterator validity - c++

Related

Guarantees of reordering of a vector

What effect does "insert" have on references to deques?

Erase by iterator on a C++ STL map

vector::erase and reverse_iterator

Does pop_back() really invalidate *all* iterators on an std::vector?

Categories

Resources

Does pop_back() really invalidate all iterators on an std::vector?