Where does the C++ standard declare that the pair of iterators passed to std::vector::insert must not overlap the original sequence?
Edit: To elaborate, I'm pretty sure that the standard does not require the standard library to handle situations like this:
std::vector<int> v(10);
std::vector<int>::iterator first = v.begin() + 5;
std::vector<int>::iterator last = v.begin() + 8;
v.insert(v.begin() + 2, first, last);
However, I was unable to find anything in the standard, that would prohibit the ranges [first, last) and [v.begin(), v.end()) to overlap.
23.1.1/4 Sequence requirements has:
expression: a.insert(p,i,j)
return type: void
precondition: i,j are not iterators into a. inserts copies of elements in[i,j) before p.
So i and j cannot be iterators into your vector.
It makes sense, as during the insert operation, the vector may need to resize itself, and so the existing elements may first be copied to a new memory location (there by invalidating the current iterators).
Consider the behavior if it was allowed. Every insert into the vector would both increase the distance between the start and end iterator by one and move the start iterator up one. Therefore the start iterator would never reach the end iterator and the algorithm would execute until an out of memory exception occurred.
Related
What will happen when erasing a range whose first iterator is after the last iterator?
std::vector<int> v{1,2};
v.erase(
v.begin() + 1,
v.begin()
);
What about other containers?
It's undefined behavior to call erase with an invalid range on any container. In practice, it will generally crash the program if you are lucky, or smash adjacent memory if you're unlucky.
This should be true for just about any API that accepts an iterator range. If a range is invalid, there is no way for the underlying code/algorithm to be aware of what the stopping condition actually is.
Iterator ranges delimit the start and end of a range for any input or algorithm. The end iterator is always used to indicate the completion of that range, and must always be reachable by repeatedly incrementing the first iterator (e.g. by calling operator++).
Most algorithms make use of operator!= to detect the completion of the range, from the LegacyInputIterator requirement. Some ranges may optionally make use of the distance between iterators if that that range is a LegacyRandomAccessIterator.
In either case, this detection requires that the first iterator be before the last, otherwise code like:
for (auto it = first; first != last; ++first) { ... }
will never reach the end of the range, and similarly:
auto distance = last - first;
will return an incorrect value for the distance between iterators.
I'm sure that I'm not alone in expecting that I could add several elements in some order to a vector or list, and then could use an iterator to retrieve those elements in the same order. For example, in:
#include <vector>
#include <cassert>
int main(int argc, char **argv)
{
using namespace std;
vector<int> v;
v.push_back(4);
v.push_back(10);
v.push_back(100);
auto i = v.begin();
assert(*i++ == 4);
assert(*i++ == 10);
assert(*i++ == 100);
return 0;
}
... all assertions should pass and the program should terminate normally (assuming that no std::bad_alloc exception is thrown during construction of the vector or adding the elements to it).
However, I'm having trouble reconciling this with any requirement in the C++ standard (I'm looking at C++11, but would like answers for other standards also if they are markedly different).
The requirement for begin() is just (23.2.1 para 6):
begin() returns an iterator referring to the first element in the container.
What I'm looking for is the requirement, or combination of requirements that in turn logically requires, that if i = v.begin(), then ++i shall refer to the second element in the vector (assuming that such an element exists) - or indeed, even the requirement that successive increments of an iterator will return each of the elements in the vector.
Edit:
A more general question is, what (if any) text in the standard requires that successfully incrementing an iterator obtained by calling begin() on a sequence (ordered or unordered) actually visits every element of the sequence?
There's isn't in the standard something straightforward to state that
if i = v.begin(), then ++i shall refer to the second element in the
vector.
However, for vector's iterators why can imply it from the following wording in the draft standard N4527 24.2.1/p5 In general [iterator.requirements.general]:
Iterators that further satisfy the requirement that, for integral
values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous
iterators.
Now, std::vector's iterator satisfy this requirement, consequently we can imply that ++i is equivalent to i + 1 and thus to addressof(*i) + 1. Which indeed is the second element in the vector due to its contiguous nature.
Edit:
There was indeed a turbidness on the matter about random access iterators and contiguous storage containers in C++11 and C++14 standards. Thus, the commity decided to refine them by putting an extra group of iterators named contiguous iterators. You can find more info in the relative proposal N3884.
It looks to me like we need to put two separate parts of the standard together to get a solid requirement here. We can start with table 101, which requires that a[n] be equivalent to *(a.begin() + n) for sequence containers (specifically, basic_string, array, deqeue and vector) (and the same requirement for a.at(n), for the same containers).
Then we look at table 111 in [random.access.iterators], where it requires that the expression r += n be equivalent to:
{
difference_type m = n;
if (m >= 0)
while (m--)
++r;
else
while (m++)
--r;
return r;
}
[indentation added]
Between the two, these imply that for any n, *(begin() + n) refers to the nth item in the vector. Just in case you want to cover the last base I see open, let's cover the requirement that push_back actually append to the collection. That's also in table 101: a.push_back(t) "Appends a copy of t" (again for basic_string, string, deque, list, and vector).
[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..]
I don't know how else you'd interpret it.
The specification isn't just in the iterators. It is also in the specification of the containers, and the operations that modify those containers.
The thing is, you are not going to find a single clause that says "incrementing begin() repeatedly will access all elements of a vector in order". You need to look at the specification of every operation on every container (since these define an order of elements in the container) and the specification of iterators (and operations on them) which is essentially that "incrementing moves to the next element in the order that operations on the container defined, until we pass the end". It is the combination of numerous clauses in the standard that give the end effect.
The general concepts, however, are ....
All containers maintain some range of zero or more elements. That range has three key properties: a beginning (corresponding to the first element in an order that is meaningful to the container), and an end (corresponding to the last element), and an order (which determines the sequence in which elements will be retrieved one after the other - i.e. defines the meaning of "next").
An iterator is an object that either references an element in a range, or has a "past the end" value. An iterator that references an element in the range other than the end (last), when incremented, will reference the next element. An iterator that references the end (last) element in the range, when incremented, will be an end (past the end) iterator.
The begin() method returns an iterator that references (or points to) the first in the range (or an end iterator if the range has zero elements). The end() method returns an end iterator - one that corresponds to "one past the the end of the range". That means, if an iterator is initialised using the begin(), incrementing it repeatedly will move sequentially through the range until the end iterator is reached.
Then it is necessary to look at the specification for the various modifiers of the container - the member functions that add or remove elements. For example, push_back() is specified as adding an element to the end of the existing range for that container. It extends the range by adding an element to the end.
It is that combination of specifications - of iterators and of operations that modify containers - that guarantees the order. The net effect is that, if elements are added to a container in some order, then a iterator initialised using begin() will - when incremented repeatedly - reference the elements in the order in which they were placed in the container.
Obviously, some container modifiers are a bit more complicated - for example, std::vector's insert() is given an iterator, and adds elements there, shuffling subsequent elements to make room. However, the key point is that the modifiers place elements into the container in a defined order (or remove, in the case of operations like std::vector::erase()) and iterators will access elements in that defined order.
I'm curious about the rationale behind the following code. For a given map, I can delete a range up to, but not including, end() (obviously,) using the following code:
map<string, int> myMap;
myMap["one"] = 1;
myMap["two"] = 2;
myMap["three"] = 3;
map<string, int>::iterator it = myMap.find("two");
myMap.erase( it, myMap.end() );
This erases the last two items using the range. However, if I used the single iterator version of erase, I half expected passing myMap.end() to result in no action as the iterator was clearly at the end of the collection. This is as distinct from a corrupt or invalid iterator which would clearly lead to undefined behaviour.
However, when I do this:
myMap.erase( myMap.end() );
I simply get a segmentation fault. I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case. Is there some subtle reason for this that I'm missing? I noticed that even this works:
myMap.erase( myMap.end(), myMap.end() );
(i.e. does nothing)
The reason I ask is that I have some code which receives a valid iterator to the collection (but which could be end()) and I wanted to simply pass this into erase rather than having to check first like this:
if ( it != myMap.end() )
myMap.erase( it );
which seems a bit clunky to me. The alternative is to re code so I can use the by-key-type erase overload but I'd rather not re-write too much if I can help it.
The key is that in the standard library ranges determined by two iterators are half-opened ranges. In math notation [a,b) They include the first but not the last iterator (if both are the same, the range is empty). At the same time, end() returns an iterator that is one beyond the last element, which perfectly matches the half-open range notation.
When you use the range version of erase it will never try to delete the element referenced by the last iterator. Consider a modified example:
map<int,int> m;
for (int i = 0; i < 5; ++i)
m[i] = i;
m.erase( m.find(1), m.find(4) );
At the end of the execution the map will hold two keys 0 and 4. Note that the element referred by the second iterator was not erased from the container.
On the other hand, the single iterator operation will erase the element referenced by the iterator. If the code above was changed to:
for (int i = 1; i <= 4; ++i )
m.erase( m.find(i) );
The element with key 4 will be deleted. In your case you will attempt to delete the end iterator that does not refer to a valid object.
I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case.
No, it is not hard to do, but the function was designed with a different contract in mind: the caller must pass in an iterator into an element in the container. Part of the reason for this is that in C++ most of the features are designed so that the incur the minimum cost possible, allowing the user to balance the safety/performance on their side. The user can test the iterator before calling erase, but if that test was inside the library then the user would not be able to opt out of testing when she knows that the iterator is valid.
n3337 23.2.4 Table 102
a.erase( q1, q2)
erases all the elements in the range [q1,q2). Returns q2.
So, iterator returning from map::end() is not in range in case of myMap.erase(myMap.end(), myMap.end());
a.erase(q)
erases the element pointed to by q. Returns an iterator pointing to the element immediately following q prior to the element being erased. If no such element exists, returns a.end().
I wouldn't have thought it difficult for map to check whether the
iterator equalled end() and not take action in that case. Is there
some subtle reason for this that I'm missing?
Reason is same, that std::vector::operator[] can don't check, that index is in range, of course.
When you use two iterators to specify a range, the range consists of the elements from the element that the first iterator points to up to but not including the element that the second iterator points to. So erase(it, myMap.end()) says to erase everything from it up to but not including end(). You could equally well pass an iterator that points to a "real" element as the second one, and the element that that iterator points to would not be erased.
When you use erase(it) it says to erase the element that it points to. The end() iterator does not point to a valid element, so erase(end()) doesn't do anything sensible. It would be possible for the library to diagnose this situation, and a debugging library will do that, but it imposes a cost on every call to erase to check what the iterator points to. The standard library doesn't impose that cost on users. You're on your own. <g>
Is there a defined behavior for container.erase(first,last) when first == last in the STL, or is it undefined?
Example:
std::vector<int> v(1,1);
v.erase(v.begin(),v.begin());
std::cout << v.size(); // 1 or 0?
If there is a Standard Library specification document that has this information I would appreciate a reference to it.
The behavior is well defined.
It is a No-op(No-Operation). It does not perform any erase operation on the container as end is same as begin.
The relevant Quote from the Standard are as follows:
C++03 Standard: 24.1 Iterator requirements and
C++11 Standard: 24.2.1 Iterator requirements
Para 6 & 7 for both:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same container.
Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.A range is a pair of iterators that designate the beginning and end of the computation. A range [i, i) is an empty range; in general, a range [i, j) refers to the elements in the data structure starting with the one pointed to by i and up to but not including the one pointed to by j. Range [i, j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
That would erase nothing at all, just like other algorithms that operate on [, ) ranges.
Even if the container is empty I think that would still work because begin() == end().
Conceptually, there is an ordinary loop from begin to end, with a simple loop condition that checks if the iterator is end already, like this:
void erase (iterator from, iterator to) {
...
while (from != to) erase (from++);
...
}
(however, implementations may vary). As you see, if from==to, then there is no single iteration of the loop body.
It is perfectly defined. It removes all elements from first to last, including first and excluding last. If there are no elements in this range (when first == last), then how much are removed? You guessed it, none.
Though I'm not sure what happens if first comes after last, I suppose this will invoke undefined behaviour.
std::vector<int> ints;
// ... fill ints with random values
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
{
*it = ints.back();
ints.pop_back();
continue;
}
it++;
}
This code is not working because when pop_back() is called, it is invalidated. But I don't find any doc talking about invalidation of iterators in std::vector::pop_back().
Do you have some links about that?
The call to pop_back() removes the last element in the vector and so the iterator to that element is invalidated. The pop_back() call does not invalidate iterators to items before the last element, only reallocation will do that. From Josuttis' "C++ Standard Library Reference":
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
Here is your answer, directly from The Holy Standard:
23.2.4.2 A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.1) and of a sequence, including most of the optional sequence requirements (23.1.1).
23.1.1.12 Table 68
expressiona.pop_back()
return typevoid
operational semanticsa.erase(--a.end())
containervector, list, deque
Notice that a.pop_back is equivalent to a.erase(--a.end()). Looking at vector's specifics on erase:
23.2.4.3.3 - iterator erase(iterator position) - effects - Invalidates all the iterators and references after the point of the erase
Therefore, once you call pop_back, any iterators to the previously final element (which now no longer exists) are invalidated.
Looking at your code, the problem is that when you remove the final element and the list becomes empty, you still increment it and walk off the end of the list.
(I use the numbering scheme as used in the C++0x working draft, obtainable here
Table 94 at page 732 says that pop_back (if it exists in a sequence container) has the following effect:
{ iterator tmp = a.end();
--tmp;
a.erase(tmp); }
23.1.1, point 12 states that:
Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container
member function or passing a container as an argument to a library function shall not invalidate iterators to, or change
the values of, objects within that container.
Both accessing end() as applying prefix-- have no such effect, erase() however:
23.2.6.4 (concerning vector.erase() point 4):
Effects: Invalidates iterators and references at or after the point of the erase.
So in conclusion: pop_back() will only invalidate an iterator to the last element, per the standard.
Here is a quote from SGI's STL documentation (http://www.sgi.com/tech/stl/Vector.html):
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
I think it follows that pop_back only invalidates the iterator pointing at the last element and the end() iterator. We really need to see the data for which the code fails, as well as the manner in which it fails to decide what's going on. As far as I can tell, the code should work - the usual problem in such code is that removal of element and ++ on iterator happen in the same iteration, the way #mikhaild points out. However, in this code it's not the case: it++ does not happen when pop_back is called.
Something bad may still happen when it is pointing to the last element, and the last element is less than 10. We're now comparing an invalidated it and end(). It may still work, but no guarantees can be made.
Iterators are only invalidated on reallocation of storage. Google is your friend: see footnote 5.
Your code is not working for other reasons.
pop_back() invalidates only iterators that point to the last element. From C++ Standard Library Reference:
Inserting or removing elements
invalidates references, pointers, and
iterators that refer to the following
element. If an insertion causes
reallocation, it invalidates all
references, iterators, and pointers.
So to answer your question, no it does not invalidate all iterators.
However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.
If iterators are implemented as pure pointers (as they would in probably all non-debug STL vector cases), your code should just work. If iterators are more than pointers, then your code does not handle this case of removing the last element correctly.
Error is that when "it" points to the last element of vector and if this element is less than 10, this last element is removed. And now "it" points to ints.end(), next "it++" moves pointer to ints.end()+1, so now "it" running away from ints.end(), and you got infinite loop scanning all your memory :).
The "official specification" is the C++ Standard. If you don't have access to a copy of C++03, you can get the latest draft of C++0x from the Committee's website: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2723.pdf
The "Operational Semantics" section of container requirements specifies that pop_back() is equivalent to { iterator i = end(); --i; erase(i); }. the [vector.modifiers] section for erase says "Effects: Invalidates iterators and references at or after the point of the erase."
If you want the intuition argument, pop_back is no-fail (since destruction of value_types in standard containers are not allowed to throw exceptions), so it cannot do any copy or allocation (since they can throw), which means that you can guess that the iterator to the erased element and the end iterator are invalidated, but the remainder are not.
pop_back() will only invalidate it if it was pointing to the last item in the vector. Your code will therefore fail whenever the last int in the vector is less than 10, as follows:
*it = ints.back(); // Set *it to the value it already has
ints.pop_back(); // Invalidate the iterator
continue; // Loop round and access the invalid iterator
You might want to consider using the return value of erase instead of swapping the back element to the deleted position an popping back. For sequences erase returns an iterator pointing the the element one beyond the element being deleted. Note that this method may cause more copying than your original algorithm.
for(std::vector<int>::iterator it = ints.begin(); it != ints.end(); )
{
if(*it < 10)
it = ints.erase( it );
else
++it;
}
std::remove_if could also be an alternative solution.
struct LessThanTen { bool operator()( int n ) { return n < 10; } };
ints.erase( std::remove_if( ints.begin(), ints.end(), LessThanTen() ), ints.end() );
std::remove_if is (like my first algorithm) stable, so it may not be the most efficient way of doing this, but it is succinct.
Check out the information here (cplusplus.com):
Delete last element
Removes the last element in the vector, effectively reducing the vector size by one and invalidating all iterators and references to it.