Is there a defined behavior for container.erase(first,last) when first == last in the STL, or is it undefined?
Example:
std::vector<int> v(1,1);
v.erase(v.begin(),v.begin());
std::cout << v.size(); // 1 or 0?
If there is a Standard Library specification document that has this information I would appreciate a reference to it.
The behavior is well defined.
It is a No-op(No-Operation). It does not perform any erase operation on the container as end is same as begin.
The relevant Quote from the Standard are as follows:
C++03 Standard: 24.1 Iterator requirements and
C++11 Standard: 24.2.1 Iterator requirements
Para 6 & 7 for both:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same container.
Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.A range is a pair of iterators that designate the beginning and end of the computation. A range [i, i) is an empty range; in general, a range [i, j) refers to the elements in the data structure starting with the one pointed to by i and up to but not including the one pointed to by j. Range [i, j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
That would erase nothing at all, just like other algorithms that operate on [, ) ranges.
Even if the container is empty I think that would still work because begin() == end().
Conceptually, there is an ordinary loop from begin to end, with a simple loop condition that checks if the iterator is end already, like this:
void erase (iterator from, iterator to) {
...
while (from != to) erase (from++);
...
}
(however, implementations may vary). As you see, if from==to, then there is no single iteration of the loop body.
It is perfectly defined. It removes all elements from first to last, including first and excluding last. If there are no elements in this range (when first == last), then how much are removed? You guessed it, none.
Though I'm not sure what happens if first comes after last, I suppose this will invoke undefined behaviour.
Related
I'm used to writing loops like this:
for (std::size_t index = 0; index < foo.size(); index++)
{
// Do stuff with foo[index].
}
But when I see iterator loops in others' code, they look like this:
for (Foo::Iterator iterator = foo.begin(); iterator != foo.end(); iterator++)
{
// Do stuff with *Iterator.
}
I find the iterator != foo.end() to be offputting. It can also be dangerous if iterator is incremented by more than one.
It seems more "correct" to use iterator < foo.end(), but I never see that in real code. Why not?
All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.
Thus, the comparison using != is more generic and flexible than the comparison using <.
There are different categories of iterators because not all ranges of elements have the same access properties. For example,
if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);
if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.
The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.
Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.
The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)
Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.
Short answer: Because Iterator is not a number, it's an object.
Longer answer: There are more collections than linear arrays. Trees and hashes, for example, don't really lend themselves to "this index is before this other index". For a tree, two indices that live on separate branches, for example. Or, any two indices in a hash -- they have no order at all, so any order you impose on them is arbitrary.
You don't have to worry about "missing" End(). It is also not a number, it is an object that represents the end of the collection. It doesn't make sense to have an iterator that goes past it, and indeed it cannot.
I'm sure that I'm not alone in expecting that I could add several elements in some order to a vector or list, and then could use an iterator to retrieve those elements in the same order. For example, in:
#include <vector>
#include <cassert>
int main(int argc, char **argv)
{
using namespace std;
vector<int> v;
v.push_back(4);
v.push_back(10);
v.push_back(100);
auto i = v.begin();
assert(*i++ == 4);
assert(*i++ == 10);
assert(*i++ == 100);
return 0;
}
... all assertions should pass and the program should terminate normally (assuming that no std::bad_alloc exception is thrown during construction of the vector or adding the elements to it).
However, I'm having trouble reconciling this with any requirement in the C++ standard (I'm looking at C++11, but would like answers for other standards also if they are markedly different).
The requirement for begin() is just (23.2.1 para 6):
begin() returns an iterator referring to the first element in the container.
What I'm looking for is the requirement, or combination of requirements that in turn logically requires, that if i = v.begin(), then ++i shall refer to the second element in the vector (assuming that such an element exists) - or indeed, even the requirement that successive increments of an iterator will return each of the elements in the vector.
Edit:
A more general question is, what (if any) text in the standard requires that successfully incrementing an iterator obtained by calling begin() on a sequence (ordered or unordered) actually visits every element of the sequence?
There's isn't in the standard something straightforward to state that
if i = v.begin(), then ++i shall refer to the second element in the
vector.
However, for vector's iterators why can imply it from the following wording in the draft standard N4527 24.2.1/p5 In general [iterator.requirements.general]:
Iterators that further satisfy the requirement that, for integral
values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous
iterators.
Now, std::vector's iterator satisfy this requirement, consequently we can imply that ++i is equivalent to i + 1 and thus to addressof(*i) + 1. Which indeed is the second element in the vector due to its contiguous nature.
Edit:
There was indeed a turbidness on the matter about random access iterators and contiguous storage containers in C++11 and C++14 standards. Thus, the commity decided to refine them by putting an extra group of iterators named contiguous iterators. You can find more info in the relative proposal N3884.
It looks to me like we need to put two separate parts of the standard together to get a solid requirement here. We can start with table 101, which requires that a[n] be equivalent to *(a.begin() + n) for sequence containers (specifically, basic_string, array, deqeue and vector) (and the same requirement for a.at(n), for the same containers).
Then we look at table 111 in [random.access.iterators], where it requires that the expression r += n be equivalent to:
{
difference_type m = n;
if (m >= 0)
while (m--)
++r;
else
while (m++)
--r;
return r;
}
[indentation added]
Between the two, these imply that for any n, *(begin() + n) refers to the nth item in the vector. Just in case you want to cover the last base I see open, let's cover the requirement that push_back actually append to the collection. That's also in table 101: a.push_back(t) "Appends a copy of t" (again for basic_string, string, deque, list, and vector).
[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..]
I don't know how else you'd interpret it.
The specification isn't just in the iterators. It is also in the specification of the containers, and the operations that modify those containers.
The thing is, you are not going to find a single clause that says "incrementing begin() repeatedly will access all elements of a vector in order". You need to look at the specification of every operation on every container (since these define an order of elements in the container) and the specification of iterators (and operations on them) which is essentially that "incrementing moves to the next element in the order that operations on the container defined, until we pass the end". It is the combination of numerous clauses in the standard that give the end effect.
The general concepts, however, are ....
All containers maintain some range of zero or more elements. That range has three key properties: a beginning (corresponding to the first element in an order that is meaningful to the container), and an end (corresponding to the last element), and an order (which determines the sequence in which elements will be retrieved one after the other - i.e. defines the meaning of "next").
An iterator is an object that either references an element in a range, or has a "past the end" value. An iterator that references an element in the range other than the end (last), when incremented, will reference the next element. An iterator that references the end (last) element in the range, when incremented, will be an end (past the end) iterator.
The begin() method returns an iterator that references (or points to) the first in the range (or an end iterator if the range has zero elements). The end() method returns an end iterator - one that corresponds to "one past the the end of the range". That means, if an iterator is initialised using the begin(), incrementing it repeatedly will move sequentially through the range until the end iterator is reached.
Then it is necessary to look at the specification for the various modifiers of the container - the member functions that add or remove elements. For example, push_back() is specified as adding an element to the end of the existing range for that container. It extends the range by adding an element to the end.
It is that combination of specifications - of iterators and of operations that modify containers - that guarantees the order. The net effect is that, if elements are added to a container in some order, then a iterator initialised using begin() will - when incremented repeatedly - reference the elements in the order in which they were placed in the container.
Obviously, some container modifiers are a bit more complicated - for example, std::vector's insert() is given an iterator, and adds elements there, shuffling subsequent elements to make room. However, the key point is that the modifiers place elements into the container in a defined order (or remove, in the case of operations like std::vector::erase()) and iterators will access elements in that defined order.
In this question it was explained that std::for_each has undefined behavior when given an invalid iterator range [first, last) (i.e. when last is not reachable by incrementing first).
Presumably this is because a general loop for(auto it = first; it != last; ++it) would run forever on invalid ranges. But for random access iterators this seems an unnecessary restriction because random access iterators have a comparison operator and one could write explicit loops as for(auto it = first; it < last; ++it). This would turn a loop over an invalid range into a no-op.
So my question is: why doesn't the standard allow std::for_each to have well-defined behavior on invalid random access iterator ranges? It would simplify several algorithms which only make sense on multi-element containers (sorting e.g.). Is there a performance penalty for using operator<() instead of operator!=() ?
This would turn a loop over an invalid range into a no-op.
That's not necessarily the case.
One example of an invalid range is when first and last refer to different containers. Comparing such iterators would result in undefined behaviour in at least some cases.
This would turn a loop over an invalid range into a no-op.
You seem to be saying that operator< should always return false for two random-access iterators that are not part of the same range. That's the only way your specified loop would be a no-op.
It doesn't make sense for the standard to specify this. Remember that pointers are random-access iterators. Think about the implementation burden for pointer operations, and the general confusion caused to readers, if it were defined that the following code print "two":
int a[5];
int b[5]; // neither [a,b) nor [b,a) is a valid range
if ((a < b) || (b < a)) {
std::cout << "one\n";
} else {
std::cout << "two\n";
}
Instead, it is left undefined so that people won't write it in the first place.
Because that's the general policy. All using < would allow is things
like:
std::for_each( v.begin() + 20, v.begin() + 10, op );
Even with <, passing an invalid iterator, or iterators from different
containers, is undefined behavior.
This question already has answers here:
What happens if you increment an iterator that is equal to the end iterator of an STL container
(8 answers)
Closed 5 years ago.
What is the behavior of std::advance when you have say:
std::vector<int> foo(10,10);
auto i = foo.begin();
std::advance(i, 20);
What is the value of i? Is it foo.end()?
The standard defines std::advance() in terms of the types of iterator it's being used on (24.3.4 "Iterator operations"):
These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
The requirements for these operations on various iterator types are also outlined in the standard (in Tables 72, 74, 75 and 76):
For an input or forward iterator
++r precondition: r is dereferenceable
for a bidirectional iterator:
--r precondition: there exists s such that r == ++s
For random access iterators, the +, +=, -, and -= operations are defined in terms of the bidirectional & forward iterator prefix ++ and -- operations, so the same preconditions hold.
So advancing an iterator beyond the 'past-the-end' value (as might be returned by the end() function on containers) or advancing before the first dereferenceable element of an iterator's valid range (as might be returned by begin() on a container) is undefined behavior since you're violating the preconditions of the ++ or -- operation.
Since it's undefined behavior you can't 'expect' anything in particular. But you'll likely crash at some point (hopefully sooner rather than later, so you can fix the bug).
According to the C++ Standard §24.3.4 std::advance(i, 20) has the same effect as for ( int n=0; n < 20; ++n ) ++i; for positive n. From the other side (§24.1.3) if i is past-the-end, then ++i operation is undefined. So the result of std::advance(i, 20) is undefined.
You are passing the foo size by advancing to 20th position. Definitely it is not end of the vector. It should invoke undefined behavior on dereferencing, AFAIK.
Edit 1:
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> foo(10,10) ;
std::vector<int>::iterator iter = foo.begin() ;
std::advance(iter,20);
std::cout << *iter << "\n" ;
return 0;
}
Output: 0
If it is the vector's last element, then it should have given 10 on iterator dereferencing. So, it is UB.
IdeOne Results
That is probably undefined behavior. The only thing the standard says is:
Since only random access iterators provide + and - operators, the library provides two function templates advance and distance. These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
template <class InputIterator, class Distance>
void advance(InputIterator& i, Distance n);
Requires: n shall be negative only for bidirectional and random access iterators. Effects: Increments (or decrements for negative n) iterator reference i by n.
From the SGI page for std::advance:
Every iterator between i and i+n
(inclusive) is nonsingular.
Therefore i is not foo.end() and dereferencing will result in undefined behavior.
Notes:
See this question for more details about what (non)singular means when referring to iterators.
I know that the SGI page is not the de-facto standard but pretty much all STL implementations follow those guidelines.
Where does the C++ standard declare that the pair of iterators passed to std::vector::insert must not overlap the original sequence?
Edit: To elaborate, I'm pretty sure that the standard does not require the standard library to handle situations like this:
std::vector<int> v(10);
std::vector<int>::iterator first = v.begin() + 5;
std::vector<int>::iterator last = v.begin() + 8;
v.insert(v.begin() + 2, first, last);
However, I was unable to find anything in the standard, that would prohibit the ranges [first, last) and [v.begin(), v.end()) to overlap.
23.1.1/4 Sequence requirements has:
expression: a.insert(p,i,j)
return type: void
precondition: i,j are not iterators into a. inserts copies of elements in[i,j) before p.
So i and j cannot be iterators into your vector.
It makes sense, as during the insert operation, the vector may need to resize itself, and so the existing elements may first be copied to a new memory location (there by invalidating the current iterators).
Consider the behavior if it was allowed. Every insert into the vector would both increase the distance between the start and end iterator by one and move the start iterator up one. Therefore the start iterator would never reach the end iterator and the algorithm would execute until an out of memory exception occurred.