std::advance behavior when advancing beyond end of container [duplicate] - c++

This question already has answers here:
What happens if you increment an iterator that is equal to the end iterator of an STL container
(8 answers)
Closed 5 years ago.
What is the behavior of std::advance when you have say:
std::vector<int> foo(10,10);
auto i = foo.begin();
std::advance(i, 20);
What is the value of i? Is it foo.end()?

The standard defines std::advance() in terms of the types of iterator it's being used on (24.3.4 "Iterator operations"):
These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
The requirements for these operations on various iterator types are also outlined in the standard (in Tables 72, 74, 75 and 76):
For an input or forward iterator
++r precondition: r is dereferenceable
for a bidirectional iterator:
--r precondition: there exists s such that r == ++s
For random access iterators, the +, +=, -, and -= operations are defined in terms of the bidirectional & forward iterator prefix ++ and -- operations, so the same preconditions hold.
So advancing an iterator beyond the 'past-the-end' value (as might be returned by the end() function on containers) or advancing before the first dereferenceable element of an iterator's valid range (as might be returned by begin() on a container) is undefined behavior since you're violating the preconditions of the ++ or -- operation.
Since it's undefined behavior you can't 'expect' anything in particular. But you'll likely crash at some point (hopefully sooner rather than later, so you can fix the bug).

According to the C++ Standard §24.3.4 std::advance(i, 20) has the same effect as for ( int n=0; n < 20; ++n ) ++i; for positive n. From the other side (§24.1.3) if i is past-the-end, then ++i operation is undefined. So the result of std::advance(i, 20) is undefined.

You are passing the foo size by advancing to 20th position. Definitely it is not end of the vector. It should invoke undefined behavior on dereferencing, AFAIK.
Edit 1:
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> foo(10,10) ;
std::vector<int>::iterator iter = foo.begin() ;
std::advance(iter,20);
std::cout << *iter << "\n" ;
return 0;
}
Output: 0
If it is the vector's last element, then it should have given 10 on iterator dereferencing. So, it is UB.
IdeOne Results

That is probably undefined behavior. The only thing the standard says is:
Since only random access iterators provide + and - operators, the library provides two function templates advance and distance. These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
template <class InputIterator, class Distance>
void advance(InputIterator& i, Distance n);
Requires: n shall be negative only for bidirectional and random access iterators. Effects: Increments (or decrements for negative n) iterator reference i by n.

From the SGI page for std::advance:
Every iterator between i and i+n
(inclusive) is nonsingular.
Therefore i is not foo.end() and dereferencing will result in undefined behavior.
Notes:
See this question for more details about what (non)singular means when referring to iterators.
I know that the SGI page is not the de-facto standard but pretty much all STL implementations follow those guidelines.

Related

std::list sentinel node from standard point of view

Code example:
list<int> mylist{10, 20, 30, 40};
auto p = mylist.end();
while (true)
{
p++;
if (p == mylist.end()) // skip sentinel
continue;
cout << *p << endl;
}
I wonder, how much this code is legal from standard (C++17, n4810) point of view?
I looking for bidirectional iterators requirements related to example above, but no luck.
My question is:
Ability to pass through end(), it is implementation details or it is standard requirements?
Quoting from the latest draft available online.
[iterator.requirements.general]/7
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
I believe that this applies not just to the end() but what comes after that as well. Note that the standard does not clearly state that end() should never be dereferenced.
And Cpp17Iterator requirements table states that for expression *r, r should be dereferenceable:
past-the-end iterator is considered a non-incrementable iterator and incrementing it (as you are doing at the beginning of the while loop) results in undefined behavior.
Something like what you are trying to do can also happen when using std::advance.
The book "The C++ Standard Library: A Tutorial and Reference" by Nicolai Josuttis has this quote:
Note that advance() does not check whether it crosses the end() of a sequence (it can't check because iterators in general do not know the containers on which they operate). Thus, calling this function might result in undefined behavior because calling operator ++ for the end of a sequence is not defined.
You code is illegal. You first initialized p to be the past-the-end iterator.
auto p = mylist.end();
Now you p++. Per Table 76,
the operational semantics of r++ is:
{ X tmp = r;
++r;
return tmp; }
And per [Table 74],
++r
Expects: r is dereferenceable.
And per [iterator.requirements.general]/7,
The library never assumes that past-the-end values are
dereferenceable.
In other words, incrementing a past-the-end iterator as you did is undefined behavior.

C++ formal requirement of behaviour of iterators over a container

I'm sure that I'm not alone in expecting that I could add several elements in some order to a vector or list, and then could use an iterator to retrieve those elements in the same order. For example, in:
#include <vector>
#include <cassert>
int main(int argc, char **argv)
{
using namespace std;
vector<int> v;
v.push_back(4);
v.push_back(10);
v.push_back(100);
auto i = v.begin();
assert(*i++ == 4);
assert(*i++ == 10);
assert(*i++ == 100);
return 0;
}
... all assertions should pass and the program should terminate normally (assuming that no std::bad_alloc exception is thrown during construction of the vector or adding the elements to it).
However, I'm having trouble reconciling this with any requirement in the C++ standard (I'm looking at C++11, but would like answers for other standards also if they are markedly different).
The requirement for begin() is just (23.2.1 para 6):
begin() returns an iterator referring to the first element in the container.
What I'm looking for is the requirement, or combination of requirements that in turn logically requires, that if i = v.begin(), then ++i shall refer to the second element in the vector (assuming that such an element exists) - or indeed, even the requirement that successive increments of an iterator will return each of the elements in the vector.
Edit:
A more general question is, what (if any) text in the standard requires that successfully incrementing an iterator obtained by calling begin() on a sequence (ordered or unordered) actually visits every element of the sequence?
There's isn't in the standard something straightforward to state that
if i = v.begin(), then ++i shall refer to the second element in the
vector.
However, for vector's iterators why can imply it from the following wording in the draft standard N4527 24.2.1/p5 In general [iterator.requirements.general]:
Iterators that further satisfy the requirement that, for integral
values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous
iterators.
Now, std::vector's iterator satisfy this requirement, consequently we can imply that ++i is equivalent to i + 1 and thus to addressof(*i) + 1. Which indeed is the second element in the vector due to its contiguous nature.
Edit:
There was indeed a turbidness on the matter about random access iterators and contiguous storage containers in C++11 and C++14 standards. Thus, the commity decided to refine them by putting an extra group of iterators named contiguous iterators. You can find more info in the relative proposal N3884.
It looks to me like we need to put two separate parts of the standard together to get a solid requirement here. We can start with table 101, which requires that a[n] be equivalent to *(a.begin() + n) for sequence containers (specifically, basic_string, array, deqeue and vector) (and the same requirement for a.at(n), for the same containers).
Then we look at table 111 in [random.access.iterators], where it requires that the expression r += n be equivalent to:
{
difference_type m = n;
if (m >= 0)
while (m--)
++r;
else
while (m++)
--r;
return r;
}
[indentation added]
Between the two, these imply that for any n, *(begin() + n) refers to the nth item in the vector. Just in case you want to cover the last base I see open, let's cover the requirement that push_back actually append to the collection. That's also in table 101: a.push_back(t) "Appends a copy of t" (again for basic_string, string, deque, list, and vector).
[C++14: 23.2.3/1]: A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. [..]
I don't know how else you'd interpret it.
The specification isn't just in the iterators. It is also in the specification of the containers, and the operations that modify those containers.
The thing is, you are not going to find a single clause that says "incrementing begin() repeatedly will access all elements of a vector in order". You need to look at the specification of every operation on every container (since these define an order of elements in the container) and the specification of iterators (and operations on them) which is essentially that "incrementing moves to the next element in the order that operations on the container defined, until we pass the end". It is the combination of numerous clauses in the standard that give the end effect.
The general concepts, however, are ....
All containers maintain some range of zero or more elements. That range has three key properties: a beginning (corresponding to the first element in an order that is meaningful to the container), and an end (corresponding to the last element), and an order (which determines the sequence in which elements will be retrieved one after the other - i.e. defines the meaning of "next").
An iterator is an object that either references an element in a range, or has a "past the end" value. An iterator that references an element in the range other than the end (last), when incremented, will reference the next element. An iterator that references the end (last) element in the range, when incremented, will be an end (past the end) iterator.
The begin() method returns an iterator that references (or points to) the first in the range (or an end iterator if the range has zero elements). The end() method returns an end iterator - one that corresponds to "one past the the end of the range". That means, if an iterator is initialised using the begin(), incrementing it repeatedly will move sequentially through the range until the end iterator is reached.
Then it is necessary to look at the specification for the various modifiers of the container - the member functions that add or remove elements. For example, push_back() is specified as adding an element to the end of the existing range for that container. It extends the range by adding an element to the end.
It is that combination of specifications - of iterators and of operations that modify containers - that guarantees the order. The net effect is that, if elements are added to a container in some order, then a iterator initialised using begin() will - when incremented repeatedly - reference the elements in the order in which they were placed in the container.
Obviously, some container modifiers are a bit more complicated - for example, std::vector's insert() is given an iterator, and adds elements there, shuffling subsequent elements to make room. However, the key point is that the modifiers place elements into the container in a defined order (or remove, in the case of operations like std::vector::erase()) and iterators will access elements in that defined order.

Are end+1 iterators for std::string allowed?

Is it valid to create an iterator to end(str)+1 for std::string?
And if it isn't, why isn't it?
This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.
The significant difference between std::string and any other standard container I speculate on is that it always contains one element more than its size, the zero-terminator, to fulfill the requirements of .c_str().
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.
Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:
21.4.1 basic_string general requirements [string.require]
4 The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
(All quotes are from C++14 final draft (n3936).)
Related: Legal to overwrite std::string's null terminator?
TL;DR: s.end() + 1 is undefined behavior.
std::string is a strange beast, mainly for historical reasons:
It attempts to bring C compatibility, where it is known that an additional \0 character exists beyond the length reported by strlen.
It was designed with an index-based interface.
As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.
This led std::string, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires: pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws: out_of_range if pos >= size()
Yes, you read this right, s[s.size()] returns a reference to a NUL character while s.at(s.size()) throws an out_of_range exception. If anyone tells you to replace all uses of operator[] by at because they are safer, beware the string trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by basic_string are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end() is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r and r++ that r be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n is equivalent to [inc|dec]rememting r n times
a + n and n + a are equivalent to copying a and then applying += n to the copy
and similarly for -= n and - n.
Thus s.end() + 1 is undefined behavior.
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
std::string::operator[](size_type i) is specified to return "a reference to an object of type charT with value charT() when i == size(), so we know that that pointer points to an object.
5.7 states that "For the purposes of [operators + and -], a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."
So we have a non-array object and the spec guarantees that a pointer one past it will be representable. So we know std::addressof(*end(str)) + 1 has to be representable.
However that's not a guarantee on std::string::iterator, and there is no such guarantee anywhere in the spec, which makes it undefined behavior.
(Note that this is not the same as 'ill-formed'. *end(str) + 1 is in fact well-formed.)
Iterators can and do implement checking logic that produce various errors when you do things like increment the end() iterator. This is in fact what Visual Studios debug iterators do with end(str) + 1.
#define _ITERATOR_DEBUG_LEVEL 2
#include <string>
#include <iterator>
int main() {
std::string s = "ssssssss";
auto x = std::end(s) + 1; // produces debug dialog, aborts program if skipped
}
And if it isn't, why isn't it?
for consistency and interoperability with zero-terminated strings if nothing else
C++ specifies some specific things for compatibility with C, but such backwards compatibility is limited to supporting things that can actually be written in C. C++ doesn't necessarily try to take C's semantics and make new constructs behave in some analogous way. Should std::vector decay to an iterator just to be consistent with C's array decay behavior?
I'd say end(std) + 1 is left as undefined behavior because there's no value in trying to constrain std::string iterators this way. There's no legacy C code that does this that C++ needs to be compatible with and new code should be prevented from doing it.
New code should be prevented from relying on it... why? [...] What does not allowing it buy you in theory, and how does that look in practice?
Not allowing it means implementations don't have to support the added complexity, complexity which provides zero demonstrated value.
In fact it seems to me that supporting end(str) + 1 has negative value since code that tries to use it will essentially be creating the same problem as C code which can't figure out when to account for the null terminator or not. C has enough off by one buffer size errors for both languages.
A std::basic_string<???> is a container over its elements. Its elements do not include the trailing null that is implicitly added (it can include embedded nulls).
This makes lots of sense -- "for each character in this string" probably shouldn't return the trailing '\0', as that is really an implementation detail for compatibility with C style APIs.
The iterator rules for containers were based off of containers that don't shove an extra element at the end. Modifying them for std::basic_string<???> without motivation is questionable; one should only break a working pattern if there is a payoff.
There is every reason to think that pointers to .data() and .data() + .size() + 1 are allowed (I could imagine a twisted interpretation of the standard that would make it not allowed). So if you really need read-only iterators into the contents of a std::string, you can use pointer-to-const-elements (which are, after all, a kind of iterator).
If you want editable ones, then no, there is no way to get a valid iterator to one-past-the-end. Neither can you get a non-const reference to the trailing null legally. In fact, such access is clearly a bad idea; if you change the value of that element, you break the std::basic_string's invariant null-termination.
For there to be an iterator to one-past-the-end, the const and non-const iterators to the container would have to have a different valid range, or a non-const iterator to the last element that can be dereferenced but not written to must exist.
I shudder at making such standard wording watertight.
std::basic_string is already a mess. Making it even stranger would lead to standard bugs and would have a non-trivial cost. The benefit is really low; in the few cases where you want access to said trailing null in an iterator range, you can use .data() and use the resulting pointers as iterators.
I can't find a definitive answer, but indirect evidence points at end()+1 being undefined.
[string.insert]/15
constexpr iterator insert(const_iterator p, charT c);
Preconditions: p is a valid iterator on *this.
It would be unreasonable to expect this to work with end()+1 as the iterator, and it indeed causes a crash on both libstdc++ and libc++.
This means end()+1 is not a valid iterator, meaning end() is not incrementable.

Why doesn't the standard allow std::for_each to have well-defined behavior on invalid random access iterator ranges?

In this question it was explained that std::for_each has undefined behavior when given an invalid iterator range [first, last) (i.e. when last is not reachable by incrementing first).
Presumably this is because a general loop for(auto it = first; it != last; ++it) would run forever on invalid ranges. But for random access iterators this seems an unnecessary restriction because random access iterators have a comparison operator and one could write explicit loops as for(auto it = first; it < last; ++it). This would turn a loop over an invalid range into a no-op.
So my question is: why doesn't the standard allow std::for_each to have well-defined behavior on invalid random access iterator ranges? It would simplify several algorithms which only make sense on multi-element containers (sorting e.g.). Is there a performance penalty for using operator<() instead of operator!=() ?
This would turn a loop over an invalid range into a no-op.
That's not necessarily the case.
One example of an invalid range is when first and last refer to different containers. Comparing such iterators would result in undefined behaviour in at least some cases.
This would turn a loop over an invalid range into a no-op.
You seem to be saying that operator< should always return false for two random-access iterators that are not part of the same range. That's the only way your specified loop would be a no-op.
It doesn't make sense for the standard to specify this. Remember that pointers are random-access iterators. Think about the implementation burden for pointer operations, and the general confusion caused to readers, if it were defined that the following code print "two":
int a[5];
int b[5]; // neither [a,b) nor [b,a) is a valid range
if ((a < b) || (b < a)) {
std::cout << "one\n";
} else {
std::cout << "two\n";
}
Instead, it is left undefined so that people won't write it in the first place.
Because that's the general policy. All using < would allow is things
like:
std::for_each( v.begin() + 20, v.begin() + 10, op );
Even with <, passing an invalid iterator, or iterators from different
containers, is undefined behavior.

container.erase(first,last) where first == last in STL containers

Is there a defined behavior for container.erase(first,last) when first == last in the STL, or is it undefined?
Example:
std::vector<int> v(1,1);
v.erase(v.begin(),v.begin());
std::cout << v.size(); // 1 or 0?
If there is a Standard Library specification document that has this information I would appreciate a reference to it.
The behavior is well defined.
It is a No-op(No-Operation). It does not perform any erase operation on the container as end is same as begin.
The relevant Quote from the Standard are as follows:
C++03 Standard: 24.1 Iterator requirements and
C++11 Standard: 24.2.1 Iterator requirements
Para 6 & 7 for both:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same container.
Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.A range is a pair of iterators that designate the beginning and end of the computation. A range [i, i) is an empty range; in general, a range [i, j) refers to the elements in the data structure starting with the one pointed to by i and up to but not including the one pointed to by j. Range [i, j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
That would erase nothing at all, just like other algorithms that operate on [, ) ranges.
Even if the container is empty I think that would still work because begin() == end().
Conceptually, there is an ordinary loop from begin to end, with a simple loop condition that checks if the iterator is end already, like this:
void erase (iterator from, iterator to) {
...
while (from != to) erase (from++);
...
}
(however, implementations may vary). As you see, if from==to, then there is no single iteration of the loop body.
It is perfectly defined. It removes all elements from first to last, including first and excluding last. If there are no elements in this range (when first == last), then how much are removed? You guessed it, none.
Though I'm not sure what happens if first comes after last, I suppose this will invoke undefined behaviour.