std::list sentinel node from standard point of view - c++

Code example:
list<int> mylist{10, 20, 30, 40};
auto p = mylist.end();
while (true)
{
p++;
if (p == mylist.end()) // skip sentinel
continue;
cout << *p << endl;
}
I wonder, how much this code is legal from standard (C++17, n4810) point of view?
I looking for bidirectional iterators requirements related to example above, but no luck.
My question is:
Ability to pass through end(), it is implementation details or it is standard requirements?

Quoting from the latest draft available online.
[iterator.requirements.general]/7
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
I believe that this applies not just to the end() but what comes after that as well. Note that the standard does not clearly state that end() should never be dereferenced.
And Cpp17Iterator requirements table states that for expression *r, r should be dereferenceable:
past-the-end iterator is considered a non-incrementable iterator and incrementing it (as you are doing at the beginning of the while loop) results in undefined behavior.
Something like what you are trying to do can also happen when using std::advance.
The book "The C++ Standard Library: A Tutorial and Reference" by Nicolai Josuttis has this quote:
Note that advance() does not check whether it crosses the end() of a sequence (it can't check because iterators in general do not know the containers on which they operate). Thus, calling this function might result in undefined behavior because calling operator ++ for the end of a sequence is not defined.

You code is illegal. You first initialized p to be the past-the-end iterator.
auto p = mylist.end();
Now you p++. Per Table 76,
the operational semantics of r++ is:
{ X tmp = r;
++r;
return tmp; }
And per [Table 74],
++r
Expects: r is dereferenceable.
And per [iterator.requirements.general]/7,
The library never assumes that past-the-end values are
dereferenceable.
In other words, incrementing a past-the-end iterator as you did is undefined behavior.

Related

Invalidated iterator in a vector

I'm aware erasing will invalidate iterators at and after the point of the erase. Consider:
std::vector<int> vec = {1, 2, 3, 4, 5};
std::vector<int>::iterator it = vec.end() - 1; //last element
vec.erase(vec.begin()); //shift everything one to the left, 'it' should be the new 'end()' ?
std::cout << (it == vec.end()); //not dereferencing 'it', just comparing, UB ?
Is it undefined behavior to compare (not dereference) an invalidated iterator (it in this case)? If not, is it == vec.end() guaranteed to hold true?
Edit: from the top answer it looks like this is UB if only it is a singular value. But from What is singular and non-singular values in the context of STL iterators? it seems like it is (or was) associated with the container hence making it non-singular.
I'd appreciate further analysis on this, thank you.
Once your iterator has been invalidated, it may be UB to even compare it to something else:
[C++14: 24.2.1/10]: An invalid iterator is an iterator that may be singular.
[C++14: 24.2.1/5]: [..] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation. [..]
Note that this means you also can't compare a default-constructed iterator to any .end().
Contrary to popular belief that "pointers are just memory addresses", these rules are also largely true for pointers. Indeed, the rules for iterators are a generalisation of the rules for pointers.
Formally any iterator pointing to an element on or after the erased element is invalidated. So
Yes this is UB (despite it being a pointer under the hood.)
UB again, despite the obvious plausibility.

In the computation of iterators, if I use an iterator which is before begin() or after end() as a operand, is its behavior defined?

I tried an experiment:
#include <iostream>
#include <vector>
int main(void) {
std::vector<int> a{1, 2, 3};
std::vector<int>::iterator b = a.begin();
std::vector<int>::iterator c = a.end();
std::vector<int>::iterator d = b - 1;
std::vector<int>::iterator e = c + 1;
std::cout << true << std::endl;
std::cout << (d < b) << std::endl;
std::cout << (e > c) << std::endl;
return 0;
}
It outputs:
1
1
1
But someone told me that it is behavior undefined for deque, so what do you think? Thank you!
No.
Both d and e are "singular iterators" because they point neither to an element in a sequence, or to the "one-past-the-end" pseudo-element.
And you can barely do anything with singular iterators:
[C++11: 24.2.1/5]: Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any sequence. [ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation.
Note that, in particular, you cannot perform arbitrary comparisons on them.
I was pretty sure that even evaluating a.begin() - 1 or a.end() + 1 were UB, but I can't find any evidence of that right now.
Not just for deque but this is undefined for any container (only random access containers will actually allow this to compile though).
The behaviour on using either d or e is undefined, as you suspected.
You are allowed to test c for equality with a.end(), but note that the behaviour on any dereferencing it is undefined.

Vector`s end iterator content

I have 2 questions regarding following examples:
1)
std::vector<int> v(5,1);
cout << *v.end();
Is a printed result is undefined (depends on compiler)
2)
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << *pv.end();
Is a printed result is undefined (depends on compiler) or NULL
You have no item at end(), it's an iterator right after the last valid item in your vector.
*v.end();
It's undefined behavior. You can use end() for comparing an iterator whether it's pointing to the item after last item or not.
Easy way to access the value of last item is back(), for example:
cout << v.back();
The end() iterator points to a position that is one element after the last element of the container. Accessing the data that it points to will invoke undefined behavior and this is the case in both your examples.
Dereferencing past the end is will probably end badly but it looks like it is implementation defined, if we look at the draft C++ standard section 24.21 Iterator requirements and then to 24.2.1 In general paragraph 5 says (emphasis mine):
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any sequence. [ Example: After the declaration of an uninitialized pointer x (as with int x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; [...] Dereferenceable values are always non-singular.
Firstly, in both cases the behavior is undefined. Note, that is not "the printed result" that is undefined. You code does not even get a chance to print anything. A mere application of * operator to end iterator already causes undefined behavior. E.g. this alone
*v.end();
is already undefined behavior.
Secondly, undefined in this case does not mean "depends on the compiler". Implementation-defined behavior depends on the compiler. Undefined means "completely unpredictable", even if you are using the same compiler.
P.S. There's seems to be a bit of ongoing work in the standard commitee with reagard to some closely related issues.
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#208
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1213
Hopefully it will result in a clearer specification of what is legal and what is not for the past-the-end iterators. But it is clear that in general case past-the-end iterator can legally be a singular iterator, meaning that in general case it can be non-dereferenceable.
Yes, both of those are undefined.
vector::end - Return iterator to end (public member function )
You can read more here.
Your first example:
std::vector<int> v(5,1);
cout << *(v.end()-1);
It's undefined (look at the picture), v.end() is pointing to the address after the last element and if the container is empty, this function returns the same as v.begin().
And your second example:
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << **(pv.end()-1);

Is ->second defined for iterator my_map.end()?

I'm working with a std::map<std::string, MyClass* >.
I want to test if my_map.find(key) returned a specific pointer.
Right now I'm doing;
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
However, the operator * of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized? If so, my_map.end()->second would be NULL, and (since NULL is never expected), I could reduce my if statement to:
if (iter->second == expected)
Is this valid according to specification? Does anyone have practical experience with the implementations of this? IMHO, the code becomes clearer, and possibly a tiny performance improvement could be achieved.
Intuitively I'm assuming it to be valid and fully initialized?
You cannot assume an iterator to an element past-the-end of a container to be dereferenceable. Per paragraph 24.2.1/5 of the C++11 Standard:
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element
of the array, so for any iterator type there is an iterator value that points past the last element of a
corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the
expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are
dereferenceable. [...]
However, the operator *of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized?
Your assumption is wrong, dereferencing iterator that points outside of container will lead to UB.
24.2 Iterator requirements [iterator.requirements]
24.2.1 In general [iterator.requirements.general]
7 Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an
empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element
pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if
j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
Even without checking the specs, you can easily see that dereferencing an iterator at end has to be invalid.
A perfectly natural implementation (the de-factor standard implementation for vector<>) is for end() to be literally a memory pointer that has a value of ptr_last_element + 1, that is, the pointer value that would point to the next element - if there was a next element.
You cannot possibly be allowed to dereference the end iterator because it could be a pointer that would end up pointing to either the next object in the heap, or perhaps an overflow guard area (so you would dereference random memory), or past the end of the heap, and possibly outside of the memory space of the process, in which case you might get an Access Violation exception when dereferencing).
If iter == my_map.end (), then dereferencing it is undefined behavior; but you're not doing that here.
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
If iter != my_map.end() is false, then the second half of the expression (iter->second == expected) will not be exectuted.
Read up on "short-circut evaluation".
Analogous valid code for pointers:
if ( p != NULL && *p == 4 ) {}

std::advance behavior when advancing beyond end of container [duplicate]

This question already has answers here:
What happens if you increment an iterator that is equal to the end iterator of an STL container
(8 answers)
Closed 5 years ago.
What is the behavior of std::advance when you have say:
std::vector<int> foo(10,10);
auto i = foo.begin();
std::advance(i, 20);
What is the value of i? Is it foo.end()?
The standard defines std::advance() in terms of the types of iterator it's being used on (24.3.4 "Iterator operations"):
These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
The requirements for these operations on various iterator types are also outlined in the standard (in Tables 72, 74, 75 and 76):
For an input or forward iterator
++r precondition: r is dereferenceable
for a bidirectional iterator:
--r precondition: there exists s such that r == ++s
For random access iterators, the +, +=, -, and -= operations are defined in terms of the bidirectional & forward iterator prefix ++ and -- operations, so the same preconditions hold.
So advancing an iterator beyond the 'past-the-end' value (as might be returned by the end() function on containers) or advancing before the first dereferenceable element of an iterator's valid range (as might be returned by begin() on a container) is undefined behavior since you're violating the preconditions of the ++ or -- operation.
Since it's undefined behavior you can't 'expect' anything in particular. But you'll likely crash at some point (hopefully sooner rather than later, so you can fix the bug).
According to the C++ Standard §24.3.4 std::advance(i, 20) has the same effect as for ( int n=0; n < 20; ++n ) ++i; for positive n. From the other side (§24.1.3) if i is past-the-end, then ++i operation is undefined. So the result of std::advance(i, 20) is undefined.
You are passing the foo size by advancing to 20th position. Definitely it is not end of the vector. It should invoke undefined behavior on dereferencing, AFAIK.
Edit 1:
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> foo(10,10) ;
std::vector<int>::iterator iter = foo.begin() ;
std::advance(iter,20);
std::cout << *iter << "\n" ;
return 0;
}
Output: 0
If it is the vector's last element, then it should have given 10 on iterator dereferencing. So, it is UB.
IdeOne Results
That is probably undefined behavior. The only thing the standard says is:
Since only random access iterators provide + and - operators, the library provides two function templates advance and distance. These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
template <class InputIterator, class Distance>
void advance(InputIterator& i, Distance n);
Requires: n shall be negative only for bidirectional and random access iterators. Effects: Increments (or decrements for negative n) iterator reference i by n.
From the SGI page for std::advance:
Every iterator between i and i+n
(inclusive) is nonsingular.
Therefore i is not foo.end() and dereferencing will result in undefined behavior.
Notes:
See this question for more details about what (non)singular means when referring to iterators.
I know that the SGI page is not the de-facto standard but pretty much all STL implementations follow those guidelines.