I have 2 questions regarding following examples:
1)
std::vector<int> v(5,1);
cout << *v.end();
Is a printed result is undefined (depends on compiler)
2)
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << *pv.end();
Is a printed result is undefined (depends on compiler) or NULL
You have no item at end(), it's an iterator right after the last valid item in your vector.
*v.end();
It's undefined behavior. You can use end() for comparing an iterator whether it's pointing to the item after last item or not.
Easy way to access the value of last item is back(), for example:
cout << v.back();
The end() iterator points to a position that is one element after the last element of the container. Accessing the data that it points to will invoke undefined behavior and this is the case in both your examples.
Dereferencing past the end is will probably end badly but it looks like it is implementation defined, if we look at the draft C++ standard section 24.21 Iterator requirements and then to 24.2.1 In general paragraph 5 says (emphasis mine):
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any sequence. [ Example: After the declaration of an uninitialized pointer x (as with int x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; [...] Dereferenceable values are always non-singular.
Firstly, in both cases the behavior is undefined. Note, that is not "the printed result" that is undefined. You code does not even get a chance to print anything. A mere application of * operator to end iterator already causes undefined behavior. E.g. this alone
*v.end();
is already undefined behavior.
Secondly, undefined in this case does not mean "depends on the compiler". Implementation-defined behavior depends on the compiler. Undefined means "completely unpredictable", even if you are using the same compiler.
P.S. There's seems to be a bit of ongoing work in the standard commitee with reagard to some closely related issues.
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#208
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1213
Hopefully it will result in a clearer specification of what is legal and what is not for the past-the-end iterators. But it is clear that in general case past-the-end iterator can legally be a singular iterator, meaning that in general case it can be non-dereferenceable.
Yes, both of those are undefined.
vector::end - Return iterator to end (public member function )
You can read more here.
Your first example:
std::vector<int> v(5,1);
cout << *(v.end()-1);
It's undefined (look at the picture), v.end() is pointing to the address after the last element and if the container is empty, this function returns the same as v.begin().
And your second example:
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << **(pv.end()-1);
Related
Code example:
list<int> mylist{10, 20, 30, 40};
auto p = mylist.end();
while (true)
{
p++;
if (p == mylist.end()) // skip sentinel
continue;
cout << *p << endl;
}
I wonder, how much this code is legal from standard (C++17, n4810) point of view?
I looking for bidirectional iterators requirements related to example above, but no luck.
My question is:
Ability to pass through end(), it is implementation details or it is standard requirements?
Quoting from the latest draft available online.
[iterator.requirements.general]/7
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
I believe that this applies not just to the end() but what comes after that as well. Note that the standard does not clearly state that end() should never be dereferenced.
And Cpp17Iterator requirements table states that for expression *r, r should be dereferenceable:
past-the-end iterator is considered a non-incrementable iterator and incrementing it (as you are doing at the beginning of the while loop) results in undefined behavior.
Something like what you are trying to do can also happen when using std::advance.
The book "The C++ Standard Library: A Tutorial and Reference" by Nicolai Josuttis has this quote:
Note that advance() does not check whether it crosses the end() of a sequence (it can't check because iterators in general do not know the containers on which they operate). Thus, calling this function might result in undefined behavior because calling operator ++ for the end of a sequence is not defined.
You code is illegal. You first initialized p to be the past-the-end iterator.
auto p = mylist.end();
Now you p++. Per Table 76,
the operational semantics of r++ is:
{ X tmp = r;
++r;
return tmp; }
And per [Table 74],
++r
Expects: r is dereferenceable.
And per [iterator.requirements.general]/7,
The library never assumes that past-the-end values are
dereferenceable.
In other words, incrementing a past-the-end iterator as you did is undefined behavior.
I'm aware erasing will invalidate iterators at and after the point of the erase. Consider:
std::vector<int> vec = {1, 2, 3, 4, 5};
std::vector<int>::iterator it = vec.end() - 1; //last element
vec.erase(vec.begin()); //shift everything one to the left, 'it' should be the new 'end()' ?
std::cout << (it == vec.end()); //not dereferencing 'it', just comparing, UB ?
Is it undefined behavior to compare (not dereference) an invalidated iterator (it in this case)? If not, is it == vec.end() guaranteed to hold true?
Edit: from the top answer it looks like this is UB if only it is a singular value. But from What is singular and non-singular values in the context of STL iterators? it seems like it is (or was) associated with the container hence making it non-singular.
I'd appreciate further analysis on this, thank you.
Once your iterator has been invalidated, it may be UB to even compare it to something else:
[C++14: 24.2.1/10]: An invalid iterator is an iterator that may be singular.
[C++14: 24.2.1/5]: [..] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible requirements, using a value-initialized iterator as the source of a copy or move operation. [..]
Note that this means you also can't compare a default-constructed iterator to any .end().
Contrary to popular belief that "pointers are just memory addresses", these rules are also largely true for pointers. Indeed, the rules for iterators are a generalisation of the rules for pointers.
Formally any iterator pointing to an element on or after the erased element is invalidated. So
Yes this is UB (despite it being a pointer under the hood.)
UB again, despite the obvious plausibility.
Is it valid to create an iterator to end(str)+1 for std::string?
And if it isn't, why isn't it?
This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.
The significant difference between std::string and any other standard container I speculate on is that it always contains one element more than its size, the zero-terminator, to fulfill the requirements of .c_str().
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.
Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:
21.4.1 basic_string general requirements [string.require]
4 The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
(All quotes are from C++14 final draft (n3936).)
Related: Legal to overwrite std::string's null terminator?
TL;DR: s.end() + 1 is undefined behavior.
std::string is a strange beast, mainly for historical reasons:
It attempts to bring C compatibility, where it is known that an additional \0 character exists beyond the length reported by strlen.
It was designed with an index-based interface.
As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.
This led std::string, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires: pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws: out_of_range if pos >= size()
Yes, you read this right, s[s.size()] returns a reference to a NUL character while s.at(s.size()) throws an out_of_range exception. If anyone tells you to replace all uses of operator[] by at because they are safer, beware the string trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by basic_string are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end() is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r and r++ that r be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n is equivalent to [inc|dec]rememting r n times
a + n and n + a are equivalent to copying a and then applying += n to the copy
and similarly for -= n and - n.
Thus s.end() + 1 is undefined behavior.
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
std::string::operator[](size_type i) is specified to return "a reference to an object of type charT with value charT() when i == size(), so we know that that pointer points to an object.
5.7 states that "For the purposes of [operators + and -], a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."
So we have a non-array object and the spec guarantees that a pointer one past it will be representable. So we know std::addressof(*end(str)) + 1 has to be representable.
However that's not a guarantee on std::string::iterator, and there is no such guarantee anywhere in the spec, which makes it undefined behavior.
(Note that this is not the same as 'ill-formed'. *end(str) + 1 is in fact well-formed.)
Iterators can and do implement checking logic that produce various errors when you do things like increment the end() iterator. This is in fact what Visual Studios debug iterators do with end(str) + 1.
#define _ITERATOR_DEBUG_LEVEL 2
#include <string>
#include <iterator>
int main() {
std::string s = "ssssssss";
auto x = std::end(s) + 1; // produces debug dialog, aborts program if skipped
}
And if it isn't, why isn't it?
for consistency and interoperability with zero-terminated strings if nothing else
C++ specifies some specific things for compatibility with C, but such backwards compatibility is limited to supporting things that can actually be written in C. C++ doesn't necessarily try to take C's semantics and make new constructs behave in some analogous way. Should std::vector decay to an iterator just to be consistent with C's array decay behavior?
I'd say end(std) + 1 is left as undefined behavior because there's no value in trying to constrain std::string iterators this way. There's no legacy C code that does this that C++ needs to be compatible with and new code should be prevented from doing it.
New code should be prevented from relying on it... why? [...] What does not allowing it buy you in theory, and how does that look in practice?
Not allowing it means implementations don't have to support the added complexity, complexity which provides zero demonstrated value.
In fact it seems to me that supporting end(str) + 1 has negative value since code that tries to use it will essentially be creating the same problem as C code which can't figure out when to account for the null terminator or not. C has enough off by one buffer size errors for both languages.
A std::basic_string<???> is a container over its elements. Its elements do not include the trailing null that is implicitly added (it can include embedded nulls).
This makes lots of sense -- "for each character in this string" probably shouldn't return the trailing '\0', as that is really an implementation detail for compatibility with C style APIs.
The iterator rules for containers were based off of containers that don't shove an extra element at the end. Modifying them for std::basic_string<???> without motivation is questionable; one should only break a working pattern if there is a payoff.
There is every reason to think that pointers to .data() and .data() + .size() + 1 are allowed (I could imagine a twisted interpretation of the standard that would make it not allowed). So if you really need read-only iterators into the contents of a std::string, you can use pointer-to-const-elements (which are, after all, a kind of iterator).
If you want editable ones, then no, there is no way to get a valid iterator to one-past-the-end. Neither can you get a non-const reference to the trailing null legally. In fact, such access is clearly a bad idea; if you change the value of that element, you break the std::basic_string's invariant null-termination.
For there to be an iterator to one-past-the-end, the const and non-const iterators to the container would have to have a different valid range, or a non-const iterator to the last element that can be dereferenced but not written to must exist.
I shudder at making such standard wording watertight.
std::basic_string is already a mess. Making it even stranger would lead to standard bugs and would have a non-trivial cost. The benefit is really low; in the few cases where you want access to said trailing null in an iterator range, you can use .data() and use the resulting pointers as iterators.
I can't find a definitive answer, but indirect evidence points at end()+1 being undefined.
[string.insert]/15
constexpr iterator insert(const_iterator p, charT c);
Preconditions: p is a valid iterator on *this.
It would be unreasonable to expect this to work with end()+1 as the iterator, and it indeed causes a crash on both libstdc++ and libc++.
This means end()+1 is not a valid iterator, meaning end() is not incrementable.
I'm working with a std::map<std::string, MyClass* >.
I want to test if my_map.find(key) returned a specific pointer.
Right now I'm doing;
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
However, the operator * of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized? If so, my_map.end()->second would be NULL, and (since NULL is never expected), I could reduce my if statement to:
if (iter->second == expected)
Is this valid according to specification? Does anyone have practical experience with the implementations of this? IMHO, the code becomes clearer, and possibly a tiny performance improvement could be achieved.
Intuitively I'm assuming it to be valid and fully initialized?
You cannot assume an iterator to an element past-the-end of a container to be dereferenceable. Per paragraph 24.2.1/5 of the C++11 Standard:
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element
of the array, so for any iterator type there is an iterator value that points past the last element of a
corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the
expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are
dereferenceable. [...]
However, the operator *of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized?
Your assumption is wrong, dereferencing iterator that points outside of container will lead to UB.
24.2 Iterator requirements [iterator.requirements]
24.2.1 In general [iterator.requirements.general]
7 Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an
empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element
pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if
j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
Even without checking the specs, you can easily see that dereferencing an iterator at end has to be invalid.
A perfectly natural implementation (the de-factor standard implementation for vector<>) is for end() to be literally a memory pointer that has a value of ptr_last_element + 1, that is, the pointer value that would point to the next element - if there was a next element.
You cannot possibly be allowed to dereference the end iterator because it could be a pointer that would end up pointing to either the next object in the heap, or perhaps an overflow guard area (so you would dereference random memory), or past the end of the heap, and possibly outside of the memory space of the process, in which case you might get an Access Violation exception when dereferencing).
If iter == my_map.end (), then dereferencing it is undefined behavior; but you're not doing that here.
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
If iter != my_map.end() is false, then the second half of the expression (iter->second == expected) will not be exectuted.
Read up on "short-circut evaluation".
Analogous valid code for pointers:
if ( p != NULL && *p == 4 ) {}
According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:
std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
//process( *it );
}
std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a pointer through that iterator is undefined behavior.
Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?
The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.
Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:
char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);
That code will work just fine, and it mirrors your vector code.
You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.
The C standard, section 6.5.6.8 says that it's well defined and valid:
...if the expression P points to the
last element of an array object, the
expression (P)+1 points one past the
last element of the array object...
but cannot be dereferenced:
...if the result points one past the
last element of the array object, it
shall not be used as the operand of a
unary * operator that is evaluated...
One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.
std::vector<X>::iterator it;
This is a singular iterator. You can only assign a valid iterator to it.
std::vector<X>::iterator it = vec.end();
This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).
Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.
It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.
The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.
I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.
First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.
Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".
From the text of the standard in [iterator.requirements], it is clear that:
Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
Dereferenceable values are not singular, since they are associated with sequence.
Past-the-end values are not singular, since they are associated with sequence.
An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.
Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.
The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.
Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.
Simple. Iterators aren't (necessarily) pointers.
They have some similarities (i.e. you can dereference them), but that's about it.
Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite
According to C++ standard (3.7.3.2/4)
using (not only dereferencing, but
also copying, casting, whatever else)
an invalid pointer is undefined
behavior
wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:
int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule