int main()
{
string s("some string");
if (s.begin() != s.end())
auto it = s.begin();
*it = toupper (*it) ; // Error ; the identifier "it" is undefined
}
Why is *it undefined? And do why we need to use dereference in the iterator dimension?
Short
The code doesn't work because it is defined in the scope of the if branch.
Dereferencing is required to access the actual "content", the iterator is referencing.
Explanation
1. Why is it undefined after the selection statement?
The standard says (emphasis mine) in § 6.4 / 1:
The substatement in a selection-statement (each substatement, in the else form of the if statement) implicitly defines a block scope (3.3). If the substatement in a selection-statement is a single statement and not a compound-statement, it is as if it was rewritten to be a compound-statement containing the original substatement.
The example given in the standard matches the case at hand.
Example:
if (x)
int i;
can be equivalently rewritten as
if (x)
{
int i;
}
Now we need to know how the (implicit) block scope affects the "visibility" of the variable by looking at the referenced § 3.3 (again emphasis mine):
A name declared in a block (6.3) is local to that block; it has block scope. Its potential scope begins at its point of declaration (3.3.2) and ends at the end of its block. A variable declared at block scope is a local variable.
=>There you go: Your variable it is in a block and the scope of it ends at the end of the block. Thus it is undefined after that block.
2. Why is an iterator to be dereferenced to access the content?
The conecept of iterators is built to abstract pointers as described in chapter 24 for the Iterator library in the standard. Thus, an iterator references something like a pointer also references a value.
Dereferencing is the C++ way of accessing the actual referenced value.
Dereferencing a pointer means "give me the value stored at the memory address of the pointer" whereas dereferencing an iterator means "give me the value stored at the point, the iterator logic is refering to".
(Note: A pointer is just a special kind of iterator.)
The standard requires every iterator type to define the dereferencing operation.
§ 24.2.1 / 1
All input iterators i support the expression *i, resulting in a value of some object type T, called the value type of the iterator. All output iterators support the expression *i = o where o is a value of some type that is in the set of types that are writable to the particular iterator type of i.
Every iterator type is either input or output.
§ 24.2.1 / 2: Types of iterators
This International Standard defines five categories of iterators, according to the operations defined on them: input iterators, output iterators, forward iterators, bidirectional iterators and random access iterators[.]
§ 24.2.1 / 3: Correlation of iterators types
Forward iterators satisfy all the requirements of input iterators and can be used whenever an input iterator is specified; Bidirectional iterators also satisfy all the requirements of forward iterators and can be used whenever a forward iterator is specified; Random access iterators also satisfy all the requirements of bidirectional iterators and can be used whenever a bidirectional iterator is specified.
=> Iterators provide indirection (in a more generalized fashion than pointers do) and need to be dereferenced to follow the indirection, accessing their values.
3. Working example
#include <string>
#include <cctype>
int main()
{
std::string s("some string");
if (s.begin() != s.end())
{
auto it = s.begin();
*it = std::toupper(*it);
}
// it not defined here
return 0;
}
Related
Code example:
list<int> mylist{10, 20, 30, 40};
auto p = mylist.end();
while (true)
{
p++;
if (p == mylist.end()) // skip sentinel
continue;
cout << *p << endl;
}
I wonder, how much this code is legal from standard (C++17, n4810) point of view?
I looking for bidirectional iterators requirements related to example above, but no luck.
My question is:
Ability to pass through end(), it is implementation details or it is standard requirements?
Quoting from the latest draft available online.
[iterator.requirements.general]/7
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
I believe that this applies not just to the end() but what comes after that as well. Note that the standard does not clearly state that end() should never be dereferenced.
And Cpp17Iterator requirements table states that for expression *r, r should be dereferenceable:
past-the-end iterator is considered a non-incrementable iterator and incrementing it (as you are doing at the beginning of the while loop) results in undefined behavior.
Something like what you are trying to do can also happen when using std::advance.
The book "The C++ Standard Library: A Tutorial and Reference" by Nicolai Josuttis has this quote:
Note that advance() does not check whether it crosses the end() of a sequence (it can't check because iterators in general do not know the containers on which they operate). Thus, calling this function might result in undefined behavior because calling operator ++ for the end of a sequence is not defined.
You code is illegal. You first initialized p to be the past-the-end iterator.
auto p = mylist.end();
Now you p++. Per Table 76,
the operational semantics of r++ is:
{ X tmp = r;
++r;
return tmp; }
And per [Table 74],
++r
Expects: r is dereferenceable.
And per [iterator.requirements.general]/7,
The library never assumes that past-the-end values are
dereferenceable.
In other words, incrementing a past-the-end iterator as you did is undefined behavior.
Does the following program invoke undefined behavior?
#include <iostream>
#include <iterator>
int main(int argc, char* argv[])
{
for (auto it = std::istream_iterator<std::string>(std::cin);
it != std::istream_iterator<std::string>();
++it)
{
std::cout << *it << " ";
}
return 0;
}
This 4 year old question says that they can't be compared:
Iterators can also have singular values that are not associated with
any container. [Example: After the declaration of an uninitialized
pointer x (as with int* x;), x must always be assumed to have a
singular value of a pointer. ] Results of most expressions are
undefined for singular values; the only excep- tion is an assignment
of a non-singular value to an iterator that holds a singular value.
But another answer for says for the C++14 standard:
However, value-initialized iterators may be compared and shall compare
equal to other value-initialized iterators of the same type.
You are conflating two different issues.
istream_iterator is an input iterator, not a forward iterator, so the C++14 change you cited doesn't apply to it at all. You are allowed to compare istream_iterators in that manner because they are explicitly specified to allow such comparisons. The standard says that (§24.6.1 [istream.iterator])
The constructor with no arguments istream_iterator() always
constructs an end-of-stream input iterator object, which is the only
legitimate iterator to be used for the end condition. [...]
Two end-of-stream iterators are always equal. An end-of-stream
iterator is not equal to a non-end-of-stream iterator. Two
non-end-of-stream iterators are equal when they are constructed from
the same stream.
For forward iterators (which also includes bidirectional and random access ones) in general, value-initialized iterators are made comparable to each other in C++14. If your standard library implements it, then you can compare two value-initialized iterators. This allows you to create an empty range without an underlying container. However, you are still not allowed to compare a non-singular iterator to a value-initialized iterator. The following code has undefined behavior even in C++14:
std::list<int> l;
if(l.begin() == std::list<int>::iterator())
foo();
else
bar();
I have 2 questions regarding following examples:
1)
std::vector<int> v(5,1);
cout << *v.end();
Is a printed result is undefined (depends on compiler)
2)
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << *pv.end();
Is a printed result is undefined (depends on compiler) or NULL
You have no item at end(), it's an iterator right after the last valid item in your vector.
*v.end();
It's undefined behavior. You can use end() for comparing an iterator whether it's pointing to the item after last item or not.
Easy way to access the value of last item is back(), for example:
cout << v.back();
The end() iterator points to a position that is one element after the last element of the container. Accessing the data that it points to will invoke undefined behavior and this is the case in both your examples.
Dereferencing past the end is will probably end badly but it looks like it is implementation defined, if we look at the draft C++ standard section 24.21 Iterator requirements and then to 24.2.1 In general paragraph 5 says (emphasis mine):
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any sequence. [ Example: After the declaration of an uninitialized pointer x (as with int x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; [...] Dereferenceable values are always non-singular.
Firstly, in both cases the behavior is undefined. Note, that is not "the printed result" that is undefined. You code does not even get a chance to print anything. A mere application of * operator to end iterator already causes undefined behavior. E.g. this alone
*v.end();
is already undefined behavior.
Secondly, undefined in this case does not mean "depends on the compiler". Implementation-defined behavior depends on the compiler. Undefined means "completely unpredictable", even if you are using the same compiler.
P.S. There's seems to be a bit of ongoing work in the standard commitee with reagard to some closely related issues.
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#208
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1213
Hopefully it will result in a clearer specification of what is legal and what is not for the past-the-end iterators. But it is clear that in general case past-the-end iterator can legally be a singular iterator, meaning that in general case it can be non-dereferenceable.
Yes, both of those are undefined.
vector::end - Return iterator to end (public member function )
You can read more here.
Your first example:
std::vector<int> v(5,1);
cout << *(v.end()-1);
It's undefined (look at the picture), v.end() is pointing to the address after the last element and if the container is empty, this function returns the same as v.begin().
And your second example:
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << **(pv.end()-1);
According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:
std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
//process( *it );
}
std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a pointer through that iterator is undefined behavior.
Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?
The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.
Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:
char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);
That code will work just fine, and it mirrors your vector code.
You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.
The C standard, section 6.5.6.8 says that it's well defined and valid:
...if the expression P points to the
last element of an array object, the
expression (P)+1 points one past the
last element of the array object...
but cannot be dereferenced:
...if the result points one past the
last element of the array object, it
shall not be used as the operand of a
unary * operator that is evaluated...
One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.
std::vector<X>::iterator it;
This is a singular iterator. You can only assign a valid iterator to it.
std::vector<X>::iterator it = vec.end();
This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).
Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.
It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.
The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.
I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.
First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.
Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".
From the text of the standard in [iterator.requirements], it is clear that:
Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
Dereferenceable values are not singular, since they are associated with sequence.
Past-the-end values are not singular, since they are associated with sequence.
An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.
Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.
The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.
Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.
Simple. Iterators aren't (necessarily) pointers.
They have some similarities (i.e. you can dereference them), but that's about it.
Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite
According to C++ standard (3.7.3.2/4)
using (not only dereferencing, but
also copying, casting, whatever else)
an invalid pointer is undefined
behavior
wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:
int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule
Does the C++ Standard say I should be able to compare two default-constructed STL iterators for equality? Are default-constructed iterators equality-comparable?
I want the following, using std::list for example:
void foo(const std::list<int>::iterator iter) {
if (iter == std::list<int>::iterator()) {
// Something
}
}
std::list<int>::iterator i;
foo(i);
What I want here is something like a NULL value for iterators, but I'm not sure if it's legal. In the STL implementation included with Visual Studio 2008, they include assertions in std::list's operator==() that preclude this usage. (They check that each iterator is "owned" by the same container and default-constructed iterators have no container.) This would hint that it's not legal, or perhaps that they're being over-zealous.
OK, I'll take a stab. The C++ Standard, Section 24.1/5:
Iterators can also have singular
values that are not associated with
any container. [Example: After the
declaration of an uninitialized
pointer x (as with int* x;), x must
always be assumed to have a singular
value of a pointer. ] Results of most
expressions are undefined for singular
values; the only excep- tion is an
assignment of a non-singular value to
an iterator that holds a singular
value.
So, no, they can't be compared.
This is going to change in C++14. [forward.iterators] 24.2.5p2 of N3936 says
However, value-initialized iterators may be compared and shall compare
equal to other value-initialized iterators of the same type.
I believe you should pass a range to the function.
void fun(std::list<int>::iterator beg, std::list<int>::iterator end)
{
while(beg != end)
{
// do what you want here.
beg++;
}
}
Specification says that the postcondition of default constructor is that iterator is singular. The comparison for equality are undefined, so it may be different in some implementation.