I'm working with a std::map<std::string, MyClass* >.
I want to test if my_map.find(key) returned a specific pointer.
Right now I'm doing;
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
However, the operator * of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized? If so, my_map.end()->second would be NULL, and (since NULL is never expected), I could reduce my if statement to:
if (iter->second == expected)
Is this valid according to specification? Does anyone have practical experience with the implementations of this? IMHO, the code becomes clearer, and possibly a tiny performance improvement could be achieved.
Intuitively I'm assuming it to be valid and fully initialized?
You cannot assume an iterator to an element past-the-end of a container to be dereferenceable. Per paragraph 24.2.1/5 of the C++11 Standard:
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element
of the array, so for any iterator type there is an iterator value that points past the last element of a
corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the
expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are
dereferenceable. [...]
However, the operator *of the iterator is required to return a reference. Intuitively I'm assuming it to be valid and fully initialized?
Your assumption is wrong, dereferencing iterator that points outside of container will lead to UB.
24.2 Iterator requirements [iterator.requirements]
24.2.1 In general [iterator.requirements.general]
7 Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges.
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an
empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element
pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if
j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
Even without checking the specs, you can easily see that dereferencing an iterator at end has to be invalid.
A perfectly natural implementation (the de-factor standard implementation for vector<>) is for end() to be literally a memory pointer that has a value of ptr_last_element + 1, that is, the pointer value that would point to the next element - if there was a next element.
You cannot possibly be allowed to dereference the end iterator because it could be a pointer that would end up pointing to either the next object in the heap, or perhaps an overflow guard area (so you would dereference random memory), or past the end of the heap, and possibly outside of the memory space of the process, in which case you might get an Access Violation exception when dereferencing).
If iter == my_map.end (), then dereferencing it is undefined behavior; but you're not doing that here.
auto iter = my_map.find(key);
if ((iter != my_map.end()) && (iter->second == expected)) {
// Something wonderful has happened
}
If iter != my_map.end() is false, then the second half of the expression (iter->second == expected) will not be exectuted.
Read up on "short-circut evaluation".
Analogous valid code for pointers:
if ( p != NULL && *p == 4 ) {}
Related
i want to remove elements within a container(for now it is unordered_set) by certain condition
for (auto it = windows.begin(); it != windows.end(); ) {
if ((*it)->closed() == 0)
it = numbers.erase(it);
else
++it;
}
i know the erase(it) will return the position immediately following the last of the elements erased. but
Is it mandatory by the standard there won't cause the rearrangement for the iteation when invoking erase? Is it always safe for all containers and all platforms? Say there may be some magic implementation for certain type of container within certain platform.
The C++ standard requires that unordered_set::erase preserve the order of remaining elements, and return an iterator immediately following those being erased. Therefore, the loop you show is well-defined.
[unord.req]/14 ... The erase members shall invalidate only iterators and references to the erased elements, and preserve the relative order of the elements that are not erased.
[unord.req]/11, Table 91 a.erase(q) Erases the element pointed to by q. Returns the iterator immediately following q prior to the erasure.
Code example:
list<int> mylist{10, 20, 30, 40};
auto p = mylist.end();
while (true)
{
p++;
if (p == mylist.end()) // skip sentinel
continue;
cout << *p << endl;
}
I wonder, how much this code is legal from standard (C++17, n4810) point of view?
I looking for bidirectional iterators requirements related to example above, but no luck.
My question is:
Ability to pass through end(), it is implementation details or it is standard requirements?
Quoting from the latest draft available online.
[iterator.requirements.general]/7
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
I believe that this applies not just to the end() but what comes after that as well. Note that the standard does not clearly state that end() should never be dereferenced.
And Cpp17Iterator requirements table states that for expression *r, r should be dereferenceable:
past-the-end iterator is considered a non-incrementable iterator and incrementing it (as you are doing at the beginning of the while loop) results in undefined behavior.
Something like what you are trying to do can also happen when using std::advance.
The book "The C++ Standard Library: A Tutorial and Reference" by Nicolai Josuttis has this quote:
Note that advance() does not check whether it crosses the end() of a sequence (it can't check because iterators in general do not know the containers on which they operate). Thus, calling this function might result in undefined behavior because calling operator ++ for the end of a sequence is not defined.
You code is illegal. You first initialized p to be the past-the-end iterator.
auto p = mylist.end();
Now you p++. Per Table 76,
the operational semantics of r++ is:
{ X tmp = r;
++r;
return tmp; }
And per [Table 74],
++r
Expects: r is dereferenceable.
And per [iterator.requirements.general]/7,
The library never assumes that past-the-end values are
dereferenceable.
In other words, incrementing a past-the-end iterator as you did is undefined behavior.
I have 2 questions regarding following examples:
1)
std::vector<int> v(5,1);
cout << *v.end();
Is a printed result is undefined (depends on compiler)
2)
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << *pv.end();
Is a printed result is undefined (depends on compiler) or NULL
You have no item at end(), it's an iterator right after the last valid item in your vector.
*v.end();
It's undefined behavior. You can use end() for comparing an iterator whether it's pointing to the item after last item or not.
Easy way to access the value of last item is back(), for example:
cout << v.back();
The end() iterator points to a position that is one element after the last element of the container. Accessing the data that it points to will invoke undefined behavior and this is the case in both your examples.
Dereferencing past the end is will probably end badly but it looks like it is implementation defined, if we look at the draft C++ standard section 24.21 Iterator requirements and then to 24.2.1 In general paragraph 5 says (emphasis mine):
Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any sequence. [ Example: After the declaration of an uninitialized pointer x (as with int x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; [...] Dereferenceable values are always non-singular.
Firstly, in both cases the behavior is undefined. Note, that is not "the printed result" that is undefined. You code does not even get a chance to print anything. A mere application of * operator to end iterator already causes undefined behavior. E.g. this alone
*v.end();
is already undefined behavior.
Secondly, undefined in this case does not mean "depends on the compiler". Implementation-defined behavior depends on the compiler. Undefined means "completely unpredictable", even if you are using the same compiler.
P.S. There's seems to be a bit of ongoing work in the standard commitee with reagard to some closely related issues.
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#208
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1213
Hopefully it will result in a clearer specification of what is legal and what is not for the past-the-end iterators. But it is clear that in general case past-the-end iterator can legally be a singular iterator, meaning that in general case it can be non-dereferenceable.
Yes, both of those are undefined.
vector::end - Return iterator to end (public member function )
You can read more here.
Your first example:
std::vector<int> v(5,1);
cout << *(v.end()-1);
It's undefined (look at the picture), v.end() is pointing to the address after the last element and if the container is empty, this function returns the same as v.begin().
And your second example:
int x = 5,y = 6;
std::vector<int*> pv;
pv.push_back(&x);
pv.push_back(&y);
cout << **(pv.end()-1);
I'm curious about the rationale behind the following code. For a given map, I can delete a range up to, but not including, end() (obviously,) using the following code:
map<string, int> myMap;
myMap["one"] = 1;
myMap["two"] = 2;
myMap["three"] = 3;
map<string, int>::iterator it = myMap.find("two");
myMap.erase( it, myMap.end() );
This erases the last two items using the range. However, if I used the single iterator version of erase, I half expected passing myMap.end() to result in no action as the iterator was clearly at the end of the collection. This is as distinct from a corrupt or invalid iterator which would clearly lead to undefined behaviour.
However, when I do this:
myMap.erase( myMap.end() );
I simply get a segmentation fault. I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case. Is there some subtle reason for this that I'm missing? I noticed that even this works:
myMap.erase( myMap.end(), myMap.end() );
(i.e. does nothing)
The reason I ask is that I have some code which receives a valid iterator to the collection (but which could be end()) and I wanted to simply pass this into erase rather than having to check first like this:
if ( it != myMap.end() )
myMap.erase( it );
which seems a bit clunky to me. The alternative is to re code so I can use the by-key-type erase overload but I'd rather not re-write too much if I can help it.
The key is that in the standard library ranges determined by two iterators are half-opened ranges. In math notation [a,b) They include the first but not the last iterator (if both are the same, the range is empty). At the same time, end() returns an iterator that is one beyond the last element, which perfectly matches the half-open range notation.
When you use the range version of erase it will never try to delete the element referenced by the last iterator. Consider a modified example:
map<int,int> m;
for (int i = 0; i < 5; ++i)
m[i] = i;
m.erase( m.find(1), m.find(4) );
At the end of the execution the map will hold two keys 0 and 4. Note that the element referred by the second iterator was not erased from the container.
On the other hand, the single iterator operation will erase the element referenced by the iterator. If the code above was changed to:
for (int i = 1; i <= 4; ++i )
m.erase( m.find(i) );
The element with key 4 will be deleted. In your case you will attempt to delete the end iterator that does not refer to a valid object.
I wouldn't have thought it difficult for map to check whether the iterator equalled end() and not take action in that case.
No, it is not hard to do, but the function was designed with a different contract in mind: the caller must pass in an iterator into an element in the container. Part of the reason for this is that in C++ most of the features are designed so that the incur the minimum cost possible, allowing the user to balance the safety/performance on their side. The user can test the iterator before calling erase, but if that test was inside the library then the user would not be able to opt out of testing when she knows that the iterator is valid.
n3337 23.2.4 Table 102
a.erase( q1, q2)
erases all the elements in the range [q1,q2). Returns q2.
So, iterator returning from map::end() is not in range in case of myMap.erase(myMap.end(), myMap.end());
a.erase(q)
erases the element pointed to by q. Returns an iterator pointing to the element immediately following q prior to the element being erased. If no such element exists, returns a.end().
I wouldn't have thought it difficult for map to check whether the
iterator equalled end() and not take action in that case. Is there
some subtle reason for this that I'm missing?
Reason is same, that std::vector::operator[] can don't check, that index is in range, of course.
When you use two iterators to specify a range, the range consists of the elements from the element that the first iterator points to up to but not including the element that the second iterator points to. So erase(it, myMap.end()) says to erase everything from it up to but not including end(). You could equally well pass an iterator that points to a "real" element as the second one, and the element that that iterator points to would not be erased.
When you use erase(it) it says to erase the element that it points to. The end() iterator does not point to a valid element, so erase(end()) doesn't do anything sensible. It would be possible for the library to diagnose this situation, and a debugging library will do that, but it imposes a cost on every call to erase to check what the iterator points to. The standard library doesn't impose that cost on users. You're on your own. <g>
According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:
std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
//process( *it );
}
std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a pointer through that iterator is undefined behavior.
Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?
The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.
Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:
char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);
That code will work just fine, and it mirrors your vector code.
You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.
The C standard, section 6.5.6.8 says that it's well defined and valid:
...if the expression P points to the
last element of an array object, the
expression (P)+1 points one past the
last element of the array object...
but cannot be dereferenced:
...if the result points one past the
last element of the array object, it
shall not be used as the operand of a
unary * operator that is evaluated...
One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.
std::vector<X>::iterator it;
This is a singular iterator. You can only assign a valid iterator to it.
std::vector<X>::iterator it = vec.end();
This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).
Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.
It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.
The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.
I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.
First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.
Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".
From the text of the standard in [iterator.requirements], it is clear that:
Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
Dereferenceable values are not singular, since they are associated with sequence.
Past-the-end values are not singular, since they are associated with sequence.
An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.
Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.
The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.
Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.
Simple. Iterators aren't (necessarily) pointers.
They have some similarities (i.e. you can dereference them), but that's about it.
Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite
According to C++ standard (3.7.3.2/4)
using (not only dereferencing, but
also copying, casting, whatever else)
an invalid pointer is undefined
behavior
wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:
int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule