Why using `std::reverse_iterator` doesn't invoke UB? - c++

I was working with std::reverse_iterator today and was thinking about how it works with values created by calling begin on a container. According to cppreference, if I have reverse_iterator r constructed from iterator i, the following has to hold &*r == &*(i-1).
However, this would mean that if I write this
std::vector<int> vec = {1, 2, 3, 4, 5};
auto iter = std::make_reverse_iterator(begin(vec));
iter now points to piece of memory that is placed before begin(vec), which is out of bounds. By strict interpretation of C++ standard, this invokes UB.
(There is specific provision for pointer/iterator to element 1-past-the-end of the array, but as far as I know, none for pointer/iterator to element 1-ahead-of-the-start of an array.)
So, am I reading the link wrong, or is there a specific provision in the standard for this case, or is it that when using reverse_iterator, the whole array is taken as reversed and as such, pointer to ahead of the array is actually pointer past the end?

Yes, you are reading it wrong.
There is no need for reverse-iterators to store pointers pointing before the start of an element.
To illustrate, take an array of 2 elements:
int a[2];
These are the forward-iterators:
a+0 a+1 a+2 // The last one is not dereferenceable
The reverse-iterators would be represented with these exact same values, in reverse order:
a+2 a+1 a+0 // The last one cannot be dereferenced
So, while dereferencing a normal iterator is really straightforward, a reverse-iterator-dereference is slightly more complicated: pointer[-1] (That's for random-access iterators, the others are worse: It copy = pointer; --copy; return *copy;).
Be aware that using forward-iterators is far more common than reverse-iterators, thus the former are more likely to have optimized code for them than the latter. Generic code which does not hit that corner is about as likely to run better with either type though, due to all the transformations a decent optimizing compiler does.

std::make_reverse_iterator(begin(vec)) is not dereferenceable, in the same way that end(vec) is not dereferenceable. It doesn't "point" to any valid object, and that's OK.

Related

Getting a Raw Pointer to the end of a Container

If I have the end iterator to a container, but I want to get a raw pointer to that is there a way to accomplish this?
Say I have a container: foo. I cannot for example do this: &*foo.end() because it yields the runtime error:
Vector iterator not dereferencable
I can do this but I was hoping for a cleaner way to get there: &*foo.begin() + foo.size().
EDIT:
This is not a question about how to convert an iterator to a pointer in general (obviously that's in the question), but how to specifically convert the end iterator to a pointer. The answers in the "duplicate" question actually suggest dereferencing the iterator. The end iterator cannot be dereferenced without seg-faulting.
The correct way to access the end of storage is:
v.data() + v.size()
This is because *v.begin() is invalid when v is empty.
The member function data is provided for all contiguous containers (vector, string and array).
From C++17 you will also be able to use the non-member functions:
data(v) + size(v)
This works on raw arrays as well.
In general? No.
And the fact that you're asking indicates that something is wrong with your overall design.
For vectors, arrays, strings? Sure… but why?
Just get a pointer to a valid element, and advance it:
std::vector<T> foo;
const T* ptr = foo.data() + foo.size();
As long as you don't dereference such a pointer (which is almost equivalent to dereferencing the iterator, as you did in your attempt) it is valid to obtain and hold such a pointer, because it points to the special one-past-the-end location.
Note that &foo[0] + foo.size() has undefined behaviour if the vector is empty, because &foo[0] is &*(foo.data() + 0) is &*foo.data(), and (just like in your attempt) *foo.data() is disallowed if there's nothing there. So we avoid all dereferencing and simply advance foo.data() itself.
Anyway, this only works for the case of vectors1, arrays and strings, though. Other containers do not guarantee (or can be reasonably expected to provide) storage contiguity; their end pointers could be almost anything, e.g. a "sentinel" null pointer, which is unlikely to be of any use to you.
That is why the iterator abstraction is there in the first place. Stick to it if you can, instead of delving into raw pointer usage.
1. Excepting std::vector<bool>.

How to understand std::distance in C++?

The code is as follow:
int B[] = {3,5};
int C[] = {4,5};
cout << distance(B,C);
The output is:
-4
Can anyone explain why is this?
The distance(first, last) function tells you how many items are between the iterator at first and last. Note that pointers are iterators, random-access iterators to be specific. So the distance between one pointer and another is their difference, as defined by operator-.
So your question boils down to "How many ints are there between the int pointed to by B and the int pointed to by C?
distance dutifully subtracts the pointers and tells you.
The trick is that distance is supposed to be applied to iterators from the same container. Your code does not live up to that promise. The compiler is free to place the B and C arrays wherever it pleases, hence the result you see is meaningless. Like many things in C++, it's up to you to ensure that you're using distance properly. If you don't, you'll get undefined behavior, where the language makes no guarantees what will happen.
std::distance(__first, __last) is designed to generalize pointer arithmetic, it returns a value n such that __first + n = __last. for your case, the arguments are pointers of int*, in terms of iteration, they are random accessed iterators. the implementation simply returns a value of __last - __first: simply (int*)C - (int*)B.

Is it!=container.end() design mistake/feature or just necessity?

recently I was thinking about how it would be nice if iterators implicitly converted to bool so you could do
auto it = find(begin(x),end(x), 42);
if (it) //not it!=x.end();
{
}
but thinking about it I realized that this would mean that either it would had to be set to "NULL", so that you couldnt use the it directly if you want to do something with it (you would have to use x.end()) or you could use it but iter size would have to be bigger(to store if what it points to was .end() or not).
So my questions are:
Is the syntax in my example achievable without breaking current code and without increasing the sizeof iterator?
would implicitly conversion to bool cause some problems?
You are working on the assumption that iterators are a way of accessing a container. They allow you to do that, but they also allow many more things that would clearly not fit your intended operations:
auto it = std::find(std::begin(x), std::next(std::begin(x),10), 42 );
// Is 42 among the first 10 elements of 'x'?
auto it = std::find(std::istream_iterator<int>(std::cout),
std::istream_iterator<int>(), 42 );
// Is 42 one of the numbers from standard input?
In the first case the iterator does refer to a container, but the range where you are finding does not enclose the whole container, so it cannot be tested against end(x). In the second case there is no container at all.
Note that an efficient implementation of an iterator for many containers holds just a pointer, so any other state would increase the size of the iterator.
Regarding conversions to any-type or bool, they do cause many problems, but they can be circumvented in C++11 by means of explicit conversions, or in C++03 by using the safe-bool idiom.
You are probably more interested on a different concept: ranges. There are multiple approaches to ranges, so it is not so clear what the precise semantics should be. The first two that come to mind are Boost.Iterator and an article by Alexandrescu I read recently called On Iteration.
Two reasons this wouldn't work:
First, it's possible to use raw pointers as iterators (usually into an array):
int data[] = { 50, 42, 37, 5 };
auto it = find(begin(data), end(data), 42);
Second, you don't have to pass the actual end of the container to find; to e.g. find the first space character before a period:
auto sentence = "Hello, world.";
auto it1 = find(begin(sentence), end(sentence), '.');
auto it2 = find(begin(sentence), it1, ' ');
There's very little that can be done with a single iterator. A pair of iterators defines a sequence consisting of the elements; the first iterator points to the first element and the second iterator points one past the end of the last element. There's no way, in general, for the first iterator to know when it's been incremented to match the second iterator. Algorithms do that, because they have both iterators and can tell when the work is done. For example:
std::vector<int> vec;
vec.push_back(1);
vec.push_back(2);
vec.push_back(3);
// copy the contents of the vector:
std::copy(somewhere, vec.begin(), vec.end());
// copy the first two elements of the vector:
std::copy(somewhere, vec.begin(), vec.begin() + 2);
In both calls to copy, vec.begin() is the same iterator; the algorithm does different things because it got the second iterator that tells it when to stop.
Granted, it's possible to design a different kind of iterator that contains both the beginning and the end of the sequence (as Java does), but that's not how C++ iterators are designed. There's discussion of standardizing the notion of a "range" that holds two iterators (the new range-based for loop is a first step toward that).
Well, you probably don't want an implicit conversion, but requiring two
separate objects to determine when iteration is done is clearly a design
error. It's not so much because of the if or the for (although
using a single iterator would make these clearer as well); it's really
because it makes functional decomposition and filtering iterators
extreamly difficult, if not impossible.
Fundamentally, STL iterators are closer to smart pointers than they are
to iterators. There are times when such pointers are appropriate, but
they aren't a good replacement for iterators.

How to correctly (yet efficiently) implement something like "vector::insert"? (Pointer aliasing)

Consider this hypothetical implementation of vector:
template<class T> // ignore the allocator
struct vector
{
typedef T* iterator;
typedef const T* const_iterator;
template<class It>
void insert(iterator where, It begin, It end)
{
...
}
...
}
Problem
There is a subtle problem we face here:
There is the possibility that begin and end refer to items in the same vector, after where.
For example, if the user says:
vector<int> items;
for (int i = 0; i < 1000; i++)
items.push_back(i);
items.insert(items.begin(), items.end() - 2, items.end() - 1);
If It is not a pointer type, then we're fine.
But we don't know, so we must check that [begin, end) does not refer to a range already inside the vector.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
So the compiler could falsely tell us that the items don't alias, when in fact they do, giving us unnecessary O(n) slowdown.
Potential solution & caveat
One solution is to copy the entire vector every time, to include the new items, and then throw away the old copy.
But that's very slow in scenarios such as in the example above, where we'd be copying 1000 items just to insert 1 item, even though we might clearly already have enough capacity.
Is there a generic way to (correctly) solve this problem efficiently, i.e. without suffering from O(n) slowdown in cases where nothing is aliasing?
You can use the predicates std::less etc, which are guaranteed to give a total order, even when the raw pointer comparisons do not.
From the standard [comparisons]/8:
For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
Wrong. The pointer comparisons are unspecified, not undefined. From C++03 §5.9/2 [expr.rel]:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
[...]
-Other pointer comparisons are unspecified.
So it's safe to test if there is an overlap before doing the expensive-but-correct copy.
Interestingly, C99 differs from C++ in this, in that pointer comparisons between unrelated objects is undefined behavior. From C99 §6.5.8/5:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. [...] In all other cases, the behavior is undefined.
Actually, this would be true even if they were regular iterators. There's nothing stopping anyone doing
std::vector<int> v;
// fill v
v.insert(v.end() - 3, v.begin(), v.end());
Determining if they alias is a problem for any implementation of iterators.
However, the thing you're missing is that you're the implementation, you don't have to use portable code. As the implementation, you can do whatever you want. You could say "Well, in my implementation, I follow x86 and < and > are fine to use for any pointers.". And that would be fine.

Why is comparing against "end()" iterator legal?

According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:
std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
//process( *it );
}
std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a pointer through that iterator is undefined behavior.
Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?
The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.
Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:
char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);
That code will work just fine, and it mirrors your vector code.
You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.
The C standard, section 6.5.6.8 says that it's well defined and valid:
...if the expression P points to the
last element of an array object, the
expression (P)+1 points one past the
last element of the array object...
but cannot be dereferenced:
...if the result points one past the
last element of the array object, it
shall not be used as the operand of a
unary * operator that is evaluated...
One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.
std::vector<X>::iterator it;
This is a singular iterator. You can only assign a valid iterator to it.
std::vector<X>::iterator it = vec.end();
This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).
Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.
It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.
The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.
I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.
First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.
Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".
From the text of the standard in [iterator.requirements], it is clear that:
Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
Dereferenceable values are not singular, since they are associated with sequence.
Past-the-end values are not singular, since they are associated with sequence.
An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.
Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.
The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.
Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.
Simple. Iterators aren't (necessarily) pointers.
They have some similarities (i.e. you can dereference them), but that's about it.
Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite
According to C++ standard (3.7.3.2/4)
using (not only dereferencing, but
also copying, casting, whatever else)
an invalid pointer is undefined
behavior
wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:
int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule