How to understand std::distance in C++? - c++

The code is as follow:
int B[] = {3,5};
int C[] = {4,5};
cout << distance(B,C);
The output is:
-4
Can anyone explain why is this?

The distance(first, last) function tells you how many items are between the iterator at first and last. Note that pointers are iterators, random-access iterators to be specific. So the distance between one pointer and another is their difference, as defined by operator-.
So your question boils down to "How many ints are there between the int pointed to by B and the int pointed to by C?
distance dutifully subtracts the pointers and tells you.
The trick is that distance is supposed to be applied to iterators from the same container. Your code does not live up to that promise. The compiler is free to place the B and C arrays wherever it pleases, hence the result you see is meaningless. Like many things in C++, it's up to you to ensure that you're using distance properly. If you don't, you'll get undefined behavior, where the language makes no guarantees what will happen.

std::distance(__first, __last) is designed to generalize pointer arithmetic, it returns a value n such that __first + n = __last. for your case, the arguments are pointers of int*, in terms of iteration, they are random accessed iterators. the implementation simply returns a value of __last - __first: simply (int*)C - (int*)B.

Related

Constant time `contains` for `std::vector`? [duplicate]

This question already has an answer here:
How to correcly check whether a pointer belongs within an allocated block?
(1 answer)
Closed 1 year ago.
I am working with some code that checks if std::vector contains a given element in constant time by comparing its address to those describing the extent of the vector's data. However I suspect that, although it works, it relies on undefined behaviour. If the element is not contained by the vector then the pointer comparisons are not permitted.
bool contains(const std::vector<T>& v, const T& a) {
return (v.data() <= &a) && (&a < v.data() + v.size());
}
Am I right in believing it is undefined behaviour? If so, is there any way to do the same thing without drastically changing the time complexity of the code?
You can use std::less
A specialization of std::less for any pointer type yields the implementation-defined strict total order, even if the built-in < operator does not.
Update:
The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.
As pointed out in the comments, the standard only guarantees that std::less yields the implementation-defined strict total order, which is is consistent with the partial order imposed by the builtin operators. However, the standard doesn't guarantee the order of pointers pointing to different objects or arrays. Releated: https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095
One interesting thing is that there's a similar usage in Herb Sutter's gcpp library(link). There's a comment saying that it is portable, the library is experimental though.
// Return whether p points into this page's storage and is allocated.
//
inline
bool gpage::contains(gsl::not_null<const byte*> p) const noexcept {
// Use std::less<> to compare (possibly unrelated) pointers portably
auto const cmp = std::less<>{};
auto const ext = extent();
return !cmp(p, ext.data()) && cmp(p, ext.data() + ext.size());
}
Yes, the comparisons as written are not permitted if the reference doesn't reference something that is already an element of the vector.
You can make the behavior defined by casting all pointers to uintptr_t and comparing those. This will work on all architectures with continuous memory (i.e. possibly not old 16-bit x86), although I don't know if the specific semantics are guaranteed.
As a side note, I would always interpret the name contains to be about the value, and thus be very surprised if the semantics are anything other than std::find(v.begin(), v.end(), a) != v.end(). Consider using a more expressive name.

How can I get the penultimate element in a list?

I have a std::list<double> foo;
I'm using
if (foo.size() >= 2){
double penultimate = *(--foo.rbegin());
}
but this always gives me an arbitrary value of penultimate.
What am I doing wrong?
Rather than decrementing rbegin, you should increment it, as shown here:1
double penultimate = *++foo.rbegin();
as rbegin() returns a reverse iterator, so ++ is the operator to move backwards in the container. Note that I've also dropped the superfluous parentheses: that's not to everyone's taste.
Currently the behaviour of your program is undefined since you are actually moving to end(), and you are not allowed to dereference that. The arbitrary nature of the output is a manifestation of that undefined behaviour.
1Do retain the minimum size check that you currently have.
The clearest way, in my mind, is to use the construct designed for this purpose (C++11):
double penultimate = *std::prev(foo.end(), 2)
I would just do *--(--foo.end()); no need for reverse iterators. It's less confusing too.

Getting a Raw Pointer to the end of a Container

If I have the end iterator to a container, but I want to get a raw pointer to that is there a way to accomplish this?
Say I have a container: foo. I cannot for example do this: &*foo.end() because it yields the runtime error:
Vector iterator not dereferencable
I can do this but I was hoping for a cleaner way to get there: &*foo.begin() + foo.size().
EDIT:
This is not a question about how to convert an iterator to a pointer in general (obviously that's in the question), but how to specifically convert the end iterator to a pointer. The answers in the "duplicate" question actually suggest dereferencing the iterator. The end iterator cannot be dereferenced without seg-faulting.
The correct way to access the end of storage is:
v.data() + v.size()
This is because *v.begin() is invalid when v is empty.
The member function data is provided for all contiguous containers (vector, string and array).
From C++17 you will also be able to use the non-member functions:
data(v) + size(v)
This works on raw arrays as well.
In general? No.
And the fact that you're asking indicates that something is wrong with your overall design.
For vectors, arrays, strings? Sure… but why?
Just get a pointer to a valid element, and advance it:
std::vector<T> foo;
const T* ptr = foo.data() + foo.size();
As long as you don't dereference such a pointer (which is almost equivalent to dereferencing the iterator, as you did in your attempt) it is valid to obtain and hold such a pointer, because it points to the special one-past-the-end location.
Note that &foo[0] + foo.size() has undefined behaviour if the vector is empty, because &foo[0] is &*(foo.data() + 0) is &*foo.data(), and (just like in your attempt) *foo.data() is disallowed if there's nothing there. So we avoid all dereferencing and simply advance foo.data() itself.
Anyway, this only works for the case of vectors1, arrays and strings, though. Other containers do not guarantee (or can be reasonably expected to provide) storage contiguity; their end pointers could be almost anything, e.g. a "sentinel" null pointer, which is unlikely to be of any use to you.
That is why the iterator abstraction is there in the first place. Stick to it if you can, instead of delving into raw pointer usage.
1. Excepting std::vector<bool>.

Why using `std::reverse_iterator` doesn't invoke UB?

I was working with std::reverse_iterator today and was thinking about how it works with values created by calling begin on a container. According to cppreference, if I have reverse_iterator r constructed from iterator i, the following has to hold &*r == &*(i-1).
However, this would mean that if I write this
std::vector<int> vec = {1, 2, 3, 4, 5};
auto iter = std::make_reverse_iterator(begin(vec));
iter now points to piece of memory that is placed before begin(vec), which is out of bounds. By strict interpretation of C++ standard, this invokes UB.
(There is specific provision for pointer/iterator to element 1-past-the-end of the array, but as far as I know, none for pointer/iterator to element 1-ahead-of-the-start of an array.)
So, am I reading the link wrong, or is there a specific provision in the standard for this case, or is it that when using reverse_iterator, the whole array is taken as reversed and as such, pointer to ahead of the array is actually pointer past the end?
Yes, you are reading it wrong.
There is no need for reverse-iterators to store pointers pointing before the start of an element.
To illustrate, take an array of 2 elements:
int a[2];
These are the forward-iterators:
a+0 a+1 a+2 // The last one is not dereferenceable
The reverse-iterators would be represented with these exact same values, in reverse order:
a+2 a+1 a+0 // The last one cannot be dereferenced
So, while dereferencing a normal iterator is really straightforward, a reverse-iterator-dereference is slightly more complicated: pointer[-1] (That's for random-access iterators, the others are worse: It copy = pointer; --copy; return *copy;).
Be aware that using forward-iterators is far more common than reverse-iterators, thus the former are more likely to have optimized code for them than the latter. Generic code which does not hit that corner is about as likely to run better with either type though, due to all the transformations a decent optimizing compiler does.
std::make_reverse_iterator(begin(vec)) is not dereferenceable, in the same way that end(vec) is not dereferenceable. It doesn't "point" to any valid object, and that's OK.

How to correctly (yet efficiently) implement something like "vector::insert"? (Pointer aliasing)

Consider this hypothetical implementation of vector:
template<class T> // ignore the allocator
struct vector
{
typedef T* iterator;
typedef const T* const_iterator;
template<class It>
void insert(iterator where, It begin, It end)
{
...
}
...
}
Problem
There is a subtle problem we face here:
There is the possibility that begin and end refer to items in the same vector, after where.
For example, if the user says:
vector<int> items;
for (int i = 0; i < 1000; i++)
items.push_back(i);
items.insert(items.begin(), items.end() - 2, items.end() - 1);
If It is not a pointer type, then we're fine.
But we don't know, so we must check that [begin, end) does not refer to a range already inside the vector.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
So the compiler could falsely tell us that the items don't alias, when in fact they do, giving us unnecessary O(n) slowdown.
Potential solution & caveat
One solution is to copy the entire vector every time, to include the new items, and then throw away the old copy.
But that's very slow in scenarios such as in the example above, where we'd be copying 1000 items just to insert 1 item, even though we might clearly already have enough capacity.
Is there a generic way to (correctly) solve this problem efficiently, i.e. without suffering from O(n) slowdown in cases where nothing is aliasing?
You can use the predicates std::less etc, which are guaranteed to give a total order, even when the raw pointer comparisons do not.
From the standard [comparisons]/8:
For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
Wrong. The pointer comparisons are unspecified, not undefined. From C++03 §5.9/2 [expr.rel]:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
[...]
-Other pointer comparisons are unspecified.
So it's safe to test if there is an overlap before doing the expensive-but-correct copy.
Interestingly, C99 differs from C++ in this, in that pointer comparisons between unrelated objects is undefined behavior. From C99 §6.5.8/5:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. [...] In all other cases, the behavior is undefined.
Actually, this would be true even if they were regular iterators. There's nothing stopping anyone doing
std::vector<int> v;
// fill v
v.insert(v.end() - 3, v.begin(), v.end());
Determining if they alias is a problem for any implementation of iterators.
However, the thing you're missing is that you're the implementation, you don't have to use portable code. As the implementation, you can do whatever you want. You could say "Well, in my implementation, I follow x86 and < and > are fine to use for any pointers.". And that would be fine.