Can I check whether or not a given pointer points to an object within an array, specified by its bounds?
template <typename T>
bool points_within_array(T* p, T* begin, T* end)
{
return begin <= p && p < end;
}
Or do the pointer comparisons invoke undefined behavior if p points outside the bounds of the array? In that case, how do I solve the problem? Does it work with void pointers? Or is it impossible to solve?
Although the comparison is valid only for pointers within the array and "one past the end", it is valid to use a set or map with a pointer as the key, which uses std::less<T*>
There was a big discussion on this way back in 1996 on comp.std.c++
Straight from the MSDN documentation:
Two pointers of different types cannot be compared unless:
One type is a class type derived from the other type.
At least one of the pointers is explicitly converted (cast) to type void *. (The other pointer is implicitly converted to type void * for the conversion.)
So a void* can be compared to anything else (including another void*). But will the comparison produce meaningful results?
If two pointers point to elements of
the same array or to the element one
beyond the end of the array, the
pointer to the object with the higher
subscript compares higher. Comparison
of pointers is guaranteed valid only
when the pointers refer to objects in
the same array or to the location one
past the end of the array.
Looks like not. If you don't already know that you are comparing items inside the array (or just past it), then the comparison is not guaranteed to be meaningful.
There is, however, a solution: The STL provides std::less<> and std::greater<>, which will work with any pointer type and will produce valid results in all cases:
if (std::less<T*>()(p, begin)) {
// p is out of bounds
}
Update:
The answer to this question gives the same suggestion (std::less) and also quotes the standard.
The only correct way to do this is an approach like this.
template <typename T>
bool points_within_array(T* p, T* begin, T* end)
{
for (; begin != end; ++begin)
{
if (p == begin)
return true;
}
return false;
}
Fairly obviously, this doesn't work if T == void. I'm not sure whether two void* technically define a range or not. Certainly if you had Derived[n], it would be incorrect to say that (Base*)Derived, (Base*)(Derived + n) defined a valid range so I can't see it being valid to define a range with anything other than a pointer to the actual array element type.
The method below fails because it is unspecified what < returns if the two operands don't point to members of the same object or elements of the same array. (5.9 [expr.rel] / 2)
template <typename T>
bool points_within_array(T* p, T* begin, T* end)
{
return !(p < begin) && (p < end);
}
The method below fails because it is also unspecified what std::less<T*>::operator() returns if the two operands don't point to members of the same object or elements of the same array.
It is true that a std::less must be specialized for any pointer type to yield a total order if the built in < does not but this is only useful for uses such as providing a key for a set or map. It is not guaranteed that the total order won't interleave distinct arrays or objects together.
For example, on a segmented memory architecture the object offset could be used for < and as the most significant differentiator for std::less<T*> with the segment index being used to break ties. In such a system an element of one array could be ordered between the bounds of a second distinct array.
template <typename T>
bool points_within_array(T* p, T* begin, T* end)
{
return !(std::less<T*>()(p, begin)) && (std::less<T*>()(p, end));
}
The C++ standard does not specify what happens when you are comparing pointers to objects that do not reside in the same array, hence undefined behaviour. However, the C++ standard is not the only standard your platform must conform. Other standards like POSIX specify things that C++ standard leaves as undefined behaviour.
On platforms with virtual address space like Linux and Win32/64 you can compare any pointers without causing any undefined behaviour.
comparisions on pointer types don't neccesarily result in a total order. std::less/std::greater_equal do, however. So ...
template <typename T>
bool points_within_array(T* p, T* begin, T* end)
{
return std::greater_equal<T*>()(p, begin) && std::less<T*>()(p, end);
}
will work.
Could you not do this with std::distance, i.e. your problem effectively boils down to:
return distance(begin, p) >= 0 && distance(begin, p) < distance(begin, end);
Given this random access iterator (pointer) is being passed in, it should boil down to some pointer arithmetic rather than pointer comparisons? (I'm assuming end really is end and not the last item in the array, if the last then change the less than to <=).
I could be way off the mark...
Related
I came across the following code:
for (int i = 0; i < subspan.size(); i++) {
...
int size = size_table[&(subspan[i]) - fullspan.begin()];
...
}
subspanand fullspan are both of type std::span (actually absl::Span from Google's Abseil library, but they seem to be pretty much the same as std::span) and are views into the same data array (with fullspan spanning the entire array).
Is this valid and well defined code? It seems to depend on the iterator being converted to the corresponding pointer value when the - operator is applied together with a lhs pointer.
Is it valid to subtract an iterator from an element pointer to get a valid index?
It could be, depending on how the iterator is defined. For example, it works if the iterator is a pointer of the same type, and points to an element of the same array.
However, no generic iterator concept specifies such operation, and so such operation isn't guaranteed to work with any standard iterator. Hence, it's not a portable assumption that it would work in generic code.
Is this valid and well defined code?
The iterator type in question is defined to be the pointer type, so that condition is satisfied. Abseil is neither thoroughly documented nor specified, so it's hard to say whether that's an intentional feature, or incidental implementation detail. If it's latter, then the code may break in future versions of Abseil.
Reading the implementation of absl::Span, we have:
template <typename T>
class Span {
...
public:
using element_type = T;
using pointer = T*;
using const_pointer = const T*;
using reference = T&;
...
using iterator = pointer;
...
constexpr iterator begin() const noexcept { return data(); }
constexpr reference operator[](size_type i) const noexcept { return *(data() + i); }
...
}
So your expression boils down to plain pointer arithmetic.
Note that there is no check on whether both spans refer to the same base span, but you asserted that was not the case.
This question already has an answer here:
How to correcly check whether a pointer belongs within an allocated block?
(1 answer)
Closed 1 year ago.
I am working with some code that checks if std::vector contains a given element in constant time by comparing its address to those describing the extent of the vector's data. However I suspect that, although it works, it relies on undefined behaviour. If the element is not contained by the vector then the pointer comparisons are not permitted.
bool contains(const std::vector<T>& v, const T& a) {
return (v.data() <= &a) && (&a < v.data() + v.size());
}
Am I right in believing it is undefined behaviour? If so, is there any way to do the same thing without drastically changing the time complexity of the code?
You can use std::less
A specialization of std::less for any pointer type yields the implementation-defined strict total order, even if the built-in < operator does not.
Update:
The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.
As pointed out in the comments, the standard only guarantees that std::less yields the implementation-defined strict total order, which is is consistent with the partial order imposed by the builtin operators. However, the standard doesn't guarantee the order of pointers pointing to different objects or arrays. Releated: https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095
One interesting thing is that there's a similar usage in Herb Sutter's gcpp library(link). There's a comment saying that it is portable, the library is experimental though.
// Return whether p points into this page's storage and is allocated.
//
inline
bool gpage::contains(gsl::not_null<const byte*> p) const noexcept {
// Use std::less<> to compare (possibly unrelated) pointers portably
auto const cmp = std::less<>{};
auto const ext = extent();
return !cmp(p, ext.data()) && cmp(p, ext.data() + ext.size());
}
Yes, the comparisons as written are not permitted if the reference doesn't reference something that is already an element of the vector.
You can make the behavior defined by casting all pointers to uintptr_t and comparing those. This will work on all architectures with continuous memory (i.e. possibly not old 16-bit x86), although I don't know if the specific semantics are guaranteed.
As a side note, I would always interpret the name contains to be about the value, and thus be very surprised if the semantics are anything other than std::find(v.begin(), v.end(), a) != v.end(). Consider using a more expressive name.
Intuitively to check whecker pointer p lies in [a,b) one will do
a<=p && p<b
However, comparing pointers from two arrays results in unspecified behavior and thus we cannot safely say p is in [a,b) from this comparison.
Is there any way one can check for this with certainty?
(It would be better if it can be done for std::vector<T>::const_iterator, but I don't think it's feasible.)
Here's a partial solution. You can leverage the fact that the comparison would invoke unspecified behavior, and the fact that a core-constant-expression can't perform this operation:
template<typename T>
constexpr bool check(T *p, T *a, T *b)
{
return a <= p and p < b;
}
Now this function can be used like this:
int main()
{
int arr[5];
int arr_2[5];
constexpr bool b1 = check(arr + 1, arr, arr + 3); // ok
constexpr bool b2 = check(arr_2 + 1, arr, arr + 3); // error
}
Here's a demo.
This obviously works only if the pointer values are known at compile time. At run-time, there is no efficient way of doing this check.
The solution for pointers is to use the comparison objects defined in <functional>, like less/less_equal, etc.
From §20.8.5/8 of the c++17 standard1:
For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.
So the solution for pointers would be:
template<typename T>
bool check(T *p, T *a, T *b)
{
return std::less_equal<T*>{}(a,p) && std::less<T*>{}(p,b);
}
Here's a working example using pointers.
There is no such strict guarantee for iterators; however this can be worked around in c++20, since it provides std::to_address which can convert pointable objects to pointers. Note, however, that the behavior of doing this for the purpose of comparisons is only really well defined for contiguous iterators.
Since we know that std::vector iterators cover a contiguous range, we can use this to retrieve the underlying pointer (note: not dereference it, as this would be undefined behavior for the past-the-end pointer).
So for a std::vector<T>::iterator, a solution might look like:
template <typename T>
bool check(const std::vector<T>::const_iterator p, std;:vector<T>::const_iterator a, std::vector<T>::const_iterator b)
{
// Delegate to the pointer check version defined above, for brevity
return check(std::to_address(p), std::to_address(a), std::to_address(b));
}
Here's a working example using iterators.
1 This same note exists all the way back to c++11 under §23.14.7/2, with similar wording.
If I understand you correctly, you want to check if vector iterator is between two other vector iterators.
Then you may use std::distance to compute distance between vector.begin and a, p and b and then simply compare itegers you get from distance return value.
std::distance(first, last) from C++17 can be used for both, but result is undefined if last is unreachable from first (e.g. different range or invalid iterator)
When I'm making a procedure with pointer arithmetic and !=, such as
template <typename T> void reverse_array ( T * arr, size_t n )
{
T * end = arr + n;
while (arr != end && arr != --end)
{
swap(arr,end);
++arr;
}
}
I always take a lot of caution because if I write my procedure wrong then in a corner case the first pointer might "jump over" the second one. But, if arrays are such that
&arr[0] < &arr[1] < ... < &arr[n]
for any array arr of length n-1, then can't I just do something like
template <typename T> void reverse_array ( T * arr, size_t n )
{
T * end = arr + n;
if (arr == end) break;
--end;
while (arr < end)
{
swap(arr,end);
++arr; --end;
}
}
since it's more readable? Or is there a danger looming? Aren't memory addresses just integral types and thus comparable with < ?
The relational operators are defined to work correctly when comparing addresses within the same array (in fact, also objects of class type, where there are some guarantees about memory layout also) including the one-past-the-end pointer.
However, if you "jump over" the end-of-array pointer, you are no longer comparing two addresses within the same array, and the behavior is undefined. (One cause is that you might in fact experience wraparound when you do pointer arithmetic outside objects, but UB is not restricted).
Your case is perfectly fine concerning jump-over, because your end pointer isn't the one-past-the-end of the array, since you always do at least one --end. An empty array, where --end moves outside the array, would be an issue, but you test for that separately.
Conclusion: your second code is perfectly valid.
For C (since you tagged both), yes, they can be compared, within the same array:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the
expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
-- C11 6.5.8, "Relational operators".
But it's not because they're "just integral types", which they aren't (and aren't guaranteed to be represented as in memory) - it's because they also have the behaviour defined for them.
Consider this hypothetical implementation of vector:
template<class T> // ignore the allocator
struct vector
{
typedef T* iterator;
typedef const T* const_iterator;
template<class It>
void insert(iterator where, It begin, It end)
{
...
}
...
}
Problem
There is a subtle problem we face here:
There is the possibility that begin and end refer to items in the same vector, after where.
For example, if the user says:
vector<int> items;
for (int i = 0; i < 1000; i++)
items.push_back(i);
items.insert(items.begin(), items.end() - 2, items.end() - 1);
If It is not a pointer type, then we're fine.
But we don't know, so we must check that [begin, end) does not refer to a range already inside the vector.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
So the compiler could falsely tell us that the items don't alias, when in fact they do, giving us unnecessary O(n) slowdown.
Potential solution & caveat
One solution is to copy the entire vector every time, to include the new items, and then throw away the old copy.
But that's very slow in scenarios such as in the example above, where we'd be copying 1000 items just to insert 1 item, even though we might clearly already have enough capacity.
Is there a generic way to (correctly) solve this problem efficiently, i.e. without suffering from O(n) slowdown in cases where nothing is aliasing?
You can use the predicates std::less etc, which are guaranteed to give a total order, even when the raw pointer comparisons do not.
From the standard [comparisons]/8:
For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.
But how do we do this? According to C++, if they don't refer to the same array, then pointer comparisons would be undefined!
Wrong. The pointer comparisons are unspecified, not undefined. From C++03 §5.9/2 [expr.rel]:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
[...]
-Other pointer comparisons are unspecified.
So it's safe to test if there is an overlap before doing the expensive-but-correct copy.
Interestingly, C99 differs from C++ in this, in that pointer comparisons between unrelated objects is undefined behavior. From C99 §6.5.8/5:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. [...] In all other cases, the behavior is undefined.
Actually, this would be true even if they were regular iterators. There's nothing stopping anyone doing
std::vector<int> v;
// fill v
v.insert(v.end() - 3, v.begin(), v.end());
Determining if they alias is a problem for any implementation of iterators.
However, the thing you're missing is that you're the implementation, you don't have to use portable code. As the implementation, you can do whatever you want. You could say "Well, in my implementation, I follow x86 and < and > are fine to use for any pointers.". And that would be fine.