Just for fun, I had a std::list of const char*, each element pointing to a null-terminated text string, and ran a std::list::sort() on it. As it happens, it sort of (no pun intended) did not sort the strings. Considering that it was working on pointers, that makes sense.
According to the documentation of std::list::sort(), it (by default) uses the operator < between the elements to compare.
Forgetting about the list for a moment, my actual question is: How do these (>, <, >=, <=) operators work on pointers in C++ and C? Do they simply compare the actual memory addresses?
char* p1 = (char*) 0xDAB0BC47;
char* p2 = (char*) 0xBABEC475;
e.g. on a 32-bit, little-endian system, p1 > p2 because 0xDAB0BC47 > 0xBABEC475?
Testing seems to confirm this, but I thought it'd be good to put it on StackOverflow for future reference. C and C++ both do some weird things to pointers, so you never really know...
In C++, you can't compare just any pointers using the relational operators. You can only compare two pointers that point to elements in the same array or two pointers that point to members of the same object. (You can also compare a pointer with itself, of course.)
You can, however, use std::less and the other relational comparison function objects to compare any two pointers. The results are implementation-defined, but it is guaranteed that there is a total ordering.
If you have a flat address space, it's likely that pointer comparisons just compare addresses as if they are integers.
(I believe the rules are the same in C, without the comparison function objects, but someone will have to confirm that; I'm not nearly as familiar with C as I am with C++.)
This is just a supplementation.
In C++ 20.3.3/8:
For templates greater, less,
greater_equal, and less_equal, the
specializations for any pointer type
yield a total order, even if the
built-in operators <, >, <=, >= do
not.
In C 6.5.8/5:
If two pointers to object or
incomplete types both point to the
same object, or both point one past
the last element of the same array
object, they compare equal. If the
objects pointed to are members of the
same aggregate object, pointers to
structure members declared later
compare greater than pointers to
members declared earlier in the
structure, and pointers to array
elements with larger subscript values
compare greater than pointers to
elements of the same array with lower
subscript values. All pointers to
members of the same union object
compare equal. If the expression P
points to an element of an array
object and the expression Q points to
the last element of the same array
object, the pointer expression Q+1
compares greater than P. In all other
cases, the behavior is undefined.
So, I think comparing char const* which belong to two different '\0'-terminated-string as in the question is an undefined behavior (in C).
Yes, they just compare memory address.
Related
Assume we have an array that contains N elements of type T.
T a[N];
According to the C++14 Standard, under which conditions do we have a guarantee that
(char*)(void*)&a[0] + n*sizeof(T) == (char*)(void*)&a[n], (0<=n<N) ?
While this is true for many types and implementations, the standard mentions it in a footnote, and in an ambiguous way:
§5.7.6, footnote 85) Another way to approach pointer arithmetic ...
There is little indication that this other way was thought of being equivalent to the standard's way. It might rather be a hint for implementers that suggests one of many conforming implementations.
Edits:
People have underestimated the difficulty of this question.
This question is not about what you can read in textbooks, it is about what what you can deduce from the C++14 Standard through the use of logic and reason.
If you use 'contiguous' or 'contiguously', please also say what is being contiguous.
While T[] and T* are closely related, they are abstractions, and the addition on T* x N may be defined by the implementation in any consistent way.
The equation was rearranged using pointer addition. If p points to a char, p+1 is always defined using (§5.7 (4)) or unary addition, so we don't run into UB. The original included a pointer subtraction, which might have caused UB early on. (The char pointers are only compared, not dereferenced).
In [dcl.array]:
An object of array type contains a contiguously allocated non-empty
set of N subobjects of type T.
Contiguous implies that the offset between any consecutive subobjects of type T is sizeof(T), which implies that the offset of the nth subobject is n*sizeof(T).
The upper bound of n < N comes from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,
the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j < n; otherwise, the behavior is undefined.
It's always true, but instead of looking at the rules for pointer arithmetic you must rely on the semantics given for the sizeof operator (5.3.3 [expr.sizeof]):
When applied to a reference or a reference type, the result is the size of the referenced type. When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero.
The result of applying sizeof to a base class subobject is the size of the base class type. When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.
It should be clear that there's only one packing that puts n non-overlapping elements in space of n * sizeof(element), namely that they are regularly spaced sizeof (element) bytes apart. And only one ordering is allowed by the pointer comparison rules found under the relational operator section (5.9 [expr.rel]):
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
The declaration in the first line is also a definition. (§3.1(2))
It creates the array object. (§1.8(1))
An object can be accessed via multiple lvalues
due to the aliasing rules. (§3.10(10)) In particular, the objects on the
right hand side may be legally accessed (aliased) through char pointers.
Lets look at a sentence in the array definition and then disambiguate 'contiguous'.
"An object of array type contains a contiguously allocated non-empty set
of N subobjects of type T." [dcl.array] §8.3.4.
Disambiguation
We start from the binary symmetric relation 'contiguous' for char objects, which should be obvious. ('iff' is short for 'if and only if', sets and sequences are mathematical ones, not C++ containers) If you can
link to a better or more acknowledged definition, comment.
A sequence x_1 ... x_N of char objects is contiguous iff
x_i and x_{i+1} are contiguous in memory for all i=1...N-1.
A set M of char objects is contiguous iff the objects in
M can be numbered, x_1 ...x_N, say, such that the sequence (x_i)_i is contiguous.
That is, iff M is the image of a contiguous, injective sequence.
Two sets M_1, M_2 of char objects are contiguous iff there
exist x_1 in M_1 and x_2 in M_2 such that x_1 and x_2 are contiguous.
A sequence M_1 ... M_N of sets of char objects is contiguous iff
M_i and M_{i+1} are contiguous for all i=1...N-1.
A set of sets of char objects is contiguous iff it is the image of
a contiguous, injective sequence of sets of char objects.
Now which version of 'contiguous' to apply? Linguistic overload resolution:
1) 'contiguous' may refer to 'allocation'. As an allocation function call provides a
subset of the available char objects, this would invoke the set-of-chars variant. That is,
the set of all char objects that occur in any of the N subobjects would be meant to be contiguous.
2) 'contiguous' may refer to 'set'. This would invoke the set-of-sets-of-chars variant with every subobject considered as a set of char objects.
What does this mean? First, while the authors numbered the array subobjects a[0] ... a[N-1], they chose not to say anything about the
order of subobjects in memory: they used 'set' instead of 'sequence'.
They described the allocation as contiguous, but they do not say that
a[j] and a[j+1] are contiguous in memory. Also, they chose not to write down the
straightforward formula involving (char*) pointers and sizeof(). While it looks like they
deliberately separated contiguity from ordering concerns,
§5.9 (3) requires one and the same ordering for array subobjects of all types.
If pointers point to two different elements of the same array, or a subobject thereof, the pointer
to the element with the higher subscript compares greater.
Now do the bytes that make up the array subobjects qualify as
subobjects in the sense of the above quote? Reading §1.8(2) and Complete object or subobject?
the answer is: No, at least not for arrays whose elements don't contain subobjects and are no arrays of chars, e.g. arrays of ints. So we may find examples where no particular ordering is imposed on the array elements.
But for the moment let's assume that our array subobjects are populated with chars only.
What does this mean considering the two possible interpretations of 'contiguous'?
1) We have a contiguous set of bytes that coincides with an ordered set of subobjects.
Then the claim in the OP is unconditionally true.
2) We have a contiguous sequence of subobjects, each of which may be non-contiguous individually.
This may happen in two ways: either the subobjects may have gaps, that is, they
contain two char objects at distance greater than sizeof(subobject)-1. Or the
subobjects may be distributed among different sequences of contiguous bytes.
In case 2) there is no guarantee that that the claim in the OP is true.
Therefore, it is important to be clear about what 'contiguous' means.
Finally, here's an example of an implementation where no obvious ordering is imposed on the array subobjects by §5.9 because the array subobjects don't have subobjects themselves. Readers raised concerns that this would contradict the standard in other places, but no definite contradiction has been demonstrated yet.
Assume T is int, and we have one particular conforming implementation that behaves as expected naively with one exception:
It allocates arrays of ints in reversed memory order,
putting the array's first element at the high-memory-address end of the object:
a[N-1], a[N-2], ... a[0]
instead of
a[0], a[1], ... a[N-1]
This implementation satisfies any reasonable contiguity
requirement, so we don't have to agree on a single interpretation of
'contiguous' to proceed with the argument.
Then if p points to a, mapping p to &a[0] (invoking [conv.array]) would make the pointer jump near the high memory end of a.
As array arithmetic has to be compatible with pointer arithmetic, we'd also have
int * p= &intVariable;
(char*)(p+1) + sizeof(int) == (char*)p
and
int a[N];
(char*)(void*)&a[n] + n*sizeof(int)==(char*)(void*)&a[0], (0<=n<N)
Then, for T=int, there is no guarantee that the claim in the original post is true.
edit history: removed and reintroduced in modified form a possibly erroneous shortcut that was due to not applying a relevant part of the pointer < relation specification. It has not been determined yet whether this was justified or not, but the main argument about contiguity comes through anyway.
There so many questions on comparing two pointers, but I found none on whether the two types are such that the pointers can be compared. Given
A* a;
B* b;
I want to know if expression a # b is valid, where # is one of ==,!=,<,<=,>,>= (I don't mind about nullptr_t or any other type that can be implicitly converted to a pointer). Is it when A, B are
equal?
equal except for cv-qualification?
in the same class hierarchy?
...?
I didn't find anything in std::type_traits. I could always do my own SFINAE test, but I am looking for the rules to apply them directly. I guess that will be easier for the compiler, right?
EDIT To clarify again: I am comparing pointers, not objects pointed to. I want to know in advance when a # b will give a compiler error, not what will be its value (true or false or unspecified).
C++ Standard
5.9 Relational operators
Pointers to objects or functions of the same type (after pointer conversions) can be compared,
with a result defined as follows:
— If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q both
yield false.
— If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of pq, p<=q, and p>=q are unspecified.
— If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
— If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
— Other pointer comparisons are unspecified.
§
Those comparison operators are always 'valid', in that you can always use them. But the results usually will not be meaningful or useful.
== and != will essentially tell you whether or not a and b refer to the same object. If a == b, then changing *a will affect *b and vice-versa.
>, <, >=, and <= will tell you where the memory addresses are, relative to one another. Usually this information will be irrelevant to the functionality of your program, and in most cases it will be unpredictable. However, one example that comes to mind where you might be able to use it is if you know that a and b point to member of the same array. In which case a < b will tell you whether or not the object *a comes before *b in the array.
Today I wrote something which looked like this:
void foo(std::vector<char>&v){
v.push_back('a');
char*front=&v.front();
char*back=&v.back();
size_t n1=back-front+1;
v.push_back('b');//This could reallocate the vector elements
size_t n2=back-front+1;//Is this line valid or Undefined Behavior ?
}
If a reallocation occures when I push 'b' back, may I still compute the difference of my two pointers ?
After reading the relevant passage of the standard a few times, I still cannot make my mind on this point.
C++11 5.7.6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. The type of the result is an implementation-defined signed integral
type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2). As
with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined.
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object,
the expression (P)-(Q) has the value i − j provided the value fits in an object of type std::ptrdiff_t.
Moreover, if the expression P points either to an element of an array object or one past the last element of
an array object, and the expression Q points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the
expression P points one past the last element of the array object, even though the expression (Q)+1 does not
point to an element of the array object. Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
Of course I know that it works, I just wonder if it is legal.
Pointers to deleted objects are toxic: don't touch then for anything other than giving them a new value. A memory tracking system may trap aby use of a reclaimed pointer value. I'm not aware if any such system in existence, however.
The relevant quote is 3.7.4.2 [basic.stc.dynamic.deallocation] paragraph 4:
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value, the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.
When resizing a std::vector<...> it jumps through a number of hoops (allocators) and, by default, eventually calls a deallocation function.
Strictly speaking, it's UB. But you can always convert your char * pointers to uintptr_t (provided it is present) and then safely subtract the resulting integers.
void foo(std::vector<char>&v){
v.push_back('a');
auto front= uintptr_t (&v.front());
auto back = uintptr_t (&v.back());
size_t n1=back-front+1;
v.push_back('b');//This could reallocate the vector elements
size_t n2=back-front+1;
}
This particular case is safe but ugly and misleading.
Line v.push_back('b');//This could reallocate the vector elements can cause reallocation of your container. In this case next line will use a non existent front and back pointers. Computing difference of two addresses is safe even if are dangling pointers. What is not safe is dereferencing them.
The correct solution is to use vector::count() function the will be always in sync. If you (for some reason) don;t want to call vector::count() you should at leas use ++n1.
char** buffer{ /* some buffer */ };
char* ptr1{buffer[0]};
char* ptr2{buffer[10]};
assert(ptr1 < ptr2);
If two pointers point to different locations in the same buffer, is it safe to compare them?
I want to know if a range of pointers is valid by comparing: assert(rangeBeginPtr < rangeEndPtr).
You can compare pointers with the relational operators (<, >, <= and >=) provided they both point to an element of the same array, or one past that array. Anything else is unspecified behaviour as per C++11 5.9 Relational operators. So, given:
char xyzzy[10];
char plugh[10];
All these are specified to function correctly:
assert(&(xyzzy[1]) < &(xyzzy[4]));
assert(&(xyzzy[9]) < &(xyzzy[10])); // even though [10] isn't there.
but these are not:
assert(&(xyzzy[1]) < &(xyzzy[15]));
assert(&(xyzzy[9]) < &(plugh[3]));
The type doesn't come into it except that it has to be the same type if you're comparing two elements in the same array. If you have two char * variables, that's unspecified if they point to different arrays even though they have the same type.
You can determine the order of pointers only with on array object and if they are non-void. However, within one array object the comparison is well defined. The relevant clause in the standard is 5.9 [expr.rel] paragraph 2:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
If two pointers p and q of the same type point to the same object or function, or both point one past the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q both yield false.
If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.
If two pointers point to non-static data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union.
If two pointers point to non-static data members of the same object with different access control (Clause 11) the result is unspecified.
If two pointers point to non-static data members of the same union object, they compare equal (after conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond the end of the array, the pointer to the object with the higher subscript compares higher.
Other pointer comparisons are unspecified.
== and != are valid and well-defined for all pointers of the same type. <, <=, >, and >= are only meaningful for pointers that point to objects in the same array or one-past-the-end of the array. They are also meaningful for pointers to sub-objects of a class object if the sub-objects have the same type and the same access specifier. If those conditions aren't met, the result is unspecified; one immediate consequence is that a<b and b<c does not imply that a<c, so you cannot use <, etc. as the comparator for a sort function.
std::less, std::less_equal, std::greater, and std::greater_equal for pointer types all define a total ordering; they can be used for sorting.
Inspired by this answer about dynamic cast to void*:
...
bool eqdc(B* b1, B *b2) {
return dynamic_cast<void*>(b1) == dynamic_cast<void*>(b2);
}
...
int main() {
DD *dd = new DD();
D1 *d1 = dynamic_cast<D1*>(dd);
D2 *d2 = dynamic_cast<D2*>(dd);
... eqdc(d1, d2) ...
I am wondering if it is fully defined behaviour in C++ (according to the 03 or 11 standard) to compare two void pointers for (in)equality that point to valid, but different objects.
More generally, but possibly not as relevant, is comparing (==or !=) two values of type void* always defined, or is it required that they hold a pointer to a valid object/memory area?
C says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the
same object (including a pointer to an object and a subobject at its beginning) or function,
both are pointers to one past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to the start of a different
array object that happens to immediately follow the first array object in the address
space.
C++ says:
Two pointers of the same type compare equal if
and only if they are both null, both point to the same function, or both represent the same address.
Hence it would mean that:
a)
it is fully defined behaviour in C++ (according to the 03 or 11 standard) to compare two void pointers for (in)equality that point to valid, but different objects.
So yes, in both C and C++. You can compare them and in this case they shall compare as true iff they point to the same object. That's simple.
b)
is comparing (==or !=) two values of type void* always defined, or is it required that they hold a pointer to a valid object/memory area?
Again, the comparison is well-defined (standard says "if and only if" so every comparison of two pointers is well-defined). But then...
C++ talks in terms of "address", so I think this means that the standard requires this to work "as we'd expect",
C, however, requires both the pointers to be either null, or point to an object or function, or one element past an array object. This, if my reading skills aren't off, means that if on a given platform you have two pointers with the same value, but not pointing to a valid object (e.g. misaligned), comparing them shall be well-defined and yield false.
This is surprising!
Indeed that's not how GCC works:
int main() {
void* a = (void*)1; // misaligned, can't point to a valid object
void* b = a;
printf((a == b) ? "equal" : "not equal");
return 0;
}
result:
equal
Maybe it's UB in C to have a pointer which isn't a null pointer and doesn't point to an object, subobject or one past the last object in an array? Hm... This was my guess, but then we have that:
An integer may be converted to anypointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
So I can only interpret it that the above program is well-defined and the C standard expects it to print "not equal", while GCC doesn't really obey the standard but gives a more intuitive result.
C++11, 5.10/1:
Pointers of the same type (after pointer conversions) can be compared
for equality. Two pointers of the same type compare equal if and only
if they are both null, both point to the same function, or both
represent the same address
So yes, the specific comparison is OK.
In general it is undefined behavior to attempt to create a pointer value that isn't a valid address - for example using pointer arithmetic to go before the beginning or after the one-after-the-end of an array - let alone use them. The result of stuff like (void*)23 is implementation-defined, so barring specific permission from the implementation it is in effect undefined behavior to compare those too, since the implementation might define that the result is a trap value of void*.