Inspired by this answer about dynamic cast to void*:
...
bool eqdc(B* b1, B *b2) {
return dynamic_cast<void*>(b1) == dynamic_cast<void*>(b2);
}
...
int main() {
DD *dd = new DD();
D1 *d1 = dynamic_cast<D1*>(dd);
D2 *d2 = dynamic_cast<D2*>(dd);
... eqdc(d1, d2) ...
I am wondering if it is fully defined behaviour in C++ (according to the 03 or 11 standard) to compare two void pointers for (in)equality that point to valid, but different objects.
More generally, but possibly not as relevant, is comparing (==or !=) two values of type void* always defined, or is it required that they hold a pointer to a valid object/memory area?
C says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the
same object (including a pointer to an object and a subobject at its beginning) or function,
both are pointers to one past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to the start of a different
array object that happens to immediately follow the first array object in the address
space.
C++ says:
Two pointers of the same type compare equal if
and only if they are both null, both point to the same function, or both represent the same address.
Hence it would mean that:
a)
it is fully defined behaviour in C++ (according to the 03 or 11 standard) to compare two void pointers for (in)equality that point to valid, but different objects.
So yes, in both C and C++. You can compare them and in this case they shall compare as true iff they point to the same object. That's simple.
b)
is comparing (==or !=) two values of type void* always defined, or is it required that they hold a pointer to a valid object/memory area?
Again, the comparison is well-defined (standard says "if and only if" so every comparison of two pointers is well-defined). But then...
C++ talks in terms of "address", so I think this means that the standard requires this to work "as we'd expect",
C, however, requires both the pointers to be either null, or point to an object or function, or one element past an array object. This, if my reading skills aren't off, means that if on a given platform you have two pointers with the same value, but not pointing to a valid object (e.g. misaligned), comparing them shall be well-defined and yield false.
This is surprising!
Indeed that's not how GCC works:
int main() {
void* a = (void*)1; // misaligned, can't point to a valid object
void* b = a;
printf((a == b) ? "equal" : "not equal");
return 0;
}
result:
equal
Maybe it's UB in C to have a pointer which isn't a null pointer and doesn't point to an object, subobject or one past the last object in an array? Hm... This was my guess, but then we have that:
An integer may be converted to anypointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
So I can only interpret it that the above program is well-defined and the C standard expects it to print "not equal", while GCC doesn't really obey the standard but gives a more intuitive result.
C++11, 5.10/1:
Pointers of the same type (after pointer conversions) can be compared
for equality. Two pointers of the same type compare equal if and only
if they are both null, both point to the same function, or both
represent the same address
So yes, the specific comparison is OK.
In general it is undefined behavior to attempt to create a pointer value that isn't a valid address - for example using pointer arithmetic to go before the beginning or after the one-after-the-end of an array - let alone use them. The result of stuff like (void*)23 is implementation-defined, so barring specific permission from the implementation it is in effect undefined behavior to compare those too, since the implementation might define that the result is a trap value of void*.
Related
If you know two pieces of information:
A memory address.
The type of the object stored in that address.
Then you logically have all you need to reference that object:
#include <iostream>
using namespace std;
int main()
{
int x = 1, y = 2;
int* p = (&x) + 1;
if ((long)&y == (long)p)
cout << "p now contains &y\n";
if (*p == y)
cout << "it also dereference to y\n";
}
However, this isn't legal per the C++ standard. It works in several compilers I tried, but it's Undefined Behavior.
The question is: why?
It wreaks havoc with optimizations.
void f(int* x);
int g() {
int x = 1, y = 2;
f(&x);
return y;
}
If you can validly "guess" the address of y from x's address, then the call to f may modify y and so the return statement must reload the value of y from memory.
Now consider a typical function with more local variables and more calls to other functions, where you'd have to save the value of every variable to memory before each call (because the called function may inspect them) and reload them after each call (because the called function may have modified them).
If you want to treat pointers as a numeric type, firstly you need to use std::uintptr_t, not long. That's the first undefined behavior, but not the one you're talking about.
It works in several compilers I tried, but it's Undefined Behavior.
The question is: why?
Okay, so the comments section went off when I called this undefined behavior. It's actually unspecified behavior (a.k.a. implementation defined).
You are trying to compare two distinctly unrelated pointers:
&x + 1
&y
The pointer &x+1 is a one-past-the-end pointer. The standard allows you to have such a pointer, but the behavior is only defined when you use it to compare against pointers based on x. The behavior is not specified if you compare it with anything else: [expr.eq § 3.1]
The compiler is free to put y anywhere it chooses, including in a register. As such, there is no guarantee that &y and &x+1 are related.
As an exercise to someone who wants to show whether this is in fact undefined behavior or not, perhaps start here:
[basic.stc.dynamic.safety § 3.4]:
An integer value is an integer representation of a safely-derived pointer only if its type is at least as large as std::intptr_t and it is one of the following: ...
3.4 the result of an additive or bitwise operation, one of whose operands is an integer representation of a safely-derived pointer value P, if that result converted by reinterpret_cast would compare equal to a safely-derived pointer computable from reinterpret_cast(P).
[basic.compound § 3.4] :
Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address
If you know address and type of an object and your implementation has relaxed pointer safety [basic.stc.dynamic.safety §4], then it should be legal to just access the object at that address through an appropriate lvalue I think.
The problem is that the standard does not guarantee that local variables of the same type are allocated contiguously with addresses increasing in order of declaration. So you cannot derive the address of y based on that computation you do with the address of x. Apart from that, pointer arithmetic would lead to undefined behavior if you go more than one element past an object ([expr.add]). So while (&x) + 1 is not undefined behavior yet, just the act of even computing (&x) + 2 would be…
The code is legal per the C++ standard (i.e. should compile), but as you already noted the behaviour is undefined. This is because the order of variable declaration does not imply that they will be arranged in memory in the same way.
The current draft standard (and presumably C++17) say in [basic.compound/4]:
[ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
So a pointer to an object cannot be reinterpret_cast'd to get its enclosing array pointer.
Now, there is std::launder, [ptr.launder/1]:
template<class T> [[nodiscard]] constexpr T* launder(T* p) noexcept;
Requires: p represents the address A of a byte in memory. An object X that is within its lifetime and whose type is similar to T is located at the address A. All bytes of storage that would be reachable through the result are reachable through p (see below).
And the definion of reachable is in [ptr.launder/3]:
Remarks: An invocation of this function may be used in a core constant expression whenever the value of its argument may be used in a core constant expression. A byte of storage is reachable through a pointer value that points to an object Y if it is within the storage occupied by Y, an object that is pointer-interconvertible with Y, or the immediately-enclosing array object if Y is an array element. The program is ill-formed if T is a function type or cv void.
Now, at first sight, it seems that std::launder is can be used to do the aforementioned conversion, because of the part I've put emphasis.
But. If p points to an object of an array, the bytes of the array is reachable according to this definition (even though p is not pointer-interconvertible to array-pointer), just like the result of the launder. So, it seems that the definition doesn't say anything about this issue.
So, can std::launder be used to convert an object pointer to its enclosing array pointer?
This depends on whether the enclosing array object is a complete object, and if not, whether you can validly access more bytes through a pointer to that enclosing array object (e.g., because it's an array element itself, or pointer-interconvertible with a larger object, or pointer-interconvertible with an object that's an array element). The "reachable" requirement means that you cannot use launder to obtain a pointer that would allow you to access more bytes than the source pointer value allows, on pain of undefined behavior. This ensures that the possibility that some unknown code may call launder does not affect the compiler's escape analysis.
I suppose some examples could help. Each example below reinterpret_casts a int* pointing to the first element of an array of 10 ints into a int(*)[10]. Since they are not pointer-interconvertible, the reinterpret_cast does not change the pointer value, and you get a int(*)[10] with the value of "pointer to the first element of (whatever the array is)". Each example then attempts to obtain a pointer to the entire array by calling std::launder on the cast pointer.
int x[10];
auto p = std::launder(reinterpret_cast<int(*)[10]>(&x[0]));
This is OK; you can access all elements of x through the source pointer, and the result of the launder doesn't allow you to access anything else.
int x2[2][10];
auto p2 = std::launder(reinterpret_cast<int(*)[10]>(&x2[0][0]));
This is undefined. You can only access elements of x2[0] through the source pointer, but the result (which would be a pointer to x2[0]) would have allowed you to access x2[1], which you can't through the source.
struct X { int a[10]; } x3, x4[2]; // assume no padding
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x3.a[0])); // OK
This is OK. Again, you can't access through a pointer to x3.a any byte you can't access already.
auto p4 = std::launder(reinterpret_cast<int(*)[10]>(&x4[0].a[0]));
This is (intended to be) undefined. You would have been able to reach x4[1] from the result because x4[0].a is pointer-interconvertible with x4[0], so a pointer to the former can be reinterpret_cast to yield a pointer to the latter, which then can be used for pointer arithmetic. See https://wg21.link/LWG2859.
struct Y { int a[10]; double y; } x5;
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x5.a[0]));
And this is again undefined, because you would have been able to reach x5.y from the resulting pointer (by reinterpret_cast to a Y*) but the source pointer can't be used to access it.
Remark: any non schizophrenic compiler will probably gladly accept that, as it would accept a C-style cast or a re-interpret cast, so just try and see is not an option.
But IMHO, the answer to your question is no. The emphasized immediately-enclosing array object if Y is an array element lies in a Remark paragraph, not in the Requires one. That means that provided the requires section is respected, the remarks one also applies. As an array and its element type are not similar types, the requirement is not satisfied and std::launder cannot be used.
What follows is more of a general (philosophycal?) interpretation. At the time of K&R C (in the 70's), C was intended to be able to replace assembly language. For that reason the rule was: the compiler must obey the programmer provided the source code can be translated. So no strict aliasing rule and a pointer was no more that an address with additional arithmetics rules. This strongly changed in C99 and C++03 (not speaking of C++11 +). Programmers are now supposed to use C++ as a high level language. That means that a pointer is just an object that allows to access another object of a given type, and an array and its element type are totally different types. Memory addresses are now little more than implementation details. So trying to convert a pointer to an array to a pointer to its first element is then against the philosophy of the language and could bite the programmer in a later version of the compiler. Of course real life compiler still accept it for compatibility reasons, but we should not even try to use it in modern programs.
There so many questions on comparing two pointers, but I found none on whether the two types are such that the pointers can be compared. Given
A* a;
B* b;
I want to know if expression a # b is valid, where # is one of ==,!=,<,<=,>,>= (I don't mind about nullptr_t or any other type that can be implicitly converted to a pointer). Is it when A, B are
equal?
equal except for cv-qualification?
in the same class hierarchy?
...?
I didn't find anything in std::type_traits. I could always do my own SFINAE test, but I am looking for the rules to apply them directly. I guess that will be easier for the compiler, right?
EDIT To clarify again: I am comparing pointers, not objects pointed to. I want to know in advance when a # b will give a compiler error, not what will be its value (true or false or unspecified).
C++ Standard
5.9 Relational operators
Pointers to objects or functions of the same type (after pointer conversions) can be compared,
with a result defined as follows:
— If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q both
yield false.
— If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of pq, p<=q, and p>=q are unspecified.
— If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
— If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
— Other pointer comparisons are unspecified.
§
Those comparison operators are always 'valid', in that you can always use them. But the results usually will not be meaningful or useful.
== and != will essentially tell you whether or not a and b refer to the same object. If a == b, then changing *a will affect *b and vice-versa.
>, <, >=, and <= will tell you where the memory addresses are, relative to one another. Usually this information will be irrelevant to the functionality of your program, and in most cases it will be unpredictable. However, one example that comes to mind where you might be able to use it is if you know that a and b point to member of the same array. In which case a < b will tell you whether or not the object *a comes before *b in the array.
In C and C++, it is often useful to use a past-the-end pointer to write functions that can operate on arbitrarily large arrays. C++ gives a std::end overload to make this easier. In C, on the other hand, I've found it's not uncommon to see a macro defined and used like this:
#define ARRAYLEN(array) (sizeof(array)/sizeof(array[0]))
// ...
int a [42];
do_something (a, a + ARRAYLEN (a));
I've also seen a pointer arithmetic trick used to let such functions operate on single objects:
int b;
do_something (&b, &b + 1);
It occured to me that something similar could be done with arrays, since they are considered by C (and, I believe, C++) to be "complete objects." Given an array, we can derive a pointer to an array immediately after it, dereference that pointer, and use array-to-pointer conversion on the resulting reference to an array to get a past-the-end pointer for the original array:
#define END(array) (*(&array + 1))
// ...
int a [42];
do_something (a, END (a));
My question is this: In dereferencing a pointer to a non-existent array object, does this code exhibit undefined behaviour? I'm interested in what the most recent revisions of both C and C++ have to say about this code (not because I intend to use it, as there are better ways of achieving the same result, but because it's an interesting question).
I've used that in my own code, as (&arr)[1].
I'm quite sure it is safe. Array to pointer decay is not "lvalue-to-rvalue conversion", although it starts with an lvalue and ends with an rvalue.
It is undefined behaviour.
a is of type array of 42 int.
&a is of type pointer to array of 42 int. (Note this is not an array-to-pointer conversion)
&a + 1 is also of type pointer to array of 42 int
5.7p5 states:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and [...] otherwise, the behavior is undefined
The pointer does not point to an element of an array object. It points to an array object. So the "otherwise, the behaviour is undefined" is true. Behaviour is undefined.
It is undefined behavior in C, dereferencing a pointer that points beyond an existing object always is unless it is itself part of a bigger object that contains more elements.
But the basic idea of using &array + 1 is correct, whenever array is an lvalue. (There are cases where arrays aren't lvalues.) In that case that is a valid pointer operation. Now to obtain a pointer to the first element you just have to cast that back to the base type. In your case that would be
(int*)(&array + 1)
The value of a pointer to array is guaranteed to be the same value as a pointer to its first element, only the types differ.
Unfortunately I don't see a way to make such an expression type agnostic such that you could put this in a generic macro, unless you cast to void*. (With the gcc typeof extension you could do, e.g)
So you'd better stick to the portable (array)+ARRAYLEN(array), that one should work in all cases.
In a weird corner case an array that is part of a struct and is returned as rvalue from a function is not an lvalue. I think that the standard allows pointer arithmetic here, too, bu t I never understood that construction completely, so I am not sure that it will work in that case.
Is this statement correct? Can any "TYPE" of pointer can point to any other type?
Because I believe so, still have doubts.
Why are pointers declared for definite types? E.g. int or char?
The one explanation I could get was: if an int type pointer was pointing to a char array, then when the pointer is incremented, the pointer will jump from 0 position to the 2 position, skipping 1 position in between (because int size=2).
And maybe because a pointer just holds the address of a value, not the value itself, i.e. the int or double.
Am I wrong? Was that statement correct?
Pointers may be interchangeable, but are not required to be.
In particular, on some platforms, certain types need to be aligned to certain byte-boundaries.
So while a char may be anywhere in memory, an int may need to be on a 4-byte boundary.
Another important potential difference is with function-pointers.
Pointers to functions may not be interchangeable with pointers to data-types on many platforms.
It bears repeating: This is platform-specific.
I believe Intel x86 architectures treat all pointers the same.
But you may well encounter other platforms where this is not true.
Every pointer is of some specific type. There's a special generic pointer type void* that can point to any object type, but you have to convert a void* to some specific pointer type before you can dereference it. (I'm ignoring function pointer types.)
You can convert a pointer value from one pointer type to another. In most cases, converting a pointer from foo* to bar* and back to foo* will yield the original value -- but that's not actually guaranteed in all cases.
You can cause a pointer of type foo* to point to an object of type bar, but (a) it's usually a bad idea, and (b) in some cases, it may not work (say, if the target types foo and bar have different sizes or alignment requirements).
You can get away with things like:
int n = 42;
char *p = (char*)&n;
which causes p to point to n -- but then *p doesn't give you the value of n, it gives you the value of the first byte of n as a char.
The differing behavior of pointer arithmetic is only part of the reason for having different pointer types. It's mostly about type safety. If you have a pointer of type int*, you can be reasonably sure (unless you've done something unsafe) that it actually points to an int object. And if you try to treat it as an object of a different type, the compiler will likely complain about it.
Basically, we have distinct pointer types for the same reasons we have other distinct types: so we can keep track of what kind of value is stored in each object, with help from the compiler.
(There have been languages that only have untyped generic pointers. In such a language, it's more difficult to avoid type errors, such as storing a value of one type and accidentally accessing it as if it were of another type.)
Any pointer can refer to any location in memory, so technically the statement is correct. With that said, you need to be careful when reinterpreting pointer types.
A pointer basically has two pieces of information: a memory location, and the type it expects to find there. The memory location could be anything. It could be the location where an object or value is stored; it could be in the middle of a string of text; or it could just be an arbitrary block of uninitialised memory.
The type information in a pointer is important though. The array and pointer arithmetic explanation in your question is correct -- if you try to iterate over data in memory using a pointer, then the type needs to be correct, otherwise you may not iterate correctly. This is because different types have different sizes, and may be aligned differently.
The type is also important in terms of how data is handled in your program. For example, if you have an int stored in memory, but you access it by dereferencing a float* pointer, then you'll probably get useless results (unless you've programmed it that way for a specific reason). This is because an int is stored in memory differently from the way a float is stored.
Can any "TYPE" of pointer can point to any other type?
Generally no. The types have to be related.
It is possible to use reinterpret_cast to cast a pointer from one type to another, but unless those pointers can be converted legally using a static_cast, the reinterpret_cast is invalid. Hence you can't do Foo* foo = ...; Bar* bar = (Bar*)foo; unless Foo and Bar are actually related.
You can also use reinterpret_cast to cast from an object pointer to a void* and vice versa, and in that sense a void* can point to anything -- but that's not what you seem to be asking about.
Further you can reinterpret_cast from object pointer to integral value and vice versa, but again, not what you appear to be asking.
Finally, a special exception is made for char*. You can initialize a char* variable with the address of any other type, and perform pointer math on the resulting pointer. You still can't dereference thru the pointer if the thing being pointed to isn't actually a char, but it can then be casted back to the actual type and used that way.
Also keep in mind that every time you use reinterpret_cast in any context, you are dancing on the precipice of a cliff. Dereferencing a pointer to a Foo when the thing it actually points to is a Bar yields Undefined Behavior when the types are not related. You would do well to avoid these types of casts at all costs.
Some pointers are more equal than others...
First of all, not all pointers are necessarily the same thing. Function pointers can be something very different from data pointers, for instance.
Aside: Function pointers on PPC
On the PPC platform, this was quite obvious: A function pointer was actually two pointers under the hood, so there was simply no way to meaningfully cast a function pointer to a data pointer or back. I.e. the following would hold:
int* dataP;
int (*functionP)(int);
assert(sizeof(dataP) == 4);
assert(sizeof(functionP) == 8);
assert(sizeof(dataP) != sizeof(functionP));
//impossible:
//dataP = (int*)functionP; //would loose information
//functionP = (int (*)(int))dataP; //part of the resulting pointer would be garbage
Alignment
Furthermore, there is problems with alignment: Depending on the platform some data types may need to be aligned in memory. This is especially common with vector data types, but could apply to any type larger than a byte. For instance, if an int must be 4 byte aligned, the following code might crash:
char a[4];
int* alias = (int*)a;
//int foo = *alias; //may crash because alias is not aligned properly
This is not an issue if the pointer comes from a malloc() call, as that is guaranteed to return sufficiently aligned pointers for all types:
char* a = malloc(sizeof(int));
int* alias = (int*)a;
*alias = 0; //perfectly legal, the pointer is aligned
Strict aliasing and type punning
Finally, there are strict aliasing rules: You must not access an object of one type through a pointer to another type. Type punning is forbidden:
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
//uint32_t bits = *(uint32_t*)&foo; //type punning is illegal
If you absolutely must reinterpret a bit pattern as another type, you must use memcpy():
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
uint32_t bits;
memcpy(&bits, &foo, sizeof(bits)); //bit pattern reinterpretation is legal when copying the data
To allow memcpy() and friends to actually be implementable, the C/C++ language standards provide for an exception for char types: You can cast any pointer to a char*, copy the char data over to another buffer, and then access that other buffer as some other type. The results are implementation defined, but the standards allow it. Use cases are mostly general data manipulation routines like I/O, etc.
TL;DR:
Pointers are much less interchangeable than you think. Don't reinterpret pointers in any other way than to/from char* (check alignment in the "from" case). And even that does not work for function pointers.