Pointer values are different but they compare equal. Why? - c++

A short example outputs a weird result!
#include <iostream>
using namespace std;
struct A { int a; };
struct B { int b; };
struct C : A, B
{
int c;
};
int main()
{
C* c = new C;
B* b = c;
cout << "The address of b is 0x" << hex << b << endl;
cout << "The address of c is 0x" << hex << c << endl;
if (b == c)
{
cout << "b is equal to c" << endl;
}
else
{
cout << "b is not equal to c" << endl;
}
}
It's very surprising to me that the output should be as follows:
The address of b is 0x003E9A9C
The address of c is 0x003E9A98
b is equal to c
What makes me wonder is:
0x003E9A9C is not equal to 0x003E9A98, but the output is "b is equal to c"

A C object contains two sub-objects, of types A and B. Obviously, these must have different addresses since two separate objects can't have the same address; so at most one of these can have the same address as the C object. That is why printing the pointers gives different values.
Comparing the pointers doesn't simply compare their numeric values. Only pointers of the same type can be compared, so first one must be converted to match the other. In this case, c is converted to B*. This is exactly the same conversion used to initialise b in the first place: it adjusts the pointer value so that it points to the B sub-object rather than the C object, and the two pointers now compare equal.

The memory layout of an object of type C will look something like this:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
I added the offset in bytes from the Address of the object (in a platform like yours with sizeof(int) = 4).
In your main, you have two pointers, I'll rename them to pb and pc for clarity. pc points to the start of the whole C object, while pb points to the start of the B subobject:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
pc-^ pb-^
This is the reason why their values are different. 3E9A98+4 is 3E9A9C, in hex.
If you now compare those two pointers, the compiler will see a comparison between a B* and a C*, which are different types. So it has to apply an implicit conversion, if there is one. pb cannot be converted into a C*, but the other way round is possible - it converts pc into a B*. That conversion will give a pointer that points to the B subobject of wherever pc points to - it is the same implicit conversion used when you defined B* pb = pc;. The result is equal to pb, obviously:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
pc-^ pb-^
(B*)pc-^
So when comparing the two pointers, the compiler in fact compares the converted pointers, which are equal.

I know there is an answer but maybe this will be more straightforward and backed-up by an example.
There is an implicit conversion from C* to B* on c operand in here if (b == c)
If you go with this code:
#include <iostream>
using namespace std;
struct A { int a; };
struct B { int b; };
struct C : A, B
{
int c;
};
int main()
{
C* c = new C;
B* b = c;
cout << "The address of b is 0x" << hex << b << endl;
cout << "The address of c is 0x" << hex << c << endl;
cout << "The address of (B*)c is 0x" << hex << (B*)c << endl;
if (b == c)
{
cout << "b is equal to c" << endl;
}
else
{
cout << "b is not equal to c" << endl;
}
}
You get:
The address of b is 0x0x88f900c
The address of c is 0x0x88f9008
The address of (B*)c is 0x0x88f900c
b is equal to c
So c casted to B* type has the same address as b. As expected.

If I may add to Mike's excellent answer, if you cast them as void* then you will get your expected behaviour:
if ((void*)(b) == (void*)(c))
^^^^^^^ ^^^^^^^
prints
b is not equal to c
Doing something similar on C (the language) actually irritated the compiler due to the different types of the pointers compared.
I got:
warning: comparison of distinct pointer types lacks a cast [enabled by default]

In computing (or, rather, we should say in mathematics) there can be many notions of equality. Any relation that is symmetric, reflexive and transitive can be employed as equality.
In your program, you are examining two somewhat different notions of equality: bitwise implementation identity (two pointers being to exactly the same address) versus another kind of equality based on object identity, which allows two views on the same object, through references of different static type, to be properly regarded as referencing the same object.
These differently typed views use pointers which do not have the same address value, because they latch on to different parts of the object. The compiler knows this and so it generates the correct code for the equality comparison which takes into account this offset.
It is the structure of objects brought about by inheritance which makes it necessary to have these offsets. When there are multiple bases (thanks to multiple inheritance), only one of those bases can be at the low address of the object, so that the pointer to the base part is the same as the pointer to the derived object. The other base parts are elsewhere in the object.
So, naive, bitwise comparison of pointers would not yield the correct results according to the object-oriented view of the object.

Some good answers here, but there's a short version. "Two objects are the same" does not mean they have the same address. It means putting data into them and taking data out of them is equivalent.

Related

Is reinterpret_cast only made for type punning? [duplicate]

I am little confused with the applicability of reinterpret_cast vs static_cast. From what I have read the general rules are to use static cast when the types can be interpreted at compile time hence the word static. This is the cast the C++ compiler uses internally for implicit casts also.
reinterpret_casts are applicable in two scenarios:
convert integer types to pointer types and vice versa
convert one pointer type to another. The general idea I get is this is unportable and should be avoided.
Where I am a little confused is one usage which I need, I am calling C++ from C and the C code needs to hold on to the C++ object so basically it holds a void*. What cast should be used to convert between the void * and the Class type?
I have seen usage of both static_cast and reinterpret_cast? Though from what I have been reading it appears static is better as the cast can happen at compile time? Though it says to use reinterpret_cast to convert from one pointer type to another?
The C++ standard guarantees the following:
static_casting a pointer to and from void* preserves the address. That is, in the following, a, b and c all point to the same address:
int* a = new int();
void* b = static_cast<void*>(a);
int* c = static_cast<int*>(b);
reinterpret_cast only guarantees that if you cast a pointer to a different type, and then reinterpret_cast it back to the original type, you get the original value. So in the following:
int* a = new int();
void* b = reinterpret_cast<void*>(a);
int* c = reinterpret_cast<int*>(b);
a and c contain the same value, but the value of b is unspecified. (in practice it will typically contain the same address as a and c, but that's not specified in the standard, and it may not be true on machines with more complex memory systems.)
For casting to and from void*, static_cast should be preferred.
One case when reinterpret_cast is necessary is when interfacing with opaque data types. This occurs frequently in vendor APIs over which the programmer has no control. Here's a contrived example where a vendor provides an API for storing and retrieving arbitrary global data:
// vendor.hpp
typedef struct _Opaque * VendorGlobalUserData;
void VendorSetUserData(VendorGlobalUserData p);
VendorGlobalUserData VendorGetUserData();
To use this API, the programmer must cast their data to VendorGlobalUserData and back again. static_cast won't work, one must use reinterpret_cast:
// main.cpp
#include "vendor.hpp"
#include <iostream>
using namespace std;
struct MyUserData {
MyUserData() : m(42) {}
int m;
};
int main() {
MyUserData u;
// store global data
VendorGlobalUserData d1;
// d1 = &u; // compile error
// d1 = static_cast<VendorGlobalUserData>(&u); // compile error
d1 = reinterpret_cast<VendorGlobalUserData>(&u); // ok
VendorSetUserData(d1);
// do other stuff...
// retrieve global data
VendorGlobalUserData d2 = VendorGetUserData();
MyUserData * p = 0;
// p = d2; // compile error
// p = static_cast<MyUserData *>(d2); // compile error
p = reinterpret_cast<MyUserData *>(d2); // ok
if (p) { cout << p->m << endl; }
return 0;
}
Below is a contrived implementation of the sample API:
// vendor.cpp
static VendorGlobalUserData g = 0;
void VendorSetUserData(VendorGlobalUserData p) { g = p; }
VendorGlobalUserData VendorGetUserData() { return g; }
The short answer:
If you don't know what reinterpret_cast stands for, don't use it. If you will need it in the future, you will know.
Full answer:
Let's consider basic number types.
When you convert for example int(12) to unsigned float (12.0f) your processor needs to invoke some calculations as both numbers has different bit representation. This is what static_cast stands for.
On the other hand, when you call reinterpret_cast the CPU does not invoke any calculations. It just treats a set of bits in the memory like if it had another type. So when you convert int* to float* with this keyword, the new value (after pointer dereferecing) has nothing to do with the old value in mathematical meaning (ignoring the fact that it is undefined behavior to read this value).
Be aware that reading or modifying values after reinterprt_cast'ing are very often Undefined Behavior. In most cases, you should use pointer or reference to std::byte (starting from C++17) if you want to achieve the bit representation of some data, it is almost always a legal operation. Other "safe" types are char and unsigned char, but I would say it shouldn't be used for that purpose in modern C++ as std::byte has better semantics.
Example: It is true that reinterpret_cast is not portable because of one reason - byte order (endianness). But this is often surprisingly the best reason to use it. Let's imagine the example: you have to read binary 32bit number from file, and you know it is big endian. Your code has to be generic and works properly on big endian (e.g. some ARM) and little endian (e.g. x86) systems. So you have to check the byte order. It is well-known on compile time so you can write constexpr function: You can write a function to achieve this:
/*constexpr*/ bool is_little_endian() {
std::uint16_t x=0x0001;
auto p = reinterpret_cast<std::uint8_t*>(&x);
return *p != 0;
}
Explanation: the binary representation of x in memory could be 0000'0000'0000'0001 (big) or 0000'0001'0000'0000 (little endian). After reinterpret-casting the byte under p pointer could be respectively 0000'0000 or 0000'0001. If you use static-casting, it will always be 0000'0001, no matter what endianness is being used.
EDIT:
In the first version I made example function is_little_endian to be constexpr. It compiles fine on the newest gcc (8.3.0) but the standard says it is illegal. The clang compiler refuses to compile it (which is correct).
The meaning of reinterpret_cast is not defined by the C++ standard. Hence, in theory a reinterpret_cast could crash your program. In practice compilers try to do what you expect, which is to interpret the bits of what you are passing in as if they were the type you are casting to. If you know what the compilers you are going to use do with reinterpret_cast you can use it, but to say that it is portable would be lying.
For the case you describe, and pretty much any case where you might consider reinterpret_cast, you can use static_cast or some other alternative instead. Among other things the standard has this to say about what you can expect of static_cast (§5.2.9):
An rvalue of type “pointer to cv void” can be explicitly converted to a pointer to object type. A value of type pointer to object converted to “pointer to cv void” and back to the original pointer type will have its original value.
So for your use case, it seems fairly clear that the standardization committee intended for you to use static_cast.
One use of reinterpret_cast is if you want to apply bitwise operations to (IEEE 754) floats. One example of this was the Fast Inverse Square-Root trick:
https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
It treats the binary representation of the float as an integer, shifts it right and subtracts it from a constant, thereby halving and negating the exponent. After converting back to a float, it's subjected to a Newton-Raphson iteration to make this approximation more exact:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the deuce?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
This was originally written in C, so uses C casts, but the analogous C++ cast is the reinterpret_cast.
Here is a variant of Avi Ginsburg's program which clearly illustrates the property of reinterpret_cast mentioned by Chris Luengo, flodin, and cmdLP: that the compiler treats the pointed-to memory location as if it were an object of the new type:
#include <iostream>
#include <string>
#include <iomanip>
using namespace std;
class A
{
public:
int i;
};
class B : public A
{
public:
virtual void f() {}
};
int main()
{
string s;
B b;
b.i = 0;
A* as = static_cast<A*>(&b);
A* ar = reinterpret_cast<A*>(&b);
B* c = reinterpret_cast<B*>(ar);
cout << "as->i = " << hex << setfill('0') << as->i << "\n";
cout << "ar->i = " << ar->i << "\n";
cout << "b.i = " << b.i << "\n";
cout << "c->i = " << c->i << "\n";
cout << "\n";
cout << "&(as->i) = " << &(as->i) << "\n";
cout << "&(ar->i) = " << &(ar->i) << "\n";
cout << "&(b.i) = " << &(b.i) << "\n";
cout << "&(c->i) = " << &(c->i) << "\n";
cout << "\n";
cout << "&b = " << &b << "\n";
cout << "as = " << as << "\n";
cout << "ar = " << ar << "\n";
cout << "c = " << c << "\n";
cout << "Press ENTER to exit.\n";
getline(cin,s);
}
Which results in output like this:
as->i = 0
ar->i = 50ee64
b.i = 0
c->i = 0
&(as->i) = 00EFF978
&(ar->i) = 00EFF974
&(b.i) = 00EFF978
&(c->i) = 00EFF978
&b = 00EFF974
as = 00EFF978
ar = 00EFF974
c = 00EFF974
Press ENTER to exit.
It can be seen that the B object is built in memory as B-specific data first, followed by the embedded A object. The static_cast correctly returns the address of the embedded A object, and the pointer created by static_cast correctly gives the value of the data field. The pointer generated by reinterpret_cast treats b's memory location as if it were a plain A object, and so when the pointer tries to get the data field it returns some B-specific data as if it were the contents of this field.
One use of reinterpret_cast is to convert a pointer to an unsigned integer (when pointers and unsigned integers are the same size):
int i;
unsigned int u = reinterpret_cast<unsigned int>(&i);
You could use reinterprete_cast to check inheritance at compile time.
Look here:
Using reinterpret_cast to check inheritance at compile time
template <class outType, class inType>
outType safe_cast(inType pointer)
{
void* temp = static_cast<void*>(pointer);
return static_cast<outType>(temp);
}
I tried to conclude and wrote a simple safe cast using templates.
Note that this solution doesn't guarantee to cast pointers on a functions.
First you have some data in a specific type like int here:
int x = 0x7fffffff://==nan in binary representation
Then you want to access the same variable as an other type like float:
You can decide between
float y = reinterpret_cast<float&>(x);
//this could only be used in cpp, looks like a function with template-parameters
or
float y = *(float*)&(x);
//this could be used in c and cpp
BRIEF: it means that the same memory is used as a different type. So you could convert binary representations of floats as int type like above to floats. 0x80000000 is -0 for example (the mantissa and exponent are null but the sign, the msb, is one. This also works for doubles and long doubles.
OPTIMIZE: I think reinterpret_cast would be optimized in many compilers, while the c-casting is made by pointerarithmetic (the value must be copied to the memory, cause pointers couldn't point to cpu- registers).
NOTE: In both cases you should save the casted value in a variable before cast! This macro could help:
#define asvar(x) ({decltype(x) __tmp__ = (x); __tmp__; })
Quick answer: use static_cast if it compiles, otherwise resort to reinterpret_cast.
Read the FAQ! Holding C++ data in C can be risky.
In C++, a pointer to an object can be converted to void * without any casts. But it's not true the other way round. You'd need a static_cast to get the original pointer back.

Confusing about char* in c++

test code:
#include <iostream>
using namespace std;
int main()
{
const char* b="str";
cout << b << endl;
cout << *b << endl;
cout << &b << endl;
cout << *(&b) << endl;
return 0;
}
result:
str
s
0x7ffdf39c27f0
str
I run my code on the web runoob online compiler
Why I get these results? I looked some questions about char*, but not enough for me to understand. Can someone explain that to me? Pictures are best.
I want to know more about it with books or blogs recommended.
By the way, usingchar b[] instead of const char*, I get the same results.
Thanks a lot for all of you.
I just want to know why a char pointer's value is not an address.
I think adress is like 0x7ffdf39c27f0. an memory adress.
But const char* b = "str". b is just str.
And I found that *b is the same as *("str").
So I want to know what happened in the memory? why a char pointer's value is not an address?
To understand what the code outputs, you need to understand that C++ output streams (objects with a type such as std::ostream) and therefore objects (such as std::cout) have a number of overloads of operator<<(). The overload that is called depends on the type of argument provided.
I'll explain your second example, but the explanation for the first example is almost identical.
const char* b="st\0r";
cout << b << endl;
cout << *b << endl;
cout << &b << endl;
cout << *(&b) << endl;
cout << b expands to cout.operator<<(b) where b has type const char *. That overload of the operator function ASSUMES the argument points to (the first character of) a nul terminated string, which is represented in memory as an array of char that ends with a char with value '\0' (zero). The operator function outputs each character it finds until it reaches a '\0' character. The first '\0' found is the one YOU explicitly inserted after the 't', so the output st is produced. The fact that your string has a second '\0' after the 'r' is irrelevant, since the operator function stops at the first one it finds.
cout << *b expands to a call of a different overload of operator<<() that accepts a single char, and outputs that char. *b is the value of the first character in the string represented by b. So the output s is produced.
In cout << &b, &b has type const char ** or (equivalently) char const **. There is no overload of an output stream's operator<<() that accepts a const char **, but there is an overload that accepts a const void *. Since any pointer (other than pointer-to-member or pointers to functions) can be implicitly converted to void *, that conversion is performed (by the compiler), so the overload matches, and is called. That particular overload of the operator<<() prints the address in memory.
The implicit conversion in the third case doesn't happen in the first two cases, since a call that doesn't require an implicit conversion is a better match than a call which does.
In the last statement *(&b) is equivalent to b. This is the case because & is the address-of operator in this code, and the * is the dereference operator (which is the inverse of the address-of operator). So the last statement produces the same output as cout << b.
cout << b << endl;
You are printing the string b
cout << *b << endl;
You are printing the pointer that points to the first character of b., so is the same as:
cout << b[0] << endl;
cout << &b << endl;
&b is the memory address of b, which means the address memory to store b in the computer.
cout << &b << endl;
So, you're printing the memory address of b here. The computer store b in the memory address 0x7ffdf39c27f0, so that's what you get.
cout << *(&b) << endl;
You are printing a pointer that points to the memory of b, so you print the value at the memory address of variable b which is the same as
cout << b << endl;
edit: A pointer contains an address that (usually, it could point at a function, for example) represents the location of an object, and to print a pointer (usually) prints the value of that memory address. Because char * is intimately linked with null-terminated strings, there is a special overload for pointers to characters to print the pointed-at string.
A pointer variable is still a variable and will have an address of its own, so &b results in a pointer to a pointer, a char ** in this case and because it is no longer a char *, cout << &b; prints the address of b, not the address pointed at by b or the string pointed at by b.

C++ type casting / type convention

Can anyone explain what lines 5 & 7 mean?
int a;
double b = 2.3;
a = b;
cout << a << endl;
a = int(b); // <-- here
cout << a << endl;
a = (int)b; // <-- here
cout << a << endl;
This is called C-style casting and is not recommended to be used in c++ because it can bring to precision loss. What happens here is that the double type is represented in memory as a structure holding the whole part and the floating part. And when you say a = int(someVariableNameWhichIsActuallyDouble) it takes only the whole part of that variable and assigns it to a. So for example if you have b = 2.9; and you want to take only the whole part of the number you can do a c-style cast. But since you wrote C++ type casting for such cases i recommend you to use a = static_cast(b);
But be cautious because when doing narrowing casting(casting from a larger type to a narrower type) you need to be causios not to loose precision.

C++ object memory layout

I am trying to understand the object layout for C++ in multiple inheritance.
For this purpose, I have two superclasses A, B and one subclass C.
What I expected to see when trying dumping it was:
vfptr | fields of A | vfptr | fields of B | fields of C.
I get this model but with some zeros that I don't understand.
Here is the code I am trying
#include <iostream>
using namespace std;
class A{
public:
int a;
A(){ a = 5; }
virtual void foo(){ }
};
class B{
public:
int b;
B(){ b = 10; }
virtual void foo2(){ }
};
class C : public A, public B{
public:
int c;
C(){ c = 15; a = 20;}
virtual void foo2(){ cout << "Heeello!\n";}
};
int main()
{
C c;
int *ptr;
ptr = (int *)&c;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
return 0;
}
And here is the output I get:
4198384 //vfptr
0
20 // value of a
0
4198416 //vfptr
0
10 // value of b
15 // value of c
What is the meaning of the zeros in between?
Thanks in advance!
That depends upon your compiler. With clang-500, I get:
191787296
1
20
0
191787328
1
10
15
1785512560
I am sure there's a GDB way too, but this is what I get if I dump pointer-sized words with LLDB at the address of the class object:
0x7fff5fbff9d0: 0x0000000100002120 vtable for C + 16
0x7fff5fbff9d8: 0x0000000000000014
0x7fff5fbff9e0: 0x0000000100002140 vtable for C + 48
0x7fff5fbff9e8: 0x0000000f0000000a
This layout seems sensible, right? Just what you expect.
The reason why that doesn't show as clean in your program as it does in the debugger is that you are dumping int-sized words. On a 64-bit system sizeof(int)==4 but sizeof(void*)==8
So, you see your pointers split into (int,int) pairs. On Linux, your pointers don't have any bit set beyond the low 32, on OSX my pointers do - hence the reason for the 0 vs. 1 disparity
this is hugely architecture and compiler dependant... Possibly for you the size of a pointer might not be the size of an int... What architecture/compiler are you using?
If you're working on a 64-bit system, then:
The first zero is the 4 most-significant-bytes of the first vfptr.
The second zero is padding, so that the second vfptr will be aligned to an 8-byte address.
The third zero is the 4 most-significant-bytes of the second vfptr.
You can check if sizeof(void*) == 8 in order to assert that.
Hard to tell without knowing your platform and compiler, but this might be an alignment issue. In effect, the compiler might attempt to align class data along 8-byte boundaries, with zeroes used for padding.
Without the above details, this is merely speculation.
This is completely dependant on your compiler, system, bitness.
The virtual table pointer will have the size of a pointer. This depends on whether you are compiling your file as 32-bit or 64-bit. Pointers will also be aligned at a multiple address of their size (like any type will typically be). This is probably why you are seeing the 0 padding after the 20.
The integers will have the size of an integer on your specific system. This is usually always 32-bit. Note that if this isn't the case on your machine you will get unexpected results because you are increasing your ptr by sizeof(int) with pointer arithmetic.
If you use MVSC, you can dump all memory layout of all class in your solution with -d1reportAllClassLayout like that:
cl -d1reportAllClassLayout main.cpp
Hope it helpful to you

When to use reinterpret_cast?

I am little confused with the applicability of reinterpret_cast vs static_cast. From what I have read the general rules are to use static cast when the types can be interpreted at compile time hence the word static. This is the cast the C++ compiler uses internally for implicit casts also.
reinterpret_casts are applicable in two scenarios:
convert integer types to pointer types and vice versa
convert one pointer type to another. The general idea I get is this is unportable and should be avoided.
Where I am a little confused is one usage which I need, I am calling C++ from C and the C code needs to hold on to the C++ object so basically it holds a void*. What cast should be used to convert between the void * and the Class type?
I have seen usage of both static_cast and reinterpret_cast? Though from what I have been reading it appears static is better as the cast can happen at compile time? Though it says to use reinterpret_cast to convert from one pointer type to another?
The C++ standard guarantees the following:
static_casting a pointer to and from void* preserves the address. That is, in the following, a, b and c all point to the same address:
int* a = new int();
void* b = static_cast<void*>(a);
int* c = static_cast<int*>(b);
reinterpret_cast only guarantees that if you cast a pointer to a different type, and then reinterpret_cast it back to the original type, you get the original value. So in the following:
int* a = new int();
void* b = reinterpret_cast<void*>(a);
int* c = reinterpret_cast<int*>(b);
a and c contain the same value, but the value of b is unspecified. (in practice it will typically contain the same address as a and c, but that's not specified in the standard, and it may not be true on machines with more complex memory systems.)
For casting to and from void*, static_cast should be preferred.
One case when reinterpret_cast is necessary is when interfacing with opaque data types. This occurs frequently in vendor APIs over which the programmer has no control. Here's a contrived example where a vendor provides an API for storing and retrieving arbitrary global data:
// vendor.hpp
typedef struct _Opaque * VendorGlobalUserData;
void VendorSetUserData(VendorGlobalUserData p);
VendorGlobalUserData VendorGetUserData();
To use this API, the programmer must cast their data to VendorGlobalUserData and back again. static_cast won't work, one must use reinterpret_cast:
// main.cpp
#include "vendor.hpp"
#include <iostream>
using namespace std;
struct MyUserData {
MyUserData() : m(42) {}
int m;
};
int main() {
MyUserData u;
// store global data
VendorGlobalUserData d1;
// d1 = &u; // compile error
// d1 = static_cast<VendorGlobalUserData>(&u); // compile error
d1 = reinterpret_cast<VendorGlobalUserData>(&u); // ok
VendorSetUserData(d1);
// do other stuff...
// retrieve global data
VendorGlobalUserData d2 = VendorGetUserData();
MyUserData * p = 0;
// p = d2; // compile error
// p = static_cast<MyUserData *>(d2); // compile error
p = reinterpret_cast<MyUserData *>(d2); // ok
if (p) { cout << p->m << endl; }
return 0;
}
Below is a contrived implementation of the sample API:
// vendor.cpp
static VendorGlobalUserData g = 0;
void VendorSetUserData(VendorGlobalUserData p) { g = p; }
VendorGlobalUserData VendorGetUserData() { return g; }
The short answer:
If you don't know what reinterpret_cast stands for, don't use it. If you will need it in the future, you will know.
Full answer:
Let's consider basic number types.
When you convert for example int(12) to unsigned float (12.0f) your processor needs to invoke some calculations as both numbers has different bit representation. This is what static_cast stands for.
On the other hand, when you call reinterpret_cast the CPU does not invoke any calculations. It just treats a set of bits in the memory like if it had another type. So when you convert int* to float* with this keyword, the new value (after pointer dereferecing) has nothing to do with the old value in mathematical meaning (ignoring the fact that it is undefined behavior to read this value).
Be aware that reading or modifying values after reinterprt_cast'ing are very often Undefined Behavior. In most cases, you should use pointer or reference to std::byte (starting from C++17) if you want to achieve the bit representation of some data, it is almost always a legal operation. Other "safe" types are char and unsigned char, but I would say it shouldn't be used for that purpose in modern C++ as std::byte has better semantics.
Example: It is true that reinterpret_cast is not portable because of one reason - byte order (endianness). But this is often surprisingly the best reason to use it. Let's imagine the example: you have to read binary 32bit number from file, and you know it is big endian. Your code has to be generic and works properly on big endian (e.g. some ARM) and little endian (e.g. x86) systems. So you have to check the byte order. It is well-known on compile time so you can write constexpr function: You can write a function to achieve this:
/*constexpr*/ bool is_little_endian() {
std::uint16_t x=0x0001;
auto p = reinterpret_cast<std::uint8_t*>(&x);
return *p != 0;
}
Explanation: the binary representation of x in memory could be 0000'0000'0000'0001 (big) or 0000'0001'0000'0000 (little endian). After reinterpret-casting the byte under p pointer could be respectively 0000'0000 or 0000'0001. If you use static-casting, it will always be 0000'0001, no matter what endianness is being used.
EDIT:
In the first version I made example function is_little_endian to be constexpr. It compiles fine on the newest gcc (8.3.0) but the standard says it is illegal. The clang compiler refuses to compile it (which is correct).
The meaning of reinterpret_cast is not defined by the C++ standard. Hence, in theory a reinterpret_cast could crash your program. In practice compilers try to do what you expect, which is to interpret the bits of what you are passing in as if they were the type you are casting to. If you know what the compilers you are going to use do with reinterpret_cast you can use it, but to say that it is portable would be lying.
For the case you describe, and pretty much any case where you might consider reinterpret_cast, you can use static_cast or some other alternative instead. Among other things the standard has this to say about what you can expect of static_cast (§5.2.9):
An rvalue of type “pointer to cv void” can be explicitly converted to a pointer to object type. A value of type pointer to object converted to “pointer to cv void” and back to the original pointer type will have its original value.
So for your use case, it seems fairly clear that the standardization committee intended for you to use static_cast.
One use of reinterpret_cast is if you want to apply bitwise operations to (IEEE 754) floats. One example of this was the Fast Inverse Square-Root trick:
https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
It treats the binary representation of the float as an integer, shifts it right and subtracts it from a constant, thereby halving and negating the exponent. After converting back to a float, it's subjected to a Newton-Raphson iteration to make this approximation more exact:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the deuce?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
This was originally written in C, so uses C casts, but the analogous C++ cast is the reinterpret_cast.
Here is a variant of Avi Ginsburg's program which clearly illustrates the property of reinterpret_cast mentioned by Chris Luengo, flodin, and cmdLP: that the compiler treats the pointed-to memory location as if it were an object of the new type:
#include <iostream>
#include <string>
#include <iomanip>
using namespace std;
class A
{
public:
int i;
};
class B : public A
{
public:
virtual void f() {}
};
int main()
{
string s;
B b;
b.i = 0;
A* as = static_cast<A*>(&b);
A* ar = reinterpret_cast<A*>(&b);
B* c = reinterpret_cast<B*>(ar);
cout << "as->i = " << hex << setfill('0') << as->i << "\n";
cout << "ar->i = " << ar->i << "\n";
cout << "b.i = " << b.i << "\n";
cout << "c->i = " << c->i << "\n";
cout << "\n";
cout << "&(as->i) = " << &(as->i) << "\n";
cout << "&(ar->i) = " << &(ar->i) << "\n";
cout << "&(b.i) = " << &(b.i) << "\n";
cout << "&(c->i) = " << &(c->i) << "\n";
cout << "\n";
cout << "&b = " << &b << "\n";
cout << "as = " << as << "\n";
cout << "ar = " << ar << "\n";
cout << "c = " << c << "\n";
cout << "Press ENTER to exit.\n";
getline(cin,s);
}
Which results in output like this:
as->i = 0
ar->i = 50ee64
b.i = 0
c->i = 0
&(as->i) = 00EFF978
&(ar->i) = 00EFF974
&(b.i) = 00EFF978
&(c->i) = 00EFF978
&b = 00EFF974
as = 00EFF978
ar = 00EFF974
c = 00EFF974
Press ENTER to exit.
It can be seen that the B object is built in memory as B-specific data first, followed by the embedded A object. The static_cast correctly returns the address of the embedded A object, and the pointer created by static_cast correctly gives the value of the data field. The pointer generated by reinterpret_cast treats b's memory location as if it were a plain A object, and so when the pointer tries to get the data field it returns some B-specific data as if it were the contents of this field.
One use of reinterpret_cast is to convert a pointer to an unsigned integer (when pointers and unsigned integers are the same size):
int i;
unsigned int u = reinterpret_cast<unsigned int>(&i);
You could use reinterprete_cast to check inheritance at compile time.
Look here:
Using reinterpret_cast to check inheritance at compile time
template <class outType, class inType>
outType safe_cast(inType pointer)
{
void* temp = static_cast<void*>(pointer);
return static_cast<outType>(temp);
}
I tried to conclude and wrote a simple safe cast using templates.
Note that this solution doesn't guarantee to cast pointers on a functions.
First you have some data in a specific type like int here:
int x = 0x7fffffff://==nan in binary representation
Then you want to access the same variable as an other type like float:
You can decide between
float y = reinterpret_cast<float&>(x);
//this could only be used in cpp, looks like a function with template-parameters
or
float y = *(float*)&(x);
//this could be used in c and cpp
BRIEF: it means that the same memory is used as a different type. So you could convert binary representations of floats as int type like above to floats. 0x80000000 is -0 for example (the mantissa and exponent are null but the sign, the msb, is one. This also works for doubles and long doubles.
OPTIMIZE: I think reinterpret_cast would be optimized in many compilers, while the c-casting is made by pointerarithmetic (the value must be copied to the memory, cause pointers couldn't point to cpu- registers).
NOTE: In both cases you should save the casted value in a variable before cast! This macro could help:
#define asvar(x) ({decltype(x) __tmp__ = (x); __tmp__; })
Quick answer: use static_cast if it compiles, otherwise resort to reinterpret_cast.
Read the FAQ! Holding C++ data in C can be risky.
In C++, a pointer to an object can be converted to void * without any casts. But it's not true the other way round. You'd need a static_cast to get the original pointer back.