interpreting object addresses with reintepret_cast - c++

The following code gives the output as 136. But I could not understand how the first two address comparisons are equal. Appreciate any help to understand this. Thank you.
#include <iostream>
class A
{
public:
A() : m_i(0){ }
protected:
int m_i;
};
class B
{
public:
B() : m_d(0.0) { }
protected:
double m_d;
};
class C : public A, public B
{
public:
C() : m_c('a') { }
private:
char m_c;
};
int main( )
{
C d;
A *b1 = &d;
B *b2 = &d;
const int a = (reinterpret_cast<char *>(b1) == reinterpret_cast<char *>(&d)) ? 1 : 2;
const int b = (b2 == &d) ? 3 : 4;
const int c = (reinterpret_cast<char *>(b1) == reinterpret_cast<char *>(b2)) ? 5 : 6;
std::cout << a << b << c << std::endl;
return 0;
}

When you use multiple inheritance like in your example the first base class and the derived class share the same base address. Additional classes you inherit from are arranged in order at an offset based on the size of all preceding classes. The result of the comparison is true because the base address of d and b1 are the same.
In your case, if the size of A is 4 bytes then B will start at base address of A + 4 bytes. When you do B *b2 = &d; the compiler calculates the offset and adjusts the pointer value accordingly.
When you do b2 == &d an implicit conversion from type 'C' to type 'B' is performed on d before the comparison is done. This conversion adjusts the offset of the pointer value just as it would in an assignment.

It’s pretty typical for a derived class (like C here) to be laid out in memory so it starts with its two base classes (A and B), so the address of an instance of type C would be identical to the address of the instance its first base class (i.e. A).

In this kind of inheritance (when virtual is not involved,) each instance of C will have the following layout:
First, there will be all members of A (which is just m_i, a 4-byte integer)
Second will be all members of B (which is just m_d, an 8-byte double)
Last will be all members of C itself, which is just a character (1 byte, m_c)
When you cast a pointer to an instance of C to A, because A is the first parent, no address adjustment takes place, and the numerical value of the two pointers will be the same. This is why the first comparison evaluates to true. (Note that doing a reinterpret_cast<char *>() on a pointer never causes adjustment, so it always gives the numerical value of the pointer. Casting to void * would have the same effect and is probably safer for comparison.)
Casting a pointer to an instance of C to B will cause a pointer adjustment (by 4 bytes) which means that the numerical value of b2 will not be equal to &d. However, when you directly compare b2 and &d, the compiler automatically generates a cast for &d to B *, which will adjust the numerical value by 4 bytes. This is the reason that the second comparison also evaluates to true.
The third comparison return false because, as said before, casting a pointer to an instance of C to A or to B will have different results (casting to A * doesn't do adjustment, while casting to B * does.)

Related

Value cast vs Reference cast

What is the difference between value cast and reference cast? Why one of them invokes conversion (aka creating new object) and other doesn't? What are caveats of using casting on rhs?
Assume this is Derived . Why those will not actually cast to Base?
*this = (Base&) rhs
(Base)* this = rhs
Could you please show on simple examples?
Value cast creates a new value from an existing one; reference cast creates a new reference to the same existing value.
Reference cast neither changes the content of an existing object nor creates a new one; it is restricted to changing the interpretation of the value that is already there. Value casting, on the other hand, can make a new object from an existing one, so it has fewer restrictions.
For example, if you have an unsigned char and you want a value or a reference of type int, value cast is going to work, while reference casting is going to fail:
unsigned char orig = 'x';
int v(orig); // Works
int &r(orig); // Does not work
rhs is Derived, I want to assign all inherited and non-inherited stuff from rhs into Base
Then you need to cast both sides to Base, or add an assignment operator to Derived that takes a const Base& as an argument. Here is an example of casting on both sides (may be hard to understand by other programmers reading your code)
struct Base {
int x;
Base(int x) : x(x) {}
};
struct Derived1 : public Base {
Derived1(int x) : Base(x) {}
};
struct Derived2 : public Base {
Derived2(int x) : Base(x) {}
};
Running the code below
Derived1 d1(5);
Derived2 d2(10);
cout << d1.x << " " << d2.x << endl;
((Base&)d1) = (Base&)d2;
cout << d1.x << " " << d2.x << endl;
produces the following printout:
5 10
10 10
As you can see, the assignment ((Base&)d1) = (Base&)d2 copied the content of d2's Base portion into d1's Base portion (demo).
What is the difference between value cast and reference cast?
Value casts convert an object to the value:
char i = 'a';
int k = static_cast<int>(i); // Prefer C++ casts to C casts
Reference casts convert an object to a reference:
char i = 'a';
int &k = static_cast<int&>(i);
Just because the conversion can be done implicitly in int &k = i doesn't mean it doesn't happen.
Why one of them invokes conversion (aka creating new object) and other doesn't?
If you write int &x = static_cast<int&>(i), there are 2 things that can happen:
1) A pointer is created pointing to i (references are hidden pointers). Then this hidden pointer gets assigned to x, and x behaves as a reference of i.
2) Usually, the compiler optimizes away this reference, and simply considers x an alias of i. Therefore no variable is instantiated.
In the former case, a new object is created.
However, if you write:
char c = 'a';
int i = static_cast<int> (c);
there is no instantiation, just a copy of the memory from c to i.
Why those will not actually cast to Base?
*this = (Base&) rhs
You cannot assign the base object to a derived object, only the opposite. This will most likely overwrite the fields of the base object to the derived object's.
(Base)* this = rhs
There is no point in castling an l-value. This is equivalent to:
*this = rhs;
What are caveats of using casting on rhs?
I don't think there is anything wrong with casting, as long as they do not decrease readability.

C++ pointer covariance

It never occured to me that c++ has pointer covariance and therefore lets you shoot yourself in the leg like this:
struct Base
{
Base() : a(5) {}
int a;
};
struct Child1 : public Base
{
Child1() : b(7) {}
int b;
int bar() { return b;}
};
struct Child2 : public Base
{
Child2(): c(8) {}
int c;
};
int main()
{
Child1 children1[2];
Base * b = children1;
Child2 child2;
b[1] = child2; // <------- now the first element of Child1 array was assigned a value of type Child2
std::cout << children1[0].bar() << children1[1].bar(); // prints 57
}
Is it an undefined behaviour? Is there any way to prevent it or at least have a warning from compiler?
Yes, this is undefined behavior.
And no, a typical C++ compiler, at this point, is unlikely to be able to identify something that merits a diagnostic, here. But, C++ compilers get smarter with each passing year. Who knows what will be the state of affairs, years from now...
However, a minor quibble:
b[1] = child2; // <------- now the first element of Child1 array was assigned...
No. That's not the first element. it's the second element. b[0] would be the first element. Furthermore, b is not an array, it's a pointer. And it's a pointer to a single element. It's not a pointer to a two element array.
And that is where the undefined behavior comes from.
The reason it's not an array is because:
Base * b = children1;
children1 decays to a Child1 *. If that's where the affair ended, you could say that b would be a pointer to a two-element array.
But that's not where the things ended. The decayed pointer was than casted to a Base *. You can implicitly cast a pointer to a subclass to a pointer to a superclass. But (loosely speaking now) you cannot cast a pointer to an array of subclasses to an array of superclasses. Hence, b is, strictly, a pointer to a single element, and b[1] becomes undefined behavior.

How does evaluate pointers and reinterpret_cast?

I have the following code that I run in Visual Studio. The address of c is the same as the address to which points pa but not the same as pb. Yet both ternary operator will evaluate as true, which is what would have expected by only viewing the code and not see the pointed addresses for pa and pb in debugger.
The third ternary operator will evaluate as false.
#include <iostream>
class A
{
public:
A() : m_i(0) {}
protected:
int m_i;
};
class B
{
public:
B() : m_d(0.0) {}
protected:
double m_d;
};
class C
: public A
, public B
{
public:
C() : m_c('a') {}
private:
char m_c;
};
int main()
{
C c;
A *pa = &c;
B *pb = &c;
const int x = (pa == &c) ? 1 : 2;
const int y = (pb == &c) ? 3 : 4;
const int z = (reinterpret_cast<char*>(pa) == reinterpret_cast<char*>(pb)) ? 5 : 6;
std::cout << x << y << z << std::endl;
return 0;
}
How does this work?
pa and pb are actually different. One way to test that is:
reinterpret_cast<char*>(pa) == reinterpret_cast<char*>(pb)
pa == &c and pb == &c both return true, but that does not mean the above must be true. &c will be converted to appropriate pointer type (A* or B*) via implicit pointer conversion. This conversion changes the pointer's value to the address of respective base class subobject of the object pointed-to by &c.
From cppreference:
A prvalue pointer to a (optionally cv-qualified) derived class type can be converted to a prvalue pointer to its accessible, unambiguous (identically cv-qualified) base class. The result of the conversion is a pointer to the base class subobject within the pointed-to object. The null pointer value is converted to the null pointer value of the destination type.
(emphasis mine)
A is the first non-virtual base class of C, so it is placed directly at the beginning of C's memory space, i.e.:
reinterpret_cast<char*>(pa) == reinterpret_cast<char*>(&c)
is true. But, B subobject is laid out after A, so it can not possibly satisfy the above condition. Both implicit conversion and static_cast then gives you the right address of the base subobject.
A C instance has an A subobject and a B subobject.
Something like this:
|---------|
|---------|
| A |
|---------|
C: |---------|
| B |
|---------|
|---------|
Now,
A *pa = &c;
makes pa point to the location of the A subobject, and
B *pb = &c;
makes pb point to the location of the B subobject.
|---------|
|---------| <------ pa
| A |
|---------|
C: |---------| <------ pb
| B |
|---------|
|---------|
When you compare pa and pb to &c, the same thing happens - in the first case, &c is the location of the A subobject and in the second it's the location of the B subobject.
So the reason that they both compare equal to &c is that the expression &c actually has different values (and different types) in the comparisons.
When you reinterpret_cast, no adjustment takes place - it means "take the representation of this value and interpret it as representing a value of a different type".
Since the subobjects are in different locations, the results of reinterpreting them as locations of a char are also different.
If you add some extra output, you can see what is going on; I added the following line:
std::cout << "pa: " << pa << "; pb: " << pb << "; c: " << &c << std::endl;
The output of this will vary of course, since I am printing the values of the pointers, but it will look like:
pa: 0x1000 pb: 0x1008 c: 0x1000
The pb pointer is in fact pointing at pa + sizeof(int) (which on my 64 bit machine is 8 bytes). This is because when you do:
B *pb = &c;
The compiler is casting the C object to a B, and will return you the value of the B variable. The confusion is that your second ternary operator shows true. This is (I am assuming) because the address of B is within the bounds of the address of C.
You're comparing the address pa and pb pointing to directly, they're different because A and B are both base class of C, and pa is pointing to the base class subobject A of c, pb is pointing to the base class subobject B of c, the actual memory address will be different. They can't/shouldn't point to the same memory address.

Address of upcast object

Suppose B is a base class of D (maybe virtual, maybe multiple inheritance, need not be a direct base class).
Let obj be an object of type D (not of a subclass of D -- exactly D).
Let
D * d = std::addressof(obj);
B * b = d;
Can we safely assume that
(char*) d <= (char*) b && (char*) b < (char*) d + sizeof(D)
?
Background: This is to become a step in a routine determining whether some object has been created by placement new in a particular aligned_storage. I need to be sure that, if yes, all pointers to base objects of this object point to some address within the aligned_storage.
I am pretty sure that your assumption is safe given D is the final type of the object. Otherwise it would be treacherous to use placement new in the first place.
#include <stdlib.h>
#include <new>
struct B { int i; };
struct D : virtual B { int j; };
int
main()
{
auto const storage = malloc(sizeof(D));
D* d = new (storage) D();
free(storage);
return 0;
}
If B were located before d then the placement new would need to return a pointer adjusted based on the layout of D but "the standard allocation function void* operator new(std::size_t, void*) ... simply returns its second argument unchanged." (http://en.cppreference.com/w/cpp/language/new) Likewise, the storage of B cannot be situated such that it extends beyond (char*)d + sizeof(D) because it would overrun the memory allocated.
Thanks for sharing an interesting question. Perhaps since asking the question you have already found a more satisfactory answer. I would be interested in reading a more concrete proof why the assumption holds or does not.

Are address of object and pointer to object the same thing for an object of polymorph class?

I was trying to solve a c++ test, and saw this question.
#include <iostream>
class A
{
public:
A() : m_i(0) { }
protected:
int m_i;
};
class B
{
public:
B() : m_d(0.0) { }
protected:
double m_d;
};
class C
: public A
, public B
{
public:
C() : m_c('a') { }
private:
char m_c;
};
int main()
{
C c;
A *pa = &c;
B *pb = &c;
const int x = (pa == &c) ? 1 : 2;
const int y = (pb == &c) ? 3 : 4;
const int z = (reinterpret_cast<char*>(pa) == reinterpret_cast<char*>(pb)) ? 5 : 6;
std::cout << x << y << z << std::endl;
return 0;
}
Output :
136
Can anyone explain it's output? I thought the base pointer points to the part of the base part, so it's not the real address of the object.
Thanks.
pa points to A subobject of c. pb points to B subobject of c. Obviously, they point to different locations in memory (so 6 in the output).
But when they are compared to &c, &c is again converted to A* and B* respectively, thus pointing to the same A and B subobject.
Here's for illustration the likely layout of c in memory:
+------------------------+-------------+-------------------+
| A subobject | B subobject | Remainder of C |
+------------------------+-------------+-------------------+
^ &c is here ^ pb points here
^ pa also points here
Background
Object C looks something like this in memory
----------- <----- Start of the object
| A |
|---------| <----- Beginning of B implementation
| B |
|---------|
| C |
|_________| <----- End of the object
When you take a pointer to a base class from a derived class (e.g. A* pa = &c), the pointer points to the beginning of that class implementation for that object.
So this means A* will point to the beginning of A (which happens to be the beginning of the object) and B* will point to the beginning of B. Note that C* will not point to the beginning of C because it knows that C is derived from A and B. It will point to the beginning of the object.
Why?
Because when you call pb->someFunction(), it actually takes the pointer pointing to B and adds some precalculated offset and executes. If pb was pointing to the beginning of A, then it would end up in inside A. The pre-calculated offset is necessary because you have no idea what pb actually points to (is it C, is it "D", or just plain old B?). This approach allows us to always rely on the offset for finding the function.
Here's what your code is really doing
((A*)pa == (A*)&c) // Obviously true, since we defined it as such above.
((B*)pb == (B*)&c) // Obviously true, since we defined it as such above.
(reinterpret_cast<char*>(pa) == reinterpret_cast<char*>(pb)) // We know pa and pb point to different places in memory. If we cast them both to char*, they will obviously not be equivalent.
An interesting thing to try is
if (pa == pb)
This will give you a compilation error because you need to cast both pointers to a common type.