C++: Pointer contains different address after being passed - c++

So i have some code like this:
#include <iostream>
using namespace std;
class Base1 {};
class Base2 {};
class A
{
public:
A() {}
void foo(Base2* ptr) { cout << "This is A. B is at the address " << ptr << endl; }
};
A *global_a;
class B : public Base1, public Base2
{
public:
B() {}
void bar()
{
cout << "This is B. I am at the address " << this << endl;
global_a->foo(this);
}
};
int main()
{
global_a = new A();
B *b = new B();
b->bar();
system("pause");
return 0;
}
But this is the output i get after compiling with Visual Studio 2013:
This is B. I am at the address 0045F0B8
This is A. B is at the address 0045F0B9
Press any key to continue . . .
Can somebody please explain why the addresses are different?

0x0045F0B8 is the address of the complete B object. 0x0045F0B9 is the address of the Base2 subobject of the B object.
In general the address of the complete object might not be the same as the address of a base class subobject. In your case the B object is probably laid out as follows:
+---+-------+
| | Base1 | <-- 0x0045F0B8
| B |-------+
| | Base2 | <-- 0x0045F0B9
+---+-------+
Each base class occupies one byte and Base1 is laid out before Base2. The pointer to the complete B points to the beginning, which is at 0x0045F0B8, but the pointer to the Base2 points to the address inside the complete B object at which the Base2 subobject starts, which is 0x0045F0B9.
However when I compile your program on my system using g++ 4.8, I get the same address printed in both lines. This is presumably because the implementation is allowed to allocate no space at all for empty base classes (the so-called empty base class optimization) and the two base class subobjects Base1 and Base2 are both located at the very beginning of the B object, taking no space, and sharing their address with B.

B derives from Base1 and Base2, so it consists of all the data that Base1 and Base2 contain, and all of the data that B adds on top of them.
B::bar() is passing a pointer to the Base2 portion of itself to A::for(), not the B portion of itself. B::bar() is printing the root address of the B portion, whereas A::foo() is printing the root address of the Base2 portion instead. You are passing around the same object, but they are different addresses within that object:
If B does not add any new data, its base address might be the same as the root address of its nearest ancestor (due to empty base optimization):
Don't rely on that. A compiler might add padding between the classes, for instance:
Always treat the various sections as independent (because they logically are). Just because B derives from Base2 does not guarantee that a B* pointer and a Base2* pointer, both pointing at the same object, will point at the same memory address.
If you have a Base2* pointer and need to access its B data, use dynamic_cast (or static_cast if you know for sure the object is a B) to ensure a proper B* pointer. You can downcast from B* to Base2* without casting (which is why B::bar() is able to pass this - a B* - to A::foo() when it is expecting a Base2* as input). Given a B* pointer, you can always access its Base2 data directly.

Related

Why doesn't a pointer to derived and a pointer to base point to the same address if abstract classes are involved?

I came across a problem which can be reduced to the following example:
#include "iostream"
struct A {
char a;
};
struct B : A {
virtual void f() = 0;
};
struct C : B {
void f() override {}
};
void f(A* fpa) {
std::cout << fpa << '\n' << reinterpret_cast<C*>(fpa) << std::endl;
}
int main() {
C c {};
A* pa {&c};
std::cout << &c << '\n' << pa << '\n' << reinterpret_cast<C*>(pa) << std::endl;
f(&c);
}
Neither pa nor fpa keep pointing to the address of c, although both are being initialized with &c. All addresses printed after that of &c directly are offset by +8 (tested with g++ and clang++). Removing either A::a or B::f() and C::f() or initializing pa and fpa with reinterpret_cast<A*>(&c) instead of just &c fixes the addresses.
But why do I have to do that? Shouldn't any A* be able to hold the address to any A, B, or C in this case since all inheritance is public? Why does the value change implicitly? And are there warning flags I can pass to g++ or clang++ that warn about this kind of behavior?
or initializing pa and fpa with reinterpret_cast<A*>(&c) instead of just &c fixes the addresses.
That doesn't "fix" the address. That breaks the address. It yields an invalid pointer.
But why do I have to do that?
You don't have to do that. The offset address is the correct address of the base sub object.
Why doesn't a pointer to derived and a pointer to base point to the same address if abstract classes are involved?
Because there is something stored in the object before the base sub object.
Shouldn't any A* be able to hold the address to any A, B, or C
No. The address of a valid pointer to A is always the address of an A object. If the dynamic type is derived, then that A object is a base sub object. The base can be stored at an offset from the beginning of the derived class.
since all inheritance is public
Accessibility of the Inheritance is irrelevant in this regard.
And are there warning flags I can pass to g++ or clang++ that warn about this kind of behavior?
I highly doubt that there would be. I also don't see why you'd want a warning in such case.

Subclass address equal to virtual base class address?

We all know that when using simple single inheritance, the address of a derived class is the same as the address of the base class. Multiple inheritance makes that untrue.
Does virtual inheritance also make that untrue? In other words, is the following code correct:
struct A {};
struct B : virtual A
{
int i;
};
int main()
{
A* a = new B; // implicit upcast
B* b = reinterpret_cast<B*>(a); // fishy?
b->i = 0;
return 0;
}
We all know that when using simple single inheritance, the address of
a derived class is the same as the address of the base class.
I think the claim is not true. In the below code, we have a simple (not virtual) single (non multiple) inheritance, but the addresses are different.
class A
{
public:
int getX()
{
return 0;
}
};
class B : public A
{
public:
virtual int getY()
{
return 0;
}
};
int main()
{
B b;
B* pB = &b;
A* pA = static_cast<A*>(pB);
std::cout << "The address of pA is: " << pA << std::endl;
std::cout << "The address of pB is: " << pB << std::endl;
return 0;
}
and the output for VS2015 is:
The address of pA is: 006FF8F0
The address of pB is: 006FF8EC
Does virtual inheritance also make that untrue?
If you change the inheritance in the above code into virtual, the result will be the same. so, even in the case of virtual inheritance, the addresses of base and derived objects can be different.
The result of reinterpret_cast<B*>(a); is only guaranteed to point to the enclosing B object of a if the a subobject and the enclosing B object are pointer-interconvertible, see [expr.static.cast]/3 of the C++17 standard.
The derived class object is pointer-interconvertible with the base class object only if the derived object is standard-layout, does not have direct non-static data members and the base class object is its first base class subobject. [basic.compound]/4.3
Having a virtual base class disqualifies a class from being standard-layout. [class]/7.2.
Therefore, because B has a virtual base class and a non-static data member, b will not point to the enclosing B object, but instead b's pointer value will remain unchanged from a's.
Accessing the i member as if it was pointing to the B object then has undefined behavior.
Any other guarantees would come from your specific ABI or other specification.
Multiple inheritance makes that untrue.
That is not entirely correct. Consider this example:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};
When creating an instance of D, B and C are instantiated each with their respective instance of A. However, there would be no problem if the instance of D had the same address of its instance of B and its respective instance of A. Although not required, this is exactly what happens when compiling with clang 11 and gcc 10:
D: 0x7fffe08b4758 // address of instance of D
B: 0x7fffe08b4758 and A: 0x7fffe08b4758 // same address for B and A
C: 0x7fffe08b4760 and A: 0x7fffe08b4760 // other address for C and A
Does virtual inheritance also make that untrue
Let's consider a modified version of the above example:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Using the virtual function specifier is typically used to avoid ambiguous function calls. Therefore, when using virtual inheritance, both B and C instances must create a common A instance. When instantiating D, we get the following addresses:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // again, address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // A has the same address as before (common instance)
Is the following code correct
There is no reason here to use reinterpret_cast, even more, it results in undefined behavior. Use static_cast instead:
A* pA = static_cast<A*>(pB);
Both casts behave differently in this example. The reinterpret_cast will reinterpret pB as a pointer to A, but the pointer pA may point to a different address, as in the above example (C vs A). The pointer will be upcasted correctly if you use static_cast.
The reason a and b are different in your case is because, since A is not having any virtual method, A is not maintaining a vtable. On the other hand, B does maintain a vtable.
When you upcast to A, the compiler is smart enough to skip the vtable meant for B. And hence the difference in addresses. You should not reinterpret_cast back to B, it wouldn't work.
To verify my claim, try adding a virtual method, say virtual void foo() {} in class A. Now A will also maintain a vtable. Thus downcast(reinterpret_cast) to B will give you back the original b.

What is the underlying mechanism of a base class pointer assigned to derived class

Similar questions I found were more based on what this does; I understand the assignment of a base class pointer to a derived class, e.g Base* obj = new Derived() to be that the right side gets upcasted to a Base* type, but I would like to understand the mechanism for how this happens and how it allows for virtual to access derived class methods. From searching online, someone equated the above code to Base* obj = new (Base*)Derived, which is what led to this confusion. If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions (the functions of the Derived class)? Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj? Thanks, and apologies for the simplicity of the question. I'd like to understand what causes this behavior.
Note: for clarity, in my example, Derived publicly inherits from Base.
In a strict sense, the answer to "how does inheritance work at runtime?" is "however the compiler-writer designed it". I.e., the language specification only describes the behavior to be achieved, not the mechanism to achieve it.
In that light, the following should be seen as analogy. Compilers will do something analogous to the following:
Given a class Base:
class Base
{
int a;
int b;
public:
Base()
: a(5),
b(3)
{ }
virtual void foo() {}
virtual void bar() {}
};
The compiler will define two structures: one we'll call the "storage layout" -- this defines the relative locations of member variables and other book-keeping info for an object of the class; the second structure is the "virtual dispatch table" (or vtable). This is a structure of pointers to the implementations of the virtual methods for the class.
This figure gives an object of type Base
Now lets look as the equivalent structure for a derived class, Derived:
class Derived : public Base
{
int c;
public:
Derived()
: Base(),
c(4)
{ }
virtual void bar() //Override
{
c = a*5 + b*3;
}
};
For an object of type Derived, we have a similar structure:
The important observation is that the in-memory representation of both the member-variable storage and the vtable entries, for members a and b, and methods foo and bar, are identical between the base class and subclass. So a pointer of type Base * that happens to point to an object of type Derived will still implement an access to the variable a as a reference to the first storage offset after the vtable pointer. Likewise, calling ptr->bar() passes control to the method in the second slot of the vtable. If the object is of type Base, this is Base::bar(); if the object is of type Derived, this is Derived::bar().
In this analogy, the this pointer points to the member storage block. Hence, the implementation of Derived::bar() can access the member variable c by fetching the 3rd storage slot after the vtable pointer, relative to this. Note that this storage slot exists whenever Derived::bar() sits in the second vtable slot...i.e., when the object really is of type Derived.
A brief aside on the debugging insanity that can arise from corrupting the vtable pointer for compilers that use a literal vtable pointer at offset 0 from this:
#include <iostream>
class A
{
public:
virtual void foo()
{
std::cout << "A::foo()" << std::endl;
}
};
class B
{
public:
virtual void bar()
{
std::cout << "B::bar()" << std::endl;
}
};
int main(int argc, char *argv[])
{
A *a = new A();
B *b = new B();
std::cout << "A: ";
a->foo();
std::cout << "B: ";
b->bar();
//Frankenobject
*((void **)a) = *((void **)b); //Overwrite a's vtable ptr with b's.
std::cout << "Franken-AB: ";
a->foo();
}
Yields:
$ ./a.out
A: A::foo()
B: B::bar()
Franken-AB: B::bar()
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
...note the lack of an inheritance relationship between A and B... :scream:
Whoever says
Base* obj = new Derived();
is equivalent to
Base* obj = new (Base*)Derived;
is ignorant of the subject matter.
It's more like:
Derived* temp = new Derived;
Base* obj = temp;
The explicit cast is not necessary. The language permits a derived class pointer to be assigned to a base class pointer.
Most of the time the numerical value of the two pointers are same but they are not same when multiple inheritance or virtual inheritance is involved.
It's the compiler's responsibility to make sure that numerical value of the pointer is offset properly when converting a derived class pointer to a base class pointer. The compiler is able to do that since it makes the decision about the layout of the derived class and the base class sub-objects in the derived class object.
If this type-casting is going on at compile-time, why and how can virtual functions access the correct functions
There is no type casting. There is a type conversion. Regarding the virtual functions, please see How are virtual functions and vtable implemented?.
Further, if this casting happens in the way I read it, why do we get errors when we assign a non-inheriting class to Base* obj?
This is moot since it does not happen the way you thought they did.

Why are pA,pB,pC not equal?

Consider the following program
#include<iostream>
using namespace std;
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
class ClassC : public ClassA,public ClassB
{
};
void main()
{
ClassC aObject;
ClassA* pA = &aObject;
ClassB* pB = &aObject;
ClassC* pC = &aObject;
cout<<"pA = "<<pA<<endl;
cout<<"pB = "<<pB<<endl;
cout<<"pC = "<<pC<<endl;
}
pA,pB,pC are supposed to equal,but the result is
pA = 0031FD90
pB = 0031FD94
pC = 0031FD90
why pB = pA + 4?
and when i change
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
to
class ClassA
{
};
class ClassB
{
};
the result is
pA = 0030FAA3
pB = 0030FAA4
pC = 0030FAA3
pB = pA + 1?
The multiply inherited object has two merged sub-objects. I would guess the compiler is pointing one of the pointers to an internal object.
C has two inherited subobjects, therefore is the concatenation of a A object and a B object. When you have an object C, it is composed of an object A followed by an object B. They're not located at the same address, that's why. All three pointers point to the same object, but as different superclasses. The compiler makes the shift for you, so you don't have to worry about that.
Now. Why is there a difference of 4 in one case and 1 in another? Well, in the first case, you have virtual functions for both A and B, therefore each subobject has to have a pointer to its vtable (the table containing the addresses of the resolved virtual function calls). So in this case, sizeof(A) is 4. In the second case, you have no virtual functions, so no vtable. But each subobject must be addressable independently, so the compiler still has to allocate for a different addresses for subobject of class A and subobject of class B. The minimum of difference between two addresses is 1. But I wonder if EBO (empty base-class optimization) should not have kicked in in this case.
That's the implementation detail of compiler.
The reason you hit this case is because you have MI in your code.
Think about how the computer access the member in ClassB, it using the offset to access the member. So let's say you have two int in class B, it using following statement to access the second int member.
*((int*)pb + 1) // this actually will be assembly generate by compiler
But if the pb point to the start of the aObject in your class, this will not work anymore, so the compiler need to generate multiple assembly version to access the same member base on the class inherit structure, and have run-time cost.
That's why the compiler adjust the pb not equal as pa, which will make the above code works, it is the most simply and effect way to implement.
And that's also explain why pa == pc but not equals with pb.

About dynamic cast and address of base and derived objects

I was reading through some of effective c++ and I realized I may be incorrect in my thinking along the way.
class A
{
public:
void laka()
{
const void * raw = dynamic_cast<const void*>(this);
cout << raw << endl;
}
virtual ~A() = 0;
};
A::~A() {}
class B : public A
{
public:
void ditka() {}
};
int _tmain(int argc, _TCHAR* argv[])
{
B b;
cout << &b << endl;
b.laka();
return 0;
}
The book stated that by using dynamic_cast with *void, I would get the starting address of an object however, all of the addresses output of the same.
When I just output the address of the plain old &b above, is the address displayed the starting address of the derived object or the base object within b?
If I was incorrect or wrong about #1, how would I get the starting addresses of each subobject within b? Do I just manually have to offset and how does dynamic_cast work with this or just clarify what the author meant?
Most implementations of inheritance put the first base class subobject at the beginning of the derived class, so you really need two base classes, both with data members, to be able to see this. Consider:
#include <iostream>
struct B1 {
int x;
virtual ~B1() { }
};
struct B2 {
int y;
virtual ~B2() { }
};
struct D : B1, B2 { };
int main() {
D x;
B1* b1_ptr = &x;
B2* b2_ptr = &x;
std::cout << "original address: " << &x << "\n";
std::cout << "b1_ptr: " << b1_ptr << "\n";
std::cout << "dynamic_cast b1_ptr: " << dynamic_cast<void*>(b1_ptr) << "\n";
std::cout << "b2_ptr: " << b2_ptr << "\n";
std::cout << "dynamic_cast b2_ptr: " << dynamic_cast<void*>(b2_ptr) << "\n";
}
Example output (from my machine; your results will be similar):
original address: 0030FB88
b1_ptr: 0030FB88
dynamic_cast b1_ptr: 0030FB88
b2_ptr: 0030FB90
dynamic_cast b2_ptr: 0030FB88
This tells us that the B1 subobject of D is located at the beginning, so it has the same address as the D object of which it is a subobject.
The B2 subobject is located at a different address, but when you use dynamic_cast<void*> on the pointer to the B2 subobject, it gives you the address of the D object of which it is a subobject.
The book was correct, a dynamic_cast to cv-qualified void* converts the pointer to a pointer to the most derived object pointed to by the pointer that you supply, so you get the starting address of the derived object. Both your output statements should print the same address (assuming there isn't a specific std::ostream and B* overload for operator<<) as b is the most derived object.
There is no reason the a base class subobject can't have the same starting address as a derived object and this is what often happens in many implementations, at least for the first base class subobject in a derived class.
This is all compiler and implementation dependent. In your case a B is a A + something, so it sotres the A then the B specific members. so the address of &b and the one displayed by your dynamic_cast ought to be the same.
When I just output the address of the plain old &b above, is the address
displayed the starting address of the
derived object or the base object
within b?
You could say "yes", it's the starting address of the base object class A within b (which is the same as the starting address of the derived object b itself) ... but the derived object is not really a "separate" object from the base-object. The derived object is also not something that will necessarily start with a fixed offset from the base object, especially if it's a non-POD class (plain-old-data-type) with virtual functions since the first address of both the base and derived objects is a pointer to a v-table specific to either the base or derived object. So you can't really "slice" apart a derived object into a "base-object" and derived object, other than the fact that for most compiler instances, the derived objects non-static data members will come after an offset from the non-static data memebers of the base-object. But again, arbitrary "slicing" will cause issues with the v-table pointer, and also for non-POD classes, any private non-static member objects may be allocated in an "optimized" fashion that may make the memory layout between the base and derived objects something that is not exactly a clean "slice".