How dose base pointer address in derived class? - c++

As I know that we could use base class pointer to address and call virtual functions in derived classes, because base class pointer has a more limited pointer scope. But I just want to know, how does base class pointer know where to start in derived class?
For example, for the record A, B and C all HAS there own data members, we can discuss this issue in two different categories 1. A, B & C all have their virtual functions; 2. A, B & C only have their own data members without any virtual functions
class A {...};
class B {...};
class C : public A, public B {...};
C c;
B* b = &c
Inside C, A should be placed over the top of B, so how does pointer b know where to start addressing in C?

The compiler knows the layout, so it will emit code, which will calculate the address of the B subobject in C.
Let's see a simple actual example:
class A { int x; };
class B { int y; };
class C: public A, public B { };
B *get(C &c) {
return &c;
}
In get, the compiler must calculate where B resides in C. Here's an example of the compiled code (godbolt):
get(C&): # #get(C&)
lea rax, [rdi + 4]
ret
This means that the compiler will add 4 bytes to the address of the input argument (c), and returns that (it adds 4, because sizeof(A) is 4, and the compiler decided to not add any additional padding).
Note, that if A is empty, then it is likely that the address of B is the same as C. And if A is not empty (has non-static data members, or has virtual functions), then B's address will likely differ. But all these are implementation details, depends on the platforms ABI (Application Binary Interface).

Related

Clarification on "this" pointer with multiple inheritance in C++

I am a bit confused as to how the "this" pointer works with base and derived classes. Consider this code:
class A {
//some code
};
class B: public A {
//some code
};
Is it correct to assume that the "this" pointer invoked from class B will be pointing to the same address as a "this" pointer invoked from some code in the base class A?
What about multiple inheritance?
class A {
//some code
};
class B {
//some code
};
class C: public A, public B {
//some code
};
Is it correct to assume that the "this" pointer invoked from class C will be pointing to the same address as a "this" pointer invoked from some code in the base classes A or B?
Is it correct to assume that the "this" pointer invoked from class B will be pointing to the same address as a "this" pointer invoked from some code in the base class A?
No, you cannot assume this. However, although not required, many implementations will use the same address (tested with clang 11 and gcc 10).
With multiple inheritance, in your example, clang 11 and gcc 10 use the same address for an instance of class C and it's base A. The base B will have another address.
Another example is multiple inheritance with a common base instance:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Instantiating D, we get with gcc 10 and clang 11:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // different address for C
There are no rules specifying the relations between addresses of instances of a derived and a base class. This is implementation dependent.

Subclass address equal to virtual base class address?

We all know that when using simple single inheritance, the address of a derived class is the same as the address of the base class. Multiple inheritance makes that untrue.
Does virtual inheritance also make that untrue? In other words, is the following code correct:
struct A {};
struct B : virtual A
{
int i;
};
int main()
{
A* a = new B; // implicit upcast
B* b = reinterpret_cast<B*>(a); // fishy?
b->i = 0;
return 0;
}
We all know that when using simple single inheritance, the address of
a derived class is the same as the address of the base class.
I think the claim is not true. In the below code, we have a simple (not virtual) single (non multiple) inheritance, but the addresses are different.
class A
{
public:
int getX()
{
return 0;
}
};
class B : public A
{
public:
virtual int getY()
{
return 0;
}
};
int main()
{
B b;
B* pB = &b;
A* pA = static_cast<A*>(pB);
std::cout << "The address of pA is: " << pA << std::endl;
std::cout << "The address of pB is: " << pB << std::endl;
return 0;
}
and the output for VS2015 is:
The address of pA is: 006FF8F0
The address of pB is: 006FF8EC
Does virtual inheritance also make that untrue?
If you change the inheritance in the above code into virtual, the result will be the same. so, even in the case of virtual inheritance, the addresses of base and derived objects can be different.
The result of reinterpret_cast<B*>(a); is only guaranteed to point to the enclosing B object of a if the a subobject and the enclosing B object are pointer-interconvertible, see [expr.static.cast]/3 of the C++17 standard.
The derived class object is pointer-interconvertible with the base class object only if the derived object is standard-layout, does not have direct non-static data members and the base class object is its first base class subobject. [basic.compound]/4.3
Having a virtual base class disqualifies a class from being standard-layout. [class]/7.2.
Therefore, because B has a virtual base class and a non-static data member, b will not point to the enclosing B object, but instead b's pointer value will remain unchanged from a's.
Accessing the i member as if it was pointing to the B object then has undefined behavior.
Any other guarantees would come from your specific ABI or other specification.
Multiple inheritance makes that untrue.
That is not entirely correct. Consider this example:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};
When creating an instance of D, B and C are instantiated each with their respective instance of A. However, there would be no problem if the instance of D had the same address of its instance of B and its respective instance of A. Although not required, this is exactly what happens when compiling with clang 11 and gcc 10:
D: 0x7fffe08b4758 // address of instance of D
B: 0x7fffe08b4758 and A: 0x7fffe08b4758 // same address for B and A
C: 0x7fffe08b4760 and A: 0x7fffe08b4760 // other address for C and A
Does virtual inheritance also make that untrue
Let's consider a modified version of the above example:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Using the virtual function specifier is typically used to avoid ambiguous function calls. Therefore, when using virtual inheritance, both B and C instances must create a common A instance. When instantiating D, we get the following addresses:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // again, address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // A has the same address as before (common instance)
Is the following code correct
There is no reason here to use reinterpret_cast, even more, it results in undefined behavior. Use static_cast instead:
A* pA = static_cast<A*>(pB);
Both casts behave differently in this example. The reinterpret_cast will reinterpret pB as a pointer to A, but the pointer pA may point to a different address, as in the above example (C vs A). The pointer will be upcasted correctly if you use static_cast.
The reason a and b are different in your case is because, since A is not having any virtual method, A is not maintaining a vtable. On the other hand, B does maintain a vtable.
When you upcast to A, the compiler is smart enough to skip the vtable meant for B. And hence the difference in addresses. You should not reinterpret_cast back to B, it wouldn't work.
To verify my claim, try adding a virtual method, say virtual void foo() {} in class A. Now A will also maintain a vtable. Thus downcast(reinterpret_cast) to B will give you back the original b.

Itanium C++ ABI primary virtual bases

I was reading here about how primary bases are chosen:
"...2. If C is a dynamic class type:
a. Identify all virtual base classes, direct or indirect, that are primary base classes for some other direct or indirect base class. Call these indirect primary base classes.
b. If C has a dynamic base class, attempt to choose a primary base class B. It is the first (in direct base class order) non-virtual dynamic base class, if one exists. Otherwise, it is a nearly empty virtual base class, the first one in (preorder) inheritance graph order which is not an indirect primary base class if any exist, or just the first one if they are all indirect primaries..."
And after there is this correction:
"Case (2b) above is now considered to be an error in the design. The use of the first indirect primary base class as the derived class' primary base does not save any space in the object, and will cause some duplication of virtual function pointers in the additional copy of the base classes virtual table.
The benefit is that using the derived class virtual pointer as the base class virtual pointer will often save a load, and no adjustment to the this pointer will be required for calls to its virtual functions.
It was thought that 2b would allow the compiler to avoid adjusting this in some cases, but this was incorrect, as the virtual function call algorithm requires that the function be looked up through a pointer to a class that defines the function, not one that just inherits it. Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to. For instance, consider this example:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
When B and C are declared, A is a primary base in each case, so although vcall offsets are allocated in the A-in-B and A-in-C vtables, no this adjustment is required and no thunk is generated. However, inside D objects, A is no longer a primary base of C, so if we allowed calls to C::f() to use the copy of A's vtable in the C subobject, we would need to adjust this from C* to B::A*, which would require a third-party thunk. Since we require that a call to C::f() first convert to A*, C-in-D's copy of A's vtable is never referenced, so this is not necessary."
Could you please explain with an example what this refers to: "Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to"?
Also, what are third-party thunks?
I do not understand either what the quoted example tries to show.
A is a nearly empty class, one that contains only a vptr and no visible data members:
struct A { virtual void f(); };
The layout of A is:
A_vtable *vptr
B has a single nearly empty base class used as a "primary":
struct B : virtual public A { int i; };
It means that the layout of B begins with the layout of an A, so that a pointer to a B is a pointer to an A (in assembly language). Layout of B subobject:
B_vtable *A_vptr
int i
A_vptr will point to a B vtable obviously, which is binary compatible with A vtable.
The B_vtable extends the A_vtable, adding all necessary information to navigate to the virtual base class A.
Layout of B complete object:
A base_subobject
int i
And same for C:
C_vtable *A_vptr
int j
Layout of C complete object:
A base_subobject
int j
In D obviously there is only an A subobject, so the layout of a complete object is:
A base_subobject
int i
not(A) not(base_subobject) aka (C::A)_vptr
int j
not(A) is the representation of an A nearly empty base class, that is, a vptr for A, but not a true A subobject: it looks like an A but the visible A is two words above. It's a ghost A!
(C::A)_vptr is the vptr to vtable with layout vtable for C (so also one with layout vtable for A), but for a C subobject where A is not finally a primary base: the C subobject has lost the privilege to host the A base class. So obviously the virtual calls through (C::A)_vptr to virtual functions defined A (there is only one: A::f()) need a this ajustement, with a thunk "C::A::f()" that receives a pointer to not(base_subobject) and adjusts it to the real base_subobject of type A that is the (two words above in the example). (Or if there is an overrider in D, so the D object that is at the exact same address, two words above in the example.)
So given these definitions:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
should the use of a ghost lvalue of a non existant A primary base work?
D d;
C *volatile cp = &d;
A *volatile ghost_ap = reinterpret_cast<A*> (cp);
ghost_ap->f(); // use the vptr of C::A: safe?
(volatile used here to avoid propagation of type knowledge by the compiler)
If for lvalues of C, for a virtual functions that is inherited from A, the call is done via the the C vptr that is also a C::A vptr because A is a "static" primary base of C, then the code should work, because a thunk has been generated that goes from C to D.
In practice it doesn't seem to work with GCC, but if you add an overrider in C:
struct C : virtual public A {
int j;
virtual void f()
{
std::cout << "C:f() \n";
}
};
it works because such concrete function is in the vtable of vtable of C::A.
Even with just a pure virtual overrider:
struct C : virtual public A {
int j;
virtual void f() = 0;
};
and a concrete overrider in D, it also works: the pure virtual override is enough to have the proper entry in the vtable of C::A.
Test code: http://codepad.org/AzmN2Xeh

How to align pointers when dealing with multiple-inheritance?

Say we have a concrete class A, and an abstract class B.
Consider a concrete C, that inherits from both A and B, and implements B:
class C : public A, public B
{
/* implementation of B and specific stuff that belongs to C */
};
Now I define a function which signature is void foo(B* b);
This is my code, I can assume that every pointers to B are both A and B.
In foo's definition, how to get a pointer to A?
A nasty but working trick is to align back pointers like so:
void foo(B* b)
{
A* a = reinterpret_cast<A*>(reinterpret_cast<char*>(b) - sizeof(A));
// now I can use all the stuff from A
}
Keep in mind that C does not have a super type and actually, there are many classes akin to C which only are A and B. Feel free to question both my logic and this sample of design as well but the question is only concerning pointers alignment.
void foo(B* b)
{
//A* a = reinterpret_cast<A*>(reinterpret_cast<char*>(b) - sizeof(A)); // undefined behaviour!!!!
A* a = dynamic_cast<A*>(b);
if (a)
{
// now I can use all the stuff from A
}
else
{
// that was something else, not descended from A
}
}
Forgot to say: in order to make work dynamic cast both A and B should have virtual function(s) or at least virtual destructors. Otherwise there is no legal way to do that type conversion.
Having a huge set of unrelated classes that both derive from A and B is a very strange design. If there's something that makes A and B always be "used together" you could either merge them or introduce a shim class that only derives from them and then only derive from that class:
class Shim : A, B {};
class DerivedX : Shim {};
and in the latter case you just use static_cast to first downcast from A or B to Shim* and then C++ it will implicitly convert the Shim* pointer to the other class.
If you want to use the functionality of both class A and Class B in your function then you should modify the function to receive C pointers:
void foo(C* c);
And in general you are wrong with you assumption that "every B is an A as well". You could create classes derived from you B interface and not derived from Class A, that's why the compiler won't know that in your specific case "every B is an A".
Expanding on sharptooth's answer (and entering it as an answer, because I can't get formatted code into a comment), you can still use the shim:
class Shim : public virtual A, public virtual B {};
Then:
class Derived1 : public Shim, public virtual A, public virtual B1
{
};
class Derived2 : public Shim, public virtual A, public virtual B2
{
};
B1 and B2 must derive virtually from B.
But I suspect that if you always need to implement both A and
B, you should create a single interface with both, either by
inheriting, or coalising both into a single class; your B1 and
B2 would inherit from that. (The solution with
dynamic_cast, of course, is for the case where the derived
class of B may or may not also derived from A.)

What's the difference between a derived object and a base object in c++?

What's the difference between a derived object and a base object in c++,
especially, when there is a virtual function in the class.
Does the derived object maintain additional tables to hold the pointers
to functions?
The derived object inherits all the data and member functions of the base class. Depending on the nature of the inheritance (public, private or protected), this will affect the visibility of these data and member functions to clients (users) of your class.
Say, you inherited B from A privately, like this:
class A
{
public:
void MyPublicFunction();
};
class B : private A
{
public:
void MyOtherPublicFunction();
};
Even though A has a public function, it won't be visible to users of B, so for example:
B* pB = new B();
pB->MyPublicFunction(); // This will not compile
pB->MyOtherPublicFunction(); // This is OK
Because of the private inheritance, all data and member functions of A, although available to the B class within the B class, will not be available to code that simply uses an instance of a B class.
If you used public inheritance, i.e.:
class B : public A
{
...
};
then all of A's data and members will be visible to users of the B class. This access is still restricted by A's original access modifiers, i.e. a private function in A will never be accessible to users of B (or, come to that, code for the B class itself). Also, B may redeclare functions of the same name as those in A, thus 'hiding' these functions from users of the B class.
As for virtual functions, that depends on whether A has virtual functions or not.
For example:
class A
{
public:
int MyFn() { return 42; }
};
class B : public A
{
public:
virtual int MyFn() { return 13; }
};
If you try to call MyFn() on a B object through a pointer of type A*, then the virtual function will not be called.
For example:
A* pB = new B();
pB->MyFn(); // Will return 42, because A::MyFn() is called.
but let's say we change A to this:
class A
{
public:
virtual void MyFn() { return 42; }
};
(Notice A now declares MyFn() as virtual)
then this results:
A* pB = new B();
pB->MyFn(); // Will return 13, because B::MyFn() is called.
Here, the B version of MyFn() is called because the class A has declared MyFn() as virtual, so the compiler knows that it must look up the function pointer in the object when calling MyFn() on an A object. Or an object it thinks it is an A, as in this case, even though we've created a B object.
So to your final question, where are the virtual functions stored?
This is compiler/system dependent, but the most common method used is that for an instance of a class that has any virtual functions (whether declared directly, or inherited from a base class), the first piece of data in such an object is a 'special' pointer. This special pointer points to a 'virtual function pointer table', or commonly shortened to 'vtable'.
The compiler creates vtables for every class it compiles that has virtual functions. So for our last example, the compiler will generate two vtables - one for class A and one for class B. There are single instances of these tables - the constructor for an object will set up the vtable-pointer in each newly created object to point to the correct vtable block.
Remember that the first piece of data in an object with virtual functions is this pointer to the vtable, so the compiler always knows how to find the vtable, given an object that needs to call a virtual function. All the compiler has to do is look at the first memory slot in any given object, and it has a pointer to the correct vtable for that object's class.
Our case is very simple - each vtable is one entry long, so they look like this:
vtable for A class:
+---------+--------------+
| 0: MyFn | -> A::MyFn() |
+---------+--------------+
vtable for B class:
+---------+--------------+
| 0: MyFn | -> B::MyFn() |
+---------+--------------+
Notice that for the vtable for the B class, the entry for MyFn has been overwritten with a pointer to B::MyFn() - this ensures that when we call the virtual function MyFn() even on an object pointer of type A*, the B version of MyFn() is correctly called, instead of the A::MyFn().
The '0' number is indicating the entry position in the table. In this simple case, we only have one entry in each vtable, so each entry is at index 0.
So, to call MyFn() on an object (either of type A or B), the compiler will generate some code like this:
pB->__vtable[0]();
(NB. this won't compile; it's just an explanation of the code the compiler will generate.)
To make it more obvious, let's say A declares another function, MyAFn(), which is virtual, which B does not over-ride/re-implement.
So the code would be:
class A
{
public:
virtual void MyAFn() { return 17; }
virtual void MyFn() { return 42; }
};
class B : public A
{
public:
virtual void MyFn() { return 13; }
};
then B will have the functions MyAFn() and MyFn() in its interface, and the vtables will now look like this:
vtable for A class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> A::MyFn() |
+----------+---------------+
vtable for B class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> B::MyFn() |
+----------+---------------+
So in this case, to call MyFn(), the compiler will generate code like this:
pB->__vtable[1]();
Because MyFn() is second in the table (and so at index 1).
Obviously, calling MyAFn() will cause code like this:
pB->__vtable[0]();
because MyAFn() is at index 0.
It should be emphasised that this is compiler-dependent, and iirc, the compiler is under no obligation to order the functions in the vtable in the order they are declared - it's just up to the compiler to make it all work under the hood.
In practice, this scheme is widely used, and function ordering in vtables is fairly deterministic, so ABI between code generated by different C++ compilers is maintained, and allows COM interoperation and similar mechanisms to work across boundaries of code generated by different compilers. This is in no way guaranteed.
Luckily, you'll never have to worry much about vtables, but it's definitely useful to get your mental model of what is going on to make sense and not store up any surprises for you in the future.
More theoretically, if you derive one class from another, you have a base class and a derived class. If you create an object of a derived class, you have a derived object. In C++, you can inherit from the same class multiple times. Consider:
struct A { };
struct B : A { };
struct C : A { };
struct D : B, C { };
D d;
In the d object, you have two A objects within each D objects, which are called "base-class sub-objects". If you try to convert D to A, then the compiler will tell you the conversion is ambiguous, because it doesn't know to which A object you want to convert:
A &a = d; // error: A object in B or A object in C?
Same goes if you name a non-static member of A: The compiler will tell you about an ambiguity. You can circumvent it in this case by converting to B or C first:
A &a = static_cast<B&>(d); // A object in B
The object d is called the "most derived object", because it's not a sub-object of another object of class type. To avoid the ambiguity above, you can inherit virtually
struct A { };
struct B : virtual A { };
struct C : virtual A { };
struct D : B, C { };
Now, there is only one subobject of type A, even though you have two subobject that this one object is contained in: subobject B and sub-object C. Converting a D object to A is now non-ambiguous, because conversion over both the B and the C path will yield the same A sub-object.
Here comes a complication of the above: Theoretically, even without looking at any implementation technique, either or both of the B and C sub-objects are now not contiguous anymore. Both contain the same A object, but both doesn't contain each other either. This means that one or both of those must be "split up" and merely reference the A object of the other, so that both B and C objects can have different addresses. In linear memory, this may look like (let's assume all objecs have size of 1 byte)
C: [1 byte [A: refer to 0xABC [B: 1byte [A: one byte at 0xABC]]]]
[CCCCCCC[ [BBBBBBBBBBCBCBCBCBCBCBCBCBCBCB]]]]
CB is what both the C and the B sub-object contains. Now, as you see, the C sub-object would be split up, and there is no way without, because B is not contained in C, and neither the other way around. The compiler, to access some member using code in a function of C, can't just use an offset, because the code in a function of C doesn't know whether it's contained as a sub-object, or - when it's not abstract - whether it's a most derived object and thus has the A object directly next to it.
a public colon. ( I told you C++ was nasty )
class base { }
class derived : public base { }
let's have:
class Base {
virtual void f();
};
class Derived : public Base {
void f();
}
without f being virtual (as implemented in pseudo "c"):
struct {
BaseAttributes;
} Base;
struct {
BaseAttributes;
DerivedAttributes;
} Derived;
with virtual functions:
struct {
vfptr = Base_vfptr,
BaseAttributes;
} Base;
struct {
vfptr = Derived_vfptr,
BaseAttributes;
DerivedAttributes;
} Derived;
struct {
&Base::f
} Base_vfptr
struct {
&Derived::f
} Base_vfptr
For multiple inheritance, things get more complicated :o)
Derived is Base, but Base is not a Derived
base- is the object you are deriving from.
derived - is the object the inherits his father's public (and protected) members.
a derived object can override (or in some cases must override) some of his father's methods, thus creating a different behavior
A base object is one from which others are derived. Typically it'll have some virtual methods (or even pure virtual) that subclasses can override to specialize.
A subclass of a base object is known as a derived object.
The derived object is derived from its base object(s).
Are you asking about the respective objects' representation in memory?
Both the base class and the derived class will have a table of pointers to their virtual functions. Depending on which functions have been overridden, the value of entries in that table will change.
If B adds more virtual functions that aren't in the base class, B's table of virtual methods will be larger (or there may be a separate table, depending on compiler implementation).
What's the difference between a derived object and a base object in c++,
A derived object can be used in place of a base object; it has all the members of the base object, and maybe some more of its own. So, given a function taking a reference (or pointer) to the base class:
void Function(Base &);
You can pass a reference to an instance of the derived class:
class Derived : public Base {};
Derived derived;
Function(derived);
especially, when there is a virtual function in the class.
If the derived class overrides a virtual function, then the overridden function will always be called on objects of that class, even through a reference to the base class.
class Base
{
public:
virtual void Virtual() {cout << "Base::Virtual" << endl;}
void NonVirtual() {cout << "Base::NonVirtual" << endl;}
};
class Derived : public Base
{
public:
virtual void Virtual() {cout << "Derived::Virtual" << endl;}
void NonVirtual() {cout << "Derived::NonVirtual" << endl;}
};
Derived derived;
Base &base = derived;
base.Virtual(); // prints "Derived::Virtual"
base.NonVirtual(); // prints "Base::NonVirtual"
derived.Virtual(); // prints "Derived::Virtual"
derived.NonVirtual();// prints "Derived::NonVirtual"
Does the derived object maintain additional tables to hold the pointers to functions?
Yes - both classes will contain a pointer to a table of virtual functions (known as a "vtable"), so that the correct function can be found at runtime. You can't access this directly, but it does affect the size and layout of the data in memory.