I am a bit confused as to how the "this" pointer works with base and derived classes. Consider this code:
class A {
//some code
};
class B: public A {
//some code
};
Is it correct to assume that the "this" pointer invoked from class B will be pointing to the same address as a "this" pointer invoked from some code in the base class A?
What about multiple inheritance?
class A {
//some code
};
class B {
//some code
};
class C: public A, public B {
//some code
};
Is it correct to assume that the "this" pointer invoked from class C will be pointing to the same address as a "this" pointer invoked from some code in the base classes A or B?
Is it correct to assume that the "this" pointer invoked from class B will be pointing to the same address as a "this" pointer invoked from some code in the base class A?
No, you cannot assume this. However, although not required, many implementations will use the same address (tested with clang 11 and gcc 10).
With multiple inheritance, in your example, clang 11 and gcc 10 use the same address for an instance of class C and it's base A. The base B will have another address.
Another example is multiple inheritance with a common base instance:
struct A {};
struct B : virtual A {};
struct C : virtual A {};
struct D : B, C {};
Instantiating D, we get with gcc 10 and clang 11:
D: 0x7ffc164eefd0
B: 0x7ffc164eefd0 and A: 0x7ffc164eefd0 // address of A and B = address of D
C: 0x7ffc164eefd8 and A: 0x7ffc164eefd0 // different address for C
There are no rules specifying the relations between addresses of instances of a derived and a base class. This is implementation dependent.
I was reading here about how primary bases are chosen:
"...2. If C is a dynamic class type:
a. Identify all virtual base classes, direct or indirect, that are primary base classes for some other direct or indirect base class. Call these indirect primary base classes.
b. If C has a dynamic base class, attempt to choose a primary base class B. It is the first (in direct base class order) non-virtual dynamic base class, if one exists. Otherwise, it is a nearly empty virtual base class, the first one in (preorder) inheritance graph order which is not an indirect primary base class if any exist, or just the first one if they are all indirect primaries..."
And after there is this correction:
"Case (2b) above is now considered to be an error in the design. The use of the first indirect primary base class as the derived class' primary base does not save any space in the object, and will cause some duplication of virtual function pointers in the additional copy of the base classes virtual table.
The benefit is that using the derived class virtual pointer as the base class virtual pointer will often save a load, and no adjustment to the this pointer will be required for calls to its virtual functions.
It was thought that 2b would allow the compiler to avoid adjusting this in some cases, but this was incorrect, as the virtual function call algorithm requires that the function be looked up through a pointer to a class that defines the function, not one that just inherits it. Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to. For instance, consider this example:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
When B and C are declared, A is a primary base in each case, so although vcall offsets are allocated in the A-in-B and A-in-C vtables, no this adjustment is required and no thunk is generated. However, inside D objects, A is no longer a primary base of C, so if we allowed calls to C::f() to use the copy of A's vtable in the C subobject, we would need to adjust this from C* to B::A*, which would require a third-party thunk. Since we require that a call to C::f() first convert to A*, C-in-D's copy of A's vtable is never referenced, so this is not necessary."
Could you please explain with an example what this refers to: "Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to"?
Also, what are third-party thunks?
I do not understand either what the quoted example tries to show.
A is a nearly empty class, one that contains only a vptr and no visible data members:
struct A { virtual void f(); };
The layout of A is:
A_vtable *vptr
B has a single nearly empty base class used as a "primary":
struct B : virtual public A { int i; };
It means that the layout of B begins with the layout of an A, so that a pointer to a B is a pointer to an A (in assembly language). Layout of B subobject:
B_vtable *A_vptr
int i
A_vptr will point to a B vtable obviously, which is binary compatible with A vtable.
The B_vtable extends the A_vtable, adding all necessary information to navigate to the virtual base class A.
Layout of B complete object:
A base_subobject
int i
And same for C:
C_vtable *A_vptr
int j
Layout of C complete object:
A base_subobject
int j
In D obviously there is only an A subobject, so the layout of a complete object is:
A base_subobject
int i
not(A) not(base_subobject) aka (C::A)_vptr
int j
not(A) is the representation of an A nearly empty base class, that is, a vptr for A, but not a true A subobject: it looks like an A but the visible A is two words above. It's a ghost A!
(C::A)_vptr is the vptr to vtable with layout vtable for C (so also one with layout vtable for A), but for a C subobject where A is not finally a primary base: the C subobject has lost the privilege to host the A base class. So obviously the virtual calls through (C::A)_vptr to virtual functions defined A (there is only one: A::f()) need a this ajustement, with a thunk "C::A::f()" that receives a pointer to not(base_subobject) and adjusts it to the real base_subobject of type A that is the (two words above in the example). (Or if there is an overrider in D, so the D object that is at the exact same address, two words above in the example.)
So given these definitions:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
should the use of a ghost lvalue of a non existant A primary base work?
D d;
C *volatile cp = &d;
A *volatile ghost_ap = reinterpret_cast<A*> (cp);
ghost_ap->f(); // use the vptr of C::A: safe?
(volatile used here to avoid propagation of type knowledge by the compiler)
If for lvalues of C, for a virtual functions that is inherited from A, the call is done via the the C vptr that is also a C::A vptr because A is a "static" primary base of C, then the code should work, because a thunk has been generated that goes from C to D.
In practice it doesn't seem to work with GCC, but if you add an overrider in C:
struct C : virtual public A {
int j;
virtual void f()
{
std::cout << "C:f() \n";
}
};
it works because such concrete function is in the vtable of vtable of C::A.
Even with just a pure virtual overrider:
struct C : virtual public A {
int j;
virtual void f() = 0;
};
and a concrete overrider in D, it also works: the pure virtual override is enough to have the proper entry in the vtable of C::A.
Test code: http://codepad.org/AzmN2Xeh
So i have some code like this:
#include <iostream>
using namespace std;
class Base1 {};
class Base2 {};
class A
{
public:
A() {}
void foo(Base2* ptr) { cout << "This is A. B is at the address " << ptr << endl; }
};
A *global_a;
class B : public Base1, public Base2
{
public:
B() {}
void bar()
{
cout << "This is B. I am at the address " << this << endl;
global_a->foo(this);
}
};
int main()
{
global_a = new A();
B *b = new B();
b->bar();
system("pause");
return 0;
}
But this is the output i get after compiling with Visual Studio 2013:
This is B. I am at the address 0045F0B8
This is A. B is at the address 0045F0B9
Press any key to continue . . .
Can somebody please explain why the addresses are different?
0x0045F0B8 is the address of the complete B object. 0x0045F0B9 is the address of the Base2 subobject of the B object.
In general the address of the complete object might not be the same as the address of a base class subobject. In your case the B object is probably laid out as follows:
+---+-------+
| | Base1 | <-- 0x0045F0B8
| B |-------+
| | Base2 | <-- 0x0045F0B9
+---+-------+
Each base class occupies one byte and Base1 is laid out before Base2. The pointer to the complete B points to the beginning, which is at 0x0045F0B8, but the pointer to the Base2 points to the address inside the complete B object at which the Base2 subobject starts, which is 0x0045F0B9.
However when I compile your program on my system using g++ 4.8, I get the same address printed in both lines. This is presumably because the implementation is allowed to allocate no space at all for empty base classes (the so-called empty base class optimization) and the two base class subobjects Base1 and Base2 are both located at the very beginning of the B object, taking no space, and sharing their address with B.
B derives from Base1 and Base2, so it consists of all the data that Base1 and Base2 contain, and all of the data that B adds on top of them.
B::bar() is passing a pointer to the Base2 portion of itself to A::for(), not the B portion of itself. B::bar() is printing the root address of the B portion, whereas A::foo() is printing the root address of the Base2 portion instead. You are passing around the same object, but they are different addresses within that object:
If B does not add any new data, its base address might be the same as the root address of its nearest ancestor (due to empty base optimization):
Don't rely on that. A compiler might add padding between the classes, for instance:
Always treat the various sections as independent (because they logically are). Just because B derives from Base2 does not guarantee that a B* pointer and a Base2* pointer, both pointing at the same object, will point at the same memory address.
If you have a Base2* pointer and need to access its B data, use dynamic_cast (or static_cast if you know for sure the object is a B) to ensure a proper B* pointer. You can downcast from B* to Base2* without casting (which is why B::bar() is able to pass this - a B* - to A::foo() when it is expecting a Base2* as input). Given a B* pointer, you can always access its Base2 data directly.
Consider the following program
#include<iostream>
using namespace std;
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
class ClassC : public ClassA,public ClassB
{
};
void main()
{
ClassC aObject;
ClassA* pA = &aObject;
ClassB* pB = &aObject;
ClassC* pC = &aObject;
cout<<"pA = "<<pA<<endl;
cout<<"pB = "<<pB<<endl;
cout<<"pC = "<<pC<<endl;
}
pA,pB,pC are supposed to equal,but the result is
pA = 0031FD90
pB = 0031FD94
pC = 0031FD90
why pB = pA + 4?
and when i change
class ClassA
{
public:
virtual ~ClassA(){};
virtual void FunctionA(){};
};
class ClassB
{
public:
virtual void FunctionB(){};
};
to
class ClassA
{
};
class ClassB
{
};
the result is
pA = 0030FAA3
pB = 0030FAA4
pC = 0030FAA3
pB = pA + 1?
The multiply inherited object has two merged sub-objects. I would guess the compiler is pointing one of the pointers to an internal object.
C has two inherited subobjects, therefore is the concatenation of a A object and a B object. When you have an object C, it is composed of an object A followed by an object B. They're not located at the same address, that's why. All three pointers point to the same object, but as different superclasses. The compiler makes the shift for you, so you don't have to worry about that.
Now. Why is there a difference of 4 in one case and 1 in another? Well, in the first case, you have virtual functions for both A and B, therefore each subobject has to have a pointer to its vtable (the table containing the addresses of the resolved virtual function calls). So in this case, sizeof(A) is 4. In the second case, you have no virtual functions, so no vtable. But each subobject must be addressable independently, so the compiler still has to allocate for a different addresses for subobject of class A and subobject of class B. The minimum of difference between two addresses is 1. But I wonder if EBO (empty base-class optimization) should not have kicked in in this case.
That's the implementation detail of compiler.
The reason you hit this case is because you have MI in your code.
Think about how the computer access the member in ClassB, it using the offset to access the member. So let's say you have two int in class B, it using following statement to access the second int member.
*((int*)pb + 1) // this actually will be assembly generate by compiler
But if the pb point to the start of the aObject in your class, this will not work anymore, so the compiler need to generate multiple assembly version to access the same member base on the class inherit structure, and have run-time cost.
That's why the compiler adjust the pb not equal as pa, which will make the above code works, it is the most simply and effect way to implement.
And that's also explain why pa == pc but not equals with pb.
What's the difference between a derived object and a base object in c++,
especially, when there is a virtual function in the class.
Does the derived object maintain additional tables to hold the pointers
to functions?
The derived object inherits all the data and member functions of the base class. Depending on the nature of the inheritance (public, private or protected), this will affect the visibility of these data and member functions to clients (users) of your class.
Say, you inherited B from A privately, like this:
class A
{
public:
void MyPublicFunction();
};
class B : private A
{
public:
void MyOtherPublicFunction();
};
Even though A has a public function, it won't be visible to users of B, so for example:
B* pB = new B();
pB->MyPublicFunction(); // This will not compile
pB->MyOtherPublicFunction(); // This is OK
Because of the private inheritance, all data and member functions of A, although available to the B class within the B class, will not be available to code that simply uses an instance of a B class.
If you used public inheritance, i.e.:
class B : public A
{
...
};
then all of A's data and members will be visible to users of the B class. This access is still restricted by A's original access modifiers, i.e. a private function in A will never be accessible to users of B (or, come to that, code for the B class itself). Also, B may redeclare functions of the same name as those in A, thus 'hiding' these functions from users of the B class.
As for virtual functions, that depends on whether A has virtual functions or not.
For example:
class A
{
public:
int MyFn() { return 42; }
};
class B : public A
{
public:
virtual int MyFn() { return 13; }
};
If you try to call MyFn() on a B object through a pointer of type A*, then the virtual function will not be called.
For example:
A* pB = new B();
pB->MyFn(); // Will return 42, because A::MyFn() is called.
but let's say we change A to this:
class A
{
public:
virtual void MyFn() { return 42; }
};
(Notice A now declares MyFn() as virtual)
then this results:
A* pB = new B();
pB->MyFn(); // Will return 13, because B::MyFn() is called.
Here, the B version of MyFn() is called because the class A has declared MyFn() as virtual, so the compiler knows that it must look up the function pointer in the object when calling MyFn() on an A object. Or an object it thinks it is an A, as in this case, even though we've created a B object.
So to your final question, where are the virtual functions stored?
This is compiler/system dependent, but the most common method used is that for an instance of a class that has any virtual functions (whether declared directly, or inherited from a base class), the first piece of data in such an object is a 'special' pointer. This special pointer points to a 'virtual function pointer table', or commonly shortened to 'vtable'.
The compiler creates vtables for every class it compiles that has virtual functions. So for our last example, the compiler will generate two vtables - one for class A and one for class B. There are single instances of these tables - the constructor for an object will set up the vtable-pointer in each newly created object to point to the correct vtable block.
Remember that the first piece of data in an object with virtual functions is this pointer to the vtable, so the compiler always knows how to find the vtable, given an object that needs to call a virtual function. All the compiler has to do is look at the first memory slot in any given object, and it has a pointer to the correct vtable for that object's class.
Our case is very simple - each vtable is one entry long, so they look like this:
vtable for A class:
+---------+--------------+
| 0: MyFn | -> A::MyFn() |
+---------+--------------+
vtable for B class:
+---------+--------------+
| 0: MyFn | -> B::MyFn() |
+---------+--------------+
Notice that for the vtable for the B class, the entry for MyFn has been overwritten with a pointer to B::MyFn() - this ensures that when we call the virtual function MyFn() even on an object pointer of type A*, the B version of MyFn() is correctly called, instead of the A::MyFn().
The '0' number is indicating the entry position in the table. In this simple case, we only have one entry in each vtable, so each entry is at index 0.
So, to call MyFn() on an object (either of type A or B), the compiler will generate some code like this:
pB->__vtable[0]();
(NB. this won't compile; it's just an explanation of the code the compiler will generate.)
To make it more obvious, let's say A declares another function, MyAFn(), which is virtual, which B does not over-ride/re-implement.
So the code would be:
class A
{
public:
virtual void MyAFn() { return 17; }
virtual void MyFn() { return 42; }
};
class B : public A
{
public:
virtual void MyFn() { return 13; }
};
then B will have the functions MyAFn() and MyFn() in its interface, and the vtables will now look like this:
vtable for A class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> A::MyFn() |
+----------+---------------+
vtable for B class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> B::MyFn() |
+----------+---------------+
So in this case, to call MyFn(), the compiler will generate code like this:
pB->__vtable[1]();
Because MyFn() is second in the table (and so at index 1).
Obviously, calling MyAFn() will cause code like this:
pB->__vtable[0]();
because MyAFn() is at index 0.
It should be emphasised that this is compiler-dependent, and iirc, the compiler is under no obligation to order the functions in the vtable in the order they are declared - it's just up to the compiler to make it all work under the hood.
In practice, this scheme is widely used, and function ordering in vtables is fairly deterministic, so ABI between code generated by different C++ compilers is maintained, and allows COM interoperation and similar mechanisms to work across boundaries of code generated by different compilers. This is in no way guaranteed.
Luckily, you'll never have to worry much about vtables, but it's definitely useful to get your mental model of what is going on to make sense and not store up any surprises for you in the future.
More theoretically, if you derive one class from another, you have a base class and a derived class. If you create an object of a derived class, you have a derived object. In C++, you can inherit from the same class multiple times. Consider:
struct A { };
struct B : A { };
struct C : A { };
struct D : B, C { };
D d;
In the d object, you have two A objects within each D objects, which are called "base-class sub-objects". If you try to convert D to A, then the compiler will tell you the conversion is ambiguous, because it doesn't know to which A object you want to convert:
A &a = d; // error: A object in B or A object in C?
Same goes if you name a non-static member of A: The compiler will tell you about an ambiguity. You can circumvent it in this case by converting to B or C first:
A &a = static_cast<B&>(d); // A object in B
The object d is called the "most derived object", because it's not a sub-object of another object of class type. To avoid the ambiguity above, you can inherit virtually
struct A { };
struct B : virtual A { };
struct C : virtual A { };
struct D : B, C { };
Now, there is only one subobject of type A, even though you have two subobject that this one object is contained in: subobject B and sub-object C. Converting a D object to A is now non-ambiguous, because conversion over both the B and the C path will yield the same A sub-object.
Here comes a complication of the above: Theoretically, even without looking at any implementation technique, either or both of the B and C sub-objects are now not contiguous anymore. Both contain the same A object, but both doesn't contain each other either. This means that one or both of those must be "split up" and merely reference the A object of the other, so that both B and C objects can have different addresses. In linear memory, this may look like (let's assume all objecs have size of 1 byte)
C: [1 byte [A: refer to 0xABC [B: 1byte [A: one byte at 0xABC]]]]
[CCCCCCC[ [BBBBBBBBBBCBCBCBCBCBCBCBCBCBCB]]]]
CB is what both the C and the B sub-object contains. Now, as you see, the C sub-object would be split up, and there is no way without, because B is not contained in C, and neither the other way around. The compiler, to access some member using code in a function of C, can't just use an offset, because the code in a function of C doesn't know whether it's contained as a sub-object, or - when it's not abstract - whether it's a most derived object and thus has the A object directly next to it.
a public colon. ( I told you C++ was nasty )
class base { }
class derived : public base { }
let's have:
class Base {
virtual void f();
};
class Derived : public Base {
void f();
}
without f being virtual (as implemented in pseudo "c"):
struct {
BaseAttributes;
} Base;
struct {
BaseAttributes;
DerivedAttributes;
} Derived;
with virtual functions:
struct {
vfptr = Base_vfptr,
BaseAttributes;
} Base;
struct {
vfptr = Derived_vfptr,
BaseAttributes;
DerivedAttributes;
} Derived;
struct {
&Base::f
} Base_vfptr
struct {
&Derived::f
} Base_vfptr
For multiple inheritance, things get more complicated :o)
Derived is Base, but Base is not a Derived
base- is the object you are deriving from.
derived - is the object the inherits his father's public (and protected) members.
a derived object can override (or in some cases must override) some of his father's methods, thus creating a different behavior
A base object is one from which others are derived. Typically it'll have some virtual methods (or even pure virtual) that subclasses can override to specialize.
A subclass of a base object is known as a derived object.
The derived object is derived from its base object(s).
Are you asking about the respective objects' representation in memory?
Both the base class and the derived class will have a table of pointers to their virtual functions. Depending on which functions have been overridden, the value of entries in that table will change.
If B adds more virtual functions that aren't in the base class, B's table of virtual methods will be larger (or there may be a separate table, depending on compiler implementation).
What's the difference between a derived object and a base object in c++,
A derived object can be used in place of a base object; it has all the members of the base object, and maybe some more of its own. So, given a function taking a reference (or pointer) to the base class:
void Function(Base &);
You can pass a reference to an instance of the derived class:
class Derived : public Base {};
Derived derived;
Function(derived);
especially, when there is a virtual function in the class.
If the derived class overrides a virtual function, then the overridden function will always be called on objects of that class, even through a reference to the base class.
class Base
{
public:
virtual void Virtual() {cout << "Base::Virtual" << endl;}
void NonVirtual() {cout << "Base::NonVirtual" << endl;}
};
class Derived : public Base
{
public:
virtual void Virtual() {cout << "Derived::Virtual" << endl;}
void NonVirtual() {cout << "Derived::NonVirtual" << endl;}
};
Derived derived;
Base &base = derived;
base.Virtual(); // prints "Derived::Virtual"
base.NonVirtual(); // prints "Base::NonVirtual"
derived.Virtual(); // prints "Derived::Virtual"
derived.NonVirtual();// prints "Derived::NonVirtual"
Does the derived object maintain additional tables to hold the pointers to functions?
Yes - both classes will contain a pointer to a table of virtual functions (known as a "vtable"), so that the correct function can be found at runtime. You can't access this directly, but it does affect the size and layout of the data in memory.