I am trying to make sense of the statement in book effective c++. Following is the inheritance diagram for multiple inheritance.
Now the book says separate memory in each class is required for vptr. Also it makes following statement
An oddity in the above diagram is that there are only three vptrs even though four classes are involved. Implementations are free to generate four vptrs if they like, but three suffice (it turns out that B and D can share a vptr), and most implementations take advantage of this opportunity to reduce the compiler-generated overhead.
I could not see any reason why there is requirement of separate memory in each class for vptr. I had an understanding that vptr is inherited from base class whatever may be the inheritance type. If we assume that it shown resultant memory structure with inherited vptr how can they make the statement that
B and D can share a vptr
Can somebody please clarify a bit about vptr in multiple inheritance?
Do we need separate vptr in each class ?
Also if above is true why B and D can share vptr ?
Your question is interesting, however I fear that you are aiming too big as a first question, so I will answer in several steps, if you don't mind :)
Disclaimer: I am no compiler writer, and though I have certainly studied the subject, my word should be taken with caution. There will me inaccuracies. And I am not that well versed in RTTI. Also, since this is not standard, what I describe are possibilities.
1. How to implement inheritance ?
Note: I will leave out alignment issues, they just mean that some padding could be included between the blocks
Let's leave it out virtual methods, for now, and concentrate on how inheritance is implemented, down below.
The truth is that inheritance and composition share a lot:
struct B { int t; int u; };
struct C { B b; int v; int w; };
struct D: B { int v; int w; };
Are going to look like:
B:
+-----+-----+
| t | u |
+-----+-----+
C:
+-----+-----+-----+-----+
| B | v | w |
+-----+-----+-----+-----+
D:
+-----+-----+-----+-----+
| B | v | w |
+-----+-----+-----+-----+
Shocking isn't it :) ?
This means, however, than multiple inheritance is quite simple to figure out:
struct A { int r; int s; };
struct M: A, B { int v; int w; };
M:
+-----+-----+-----+-----+-----+-----+
| A | B | v | w |
+-----+-----+-----+-----+-----+-----+
Using these diagrams, let's see what happens when casting a derived pointer to a base pointer:
M* pm = new M();
A* pa = pm; // points to the A subpart of M
B* pb = pm; // points to the B subpart of M
Using our previous diagram:
M:
+-----+-----+-----+-----+-----+-----+
| A | B | v | w |
+-----+-----+-----+-----+-----+-----+
^ ^
pm pb
pa
The fact that the address of pb is slightly different from that of pm is handled through pointer arithmetic automatically for you by the compiler.
2. How to implement virtual inheritance ?
Virtual inheritance is tricky: you need to ensure that a single V (for virtual) object will be shared by all the other subobjects. Let's define a simple diamond inheritance.
struct V { int t; };
struct B: virtual V { int u; };
struct C: virtual V { int v; };
struct D: B, C { int w; };
I'll leave out the representation, and concentrate on ensuring that in a D object, both the B and C subparts share the same subobject. How can it be done ?
Remember that a class size should be constant
Remember that when designed, neither B nor C can foresee whether they will be used together or not
The solution that has been found is therefore simple: B and C only reserve space for a pointer to V, and:
if you build a stand-alone B, the constructor will allocate a V on the heap, which will be handled automatically
if you build B as part of a D, the B subpart will expect the D constructor to pass the pointer to the location of V
And idem for C, obviously.
In D, an optimization allow the constructor to reserve space for V right in the object, because D does not inherit virtually from either B or C, giving the diagram you have shown (though we don't have yet virtual methods).
B: (and C is similar)
+-----+-----+
| V* | u |
+-----+-----+
D:
+-----+-----+-----+-----+-----+-----+
| B | C | w | A |
+-----+-----+-----+-----+-----+-----+
Remark now that casting from B to A is slightly trickier than simple pointer arithmetic: you need follow the pointer in B rather than simple pointer arithmetic.
There is a worse case though, up-casting. If I give you a pointer to A how do you know how to get back to B ?
In this case, the magic is performed by dynamic_cast, but this require some support (ie, information) stored somewhere. This is the so called RTTI (Run-Time Type Information). dynamic_cast will first determine that A is part of a D through some magic, then query D's runtime information to know where within D the B subobject is stored.
If we were in case where there is no B subobject, it would either return 0 (pointer form) or throw a bad_cast exception (reference form).
3. How to implement virtual methods ?
In general virtual methods are implemented through a v-table (ie, a table of pointer to functions) per class, and v-ptr to this table per-object. This is not the sole possible implementation, and it has been demonstrated that others could be faster, however it is both simple and with a predictable overhead (both in term of memory and dispatch speed).
If we take a simple base class object, with a virtual method:
struct B { virtual foo(); };
For the computer, there is no such things as member methods, so in fact you have:
struct B { VTable* vptr; };
void Bfoo(B* b);
struct BVTable { RTTI* rtti; void (*foo)(B*); };
When you derive from B:
struct D: B { virtual foo(); virtual bar(); };
You now have two virtual methods, one overrides B::foo, the other is brand new. The computer representation is akin to:
struct D { VTable* vptr; }; // single table, even for two methods
void Dfoo(D* d); void Dbar(D* d);
struct DVTable { RTTI* rtti; void (*foo)(D*); void (*foo)(B*); };
Note how BVTable and DVTable are so similar (since we put foo before bar) ? It's important!
D* d = /**/;
B* b = d; // noop, no needfor arithmetic
b->foo();
Let's translate the call to foo in machine language (somewhat):
// 1. get the vptr
void* vptr = b; // noop, it's stored at the first byte of B
// 2. get the pointer to foo function
void (*foo)(B*) = vptr[1]; // 0 is for RTTI
// 3. apply foo
(*foo)(b);
Those vptrs are initialized by the constructors of the objects, when executing the constructor of D, here is what happened:
D::D() calls B::B() first and foremost, to initiliaze its subparts
B::B() initialize vptr to point to its vtable, then returns
D::D() initialize vptr to point to its vtable, overriding B's
Therefore, vptr here pointed to D's vtable, and thus the foo applied was D's. For B it was completely transparent.
Here B and D share the same vptr!
4. Virtual tables in multi-inheritance
Unfortunately this sharing is not always possible.
First, as we have seen, in the case of virtual inheritance, the "shared" item is positionned oddly in the final complete object. It therefore has its own vptr. That's 1.
Second, in case of multi-inheritance, the first base is aligned with the complete object, but the second base cannot be (they both need space for their data), therefore it cannot share its vptr. That's 2.
Third, the first base is aligned with the complete object, thus offering us the same layout that in the case of simple inheritance (the same optimization opportunity). That's 3.
Quite simple, no ?
If a class has virtual members, one need to way to find their address. Those are collected in a constant table (the vtbl) whose address is stored in an hidden field for each object (vptr). A call to a virtual member is essentially:
obj->_vptr[member_idx](obj, params...);
A derived class which add virtual members to his base class also need a place for them. Thus a new vtbl and a new vptr for them. A call to an inherited virtual member is still
obj->_vptr[member_idx](obj, params...);
and a call to new virtual member is:
obj->_vptr2[member_idx](obj, params...);
If the base is not virtual, one can arrange for the second vtbl to be put immediately after the first one, effectively increasing the size of the vtbl. And the _vptr2 is no more needed. A call to a new virtual member is thus:
obj->_vptr[member_idx+num_inherited_members](obj, params...);
In the case of (non virtual) multiple inheritance, one inherit two vtbl and two vptr. They can't be merged, and calls must pay attention to add an offset to the object (in order for the inherited data members to be found at the correct place). Calls to the first base class members will be
obj->_vptr_base1[member_idx](obj, params...);
and for the second
obj->_vptr_base2[member_idx](obj+offset, params...);
New virtual members can again either be put in a new vtbl, or appended to the vtbl of the first base (so that no offsets are added in future calls).
If a base is virtual, one can not append the new vtbl to the inherited one as it could leads to conflicts (in the example you gave, if both B and C append their virtual functions, how D be able to build its version?).
Thus, A needs a vtbl. B and C need a vtbl and it can't be appended to A's one because A is a virtual base of both. D needs a vtbl but it can be appended to B one as B is not a virtual base class of D.
It all has to do with how compiler figures out the actual addresses of method functions. The compiler assumes that virtual table pointer is located at a known offset from the base of the object (typically at offset 0). The compiler also needs to know the structure of the virtual table for each class - in other words, how to lookup pointers to functions in the virtual table.
Class B and class C will have completely different structures of Virtual Tables since they have different methods. Virtual table for class D can look like a virtual table for class B followed by additional data for methods of class C.
When you generate an object of class D, you can cast it as a pointer to B or as a pointer to C or even as a pointer to class A. You may pass these pointers to modules that are not even aware of existence of class D, but can call methods of class B or C or A. These modules need to know how to locate the pointer to the virtual table of the class and they need to know how to locate pointers to methods of class B/C/A in the virtual table. That's why you need to have separate VPTRs for each class.
Class D is well aware of existence of class B and the structure of its virtual table and therefore can extend its structure and reuse the VPTR from object B.
When you cast a pointer to object D to a pointer to object B or C or A, it will actually update the pointer by some offset, so that it starts from vptr corresponding to that specific base class.
I could not see any reason why there
is requirement of separate memory in
each class for vptr
At runtime, when you invoke a (virtual) method via a pointer, the CPU has no knowledge about the actual object on which the method is dispatched. If you have B* b = ...; b->some_method(); then the variable b can potentially point at an object created via new B() or via new D() or
even new E() where E is some other class that inherits from (either) B or D. Each of these classes can supply its own implementation (override) for some_method(). Thus, the call b->some_method() should dispatch the implementation from either B, D or E depending on the object on which b is pointing.
The vptr of an object allows the CPU to find the address of the implementation of some_method that is in effect for that object. Each class defines it own vtbl (containing addresses of all virtual methods) and each object of the class starts with a vptr that points at that vtbl.
I think D needs 2 or 3 vptrs.
Here A may or may not require a vptr.
B needs one that should not be shared with A (because A is virtually inherited).
C needs one that should not be shared with A (ditto).
D can use B or C's vftable for its new virtual functions (if any), so it can share B's or C's.
My old paper "C++: Under the Hood" explains the Microsoft C++ implementation of virtual base classes. http://www.openrce.org/articles/files/jangrayhood.pdf
And (MS C++) you can compile with cl /d1reportAllClassLayout to get a text report of class memory layouts.
Happy hacking!
Related
I was reading here about how primary bases are chosen:
"...2. If C is a dynamic class type:
a. Identify all virtual base classes, direct or indirect, that are primary base classes for some other direct or indirect base class. Call these indirect primary base classes.
b. If C has a dynamic base class, attempt to choose a primary base class B. It is the first (in direct base class order) non-virtual dynamic base class, if one exists. Otherwise, it is a nearly empty virtual base class, the first one in (preorder) inheritance graph order which is not an indirect primary base class if any exist, or just the first one if they are all indirect primaries..."
And after there is this correction:
"Case (2b) above is now considered to be an error in the design. The use of the first indirect primary base class as the derived class' primary base does not save any space in the object, and will cause some duplication of virtual function pointers in the additional copy of the base classes virtual table.
The benefit is that using the derived class virtual pointer as the base class virtual pointer will often save a load, and no adjustment to the this pointer will be required for calls to its virtual functions.
It was thought that 2b would allow the compiler to avoid adjusting this in some cases, but this was incorrect, as the virtual function call algorithm requires that the function be looked up through a pointer to a class that defines the function, not one that just inherits it. Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to. For instance, consider this example:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
When B and C are declared, A is a primary base in each case, so although vcall offsets are allocated in the A-in-B and A-in-C vtables, no this adjustment is required and no thunk is generated. However, inside D objects, A is no longer a primary base of C, so if we allowed calls to C::f() to use the copy of A's vtable in the C subobject, we would need to adjust this from C* to B::A*, which would require a third-party thunk. Since we require that a call to C::f() first convert to A*, C-in-D's copy of A's vtable is never referenced, so this is not necessary."
Could you please explain with an example what this refers to: "Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to"?
Also, what are third-party thunks?
I do not understand either what the quoted example tries to show.
A is a nearly empty class, one that contains only a vptr and no visible data members:
struct A { virtual void f(); };
The layout of A is:
A_vtable *vptr
B has a single nearly empty base class used as a "primary":
struct B : virtual public A { int i; };
It means that the layout of B begins with the layout of an A, so that a pointer to a B is a pointer to an A (in assembly language). Layout of B subobject:
B_vtable *A_vptr
int i
A_vptr will point to a B vtable obviously, which is binary compatible with A vtable.
The B_vtable extends the A_vtable, adding all necessary information to navigate to the virtual base class A.
Layout of B complete object:
A base_subobject
int i
And same for C:
C_vtable *A_vptr
int j
Layout of C complete object:
A base_subobject
int j
In D obviously there is only an A subobject, so the layout of a complete object is:
A base_subobject
int i
not(A) not(base_subobject) aka (C::A)_vptr
int j
not(A) is the representation of an A nearly empty base class, that is, a vptr for A, but not a true A subobject: it looks like an A but the visible A is two words above. It's a ghost A!
(C::A)_vptr is the vptr to vtable with layout vtable for C (so also one with layout vtable for A), but for a C subobject where A is not finally a primary base: the C subobject has lost the privilege to host the A base class. So obviously the virtual calls through (C::A)_vptr to virtual functions defined A (there is only one: A::f()) need a this ajustement, with a thunk "C::A::f()" that receives a pointer to not(base_subobject) and adjusts it to the real base_subobject of type A that is the (two words above in the example). (Or if there is an overrider in D, so the D object that is at the exact same address, two words above in the example.)
So given these definitions:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
should the use of a ghost lvalue of a non existant A primary base work?
D d;
C *volatile cp = &d;
A *volatile ghost_ap = reinterpret_cast<A*> (cp);
ghost_ap->f(); // use the vptr of C::A: safe?
(volatile used here to avoid propagation of type knowledge by the compiler)
If for lvalues of C, for a virtual functions that is inherited from A, the call is done via the the C vptr that is also a C::A vptr because A is a "static" primary base of C, then the code should work, because a thunk has been generated that goes from C to D.
In practice it doesn't seem to work with GCC, but if you add an overrider in C:
struct C : virtual public A {
int j;
virtual void f()
{
std::cout << "C:f() \n";
}
};
it works because such concrete function is in the vtable of vtable of C::A.
Even with just a pure virtual overrider:
struct C : virtual public A {
int j;
virtual void f() = 0;
};
and a concrete overrider in D, it also works: the pure virtual override is enough to have the proper entry in the vtable of C::A.
Test code: http://codepad.org/AzmN2Xeh
I have been learning some more "indepth" things about virtual tables recently and this question came to my mind.
Suppose we have this sample:
class A {
virtual void foo();
}
class B : public A {
void foo();
}
In this case from what I know there will be a vtable present for each class and the dispatch would be quite simple.
Now suppose we change the B class to something like this:
class B : public C, public A {
void foo();
}
If the class C has some virtual methods the dispatch mechanism for B will be more complicated. There will probably be 2 vtables for both inheritance paths B-C, B-A etc.
From what I've learned so far it seems that if there would be somewhere else in the codebase function like this:
void bar(A * a) {
a->foo();
}
It would need to compile now with the more complicated dispatch mechanism because at compile time we do not know if "a" is pointer to A or B.
Now to the question. Suppose we added the new class B to our codebase. It doesn't seem likely to me that it would require to recompile the code everywhere where the pointer to A is used.
From what I know the vtables are created by the compiler. However is it possible that this fixup is solved by the linker possibly during relocation? It does seem likely to me I just can not find any evidence to be sure and therefore go to sleep right now :)
Inside void bar(A * a), the pointer is definitely to an A object. That A object may be a subobject of something else like a B, but that's irrelevant. The A is self-contained and has its own vtable pointer which links to foo.
When the conversion from B * to A * occurs, such as when bar is called with a B *, a constant offset may be added to the B * to make it point to the A subobject. If the first thing in every object is the vtable, then this will also set the pointer to the A vtable as well. For single inheritance, the adjustment is unnecessary.
Here is what memory looks like for a typical implementation:
| ptr to B vt | members of C | members of B | ptr to AB vt | members of A |
B vt: | ptrs to methods of C (with B overrides) | ptrs to methods of B |
AB vt: | ptrs to methods of A (with B overrides) |
(Note that typically the AB vt is really still part of the B vt; they would be contiguous in memory. And ptrs to methods of B could then go after ptrs to methods of A. I just wrote it this way for formatting clarity.)
When you convert a B * to an A *, you go from this:
| ptr to B vt | members of C | members of B | ptr to AB vt | members of A |
^ your B * pointer value
to this:
| ptr to B vt | members of C | members of B | ptr to AB vt | members of A |
^ your A * pointer value
Using a static_cast from A * to B * will move the pointer backwards, in the other direction.
No, there's no need to recompile the code that depends only on A.
This is actually immediately follows from the principle of "independent translation", which typical modern C++ compilers adhere to. You can write all code dependent only on A in such a way, that it will never include any definitions that mention B. That means that any changes in B will not trigger the recompilation of any A-specific code. That in turn means that a C++ compiler that follows the principle of "independent translation" has to implement simple inheritance, multiple inheritance, virtual dispatch, hierarchical conversions etc. in such a way that any changes in B-specific code do not require recompilation of any A-specific code.
If that wasn't the case, the architecture of a typical C++ compiler would have to be significantly different.
There is this code:
#include <iostream>
class Base
{
public:
Base() {
std::cout << "Base: " << this << std::endl;
}
int x;
int y;
int z;
};
class Derived : Base
{
public:
Derived() {
std::cout << "Derived: " << this << std::endl;
}
void fun(){}
};
int main() {
Derived d;
return 0;
}
The output:
Base: 0xbfdb81d4
Derived: 0xbfdb81d4
However when function 'fun' is changed to virtual in Derived class:
virtual void fun(){} // changed in Derived
Then address of 'this' is not the same in both constructors:
Base: 0xbf93d6a4
Derived: 0xbf93d6a0
The other thing is if class Base is polymorphic, for example I added there some other virtual function:
virtual void funOther(){} // added to Base
then addresses of both 'this' match again:
Base: 0xbfcceda0
Derived: 0xbfcceda0
The question is - why 'this' address is different in Base and Derived class when Base class is not polymorphic and Derived class is?
When you have a polymorphic single-inheritance hierarchy of classes, the typical convention followed by most (if not all) compilers is that each object in that hierarchy has to begin with a VMT pointer (a pointer to Virtual Method Table). In such case the VMT pointer is introduced into the object memory layout early: by the root class of the polymorphic hierarchy, while all lower classes simply inherit it and set it to point to their proper VMT. In such case all nested subobjects within any derived object have the same this value. That way by reading a memory location at *this the compiler has immediate access to VMT pointer regardless of the actual subobject type. This is exactly what happens in your last experiment. When you make the root class polymorphic, all this values match.
However, when the base class in the hierarchy is not polymorphic, it does not introduce a VMT pointer. The VMT pointer will be introduced by the very first polymorphic class somewhere lower in the hierarchy. In such case a popular implementational approach is to insert the VMT pointer before the data introduced by the non-polymorphic (upper) part of the hierarchy. This is what you see in your second experiment. The memory layout for Derived looks as follows
+------------------------------------+ <---- `this` value for `Derived` and below
| VMT pointer introduced by Derived |
+------------------------------------+ <---- `this` value for `Base` and above
| Base data |
+------------------------------------+
| Derived data |
+------------------------------------+
Meanwhile, all classes in the non-polymorphic (upper) part of the hierarchy should know nothing about any VMT pointers. Objects of Base type must begin with data field Base::x. At the same time all classes in the polymorphic (lower) part of the hierarchy must begin with VMT pointer. In order to satisfy both of these requirements, the compiler is forced to adjust the object pointer value as it is converted up and down the hierarchy from one nested base subobject to another. That immediately means that pointer conversion across the polymorphic/non-polymorphic boundary is no longer conceptual: the compiler has to add or subtract some offset.
The subobjects from non-polymorphic part of the hierarchy will share their this value, while subobjects from the polymorphic part of hierarchy will share their own, different this value.
Having to add or subtract some offset when converting pointer values along the hierarchy is not unusual: the compiler has to do it all the time when dealing with multiple-inheritance hierarchies. However, you example shows how it can be achieved in single-inheritance hierarchy as well.
The addition/subtraction effect will also be revealed in a pointer conversion
Derived *pd = new Derived;
Base *pb = pd;
// Numerical values of `pb` and `pd` are different if `Base` is non-polymorphic
// and `Derived` is polymorphic
Derived *pd2 = static_cast<Derived *>(pb);
// Numerical values of `pd` and `pd2` are the same
This looks like behavior of a typical implementation of polymorphism with a v-table pointer in the object. The Base class doesn't require such a pointer since it doesn't have any virtual methods. Which saves 4 bytes in the object size on a 32-bit machine. A typical layout is:
+------+------+------+
| x | y | z |
+------+------+------+
^
| this
The Derived class however does require the v-table pointer. Typically stored at offset 0 in the object layout.
+------+------+------+------+
| vptr | x | y | z |
+------+------+------+------+
^
| this
So to make the Base class methods see the same layout of the object, the code generator adds 4 to the this pointer before calling a method of the Base class. The constructor sees:
+------+------+------+------+
| vptr | x | y | z |
+------+------+------+------+
^
| this
Which explains why you see 4 added to the this pointer value in the Base constructor.
Technically speaking, this is exactly what happens.
However it must be noted that according to the language specification, the implementation of polymorphism does not necessarily relate to vtables: this is what the spec. defines as "implementation detail", that is out of the specs scope.
All what we can say is that this has a type, and points to what is accessible through its type.
How the dereferencing into members happens, again, is an implementation detail.
The fact that a pointer to something when converted into a pointer to something else, either by implicit, static or dynamic conversion, has to be changed to accommodate what is around must be considered the rule, not the exception.
By the way C++ is defined, the question is meaningless, as are the answers, since they assume implicitly that the implementation is based on the supposed layouts.
The fact that, under given circumstances, two object sub-components share a same origin, is just a (very common) particular case.
The exception is "reinterpreting": when you "blind" the type system, and just say "look this bunch of bytes as they are an instance of this type": that's the only case you have to expect no address change (and no responsibility from the compiler about the meaningfulness of such a conversion).
How many vptrs are usually needed for a object whose clas( child ) has single inheritance with a base class which multiple inherits base1 and base2. What is the strategy for identifying how many vptrs a object has provided it has couple of single inheritance and multiple inheritance. Though standard doesn't specify about vptrs but I just want to know how an implementation does virtual function implementation.
Why do you care? The simple answer is enough, but I guess you want something more complete.
This is not part of the standard, so any implementation is free to do as they wish, but a general rule of thumb is that in an implementation that uses virtual table pointers, as a zeroth approximation, for the dynamic dispatch you need at most as many pointers to virtual tables as there are classes that add a new virtual method to the hierarchy. (In some cases the virtual table can be extended, and the base and derived types share a single vptr)
// some examples:
struct a { void foo(); }; // no need for virtual table
struct b : a { virtual foo1(); }; // need vtable, and vptr
struct c : b { void bar(); }; // no extra virtual table, 1 vptr (b) suffices
struct d : b { virtual bar(); }; // 1 vtable, d extends b's vtable
struct e : d, b {}; // 2 vptr, 1 for the d and 1 for b
struct f : virtual b {}; // 1 vptr, f reuse b's vptr to locate subobject b
struct g : virtual b {}; // 1 vptr, g reuse b's vptr to locate subobject b
struct h : f, g {}; // 2 vptr, 1 for f, 1 for g
// h can locate subobject b using f's vptr
Basically each subobject of a type that requires its own dynamic dispatch (cannot directly reuse the parents) would need its own virtual table and vptr.
In reality compilers merge different vtables into a single vtable. When d adds a new virtual function over the set of functions in b, the compiler will merge the potential two tables into a single one by appending the new slots to the end of the vtable, so the vtable for d will be a extended version of the vtable for b with extra elements at the end maintaining binary compatibility (i.e. the d vtable can be interpreted as a b vtable to access the methods available in b), and the d object will have a single vptr.
In the case of multiple inheritance things become a bit more complicated as each base needs to have the same layout as a subobject of the complete object than if it was a separate object, so there will be extra vptrs pointing to different regions in the complete object's vtable.
Finally in the case of virtual inheritance things become even more complicated, and there might be multiple vtables for the same complete object with the vptr's being updated as construction/destruction evolves (vptr's are always updated as construction/destruction evolves, but without virtual inheritance the vptr will point to the base's vtables, while in the case of virtual inheritance there will be multiple vtables for the same type)
The fine print
Anything regarding vptr/vtable is not specified, so this is going to be compiler dependent for the fine details, but the simple cases are handled the same by almost every modern compiler (I write "almost" just in case).
You have been warned.
Object layout: non-virtual inheritance
If you inherit from base classes, and they have a vptr, you naturally have as many inherited vptr in your class.
The question is: When will the compiler add a vptr to a class that already has an inherited vptr?
The compiler will try to avoid adding redundant vptr:
struct B {
virtual ~B();
};
struct D : B {
virtual void foo();
};
Here B has a vptr, so D does not get its own vptr, it reuses the existing vptr; the vtable of B is extended with an entry for foo(). The vtable for D is "derived" from the vtable for B, pseudo-code:
struct B_vtable {
typeinfo *info; // for typeid, dynamic_cast
void (*destructor)(B*);
};
struct D_vtable : B_vtable {
void (*foo)(D*);
};
The fine print, again: this is a simplification of a real vtable, to get the idea.
Virtual inheritance
For non virtual single inheritance, there is almost no room for variation between implementations. For virtual inheritance, there are a lot more variations between compilers.
struct B2 : virtual A {
};
There is a conversion from B2* to A*, so a B2 object must provide this functionality:
either with a A* member
either with an int member: offset_of_A_from_B2
either using its vptr, by storing offset_of_A_from_B2 in the vtable
In general, a class will not reuse the vptr of its virtual base class (but it can in a very special case).
What's the difference between a derived object and a base object in c++,
especially, when there is a virtual function in the class.
Does the derived object maintain additional tables to hold the pointers
to functions?
The derived object inherits all the data and member functions of the base class. Depending on the nature of the inheritance (public, private or protected), this will affect the visibility of these data and member functions to clients (users) of your class.
Say, you inherited B from A privately, like this:
class A
{
public:
void MyPublicFunction();
};
class B : private A
{
public:
void MyOtherPublicFunction();
};
Even though A has a public function, it won't be visible to users of B, so for example:
B* pB = new B();
pB->MyPublicFunction(); // This will not compile
pB->MyOtherPublicFunction(); // This is OK
Because of the private inheritance, all data and member functions of A, although available to the B class within the B class, will not be available to code that simply uses an instance of a B class.
If you used public inheritance, i.e.:
class B : public A
{
...
};
then all of A's data and members will be visible to users of the B class. This access is still restricted by A's original access modifiers, i.e. a private function in A will never be accessible to users of B (or, come to that, code for the B class itself). Also, B may redeclare functions of the same name as those in A, thus 'hiding' these functions from users of the B class.
As for virtual functions, that depends on whether A has virtual functions or not.
For example:
class A
{
public:
int MyFn() { return 42; }
};
class B : public A
{
public:
virtual int MyFn() { return 13; }
};
If you try to call MyFn() on a B object through a pointer of type A*, then the virtual function will not be called.
For example:
A* pB = new B();
pB->MyFn(); // Will return 42, because A::MyFn() is called.
but let's say we change A to this:
class A
{
public:
virtual void MyFn() { return 42; }
};
(Notice A now declares MyFn() as virtual)
then this results:
A* pB = new B();
pB->MyFn(); // Will return 13, because B::MyFn() is called.
Here, the B version of MyFn() is called because the class A has declared MyFn() as virtual, so the compiler knows that it must look up the function pointer in the object when calling MyFn() on an A object. Or an object it thinks it is an A, as in this case, even though we've created a B object.
So to your final question, where are the virtual functions stored?
This is compiler/system dependent, but the most common method used is that for an instance of a class that has any virtual functions (whether declared directly, or inherited from a base class), the first piece of data in such an object is a 'special' pointer. This special pointer points to a 'virtual function pointer table', or commonly shortened to 'vtable'.
The compiler creates vtables for every class it compiles that has virtual functions. So for our last example, the compiler will generate two vtables - one for class A and one for class B. There are single instances of these tables - the constructor for an object will set up the vtable-pointer in each newly created object to point to the correct vtable block.
Remember that the first piece of data in an object with virtual functions is this pointer to the vtable, so the compiler always knows how to find the vtable, given an object that needs to call a virtual function. All the compiler has to do is look at the first memory slot in any given object, and it has a pointer to the correct vtable for that object's class.
Our case is very simple - each vtable is one entry long, so they look like this:
vtable for A class:
+---------+--------------+
| 0: MyFn | -> A::MyFn() |
+---------+--------------+
vtable for B class:
+---------+--------------+
| 0: MyFn | -> B::MyFn() |
+---------+--------------+
Notice that for the vtable for the B class, the entry for MyFn has been overwritten with a pointer to B::MyFn() - this ensures that when we call the virtual function MyFn() even on an object pointer of type A*, the B version of MyFn() is correctly called, instead of the A::MyFn().
The '0' number is indicating the entry position in the table. In this simple case, we only have one entry in each vtable, so each entry is at index 0.
So, to call MyFn() on an object (either of type A or B), the compiler will generate some code like this:
pB->__vtable[0]();
(NB. this won't compile; it's just an explanation of the code the compiler will generate.)
To make it more obvious, let's say A declares another function, MyAFn(), which is virtual, which B does not over-ride/re-implement.
So the code would be:
class A
{
public:
virtual void MyAFn() { return 17; }
virtual void MyFn() { return 42; }
};
class B : public A
{
public:
virtual void MyFn() { return 13; }
};
then B will have the functions MyAFn() and MyFn() in its interface, and the vtables will now look like this:
vtable for A class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> A::MyFn() |
+----------+---------------+
vtable for B class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> B::MyFn() |
+----------+---------------+
So in this case, to call MyFn(), the compiler will generate code like this:
pB->__vtable[1]();
Because MyFn() is second in the table (and so at index 1).
Obviously, calling MyAFn() will cause code like this:
pB->__vtable[0]();
because MyAFn() is at index 0.
It should be emphasised that this is compiler-dependent, and iirc, the compiler is under no obligation to order the functions in the vtable in the order they are declared - it's just up to the compiler to make it all work under the hood.
In practice, this scheme is widely used, and function ordering in vtables is fairly deterministic, so ABI between code generated by different C++ compilers is maintained, and allows COM interoperation and similar mechanisms to work across boundaries of code generated by different compilers. This is in no way guaranteed.
Luckily, you'll never have to worry much about vtables, but it's definitely useful to get your mental model of what is going on to make sense and not store up any surprises for you in the future.
More theoretically, if you derive one class from another, you have a base class and a derived class. If you create an object of a derived class, you have a derived object. In C++, you can inherit from the same class multiple times. Consider:
struct A { };
struct B : A { };
struct C : A { };
struct D : B, C { };
D d;
In the d object, you have two A objects within each D objects, which are called "base-class sub-objects". If you try to convert D to A, then the compiler will tell you the conversion is ambiguous, because it doesn't know to which A object you want to convert:
A &a = d; // error: A object in B or A object in C?
Same goes if you name a non-static member of A: The compiler will tell you about an ambiguity. You can circumvent it in this case by converting to B or C first:
A &a = static_cast<B&>(d); // A object in B
The object d is called the "most derived object", because it's not a sub-object of another object of class type. To avoid the ambiguity above, you can inherit virtually
struct A { };
struct B : virtual A { };
struct C : virtual A { };
struct D : B, C { };
Now, there is only one subobject of type A, even though you have two subobject that this one object is contained in: subobject B and sub-object C. Converting a D object to A is now non-ambiguous, because conversion over both the B and the C path will yield the same A sub-object.
Here comes a complication of the above: Theoretically, even without looking at any implementation technique, either or both of the B and C sub-objects are now not contiguous anymore. Both contain the same A object, but both doesn't contain each other either. This means that one or both of those must be "split up" and merely reference the A object of the other, so that both B and C objects can have different addresses. In linear memory, this may look like (let's assume all objecs have size of 1 byte)
C: [1 byte [A: refer to 0xABC [B: 1byte [A: one byte at 0xABC]]]]
[CCCCCCC[ [BBBBBBBBBBCBCBCBCBCBCBCBCBCBCB]]]]
CB is what both the C and the B sub-object contains. Now, as you see, the C sub-object would be split up, and there is no way without, because B is not contained in C, and neither the other way around. The compiler, to access some member using code in a function of C, can't just use an offset, because the code in a function of C doesn't know whether it's contained as a sub-object, or - when it's not abstract - whether it's a most derived object and thus has the A object directly next to it.
a public colon. ( I told you C++ was nasty )
class base { }
class derived : public base { }
let's have:
class Base {
virtual void f();
};
class Derived : public Base {
void f();
}
without f being virtual (as implemented in pseudo "c"):
struct {
BaseAttributes;
} Base;
struct {
BaseAttributes;
DerivedAttributes;
} Derived;
with virtual functions:
struct {
vfptr = Base_vfptr,
BaseAttributes;
} Base;
struct {
vfptr = Derived_vfptr,
BaseAttributes;
DerivedAttributes;
} Derived;
struct {
&Base::f
} Base_vfptr
struct {
&Derived::f
} Base_vfptr
For multiple inheritance, things get more complicated :o)
Derived is Base, but Base is not a Derived
base- is the object you are deriving from.
derived - is the object the inherits his father's public (and protected) members.
a derived object can override (or in some cases must override) some of his father's methods, thus creating a different behavior
A base object is one from which others are derived. Typically it'll have some virtual methods (or even pure virtual) that subclasses can override to specialize.
A subclass of a base object is known as a derived object.
The derived object is derived from its base object(s).
Are you asking about the respective objects' representation in memory?
Both the base class and the derived class will have a table of pointers to their virtual functions. Depending on which functions have been overridden, the value of entries in that table will change.
If B adds more virtual functions that aren't in the base class, B's table of virtual methods will be larger (or there may be a separate table, depending on compiler implementation).
What's the difference between a derived object and a base object in c++,
A derived object can be used in place of a base object; it has all the members of the base object, and maybe some more of its own. So, given a function taking a reference (or pointer) to the base class:
void Function(Base &);
You can pass a reference to an instance of the derived class:
class Derived : public Base {};
Derived derived;
Function(derived);
especially, when there is a virtual function in the class.
If the derived class overrides a virtual function, then the overridden function will always be called on objects of that class, even through a reference to the base class.
class Base
{
public:
virtual void Virtual() {cout << "Base::Virtual" << endl;}
void NonVirtual() {cout << "Base::NonVirtual" << endl;}
};
class Derived : public Base
{
public:
virtual void Virtual() {cout << "Derived::Virtual" << endl;}
void NonVirtual() {cout << "Derived::NonVirtual" << endl;}
};
Derived derived;
Base &base = derived;
base.Virtual(); // prints "Derived::Virtual"
base.NonVirtual(); // prints "Base::NonVirtual"
derived.Virtual(); // prints "Derived::Virtual"
derived.NonVirtual();// prints "Derived::NonVirtual"
Does the derived object maintain additional tables to hold the pointers to functions?
Yes - both classes will contain a pointer to a table of virtual functions (known as a "vtable"), so that the correct function can be found at runtime. You can't access this directly, but it does affect the size and layout of the data in memory.