I was reading here about how primary bases are chosen:
"...2. If C is a dynamic class type:
a. Identify all virtual base classes, direct or indirect, that are primary base classes for some other direct or indirect base class. Call these indirect primary base classes.
b. If C has a dynamic base class, attempt to choose a primary base class B. It is the first (in direct base class order) non-virtual dynamic base class, if one exists. Otherwise, it is a nearly empty virtual base class, the first one in (preorder) inheritance graph order which is not an indirect primary base class if any exist, or just the first one if they are all indirect primaries..."
And after there is this correction:
"Case (2b) above is now considered to be an error in the design. The use of the first indirect primary base class as the derived class' primary base does not save any space in the object, and will cause some duplication of virtual function pointers in the additional copy of the base classes virtual table.
The benefit is that using the derived class virtual pointer as the base class virtual pointer will often save a load, and no adjustment to the this pointer will be required for calls to its virtual functions.
It was thought that 2b would allow the compiler to avoid adjusting this in some cases, but this was incorrect, as the virtual function call algorithm requires that the function be looked up through a pointer to a class that defines the function, not one that just inherits it. Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to. For instance, consider this example:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
When B and C are declared, A is a primary base in each case, so although vcall offsets are allocated in the A-in-B and A-in-C vtables, no this adjustment is required and no thunk is generated. However, inside D objects, A is no longer a primary base of C, so if we allowed calls to C::f() to use the copy of A's vtable in the C subobject, we would need to adjust this from C* to B::A*, which would require a third-party thunk. Since we require that a call to C::f() first convert to A*, C-in-D's copy of A's vtable is never referenced, so this is not necessary."
Could you please explain with an example what this refers to: "Removing that requirement would not be a good idea, as there would then no longer be a way to emit all thunks with the functions they jump to"?
Also, what are third-party thunks?
I do not understand either what the quoted example tries to show.
A is a nearly empty class, one that contains only a vptr and no visible data members:
struct A { virtual void f(); };
The layout of A is:
A_vtable *vptr
B has a single nearly empty base class used as a "primary":
struct B : virtual public A { int i; };
It means that the layout of B begins with the layout of an A, so that a pointer to a B is a pointer to an A (in assembly language). Layout of B subobject:
B_vtable *A_vptr
int i
A_vptr will point to a B vtable obviously, which is binary compatible with A vtable.
The B_vtable extends the A_vtable, adding all necessary information to navigate to the virtual base class A.
Layout of B complete object:
A base_subobject
int i
And same for C:
C_vtable *A_vptr
int j
Layout of C complete object:
A base_subobject
int j
In D obviously there is only an A subobject, so the layout of a complete object is:
A base_subobject
int i
not(A) not(base_subobject) aka (C::A)_vptr
int j
not(A) is the representation of an A nearly empty base class, that is, a vptr for A, but not a true A subobject: it looks like an A but the visible A is two words above. It's a ghost A!
(C::A)_vptr is the vptr to vtable with layout vtable for C (so also one with layout vtable for A), but for a C subobject where A is not finally a primary base: the C subobject has lost the privilege to host the A base class. So obviously the virtual calls through (C::A)_vptr to virtual functions defined A (there is only one: A::f()) need a this ajustement, with a thunk "C::A::f()" that receives a pointer to not(base_subobject) and adjusts it to the real base_subobject of type A that is the (two words above in the example). (Or if there is an overrider in D, so the D object that is at the exact same address, two words above in the example.)
So given these definitions:
struct A { virtual void f(); };
struct B : virtual public A { int i; };
struct C : virtual public A { int j; };
struct D : public B, public C {};
should the use of a ghost lvalue of a non existant A primary base work?
D d;
C *volatile cp = &d;
A *volatile ghost_ap = reinterpret_cast<A*> (cp);
ghost_ap->f(); // use the vptr of C::A: safe?
(volatile used here to avoid propagation of type knowledge by the compiler)
If for lvalues of C, for a virtual functions that is inherited from A, the call is done via the the C vptr that is also a C::A vptr because A is a "static" primary base of C, then the code should work, because a thunk has been generated that goes from C to D.
In practice it doesn't seem to work with GCC, but if you add an overrider in C:
struct C : virtual public A {
int j;
virtual void f()
{
std::cout << "C:f() \n";
}
};
it works because such concrete function is in the vtable of vtable of C::A.
Even with just a pure virtual overrider:
struct C : virtual public A {
int j;
virtual void f() = 0;
};
and a concrete overrider in D, it also works: the pure virtual override is enough to have the proper entry in the vtable of C::A.
Test code: http://codepad.org/AzmN2Xeh
Related
I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?
I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.
Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.
The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.
Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).
Inherently primary base class: the simplest case of polymorphic inheritance
The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.
These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)
Considering the case where the return type isn't covariant; or:
Trivial covariance
An example: the case where it's covariant with the same type as *this; as in:
struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance
Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.
Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.
Conclusion
In these cases the type of the declaration of the overriding function is trivially different from the type of the base:
all parameters are almost the same (with only a trivial difference on the type of this)
the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)
(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically
So no vtable entry is added for the derived declaration.
(So making the class final wouldn't be vtable simplification.)
Never primary base
Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.
(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.
The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:
struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };
Each base has its own vptr scalar member, and these vptr have different purposes:
B1::vptr points to a B1_vtable structure
B2::vptr points to a B2_vtable structure
and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because
The vtables have distinct entries:
B1_vtable.f_ptr points to the final overrider for B1::f()
B2_vtable.f_ptr points to the final overrider for B2::f()
B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)
The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)
(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)
(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".
(***) That they sometimes have the same final overrider in a further derived class:
struct DD : D { void f(); }
isn't useful for the purpose of defining the ABI of D.
So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.
So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.
So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:
D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).
the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.
Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.
Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.
(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used
Not inherently primary base
Regular (non virtual) inheritance is simple like membership:
a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);
the placement of a non virtual base is fixed;
base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).
A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.
That offset can never be known because C++ supports both unifying and duplicating inheritance:
virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;
non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):
struct B { virtual void f(); };
struct D1 : B { virtual void f(); }; // final overrider
struct D2 : B { virtual void f(); }; // final overrider
struct DD : D1, D2 { };
Here DD has two distinct final overriders of B::f():
DD::D1::f() is final overrider for DD::D1::B::f()
DD::D2::f() is final overrider for DD::D2::B::f()
in two distinct vtable entries.
Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).
Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:
struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };
There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.
So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).
In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.
This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.
(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members
So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:
the offset at which they reside will only be determined in the most derived class;
the offset might or might not be zero;
a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;
a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).
This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.
A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...
...a non-morally virtual base is inherently reversible in every case:
there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);
the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;
(in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:
static_cast<D*>((B*)d) is ill formed;
dynamic_cast<D*>((B*)d) will work for the simple cases.
So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:
struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary
struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};
Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.
When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.
Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:
there are additional guarantees/invariants on that scalar member, and the vtable is extended;
no new vptr is needed for Da.
So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():
in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.
in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.
Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.
Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.
[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:
struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};
Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.
The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.
--end example]
But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.
Possible extension of final
There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?
First we need to define what is overriding a virtual base inheritance:
An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:
an almost direct subobject can be initialized like a direct subobject;
access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).
Virtual inheritance provides almost direct access:
constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;
when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.
A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};
Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};
Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.
struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};
The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:
struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};
Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.
-- end analog/example]
That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).
Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.
I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:
// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };
// in car.cpp
// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }
If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.
I'm not 100% sure about this though, some of it is speculation.
If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:
class A {
virtual void f();
};
class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};
The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand
Consider the following code:
#include <iostream>
class A
{
public:
virtual void f() = 0;
virtual void g() = 0;
};
class B : virtual public A
{
public:
virtual void f()
{
g();
}
};
class C : virtual public A
{
public:
virtual void g()
{
std::cout << "C::g" << std::endl;
}
};
class D : public C, public B
{
};
int main()
{
B* b = new D;
b->f();
}
The output of the following program is C::g.
How does the compiler invoke a function of a sister class of class B??
N3337 10.3/9
[ Note: The interpretation of the call of a virtual function depends on the type of the object for which it is
called (the dynamic type), whereas the interpretation of a call of a non-virtual member function depends
only on the type of the pointer or reference denoting that object (the static type) (5.2.2). — end note ]
The dynamic type is type to which pointer really points, not type that was declared as pointed type.
Therefore:
D d;
d.g(); //this results in C::g as expected
is same as:
B* b = new D;
b->g();
And because inside your B::f call to g() is (implicitly) called on this pointer whose dynamic type is D, call resolves to D::f, which is C::f.
If you look closely, it's the (exactly) same behaviour as shown in code above, only that b is now implicit this instead.
That's the whole point of virtual functions.
It's the behavior of virtual: B call g through f but g is resolve at runtime (like f). So, at runtime, the only available override of g for D is the one implemented in C
g is resolved at runtime, like all virtual functions. Because of the way D is defined it's resolved into whatever C implements.
If you don't want this behaviour you should either call a non-virtual implementation of g (you can delegate to that function from the virtual one as well), or explicitly call B's implementation using B::g().
Though if you do this your design will be a lot more complicated than it probably needs to be so try to find a solution that doesn't rely on all these tricks.
Virtual function calls are resolved at runtime, by reference to the instance's VTable. A distinct VTable exists for any virtual class (so in above each of A, B, C and D have a VTable). Every runtime instance has a pointer to one of these tables, determined by its dynamic type.
The VTable lists every virtual function in the class, mapping it to the actual function that should be called at runtime. For normal inheritance these are listed in order of declaration, so a base class can use the VTable of a derived class to resolve virtual functions that it declares (because the derived class, listing the functions in declaration order, will have all the base class's functions listed first in excatly the same order the base class's own VTable). For virtual inheritance (as above), this is slightly more complicated, but essentially the base class within the derived class still has its own VTable pointer that points to the relevant section inside the derived's VTable.
In practice this means your D class has a VTable entry for g that points to C's implementation. Even when accessed through the B static type it will still refer to this same VTable to resolve g.
In C++, there is no class representation at run-time but I can always call an overridden virtual method in the derived class. where is that overridden method saved in the vtable? here's a piece of code to demonstrate:
struct B1 {
virtual void f() { ... }
};
struct B2 {
virtual void f() { ... }
virtual void g() { ... }
};
struct D : B1, B2 {
void f() { ... }
virtual void h() { ... }
};
What's the memory layout for an object of class D ? Where are B1::f and B2::f saved in that memory layout (if they're saved at all) ?
An object d of Class D will have only a pointer to the VMT of class D, which will contain a pointer to D::f.
Since B1:f and B2::f can be called only statically from the scope of D class, there is no need for object d to keep a dynamic pointer to those overridden methods.
This of cause is not defined in the standard, this is just the usual/logical implementation of the compiler.
In fact the picture is more complicated, since the VMT of class D incorporates the VMTs of classes B1 and B2. But anyway, there is no need to dynamically call B1::f until an object of class B1 is created.
When compiler uses vtable method of virtual dispatch*, the address of the overriden member function is stored in the vtable of the base class in which the function is defined.
Each class has access to vtables of all of its base classes. These vtables are stored outside of the memory layout for the class itself. Each class with virtual member functions, declared or inherited, has a single pointer to its own vtable. When you call an overriden member function, you supply the name of the base class whose member function you wish to call. The compiler knows about vtables of all classes, to it knows how to locate the vtable of your base class, does the lookup at compile time, and calls the member function directly.
Here is a short example:
struct A {
virtual void foo() { cout << "A"; }
};
struct B : public A { }; // No overrides
struct C : public B {
virtual void foo() { cout << "C"; }
void bar() { B::foo(); }
};
Demo.
In the example above the compiler needs to look up B::foo, which is not defined in class B. The compiler consults its symbol table to find out that B::foo is implemented in A, and generates the call to A::foo inside C::bar.
* vtables is not the only method of implementing virtual dispatch. C++ standard does not require vtables to be used.
Although nothing is mandated in the C++ standard, every known C++ implementation uses the same approach: every class with at least a virtual function has a vptr (pointer to vtable).
You didn't mention virtual inheritance which is a different, more subtle inheritance relation; non-virtual inheritance is a simple exclusive relation between a base class subobject and a derived class. I will assume all inheritance relations are not virtual in this answer.
Here I assume we derive from classes with at least a virtual function.
In case of single inheritance, the vptr from the base class is reused. (Not reusing it just wastes space and run time.) The base class is called "primary base class".
In case of multiple inheritance, the layout of the derived class contains the layout of every base class, just like the layout of a struct in C contains the layout of every member. The layout of D is B1 then B2 (in any order actually, but the source code order is usually kept).
The first class is the primary base class: in D the vptr from B1 points to a complete vtable for D, the vtable with all the virtual functions of D. Each vptr from a non-primary base class points to a secondary vtable of D: a vtable with only the virtual functions from this secondary base class.
The constructor of D must initialize every vptr of the class instance to point to the appropriate vtable of D.
How many vptrs are usually needed for a object whose clas( child ) has single inheritance with a base class which multiple inherits base1 and base2. What is the strategy for identifying how many vptrs a object has provided it has couple of single inheritance and multiple inheritance. Though standard doesn't specify about vptrs but I just want to know how an implementation does virtual function implementation.
Why do you care? The simple answer is enough, but I guess you want something more complete.
This is not part of the standard, so any implementation is free to do as they wish, but a general rule of thumb is that in an implementation that uses virtual table pointers, as a zeroth approximation, for the dynamic dispatch you need at most as many pointers to virtual tables as there are classes that add a new virtual method to the hierarchy. (In some cases the virtual table can be extended, and the base and derived types share a single vptr)
// some examples:
struct a { void foo(); }; // no need for virtual table
struct b : a { virtual foo1(); }; // need vtable, and vptr
struct c : b { void bar(); }; // no extra virtual table, 1 vptr (b) suffices
struct d : b { virtual bar(); }; // 1 vtable, d extends b's vtable
struct e : d, b {}; // 2 vptr, 1 for the d and 1 for b
struct f : virtual b {}; // 1 vptr, f reuse b's vptr to locate subobject b
struct g : virtual b {}; // 1 vptr, g reuse b's vptr to locate subobject b
struct h : f, g {}; // 2 vptr, 1 for f, 1 for g
// h can locate subobject b using f's vptr
Basically each subobject of a type that requires its own dynamic dispatch (cannot directly reuse the parents) would need its own virtual table and vptr.
In reality compilers merge different vtables into a single vtable. When d adds a new virtual function over the set of functions in b, the compiler will merge the potential two tables into a single one by appending the new slots to the end of the vtable, so the vtable for d will be a extended version of the vtable for b with extra elements at the end maintaining binary compatibility (i.e. the d vtable can be interpreted as a b vtable to access the methods available in b), and the d object will have a single vptr.
In the case of multiple inheritance things become a bit more complicated as each base needs to have the same layout as a subobject of the complete object than if it was a separate object, so there will be extra vptrs pointing to different regions in the complete object's vtable.
Finally in the case of virtual inheritance things become even more complicated, and there might be multiple vtables for the same complete object with the vptr's being updated as construction/destruction evolves (vptr's are always updated as construction/destruction evolves, but without virtual inheritance the vptr will point to the base's vtables, while in the case of virtual inheritance there will be multiple vtables for the same type)
The fine print
Anything regarding vptr/vtable is not specified, so this is going to be compiler dependent for the fine details, but the simple cases are handled the same by almost every modern compiler (I write "almost" just in case).
You have been warned.
Object layout: non-virtual inheritance
If you inherit from base classes, and they have a vptr, you naturally have as many inherited vptr in your class.
The question is: When will the compiler add a vptr to a class that already has an inherited vptr?
The compiler will try to avoid adding redundant vptr:
struct B {
virtual ~B();
};
struct D : B {
virtual void foo();
};
Here B has a vptr, so D does not get its own vptr, it reuses the existing vptr; the vtable of B is extended with an entry for foo(). The vtable for D is "derived" from the vtable for B, pseudo-code:
struct B_vtable {
typeinfo *info; // for typeid, dynamic_cast
void (*destructor)(B*);
};
struct D_vtable : B_vtable {
void (*foo)(D*);
};
The fine print, again: this is a simplification of a real vtable, to get the idea.
Virtual inheritance
For non virtual single inheritance, there is almost no room for variation between implementations. For virtual inheritance, there are a lot more variations between compilers.
struct B2 : virtual A {
};
There is a conversion from B2* to A*, so a B2 object must provide this functionality:
either with a A* member
either with an int member: offset_of_A_from_B2
either using its vptr, by storing offset_of_A_from_B2 in the vtable
In general, a class will not reuse the vptr of its virtual base class (but it can in a very special case).
What's the difference between a derived object and a base object in c++,
especially, when there is a virtual function in the class.
Does the derived object maintain additional tables to hold the pointers
to functions?
The derived object inherits all the data and member functions of the base class. Depending on the nature of the inheritance (public, private or protected), this will affect the visibility of these data and member functions to clients (users) of your class.
Say, you inherited B from A privately, like this:
class A
{
public:
void MyPublicFunction();
};
class B : private A
{
public:
void MyOtherPublicFunction();
};
Even though A has a public function, it won't be visible to users of B, so for example:
B* pB = new B();
pB->MyPublicFunction(); // This will not compile
pB->MyOtherPublicFunction(); // This is OK
Because of the private inheritance, all data and member functions of A, although available to the B class within the B class, will not be available to code that simply uses an instance of a B class.
If you used public inheritance, i.e.:
class B : public A
{
...
};
then all of A's data and members will be visible to users of the B class. This access is still restricted by A's original access modifiers, i.e. a private function in A will never be accessible to users of B (or, come to that, code for the B class itself). Also, B may redeclare functions of the same name as those in A, thus 'hiding' these functions from users of the B class.
As for virtual functions, that depends on whether A has virtual functions or not.
For example:
class A
{
public:
int MyFn() { return 42; }
};
class B : public A
{
public:
virtual int MyFn() { return 13; }
};
If you try to call MyFn() on a B object through a pointer of type A*, then the virtual function will not be called.
For example:
A* pB = new B();
pB->MyFn(); // Will return 42, because A::MyFn() is called.
but let's say we change A to this:
class A
{
public:
virtual void MyFn() { return 42; }
};
(Notice A now declares MyFn() as virtual)
then this results:
A* pB = new B();
pB->MyFn(); // Will return 13, because B::MyFn() is called.
Here, the B version of MyFn() is called because the class A has declared MyFn() as virtual, so the compiler knows that it must look up the function pointer in the object when calling MyFn() on an A object. Or an object it thinks it is an A, as in this case, even though we've created a B object.
So to your final question, where are the virtual functions stored?
This is compiler/system dependent, but the most common method used is that for an instance of a class that has any virtual functions (whether declared directly, or inherited from a base class), the first piece of data in such an object is a 'special' pointer. This special pointer points to a 'virtual function pointer table', or commonly shortened to 'vtable'.
The compiler creates vtables for every class it compiles that has virtual functions. So for our last example, the compiler will generate two vtables - one for class A and one for class B. There are single instances of these tables - the constructor for an object will set up the vtable-pointer in each newly created object to point to the correct vtable block.
Remember that the first piece of data in an object with virtual functions is this pointer to the vtable, so the compiler always knows how to find the vtable, given an object that needs to call a virtual function. All the compiler has to do is look at the first memory slot in any given object, and it has a pointer to the correct vtable for that object's class.
Our case is very simple - each vtable is one entry long, so they look like this:
vtable for A class:
+---------+--------------+
| 0: MyFn | -> A::MyFn() |
+---------+--------------+
vtable for B class:
+---------+--------------+
| 0: MyFn | -> B::MyFn() |
+---------+--------------+
Notice that for the vtable for the B class, the entry for MyFn has been overwritten with a pointer to B::MyFn() - this ensures that when we call the virtual function MyFn() even on an object pointer of type A*, the B version of MyFn() is correctly called, instead of the A::MyFn().
The '0' number is indicating the entry position in the table. In this simple case, we only have one entry in each vtable, so each entry is at index 0.
So, to call MyFn() on an object (either of type A or B), the compiler will generate some code like this:
pB->__vtable[0]();
(NB. this won't compile; it's just an explanation of the code the compiler will generate.)
To make it more obvious, let's say A declares another function, MyAFn(), which is virtual, which B does not over-ride/re-implement.
So the code would be:
class A
{
public:
virtual void MyAFn() { return 17; }
virtual void MyFn() { return 42; }
};
class B : public A
{
public:
virtual void MyFn() { return 13; }
};
then B will have the functions MyAFn() and MyFn() in its interface, and the vtables will now look like this:
vtable for A class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> A::MyFn() |
+----------+---------------+
vtable for B class:
+----------+---------------+
| 0: MyAFn | -> A::MyAFn() |
+----------+---------------+
| 1: MyFn | -> B::MyFn() |
+----------+---------------+
So in this case, to call MyFn(), the compiler will generate code like this:
pB->__vtable[1]();
Because MyFn() is second in the table (and so at index 1).
Obviously, calling MyAFn() will cause code like this:
pB->__vtable[0]();
because MyAFn() is at index 0.
It should be emphasised that this is compiler-dependent, and iirc, the compiler is under no obligation to order the functions in the vtable in the order they are declared - it's just up to the compiler to make it all work under the hood.
In practice, this scheme is widely used, and function ordering in vtables is fairly deterministic, so ABI between code generated by different C++ compilers is maintained, and allows COM interoperation and similar mechanisms to work across boundaries of code generated by different compilers. This is in no way guaranteed.
Luckily, you'll never have to worry much about vtables, but it's definitely useful to get your mental model of what is going on to make sense and not store up any surprises for you in the future.
More theoretically, if you derive one class from another, you have a base class and a derived class. If you create an object of a derived class, you have a derived object. In C++, you can inherit from the same class multiple times. Consider:
struct A { };
struct B : A { };
struct C : A { };
struct D : B, C { };
D d;
In the d object, you have two A objects within each D objects, which are called "base-class sub-objects". If you try to convert D to A, then the compiler will tell you the conversion is ambiguous, because it doesn't know to which A object you want to convert:
A &a = d; // error: A object in B or A object in C?
Same goes if you name a non-static member of A: The compiler will tell you about an ambiguity. You can circumvent it in this case by converting to B or C first:
A &a = static_cast<B&>(d); // A object in B
The object d is called the "most derived object", because it's not a sub-object of another object of class type. To avoid the ambiguity above, you can inherit virtually
struct A { };
struct B : virtual A { };
struct C : virtual A { };
struct D : B, C { };
Now, there is only one subobject of type A, even though you have two subobject that this one object is contained in: subobject B and sub-object C. Converting a D object to A is now non-ambiguous, because conversion over both the B and the C path will yield the same A sub-object.
Here comes a complication of the above: Theoretically, even without looking at any implementation technique, either or both of the B and C sub-objects are now not contiguous anymore. Both contain the same A object, but both doesn't contain each other either. This means that one or both of those must be "split up" and merely reference the A object of the other, so that both B and C objects can have different addresses. In linear memory, this may look like (let's assume all objecs have size of 1 byte)
C: [1 byte [A: refer to 0xABC [B: 1byte [A: one byte at 0xABC]]]]
[CCCCCCC[ [BBBBBBBBBBCBCBCBCBCBCBCBCBCBCB]]]]
CB is what both the C and the B sub-object contains. Now, as you see, the C sub-object would be split up, and there is no way without, because B is not contained in C, and neither the other way around. The compiler, to access some member using code in a function of C, can't just use an offset, because the code in a function of C doesn't know whether it's contained as a sub-object, or - when it's not abstract - whether it's a most derived object and thus has the A object directly next to it.
a public colon. ( I told you C++ was nasty )
class base { }
class derived : public base { }
let's have:
class Base {
virtual void f();
};
class Derived : public Base {
void f();
}
without f being virtual (as implemented in pseudo "c"):
struct {
BaseAttributes;
} Base;
struct {
BaseAttributes;
DerivedAttributes;
} Derived;
with virtual functions:
struct {
vfptr = Base_vfptr,
BaseAttributes;
} Base;
struct {
vfptr = Derived_vfptr,
BaseAttributes;
DerivedAttributes;
} Derived;
struct {
&Base::f
} Base_vfptr
struct {
&Derived::f
} Base_vfptr
For multiple inheritance, things get more complicated :o)
Derived is Base, but Base is not a Derived
base- is the object you are deriving from.
derived - is the object the inherits his father's public (and protected) members.
a derived object can override (or in some cases must override) some of his father's methods, thus creating a different behavior
A base object is one from which others are derived. Typically it'll have some virtual methods (or even pure virtual) that subclasses can override to specialize.
A subclass of a base object is known as a derived object.
The derived object is derived from its base object(s).
Are you asking about the respective objects' representation in memory?
Both the base class and the derived class will have a table of pointers to their virtual functions. Depending on which functions have been overridden, the value of entries in that table will change.
If B adds more virtual functions that aren't in the base class, B's table of virtual methods will be larger (or there may be a separate table, depending on compiler implementation).
What's the difference between a derived object and a base object in c++,
A derived object can be used in place of a base object; it has all the members of the base object, and maybe some more of its own. So, given a function taking a reference (or pointer) to the base class:
void Function(Base &);
You can pass a reference to an instance of the derived class:
class Derived : public Base {};
Derived derived;
Function(derived);
especially, when there is a virtual function in the class.
If the derived class overrides a virtual function, then the overridden function will always be called on objects of that class, even through a reference to the base class.
class Base
{
public:
virtual void Virtual() {cout << "Base::Virtual" << endl;}
void NonVirtual() {cout << "Base::NonVirtual" << endl;}
};
class Derived : public Base
{
public:
virtual void Virtual() {cout << "Derived::Virtual" << endl;}
void NonVirtual() {cout << "Derived::NonVirtual" << endl;}
};
Derived derived;
Base &base = derived;
base.Virtual(); // prints "Derived::Virtual"
base.NonVirtual(); // prints "Base::NonVirtual"
derived.Virtual(); // prints "Derived::Virtual"
derived.NonVirtual();// prints "Derived::NonVirtual"
Does the derived object maintain additional tables to hold the pointers to functions?
Yes - both classes will contain a pointer to a table of virtual functions (known as a "vtable"), so that the correct function can be found at runtime. You can't access this directly, but it does affect the size and layout of the data in memory.