Virtual multiple inheritance - final overrider - c++

while trying to analyse in greater depth inheritance mechanism of C++ I stumbled upon the following example:
#include<iostream>
using namespace std;
class Base {
public:
virtual void f(){
cout << "Base.f" << endl;
}
};
class Left : public virtual Base {
};
class Right : public virtual Base{
public:
virtual void f(){
cout << "Right.f" << endl;
}
};
class Bottom : public Left, public Right{
};
int main(int argc,char **argv)
{
Bottom* b = new Bottom();
b->f();
}
The above, somehow, compiles and calls Right::f(). I see what might be going on in the compiler, that it understands that there is one shared Base object, and that Right overrides f(), but really, in my understanding, there should be two methods: Left::f() (inherited from Base::f()) and Right::f(), which overrides Base::f(). Now, I would think, based that there are two separate methods being inherited by Bottom, both with same signature, there should be a clash.
Could anyone explain which specification detail of C++ deals with this case and how it does it from the low-level perspective?

In the dreaded diamond there is a single base, from which the two intermediate objects derive and then the fourth type closes the diamond with multiple inheritance from both types in the intermediate levels.
Your question seems to be how many f functions are declared in the previous example? and the answer is one.
Lets start with the simpler example of a linear hierarchy of just base and derived:
struct base {
virtual void f() {}
};
struct derived : base {
virtual void f() {}
};
In this example there is a single f declared for which there are two overrides, base::f and derived::f. In an object of type derived, the final overrider is derived::f. It is important to note that both f functions represent a single function that has multiple implementations.
Now, going back to the original example, on the line on the right, Base::f and Right::f are in the same way the same function that is overridden. So for an object of type Right, the final overrider is Right::f. Now for a final object of type Left, the final overrider is Base::f as Left does not override the function.
When the diamond is closed, and because inheritance is virtual there is a single Base object, that declares a single f function. In the second level of inheritance, Right overrides that function with its own implementation and that is the final overrider for the most derived type Bottom.
You might want to look at this outside of the standard and take a look at how this is actually implemented by compilers. The compiler, when creating the Base object it adds a hidden pointer vptr to the virtual table. The virtual table holds pointers to thunks (for simplicity just assume that the table held pointers to the function's final overriders, [1]). In this case, the Base object will contain no member data and just a pointer to a table that holds a pointer to the function Base::f.
When Left extends Base, a new vtable is created for Left and the pointer in that vtable is set to the final overrider of f at this level, which is incidentally Base::f so the pointers in both vtables (ignoring the trampolin) jump to the same actual implementation. When an object of type Left is being constructed, the Basesubobject is initialized first, and then prior to initialization of the members of Left (if there were) the Base::vptr pointer is updated to refer to Left::vtable (i.e. the pointer stored in Base refers to the table defined for Left).
On the other side of the diamond, the vtable that is created for Right contains a single thunk that ends up calling Right::f. If an object of type Right was to be created the same initialization process would happen and the Base::vptr would point to Derived::f.
Now we get to the final object Bottom. Again, a vtable is generated for the type Bottom and that vtable, as is the case in all others, contains a single entry that represents f. The compiler analyzes the hierarchy of inheritance and determines that Right::f overrides Base::f, and there is no equivalent override on the left branch, so in Bottom's vtable the pointer representing f refers to Right::f. Again, during construction of the Bottom object, the Base::vptr is updated to refer to Bottom's vtable.
As you see, all four vtables have a single entry for f, there is a single f in the program, even if the value stored in each vtable is different (the final overriders differ).
[1] The thunk is a small piece of code that adapts the this pointer if needed (multiple inheritance usually implies it is needed) and then forwards the call to the actual override. In the event of single inheritance, the this pointer does not need to be updated and the thunk disappears, with the entry in the vtable pointing directly to the actual function.

Related

Does making a derived C++ class "final" change the ABI?

I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?
I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.
Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.
The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.
Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).
Inherently primary base class: the simplest case of polymorphic inheritance
The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.
These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)
Considering the case where the return type isn't covariant; or:
Trivial covariance
An example: the case where it's covariant with the same type as *this; as in:
struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance
Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.
Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.
Conclusion
In these cases the type of the declaration of the overriding function is trivially different from the type of the base:
all parameters are almost the same (with only a trivial difference on the type of this)
the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)
(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically
So no vtable entry is added for the derived declaration.
(So making the class final wouldn't be vtable simplification.)
Never primary base
Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.
(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.
The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:
struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };
Each base has its own vptr scalar member, and these vptr have different purposes:
B1::vptr points to a B1_vtable structure
B2::vptr points to a B2_vtable structure
and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because
The vtables have distinct entries:
B1_vtable.f_ptr points to the final overrider for B1::f()
B2_vtable.f_ptr points to the final overrider for B2::f()
B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)
The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)
(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)
(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".
(***) That they sometimes have the same final overrider in a further derived class:
struct DD : D { void f(); }
isn't useful for the purpose of defining the ABI of D.
So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.
So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.
So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:
D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).
the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.
Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.
Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.
(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used
Not inherently primary base
Regular (non virtual) inheritance is simple like membership:
a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);
the placement of a non virtual base is fixed;
base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).
A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.
That offset can never be known because C++ supports both unifying and duplicating inheritance:
virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;
non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):
struct B { virtual void f(); };
struct D1 : B { virtual void f(); }; // final overrider
struct D2 : B { virtual void f(); }; // final overrider
struct DD : D1, D2 { };
Here DD has two distinct final overriders of B::f():
DD::D1::f() is final overrider for DD::D1::B::f()
DD::D2::f() is final overrider for DD::D2::B::f()
in two distinct vtable entries.
Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).
Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:
struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };
There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.
So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).
In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.
This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.
(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members
So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:
the offset at which they reside will only be determined in the most derived class;
the offset might or might not be zero;
a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;
a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).
This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.
A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...
...a non-morally virtual base is inherently reversible in every case:
there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);
the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;
(in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:
static_cast<D*>((B*)d) is ill formed;
dynamic_cast<D*>((B*)d) will work for the simple cases.
So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:
struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary
struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};
Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.
When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.
Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:
there are additional guarantees/invariants on that scalar member, and the vtable is extended;
no new vptr is needed for Da.
So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():
in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.
in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.
Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.
Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.
[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:
struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};
Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.
The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.
--end example]
But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.
Possible extension of final
There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?
First we need to define what is overriding a virtual base inheritance:
An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:
an almost direct subobject can be initialized like a direct subobject;
access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).
Virtual inheritance provides almost direct access:
constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;
when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.
A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};
Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};
Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.
struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};
The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:
struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};
Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.
-- end analog/example]
That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).
Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.
I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:
// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };
// in car.cpp
// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }
If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.
I'm not 100% sure about this though, some of it is speculation.
If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:
class A {
virtual void f();
};
class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};
The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand

Function resolution from vtable in C++

I have a confusion regarding vtable after reading more about name mangling.
for ex:
class Base
{
public:
virtual void print()
{
}
};
class A : public Base
{
public:
void hello()
{
....
}
void print()
{
}
};
A obj;
obj.hello();
Base* test = new A();
test->print();
As per my understanding after the name manging obj.hello() call will be converted to something like _ZASDhellov(&obj) now
how this virtual functions will be invoked from vtable?
my wild guess test->__vtable[_ZASDprintv](&test(dynamic cast to derived???)) is correct?
How the function names are resolved from vtable?
Firstly, vtables are not in any way part of the C++ language, but rather an implementation detail used by particular compilers. Below I describe one way it is commonly used as such.
Second, your function hello is not virtual. To make it virtual, you would simply pre-pend virtual to the declaration.
Assuming it is now virtual: Your guess is quite close. In fact, the vtable (to which a pointer is stored with every instance of a virtual class) is an array of function pointers. The way that a particular function is looked up in it is by its ordinal. The first declared virtual function in A is the first entry in its vtable, the second one is the second entry and so on. If A had a base class, the index of A's first (non-override) virtual function in the table would be n+1, where n is the index of the last virtual function of its base class. If A has more than one base class, their entries precede A's entries in order of their declaration as base classes of A.
If A uses virtual inheritance, the picture is a bit more complicated than that, I won't elaborate unless you're specifically interested.
UPDATE: I'll add a very brief description for the virtual inheritance case as requested. If A had Base as a virtual base class, A's vtable would store at the very beginning (before the function addresses) the byte offset of where Base's data starts within the A object. This is necessary because, unlike in normal inheritance, a base class does not have its data precede the derived class's data - instead it follows it. So in effect, any function call to a virtual function defined in Base has to have its this pointer offset by that amount. Additionally, Base would have to have its own vtable pointer, right at the beginning of its data where it expects to find it. Thus the full A object would contain two vtable pointers instead of one. The actual vtable pointed to by this second pointer would be the same one as the first vtable pointer, except advanced to skip the offset entry described above (so that any Base code using the vtable would find the first virtual function at the beginning where it is expected). Apart from these differences, the vtable itself is the same as before.

How do upcasting and vtables work together to ensure correct dynamic binding?

So, vtable is a table maintained by the compiler which contains function pointers that point to the virtual functions in that class.
and
Assigning a derived class's object to an ancestor class's object is called up-casting.
Up-casting is handling a derived class instance/object using a base class pointer or reference; the objects are not "assigned to", which implies an overwriting of value ala operator= invocation.
(Thanks to: Tony D)
Now, how it is known at run time "which" class's virtual function is supposed to be called?
Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?
You can imagine (although the C++ specification doesn't say this) that the vtable is an identifier (or some other metadata that can be used to "find more information" about the class itself) and a list of functions.
So, if we have a class like this:
class Base
{
public:
virtual void func1();
virtual void func2(int x);
virtual std::string func3();
virtual ~Base();
... some other stuff we don't care about ...
};
The compiler will then produce a VTable something like this:
struct VTable_Base
{
int identifier;
void (*func1)(Base* this);
void (*func2)(Base* this, int x);
std::string (*func3)(Base* this);
~Base(Base *this);
};
The compiler will then create an internal structure that, something like this (this is not possible to compile as C++, it's just to show what the compiler actually does - and I call it Sbase to differntiate the actual class Base)
struct SBase
{
VTable_Base* vtable;
inline void func1(Base* this) { vtable->func1(this); }
inline void func2(Base* this, int x) { vtable->func2(this, x); }
inline std::string func3(Base* this) { return vtable->func3(this); }
inline ~Base(Base* this) { vtable->~Base(this); }
};
It also builds the real vtable:
VTable_Base vtable_base =
{
1234567, &Base::func1, &Base::func2, &Base::func3, &Base::~Base
};
And in the constructor for Base, it will set the vtable = vtable_base;.
When we then add a derived class, where we override one function (and by default, the destructor, even if we don't declare one) :
class Derived : public Base
{
virtual void func2(int x) override;
};
The compiler will now make this structure:
struct VTable_Derived
{
int identifier;
void (*func1)(Base* this);
void (*func2)(Base* this, int x);
std::string (*func3)(Base* this);
~Base(Derived *this);
};
and then does the same "structure" building:
struct SDerived
{
VTable_Derived* vtable;
inline void func1(Base* this) { vtable->func1(this); }
inline void func2(Base* this, int x) { vtable->func2(this, x); }
inline std::string func3(Base* this) { return vtable->func3(this); }
inline ~Derived(Derived* this) { vtable->~Derived(this); }
};
We need this structure for when we are using Derived directly rather than through the Base class.
(We rely on the compiler chainin the ~Derived to call ~Base too, just like normal destructors that inherit)
And finally, we build an actual vtable:
VTable_Derived vtable_derived =
{
7654339, &Base::func1, &Derived::func2, &Base::func3, &Derived::~Derived
};
And again,the Derived constructor will set Dervied::vtable = vtable_derived for all instances.
Edit to answer question in comments: The compiler has to carefully place the various components in both VTable_Derived and SDerived such that it matches VTable_Base and SBase, so that when we have a pointer to Base, the Base::vtable and Base::funcN() are matching Derived::vtable and Derived::FuncN. If that doesn't match up, then the inheritance won't work.
If new virtual functions are added to Derived, they must then be placed after the ones inherited from Base.
End Edit.
So, when we do:
Base* p = new Derived;
p->func2();
the code will look up SBase::Func2, which will use the correct Derived::func2 (because the actual vtable inside p->vtable is VTable_Derived (as set by the Derived constructor that is called in conjunction with the new Derived).
I'll take a different route from the other answers and try to fill just the specific gaps in your knowledge, without going very much into the details. I'll address the mechanics just enough to help your understanding.
So, vtable is a table maintained by the compiler which contains function pointers that point to the virtual functions in that class.
The more precise way to say this is as follows:
Every class with virtual methods, including every class that inherits from a class with virtual methods, has its own virtual table. The virtual table of a class points to the virtual methods specific to that class, i.e. either inherited methods, overridden methods or newly added methods. Every instance of such a class contains a pointer to the virtual table that matches the class.
Up-casting is handling a derived class instance/object using a base class pointer or reference; (...)
Perhaps more enlightening:
Up-casting means that a pointer or reference to an instance of class Derived is treated as if it were a pointer or reference to an instance of class Base. The instance itself, however, is still purely an instance of Derived.
(When a pointer is "treated as a pointer to Base", that means that the compiler generates code for dealing with a pointer to Base. In other words, the compiler and the generated code know no better than that they are dealing with a pointer to Base. Hence, a pointer that is "treated as" will have to point to an object that offers at least the same interface as instances of Base. This happens to be the case for Derived because of inheritance. We'll see how this works out below.)
At this point we can answer the first version of your question.
Now, how it is known at run time "which" class's virtual function is supposed to be called?
Suppose we have a pointer to an instance of Derived. First we upcast it, so it is treated as a pointer to an instance of Base. Then we call a virtual method upon our upcasted pointer. Since the compiler knows that the method is virtual, it knows to look for the virtual table pointer in the instance. While we are treating the pointer as if it points to an instance of Base, the actual object has not changed value and the virtual table pointer within it is still pointing to the virtual table of Derived. So at runtime, the address of the method is taken from the virtual table of Derived.
Now, the particular method may be inherited from Base or it might be overridden in Derived. It does not matter; if inherited, the method pointer in the virtual table of Derived simply contains the same address as the corresponding method pointer in the virtual table of Base. In other words, both tables are pointing to the same method implementation for that particular method. If overridden, the method pointer in the virtual table of Derived differs from the corresponding method pointer in the virtual table of Base, so method lookups on instances of Derived will find the overridden method while lookups on instances of Base will find the original version of the method — regardless of whether a pointer to the instance is treated as a pointer to Base or a pointer to Derived.
Finally, it should now be straightforward to explain why the second version of your question is a bit misguided:
Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?
This question presupposes that vtable lookups are first by method and then by class. It is the other way round: first, the vtable pointer in the instance is used to find the vtable for the right class. Then, the vtable for that class is used to find the right method.
Which entry in vtable refers to the function of "particular" derived
classes which is supposed to be called at run time?
None, it is not an entry in the vtable, but the vtable pointer that is part of each and every object instance that determines which are the correct set of virtual functions for that particular object. This way, depending on the actual vtable pointed to, invoking the "first virtual method" from the vtable may result in the calling of different functions for objects of different types in the same polymorphic hierarchy.
Implementations may vary, but what I personally consider the most logical and performing thing to do is to have the vtable pointer being the first element in the class layout. This way you can dereference the very address of the object to determine its type based on the value of the pointer sitting in that address, since all objects of a given type will have that pointer pointing to the same vtable, which is created uniquely for every object that has virtual methods, which is required to enable features as overriding certain virtual methods.
How do upcasting and vtables work together to ensure correct dynamic
binding?
Upcasting itself isn't strictly needed, neither is downcasting. Remember that you already have the object allocated in memory, and it will already have its vtable pointer set to the correct vtable for that type which is what ensures it, up an down casting doesn't change the vtable for that object, it only changes the pointer you operate through.
Downcasting is needed when you want to access functionality that is not available in the base class and is declared in the derived class. But before you try to do that, you must be sure that particular object is of or inherits the type which declares that functionality, which is where dynamic_cast comes in, when you dynamic cast the compiler generates a check for that vtable entry and whether it inherits the requested type from another table, generated at compile time, and if so the dynamic cast succeeds, otherwise it fails.
The pointer you access the object through doesn't refer to the right set of virtual functions to call, it merely serves as a gauge to which functions in the vtable you can refer to as the developer. That is why it is safe to upcast using a C style or static cast, which performs no runtime checks, because then you only limit your gauge to the functions available in the base class, which are already available in the derived class, so there is no room for error and harm. And that's why you must always use a dynamic cast or some other custom technique still based on virtual dispatch when you downcast, because you have to be sure that object's associated vtable does indeed contain the extra functionality you may invoke.
Otherwise you will get undefined behavior, and of the "bad kind" at that, meaning something fatal will most likely happen, since interpreting arbitrary data as an address of a function of particular signature to be called is a very big no-no.
Also note that in a static context, i.e. when it is known at compile time what the type is, the compiler will most likely not use the vtable to call virtual functions but use direct static calls or even inline certain functions, which will make them that much faster. In such cases upcasting and using a base class pointer instead of the actual object will only diminish that optimization.
Polymorphism and Dynamic Dispatch (hyper-abridged version)
Note: I was not able to fit enough information about multiple inheritance with virtual bases, as there is not much of anything simple about it, and the details would clutter the exposition (further). This answer demonstrates the mechanisms used to implement dynamic dispatch assuming only single inheritance.
Interpreting abstract types and their behaviors visible across module boundaries requires a common Application Binary Interface (ABI). The C++ standard, of course, does not require the implementation of any particular ABI.
An ABI would describe:
The layout of virtual method dispatch tables (vtables)
The metadata required for runtime type checks and cast operations
Name decoration (a.k.a. mangling), calling conventions, and many other things.
Both modules in the following example, external.so and main.o, are assumed to have been linked to the same runtime. Static and dynamic binding give preference to symbols located within the calling module.
An external library
external.h (distributed to users):
class Base
{
__vfptr_t __vfptr; // For exposition
public:
__attribute__((dllimport)) virtual int Helpful();
__attribute__((dllimport)) virtual ~Base();
};
class Derived : public Base
{
public:
__attribute__((dllimport)) virtual int Helpful() override;
~Derived()
{
// Visible destructor logic here.
// Note: This is in the header!
// __vft#Base gets treated like any other imported symbol:
// The address is resolved at load time.
//
this->__vfptr = &__vft#Base;
static_cast<Base *>(this)->~Base();
}
};
__attribute__((dllimport)) Derived *ReticulateSplines();
external.cpp:
#include "external.h" // the version in which the attributes are dllexport
__attribute__((dllexport)) int Base::Helpful()
{
return 47;
}
__attribute__((dllexport)) Base::~Base()
{
}
__attribute__((dllexport)) int Derived::Helpful()
{
return 4449;
}
__attribute__((dllexport)) Derived *ReticulateSplines()
{
return new Derived(); // __vfptr = &__vft#Derived in external.so
}
external.so (not a real binary layout):
__vft#Base:
[offset to __type_info#Base] <-- in external.so
[offset to Base::~Base] <------- in external.so
[offset to Base::Helpful] <----- in external.so
__vft#Derived:
[offset to __type_info#Derived] <-- in external.so
[offset to Derived::~Derived] <---- in external.so
[offset to Derived::Helpful] <----- in external.so
Etc...
__type_info#Base:
[null base offset field]
[offset to mangled name]
__type_info#Derived:
[offset to __type_info#Base]
[offset to mangled name]
Etc...
An application using the external library
special.hpp:
#include <iostream>
#include "external.h"
class Special : public Base
{
public:
int Helpful() override
{
return 55;
}
virtual void NotHelpful()
{
throw std::exception{"derp"};
}
};
class MoreDerived : public Derived
{
public:
int Helpful() override
{
return 21;
}
~MoreDerived()
{
// Visible destructor logic here
this->__vfptr = &__vft#Derived; // <- the version in main.o
static_cast<Derived *>(this)->~Derived();
}
};
class Related : public Base
{
public:
virtual void AlsoHelpful() = 0;
};
class RelatedImpl : public Related
{
public:
void AlsoHelpful() override
{
using namespace std;
cout << "The time for action... Is now!" << endl;
}
};
main.cpp:
#include "special.hpp"
int main(int argc, char **argv)
{
Base *ptr = new Base(); // ptr->__vfptr = &__vft#Base (in external.so)
auto r = ptr->Helpful(); // calls "Base::Helpful" in external.so
// r = 47
delete ptr; // calls "Base::~Base" in external.so
ptr = new Derived(); // ptr->__vfptr = &__vft#Derived (in main.o)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in main.o
ptr = ReticulateSplines(); // ptr->__vfptr = &__vft#Derived (in external.so)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in external.so
ptr = new Special(); // ptr->__vfptr = &__vft#Special (in main.o)
r = ptr->Helpful(); // calls "Special::Helpful" in main.o
// r = 55
delete ptr; // calls "Base::~Base" in external.so
ptr = new MoreDerived(); // ptr->__vfptr = & __vft#MoreDerived (in main.o)
r = ptr->Helpful(); // calls "MoreDerived::Helpful" in main.o
// r = 21
delete ptr; // calls "MoreDerived::~MoreDerived" in main.o
return 0;
}
main.o:
__vft#Derived:
[offset to __type_info#Derivd] <-- in main.o
[offset to Derived::~Derived] <--- in main.o
[offset to Derived::Helpful] <---- stub that jumps to import table
__vft#Special:
[offset to __type_info#Special] <-- in main.o
[offset to Base::~Base] <---------- stub that jumps to import table
[offset to Special::Helpful] <----- in main.o
[offset to Special::NotHelpful] <-- in main.o
__vft#MoreDerived:
[offset to __type_info#MoreDerived] <---- in main.o
[offset to MoreDerived::~MoreDerived] <-- in main.o
[offset to MoreDerived::Helpful] <------- in main.o
__vft#Related:
[offset to __type_info#Related] <------ in main.o
[offset to Base::~Base] <-------------- stub that jumps to import table
[offset to Base::Helpful] <------------ stub that jumps to import table
[offset to Related::AlsoHelpful] <----- stub that throws PV exception
__vft#RelatedImpl:
[offset to __type_info#RelatedImpl] <--- in main.o
[offset to Base::~Base] <--------------- stub that jumps to import table
[offset to Base::Helpful] <------------- stub that jumps to import table
[offset to RelatedImpl::AlsoHelpful] <-- in main.o
Etc...
__type_info#Base:
[null base offset field]
[offset to mangled name]
__type_info#Derived:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#Special:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#MoreDerived:
[offset to __type_info#Derived]
[offset to mangled name]
__type_info#Related:
[offset to __type_info#Base]
[offset to mangled name]
__type_info#RelatedImpl:
[offset to __type_info#Related]
[offset to mangled name]
Etc...
Invocation is (or might not be) Magic!
Depending on the method and what can be proven at the binding side, a virtual method call may be bound statically or dynamically.
A dynamic virtual method call will read the target function's address from the vtable pointed to by a __vfptr member.
The ABI describes how functions are ordered in vtables. For example: They might be ordered by class, then lexicographically by mangled name (which includes information about const-ness, parameters, etc...). For single inheritance, this approach guarantees that a function's virtual dispatch index will always be the same, regardless of how many distinct implementations there are.
In the examples given here, destructors are placed at the beginning of each vtable, if applicable. If the destructor is trivial and non-virtual (not defined or does nothing), the compiler may elide it entirely, and not allocate a vtable entry for it.
Base *ptr = new Special{};
MoreDerived *md_ptr = new MoreDerived{};
// The cast below is checked statically, which would
// be a problem if "ptr" weren't pointing to a Special.
//
Special *sptr = static_cast<Special *>(ptr);
// In this case, it is possible to
// prove that "ptr" could point only to
// a Special, binding statically.
//
ptr->Helpful();
// Due to the cast above, a compiler might not
// care to prove that the pointed-to type
// cannot be anything but a Special.
//
// The call below might proceed as follows:
//
// reg = sptr->__vptr[__index_of#Base::Helpful] = &Special::Helpful in main.o
//
// push sptr
// call reg
// pop
//
// This will indirectly call Special::Helpful.
//
sptr->Helpful();
// No cast required: LSP is satisfied.
ptr = md_ptr;
// Once again:
//
// reg = ptr->__vfptr[__index_of#Base::Helpful] = &MoreDerived::Helpful in main.o
//
// push ptr
// call reg
// pop
//
// This will indirectly call MoreDerived::Helpful
//
ptr->Helpful();
The logic above is the same for any invocation site that requires dynamic binding. In the example above, it doesn't matter exactly what type ptr or sptr point to; the code will just load a pointer at a known offset, then blindly call it.
Type casting: Ups and Downs
All information about a type hierarchy must be available to the compiler when translating a cast or function call expression. Symbolically, casting is just a matter of traversing a directed graph.
Up-casting in this simple ABI can be performed entirely at compile time. The compiler needs only to examine the type hierarchy to determine if the source and target types are related (there is a path from the source to the target in the type graph). By the substitution principle, a pointer to a MoreDerived also points to a Base and can be interpreted as such. The __vfptr member is at the same offset for all types in this hierarchy, so RTTI logic doesn't need to handle any special cases (in certain implementations of VMI, it would need to grab another offset from a type thunk to grab another vptr and so on...).
Down-casting, however, is different. Since casting from a base type to a derived type involves determining if the pointed-to object has a compatible binary layout, it is necessary to perform an explicit type check (conceptually, this is "proving" that the extra information exists beyond the end of the structure assumed at compile time).
Note that there are multiple vtable instances for the Derived type: One in external.so and one in main.o. This is because a virtual method defined for Derived (its destructor) appears in every translation unit that includes external.h.
Even though the logic is identical in both cases, both images in this example need to have their own copy. This is why type checking cannot be performed using addresses alone.
A down-cast is then performed by walking a type graph (copied in both images) starting from the source type decoded at runtime, comparing mangled names until the compile-time target is matched.
For example:
Base *ptr = new MoreDerived();
// ptr->__vfptr = &__vft::MoreDerived in main.o
//
// This provides the code below with a starting point
// for dynamic cast graph traversals.
// All searches start with the type graph in the current image,
// then all other linked images, and so on...
// This example is not exhaustive!
// Starts by grabbing &__type_info#MoreDerived
// using the offset within __vft#MoreDerived resolved
// at load time.
//
// This is similar to a virtual method call: Just grab
// a pointer from a known offset within the table.
//
// Search path:
// __type_info#MoreDerived (match!)
//
auto *md_ptr = dynamic_cast<MoreDerived *>(ptr);
// Search path:
// __type_info#MoreDerived ->
// __type_info#Derived (match!)
//
auto *d_ptr = dynamic_cast<Derived *>(ptr);
// Search path:
// __type_info#MoreDerived ->
// __type_info#Derived ->
// __type_info#Base (no match)
//
// Did not find a path connecting RelatedImpl to MoreDerived.
//
// rptr will be nullptr
//
auto *rptr = dynamic_cast<RelatedImpl *>(ptr);
At no point in the code above did ptr->__vfptr need to change. The static nature of type deduction in C++ requires the implementation to satisfy the substitution principle at compile time, meaning that the actual type of an object cannot change at runtime.
Summary
I've understood this question as one about the mechanisms behind dynamic dispatch.
To me, "Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?", is asking how a vtable works.
This answer is intended to demonstrate that type casting affects only the view of an object's data, and that the implementation of dynamic dispatch in these examples operate independently of it. However, type casting does affect dynamic dispatch in the case of multiple inheritance, where determining which vtable to use may require multiple steps (an instance of a type with multiple bases may have multiple vptrs).
casting
casting is a concept associated with variable. So any variable can be casted. It can be casted up or down.
char charVariable = 'A';
int intVariable = charVariable; // upcasting
int intVariable = 20;
char charVariale = intVariable; // downcasting
for system defined data type Up cast or downcast is based on your current variable and it mainly related to how much memory compiler is allocating to both compared variable.
If you are assigning a variable which is allocating less memory than the type what is converting to, is called up cast.
If you are assigning a variable which is allocating more memory than the type what is converting to, is called down cast.
Down cast create some problem when the value is trying to cast can't fit in to that allocated memory area.
Upcasting in Class level
Just like system defined data type we can have object of base class and derived class. So if we want to convert derived type to base type , it is known as down upcasting. That can be achieved by pointer of a base class pointing to a derived class type.
class Base{
public:
void display(){
cout<<"Inside Base::display()"<<endl;
}
};
class Derived:public Base{
public:
void display(){
cout<<"Inside Derived::display()"<<endl;
}
};
int main(){
Base *baseTypePointer = new Derived(); // Upcasting
baseTypePointer.display(); // because we have upcasted we want the out put as Derived::display() as output
}
output
Inside Base::display()
Excepted
Inside Derived::display()
In the above scenario the output wasn't as excepted. Its because we don't have the v-table and vptr (virtual pointer) in the object the base pointer will call the Base::display() though we have assigned derived type to the base pointer.
To avoid this problem c++ gives us virtual concept. Now the base class display function need to be changed to a virtual type.
virtual void display()
full code is:
class Base{
public:
virtual void display(){
cout<<"Inside Base::display()"<<endl;
}
};
class Derived:public Base{
public:
void display(){
cout<<"Inside Derived::display()"<<endl;
}
};
int main(){
Base *baseTypePointer = new Derived(); // Upcasting
baseTypePointer.display(); // because we have upcasted we want the out put as Derived::display() as output
}
output
Inside Derived::display()
Excepted
Inside Derived::display()
To understand this we need to understand v-table and vptr;
when ever compiler find a virtual along with a function it will generate a virtual table for each of the classes (both Base and all the derived classes).
If virtual function is present than every object will be containing vptr (virtual pointer) pointing to the respective class vtable and vtable will contain the pointer to the respective class virtual function. when you will call the function throught vptr the virutal function will get called and it will invoke the respective class function and we will achieve the required output.
I believe, this is best explained by implementing polymorphism in C. Given these two C++ classes:
class Foo {
virtual void foo(int);
};
class Bar : public Foo {
virtual void foo(int);
virtual void bar(double);
};
the C structure definitions (i. e. the header file) would look like this:
//For class Foo
typedef struct Foo_vtable {
void (*foo)(int);
} Foo_vtable;
typedef struct Foo {
Foo_vtable* vtable;
} Foo;
//For class Bar
typedef struct Bar_vtable {
Foo_vtable super;
void (*bar)(double);
}
typedef struct Bar {
Foo super;
} Bar;
As you see, there are two structure definitions for each class, one for the vtable and one for the class itself. Note also that both structures for class Bar include a base class object as their first member which allows us upcasting: both (Foo*)myBarPointer and (Foo_vtable*)myBar_vtablePointer are valid. As such, given a Foo*, it is safe to find the location of the foo() member by doing
Foo* basePointer = ...;
(basePointer->vtable->foo)(7);
Now, lets take a look at how we can actually fill the vtables. For that we write some constructors that use some statically defined vtable instances, this is what the foo.c file could look like
#include "..."
static void foo(int) {
printf("Foo::foo() called\n");
}
Foo_vtable vtable = {
.foo = &foo,
};
void Foo_construct(Foo* me) {
me->vtable = vtable;
}
This makes sure that it is possible to execute (basePointer->vtable->foo)(7) on every object that has been passed to Foo_construct(). Now, the code for Bar is quite similar:
#include "..."
static void foo(int) {
printf("Bar::foo() called\n");
}
static void bar(double) {
printf("Bar::bar() called\n");
}
Bar_vtable vtable = {
.super = {
.foo = &foo
},
.bar = &bar
};
void Bar_construct(Bar* me) {
Foo_construct(&me->super); //construct the base class.
(me->vtable->foo)(7); //This will print Foo::foo()
me->vtable = vtable;
(me->vtable->foo)(7); //This will print Bar::foo()
}
I have used static declarations for the member functions to avoid having to invent a new name for each implementation, static void foo(int) restricts the visibility of the function to the source file. However, it can still be called from other files by the use of a function pointer.
Usage of these classes could look like this:
#include "..."
int main() {
//First construct two objects.
Foo myFoo;
Foo_construct(&myFoo);
Bar myBar;
Bar_construct(&myBar);
//Now make some pointers.
Foo* pointer1 = &myFoo, pointer2 = (Foo*)&myBar;
Bar* pointer3 = &myBar;
//And the calls:
(pointer1->vtable->foo)(7); //prints Foo::foo()
(pointer2->vtable->foo)(7); //prints Bar::foo()
(pointer3->vtable->foo)(7); //prints Bar::foo()
(pointer3->vtable->bar)(7.0); //prints Bar::bar()
}
Once you know how this works, you know how C++ vtables work. The only difference is that in C++ the compiler does the work that I did myself in the code above.
Let me try to explain it with some examples:-
class Base
{
public:
virtual void function1() {cout<<"Base :: function1()\n";};
virtual void function2() {cout<<"Base :: function2()\n";};
virtual ~Base(){};
};
class D1: public Base
{
public:
~D1(){};
virtual void function1() { cout<<"D1 :: function1()\n";};
};
class D2: public Base
{
public:
~D2(){};
virtual void function2() { cout<< "D2 :: function2\n";};
};
So, compiler would generate three vtables one for each class as these classes have virtual functions. ( Although it's compiler-dependant ).
NOTE:- vtables contain only pointers to virtual functions. Non-virtual functions would still be resolved at compile time...
You are right in saying that vtables are nothing just pointers to functions. vtables for these classes would be like something:-
vtable for Base:-
&Base::function1 ();
&Base::function2 ();
&Base::~Base ();
vtable for D1:-
&D1::function1 ();
&Base::function2 ();
&D1::~D1();
vtable for D2:-
&Base::function1 ();
&D2::function2 ();
&D2::~D2 ();
vptr is a pointer which is used for look-up purpose on this table. Each object of polymorphic class has extra allocated space for vptr in it ( Although where vptr would be in object is entirely implementation dependant ).Generally vptr is at the beginning of object.
With taking all into account , if I make a call to func, compiler at run time would check what b is actually pointing to:-
void func ( Base* b )
{
b->function1 ();
b->function2 ();
}
Let's say we have object of D1 passed to func. Compiler would resolve calls in following manner:-
First it would fetch vptr from object and then it will use it to get correct address of function to call. SO, in this case vptr would give access to D1's vtable. and when it looksup for function1 it would get the address of function1 defined in base class. In case of call to function2, it would get address of base's function2.
Hope I have clarified your doubts to your satisfaction...
The implementation is compiler specific. Here I am going to do some thoughts that have NOTHING TO DO WITH ANY ACTUAL KNOWLEDGE of how exactly it is done in compilers, but just with some minimal requirements that are needed in order to work as required. Keep in mind that each instance of a class with virtual methods knows at run time which is the class it belongs too.
Lets suppose we have a chain of base and derived classes with a length of 10 ( so a derived class has a gran gran ... gran father ).
We may call these classes base0 base1 ... base9 where base9 derives from base8 etc.
Each of these classes define a method as: virtual void doit(){ ... }
Let's suppose that in the base class we use that method inside a method called "dowith_doit" non overridden in any derived class.
The semantics of c++ imply that depending on the base class of the instance we have at hand, we must apply to that instance the "doit" defined in the base class of the instance at hand.
Essentially we have two possible ways of doing it:
a) Assign to any such virtual method a number that must be different for each method defined in the chain of derived classes. In that case the number could be also a hash of the name of the method.
Each class defines a table with 2 columns were the first column holds the number of the method and the second column the address of the function. In that case each class will have a vtable with so many rows as the number of virtual methods defined inside the class.
The execution of the method happens by searching inside the class the method under consideration. That search may be done linearly ( slow ) of by bisections ( when there is an order based on the number of the method).
b) Assign to any such method a progressively increasing integer number (for each different method in the chain of classes), and for each class define a table with only one column. For virtual methods defined inside the class the function address will be in the raw defined by the number of the method. There will be many rows with null pointers because each class doesn't override always the methods of previous classes.
The implementation may choose in order to improve efficiency to fill null rows with the address hold in the ancestor class of the class under consideration.
Essentially no other simple ways exist in order work with virtual methods efficiently.
I suppose that only the second solution (b) is used in actual implementations, because the trade of between space overhead used for non existing methods compared to execution efficiency of case (b) is favorable for case b (taking into consideration too that methods are limited in number - may be 10 20 50 but not 5000 ).
Upon instantiation every class with at least one virtual function gets a hidden member usually called vTable (or virtual dispatch table, VDT).
class Base {
hidden: // not part of the language, just to illustrate.
static VDT baseVDT; // per class VDT for base
VDT *vTable; // per object instance
private:
...
public:
virtual int base1();
virtual int base2();
...
};
The vTable contains pointers to all functions in Base.
As a hidden part of Base's constructor vTable gets assigned to baseVDT.
VDT Base::baseVDT[] = {
Base::base1,
Base::base2
};
class Derived : public Base {
hidden:
static VDT derivedVDT; // per class VDT for derived
private:
...
public:
virtual int base2();
...
};
The vTable for Derived contains pointers to all functions defined in Base followed by functions defined in Derived . When objects of type Derived gets constructed, vTable gets set to derivedVDT.
VDT derived::derivedVDT[] = {
// functions first defined in Base
Base::base1,
Derived::base2, // override
// functions first defined in Derived are appended
Derived::derived3
}; // function 2 has an override in derived.
Now if we have
Base *bd = new Derived;
Derived *dd = new Derived;
Base *bb = new Base;
bd points to an object of type derived who's vTable points to Derived
So the function calls
x = bd->base2();
y = bb->base2();
actually is
// "base2" here is the index into vTable for base2.
x = bd->vTable["base2"](); // vTable points to derivedVDT
y = bb->vTable["base2"](); // vTable points to baseVDT
The index is the same in both due to the construction of the VDT. This also means the compiler knows the index at the moment of compilation.
This could also be implemented as
// call absolute address to virtual dispatch function which calls the right base2.
x = Base::base2Dispatch(bd->vTable["base2"]);
inline Base::base2Dispatch(void *call) {
return call(); // call through function pointer.
}
Which with O2 or O3 will be the same.
There are some special cases:
dd points to a derived or more derived object and base2 is declared final then
z = dd->base2();
actually is
z = Derived::base2(); // absolute call to final method.
If dd pointed to a Base object or anything else your in undefined behaviour land and the compiler can still do this.
The other case is if the compiler sees there are only a few derived classes from Base it could generate a Oracle interface for base2. [free after a MS or Intel compiler guy at some C++ conference in 2012 or 2013? showing that (~500%?) more code gives (2+ times?) speedup on average]
inline Base::base2Dispatch(void *call) {
if (call == Derived::base2) // most likely from compilers static analysis or profiling.
return Derived::base2(); // call absolute address
if (call == Base::base2)
return Base::base2(); // call absolute address
// Backup catch all solution in case of more derived classes
return call(); // call through function pointer.
}
Why on earth do you want to do this as a compiler??? more code is bad, unneeded branches diminish performance!
Because calling a function pointer is very slow on many architectures, optimistic example
Get the address from memory, 3+ cycles.
Delayed pipeline while waiting for ip value, 10 cycles, on some processors 19+ cycles.
If the most complex modern cpu's can predict the actual jump address [BTB] as well as it does branch prediction, this might be a loss. Else the ~8 extra instructions will easily save the 4*(3+10) instructions lost due to pipeline stalls (if the prediction failure rate is less than 10-20%).
If the branches in the two if's both predict taken (ie evaluate to false) the ~2 cycles lost is nicely covered by the memory latency to get the call address and we are no worse off.
If one of the if's are mispredicts the the BTB will most likely also be wrong. Then the cost of the mispredicts is around 8 cycles of which 3 are paid by the memory latency, and the correct not take or the 2nd if might save the day or we pay the full 10+ pipeline stall.
If only the 2 possibilities exists one of them will be taken and we save the pipeline stall from the function pointer call and we will max. get one mispredict resulting in no (significant) worse performance than calling directly.
If the memory delay is longer and the result is correctly predicted the effect is much larger.

Clarification Needed on C++ Virtual Call Implementation

I have some doubts regarding virtual function or better we can say Run Time Polymorphism. According to me, I assumed the way it works as below,
A Virtual Table (V-Table) will be created for every class that has at least one virtual member function. I believe this is static table and so it is created for every class and not for every object. Please correct me in this if I am wrong here.
This V-Table has the address of the virtual function. If the class has 4 virtual functions, then this table has 4 entries pointing to the corresponding 4 functions.
Compiler will add a virtual pointer (V-Ptr) as a hidden member of the class. This virtual pointer will point to the starting address in the virtual table.
Assume I have program like this,
class Base
{
virtual void F1();
virtual void F2();
virtual void F3();
virtual void F4();
}
class Der1 : public Base //Overrides only first 2 functions of Base class
{
void F1(); //Overrides Base::F1()
void F2(); //Overrides Base::F2()
}
class Der2 : public Base //Overrides remaining functions of Base class
{
void F3(); //Overrides Base::F3()
void F4(); //Overrides Base::F4()
}
int main()
{
Base* p1 = new Der1; //Believe Vtable will populated in compile time itself
Base* p2 = new Der2;
p1->F1(); //how does it call Der1::F1()
p2->F3(); //how does it call Base::F3();
}
If the V-Table gets populated in compile time, why do call it as Run Time Polymorphism ?. Please explain me how many vtables and vptr and how it works using the above example. According to me 3 Vtables will be there for Base, Der1 and Der2 class. In Der1 Vtable,it has address of F1() and F2() of its own, whereas for F3() and F4() the address will point to Base class. Also 3 Vptr will be added as hidden member in Base, Der1 and Der2 class. If everything is decided at compile time, What happens exactly during the run time ?. Please correct me if I am wrong in the concept.
It's obviously implementation defined, but most implementations
are fairly similar, more or less along the lines you describe.
This is correct.
vtables contain more than just pointers to functions.
There's usually an entry pointing to the RTTI information, and
often some information concerning how to fix up the this pointer
when calling the function (although this can also be done using
trampolines). In the case of virtual bases, there could also be
an offset to the virtual base.
This is also correct. Note that during construction and
destruction, the compiler will change the vptr as the dynamic
type of the object changes, and that in the case of multiple
inheritance (with or without virtual bases), there will be more
than one vptr. (The vptr is at a fixed offset with
respect to the base address of the class, and in the case of
multiple inheritance, not all classes can have the same base
address.)
As to your final remarks: the vtables are populated at compile
time, and are static. But the vptr's are set at runtime,
according to the dynamic type, and the function call uses it to
find the vtable and dispatch the call.
In your (very simple) example, there are three vtable, one for
each class. Because only simple inheritance is involved, there
is only one vptr per instance, shared between Base and the
derived class. The vtable for Base will contain four slots,
pointing to Base::f1, Base::f2, Base::f3 and Base::f4.
The vtable for Der1 will also contain four slots, pointing to
Der1::f1, Der1::f2, Base::f3 and Base::f4. The vtable
for Der2 will point to Base::f1, Base::f2, Der2::f3 and
Der2::f4. The constructor for Base will set the vptr to the
table of Base; the constructor for the derived classes will
first call the constructor for the base class, then set the vptr
to the vtable corresponding to its type. (In practice, in such
simple cases, the compiler is probably capable of determining
that the vptr is never used in the constructor to Base, and so
skip setting it. In more complicated cases, where the compiler
cannot see all of the behavior of the base class constructor,
however, this is not the case.)
As to why it is called runtime polymorphism, consider
a function:
void f(Base* p)
{
p->f1();
}
The function actually called will be different, depending on
whether p points to a Der1 or a Der2. In other words, it
will be determined at runtime.
The C++ standard doesn't specify how virtual function calls have to be implemented, but here's a simplified example of the approach that is universally accepted.
From a high-level perspective, the v-tables would look like this:
Base:
Index | Function Address
------|------------------
0 | Base::F1
1 | Base::F2
2 | Base::F3
3 | Base::F4
Der1:
Index | Function Address
------|------------------
0 | Der1::F1
1 | Der1::F2
2 | Base::F3
3 | Base::F4
Der2:
Index | Function Address
------|------------------
0 | Base::F1
1 | Base::F2
2 | Der2::F3
3 | Der2::F4
When you create p1 and p2, they get a pointer that points to Der1's vtable and Der2's vtable, respectively.
The call to p1->F1 basically means "call function 0 on p1's virtual table".
vptr[0] is Der1::F1, so it gets called.
It's called run-time polymorphism because the function that will be called for a specific object is determined at run-time (by making a look-up in the object's vtable).
It's implementation defined. When programming in C++, the only thing that should concern you is that if you declare a method virtual, the run-time contents of the object behind the pointer or reference will decide what code will be called.
Perhaps you should read about that topic first. Here is the C++ specific stuff.
I'm not going to go through four virtual functions and three derived types. Suffice it to say: for the ultimate base class, the vtable has pointers that point to the base class' version of all the virtual functions. For derived classes, the vtable has pointers to all of the derived class's virtual functions; when the derived class overrides a base class function, the function pointer for that function points to the derived class' version of that virtual function; when the derived class inherits a virtual function, the function pointer points to the inherited function.

When is a vtable created in C++?

When exactly does the compiler create a virtual function table?
1) when the class contains at least one virtual function.
OR
2) when the immediate base class contains at least one virtual function.
OR
3) when any parent class at any level of the hierarchy contains at least one virtual function.
A related question to this:
Is it possible to give up dynamic dispatch in a C++ hierarchy?
e.g. consider the following example.
#include <iostream>
using namespace std;
class A {
public:
virtual void f();
};
class B: public A {
public:
void f();
};
class C: public B {
public:
void f();
};
Which classes will contain a V-Table?
Since B does not declare f() as virtual, does class C get dynamic polymorphism?
Beyond "vtables are implementation-specific" (which they are), if a vtable is used: there will be unique vtables for each of your classes. Even though B::f and C::f are not declared virtual, because there is a matching signature on a virtual method from a base class (A in your code), B::f and C::f are both implicitly virtual. Because each class has at least one unique virtual method (B::f overrides A::f for B instances and C::f similarly for C instances), you need three vtables.
You generally shouldn't worry about such details. What matters is whether you have virtual dispatch or not. You don't have to use virtual dispatch, by explicitly specifying which function to call, but this is generally only useful when implementing a virtual method (such as to call the base's method). Example:
struct B {
virtual void f() {}
virtual void g() {}
};
struct D : B {
virtual void f() { // would be implicitly virtual even if not declared virtual
B::f();
// do D-specific stuff
}
virtual void g() {}
};
int main() {
{
B b; b.g(); b.B::g(); // both call B::g
}
{
D d;
B& b = d;
b.g(); // calls D::g
b.B::g(); // calls B::g
b.D::g(); // not allowed
d.D::g(); // calls D::g
void (B::*p)() = &B::g;
(b.*p)(); // calls D::g
// calls through a function pointer always use virtual dispatch
// (if the pointed-to function is virtual)
}
return 0;
}
Some concrete rules that may help; but don't quote me on these, I've likely missed some edge cases:
If a class has virtual methods or virtual bases, even if inherited, then instances must have a vtable pointer.
If a class declares non-inherited virtual methods (such as when it doesn't have a base class), then it must have its own vtable.
If a class has a different set of overriding methods than its first base class, then it must have its own vtable, and cannot reuse the base's. (Destructors commonly require this.)
If a class has multiple base classes, with the second or later base having virtual methods:
If no earlier bases have virtual methods and the Empty Base Optimization was applied to all earlier bases, then treat this base as the first base class.
Otherwise, the class must have its own vtable.
If a class has any virtual base classes, it must have its own vtable.
Remember that a vtable is similar to a static data member of a class, and instances have only pointers to these.
Also see the comprehensive article C++: Under the Hood (March 1994) by Jan Gray. (Try Google if that link dies.)
Example of reusing a vtable:
struct B {
virtual void f();
};
struct D : B {
// does not override B::f
// does not have other virtuals of its own
void g(); // still might have its own non-virtuals
int n; // and data members
};
In particular, notice B's dtor isn't virtual (and this is likely a mistake in real code), but in this example, D instances will point to the same vtable as B instances.
The answer is, 'it depends'. It depends on what you mean by 'contain a vtbl' and it depends on the decisions made by the implementor of the particular compiler.
Strictly speaking, no 'class' ever contains a virtual function table. Some instances of some classes contain pointers to virtual function tables. However, that's just one possible implementation of the semantics.
In the extreme, a compiler could hypothetically put a unique number into the instance that indexed into a data structure used for selecting the appropriate virtual function instance.
If you ask, 'What does GCC do?' or 'What does Visual C++ do?' then you could get a concrete answer.
#Hassan Syed's answer is probably closer to what you were asking about, but it is really important to keep the concepts straight here.
There is behavior (dynamic dispatch based on what class was new'ed) and there's implementation. Your question used implementation terminology, though I suspect you were looking for a behavioral answer.
The behavioral answer is this: any class that declares or inherits a virtual function will exhibit dynamic behavior on calls to that function. Any class that does not, will not.
Implementation-wise, the compiler is allowed to do whatever it wants to accomplish that result.
Answer
a vtable is created when a class declaration contains a virtual function. A vtable is introduced when a parent -- anywhere in the heirarchy -- has a virtual function, lets call this parent Y. Any parent of Y WILL NOT have a vtable (unless they have a virtual for some other function in their heirarchy).
Read on for discussion and tests
-- explanation --
When you specify a member function as virtual, there is a chance that you may try to use sub-classes via a base-class polymorphically at run-time. To maintain c++'s guarantee of performance over language design they offered the lightest possible implementation strategy -- i.e., one level of indirection, and only when a class might be used polymorphically at runtime, and the programmer specifies this by setting at least one function to be virtual.
You do not incur the cost of the vtable if you avoid the virtual keyword.
-- edit : to reflect your edit --
Only when a base class contains a virtual function do any other sub-classes contain a vtable. The parents of said base class do not have a vtable.
In your example all three classes will have a vtable, this is because you can try to use all three classes via an A*.
--test - GCC 4+ --
#include <iostream>
class test_base
{
public:
void x(){std::cout << "test_base" << "\n"; };
};
class test_sub : public test_base
{
public:
virtual void x(){std::cout << "test_sub" << "\n"; } ;
};
class test_subby : public test_sub
{
public:
void x() { std::cout << "test_subby" << "\n"; }
};
int main()
{
test_sub sub;
test_base base;
test_subby subby;
test_sub * psub;
test_base *pbase;
test_subby * psubby;
pbase = ⊂
pbase->x();
psub = &subby;
psub->x();
return 0;
}
output
test_base
test_subby
test_base does not have a virtual table therefore anything casted to it will use the x() from test_base. test_sub on the other hand changes the nature of x() and its pointer will indirect through a vtable, and this is shown by test_subby's x() being executed.
So, a vtable is only introduced in the hierarchy when the keyword virtual is used. Older ancestors do not have a vtable, and if a downcast occurs it will be hardwired to the ancestors functions.
You made an effort to make your question very clear and precise, but there's still a bit of information missing. You probably know, that in implementations that use V-Table, the table itself is normally an independent data structure, stored outside the polymorphic objects, while objects themselves only store a implicit pointer to the table. So, what is it you are asking about? Could be:
When does an object get an implicit pointer to V-Table inserted into it?
or
When is a dedicated, individual V-Table created for a given type in the hierarchy?
The answer to the first question is: an object gets an implicit pointer to V-Table inserted into it when the object is of polymorphic class type. The class type is polymorphic if it contains at least one virtual function, or any of its direct or indirect parents are polymorphic (this is answer 3 from your set). Note also, that in case of multiple inheritance, an object might (and will) end up containing multiple V-Table pointers embedded into it.
The answer to the second question could be the same as to the first (option 3), with a possible exception. If some polymorphic class in single inheritance hierarchy has no virtual functions of its own (no new virtual functions, no overrides for parent virtual function), it is possible that implementation might decide not to create an individual V-Table for this class, but instead use it's immediate parent's V-Table for this class as well (since it is going to be the same anyway). I.e. in this case both objects of parent type and objects of derived type will store the same value in their embedded V-Table pointers. This is, of course, highly dependent on implementation. I checked GCC and MS VS 2005 and they don't act that way. They both do create an individual V-Table for the derived class in this situation, but I seem to recall hearing about implementations that don't.
C++ standards doesn't mandate using V-Tables to create the illusion of polymorphic classes. Most of the time implementations use V-Tables, to store the extra information needed. In short, these extra pieces of information are equipped when you have at least one virtual function.
The behavior is defined in chapter 10.3, paragraph 2 of the C++ language specification:
If a virtual member function vf is
declared in a class Base and in a
class Derived, derived directly or
indirectly from Base, a member
function vf with the same name and
same parameter list as Base::vf is
declared, then Derived::vf is also
virtual ( whether or not it is so
declared ) and it overrides Base::vf.
A italicized the relevant phrase. Thus, if your compiler creates v-tables in the usual sense then all classes will have a v-table since all their f() methods are virtual.