When exactly does the compiler create a virtual function table?
1) when the class contains at least one virtual function.
OR
2) when the immediate base class contains at least one virtual function.
OR
3) when any parent class at any level of the hierarchy contains at least one virtual function.
A related question to this:
Is it possible to give up dynamic dispatch in a C++ hierarchy?
e.g. consider the following example.
#include <iostream>
using namespace std;
class A {
public:
virtual void f();
};
class B: public A {
public:
void f();
};
class C: public B {
public:
void f();
};
Which classes will contain a V-Table?
Since B does not declare f() as virtual, does class C get dynamic polymorphism?
Beyond "vtables are implementation-specific" (which they are), if a vtable is used: there will be unique vtables for each of your classes. Even though B::f and C::f are not declared virtual, because there is a matching signature on a virtual method from a base class (A in your code), B::f and C::f are both implicitly virtual. Because each class has at least one unique virtual method (B::f overrides A::f for B instances and C::f similarly for C instances), you need three vtables.
You generally shouldn't worry about such details. What matters is whether you have virtual dispatch or not. You don't have to use virtual dispatch, by explicitly specifying which function to call, but this is generally only useful when implementing a virtual method (such as to call the base's method). Example:
struct B {
virtual void f() {}
virtual void g() {}
};
struct D : B {
virtual void f() { // would be implicitly virtual even if not declared virtual
B::f();
// do D-specific stuff
}
virtual void g() {}
};
int main() {
{
B b; b.g(); b.B::g(); // both call B::g
}
{
D d;
B& b = d;
b.g(); // calls D::g
b.B::g(); // calls B::g
b.D::g(); // not allowed
d.D::g(); // calls D::g
void (B::*p)() = &B::g;
(b.*p)(); // calls D::g
// calls through a function pointer always use virtual dispatch
// (if the pointed-to function is virtual)
}
return 0;
}
Some concrete rules that may help; but don't quote me on these, I've likely missed some edge cases:
If a class has virtual methods or virtual bases, even if inherited, then instances must have a vtable pointer.
If a class declares non-inherited virtual methods (such as when it doesn't have a base class), then it must have its own vtable.
If a class has a different set of overriding methods than its first base class, then it must have its own vtable, and cannot reuse the base's. (Destructors commonly require this.)
If a class has multiple base classes, with the second or later base having virtual methods:
If no earlier bases have virtual methods and the Empty Base Optimization was applied to all earlier bases, then treat this base as the first base class.
Otherwise, the class must have its own vtable.
If a class has any virtual base classes, it must have its own vtable.
Remember that a vtable is similar to a static data member of a class, and instances have only pointers to these.
Also see the comprehensive article C++: Under the Hood (March 1994) by Jan Gray. (Try Google if that link dies.)
Example of reusing a vtable:
struct B {
virtual void f();
};
struct D : B {
// does not override B::f
// does not have other virtuals of its own
void g(); // still might have its own non-virtuals
int n; // and data members
};
In particular, notice B's dtor isn't virtual (and this is likely a mistake in real code), but in this example, D instances will point to the same vtable as B instances.
The answer is, 'it depends'. It depends on what you mean by 'contain a vtbl' and it depends on the decisions made by the implementor of the particular compiler.
Strictly speaking, no 'class' ever contains a virtual function table. Some instances of some classes contain pointers to virtual function tables. However, that's just one possible implementation of the semantics.
In the extreme, a compiler could hypothetically put a unique number into the instance that indexed into a data structure used for selecting the appropriate virtual function instance.
If you ask, 'What does GCC do?' or 'What does Visual C++ do?' then you could get a concrete answer.
#Hassan Syed's answer is probably closer to what you were asking about, but it is really important to keep the concepts straight here.
There is behavior (dynamic dispatch based on what class was new'ed) and there's implementation. Your question used implementation terminology, though I suspect you were looking for a behavioral answer.
The behavioral answer is this: any class that declares or inherits a virtual function will exhibit dynamic behavior on calls to that function. Any class that does not, will not.
Implementation-wise, the compiler is allowed to do whatever it wants to accomplish that result.
Answer
a vtable is created when a class declaration contains a virtual function. A vtable is introduced when a parent -- anywhere in the heirarchy -- has a virtual function, lets call this parent Y. Any parent of Y WILL NOT have a vtable (unless they have a virtual for some other function in their heirarchy).
Read on for discussion and tests
-- explanation --
When you specify a member function as virtual, there is a chance that you may try to use sub-classes via a base-class polymorphically at run-time. To maintain c++'s guarantee of performance over language design they offered the lightest possible implementation strategy -- i.e., one level of indirection, and only when a class might be used polymorphically at runtime, and the programmer specifies this by setting at least one function to be virtual.
You do not incur the cost of the vtable if you avoid the virtual keyword.
-- edit : to reflect your edit --
Only when a base class contains a virtual function do any other sub-classes contain a vtable. The parents of said base class do not have a vtable.
In your example all three classes will have a vtable, this is because you can try to use all three classes via an A*.
--test - GCC 4+ --
#include <iostream>
class test_base
{
public:
void x(){std::cout << "test_base" << "\n"; };
};
class test_sub : public test_base
{
public:
virtual void x(){std::cout << "test_sub" << "\n"; } ;
};
class test_subby : public test_sub
{
public:
void x() { std::cout << "test_subby" << "\n"; }
};
int main()
{
test_sub sub;
test_base base;
test_subby subby;
test_sub * psub;
test_base *pbase;
test_subby * psubby;
pbase = ⊂
pbase->x();
psub = &subby;
psub->x();
return 0;
}
output
test_base
test_subby
test_base does not have a virtual table therefore anything casted to it will use the x() from test_base. test_sub on the other hand changes the nature of x() and its pointer will indirect through a vtable, and this is shown by test_subby's x() being executed.
So, a vtable is only introduced in the hierarchy when the keyword virtual is used. Older ancestors do not have a vtable, and if a downcast occurs it will be hardwired to the ancestors functions.
You made an effort to make your question very clear and precise, but there's still a bit of information missing. You probably know, that in implementations that use V-Table, the table itself is normally an independent data structure, stored outside the polymorphic objects, while objects themselves only store a implicit pointer to the table. So, what is it you are asking about? Could be:
When does an object get an implicit pointer to V-Table inserted into it?
or
When is a dedicated, individual V-Table created for a given type in the hierarchy?
The answer to the first question is: an object gets an implicit pointer to V-Table inserted into it when the object is of polymorphic class type. The class type is polymorphic if it contains at least one virtual function, or any of its direct or indirect parents are polymorphic (this is answer 3 from your set). Note also, that in case of multiple inheritance, an object might (and will) end up containing multiple V-Table pointers embedded into it.
The answer to the second question could be the same as to the first (option 3), with a possible exception. If some polymorphic class in single inheritance hierarchy has no virtual functions of its own (no new virtual functions, no overrides for parent virtual function), it is possible that implementation might decide not to create an individual V-Table for this class, but instead use it's immediate parent's V-Table for this class as well (since it is going to be the same anyway). I.e. in this case both objects of parent type and objects of derived type will store the same value in their embedded V-Table pointers. This is, of course, highly dependent on implementation. I checked GCC and MS VS 2005 and they don't act that way. They both do create an individual V-Table for the derived class in this situation, but I seem to recall hearing about implementations that don't.
C++ standards doesn't mandate using V-Tables to create the illusion of polymorphic classes. Most of the time implementations use V-Tables, to store the extra information needed. In short, these extra pieces of information are equipped when you have at least one virtual function.
The behavior is defined in chapter 10.3, paragraph 2 of the C++ language specification:
If a virtual member function vf is
declared in a class Base and in a
class Derived, derived directly or
indirectly from Base, a member
function vf with the same name and
same parameter list as Base::vf is
declared, then Derived::vf is also
virtual ( whether or not it is so
declared ) and it overrides Base::vf.
A italicized the relevant phrase. Thus, if your compiler creates v-tables in the usual sense then all classes will have a v-table since all their f() methods are virtual.
Related
I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?
I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.
Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.
The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.
Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).
Inherently primary base class: the simplest case of polymorphic inheritance
The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.
These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)
Considering the case where the return type isn't covariant; or:
Trivial covariance
An example: the case where it's covariant with the same type as *this; as in:
struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance
Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.
Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.
Conclusion
In these cases the type of the declaration of the overriding function is trivially different from the type of the base:
all parameters are almost the same (with only a trivial difference on the type of this)
the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)
(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically
So no vtable entry is added for the derived declaration.
(So making the class final wouldn't be vtable simplification.)
Never primary base
Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.
(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.
The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:
struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };
Each base has its own vptr scalar member, and these vptr have different purposes:
B1::vptr points to a B1_vtable structure
B2::vptr points to a B2_vtable structure
and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because
The vtables have distinct entries:
B1_vtable.f_ptr points to the final overrider for B1::f()
B2_vtable.f_ptr points to the final overrider for B2::f()
B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)
The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)
(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)
(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".
(***) That they sometimes have the same final overrider in a further derived class:
struct DD : D { void f(); }
isn't useful for the purpose of defining the ABI of D.
So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.
So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.
So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:
D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).
the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.
Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.
Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.
(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used
Not inherently primary base
Regular (non virtual) inheritance is simple like membership:
a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);
the placement of a non virtual base is fixed;
base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).
A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.
That offset can never be known because C++ supports both unifying and duplicating inheritance:
virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;
non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):
struct B { virtual void f(); };
struct D1 : B { virtual void f(); }; // final overrider
struct D2 : B { virtual void f(); }; // final overrider
struct DD : D1, D2 { };
Here DD has two distinct final overriders of B::f():
DD::D1::f() is final overrider for DD::D1::B::f()
DD::D2::f() is final overrider for DD::D2::B::f()
in two distinct vtable entries.
Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).
Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:
struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };
There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.
So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).
In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.
This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.
(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members
So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:
the offset at which they reside will only be determined in the most derived class;
the offset might or might not be zero;
a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;
a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).
This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.
A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...
...a non-morally virtual base is inherently reversible in every case:
there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);
the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;
(in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:
static_cast<D*>((B*)d) is ill formed;
dynamic_cast<D*>((B*)d) will work for the simple cases.
So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:
struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary
struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};
Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.
When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.
Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:
there are additional guarantees/invariants on that scalar member, and the vtable is extended;
no new vptr is needed for Da.
So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():
in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.
in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.
Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.
Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.
[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:
struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};
Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.
The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.
--end example]
But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.
Possible extension of final
There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?
First we need to define what is overriding a virtual base inheritance:
An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:
an almost direct subobject can be initialized like a direct subobject;
access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).
Virtual inheritance provides almost direct access:
constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;
when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.
A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};
Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:
struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};
Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.
struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};
The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:
struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};
Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.
-- end analog/example]
That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).
Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.
I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:
// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };
// in car.cpp
// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }
If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.
I'm not 100% sure about this though, some of it is speculation.
If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:
class A {
virtual void f();
};
class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};
The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand
With the struct definition given below...
struct A {
virtual void hello() = 0;
};
Approach #1:
struct B : public A {
virtual void hello() { ... }
};
Approach #2:
struct B : public A {
void hello() { ... }
};
Is there any difference between these two ways to override the hello function?
They are exactly the same. There is no difference between them other than that the first approach requires more typing and is potentially clearer.
The 'virtualness' of a function is propagated implicitly, however at least one compiler I use will generate a warning if the virtual keyword is not used explicitly, so you may want to use it if only to keep the compiler quiet.
From a purely stylistic point-of-view, including the virtual keyword clearly 'advertises' the fact to the user that the function is virtual. This will be important to anyone further sub-classing B without having to check A's definition. For deep class hierarchies, this becomes especially important.
The virtual keyword is not necessary in the derived class. Here's the supporting documentation, from the C++ Draft Standard (N3337) (emphasis mine):
10.3 Virtual functions
2 If a virtual member function vf is declared in a class Base and in a class Derived, derived directly or indirectly from Base, a member function vf with the same name, parameter-type-list (8.3.5), cv-qualification, and ref-qualifier (or absence of same) as Base::vf is declared, then Derived::vf is also virtual (whether or not it is so declared) and it overrides Base::vf.
No, the virtual keyword on derived classes' virtual function overrides is not required. But it is worth mentioning a related pitfall: a failure to override a virtual function.
The failure to override occurs if you intend to override a virtual function in a derived class, but make an error in the signature so that it declares a new and different virtual function. This function may be an overload of the base class function, or it might differ in name. Whether or not you use the virtual keyword in the derived class function declaration, the compiler would not be able to tell that you intended to override a function from a base class.
This pitfall is, however, thankfully addressed by the C++11 explicit override language feature, which allows the source code to clearly specify that a member function is intended to override a base class function:
struct Base {
virtual void some_func(float);
};
struct Derived : Base {
virtual void some_func(int) override; // ill-formed - doesn't override a base class method
};
The compiler will issue a compile-time error and the programming error will be immediately obvious (perhaps the function in Derived should have taken a float as the argument).
Refer to WP:C++11.
Adding the "virtual" keyword is good practice as it improves readability , but it is not necessary. Functions declared virtual in the base class, and having the same signature in the derived classes are considered "virtual" by default.
There is no difference for the compiler, when you write the virtual in the derived class or omit it.
But you need to look at the base class to get this information. Therfore I would recommend to add the virtual keyword also in the derived class, if you want to show to the human that this function is virtual.
The virtual keyword should be added to functions of a base class to make them overridable. In your example, struct A is the base class. virtual means nothing for using those functions in a derived class. However, it you want your derived class to also be a base class itself, and you want that function to be overridable, then you would have to put the virtual there.
struct B : public A {
virtual void hello() { ... }
};
struct C : public B {
void hello() { ... }
};
Here C inherits from B, so B is not the base class (it is also a derived class), and C is the derived class.
The inheritance diagram looks like this:
A
^
|
B
^
|
C
So you should put the virtual in front of functions inside of potential base classes which may have children. virtual allows your children to override your functions. There is nothing wrong with putting the virtual in front of functions inside of the derived classes, but it is not required. It is recommended though, because if someone would want to inherit from your derived class, they would not be pleased that the method overriding doesn't work as expected.
So put virtual in front of functions in all classes involved in inheritance, unless you know for sure that the class will not have any children who would need to override the functions of the base class. It is good practice.
There's a considerable difference when you have templates and start taking base class(es) as template parameter(s):
struct None {};
template<typename... Interfaces>
struct B : public Interfaces
{
void hello() { ... }
};
struct A {
virtual void hello() = 0;
};
template<typename... Interfaces>
void t_hello(const B<Interfaces...>& b) // different code generated for each set of interfaces (a vtable-based clever compiler might reduce this to 2); both t_hello and b.hello() might be inlined properly
{
b.hello(); // indirect, non-virtual call
}
void hello(const A& a)
{
a.hello(); // Indirect virtual call, inlining is impossible in general
}
int main()
{
B<None> b; // Ok, no vtable generated, empty base class optimization works, sizeof(b) == 1 usually
B<None>* pb = &b;
B<None>& rb = b;
b.hello(); // direct call
pb->hello(); // pb-relative non-virtual call (1 redirection)
rb->hello(); // non-virtual call (1 redirection unless optimized out)
t_hello(b); // works as expected, one redirection
// hello(b); // compile-time error
B<A> ba; // Ok, vtable generated, sizeof(b) >= sizeof(void*)
B<None>* pba = &ba;
B<None>& rba = ba;
ba.hello(); // still can be a direct call, exact type of ba is deducible
pba->hello(); // pba-relative virtual call (usually 3 redirections)
rba->hello(); // rba-relative virtual call (usually 3 redirections unless optimized out to 2)
//t_hello(b); // compile-time error (unless you add support for const A& in t_hello as well)
hello(ba);
}
The fun part of it is that you can now define interface and non-interface functions later to defining classes. That is useful for interworking interfaces between libraries (don't rely on this as a standard design process of a single library). It costs you nothing to allow this for all of your classes - you might even typedef B to something if you'd like.
Note that, if you do this, you might want to declare copy / move constructors as templates, too: allowing to construct from different interfaces allows you to 'cast' between different B<> types.
It's questionable whether you should add support for const A& in t_hello(). The usual reason for this rewrite is to move away from inheritance-based specialization to template-based one, mostly for performance reasons. If you continue to support the old interface, you can hardly detect (or deter from) old usage.
I will certainly include the Virtual keyword for the child class, because
i. Readability.
ii. This child class my be derived further down, you don't want the constructor of the further derived class to call this virtual function.
I just read about this in the C++ FAQ Lite
[25.10] What does it mean to "delegate to a sister class" via virtual inheritance?
class Base {
public:
virtual void foo() = 0;
virtual void bar() = 0;
};
class Der1 : public virtual Base {
public:
virtual void foo();
};
void Der1::foo()
{ bar(); }
class Der2 : public virtual Base {
public:
virtual void bar();
};
class Join : public Der1, public Der2 {
public:
...
};
int main()
{
Join* p1 = new Join();
Der1* p2 = p1;
Base* p3 = p1;
p1->foo();
p2->foo();
p3->foo();
}
"Believe it or not, when Der1::foo() calls this->bar(), it ends up calling Der2::bar(). Yes, that's right: a class that Der1 knows nothing about will supply the override of a virtual function invoked by Der1::foo(). This "cross delegation" can be a powerful technique for customizing the behavior of polymorphic classes. "
My question is:
What is happening behind the scene.
If I add a Der3 (virtual inherited from Base), what will happen? (I dont have a compiler here, couldn't test it right now.)
What is happening behind the scene.
The simple explanation is that, because inheritance from Base is virtual in both Der1 and Der2, there is a single instance of the object in the most derived object Join. At compile time, and assuming (which is the common case) virtual tables as dispatch mechanism, when compiling Der1::foo it will redirect the call to bar() through the vtable.
Now the question is how the compiler generates vtables for each of the objects, the vtable for Base will contain two null pointers, the vtable for Der1 will contain Der1::foo and a null pointer and the vtable for Der2 will contain a null pointer and Der2::bar [*]
Now, because of virtual inheritance in the previous level, when the compiler processes Join it will create a single Base object, and thus a single vtable for the Base subojbect of Join. It effectively merges the vtables of Der1 and Der2 and produces a vtable that contains pointers to Der1::foo and Der2::bar.
So the code in Der1::foo will dispatch through Join's vtable to the final overrider, which in this case is in a different branch of the virtual inheritance hierarchy.
If you add a Der3 class, and that class defines either of the virtual functions, the compiler will not be able to cleanly merge the three vtables and will complain, with some error relating to the ambiguity of the multiply defined method (none of the overriders can be considered to be the final overrider). If you add the same method to Join, then the ambiguity will no longer be a problem, as the final overrider will be the member function defined in Join, so the compiler is able to generate the virtual table.
[*] Most compilers will not write null pointers here, but rather a pointer to a generic function that will print an error message and terminate the application, allowing for better diagnostics than a plain segmentation fault.
If you add a Der3 what will happen depends on which class it inherits from.
As you know, instantiating a class is only possible when all virtual functions have been defined; otherwise you can only make pointers to them. This is to prevent constructing partially defined objects.
In your example you cannot instantiate Der1 nor Der2 directly because in Der1, bar() is still pure virtual and in Der2, foo() is pure virtual.
Your Join class can be instantiated because it inherits from both and has therefore no pure virtual function.
Once you have made an instance of a class, you can instantiate pointers to non-instantiable classes by dynamic_casting.
From the moment a class has been instantiated, the virtual function mechanism, that works with a table of pointer to functions, will still call the functions that have been defined at instantiation time.
So the key here is that when you create your object, you create an instance of Join. Its virtual functions are defined because you are able to create the object. From that moment, you can call the virtual functions with any pointer to a base class.
I see why this is interesting to explore. In real code this would probably be hardly useful however. As others pointed out, virtual inheritance is more of a fix-this-bad-design-to-work-somehow tool, than a valid desing tool.
Your code produces warnings in VS2010 - the compiler is making you know that dominance is being used. Of course thats not a show stopper, but another discouragement to use this.
If you introduce Der3 like this
class Der3 : public virtual Base {
public:
void bar() {}
};
class Join : public Der1, public Der2, public Der3 {}
the code fails to compile because of ambiguous inheritance of 'void Base::bar(void)'
One point is missing in the discussion ( none-the-less this is quite informative and thanks to all ).
When you 'virtually inherit' a class. What happens is: a pointer to the virtual base class is kept by most of the compilers ( it can be implemented in different ways by different compilers). So if you take the size of Der1 and Der2, it would be atleast 4 bytes on 32 bit and 8 bytes on 64 bit. Because they have a pointer to the virtual base class and therefore, no ambiguity. That is why when you create the object of Join, it first calls the constructor of Virtual Base class ( not really the first call, but it initializes the pointer which came to it through Der1 and Der2 first in its construtor ). In Join compiler can check the pointer name / type and then it makes sure that only one pointer of virtual base class comes to it from Der1 and Der2. You can check even this by sizeof operator. As we know that compiler puts the calls in the constructor silently. Therefore, it first calls the Virtual Base class's constructor in Depth First way. ( can be checked using all the base classes as virtual derivation ). Rest is already explained
This is a pretty stupid example imo and a perfect example of academics making themselves look clever. If this situation ever came up, it would almost CERTAINLY be because of a bug, specifically forgetting to make Der1::foo() virtual.
Edit:
I misread the class definitions. Which is exactly the problem with this type of design. It takes a lot of thought to determine exactly what would happen in each of these cases, which is bad. Making your code readable is by far better than being "clever" like this.
Just what the topic asks. Also want to know why non of the usual examples of CRTP do not mention a virtual dtor.
EDIT:
Guys, Please post about the CRTP prob as well, thanks.
Only virtual functions require dynamic dispatch (and hence vtable lookups) and not even in all cases. If the compiler is able to determine at compile time what is the final overrider for a method call, it can elide performing the dispatch at runtime. User code can also disable the dynamic dispatch if it so desires:
struct base {
virtual void foo() const { std::cout << "base" << std::endl; }
void bar() const { std::cout << "bar" << std::endl; }
};
struct derived : base {
virtual void foo() const { std::cout << "derived" << std::endl; }
};
void test( base const & b ) {
b.foo(); // requires runtime dispatch, the type of the referred
// object is unknown at compile time.
b.base::foo();// runtime dispatch manually disabled: output will be "base"
b.bar(); // non-virtual, no runtime dispatch
}
int main() {
derived d;
d.foo(); // the type of the object is known, the compiler can substitute
// the call with d.derived::foo()
test( d );
}
On whether you should provide virtual destructors in all cases of inheritance, the answer is no, not necessarily. The virtual destructor is required only if code deletes objects of the derived type held through pointers to the base type. The common rule is that you should
provide a public virtual destructor or a protected non-virtual destructor
The second part of the rule ensures that user code cannot delete your object through a pointer to the base, and this implies that the destructor need not be virtual. The advantage is that if your class does not contain any virtual method, this will not change any of the properties of your class --the memory layout of the class changes when the first virtual method is added-- and you will save the vtable pointer in each instance. From the two reasons, the first being the important one.
struct base1 {};
struct base2 {
virtual ~base2() {}
};
struct base3 {
protected:
~base3() {}
};
typedef base1 base;
struct derived : base { int x; };
struct other { int y; };
int main() {
std::auto_ptr<derived> d( new derived() ); // ok: deleting at the right level
std::auto_ptr<base> b( new derived() ); // error: deleting through a base
// pointer with non-virtual destructor
}
The problem in the last line of main can be resolved in two different ways. If the typedef is changed to base1 then the destructor will correctly be dispatched to the derived object and the code will not cause undefined behavior. The cost is that derived now requires a virtual table and each instance requires a pointer. More importantly, derived is no longer layout compatible with other. The other solution is changing the typedef to base3, in which case the problem is solved by having the compiler yell at that line. The shortcoming is that you cannot delete through pointers to base, the advantage is that the compiler can statically ensure that there will be no undefined behavior.
In the particular case of the CRTP pattern (excuse the redundant pattern), most authors do not even care to make the destructor protected, as the intention is not to hold objects of the derived type by references to the base (templated) type. To be in the safe side, they should mark the destructor as protected, but that is rarely an issue.
Very unlikely indeed. There's nothing in the standard to stop compilers doing whole classes of stupidly inefficient things, but a non-virtual call is still a non-virtual call, regardless of whether the class has virtual functions too. It has to call the version of the function corresponding to the static type, not the dynamic type:
struct Foo {
void foo() { std::cout << "Foo\n"; }
virtual void virtfoo() { std::cout << "Foo\n"; }
};
struct Bar : public Foo {
void foo() { std::cout << "Bar\n"; }
void virtfoo() { std::cout << "Bar\n"; }
};
int main() {
Bar b;
Foo *pf = &b; // static type of *pf is Foo, dynamic type is Bar
pf->foo(); // MUST print "Foo"
pf->virtfoo(); // MUST print "Bar"
}
So there's absolutely no need for the implementation to put non-virtual functions in the vtable, and indeed in the vtable for Bar you'd need two different slots in this example for Foo::foo() and Bar::foo(). That means it would be a special-case use of the vtable even if the implementation wanted to do it. In practice it doesn't want to do it, it wouldn't make sense to do it, don't worry about it.
CRTP base classes really ought to have destructors that are non-virtual and protected.
A virtual destructor is required if the user of the class might take a pointer to the object, cast it to the base class pointer type, then delete it. A virtual destructor means this will work. A protected destructor in the base class stops them trying it (the delete won't compile since there's no accessible destructor). So either one of virtual or protected solves the problem of the user accidentally provoking undefined behavior.
See guideline #4 here, and note that "recently" in this article means nearly 10 years ago:
http://www.gotw.ca/publications/mill18.htm
No user will create a Base<Derived> object of their own, that isn't a Derived object, since that's not what the CRTP base class is for. They just don't need to be able to access the destructor - so you can leave it out of the public interface, or to save a line of code you can leave it public and rely on the user not doing something silly.
The reason it's undesirable for it to be virtual, given that it doesn't need to be, is just that there's no point giving a class virtual functions if it doesn't need them. Some day it might cost something, in terms of object size, code complexity or even (unlikely) speed, so it's a premature pessimization to make things virtual always. The preferred approach among the kind of C++ programmer who uses CRTP, is to be absolutely clear what classes are for, whether they are designed to be base classes at all, and if so whether they are designed to be used as polymorphic bases. CRTP base classes aren't.
The reason that the user has no business casting to the CRTP base class, even if it's public, is that it doesn't really provide a "better" interface. The CRTP base class depends on the derived class, so it's not as if you're switching to a more general interface if you cast Derived* to Base<Derived>*. No other class will ever have Base<Derived> as a base class, unless it also has Derived as a base class. It's just not useful as a polymorphic base, so don't make it one.
The answer to your first question: No. Only calls to virtual functions will cause an indirection via the virtual table at runtime.
The answer to your second question: The Curiously recurring template pattern is commonly implemented using private inheritance. You don't model an 'IS-A' relationship and hence you don't pass around pointers to the base class.
For instance, in
template <class Derived> class Base
{
};
class Derived : Base<Derived>
{
};
You don't have code which takes a Base<Derived>* and then goes on to call delete on it. So you never attempt to delete an object of a derived class through a pointer to the base class. Hence, the destructor doesn't need to be virtual.
Firstly, I think the answer to the OP's question has been answered quite well - that's a solid NO.
But, is it just me going insane or is something going seriously wrong in the community? I felt a bit scared to see so many people suggesting that it's useless/rare to hold a pointer/reference to Base. Some of the popular answers above suggest that we don't model IS-A relationship with CRTP, and I completely disagree with those opinions.
It's widely known that there's no such thing as interface in C++. So to write testable/mockable code, a lot of people use ABC as an "interface". For example, you have a function void MyFunc(Base* ptr) and you can use it this way: MyFunc(ptr_derived). This is the conventional way to model IS-A relationship which requires vtable lookups when you call any virtual functions in MyFunc. So this is pattern one to model IS-A relationship.
In some domain where performance is critical, there exists another way(pattern two) to model IS-A relationship in a testable/mockable manner - via CRTP. And really, performance boost can be impressive(600% in the article) in some cases, see this link. So MyFunc will look like this template<typename Derived> void MyFunc(Base<Derived> *ptr). When you use MyFunc, you do MyFunc(ptr_derived); The compiler is going to generate a copy of code for MyFunc() that matches best with the parameter type ptr_derived - MyFunc(Base<Derived> *ptr). Inside MyFunc, we may well assume some function defined by the interface is called, and pointers are statically cast-ed at compile time(check out the impl() function in the link), there's no overheads for vtable lookups.
Now, can someone please tell me either I am talking insane nonsense or the answers above simply did not consider the second pattern to model IS-A relationship with CRTP?
I read a lot of people writing "a virtual table exists for a class that has a virtual function declared in it".
My question is, does a vtable exists only for a class that has a virtual function or does it also exist for classes derived from that class.
e.g
class Base{
public:
virtual void print(){cout<<"Base Print\n";}
};
class Derived:public Base{
public:
void print(){cout<<"Derived print\n";}
};
//From main.cpp
Base* b = new Derived;
b->print();
Question: Had there been no vtable for class derived then the output would not have been "derived print". So IMO there exists a vtable for any class that has virtual function declared and also in classes inheriting from that class. Is this correct ?
As far as only virtual-function-specific functionality is considered, in a traditional approach to vtable implementation derived class would need a separate version of vtable if and only if that derived class overrides at least one virtual function. In your example, Derived overrides virtual function print. Since Derived has its own version of print, the corresponding entry in Derived vtable is different from that in Base vtable. This would normally necessitate a separate vtable for Derived.
If Derived didn't override anything at all, formally it still would be a separate polymorphic class, but in order to make its virtual functions work properly we could have simply reused Base vtable for Derived as well. So, technically there wouldn't be any need for a separate vtable for Derived.
However, in practical implementations, the data structure that we usually refer to as "vtable", often holds some additional class-specific information as well. That extra information is so class-specific that most of the time it becomes impossible to share vtables between different classes in hierarchy, even if they use the same set of virtual functions. For example, in some implementations the vtable pointer stored in each polymorphic object points to data structure that also stores so called "RTTI information" about the class. For this reason, in most (if not all) practical implementations each polymorphic class gets its own vtable, even if the virtual function pointers stored in those tables happen to be the same.
Yes, your understanding is correct. Any class that has a base with any virtual functions has a vtable.
Yes it's true. Actually, given base's defintion:
class derived:public base{
public:
void print(){cout<<"derived print\n";}
};
is completely equivalent to:
class derived:public base{
public:
virtual void print(){cout<<"derived print\n";}
};
... because you already defined print as virtual in base.
I'd wish the compiler would enforce that...
Yes, that's true. A class inherits all data members from its base class, including the vtable. However, vtable entries are adjusted accordingly (for example if the class overrides a base class virtual method, the corresponding entry in the vtable must point to its own implementation).
But keep in mind that the concept of a 'vtable' is common practice used by vitually every compiler, but it is not compulsory nor standardized.