C++ supports inheritance.
But how is it implemented in the compiler?
Does the compiler copy and paste all the implementation from parent to child?
EXTREMELY simplified, if we are talking about something like this:
class A
{
public:
int func1() { do something; }
int func2() { do something; }
};
class B : public A
{
public:
int func2() { do somethign else; }
};
B b;
b.func1();
then what happens inside the compiler will be this (remember, this is VERY simplified, and the real compiler code will be a lot more complex, I'm sure):
... fname = "func1" from the source code ...
... object = "b";
function fn;
while (!(fn = find_func(object, fname)))
object = parent_object(object);
if (fn)
produce_call(fn);
else
print_error_not_found(fname);
If we are talking about virtual functions, then the compiler will produce a table which holds the address of the respective virtual function, and the table is generated for each class, based on a similar principle of "find the function that exists in a this class or one of its parents).
[In the above, I've ignored the fact that one class can have more than one "parent" class - it doesn't change how things works, just that the code has to maintain a list or array of "more classes at the same level"]
Just like member variables, base classes cause a subobject to be embedded inside all instances of the derived class. Member functions of the base class are not duplicated for the derived class, instead they are called on this subobject corresponding to the base class.
The compiler knows where this subobject is located relative to the full object, and will insert pointer arithmetic everywhere there is a cast (possibly implicit) between pointer (or reference) to derived and to base. This includes the hidden this-pointer arguments passed to member functions of the base type.
Virtual inheritance is a little bit tricky, because the offset can be different depending on the most-derived type. In that case, the compiler needs to store the offset as a variable inside the class instances so it can be looked up at runtime (just like pointers to virtual member functions, there might be another layer of indirection involved to save space).
Related
I have three classes of objects:
class Foo: has a mesh, and I need to get that mesh;
class Bar: is a Foo, but has some further capabilities which Foo doesn't have;
class Baz: is a Foo, but has another completely independent set of capabilities which neither Foo nor Bar have.
All three classes need to have a way to give me their mesh which, however, can be implemented in many ways, of which I need (at the moment I can't see another way) to use at least 2 different ones, which are MeshTypeA and MeshTypeB.
I would like to have a common interface for different implementations of the same concept (getMesh), however, I can't use auto in a virtual method. I'm lacking the facility to make the code have sense. I would like to have:
class Foo
{
public:
virtual ~Foo() = 0;
virtual auto getMesh() const = 0; // doesn't compile
};
class Bar : public Foo
{
public:
virtual ~Bar() = 0;
virtual auto getMesh() const = 0; // doesn't compile
// other virtual methods
};
class ConcreteFooWhichUsesA : public Foo
{
public:
ConcreteFooWhichUsesA();
~ConcreteFooWhichUsesA();
auto getMesh() const override {return mesh_;};
private:
MeshTypeA mesh_;
};
class ConcreteBarWhichUsesB : public Bar
{
public:
ConcreteBarWhichUsesB();
~ConcreteBarWhichUsesB();
auto getMesh() const override {return mesh_;};
// other implementations of virtual methods
private:
MeshTypeB mesh_;
};
MeshTypeA and MeshTypeB are not exclusive to Foo, Bar, or Baz, which is to say all three could have both types of mesh. However I really don't care for which MeshType I get when I later use it.
Do I need to wrap MeshTypeA and MeshTypeB in my own MeshType? Is it a matter of templating the MeshType? I believe there is a way, however related questions aren't helping or I can't formulate my question in a meaningful enough way.
I have also found this where the author uses a Builder class and decltype, but I don't have such a class. Maybe that would be it? Do I need a MeshLoader sort of class as an indirection level?
If your MeshTypes all have a common (abstract) base class, then you can just return (a pointer or reference to) that in the virtual function defintions, and the derived classes can then return their concrete mesh types, and all will be well. If you have code that can work on any mesh type, it is going to need that abstract base anyways.
If your MeshTypes do not all have a common base class, why even have a getMesh method in Foo at all? Remove it and give each of the concrete classes it's own getMesh method that doesn't override (and has nothing in particular to do with the meshes in any other concrete class).
A function's return type is part of its interface. You can't just change it willy-nilly. More specifically, you cannot have a base class virtual method return one thing while an overridden version returns another. OK, you can, but only if the derived version's return type is convertible to the base class return type (in which case, calling through the base class function will perform said conversion on the overriding method's return type).
C++ is a statically typed language; the compiler must know what an expression evaluates to at compile time. Since polymorphic inheritance is a runtime property (that is, the compiler is not guaranteed to be able to know which override will be called through a base class pointer/reference), you cannot have polymorphic inheritance change compile-time constructs, like the type of a function call expression. If you call a virtual method of a base class instance, the compiler will expect this expression to evaluate to what that base class's method returns.
Remember: the point of polymorphic inheritance is that you can write code that doesn't know about the derived classes and have it still work with them. What you're trying to do violates that.
I'm currently trying to make a pair of classes which depend on each other. Essentially, objects of class B create objects of class A. However, I am also using an inheritance hierarchy, so all derivatives of class B must also be able to create derivatives of class A (each derivative of B corresponds to a derivative of A, so DerB1 makes DerA1 objects, and DerB2 makes DerA2 objects).
I'm having problems with my implementation, and it may be silly, but I would like to see if anyone knows what to do. My code is below (I HATE reading other people's code, so I tried to make it as easy to read as possible...only a few important bits, which I commented to explain)
class BaseB {} // Declare BaseB early to use in BaseA constructor
class BaseA
{
public:
BaseA(BaseB* b) {}; // Declare the BaseA constructor (callable by all B classes, which pass a pointer to themselves to the constructor so the A objects can keep track of their parent)
}
class DerA:public BaseA
{
DerA(BaseB* b):BaseA(b) {}; // Inherit the BaseA constructor, and use initialization list
}
class BaseB
{
public:
virtual BaseA createA() = 0; // Virtual function, representing method to create A objects
}
class DerB:public BaseB
{
BaseA createA() {
DerA* a = new DerA(this); // Definition of createA to make a new A object, specifically one of type DerA (Error1: No instance of constructor "DerA::DerA" matches the argument list)
return a; // Error2: Cannot return DerA for BaseA function
}
}
So, I have two main problems, one is practical (Error1, as I seem to simply be calling the function wrong, even if I try to typecast this), one is philosophical (Error 2, as I don't know how to implement the features I want. If anyone could point out why Error1 is occurring, that would be wonderful! Error2, however, requires some explanation.
I would like my user (programmer) to interact with all A objects the same way. They will have the same exact public functions, but each will have VERY different implementations of these functions. Some will be using different data types (and so will require function contracts), but many will have the same data types just with different algorithms that they use on them. I would like some piece of code to work exactly the same way if one class A derivative is used or another is. However, in my current implementation, it seems that I need to return a DerA object instead of a BaseA object (at the site of Error2). This means that I will need to write a segment of main code SPECIFICALLY for a DerA object, instead of any arbitrary A object. I would like something like:
BaseB b = new DerB(); // Declare which derivative of BaseB I want to use
BaseA a = b->createA(b); // Call the createA function in that derivative, which will automatically make a corresponding A object
This way, I can simply choose which type of B object I would like in the first line (by my choice of B constructor, or tag, or template, or something), and the rest of the code will look the same for any type of object B (as each has the same public member functions, even though each object will perform those functions differently).
Would I be better off using templates or some other method instead of inheritance? (I apologize for being intentionally vague, but I hope my class A/B example should mostly explain what I need).
Thank you for any help. I apologize for asking two questions in one post and for being long-winded, but I am trying to learn the best way to approach a rather large redesign of some software.
You have several syntactical issues to get the errors solved:
Add the ; after each class definitions.
The first line should be a forward declaration: class BaseB /*{} NO!!*/ ;
Add public: to make constructor of DerA accessible for DerB
BaseA createA() should return a value, not a pointner (according to signature): return *a;
There is another potential hidden slicing issue, as createA() returns a value, an not a pointer. This means that your returned object (here *a), would be copied but as a real BaseA object. So only the BaseA part of the object will be copied, not the derived part. This could lead to some unexpected surprises.
In order to avoid slicing, consider returning a pointer, changing the signature of createA() accordingly. The object pointed to would then keep the right type without loosing anything.
If you would later need to copy the object, you could use a static cast if you are absolutely sure of the real type of the object pointed to:
BaseA *pba = pdb->createA(); // get pointer returned
DerA da = *static_cast<DerA*>(pba); // static cast with pointer
If you would need to copy pointed BaseA objects without necessarily knwowing for sure their real type, you could implement a virtual clone function in DerA (e.g. prototype design pattern)
Note: This question is purely about asm.js not about C++ nor any other programming language.
As the title already says:
How should a function pointer be implemented in a efficient way?
I couldn't find anything on the web, so I figured asking it here.
Edit:
I would like to implement virtual functions in the compiler I'm working on.
In C++ I would do something like this to generate a vtable:
#include <iostream>
class Base {
public:
virtual void doSomething() = 0;
};
class Derived : public Base {
public:
void doSomething() {
std::cout << "I'm doing something..." << std::endl;
}
};
int main()
{
Base* instance = new Derived();
instance->doSomething();
return 0;
}
To be more precise; how can I generate a vtable in asm.js without the need of plain JavaScript?
In any case, I would like the "near native" capabilities of asm.js while using function pointers.
The solution may be suitable for computer generated code only.
Looking over how asm.js works, I believe your best bet would be to use the method the original CFront compiler used: compile the virtual methods down to functions that take a this pointer, and use thunks to correct the this pointer before passing it. I'll go through it step by step:
No Inheritance
Reduce methods to special functions:
void ExampleObject::foo( void );
would be transformed into
void exampleobject_foo( ExampleObject* this );
This works fine for non-inheritance based objects.
Single Inheritance
We can easily add support for arbitrary large amount of single inheritance through a simple trick: always store the object in memory base first:
class A : public B
would become, in memory:
[[ B ] A ]
Getting closer!
Multiple Inheritance
Now, multiple inheritance makes this much harder to work with
class A : public B, public C
It's impossible for both B and C to be at the start of A; they simply cannot co-exist. There are two choices:
Store an explicit offset (known as delta) for each call to base.
Do not allow calls through A to B or C
The second choice is much preferable for a variety of reasons; if you are calling a base class member function, it's rare you would want to do it through a derived class. Instead, you could simply go C::conlyfunc, which could then do the adjustment to your pointer for you at no cost. Allowing A::conlyfunc removes important information that the compiler could have used, at very little benefit.
The first choice is used in C++; all multiple inheritance objects call a thunk before each call to a base class, which adjusts the this pointer to point to the subobject inside it. In a simple example:
class ExampleBaseClass
{
void foo( void );
}
class ExampleDerivedClass : public ExampleBaseClass, private IrrelevantBaseClass
{
void bar( void );
}
would then become
void examplebaseclass_foo( ExampleBaseClass* this );
void examplederivedclass_bar( ExampleDerivedClass* this);
void examplederivedclass_thunk_foo( ExampleDerivedClass* this)
{
examplebaseclass_foo( this + delta );
}
This could be inlined in many situations, so it's not too big of overhead. However, if you could never refer to ExampleBaseClass::foo as ExampleDerivedClass::foo, these thunks wouldn't be needed, as the delta would be easily discernible from the call itself.
Virtual functions
Virtual functions adds a whole new layer of complexity. In the multiple inheritance example, the thunks had fixed addresses to call; we were just adjusting the this before passing it to an already known function. With virtual functions, the function we're calling is unknown; we could be overridden by a derived object we have no possibility of knowing about at compile time, due to it being in another translation unit or a library, etc.
This means we need some form of dynamic dispatch for each object that has a virtually overridable function; many methods are possible, but C++ implementations tend to use a simple array of function pointers, or a vtable. To each object that has virtual functions, we add a point to an array as a hidden member, usually at the front:
class A
{
hidden:
void* vtable;
public:
virtual void foo( void );
}
We add thunk functions which redirect to the vtable
void a_foo( A* this )
{
int vindex = 0;
this->vtable[vindex](this);
}
The vtable is then populated with pointers to the functions we actually want to call:
vtable[0] = &A::foo_default; // our baseclass implimentation of foo
In a derived class, if we wish to override this virtual function, all we need to do is change the vtable in our own object, to point to the new function, and it will override in the base class as well:
class B: public A
{
virtual void foo( void );
}
will then do this in the constructor:
((A*)this)->vtable[0] = &B::foo;
Finally, we have support for all forms of inheritance!
Almost.
Virtual Inheritance
There is one final caveat with this implementation: if you continue to allow Derived::foo to be used when what you really mean is Base::foo, you get the diamond problem:
class A : public B, public C;
class B : public D;
class C : public D;
A::DFunc(); // Which D?
This problem can also occur when you use base classes as stateful classes, or when you put function that should be has-a rather than is-a; generally, it's a sign of a need for a restructure. But not always.
In C++, this has a solution that is not very elegant, but works:
class A : public B, public C;
class B : virtual D;
class C : virtual D;
This requires those who implement such classes and hierarchies to think ahead and intentionally make their classes a little slower, to support a possible future usage. But it does solve the problem.
How can we implement this solution?
[ [ D ] [ B ] [ Dptr ] [ C ] [ Dptr ] A ]
Rather than use the base class directly, as in normal inheritance, with virtual inheritance we push all usages of D through a pointer, adding an indirection, while stomping the multiple instantiations into a single one. Notice that both B and C have their own pointer, that both point to the same D; this is because B and C don't know if they are free floating copies or bound in derived objects. The same calls need to be used for both, or virtual functions won't work as expected.
Summary
Transform method calls into function calls with a special this parameter in base classes
Structure objects in memory so single inheritance is no different from no inheritance
Add thunks to adjust this pointers then call base classes for multiple inheritance
Add vtables to classes with virtual methods, and make all calls to methods go through vtable to method (thunk -> vtable -> method)
Deal with virtual inheritance through a pointer-to-baseobject rather than derive object calls
All of this is straightforward in js.asm.
Tim,
I'm by no means an asm.js expert but your question intrigues me. It goes to the hart of object oriented language design. It also seems ironic that you are recreating machine-level problems in the javascript domain.
The solution to your question it seems to me is that you will need to setup an accounting of defined types and functions. In Java this is typically done by decorating bytecode with identifiers that represent the correct class->function mapping of any given object. If you use a Int32 identifier for each class that is defined and an additionaly Int32 identifier for each function defined you can then store these in the object representations on the heap. Your vtable is then no more than the mapping of these combination to specific functions.
I hope this helps you.
I am not very familiar with the exact syntax of asm.js, but here is how I implemented a vtable in a compiler of mine (for x86):
Each object are derived from a struct like this:
struct Object {
VTable *vtable;
};
Then the other types I use will look something like this in c++-syntax:
struct MyInt : Vtable {
int value;
};
which is (in this case) equivalent to:
struct MyInt {
VTable *vtable;
int value;
};
So the final layout of the objects are that at offset 0, I know I have a pointer to a vtable. The vtable I use is simply an array of function pointers, again in C-syntax the type VTable could be defined as follows:
typedef Function *VTable;
Where in C i would use void * instead of Function *, since the actual types of the function will vary. What is left for the compiler to do is:
1: For each type containing virtual functions, create a global vtable and populate it with function pointers to the overridden functions.
2: When an object is created, set the vtable member of the object (at offset 0) to point to the global vtable.
Then when you want to call virtual functions you can do something like this:
(*myObject->vtable[1])(1);
to call the function your compiler has assigned the ID 1 in the vtable (methodB in the example below).
A final example: Let's say we have the following two classes:
class A {
public:
virtual int methodA(int) { ... }
virtual int methodB(int) { ... }
virtual int methodC(int) { ... }
};
class B : public A {
public:
virtual int methodA(int) { ... }
virtual int methodB(int) { ... }
};
The VTable for class A and B can look like this:
A: B:
0: &A::methodA 0: &B::methodA
1: &A::methodB 1: &B::methodB
2: &A::methodC 2: &A::methodC
By using this logic, we know that when we are calling methodB on any type derived from A, we shall call whatever function is located at index 1 in the vtable of that object.
Of course, this solution do not work right away if you want to allow for multiple inheritance, but I am fairly sure it can be extended to do so. After some debugging with Visual Studio 2008, it seems like this is more or less how the vtables are implemented there (of course there it is extended to handle multiple inheritance, I have not tried to figure that out yet).
I hope you get some ideas that can be applied in asm.js at least. Like I said, I do not know exactly how asm.js works, but I have managed to implement this system in x86 assembly and I do not see any issues with implementing it in JavaScript either, so I hope it can be used in asm.js as well.
class base{
.....
virtual void function1();
virtual void function2();
};
class derived::public base{
int function1();
int function2();
};
int main()
{
derived d;
base *b = &d;
int k = b->function1() // Why use this instead of the following line?
int k = d.function1(); // With this, the need for virtual functions is gone, right?
}
I am not a CompSci engineer and I would like to know this. Why use virtual functions if we can avoid base class pointers?
The power of polymorphism isn't really apparent in your simple example, but if you extend it a bit it might become clearer.
class vehicle{
.....
virtual int getEmission();
}
class car : public vehicle{
int getEmission();
}
class bus : public vehicle{
int getEmission();
}
int main()
{
car a;
car b;
car c;
bus d;
bus e;
vehicle *traffic[]={&a,&b,&c,&d,&e};
int totalEmission=0;
for(int i=0;i<5;i++)
{
totalEmission+=traffic[i]->getEmission();
}
}
This lets you iterate through a list of pointers and have different methods get called depending on the underlying type. Basically it lets you write code where you don't need to know what the child type is at compile time, but the code will perform the right function anyway.
You're correct, if you have an object you don't need to refer to it via a pointer. You also don't need a virtual destructor when the object will be destroyed as the type it was created.
The utility comes when you get a pointer to an object from another piece of code, and you don't really know what the most derived type is. You can have two or more derived types built on the same base, and have a function that returns a pointer to the base type. Virtual functions will allow you to use the pointer without worrying about which derived type you're using, until it's time to destroy the object. The virtual destructor will destroy the object without you knowing which derived class it corresponds to.
Here's the simplest example of using virtual functions:
base *b = new derived;
b->function1();
delete b;
its to implement polymorphism. Unless you have base class pointer
pointing to derived object you cannot have polymorphism here.
One of the key features of derived classes is that a pointer to a
derived class is type-compatible with a pointer to its base class.
Polymorphism is the art of taking advantage of this simple but
powerful and versatile feature, that brings Object Oriented
Methodologies to its full potential.
In C++, a special type/subtype relationship exists in which a base
class pointer or a reference can address any of its derived class
subtypes without programmer intervention. This ability to manipulate
more than one type with a pointer or a reference to a base class is
spoken of as polymorphism.
Subtype polymorphism allows us to write the kernel of our application
independent of the individual types we wish to manipulate. Rather, we
program the public interface of the base class of our abstraction
through base class pointers and references. At run-time, the actual
type being referenced is resolved and the appropriate instance of the
public interface is invoked. The run-time resolution of the
appropriate function to invoke is termed dynamic binding (by default,
functions are resolved statically at compile-time). In C++, dynamic
binding is supported through a mechanism referred to as class virtual
functions. Subtype polymorphism through inheritance and dynamic
binding provide the foundation for objectoriented programming
The primary benefit of an inheritance hierarchy is that we can program
to the public interface of the abstract base class rather than to the
individual types that form its inheritance hierarchy, in this way
shielding our code from changes in that hierarchy. We define eval(),
for example, as a public virtual function of the abstract Query base
class. By writing code such as
_rop->eval();
user code is shielded from the variety and volatility of our query language. This not only allows for the addition, revision,
or removal of types without requiring changes to user programs, but
frees the provider of a new query type from having to recode behavior
or actions common to all types in the hierarchy itself. This is
supported by two special characteristics of inheritance: polymorphism
and dynamic binding. When we speak of polymorphism within C++, we
primarily mean the ability of a pointer or a reference of a base class
to address any of its derived classes. For example, if we define a
nonmember function eval() as follows, // pquery can address any of the
classes derived from Query
void eval( const Query *pquery ) { pquery->eval(); }
we can invoke it legally, passing in the address of an object of any of the
four query types:
int main()
{
AndQuery aq;
NotQuery notq;
OrQuery *oq = new OrQuery;
NameQuery nq( "Botticelli" ); // ok: each is derived from Query
// compiler converts to base class automatically
eval( &aq );
eval( ¬q );
eval( oq );
eval( &nq );
}
whereas an attempt to invoke eval() with the address of an object not derived from Query
results in a compile-time error:
int main()
{ string name("Scooby-Doo" ); // error: string is not derived from Query
eval( &name);
}
Within eval(), the execution of pquery->eval(); must invoke the
appropriate eval() virtual member function based on the actual class
object pquery addresses. In the previous example, pquery in turn
addresses an AndQuery object, a NotQuery object, an OrQuery object,
and a NameQuery object. At each invocation point during the execution
of our program, the actual class type addressed by pquery is
determined, and the appropriate eval() instance is called. Dynamic
binding is the mechanism through which this is accomplished.
In the object-oriented paradigm, the programmer manipulates an unknown instance of a bound but infinite set of types. (The set of
types is bound by its inheritance hierarchy. In theory, however, there
is no limit to the depth and breadth of that hierarchy.) In C++ this
is achieved through the manipulation of objects through base class
pointers and references only. In the object-based paradigm, the
programmer
manipulates an instance of a fixed, singular type that is completely defined at the point of compilation. Although the
polymorphic manipulation of an object requires that the object be
accessed either through a pointer or a reference, the manipulation of
a pointer or a reference in C++ does not in itself necessarily result
in polymorphism. For example, consider
// no polymorphism
int *pi;
// no language-supported polymorphism
void *pvi;
// ok: pquery may address any Query derivation
Query *pquery;
In C++, polymorphism
exists only within individual class hierarchies. Pointers of type
void* can be described as polymorphic, but they are without explicit
language support — that is, they must be managed by the programmer
through explicit casts and some form of discriminant that keeps track
of the actual type being addressed.
You seem to have asked two questions (in the title and in the end):
Why use base class pointers for derived classes?
This is the very use of polymorphism. It allows you to treat objects uniformly while allowing you to have specific implementation. If this bothers you, then I assume you should ask: Why polymorphism?
Why use virtual destructors if we can avoid base class pointers?
The problem here is you cannot always avoid base class pointers to exploit the strength of polymorphism.
When exactly does the compiler create a virtual function table?
1) when the class contains at least one virtual function.
OR
2) when the immediate base class contains at least one virtual function.
OR
3) when any parent class at any level of the hierarchy contains at least one virtual function.
A related question to this:
Is it possible to give up dynamic dispatch in a C++ hierarchy?
e.g. consider the following example.
#include <iostream>
using namespace std;
class A {
public:
virtual void f();
};
class B: public A {
public:
void f();
};
class C: public B {
public:
void f();
};
Which classes will contain a V-Table?
Since B does not declare f() as virtual, does class C get dynamic polymorphism?
Beyond "vtables are implementation-specific" (which they are), if a vtable is used: there will be unique vtables for each of your classes. Even though B::f and C::f are not declared virtual, because there is a matching signature on a virtual method from a base class (A in your code), B::f and C::f are both implicitly virtual. Because each class has at least one unique virtual method (B::f overrides A::f for B instances and C::f similarly for C instances), you need three vtables.
You generally shouldn't worry about such details. What matters is whether you have virtual dispatch or not. You don't have to use virtual dispatch, by explicitly specifying which function to call, but this is generally only useful when implementing a virtual method (such as to call the base's method). Example:
struct B {
virtual void f() {}
virtual void g() {}
};
struct D : B {
virtual void f() { // would be implicitly virtual even if not declared virtual
B::f();
// do D-specific stuff
}
virtual void g() {}
};
int main() {
{
B b; b.g(); b.B::g(); // both call B::g
}
{
D d;
B& b = d;
b.g(); // calls D::g
b.B::g(); // calls B::g
b.D::g(); // not allowed
d.D::g(); // calls D::g
void (B::*p)() = &B::g;
(b.*p)(); // calls D::g
// calls through a function pointer always use virtual dispatch
// (if the pointed-to function is virtual)
}
return 0;
}
Some concrete rules that may help; but don't quote me on these, I've likely missed some edge cases:
If a class has virtual methods or virtual bases, even if inherited, then instances must have a vtable pointer.
If a class declares non-inherited virtual methods (such as when it doesn't have a base class), then it must have its own vtable.
If a class has a different set of overriding methods than its first base class, then it must have its own vtable, and cannot reuse the base's. (Destructors commonly require this.)
If a class has multiple base classes, with the second or later base having virtual methods:
If no earlier bases have virtual methods and the Empty Base Optimization was applied to all earlier bases, then treat this base as the first base class.
Otherwise, the class must have its own vtable.
If a class has any virtual base classes, it must have its own vtable.
Remember that a vtable is similar to a static data member of a class, and instances have only pointers to these.
Also see the comprehensive article C++: Under the Hood (March 1994) by Jan Gray. (Try Google if that link dies.)
Example of reusing a vtable:
struct B {
virtual void f();
};
struct D : B {
// does not override B::f
// does not have other virtuals of its own
void g(); // still might have its own non-virtuals
int n; // and data members
};
In particular, notice B's dtor isn't virtual (and this is likely a mistake in real code), but in this example, D instances will point to the same vtable as B instances.
The answer is, 'it depends'. It depends on what you mean by 'contain a vtbl' and it depends on the decisions made by the implementor of the particular compiler.
Strictly speaking, no 'class' ever contains a virtual function table. Some instances of some classes contain pointers to virtual function tables. However, that's just one possible implementation of the semantics.
In the extreme, a compiler could hypothetically put a unique number into the instance that indexed into a data structure used for selecting the appropriate virtual function instance.
If you ask, 'What does GCC do?' or 'What does Visual C++ do?' then you could get a concrete answer.
#Hassan Syed's answer is probably closer to what you were asking about, but it is really important to keep the concepts straight here.
There is behavior (dynamic dispatch based on what class was new'ed) and there's implementation. Your question used implementation terminology, though I suspect you were looking for a behavioral answer.
The behavioral answer is this: any class that declares or inherits a virtual function will exhibit dynamic behavior on calls to that function. Any class that does not, will not.
Implementation-wise, the compiler is allowed to do whatever it wants to accomplish that result.
Answer
a vtable is created when a class declaration contains a virtual function. A vtable is introduced when a parent -- anywhere in the heirarchy -- has a virtual function, lets call this parent Y. Any parent of Y WILL NOT have a vtable (unless they have a virtual for some other function in their heirarchy).
Read on for discussion and tests
-- explanation --
When you specify a member function as virtual, there is a chance that you may try to use sub-classes via a base-class polymorphically at run-time. To maintain c++'s guarantee of performance over language design they offered the lightest possible implementation strategy -- i.e., one level of indirection, and only when a class might be used polymorphically at runtime, and the programmer specifies this by setting at least one function to be virtual.
You do not incur the cost of the vtable if you avoid the virtual keyword.
-- edit : to reflect your edit --
Only when a base class contains a virtual function do any other sub-classes contain a vtable. The parents of said base class do not have a vtable.
In your example all three classes will have a vtable, this is because you can try to use all three classes via an A*.
--test - GCC 4+ --
#include <iostream>
class test_base
{
public:
void x(){std::cout << "test_base" << "\n"; };
};
class test_sub : public test_base
{
public:
virtual void x(){std::cout << "test_sub" << "\n"; } ;
};
class test_subby : public test_sub
{
public:
void x() { std::cout << "test_subby" << "\n"; }
};
int main()
{
test_sub sub;
test_base base;
test_subby subby;
test_sub * psub;
test_base *pbase;
test_subby * psubby;
pbase = ⊂
pbase->x();
psub = &subby;
psub->x();
return 0;
}
output
test_base
test_subby
test_base does not have a virtual table therefore anything casted to it will use the x() from test_base. test_sub on the other hand changes the nature of x() and its pointer will indirect through a vtable, and this is shown by test_subby's x() being executed.
So, a vtable is only introduced in the hierarchy when the keyword virtual is used. Older ancestors do not have a vtable, and if a downcast occurs it will be hardwired to the ancestors functions.
You made an effort to make your question very clear and precise, but there's still a bit of information missing. You probably know, that in implementations that use V-Table, the table itself is normally an independent data structure, stored outside the polymorphic objects, while objects themselves only store a implicit pointer to the table. So, what is it you are asking about? Could be:
When does an object get an implicit pointer to V-Table inserted into it?
or
When is a dedicated, individual V-Table created for a given type in the hierarchy?
The answer to the first question is: an object gets an implicit pointer to V-Table inserted into it when the object is of polymorphic class type. The class type is polymorphic if it contains at least one virtual function, or any of its direct or indirect parents are polymorphic (this is answer 3 from your set). Note also, that in case of multiple inheritance, an object might (and will) end up containing multiple V-Table pointers embedded into it.
The answer to the second question could be the same as to the first (option 3), with a possible exception. If some polymorphic class in single inheritance hierarchy has no virtual functions of its own (no new virtual functions, no overrides for parent virtual function), it is possible that implementation might decide not to create an individual V-Table for this class, but instead use it's immediate parent's V-Table for this class as well (since it is going to be the same anyway). I.e. in this case both objects of parent type and objects of derived type will store the same value in their embedded V-Table pointers. This is, of course, highly dependent on implementation. I checked GCC and MS VS 2005 and they don't act that way. They both do create an individual V-Table for the derived class in this situation, but I seem to recall hearing about implementations that don't.
C++ standards doesn't mandate using V-Tables to create the illusion of polymorphic classes. Most of the time implementations use V-Tables, to store the extra information needed. In short, these extra pieces of information are equipped when you have at least one virtual function.
The behavior is defined in chapter 10.3, paragraph 2 of the C++ language specification:
If a virtual member function vf is
declared in a class Base and in a
class Derived, derived directly or
indirectly from Base, a member
function vf with the same name and
same parameter list as Base::vf is
declared, then Derived::vf is also
virtual ( whether or not it is so
declared ) and it overrides Base::vf.
A italicized the relevant phrase. Thus, if your compiler creates v-tables in the usual sense then all classes will have a v-table since all their f() methods are virtual.