Difference in VTBL in single inheritance and multiple inheritance - c++

I was taught in class that in the case of single inheritance the VTBL includes all of the of the virtual functions the class can respond to. The following image should illustrate this.
In multiple inheritance I was taught that the VTBL includes all of the virtual functions that were first defined in that class or the ones which have been overriden in this class. This means that at run time you've got to search for the right method implementation using the dispatch algorithm.
I'm not entirely sure why this difference exists. Why couldn't the VTBL in the case of multiple inheritance consist of all the virtual functions that the class can respond to (just like in the case of single inheritance)? This should speed up the process since we don't have to look for the method implementation at run time throughout the whole inheritance hierarchy.
Can anyone clarify this for me?
Edit: When I refer to the dispatch algorithm for multiple inheritance I'm referring to the following:
Just to clarify: notice how we've got to traverse the hierarchy to search for the implementation rather than just going to the current class's VTBL and calling jumping to the method.

Here's a translated example from published German notes by Scott Meyers. Consider
class B1 {
public:
virtual void mf(); // may be overridden in derived classes
};
class B2 {
public:
virtual void mf(); // may be overridden in derived classes
};
class D: public B1, public B2 {};
void g(B2 *pb2)
{
pb2->mf(); // requires offset adjustment before calling mf?
}
The pointer argument being passed to g() needs an offset adjustment is needed only if D overrides mf and pb2 really points to a D. What should a compiler do? When generating code for the call,
It may not know that D exists. (that's the point of dynamic polymorphism: to be able to call future code without recompiling)
It can’t know whether pb2 points to a D (it only knows that only at runtime).
Because polymorphic classes need to remain flexible against the unbounded set of possible future further derivations, the problem is typically solved by
Creating special vtbls that handle offset adjustments.
For derived class objects, adding new vptrs to these vtbls, one
additional vptr for each base class after the first one.
Merging all the virtual functions into a single table would destroy that flexibility. Note that multiple "parallel" inheritance D: B1, B2 {}; is different from "stacked" inheritance D: M: B {};. The latter requires a single substitution chain, the former has two such chains and incompatible B1 and B2.

If you have to base class A and B of your multiply inherited object D, these have their own vtable layout and D needs to provide vtables which match the vtables of both A and B. Further, if another class derives from D and possibly from another similarly multiple inherited class, the same thing happens again, i.e., there are multiple vtables needed. They can't just simply be merged. As a result, multiply inherited objects typically have multiple vtables around and the compiler inserts code to first determine the function's correct vtable and then call it. I think the code determining the correct vtable based on a pointer to an object with multiple bases is just a simple addition or subtraction if the virtual function is not in a virtual base class and a look-up of the location of the virtual base class otherwise, i.e., there isn't anything really expensive being done but more than just an indirect call is needed.

Related

C++ class hierarchy in which to add functionality with an interface

C++. Imagine the following situation.
There's a class hierarchy of classes deriving from some base class A.
We cannot modify A because it is outside of our scope.
(Provided by a library, it is a MFC CView class, but that shouldn't matter here)
So there are A1, A2 etc which are different classes somehow derived from A and providing specific functionality.
Now imagine we define some new interface I to provide some new functionality.
Classes for concrete objects of the application will inherit from both one of the As and I.
Let's call them Bs. (There are again several of them, like B1 derived from A1 and I, B2 derived from A2 and I etc.)
Now it happens that to implement the interface of I, there is a lot of common code that needs functionality from A.
How can we organize the class hierarchy without repeating ourselves too much.
So for instance if there is a function I::f that needs to call A::f, for all derived classes Bn.
It seems like waste to re-implement I::f for every Bn.
But obviously, we cannot call A::f directly from I::f, as they aren't related.
I hope you get the point.
What is the pattern that can help us here?
The immediate solution to "call A::f from I::f without overhauling everything" would be dynamic_cast:
struct I {
void f() {
dynamic_cast<A *>(this)->f();
}
};
Note that this performs a full-fledged RTTI graph traversal to perform the cross-cast through the unknown Ax dynamic type of the object, so it might be on the slow side of things.

Diamond inheritance

Assume classes D and E and F all inherit from base class B, and that
class C inherits from D and E.
(i) How many copies of class B appear in class C?
(ii) How would using virtual inheritance change this scenario? Explain
your answer.
(iii) How does Java avoid the need for multiple inheritance for many
of the
situations where multiple inheritance might be used in C++?
Here are some of my current ideas, but I'm an by no means an expert on C++!
(i) If C inherits from D and E which are subclasses of B, then would D and E technically be copies of their super class? Then if C inherits from D and E that would mean there are 2 copies of B in C.
(ii) Using virtual is somewhat similar to using Abstract in Java (i think). Now given this, it would mean that there would not be multiple copies of B in C, as the instantiation would be cascaded down to the level it is needed. I am not sure how to word my explanation but say B has a function called print() which prints "i am B" and C overrides this function put prints "i am C". If you called print() on C without virtual you end up printing "i am B", using virtual would mean that it would print "i am C".
(iii) My idea here is that Java can use interfaces to avoid the use of multiple inheritance. You can implement multiple interfaces but you can only extend one Class. I'm not sure what else to add here, so any input or relevant resources would be helpful.
(i) and (iii) are right. In my experience anyway, most of the time in C++ when I've used multiple inheritance it's been because the bases were interfaces (a concept which doesn't have keyword support in C++, but it is a concept you can execute anyway).
The first sentence of (ii) is right, however your second sentence is talking about virtual functions, which is completely different to virtual inheritance. Virtual inheritance means that there is only one copy of B, and the D and E both have that same copy as their base. There is no difference in terms of functions, but the difference comes in terms of member variables (and base classes) of B.
If there is a function that prints out B's member variable foo; then in case (ii) this function always prints the same value because there is only one foo, but in case (i) calling that function from the D base class may print a different value to calling it from the E base class.
The term "diamond inheritance" wraps all this up in two words that serve as a good mnemonic :)
You seem to have mostly arrived at the right answers, though the reasoning needs work. The key issue at play here is the question of "how to lay out the memory of an instance of C if it inherits the same base class twice?"
i) There are 2 copies of the base class B in the memory layout for an object of type C. The example provided is a case of "diamond inheritance", because when you draw out the dependency/inheritance tree, you essentially draw a diamond. The "problem" with diamond inheritance is essentially to ask how to lay the object out in memory. C++ went with two approaches, a fast one, this, duplicating the data members, and a slower one, "virtual inheritance". The reason to take the non-virtual approach is that if you inherit a class that has no data members (what would be an interface in Java), then there is no problem with "duplicating the data members", because they do not exist (see my note at the bottom). It is also advisable to use non-virtual inheritance if your plan is to only use single inheritance.
ii) If you have a virtual class C, then that is the way of saying in the C++ language that you would like to have the compiler perform acts of heroism to ensure that only one copy of any/all base classes exist in the memory layout of your derived class; I believe this also incurs a slight performance hit. If you use any 'B' members from a 'C' instance now, it will always refer to the same place in memory. Note that virtual inheritance has no bearing on whether your functions are virtual.
Aside: This also is completely unrelated to the concept of a class being abstract. To make a class abstract in C++, set any method declaration = 0, as in void foo() = 0;; doing so for any method (including the destructor) is sufficient to make the entire class abstract.
iii) Java outright forbids it. In Java there is only single inheritance plus the ability to implement any number of interfaces. While interfaces do grant you the "is-a" relationship and the ability to have virtual functions, they implicitly avoid the issues that C++ has with data layouts and diamond inheritance, as an interface cannot add any data members, ipso facto: there is no confusion about how to resolve any data member's location.
An important extension to iii is to realize that virtual function call dispatch is not impacted at all if you happen to "implement the same interface twice". The reason is that the method will always do the same thing, even if there were multiple copies of it in your virtual table; it only acts on the data of your class, it does not itself contain data that needs to be disambiguated.

When I add an extra virtual function to a derived class, what is the overhead?

I'm going through Marshall Cline's C++ Faq - specifically this link about how virtual functions are implemented in the compiler.
It seems to be saying that the vptr for a derived class, exists in the base class portion of the object. And when an instance of a derived class is created, another vptr is not created in the derived class part - simply the vptr that already exists in the base class part is initialised to point to the correct vtable.
My question is: what if I declare a virtual function in a derived class, that is not in the base class, what is the overhead? Is there an extra vptr created in the derived class part - or is it still done the same way, i.e. the vptr in the base class part is assigned to point to a particular vtable?
So - to make my question a bit more concrete - in the following example, does the compiler give Apple class an extra vtable, because it added peel_me() virtual function? (I'm assuming the answer must be yes ). If so, does the compiler give instances of Apple another vptr (i.e. on top of the one in it's Fruit base class part)?
class Fruit {
public:
virtual void display();
};
class Apple : public Fruit {
public:
virtual void display() { std::cout << "I'm an apple!\n"; }
virtual void peel_me(); // extra virtual function, that is not in the base class
};
I can't seem to find the answer to this anywhere.
Typically:
Apple needs its own vtable regardless of whether you add peel_me or not, because it needs its override of display to be found by virtual calls on instances of Apple.
Adding peel_me makes that vtable one entry bigger than it otherwise would be -- this additional entry may or may not occupy a significant amount of space compared to the code for peel_me, but you'd expect probably not. The vtables of all derived classes of Apple are also one entry larger than they would be if there was no peel_me.
Instances of Apple have a single vptr. It points to the vtable for Apple. This table contains entries for all virtual member functions of Apple, including those inherited from Fruit, and including any that aren't overridden in Apple (in which case the vtable entry in Apple refers to the implementation in Fruit).
If the base class already has a virtual function, none.
Only the first virtual function declared in a hierarchy chain affects the size of the object, because a pointer to the virtual table is added. Subsequent ones don't.
This is of course implementation-dependent, but most behave like this.
The compiler usually creates a vtable to store pointers to an object's virtual functions. So adding one usually costs (assuming the base class already has virtual functions so the overhead of storing a vtable has already been paid) the storage of one more function pointer and a speed impact of dereferencing a pointer for every call to that virtual function.
There's no absolute rule, and it depends on the implementation,
but... If your derived class already overloads at least one of
the virtual functions in the base class, it already has its own
vtable; adding a virtual function will add one more entry to
the vtable (typically either 4 or 8 bytes). But since there
should be only one instance of the vtable per process, that's
one pointer in the entire program, which on a modern machine,
can effectively be considered nothing.

Use of making the base class polymorphic?

I know the keyword virtual makes the base class polymorphic and if I create an object and call a virtual function, corresponding function will be called based on the run time allocation but why should I create an object with different types. I mean
Base *ptr = new Derived;
ptr->virtualfunction(); //calls the function which has implemented in Derived class.
If I create an object so that
Derived *ptr = new Derived;
ptr->virtualfunction(); // which does the same without the need of making the function virtual.
Because you might want to store objects of different types together:
std::vector<std::unique_ptr<Base>> v;
v.push_back(make_unique(new DerivedA()));
v.push_back(make_unique(new DerivedB()));
v.push_back(make_unique(new DerivedC()));
Now, if you go over that vector:
for (auto& p : v) {
p->foo();
}
It will call foo() of DerivedA, B, and C appropriately.
Let's go with a simple example : Let's say you have
class Base {};
class Derived1 : public Base {};
class Derived2 : public Base {};
Now, let's say you want to be able to store in a vector (or any container) both Derived1 and Derived2 instances.
You have to use the base class in that case.
std::vector<Base*>
// or std::vector<std::unique_ptr<Base>>
The need for polymorphism is the need of processing different data in the same manner. Rather than reimplementing over and over the same algorithm for dataset with different shapes, wouldn't it be much easier to have only one implementation of that algorithm, and parameterize it with different operators?
That's the essence of polymorphism. You start with an algorithm, establish the interface it must interact with, and then build implementations of that interface. In C++ the notion of interface is implicit in every classes. Any class exposes one interface (though it may support many interfaces through its ancestors), and its descendants implement it as well. By making certain methods virtuals, the descendants may override and adapt them to their own internal structures, without modifying how the object is manipulated from the outside.
So polymorphism is really that, values which may adopt different shapes, and the means to access and manipulate them uniformally. The key point in answering your question is perhaps that the algorithm does not know which implepentation it is manipulating. You provide a trivial example where the code knows that it works with an instance of Derived, and thus may call its methods directly. In generic code, or code refering to an interface (so to speak), that knowledge does not exist, which forces the code to rely on the base class methods (and requires the programmer to ensure that the classes he plans to use with that code are well defined - ie. virtual - where needed).
There are many useful applications of polymorphism, but they all derive from the above principle:
heterogeneous dataset (as illustrated by other answers),
injection ( in which different implementations of the same interface may be swapped one for another at runtime),
testing (and more specifically mocking, in which classes which interact with a given class C are replaced by dummies which help test the correct behaviour of C),
to name a few. Note that compile time polymorphism (templates), and runtime polymorphism (virtual methods and inheritance) both achieve that goal, albeit in a different way, and with different pros and cons.

Virtual dispatch implementation details

First of all, I want to make myself clear that I do understand that there is no notion of vtables and vptrs in the C++ standard. However I think that virtually all implementations implement the virtual dispatch mechanism in pretty much the same way (correct me if I am wrong, but this isn't the main question). Also, I believe I know how virtual functions work, that is, I can always tell which function will be called, I just need the implementation details.
Suppose someone asked me the following:
"You have base class B with virtual functions v1, v2, v3 and derived class D:B which overrides functions v1 and v3 and adds a virtual function v4. Explain how virtual dispatch works".
I would answer like this:
For each class with virtual functions(in this case B and D) we have a separate array of pointers-to-functions called vtable.
The vtable for B would contain
&B::v1
&B::v2
&B::v3
The vtable for D would contain
&D::v1
&B::v2
&D::v3
&D::v4
Now the class B contains a member pointer vptr. D naturally inherits it and therefore contains it too. In the constructor and destructor of B B sets vptr to point to B's vtable. In the constructor and destructor of D D sets it to point to D's vtable.
Any call to a virtual function f on an object x of polymorphic class X is interpreted as a call to x.vptr[f's position in vtables]
The questions are:
1. Do I have any errors in the above description?
2. How does the compiler know f's position in vtable (in detail, please)
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case? (try to describe in a similar manner as I did, in as much detail as possible)
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
Thanks in advance.
1. Do I have any errors in the above description?
All good. :-)
2. How does the compiler know f's position in vtable
Each vendor will have their own way of doing this, but I always think of the vtable as map of the member function signature to memory offset. So the compiler just maintains this list.
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case?
Typically, compilers compose a new vtable which consists of all the vtables of the virtual bases appended together in the order they were specified, along with the vtable pointer of the virtual base. They follow this with the vtable functions of the deriving class. This is extremely vendor-specific, but for class D : B1, B2, you typically see D._vptr[0] == B1._vptr.
That image is actually for composing the member fields of an object, but vtables can be composed by the compiler in the exact same way (as far as I understand it).
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
The short answer? Absolute hell. Did you virtually inherit both the bases? Just one of them? Neither of them? Ultimately, the same techniques of composing a vtable for the class are used, but how this is done varies way to wildly, since how it should be done is not at all set in stone. There is a decent explanation of solving the diamond-hierarchy problem here, but, like most of this, it is quite vendor-specific.
Looks good to me
Implementation specific, but most are just in source code order -- meaning the order they appear in the class -- starting with the base class, then adding on new virtual functions from the derived. As long as the compiler has a deterministic way of doing this, then anything it wants to do is fine. However, on Windows, to create COM compatible V-Tables, it has to be in source order
(not sure)
(guess) A diamond just means that you could have two copies of a base class B. Virtual inheritance will merge them into one instance. So if you set a member via D1, you can read it via D2. (with C derived from D1, D2, each of them derived from B). I believe that in both cases, the vtables would be identical, as the function pointers are the same -- the memory for data members is what is merged.
Comments:
I don't think destructors come into it!
A call such as e.g. D d; d.v1(); will probably not be implemented via the vtable, as the compiler can resolve the function address at compile/link-time.
The compiler knows f's position because it put it there!
Yes, a class with multiple base classes will typically have multiple vptrs (assuming virtual functions in each base class).
Scott Meyers' "Effective C++" books explain multiple inheritance and diamonds better than I can; I'd recommend reading them for this (and many other) reasons. Consider them essential reading!