Virtual dispatch implementation details - c++

First of all, I want to make myself clear that I do understand that there is no notion of vtables and vptrs in the C++ standard. However I think that virtually all implementations implement the virtual dispatch mechanism in pretty much the same way (correct me if I am wrong, but this isn't the main question). Also, I believe I know how virtual functions work, that is, I can always tell which function will be called, I just need the implementation details.
Suppose someone asked me the following:
"You have base class B with virtual functions v1, v2, v3 and derived class D:B which overrides functions v1 and v3 and adds a virtual function v4. Explain how virtual dispatch works".
I would answer like this:
For each class with virtual functions(in this case B and D) we have a separate array of pointers-to-functions called vtable.
The vtable for B would contain
&B::v1
&B::v2
&B::v3
The vtable for D would contain
&D::v1
&B::v2
&D::v3
&D::v4
Now the class B contains a member pointer vptr. D naturally inherits it and therefore contains it too. In the constructor and destructor of B B sets vptr to point to B's vtable. In the constructor and destructor of D D sets it to point to D's vtable.
Any call to a virtual function f on an object x of polymorphic class X is interpreted as a call to x.vptr[f's position in vtables]
The questions are:
1. Do I have any errors in the above description?
2. How does the compiler know f's position in vtable (in detail, please)
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case? (try to describe in a similar manner as I did, in as much detail as possible)
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
Thanks in advance.

1. Do I have any errors in the above description?
All good. :-)
2. How does the compiler know f's position in vtable
Each vendor will have their own way of doing this, but I always think of the vtable as map of the member function signature to memory offset. So the compiler just maintains this list.
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case?
Typically, compilers compose a new vtable which consists of all the vtables of the virtual bases appended together in the order they were specified, along with the vtable pointer of the virtual base. They follow this with the vtable functions of the deriving class. This is extremely vendor-specific, but for class D : B1, B2, you typically see D._vptr[0] == B1._vptr.
That image is actually for composing the member fields of an object, but vtables can be composed by the compiler in the exact same way (as far as I understand it).
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
The short answer? Absolute hell. Did you virtually inherit both the bases? Just one of them? Neither of them? Ultimately, the same techniques of composing a vtable for the class are used, but how this is done varies way to wildly, since how it should be done is not at all set in stone. There is a decent explanation of solving the diamond-hierarchy problem here, but, like most of this, it is quite vendor-specific.

Looks good to me
Implementation specific, but most are just in source code order -- meaning the order they appear in the class -- starting with the base class, then adding on new virtual functions from the derived. As long as the compiler has a deterministic way of doing this, then anything it wants to do is fine. However, on Windows, to create COM compatible V-Tables, it has to be in source order
(not sure)
(guess) A diamond just means that you could have two copies of a base class B. Virtual inheritance will merge them into one instance. So if you set a member via D1, you can read it via D2. (with C derived from D1, D2, each of them derived from B). I believe that in both cases, the vtables would be identical, as the function pointers are the same -- the memory for data members is what is merged.

Comments:
I don't think destructors come into it!
A call such as e.g. D d; d.v1(); will probably not be implemented via the vtable, as the compiler can resolve the function address at compile/link-time.
The compiler knows f's position because it put it there!
Yes, a class with multiple base classes will typically have multiple vptrs (assuming virtual functions in each base class).
Scott Meyers' "Effective C++" books explain multiple inheritance and diamonds better than I can; I'd recommend reading them for this (and many other) reasons. Consider them essential reading!

Related

Diamond inheritance

Assume classes D and E and F all inherit from base class B, and that
class C inherits from D and E.
(i) How many copies of class B appear in class C?
(ii) How would using virtual inheritance change this scenario? Explain
your answer.
(iii) How does Java avoid the need for multiple inheritance for many
of the
situations where multiple inheritance might be used in C++?
Here are some of my current ideas, but I'm an by no means an expert on C++!
(i) If C inherits from D and E which are subclasses of B, then would D and E technically be copies of their super class? Then if C inherits from D and E that would mean there are 2 copies of B in C.
(ii) Using virtual is somewhat similar to using Abstract in Java (i think). Now given this, it would mean that there would not be multiple copies of B in C, as the instantiation would be cascaded down to the level it is needed. I am not sure how to word my explanation but say B has a function called print() which prints "i am B" and C overrides this function put prints "i am C". If you called print() on C without virtual you end up printing "i am B", using virtual would mean that it would print "i am C".
(iii) My idea here is that Java can use interfaces to avoid the use of multiple inheritance. You can implement multiple interfaces but you can only extend one Class. I'm not sure what else to add here, so any input or relevant resources would be helpful.
(i) and (iii) are right. In my experience anyway, most of the time in C++ when I've used multiple inheritance it's been because the bases were interfaces (a concept which doesn't have keyword support in C++, but it is a concept you can execute anyway).
The first sentence of (ii) is right, however your second sentence is talking about virtual functions, which is completely different to virtual inheritance. Virtual inheritance means that there is only one copy of B, and the D and E both have that same copy as their base. There is no difference in terms of functions, but the difference comes in terms of member variables (and base classes) of B.
If there is a function that prints out B's member variable foo; then in case (ii) this function always prints the same value because there is only one foo, but in case (i) calling that function from the D base class may print a different value to calling it from the E base class.
The term "diamond inheritance" wraps all this up in two words that serve as a good mnemonic :)
You seem to have mostly arrived at the right answers, though the reasoning needs work. The key issue at play here is the question of "how to lay out the memory of an instance of C if it inherits the same base class twice?"
i) There are 2 copies of the base class B in the memory layout for an object of type C. The example provided is a case of "diamond inheritance", because when you draw out the dependency/inheritance tree, you essentially draw a diamond. The "problem" with diamond inheritance is essentially to ask how to lay the object out in memory. C++ went with two approaches, a fast one, this, duplicating the data members, and a slower one, "virtual inheritance". The reason to take the non-virtual approach is that if you inherit a class that has no data members (what would be an interface in Java), then there is no problem with "duplicating the data members", because they do not exist (see my note at the bottom). It is also advisable to use non-virtual inheritance if your plan is to only use single inheritance.
ii) If you have a virtual class C, then that is the way of saying in the C++ language that you would like to have the compiler perform acts of heroism to ensure that only one copy of any/all base classes exist in the memory layout of your derived class; I believe this also incurs a slight performance hit. If you use any 'B' members from a 'C' instance now, it will always refer to the same place in memory. Note that virtual inheritance has no bearing on whether your functions are virtual.
Aside: This also is completely unrelated to the concept of a class being abstract. To make a class abstract in C++, set any method declaration = 0, as in void foo() = 0;; doing so for any method (including the destructor) is sufficient to make the entire class abstract.
iii) Java outright forbids it. In Java there is only single inheritance plus the ability to implement any number of interfaces. While interfaces do grant you the "is-a" relationship and the ability to have virtual functions, they implicitly avoid the issues that C++ has with data layouts and diamond inheritance, as an interface cannot add any data members, ipso facto: there is no confusion about how to resolve any data member's location.
An important extension to iii is to realize that virtual function call dispatch is not impacted at all if you happen to "implement the same interface twice". The reason is that the method will always do the same thing, even if there were multiple copies of it in your virtual table; it only acts on the data of your class, it does not itself contain data that needs to be disambiguated.

Difference in VTBL in single inheritance and multiple inheritance

I was taught in class that in the case of single inheritance the VTBL includes all of the of the virtual functions the class can respond to. The following image should illustrate this.
In multiple inheritance I was taught that the VTBL includes all of the virtual functions that were first defined in that class or the ones which have been overriden in this class. This means that at run time you've got to search for the right method implementation using the dispatch algorithm.
I'm not entirely sure why this difference exists. Why couldn't the VTBL in the case of multiple inheritance consist of all the virtual functions that the class can respond to (just like in the case of single inheritance)? This should speed up the process since we don't have to look for the method implementation at run time throughout the whole inheritance hierarchy.
Can anyone clarify this for me?
Edit: When I refer to the dispatch algorithm for multiple inheritance I'm referring to the following:
Just to clarify: notice how we've got to traverse the hierarchy to search for the implementation rather than just going to the current class's VTBL and calling jumping to the method.
Here's a translated example from published German notes by Scott Meyers. Consider
class B1 {
public:
virtual void mf(); // may be overridden in derived classes
};
class B2 {
public:
virtual void mf(); // may be overridden in derived classes
};
class D: public B1, public B2 {};
void g(B2 *pb2)
{
pb2->mf(); // requires offset adjustment before calling mf?
}
The pointer argument being passed to g() needs an offset adjustment is needed only if D overrides mf and pb2 really points to a D. What should a compiler do? When generating code for the call,
It may not know that D exists. (that's the point of dynamic polymorphism: to be able to call future code without recompiling)
It can’t know whether pb2 points to a D (it only knows that only at runtime).
Because polymorphic classes need to remain flexible against the unbounded set of possible future further derivations, the problem is typically solved by
Creating special vtbls that handle offset adjustments.
For derived class objects, adding new vptrs to these vtbls, one
additional vptr for each base class after the first one.
Merging all the virtual functions into a single table would destroy that flexibility. Note that multiple "parallel" inheritance D: B1, B2 {}; is different from "stacked" inheritance D: M: B {};. The latter requires a single substitution chain, the former has two such chains and incompatible B1 and B2.
If you have to base class A and B of your multiply inherited object D, these have their own vtable layout and D needs to provide vtables which match the vtables of both A and B. Further, if another class derives from D and possibly from another similarly multiple inherited class, the same thing happens again, i.e., there are multiple vtables needed. They can't just simply be merged. As a result, multiply inherited objects typically have multiple vtables around and the compiler inserts code to first determine the function's correct vtable and then call it. I think the code determining the correct vtable based on a pointer to an object with multiple bases is just a simple addition or subtraction if the virtual function is not in a virtual base class and a look-up of the location of the virtual base class otherwise, i.e., there isn't anything really expensive being done but more than just an indirect call is needed.

Why does virtual inheritance need to be specified in the middle of a diamond hierarchy?

I have diamond hierarchy of classes:
A
/ \
B C
\ /
D
To avoid two copies of A in D, we need to use virtual inheritance at B and C.
class A { };
class B: virtual public A {};
class C: virtual public A { };
class D: public B, public C { };
Question: Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D? It would have been more intuitive if it is at D.
Why is this feature designed like this by standards committee?
What can we do if B and C classes are coming from 3rd party library ?
EDIT: My answer was to indicate B and C classes that they should not invoke A's constructor whenever its derived object gets created, as it will be invoked by D.
I'm not sure of the exact reason they chose to design virtual inheritance this way, but I believe the reason has to do with object layout.
Suppose that C++ was designed in a way where to resolve the diamond problem, you would virtually inherit B and C in D rather than virtually inheriting A in B and C. Now, what would the object layout for B and C be? Well, if no one ever tries to virtually inherit from them, then they'd each have their own copy of A and could use the standard, optimized layout where B and C each have an A at their base. However, if someone does virtually inherit from either B or C, then the object layout would have to be different because the two would have to share their copy of A.
The problem with this is that when the compiler first sees B and C, it can't know if anyone is going to be inheriting from them. Consequently, the compiler would have to fall back on the slower version of inheritance used in virtual inheritance rather than the more optimized version of inheritance that is turned on by default. This violates the C++ principle of "don't pay what you don't use for," (the zero-overhead principle) where you only pay for language features you explicitly use.
Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D? It would have been more intuitive if it is at D.
In your example, B and C are using virtual specifically to ask the compiler to ensure there's only one copy of A involved. If they didn't do this, they're effectively saying "I need my own A base class, I'm not expecting to share it with any other derived object". This could be crucial.
Example of not wanting to share a virtual base class
If A was some kind of container, B was derived from it and stored some particular type of object - say "Bat", while C stores "Cat". If D expects to have B and C independently providing information on a population of Bats and Cats they'd be very surprised if a C operation did something to/with the Bats, or a B operation did something to/with the Cats.
Example of wanting to share a virtual base class
Say D needs to provide access to some functions or data members that are in A, say "A::x"... if A is inherited independently (non-virtually) by B and C, then the compiler can't resolve D::x to B::x or C::x without the programmer having to explicitly disambiguate it. This means D can't be used as an A despite having not one but two "is-a" relationships implied by the derivation chain (i.e. if B "is a" A, and D "is a" B, then the user may expect/need to use D as if D "is a" A).
Why is this feature designed like this by standards committee?
virtual inheritance exists because it's sometimes useful. It's specified by B and C, rather than D, because it's an intrusive concept in terms of the design of B and C, and also has implications for the encapsulation, memory layout, construction and destruction and function dispatch of B and C.
What can we do if B and C classes are coming from 3rd party library ?
If D needs to inherit from both and provide access to an A, but B and C weren't designed to use virtual inheritance and can't be changed, then D must take responsibility for forwarding any requests matching the A API to either B and/or C and/or optionally another A it directly inherits from (if it needs a visible "is A" relationship). That might be practical if the calling code knows it's dealing with a D (even if via templating), but operations on the object via pointers to the base classes will not know about the management D is attempting to perform, and the whole thing may be very tricky to get right. But it's a bit like saying "what if I need a vector and I've only got a list", "a saw and not a screwdriver"... well, make the most of it or get what you really need.
EDIT: My answer was to indicate B and C classes that they should not invoke A's constructor whenever its derived object gets created, as it will be invoked by D.
That's an important aspect of this, yes.
In addition to templatetypedef answer, it may be pointed out that you also may wrap A into
class AVirt:virtual public A{};
and inherit other classes from it. You wil not need to mark explicitly other inheriances as virtual in this case
Question: Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D?
Because B's and C's methods must know they might have to work on objects whose layout is much different from B's and C's own layouts. With single inheritance it is not a problem, because derived classes just append their attributes after parent's original layout.
With multiple inheritance you cannot to that because there's no single parent's layout in the first place. Moreover (if you want to avoid A's duplication) parents' layouts need to overlap on A's attributes. Multiple inheritance in C++ hides quite a lot of complexity.
As A is the multiply-inherited class it is those that derive from it directly that have to do so virtual.
If you have a situation where B and C both derive from A and you want both in D and you can't use the diamond, then D can derive from just one of B and C, and have an instance of the other, through which it can forward functions.
workaround something like this:
class B : public A; // not your class, cannot change
class C : public A; // not your class, cannot change
class D : public B; // your class, implement the functions of B
class D2 : public C; // your class implement the functions of C
class D
{
D2 d2;
};

C++ vtable resolving with virtual inheritance

I was curious about C++ and virtual inheritance - in particular, the way that vtable conflicts are resolved between bass and child classes. I won't pretend to understand the specifics on how they work, but what I've gleamed so far is that their is a small delay caused by using virtual functions due to that resolution. My question then is if the base class is blank - ie, its virtual functions are defined as:
virtual void doStuff() = 0;
Does this mean that the resolution is not necessary, because there's only one set of functions to pick from?
Forgive me if this is an stupid question - as I said, I don't understand how vtables work so I don't really know any better.
EDIT
So if I have an abstract class with two seperate child classes:
A
/ \
/ \
B C
There is no performance hit when calling functions from the child classes compared to say, just a single inheritance free class?
There is no hit for calling nonvirtual functions in the child class. If you're calling an overridden version of your pure virtual function as in your example, then the virtual penalty may still exist. In general it's difficult for compilers to optimize away the use of the virtual table except under very specific circumstances, where it knows the exact by-value type of the object in question (from context).
But seriously don't worry about the overhead. It's going to be so little that in practice you will almost certainly never encounter a situation where it's the part of code causing performance bottlenecks. Use virtual functions where they make sense for your design and don't worry about the (tiny) performance penalty.
I don't know what "one set of functions" you are talking about. You have two derived classes - B and C - with each having its own set of virtual functions. So, you have at least two sets, even if all functions in A are pure.
The virtual dispatch occurs when the compiler does not know the dynamic type of the object it is working with. For example, if your have a pointer A *p, it can point to an object of type B or type C. If the compiler does not know what is the actual type of the object p is pointing to, it will have to use virtual dispatch in order to call virtual functions through p.
P.S. There's no "virtual inheritance" in your example. The term virtual inheritance in C++ has its own meaning. And you are not talking about virtual inheritance here.
The 'double dispatch' hit only occurs when the method is virtual. If the derived method is not virtual, there is no performance hit.

Does an abstract classes have a VTABLE?

Do we have virtual table for an abstract class?
First of all, usage of vtables is implementation defined and not mandated by the standard.
For implementations that use vtable, the answer is: Yes, usually. You might think that vtable isn't required for abstract classes because the derived class will have its own vtable, but it is needed during construction: While the base class is being constructed, it sets the vtable pointer to its own vtable. Later when the derived class constructor is entered, it will use its own vtable instead.
That said, in some cases this isn't needed and the vtable can be optimized away. For example, MS Visual C++ provides the __declspec(novtable) flag to disable vtable generation on pure interface classes.
There seems to be a common misconception here, and I think traces of its sources can still be found online. Paul DiLascia wrote sometime in 2000 that -
...see that the compiler still
generates a vtable all of whose
entries are NULL and still generates
code to initialize the vtable in the
constructor or destructor for A.
That may actually have been true then, but certainly isn't now.
Yes, abstract classes do have vtables, also with pure abstract methods (these can actually be implemented and called), and yes - their constructor does initialize the pure entries to a specified value. For VC++ at least, that value is in the address of the CRT function _purecall. You can in fact control that value, either by overloading purecall yourself or using _set_purecall_handler.
We have a virtual table for a class which has atleast one virtual function.
that virtual function can also be pure.
this means. an abstact class can have a vtable.
in case of abstact classes the vtable entry will be NULL.
when ever you try to instantiate a abstract class it will check in tha vtable and check for a NULL value is present or not.
if NULL is present the compiler will throw an error.