Diamond inheritance - c++

Assume classes D and E and F all inherit from base class B, and that
class C inherits from D and E.
(i) How many copies of class B appear in class C?
(ii) How would using virtual inheritance change this scenario? Explain
your answer.
(iii) How does Java avoid the need for multiple inheritance for many
of the
situations where multiple inheritance might be used in C++?
Here are some of my current ideas, but I'm an by no means an expert on C++!
(i) If C inherits from D and E which are subclasses of B, then would D and E technically be copies of their super class? Then if C inherits from D and E that would mean there are 2 copies of B in C.
(ii) Using virtual is somewhat similar to using Abstract in Java (i think). Now given this, it would mean that there would not be multiple copies of B in C, as the instantiation would be cascaded down to the level it is needed. I am not sure how to word my explanation but say B has a function called print() which prints "i am B" and C overrides this function put prints "i am C". If you called print() on C without virtual you end up printing "i am B", using virtual would mean that it would print "i am C".
(iii) My idea here is that Java can use interfaces to avoid the use of multiple inheritance. You can implement multiple interfaces but you can only extend one Class. I'm not sure what else to add here, so any input or relevant resources would be helpful.

(i) and (iii) are right. In my experience anyway, most of the time in C++ when I've used multiple inheritance it's been because the bases were interfaces (a concept which doesn't have keyword support in C++, but it is a concept you can execute anyway).
The first sentence of (ii) is right, however your second sentence is talking about virtual functions, which is completely different to virtual inheritance. Virtual inheritance means that there is only one copy of B, and the D and E both have that same copy as their base. There is no difference in terms of functions, but the difference comes in terms of member variables (and base classes) of B.
If there is a function that prints out B's member variable foo; then in case (ii) this function always prints the same value because there is only one foo, but in case (i) calling that function from the D base class may print a different value to calling it from the E base class.
The term "diamond inheritance" wraps all this up in two words that serve as a good mnemonic :)

You seem to have mostly arrived at the right answers, though the reasoning needs work. The key issue at play here is the question of "how to lay out the memory of an instance of C if it inherits the same base class twice?"
i) There are 2 copies of the base class B in the memory layout for an object of type C. The example provided is a case of "diamond inheritance", because when you draw out the dependency/inheritance tree, you essentially draw a diamond. The "problem" with diamond inheritance is essentially to ask how to lay the object out in memory. C++ went with two approaches, a fast one, this, duplicating the data members, and a slower one, "virtual inheritance". The reason to take the non-virtual approach is that if you inherit a class that has no data members (what would be an interface in Java), then there is no problem with "duplicating the data members", because they do not exist (see my note at the bottom). It is also advisable to use non-virtual inheritance if your plan is to only use single inheritance.
ii) If you have a virtual class C, then that is the way of saying in the C++ language that you would like to have the compiler perform acts of heroism to ensure that only one copy of any/all base classes exist in the memory layout of your derived class; I believe this also incurs a slight performance hit. If you use any 'B' members from a 'C' instance now, it will always refer to the same place in memory. Note that virtual inheritance has no bearing on whether your functions are virtual.
Aside: This also is completely unrelated to the concept of a class being abstract. To make a class abstract in C++, set any method declaration = 0, as in void foo() = 0;; doing so for any method (including the destructor) is sufficient to make the entire class abstract.
iii) Java outright forbids it. In Java there is only single inheritance plus the ability to implement any number of interfaces. While interfaces do grant you the "is-a" relationship and the ability to have virtual functions, they implicitly avoid the issues that C++ has with data layouts and diamond inheritance, as an interface cannot add any data members, ipso facto: there is no confusion about how to resolve any data member's location.
An important extension to iii is to realize that virtual function call dispatch is not impacted at all if you happen to "implement the same interface twice". The reason is that the method will always do the same thing, even if there were multiple copies of it in your virtual table; it only acts on the data of your class, it does not itself contain data that needs to be disambiguated.

Related

Example for non-virtual multiple inheritance

Is there a real-world example where non-virtual multiple inheritance is being used? I'd like to have one mostly for didactic reasons. Slapping around classes named A, B, C, and D, where B and C inherit from A and D inherits from B and C is perfectly fine for explaining the question "Does/Should a D object have one or two A sub-objects?", but bears no weight about why we even have both options. Many examples care about why we do want virtual inheritance, but why would we not want virtual inheritance?
I know what virtual base classes are and how to express that stuff in code. I know about diamond inheritance and examples of multiple inheritance with a virtual base class are abundant.
The best I could find is vehicles. The base class is Vehicle which is inherited by Car and Boat. Among other things, a Vehicle has occupants() and a max_speed(). So an Amphibian that inherits from both Car and Boat inherits different max_speed() on land and water – and that makes sense –, but also different occupants() – and that does not make sense. So the Vehicle sub-objects aren't really independent; that is another problem which might be interesting to solve, but this is not the question.
Is there an example, that makes sense as a real-world model, where the two sub-objects are really independent?
You're thinking like an OOP programmer, trying to design abstract models of things. C++ multiple inheritance, like many things in C++, is a tool that has a particular effect. Whether it maps onto some OOP model is irrelevant next to the utility of the tool itself. To put it another way, you don't need a "real-world model" to justify non-virtual inheritance; you just need a real-world use case.
Because a derived class inherits the members of a base class, inheritance often is used in C++ as a means of collecting a set of common functionality together, sometimes with minimal interaction from the derived class, and injecting this functionality directly into the derived class.
The Curiously Recurring Template Pattern and other mixin-like constructs are mechanisms for doing this. The idea is that you have a base class that is a template, and its template parameter is the derived class that uses it. This allows the base class to have some access to the derived class itself without virtual functions.
The simplest example I can think of in C++ is enable_shared_from_this, which allows an object whose lifetime is currently managed by a shared_ptr to actually retrieve a shared_ptr to that object just from a pointer/reference to that object. That uses CRTP to add the various members and interfaces needed to make shared_from_this possible to the derived class. And since the inheritance is public, it also allows shared_ptr's various functions that "enable shared_from_this" to to detect that a particular type has the shared_from_this stuff in it and to properly initialize it.
enable_shared_from_this doesn't need virtual inheritance, and indeed would probably not work very well with it.
Now imagine that I have some other CRTP class that injects some other functionality into an object. This functionality has nothing to do with shared_ptr, but it uses CRTP and inheritance.
Well, if I now write some type that wants to inherit from both enable_shared_from_this and this other functionality, well, that works just fine. There is no need for virtual inheritance, and in fact doing so would only make composition that much harder.
Virtual inheritance is not free. It fundamentally changes a bunch of things about how a type relates to its base classes. If you inherit from such a type, your constructors have to initialize any virtual base classes directly. The layout of such a type is very odd and is highly unlikely to be standardized. And various other things. C++ tries not to make programmers pay for functionality they don't use, so if you don't need the special properties of virtual inheritance, you shouldn't be using it.
Its the same reason C++ has non-virtual methods -- because the implementation is simpler and more efficient if you use non-virtual inheritance, so you need to explicitly ask for virtual inheritance if you want it. Since you don't need it if your classes never use multiple inheritance, that is the default.

How could I avoid diamond inheritance?

I am currently working on a C++ design where I have this inheritance structure:
A
/ \
B C
Class A does the computations that are common to both classes B and C, and classes B and C are two different ways of initializing A.
I'd like to add some sort of hybrid initialization, i.e. a class D that would use methods from B and C.
However, I'd need to use diamond inheritance to be able to access B::init() and C::init() to set up the attributes of D.
I know I can avoid diamond problems using virtual inheritance, but I get runtime errors that I don't have when I copy manually the methods. Moreover, I have problems when trying to instantiate the classes B and C, and a colleague advised me to never use diamond inheritance in my designs.
Therefore, I'd like to find some kind of "clean" workaround, which I have not been able to do.
I could put all the initialization routines in class A, but for the moment they are separated nicely and I'd like to avoid having one big class where I can't really separate the distinct groups of functions of the classes B and C. EDIT after answer: This is what I chose, using different cpp files to split my "big" class into logical groups of methods.
I could also remove the inheritance links and replace them with friendship, where the methods of B and C are static and work on a pointer of type A*. This way, I could call B::init(A* a) and C::init(A* a) from D::init(A* a). However, I'd have to replace all the uses of _fooAttribute by a->_fooAttribute, which is a bit cumbersome and does not seem right.
What would you recommend ?
If your design calls for diamond inheritance, then that is what you need to do. People treat it as a "must not use" feature of C++, but the truth of the matter is that it is there, it is fully defined (if somewhat complex to understand), and if your problem space calls for it, you should use it.
In particular, I was not able to understand whether this is, indeed, a diamond inheritance. In particular, does it make sense for the A inside B and the A inside C to be the same instance of A? From your question it would appear that it is not. That Both B and C has a certain, different, way it makes sense to initialize A. If that is the case, this is not a diamond inheritance. Just make sure that B and C inherit A in a non-virtual inheritance.
With that said, make sure this is, indeed, what your design calls for. Can you honestly say that B is a A? That C? Can you honestly say that D is both a B and a C? If not, maybe making A a member of B, C or both, or making B or C members of D would make more sense.
If the only reason you are inheriting from A is as a way to extend A's provided methods, then consider simply making those methods a member of A. As stated above, while reducing code duplication is a worthy cause, the design should make sure that inheritance relationship is a is a relationship. Deviating from that is asking for trouble.
Inheritance is an "is a" relationship. If B is an A, then you're good. The same applies to C. From your description, you do not have this relationship. Instead, you have a utility class (A) that does computations. You might want to make this have static methods as it shouldn't need to store any data in itself, if it's truly a utility. There's nothing wrong with passing A an instance of B or C and having it access the properties that it needs using B->fooAttribute. However, you will probably want both B and C to implement a common interface so you don't have to know which one you're looking at.

Difference in VTBL in single inheritance and multiple inheritance

I was taught in class that in the case of single inheritance the VTBL includes all of the of the virtual functions the class can respond to. The following image should illustrate this.
In multiple inheritance I was taught that the VTBL includes all of the virtual functions that were first defined in that class or the ones which have been overriden in this class. This means that at run time you've got to search for the right method implementation using the dispatch algorithm.
I'm not entirely sure why this difference exists. Why couldn't the VTBL in the case of multiple inheritance consist of all the virtual functions that the class can respond to (just like in the case of single inheritance)? This should speed up the process since we don't have to look for the method implementation at run time throughout the whole inheritance hierarchy.
Can anyone clarify this for me?
Edit: When I refer to the dispatch algorithm for multiple inheritance I'm referring to the following:
Just to clarify: notice how we've got to traverse the hierarchy to search for the implementation rather than just going to the current class's VTBL and calling jumping to the method.
Here's a translated example from published German notes by Scott Meyers. Consider
class B1 {
public:
virtual void mf(); // may be overridden in derived classes
};
class B2 {
public:
virtual void mf(); // may be overridden in derived classes
};
class D: public B1, public B2 {};
void g(B2 *pb2)
{
pb2->mf(); // requires offset adjustment before calling mf?
}
The pointer argument being passed to g() needs an offset adjustment is needed only if D overrides mf and pb2 really points to a D. What should a compiler do? When generating code for the call,
It may not know that D exists. (that's the point of dynamic polymorphism: to be able to call future code without recompiling)
It can’t know whether pb2 points to a D (it only knows that only at runtime).
Because polymorphic classes need to remain flexible against the unbounded set of possible future further derivations, the problem is typically solved by
Creating special vtbls that handle offset adjustments.
For derived class objects, adding new vptrs to these vtbls, one
additional vptr for each base class after the first one.
Merging all the virtual functions into a single table would destroy that flexibility. Note that multiple "parallel" inheritance D: B1, B2 {}; is different from "stacked" inheritance D: M: B {};. The latter requires a single substitution chain, the former has two such chains and incompatible B1 and B2.
If you have to base class A and B of your multiply inherited object D, these have their own vtable layout and D needs to provide vtables which match the vtables of both A and B. Further, if another class derives from D and possibly from another similarly multiple inherited class, the same thing happens again, i.e., there are multiple vtables needed. They can't just simply be merged. As a result, multiply inherited objects typically have multiple vtables around and the compiler inserts code to first determine the function's correct vtable and then call it. I think the code determining the correct vtable based on a pointer to an object with multiple bases is just a simple addition or subtraction if the virtual function is not in a virtual base class and a look-up of the location of the virtual base class otherwise, i.e., there isn't anything really expensive being done but more than just an indirect call is needed.

Why does virtual inheritance need to be specified in the middle of a diamond hierarchy?

I have diamond hierarchy of classes:
A
/ \
B C
\ /
D
To avoid two copies of A in D, we need to use virtual inheritance at B and C.
class A { };
class B: virtual public A {};
class C: virtual public A { };
class D: public B, public C { };
Question: Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D? It would have been more intuitive if it is at D.
Why is this feature designed like this by standards committee?
What can we do if B and C classes are coming from 3rd party library ?
EDIT: My answer was to indicate B and C classes that they should not invoke A's constructor whenever its derived object gets created, as it will be invoked by D.
I'm not sure of the exact reason they chose to design virtual inheritance this way, but I believe the reason has to do with object layout.
Suppose that C++ was designed in a way where to resolve the diamond problem, you would virtually inherit B and C in D rather than virtually inheriting A in B and C. Now, what would the object layout for B and C be? Well, if no one ever tries to virtually inherit from them, then they'd each have their own copy of A and could use the standard, optimized layout where B and C each have an A at their base. However, if someone does virtually inherit from either B or C, then the object layout would have to be different because the two would have to share their copy of A.
The problem with this is that when the compiler first sees B and C, it can't know if anyone is going to be inheriting from them. Consequently, the compiler would have to fall back on the slower version of inheritance used in virtual inheritance rather than the more optimized version of inheritance that is turned on by default. This violates the C++ principle of "don't pay what you don't use for," (the zero-overhead principle) where you only pay for language features you explicitly use.
Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D? It would have been more intuitive if it is at D.
In your example, B and C are using virtual specifically to ask the compiler to ensure there's only one copy of A involved. If they didn't do this, they're effectively saying "I need my own A base class, I'm not expecting to share it with any other derived object". This could be crucial.
Example of not wanting to share a virtual base class
If A was some kind of container, B was derived from it and stored some particular type of object - say "Bat", while C stores "Cat". If D expects to have B and C independently providing information on a population of Bats and Cats they'd be very surprised if a C operation did something to/with the Bats, or a B operation did something to/with the Cats.
Example of wanting to share a virtual base class
Say D needs to provide access to some functions or data members that are in A, say "A::x"... if A is inherited independently (non-virtually) by B and C, then the compiler can't resolve D::x to B::x or C::x without the programmer having to explicitly disambiguate it. This means D can't be used as an A despite having not one but two "is-a" relationships implied by the derivation chain (i.e. if B "is a" A, and D "is a" B, then the user may expect/need to use D as if D "is a" A).
Why is this feature designed like this by standards committee?
virtual inheritance exists because it's sometimes useful. It's specified by B and C, rather than D, because it's an intrusive concept in terms of the design of B and C, and also has implications for the encapsulation, memory layout, construction and destruction and function dispatch of B and C.
What can we do if B and C classes are coming from 3rd party library ?
If D needs to inherit from both and provide access to an A, but B and C weren't designed to use virtual inheritance and can't be changed, then D must take responsibility for forwarding any requests matching the A API to either B and/or C and/or optionally another A it directly inherits from (if it needs a visible "is A" relationship). That might be practical if the calling code knows it's dealing with a D (even if via templating), but operations on the object via pointers to the base classes will not know about the management D is attempting to perform, and the whole thing may be very tricky to get right. But it's a bit like saying "what if I need a vector and I've only got a list", "a saw and not a screwdriver"... well, make the most of it or get what you really need.
EDIT: My answer was to indicate B and C classes that they should not invoke A's constructor whenever its derived object gets created, as it will be invoked by D.
That's an important aspect of this, yes.
In addition to templatetypedef answer, it may be pointed out that you also may wrap A into
class AVirt:virtual public A{};
and inherit other classes from it. You wil not need to mark explicitly other inheriances as virtual in this case
Question: Why does virtual inheritance needs to be performed at B and C, even though the ambiguity is at D?
Because B's and C's methods must know they might have to work on objects whose layout is much different from B's and C's own layouts. With single inheritance it is not a problem, because derived classes just append their attributes after parent's original layout.
With multiple inheritance you cannot to that because there's no single parent's layout in the first place. Moreover (if you want to avoid A's duplication) parents' layouts need to overlap on A's attributes. Multiple inheritance in C++ hides quite a lot of complexity.
As A is the multiply-inherited class it is those that derive from it directly that have to do so virtual.
If you have a situation where B and C both derive from A and you want both in D and you can't use the diamond, then D can derive from just one of B and C, and have an instance of the other, through which it can forward functions.
workaround something like this:
class B : public A; // not your class, cannot change
class C : public A; // not your class, cannot change
class D : public B; // your class, implement the functions of B
class D2 : public C; // your class implement the functions of C
class D
{
D2 d2;
};

Virtual dispatch implementation details

First of all, I want to make myself clear that I do understand that there is no notion of vtables and vptrs in the C++ standard. However I think that virtually all implementations implement the virtual dispatch mechanism in pretty much the same way (correct me if I am wrong, but this isn't the main question). Also, I believe I know how virtual functions work, that is, I can always tell which function will be called, I just need the implementation details.
Suppose someone asked me the following:
"You have base class B with virtual functions v1, v2, v3 and derived class D:B which overrides functions v1 and v3 and adds a virtual function v4. Explain how virtual dispatch works".
I would answer like this:
For each class with virtual functions(in this case B and D) we have a separate array of pointers-to-functions called vtable.
The vtable for B would contain
&B::v1
&B::v2
&B::v3
The vtable for D would contain
&D::v1
&B::v2
&D::v3
&D::v4
Now the class B contains a member pointer vptr. D naturally inherits it and therefore contains it too. In the constructor and destructor of B B sets vptr to point to B's vtable. In the constructor and destructor of D D sets it to point to D's vtable.
Any call to a virtual function f on an object x of polymorphic class X is interpreted as a call to x.vptr[f's position in vtables]
The questions are:
1. Do I have any errors in the above description?
2. How does the compiler know f's position in vtable (in detail, please)
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case? (try to describe in a similar manner as I did, in as much detail as possible)
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
Thanks in advance.
1. Do I have any errors in the above description?
All good. :-)
2. How does the compiler know f's position in vtable
Each vendor will have their own way of doing this, but I always think of the vtable as map of the member function signature to memory offset. So the compiler just maintains this list.
3. Does this mean that if a class has two bases then it has two vptrs? What is happening in this case?
Typically, compilers compose a new vtable which consists of all the vtables of the virtual bases appended together in the order they were specified, along with the vtable pointer of the virtual base. They follow this with the vtable functions of the deriving class. This is extremely vendor-specific, but for class D : B1, B2, you typically see D._vptr[0] == B1._vptr.
That image is actually for composing the member fields of an object, but vtables can be composed by the compiler in the exact same way (as far as I understand it).
4. What's happening in a diamond hierarchy with A on top B,C in the middle and D at the bottom? (A is a virtual base class of B and C)
The short answer? Absolute hell. Did you virtually inherit both the bases? Just one of them? Neither of them? Ultimately, the same techniques of composing a vtable for the class are used, but how this is done varies way to wildly, since how it should be done is not at all set in stone. There is a decent explanation of solving the diamond-hierarchy problem here, but, like most of this, it is quite vendor-specific.
Looks good to me
Implementation specific, but most are just in source code order -- meaning the order they appear in the class -- starting with the base class, then adding on new virtual functions from the derived. As long as the compiler has a deterministic way of doing this, then anything it wants to do is fine. However, on Windows, to create COM compatible V-Tables, it has to be in source order
(not sure)
(guess) A diamond just means that you could have two copies of a base class B. Virtual inheritance will merge them into one instance. So if you set a member via D1, you can read it via D2. (with C derived from D1, D2, each of them derived from B). I believe that in both cases, the vtables would be identical, as the function pointers are the same -- the memory for data members is what is merged.
Comments:
I don't think destructors come into it!
A call such as e.g. D d; d.v1(); will probably not be implemented via the vtable, as the compiler can resolve the function address at compile/link-time.
The compiler knows f's position because it put it there!
Yes, a class with multiple base classes will typically have multiple vptrs (assuming virtual functions in each base class).
Scott Meyers' "Effective C++" books explain multiple inheritance and diamonds better than I can; I'd recommend reading them for this (and many other) reasons. Consider them essential reading!