C++ class hierarchy in which to add functionality with an interface - c++

C++. Imagine the following situation.
There's a class hierarchy of classes deriving from some base class A.
We cannot modify A because it is outside of our scope.
(Provided by a library, it is a MFC CView class, but that shouldn't matter here)
So there are A1, A2 etc which are different classes somehow derived from A and providing specific functionality.
Now imagine we define some new interface I to provide some new functionality.
Classes for concrete objects of the application will inherit from both one of the As and I.
Let's call them Bs. (There are again several of them, like B1 derived from A1 and I, B2 derived from A2 and I etc.)
Now it happens that to implement the interface of I, there is a lot of common code that needs functionality from A.
How can we organize the class hierarchy without repeating ourselves too much.
So for instance if there is a function I::f that needs to call A::f, for all derived classes Bn.
It seems like waste to re-implement I::f for every Bn.
But obviously, we cannot call A::f directly from I::f, as they aren't related.
I hope you get the point.
What is the pattern that can help us here?

The immediate solution to "call A::f from I::f without overhauling everything" would be dynamic_cast:
struct I {
void f() {
dynamic_cast<A *>(this)->f();
}
};
Note that this performs a full-fledged RTTI graph traversal to perform the cross-cast through the unknown Ax dynamic type of the object, so it might be on the slow side of things.

Related

Inheritance and member functions

Consider I have series of derived classes for example as listed below:
class A
{
...
}
class B1 : public A //there may be many B's here say, B2, B3 etc
{
...
}
class C1 : public B1 //may be more C's as well
{
...
}
I would like to put all of the objects in single container, thus all would be of type class A.
Suppose I would like to add a function to class C1, what would be the best way to achieve this? My options would be introducing it in the base class A and write the needed implementation in C1, or, I could introduce it in C1 and do dynamic casting to access it. Which one is preferred? Is dynamic casting too expensive? (My main constrain is the run time.I have a flag in the base class to indicate what type of derived object it is, thus I do not have to dynamic cast every object in the container. Does adding unnecessary functions to base class can result in bad instruction cache use?)
You don't tell us the purpose of the new function in C1, and this does affect the answer, but as rough guidelines:
If the new function is a general behavior that you may need on any object and C1 happens to be the first user, definitely just add the interface to A.
If the new function is specific to the C series of classes but it can follow some general pattern (for example post-processing), add a post_process method to A, override it in C1, and have that method call private implementation methods of C1 to do the actual specific post-processing task.
If neither of these are the case you may wish to reconsider your use of inheritance as it's possible you're using it to represent a relationship other than substitution.
adding a virtual function to your base class A is better because:
you should avoid dynamic cast especially in performance sensitive code. Please see Performance of dynamic_cast?
you should avoid having conditions to examine the object type (e.g. is it A, B1, or C1 ?) before performing a type-specific operation. Not only because it's slow, but also because if you do so, every time you add a new object type (e.g. C2) you will need to check all those conditions to see if they need to be updated.

Difference in VTBL in single inheritance and multiple inheritance

I was taught in class that in the case of single inheritance the VTBL includes all of the of the virtual functions the class can respond to. The following image should illustrate this.
In multiple inheritance I was taught that the VTBL includes all of the virtual functions that were first defined in that class or the ones which have been overriden in this class. This means that at run time you've got to search for the right method implementation using the dispatch algorithm.
I'm not entirely sure why this difference exists. Why couldn't the VTBL in the case of multiple inheritance consist of all the virtual functions that the class can respond to (just like in the case of single inheritance)? This should speed up the process since we don't have to look for the method implementation at run time throughout the whole inheritance hierarchy.
Can anyone clarify this for me?
Edit: When I refer to the dispatch algorithm for multiple inheritance I'm referring to the following:
Just to clarify: notice how we've got to traverse the hierarchy to search for the implementation rather than just going to the current class's VTBL and calling jumping to the method.
Here's a translated example from published German notes by Scott Meyers. Consider
class B1 {
public:
virtual void mf(); // may be overridden in derived classes
};
class B2 {
public:
virtual void mf(); // may be overridden in derived classes
};
class D: public B1, public B2 {};
void g(B2 *pb2)
{
pb2->mf(); // requires offset adjustment before calling mf?
}
The pointer argument being passed to g() needs an offset adjustment is needed only if D overrides mf and pb2 really points to a D. What should a compiler do? When generating code for the call,
It may not know that D exists. (that's the point of dynamic polymorphism: to be able to call future code without recompiling)
It can’t know whether pb2 points to a D (it only knows that only at runtime).
Because polymorphic classes need to remain flexible against the unbounded set of possible future further derivations, the problem is typically solved by
Creating special vtbls that handle offset adjustments.
For derived class objects, adding new vptrs to these vtbls, one
additional vptr for each base class after the first one.
Merging all the virtual functions into a single table would destroy that flexibility. Note that multiple "parallel" inheritance D: B1, B2 {}; is different from "stacked" inheritance D: M: B {};. The latter requires a single substitution chain, the former has two such chains and incompatible B1 and B2.
If you have to base class A and B of your multiply inherited object D, these have their own vtable layout and D needs to provide vtables which match the vtables of both A and B. Further, if another class derives from D and possibly from another similarly multiple inherited class, the same thing happens again, i.e., there are multiple vtables needed. They can't just simply be merged. As a result, multiply inherited objects typically have multiple vtables around and the compiler inserts code to first determine the function's correct vtable and then call it. I think the code determining the correct vtable based on a pointer to an object with multiple bases is just a simple addition or subtraction if the virtual function is not in a virtual base class and a look-up of the location of the virtual base class otherwise, i.e., there isn't anything really expensive being done but more than just an indirect call is needed.

Use of making the base class polymorphic?

I know the keyword virtual makes the base class polymorphic and if I create an object and call a virtual function, corresponding function will be called based on the run time allocation but why should I create an object with different types. I mean
Base *ptr = new Derived;
ptr->virtualfunction(); //calls the function which has implemented in Derived class.
If I create an object so that
Derived *ptr = new Derived;
ptr->virtualfunction(); // which does the same without the need of making the function virtual.
Because you might want to store objects of different types together:
std::vector<std::unique_ptr<Base>> v;
v.push_back(make_unique(new DerivedA()));
v.push_back(make_unique(new DerivedB()));
v.push_back(make_unique(new DerivedC()));
Now, if you go over that vector:
for (auto& p : v) {
p->foo();
}
It will call foo() of DerivedA, B, and C appropriately.
Let's go with a simple example : Let's say you have
class Base {};
class Derived1 : public Base {};
class Derived2 : public Base {};
Now, let's say you want to be able to store in a vector (or any container) both Derived1 and Derived2 instances.
You have to use the base class in that case.
std::vector<Base*>
// or std::vector<std::unique_ptr<Base>>
The need for polymorphism is the need of processing different data in the same manner. Rather than reimplementing over and over the same algorithm for dataset with different shapes, wouldn't it be much easier to have only one implementation of that algorithm, and parameterize it with different operators?
That's the essence of polymorphism. You start with an algorithm, establish the interface it must interact with, and then build implementations of that interface. In C++ the notion of interface is implicit in every classes. Any class exposes one interface (though it may support many interfaces through its ancestors), and its descendants implement it as well. By making certain methods virtuals, the descendants may override and adapt them to their own internal structures, without modifying how the object is manipulated from the outside.
So polymorphism is really that, values which may adopt different shapes, and the means to access and manipulate them uniformally. The key point in answering your question is perhaps that the algorithm does not know which implepentation it is manipulating. You provide a trivial example where the code knows that it works with an instance of Derived, and thus may call its methods directly. In generic code, or code refering to an interface (so to speak), that knowledge does not exist, which forces the code to rely on the base class methods (and requires the programmer to ensure that the classes he plans to use with that code are well defined - ie. virtual - where needed).
There are many useful applications of polymorphism, but they all derive from the above principle:
heterogeneous dataset (as illustrated by other answers),
injection ( in which different implementations of the same interface may be swapped one for another at runtime),
testing (and more specifically mocking, in which classes which interact with a given class C are replaced by dummies which help test the correct behaviour of C),
to name a few. Note that compile time polymorphism (templates), and runtime polymorphism (virtual methods and inheritance) both achieve that goal, albeit in a different way, and with different pros and cons.

Is there any way to avoid declaring virtual methods when storing (children) pointers?

I have run into an annoying problem lately, and I am not satisfied with my own workaround: I have a program that maintains a vector of pointers to a base class, and I am storing there all kind of children object-pointers. Now, each child class has methods of their own, and the main program may or not may call these methods, depending on the type of object (note though that they all heavily use common methods of the base class, so this justify inheritance).
I have found useful to have an "object identifier" to check the class type (and then either call the method or not), which is already not very beautiful, but this is not the main inconvenience. The main inconvenience is that, if I want to actually be able to call a derived class method using the base class pointer (or even just store the pointer in the pointer array), then one need to declare the derived methods as virtual in the base class.
Make sense from the C++ coding point of view.. but this is not practical in my case (from the development point of view), because I am planning to create many different children classes in different files, perhaps made by different people, and I don't want to tweak/maintain the base class each time, to add virtual methods!
How to do this? Essentially, what I am asking (I guess) is how to implement something like Objective-C NSArrays - if you send a message to an object that does not implement the method, well, nothing happens.
regards
Instead of this:
// variant A: declare everything in the base class
void DoStuff_A(Base* b) {
if (b->TypeId() == DERIVED_1)
b->DoDerived1Stuff();
else if if (b->TypeId() == DERIVED_2)
b->DoDerived12Stuff();
}
or this:
// variant B: declare nothing in the base class
void DoStuff_B(Base* b) {
if (b->TypeId() == DERIVED_1)
(dynamic_cast<Derived1*>(b))->DoDerived1Stuff();
else if if (b->TypeId() == DERIVED_2)
(dynamic_cast<Derived2*>(b))->DoDerived12Stuff();
}
do this:
// variant C: declare the right thing in the base class
b->DoStuff();
Note there's a single virtual function in the base per stuff that has to be done.
If you find yourself in a situation where you are more comfortable with variants A or B then with variant C, stop and rethink your design. You are coupling components too tightly and in the end it will backfire.
I am planning to create many different children classes in different
files, perhaps made by different people, and I don't want to
tweak/maintain the base class each time, to add virtual methods!
You are OK with tweaking DoStuff each time a derived class is added, but tweaking Base is a no-no. May I ask why?
If your design does not fit in either A, B or C pattern, show what you have, for clairvoyance is a rare feat these days.
You can do what you describe in C++, but not using functions. It is, by the way, kind of horrible but I suppose there might be cases in which it's a legitimate approach.
First way of doing this:
Define a function with a signature something like boost::variant parseMessage(std::string, std::vector<boost::variant>); and perhaps a string of convenience functions with common signatures on the base class and include a message lookup table on the base class which takes functors. In each class constructor add its messages to the message table and the parseMessage function then parcels off each message to the right function on the class.
It's ugly and slow but it should work.
Second way of doing this:
Define the virtual functions further down the hierarchy so if you want to add int foo(bar*); you first add a class that defines it as virtual and then ensure every class that wants to define int foo(bar*); inherit from it. You can then use dynamic_cast to ensure that the pointer you are looking at inherits from this class before trying to call int foo(bar*);. Possible these interface adding classes could be pure virtual so they can be mixed in to various points using multiple inheritance, but that may have its own problems.
This is less flexible than the first way and requires the classes that implement a function to be linked to each other. Oh, and it's still ugly.
But mostly I suggest you try and write C++ code like C++ code not Objective-C code.
This can be solved by adding some sort of introspection capabilities and meta object system. This talk Metadata and reflection in C++ — Jeff Tucker demonstrates how to do this using c++'s template meta programming.
If you don't want to go to the trouble of implementing one yourself, then it would be easier to use an existing one such as Qt's meta object system. Note that this solution does not work with multiple inheritance due to limitations in the meta object compiler: QObject Multiple Inheritance.
With that installed, you can query for the presence of methods and call them. This is quite tedious to do by hand, so the easiest way to call such a methods is using the signal and slot mechanism.
There is also GObject which is quite simmilar and there are others.
If you are planning to create many different children classes in different files, perhaps made by different people, and also I would guess you don't want to change your main code for every child class. Then I think what you need to do in your base class is to define several (not to many) virtual functions (with empty implementation) BUT those functions should be used to mark a time in the logic where they are called like "AfterInseart" or "BeforeSorting", Etc.
Usually there are not to many places in the logic you wish a derived classes to perform there own logic.

What are the disadvantages of "upcasting"?

The purpose of an abstract class is not to let the developers create an object of the base class and then upcast it, AFAIK.
Now, even if the upcasting is not required, and I still use it, does it prove to be "disadvantageous" in some way?
More clarification:
From The Thinking in C++:
Often in a design, you want the base class to present only an
interface for its derived classes. That is, you don’t want anyone to
actually create an object of the base class, only to upcast to it so that
its interface can be used. This is accomplished by making that class
abstract,
By upcasting, I meant: baseClass *obj = new derived ();
Upcasting can be disadvantageous for non polymorphic classes. For example:
class Fruit { ... }; // doesn't contain any virtual method
class Apple : public Fruit { ... };
class Blackberry : public Fruit { ... };
upcast it somewhere,
Fruit *p = new Apple; // oops, information gone
Now, you will never know (without any manual mechanism) that if *p is an instance of an Apple or a Blackberry.
[Note that dynamic_cast<> is not allowed for non-polymorphic classes.]
Abstract classes are used to express concepts that are common to a set of (sub-)classes, but for which it is not sensible to create instances.
Consider a class Animal. It does not make sense to create an instance of that class, because there is no thing that is just an animal. There are ducks, dogs and elephants, each of which is a subclass of animal. By formally declaring the class animal you can capture the similarities of all types of animals, and by making it abstract you can express that it cannot be instantiated.
Upcasting is required to make use of polymorphism in statically typed languages. This is, as #Jigar Joshi pointed out in a comment, called the Liskov Substituion Principle.
Edit: Upcasting is not disadvantageous. In fact, you should use it whenever possible, making your code depend on super-classes(interfaces) instead of base-classes(implementations). This enables you later switch implementations without having to change your code.
Upcasting is a technical tool.
Like every tool it is useful when used correctly and dangerous / disadvantageous if used inconsistently.
It can be good or bad depending on how "pure" you want your code to be in respect to a given programming paradigm.
Now, C++ is not necessarily "pure OOP", not necessarily "pure Generic", not necessarily "pure functional". And since C++ is a "pragmatic language", it is not in general an advantage force it to fit a "one and only paradigm".
The only thing that can be said, in technical terms, is that,
A derived class is a base class plus something more
Referring a derived through a base pointer makes that "something more" not accessible, unless there is a mechanism in the base to make you jump into the derived scope.
The mechanism C++ offers for that implicit jump are virtual functions.
The mechanism C++ offers for explicit jump is dynamic_cast (used in downcasting).
For non-polymorphic objects (that don't have any virtual method) static_cast (to downcast) is still available, but with no runtime check.
Advantages and disadvantages derive from consistent and inconsistent use of all of those points together. Is not a matter related to downcast only.
One disadvantage would be the obvious loss of new functionality introduced in the derived class:
class A
{
void foo();
}
class B : public A
{
void foo2();
}
A* b = new B;
b->foo2(); //error - no longer visible
I'm talking here about non-virtual functions.
Also, if you forget to make your destructors virtual, you might get some memory leaks when deleting a derived object via a pointer to a base object.
However all these can be avoided with a good architecture.