dynamical binding or switch/case? - c++

A scene like this:
I've different of objects do the similar operation as respective func() implements.
There're 2 kinds of solution for func_manager() to call func() according to different objects
Solution 1: Use virtual function character specified in c++. func_manager works differently accroding to different object point pass in.
class Object{
virtual void func() = 0;
}
class Object_A : public Object{
void func() {};
}
class Object_B : public Object{
void func() {};
}
void func_manager(Object* a)
{
a->func();
}
Solution 2: Use plain switch/case. func_manager works differently accroding to different type pass in
typedef enum _type_t
{
TYPE_A,
TYPE_B
}type_t;
void func_by_a()
{
// do as func() in Object_A
}
void func_by_b()
{
// do as func() in Object_A
}
void func_manager(type_t type)
{
switch(type){
case TYPE_A:
func_by_a();
break;
case TYPE_B:
func_by_b();
default:
break;
}
}
My Question are 2:
1. at the view point of DESIGN PATTERN, which one is better?
2. at the view point of RUNTIME EFFCIENCE, which one is better? Especailly as the kinds of Object increases, may be up to 10-15 total, which one's overhead oversteps the other? I don't know how switch/case implements innerly, just a bunch of if/else?
Thanks very much!

from the view point of DESIGN PATTERN, which one is better?
Using polymorphism (Solution 1) is better.
Just one data point: Imagine you have a huge system built around either of the two and then suddenly comes the requirement to add another type. With solution one, you add one derived class, make sure it's instantiated where required, and you're done. With solution 2 you have thousands of switch statements smeared all over the system and it is more or less impossible to guarantee you found all the places where you have to modify them for the new type.
from the view point of RUNTIME EFFCIENCE, which one is better? Especailly as the kinds of Object
That's hard to say.
I remember a footnote in Stanley Lippmann's Inside the C++ Object Model, where he says that studies have shown that virtual functions might have a small advantage against switches over types. I would be hard-pressed, however, to cite chapter and verse, and, IIRC, the advantage didn't seem big enough to make the decision dependent on it.

The first solution is better if only because it's shorter in code. It is also easier to maintain and faster to compile: if you want to add types you need only add new types as headers and compilation units, with no need to change and recompile the code that is responsible for the type mapping. In fact, this code is generated by the compiler and is likely to be as efficient or more efficient than anything you can write on your own.
Virtual functions at the lowest level cost no more than an extra dereference in a table (array). But never mind that, this sort of performance nitpicking really doesn't matter at this microscopic numbers. Your code is simpler, and that's what matters. The runtime dispatch that C++ gives you is there for a reason. Use it.

I would say the first one is better. In solution 2 func_manager will have to know about all types and be updated everytime you add a new type. If you go with solution 1 you can later add a new type and func_manager will just work.
In this simple case I would actually guess that solution 1 will be faster since it can directly look up the function address in the vtable. If you have 15 different types the switch statement will likely not end up as a jump table but basically as you say a huge if/else statement.

From the design point of view the first one is definitely better as thats what inheritance was intended for, to make different objects behave homogeneously.
From the efficiency point of view in both alternatives you have more or less the same generated code, somewhere there must be the choice making code. Difference is that inheritance handles it for you automatically in the 1st one and you do it manually in the 2nd one.

Using "dynamic dispatch" or "virtual dispatch" (i.e. invoking a virtual function) is better both with respect to design (keeping changes in one place, extensibility) and with respect to runtime efficiency (simple dereference vs. a simple dereference where a jump table is used or an if...else ladder which is really slow). As a slight aside, "dynamic binding" doesn't mean what you think... "dynamic binding" refers to resolving a variable's value based on its most recent declaration, as opposed to "static binding" or "lexical binding", which refers to resolving a variable by the current inner-most scope in which it is declared.
Design
If another programmer comes along who doesn't have access to your source code and wants to create an implementation of Object, then that programmer is stuck... the only way to extend functionality is by adding yet another case in a very long switch statement. Whereas with virtual functions, the programmer only needs to inherit from your interface and provide definitions for the virtual methods and, walla, it works. Also, those switch statements end up all over the place, and so adding new implementations almost always requires modifying many switch statements everywhere, while inheritance keeps the changes localized to one class.
Efficiency
Dynamic dispatch simply looks up a function in the object's virtual table and then jumps to that location. It is incredibly fast. If the switch statement uses a jump table, it will be roughly the same speed; however, if there are very few implementations, some programmer is going to be tempted to use an if...else ladder instead of a switch statement, which generally is not able to take advantage of jump tables and is, therefore, slower.

Why nobody suggests function objects? I think kingkai interested in solving the problem, not only that two solutions.
I'm not experienced with them, but they do their job:
struct Helloer{
std::string operator() (void){
return std::string("Hello world!");
}
};
struct Byer{
std::string operator() (void){
return std::string("Good bye world!");
}
};
template< class T >
void say( T speaker){
std::cout << speaker() << std::endl;
}
int main()
{
say( Helloer() );
say( Byer() );
}
Edit: In my opinion this is more "right" approach, than classes with single method (which is not function call operator). Actually, I think this overloading was added to C++ to avoid such classes.
Also, function objects are more convenient to use, even if you dont want templates - just like usual functions.
In the end consider STL - it uses func objects everywhere and looks pretty natural. And I dont even mention Boost

Related

Alternatives to passing a flag into a method?

Sometimes when fixing a defect in an existing code base I might (often out of laziness) decide to change a method from:
void
MyClass::foo(uint32_t aBar)
{
// Do something with aBar...
}
to:
void
MyClass::foo(uint32_t aBar, bool aSomeCondition)
{
if (aSomeCondition)
{
// Do something with aBar...
}
}
During a code review a colleague mentioned that a better approach would be to sub-class MyClass to provide this specialized functionality.
However, I would argue that as long as aSomeCondition doesn't violate the purpose or cohesion of MyClass it is an acceptable pattern to use. Only if the code became infiltrated with flags and if statements would inheritance be a better option, otherwise we would be potentially be entering architecture astronaut territory.
What's the tipping point here?
Note: I just saw this related answer which suggests that an enum may be a better
choice than a bool, but I think my question still applies in this case.
There is not only one solution for this kind of problem.
Boolean has a very low semantic. If you want to add in the future a new condition you will have to add a new parameter...
After four years of maintenance your method may have half a dozen of parameters, if these parameters are all boolean it is very nice trap for maintainers.
Enum is a good choice if cases are exclusive.
Enums can be easily migrated to a bit-mask or a context object.
Bit mask : C++ includes C language, you can use some plain old practices. Sometime a bit mask on an unsigned int is a good choice (but you loose type checking) and you can pass by mistake an incorrect mask. It is a convenient way to move smoothly from a boolean or an enum argument to this kind of pattern.
Bit mask can be migrated with some effort to a context-object. You may have to implement some kind of bitwise arithmetics such as operator | and operator & if you have to keep a buildtime compatibility.
Inheritence is sometime a good choice if the split of behavior is big and this behavior IS RELATED to the lifecycle of the instance. Note that you also have to use polymorphism and this is may slow down the method if this method is heavily used.
And finally inheritence induce change in all your factory code... And what will you do if you have several methods to change in an exclusive fashion ? You will clutter your code of specific classes...
In fact, I think that this generally not a very good idea.
Method split : Another solution is sometime to split the method in several private and provide two or more public methods.
Context object : C++ and C lack of named parameter can be bypassed by adding a context parameter. I use this pattern very often, especially when I have to pass many data across level of a complex framework.
class Context{
public:
// usually not a good idea to add public data member but to my opinion this is an exception
bool setup:1;
bool foo:1;
bool bar:1;
...
Context() : setup(0), foo(0), bar(0) ... {}
};
...
Context ctx;
ctx.setup = true; ...
MyObj.foo(ctx);
Note:
That this is also useful to minimize access (or use) of static data or query to singleton object, TLS ...
Context object can contain a lot more of caching data related to an algorithm.
...
I let your imagination run free...
Anti patterns
I add here several anti pattern (to prevent some change of signature):
*NEVER DO THIS *
*NEVER DO THIS * use a static int/bool for argument passing (some people that do that, and this is a nightmare to remove this kind of stuff). Break at least multithreading...
*NEVER DO THIS * add a data member to pass parameter to method.
Unfortunately, I don't think there is a clear answer to the problem (and it's one I encounter quite frequently in my own code). With the boolean:
foo( x, true );
the call is hard to understand .
With an enum:
foo( x, UseHigherAccuracy );
it is easy to understand but you tend to end up with code like this:
foo( x, something == someval ? UseHigherAccuracy : UseLowerAccuracy );
which is hardly an improvement. And with multiple functions:
if ( something == someval ) {
AccurateFoo( x );
}
else {
InaccurateFoo( x );
}
you end up with a lot more code. But I guess this is the easiest to read, and what I'd tend to use, but I still don't completely like it :-(
One thing I definitely would NOT do however, is subclass. Inheritance should be the last tool you ever reach for.
The primary question is if the flag affects the behaviour of the class, or of that one function. Function-local changes should be parameters, not subclasses. Run-time inheritance should be one of the last tools reached for.
The general guideline I use is: if aSomeCondition changes the nature of the function in a major way, then I consider subclassing.
Subclassing is a relatively large effort compared to adding a flag that has only a minor effect.
Some examples:
if it's a flag that changes the direction in which a sorted collection is returned to the caller, that's a minor change in nature (flag).
if it's a one-shot flag (something that affects the current call rather than a persistent change to the object), it should probably not be a subclass (since going down that track is likely to lead to a massive number of classes).
if it's a enumeration that changes the underlying data structure of your class from array to linked list or balanced tree, that's a complex change (subclass).
Of course, that last one may be better handled by totally hiding the underlying data structure but I'm assuming here that you want to be able to select one of many, for reasons such as performance.
IMHO, aSomeCondition flag changes or depends on the state of current instance, therefore, under certain conditions this class should change its state and handle mentioned operation differently. In this case, I can suggest the usage of State Pattern. Hope it helps.
I would just change code:
void MyClass::foo(uint32_t aBar, bool aSomeCondition)
{
if (aSomeCondition)
{
// Do something with aBar...
}
}
to:
void MyClass::foo(uint32_t aBar)
{
if (this->aSomeCondition)
{
// Do something with aBar...
}
}
I always omit bool as function parameter and prefer to put into struct, even if I would have to call
myClass->enableCondition();

Why bother with virtual functions in c++?

This is not a question about how they work and declared, this I think is pretty much clear to me. The question is about why to implement this?
I suppose the practical reason is to simplify bunch of other code to relate and declare their variables of base type, to handle objects and their specific methods from many other subclasses?
Could this be done by templating and typechecking, like I do it in Objective C? If so, what is more efficient? I find it confusing to declare object as one class and instantiate it as another, even if it is its child.
SOrry for stupid questions, but I havent done any real projects in C++ yet and since I am active Objective C developer (it is much smaller language thus relying heavily on SDK's functionalities, like OSX, iOS) I need to have clear view on any parallel ways of both cousins.
Yes, this can be done with templates, but then the caller must know what the actual type of the object is (the concrete class) and this increases coupling.
With virtual functions the caller doesn't need to know the actual class - it operates through a pointer to a base class, so you can compile the client once and the implementor can change the actual implementation as much as it wants and the client doesn't have to know about that as long as the interface is unchanged.
Virtual functions implement polymorphism. I don't know Obj-C, so I cannot compare both, but the motivating use case is that you can use derived objects in place of base objects and the code will work. If you have a compiled and working function foo that operates on a reference to base you need not modify it to have it work with an instance of derived.
You could do that (assuming that you had runtime type information) by obtaining the real type of the argument and then dispatching directly to the appropriate function with a switch of shorts, but that would require either manually modifying the switch for each new type (high maintenance cost) or having reflection (unavailable in C++) to obtain the method pointer. Even then, after obtaining a method pointer you would have to call it, which is as expensive as the virtual call.
As to the cost associated to a virtual call, basically (in all implementations with a virtual method table) a call to a virtual function foo applied on object o: o.foo() is translated to o.vptr[ 3 ](), where 3 is the position of foo in the virtual table, and that is a compile time constant. This basically is a double indirection:
From the object o obtain the pointer to the vtable, index that table to obtain the pointer to the function and then call. The extra cost compared with a direct non-polymorphic call is just the table lookup. (In fact there can be other hidden costs when using multiple inheritance, as the implicit this pointer might have to be shifted), but the cost of the virtual dispatch is very small.
I don't know the first thing about Objective-C, but here's why you want to "declare an object as one class and instantiate it as another": the Liskov Substitution Principle.
Since a PDF is a document, and an OpenOffice.org document is a document, and a Word Document is a document, it's quite natural to write
Document *d;
if (ends_with(filename, ".pdf"))
d = new PdfDocument(filename);
else if (ends_with(filename, ".doc"))
d = new WordDocument(filename);
else
// you get the point
d->print();
Now, for this to work, print would have to be virtual, or be implemented using virtual functions, or be implemented using a crude hack that reinvents the virtual wheel. The program need to know at runtime which of various print methods to apply.
Templating solves a different problem, where you determine at compile time which of the various containers you're going to use (for example) when you want to store a bunch of elements. If you operate on those containers with template functions, then you don't need to rewrite them when you switch containers, or add another container to your program.
A virtual function is important in inheritance. Think of an example where you have a CMonster class and then a CRaidBoss and CBoss class that inherit from CMonster.
Both need to be drawn. A CMonster has a Draw() function, but the way a CRaidBoss and a CBoss are drawn is different. Thus, the implementation is left to them by utilizing the virtual function Draw.
Well, the idea is simply to allow the compiler to perform checks for you.
It's like a lot of features : ways to hide what you don't want to have to do yourself. That's abstraction.
Inheritance, interfaces, etc. allow you to provide an interface to the compiler for the implementation code to match.
If you didn't have the virtual function mecanism, you would have to write :
class A
{
void do_something();
};
class B : public A
{
void do_something(); // this one "hide" the A::do_something(), it replace it.
};
void DoSomething( A* object )
{
// calling object->do_something will ALWAYS call A::do_something()
// that's not what you want if object is B...
// so we have to check manually:
B* b_object = dynamic_cast<B*>( object );
if( b_object != NULL ) // ok it's a b object, call B::do_something();
{
b_object->do_something()
}
else
{
object->do_something(); // that's a A, call A::do_something();
}
}
Here there are several problems :
you have to write this for each function redefined in a class hierarchy.
you have one additional if for each child class.
you have to touch this function again each time you add a definition to the whole hierarcy.
it's visible code, you can get it wrong easily, each time
So, marking functions virtual does this correctly in an implicit way, rerouting automatically, in a dynamic way, the function call to the correct implementation, depending on the final type of the object.
You dont' have to write any logic so you can't get errors in this code and have an additional thing to worry about.
It's the kind of thing you don't want to bother with as it can be done by the compiler/runtime.
The use of templates is also technically known as polymorphism from theorists. Yep, both are valid approach to the problem. The implementation technics employed will explain better or worse performance for them.
For example, Java implements templates, but through template erasure. This means that it is only apparently using templates, under the surface is plain old polymorphism.
C++ has very powerful templates. The use of templates makes code quicker, though each use of a template instantiates it for the given type. This means that, if you use an std::vector for ints, doubles and strings, you'll have three different vector classes: this means that the size of the executable will suffer.

C++ design pattern to get rid of if-then-else

I have the following piece of code:
if (book.type == A) do_something();
else if (book.type == B) do_something_else();
....
else do so_some_default_thing.
This code will need to be modified whenever there is a new book type
or when a book type is removed. I know that I can use enums and use a switch
statement. Is there a design pattern that removes this if-then-else?
What are the advantages of such a pattern over using a switch statement?
You could make a different class for each type of book. Each class could implement the same interface, and overload a method to perform the necessary class-specific logic.
I'm not saying that's necessarily better, but it is an option.
As others have pointed out, a virtual function should probably be your first choice.
If, for some reason, that doesn't make sense/work well for your design, another possibility would be to use an std::map using book.type as a key and a pointer to function (or functor, etc.) as the associated value, so you just lookup the action to take for a particular type (which is pretty much how many OO languages implement their equivalent of virtual functions, under the hood).
Each different type of book is a different sub-class of the parent class, and each class implements a method do_some_action() with the same interface. You invoke the method when you want the action to take place.
Yes, it's called looping:
struct BookType {
char type;
void *do();
};
BookType[] types = {{A, do_something}, {B, do_something_else}, ...};
for (int i = 0; i < types_length; i++) {
if (book.type == types[i].type) types[i].do(book);
}
For a better approach though, it's even more preferrable if do_something, do_something_else, etc is a method of Book, so:
struct Book {
virtual void do() = 0;
};
struct A {
void do() {
// ... do_something
}
};
struct B {
void do() {
// ... do_something_else
}
};
so you only need to do:
book.do();
Those if-then-else-if constructs are one of my most acute pet peeves. I find it difficult to conjure up a less imaginative design choice. But enough of that. On to what can be done about it.
I've used several design approaches depending on the exact nature of the action to be taken.
If the number of possibilities is small and future expansion is unlikely I may just use a switch statement. But I'm sure you didn't come all the way to SOF to hear something that boring.
If the action is the assignment of a value then a table-driven approach allows future growth without actually making code changes. Simply add and remove table entries.
If the action involves complex method invocations then I tend to use the Chain of Responsibility design pattern. I'll build a list of objects that each knows how to handle the actions for a particular case.
You hand the item to be processed to the first handler object. If it knows what to do with the item it performs the action. If it doesn't, it passes the item off to the next handler in the list. This continues until the item is processed or it falls into the default handler that cleans up or prints an error or whatever. Maintenance is simple -- you add or remove handler objects from the list.
You could define a subclass for each book type, and define a virtual function do_something. Each subclass A, B, etc would have its own version of do_something that it calls into, and do_some_default_thing then just becomes the do_something method in the base class.
Anyway, just one possible approach. You would have to evaluate whether it really makes things easier for you...
Strategy Design Pattern is what I think you need.
As an alternative to having a different class for each book, consider having a map from book types to function pointers. Then your code would look like this (sorry for pseudocode, C++ isn't at the tip of my fingers these days):
if book.type in booktypemap:
booktypemap[book.type]();
else
defaultfunc();

Are type fields pure evil?

As discusses in The c++ Programming Language 3rd Edition in section 12.2.5, type fields tend to create code that is less versatile, error-prone, less intuitive, and less maintainable than the equivalent code that uses virtual functions and polymorphism.
As a short example, here is how a type field would be used:
void print(const Shape &s)
{
switch(s.type)
{
case Shape::TRIANGE:
cout << "Triangle" << endl;
case Shape::SQUARE:
cout << "Square" << endl;
default:
cout << "None" << endl;
}
}
Clearly, this is a nightmare as adding a new type of shape to this and a dozen similar functions would be error-prone and taxing.
Despite these shortcomings and those described in TC++PL, are there any examples where such an implementation (using a type field) is a better solution than utilizing the language features of virtual functions? Or should this practice be black listed as pure evil?
Realistic examples would be preferred over contrived ones, but I'd still be interested in contrived examples. Also, have you ever seen this in production code (even though virtual functions would have been easier)?
When you "know" you have a very specific, small, constant set of types, it can be easier to hardcode them like this. Of course, constants aren't and variables don't, so at some point you might have to rewrite the whole thing anyway.
This is, more or less, the technique used for discriminated unions in several of Alexandrescu's articles.
For example, if I was implementing a JSON library, I'd know each Value can only be an Object, Array, String, Integer, Boolean, or Null—the spec doesn't allow any others.
A type enum can be serialized via memcpy, a v-table can't. A similar feature is that corruption of a type enum value is easy to handle, corruption of the v-table pointer means instant badness. There's no portable way to even test a v-table pointer for validity, calling dynamic_cast or typeinfo to do RTTI checks on an invalid object is undefined behavior.
For example, one instance where I choose to use a type hierarchy with static dispatch controlled by a discriminator and not dynamic dispatch is when passing a pointer to a structure through a Windows message queue. This gives me some protection against other software that may have allocated broadcast messages from a range I'm using (it's supposed to be reserved for app-local messages, do not pass GO if you think that rule is actually respected).
The following guideline is from Clean Code by Robert C. Martin.
"My general rule for switch statements is that they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them".
The rationale is this: if you expose type fields to the rest of your code, you'll get multiple instances of the above switch statement. That is a clear violation of DRY. When you add a type, all these switches need to change (or, even worse, they become inconsistent without breaking your build).
My take is: It depends.
A parameterized Factory Method design pattern relies on this technique.
class Creator {
public:
virtual Product* Create(ProductId);
};
Product* Creator::Create (ProductId id) {
if (id == MINE) return new MyProduct;
if (id == YOURS) return new YourProduct;
// repeat for remaining products...
return 0;
}
So, is this bad. I don't think so as we do not have any other alternative at this stage. This is a place where it is absolutely necessary as it involves creation of an object. The type of the object is yet to be known.
The example in OP is however an example which sure needs refactoring. Here we are already dealing with an existing object/type (passed as argument to function).
As Herb Sutter mentions -
"Switch off: Avoid switching on the
type of an object to customize
behavior. Use templates and virtual
functions to let types (not their
calling code) decide their behavior."
Aren't there costs associated to virtual functions and polymorphism? Like maintaining a vtable per class, increase of each class object size by 4 byptes, runtime slowness (I have never measured it though) for resolving the virtual function appropriately. So, for simple situations, using a type field appears acceptable.
I think that if the type corresponds precisely to the implied classes then type is wrong. Where it gets complicated is where the type does not quite match or its not so cut and dried.
Taking your example what if type was Red, Green, Blue. Those are types of shapes. You could even have a color class as a mixin; but its probably too much.
I am thinking of using a type field to solve the problem of vector slicing. That is, I want a vector of hierarchical objects. For example I want my vector to be a vector of shapes, but I want to store circles, rectangles, triangles etc.
You can't do that in the most obvious simple way because of slicing. So the normal solution is to have a vector of pointers or smart pointers instead. But I think there are cases where using a type field will be a simpler solution, (avoids new/delete or alternative lifecycle techniques).
The best example I can think of (and the one I've run into before), is when your set of types is fixed and the set of functions you want to do (that depend on those types) is fluid. That way, when you add a new function, you modify a single place (adding a single switch) rather than adding a new base virtual function with the real implementation scattered all across the classes in your type hierarchy.
I don't know of any realistic examples. Contrived ones would depend on there being some good reason that virtual methods cannot be used.

Do polymorphism or conditionals promote better design?

I recently stumbled across this entry in the google testing blog about guidelines for writing more testable code. I was in agreement with the author until this point:
Favor polymorphism over conditionals: If you see a switch statement you should think polymorphisms. If you see the same if condition repeated in many places in your class you should again think polymorphism. Polymorphism will break your complex class into several smaller simpler classes which clearly define which pieces of the code are related and execute together. This helps testing since simpler/smaller class is easier to test.
I simply cannot wrap my head around that. I can understand using polymorphism instead of RTTI (or DIY-RTTI, as the case may be), but that seems like such a broad statement that I can't imagine it actually being used effectively in production code. It seems to me, rather, that it would be easier to add additional test cases for methods which have switch statements, rather than breaking down the code into dozens of separate classes.
Also, I was under the impression that polymorphism can lead to all sorts of other subtle bugs and design issues, so I'm curious to know if the tradeoff here would be worth it. Can someone explain to me exactly what is meant by this testing guideline?
Actually this makes testing and code easier to write.
If you have one switch statement based on an internal field you probably have the same switch in multiple places doing slightly different things. This causes problems when you add a new case as you have to update all the switch statements (if you can find them).
By using polymorphism you can use virtual functions to get the same functionality and because a new case is a new class you don't have to search your code for things that need to be checked it is all isolated for each class.
class Animal
{
public:
Noise warningNoise();
Noise pleasureNoise();
private:
AnimalType type;
};
Noise Animal::warningNoise()
{
switch(type)
{
case Cat: return Hiss;
case Dog: return Bark;
}
}
Noise Animal::pleasureNoise()
{
switch(type)
{
case Cat: return Purr;
case Dog: return Bark;
}
}
In this simple case every new animal causes requires both switch statements to be updated.
You forget one? What is the default? BANG!!
Using polymorphism
class Animal
{
public:
virtual Noise warningNoise() = 0;
virtual Noise pleasureNoise() = 0;
};
class Cat: public Animal
{
// Compiler forces you to define both method.
// Otherwise you can't have a Cat object
// All code local to the cat belongs to the cat.
};
By using polymorphism you can test the Animal class.
Then test each of the derived classes separately.
Also this allows you to ship the Animal class (Closed for alteration) as part of you binary library. But people can still add new Animals (Open for extension) by deriving new classes derived from the Animal header. If all this functionality had been captured inside the Animal class then all animals need to be defined before shipping (Closed/Closed).
Do not fear...
I guess your problem lies with familiarity, not technology. Familiarize yourself with C++ OOP.
C++ is an OOP language
Among its multiple paradigms, it has OOP features and is more than able to support comparison with most pure OO language.
Don't let the "C part inside C++" make you believe C++ can't deal with other paradigms. C++ can handle a lot of programming paradigms quite graciously. And among them, OOP C++ is the most mature of C++ paradigms after procedural paradigm (i.e. the aforementioned "C part").
Polymorphism is Ok for production
There is no "subtle bugs" or "not suitable for production code" thing. There are developers who remain set in their ways, and developers who'll learn how to use tools and use the best tools for each task.
switch and polymorphism are [almost] similar...
... But polymorphism removed most errors.
The difference is that you must handle the switches manually, whereas polymorphism is more natural, once you get used with inheritance method overriding.
With switches, you'll have to compare a type variable with different types, and handle the differences. With polymorphism, the variable itself knows how to behave. You only have to organize the variables in logical ways, and override the right methods.
But in the end, if you forget to handle a case in switch, the compiler won't tell you, whereas you'll be told if you derive from a class without overriding its pure virtual methods. Thus most switch-errors are avoided.
All in all, the two features are about making choices. But Polymorphism enable you to make more complex and in the same time more natural and thus easier choices.
Avoid using RTTI to find an object's type
RTTI is an interesting concept, and can be useful. But most of the time (i.e. 95% of the time), method overriding and inheritance will be more than enough, and most of your code should not even know the exact type of the object handled, but trust it to do the right thing.
If you use RTTI as a glorified switch, you're missing the point.
(Disclaimer: I am a great fan of the RTTI concept and of dynamic_casts. But one must use the right tool for the task at hand, and most of the time RTTI is used as a glorified switch, which is wrong)
Compare dynamic vs. static polymorphism
If your code does not know the exact type of an object at compile time, then use dynamic polymorphism (i.e. classic inheritance, virtual methods overriding, etc.)
If your code knows the type at compile time, then perhaps you could use static polymorphism, i.e. the CRTP pattern http://en.wikipedia.org/wiki/Curiously_Recurring_Template_Pattern
The CRTP will enable you to have code that smells like dynamic polymorphism, but whose every method call will be resolved statically, which is ideal for some very critical code.
Production code example
A code similar to this one (from memory) is used on production.
The easier solution revolved around a the procedure called by message loop (a WinProc in Win32, but I wrote a simplier version, for simplicity's sake). So summarize, it was something like:
void MyProcedure(int p_iCommand, void *p_vParam)
{
// A LOT OF CODE ???
// each case has a lot of code, with both similarities
// and differences, and of course, casting p_vParam
// into something, depending on hoping no one
// did a mistake, associating the wrong command with
// the wrong data type in p_vParam
switch(p_iCommand)
{
case COMMAND_AAA: { /* A LOT OF CODE (see above) */ } break ;
case COMMAND_BBB: { /* A LOT OF CODE (see above) */ } break ;
// etc.
case COMMAND_XXX: { /* A LOT OF CODE (see above) */ } break ;
case COMMAND_ZZZ: { /* A LOT OF CODE (see above) */ } break ;
default: { /* call default procedure */} break ;
}
}
Each addition of command added a case.
The problem is that some commands where similar, and shared partly their implementation.
So mixing the cases was a risk for evolution.
I resolved the problem by using the Command pattern, that is, creating a base Command object, with one process() method.
So I re-wrote the message procedure, minimizing the dangerous code (i.e. playing with void *, etc.) to a minimum, and wrote it to be sure I would never need to touch it again:
void MyProcedure(int p_iCommand, void *p_vParam)
{
switch(p_iCommand)
{
// Only one case. Isn't it cool?
case COMMAND:
{
Command * c = static_cast<Command *>(p_vParam) ;
c->process() ;
}
break ;
default: { /* call default procedure */} break ;
}
}
And then, for each possible command, instead of adding code in the procedure, and mixing (or worse, copy/pasting) the code from similar commands, I created a new command, and derived it either from the Command object, or one of its derived objects:
This led to the hierarchy (represented as a tree):
[+] Command
|
+--[+] CommandServer
| |
| +--[+] CommandServerInitialize
| |
| +--[+] CommandServerInsert
| |
| +--[+] CommandServerUpdate
| |
| +--[+] CommandServerDelete
|
+--[+] CommandAction
| |
| +--[+] CommandActionStart
| |
| +--[+] CommandActionPause
| |
| +--[+] CommandActionEnd
|
+--[+] CommandMessage
Now, all I needed to do was to override process for each object.
Simple, and easy to extend.
For example, say the CommandAction was supposed to do its process in three phases: "before", "while" and "after". Its code would be something like:
class CommandAction : public Command
{
// etc.
virtual void process() // overriding Command::process pure virtual method
{
this->processBefore() ;
this->processWhile() ;
this->processAfter() ;
}
virtual void processBefore() = 0 ; // To be overriden
virtual void processWhile()
{
// Do something common for all CommandAction objects
}
virtual void processAfter() = 0 ; // To be overriden
} ;
And, for example, CommandActionStart could be coded as:
class CommandActionStart : public CommandAction
{
// etc.
virtual void processBefore()
{
// Do something common for all CommandActionStart objects
}
virtual void processAfter()
{
// Do something common for all CommandActionStart objects
}
} ;
As I said: Easy to understand (if commented properly), and very easy to extend.
The switch is reduced to its bare minimum (i.e. if-like, because we still needed to delegate Windows commands to Windows default procedure), and no need for RTTI (or worse, in-house RTTI).
The same code inside a switch would be quite amusing, I guess (if only judging by the amount of "historical" code I saw in our app at work).
Unit testing an OO program means testing each class as a unit. A principle that you want to learn is "Open to extension, closed to modification". I got that from Head First Design Patterns. But it basically says that you want to have the ability to easily extend your code without modifying existing tested code.
Polymorphism makes this possible by eliminating those conditional statements. Consider this example:
Suppose you have a Character object that carries a Weapon. You can write an attack method like this:
If (weapon is a rifle) then //Code to attack with rifle else
If (weapon is a plasma gun) //Then code to attack with plasma gun
etc.
With polymorphism the Character does not have to "know" the type of weapon, simply
weapon.attack()
would work. What happens if a new weapon was invented? Without polymorphism you will have to modify your conditional statement. With polymorphism you will have to add a new class and leave the tested Character class alone.
I'm a bit of a skeptic: I believe inheritance often adds more complexity than it removes.
I think you are asking a good question, though, and one thing I consider is this:
Are you splitting into multiple classes because you are dealing with different things? Or is it really the same thing, acting in a different way?
If it's really a new type, then go ahead and create a new class. But if it's just an option, I generally keep it in the same class.
I believe the default solution is the single-class one, and the onus is on the programmer proposing inheritance to prove their case.
Not an expert in the implications for test cases, but from a software development perspective:
Open-closed principle -- Classes should be closed to alteration, but open to extension. If you manage conditional operations via a conditional construct, then if a new condition is added, your class needs to change. If you use polymorphism, the base class need not change.
Don't repeat yourself -- An important part of the guideline is the "same if condition." That indicates that your class has some distinct modes of operation that can be factored into a class. Then, that condition appears in one place in your code -- when you instantiate the object for that mode. And again, if a new one comes along, you only need to change one piece of code.
Polymorphism is one of the corner stones of OO and certainly is very useful.
By dividing concerns over multiple classes you create isolated and testable units.
So instead of doing a switch...case where you call methods on several different types or implemenations you create a unified interface, having multiple implementations.
When you need to add an implementation, you do not need to modify the clients, as is the case with switch...case. Very important as this helps to avoid regression.
You can also simplify your client algorithm by dealing with just one type : the interface.
Very important to me is that polymorphism is best used with a pure interface/implementation pattern ( like the venerable Shape <- Circle etc... ) .
You can also have polymorphism in concrete classes with template-methods ( aka hooks ), but its effectiveness decreases as complexity increases.
Polymorphism is the foundation on which our company's codebase is built, so I consider it very practical.
Switches and polymorphism does the same thing.
In polymorphism (and in class-based programming in general) you group the functions by their type. When using switches you group the types by function. Decide which view is good for you.
So if your interface is fixed and you only add new types, polymorphism is your friend.
But if you add new functions to your interface you will need to update all implementations.
In certain cases, you may have a fixed amount of types, and new functions can come, then switches are better. But adding new types makes you update every switch.
With switches you are duplicating sub-type lists. With polymorphism you are duplicating operation lists. You traded a problem to get a different one. This is the so called expression problem, which is not solved by any programming paradigm I know. The root of the problem is the one-dimensional nature of the text used to represent the code.
Since pro-polymorphism points are well discussed here, let me provide a pro-switch point.
OOP has design patterns to avoid common pitfalls. Procedural programming has design patterns too (but no one have wrote it down yet AFAIK, we need another new Gang of N to make a bestseller book of those...). One design pattern could be always include a default case.
Switches can be done right:
switch (type)
{
case T_FOO: doFoo(); break;
case T_BAR: doBar(); break;
default:
fprintf(stderr, "You, who are reading this, add a new case for %d to the FooBar function ASAP!\n", type);
assert(0);
}
This code will point your favorite debugger to the location where you forgot to handle a case. A compiler can force you to implement your interface, but this forces you to test your code thoroughly (at least to see the new case is noticed).
Of course if a particular switch would be used more than one places, it's cut out into a function (don't repeat yourself).
If you want to extend these switches just do a grep 'case[ ]*T_BAR' rn . (on Linux) and it will spit out the locations worth looking at. Since you need to look at the code, you will see some context which helps you how to add the new case correctly. When you use polymorphism the call sites are hidden inside the system, and you depend on the correctness of the documentation, if it exists at all.
Extending switches does not break the OCP too, since you does not alter the existing cases, just add a new case.
Switches also help the next guy trying to get accustomed to and understand the code:
The possible cases are before your eyes. That's a good thing when reading code (less jumping around).
But virtual method calls are just like normal method calls. One can never know if a call is virtual or normal (without looking up the class). That's bad.
But if the call is virtual, possible cases are not obvious (without finding all derived classes). That's also bad.
When you provide an interface to a thirdparty, so they can add behavior and user data to a system, then that's a different matter. (They can set callbacks and pointers to user-data, and you give them handles)
Further debate can be found here: http://c2.com/cgi/wiki?SwitchStatementsSmell
I'm afraid my "C-hacker's syndrome" and anti-OOPism will eventually burn all my reputation here. But whenever I needed or had to hack or bolt something into a procedural C system, I found it quite easy, the lack of constraints, forced encapsulation and less abstraction layers makes me "just do it". But in a C++/C#/Java system where tens of abstraction layers stacked on the top of each other in the software's lifetime, I need to spend many hours sometimes days to find out how to correctly work around all the constraints and limitations that other programmers built into their system to avoid others "messing with their class".
This is mainly to do with encapsulation of knowledge. Let's start with a really obvious example - toString(). This is Java, but easily transfers to C++. Suppose you want to print a human friendly version of an object for debugging purposes. You could do:
switch(obj.type): {
case 1: cout << "Type 1" << obj.foo <<...; break;
case 2: cout << "Type 2" << ...
This would however clearly be silly. Why should one method somewhere know how to print everything. It will often be better for the object itself to know how to print itself, eg:
cout << object.toString();
That way the toString() can access member fields without needing casts. They can be tested independently. They can be changed easily.
You could argue however, that how an object prints shouldn't be associated with an object, it should be associated with the print method. In this case, another design pattern comes in helpful, which is the Visitor pattern, used to fake Double Dispatch. Describing it fully is too long for this answer, but you can read a good description here.
If you are using switch statements everywhere you run into the possibility that when upgrading you miss one place thats needs an update.
It works very well if you understand it.
There are also 2 flavors of polymorphism. The first is very easy to understand in java-esque:
interface A{
int foo();
}
final class B implements A{
int foo(){ print("B"); }
}
final class C implements A{
int foo(){ print("C"); }
}
B and C share a common interface. B and C in this case can't be extended, so you're always sure which foo() you're calling. Same goes for C++, just make A::foo pure virtual.
Second, and trickier is run-time polymorphism. It doesn't look too bad in pseudo-code.
class A{
int foo(){print("A");}
}
class B extends A{
int foo(){print("B");}
}
class C extends B{
int foo(){print("C");}
}
...
class Z extends Y{
int foo(){print("Z");
}
main(){
F* f = new Z();
A* a = f;
a->foo();
f->foo();
}
But it is a lot trickier. Especially if you're working in C++ where some of the foo declarations may be virtual, and some of the inheritance might be virtual. Also the answer to this:
A* a = new Z;
A a2 = *a;
a->foo();
a2.foo();
might not be what you expect.
Just keep keenly aware of what you do and don't know if you're using run-time polymorphism. Don't get overconfident, and if you're not sure what something is going to do at run-time, then test it.
I must re-iterate that finding all switch statments can be a non trivial processes in a mature code base. If you miss any then the application is likely to crash because of an unmatched case statement unless you have default set.
Also check out "Martin Fowlers" book on "Refactoring"
Using a switch instead of polymorphism is a code smell.
It really depends on your style of programming. While this may be correct in Java or C#, I don't agree that automatically deciding to use polymorphism is correct. You can split your code into lots of little functions and perform an array lookup with function pointers (initialized at compile time), for instance. In C++, polymorphism and classes are often overused - probably the biggest design mistake made by people coming from strong OOP languages into C++ is that everything goes into a class - this is not true. A class should only contain the minimal set of things that make it work as a whole. If a subclass or friend is necessary, so be it, but they shouldn't be the norm. Any other operations on the class should be free functions in the same namespace; ADL will allow these functions be used without lookup.
C++ is not an OOP language, don't make it one. It's as bad as programming C in C++.