Are type fields pure evil? - c++

As discusses in The c++ Programming Language 3rd Edition in section 12.2.5, type fields tend to create code that is less versatile, error-prone, less intuitive, and less maintainable than the equivalent code that uses virtual functions and polymorphism.
As a short example, here is how a type field would be used:
void print(const Shape &s)
{
switch(s.type)
{
case Shape::TRIANGE:
cout << "Triangle" << endl;
case Shape::SQUARE:
cout << "Square" << endl;
default:
cout << "None" << endl;
}
}
Clearly, this is a nightmare as adding a new type of shape to this and a dozen similar functions would be error-prone and taxing.
Despite these shortcomings and those described in TC++PL, are there any examples where such an implementation (using a type field) is a better solution than utilizing the language features of virtual functions? Or should this practice be black listed as pure evil?
Realistic examples would be preferred over contrived ones, but I'd still be interested in contrived examples. Also, have you ever seen this in production code (even though virtual functions would have been easier)?

When you "know" you have a very specific, small, constant set of types, it can be easier to hardcode them like this. Of course, constants aren't and variables don't, so at some point you might have to rewrite the whole thing anyway.
This is, more or less, the technique used for discriminated unions in several of Alexandrescu's articles.
For example, if I was implementing a JSON library, I'd know each Value can only be an Object, Array, String, Integer, Boolean, or Null—the spec doesn't allow any others.

A type enum can be serialized via memcpy, a v-table can't. A similar feature is that corruption of a type enum value is easy to handle, corruption of the v-table pointer means instant badness. There's no portable way to even test a v-table pointer for validity, calling dynamic_cast or typeinfo to do RTTI checks on an invalid object is undefined behavior.
For example, one instance where I choose to use a type hierarchy with static dispatch controlled by a discriminator and not dynamic dispatch is when passing a pointer to a structure through a Windows message queue. This gives me some protection against other software that may have allocated broadcast messages from a range I'm using (it's supposed to be reserved for app-local messages, do not pass GO if you think that rule is actually respected).

The following guideline is from Clean Code by Robert C. Martin.
"My general rule for switch statements is that they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them".
The rationale is this: if you expose type fields to the rest of your code, you'll get multiple instances of the above switch statement. That is a clear violation of DRY. When you add a type, all these switches need to change (or, even worse, they become inconsistent without breaking your build).

My take is: It depends.
A parameterized Factory Method design pattern relies on this technique.
class Creator {
public:
virtual Product* Create(ProductId);
};
Product* Creator::Create (ProductId id) {
if (id == MINE) return new MyProduct;
if (id == YOURS) return new YourProduct;
// repeat for remaining products...
return 0;
}
So, is this bad. I don't think so as we do not have any other alternative at this stage. This is a place where it is absolutely necessary as it involves creation of an object. The type of the object is yet to be known.
The example in OP is however an example which sure needs refactoring. Here we are already dealing with an existing object/type (passed as argument to function).
As Herb Sutter mentions -
"Switch off: Avoid switching on the
type of an object to customize
behavior. Use templates and virtual
functions to let types (not their
calling code) decide their behavior."

Aren't there costs associated to virtual functions and polymorphism? Like maintaining a vtable per class, increase of each class object size by 4 byptes, runtime slowness (I have never measured it though) for resolving the virtual function appropriately. So, for simple situations, using a type field appears acceptable.

I think that if the type corresponds precisely to the implied classes then type is wrong. Where it gets complicated is where the type does not quite match or its not so cut and dried.
Taking your example what if type was Red, Green, Blue. Those are types of shapes. You could even have a color class as a mixin; but its probably too much.

I am thinking of using a type field to solve the problem of vector slicing. That is, I want a vector of hierarchical objects. For example I want my vector to be a vector of shapes, but I want to store circles, rectangles, triangles etc.
You can't do that in the most obvious simple way because of slicing. So the normal solution is to have a vector of pointers or smart pointers instead. But I think there are cases where using a type field will be a simpler solution, (avoids new/delete or alternative lifecycle techniques).

The best example I can think of (and the one I've run into before), is when your set of types is fixed and the set of functions you want to do (that depend on those types) is fluid. That way, when you add a new function, you modify a single place (adding a single switch) rather than adding a new base virtual function with the real implementation scattered all across the classes in your type hierarchy.

I don't know of any realistic examples. Contrived ones would depend on there being some good reason that virtual methods cannot be used.

Related

Text Adventure Game - How to Tell One Item Type from Another and How to Structure the Item Classes/Subclasses?

I'm a beginner programmer (who has a bunch of design-related scripting experience for video games but very little programming experience - so just basic stuff like loops, flow control, etc. - although I do have a C++ fundamentals and C++ data structures and algorithm's course under my belt). I'm working on a text-adventure personal project (I actually already wrote it in Python ages ago before I learned how classes work - everything is a dictionary - so it's shameful). I'm "remaking" it in C++ with classes to get out of the rut of having only done homework assignments.
I've written my player and room classes (which were simple since I only need one class for each). I'm onto item classes (an item being anything in a room, such as a torch, a fire, a sign, a container, etc.). I'm unsure how to approach the item base class and derived classes. Here are the problems I'm having.
How do I tell whether an item is of a certain type in a non-shit way (there's a good chance I'm overthinking this)?
For example, I set up my print room info function so that in addition to whatever else it might do, it prints the name of every object in its inventory (i.e. inside of it) and I want it to print something special for a container object (the contents of its inventory for example).
The first part's easy because every item has a name since the name attribute is part of the base item class. The container has an inventory though, which is an attribute unique to the container subclass.
It's my understanding that it's bad form to execute conditional logic based on the object's class type (because one's classes should be polymorphic) and I'm assuming (perhaps incorrectly) that it'd be weird and wrong to put a getHasInventory accessor virtual function in the item base class (my assumption here is based on thinking it'd be crazy to put virtual functions for every derived class in the base class - I have about a dozen derived classes - a couple of which are derived classes of derived classes).
If that's all correct, what's an acceptable way to do this? One obvious thing is to add an itemType attribute to the base and then do conditional logic but this strikes me as wrong since it seems to just be a re-skinning of the checking class type solution. I'm unsure whether the above-mentioned assumptions are correct and what a good solution might be.
How should I structure my base class/classes and my derived classes?
I originally wrote them such that the item class was the base class and most other classes used single inheritance (except for a couple which had multi-level).
This seemed to present some awkwardness and repeating myself though. For example, I want a sign and a letter. A sign is a Readable Item > Untakeable Item > Item. A letter is a Readable Item > Takeable Item > Item. Because they all use single inheritance I need two different Readable Items, one that's takeable and one that's not (I know I could just make takeable and untakeable into attributes of the base in this instance and I did but this works as an example because I still have similar issues with other classes).
That seems icky to me so I took another stab at it and implemented them all using multiple inheritance & virtual inheritance. In my case that seems more flexible because I can compose classes of multiple classes and create a kind of component system for my classes.
Is one of these ways better than the other? Is there some third way that's better?
One possible way to solve your problem is polymorphism. By using polymorphism you can (for example) have a single describe function which when invoked leads the item to describe itself to the player. You can do the same for use, and other common verbs.
Another way is to implement a more advanced input parser, which can recognize objects and pass on the verbs to some (polymorphic) function of the items for themselves to handle. For example each item could have a function returning a list of available verbs, together with a function returning a list of "names" for the items:
struct item
{
// Return a list of verbs this item reacts to
virtual std::vector<std::string> get_verbs() = 0;
// Return a list of name aliases for this item
virtual std::vector<std::string> get_names() = 0;
// Describe this items to the player
virtual void describe(player*) = 0;
// Perform a specific verb, input is the full input line
virtual void perform_verb(std::string verb, std::string input) = 0;
};
class base_torch : public item
{
public:
std::vector<std::string> get_verbs() override
{
return { "light", "extinguish" };
}
// Return true if the torch is lit, false otherwise
bool is_lit();
void perform_verb(std::string verb, std::string) override
{
if (verb == "light")
{
// TODO: Make the torch "lit"
}
else
{
// TODO: Make the torch "extinguished"
}
}
};
class long_brown_torch : public base_torch
{
std::vector<std::string> get_names() override
{
return { "long brown torch", "long torch", "brown torch", "torch" };
}
void describe(player* p) override
{
p->write("This is a long brown torch.");
if (is_lit())
p->write("The torch is burning.");
}
};
Then if the player input e.g. light brown torch the parser looks through all available items (the ones in the players inventory followed by the items in the room), get each items name-list (call the items get_names() function) and compare it to the brown torch. If a match is found the parser calls the items perform_verb function passing the appropriate arguments (item->perform_verb("light", "light brown torch")).
You can even modify the parser (and the items) to handle adjectives separately, or even articles like the, or save the last used item so it can be referenced by using it.
Constructing the different rooms and items is tedious but still trivial once a good design has been made (and you really should spend some time creating requirement, analysis of the requirements, and creating a design). The really hard part is writing a decent parser.
Note that this is only two possible ways to handle items and verbs in such a game. There are many other ways, to many to list them all.
You are asking some excellent questions reg. how to design, structure and implement the program, as well as how to model the problem domain.
OOP, 'methods' and approaches
The questions you ask indicate that you have learned about OOP (object-oriented programming). In a lot of introductory material on OOP, it is common to encourage modelling the problem domain directly through objects and subtyping and implementing functionality by adding methods to them. A classical example is modelling animals, with for instance an Animal type and two sub-types, Duck and Cat, and implementing functionality, for instance walk, quack and mew.
Modelling the problem domain directly with objects and subtyping can make sense, but it can also very much be overkill and bothersome compared to simply having a single or a few types with different fields describing what it is. In your case, I do believe a more complex modelling like you have with objects and subtypes or alternative approaches can make sense, since among other aspects you have functionality that varies depending on the type as well as somewhat complex data (like a container with an inventory). But it is something to keep in mind - there are different trade-offs, and sometimes, having a single type with multiple different fields for modelling the domain can make more sense overall.
Implementing the desired functionality through methods on a base class and subtypes likewise have different trade-offs, and it is not always a good approach for the given case. For one of your questions, you could do something like adding a print method or similar to the base type and each subtype, but this is not always that nice in practice (a simple example is that of a calculator application where simplifying the arithmetic expression the user enters (like (3*x)*4/2) might be bothersome to implement if one uses the approach of adding methods to the base class).
Alternative approach - Tagged unions/sum types
There is a very nice fundamental abstraction known as "tagged union" (it is also known by the names "disjoint union" and "sum type"). The main idea about the tagged union is that you have a union of several different sets of instances, where which set the given instance belongs to matters. They are a superset of the feature in C++ known as enum. Regrettably, C++ does not currently support tagged unions, though there are research into it (for instance https://www.stroustrup.com/OpenPatternMatching.pdf , though this may be somewhat beyond you if you are a beginner programmer). As far as I can see, this fits very well with the example you have given here. An example in Scala would be (many other languages support tagged unions as well, such as Rust, Kotlin, Typescript, the ML-languages, Haskell, etc.):
sealed trait Item {
val name: String
}
case class Book(val name: String) extends Item
case object Fire extends Item {
val name = "Fire"
}
case class Container(val name: String, val inventory: List[Item]) extends Item
This describes your different kinds of items very well as far as I can see. Do note that Scala is a bit special in this regard, since it implements tagged unions through subtyping.
If you then wanted to implement some print functionality, you could then use "pattern matching" to match which item you have and do functionality specific to that item. In languages that support pattern matching, this is convenient and non-fragile, since the pattern matching checks that you have covered each possible case (similar to switch in C++ over enums checking that you have covered each possible case). For instance in Scala:
def getDescription(item: Item): String = {
item match {
case Book(_) | Fire => item.name
case Container(name, inventory) =>
name + " contains: (" +
inventory
.map(getDescription(_))
.mkString(", ") +
")"
}
}
val description = getDescription(
Container("Bag", List(Book("On Spelunking"), Fire))
)
println(description)
You can copy-paste the two snippets in here and try to run them: https://scalafiddle.io/ .
This kind of modelling works very well with what one might call "data types", where you have no or very little functionality in the classes themselves, and where the fields inside the classes basically are part of their interface ("interface" in the sense that you would like to change the implementations that uses the types if you ever add to, remove or change the fields of the types).
Conversely, I find a more conventional subtyping modelling and approach more convenient when the implementation inside of a class is not part of its interface, for instance if I have a base type that describes a collision system interface, and each of its subtypes have different performance characteristics, handy for different situations. Hiding and protecting the implementation since it is not part of the interface makes a lot of sense and fits very well with what one might call "mini-modules".
In C++ (and C), sometimes people do use tagged unions despite the lack of language support, in various ways. One way that I have seen being used in C is to make a C union (though do be careful reg. aspects such as memory and semantics) where an enum tag was used to differentiate between the different cases. This is error-prone, since you might easily end up accessing a field in one enum case that is not valid for that enum case.
You could also model your command input as a tagged union. That said, parsing can be somewhat challenging, and parsing libraries may be a bit involved if you are a beginner programmer; keeping the parsing somewhat simple might be a good idea.
Side-notes
C++ is a special languages - I do not quite like it for cases where I do not care much about resource usage or runtime performance and the like for multiple different reasons, since it can be annoying and not that flexible to develop in. And it can be challenging to develop in, because you must always take great care to avoid undefined behaviour. That said, if resource usage or runtime performance do matter, C++ can, depending on case, be a very good option. There are also a number of very useful and important insights in the C++ language and its community, such as RAII, ownership and lifetimes. My recommendation is that learning C++ is a good idea, but that you should also learn other languages, maybe for instance a statically-typed functional programming language. FP (functional programming) and languages supporting FP, has a number of advantages and drawbacks, but some of their advantages are very, very nice, especially reg. immutability as well as side-effects.
Of these languages, Rust may be the closest to C++ in certain regards, though I don't have experience with Rust and cannot therefore vouch for either the language or its community.
As a side-note, you may be interested in this Wikipedia-page: https://en.wikipedia.org/wiki/Expression_problem .

Why should one not derive from c++ std string class?

I wanted to ask about a specific point made in Effective C++.
It says:
A destructor should be made virtual if a class needs to act like a polymorphic class. It further adds that since std::string does not have a virtual destructor, one should never derive from it. Also std::string is not even designed to be a base class, forget polymorphic base class.
I do not understand what specifically is required in a class to be eligible for being a base class (not a polymorphic one)?
Is the only reason that I should not derive from std::string class is it does not have a virtual destructor? For reusability purpose a base class can be defined and multiple derived class can inherit from it. So what makes std::string not even eligible as a base class?
Also, if there is a base class purely defined for reusability purpose and there are many derived types, is there any way to prevent client from doing Base* p = new Derived() because the classes are not meant to be used polymorphically?
I think this statement reflects the confusion here (emphasis mine):
I do not understand what specifically is required in a class to be eligible for being a base clas (not a polymorphic one)?
In idiomatic C++, there are two uses for deriving from a class:
private inheritance, used for mixins and aspect oriented programming using templates.
public inheritance, used for polymorphic situations only. EDIT: Okay, I guess this could be used in a few mixin scenarios too -- such as boost::iterator_facade -- which show up when the CRTP is in use.
There is absolutely no reason to publicly derive a class in C++ if you're not trying to do something polymorphic. The language comes with free functions as a standard feature of the language, and free functions are what you should be using here.
Think of it this way -- do you really want to force clients of your code to convert to using some proprietary string class simply because you want to tack on a few methods? Because unlike in Java or C# (or most similar object oriented languages), when you derive a class in C++ most users of the base class need to know about that kind of a change. In Java/C#, classes are usually accessed through references, which are similar to C++'s pointers. Therefore, there's a level of indirection involved which decouples the clients of your class, allowing you to substitute a derived class without other clients knowing.
However, in C++, classes are value types -- unlike in most other OO languages. The easiest way to see this is what's known as the slicing problem. Basically, consider:
int StringToNumber(std::string copyMeByValue)
{
std::istringstream converter(copyMeByValue);
int result;
if (converter >> result)
{
return result;
}
throw std::logic_error("That is not a number.");
}
If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object -- no matter what child class of std::string is passed. This can lead to inconsistency between your methods and anything attached to the string. The function StringToNumber cannot simply take whatever your derived object is and copy that, simply because your derived object probably has a different size than a std::string -- but this function was compiled to reserve only the space for a std::string in automatic storage. In Java and C# this is not a problem because the only thing like automatic storage involved are reference types, and the references are always the same size. Not so in C++.
Long story short -- don't use inheritance to tack on methods in C++. That's not idiomatic and results in problems with the language. Use non-friend, non-member functions where possible, followed by composition. Don't use inheritance unless you're template metaprogramming or want polymorphic behavior. For more information, see Scott Meyers' Effective C++ Item 23: Prefer non-member non-friend functions to member functions.
EDIT: Here's a more complete example showing the slicing problem. You can see it's output on codepad.org
#include <ostream>
#include <iomanip>
struct Base
{
int aMemberForASize;
Base() { std::cout << "Constructing a base." << std::endl; }
Base(const Base&) { std::cout << "Copying a base." << std::endl; }
~Base() { std::cout << "Destroying a base." << std::endl; }
};
struct Derived : public Base
{
int aMemberThatMakesMeBiggerThanBase;
Derived() { std::cout << "Constructing a derived." << std::endl; }
Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; }
~Derived() { std::cout << "Destroying a derived." << std::endl; }
};
int SomeThirdPartyMethod(Base /* SomeBase */)
{
return 42;
}
int main()
{
Derived derivedObject;
{
//Scope to show the copy behavior of copying a derived.
Derived aCopy(derivedObject);
}
SomeThirdPartyMethod(derivedObject);
}
To offer the counter side to the general advice (which is sound when there are no particular verbosity/productivity issues evident)...
Scenario for reasonable use
There is at least one scenario where public derivation from bases without virtual destructors can be a good decision:
you want some of the type-safety and code-readability benefits provided by dedicated user-defined types (classes)
an existing base is ideal for storing the data, and allows low-level operations that client code would also want to use
you want the convenience of reusing functions supporting that base class
you understand that any any additional invariants your data logically needs can only be enforced in code explicitly accessing the data as the derived type, and depending on the extent to which that will "naturally" happen in your design, and how much you can trust client code to understand and cooperate with the logically-ideal invariants, you may want members functions of the derived class to reverify expectations (and throw or whatever)
the derived class adds some highly type-specific convenience functions operating over the data, such as custom searches, data filtering / modifications, streaming, statistical analysis, (alternative) iterators
coupling of client code to the base is more appropriate than coupling to the derived class (as the base is either stable or changes to it reflect improvements to functionality also core to the derived class)
put another way: you want the derived class to continue to expose the same API as the base class, even if that means the client code is forced to change, rather than insulating it in some way that allows the base and derived APIs to grow out of sync
you're not going to be mixing pointers to base and derived objects in parts of the code responsible for deleting them
This may sound quite restrictive, but there are plenty of cases in real world programs matching this scenario.
Background discussion: relative merits
Programming is about compromises. Before you write a more conceptually "correct" program:
consider whether it requires added complexity and code that obfuscates the real program logic, and is therefore more error prone overall despite handling one specific issue more robustly,
weigh the practical costs against the probability and consequences of issues, and
consider "return on investment" and what else you could be doing with your time.
If the potential problems involve usage of the objects that you just can't imagine anyone attempting given your insights into their accessibility, scope and nature of usage in the program, or you can generate compile-time errors for dangerous use (e.g. an assertion that derived class size matches the base's, which would prevent adding new data members), then anything else may be premature over-engineering. Take the easy win in clean, intuitive, concise design and code.
Reasons to consider derivation sans virtual destructor
Say you have a class D publicly derived from B. With no effort, the operations on B are possible on D (with the exception of construction, but even if there are a lot of constructors you can often provide effective forwarding by having one template for each distinct number of constructor arguments: e.g. template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { }. Better generalised solution in C++0x variadic templates.)
Further, if B changes then by default D exposes those changes - staying in sync - but someone may need to review extended functionality introduced in D to see if it remains valid, and the client usage.
Rephrasing this: there is reduced explicit coupling between base and derived class, but increased coupling between base and client.
This is often NOT what you want, but sometimes it is ideal, and other times a non issue (see next paragraph). Changes to the base force more client code changes in places distributed throughout the code base, and sometimes the people changing the base may not even have access to the client code to review or update it correspondingly. Sometimes it is better though: if you as the derived class provider - the "man in the middle" - want base class changes to feed through to clients, and you generally want clients to be able - sometimes forced - to update their code when the base class changes without you needing to be constantly involved, then public derivation may be ideal. This is common when your class is not so much an independent entity in its own right, but a thin value-add to the base.
Other times the base class interface is so stable that the coupling may be deemed a non issue. This is especially true of classes like Standard containers.
Summarily, public derivation is a quick way to get or approximate the ideal, familiar base class interface for the derived class - in a way that's concise and self-evidently correct to both the maintainer and client coder - with additional functionality available as member functions (which IMHO - which obviously differs with Sutter, Alexandrescu etc - can aid usability, readability and assist productivity-enhancing tools including IDEs)
C++ Coding Standards - Sutter & Alexandrescu - cons examined
Item 35 of C++ Coding Standards lists issues with the scenario of deriving from std::string. As scenarios go, it's good that it illustrates the burden of exposing a large but useful API, but both good and bad as the base API is remarkably stable - being part of the Standard Library. A stable base is a common situation, but no more common than a volatile one and a good analysis should relate to both cases. While considering the book's list of issues, I'll specifically contrast the issues' applicability to the cases of say:
a) class Issue_Id : public std::string { ...handy stuff... }; <-- public derivation, our controversial usage
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO derivation
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; <-- a compositional approach
d) using std::string everywhere, with freestanding support functions
(Hopefully we can agree the composition is acceptable practice, as it provides encapsulation, type safety as well as a potentially enriched API over and above that of std::string.)
So, say you're writing some new code and start thinking about the conceptual entities in an OO sense. Maybe in a bug tracking system (I'm thinking of JIRA), one of them is say an Issue_Id. Data content is textual - consisting of an alphabetic project id, a hyphen, and an incrementing issue number: e.g. "MYAPP-1234". Issue ids can be stored in a std::string, and there will be lots of fiddly little text searches and manipulation operations needed on issue ids - a large subset of those already provided on std::string and a few more for good measure (e.g. getting the project id component, providing the next possible issue id (MYAPP-1235)).
On to Sutter and Alexandrescu's list of issues...
Nonmember functions work well within existing code that already manipulates strings. If instead you supply a super_string, you force changes through your code base to change types and function signatures to super_string.
The fundamental mistake with this claim (and most of the ones below) is that it promotes the convenience of using only a few types, ignoring the benefits of type safety. It's expressing a preference for d) above, rather than insight into c) or b) as alternatives to a). The art of programming involves balancing the pros and cons of distinct types to achieve reasonable reuse, performance, convenience and safety. The paragraphs below elaborate on this.
Using public derivation, the existing code can implicitly access the base class string as a string, and continue to behave as it always has. There's no specific reason to think that the existing code would want to use any additional functionality from super_string (in our case Issue_Id)... in fact it's often lower-level support code pre-existing the application for which you're creating the super_string, and therefore oblivious to the needs provided for by the extended functions. For example, say there's a non-member function to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it could still be applied to an Issue_Id.
So, unless the non-member support function is being cleaned up or extended at the deliberate cost of tightly coupling it to the new code, then it needn't be touched. If it is being overhauled to support issue ids (for example, using the insight into the data content format to upper-case only leading alpha characters), then it's probably a good thing to ensure it really is being passed an Issue_Id by creating an overload ala to_upper(Issue_Id&) and sticking to either the derivation or compositional approaches allowing type safety. Whether super_string or composition is used makes no difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable free-standing support function isn't likely to be of much use - I can't recall the last time I wanted such a function.
The impulse to use std::string everywhere isn't qualitatively different to accepting all your arguments as containers of variants or void*s so you don't have to change your interfaces to accept arbitrary data, but it makes for error prone implementation and less self-documenting and compiler-verifiable code.
Interface functions that take a string now need to: a) stay away from super_string's added functionality (unuseful); b) copy their argument to a super_string (wasteful); or c) cast the string reference to a super_string reference (awkward and potentially illegal).
This seems to be revisiting the first point - old code that needs to be refactored to use the new functionality, albeit this time client code rather than support code. If the function wants to start treating its argument as an entity for which the new operations are relevant, then it should start taking its arguments as that type and the clients should generate them and accept them using that type. The exact same issues exists for composition. Otherwise, c) can be practical and safe if the guidelines I list below are followed, though it is ugly.
super_string's member functions don't have any more access to string's internals than nonmember functions because string probably doesn't have protected members (remember, it wasn't meant to be derived from in the first place)
True, but sometimes that's a good thing. A lot of base classes have no protected data. The public string interface is all that's needed to manipulate the contents, and useful functionality (e.g. get_project_id() postulated above) can be elegantly expressed in terms of those operations. Conceptually, many times I've derived from Standard containers, I've wanted not to extend or customise their functionality along the existing lines - they're already "perfect" containers - rather I've wanted to add another dimension of behaviour that's specific to my application, and requires no private access. It's because they're already good containers that they're good to reuse.
If super_string hides some of string's functions (and redefining a nonvirtual function in a derived class is not overriding, it's just hiding), that could cause widespread confusion in code that manipulates strings that started their life converted automatically from super_strings.
True for composition too - and more likely to happen as the code doesn't default to passing things through and hence staying in sync, and also true in some situations with run-time polymorphic hierarchies as well. Samed named functions that behave differently in classes that initial appear interchangeable - just nasty. This is effectively the usual caution for correct OO programming, and again not a sufficient reason to abandon the benefits in type safety etc..
What if super_string wants to inherit from string to add more state [explanation of slicing]
Agreed - not a good situation, and somewhere I personally tend to draw the line as it often moves the problems of deletion through a pointer to base from the realm of theory to the very practical - destructors aren't invoked for additional members. Still, slicing can often do what's wanted - given the approach of deriving super_string not to change its inherited functionality, but to add another "dimension" of application-specific functionality....
Admittedly, it's tedious to have to write passthrough functions for the member functions you want to keep, but such an implementation is vastly better and safer than using public or nonpublic inheritance.
Well, certainly agree about the tedium....
Guidelines for successful derivation sans virtual destructor
ideally, avoid adding data members in derived class: variants of slicing can accidentally remove data members, corrupt them, fail to initialise them...
even more so - avoid non-POD data members: deletion via base-class pointer is technically undefined behaviour anyway, but with non-POD types failing to run their destructors is more likely to have non-theoretical problems with resource leaks, bad reference counts etc.
honour the Liskov Substitution Principal / you can't robustly maintain new invariants
for example, in deriving from std::string you can't intercept a few functions and expect your objects to remain uppercase: any code that accesses them via a std::string& or ...* can use std::string's original function implementations to change the value)
derive to model a higher level entity in your application, to extend the inherited functionality with some functionality that uses but doesn't conflict with the base; do not expect or try to change the basic operations - and access to those operations - granted by the base type
be aware of the coupling: base class can't be removed without affecting client code even if the base class evolves to have inappropriate functionality, i.e. your derived class's usability depends on the ongoing appropriateness of the base
sometimes even if you use composition you'll need to expose the data member due to performance, thread safety issues or lack of value semantics - so the loss of encapsulation from public derivation isn't tangibly worse
the more likely people using the potentially-derived class will be unaware of its implementation compromises, the less you can afford to make them dangerous
therefore, low-level widely deployed libraries with many ad-hoc casual users should be more wary of dangerous derivation than localised use by programmers routinely using the functionality at application level and/or in "private" implementation / libraries
Summary
Such derivation is not without issues so don't consider it unless the end result justifies the means. That said, I flatly reject any claim that this can't be used safely and appropriately in particular cases - it's just a matter of where to draw the line.
Personal experience
I do sometimes derive from std::map<>, std::vector<>, std::string etc - I've never been burnt by the slicing or delete-via-base-class-pointer issues, and I've saved a lot of time and energy for more important things. I don't store such objects in heterogeneous polymorphic containers. But, you need to consider whether all the programmers using the object are aware of the issues and likely to program accordingly. I personally like to write my code to use heap and run-time polymorphism only when needed, while some people (due to Java backgrounds, their prefered approach to managing recompilation dependencies or switching between runtime behaviours, testing facilities etc.) use them habitually and therefore need to be more concerned about safe operations via base class pointers.
If you really want to derive from it (not discussing why you want to do it) I think you can prevent Derived class direct heap instantiation by making it's operator new private:
class StringDerived : public std::string {
//...
private:
static void* operator new(size_t size);
static void operator delete(void *ptr);
};
But this way you restrict yourself from any dynamic StringDerived objects.
Not only is the destructor not virtual, std::string contains no virtual functions at all, and no protected members. That makes it very hard for the derived class to modify its functionality.
Then why would you derive from it?
Another problem with being non-polymorphic is that if you pass your derived class to a function expecting a string parameter, your extra functionality will just be sliced off and the object will be seen as a plain string again.
Why should one not derive from c++ std string class?
Because it is not necessary. If you want to use DerivedString for functionality extension; I don't see any problem in deriving std::string. The only thing is, you should not interact between both classes (i.e. don't use string as a receiver for DerivedString).
Is there any way to prevent client from doing Base* p = new Derived()
Yes. Make sure that you provide inline wrappers around Base methods inside Derived class. e.g.
class Derived : protected Base { // 'protected' to avoid Base* p = new Derived
const char* c_str () const { return Base::c_str(); }
//...
};
There are two simple reasons for not deriving from a non-polymorphic class:
Technical: it introduces slicing bugs (because in C++ we pass by value unless otherwise specified)
Functional: if it is non-polymorphic, you can achieve the same effect with composition and some function forwarding
If you wish to add new functionalities to std::string, then first consider using free functions (possibly templates), like the Boost String Algorithm library does.
If you wish to add new data members, then properly wrap the class access by embedding it (Composition) inside a class of your own design.
EDIT:
#Tony noticed rightly that the Functional reason I cited was probably meaningless to most people. There is a simple rule of thumb, in good design, that says that when you can pick a solution among several, you should consider the one with the weaker coupling. Composition has weaker coupling that Inheritance, and thus should be preferred, when possible.
Also, composition gives you the opportunity to nicely wrap the original's class method. This is not possible if you pick inheritance (public) and the methods are not virtual (which is the case here).
The C++ standard states that If Base class destructor is not virtual and you delete an object of Base class that points to the object of an derived class then it causes an undefined Behavior.
C++ standard section 5.3.5/3:
if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined.
To be clear on the Non-polymorphic class & need of virtual destructor
The purpose of making a destructor virtual is to facilitate the polymorphic deletion of objects through delete-expression. If there is no polymorphic deletion of objects, then you don't need virtual destructor's.
Why not to derive from String Class?
One should generally avoid deriving from any standard container class because of the very reason that they don' have virtual destructors, which make it impossible to delete objects polymorphically.
As for the string class, the string class doesn't have any virtual functions so there is nothing that you can possibly override. The best you can do is hide something.
If at all you want to have a string like functionality you should write a class of your own rather than inherit from std::string.
As soon as you add any member (variable) into your derived std::string class, will you systematically screw the stack if you attempt to use the std goodies with an instance of your derived std::string class? Because the stdc++ functions/members have their stack pointers[indexes] fixed [and adjusted] to the size/boundary of the (base std::string) instance size.
Right?
Please, correct me if I am wrong.

Alternatives to passing a flag into a method?

Sometimes when fixing a defect in an existing code base I might (often out of laziness) decide to change a method from:
void
MyClass::foo(uint32_t aBar)
{
// Do something with aBar...
}
to:
void
MyClass::foo(uint32_t aBar, bool aSomeCondition)
{
if (aSomeCondition)
{
// Do something with aBar...
}
}
During a code review a colleague mentioned that a better approach would be to sub-class MyClass to provide this specialized functionality.
However, I would argue that as long as aSomeCondition doesn't violate the purpose or cohesion of MyClass it is an acceptable pattern to use. Only if the code became infiltrated with flags and if statements would inheritance be a better option, otherwise we would be potentially be entering architecture astronaut territory.
What's the tipping point here?
Note: I just saw this related answer which suggests that an enum may be a better
choice than a bool, but I think my question still applies in this case.
There is not only one solution for this kind of problem.
Boolean has a very low semantic. If you want to add in the future a new condition you will have to add a new parameter...
After four years of maintenance your method may have half a dozen of parameters, if these parameters are all boolean it is very nice trap for maintainers.
Enum is a good choice if cases are exclusive.
Enums can be easily migrated to a bit-mask or a context object.
Bit mask : C++ includes C language, you can use some plain old practices. Sometime a bit mask on an unsigned int is a good choice (but you loose type checking) and you can pass by mistake an incorrect mask. It is a convenient way to move smoothly from a boolean or an enum argument to this kind of pattern.
Bit mask can be migrated with some effort to a context-object. You may have to implement some kind of bitwise arithmetics such as operator | and operator & if you have to keep a buildtime compatibility.
Inheritence is sometime a good choice if the split of behavior is big and this behavior IS RELATED to the lifecycle of the instance. Note that you also have to use polymorphism and this is may slow down the method if this method is heavily used.
And finally inheritence induce change in all your factory code... And what will you do if you have several methods to change in an exclusive fashion ? You will clutter your code of specific classes...
In fact, I think that this generally not a very good idea.
Method split : Another solution is sometime to split the method in several private and provide two or more public methods.
Context object : C++ and C lack of named parameter can be bypassed by adding a context parameter. I use this pattern very often, especially when I have to pass many data across level of a complex framework.
class Context{
public:
// usually not a good idea to add public data member but to my opinion this is an exception
bool setup:1;
bool foo:1;
bool bar:1;
...
Context() : setup(0), foo(0), bar(0) ... {}
};
...
Context ctx;
ctx.setup = true; ...
MyObj.foo(ctx);
Note:
That this is also useful to minimize access (or use) of static data or query to singleton object, TLS ...
Context object can contain a lot more of caching data related to an algorithm.
...
I let your imagination run free...
Anti patterns
I add here several anti pattern (to prevent some change of signature):
*NEVER DO THIS *
*NEVER DO THIS * use a static int/bool for argument passing (some people that do that, and this is a nightmare to remove this kind of stuff). Break at least multithreading...
*NEVER DO THIS * add a data member to pass parameter to method.
Unfortunately, I don't think there is a clear answer to the problem (and it's one I encounter quite frequently in my own code). With the boolean:
foo( x, true );
the call is hard to understand .
With an enum:
foo( x, UseHigherAccuracy );
it is easy to understand but you tend to end up with code like this:
foo( x, something == someval ? UseHigherAccuracy : UseLowerAccuracy );
which is hardly an improvement. And with multiple functions:
if ( something == someval ) {
AccurateFoo( x );
}
else {
InaccurateFoo( x );
}
you end up with a lot more code. But I guess this is the easiest to read, and what I'd tend to use, but I still don't completely like it :-(
One thing I definitely would NOT do however, is subclass. Inheritance should be the last tool you ever reach for.
The primary question is if the flag affects the behaviour of the class, or of that one function. Function-local changes should be parameters, not subclasses. Run-time inheritance should be one of the last tools reached for.
The general guideline I use is: if aSomeCondition changes the nature of the function in a major way, then I consider subclassing.
Subclassing is a relatively large effort compared to adding a flag that has only a minor effect.
Some examples:
if it's a flag that changes the direction in which a sorted collection is returned to the caller, that's a minor change in nature (flag).
if it's a one-shot flag (something that affects the current call rather than a persistent change to the object), it should probably not be a subclass (since going down that track is likely to lead to a massive number of classes).
if it's a enumeration that changes the underlying data structure of your class from array to linked list or balanced tree, that's a complex change (subclass).
Of course, that last one may be better handled by totally hiding the underlying data structure but I'm assuming here that you want to be able to select one of many, for reasons such as performance.
IMHO, aSomeCondition flag changes or depends on the state of current instance, therefore, under certain conditions this class should change its state and handle mentioned operation differently. In this case, I can suggest the usage of State Pattern. Hope it helps.
I would just change code:
void MyClass::foo(uint32_t aBar, bool aSomeCondition)
{
if (aSomeCondition)
{
// Do something with aBar...
}
}
to:
void MyClass::foo(uint32_t aBar)
{
if (this->aSomeCondition)
{
// Do something with aBar...
}
}
I always omit bool as function parameter and prefer to put into struct, even if I would have to call
myClass->enableCondition();

dynamical binding or switch/case?

A scene like this:
I've different of objects do the similar operation as respective func() implements.
There're 2 kinds of solution for func_manager() to call func() according to different objects
Solution 1: Use virtual function character specified in c++. func_manager works differently accroding to different object point pass in.
class Object{
virtual void func() = 0;
}
class Object_A : public Object{
void func() {};
}
class Object_B : public Object{
void func() {};
}
void func_manager(Object* a)
{
a->func();
}
Solution 2: Use plain switch/case. func_manager works differently accroding to different type pass in
typedef enum _type_t
{
TYPE_A,
TYPE_B
}type_t;
void func_by_a()
{
// do as func() in Object_A
}
void func_by_b()
{
// do as func() in Object_A
}
void func_manager(type_t type)
{
switch(type){
case TYPE_A:
func_by_a();
break;
case TYPE_B:
func_by_b();
default:
break;
}
}
My Question are 2:
1. at the view point of DESIGN PATTERN, which one is better?
2. at the view point of RUNTIME EFFCIENCE, which one is better? Especailly as the kinds of Object increases, may be up to 10-15 total, which one's overhead oversteps the other? I don't know how switch/case implements innerly, just a bunch of if/else?
Thanks very much!
from the view point of DESIGN PATTERN, which one is better?
Using polymorphism (Solution 1) is better.
Just one data point: Imagine you have a huge system built around either of the two and then suddenly comes the requirement to add another type. With solution one, you add one derived class, make sure it's instantiated where required, and you're done. With solution 2 you have thousands of switch statements smeared all over the system and it is more or less impossible to guarantee you found all the places where you have to modify them for the new type.
from the view point of RUNTIME EFFCIENCE, which one is better? Especailly as the kinds of Object
That's hard to say.
I remember a footnote in Stanley Lippmann's Inside the C++ Object Model, where he says that studies have shown that virtual functions might have a small advantage against switches over types. I would be hard-pressed, however, to cite chapter and verse, and, IIRC, the advantage didn't seem big enough to make the decision dependent on it.
The first solution is better if only because it's shorter in code. It is also easier to maintain and faster to compile: if you want to add types you need only add new types as headers and compilation units, with no need to change and recompile the code that is responsible for the type mapping. In fact, this code is generated by the compiler and is likely to be as efficient or more efficient than anything you can write on your own.
Virtual functions at the lowest level cost no more than an extra dereference in a table (array). But never mind that, this sort of performance nitpicking really doesn't matter at this microscopic numbers. Your code is simpler, and that's what matters. The runtime dispatch that C++ gives you is there for a reason. Use it.
I would say the first one is better. In solution 2 func_manager will have to know about all types and be updated everytime you add a new type. If you go with solution 1 you can later add a new type and func_manager will just work.
In this simple case I would actually guess that solution 1 will be faster since it can directly look up the function address in the vtable. If you have 15 different types the switch statement will likely not end up as a jump table but basically as you say a huge if/else statement.
From the design point of view the first one is definitely better as thats what inheritance was intended for, to make different objects behave homogeneously.
From the efficiency point of view in both alternatives you have more or less the same generated code, somewhere there must be the choice making code. Difference is that inheritance handles it for you automatically in the 1st one and you do it manually in the 2nd one.
Using "dynamic dispatch" or "virtual dispatch" (i.e. invoking a virtual function) is better both with respect to design (keeping changes in one place, extensibility) and with respect to runtime efficiency (simple dereference vs. a simple dereference where a jump table is used or an if...else ladder which is really slow). As a slight aside, "dynamic binding" doesn't mean what you think... "dynamic binding" refers to resolving a variable's value based on its most recent declaration, as opposed to "static binding" or "lexical binding", which refers to resolving a variable by the current inner-most scope in which it is declared.
Design
If another programmer comes along who doesn't have access to your source code and wants to create an implementation of Object, then that programmer is stuck... the only way to extend functionality is by adding yet another case in a very long switch statement. Whereas with virtual functions, the programmer only needs to inherit from your interface and provide definitions for the virtual methods and, walla, it works. Also, those switch statements end up all over the place, and so adding new implementations almost always requires modifying many switch statements everywhere, while inheritance keeps the changes localized to one class.
Efficiency
Dynamic dispatch simply looks up a function in the object's virtual table and then jumps to that location. It is incredibly fast. If the switch statement uses a jump table, it will be roughly the same speed; however, if there are very few implementations, some programmer is going to be tempted to use an if...else ladder instead of a switch statement, which generally is not able to take advantage of jump tables and is, therefore, slower.
Why nobody suggests function objects? I think kingkai interested in solving the problem, not only that two solutions.
I'm not experienced with them, but they do their job:
struct Helloer{
std::string operator() (void){
return std::string("Hello world!");
}
};
struct Byer{
std::string operator() (void){
return std::string("Good bye world!");
}
};
template< class T >
void say( T speaker){
std::cout << speaker() << std::endl;
}
int main()
{
say( Helloer() );
say( Byer() );
}
Edit: In my opinion this is more "right" approach, than classes with single method (which is not function call operator). Actually, I think this overloading was added to C++ to avoid such classes.
Also, function objects are more convenient to use, even if you dont want templates - just like usual functions.
In the end consider STL - it uses func objects everywhere and looks pretty natural. And I dont even mention Boost

What are some 'good use' examples of dynamic casting?

We often hear/read that one should avoid dynamic casting. I was wondering what would be 'good use' examples of it, according to you?
Edit:
Yes, I'm aware of that other thread: it is indeed when reading one of the first answers there that I asked my question!
This recent thread gives an example of where it comes in handy. There is a base Shape class and classes Circle and Rectangle derived from it. In testing for equality, it is obvious that a Circle cannot be equal to a Rectangle and it would be a disaster to try to compare them. While iterating through a collection of pointers to Shapes, dynamic_cast does double duty, telling you if the shapes are comparable and giving you the proper objects to do the comparison on.
Vector iterator not dereferencable
Here's something I do often, it's not pretty, but it's simple and useful.
I often work with template containers that implement an interface,
imagine something like
template<class T>
class MyVector : public ContainerInterface
...
Where ContainerInterface has basic useful stuff, but that's all. If I want a specific algorithm on vectors of integers without exposing my template implementation, it is useful to accept the interface objects and dynamic_cast it down to MyVector in the implementation. Example:
// function prototype (public API, in the header file)
void ProcessVector( ContainerInterface& vecIfce );
// function implementation (private, in the .cpp file)
void ProcessVector( ContainerInterface& vecIfce)
{
MyVector<int>& vecInt = dynamic_cast<MyVector<int> >(vecIfce);
// the cast throws bad_cast in case of error but you could use a
// more complex method to choose which low-level implementation
// to use, basically rolling by hand your own polymorphism.
// Process a vector of integers
...
}
I could add a Process() method to the ContainerInterface that would be polymorphically resolved, it would be a nicer OOP method, but I sometimes prefer to do it this way. When you have simple containers, a lot of algorithms and you want to keep your implementation hidden, dynamic_cast offers an easy and ugly solution.
You could also look at double-dispatch techniques.
HTH
My current toy project uses dynamic_cast twice; once to work around the lack of multiple dispatch in C++ (it's a visitor-style system that could use multiple dispatch instead of the dynamic_casts), and once to special-case a specific subtype.
Both of these are acceptable, in my view, though the former at least stems from a language deficit. I think this may be a common situation, in fact; most dynamic_casts (and a great many "design patterns" in general) are workarounds for specific language flaws rather than something that aim for.
It can be used for a bit of run-time type-safety when exposing handles to objects though a C interface. Have all the exposed classes inherit from a common base class. When accepting a handle to a function, first cast to the base class, then dynamic cast to the class you're expecting. If they passed in a non-sensical handle, you'll get an exception when the run-time can't find the rtti. If they passed in a valid handle of the wrong type, you get a NULL pointer and can throw your own exception. If they passed in the correct pointer, you're good to go.
This isn't fool-proof, but it is certainly better at catching mistaken calls to the libraries than a straight reinterpret cast from a handle, and waiting until some data gets mysteriously corrupted when you pass the wrong handle in.
Well it would really be nice with extension methods in C#.
For example let's say I have a list of objects and I want to get a list of all ids from them. I can step through them all and pull them out but I would like to segment out that code for reuse.
so something like
List<myObject> myObjectList = getMyObjects();
List<string> ids = myObjectList.PropertyList("id");
would be cool except on the extension method you won't know the type that is coming in.
So
public static List<string> PropertyList(this object objList, string propName) {
var genList = (objList.GetType())objList;
}
would be awesome.
It is very useful, however, most of the times it is too useful: if for getting the job done the easiest way is to do a dynamic_cast, it's more often than not a symptom of bad OO design, what in turn might lead to trouble in the future in unforeseen ways.