C++ Inheritance & Virtual Functions Where Base Parameters require Replacement with Derived Ones - c++

I have looked high and low for answers to this question - here on this forum and on the general internet. While I have found posts discussing similar topics, I am at a point where I need to make some design choices and am wondering if I am going about it the right way, which is as follows:
In C++ I have created 3 data structures: A linked list, a binary tree and a grid. I want to be able to store different classes in these data structures - one may be a class to manipulate strings, another numbers, etc. Now, each of these classes, assigned to the nodes, has the ability to perform and handle comparison operations for the standard inequality operators.
I thought C++ inheritance would provide the perfect solution to the matter - it would allow for a base "data class" (the abstract class) and all the other data classes, such as JString, to inherit from it. So the data class would have the following inequality method:
virtual bool isGreaterThan(const dataStructure & otherData) const = 0;
Then, JString will inherit from dataStructure and the desire would be to override this method, since isGreaterThan will obviously have a different meaning depending on the class. However, what I need is this:
virtual bool isGreaterThan(const JString & otherData) const;
Which, I know will not work since the parameters are of a different data type and C++ requires this for the overriding of virtual methods. The only solution I could see is doing something like this in JString:
virtual bool isGreaterThan(const dataStructure & otherData);
{
this->isGreaterThanJString(dynamic_cast<const JString&>(theSourceData));
};
virtual bool isGreaterThanJString(const JString & otherData) const;
In other words, the overriding method just calls the JString equivalent, down-casting otherData to a JString object, since this will always be true and if not, it should fail regardless.
My question is this: Does this seem like an acceptable strategy or am I missing some ability in C++. I have used templates as well, but I am trying to avoid this as I find debugging becomes very difficult. The other option would be to try a void* that can accept any data type, but this comes with issues as well and shifts the burden onto the code resulting in lengthier classes.

The LSP means operations on a reference to base class must work and have the same semantics as operations on both base and derived class instances when those operations are referentially polymorphic.
Your example fails this test. The base isGreaterThan claims to work on all dataStructure, but it does not.
I would make the dataStructure argument types templates in your containers. Then you know the concrete type of the stored data.
Look at std list for an idea of what a linked list template might look like.
I will now go onto complex additional steps you can do in the 0.1% of cases where the above advice is not correct.
If this causes issues, because of template bloat, you could create a polymorphic container that enforces the type of the stored data, either with a thin template wrapper or runtime tests. Once stored, you blindly cast to the known stored type, and store how to copy/compare/etc said type either in a C or C++ style polymorphic method.
Here is an 8 year old fun talk about this approach: https://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil

Related

Abstract class, abstract method parameters and inheritance

I joined a research project in which the OOQP library is used. It uses an abstract class OoqpVector (inherited by several subclasses) and a number of methods, such as:
virtual double dotProductWith(OoqpVector& v) = 0
This method is implemented by subclasses SimpleVector and DistributedVector. Because of genericity, OoqpVector::dotProductWith allows all possible combinations:
SimpleVector.dotProductWith(SimpleVector)
DistributedVector.dotProductWith(DistributedVector)
SimpleVector.dotProductWith(DistributedVector)
DistributedVector.dotProductWith(SimpleVector)
However in practice, the method is called on vectors of the same type only (combinations 1 and 2). Therefore, the authors implemented SimpleVector::dotProductWith(OoqpVector&) and DistributedVector::dotProductWith(OoqpVector&) by dynamically casting the parameter to the current type.
Although this works, I find it a bit ugly and wonder if we can implement the same behavior without casts.
An option would be to get rid of OoqpVector::dotProductWith, declare the specialized methods SimpleVector::dotProductWith(SimpleVector&) and DistributedVector::dotProductWith(DistributedVector&), and force the calling methods to impose the two vector types to be equal. However, our code uses a bunch of OoqpVectors everywhere and is relatively generic, so this might be difficult to achieve.
Do you see a way to fix that?
Best,
Charlie

C++ Design issues: Map with various abstract base classes

I'm facing design problems and could do with some external input. I am trying to avoid abstract base class casting (Since I've heard that's bad).
The issues are down to this structure:
class entity... (base with pure virtual functions)
class hostile : public entity... (base with pure virtual functions)
class friendly : public entity... (base with pure virtual functions)
// Then further derived classes using these last base classes...
Initially I thought I'd get away with:
const enum class FactionType : unsigned int
{ ... };
std::unordered_map<FactionType, std::vector<std::unique_ptr<CEntity>>> m_entitys;
And... I did but this causes me problems because I need to access "unique" function from say hostile or friendly specifically.
I have disgracefully tried (worked but don't like it nor does it feel safe):
// For-Each Loop: const auto& friendly : m_entitys[FactionType::FRIENDLY]
CFriendly* castFriendly = static_cast<CFriendly*>(&*friendly);
I was hoping/trying to maintain the unordered_map design that uses FactionType as a key for the base abstract class type... Anyway, input is greatly appreciated.
If there are any syntactical errors, I apologise.
About casting I agree with with #rufflewind. The casts mean different thing and are useful at different times.
To coerce a region of memory at compile time (the decision of typing happen at compile time anyway) use static_cast. The amount of memory on the other end of the T* equal to sizeof(T) will be interpreted as a T regardless of correct behavior.
The decisions for dynamic_cast are made entirely at runtime, sometimes requiring RTTI (Run Time Type Information). It makes a decision and it will either return a null pointer or a valid pointer to a T if one can be made.
The decision goes further than just the types of casts though. Using a data structure to look up types and methods (member functions) imposes the time constraints that would not otherwise exist when compared to the relatively fast and mandatory casts. There is a way to skip the data structures, but not the casting without major refactoring (with major refactoring you can do anything).
You can move the casts into the entity class, get them done right and just leave them encapsulated there.
class entity
{
// Previous code
public:
// This will be overridden in hostiles to return a valid
// pointer and nullptr or 0 in other types of entities
virtual hostile* cast_to_hostile() = 0
virtual const hostile* cast_to_hostile() const = 0
// This will be overridden in friendlies to return a valid
// pointer and nullptr or 0 in other types of entities
virtual friendly* cast_to_friendly() = 0
virtual const friendly* cast_to_friendly() const = 0
// The following helper methods are optional but
// can make it easier to write streamlined code in
// calling classes with a little trouble.
// Hostile and friendly can each implement this to return
// The appropriate enum member. This would useful for making
// decision about friendlies and hostiles
virtual FactionType entity_type() const = 0;
// These two method delegate storage of the knowledge
// of hostility or friendliness to the derived classes.
// These are implemented here as non-virtual functions
// because they shouldn't need to be overridden, but
// could be made virtual at the cost of a pointer
// indirection and sometimes, but not often a cache miss.
bool is_friendly() const
{
return entity_type() == FactionType_friendly;
}
bool is_hostile() const
{
return entity_type() == FactionType_hostile;
}
}
This strategy is good and bad for a variety of reasons.
Pros:
It is conceptually simple. This is easy to understand quickly if you understand polymorphism.
It seems similar to your existing code seems superficially similar to your existing code making migration easier. There is a reason hostility and friendliness is encoded in your types, this preserves that reason.
You can use static_casts safely because all the casts exist in the class they are used in, and therefor won't normally get called unless valid.
You can return shared_ptr or other custom smart pointers instead of raw pointers. And you probably should.
This avoids a potentially costly refactor that completely avoids casting. Casting is there to be used as a tool.
Cons:
It is conceptually simple. This does not provide a strong set of vocabulary (methods, classes and patterns) for building a smart set of tools for building advanced type mechanics.
Likely whether or not something is hostile should be a data member or implemented as series of methods controlling instance behavior.
Someone might think that the pointers this returns convey ownership and delete them.
Every caller must check pointers for validity prior to use. Or you can add methods to check, but then callers will need to call methods to check before the cast. Checks like these are surprising for users of the class and make it harder to use correctly.
It is polymorphism dense. This will perplex people who are uncomfortable with polymorphism. Even today there are many who are not comfortable with polymorphism.
A refactor that completely avoids casting is possible. Casting is dangerous and not a tool to use lightly.

Vector of pointers to base type, find all instances of a given derived type stored in a base type

Suppose you have a base class inside of a library:
class A {};
and derived classes
class B: public A {};
class C: public A {};
Now Instances of B and C are stored in a std::vector of boost::shared_ptr<A>:
std::vector<boost::shared_ptr<A> > A_vec;
A_vec.push_back(boost::shared_ptr<B>(new B()));
A_vec.push_back(boost::shared_ptr<C>(new C()));
Adding instances of B and C is done by a user, and there is no way to determine in advance the order, in which they will be added.
However, inside of the library, there may be a need to perform specific actions on B and C, so the pointer to the base class needs to be casted to B and C.
I can of course do "trial and error" conversions, i.e. try to cast to Band C(and any other derivative of the base class), until I find a conversion that doesn't throw. However, this method seems very crude and error-prone, and I'm looking for a more elegant (and better performing) way.
I am looking for a solution that will also work with C++98, but may involve boost functionality.
Any ideas ?
EDIT:
O.k., thanks for all the answers so far!
I'd like to give some more details regarding the use-case. All of this happens in the context of parametric optimization.
Users define the optimization problem by:
Specifying the parameters, i.e. their types (e.g. "constrained double", "constrained integer", "unconstrained double", "boolean", etc.) and initial values
Specifying the evaluation function, which assigns one or more evaluations (double values) to a given parameter set
Different optimization algorithms then act on the problem definitions, including their parameters.
There is a number of predefined parameter objects for common cases, but users may also create their own parameter objects, by deriving from one of my base classes. So from a library perspective, apart from the fact that the parameter objects need to comply with a given (base-class) API, I cannot assume much about parameter objects.
The problem definition is a user-defined C++-class, derived from a base-class with a std::vector interface. The user adds his (predefined or home-grown) parameter objects and overloads a fitness-function.
Access to the parameter objects may happen
from within the optimization algorithms (usually o.k., even for home-grown parameter objects, as derived parameter objects need to provide access functions for their values).
from within the user-supplied fitness function (usually o.k., as the user knows where to find which parameter object in the collection and its value can be accessed easily)
This works fine.
There may however be special cases where
a user wants to access specifics of his home-grown parameter types
a third party has supplied the parameter structure (this is an Open Source library, others may add code for specific optimization problems)
the parameter structure (i.e. which parameters are where in the vector) may be modified as part of the optimization problem --> example: training of the architecture of a neural network
Under these circumstances it would be great to have an easy method to access all parameter objects of a given derived type inside of the collection of base types.
I already have a templated "conversion_iterator". It iterates over the vector of base objects and skips those that do not comply with the desired target type. However, this is based on "trial and error" conversion (i.e. I check whether the converted smart pointer is NULL), which I find very unelegant and error-prone.
I'd love to have a better solution.
NB: The optimization library is targetted at use-cases, where the evaluation step for a given parameter set may last arbitrarily long (usually seconds, possibly hours or longer). So speed of access to parameter types is not much of an issue. But stability and maintainability is ...
There’s no better general solution than trying to cast and seeing whether it succeeds. You can alternatively derive the dynamic typeid and compare it to all types in turn, but that is effectively the same amount of work.
More fundamentally, your need to do this hints at a design problem: the whole purpose of a base class is to be able to treat children as if they were parents. There are certain situations where this is necessary though, in which case you’d use a visitor to dispatch them.
If possible, add virtual methods to class A to do the "specific actions on B and C".
If that's not possible or not reasonable, use the pointer form of dynamic_cast, so there are no exceptions involved.
for (boost::shared_ptr<A> a : A_vec)
{
if (B* b = dynamic_cast<B*>(a.get()))
{
b->do_something();
}
else if (C* c = dynamic_cast<C*>(a.get()))
{
something_else(*c);
}
}
Adding instances of B and C is done by a user, and there is no way to determine in advance the order, in which they will be added.
Okay, so just put them in two different containers?
std::vector<boost::shared_ptr<A> > A_vec;
std::vector<boost::shared_ptr<B> > B_vec;
std::vector<boost::shared_ptr<C> > C_vec;
void add(B * p)
{
B_vec.push_back(boost::shared_ptr<B>(p));
A_vec.push_back(b.back());
}
void add(C * p)
{
C_vec.push_back(boost::shared_ptr<C>(p));
A_vec.push_back(c.back());
}
Then you can iterate over the Bs or Cs to your hearts content.
I would suggest to implement a method in the base class (e.g. TypeOf()), which will return the type of the particular object. Make sure you define that method as virtual and abstract so that you will be enforced to implement in the derived types. As for the type itself, you can define an enum for each type (e.g. class).
enum class ClassType { ClassA, ClassB, ClassC };
This answer might interest you: Generating an interface without virtual functions?
This shows you both approaches
variant w/visitor in a single collection
separate collections,
as have been suggested by others (Fred and Konrad, notably). The latter is more efficient for iteration, the former could well be more pure and maintainable. It could even be more efficient too, depending on the usage patterns.

Is there any way to avoid declaring virtual methods when storing (children) pointers?

I have run into an annoying problem lately, and I am not satisfied with my own workaround: I have a program that maintains a vector of pointers to a base class, and I am storing there all kind of children object-pointers. Now, each child class has methods of their own, and the main program may or not may call these methods, depending on the type of object (note though that they all heavily use common methods of the base class, so this justify inheritance).
I have found useful to have an "object identifier" to check the class type (and then either call the method or not), which is already not very beautiful, but this is not the main inconvenience. The main inconvenience is that, if I want to actually be able to call a derived class method using the base class pointer (or even just store the pointer in the pointer array), then one need to declare the derived methods as virtual in the base class.
Make sense from the C++ coding point of view.. but this is not practical in my case (from the development point of view), because I am planning to create many different children classes in different files, perhaps made by different people, and I don't want to tweak/maintain the base class each time, to add virtual methods!
How to do this? Essentially, what I am asking (I guess) is how to implement something like Objective-C NSArrays - if you send a message to an object that does not implement the method, well, nothing happens.
regards
Instead of this:
// variant A: declare everything in the base class
void DoStuff_A(Base* b) {
if (b->TypeId() == DERIVED_1)
b->DoDerived1Stuff();
else if if (b->TypeId() == DERIVED_2)
b->DoDerived12Stuff();
}
or this:
// variant B: declare nothing in the base class
void DoStuff_B(Base* b) {
if (b->TypeId() == DERIVED_1)
(dynamic_cast<Derived1*>(b))->DoDerived1Stuff();
else if if (b->TypeId() == DERIVED_2)
(dynamic_cast<Derived2*>(b))->DoDerived12Stuff();
}
do this:
// variant C: declare the right thing in the base class
b->DoStuff();
Note there's a single virtual function in the base per stuff that has to be done.
If you find yourself in a situation where you are more comfortable with variants A or B then with variant C, stop and rethink your design. You are coupling components too tightly and in the end it will backfire.
I am planning to create many different children classes in different
files, perhaps made by different people, and I don't want to
tweak/maintain the base class each time, to add virtual methods!
You are OK with tweaking DoStuff each time a derived class is added, but tweaking Base is a no-no. May I ask why?
If your design does not fit in either A, B or C pattern, show what you have, for clairvoyance is a rare feat these days.
You can do what you describe in C++, but not using functions. It is, by the way, kind of horrible but I suppose there might be cases in which it's a legitimate approach.
First way of doing this:
Define a function with a signature something like boost::variant parseMessage(std::string, std::vector<boost::variant>); and perhaps a string of convenience functions with common signatures on the base class and include a message lookup table on the base class which takes functors. In each class constructor add its messages to the message table and the parseMessage function then parcels off each message to the right function on the class.
It's ugly and slow but it should work.
Second way of doing this:
Define the virtual functions further down the hierarchy so if you want to add int foo(bar*); you first add a class that defines it as virtual and then ensure every class that wants to define int foo(bar*); inherit from it. You can then use dynamic_cast to ensure that the pointer you are looking at inherits from this class before trying to call int foo(bar*);. Possible these interface adding classes could be pure virtual so they can be mixed in to various points using multiple inheritance, but that may have its own problems.
This is less flexible than the first way and requires the classes that implement a function to be linked to each other. Oh, and it's still ugly.
But mostly I suggest you try and write C++ code like C++ code not Objective-C code.
This can be solved by adding some sort of introspection capabilities and meta object system. This talk Metadata and reflection in C++ — Jeff Tucker demonstrates how to do this using c++'s template meta programming.
If you don't want to go to the trouble of implementing one yourself, then it would be easier to use an existing one such as Qt's meta object system. Note that this solution does not work with multiple inheritance due to limitations in the meta object compiler: QObject Multiple Inheritance.
With that installed, you can query for the presence of methods and call them. This is quite tedious to do by hand, so the easiest way to call such a methods is using the signal and slot mechanism.
There is also GObject which is quite simmilar and there are others.
If you are planning to create many different children classes in different files, perhaps made by different people, and also I would guess you don't want to change your main code for every child class. Then I think what you need to do in your base class is to define several (not to many) virtual functions (with empty implementation) BUT those functions should be used to mark a time in the logic where they are called like "AfterInseart" or "BeforeSorting", Etc.
Usually there are not to many places in the logic you wish a derived classes to perform there own logic.

Why should one not derive from c++ std string class?

I wanted to ask about a specific point made in Effective C++.
It says:
A destructor should be made virtual if a class needs to act like a polymorphic class. It further adds that since std::string does not have a virtual destructor, one should never derive from it. Also std::string is not even designed to be a base class, forget polymorphic base class.
I do not understand what specifically is required in a class to be eligible for being a base class (not a polymorphic one)?
Is the only reason that I should not derive from std::string class is it does not have a virtual destructor? For reusability purpose a base class can be defined and multiple derived class can inherit from it. So what makes std::string not even eligible as a base class?
Also, if there is a base class purely defined for reusability purpose and there are many derived types, is there any way to prevent client from doing Base* p = new Derived() because the classes are not meant to be used polymorphically?
I think this statement reflects the confusion here (emphasis mine):
I do not understand what specifically is required in a class to be eligible for being a base clas (not a polymorphic one)?
In idiomatic C++, there are two uses for deriving from a class:
private inheritance, used for mixins and aspect oriented programming using templates.
public inheritance, used for polymorphic situations only. EDIT: Okay, I guess this could be used in a few mixin scenarios too -- such as boost::iterator_facade -- which show up when the CRTP is in use.
There is absolutely no reason to publicly derive a class in C++ if you're not trying to do something polymorphic. The language comes with free functions as a standard feature of the language, and free functions are what you should be using here.
Think of it this way -- do you really want to force clients of your code to convert to using some proprietary string class simply because you want to tack on a few methods? Because unlike in Java or C# (or most similar object oriented languages), when you derive a class in C++ most users of the base class need to know about that kind of a change. In Java/C#, classes are usually accessed through references, which are similar to C++'s pointers. Therefore, there's a level of indirection involved which decouples the clients of your class, allowing you to substitute a derived class without other clients knowing.
However, in C++, classes are value types -- unlike in most other OO languages. The easiest way to see this is what's known as the slicing problem. Basically, consider:
int StringToNumber(std::string copyMeByValue)
{
std::istringstream converter(copyMeByValue);
int result;
if (converter >> result)
{
return result;
}
throw std::logic_error("That is not a number.");
}
If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object -- no matter what child class of std::string is passed. This can lead to inconsistency between your methods and anything attached to the string. The function StringToNumber cannot simply take whatever your derived object is and copy that, simply because your derived object probably has a different size than a std::string -- but this function was compiled to reserve only the space for a std::string in automatic storage. In Java and C# this is not a problem because the only thing like automatic storage involved are reference types, and the references are always the same size. Not so in C++.
Long story short -- don't use inheritance to tack on methods in C++. That's not idiomatic and results in problems with the language. Use non-friend, non-member functions where possible, followed by composition. Don't use inheritance unless you're template metaprogramming or want polymorphic behavior. For more information, see Scott Meyers' Effective C++ Item 23: Prefer non-member non-friend functions to member functions.
EDIT: Here's a more complete example showing the slicing problem. You can see it's output on codepad.org
#include <ostream>
#include <iomanip>
struct Base
{
int aMemberForASize;
Base() { std::cout << "Constructing a base." << std::endl; }
Base(const Base&) { std::cout << "Copying a base." << std::endl; }
~Base() { std::cout << "Destroying a base." << std::endl; }
};
struct Derived : public Base
{
int aMemberThatMakesMeBiggerThanBase;
Derived() { std::cout << "Constructing a derived." << std::endl; }
Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; }
~Derived() { std::cout << "Destroying a derived." << std::endl; }
};
int SomeThirdPartyMethod(Base /* SomeBase */)
{
return 42;
}
int main()
{
Derived derivedObject;
{
//Scope to show the copy behavior of copying a derived.
Derived aCopy(derivedObject);
}
SomeThirdPartyMethod(derivedObject);
}
To offer the counter side to the general advice (which is sound when there are no particular verbosity/productivity issues evident)...
Scenario for reasonable use
There is at least one scenario where public derivation from bases without virtual destructors can be a good decision:
you want some of the type-safety and code-readability benefits provided by dedicated user-defined types (classes)
an existing base is ideal for storing the data, and allows low-level operations that client code would also want to use
you want the convenience of reusing functions supporting that base class
you understand that any any additional invariants your data logically needs can only be enforced in code explicitly accessing the data as the derived type, and depending on the extent to which that will "naturally" happen in your design, and how much you can trust client code to understand and cooperate with the logically-ideal invariants, you may want members functions of the derived class to reverify expectations (and throw or whatever)
the derived class adds some highly type-specific convenience functions operating over the data, such as custom searches, data filtering / modifications, streaming, statistical analysis, (alternative) iterators
coupling of client code to the base is more appropriate than coupling to the derived class (as the base is either stable or changes to it reflect improvements to functionality also core to the derived class)
put another way: you want the derived class to continue to expose the same API as the base class, even if that means the client code is forced to change, rather than insulating it in some way that allows the base and derived APIs to grow out of sync
you're not going to be mixing pointers to base and derived objects in parts of the code responsible for deleting them
This may sound quite restrictive, but there are plenty of cases in real world programs matching this scenario.
Background discussion: relative merits
Programming is about compromises. Before you write a more conceptually "correct" program:
consider whether it requires added complexity and code that obfuscates the real program logic, and is therefore more error prone overall despite handling one specific issue more robustly,
weigh the practical costs against the probability and consequences of issues, and
consider "return on investment" and what else you could be doing with your time.
If the potential problems involve usage of the objects that you just can't imagine anyone attempting given your insights into their accessibility, scope and nature of usage in the program, or you can generate compile-time errors for dangerous use (e.g. an assertion that derived class size matches the base's, which would prevent adding new data members), then anything else may be premature over-engineering. Take the easy win in clean, intuitive, concise design and code.
Reasons to consider derivation sans virtual destructor
Say you have a class D publicly derived from B. With no effort, the operations on B are possible on D (with the exception of construction, but even if there are a lot of constructors you can often provide effective forwarding by having one template for each distinct number of constructor arguments: e.g. template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { }. Better generalised solution in C++0x variadic templates.)
Further, if B changes then by default D exposes those changes - staying in sync - but someone may need to review extended functionality introduced in D to see if it remains valid, and the client usage.
Rephrasing this: there is reduced explicit coupling between base and derived class, but increased coupling between base and client.
This is often NOT what you want, but sometimes it is ideal, and other times a non issue (see next paragraph). Changes to the base force more client code changes in places distributed throughout the code base, and sometimes the people changing the base may not even have access to the client code to review or update it correspondingly. Sometimes it is better though: if you as the derived class provider - the "man in the middle" - want base class changes to feed through to clients, and you generally want clients to be able - sometimes forced - to update their code when the base class changes without you needing to be constantly involved, then public derivation may be ideal. This is common when your class is not so much an independent entity in its own right, but a thin value-add to the base.
Other times the base class interface is so stable that the coupling may be deemed a non issue. This is especially true of classes like Standard containers.
Summarily, public derivation is a quick way to get or approximate the ideal, familiar base class interface for the derived class - in a way that's concise and self-evidently correct to both the maintainer and client coder - with additional functionality available as member functions (which IMHO - which obviously differs with Sutter, Alexandrescu etc - can aid usability, readability and assist productivity-enhancing tools including IDEs)
C++ Coding Standards - Sutter & Alexandrescu - cons examined
Item 35 of C++ Coding Standards lists issues with the scenario of deriving from std::string. As scenarios go, it's good that it illustrates the burden of exposing a large but useful API, but both good and bad as the base API is remarkably stable - being part of the Standard Library. A stable base is a common situation, but no more common than a volatile one and a good analysis should relate to both cases. While considering the book's list of issues, I'll specifically contrast the issues' applicability to the cases of say:
a) class Issue_Id : public std::string { ...handy stuff... }; <-- public derivation, our controversial usage
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO derivation
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; <-- a compositional approach
d) using std::string everywhere, with freestanding support functions
(Hopefully we can agree the composition is acceptable practice, as it provides encapsulation, type safety as well as a potentially enriched API over and above that of std::string.)
So, say you're writing some new code and start thinking about the conceptual entities in an OO sense. Maybe in a bug tracking system (I'm thinking of JIRA), one of them is say an Issue_Id. Data content is textual - consisting of an alphabetic project id, a hyphen, and an incrementing issue number: e.g. "MYAPP-1234". Issue ids can be stored in a std::string, and there will be lots of fiddly little text searches and manipulation operations needed on issue ids - a large subset of those already provided on std::string and a few more for good measure (e.g. getting the project id component, providing the next possible issue id (MYAPP-1235)).
On to Sutter and Alexandrescu's list of issues...
Nonmember functions work well within existing code that already manipulates strings. If instead you supply a super_string, you force changes through your code base to change types and function signatures to super_string.
The fundamental mistake with this claim (and most of the ones below) is that it promotes the convenience of using only a few types, ignoring the benefits of type safety. It's expressing a preference for d) above, rather than insight into c) or b) as alternatives to a). The art of programming involves balancing the pros and cons of distinct types to achieve reasonable reuse, performance, convenience and safety. The paragraphs below elaborate on this.
Using public derivation, the existing code can implicitly access the base class string as a string, and continue to behave as it always has. There's no specific reason to think that the existing code would want to use any additional functionality from super_string (in our case Issue_Id)... in fact it's often lower-level support code pre-existing the application for which you're creating the super_string, and therefore oblivious to the needs provided for by the extended functions. For example, say there's a non-member function to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it could still be applied to an Issue_Id.
So, unless the non-member support function is being cleaned up or extended at the deliberate cost of tightly coupling it to the new code, then it needn't be touched. If it is being overhauled to support issue ids (for example, using the insight into the data content format to upper-case only leading alpha characters), then it's probably a good thing to ensure it really is being passed an Issue_Id by creating an overload ala to_upper(Issue_Id&) and sticking to either the derivation or compositional approaches allowing type safety. Whether super_string or composition is used makes no difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable free-standing support function isn't likely to be of much use - I can't recall the last time I wanted such a function.
The impulse to use std::string everywhere isn't qualitatively different to accepting all your arguments as containers of variants or void*s so you don't have to change your interfaces to accept arbitrary data, but it makes for error prone implementation and less self-documenting and compiler-verifiable code.
Interface functions that take a string now need to: a) stay away from super_string's added functionality (unuseful); b) copy their argument to a super_string (wasteful); or c) cast the string reference to a super_string reference (awkward and potentially illegal).
This seems to be revisiting the first point - old code that needs to be refactored to use the new functionality, albeit this time client code rather than support code. If the function wants to start treating its argument as an entity for which the new operations are relevant, then it should start taking its arguments as that type and the clients should generate them and accept them using that type. The exact same issues exists for composition. Otherwise, c) can be practical and safe if the guidelines I list below are followed, though it is ugly.
super_string's member functions don't have any more access to string's internals than nonmember functions because string probably doesn't have protected members (remember, it wasn't meant to be derived from in the first place)
True, but sometimes that's a good thing. A lot of base classes have no protected data. The public string interface is all that's needed to manipulate the contents, and useful functionality (e.g. get_project_id() postulated above) can be elegantly expressed in terms of those operations. Conceptually, many times I've derived from Standard containers, I've wanted not to extend or customise their functionality along the existing lines - they're already "perfect" containers - rather I've wanted to add another dimension of behaviour that's specific to my application, and requires no private access. It's because they're already good containers that they're good to reuse.
If super_string hides some of string's functions (and redefining a nonvirtual function in a derived class is not overriding, it's just hiding), that could cause widespread confusion in code that manipulates strings that started their life converted automatically from super_strings.
True for composition too - and more likely to happen as the code doesn't default to passing things through and hence staying in sync, and also true in some situations with run-time polymorphic hierarchies as well. Samed named functions that behave differently in classes that initial appear interchangeable - just nasty. This is effectively the usual caution for correct OO programming, and again not a sufficient reason to abandon the benefits in type safety etc..
What if super_string wants to inherit from string to add more state [explanation of slicing]
Agreed - not a good situation, and somewhere I personally tend to draw the line as it often moves the problems of deletion through a pointer to base from the realm of theory to the very practical - destructors aren't invoked for additional members. Still, slicing can often do what's wanted - given the approach of deriving super_string not to change its inherited functionality, but to add another "dimension" of application-specific functionality....
Admittedly, it's tedious to have to write passthrough functions for the member functions you want to keep, but such an implementation is vastly better and safer than using public or nonpublic inheritance.
Well, certainly agree about the tedium....
Guidelines for successful derivation sans virtual destructor
ideally, avoid adding data members in derived class: variants of slicing can accidentally remove data members, corrupt them, fail to initialise them...
even more so - avoid non-POD data members: deletion via base-class pointer is technically undefined behaviour anyway, but with non-POD types failing to run their destructors is more likely to have non-theoretical problems with resource leaks, bad reference counts etc.
honour the Liskov Substitution Principal / you can't robustly maintain new invariants
for example, in deriving from std::string you can't intercept a few functions and expect your objects to remain uppercase: any code that accesses them via a std::string& or ...* can use std::string's original function implementations to change the value)
derive to model a higher level entity in your application, to extend the inherited functionality with some functionality that uses but doesn't conflict with the base; do not expect or try to change the basic operations - and access to those operations - granted by the base type
be aware of the coupling: base class can't be removed without affecting client code even if the base class evolves to have inappropriate functionality, i.e. your derived class's usability depends on the ongoing appropriateness of the base
sometimes even if you use composition you'll need to expose the data member due to performance, thread safety issues or lack of value semantics - so the loss of encapsulation from public derivation isn't tangibly worse
the more likely people using the potentially-derived class will be unaware of its implementation compromises, the less you can afford to make them dangerous
therefore, low-level widely deployed libraries with many ad-hoc casual users should be more wary of dangerous derivation than localised use by programmers routinely using the functionality at application level and/or in "private" implementation / libraries
Summary
Such derivation is not without issues so don't consider it unless the end result justifies the means. That said, I flatly reject any claim that this can't be used safely and appropriately in particular cases - it's just a matter of where to draw the line.
Personal experience
I do sometimes derive from std::map<>, std::vector<>, std::string etc - I've never been burnt by the slicing or delete-via-base-class-pointer issues, and I've saved a lot of time and energy for more important things. I don't store such objects in heterogeneous polymorphic containers. But, you need to consider whether all the programmers using the object are aware of the issues and likely to program accordingly. I personally like to write my code to use heap and run-time polymorphism only when needed, while some people (due to Java backgrounds, their prefered approach to managing recompilation dependencies or switching between runtime behaviours, testing facilities etc.) use them habitually and therefore need to be more concerned about safe operations via base class pointers.
If you really want to derive from it (not discussing why you want to do it) I think you can prevent Derived class direct heap instantiation by making it's operator new private:
class StringDerived : public std::string {
//...
private:
static void* operator new(size_t size);
static void operator delete(void *ptr);
};
But this way you restrict yourself from any dynamic StringDerived objects.
Not only is the destructor not virtual, std::string contains no virtual functions at all, and no protected members. That makes it very hard for the derived class to modify its functionality.
Then why would you derive from it?
Another problem with being non-polymorphic is that if you pass your derived class to a function expecting a string parameter, your extra functionality will just be sliced off and the object will be seen as a plain string again.
Why should one not derive from c++ std string class?
Because it is not necessary. If you want to use DerivedString for functionality extension; I don't see any problem in deriving std::string. The only thing is, you should not interact between both classes (i.e. don't use string as a receiver for DerivedString).
Is there any way to prevent client from doing Base* p = new Derived()
Yes. Make sure that you provide inline wrappers around Base methods inside Derived class. e.g.
class Derived : protected Base { // 'protected' to avoid Base* p = new Derived
const char* c_str () const { return Base::c_str(); }
//...
};
There are two simple reasons for not deriving from a non-polymorphic class:
Technical: it introduces slicing bugs (because in C++ we pass by value unless otherwise specified)
Functional: if it is non-polymorphic, you can achieve the same effect with composition and some function forwarding
If you wish to add new functionalities to std::string, then first consider using free functions (possibly templates), like the Boost String Algorithm library does.
If you wish to add new data members, then properly wrap the class access by embedding it (Composition) inside a class of your own design.
EDIT:
#Tony noticed rightly that the Functional reason I cited was probably meaningless to most people. There is a simple rule of thumb, in good design, that says that when you can pick a solution among several, you should consider the one with the weaker coupling. Composition has weaker coupling that Inheritance, and thus should be preferred, when possible.
Also, composition gives you the opportunity to nicely wrap the original's class method. This is not possible if you pick inheritance (public) and the methods are not virtual (which is the case here).
The C++ standard states that If Base class destructor is not virtual and you delete an object of Base class that points to the object of an derived class then it causes an undefined Behavior.
C++ standard section 5.3.5/3:
if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined.
To be clear on the Non-polymorphic class & need of virtual destructor
The purpose of making a destructor virtual is to facilitate the polymorphic deletion of objects through delete-expression. If there is no polymorphic deletion of objects, then you don't need virtual destructor's.
Why not to derive from String Class?
One should generally avoid deriving from any standard container class because of the very reason that they don' have virtual destructors, which make it impossible to delete objects polymorphically.
As for the string class, the string class doesn't have any virtual functions so there is nothing that you can possibly override. The best you can do is hide something.
If at all you want to have a string like functionality you should write a class of your own rather than inherit from std::string.
As soon as you add any member (variable) into your derived std::string class, will you systematically screw the stack if you attempt to use the std goodies with an instance of your derived std::string class? Because the stdc++ functions/members have their stack pointers[indexes] fixed [and adjusted] to the size/boundary of the (base std::string) instance size.
Right?
Please, correct me if I am wrong.