C++ typedef versus unelaborated inheritance - c++

I have a data structure made of nested STL containers:
typedef std::map<Solver::EnumValue, double> SmValueProb;
typedef std::map<Solver::VariableReference, Solver::EnumValue> SmGuard;
typedef std::map<SmGuard, SmValueProb> SmTransitions;
typedef std::map<Solver::EnumValue, SmTransitions> SmMachine;
This form of the data is only used briefly in my program, and there's not much behavior that makes sense to attach to these types besides simply storing their data. However, the compiler (VC++2010) complains that the resulting names are too long.
Redefining the types as subclasses of the STL containers with no further elaboration seems to work:
typedef std::map<Solver::EnumValue, double> SmValueProb;
class SmGuard : public std::map<Solver::VariableReference, Solver::EnumValue> { };
class SmTransitions : public std::map<SmGuard, SmValueProb> { };
class SmMachine : public std::map<Solver::EnumValue, SmTransitions> { };
Recognizing that the STL containers aren't intended to be used as a base class, is there actually any hazard in this scenario?

There is one hazard: if you call delete on a pointer to a base class with no virtual destructor, you have Undefined Behavior. Otherwise, you are fine.
At least that's the theory. In practice, in the MSVC ABI or the Itanium ABI (gcc, Clang, icc, ...) delete on a base class with no virtual destructor (-Wdelete-non-virtual-dtor with gcc and clang, providing the class has virtual methods) only results in a problem if your derived class adds non-static attributes with non-trivial destructor (eg. a std::string).
In your specific case, this seems fine... but...
... you might still want to encapsulate (using Composition) and expose meaningful (business-oriented) methods. Not only will it be less hazardous, it will also be easier to understand than it->second.find('x')->begin()...

Yes there is:
std::map<Solver::VariableReference, Solver::EnumValue>* x = new SmGuard;
delete x;
results in undefined behavior.

This is one of the controversial point of C++ vs "inheritance based classical OOP".
There are two aspect that must be taken in consideration:
a typedef is introduce another name for a same type: std::map<Solver::EnumValue, double> and SmValueProb are -at all effect- the exact same thing and cna be used interchangably.
a class introcuce a new type that is (by principle) unrelated with anything else.
Class relation are defined by the way the class is "made up", and what lets implicit operations and conversion to be possible with other types.
Outside of specific programming paradigms (like OOP, that associate to the concept of "inhritance" and "is-a" relation) inheritance, implicit constructors, implicit casts, and so on, all do a same thing: let a type to be used across the interface of another type, thus defining a network of possible operations across different types. This is (generally speaking) "polymorphism".
Various programming paradigms exist about saying how such a network should be structured each attempting to optimize a specific aspect of programming, like the representation or runtime-replacable objects (classical OOP), the representation of compile-time replacable objects (CRTP), the use of genreric algorithial function for different types (Generic programming), teh use of "pure function" to express algorithm composition (functional and lambda "captures").
All of them dictates some "rules" about how language "features" must be used, since -being C++ multiparadigm- non of its features satisfy alone the requirements of the paradigm, letting some dirtiness open.
As Luchian said, inheriting a std::map will not produce a pure OOP replaceable type, since a delete over a base-pointer will not know how to destroy the derived part, being the destructor not virtual by design.
But -in fact- this is just a particular case: also pbase->find will not call your own eventually overridden find method, being std::map::find not virtual. (But this is not undefined: it is very well defined to be most likely not what you intend).
The real question is another: is "classic OOP substitution principle" important in your design or not?
In other word, are you going to use your classes AND their bases each other interchangeably, with functions just taking a std::map* or std::map& parameter, pretending those function to call std::map functions resulting in calls to your methods?
If yes, inheritance is NOT THE WAY TO GO. There are no virtual methods in std::map, hence runtime polymorphism will not work.
If no, that is: you're just writing your own class reusing both std::map behavior and interface, with no intention of interchange their usage (in particular, you are not allocating your own classes with new and deletinf them with delete applyed to an std::map pointer), providing just a set of functions taking yourclass& or yourclass* as parameters, that that's perfectly fine. It may even be better than a typedef, since your function cannot be used with a std::map anymore, thus separating the functionalities.
The alternative can be "encapsulation": that is: make the map and explicit member of your class letting the map accessible as a public member, or making it a private member with an accessor function, or rewriting yourself the map interface in your class. You gat finally an unrelated type with tha same interface an its own behavior. At the cost to rewrite the entire interface of something that may have hundredths of methods.
NOTE:
To anyone thinking about the danger of the missing of vitual dtor, note tat encapluating with public visibility won't solve the problem:
class myclass: public std::map<something...>
{};
std::map<something...>* p = new myclass;
delete p;
is UB excatly like
class myclass
{
public:
std::map<something...> mp;
};
std::map<something...>* p = &((new myclass)->mp);
delete p;
The second sample has the same mistake as the first, it is just less common: they both pretend to use a pointer to a partial object to operate on the entire one, with nothing in the partial object letting you able to know what the "containing one" is.

Related

Is Abstract class an example of Abstract data type?

I'm getting confused by these two. What I learned is that Abstract data type is a mathematical model for data type, where it specifies the objects and the methods to manipulate these objects without specifying the details about the implementation of the objects and methods. Ex: an abstract stack model defines a stack with push and pop operations to insert and delete items to and from the stack. We can implement this in many ways, by using linked lists, arrays or classes.
Now, coming to the definition of abstract class, its a parent class which has one or more methods that doesn't have definition(implementation?) and cannot be instantiated (much like we can't implement an abstract stack as it is, without defining the stack's underlying mechanism through one of the concrete data structures). For ex: if we have an abstract class called Mammal which includes a function called eat(), we don't know how a mammal eats because a mammal is abstract. Although we can define eat() for a cow which is a derived class of mammal. Does this mean that mammal serves as an adt and cow class is an implementation of the mammal adt?
Correct me if I'm wrong in any way. Any kind of help would be really appreciated.
Abstract data type is a mathematical model for data type...
Now, coming to the definition of abstract class...
You need to distinguish between theoretical mathematical models and a practical implementation techniques.
Models are created by people in order to reason about problems easily, in some comprehensible, generalized way.
Meanwhile, the actual code is written in order to work and get the job done.
"Abstract data type" is a model. "Abstract class" is a programming technique which some programming languages (C++, C#, Java) support on the language level.
"Abstract data type" lets you think and talk about the solution of a problem, without overloading your brain with unnecessary (at this moment) implementation details. When you need a FIFO data structure, you say just "stack", but not "a doubly-linked list with the pointer to the head node and the ability to...".
"Abstract class" lets you write the code once and then reuse it later (because that is the point of OOP - code reuse). When you see that several types have a common interface and functionality - you may create "an abstract class" and put the intersection of their functionality in inside, while still being able to rely on yet unimplemented functions, which will be implemented by some concrete type later. This way, you write the code once and when you need to change it later - it's only one place to make the change in.
Note:
Although, in C++ ISO Standard (at least in the draft) there is a note:
Note: The abstract class mechanism supports the notion of a general concept,
such as a shape, of which only more concrete variants, such as circle
and square, can actually be used.
but it is just a note. The real definition is:
A class is abstract if it has at least one pure (aka unimplemented) virtual function.
which leads to the obvious constraint:
no objects of an abstract class can be created except as subobjects of
a class derived from it
Personally, I like that C++ (unlike C# and Java) doesn't have the keyword "abstract". It only has type inheritance and virtual functions (which may remain unimplemented). This helps you focus on a practical matter: inherit where needed, override where necessary.
In a nutshell, using OOP - be pragmatic.
The term "abstract data type" is not directly related to anything in C++. So abstract class is one of the potential implementation strategies to implement abstract data types in the given language. But there are a lot more techniques to do that.
So abstract base classes allow you to define a set of derived classes and give you the guarantee that all interfaces ( declarations ) have also an implementation, if not, the compiler throws an error, because you can't get an instance of your class because of the missing method definition.
But you also can use compile time polymorphism and related techniques like CRTP to have abstract data types.
So you have to decide which features you need and what price you want to pay for it. Runtime polymorphism comes with the extra cost of vtable and vtable dispatching but with the benefit of late binding. Compile time polymorphism comes with the benefit of much better optimizable code with faster execution and less code size. Both give you errors if an interface is not implemented, at minimum at the linker stage.
But abstract data types with polymorphism, independend of runtime or compile time, is not a 1:1 relation. Making things abstract can also be given by simply defining an interface which must be somewhere fulfilled.
In a short: Abstract data types is not a directly represented in c++ while abstract base class is a c++ technique.
Is Abstract class an example of Abstract data type?
Yes, but in C++, abstract classes have become an increasingly rare example of abstract data types, because generic programming is often a superior alternative.
Ex: an abstract stack model defines a stack with push and pop
operations to insert and delete items to and from the stack. We can
implement this in many ways, by using linked lists, arrays or classes.
The C++ std::stack class template more or less works like this. It has member functions push and pop, and it's implemented in terms of the Container type parameter, which defaults to std::deque.
For an implementation with a linked list, you'd type std::stack<int, std::list<int>>. However, arrays cannot be used to implement a stack, because a stack can grow and shrink, and arrays have a fixed size.
It's very important to understand that the std::stack has absolutely nothing to do with abstract classes or runtime polymorphism. There's not a single virtual function involved.
Now, coming to the definition of abstract class, its a parent class
which has one or more methods that doesn't have
definition(implementation?) and cannot be instantiated
Yes, that's precisely the definition of an abstract class in C++.
In theory, such a stack class could look like this:
template <class T>
class Stack
{
public:
virtual ~Stack() = 0;
virtual void push(T const& value) = 0;
virtual T pop() = 0;
};
In this example, the element type is still generic, but the implementation of the container is meant to be provided by a concrete derived class. Such container designs are idiomatic in other languages, but not in C++.
much like we can't implement an abstract stack as it is, without defining the stack's underlying mechanism through one of the concrete data structures
Yes, you couldn't use std::stack without providing a container type parameter (but that's impossible anyway, because there's the default std::deque parameter), and you cannot instantiate a Stack<int> my_stack; either.

Vector of pointers to base type, find all instances of a given derived type stored in a base type

Suppose you have a base class inside of a library:
class A {};
and derived classes
class B: public A {};
class C: public A {};
Now Instances of B and C are stored in a std::vector of boost::shared_ptr<A>:
std::vector<boost::shared_ptr<A> > A_vec;
A_vec.push_back(boost::shared_ptr<B>(new B()));
A_vec.push_back(boost::shared_ptr<C>(new C()));
Adding instances of B and C is done by a user, and there is no way to determine in advance the order, in which they will be added.
However, inside of the library, there may be a need to perform specific actions on B and C, so the pointer to the base class needs to be casted to B and C.
I can of course do "trial and error" conversions, i.e. try to cast to Band C(and any other derivative of the base class), until I find a conversion that doesn't throw. However, this method seems very crude and error-prone, and I'm looking for a more elegant (and better performing) way.
I am looking for a solution that will also work with C++98, but may involve boost functionality.
Any ideas ?
EDIT:
O.k., thanks for all the answers so far!
I'd like to give some more details regarding the use-case. All of this happens in the context of parametric optimization.
Users define the optimization problem by:
Specifying the parameters, i.e. their types (e.g. "constrained double", "constrained integer", "unconstrained double", "boolean", etc.) and initial values
Specifying the evaluation function, which assigns one or more evaluations (double values) to a given parameter set
Different optimization algorithms then act on the problem definitions, including their parameters.
There is a number of predefined parameter objects for common cases, but users may also create their own parameter objects, by deriving from one of my base classes. So from a library perspective, apart from the fact that the parameter objects need to comply with a given (base-class) API, I cannot assume much about parameter objects.
The problem definition is a user-defined C++-class, derived from a base-class with a std::vector interface. The user adds his (predefined or home-grown) parameter objects and overloads a fitness-function.
Access to the parameter objects may happen
from within the optimization algorithms (usually o.k., even for home-grown parameter objects, as derived parameter objects need to provide access functions for their values).
from within the user-supplied fitness function (usually o.k., as the user knows where to find which parameter object in the collection and its value can be accessed easily)
This works fine.
There may however be special cases where
a user wants to access specifics of his home-grown parameter types
a third party has supplied the parameter structure (this is an Open Source library, others may add code for specific optimization problems)
the parameter structure (i.e. which parameters are where in the vector) may be modified as part of the optimization problem --> example: training of the architecture of a neural network
Under these circumstances it would be great to have an easy method to access all parameter objects of a given derived type inside of the collection of base types.
I already have a templated "conversion_iterator". It iterates over the vector of base objects and skips those that do not comply with the desired target type. However, this is based on "trial and error" conversion (i.e. I check whether the converted smart pointer is NULL), which I find very unelegant and error-prone.
I'd love to have a better solution.
NB: The optimization library is targetted at use-cases, where the evaluation step for a given parameter set may last arbitrarily long (usually seconds, possibly hours or longer). So speed of access to parameter types is not much of an issue. But stability and maintainability is ...
There’s no better general solution than trying to cast and seeing whether it succeeds. You can alternatively derive the dynamic typeid and compare it to all types in turn, but that is effectively the same amount of work.
More fundamentally, your need to do this hints at a design problem: the whole purpose of a base class is to be able to treat children as if they were parents. There are certain situations where this is necessary though, in which case you’d use a visitor to dispatch them.
If possible, add virtual methods to class A to do the "specific actions on B and C".
If that's not possible or not reasonable, use the pointer form of dynamic_cast, so there are no exceptions involved.
for (boost::shared_ptr<A> a : A_vec)
{
if (B* b = dynamic_cast<B*>(a.get()))
{
b->do_something();
}
else if (C* c = dynamic_cast<C*>(a.get()))
{
something_else(*c);
}
}
Adding instances of B and C is done by a user, and there is no way to determine in advance the order, in which they will be added.
Okay, so just put them in two different containers?
std::vector<boost::shared_ptr<A> > A_vec;
std::vector<boost::shared_ptr<B> > B_vec;
std::vector<boost::shared_ptr<C> > C_vec;
void add(B * p)
{
B_vec.push_back(boost::shared_ptr<B>(p));
A_vec.push_back(b.back());
}
void add(C * p)
{
C_vec.push_back(boost::shared_ptr<C>(p));
A_vec.push_back(c.back());
}
Then you can iterate over the Bs or Cs to your hearts content.
I would suggest to implement a method in the base class (e.g. TypeOf()), which will return the type of the particular object. Make sure you define that method as virtual and abstract so that you will be enforced to implement in the derived types. As for the type itself, you can define an enum for each type (e.g. class).
enum class ClassType { ClassA, ClassB, ClassC };
This answer might interest you: Generating an interface without virtual functions?
This shows you both approaches
variant w/visitor in a single collection
separate collections,
as have been suggested by others (Fred and Konrad, notably). The latter is more efficient for iteration, the former could well be more pure and maintainable. It could even be more efficient too, depending on the usage patterns.

Why does C++ not let baseclasses implement a derived class' inherited interface?

Here is what I am talking about
// some guy wrote this, used as a Policy with templates
struct MyWriter {
void write(std::vector<char> const& data) {
// ...
}
};
In some existing code, the people did not use templates, but interfaces+type-erasure
class IWriter {
public:
virtual ~IWriter() {}
public:
virtual void write(std::vector<char> const& data) = 0;
};
Someone else wanted to be usable with both approaches and writes
class MyOwnClass: private MyWriter, public IWriter {
// other stuff
};
MyOwnClass is implemented-in-terms-of MyWriter. Why doesn't MyOwnClass' inherited member functions implement the interface of IWriter automatically? Instead the user has to write forwarding functions that do nothing but call the base class versions, as in
class MyOwnClass: private MyWriter, public IWriter {
public:
void write(std::vector<char> const& data) {
MyWriter::write(data);
}
};
I know that in Java when you have a class that implements an interface and derives from a class that happens to have suitable methods, that base class automatically implements the interface for the derived class.
Why doesn't C++ do that? It seems like a natural thing to have.
This is multiple inheritance, and there are two inherited functions with the same signature, both of which have implementation. That's where C++ is different from Java.
Calling write on an expression whose static type is MyBigClass would therefore be ambiguous as to which of the inherited functions was desired.
If write is only called through base class pointers, then defining write in the derived class is NOT necessary, contrary to the claim in the question. Now that the question changed to include a pure specifier, implementing that function in the derived class is necessary to make the class concrete and instantiable.
MyWriter::write cannot be used for the virtual call mechanism of MyBigClass, because the virtual call mechanism requires a function that accepts an implicit IWriter* const this, and MyWriter::write accepts an implicit MyWriter* const this. A new function is required, which must take into account the address difference between the IWriter subobject and the MyWriter subobject.
It would be theoretically possible for the compiler to create this new function automatically, but it would be fragile, since a change in a base class could suddenly cause a new function to be chosen for forwarding. It's less fragile in Java, where only single inheritance is possible (there's only one choice for what function to forward to), but in C++, which supports full multiple inheritance, the choice is ambiguous, and we haven't even started on diamond inheritance or virtual inheritance yet.
Actually, this problem (difference between subobject addresses) is solved for virtual inheritance. But it requires additional overhead that's not necessary most of the time, and a C++ guiding principle is "you don't pay for what you don't use".
Why doesn't C++ do that? It seems like a natural thing to have.
Actually, no, it is extremely unnatural thing to have.
Please note that my reasoning is based on my own understanding of "common sense" and can be fundamentally flawed as a result.
You see, you have two different methods, first one in MyWriter, which is non virtual and second one in IWriter which is virtual. They are completely different despite "looking" similar.
I suggest to check this question. The good thing about non-virtual methods is that no matter what you do, as long as they don't call virtual methods, their behavior will never change. I.e. somebody deriving from your class with non-virtual methods will not break existing method by masking them. Virtual methods are designed to be overriden. The price of that is that it is possible to break underlying logic by improperly overriding virtual method. And this is a root of your problem.
Let's say what you propose is allowed. (automatic conversion to virtual with multiple inheritance) There two possible solutions:
Solution #1
MyWriter becomes virtual. Consequences: All existing C++ code in the world becomes easy to break via typo or name clash. MyWriter method was not supposed to be overriden initially, so suddenly turning it into virtual will (murphy's law) break underlying logic of MyWriter class when somebody derives from MyOwnClass. Which means that suddenly making MyWriter::write virtual is a bad idea.
Soluion #2
MyWriter remains static BUUUT it is included temporarily as a virtual method into IWriter, until overriden. At first glance there's nothing to worry about, but let's think about it. IWriter implements some kind of concept you had in mind, and it is supposed to do something. MyWriter implements another concept. To assign MyWriter::write as IWriter::write method you need two guarantees:
Compiler must ensure that MyWriter::write does what IWriter::write() is supposed to do.
Compiler must ensure that calling MyWriter::write from IWriter will not break existing functionality in MyWriter code programmer expects to use elsewhere.
So, the thing is that compiler cannot guarantee that. Functions have similar name and argument list, but by Murphy's law that means that they're prbably doing completely different thing. (sinf and cosf have same argument list, for example), and it is unlikely that compiler will be able to predict the future and make sure that at no point in development will MyWriter be changed in such way that it will become incompatible with IWriter. So, since machine can't make reasonable decision (no AI for that) by itself, it has to ask YOU, programmer - "What is it you wish to do?". And you say "redirect virtual method into MyWriter::write(). It totally won't break anything. I think.".
And that's why you must specify which method you want to use manually....
Doing it automatically would be unintuitive and surprising. C++ does not assume that multiple base classes are related to each other, and protects the user against name collisions between their members by defining nested name specifiers for nonstatic members. Adding implicit declarations to MyOwnClass where signatures from IWriter and MyWriter collide would be antithetical to protecting names.
However, C++11 extensions do bring us closer. Consider this:
class MyOwnClass: private MyWriter, public IWriter {
public:
void write(std::vector<char> const& data) final = MyWriter::write;
};
This mechanism would be safe because it expresses that MyWriter doesn't expect any further overrides, and convenient because it names the function signature that will be "joined" but nothing more. Also, final would be ill-formed if the function weren't implicitly virtual, so it checks that the signature matches the virtual interface.
On one hand, most interfaces don't just happen to match up this way. Defining this feature to work only with identical signatures would be safe but rarely useful. Defining it as a shortcut to a delegating function body would be useful but fragile. So it might not really be a good feature
On the other hand, this is a good design pattern to provide functionality which isn't virtual when you don't need it to be. So given this idiom, we might use it to write good code, even if it doesn't match up well with current practices.
Why doesn't C++ do that?
I'm not sure what you're asking here. Could C++ be rewritten to allow this? Yes, but to what end?
Because MyWriter and IWriter are completely different classes, it is illegal in C++ to call a member of MyWriter through an instance of IWriter. The member pointers have completely different types. And just as a MyWriter* is not convertible to a IWriter*, neither is a void (MyWriter::*)(const std::vector<char>&) convertible to a void (IWriter::*)(const std::vector<char>&).
The rules of C++ don't change just because there could be a third class that combines the two. Neither class is a direct parent/child relative of one another. Therefore, they are treated as entirely distinct classes.
Remember: member functions always take an additional parameter: a this pointer to the object that they point to. You cannot call void (MyWriter::*)(const std::vector<char>&) on an IWriter*. The third class can have a method that casts itself into the proper base class, but it must actually have this method. So either you or the C++ compiler must create it. The rules of C++ require this.
Consider what would have to happen to make this work without a derived-class method.
A function gets an IWriter*. The user calls the write member of it, using nothing more than the IWriter* pointer. So... exactly how can the compiler generate the code to call MyWriter::writer? Remember: MyWriter::writer needs a MyWriter instance. And there is no relationship between IWriter and MyWriter.
So how exactly could the compiler do the type coercion locally? The compiler would have to check the virtual function to see if the actual function to be called takes IWriter or some other type. If it takes another type, it would have to convert the pointer to its true type, then do another conversion to the type needed by the virtual function. After doing all of that, it would then be able to make the call.
All of this overhead would affect every virtual call. All of them would have to at least check to see if the actual function to be call. Every call will also have to generate the code to do the type conversions, just in case.
Every virtual function call would have a "get type" and conditional branch in it. Even if it is never possible to trigger that branch. So you would be paying for something regardless of whether you use it or not. That's not the C++ way.
Even worse, a straight v-table implementation of virtual calls is no longer possible. The fastest method of doing virtual dispatch would not be a conforming implementation. The C++ committee is not going to make any change that would make such implementations impossible.
Again, to what end? Just so that you don't have to write a simple forwarding function?
Just make MyWriter derive from IWriter, eliminate the IWriter derivation in MyOwnClass, and move on with life. This should resolve the problem and should not interfere with the template code.

Why should one not derive from c++ std string class?

I wanted to ask about a specific point made in Effective C++.
It says:
A destructor should be made virtual if a class needs to act like a polymorphic class. It further adds that since std::string does not have a virtual destructor, one should never derive from it. Also std::string is not even designed to be a base class, forget polymorphic base class.
I do not understand what specifically is required in a class to be eligible for being a base class (not a polymorphic one)?
Is the only reason that I should not derive from std::string class is it does not have a virtual destructor? For reusability purpose a base class can be defined and multiple derived class can inherit from it. So what makes std::string not even eligible as a base class?
Also, if there is a base class purely defined for reusability purpose and there are many derived types, is there any way to prevent client from doing Base* p = new Derived() because the classes are not meant to be used polymorphically?
I think this statement reflects the confusion here (emphasis mine):
I do not understand what specifically is required in a class to be eligible for being a base clas (not a polymorphic one)?
In idiomatic C++, there are two uses for deriving from a class:
private inheritance, used for mixins and aspect oriented programming using templates.
public inheritance, used for polymorphic situations only. EDIT: Okay, I guess this could be used in a few mixin scenarios too -- such as boost::iterator_facade -- which show up when the CRTP is in use.
There is absolutely no reason to publicly derive a class in C++ if you're not trying to do something polymorphic. The language comes with free functions as a standard feature of the language, and free functions are what you should be using here.
Think of it this way -- do you really want to force clients of your code to convert to using some proprietary string class simply because you want to tack on a few methods? Because unlike in Java or C# (or most similar object oriented languages), when you derive a class in C++ most users of the base class need to know about that kind of a change. In Java/C#, classes are usually accessed through references, which are similar to C++'s pointers. Therefore, there's a level of indirection involved which decouples the clients of your class, allowing you to substitute a derived class without other clients knowing.
However, in C++, classes are value types -- unlike in most other OO languages. The easiest way to see this is what's known as the slicing problem. Basically, consider:
int StringToNumber(std::string copyMeByValue)
{
std::istringstream converter(copyMeByValue);
int result;
if (converter >> result)
{
return result;
}
throw std::logic_error("That is not a number.");
}
If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object -- no matter what child class of std::string is passed. This can lead to inconsistency between your methods and anything attached to the string. The function StringToNumber cannot simply take whatever your derived object is and copy that, simply because your derived object probably has a different size than a std::string -- but this function was compiled to reserve only the space for a std::string in automatic storage. In Java and C# this is not a problem because the only thing like automatic storage involved are reference types, and the references are always the same size. Not so in C++.
Long story short -- don't use inheritance to tack on methods in C++. That's not idiomatic and results in problems with the language. Use non-friend, non-member functions where possible, followed by composition. Don't use inheritance unless you're template metaprogramming or want polymorphic behavior. For more information, see Scott Meyers' Effective C++ Item 23: Prefer non-member non-friend functions to member functions.
EDIT: Here's a more complete example showing the slicing problem. You can see it's output on codepad.org
#include <ostream>
#include <iomanip>
struct Base
{
int aMemberForASize;
Base() { std::cout << "Constructing a base." << std::endl; }
Base(const Base&) { std::cout << "Copying a base." << std::endl; }
~Base() { std::cout << "Destroying a base." << std::endl; }
};
struct Derived : public Base
{
int aMemberThatMakesMeBiggerThanBase;
Derived() { std::cout << "Constructing a derived." << std::endl; }
Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; }
~Derived() { std::cout << "Destroying a derived." << std::endl; }
};
int SomeThirdPartyMethod(Base /* SomeBase */)
{
return 42;
}
int main()
{
Derived derivedObject;
{
//Scope to show the copy behavior of copying a derived.
Derived aCopy(derivedObject);
}
SomeThirdPartyMethod(derivedObject);
}
To offer the counter side to the general advice (which is sound when there are no particular verbosity/productivity issues evident)...
Scenario for reasonable use
There is at least one scenario where public derivation from bases without virtual destructors can be a good decision:
you want some of the type-safety and code-readability benefits provided by dedicated user-defined types (classes)
an existing base is ideal for storing the data, and allows low-level operations that client code would also want to use
you want the convenience of reusing functions supporting that base class
you understand that any any additional invariants your data logically needs can only be enforced in code explicitly accessing the data as the derived type, and depending on the extent to which that will "naturally" happen in your design, and how much you can trust client code to understand and cooperate with the logically-ideal invariants, you may want members functions of the derived class to reverify expectations (and throw or whatever)
the derived class adds some highly type-specific convenience functions operating over the data, such as custom searches, data filtering / modifications, streaming, statistical analysis, (alternative) iterators
coupling of client code to the base is more appropriate than coupling to the derived class (as the base is either stable or changes to it reflect improvements to functionality also core to the derived class)
put another way: you want the derived class to continue to expose the same API as the base class, even if that means the client code is forced to change, rather than insulating it in some way that allows the base and derived APIs to grow out of sync
you're not going to be mixing pointers to base and derived objects in parts of the code responsible for deleting them
This may sound quite restrictive, but there are plenty of cases in real world programs matching this scenario.
Background discussion: relative merits
Programming is about compromises. Before you write a more conceptually "correct" program:
consider whether it requires added complexity and code that obfuscates the real program logic, and is therefore more error prone overall despite handling one specific issue more robustly,
weigh the practical costs against the probability and consequences of issues, and
consider "return on investment" and what else you could be doing with your time.
If the potential problems involve usage of the objects that you just can't imagine anyone attempting given your insights into their accessibility, scope and nature of usage in the program, or you can generate compile-time errors for dangerous use (e.g. an assertion that derived class size matches the base's, which would prevent adding new data members), then anything else may be premature over-engineering. Take the easy win in clean, intuitive, concise design and code.
Reasons to consider derivation sans virtual destructor
Say you have a class D publicly derived from B. With no effort, the operations on B are possible on D (with the exception of construction, but even if there are a lot of constructors you can often provide effective forwarding by having one template for each distinct number of constructor arguments: e.g. template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { }. Better generalised solution in C++0x variadic templates.)
Further, if B changes then by default D exposes those changes - staying in sync - but someone may need to review extended functionality introduced in D to see if it remains valid, and the client usage.
Rephrasing this: there is reduced explicit coupling between base and derived class, but increased coupling between base and client.
This is often NOT what you want, but sometimes it is ideal, and other times a non issue (see next paragraph). Changes to the base force more client code changes in places distributed throughout the code base, and sometimes the people changing the base may not even have access to the client code to review or update it correspondingly. Sometimes it is better though: if you as the derived class provider - the "man in the middle" - want base class changes to feed through to clients, and you generally want clients to be able - sometimes forced - to update their code when the base class changes without you needing to be constantly involved, then public derivation may be ideal. This is common when your class is not so much an independent entity in its own right, but a thin value-add to the base.
Other times the base class interface is so stable that the coupling may be deemed a non issue. This is especially true of classes like Standard containers.
Summarily, public derivation is a quick way to get or approximate the ideal, familiar base class interface for the derived class - in a way that's concise and self-evidently correct to both the maintainer and client coder - with additional functionality available as member functions (which IMHO - which obviously differs with Sutter, Alexandrescu etc - can aid usability, readability and assist productivity-enhancing tools including IDEs)
C++ Coding Standards - Sutter & Alexandrescu - cons examined
Item 35 of C++ Coding Standards lists issues with the scenario of deriving from std::string. As scenarios go, it's good that it illustrates the burden of exposing a large but useful API, but both good and bad as the base API is remarkably stable - being part of the Standard Library. A stable base is a common situation, but no more common than a volatile one and a good analysis should relate to both cases. While considering the book's list of issues, I'll specifically contrast the issues' applicability to the cases of say:
a) class Issue_Id : public std::string { ...handy stuff... }; <-- public derivation, our controversial usage
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO derivation
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; <-- a compositional approach
d) using std::string everywhere, with freestanding support functions
(Hopefully we can agree the composition is acceptable practice, as it provides encapsulation, type safety as well as a potentially enriched API over and above that of std::string.)
So, say you're writing some new code and start thinking about the conceptual entities in an OO sense. Maybe in a bug tracking system (I'm thinking of JIRA), one of them is say an Issue_Id. Data content is textual - consisting of an alphabetic project id, a hyphen, and an incrementing issue number: e.g. "MYAPP-1234". Issue ids can be stored in a std::string, and there will be lots of fiddly little text searches and manipulation operations needed on issue ids - a large subset of those already provided on std::string and a few more for good measure (e.g. getting the project id component, providing the next possible issue id (MYAPP-1235)).
On to Sutter and Alexandrescu's list of issues...
Nonmember functions work well within existing code that already manipulates strings. If instead you supply a super_string, you force changes through your code base to change types and function signatures to super_string.
The fundamental mistake with this claim (and most of the ones below) is that it promotes the convenience of using only a few types, ignoring the benefits of type safety. It's expressing a preference for d) above, rather than insight into c) or b) as alternatives to a). The art of programming involves balancing the pros and cons of distinct types to achieve reasonable reuse, performance, convenience and safety. The paragraphs below elaborate on this.
Using public derivation, the existing code can implicitly access the base class string as a string, and continue to behave as it always has. There's no specific reason to think that the existing code would want to use any additional functionality from super_string (in our case Issue_Id)... in fact it's often lower-level support code pre-existing the application for which you're creating the super_string, and therefore oblivious to the needs provided for by the extended functions. For example, say there's a non-member function to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it could still be applied to an Issue_Id.
So, unless the non-member support function is being cleaned up or extended at the deliberate cost of tightly coupling it to the new code, then it needn't be touched. If it is being overhauled to support issue ids (for example, using the insight into the data content format to upper-case only leading alpha characters), then it's probably a good thing to ensure it really is being passed an Issue_Id by creating an overload ala to_upper(Issue_Id&) and sticking to either the derivation or compositional approaches allowing type safety. Whether super_string or composition is used makes no difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable free-standing support function isn't likely to be of much use - I can't recall the last time I wanted such a function.
The impulse to use std::string everywhere isn't qualitatively different to accepting all your arguments as containers of variants or void*s so you don't have to change your interfaces to accept arbitrary data, but it makes for error prone implementation and less self-documenting and compiler-verifiable code.
Interface functions that take a string now need to: a) stay away from super_string's added functionality (unuseful); b) copy their argument to a super_string (wasteful); or c) cast the string reference to a super_string reference (awkward and potentially illegal).
This seems to be revisiting the first point - old code that needs to be refactored to use the new functionality, albeit this time client code rather than support code. If the function wants to start treating its argument as an entity for which the new operations are relevant, then it should start taking its arguments as that type and the clients should generate them and accept them using that type. The exact same issues exists for composition. Otherwise, c) can be practical and safe if the guidelines I list below are followed, though it is ugly.
super_string's member functions don't have any more access to string's internals than nonmember functions because string probably doesn't have protected members (remember, it wasn't meant to be derived from in the first place)
True, but sometimes that's a good thing. A lot of base classes have no protected data. The public string interface is all that's needed to manipulate the contents, and useful functionality (e.g. get_project_id() postulated above) can be elegantly expressed in terms of those operations. Conceptually, many times I've derived from Standard containers, I've wanted not to extend or customise their functionality along the existing lines - they're already "perfect" containers - rather I've wanted to add another dimension of behaviour that's specific to my application, and requires no private access. It's because they're already good containers that they're good to reuse.
If super_string hides some of string's functions (and redefining a nonvirtual function in a derived class is not overriding, it's just hiding), that could cause widespread confusion in code that manipulates strings that started their life converted automatically from super_strings.
True for composition too - and more likely to happen as the code doesn't default to passing things through and hence staying in sync, and also true in some situations with run-time polymorphic hierarchies as well. Samed named functions that behave differently in classes that initial appear interchangeable - just nasty. This is effectively the usual caution for correct OO programming, and again not a sufficient reason to abandon the benefits in type safety etc..
What if super_string wants to inherit from string to add more state [explanation of slicing]
Agreed - not a good situation, and somewhere I personally tend to draw the line as it often moves the problems of deletion through a pointer to base from the realm of theory to the very practical - destructors aren't invoked for additional members. Still, slicing can often do what's wanted - given the approach of deriving super_string not to change its inherited functionality, but to add another "dimension" of application-specific functionality....
Admittedly, it's tedious to have to write passthrough functions for the member functions you want to keep, but such an implementation is vastly better and safer than using public or nonpublic inheritance.
Well, certainly agree about the tedium....
Guidelines for successful derivation sans virtual destructor
ideally, avoid adding data members in derived class: variants of slicing can accidentally remove data members, corrupt them, fail to initialise them...
even more so - avoid non-POD data members: deletion via base-class pointer is technically undefined behaviour anyway, but with non-POD types failing to run their destructors is more likely to have non-theoretical problems with resource leaks, bad reference counts etc.
honour the Liskov Substitution Principal / you can't robustly maintain new invariants
for example, in deriving from std::string you can't intercept a few functions and expect your objects to remain uppercase: any code that accesses them via a std::string& or ...* can use std::string's original function implementations to change the value)
derive to model a higher level entity in your application, to extend the inherited functionality with some functionality that uses but doesn't conflict with the base; do not expect or try to change the basic operations - and access to those operations - granted by the base type
be aware of the coupling: base class can't be removed without affecting client code even if the base class evolves to have inappropriate functionality, i.e. your derived class's usability depends on the ongoing appropriateness of the base
sometimes even if you use composition you'll need to expose the data member due to performance, thread safety issues or lack of value semantics - so the loss of encapsulation from public derivation isn't tangibly worse
the more likely people using the potentially-derived class will be unaware of its implementation compromises, the less you can afford to make them dangerous
therefore, low-level widely deployed libraries with many ad-hoc casual users should be more wary of dangerous derivation than localised use by programmers routinely using the functionality at application level and/or in "private" implementation / libraries
Summary
Such derivation is not without issues so don't consider it unless the end result justifies the means. That said, I flatly reject any claim that this can't be used safely and appropriately in particular cases - it's just a matter of where to draw the line.
Personal experience
I do sometimes derive from std::map<>, std::vector<>, std::string etc - I've never been burnt by the slicing or delete-via-base-class-pointer issues, and I've saved a lot of time and energy for more important things. I don't store such objects in heterogeneous polymorphic containers. But, you need to consider whether all the programmers using the object are aware of the issues and likely to program accordingly. I personally like to write my code to use heap and run-time polymorphism only when needed, while some people (due to Java backgrounds, their prefered approach to managing recompilation dependencies or switching between runtime behaviours, testing facilities etc.) use them habitually and therefore need to be more concerned about safe operations via base class pointers.
If you really want to derive from it (not discussing why you want to do it) I think you can prevent Derived class direct heap instantiation by making it's operator new private:
class StringDerived : public std::string {
//...
private:
static void* operator new(size_t size);
static void operator delete(void *ptr);
};
But this way you restrict yourself from any dynamic StringDerived objects.
Not only is the destructor not virtual, std::string contains no virtual functions at all, and no protected members. That makes it very hard for the derived class to modify its functionality.
Then why would you derive from it?
Another problem with being non-polymorphic is that if you pass your derived class to a function expecting a string parameter, your extra functionality will just be sliced off and the object will be seen as a plain string again.
Why should one not derive from c++ std string class?
Because it is not necessary. If you want to use DerivedString for functionality extension; I don't see any problem in deriving std::string. The only thing is, you should not interact between both classes (i.e. don't use string as a receiver for DerivedString).
Is there any way to prevent client from doing Base* p = new Derived()
Yes. Make sure that you provide inline wrappers around Base methods inside Derived class. e.g.
class Derived : protected Base { // 'protected' to avoid Base* p = new Derived
const char* c_str () const { return Base::c_str(); }
//...
};
There are two simple reasons for not deriving from a non-polymorphic class:
Technical: it introduces slicing bugs (because in C++ we pass by value unless otherwise specified)
Functional: if it is non-polymorphic, you can achieve the same effect with composition and some function forwarding
If you wish to add new functionalities to std::string, then first consider using free functions (possibly templates), like the Boost String Algorithm library does.
If you wish to add new data members, then properly wrap the class access by embedding it (Composition) inside a class of your own design.
EDIT:
#Tony noticed rightly that the Functional reason I cited was probably meaningless to most people. There is a simple rule of thumb, in good design, that says that when you can pick a solution among several, you should consider the one with the weaker coupling. Composition has weaker coupling that Inheritance, and thus should be preferred, when possible.
Also, composition gives you the opportunity to nicely wrap the original's class method. This is not possible if you pick inheritance (public) and the methods are not virtual (which is the case here).
The C++ standard states that If Base class destructor is not virtual and you delete an object of Base class that points to the object of an derived class then it causes an undefined Behavior.
C++ standard section 5.3.5/3:
if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined.
To be clear on the Non-polymorphic class & need of virtual destructor
The purpose of making a destructor virtual is to facilitate the polymorphic deletion of objects through delete-expression. If there is no polymorphic deletion of objects, then you don't need virtual destructor's.
Why not to derive from String Class?
One should generally avoid deriving from any standard container class because of the very reason that they don' have virtual destructors, which make it impossible to delete objects polymorphically.
As for the string class, the string class doesn't have any virtual functions so there is nothing that you can possibly override. The best you can do is hide something.
If at all you want to have a string like functionality you should write a class of your own rather than inherit from std::string.
As soon as you add any member (variable) into your derived std::string class, will you systematically screw the stack if you attempt to use the std goodies with an instance of your derived std::string class? Because the stdc++ functions/members have their stack pointers[indexes] fixed [and adjusted] to the size/boundary of the (base std::string) instance size.
Right?
Please, correct me if I am wrong.

Pimpl idiom vs Pure virtual class interface

I was wondering what would make a programmer to choose either Pimpl idiom or pure virtual class and inheritance.
I understand that pimpl idiom comes with one explicit extra indirection for each public method and the object creation overhead.
The Pure virtual class in the other hand comes with implicit indirection(vtable) for the inheriting implementation and I understand that no object creation overhead.
EDIT: But you'd need a factory if you create the object from the outside
What makes the pure virtual class less desirable than the pimpl idiom?
When writing a C++ class, it's appropriate to think about whether it's going to be
A Value Type
Copy by value, identity is never important. It's appropriate for it to be a key in a std::map. Example, a "string" class, or a "date" class, or a "complex number" class. To "copy" instances of such a class makes sense.
An Entity type
Identity is important. Always passed by reference, never by "value". Often, doesn't make sense to "copy" instances of the class at all. When it does make sense, a polymorphic "Clone" method is usually more appropriate. Examples: A Socket class, a Database class, a "policy" class, anything that would be a "closure" in a functional language.
Both pImpl and pure abstract base class are techniques to reduce compile time dependencies.
However, I only ever use pImpl to implement Value types (type 1), and only sometimes when I really want to minimize coupling and compile-time dependencies. Often, it's not worth the bother. As you rightly point out, there's more syntactic overhead because you have to write forwarding methods for all of the public methods. For type 2 classes, I always use a pure abstract base class with associated factory method(s).
Pointer to implementation is usually about hiding structural implementation details. Interfaces are about instancing different implementations. They really serve two different purposes.
The pimpl idiom helps you reduce build dependencies and times especially in large applications, and minimizes header exposure of the implementation details of your class to one compilation unit. The users of your class should not even need to be aware of the existence of a pimple (except as a cryptic pointer to which they are not privy!).
Abstract classes (pure virtuals) is something of which your clients must be aware: if you try to use them to reduce coupling and circular references, you need to add some way of allowing them to create your objects (e.g. through factory methods or classes, dependency injection or other mechanisms).
I was searching an answer for the same question.
After reading some articles and some practice I prefer using "Pure virtual class interfaces".
They are more straight forward (this is a subjective opinion). Pimpl idiom makes me feel I'm writing code "for the compiler", not for the "next developer" that will read my code.
Some testing frameworks have direct support for Mocking pure virtual classes
It's true that you need a factory to be accessible from the outside.
But if you want to leverage polymorphism: that's also "pro", not a "con". ...and a simple factory method does not really hurts so much
The only drawback (I'm trying to investigate on this) is that pimpl idiom could be faster
when the proxy-calls are inlined, while inheriting necessarily need an extra access to the object VTABLE at runtime
the memory footprint the pimpl public-proxy-class is smaller (you can do easily optimizations for faster swaps and other similar optimizations)
I hate pimples! They do the class ugly and not readable. All methods are redirected to pimple. You never see in headers, what functionalities has the class, so you can not refactor it (e. g. simply change the visibility of a method). The class feels like "pregnant". I think using iterfaces is better and really enough to hide the implementation from the client. You can event let one class implement several interfaces to hold them thin. One should prefer interfaces!
Note: You do not necessary need the factory class. Relevant is that the class clients communicate with it's instances via the appropriate interface.
The hiding of private methods I find as a strange paranoia and do not see reason for this since we hav interfaces.
There's a very real problem with shared libraries that the pimpl idiom circumvents neatly that pure virtuals can't: you cannot safely modify/remove data members of a class without forcing users of the class to recompile their code. That may be acceptable under some circumstances, but not e.g. for system libraries.
To explain the problem in detail, consider the following code in your shared library/header:
// header
struct A
{
public:
A();
// more public interface, some of which uses the int below
private:
int a;
};
// library
A::A()
: a(0)
{}
The compiler emits code in the shared library that calculates the address of the integer to be initialized to be a certain offset (probably zero in this case, because it's the only member) from the pointer to the A object it knows to be this.
On the user side of the code, a new A will first allocate sizeof(A) bytes of memory, then hand a pointer to that memory to the A::A() constructor as this.
If in a later revision of your library you decide to drop the integer, make it larger, smaller, or add members, there'll be a mismatch between the amount of memory user's code allocates, and the offsets the constructor code expects. The likely result is a crash, if you're lucky - if you're less lucky, your software behaves oddly.
By pimpl'ing, you can safely add and remove data members to the inner class, as the memory allocation and constructor call happen in the shared library:
// header
struct A
{
public:
A();
// more public interface, all of which delegates to the impl
private:
void * impl;
};
// library
A::A()
: impl(new A_impl())
{}
All you need to do now is keep your public interface free of data members other than the pointer to the implementation object, and you're safe from this class of errors.
Edit: I should maybe add that the only reason I'm talking about the constructor here is that I didn't want to provide more code - the same argumentation applies to all functions that access data members.
We must not forget that inheritance is a stronger, closer coupling than delegation. I would also take into account all the issues raised in the answers given when deciding what design idioms to employ in solving a particular problem.
Although broadly covered in the other answers maybe I can be a bit more explicit about one benefit of pimpl over virtual base classes:
A pimpl approach is transparent from the user view point, meaning you can e.g. create objects of the class on the stack and use them directly in containers. If you try to hide the implementation using an abstract virtual base class, you will need to return a shared pointer to the base class from a factory, complicating it's use. Consider the following equivalent client code:
// Pimpl
Object pi_obj(10);
std::cout << pi_obj.SomeFun1();
std::vector<Object> objs;
objs.emplace_back(3);
objs.emplace_back(4);
objs.emplace_back(5);
for (auto& o : objs)
std::cout << o.SomeFun1();
// Abstract Base Class
auto abc_obj = ObjectABC::CreateObject(20);
std::cout << abc_obj->SomeFun1();
std::vector<std::shared_ptr<ObjectABC>> objs2;
objs2.push_back(ObjectABC::CreateObject(13));
objs2.push_back(ObjectABC::CreateObject(14));
objs2.push_back(ObjectABC::CreateObject(15));
for (auto& o : objs2)
std::cout << o->SomeFun1();
In my understanding these two things serve completely different purposes. The purpose of the pimple idiom is basically give you a handle to your implementation so you can do things like fast swaps for a sort.
The purpose of virtual classes is more along the line of allowing polymorphism, i.e. you have a unknown pointer to an object of a derived type and when you call function x you always get the right function for whatever class the base pointer actually points to.
Apples and oranges really.
The most annoying problem about the pimpl idiom is it makes it extremely hard to maintain and analyse existing code. So using pimpl you pay with developer time and frustration only to "reduce build dependencies and times and minimize header exposure of the implementation details". Decide yourself, if it is really worth it.
Especially "build times" is a problem you can solve by better hardware or using tools like Incredibuild ( www.incredibuild.com, also already included in Visual Studio 2017 ), thus not affecting your software design. Software design should be generally independent of the way the software is built.