Dependency injection ; good practices to reduce boilerplate code - c++

I have a simple question, and I'm not even sure it has an answer but let's try.
I'm coding in C++, and using dependency injection to avoid global state. This works quite well, and I don't run in unexpected/undefined behaviours very often.
However I realise that, as my project grows I'm writing a lot of code which I consider boilerplate. Worse : the fact there is more boilerplate code, than actual code makes it sometimes hard to understand.
Nothing beats a good example so let's go :
I have a class called TimeFactory which creates Time objects.
For more details (not sure it's relevant) : Time objects are quite complex because the Time can have different formats, and conversion between them is neither linear, nor straightforward. Each "Time" contains a Synchronizer to handle conversions, and to make sure they have the same, properly initialized, synchronizer, I use a TimeFactory. The TimeFactory has only one instance and is application wide, so it would qualify for singleton but, because it's mutable, I don't want to make it a singleton
In my app, a lot of classes need to create Time objects. Sometimes those classes are deeply nested.
Let's say I have a class A which contains instances of class B, and so on up to class D. Class D need to create Time objects.
In my naive implementation, I pass the TimeFactory to the constructor of class A, which passes it to the constructor of class B and so on until class D.
Now, imagine I have a couple of classes like TimeFactory and a couple of class hierarchies like the one above : I loose all the flexibility and readability I'm suppose to get using dependency injection.
I'm starting to wonder if there isn't a major design flaw in my app ...
Or is this a necessary evil of using dependency injection ?
What do you think ?

In my naive implementation, I pass the TimeFactory to the constructor
of class A, which passes it to the constructor of class B and so on
until class D.
This is a common misapplication of dependency injection. Unless class A directly uses the TimeFactory, it should not ever see, know about, or have access to the TimeFactory. The D instance should be constructed with the TimeFactory. Then the C instance should be constructed with the D instance you just constructed. Then the B with the C, and finally the A with the B. Now you have an A instance which, indirectly, owns a D instance with access to a TimeFactory, and the A instance never saw the TimeFactory directly passed to it.
Miško Hevery talks about this in this video.

Global state isn't always evil, if it's truly global. There are often engineering trade-offs, and your use of dependency injection already introduces more coupling than using a singleton interface or a global variable would: class A now knows than class B requires a TimeFactory, which is often more detail about class B than class A requires. The same goes for classes B and C, and for classes C and D.
Consider the following solution that uses a singleton pattern:
Have the TimeFactory (abstract) base class provide singleton access to the application's `TimeFactory:
Set that singleton once, to a concrete subclass of TimeFactory
Have all of your accesses to TimeFactory use that singleton.
This creates global state, but decouples clients of that global state from any knowledge of its implementation.
Here's a sketch of a potential implementation:
class TimeFactory
{
public:
// ...
static TimeFactory* getSingleton(void) { return singleton; }
// ...
protected:
void setAsSingleton(void)
{
if (singleton != NULL) {
// handle case where multiple TimeFactory implementations are created
throw std::exception(); // possibly by throwing
}
singleton = this;
}
private:
static TimeFactory* singleton = NULL;
};
Each time a subclass of TimeFactory is instantiated, you can have it call setAsSingleton, either in its constructor or elsewhere.

Related

Should you NEVER create a new object of any other class in another class' method?

I have been trying to understand Dependency injection and Ioc and all related concepts to it, but have not made much progress.
What I have understood is to make my class (say A) testable (my class which uses object of class B), I should have the object injected into my class' constructor like this:
class A{
B class_B_Object;
A(B class_B_Object){
this.class_B_Object = class_B_Object;
}
}
So, my question here is, is it safe to say that I should never have something like var classBObject = new B() in any of my class A's methods?
Never is a very strong word, and you should use it very carefully. Depending on what A and B are, it may be perfectly normal for A to create (and usually return) an instance of B. The main point of testability here is that if A uses an instance of B and relies on its logic, you should probably have an easy way to inject a mock instance of B where this logic is controlled (i.e. - mocked away), so when you write a unit test for A you're just testing its logic, not the underlying logic of B.
The first step to making your code easily unit testable in isolation is by using a weak coupling between your layers. This means that your A class constructor should not take B class but rather IB interface that some potential B class might implement. This would allow you to substitute this interface in your unit test with something you have control over - a mock because you are testing the A class, not B.
As far as your question about var classBObject = new B() is concerned, it would really depend on the situation but in general, it is better to have this instance provided from the outside.

Should members of a class be turned static when more than one object creation is not needed?

class AA
{
public:
AA ();
static void a1 ();
static std :: string b1 ();
static std :: string c1 (unsigned short x);
};
My class won't have different objects interacting with among themselves or with others.
Of course I need to have at least one object to call the functions of this class, so I thought of making the members static so that unnecessary object creation can be avoided.
What are the pros and cons of this design? What will be a better design?
To access to static members, you don't even need an object, just call
AA::a1()
This patterns is called "Monostate", the alternative being Singleton, where you actually create an object, but make sure it's only done once, there are tons of tutorials on how to do that, just google it.
Should members of a class be turned static when more than one object creation is not needed?
You make members of the class static when you need only one instance of the member for all objects of your class. When you declare a class member static the member becomes per class instead of per object.
When you say you need only one object of your class, You are probably pointing towards the singleton design pattern. Note that pattern is widely considered an anti pattern and its usefulness if any is dependent to specific situations.
The reason you mention in Q is no way related to whether you should make a member static or not.
Your class has no data members, so I don't see any good reason to use a class at all:
namespace AA
{
void a1 ();
std :: string b1 ();
std :: string c1 (unsigned short x);
};
Of course I need to have at least one object to call the functions of this class
That's not true. You can call static member functions without an instance of the class.
A note on the Singleton pattern: it has a bad reputation, it is often mis-used, and in my experience it is only very rarely useful. What it does is enforce that there can only be one instance of the class, and that this instance is globally accessible.
People often think, "I only need one instance, therefore I should use Singleton", especially when Singleton is the first Capitalized Design Pattern they're introduced to. This is wrong -- if you only need one instance, create one instance and use it. Don't unnecessarily limit all future users of the class to only create one instance. Don't unnecessarily create shared global state. Both things make your code less flexible, harder to use in different ways and therefore in particular harder to test. Some people would argue that for these reasons, Singleton is strictly never useful.
In this case you don't seem to need even one instance. If that's the case I'd use free functions as above.

Dependency inversion and pervasive dependencies

I'm trying to get dependency inversion, or at least understand how to apply it, but the problem I have at the moment is how to deal with dependencies that are pervasive. The classic example of this is trace logging, but in my application I have many services that most if not all code will depend on (trace logging, string manipulation, user message logging etc).
None of the solutions to this would appear to be particularly palatable:
Using constructor dependency injection would mean that most of the constructors would have several, many, standard injected dependencies because most classes explicitly require those dependencies (they are not just passing them down to objects that they construct).
Service locator pattern just drives the dependencies underground, removing them from the constructor but hiding them so that it's not even explicit that the dependencies are required
Singleton services are, well, Singletons, and also serve to hide the dependencies
Lumping all those common services together into a single CommonServices interface and injecting that aswell a) violates the Law of Demeter and b) is really just another name for a Service Locator, albeit a specific rather than a generic one.
Does anyone have any other suggestions for how to structure these kinds of dependencies, or indeed any experience of any of the above solutions?
Note that I don't have a particular DI framework in mind, in fact we're programming in C++ and would be doing any injection manually (if indeed dependencies are injected).
Service locator pattern just drives the dependencies underground,
Singleton services are, well, Singletons, and also serve to hide the
dependencies
This is a good observation. Hiding the dependencies doesn't remove them. Instead you should address the number of dependencies a class needs.
Using constructor dependency injection would mean that most of the
constructors would have several, many, standard injected dependencies
because most classes explicitly require those dependencies
If this is the case, you are probably violating the Single Responsibility Principle. In other words, those classes are probably too big and do too much. Since you are talking about logging and tracing, you should ask yourself if you aren't logging too much. But in general, logging and tracing are cross-cutting concerns and you should not have to add them to many classes in the system. If you correctly apply the SOLID principles, this problem goes away (as explained here).
The Dependency Inversion principle is part of the SOLID Principles and is an important principle for among other things, to promote testability and reuse of the higher-level algorithm.
Background:
As indicated on Uncle Bob's web page, Dependency Inversion is about depend on abstractions, not on concretions.
In practice, what happens is that some places where your class instantiates another class directly, need to be changed such that the implementation of the inner class can be specified by the caller.
For instance, if I have a Model class, I should not hard code it to use a specific database class. If I do that, I cannot use the Model class to use a different database implementation. This might be useful if you have a different database provider, or you may want to replace the database provider with a fake database for testing purposes.
Rather than the Model doing a "new" on the Database class, it will simply use an IDatabase interface that the Database class implements. The Model never refers to a concrete Database class. But then who instantiates the Database class? One solution is Constructor Injection (part of Dependency Injection). For this example, the Model class is given a new constructor that takes an IDatabase instance which it is to use, rather than instantiate one itself.
This solves the original problem of the Model no longer references the concrete Database class and uses the database through the IDatabase abstraction. But it introduces the problem mentioned in the Question, which is that it goes against Law of Demeter. That is, in this case, the caller of Model now has to know about IDatabase, when previously it did not. The Model is now exposing to its clients some detail about how it gets its job done.
Even if you were okay with this, there's another issue that seems to confuse a lot of people, including some trainers. There's as an assumption that any time a class, such as Model, instantiates another class concretely, then it's breaking the Dependency Inversion principle and therefore it is bad. But in practice, you can't follow these types of hard-and-fast rules. There are times when you need to use concrete classes. For instance, if you're going to throw an exception you have to "new it up" (eg. threw new BadArgumentException(...)). Or use classes from the base system such as strings, dictionaries, etc.
There's no simple rule that works in all cases. You have to understand what it is that you're trying to accomplish. If you're after testability, then the fact that the Model classes references the Database class directly is not itself a problem. The problem is the fact that the Model class has no other means of using another Database class. You solve this problem by implementing the Model class such that it uses IDatabase, and allows a client to specify an IDatabase implementation. If one is not specified by the client, the Model can then use a concrete implementation.
This is similar to the design of the many libraries, including C++ Standard Library. For instance, looking at the declaration std::set container:
template < class T, // set::key_type/value_type
class Compare = less<T>, // set::key_compare/value_compare
class Alloc = allocator<T> > // set::allocator_type
> class set;
You can see that it allows you to specify a comparer and an allocator, but most of the time, you take the default, especially the allocator. The STL has many such facets, especially in the IO library where detailed aspects of streaming can be augmented for localization, endianness, locales, etc.
In addition to testability, this allows the reuse of the higher-level algorithm with entirely different implementation of the classes that the algorithm internally uses.
And finally, back to the assertion I made previously with regard to scenarios where you would not want to invert the dependency. That is, there are times when you need to instantiate a concrete class, such as when instantiating the exception class, BadArgumentException. But, if you're after testability, you can also make the argument that you do, in fact, want to invert dependency of this as well. You may want to design the Model class such that all instantiations of exceptions are delegated to a class and invoked through an abstract interface. That way, code that tests the Model class can provide its own exception class whose usage the test can then monitor.
I've had colleagues give me examples where they abstract instantiation of even system calls, such as "getsystemtime" simply so they can test daylight savings and time-zone scenarios through their unit-testing.
Follow the YAGNI principle -- don't add abstractions simply because you think you might need it. If you're practicing test-first development, the right abstractions becomes apparent and only just enough abstraction is implemented to pass the test.
class Base {
public:
void doX() {
doA();
doB();
}
virtual void doA() {/*does A*/}
virtual void doB() {/*does B*/}
};
class LoggedBase public : Base {
public:
LoggedBase(Logger& logger) : l(logger) {}
virtual void doA() {l.log("start A"); Base::doA(); l.log("Stop A");}
virtual void doB() {l.log("start B"); Base::doB(); l.log("Stop B");}
private:
Logger& l;
};
Now you can create the LoggedBase using an abstract factory that knows about the logger. Nobody else has to know about the logger, nor do they need to know about LoggedBase.
class BaseFactory {
public:
virtual Base& makeBase() = 0;
};
class BaseFactoryImp public : BaseFactory {
public:
BaseFactoryImp(Logger& logger) : l(logger) {}
virtual Base& makeBase() {return *(new LoggedBase(l));}
};
The factory implementation is held in a global variable:
BaseFactory* baseFactory;
And is initialized to an instance of BaseFactoryImp by 'main' or some function close to main. Only that function knows about BaseFactoryImp and LoggedBase. Everyone else is blissfully ignorant of them all.

Why should one not derive from c++ std string class?

I wanted to ask about a specific point made in Effective C++.
It says:
A destructor should be made virtual if a class needs to act like a polymorphic class. It further adds that since std::string does not have a virtual destructor, one should never derive from it. Also std::string is not even designed to be a base class, forget polymorphic base class.
I do not understand what specifically is required in a class to be eligible for being a base class (not a polymorphic one)?
Is the only reason that I should not derive from std::string class is it does not have a virtual destructor? For reusability purpose a base class can be defined and multiple derived class can inherit from it. So what makes std::string not even eligible as a base class?
Also, if there is a base class purely defined for reusability purpose and there are many derived types, is there any way to prevent client from doing Base* p = new Derived() because the classes are not meant to be used polymorphically?
I think this statement reflects the confusion here (emphasis mine):
I do not understand what specifically is required in a class to be eligible for being a base clas (not a polymorphic one)?
In idiomatic C++, there are two uses for deriving from a class:
private inheritance, used for mixins and aspect oriented programming using templates.
public inheritance, used for polymorphic situations only. EDIT: Okay, I guess this could be used in a few mixin scenarios too -- such as boost::iterator_facade -- which show up when the CRTP is in use.
There is absolutely no reason to publicly derive a class in C++ if you're not trying to do something polymorphic. The language comes with free functions as a standard feature of the language, and free functions are what you should be using here.
Think of it this way -- do you really want to force clients of your code to convert to using some proprietary string class simply because you want to tack on a few methods? Because unlike in Java or C# (or most similar object oriented languages), when you derive a class in C++ most users of the base class need to know about that kind of a change. In Java/C#, classes are usually accessed through references, which are similar to C++'s pointers. Therefore, there's a level of indirection involved which decouples the clients of your class, allowing you to substitute a derived class without other clients knowing.
However, in C++, classes are value types -- unlike in most other OO languages. The easiest way to see this is what's known as the slicing problem. Basically, consider:
int StringToNumber(std::string copyMeByValue)
{
std::istringstream converter(copyMeByValue);
int result;
if (converter >> result)
{
return result;
}
throw std::logic_error("That is not a number.");
}
If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object -- no matter what child class of std::string is passed. This can lead to inconsistency between your methods and anything attached to the string. The function StringToNumber cannot simply take whatever your derived object is and copy that, simply because your derived object probably has a different size than a std::string -- but this function was compiled to reserve only the space for a std::string in automatic storage. In Java and C# this is not a problem because the only thing like automatic storage involved are reference types, and the references are always the same size. Not so in C++.
Long story short -- don't use inheritance to tack on methods in C++. That's not idiomatic and results in problems with the language. Use non-friend, non-member functions where possible, followed by composition. Don't use inheritance unless you're template metaprogramming or want polymorphic behavior. For more information, see Scott Meyers' Effective C++ Item 23: Prefer non-member non-friend functions to member functions.
EDIT: Here's a more complete example showing the slicing problem. You can see it's output on codepad.org
#include <ostream>
#include <iomanip>
struct Base
{
int aMemberForASize;
Base() { std::cout << "Constructing a base." << std::endl; }
Base(const Base&) { std::cout << "Copying a base." << std::endl; }
~Base() { std::cout << "Destroying a base." << std::endl; }
};
struct Derived : public Base
{
int aMemberThatMakesMeBiggerThanBase;
Derived() { std::cout << "Constructing a derived." << std::endl; }
Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; }
~Derived() { std::cout << "Destroying a derived." << std::endl; }
};
int SomeThirdPartyMethod(Base /* SomeBase */)
{
return 42;
}
int main()
{
Derived derivedObject;
{
//Scope to show the copy behavior of copying a derived.
Derived aCopy(derivedObject);
}
SomeThirdPartyMethod(derivedObject);
}
To offer the counter side to the general advice (which is sound when there are no particular verbosity/productivity issues evident)...
Scenario for reasonable use
There is at least one scenario where public derivation from bases without virtual destructors can be a good decision:
you want some of the type-safety and code-readability benefits provided by dedicated user-defined types (classes)
an existing base is ideal for storing the data, and allows low-level operations that client code would also want to use
you want the convenience of reusing functions supporting that base class
you understand that any any additional invariants your data logically needs can only be enforced in code explicitly accessing the data as the derived type, and depending on the extent to which that will "naturally" happen in your design, and how much you can trust client code to understand and cooperate with the logically-ideal invariants, you may want members functions of the derived class to reverify expectations (and throw or whatever)
the derived class adds some highly type-specific convenience functions operating over the data, such as custom searches, data filtering / modifications, streaming, statistical analysis, (alternative) iterators
coupling of client code to the base is more appropriate than coupling to the derived class (as the base is either stable or changes to it reflect improvements to functionality also core to the derived class)
put another way: you want the derived class to continue to expose the same API as the base class, even if that means the client code is forced to change, rather than insulating it in some way that allows the base and derived APIs to grow out of sync
you're not going to be mixing pointers to base and derived objects in parts of the code responsible for deleting them
This may sound quite restrictive, but there are plenty of cases in real world programs matching this scenario.
Background discussion: relative merits
Programming is about compromises. Before you write a more conceptually "correct" program:
consider whether it requires added complexity and code that obfuscates the real program logic, and is therefore more error prone overall despite handling one specific issue more robustly,
weigh the practical costs against the probability and consequences of issues, and
consider "return on investment" and what else you could be doing with your time.
If the potential problems involve usage of the objects that you just can't imagine anyone attempting given your insights into their accessibility, scope and nature of usage in the program, or you can generate compile-time errors for dangerous use (e.g. an assertion that derived class size matches the base's, which would prevent adding new data members), then anything else may be premature over-engineering. Take the easy win in clean, intuitive, concise design and code.
Reasons to consider derivation sans virtual destructor
Say you have a class D publicly derived from B. With no effort, the operations on B are possible on D (with the exception of construction, but even if there are a lot of constructors you can often provide effective forwarding by having one template for each distinct number of constructor arguments: e.g. template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { }. Better generalised solution in C++0x variadic templates.)
Further, if B changes then by default D exposes those changes - staying in sync - but someone may need to review extended functionality introduced in D to see if it remains valid, and the client usage.
Rephrasing this: there is reduced explicit coupling between base and derived class, but increased coupling between base and client.
This is often NOT what you want, but sometimes it is ideal, and other times a non issue (see next paragraph). Changes to the base force more client code changes in places distributed throughout the code base, and sometimes the people changing the base may not even have access to the client code to review or update it correspondingly. Sometimes it is better though: if you as the derived class provider - the "man in the middle" - want base class changes to feed through to clients, and you generally want clients to be able - sometimes forced - to update their code when the base class changes without you needing to be constantly involved, then public derivation may be ideal. This is common when your class is not so much an independent entity in its own right, but a thin value-add to the base.
Other times the base class interface is so stable that the coupling may be deemed a non issue. This is especially true of classes like Standard containers.
Summarily, public derivation is a quick way to get or approximate the ideal, familiar base class interface for the derived class - in a way that's concise and self-evidently correct to both the maintainer and client coder - with additional functionality available as member functions (which IMHO - which obviously differs with Sutter, Alexandrescu etc - can aid usability, readability and assist productivity-enhancing tools including IDEs)
C++ Coding Standards - Sutter & Alexandrescu - cons examined
Item 35 of C++ Coding Standards lists issues with the scenario of deriving from std::string. As scenarios go, it's good that it illustrates the burden of exposing a large but useful API, but both good and bad as the base API is remarkably stable - being part of the Standard Library. A stable base is a common situation, but no more common than a volatile one and a good analysis should relate to both cases. While considering the book's list of issues, I'll specifically contrast the issues' applicability to the cases of say:
a) class Issue_Id : public std::string { ...handy stuff... }; <-- public derivation, our controversial usage
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO derivation
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; <-- a compositional approach
d) using std::string everywhere, with freestanding support functions
(Hopefully we can agree the composition is acceptable practice, as it provides encapsulation, type safety as well as a potentially enriched API over and above that of std::string.)
So, say you're writing some new code and start thinking about the conceptual entities in an OO sense. Maybe in a bug tracking system (I'm thinking of JIRA), one of them is say an Issue_Id. Data content is textual - consisting of an alphabetic project id, a hyphen, and an incrementing issue number: e.g. "MYAPP-1234". Issue ids can be stored in a std::string, and there will be lots of fiddly little text searches and manipulation operations needed on issue ids - a large subset of those already provided on std::string and a few more for good measure (e.g. getting the project id component, providing the next possible issue id (MYAPP-1235)).
On to Sutter and Alexandrescu's list of issues...
Nonmember functions work well within existing code that already manipulates strings. If instead you supply a super_string, you force changes through your code base to change types and function signatures to super_string.
The fundamental mistake with this claim (and most of the ones below) is that it promotes the convenience of using only a few types, ignoring the benefits of type safety. It's expressing a preference for d) above, rather than insight into c) or b) as alternatives to a). The art of programming involves balancing the pros and cons of distinct types to achieve reasonable reuse, performance, convenience and safety. The paragraphs below elaborate on this.
Using public derivation, the existing code can implicitly access the base class string as a string, and continue to behave as it always has. There's no specific reason to think that the existing code would want to use any additional functionality from super_string (in our case Issue_Id)... in fact it's often lower-level support code pre-existing the application for which you're creating the super_string, and therefore oblivious to the needs provided for by the extended functions. For example, say there's a non-member function to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it could still be applied to an Issue_Id.
So, unless the non-member support function is being cleaned up or extended at the deliberate cost of tightly coupling it to the new code, then it needn't be touched. If it is being overhauled to support issue ids (for example, using the insight into the data content format to upper-case only leading alpha characters), then it's probably a good thing to ensure it really is being passed an Issue_Id by creating an overload ala to_upper(Issue_Id&) and sticking to either the derivation or compositional approaches allowing type safety. Whether super_string or composition is used makes no difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable free-standing support function isn't likely to be of much use - I can't recall the last time I wanted such a function.
The impulse to use std::string everywhere isn't qualitatively different to accepting all your arguments as containers of variants or void*s so you don't have to change your interfaces to accept arbitrary data, but it makes for error prone implementation and less self-documenting and compiler-verifiable code.
Interface functions that take a string now need to: a) stay away from super_string's added functionality (unuseful); b) copy their argument to a super_string (wasteful); or c) cast the string reference to a super_string reference (awkward and potentially illegal).
This seems to be revisiting the first point - old code that needs to be refactored to use the new functionality, albeit this time client code rather than support code. If the function wants to start treating its argument as an entity for which the new operations are relevant, then it should start taking its arguments as that type and the clients should generate them and accept them using that type. The exact same issues exists for composition. Otherwise, c) can be practical and safe if the guidelines I list below are followed, though it is ugly.
super_string's member functions don't have any more access to string's internals than nonmember functions because string probably doesn't have protected members (remember, it wasn't meant to be derived from in the first place)
True, but sometimes that's a good thing. A lot of base classes have no protected data. The public string interface is all that's needed to manipulate the contents, and useful functionality (e.g. get_project_id() postulated above) can be elegantly expressed in terms of those operations. Conceptually, many times I've derived from Standard containers, I've wanted not to extend or customise their functionality along the existing lines - they're already "perfect" containers - rather I've wanted to add another dimension of behaviour that's specific to my application, and requires no private access. It's because they're already good containers that they're good to reuse.
If super_string hides some of string's functions (and redefining a nonvirtual function in a derived class is not overriding, it's just hiding), that could cause widespread confusion in code that manipulates strings that started their life converted automatically from super_strings.
True for composition too - and more likely to happen as the code doesn't default to passing things through and hence staying in sync, and also true in some situations with run-time polymorphic hierarchies as well. Samed named functions that behave differently in classes that initial appear interchangeable - just nasty. This is effectively the usual caution for correct OO programming, and again not a sufficient reason to abandon the benefits in type safety etc..
What if super_string wants to inherit from string to add more state [explanation of slicing]
Agreed - not a good situation, and somewhere I personally tend to draw the line as it often moves the problems of deletion through a pointer to base from the realm of theory to the very practical - destructors aren't invoked for additional members. Still, slicing can often do what's wanted - given the approach of deriving super_string not to change its inherited functionality, but to add another "dimension" of application-specific functionality....
Admittedly, it's tedious to have to write passthrough functions for the member functions you want to keep, but such an implementation is vastly better and safer than using public or nonpublic inheritance.
Well, certainly agree about the tedium....
Guidelines for successful derivation sans virtual destructor
ideally, avoid adding data members in derived class: variants of slicing can accidentally remove data members, corrupt them, fail to initialise them...
even more so - avoid non-POD data members: deletion via base-class pointer is technically undefined behaviour anyway, but with non-POD types failing to run their destructors is more likely to have non-theoretical problems with resource leaks, bad reference counts etc.
honour the Liskov Substitution Principal / you can't robustly maintain new invariants
for example, in deriving from std::string you can't intercept a few functions and expect your objects to remain uppercase: any code that accesses them via a std::string& or ...* can use std::string's original function implementations to change the value)
derive to model a higher level entity in your application, to extend the inherited functionality with some functionality that uses but doesn't conflict with the base; do not expect or try to change the basic operations - and access to those operations - granted by the base type
be aware of the coupling: base class can't be removed without affecting client code even if the base class evolves to have inappropriate functionality, i.e. your derived class's usability depends on the ongoing appropriateness of the base
sometimes even if you use composition you'll need to expose the data member due to performance, thread safety issues or lack of value semantics - so the loss of encapsulation from public derivation isn't tangibly worse
the more likely people using the potentially-derived class will be unaware of its implementation compromises, the less you can afford to make them dangerous
therefore, low-level widely deployed libraries with many ad-hoc casual users should be more wary of dangerous derivation than localised use by programmers routinely using the functionality at application level and/or in "private" implementation / libraries
Summary
Such derivation is not without issues so don't consider it unless the end result justifies the means. That said, I flatly reject any claim that this can't be used safely and appropriately in particular cases - it's just a matter of where to draw the line.
Personal experience
I do sometimes derive from std::map<>, std::vector<>, std::string etc - I've never been burnt by the slicing or delete-via-base-class-pointer issues, and I've saved a lot of time and energy for more important things. I don't store such objects in heterogeneous polymorphic containers. But, you need to consider whether all the programmers using the object are aware of the issues and likely to program accordingly. I personally like to write my code to use heap and run-time polymorphism only when needed, while some people (due to Java backgrounds, their prefered approach to managing recompilation dependencies or switching between runtime behaviours, testing facilities etc.) use them habitually and therefore need to be more concerned about safe operations via base class pointers.
If you really want to derive from it (not discussing why you want to do it) I think you can prevent Derived class direct heap instantiation by making it's operator new private:
class StringDerived : public std::string {
//...
private:
static void* operator new(size_t size);
static void operator delete(void *ptr);
};
But this way you restrict yourself from any dynamic StringDerived objects.
Not only is the destructor not virtual, std::string contains no virtual functions at all, and no protected members. That makes it very hard for the derived class to modify its functionality.
Then why would you derive from it?
Another problem with being non-polymorphic is that if you pass your derived class to a function expecting a string parameter, your extra functionality will just be sliced off and the object will be seen as a plain string again.
Why should one not derive from c++ std string class?
Because it is not necessary. If you want to use DerivedString for functionality extension; I don't see any problem in deriving std::string. The only thing is, you should not interact between both classes (i.e. don't use string as a receiver for DerivedString).
Is there any way to prevent client from doing Base* p = new Derived()
Yes. Make sure that you provide inline wrappers around Base methods inside Derived class. e.g.
class Derived : protected Base { // 'protected' to avoid Base* p = new Derived
const char* c_str () const { return Base::c_str(); }
//...
};
There are two simple reasons for not deriving from a non-polymorphic class:
Technical: it introduces slicing bugs (because in C++ we pass by value unless otherwise specified)
Functional: if it is non-polymorphic, you can achieve the same effect with composition and some function forwarding
If you wish to add new functionalities to std::string, then first consider using free functions (possibly templates), like the Boost String Algorithm library does.
If you wish to add new data members, then properly wrap the class access by embedding it (Composition) inside a class of your own design.
EDIT:
#Tony noticed rightly that the Functional reason I cited was probably meaningless to most people. There is a simple rule of thumb, in good design, that says that when you can pick a solution among several, you should consider the one with the weaker coupling. Composition has weaker coupling that Inheritance, and thus should be preferred, when possible.
Also, composition gives you the opportunity to nicely wrap the original's class method. This is not possible if you pick inheritance (public) and the methods are not virtual (which is the case here).
The C++ standard states that If Base class destructor is not virtual and you delete an object of Base class that points to the object of an derived class then it causes an undefined Behavior.
C++ standard section 5.3.5/3:
if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined.
To be clear on the Non-polymorphic class & need of virtual destructor
The purpose of making a destructor virtual is to facilitate the polymorphic deletion of objects through delete-expression. If there is no polymorphic deletion of objects, then you don't need virtual destructor's.
Why not to derive from String Class?
One should generally avoid deriving from any standard container class because of the very reason that they don' have virtual destructors, which make it impossible to delete objects polymorphically.
As for the string class, the string class doesn't have any virtual functions so there is nothing that you can possibly override. The best you can do is hide something.
If at all you want to have a string like functionality you should write a class of your own rather than inherit from std::string.
As soon as you add any member (variable) into your derived std::string class, will you systematically screw the stack if you attempt to use the std goodies with an instance of your derived std::string class? Because the stdc++ functions/members have their stack pointers[indexes] fixed [and adjusted] to the size/boundary of the (base std::string) instance size.
Right?
Please, correct me if I am wrong.

Factory Pattern in C++ -- doing this correctly?

I am relatively new to "design patterns" as they are referred to in a formal sense. I've not been a professional for very long, so I'm pretty new to this.
We've got a pure virtual interface base class. This interface class is obviously to provide the definition of what functionality its derived children are supposed to do. The current use and situation in the software dictates what type of derived child we want to use, so I recommended creating a wrapper that will communicate which type of derived child we want and return a Base pointer that points to a new derived object. This wrapper, to my understanding, is a factory.
Well, a colleague of mine created a static function in the Base class to act as the factory. This causes me trouble for two reasons. First, it seems to break the interface nature of the Base class. It feels wrong to me that the interface would itself need to have knowledge of the children derived from it.
Secondly, it causes more problems when I try to re-use the Base class across two different Qt projects. One project is where I am implementing the first (and probably only real implementation for this one class... though i want to use the same method for two other features that will have several different derived classes) derived class and the second is the actual application where my code will eventually be used. My colleague has created a derived class to act as a tester for the real application while I code my part. This means that I've got to add his headers and cpp files to my project, and that just seems wrong since I'm not even using his code for the project while I implement my part (but he will use mine when it is finished).
Am I correct in thinking that the factory really needs to be a wrapper around the Base class rather than the Base acting as the factory?
You do NOT want to use your interface class as the factory class. For one, if it is a true interface class, there is no implementation. Second, if the interface class does have some implementation defined (in addition to the pure virtual functions), making a static factory method now forces the base class to be recompiled every time you add a child class implementation.
The best way to implement the factory pattern is to have your interface class separate from your factory.
A very simple (and incomplete) example is below:
class MyInterface
{
public:
virtual void MyFunc() = 0;
};
class MyImplementation : public MyInterface
{
public:
virtual void MyFunc() {}
};
class MyFactory
{
public:
static MyInterface* CreateImplementation(...);
};
I'd have to agree with you. Probably one of the most important principles of object oriented programming is to have a single responsibility for the scope of a piece of code (whether it's a method, class or namespace). In your case, your base class serves the purpose of defining an interface. Adding a factory method to that class, violates that principle, opening the door to a world of shi... trouble.
Yes, a static factory method in the interface (base class) requires it to have knowledge of all possible instantiations. That way, you don't get any of the flexibility the Factory Method pattern is intended to bring.
The Factory should be an independent piece of code, used by client code to create instances. You have to decide somewhere in your program what concrete instance to create. Factory Method allows you to avoid having the same decision spread out through your client code. If later you want to change the implementation (or e.g. for testing), you have just one place to edit: this may be e.g. a simple global change, through conditional compilation (usually for tests), or even via a dependency injection configuration file.
Be careful about how client code communicates what kind of implementation it wants: that's not an uncommon way of reintroducing the dependencies factories are meant to hide.
It's not uncommon to see factory member functions in a class, but it makes my eyes bleed. Often their use have been mixed up with the functionality of the named constructor idiom. Moving the creation function(s) to a separate factory class will buy you more flexibility also to swap factories during testing.
When the interface is just for hiding the implementation details and there will be only one implementation of the Base interface ever, it could be ok to couple them. In that case, the factory function is just a new name for the constructor of the actual implementation.
However, that case is rare. Except when explicit designed having only one implementation ever, you are better off to assume that multiple implementations will exist at some point in time, if only for testing (as you discovered).
So usually it is better to split the Factory part into a separate class.