pattern to avoid dynamic_cast - c++

I have a class:
class A
{
public:
virtual void func() {…}
virtual void func2() {…}
};
And some derived classes from this one, lets say B,C,D... In 95 % of the cases, i want to go through all objects and call func or func2(), so therefore i have them in a vector, like:
std::vector<std::shared_ptr<A> > myVec;
…
for (auto it = myVec.begin(); it != myVec.end(); ++it)
(*it).func();
However, in the rest 5 % of the cases i want to do something different to the classes depending on their subclass. And I mean totally different, like calling functions that takes other parameters or not calling functions at all for some subclasses. I have thought of some options to solve this, none of which I really like:
Use dynamic_cast to analyze subclass. Not good, too slow as I make calls very often and on limited hardware
Use a flag in each subclass, like an enum {IS_SUBCLASS_B, IS_SUBCLASS_C}. Not good as it doesnt feel OO.
Also put the classes in other vectors, each for their specific task. This doesnt feel really OO either, but maybe I'm wrong here. Like:
std::vector<std::shared_ptr<B> > vecForDoingSpecificOperation;
std::vector<std::shared_ptr<C> > vecForDoingAnotherSpecificOperation;
So, can someone suggest a style/pattern that achieves what I want?

Someone intelligent (unfortunately I forgot who) once said about OOP in C++: The only reason for switch-ing over types (which is what all your suggestions propose) is fear of virtual functions. (That's para-paraphrasing.) Add virtual functions to your base class which derived classes can override, and you're set.
Now, I know there are cases where this is hard or unwieldy. For that we have the visitor pattern.
There's cases where one is better, and cases where the other is. Usually, the rule of thumb goes like this:
If you have a rather fixed set of operations, but keep adding types, use virtual functions.
Operations are hard to add to/remove from a big inheritance hierarchy, but new types are easy to add by simply having them override the appropriate virtual functions.
If you have a rather fixed set of types, but keep adding operations, use the visitor pattern.
Adding new types to a large set of visitors is a serious pain in the neck, but adding a new visitor to a fixed set of types is easy.
(If both change, you're doomed either way.)

According to your comments, what you have stumbled upon is known (dubiously) as the Expression Problem, as expressed by Philip Wadler:
The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).
That is, extending both "vertically" (adding types to the hierarchy) and "horizontally" (adding functions to be overriden to the base class) is hard on the programmer.
There was a long (as always) discussion about it on Reddit in which I proposed a solution in C++.
It is a bridge between OO (great at adding new types) and generic programming (great at adding new functions). The idea is to have a hierachy of pure interfaces and a set of non-polymorphic types. Free-functions are defined on the concrete types as needed, and the bridge with the pure interfaces is brought by a single template class for each interface (supplemented by a template function for automatic deduction).
I have found a single limitation to date: if a function returns a Base interface, it may have been generated as-is, even though the actual type wrapped supports more operations, now. This is typical of a modular design (the new functions were not available at the call site). I think it illustrates a clean design, however I understand one could want to "recast" it to a more verbose interface. Go can, with language support (basically, runtime introspection of the available methods). I don't want to code this in C++.
As already explained myself on reddit... I'll just reproduce and tweak the code I already submitted there.
So, let's start with 2 types and a single operation.
struct Square { double side; };
double area(Square const s);
struct Circle { double radius; };
double area(Circle const c);
Now, let's make a Shape interface:
class Shape {
public:
virtual ~Shape();
virtual double area() const = 0;
protected:
Shape(Shape const&) {}
Shape& operator=(Shape const&) { return *this; }
};
typedef std::unique_ptr<Shape> ShapePtr;
template <typename T>
class ShapeT: public Shape {
public:
explicit ShapeT(T const t): _shape(t) {}
virtual double area() const { return area(_shape); }
private:
T _shape;
};
template <typename T>
ShapePtr newShape(T t) { return ShapePtr(new ShapeT<T>(t)); }
Okay, C++ is verbose. Let's check the use immediately:
double totalArea(std::vector<ShapePtr> const& shapes) {
double total = 0.0;
for (ShapePtr const& s: shapes) { total += s->area(); }
return total;
}
int main() {
std::vector<ShapePtr> shapes{ new_shape<Square>({5.0}), new_shape<Circle>({3.0}) };
std::cout << totalArea(shapes) << "\n";
}
So, first exercise, let's add a shape (yep, it's all):
struct Rectangle { double length, height; };
double area(Rectangle const r);
Okay, so far so good, let's add a new function. We have two options.
The first is to modify Shape if it is in our power. This is source compatible, but not binary compatible.
// 1. We need to extend Shape:
virtual double perimeter() const = 0
// 2. And its adapter: ShapeT
virtual double perimeter() const { return perimeter(_shape); }
// 3. And provide the method for each Shape (obviously)
double perimeter(Square const s);
double perimeter(Circle const c);
double perimeter(Rectangle const r);
It may seem that we fall into the Expression Problem here, but we don't. We needed to add the perimeter for each (already known) class because there is no way to automatically infer it; however it did not require editing each class either!
Therefore, the combination of External Interface and free functions let us neatly (well, it is C++...) sidestep the issue.
sodraz noticed in comments that the addition of a function touched the original interface which may need to be frozen (provided by a 3rd party, or for binary compatibility issues).
The second options therefore is not intrusive, at the cost of being slightly more verbose:
class ExtendedShape: public Shape {
public:
virtual double perimeter() const = 0;
protected:
ExtendedShape(ExtendedShape const&) {}
ExtendedShape& operator=(ExtendedShape const&) { return *this; }
};
typedef std::unique_ptr<ExtendedShape> ExtendedShapePtr;
template <typename T>
class ExtendedShapeT: public ExtendedShape {
public:
virtual double area() const { return area(_data); }
virtual double perimeter() const { return perimeter(_data); }
private:
T _data;
};
template <typename T>
ExtendedShapePtr newExtendedShape(T t) { return ExtendedShapePtr(new ExtendedShapeT<T>(t)); }
And then, define the perimeter function for all those Shape we would like to use with the ExtendedShape.
The old code, compiled to work against Shape, still works. It does not need the new function anyway.
The new code can make use of the new functionality, and still interface painlessly with the old code. (*)
There is only one slight issue, if the old code return a ShapePtr, we do not know whether the shape actually has a perimeter function (note: if the pointer is generated internally, it has not been generated with the newExtendedShape mechanism). This is the limitation of the design mentioned at the beginning. Oops :)
(*) Note: painlessly implies that you know who the owner is. A std::unique_ptr<Derived>& and a std::unique_ptr<Base>& are not compatible, however a std::unique_ptr<Base> can be build from a std::unique_ptr<Derived> and a Base* from a Derived* so make sure your functions are clean ownership-wise and you're golden.

Related

What is a good design to use external class on member functions?

I have the following design problem and am seeking for the most elegant and even more important most efficient solution as this problem comes from a context where performance is an issue.
Simply spoken I have a class "Function_processor" that does some calculations for real functions (e.g. calculates the roots of a real function) and I have another class "A" that has different such functions and needs to use the Function_processor to perform calculations on them.
The Function_processor should be as generic as possible (e.g. do not provide interfaces for all sorts of different objects), but merely stick to its own task (do calculations for any functions).
#include "function_processor.h"
class A {
double a;
public:
A(double a) : a(a) {}
double function1(double x) {
return a*x;
}
double function2(double x){
return a*x*x;
}
double calculate_sth() {
Function_processor function_processor(3*a+1, 7);
return function_processor.do_sth(&function1);
}
};
class Function_processor {
double p1, p2;
public:
Function_processor(double parameter1, double parameter2);
double do_sth(double (*function)(double));
double do_sth_else(double (*function)(double));
};
Clearly I can not pass the member functions A::function1/2 as in the following example (I know that, but this is roughly what I would consider readable code).
Also I can not make function1/2 static because they use the non-static member a.
I am sure I could use sth like std::bind or templates (even though I have hardly any experience with these things) but then I am mostly concerned about the performance I would get.
What is the best (nice code and fast performance) solution to my problem ?
Thanks for your help !
This is not really the best way to do this, either from a pure OO point of view or a functional or procedural POV. First of all, your class A is really nothing more than a namespace that has to be instantiated. Personally, I'd just put its functions as free floating C-style ones - maybe in a namespace somewhere so that you get some kind of classification.
Here's how you'd do it in pure OO:
class Function
{
virtual double Execute(double value);
};
class Function1 : public Function
{
virtual double Execute(double value) { ... }
};
class FunctionProcessor
{
void Process(Function & f)
{
...
}
}
This way, you could instantiate Function1 and FunctionProcessor and send the Function1 object to the Process method. You could derive anything from Function and pass it to Process.
A similar, but more generic way to do it is to use templates:
template <class T>
class FunctionProcessor
{
void Process()
{
T & function;
...
}
}
You can pass anything at all as T, but in this case, T becomes a compile-time dependency, so you have to pass it in code. No dynamic stuff allowed here!
Here's another templated mechanism, this time using simple functions instead of classes:
template <class T>
void Process(T & function)
{
...
double v1 = function(x1);
double v2 = function(x2);
...
}
You can call this thing like this:
double function1(double val)
{
return blah;
}
struct function2
{
double operator()(double val) { return blah; }
};
// somewhere else
FunctionProcessor(function1);
FunctionProcessor(function2());
You can use this approach with anything that can be called with the right signature; simple functions, static methods in classes, functors (like struct function2 above), std::mem_fun objects, new-fangled c++11 lambdas,... And if you use functors, you can pass them parameters in the constructor, just like any object.
That last is probably what I'd do; it's the fastest, if you know what you're calling at compile time, and the simplest while reading the client code. If it has to be extremely loosely coupled for some reason, I'd go with the first class-based approach. I personally think that circumstance is quite rare, especially as you describe the problem.
If you still want to use your class A, make all the functions static if they don't need member access. Otherwise, look at std::mem_fun. I still discourage this approach.
If I understood correctly, what you're searching for seems to be pointer to member functions:
double do_sth(double (A::*function)(double));
For calling, you would however also need an object of class A. You could also pass that into function_processor in the constructor.
Not sure about the performance of this, though.

Dynamically construct function

I fear something like this is answered somewhere on this site, but I can't find it because I don't even know how to formulate the question. So here's the problem:
I have a voxel drowing function. First I calculate offsets, angles and stuff and after I do drowing. But I make few versions of every function because sometimes I want to copy pixel, sometimes blit, sometimes blit 3*3 square for every pixel for smoothing effect, sometimes just copy pixel to n*n pixels on the screen if object is resized. And there's tons of versions for that small part in the center of a function.
What can I do instead of writing 10 of same functions which differ only by central part of code? For performance reasons, passing a function pointer as an argument is not an option. I'm not sure making them inline will do the trick, because arguments I send differ: sometimes I calculate volume(Z value), sometimes I know pixels are drawn from bottom to top.
I assume there's some way of doing this stuff in C++ everybody knows about.
Please tell me what I need to learn to do this. Thanks.
The traditional OO approaches to this are the template method pattern and the strategy pattern.
Template Method
The first is an extension of the technique described in Vincenzo's answer: instead of writing a simple non-virtual wrapper, you write a non-virtual function containing the whole algorithm. Those parts that might vary, are virtual function calls.
The specific arguments needed for a given implementation, are stored in the derived class object that provides that implementation.
eg.
class VoxelDrawer {
protected:
virtual void copy(Coord from, Coord to) = 0;
// any other functions you might want to change
public:
virtual ~VoxelDrawer() {}
void draw(arg) {
for (;;) {
// implement full algorithm
copy(a,b);
}
}
};
class SmoothedVoxelDrawer: public VoxelDrawer {
int radius; // algorithm-specific argument
void copy(Coord from, Coord to) {
blit(from.dx(-radius).dy(-radius),
to.dx(-radius).dy(-radius),
2*radius, 2*radius);
}
public:
SmoothedVoxelDrawer(int r) : radius(r) {}
};
Strategy
This is similar but instead of using inheritance, you pass a polymorphic Copier object as an argument to your function. Its more flexible in that it decouples your various copying strategies from the specific function, and you can re-use your copying strategies in other functions.
struct VoxelCopier {
virtual void operator()(Coord from, Coord to) = 0;
};
struct SmoothedVoxelCopier: public VoxelCopier {
// etc. as for SmoothedVoxelDrawer
};
void draw_voxels(arguments, VoxelCopier &copy) {
for (;;) {
// implement full algorithm
copy(a,b);
}
}
Although tidier than passing in a function pointer, neither the template method nor the strategy are likely to have better performance than just passing a function pointer: runtime polymorphism is still an indirect function call.
Policy
The modern C++ equivalent of the strategy pattern is the policy pattern. This simply replaces run-time polymorphism with compile-time polymorphism to avoid the indirect function call and enable inlining
// you don't need a common base class for policies,
// since templates use duck typing
struct SmoothedVoxelCopier {
int radius;
void copy(Coord from, Coord to) { ... }
};
template <typename CopyPolicy>
void draw_voxels(arguments, CopyPolicy cp) {
for (;;) {
// implement full algorithm
cp.copy(a,b);
}
}
Because of type deduction, you can simply call
draw_voxels(arguments, SmoothedVoxelCopier(radius));
draw_voxels(arguments, OtherVoxelCopier(whatever));
NB. I've been slightly inconsistent here: I used operator() to make my strategy call look like a regular function, but a normal method for my policy. So long as you choose one and stick with it, this is just a matter of taste.
CRTP Template Method
There's one final mechanism, which is the compile-time polymorphism version of the template method, and uses the Curiously Recurring Template Pattern.
template <typename Impl>
class VoxelDrawerBase {
protected:
Impl& impl() { return *static_cast<Impl*>(this); }
void copy(Coord from, Coord to) {...}
// *optional* default implementation, is *not* virtual
public:
void draw(arg) {
for (;;) {
// implement full algorithm
impl().copy(a,b);
}
}
};
class SmoothedVoxelDrawer: public VoxelDrawerBase<SmoothedVoxelDrawer> {
int radius; // algorithm-specific argument
void copy(Coord from, Coord to) {
blit(from.dx(-radius).dy(-radius),
to.dx(-radius).dy(-radius),
2*radius, 2*radius);
}
public:
SmoothedVoxelDrawer(int r) : radius(r) {}
};
Summary
In general I'd prefer the strategy/policy patterns for their lower coupling and better reuse, and choose the template method pattern only where the top-level algorithm you're parameterizing is genuinely set in stone (ie, when you're either refactoring existing code or are really sure of your analysis of the points of variation) and reuse is genuinely not an issue.
It's also really painful to use the template method if there is more than one axis of variation (that is, you have multiple methods like copy, and want to vary their implementations independently). You either end up with code duplication or mixin inheritance.
I suggest using the NVI idiom.
You have your public method which calls a private function that implements the logic that must differ from case to case.
Derived classes will have to provide an implementation of that private function that specializes them for their particular task.
Example:
class A {
public:
void do_base() {
// [pre]
specialized_do();
// [post]
}
private:
virtual void specialized_do() = 0;
};
class B : public A {
private:
void specialized_do() {
// [implementation]
}
};
The advantage is that you can keep a common implementation in the base class and detail it as required for any subclass (which just need to reimplement the specialized_do method).
The disadvantage is that you need a different type for each implementation, but if your use case is drawing different UI elements, this is the way to go.
You could simply use the strategy pattern
So, instead of something like
void do_something_one_way(...)
{
//blah
//blah
//blah
one_way();
//blah
//blah
}
void do_something_another_way(...)
{
//blah
//blah
//blah
another_way();
//blah
//blah
}
You will have
void do_something(...)
{
//blah
//blah
//blah
any_which_way();
//blah
//blah
}
any_which_way could be a lambda, a functor, a virtual member function of a strategy class passed in. There are many options.
Are you sure that
"passing a function pointer as an argument is not an option"
Does it really slow it down?
You could use higher order functions, if your 'central part' can be parameterized nicely.
Here is a simple example of a function that returns a function which adds n to its argument:
#include <iostream>
#include<functional>
std::function<int(int)> n_adder(int n)
{
return [=](int x){return x+n;};
}
int main()
{
auto add_one = n_adder(1);
std::cout<<add_one(5);
}
You can use either Template Method pattern or Strategy pattern.
Usually Template method pattern is used in white-box frameworks, when you need to know about the internal structure of a framework to correctly subclass a class.
Strategy pattern is usually used in black-box frameworks, when you should not know about the implementation of the framework, since you only need to understand the contract of the methods you should implement.
For performance reasons, passing a function pointer as an argument is not an option.
Are you sure that passing one additional parameter and will cause performance problems? In this case you may have similar performance penalties if you use OOP techniques, like Template method or Strategy. But it is usually necessary to use profilier to determine what is the source of the performance degradation. Virtual calls, passing additional parameters, calling function through a pointer are usually very cheap, comparing to complex algorithms. You may find that these techniques consumes insignificant percent of CPU resources comparing to other code.
I'm not sure making them inline will do the trick, because arguments I send differ: sometimes I calculate volume(Z value), sometimes I know pixels are drawn from bottom to top.
You could pass all the parameter required for drawing in all cases. Alternatively if use Tempate method pattern a base class could provide methods that can return the data that could be required for drawing in different cases. In Strategy pattern, you could pass an instance of an object that could provide this kind of data to a Strategy implementation.

several classes implement parent class with varying api

I have a class Feature with a pure virtual method.
class Feature {
public:
virtual ~Feature() {}
virtual const float getValue(const vector<int>& v) const = 0;
};
This class is implemented by several classes, for example FeatureA and FeatureB.
A separate class Computer (simplified) uses the getValue method to do some computation.
class Computer {
public:
const float compute(const vector<Feature*>& features, const vector<int>& v) {
float res = 0;
for (int i = 0; i < features.size(); ++i) {
res += features[i]->getValue(v);
}
return res;
}
};
Now, I am would like to implement FeatureC but I realize that I need additional information in the getValue method. The method in FeatureC looks like
const float getValue(const vector<int>& v, const vector<int>& additionalInfo) const;
I can of course modify the signature of getValue in Feature, FeatureA, FeatureB to take additionalInfo as a parameter and also add additionalInfo as a parameter in the compute method. But then I may have to modify all those signatures again later if I want to implement FeatureD that needs even more additional info. I wonder if there is a more elegant solution to this or if there is a known design pattern that you can point me to for further reading.
You have at least two options:
Instead of passing the single vector to getValue(), pass a struct. In this struct you can put the vector today, and more data tomorrow. Of course, if some concrete runs of your program don't need the extra fields, the need to compute them might be wasteful. But it will impose no performance penalty if you always need to compute all the data anyway (i.e. if there will always be one FeatureC).
Pass to getValue() a reference to an object having methods to get the necessary data. This object could be the Computer itself, or some simpler proxy. Then the getValue() implementations can request exactly what they need, and it can be lazily computed. The laziness will eliminate wasted computations in some cases, but the overall structure of doing it this way will impose some small constant overhead due to having to call (possibly virtual) functions to get the various data.
Requiring the user of your Feature class hierarchy to call different methods based on class defeats polymorphism. Once you start doing dynamic_cast<>() you know you should be rethinking your design.
If a subclass requires information that it can only get from its caller, you should change the getValue() method to take an additionalInfo argument, and simply ignore that information in classes where it doesn't matter.
If FeatureC can get additionalInfo by calling another class or function, that's usually a better approach, as it limits the number of classes that need to know about it. Perhaps the data is available from an object which FeatureC is given access to via its constructor, or from a singleton object, or it can be calculated by calling a function. Finding the best approach requires a bit more knowledge about the case.
This problem is addressed in item 39 of C++ Coding Standards (Sutter, Alexandrescu), which is titled "Consider making virtual functions nonpublic, and public functions nonvirtual."
In particular, one of the motivations for following the Non-Virtual-Interface design pattern (this is what the item is all about) is stated as
Each interface can take its natural shape: When we separate the public interface
from the customization interface, each can easily take the form it naturally
wants to take instead of trying to find a compromise that forces them to look
identical. Often, the two interfaces want different numbers of functions and/or
different parameters; [...]
This is particularly useful
In base classes with a high cost of change
Another design pattern which is very useful in this case is the Visitor pattern. As for the NVI it applies when base classes (as well as the whole hierarchy) have a high cost of change. You can find plenty of discussion about this design pattern, I suggest you to read the related chapter in Modern C++ (Alexandrescu), which (on the side) gives you a great insight on how to use the (very easy to use) Visitor facilities in loki
I suggest for you to read all of this material and then edit the question so that we can give you a better answer. We can come up with all sort of solutions (e.g. use an additional method which gives the class the additional parameters, if needed) which might well not suit your case.
Try to address the following questions:
would a template-based solution fit the problem?
would it be feasible to add a new layer of indirection when calling the function?
would a "push argument"-"push argument"-...-"push argument"-"call function" method be of help? (this might seem very odd at first, but
think to something like "cout << arg << arg << arg << endl", where
"endl" is the "call function")
how do you intend to distinguish how to call the function in Computer::compute?
Now that we had some "theory", let's aim for the practice using the Visitor pattern:
#include <iostream>
using namespace std;
class FeatureA;
class FeatureB;
class Computer{
public:
int visitA(FeatureA& f);
int visitB(FeatureB& f);
};
class Feature {
public:
virtual ~Feature() {}
virtual int accept(Computer&) = 0;
};
class FeatureA{
public:
int accept(Computer& c){
return c.visitA(*this);
}
int compute(int a){
return a+1;
}
};
class FeatureB{
public:
int accept(Computer& c){
return c.visitB(*this);
}
int compute(int a, int b){
return a+b;
}
};
int Computer::visitA(FeatureA& f){
return f.compute(1);
}
int Computer::visitB(FeatureB& f){
return f.compute(1, 2);
}
int main()
{
FeatureA a;
FeatureB b;
Computer c;
cout << a.accept(c) << '\t' << b.accept(c) << endl;
}
You can try this code here.
This is a rough implementation of the Visitor pattern which, as you can see, solves your problem. I strongly advice you not to try to implement it this way, there are obvious dependency problems which can be solved by means of a refinement called the Acyclic Visitor. It is already implemented in Loki, so there is no need to worry about implementing it.
Apart from implementation, as you can see you are not relying on type switches (which, as somebody else pointed out, you should avoid whenever possible) and you are not requiring the classes to have any particular interface (e.g. one argument for the compute function). Moreover, if the visitor class is a hierarchy (make Computer a base class in the example), you won't need to add any new function to the hierarchy when you want to add functionalities of this sort.
If you don't like the visitA, visitB, ... "pattern", worry not: this is just a trivial implementation and you don't need that. Basically, in a real implementation you use template specialization of a visit function.
Hope this helped, I had put a lot of effort into it :)
Virtual functions, to work correctly, needs to have exactly the same "signature" (same parameters and same return type). Otherwise, you just get a "new member function", which isn't what you want.
The real question here is "how does the calling code know it needs the extra information".
You can solve this in a few different ways - the first one is to always pass in const vector <int>& additionalInfo, whether it's needed or not.
If that's not possible, because there isn't any additionalInfo except for in the case of FeatureC, you could have an "optional" parameter - which means use a pointer to vector (vector<int>* additionalInfo), which is NULL when the value is not available.
Of course if additionalInfo is a value that is something that can be stored in the FeatureC class, then that would also work.
Another option is to extend the base class Feature to have two more options:
class Feature {
public:
virtual ~Feature() {}
virtual const float getValue(const vector<int>& v) const = 0;
virtual const float getValue(const vector<int>& v, const vector<int>& additionalInfo) { return -1.0; };
virtual bool useAdditionalInfo() { return false; }
};
and then make your loop something like this:
for (int i = 0; i < features.size(); ++i) {
if (features[i]->useAdditionalInfo())
{
res += features[i]->getValue(v, additionalInfo);
}
else
{
res += features[i]->getValue(v);
}
}

Why doesn't a const method override a non-const method in C++?

Consider this simple program:
class Shape
{
public:
virtual double getArea() = 0;
};
class Rectangle : public Shape
{
int width;
int height;
public:
Rectangle( int w , int h ) :width(w) , height(h) {}
double getArea() const
{
return width * height;
}
};
int main() {
Rectangle* r = new Rectangle(4,2);
}
Trying to compile this problem gives me:
'Rectangle' : cannot instantiate abstract class
Why is this not allowed in C++ when covariant return types are? Of course I can fix the program by making Rectangle::getArea to be a non-const function but I am curious as to why the language designers decided otherwise.
EDIT
Lots of people have mentioned in their answers how the signature is different. But so is
class Shape
{
public:
virtual BaseArea* getArea() = 0;
};
class Rectangle : public Shape
{
public:
virtual RectangleArea* getArea();
};
but C++ goes out of its way to allow it, when C# doesn't.
C++ supports covariant return types because if I expect an interface to return a BaseArea* and an implemention returns a RectangleArea*, it is ok as long as RectangleArea derives from BaseArea because my contract is met.
On the same lines, isn't an implementation that provides a non-mutating function satisfying an interface that only asks for a mutating function?
What would happen in this case:
struct base
{
virtual void foo(); // May implement copy-on write
virtual void foo() const;
};
struct derived : base
{
// Only specialize the const version, the non const
// default suits me well. How would I specify that I don't
// want the non-const version to be overriden ?
void foo() const;
};
Because you failed to override
virtual double getArea() = 0;
That is not the same as
virtual double getArea() const = 0;
It's not the same function; both could coexist within a single class definition.
It is because a const method has a different signature than a non const method. So the compiler is looking for a method that is not implemented. Presumably, the non const version could do something quite different from the const version. However, if that is the semantic you want, you can easily provide it:
class Shape
{
public:
virtual double getArea() {
return static_cast<const Shape *>(this)->getArea();
}
virtual double getArea() const = 0;
};
In your edit, you provide an example of covariant return types:
C++ supports covariant return types because if i expect an interface to return an BaseArea* and an implemenation that returns as RectangleArea* it is ok if RectangleArea derives from BaseArea because my contract is met.
The return value of a method or a function has never been a part of its signature. The const-ness of a method has no bearing on the return value. It affects the object of the method call, essentially making the first argument of the function a pointer to const object. And function and method arguments control its signature (among other things, like its name, whether it's static, et al.).
You asked more specifically:
In the same lines isn't an implementation that provides an non-mutating function satisfying an interface that only asks for a mutating function.
Even if the language could (and I think it in fact does in some circumstance) resolve to the const version, it still requires that an implementation be provided for the non const version, since that is the method that was declared as pure virtual.
For a case of calling the const version when the non const is also available, consider this:
class Shape {
public:
virtual double getArea() {
std::cout << "non-const getArea" << std::endl;
return static_cast<const Shape *>(this)->getArea();
}
virtual double getArea() const = 0;
};
Coupled with the Rectangle in your example. Then:
Rectangle *r = new Rectangle(4,2);
Shape *s = r;
r->getArea(); // calls const version
s->getArea(); // call non-const version
Most of the answer say why it's not allowed in terms of the current rules of the language, instead of saying why the rules were written that way. I'll try to answer why the rules couldn't have been the way you suggest.
Stroustrup's Design & Evolution of C++ book describes why the overriding rules were relaxed to allow covariant returns, which weren't always allowed in C++. So one answer to your question would be that originally overrides had to be exact matches for the signature, and an exception was made for "compatible" return types that don't weaken the contract of a virtual function. It's possible they just weren't relaxed further because noone thought of it or noone suggested it. D&E does mention other possible relaxations of the overriding rules but says "We felt that the benefits from allowing such conversions through overriding would not outweigh the implementation cost and the potential for confusing users." That's relevant because I think your idea has plenty of potential for confusing users, and can actually cause safety problems, specifically it weakens the type system.
Consider:
class Square : public Rectangle
{
public:
explicit Square(int side) : Rectangle(side, side) { }
virtual double getArea() // N.B. non-const, overrides Shape::getArea
{
// class author decides this would be a sensible "sanity check"
// (I'm not suggesting this is a good implementation)
if (height != width)
height = width;
return Rectangle::getArea();
}
};
const Square s(2);
int main()
{
double (Rectangle::*area)() const = &Rectangle::getArea;
double d = (s.*area)();
}
I believe that your idea would make this code valid, a const member function is invoked on a const object, but actually it is a virtual function so it calls Square::getArea() which is non-const and so it tries to modify a const object, which could be stored in read-only memory and so would result in a segfault.
This is just one example of how allowing your overriding relaxation in the Shape example could result in undefined behaviour, I'm sure in more realistic code there could be bigger, maybe subtler problems.
You could argue that the compiler should not allow a non-const function to override Rectangle::getArea and so should reject Square::getArea ("once a virtual function has gone const it can't go back") but that would make hierarchies very fragile. Adding or removing intermediate base classes with getArea functions with different constness would change whether Square::getArea() is an override or an overload. There is already some fragility like this with virtual functions, especially covariant returns, but according to D&E Stroustrup considered covariant returns useful because "the relaxation allows people to do something important within the type system instead of using casts." I don't think allowing const functions to override non-const ones fits nicely within the type-system, and doesn't allow doing anything important, and doesn't get rid of casts to allow a new (safe) techniques to be used.
A method can be overloaded to have both a version that is const and one that is not. (Since the return type is not part of the unique signature of a method, and a method might accept no arguments, const can be the only way to distinguish between something that returns, say, const X* or X*.)
If the compiler did not require you to be precise about this, someone could add a non-const version to the base class and suddenly you'd be overriding a different method.
The const part has been mentioned so I skip this.
Let's assume you consider your base class as a contract you want to be sure to fulfill.
Since const and non-const versions of a function can coexist and should coextist you should be told if you do not fulfill the given contract.
Imagine a base class that has a const and non const method, e.g. a container with operator[] in both flavors. Now you inherit but do not provide both functions.
Your child does not fulfill the contract and does not provide the required functionality. So you should get an error since your child might not be usable via polymorphism

How to implement the factory method pattern in C++ correctly

There's this one thing in C++ which has been making me feel uncomfortable for quite a long time, because I honestly don't know how to do it, even though it sounds simple:
How do I implement Factory Method in C++ correctly?
Goal: to make it possible to allow the client to instantiate some object using factory methods instead of the object's constructors, without unacceptable consequences and a performance hit.
By "Factory method pattern", I mean both static factory methods inside an object or methods defined in another class, or global functions. Just generally "the concept of redirecting the normal way of instantiation of class X to anywhere else than the constructor".
Let me skim through some possible answers which I have thought of.
0) Don't make factories, make constructors.
This sounds nice (and indeed often the best solution), but is not a general remedy. First of all, there are cases when object construction is a task complex enough to justify its extraction to another class. But even putting that fact aside, even for simple objects using just constructors often won't do.
The simplest example I know is a 2-D Vector class. So simple, yet tricky. I want to be able to construct it both from both Cartesian and polar coordinates. Obviously, I cannot do:
struct Vec2 {
Vec2(float x, float y);
Vec2(float angle, float magnitude); // not a valid overload!
// ...
};
My natural way of thinking is then:
struct Vec2 {
static Vec2 fromLinear(float x, float y);
static Vec2 fromPolar(float angle, float magnitude);
// ...
};
Which, instead of constructors, leads me to usage of static factory methods... which essentially means that I'm implementing the factory pattern, in some way ("the class becomes its own factory"). This looks nice (and would suit this particular case), but fails in some cases, which I'm going to describe in point 2. Do read on.
another case: trying to overload by two opaque typedefs of some API (such as GUIDs of unrelated domains, or a GUID and a bitfield), types semantically totally different (so - in theory - valid overloads) but which actually turn out to be the same thing - like unsigned ints or void pointers.
1) The Java Way
Java has it simple, as we only have dynamic-allocated objects. Making a factory is as trivial as:
class FooFactory {
public Foo createFooInSomeWay() {
// can be a static method as well,
// if we don't need the factory to provide its own object semantics
// and just serve as a group of methods
return new Foo(some, args);
}
}
In C++, this translates to:
class FooFactory {
public:
Foo* createFooInSomeWay() {
return new Foo(some, args);
}
};
Cool? Often, indeed. But then- this forces the user to only use dynamic allocation. Static allocation is what makes C++ complex, but is also what often makes it powerful. Also, I believe that there exist some targets (keyword: embedded) which don't allow for dynamic allocation. And that doesn't imply that the users of those platforms like to write clean OOP.
Anyway, philosophy aside: In the general case, I don't want to force the users of the factory to be restrained to dynamic allocation.
2) Return-by-value
OK, so we know that 1) is cool when we want dynamic allocation. Why won't we add static allocation on top of that?
class FooFactory {
public:
Foo* createFooInSomeWay() {
return new Foo(some, args);
}
Foo createFooInSomeWay() {
return Foo(some, args);
}
};
What? We can't overload by the return type? Oh, of course we can't. So let's change the method names to reflect that. And yes, I've written the invalid code example above just to stress how much I dislike the need to change the method name, for example because we cannot implement a language-agnostic factory design properly now, since we have to change names - and every user of this code will need to remember that difference of the implementation from the specification.
class FooFactory {
public:
Foo* createDynamicFooInSomeWay() {
return new Foo(some, args);
}
Foo createFooObjectInSomeWay() {
return Foo(some, args);
}
};
OK... there we have it. It's ugly, as we need to change the method name. It's imperfect, since we need to write the same code twice. But once done, it works. Right?
Well, usually. But sometimes it does not. When creating Foo, we actually depend on the compiler to do the return value optimisation for us, because the C++ standard is benevolent enough for the compiler vendors not to specify when will the object created in-place and when will it be copied when returning a temporary object by value in C++. So if Foo is expensive to copy, this approach is risky.
And what if Foo is not copiable at all? Well, doh. (Note that in C++17 with guaranteed copy elision, not-being-copiable is no problem anymore for the code above)
Conclusion: Making a factory by returning an object is indeed a solution for some cases (such as the 2-D vector previously mentioned), but still not a general replacement for constructors.
3) Two-phase construction
Another thing that someone would probably come up with is separating the issue of object allocation and its initialisation. This usually results in code like this:
class Foo {
public:
Foo() {
// empty or almost empty
}
// ...
};
class FooFactory {
public:
void createFooInSomeWay(Foo& foo, some, args);
};
void clientCode() {
Foo staticFoo;
auto_ptr<Foo> dynamicFoo = new Foo();
FooFactory factory;
factory.createFooInSomeWay(&staticFoo);
factory.createFooInSomeWay(&dynamicFoo.get());
// ...
}
One may think it works like a charm. The only price we pay for in our code...
Since I've written all of this and left this as the last, I must dislike it too. :) Why?
First of all... I sincerely dislike the concept of two-phase construction and I feel guilty when I use it. If I design my objects with the assertion that "if it exists, it is in valid state", I feel that my code is safer and less error-prone. I like it that way.
Having to drop that convention AND changing the design of my object just for the purpose of making factory of it is.. well, unwieldy.
I know that the above won't convince many people, so let's me give some more solid arguments. Using two-phase construction, you cannot:
initialise const or reference member variables,
pass arguments to base class constructors and member object constructors.
And probably there could be some more drawbacks which I can't think of right now, and I don't even feel particularly obliged to since the above bullet points convince me already.
So: not even close to a good general solution for implementing a factory.
Conclusions:
We want to have a way of object instantiation which would:
allow for uniform instantiation regardless of allocation,
give different, meaningful names to construction methods (thus not relying on by-argument overloading),
not introduce a significant performance hit and, preferably, a significant code bloat hit, especially at client side,
be general, as in: possible to be introduced for any class.
I believe I have proven that the ways I have mentioned don't fulfil those requirements.
Any hints? Please provide me with a solution, I don't want to think that this language won't allow me to properly implement such a trivial concept.
First of all, there are cases when
object construction is a task complex
enough to justify its extraction to
another class.
I believe this point is incorrect. The complexity doesn't really matter. The relevance is what does. If an object can be constructed in one step (not like in the builder pattern), the constructor is the right place to do it. If you really need another class to perform the job, then it should be a helper class that is used from the constructor anyway.
Vec2(float x, float y);
Vec2(float angle, float magnitude); // not a valid overload!
There is an easy workaround for this:
struct Cartesian {
inline Cartesian(float x, float y): x(x), y(y) {}
float x, y;
};
struct Polar {
inline Polar(float angle, float magnitude): angle(angle), magnitude(magnitude) {}
float angle, magnitude;
};
Vec2(const Cartesian &cartesian);
Vec2(const Polar &polar);
The only disadvantage is that it looks a bit verbose:
Vec2 v2(Vec2::Cartesian(3.0f, 4.0f));
But the good thing is that you can immediately see what coordinate type you're using, and at the same time you don't have to worry about copying. If you want copying, and it's expensive (as proven by profiling, of course), you may wish to use something like Qt's shared classes to avoid copying overhead.
As for the allocation type, the main reason to use the factory pattern is usually polymorphism. Constructors can't be virtual, and even if they could, it wouldn't make much sense. When using static or stack allocation, you can't create objects in a polymorphic way because the compiler needs to know the exact size. So it works only with pointers and references. And returning a reference from a factory doesn't work too, because while an object technically can be deleted by reference, it could be rather confusing and bug-prone, see Is the practice of returning a C++ reference variable, evil? for example. So pointers are the only thing that's left, and that includes smart pointers too. In other words, factories are most useful when used with dynamic allocation, so you can do things like this:
class Abstract {
public:
virtual void do() = 0;
};
class Factory {
public:
Abstract *create();
};
Factory f;
Abstract *a = f.create();
a->do();
In other cases, factories just help to solve minor problems like those with overloads you have mentioned. It would be nice if it was possible to use them in a uniform way, but it doesn't hurt much that it is probably impossible.
Simple Factory Example:
// Factory returns object and ownership
// Caller responsible for deletion.
#include <memory>
class FactoryReleaseOwnership{
public:
std::unique_ptr<Foo> createFooInSomeWay(){
return std::unique_ptr<Foo>(new Foo(some, args));
}
};
// Factory retains object ownership
// Thus returning a reference.
#include <boost/ptr_container/ptr_vector.hpp>
class FactoryRetainOwnership{
boost::ptr_vector<Foo> myFoo;
public:
Foo& createFooInSomeWay(){
// Must take care that factory last longer than all references.
// Could make myFoo static so it last as long as the application.
myFoo.push_back(new Foo(some, args));
return myFoo.back();
}
};
Have you thought about not using a factory at all, and instead making nice use of the type system? I can think of two different approaches which do this sort of thing:
Option 1:
struct linear {
linear(float x, float y) : x_(x), y_(y){}
float x_;
float y_;
};
struct polar {
polar(float angle, float magnitude) : angle_(angle), magnitude_(magnitude) {}
float angle_;
float magnitude_;
};
struct Vec2 {
explicit Vec2(const linear &l) { /* ... */ }
explicit Vec2(const polar &p) { /* ... */ }
};
Which lets you write things like:
Vec2 v(linear(1.0, 2.0));
Option 2:
you can use "tags" like the STL does with iterators and such. For example:
struct linear_coord_tag linear_coord {}; // declare type and a global
struct polar_coord_tag polar_coord {};
struct Vec2 {
Vec2(float x, float y, const linear_coord_tag &) { /* ... */ }
Vec2(float angle, float magnitude, const polar_coord_tag &) { /* ... */ }
};
This second approach lets you write code which looks like this:
Vec2 v(1.0, 2.0, linear_coord);
which is also nice and expressive while allowing you to have unique prototypes for each constructor.
You can read a very good solution in: http://www.codeproject.com/Articles/363338/Factory-Pattern-in-Cplusplus
The best solution is on the "comments and discussions", see the "No need for static Create methods".
From this idea, I've done a factory. Note that I'm using Qt, but you can change QMap and QString for std equivalents.
#ifndef FACTORY_H
#define FACTORY_H
#include <QMap>
#include <QString>
template <typename T>
class Factory
{
public:
template <typename TDerived>
void registerType(QString name)
{
static_assert(std::is_base_of<T, TDerived>::value, "Factory::registerType doesn't accept this type because doesn't derive from base class");
_createFuncs[name] = &createFunc<TDerived>;
}
T* create(QString name) {
typename QMap<QString,PCreateFunc>::const_iterator it = _createFuncs.find(name);
if (it != _createFuncs.end()) {
return it.value()();
}
return nullptr;
}
private:
template <typename TDerived>
static T* createFunc()
{
return new TDerived();
}
typedef T* (*PCreateFunc)();
QMap<QString,PCreateFunc> _createFuncs;
};
#endif // FACTORY_H
Sample usage:
Factory<BaseClass> f;
f.registerType<Descendant1>("Descendant1");
f.registerType<Descendant2>("Descendant2");
Descendant1* d1 = static_cast<Descendant1*>(f.create("Descendant1"));
Descendant2* d2 = static_cast<Descendant2*>(f.create("Descendant2"));
BaseClass *b1 = f.create("Descendant1");
BaseClass *b2 = f.create("Descendant2");
I mostly agree with the accepted answer, but there is a C++11 option that has not been covered in existing answers:
Return factory method results by value, and
Provide a cheap move constructor.
Example:
struct sandwich {
// Factory methods.
static sandwich ham();
static sandwich spam();
// Move constructor.
sandwich(sandwich &&);
// etc.
};
Then you can construct objects on the stack:
sandwich mine{sandwich::ham()};
As subobjects of other things:
auto lunch = std::make_pair(sandwich::spam(), apple{});
Or dynamically allocated:
auto ptr = std::make_shared<sandwich>(sandwich::ham());
When might I use this?
If, on a public constructor, it is not possible to give meaningful initialisers for all class members without some preliminary calculation, then I might convert that constructor to a static method. The static method performs the preliminary calculations, then returns a value result via a private constructor which just does a member-wise initialisation.
I say 'might' because it depends on which approach gives the clearest code without being unnecessarily inefficient.
Loki has both a Factory Method and an Abstract Factory. Both are documented (extensively) in Modern C++ Design, by Andei Alexandrescu. The factory method is probably closer to what you seem to be after, though it's still a bit different (at least if memory serves, it requires you to register a type before the factory can create objects of that type).
I don't try to answer all of my questions, as I believe it is too broad. Just a couple of notes:
there are cases when object construction is a task complex enough to justify its extraction to another class.
That class is in fact a Builder, rather than a Factory.
In the general case, I don't want to force the users of the factory to be restrained to dynamic allocation.
Then you could have your factory encapsulate it in a smart pointer. I believe this way you can have your cake and eat it too.
This also eliminates the issues related to return-by-value.
Conclusion: Making a factory by returning an object is indeed a solution for some cases (such as the 2-D vector previously mentioned), but still not a general replacement for constructors.
Indeed. All design patterns have their (language specific) constraints and drawbacks. It is recommended to use them only when they help you solve your problem, not for their own sake.
If you are after the "perfect" factory implementation, well, good luck.
This is my c++11 style solution. parameter 'base' is for base class of all sub-classes. creators, are std::function objects to create sub-class instances, might be a binding to your sub-class' static member function 'create(some args)'. This maybe not perfect but works for me. And it is kinda 'general' solution.
template <class base, class... params> class factory {
public:
factory() {}
factory(const factory &) = delete;
factory &operator=(const factory &) = delete;
auto create(const std::string name, params... args) {
auto key = your_hash_func(name.c_str(), name.size());
return std::move(create(key, args...));
}
auto create(key_t key, params... args) {
std::unique_ptr<base> obj{creators_[key](args...)};
return obj;
}
void register_creator(const std::string name,
std::function<base *(params...)> &&creator) {
auto key = your_hash_func(name.c_str(), name.size());
creators_[key] = std::move(creator);
}
protected:
std::unordered_map<key_t, std::function<base *(params...)>> creators_;
};
An example on usage.
class base {
public:
base(int val) : val_(val) {}
virtual ~base() { std::cout << "base destroyed\n"; }
protected:
int val_ = 0;
};
class foo : public base {
public:
foo(int val) : base(val) { std::cout << "foo " << val << " \n"; }
static foo *create(int val) { return new foo(val); }
virtual ~foo() { std::cout << "foo destroyed\n"; }
};
class bar : public base {
public:
bar(int val) : base(val) { std::cout << "bar " << val << "\n"; }
static bar *create(int val) { return new bar(val); }
virtual ~bar() { std::cout << "bar destroyed\n"; }
};
int main() {
common::factory<base, int> factory;
auto foo_creator = std::bind(&foo::create, std::placeholders::_1);
auto bar_creator = std::bind(&bar::create, std::placeholders::_1);
factory.register_creator("foo", foo_creator);
factory.register_creator("bar", bar_creator);
{
auto foo_obj = std::move(factory.create("foo", 80));
foo_obj.reset();
}
{
auto bar_obj = std::move(factory.create("bar", 90));
bar_obj.reset();
}
}
Factory Pattern
class Point
{
public:
static Point Cartesian(double x, double y);
private:
};
And if you compiler does not support Return Value Optimization, ditch it, it probably does not contain much optimization at all...
extern std::pair<std::string_view, Base*(*)()> const factories[2];
decltype(factories) factories{
{"blah", []() -> Base*{return new Blah;}},
{"foo", []() -> Base*{return new Foo;}}
};
I know this question has been answered 3 years ago, but this may be what your were looking for.
Google has released a couple of weeks ago a library allowing easy and flexible dynamic object allocations. Here it is: http://google-opensource.blogspot.fr/2014/01/introducing-infact-library.html