Flexible design despite strongly dependent classes - c++

I'm working on a code which needs to be extremely flexible in nature, i.e. especially very easy to extend later also by other people. But I'm facing a problem now about which I do not even know in principal how to properly deal with:
I'm having a rather complex Algorithm, which at some point is supposed to converge. But due to its complexity there are several different criteria to check for convergence, and depending on the circumstances (or input) I would want to have different convergence criteria activated. Also it should easily be possible to create new convergence criteria without having to touch the algorithm itself. So ideally I would like to have an abstract ConvergenceChecker class from which I can inherit and let the algorithm have a vector of those, e.g. like this:
//Algorithm.h (with include guards of course)
class Algorithm {
//...
vector<ConvergenceChecker*> _convChecker;
}
//Algorithm.cpp
void runAlgorithm() {
bool converged=false;
while(true){
//Algorithm performs a cycle
for (unsigned i=0; i<_convChecker.size(); i++) {
// Check for convergence with each criterion
converged=_convChecker[i]->isConverged();
// If this criterion is not satisfied, forget about the following ones
if (!converged) { break; }
}
// If all are converged, break out of the while loop
if (converged) { break; }
}
}
The problem with this is that each ConvergenceChecker needs to know something about the currently running Algorithm, but each one might need to know totally different things from the algorithm. Say the Algorithm changes _foo _bar and _fooBar during each cycle, but one possible ConvergenceChecker only needs to know _foo, another one _foo and _bar, and it might be that some day a ConvergenceChecker needing _fooBar will be implemented. Here are some ways I already tried to solve this:
Give the function isConverged() a large argument list (containing _foo, _bar, and _fooBar). Disadvantages: Most of the variables used as arguments will not be used in most cases, and if the Algorithm would be extended by another variable (or a similar algorithm inherits from it and adds some variables) quite some code would have to be modified. -> possible, but ugly
Give the function isConverged() the Algorithm itself (or a pointer to it) as an argument. Problem: Circular dependency.
Declare isConverged() as a friend function. Problem (among others): Cannot be defined as a member function of different ConvergenceCheckers.
Use an array of function pointers. Does not solve the problem at all, and also: where to define them?
(Just came up with this while writing this question) Use a different class which holds the data, say AlgorithmData having Algorithm as a friend class, then provide the AlgorithmData as a function argument. So, like 2. but maybe getting around circular dependency problems. (Did not test this yet.)
I'd be happy to hear your solutions about this (and problems you see with 5.).
Further notes:
Question title: I'm aware that 'strongly dependent classes' already says that most probably one is doing something very wrong with designing the code, still I guess a lot of people might end up with having that problem and would like to hear possibilities to avoid it, so I'd rather keep that ugly expression.
Too easy?: Actually the problem I presented here was not complete. There will be a lot of different Algorithms in the code inheriting from each other, and the ConvergenceCheckers should of course ideally work in appropriate cases without any further modification even if new Algorithms come up. Feel free to comment on this as well.
Question style: I hope the question is neither too abstract nor too special, and I hope it did not get too long and is understandable. So please also don't hesitate to comment on the way I ask this question so that I can improve on that.

Actually, your solution 5 sounds good.
When in danger of introducing circular dependencies, the best remedy usually is to extract the part that both need, and moving it to a separate entity; exactly as extracting the data used by the algorithm into a separate class/struct would do in your case!

Another solution would be passing your checker an object that provides the current algorithm state in response to parameter names expressed as string names. This makes it possible to separately compile your conversion strategies, because the interface of this "callback" interface stays the same even if you add more parameters to your algorithm:
struct AbstractAlgorithmState {
virtual double getDoubleByName(const string& name) = 0;
virtual int getIntByName(const string& name) = 0;
};
struct ConvergenceChecker {
virtual bool converged(const AbstractAlgorithmState& state) = 0;
};
That is all the implementers of the convergence checker need to see: they implement the checker, and get the state.
You can now build a class that is tightly coupled with your algorithm implementation to implement AbstractAlgorithmState and get the parameter based on its name. This tightly coupled class is private to your implementation, though: the callers see only its interface, which never changes:
class PrivateAlgorithmState : public AbstractAlgorithmState {
private:
const Algorithm &algorithm;
public:
PrivateAlgorithmState(const Algorithm &alg) : algorithm(alg) {}
...
// Implement getters here
}
void runAlgorithm() {
PrivateAlgorithmState state(*this);
...
converged=_convChecker[i]->converged(state);
}

Using a separate data/state structure seems easy enough - just pass it to the checker as a const reference for read-only access.
class Algorithm {
public:
struct State {
double foo_;
double bar_;
double foobar_;
};
struct ConvergenceChecker {
virtual ~ConvergenceChecker();
virtual bool isConverged(State const &) = 0;
}
void addChecker(std::unique_ptr<ConvergenceChecker>);
private:
std::vector<std::unique_ptr<ConvergenceChecker>> checkers_;
State state_;
bool isConverged() {
const State& csr = state_;
return std::all_of(checkers_.begin(),
checkers_.end(),
[csr](std::unique_ptr<ConvergenceChecker> &cc) {
return cc->isConverged(csr);
});
}
};

Maybe the decorator pattern can help in simplifying an (unknown) set of convergence checks. This way you can keep the algorithm itself agnostic to what convergence checks may occur and you don't require a container for all the checks.
You would get something along these lines:
class ConvergenceCheck {
private:
ConvergenceCheck *check;
protected:
ConvergenceCheck(ConvergenceCheck *check):check(check){}
public:
bool converged() const{
if(check && check->converged()) return true;
return thisCheck();
}
virtual bool thisCheck() const=0;
virtual ~ConvergenceCheck(){ delete check; }
};
struct Check1 : ConvergenceCheck {
public:
Check1(ConvergenceCheck* check):ConvergenceCheck(check) {}
bool thisCheck() const{ /* whatever logic you like */ }
};
You can then make arbitrary complex combinations of convergence checks while only keeping one ConvergenceCheck* member in Algorithm. For example, if you want to check two criteria (implemented in Check1 and Check2):
ConvergenceCheck *complex=new Check2(new Check1(nullptr));
The code is not complete, but you get the idea. Additionally, if you are a performance fanatic and are afraid of the virtual function call (thisCheck), you can apply the curiously returning template pattern to eliminate that.
Here is a complete example of decorators to check constraints on an int, to give an idea of how it works:
#include <iostream>
class Check {
private:
Check *check_;
protected:
Check(Check *check):check_(check){}
public:
bool check(int test) const{
if(check_ && !check_->check(test)) return false;
return thisCheck(test);
}
virtual bool thisCheck(int test) const=0;
virtual ~Check(){ delete check_; }
};
class LessThan5 : public Check {
public:
LessThan5():Check(NULL){};
LessThan5(Check* check):Check(check) {};
bool thisCheck(int test) const{ return test < 5; }
};
class MoreThan3 : public Check{
public:
MoreThan3():Check(NULL){}
MoreThan3(Check* check):Check(check) {}
bool thisCheck(int test) const{ return test > 3; }
};
int main(){
Check *morethan3 = new MoreThan3();
Check *lessthan5 = new LessThan5();
Check *both = new LessThan5(new MoreThan3());
std::cout << morethan3->check(3) << " " << morethan3->check(4) << " " << morethan3->check(5) << std::endl;
std::cout << lessthan5->check(3) << " " << lessthan5->check(4) << " " << lessthan5->check(5) << std::endl;
std::cout << both->check(3) << " " << both->check(4) << " " << both->check(5);
}
Output:
0 1 1
1 1 0
0 1 0

Related

Mechanism for Save/Load+Undo/Redo with minimum boilerplate

I want to make an app where a user can edit a diagram (for example), which would provide the standard mechanisms of: Save, Load, Undo, and Redo.
A simple way to do it is to have classes for the diagram and for the various shapes in it, which implement serialization via save and load methods, and where all methods to edit them return UndoableActions that can be added to an UndoManager which calls their perform method and adds them to an undo stack.
The problem with the simple way described above is that it requires a lot of error-prone boilerplate work.
I know that the serialization (save/load) part of the work can be solved by using something like Google's Protocol Buffers or Apache Thrift, which generates the boiler-plate serialization code for you, but it doesn't solve the undo+redo problem. I know that for Objective C and Swift, Apple provides Core Data which does solve serialization + undo, but I'm not familiar with anything similar for C++.
Is there a good way non-error-prone to solve save+load+undo+redo with little boilerplate?
The problem with the simple way described above is that it requires a lot of error-prone boilerplate work.
I am not convinced that this is the case. Your approach sounds reasonable and using Modern C++ features and abstractions you can implement a safe and elegant interface for it.
For starters, you could use std::variant as a sum type for "undoable actions" - this will give you a type-safe tagged union for every action. (Consider using boost::variant or other implementations that can be easily found on Google if you do not have access to C++17). Example:
namespace action
{
// User dragged the shape to a separate position.
struct move_shape
{
shape_id _id;
offset _offset;
};
// User changed the color of a shape.
struct change_shape_color
{
shape_id _id;
color _previous;
color _new;
};
// ...more actions...
}
using undoable_action = std::variant<
action::move_shape,
action::change_shape_color,
// ...
>;
Now that you have a sum type for all your possible "undoable actions", you can define undo behavior by using pattern matching. I wrote two articles on variant "pattern matching" by overloading lambdas that you could find interesting:
"visiting variants using lambdas - part 1"
"visiting variants using lambdas - part 2"
Here's an example of how your undo function could look like:
void undo()
{
auto action = undo_stack.pop_and_get();
match(action, [&shapes](const move_shape& y)
{
// Revert shape movement.
shapes[y._id].move(-y._offset);
},
[&shapes](const change_shape_color& y)
{
// Revert shape color change.
shapes[y._id].set_color(y._previous);
},
[](auto)
{
// Produce a compile-time error.
struct undo_not_implemented;
undo_not_implemented{};
});
}
If every branch of match gets large, it could be moved to its own function for readability. Trying to instantiate undo_not_implemented or using a dependent static_assert is also a good idea: a compile-time error will be produced if you forget to implement behavior for a specific "undoable action".
That's pretty much it! If you want to save the undo_stack so that the history of actions is preserved in saved documents, you can implement a auto serialize(const undoable_action&) that, again, uses pattern matching to serialize the various actions. You could then implement a deserialize function that repopulates the undo_stack on file load.
If you find implementing serialization/deserialization for every action too tedious, consider using BOOST_HANA_DEFINE_STRUCT or similar solutions to automatically generate serialization/deserialization code.
Since you're concerned about battery and performance, I would also like to mention that using std::variant or similar tagged union constructs is on average faster and more lightweight compared to polymorphic hierarchies, as heap allocation is not required and as there is no run-time virtual dispatch.
About redo functionality: you could have a redo_stack and implement an auto invert(const undoable_action&) function that inverts the behavior of an action. Example:
void undo()
{
auto action = undo_stack.pop_and_get();
match(action, [&](const move_shape& y)
{
// Revert shape movement.
shapes[y._id].move(-y._offset);
redo_stack.push(invert(y));
},
// ...
auto invert(const undoable_action& x)
{
return match(x, [&](move_shape y)
{
y._offset *= -1;
return y;
},
// ...
If you follow this pattern, you can implement redo in terms of undo! Simply call undo by popping from the redo_stack instead of the undo_stack: since you "inverted" the actions it will perform the desired operation.
EDIT: here's a minimal wandbox example that implements a match function that takes in a variant and returns a variant.
The example uses boost::hana::overload to generate the visitor.
The visitor is wrapped in a lambda f that unifies the return type to the type of the variant: this is necessary as std::visit requires that the visitor always returns the same type.
If returning a type which is different from the variant is desirable, std::common_type_t could be used, otherwise the user could explicitly specify it as the first template parameter of match.
Two reasonable approaches to this problem are implemented in the frameworks Flip and ODB.
Code-generation / ODB
With ODB you need to add #pragma declarations to your code, and have it's tool generate methods that you use for save/load and for editing the model, like so:
#pragma db object
class person
{
public:
void setName (string);
string getName();
...
private:
friend class odb::access;
person () {}
#pragma db id
string email_;
string name_;
};
Where the accessors declared in the class are auto-generated by ODB so that all changes to the model can get captured and undo-transactions may be made for them.
Reflection with minimal boilerplate / Flip
Unlike ODB, Flip doesn't generate C++ code for you, but rather it requires your program to call Model::declare to re-declare your structures like so:
class Song : public flip::Object
{
public:
static void declare ();
flip::Float tempo;
flip::Array <Track> tracks;
};
void Song::declare ()
{
Model::declare <Song> ()
.name ("acme.product.Song")
.member <flip::Float, &Song::tempo> ("tempo");
.member <flip::Array <Track>, &Song::tracks> ("tracks");
}
int main()
{
Song::declare();
...
}
With the structured declared like so, flip::Object's constructor can initialize all the fields so that they can point to the undo stack, and all the edits on them are recorded. It also has a list of all the members so that flip::Object can implement the serialization for you.
The problem with the simple way described above is that it requires a lot of error-prone boilerplate work.
I would say that the actual problem is that your undo/redo logic is part of a component, that should ship only a bunch of data as a position, a content and so on.
A common OOP way to decouple the undo/redo logic from the data is the command design pattern.
The basic idea is that all the user interactions are converted to commands and those commands are executed on the diagram itself. They contain all the information required to perform an operation and to rollback it, as long as you maintain a sorted list of commands and undo/redo them in order (that is usually the user expectation).
Another common OOP pattern that can help you either to design a custom serialization utility or to use the most common ones is the visitor design pattern.
Here the basic idea is that your diagram should not care about the kind of components it contains. Whenever you want to serialize it, you provide a serializer and the components promote themselves to the right type when queried (see double dispatching for further details on this technique).
That being said, a minimal example is worth more than a thousand words:
#include <memory>
#include <stack>
#include <vector>
#include <utility>
#include <iostream>
#include <algorithm>
#include <string>
struct Serializer;
struct Part {
virtual void accept(Serializer &) = 0;
virtual void draw() = 0;
};
struct Node: Part {
void accept(Serializer &serializer) override;
void draw() override;
std::string label;
unsigned int x;
unsigned int y;
};
struct Link: Part {
void accept(Serializer &serializer) override;
void draw() override;
std::weak_ptr<Node> from;
std::weak_ptr<Node> to;
};
struct Serializer {
void visit(Node &node) {
std::cout << "serializing node " << node.label << " - x: " << node.x << ", y: " << node.y << std::endl;
}
void visit(Link &link) {
auto pfrom = link.from.lock();
auto pto = link.to.lock();
std::cout << "serializing link between " << (pfrom ? pfrom->label : "-none-") << " and " << (pto ? pto->label : "-none-") << std::endl;
}
};
void Node::accept(Serializer &serializer) {
serializer.visit(*this);
}
void Node::draw() {
std::cout << "drawing node " << label << " - x: " << x << ", y: " << y << std::endl;
}
void Link::accept(Serializer &serializer) {
serializer.visit(*this);
}
void Link::draw() {
auto pfrom = from.lock();
auto pto = to.lock();
std::cout << "drawing link between " << (pfrom ? pfrom->label : "-none-") << " and " << (pto ? pto->label : "-none-") << std::endl;
}
struct TreeDiagram;
struct Command {
virtual void execute(TreeDiagram &) = 0;
virtual void undo(TreeDiagram &) = 0;
};
struct TreeDiagram {
std::vector<std::shared_ptr<Part>> parts;
std::stack<std::unique_ptr<Command>> commands;
void execute(std::unique_ptr<Command> command) {
command->execute(*this);
commands.push(std::move(command));
}
void undo() {
if(!commands.empty()) {
commands.top()->undo(*this);
commands.pop();
}
}
void draw() {
std::cout << "draw..." << std::endl;
for(auto &part: parts) {
part->draw();
}
}
void serialize(Serializer &serializer) {
std::cout << "serialize..." << std::endl;
for(auto &part: parts) {
part->accept(serializer);
}
}
};
struct AddNode: Command {
AddNode(std::string label, unsigned int x, unsigned int y):
label{label}, x{x}, y{y}, node{std::make_shared<Node>()}
{
node->label = label;
node->x = x;
node->y = y;
}
void execute(TreeDiagram &diagram) override {
diagram.parts.push_back(node);
}
void undo(TreeDiagram &diagram) override {
auto &parts = diagram.parts;
parts.erase(std::remove(parts.begin(), parts.end(), node), parts.end());
}
std::string label;
unsigned int x;
unsigned int y;
std::shared_ptr<Node> node;
};
struct AddLink: Command {
AddLink(std::shared_ptr<Node> from, std::shared_ptr<Node> to):
link{std::make_shared<Link>()}
{
link->from = from;
link->to = to;
}
void execute(TreeDiagram &diagram) override {
diagram.parts.push_back(link);
}
void undo(TreeDiagram &diagram) override {
auto &parts = diagram.parts;
parts.erase(std::remove(parts.begin(), parts.end(), link), parts.end());
}
std::shared_ptr<Link> link;
};
struct MoveNode: Command {
MoveNode(unsigned int x, unsigned int y, std::shared_ptr<Node> node):
px{node->x}, py{node->y}, x{x}, y{y}, node{node}
{}
void execute(TreeDiagram &) override {
node->x = x;
node->y = y;
}
void undo(TreeDiagram &) override {
node->x = px;
node->y = py;
}
unsigned int px;
unsigned int py;
unsigned int x;
unsigned int y;
std::shared_ptr<Node> node;
};
int main() {
TreeDiagram diagram;
Serializer serializer;
auto addNode1 = std::make_unique<AddNode>("foo", 0, 0);
auto addNode2 = std::make_unique<AddNode>("bar", 100, 50);
auto moveNode2 = std::make_unique<MoveNode>(10, 10, addNode2->node);
auto addLink = std::make_unique<AddLink>(addNode1->node, addNode2->node);
diagram.serialize(serializer);
diagram.execute(std::move(addNode1));
diagram.execute(std::move(addNode2));
diagram.execute(std::move(addLink));
diagram.serialize(serializer);
diagram.execute(std::move(moveNode2));
diagram.draw();
diagram.undo();
diagram.undo();
diagram.serialize(serializer);
}
I've not implemented the redo action and the code is far from being a production-ready piece of software, but it acts quite well as a starting point from which to create something more complex.
As you can see, the goal is to create a tree diagram that contains both nodes an links. A component contains a bunch of data and knows how to draw itself. Moreover, as anticipated, a component accepts a serializer in case you want to write it down on a file or whatever.
All the logic is contained in the so called commands. In the example there are three commands: add node, add link and move node. Neither the diagram nor the components know anything about what's going on under the hood. All what the diagram knows is that it's executing a set of commands and those commands can be executed back a step at the time.
A more complex undo/redo system can contain a circular buffer of commands and a few indexes that indicate the one to substitute with the next one, the one valid when going forth and the one valid when going back.
It's quite easy to implement indeed.
This approach will help you decoupling the logic from the data and it's quite common when dealing with user interfaces.
To be honest, it's not something that came up suddenly to my mind. I found something similar while looking at how open-source software solved the issue and I've used it a few years ago in a software of mine. The resulting code is really easy to maintain.
Another approach you might want to consider is working with inmutable data structures and objects. Then, the undo/redo stack can be implemented as a stack of versions of the scene/diagram/document. Undo() replaces the current version with an older version from the stack, and so on. Because all data is inmutable, you can keep references instead of copies, so it is fast and (relatively) cheap.
Pros:
simple undo/redo
multithread-friendly
clean separation of "structure" and transient state (e.g. current selection)
may simplify serialization
caching/memoization/precomputation-friendly (e.g. bounding-box, gpu buffers)
Cons:
consumes a bit more memory
forces separation of "structure" and transient state
probably more difficult: for example, for a typical tree-like scenegraph, to change a node you would also need to change all the nodes along the path to the root; the old and new versions can share the rest of the nodes
Assuming that you're calling save() on a temporary file for each edit of the diagram (even if user doesn't explicitly call the save action) and that you undo only the latest action, you can do as follows:
LastDiagram load(const std::string &path)
{
/* Check for valid path (e.g. boost::filesystem here) */
if(!found)
{
throw std::runtime_exception{"No diagram found"};
}
//read LastDiagram
return LastDiagram;
}
LastDiagram undoLastAction()
{
return loadLastDiagram("/tmp/tmp_diagram_file");
}
and in your main app you handle the exception if thrown. In case you want to allow more undos, then you should think to a solution like sqlite or a tmp file with more entries.
If performance in time and space are issues due large diagrams, think to implement some strategy like keeping an incremental difference for each element of a diagram in a std::vector (limit it to 3/5 if objects are big) and call the renderer with the current statuses. I'm not an OpenGL expert, but I think it's the way it's done there. Actually you could 'steal' this strategy from game development best practices, or generally graphics related ones.
One of those strategies could be something like this:
A structure for efficient update, incremental redisplay and undo in graphical editors

Advice for best approach at extending class capability

I want to extend a class to include extra data and capabilities (I want polymorphic behavior). It seemed obvious to use inheritance and multiple inheritance.
Having read various posts that inheritance (and especially multiple inheritance) can be problematic, I've begun looking into other options:
Put all data and functions in one class and not use inheritance
Composite pattern
mixin
Is there a suggested approach for the following inheritance example? Is this a case where inheritance is reasonable? (but I don't like having to put default functions in the base-class)
#include <iostream>
//================================
class B {
public:
virtual ~B() { }
void setVal(int val) { val_ = val; }
// I'd rather not have these at base class level but want to use
// polymorphism on type B:
virtual void setColor(int val) { std::cout << "setColor not implemented" << std::endl; }
virtual void setLength(int val) { std::cout << "setLength not implemented" << std::endl; }
private:
int val_;
};
//================================
class D1 : virtual public B {
public:
void setColor(int color) {
std::cout << "D1::setColor to " << color << std::endl;
color_ = color;
}
private:
int color_;
};
//================================
class D2 : virtual public B {
public:
void setLength(int length) {
std::cout << "D2::setLength to " << length << std::endl;
length_ = length;
}
private:
int length_;
};
//================================
// multi-inheritance diamond - have fiddled with mixin
// but haven't solved using type B polymorphically with mixins
class M1 : public D1, public D2 {
};
//================================
int main() {
B* d1 = new D1;
d1->setVal(3);
d1->setColor(1);
B* m1 = new M1;
m1->setVal(4);
m1->setLength(2);
m1->setColor(4);
return 0;
}
Suspected problems with the original example code
There are a number of issues with your example.
In the first place, you don't have to supply function bodies in the base class. Use pure virtual functions instead.
Secondly, both your classes D1 and D2 miss functionality, so they should be abstract (which will prevent you from creating deprived objects from them). This second issue will become clear if you indeed use pure virtual functions for your base class. The compiler will start to issue warnings then.
Instantiating D1 as you do with new D1, is bad design, because D1 has no truly functional implementation of the setLength method, even if you give it a 'dummy' body. Giving it a 'dummy' body (one that doesn't do anything useful) so masks your design error.
So your remark (but I don't like having to put default functions in the base-class) testifies of a proper intuition. Having to do that signals flawed design. A D1 object cannot understand setLength, while its inherited public interface promises it can.
And: There's nothing wrong with multiple inheritance, if used correctly. It is very powerful and elegant. But you have to use it where appropiate. D1 and D2 are partial implementations of B, so abstract, and inheriting from both will indeed give you a complete implementation, so concrete.
Maybe a good rule to start with is: Use multiple inheritance only if you see a compelling need for it. But if you do, as said, it's very useful. It can prevent quite some ugly asymmetry and code duplication, compared to e.g. a language like Java, that has banned it.
I am not a tree doctor. When I use a chainsaw I endanger my leg. But that is not to say chainsaws ain't useful.
Where to put the dummy: Nowhere please, do not disinherit...
[EDIT after first comment of OP]
If you derive a class D1 from B that would print 'setLength not implemented' if you call its setLength method, how should the caller react? It shouldn't have called it in the first place, which the caller could have known if D1 did not derive from a B that has this methods, pure virtual or not. Then it would have been clear that it just doesn't support this method. Having the B baseclass makes D1 feel at home in a polymorphic datastructure who'se element type, B* or B&, promises its users that its objects properly support getLength, which they don't.
While this is not the case in your example (but maybe you left things out), there may of course be a good reason to derive D1 and D2 from B. B could hold a part of the eventual interface or implementation of its derived classes that both D1 an D2 need.
Suppose B had a method setAny (key, value) (setting a value in a dictionary), which D1 and D2 both use, D1 calls it in setColor and D2 calls it in setLength.
In that case use of a common base class is justified. In that case B should not have virtual methods setColor or setLength at all, neither dummies nor pure. You should just have a setColor in your D1 class and a setLength in your D2 class, but neither of both in your B class.
There's a basic rule in Object Oriented Design:
Do not disinherit
By introducing the concept of a "method that's not applicable" in a concrete class that's just what you're doing. Now rules like this aren't dogma's. But violating this rule almost always points to a design flaw.
All B's in one datastructure is only useful to have them do a trick that they all understand...
[EDIT2 after second coment of OP]
OP wants to have a map that can hold objects of any class derived from B.
This is exactly where the problem starts. To find out how to store pointers and references to our objects, we have to ask: what is the storage used for. If a map, say mapB is used to store pointers to B, there must be some point in that. With data storage the fun is in retrieving the data and doing something useful with it.
Let's make this a bit simpler by working with lists from everyday life. Suppose I have a personList of say 1000 persons, each with their fullName and phoneNumber. And now say I have a problem with the kitchen sink. I could in fact read through the list, call every single Person on it and ask: can you repair my kitchen sink. In other words: do you support the repairKitchenSink method. Or: are you by any chance an instance of class Plumber (are you a Plumber). But then I spend quite some time calling, and maybe after 500 calls or so, I'll be lucky.
Now all 1000 persons on my personList do support the talkToMe method. So whenever I feel lonely I can call any person from that list and invokate that Person's talkToMe method. But they should not all have a repairKitchenSink method, even not a pure virtual or a dummy variation that does something else, because if I would call this method for a person of class Burglar, he'd probably respond to the call, but in an unexpected way.
So class Person shouldn't contain a method repairKitchenSink, even not a pure virtual one. Because it should never called as part of iteration of personList. It should be called when iterating plumberList. This lists only holds objects that support the repairKitchenSink method.
Use pure virtual functions only where appropriate
They may support it in different ways though. In other words, in class Plumber, method repairKitchenSink can e.g. be pure virtual. There may e.g. be 2 derived classes, PvcPlumber and CopperPlumber. CopperPlumber would implement (code) the repairKitchenSink method by calling lightFlame, followed by a call to solderDrainToSink whereas PvcPlumber would implement it as successive calls to applyGlueToPvcTube and glueTubeToSinkOutlet. But both plumber subclasses implement repairKitchenSink, only in different ways. This and only this justifies having the pure virtual function repairKitchenSink in their base class Plumber. Now of course a class may be derived from Plumber that doesn't implement that method, say class WannabePlumber. But since it will be abstract, you cannot instantiate objects from it, which is good, unless you want wet feet.
There may be many different subclasses of Person. They e.g. represent different professions, or different political preferences, or different religions. If a Person is a Democrat Budhist Plumber, than he (M/F) may be in a derived class that inherits from classes Democrat, Budhist and Plumber. Using inheritance or even typing for something so volatile as political preferences or religious beliefs, or even profession and the endless amount of combinations of those, would not be handy in practice, but it's just an example. In reality profession, religion and politicalPreference would probably be attributes. But that doesn't change the point that matters here. IF something is of a class does not support a certain operation, THEN it shouldn't be in a datastructure that suggests it does.
By, besides personList, having plumberList, animistList and democratList, you're sure to call a person that understands your call to method inviteBillToPlayInMyJazzBand, or worshipTheTreeInMyBackyard.
Lists don't contain objects, they only contain pointers or references to objects. So there's nothing wrong with our Democratic Budhist Plumber being contained in personList, democratList, budhistList and plumberList. Lists are like database indexes. The don't contain the records, they just refer to them. You can have many indexes on one table, and you should, because indexes are small and make your database fast.
The same holds for polymorphic datastructures. At the moment that even personList, democratList, budhistList and plumberList become so large that you're running out of memory, the solution is generally NOT to only have a personList. Because then you exchange your memory problem for a perfomance problem and a code complexity problem that, in general, is far worse.
So, back to your comment: You say you want all your derived classes to be in a list of B's. Fine, but still the interface of a B should only contain methods that are implemented for everything in the list, so no dummy methods. That would be like going through the library and going through all books, in search for one that supports the teachMeAboutTheLovelifeOfGoldfishes method.
To be honest, in telling you all this, I've been committing a capital sin. I've been selling general truths. But in software design these don't exist. I've been trying to sell them to you because I've been teaching OO design for some 30 years now, and I think I recognize the point where your stuck. But to every rule there are many exceptions. Still, if I've properly fathomed your problem, in this case I think you should go for separate datastructures, each holding only references or pointers to objects that really can do trick that you were after when you iterated through that particular datastructure.
A point is a square circle
Part of the confusion in properly using polymorphic datastructures (datastructures holding pointers or references to different object types) comes for the world of relational databases. RDB's work with tables of flat records, each record having the same fields. Since some fields may not apply, something called 'constraint' was invented. In C++ class Point would contain field x and y. Class Circle could inherit from it and additionally contain field 'radius'. Class Square could also inherit from Point, but contain field 'side' in addition to x and y. In the RDB world constraints, not fields, are inherited. So a Circle would have constraint radius == 0. And a Square would have constraint side == 0. And a Point would inherit both constraints, so it would meet the conditions for both being a square and a circle: A point is a square circle, which in mathematics indeed is the case. Note that the constraint inheritance hierarchy is 'upside down', compared to C++. Which can be confusing.
What doesn't help either is the generally held belief that inheritance goes hand in hand with specialization. While this is often the case it isn't always. In many cases in C++ inheritance is extension rather than specialization. The two often coincide, but the Point, Square, Circle example shows that this isn't a general truth.
If inheritance is used, in C++ Circle should derive from Point, since it has extra fields. But a Circle certainly isn't a special type of Point, it's the other way round. In many practical libraries, by the way, Circle will contain an object of class Point, holding x and y, rather than inherit from it, bypassing the whole problem.
Welcome to the world of design choices
What you bumped into is a real design choice, and an important one. Thinking very carefully about things like this, as you are doing, and trying them all in practice, including the allegedly 'wrong' ones, will make you a programmer, rather than a coder.
Let me first state that what you are trying to do is a design smell: Most probably what you are actually trying to achieve could be achieved in a better way. Unfortunately we can't know what it is you actually want to achieve since you only told us how you want to achieve it.
But anyway, your implementation is bad, as the methods report "not implemented" to the users of the program, rather than to the caller. There is no way for the caller to react on the method not doing what is intended. Even worse, you don't even output it to the error stream, but to the regular output stream, so if you use that class in any program that produces regular output, that output will be interrupted by your error message, possibly confusing a program further on in a pipeline).
Here's a better way to do it:
#include <iostream>
#include <cstdlib> // for EXIT_FAILURE
//================================
class B {
public:
virtual ~B() { }
void setVal(int val) { val_ = val; }
// note: No implementation of methods not making sense to a B
private:
int val_;
};
//================================
class D1 : virtual public B {
public:
void setColor(int color) {
std::cout << "D1::setColor to " << color << std::endl;
color_ = color;
}
private:
int color_;
};
//================================
class D2 : virtual public B {
public:
void setLength(int length) {
std::cout << "D2::setLength to " << length << std::endl;
length_ = length;
}
private:
int length_;
};
class M1 : public virtual D1, public virtual D2 {
};
//================================
int main() {
B* d1 = new D1;
p->setVal(3);
if (D1* p = dynamic_cast<D1*>(d1))
{
p->setColor(1);
}
else
{
// note: Use std::cerr, not std::cout, for error messages
std::cerr << "Oops, this wasn't a D1!\n";
// Since this should not have happened to begin with,
// better exit immediately; *reporting* the failure
return EXIT_FAILURE;
}
B* m1 = new M1;
m1->setVal(4);
if (D2* p = dynamic_cast<D2*>(m1))
{
p->setLength(2);
}
else
{
// note: Use std::cerr, not std::cout, for error messages
std::cerr << "Oops, this wasn't a D1!\n";
// Since this should not have happened to begin with,
// better exit immediately; *reporting* the failure
return EXIT_FAILURE;
}
if (D1* p = dynamic_cast<D1*>(m1))
{
p->setColor(4);
}
else
{
// note: Use std::cerr, not std::cout, for error messages
std::cerr << "Oops, this wasn't a D1!\n";
// Since this should not have happened to begin with,
// better exit immediately; *reporting* the failure
return EXIT_FAILURE;
}
return 0;
}
Alternatively, you could make use of the fact that your methods share some uniformity, and use a common method to set all:
#include <iostream>
#include <stdexcept> // for std::logic_error
#include <cstdlib>
#include <string>
enum properties { propValue, propColour, propLength };
std::string property_name(property p)
{
switch(p)
{
case propValue: return "Value";
case propColour: return "Colour";
case propLength: return "Length";
default: return "<invalid property>";
}
}
class B
{
public:
virtual ~B() {}
// allow the caller to determine which properties are supported
virtual bool supportsProperty(property p)
{
return p == propValue;
}
void setProperty(property p, int v)
{
bool succeeded = do_set_property(p,v);
// report problems to the _caller_
if (!succeeded)
throw std::logic_error(property_name(p)+" not supported.");
}
private:
virtual bool do_set_property(property p)
{
if (p == propValue)
{
value = v;
return true;
}
else
return false;
}
int value;
};
class D1: public virtual B
{
public:
virtual bool supportsProperty(property p)
{
return p == propColour || B::supportsProperty(p);
}
private:
virtual bool do_set_property(property p, int v)
{
if (p == propColour)
{
colour = v;
return true;
}
else
return B::do_set_property(p, v);
}
int colour;
};
class D2: public virtual B
{
public:
virtual bool supportsProperty(property p)
{
return p == propLength || B::supportsProperty(p);
}
private:
virtual bool do_set_property(property p, int v)
{
if (p == propLength)
{
length = v;
return true;
}
else
return B::do_set_property(p, v);
}
int length;
};
class M1: public virtual D1, public virtual D2
{
public:
virtual bool supportsProperty(property p)
{
return D1::supportsProperty(p) || D2::supportsProperty(p);
}
private:
bool do_set_property(property p, int v)
{
return D1::do_set_property(p, v) || D2::do_set_property(p, v);
}
};

Practical use of dynamic_cast?

I have a pretty simple question about the dynamic_cast operator. I know this is used for run time type identification, i.e., to know about the object type at run time. But from your programming experience, can you please give a real scenario where you had to use this operator? What were the difficulties without using it?
Toy example
Noah's ark shall function as a container for different types of animals. As the ark itself is not concerned about the difference between monkeys, penguins, and mosquitoes, you define a class Animal, derive the classes Monkey, Penguin, and Mosquito from it, and store each of them as an Animal in the ark.
Once the flood is over, Noah wants to distribute animals across earth to the places where they belong and hence needs additional knowledge about the generic animals stored in his ark. As one example, he can now try to dynamic_cast<> each animal to a Penguin in order to figure out which of the animals are penguins to be released in the Antarctic and which are not.
Real life example
We implemented an event monitoring framework, where an application would store runtime-generated events in a list. Event monitors would go through this list and examine those specific events they were interested in. Event types were OS-level things such as SYSCALL, FUNCTIONCALL, and INTERRUPT.
Here, we stored all our specific events in a generic list of Event instances. Monitors would then iterate over this list and dynamic_cast<> the events they saw to those types they were interested in. All others (those that raise an exception) are ignored.
Question: Why can't you have a separate list for each event type?
Answer: You can do this, but it makes extending the system with new events as well as new monitors (aggregating multiple event types) harder, because everyone needs to be aware of the respective lists to check for.
A typical use case is the visitor pattern:
struct Element
{
virtual ~Element() { }
void accept(Visitor & v)
{
v.visit(this);
}
};
struct Visitor
{
virtual void visit(Element * e) = 0;
virtual ~Visitor() { }
};
struct RedElement : Element { };
struct BlueElement : Element { };
struct FifthElement : Element { };
struct MyVisitor : Visitor
{
virtual void visit(Element * e)
{
if (RedElement * p = dynamic_cast<RedElement*>(e))
{
// do things specific to Red
}
else if (BlueElement * p = dynamic_cast<BlueElement*>(e))
{
// do things specific to Blue
}
else
{
// error: visitor doesn't know what to do with this element
}
}
};
Now if you have some Element & e;, you can make MyVisitor v; and say e.accept(v).
The key design feature is that if you modify your Element hierarchy, you only have to edit your visitors. The pattern is still fairly complex, and only recommended if you have a very stable class hierarchy of Elements.
Imagine this situation: You have a C++ program that reads and displays HTML. You have a base class HTMLElement which has a pure virtual method displayOnScreen. You also have a function called renderHTMLToBitmap, which draws the HTML to a bitmap. If each HTMLElement has a vector<HTMLElement*> children;, you can just pass the HTMLElement representing the element <html>. But what if a few of the subclasses need special treatment, like <link> for adding CSS. You need a way to know if an element is a LinkElement so you can give it to the CSS functions. To find that out, you'd use dynamic_cast.
The problem with dynamic_cast and polymorphism in general is that it's not terribly efficient. When you add vtables into the mix, it only get's worse.
When you add virtual functions to a base class, when they are called, you end up actually going through quite a few layers of function pointers and memory areas. That will never be more efficient than something like the ASM call instruction.
Edit: In response to Andrew's comment bellow, here's a new approach: Instead of dynamic casting to the specific element type (LinkElement), instead you have another abstract subclass of HTMLElement called ActionElement that overrides displayOnScreen with a function that displays nothing, and creates a new pure virtual function: virtual void doAction() const = 0. The dynamic_cast is changed to test for ActionElement and just calls doAction(). You'd have the same kind of subclass for GraphicalElement with a virtual method displayOnScreen().
Edit 2: Here's what a "rendering" method might look like:
void render(HTMLElement root) {
for(vector<HTLMElement*>::iterator i = root.children.begin(); i != root.children.end(); i++) {
if(dynamic_cast<ActionElement*>(*i) != NULL) //Is an ActionElement
{
ActionElement* ae = dynamic_cast<ActionElement*>(*i);
ae->doAction();
render(ae);
}
else if(dynamic_cast<GraphicalElement*>(*i) != NULL) //Is a GraphicalElement
{
GraphicalElement* ge = dynamic_cast<GraphicalElement*>(*i);
ge->displayToScreen();
render(ge);
}
else
{
//Error
}
}
}
Operator dynamic_cast solves the same problem as dynamic dispatch (virtual functions, visitor pattern, etc): it allows you to perform different actions based on the runtime type of an object.
However, you should always prefer dynamic dispatch, except perhaps when the number of dynamic_cast you'd need will never grow.
Eg. you should never do:
if (auto v = dynamic_cast<Dog*>(animal)) { ... }
else if (auto v = dynamic_cast<Cat*>(animal)) { ... }
...
for maintainability and performance reasons, but you can do eg.
for (MenuItem* item: items)
{
if (auto submenu = dynamic_cast<Submenu*>(item))
{
auto items = submenu->items();
draw(context, items, position); // Recursion
...
}
else
{
item->draw_icon();
item->setup_accelerator();
...
}
}
which I've found quite useful in this exact situation: you have one very particular subhierarchy that must be handled separately, this is where dynamic_cast shines. But real world examples are quite rare (the menu example is something I had to deal with).
dynamic_cast is not intended as an alternative to virtual functions.
dynamic_cast has a non-trivial performance overhead (or so I think) since the whole class hierarchy has to be walked through.
dynamic_cast is similar to the 'is' operator of C# and the QueryInterface of good old COM.
So far I have found one real use of dynamic_cast:
(*) You have multiple inheritance and to locate the target of the cast the compiler has to walk the class hierarchy up and down to locate the target (or down and up if you prefer). This means that the target of the cast is in a parallel branch in relation to where the source of the cast is in the hierarchy. I think there is NO other way to do such a cast.
In all other cases, you just use some base class virtual to tell you what type of object you have and ONLY THEN you dynamic_cast it to the target class so you can use some of it's non-virtual functionality. Ideally there should be no non-virtual functionality, but what the heck, we live in the real world.
Doing things like:
if (v = dynamic_cast(...)){} else if (v = dynamic_cast(...)){} else if ...
is a performance waste.
Casting should be avoided when possible, because it is basically saying to the compiler that you know better and it is usually a sign of some weaker design decission.
However, you might come in situations where the abstraction level was a bit too high for 1 or 2 sub-classes, where you have the choice to change your design or solve it by checking the subclass with dynamic_cast and handle it in a seperate branch. The trade-of is between adding extra time and risk now against extra maintenance issues later.
In most situations where you are writing code in which you know the type of the entity you're working with, you just use static_cast as it's more efficient.
Situations where you need dynamic cast typically arrive (in my experience) from lack of foresight in design - typically where the designer fails to provide an enumeration or id that allows you to determine the type later in the code.
For example, I've seen this situation in more than one project already:
You may use a factory where the internal logic decides which derived class the user wants rather than the user explicitly selecting one. That factory, in a perfect world, returns an enumeration which will help you identify the type of returned object, but if it doesn't you may need to test what type of object it gave you with a dynamic_cast.
Your follow-up question would obviously be: Why would you need to know the type of object that you're using in code using a factory?
In a perfect world, you wouldn't - the interface provided by the base class would be sufficient for managing all of the factories' returned objects to all required extents. People don't design perfectly though. For example, if your factory creates abstract connection objects, you may suddenly realize that you need to access the UseSSL flag on your socket connection object, but the factory base doesn't support that and it's not relevant to any of the other classes using the interface. So, maybe you would check to see if you're using that type of derived class in your logic, and cast/set the flag directly if you are.
It's ugly, but it's not a perfect world, and sometimes you don't have time to refactor an imperfect design fully in the real world under work pressure.
The dynamic_cast operator is very useful to me.
I especially use it with the Observer pattern for event management:
#include <vector>
#include <iostream>
using namespace std;
class Subject; class Observer; class Event;
class Event { public: virtual ~Event() {}; };
class Observer { public: virtual void onEvent(Subject& s, const Event& e) = 0; };
class Subject {
private:
vector<Observer*> m_obs;
public:
void attach(Observer& obs) { m_obs.push_back(& obs); }
public:
void notifyEvent(const Event& evt) {
for (vector<Observer*>::iterator it = m_obs.begin(); it != m_obs.end(); it++) {
if (Observer* const obs = *it) {
obs->onEvent(*this, evt);
}
}
}
};
// Define a model with events that contain data.
class MyModel : public Subject {
public:
class Evt1 : public Event { public: int a; string s; };
class Evt2 : public Event { public: float f; };
};
// Define a first service that processes both events with their data.
class MyService1 : public Observer {
public:
virtual void onEvent(Subject& s, const Event& e) {
if (const MyModel::Evt1* const e1 = dynamic_cast<const MyModel::Evt1*>(& e)) {
cout << "Service1 - event Evt1 received: a = " << e1->a << ", s = " << e1->s << endl;
}
if (const MyModel::Evt2* const e2 = dynamic_cast<const MyModel::Evt2*>(& e)) {
cout << "Service1 - event Evt2 received: f = " << e2->f << endl;
}
}
};
// Define a second service that only deals with the second event.
class MyService2 : public Observer {
public:
virtual void onEvent(Subject& s, const Event& e) {
// Nothing to do with Evt1 in Service2
if (const MyModel::Evt2* const e2 = dynamic_cast<const MyModel::Evt2*>(& e)) {
cout << "Service2 - event Evt2 received: f = " << e2->f << endl;
}
}
};
int main(void) {
MyModel m; MyService1 s1; MyService2 s2;
m.attach(s1); m.attach(s2);
MyModel::Evt1 e1; e1.a = 2; e1.s = "two"; m.notifyEvent(e1);
MyModel::Evt2 e2; e2.f = .2f; m.notifyEvent(e2);
}
Contract Programming and RTTI shows how you can use dynamic_cast to allow objects to advertise what interfaces they implement. We used it in my shop to replace a rather opaque metaobject system. Now we can clearly describe the functionality of objects, even if the objects are introduced by a new module several weeks/months after the platform was 'baked' (though of course the contracts need to have been decided on up front).

What is the practical use of pointers to member functions?

I've read through this article, and what I take from it is that when you want to call a pointer to a member function, you need an instance (either a pointer to one or a stack-reference) and call it so:
(instance.*mem_func_ptr)(..)
or
(instance->*mem_func_ptr)(..)
My question is based on this: since you have the instance, why not call the member function directly, like so:
instance.mem_func(..) //or: instance->mem_func(..)
What is the rational/practical use of pointers to member functions?
[edit]
I'm playing with X-development & reached the stage where I am implementing widgets; the event-loop-thread for translating the X-events to my classes & widgets needs to start threads for each widget/window when an event for them arrives; to do this properly I thought I needed function-pointers to the event-handlers in my classes.
Not so: what I did discover was that I could do the same thing in a much clearer & neater way by simply using a virtual base class. No need whatsoever for pointers to member-functions. It was while developing the above that the doubt about the practical usability/meaning of pointers to member-functions arose.
The simple fact that you need a reference to an instance in order to use the member-function-pointer, obsoletes the need for one.
[edit - #sbi & others]
Here is a sample program to illustrate my point:
(Note specifically 'Handle_THREE()')
#include <iostream>
#include <string>
#include <map>
//-----------------------------------------------------------------------------
class Base
{
public:
~Base() {}
virtual void Handler(std::string sItem) = 0;
};
//-----------------------------------------------------------------------------
typedef void (Base::*memfunc)(std::string);
//-----------------------------------------------------------------------------
class Paper : public Base
{
public:
Paper() {}
~Paper() {}
virtual void Handler(std::string sItem) { std::cout << "Handling paper\n"; }
};
//-----------------------------------------------------------------------------
class Wood : public Base
{
public:
Wood() {}
~Wood() {}
virtual void Handler(std::string sItem) { std::cout << "Handling wood\n"; }
};
//-----------------------------------------------------------------------------
class Glass : public Base
{
public:
Glass() {}
~Glass() {}
virtual void Handler(std::string sItem) { std::cout << "Handling glass\n"; }
};
//-----------------------------------------------------------------------------
std::map< std::string, memfunc > handlers;
void AddHandler(std::string sItem, memfunc f) { handlers[sItem] = f; }
//-----------------------------------------------------------------------------
std::map< Base*, memfunc > available_ONE;
void AddAvailable_ONE(Base *p, memfunc f) { available_ONE[p] = f; }
//-----------------------------------------------------------------------------
std::map< std::string, Base* > available_TWO;
void AddAvailable_TWO(std::string sItem, Base *p) { available_TWO[sItem] = p; }
//-----------------------------------------------------------------------------
void Handle_ONE(std::string sItem)
{
memfunc f = handlers[sItem];
if (f)
{
std::map< Base*, memfunc >::iterator it;
Base *inst = NULL;
for (it=available_ONE.begin(); ((it != available_ONE.end()) && (inst==NULL)); it++)
{
if (it->second == f) inst = it->first;
}
if (inst) (inst->*f)(sItem);
else std::cout << "No instance of handler for: " << sItem << "\n";
}
else std::cout << "No handler for: " << sItem << "\n";
}
//-----------------------------------------------------------------------------
void Handle_TWO(std::string sItem)
{
memfunc f = handlers[sItem];
if (f)
{
Base *inst = available_TWO[sItem];
if (inst) (inst->*f)(sItem);
else std::cout << "No instance of handler for: " << sItem << "\n";
}
else std::cout << "No handler for: " << sItem << "\n";
}
//-----------------------------------------------------------------------------
void Handle_THREE(std::string sItem)
{
Base *inst = available_TWO[sItem];
if (inst) inst->Handler(sItem);
else std::cout << "No handler for: " << sItem << "\n";
}
//-----------------------------------------------------------------------------
int main()
{
Paper p;
Wood w;
Glass g;
AddHandler("Paper", (memfunc)(&Paper::Handler));
AddHandler("Wood", (memfunc)(&Wood::Handler));
AddHandler("Glass", (memfunc)(&Glass::Handler));
AddAvailable_ONE(&p, (memfunc)(&Paper::Handler));
AddAvailable_ONE(&g, (memfunc)(&Glass::Handler));
AddAvailable_TWO("Paper", &p);
AddAvailable_TWO("Glass", &g);
std::cout << "\nONE: (bug due to member-function address being relative to instance address)\n";
Handle_ONE("Paper");
Handle_ONE("Wood");
Handle_ONE("Glass");
Handle_ONE("Iron");
std::cout << "\nTWO:\n";
Handle_TWO("Paper");
Handle_TWO("Wood");
Handle_TWO("Glass");
Handle_TWO("Iron");
std::cout << "\nTHREE:\n";
Handle_THREE("Paper");
Handle_THREE("Wood");
Handle_THREE("Glass");
Handle_THREE("Iron");
}
{edit] Potential problem with direct-call in above example:
In Handler_THREE() the name of the method must be hard-coded, forcing changes to be made anywhere that it is used, to apply any change to the method. Using a pointer to member-function the only additional change to be made is where the pointer is created.
[edit] Practical uses gleaned from the answers:
From answer by Chubsdad:
What: A dedicated 'Caller'-function is used to invoke the mem-func-ptr;Benefit: To protect code using function(s) provided by other objectsHow: If the particular function(s) are used in many places and the name and/or parameters change, then you only need to change the name where it is allocated as pointer, and adapt the call in the 'Caller'-function. (If the function is used as instance.function() then it must be changed everywhere.)
From answer by Matthew Flaschen:
What: Local specialization in a classBenefit: Makes the code much clearer,simpler and easier to use and maintainHow: Replaces code that would conventionally be implement using complex logic with (potentially) large switch()/if-then statements with direct pointers to the specialization; fairly similar to the 'Caller'-function above.
The same reason you use any function pointer: You can use arbitrary program logic to set the function pointer variable before calling it. You could use a switch, an if/else, pass it into a function, whatever.
EDIT:
The example in the question does show that you can sometimes use virtual functions as an alternative to pointers to member functions. This shouldn't be surprising, because there are usually multiple approaches in programming.
Here's an example of a case where virtual functions probably don't make sense. Like the code in the OP, this is meant to illustrate, not to be particularly realistic. It shows a class with public test functions. These use internal, private, functions. The internal functions can only be called after a setup, and a teardown must be called afterwards.
#include <iostream>
class MemberDemo;
typedef void (MemberDemo::*MemberDemoPtr)();
class MemberDemo
{
public:
void test1();
void test2();
private:
void test1_internal();
void test2_internal();
void do_with_setup_teardown(MemberDemoPtr p);
};
void MemberDemo::test1()
{
do_with_setup_teardown(&MemberDemo::test1_internal);
}
void MemberDemo::test2()
{
do_with_setup_teardown(&MemberDemo::test2_internal);
}
void MemberDemo::test1_internal()
{
std::cout << "Test1" << std::endl;
}
void MemberDemo::test2_internal()
{
std::cout << "Test2" << std::endl;
}
void MemberDemo::do_with_setup_teardown(MemberDemoPtr mem_ptr)
{
std::cout << "Setup" << std::endl;
(this->*mem_ptr)();
std::cout << "Teardown" << std::endl;
}
int main()
{
MemberDemo m;
m.test1();
m.test2();
}
My question is based on this: since you have the instance, why not call the member function directly[?]
Upfront: In more than 15 years of C++ programming, I have used members pointers maybe twice or thrice. With virtual functions being around, there's not all that much use for it.
You would use them if you want to call a certain member functions on an object (or many objects) and you have to decide which member function to call before you can find out for which object(s) to call it on. Here is an example of someone wanting to do this.
I find the real usefulness of pointers to member functions comes when you look at a higher level construct such as boost::bind(). This will let you wrap a function call as an object that can be bound to a specific object instance later on and then passed around as a copyable object. This is a really powerful idiom that allows for deferred callbacks, delegates and sophisticated predicate operations. See my previous post for some examples:
https://stackoverflow.com/questions/1596139/hidden-features-and-dark-corners-of-stl/1596626#1596626
Member functions, like many function pointers, act as callbacks. You could manage without them by creating some abstract class that calls your method, but this can be a lot of extra work.
One common use is algorithms. In std::for_each, we may want to call a member function of the class of each member of our collection. We also may want to call the member function of our own class on each member of the collection - the latter requires boost::bind to achieve, the former can be done with the STL mem_fun family of classes (if we don't have a collection of shared_ptr, in which case we need to boost::bind in this case too). We could also use a member function as a predicate in certain lookup or sort algorithms. (This removes our need to write a custom class that overloads operator() to call a member of our class, we just pass it in directly to boost::bind).
The other use, as I mentioned, are callbacks, often in event-driven code. When an operation has completed we want a method of our class called to handle the completion. This can often be wrapped into a boost::bind functor. In this case we have to be very careful to manage the lifetime of these objects correctly and their thread-safety (especially as it can be very hard to debug if something goes wrong). Still, it once again can save us from writing large amounts of "wrapper" code.
There are many practical uses. One that comes to my mind is as follows:
Assume a core function such as below (suitably defined myfoo and MFN)
void dosomething(myfoo &m, MFN f){ // m could also be passed by reference to
// const
m.*f();
}
Such a function in the presence of pointer to member functions, becomes open for extension and closed for modification (OCP)
Also refer to Safe bool idiom which smartly uses pointer to members.
The best use of pointers to member functions is to break dependencies.
Good example where pointer to member function is needed is Subscriber/Publisher pattern :
http://en.wikipedia.org/wiki/Publish/subscribe
In my opinion, member function pointers do are not terribly useful to the average programmer in their raw form. OTOH, constructs like ::std::tr1::function that wrap member function pointers together with a pointer to the object they're supposed to operate on are extremely useful.
Of course ::std::tr1::function is very complex. So I will give you a simple example that you wouldn't actually use in practice if you had ::std::tr1::function available:
// Button.hpp
#include <memory>
class Button {
public:
Button(/* stuff */) : hdlr_(0), myhandler_(false) { }
~Button() {
// stuff
if (myhandler_) {
delete hdlr_;
}
}
class PressedHandler {
public:
virtual ~PressedHandler() = 0;
virtual void buttonPushed(Button *button) = 0;
};
// ... lots of stuff
// This stores a pointer to the handler, but will not manage the
// storage. You are responsible for making sure the handler stays
// around as long as the Button object.
void setHandler(const PressedHandler &hdlr) {
hdlr_ = &hdlr;
myhandler_ = false;
}
// This stores a pointer to an object that Button does not manage. You
// are responsible for making sure this object stays around until Button
// goes away.
template <class T>
inline void setHandlerFunc(T &dest, void (T::*pushed)(Button *));
private:
const PressedHandler *hdlr_;
bool myhandler_;
template <class T>
class PressedHandlerT : public Button::PressedHandler {
public:
typedef void (T::*hdlrfuncptr_t)(Button *);
PressedHandlerT(T *ob, hdlrfuncptr_t hdlr) : ob_(ob), func_(hdlr) { }
virtual ~PressedHandlerT() {}
virtual void buttonPushed(Button *button) { (ob_->*func_)(button); }
private:
T * const ob_;
const hdlrfuncptr_t func_;
};
};
template <class T>
inline void Button::setHandlerFunc(T &dest, void (T::*pushed)(Button *))
{
PressedHandler *newhandler = new PressedHandlerT<T>(&dest, pushed);
if (myhandler_) {
delete hdlr_;
}
hdlr_ = newhandler;
myhandler_ = true;
}
// UseButton.cpp
#include "Button.hpp"
#include <memory>
class NoiseMaker {
public:
NoiseMaker();
void squee(Button *b);
void hiss(Button *b);
void boo(Button *b);
private:
typedef ::std::auto_ptr<Button> buttonptr_t;
const buttonptr_t squeebutton_, hissbutton_, boobutton_;
};
NoiseMaker::NoiseMaker()
: squeebutton_(new Button), hissbutton_(new Button), boobutton_(new Button)
{
squeebutton_->setHandlerFunc(*this, &NoiseMaker::squee);
hissbutton_->setHandlerFunc(*this, &NoiseMaker::hiss);
boobutton_->setHandlerFunc(*this, &NoiseMaker::boo);
}
Assuming Button is in a library and not alterable by you, I would enjoy seeing you implement that cleanly using a virtual base class without resorting to a switch or if else if construct somewhere.
The whole point of pointers of pointer-to-member function type is that they act as a run-time way to reference a specific method. When you use the "usual" syntax for method access
object.method();
pointer->method();
the method part is a fixed, compile-time specification of the method you want to call. It is hardcoded into your program. It can never change. But by using a pointer of pointer-to-member function type you can replace that fixed part with a variable, changeable at run-time specification of the method.
To better illustrate this, let me make the following simple analogy. Let's say you have an array
int a[100];
You can access its elements with fixed compile-time index
a[5]; a[8]; a[23];
In this case the specific indices are hardcoded into your program. But you can also access array's elements with a run-time index - an integer variable i
a[i];
the value of i is not fixed, it can change at run-time, thus allowing you to select different elements of the array at run-time. That is very similar to what pointers of pointer-to-member function type let you do.
The question you are asking ("since you have the instance, why not call the member function directly") can be translated into this array context. You are basically asking: "Why do we need a variable index access a[i], when we have direct compile-time constant access like a[1] and a[3]?" I hope you know the answer to this question and realize the value of run-time selection of specific array element.
The same applies to pointers of pointer-to-member function type: they, again, let you to perform run-time selection of a specific class method.
The use case is that you have several member methods with the same signature, and you want to build logic which one should be called under given circumstances. This can be helpful to implement state machine algorithms.
Not something you use everyday...
Imagine for a second you have a function that could call one of several different functions depending on parameters passed.
You could use a giant if/else if statement
You could use a switch statement
Or you could use a table of function pointers (a jump table)
If you have a lot of different options the jump table can be a much cleaner way of arranging your code ...
Its down to personal preference though. Switch statement and jump table correspond to more or less the same compiled code anyway :)
Member pointers + templates = pure win.
e.g. How to tell if class contains a certain member function in compile time
or
template<typename TContainer,
typename TProperty,
typename TElement = decltype(*Container().begin())>
TProperty grand_total(TContainer& items, TProperty (TElement::*property)() const)
{
TProperty accum = 0;
for( auto it = items.begin(), end = items.end(); it != end; ++it) {
accum += (it->*property)();
}
return accum;
}
auto ship_count = grand_total(invoice->lineItems, &LineItem::get_quantity);
auto sub_total = grand_total(invoice->lineItems, &LineItem::get_extended_total);
auto sales_tax = grand_total(invoice->lineItems, &LineItem::calculate_tax);
To invoke it, you need a reference to an instance, but then you can call the func direct & don't need a pointer to it.
This is completely missing the point. There are two indepedent concerns here:
what action to take at some later point in time
what object to perform that action on
Having a reference to an instance satisfies the second requirement. Pointers to member functions address the first: they are a very direct way to record - at one point in a program's execution - which action should be taken at some later stage of execution, possibly by another part of the program.
EXAMPLE
Say you have a monkey that can kiss people or tickle them. At 6pm, your program should set the monkey loose, and knows whom the monkey should visit, but around 3pm your user will type in which action should be taken.
A beginner's approach
So, at 3pm you could set a variable "enum Action { Kiss, Tickle } action;", then at 6pm you could do something like "if (action == Kiss) monkey->kiss(person); else monkey->tickle(person)".
Issues
But that introducing an extra level of encoding (the Action type's introduced to support this - built in types could be used but would be more error prone and less inherently meaningful). Then - after having worked out what action should be taken at 3pm, at 6pm you have to redundantly consult that encoded value to decide which action to take, which will require another if/else or switch upon the encoded value. It's all clumsy, verbose, slow and error prone.
Member function pointers
A better way is to use a more specialised varibale - a member function pointer - that directly records which action to perform at 6pm. That's what a member function pointer is. It's a kiss-or-tickle selector that's set earlier, creating a "state" for the monkey - is it a tickler or a kisser - which can be used later. The later code just invokes whatever function's been set without having to think about the possibilities or have any if/else-if or switch statements.
To invoke it, you need a reference to an instance, but then you can call the func direct & don't need a pointer to it.
Back to this. So, this is good if you make the decision about which action to take at compile time (i.e. a point X in your program, it'll definitely be a tickle). Function pointers are for when you're not sure, and want to decouple the setting of actions from the invocation of those actions.

Testing a c++ class for features

I have a set of classes that describe a set of logical boxes that can hold things and do things to them. I have
struct IBox // all boxes do these
{
....
}
struct IBoxCanDoX // the power to do X
{
void x();
}
struct IBoxCanDoY // the power to do Y
{
void y();
}
I wonder what is the 'best' or maybe its just 'favorite' idiom for a client of these classes to deal with these optional capabilities
a)
if(typeid(box) == typeid(IBoxCanDoX))
{
IBoxCanDoX *ix = static_cast<IBoxCanDoX*>(box);
ix->x();
}
b)
IBoxCanDoX *ix = dynamic_cast<IBoxCanDoX*>(box);
if(ix)
{
ix->x();
}
c)
if(box->canDoX())
{
IBoxCanDoX *ix = static_cast<IBoxCanDoX*>(box);
ix->x();
}
d) different class struct now
struct IBox
{
void x();
void y();
}
...
box->x(); /// ignored by implementations that dont do x
e) same except
box->x() // 'not implemented' exception thrown
f) explicit test function
if(box->canDoX())
{
box->x();
}
I am sure there are others too.
EDIT:
Just to make the use case clearer
I am exposing this stuff to end users via interactive ui. They can type 'make box do X'. I need to know if box can do x. Or I need to disable the 'make current box do X' command
EDIT2: Thx to all answerers
as Noah Roberts pointed out (a) doesnt work (explains some of my issues !).
I ended up doing (b) and a slight variant
template<class T>
T* GetCurrentBox()
{
if (!current_box)
throw "current box not set";
T* ret = dynamic_cast<T*>(current_box);
if(!ret)
throw "current box doesnt support requested operation";
return ret;
}
...
IBoxCanDoX *ix = GetCurrentBox<IBoxCanDoX>();
ix->x();
and let the UI plumbing deal nicely with the exceptions (I am not really throwing naked strings).
I also intend to explore Visitor
I suggest the Visitor pattern for double-dispatch problems like this in C++:
class IVisitor
{
public:
virtual void Visit(IBoxCanDoX *pBox) = 0;
virtual void Visit(IBoxCanDoY *pBox) = 0;
virtual void Visit(IBox* pBox) = 0;
};
class IBox // all boxes do these
{
public:
virtual void Accept(IVisitor *pVisitor)
{
pVisitor->Visit(this);
}
};
class BoxCanDoY : public IBox
{
public:
virtual void Accept(IVisitor *pVisitor)
{
pVisitor->Visit(this);
}
};
class TestVisitor : public IVisitor
{
public:
// override visit methods to do tests for each type.
};
void Main()
{
BoxCanDoY y;
TestVisitor v;
y.Accept(&v);
}
Of the options you've given, I'd say that b or d are "best". However, the need to do a lot of this sort of thing is often indicative of a poor design, or of a design that would be better implemented in a dynamically typed language rather than in C++.
If you are using the 'I' prefix to mean "interface" as it would mean in Java, which would be done with abstract bases in C++, then your first option will fail to work....so that one's out. I have used it for some things though.
Don't do 'd', it will pollute your hierarchy. Keep your interfaces clean, you'll be glad you did. Thus a Vehicle class doesn't have a pedal() function because only some vehicles can pedal. If a client needs the pedal() function then it really does need to know about those classes that can.
Stay way clear of 'e' for the same reason as 'd' PLUS that it violates the Liskov Substitution Principle. If a client needs to check that a class responds to pedal() before calling it so that it doesn't explode then the best way to do that is to attempt casting to an object that has that function. 'f' is just the same thing with the check.
'c' is superfluous. If you have your hierarchy set up the way it should be then casting to ICanDoX is sufficient to check if x can do X().
Thus 'b' becomes your answer from the options given. However, as Gladfelter demonstrates, there are options you haven't considered in your post.
Edit note: I did not notice that 'c' used a static_cast rather than dynamic. As I mention in an answer about that, the dynamic_cast version is cleaner and should be preferred unless specific situations dictate otherwise. It's similar to the following options in that it pollutes the base interface.
Edit 2: I should note that in regard to 'a', I have used it but I don't use types statically like you have in your post. Any time I've used typeid to split flow based on type it has always been based on something that is registered during runtime. For example, opening the correct dialog to edit some object of unknown type: the dialog governors are registered with a factory based on the type they edit. This keeps me from having to change any of the flow control code when I add/remove/change objects. I generally wouldn't use this option under different circumstances.
A and B require run time type identification(RTTI) and might be slower if you are doing a lot checks. Personally I don't like the solutions of "canDoX" methods, if situations like this arise the design probably needs an upgrade because you are exposing information that is not relevant to the class.
If you only need to execute X or Y, depending on the class, I would go for a virtual method in IBox which get overridden in subclasses.
class IBox{
virtual void doThing();
}
class IBoxCanDoX: public IBox{
void doThing() { doX(); }
void doX();
}
class IBoxCanDoY: public IBox{
void doThing() { doY(); }
void doY();
}
box->doThing();
If that solution is not applicable or you need more complex logic, then look at the Visitor design pattern. But keep in mind that the visitor pattern is not very flexible when you add new classes regularly or methods change/are added/are removed (but that also goes true for your proposed alternatives).
If you are trying to call either of these classes actions from contingent parts of code, you I would suggest you wrap that code in a template function and name each class's methods the same way to implement duck typing, thus your client code would look like this.
template<class box>
void box_do_xory(box BOX){
BOX.xory();
}
There is no general answer to your question. Everything depends. I can say only that:
- don't use a), use b) instead
- b) is nice, requires least code, no need for dummy methods, but dynamic_cast is a little slow
- c) is similar to b) but it is faster (no dynamic_cast) and requires more memory
- e) has no sense, you still need to discover if you can call the method so the exception is not thrown
- d) is better then f) (less code to write)
- d) e) and f) produce more garbage code then others, but are faster and less memory consuming
I assume that you will not only be working with one object of one type here.
I would lay out the data that you are working with and try to see how you can lay it out in memory in order to do data-driven programming. A good layout in memory should reflect the way that you store the data in your classes and how the classes are layed out in memory. Once you have that basic design structured (shouldn't take more than a napkin), I would begin organizing the objects into lists dependent on the operations that you plan to do on the data. If you plan to do X() on a collection of objects { Y } in the subset X, I would probably make sure to have a static array of Y that I create from the beginning. If you wish to access the entire of X occasionally, that can be arranged by collecting the lists into a dynamic list of pointers (using std::vector or your favorite choice).
I hope that makes sense, but once implemented it gives simple straight solutions that are easy to understand and easy to work with.
There is a generic way to test if a class supports a certain concept and then to execute the most appropriate code. It uses SFINAE hack. This example is inspired by Abrahams and Gurtovoy's "C++ Template Metaprogramming" book. The function doIt will use x method if it is present, otherwise it will use y method. You can extend CanDo structure to test for other methods as well. You can test as many methods as you wish, as long as the overloads of doIt can be resolved uniquely.
#include <iostream>
#include <boost/config.hpp>
#include <boost/utility/enable_if.hpp>
typedef char yes; // sizeof(yes) == 1
typedef char (&no)[2]; // sizeof(no) == 2
template<typename T>
struct CanDo {
template<typename U, void (U::*)()>
struct ptr_to_mem {};
template<typename U>
static yes testX(ptr_to_mem<U, &U::x>*);
template<typename U>
static no testX(...);
BOOST_STATIC_CONSTANT(bool, value = sizeof(testX<T>(0)) == sizeof(yes));
};
struct DoX {
void x() { std::cout << "doing x...\n"; }
};
struct DoAnotherX {
void x() { std::cout << "doing another x...\n"; }
};
struct DoY {
void y() { std::cout << "doing y...\n"; }
};
struct DoAnotherY {
void y() { std::cout << "doing another y...\n"; }
};
template <typename Action>
typename boost::enable_if<CanDo<Action> >::type
doIt(Action* a) {
a->x();
}
template <typename Action>
typename boost::disable_if<CanDo<Action> >::type
doIt(Action* a) {
a->y();
}
int main() {
DoX doX;
DoAnotherX doAnotherX;
DoY doY;
DoAnotherY doAnotherY;
doIt(&doX);
doIt(&doAnotherX);
doIt(&doY);
doIt(&doAnotherY);
}