I find myself using policies a lot in my code and usually I'm very happy with that.
But from time to time I find myself confronted with using that pattern in situations where the Policies are selected and runtime and I have developed habbits to work around such situations. Usually I start with something like that:
class DrawArrays {
protected:
void sendDraw() const;
};
class DrawElements {
public:
void setIndices( GLenum mode, GLsizei count, GLenum type, const GLvoid *indices);
protected:
void sendDraw() const;
};
template<class Policy>
class Vertices : public Policy {
using Policy::sendDraw();
public:
void render() const;
};
When the policy is picked at runtime I have different choices of working around the situation.
Different code paths:
if(drawElements) {
Vertices<DrawElements> vertices;
} else {
Vertices<DrawArrays> vertices;
}
Inheritance and virtual calls:
class PureVertices {
public:
void render()=0;
};
template<class Policy>
class Vertices : public PureVertices, public Policy {
//..
};
Both solutions feel wrong to me. The first creates an umaintainable mess and the second introduces the overhead of virtual calls that I tried to avoid by using policies in the first place.
Am I missing the proper solutions or do I use the wrong pattern to solve the problem?
Use the second version. Virtual calls are more expensive than static calls because they require an additional pointer lookup, but if "sendDraw" does any real drawing, you won't notice the difference. If you really have a performance problem later, use a profiler to find out where the problem is and fix it. In the (extremely unlikely) case that the virtual method call is actually a performance problem, you could try optimizing it using policies. Until then, write code that's most maintainable so you have development time left to optimize later.
Remeber: Premature optimization is the root of all evil!
In general, if you need behavior to vary at runtime you are going to have to pay some overhead cost for that, whether it be a switch/if statement or a virtual call. The question is how much runtime variance you need. If you're very confident you will only ever have a small number of types, then a switch statement may really be appropriate. Virtual calls give more flexibility for extending in the future, but you don't necessarily need that flexibility; it depends on the problem. That said, there's still a lot of of ways to implement your 'switch statement' or your 'virtual call'. Instead of a switch/if you could use the Visitor Pattern (more maintainable), and instead of virtual calls you could use function pointers (when it doesn't make sense for the class itself to specify the behavior that is invoked at runtime). Also, although I don't agree with everything the author says (I think he artificially makes his idea and OOP mutually exclusive) you might be interested in Data Oriented Programming, especially if you're working on rendering as your class names suggest.
Why do you oppose virtual calls? Is the overhead really considerable for you? I think the code becomes more readable when you express what you want to do by writing an interface and different implementations instead of some unreadable templates.
Anyway, why do you inherit Vertices from Policy class? You already have it as a template argument. Looks like composition is more appropriate here. If you use inheritance, you can have just one non-template class Vertices and change its behaviour by passing different Policy objects - this is Strategy pattern.
class Policy {
public:
void sendDraw() const =0;
}
class Vertices {
public:
Vertices(Policy * policy) :
: policy(policy)
{
}
void render() {
// Do something with policy->sendDraw();
}
}
I don't see anything wrong with the first one - it doesn't look like an unmaintainable mess to me, although there's not enough code here to determine if there might be a better refactoring.
If you aren't putting the draw calls into a display list then the array data will have to be copied out when it's drawn. (Either the caller blocks until the GPU is done, or the driver copies it out of the app memory to somewhere safe.) So the virtual function won't be an issue. And if you ARE putting them in a display list, then the virtual function won't be an issue, because it's only being set up the once.
And in any event, PCs do virtual calls very quickly. They're not free, it's true, but if drawing (say) thousands of sets of vertices per frame then an extra virtual function call per draw is highly unlikely to break the bank. Out of all the things to think about ahead of time, avoiding uses of a virtual function in the very sorts of situation that virtual functions are designed for is probably one of the less important ones. Unnecessarily-virtual functions are worth avoiding; genuinely useful virtual functions are innocent until proven guilty...
(Drawing more vertices per call and changing shader, shader constants, vertex format and render state settings less frequently are likely to pay greater dividends.)
Related
Sorry for the lack of a better title; I couldn't think of a better one.
I have a class hierarchy like the following:
class Simulator
{
public:
virtual void simulate(unsigned int num_steps);
};
class SpecializedSimulator1 : public Simulator
{
Heap state1; Tree state2; // whatever
public:
double speed() const;
void simulate(unsigned int num_steps) override;
};
class SpecializedSimulator2 : public Simulator
{
Stack state1; Graph state2; // whatever
public:
double step_size() const;
void simulate(unsigned int num_steps) override;
};
class SpecializedSubSimulator2 : public SpecializedSimulator2
{
// more state...
public:
// more parameters...
void simulate(unsigned int num_steps) override;
};
class Component
{
public:
virtual void receive(int port, string data);
virtual void react(Simulator &sim);
};
So far, so good.
Now it gets more complicated.
Components can support one or more types of simulation. (For example, a component that negates its input may support Boolean circuits as well as continuous-time simulation.) Every component "knows" what kinds of simulations it supports, and given a particular kind of simulator, it queries the simulator (via dynamic_cast or double dispatch or whatever means are appropriate) to find out how it needs to react.
Here's where it gets tricky:
Some Components (say, imagine a SimulatorComponent class) themselves need to run sub-simulations inside of them. Part of this involves inheriting some properties of outer simulations, but potentially changing a few of them. For example, a continuous-time sub-simulator might want to lower its step size for its internal components in order to get better accuracy, but otherwise keep everything else the same.
Ideally, SimulatorComponent would be able to inherit from a class (say, SpecializedSimulator2) and override some subset of its properties as desired. The trouble, though, is that it has no idea whether the outer simulator's most-derived type is a SpecializedSimulator2 -- it may very well be the case that SimulatorComponent is running inside a more specialized simulator than that, like a SpecializedSubSimulator2! In that case, sub-components of SimulatorComponent would need to be able to somehow get access to the properties of SpecializedSubSimulator2 that they might need to access, but SimulatorComponent itself would not (and should not) be aware of these properties.
So, we see we can't use inheritance here.
Since the only means in C++ for "discovering" sub-interfaces like this is dynamic_cast, that means the sub-components must be able to directly access the outer simulator themselves, in order to run dynamic_cast on them. But if they do this, then SimulatorComponent can't intercept any of the calls.
At this point, I'm not sure what to do. The problem isn't impossible to solve, obviously -- I can think of some solutions (e.g. hierarchical key/value dictionary maintained at run-time) -- but the solutions involves some massive tradeoffs (e.g. less compile-time checking, performance loss, etc.) and make me wonder what I should be doing.
So, basically: how should I approach this problem? Is there a flaw in my design? Should I be solving this problem differently? Is there a design pattern for this that I'm just not aware of? Any tips?
I'll try to give a partial advice. For the situation in which you need to use a simulator inheriting properties from some parent then a cloning function could be the solution. This way you can ignore what actually the original simulation was, but anyway you end up with a new one with the same props.
It may just require some basic properties (like the simulation time step) which means you need to dynamic_cast to some intermediate class in your simulator hierarcy, but not exactly spot the right one.
As part of a system design, we need to implement a factory pattern. In combination with the Factory pattern, we are also using CRTP, to provide a base set of functionality which can then be customized by the Derived classes.
Sample code below:
class FactoryInterface{
public:
virtual void doX() = 0;
};
//force all derived classes to implement custom_X_impl
template< typename Derived, typename Base = FactoryInterface>
class CRTP : public Base
{
public:
void doX(){
// do common processing..... then
static_cast<Derived*>(this)->custom_X_impl();
}
};
class Derived: public CRTP<Derived>
{
public:
void custom_X_impl(){
//do custom stuff
}
};
Although this design is convoluted, it does a provide a few benefits. All the calls after the initial virtual function call can be inlined. The derived class custom_X_impl call is also made efficiently.
I wrote a comparison program to compare the behavior for a similar implementation (tight loop, repeated calls) using function pointers and virtual functions. This design came out triumphs for gcc/4.8 with O2 and O3.
A C++ guru however told me yesterday, that any virtual function call in a large executing program can take a variable time, considering cache misses and I can achieve a potentially better performance using C style function table look-ups and gcc hotlisting of functions. However I still see 2x the cost in my sample program mentioned above.
My questions are as below:
1. Is the guru's assertion true? For either answers, are there any links I can refer.
2. Is there any low latency implementation which I can refer, has a base class invoking a custom function in a derived class, using function pointers?
3. Any suggestions on improving the design?
Any other feedback is always welcome.
Your guru refers to the hot attribute of the gcc compiler. The effect of this attribute is:
The function is optimized more aggressively and on many targets it is
placed into a special subsection of the text section so all hot
functions appear close together, improving locality.
So yes, in a very large code base, the hotlisted function may remain in cache ready to be executed without delay, because it avodis cache misses.
You can perfectly use this attribute for member functions:
struct X {
void test() __attribute__ ((hot)) {cout <<"hello, world !\n"; }
};
But...
When you use virtual functions the compiler generally generates a vtable that is shared between all objects of the class. This table is a table of pointers to functions. And indeed -- your guru is right -- nothing garantees that this table remains in cached memory.
But, if you manually create a "C-style" table of function pointers, the problem is EXACTLY THE SAME. While the function may remain in cache, nothing ensures that your function table remains in cache as well.
The main difference between the two approaches is that:
in the case of virtual functions, the compiler knows that the virtual function is a hot spot, and could decide to make sure to keep the vtable in cache as well (I don't know if gcc can do this or if there are plans to do so).
in the case of the manual function pointer table, your compiler will not easily deduce that the table belongs to a hot spot. So this attempt of manual optimization might very well backfire.
My opinion: never try to optimize yourself what a compiler can do much better.
Conclusion
Trust in your benchmarks. And trust your OS: if your function or your data is frequently acessed, there are high chances that a modern OS will take this into account in its virtual memry management, and whatever the compiler will generate.
I know a lot of questions on these topics have been asked before, but I have a specific case where speed is (moderately) important and the speed increase when using function pointers rather than virtual functions is about 25%. I wondered (for mostly academic reasons) why?
To give more detail, I am writing a simulation which consists of a series of Cells. The Cells are connected by Links which dictate the interactions between the Cells. The Link class has a virtual function called update(), which causes the two Cells it is linking to interact. It is virtual so that I can create different types of Links to give different types of interactions. For example, at the moment I am simulating inviscid flow, but I may want a link which has viscosity or applies a boundary condition.
The second way I can achieve the same affect is to pass a function pointer to the Link class and make the target of the function pointer a friend. I can now have a non-virtual update() which uses the function pointer. Derived classes can use pointers to different functions giving polymorphic behaviour.
When I build the two versions and profile with Very Sleepy, I find that the function pointer version is significantly faster than the virtual function version and that it appears that the Link class has been entirely optimised away - I just see calls from my main function to the functions pointed to.
I just wondered what made it easier for my compiler (MSVC++ 2012 Express) to optimise the function pointer case better than the virtual function case?
Some code below if it helps for the function pointer case, I'm sure it is obvious how the equivalent would be done with virtual functions.
void InviscidLinkUpdate( void * linkVoid )
{
InviscidLink * link=(InviscidLink*)linkVoid;
//do some stuff with the link
//e.g.
//link->param1=
}
void ViscousLinkUpdate( void * linkVoid )
{
ViscousLink * link=(ViscousLink*)linkVoid;
//do some stuff with the link
//e.g.
//link->param1=
}
class Link
{
public:
Link(Cell *cell1, Cell*cell2, float area, float timeStep, void (*updateFunc)( void * ))
:m_cell1(cell1), m_cell2(cell2), m_area(area), m_timeStep(timeStep), m_update(updateFunc)
~Link(){};
void update() {m_update( this );}
protected:
void (*const m_update)( void *, UNG_FLT );
Cell *m_cell1;
Cell *m_cell2;
float m_area;
float m_timeStep
//some other parameters I want to modify in update()
float param1;
float param2;
};
class InviscidLink : public Link
{
friend void InviscidLinkUpdate( void * linkVoid )
public:
InviscidLink(Cell *cell1, Cell*cell2, float area, float timeStep)
Link(cell1, cell2, area, timeStep, InvicedLinkUpdate)
{}
};
class ViscousLink : public Link
{
friend void ViscousLinkUpdate( void * linkVoid )
public:
ViscousLink(Cell *cell1, Cell*cell2, float area, float timeStep)
Link(cell1, cell2, area, timeStep, ViscousLinkUpdate)
{}
};
edit
I have now put the full source on GitHub - https://github.com/philrosenberg/ung
Compare commit 5ca899d39aa85fa3a86091c1202b2c4cd7147287 (the function pointer version) with commit aff249dbeb5dfdbdaefc8483becef53053c4646f (the virtual function version). Unfortunately I based the test project initially on a wxWidgets project in case I wanted to play with some graphical display so if you don't have wxWidgets then you will need to hack it into a command line project to compile it.
I used Very Sleepy to benchmark it
further edit:
milianw's comment about profile guided optimization turned out to be the solution, but as a comment I currently cannot mark it as the answer. Using the pro version of Visual Studio with the profile guided optimization gave similar runtimes as using inline functions. I guess this is the Virtual Call Speculation described at http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx. I still find it a bit odd that this code could be more easily optimized using function pointers instead of virtual functions, but I guess that is why everyone advises to TEST, rather than assume certain code is faster than another.
Two things I can think about that differs when using function pointers vs virtual functions :
Your class size will be smaller since it won't have a vftable allocated hence smaller size, more cache friendly
There's one indirection less with function pointer ( With virtual functions : Object Indirection, vftable indirection, virtual function indirection , with functors : Object indirection, functor indirection -> your update function is resolved at compile time, since it's not virtual)
As requested, here my comment again as an answer:
Try to use profile-guided optimizations here. Then, the profiler can potentially apply devirtualization to speed up the code. Also, don't forget to mark your implementations as final, which can further help. See e.g. http://channel9.msdn.com/Shows/C9-GoingNative/C9GoingNative-12-C-at-BUILD-2012-Inside-Profile-Guided-Optimization or the excellent GCC article series over at http://hubicka.blogspot.de/search/label/devirtualization.
The actual cost of a virtual function call is normally insignificant. However, virtual functions may, as you observed, impact the code speed considerably. The main reason is that normally a virtual function call is real function call - with adding a frame to the stack. It is so because virtual function calls are resolved in runtime.
If a function is not virtual, it is much easier for C++ compiler to inline it. The call is resolved in compilation time, so the compiler may replace the call with the body of the called function. This allows for much more aggressive optimisations - like doing some computations once only instead of each loop run, etc.
Based on the information provided here my best guess is that you're operating on a large number of objects, and that the one extra indirection induced by the virtual table is increasing cache misses to the point where the I/O to refetch from memory becomes measurable.
As another alternative have you considered using templates and either CRTP or a policy-based approach for the Link class? Depending on your needs it may be possible to entirely remove the dynamic dispatching.
I have a chess variants engine that plays suicide chess and losers chess along with normal chess. I might, over time, add more variants to my engine. The engine is implemented completely in C++ with proper usage of OOP. My question is related to design of such a variant engine.
Initially the project started as a suicide-only engine while over time I added other flavors. For adding new variants, I experimented using polymorphism in C++ first. For instance, a MoveGenerator abstract class had two subclasses SuicideMoveGenerator and NormalMoveGenerator and depending on the type of game chosen by user, a factory would instantiate the right subclass. But I found this to be much slower - obviously because instantiating classes containing virtual functions and calling virtual functions in tight loops are both quite inefficient.
But then it occurred to me to use C++ templates with template specialization for separating logic for different variants with maximum reuse of code. This also seemed very logical because dynamic linking is not really necessary in the context as once you choose the type of game, you basically stick with it until the end of the game. C++ template specialization provides exactly this - static polymorphism. The template parameter is either SUICIDE or LOSERS or NORMAL.
enum GameType { LOSERS, NORMAL, SUICIDE };
So once user selects a game type, appropriate game object is instantiated and everything called from there will be appropriately templatized. For instance if user selects suicide chess, lets say:
ComputerPlayer<SUICIDE>
object is instantiated and that instantiation basically is linked to the whole control flow statically. Functions in ComputerPlayer<SUICIDE> would work with MoveGenerator<SUICIDE>, Board<SUICIDE> and so on while corresponding NORMAL one will appropriately work.
On a whole, this lets me instantiate the right templatize specialized class at the beginning and without any other if conditions anywhere, the whole thing works perfectly. The best thing is there is no performance penalty at all!
The main downside with this approach however is that using templates makes your code a bit harder to read. Also template specialization if not appropriately handled can lead to major bugs.
I wonder what do other variant engine authors normally do for separation of logic (with good reuse of code)?? I found C++ template programming quite suitable but if there's anything better out there, I would be glad to embrace. In particular, I checked Fairymax engine by Dr. H G Muller but that uses config files for defining game rules. I don't want to do that because many of my variants have different extensions and by making it generic to the level of config-files the engine might not grow strong. Another popular engine Sjeng is littered with if conditions everywhere and I personally feel thats not a good design.
Any new design insights would be very useful.
"Calling virtual functions in tight loops are inefficient"
I would be pretty surprised actually if this caused any real bloat, if all the variables of the loop have the same dynamic type, then I would expect the compiler to fetch the corresponding instruction from its L1 cache and thus not suffer much.
However there is one part that worries me:
"obviously because instantiating classes containing virtual functions [is] quite inefficient"
Now... I am really surprised.
The cost of instantiating a class with virtual functions is near undistinguishable from the cost of instantiating a class without any virtual functions: it's one more pointer, and that's all (on popular compilers, which corresponds to the _vptr).
I surmise that your problem lies elsewhere. So I am going to take a wild guess:
do you, by any chance, have a lot of dynamic instantiation going on ? (calling new)
If that is the case, you would gain much by removing them.
There is a Design Pattern called Strategy which would be eminently suitable for your precise situation. The idea of this pattern is akin, in fact, to the use of virtual functions, but it actually externalize those functions.
Here is a simple example:
class StrategyInterface
{
public:
Move GenerateMove(Player const& player) const;
private:
virtual Move GenerateMoveImpl(Player const& player) const = 0;
};
class SuicideChessStrategy: public StrategyInterface
{
virtual Move GenerateMoveImpl(Player const& player) const = 0;
};
// Others
Once implemented, you need a function to get the right strategy:
StrategyInterface& GetStrategy(GameType gt)
{
static std::array<StrategyInterface*,3> strategies
= { new SuicideChessStrategy(), .... };
return *(strategies[gt]);
}
And finally, you can delegate the work without using inheritance for the other structures:
class Player
{
public:
Move GenerateMove() const { return GetStrategy(gt).GenerateMove(*this); }
private:
GameType gt;
};
The cost is pretty much similar to using virtual functions, however you do not need dynamically allocated memory for the basic objects of your game any longer, and stack allocation is a LOT faster.
I'm not quite sure if this is a fit but you may be able to achieve static polymorphism via the CRTP with some slight modifications to your original design.
it's my flight simulation application again. I am leaving the mere prototyping phase now and start fleshing out the software design now. At least I try..
Each of the aircraft in the simulation have got a flight plan associated to them, the exact nature of which is of no interest for this question. Sufficient to say that the operator way edit the flight plan while the simulation is running. The aircraft model most of the time only needs to read-acess the flight plan object which at first thought calls for simply passing a const reference. But ocassionally the aircraft will need to call AdvanceActiveWayPoint() to indicate a way point has been reached. This will affect the Iterator returned by function ActiveWayPoint(). This implies that the aircraft model indeed needs a non-const reference which in turn would also expose functions like AppendWayPoint() to the aircraft model. I would like to avoid this because I would like to enforce the useage rule described above at compile time.
Note that class WayPointIter is equivalent to a STL const iterator, that is the way point can not be mutated by the iterator.
class FlightPlan
{
public:
void AppendWayPoint(const WayPointIter& at, WayPoint new_wp);
void ReplaceWayPoint(const WayPointIter& ar, WayPoint new_wp);
void RemoveWayPoint(WayPointIter at);
(...)
WayPointIter First() const;
WayPointIter Last() const;
WayPointIter Active() const;
void AdvanceActiveWayPoint() const;
(...)
};
My idea to overcome the issue is this: define an abstract interface class for each usage role and inherit FlightPlan from both. Each user then only gets passed a reference of the appropriate useage role.
class IFlightPlanActiveWayPoint
{
public:
WayPointIter Active() const =0;
void AdvanceActiveWayPoint() const =0;
};
class IFlightPlanEditable
{
public:
void AppendWayPoint(const WayPointIter& at, WayPoint new_wp);
void ReplaceWayPoint(const WayPointIter& ar, WayPoint new_wp);
void RemoveWayPoint(WayPointIter at);
(...)
};
Thus the declaration of FlightPlan would only need to be changed to:
class FlightPlan : public IFlightPlanActiveWayPoint, IFlightPlanEditable
{
(...)
};
What do you think? Are there any cavecats I might be missing? Is this design clear or should I come up with somethink different for the sake of clarity?
Alternatively I could also define a special ActiveWayPoint class which would contain the function AdvanceActiveWayPoint() but feel that this might be unnecessary.
Thanks in advance!
From a strict design point of view, your idea is quite good indeed. It is equivalent to having a single objects and several different 'views' over this object.
However there is a scaling issue here (relevant to the implementation). What if you then have another object Foo that needs access to the flight plan, you would add IFlightPlanFoo interface ?
There is a risk that you will soon face an imbroglio in the inheritance.
The traditional approach is to create another object, a Proxy, and use this object to adapt/restrict/control the usage. It's a design pattern: Proxy
Here you would create:
class FlightPlanActiveWayPoint
{
public:
FlightPlanActiveWayPoint(FlightPlan& fp);
// forwarding
void foo() { fp.foo(); }
private:
FlightPlan& mFp;
};
Give it the interface you planned for IFlightPlanActiveWayPoint, build it with a reference to an actual FlightPlan object, and forward the calls.
There are several advantages to this approach:
Dependency: it's unnecessary to edit flightPlan.h each time you have a new requirement, thus unnecessary to rebuild the whole application
It's faster because there is no virtual call any longer, and the functions can be inlined (thus amounting to almost nothing). Though I would recommend not to inline them to begin with (so you can modify them without recompiling everything).
It's easy to add checks / logging etc without modifying the base class (in case you have a problem in a particular scenario)
My 2 cents.
Not sure about "cavecats" ;-) but isn't the crew of the aircraft sometimes modifying the flight plan themselves in real life? E.g. if there is a bad storm ahead, or the destination airport is unavailable due to thick fog. In crisis situations, it is the right of the captain of the aircraft to make the final decision. Of course, you may decide not to include this in your model, but I thought it is worth mentioning.
An alternative to multiple inheritance could be composition, using a variation of the Pimpl idiom, in which the wrapper class would not expose the full interface of the internal class. As #Matthieu points out, this is also known as a variation of the Proxy design pattern.