Traits vs virtual overhead - c++

I've come across to an Alexandrescu tutorial about traits and I have some reflections to share. This is the code:
// Example 6: Reference counting traits
//
template <class T>
class RefCountingTraits
{
static void Refer(T* p)
{
p->IncRef(); // assume RefCounted interface
}
static void Unrefer(T* p)
{
p->DecRef(); // assume RefCounted interface
}
};
template <>
class RefCountingTraits<Widget>
{
static void Refer(Widget* p)
{
p->AddReference(); // use Widget interface
}
static void Unrefer(Widget* p)
{
// use Widget interface
if (p->RemoveReference() == 0)
delete p;
}
};
How much overhead we have in this case compared to a standard virtual function member case? we are not accessing directly to the object also in this case: we are still passing a pointer. Is the compiler able to optimize it in same way?

At typical production optimisation levels (-O2 or /O2) you can expect all the code you've shown to be inlined and the bits without side-effects optimised away. That leaves the actual calls to IncRef or AddReference and the check for and delete-ion.
If virtual functions had been used, and if the reference counting code is trivial (e.g. not thread safe), it might have been about an order of magnitude slower due to a dispatch table lookup and out-of-line function call, but that will vary a bit with compiler, exact optimisation settings, CPU, calling conventions etc..
As always, when you have to care, profile and experiment.

Related

Separate class ownership and use, generate optimal (fast) code

In general, my question was simple, I want to imlement some design pattern, which allows following:
there is exists some predefined interface (Interface class);
and exists class (Utilizer), which accepts another class (via pointer, reference, smart-pointer, whatever else...) implementing predefined interface, and stars using this class via the interface;
class Utilizer should be able to own other class passed to it (which implements Interface) and delete it when Utilizer is destroyed.
In managed languages (like C#, Java) this can be implemented in simple way: class Utilizer might accept reference to base class (Interface) and hold this reference in the class, and use interface via the reference. On destruction of Utilizer class, the garbage collector might delete class, which implements `Interface'.
In C++ we have no garbage collector... Ok, we can use some smart_pointer, but this might be not generic smart pointer, but smart pointer of some particular type (for example, unique_ptr with user specified deleter, because class, which implements Interface is resided in shared memory and regular operator delete() can't be applied to this class...)
And second nuisance: virtual functions. Of course, when you are using managed languages you may not notice this. But if you made Interface class as abstract base class (with virtual keyword), then you will notice, that in test function (see the code below) compiler performs indirect calls (via function pointers). This happens because compiler needs to access virtual functions table. The call via function pointer is not very heavy (few processor ticks, or event tens of ticks), but the major issue is that compiler doesn't see that happens next, after the indirection. Optimizer stops here. Functions can't be inlined anymore. And we get not optimal code, which doesn't reduces to few machine instructions (for example test function reduces in the example to loading of two constant and calling printf function), we get unoptimal "generic" implementation, which effectively nullifies all the benefits of C++.
There is typical solution to avoid getting of unoptimal code -- avoid using virtual functions (prefer CRTP pattern instead), avoid type erasure (in the example, Utilizer class might store not Accessor, but std::function<Interface<T>&()> -- this solution is nice, but indirection in std::function leads to generation of unoptimal code again).
And the essence of the question, how to implement the logic described above (class which owns other abstract, non some particular, class and uses it) in C++ effectively?
Not sure if I was able to clearly express my thought. Below is the my implementation with the comments. It generates optimal code (see disassembly of test function in live demo live demo), all is inlined as expected. But the whole implementation looks cumbersome.
I would like to hear how can I improve the code.
#include <utility>
#include <memory>
#include <functional>
#include <stdio.h>
#include <math.h>
// This type implements interface: later Utilizer class
// accept Accessor type, which was able to return reference
// to object of some type, which implements this interface,
// and Utilizer class uses returned object via this interface.
template <typename Impl> class Interface
{
public:
int oper(int arg) { return static_cast<Impl*>(this)->oper(arg); }
const char *name() const { return static_cast<const Impl*>(this)->name(); }
};
// Class which uses object, returned by Accessor class, via
// predefined interface of type Interface<Impl>.
// Utilizer class can perform operations on any class
// which inherited from Interface class, but Utilizer
// doesn't directly owns parficular instance of the
// class implementing Interface: Accessor serves for
// getting of particular implementation of Interface
// from somewhere.
template <typename Accessor> class Utilizer
{
private:
typedef typename std::remove_reference<decltype(std::declval<Accessor>()())>::type Impl;
Accessor accessor;
// This static_cast allows only such Accessor types, for
// which operator() returns class inherited from Interface
Interface<Impl>& get() const { return static_cast<Interface<Impl>&>(accessor()); }
public:
template <typename...Args> Utilizer(Args&& ...args) : accessor(std::forward<Args>(args)...) {}
// Following functions is the public interface of Utilizer class
// (this interface have no relations with Interface class,
// except of the fact, that implementation uses Interface class):
double func(int a, int b)
{
if (a > 0) return sqrt(get().oper(a) + b);
else return get().oper(b) * a;
}
const char *text() const
{
const char *result = get().name();
if (result == nullptr) return "unknown";
return result;
}
};
// This is implementation of Interface<Impl> interface
// (program may have multiple similar classes and Utilizer
// can work with any of these classes).
struct Implementation : public Interface<Implementation>
{
Implementation() { puts("Implementation()"); }
Implementation(const Implementation&) { puts("copy Implementation"); }
~Implementation() { puts("~Implementation()"); }
// Following functions are implementation of functions
// defined in Interface<Impl>:
int oper(int arg) { return arg + 42; }
const char *name() const { return "implementation"; }
};
// This is class which owns some particular implementation
// of the class inherited from Interface. This class only
// owns the class which was given to it and allows accessing
// this class via operator(). This class is intendent to be
// template argument for Utilizer class.
template <typename SmartPointer> struct Owner
{
SmartPointer p;
Owner(Owner&& other) : p(std::move(other.p)) {}
template <typename... Args> Owner(Args&&...args) : p(std::forward<Args>(args)...) {}
Implementation& operator()() const { return *p; }
};
typedef std::unique_ptr<Implementation> PtrType;
typedef Utilizer<Owner<PtrType> > UtilType;
void test(UtilType& utilizer)
{
printf("%f %s\n", utilizer.func(1, 2), utilizer.text());
}
int main()
{
PtrType t(new Implementation);
UtilType utilizer(std::move(t));
test(utilizer);
return 0;
}
Your CPU is smarter than you think. Modern CPUs are absolutely capable of guessing the target of, and speculatively executing through, an indirect branch. The speed of the L1 cache, and register renaming, often remove most or all of the extra cost of a non-inlined call. And the 80/20 rule applies in spades: Your test code's bottleneck is the internal processing done by puts, not the late binding you're trying to avoid.
To answer your question, you could improve your code by removing all that template stuff: it would be just as fast, and more maintainable (hence more practical to do actual optimization). Optimization of algorithms and data structures should often be done up-front; optimization of low-level instruction streams should never, ever, ever be done except after analyzing profiling results.

What is the motivation behind static polymorphism in C++?

I understand the mechanics of static polymorphism using the Curiously Recurring Template Pattern. I just do not understand what is it good for.
The declared motivation is:
We sacrifice some flexibility of dynamic polymorphism for speed.
But why bother with something so complicated like:
template <class Derived>
class Base
{
public:
void interface()
{
// ...
static_cast<Derived*>(this)->implementation();
// ...
}
};
class Derived : Base<Derived>
{
private:
void implementation();
};
When you can just do:
class Base
{
public:
void interface();
}
class Derived : public Base
{
public:
void interface();
}
My best guess is that there is no semantic difference in the code and that it is just a matter of good C++ style.
Herb Sutter wrote in Exceptional C++ style: Chapter 18 that:
Prefer to make virtual functions private.
Accompanied of course with a thorough explanation why this is good style.
In the context of this guideline the first example is good, because:
The void implementation() function in the example can pretend to be virtual, since it is here to perform customization of the class. It therefore should be private.
And the second example is bad, since:
We should not meddle with the public interface to perform customization.
My question is:
What am I missing about static polymorphism? Is it all about good C++ style?
When should it be used? What are some guidelines?
What am I missing about static polymorphism? Is it all about good C++ style?
Static polymorphism and runtime polymorphism are different things and accomplish different goals. They are both technically polymorphism, in that they decide which piece of code to execute based on the type of something. Runtime polymorphism defers binding the type of something (and thus the code that runs) until runtime, while static polymorphism is completely resolved at compile time.
This results in pros and cons for each. For instance, static polymorphism can check assumptions at compile time, or select among options which would not compile otherwise. It also provides tons of information to the compiler and optimizer, which can inline knowing fully the target of calls and other information. But static polymorphism requires that implementations be available for the compiler to inspect in each translation unit, can result in binary code size bloat (templates are fancy pants copy paste), and don't allow these determinations to occur at runtime.
For instance, consider something like std::advance:
template<typename Iterator>
void advance(Iterator& it, ptrdiff_t offset)
{
// If it is a random access iterator:
// it += offset;
// If it is a bidirectional iterator:
// for (; offset < 0; ++offset) --it;
// for (; offset > 0; --offset) ++it;
// Otherwise:
// for (; offset > 0; --offset) ++it;
}
There's no way to get this to compile using runtime polymorphism. You have to make the decision at compile time. (Typically you would do this with tag dispatch e.g.)
template<typename Iterator>
void advance_impl(Iterator& it, ptrdiff_t offset, random_access_iterator_tag)
{
// Won't compile for bidirectional iterators!
it += offset;
}
template<typename Iterator>
void advance_impl(Iterator& it, ptrdiff_t offset, bidirectional_iterator_tag)
{
// Works for random access, but slow
for (; offset < 0; ++offset) --it; // Won't compile for forward iterators
for (; offset > 0; --offset) ++it;
}
template<typename Iterator>
void advance_impl(Iterator& it, ptrdiff_t offset, forward_iterator_tag)
{
// Doesn't allow negative indices! But works for forward iterators...
for (; offset > 0; --offset) ++it;
}
template<typename Iterator>
void advance(Iterator& it, ptrdiff_t offset)
{
// Use overloading to select the right one!
advance_impl(it, offset, typename iterator_traits<Iterator>::iterator_category());
}
Similarly, there are cases where you really don't know the type at compile time. Consider:
void DoAndLog(std::ostream& out, int parameter)
{
out << "Logging!";
}
Here, DoAndLog doesn't know anything about the actual ostream implementation it gets -- and it may be impossible to statically determine what type will be passed in. Sure, this can be turned into a template:
template<typename StreamT>
void DoAndLog(StreamT& out, int parameter)
{
out << "Logging!";
}
But this forces DoAndLog to be implemented in a header file, which may be impractical. It also requires that all possible implementations of StreamT are visible at compile time, which may not be true -- runtime polymorphism can work (although this is not recommended) across DLL or SO boundaries.
When should it be used? What are some guidelines?
This is like someone coming to you and saying "when I'm writing a sentence, should I use compound sentences or simple sentences"? Or perhaps a painter saying "should I always use red paint or blue paint?" There is no right answer, and there is no set of rules that can be blindly followed here. You have to look at the pros and cons of each approach, and decide which best maps to your particular problem domain.
As for the CRTP, most use cases for that are to allow the base class to provide something in terms of the derived class; e.g. Boost's iterator_facade. The base class needs to have things like DerivedClass operator++() { /* Increment and return *this */ } inside -- specified in terms of derived in the member function signatures.
It can be used for polymorphic purposes, but I haven't seen too many of those.
The link you provide mentions boost iterators as an example of static polymorphism. STL iterators also exhibit this pattern. Lets take a look at an example and consider why the authors of those types decided this pattern was appropriate:
#include <vector>
#include <iostream>
using namespace std;
void print_ints( vector<int> const& some_ints )
{
for( vector<int>::const_iterator i = some_ints.begin(), end = some_ints.end(); i != end; ++i )
{
cout << *i;
}
}
Now, how would we implement int vector<int>::const_iterator::operator*() const; Can we use polymprhism for this? Well, no. What would the signature of our virtual function be? void const* operator*() const? That's useless! The type has been erased (degraded from int to void*). Instead, the curiously recurring template pattern steps in to help us generate the iterator type. Here is a rough approximation of the iterator class we would need to implement the above:
template<typename T>
class const_iterator_base
{
public:
const_iterator_base():{}
T::contained_type const& operator*() const { return Ptr(); }
T::contained_type const& operator->() const { return Ptr(); }
// increment, decrement, etc, can be implemented and forwarded to T
// ....
private:
T::contained_type const* Ptr() const { return static_cast<T>(this)->Ptr(); }
};
Traditional dynamic polymorphism could not provide the above implementation!
A related and important term is parametric polymorphism. This allows you to implement similar APIs in, say, python that you can using the curiously recurring template pattern in C++. Hope this is helpful!
I think it's worth taking a stab at the source of all this complexity, and why languages like Java and C# mostly try to avoid it: type erasure! In c++ there is no useful all containing Object type with useful information. Instead we have void* and once you have void* you truely have nothing! If you have an interface that decays to void* the only way to recover is by making dangerous assumptions or keeping extra type information around.
While there may be cases where static polymorphism is useful (the other answers have listed a few), I would generally see it as a bad thing. Why? Because you cannot actually use a pointer to the base class anymore, you always have to provide a template argument providing the exact derived type. And in that case, you could just as well use the derived type directly. And, to put it bluntly, static polymorphism is not what object orientation is about.
The runtime difference between static and dynamic polymorphism is exactly two pointer dereferenciations (iff the compiler really inlines the dispatch method in the base class, if it doesn't for some reason, static polymorphism is slower). That's not really expensive, especially since the second lookup should virtually always hit the cache. All in all, those lookups are usually cheaper than the function call itself, and are certainly worth it to get the real flexibility provided by dynamic polymorphism.

Dynamically construct function

I fear something like this is answered somewhere on this site, but I can't find it because I don't even know how to formulate the question. So here's the problem:
I have a voxel drowing function. First I calculate offsets, angles and stuff and after I do drowing. But I make few versions of every function because sometimes I want to copy pixel, sometimes blit, sometimes blit 3*3 square for every pixel for smoothing effect, sometimes just copy pixel to n*n pixels on the screen if object is resized. And there's tons of versions for that small part in the center of a function.
What can I do instead of writing 10 of same functions which differ only by central part of code? For performance reasons, passing a function pointer as an argument is not an option. I'm not sure making them inline will do the trick, because arguments I send differ: sometimes I calculate volume(Z value), sometimes I know pixels are drawn from bottom to top.
I assume there's some way of doing this stuff in C++ everybody knows about.
Please tell me what I need to learn to do this. Thanks.
The traditional OO approaches to this are the template method pattern and the strategy pattern.
Template Method
The first is an extension of the technique described in Vincenzo's answer: instead of writing a simple non-virtual wrapper, you write a non-virtual function containing the whole algorithm. Those parts that might vary, are virtual function calls.
The specific arguments needed for a given implementation, are stored in the derived class object that provides that implementation.
eg.
class VoxelDrawer {
protected:
virtual void copy(Coord from, Coord to) = 0;
// any other functions you might want to change
public:
virtual ~VoxelDrawer() {}
void draw(arg) {
for (;;) {
// implement full algorithm
copy(a,b);
}
}
};
class SmoothedVoxelDrawer: public VoxelDrawer {
int radius; // algorithm-specific argument
void copy(Coord from, Coord to) {
blit(from.dx(-radius).dy(-radius),
to.dx(-radius).dy(-radius),
2*radius, 2*radius);
}
public:
SmoothedVoxelDrawer(int r) : radius(r) {}
};
Strategy
This is similar but instead of using inheritance, you pass a polymorphic Copier object as an argument to your function. Its more flexible in that it decouples your various copying strategies from the specific function, and you can re-use your copying strategies in other functions.
struct VoxelCopier {
virtual void operator()(Coord from, Coord to) = 0;
};
struct SmoothedVoxelCopier: public VoxelCopier {
// etc. as for SmoothedVoxelDrawer
};
void draw_voxels(arguments, VoxelCopier &copy) {
for (;;) {
// implement full algorithm
copy(a,b);
}
}
Although tidier than passing in a function pointer, neither the template method nor the strategy are likely to have better performance than just passing a function pointer: runtime polymorphism is still an indirect function call.
Policy
The modern C++ equivalent of the strategy pattern is the policy pattern. This simply replaces run-time polymorphism with compile-time polymorphism to avoid the indirect function call and enable inlining
// you don't need a common base class for policies,
// since templates use duck typing
struct SmoothedVoxelCopier {
int radius;
void copy(Coord from, Coord to) { ... }
};
template <typename CopyPolicy>
void draw_voxels(arguments, CopyPolicy cp) {
for (;;) {
// implement full algorithm
cp.copy(a,b);
}
}
Because of type deduction, you can simply call
draw_voxels(arguments, SmoothedVoxelCopier(radius));
draw_voxels(arguments, OtherVoxelCopier(whatever));
NB. I've been slightly inconsistent here: I used operator() to make my strategy call look like a regular function, but a normal method for my policy. So long as you choose one and stick with it, this is just a matter of taste.
CRTP Template Method
There's one final mechanism, which is the compile-time polymorphism version of the template method, and uses the Curiously Recurring Template Pattern.
template <typename Impl>
class VoxelDrawerBase {
protected:
Impl& impl() { return *static_cast<Impl*>(this); }
void copy(Coord from, Coord to) {...}
// *optional* default implementation, is *not* virtual
public:
void draw(arg) {
for (;;) {
// implement full algorithm
impl().copy(a,b);
}
}
};
class SmoothedVoxelDrawer: public VoxelDrawerBase<SmoothedVoxelDrawer> {
int radius; // algorithm-specific argument
void copy(Coord from, Coord to) {
blit(from.dx(-radius).dy(-radius),
to.dx(-radius).dy(-radius),
2*radius, 2*radius);
}
public:
SmoothedVoxelDrawer(int r) : radius(r) {}
};
Summary
In general I'd prefer the strategy/policy patterns for their lower coupling and better reuse, and choose the template method pattern only where the top-level algorithm you're parameterizing is genuinely set in stone (ie, when you're either refactoring existing code or are really sure of your analysis of the points of variation) and reuse is genuinely not an issue.
It's also really painful to use the template method if there is more than one axis of variation (that is, you have multiple methods like copy, and want to vary their implementations independently). You either end up with code duplication or mixin inheritance.
I suggest using the NVI idiom.
You have your public method which calls a private function that implements the logic that must differ from case to case.
Derived classes will have to provide an implementation of that private function that specializes them for their particular task.
Example:
class A {
public:
void do_base() {
// [pre]
specialized_do();
// [post]
}
private:
virtual void specialized_do() = 0;
};
class B : public A {
private:
void specialized_do() {
// [implementation]
}
};
The advantage is that you can keep a common implementation in the base class and detail it as required for any subclass (which just need to reimplement the specialized_do method).
The disadvantage is that you need a different type for each implementation, but if your use case is drawing different UI elements, this is the way to go.
You could simply use the strategy pattern
So, instead of something like
void do_something_one_way(...)
{
//blah
//blah
//blah
one_way();
//blah
//blah
}
void do_something_another_way(...)
{
//blah
//blah
//blah
another_way();
//blah
//blah
}
You will have
void do_something(...)
{
//blah
//blah
//blah
any_which_way();
//blah
//blah
}
any_which_way could be a lambda, a functor, a virtual member function of a strategy class passed in. There are many options.
Are you sure that
"passing a function pointer as an argument is not an option"
Does it really slow it down?
You could use higher order functions, if your 'central part' can be parameterized nicely.
Here is a simple example of a function that returns a function which adds n to its argument:
#include <iostream>
#include<functional>
std::function<int(int)> n_adder(int n)
{
return [=](int x){return x+n;};
}
int main()
{
auto add_one = n_adder(1);
std::cout<<add_one(5);
}
You can use either Template Method pattern or Strategy pattern.
Usually Template method pattern is used in white-box frameworks, when you need to know about the internal structure of a framework to correctly subclass a class.
Strategy pattern is usually used in black-box frameworks, when you should not know about the implementation of the framework, since you only need to understand the contract of the methods you should implement.
For performance reasons, passing a function pointer as an argument is not an option.
Are you sure that passing one additional parameter and will cause performance problems? In this case you may have similar performance penalties if you use OOP techniques, like Template method or Strategy. But it is usually necessary to use profilier to determine what is the source of the performance degradation. Virtual calls, passing additional parameters, calling function through a pointer are usually very cheap, comparing to complex algorithms. You may find that these techniques consumes insignificant percent of CPU resources comparing to other code.
I'm not sure making them inline will do the trick, because arguments I send differ: sometimes I calculate volume(Z value), sometimes I know pixels are drawn from bottom to top.
You could pass all the parameter required for drawing in all cases. Alternatively if use Tempate method pattern a base class could provide methods that can return the data that could be required for drawing in different cases. In Strategy pattern, you could pass an instance of an object that could provide this kind of data to a Strategy implementation.

c++ virtual function vs member function pointer (performance comparison)

Virtual function calls can be slow due to virtual calls requiring an extra indexed deference to the v-table, which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications.
So I have been thinking of a way to overcome this performance issue of virtual functions yet still having some of the same functionality that virtual functions provide.
I am confident that this has been done before, but I devised a simple test that allows the base class to store a member function pointer that can be set by any the derived class. And when I call Foo() on any derived class, it will call the appropriate member function without having to traverse the v-table...
I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous?
Thanks in advance for your time! :)
class BaseClass
{
protected:
// member function pointer
typedef void(BaseClass::*FooMemFuncPtr)();
FooMemFuncPtr m_memfn_ptr_Foo;
void FooBaseClass()
{
printf("FooBaseClass() \n");
}
public:
BaseClass()
{
m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
}
void Foo()
{
((*this).*m_memfn_ptr_Foo)();
}
};
class DerivedClass : public BaseClass
{
protected:
void FooDeriveddClass()
{
printf("FooDeriveddClass() \n");
}
public:
DerivedClass() : BaseClass()
{
m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDeriveddClass;
}
};
int main(int argc, _TCHAR* argv[])
{
DerivedClass derived_inst;
derived_inst.Foo(); // "FooDeriveddClass()"
BaseClass base_inst;
base_inst.Foo(); // "FooBaseClass()"
BaseClass * derived_heap_inst = new DerivedClass;
derived_heap_inst->Foo();
return 0;
}
I did a test, and the version using virtual function calls was faster on my system with optimization.
$ time ./main 1
Using member pointer
real 0m3.343s
user 0m3.340s
sys 0m0.002s
$ time ./main 2
Using virtual function call
real 0m2.227s
user 0m2.219s
sys 0m0.006s
Here is the code:
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <stdio.h>
struct BaseClass
{
typedef void(BaseClass::*FooMemFuncPtr)();
FooMemFuncPtr m_memfn_ptr_Foo;
void FooBaseClass() { }
BaseClass()
{
m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
}
void Foo()
{
((*this).*m_memfn_ptr_Foo)();
}
};
struct DerivedClass : public BaseClass
{
void FooDerivedClass() { }
DerivedClass() : BaseClass()
{
m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDerivedClass;
}
};
struct VBaseClass {
virtual void Foo() = 0;
};
struct VDerivedClass : VBaseClass {
virtual void Foo() { }
};
static const size_t count = 1000000000;
static void f1(BaseClass* bp)
{
for (size_t i=0; i!=count; ++i) {
bp->Foo();
}
}
static void f2(VBaseClass* bp)
{
for (size_t i=0; i!=count; ++i) {
bp->Foo();
}
}
int main(int argc, char** argv)
{
int test = atoi(argv[1]);
switch (test) {
case 1:
{
std::cerr << "Using member pointer\n";
DerivedClass d;
f1(&d);
break;
}
case 2:
{
std::cerr << "Using virtual function call\n";
VDerivedClass d;
f2(&d);
break;
}
}
return 0;
}
Compiled using:
g++ -O2 main.cpp -o main
with g++ 4.7.2.
Virtual function calls can be slow due to virtual calls having to traverse the v-table,
That's not quite correct. The vtable should be computed on object construction, with each virtual function pointer set to the most specialized version in the hierarchy. The process of calling a virtual function does not iterate pointers but call something like *(vtbl_address + 8)(args);, which is computed in constant time.
which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications.
Your solution is not good for performance critical applications (in general) either, because it is generic.
As a rule, performance critical applications are optimized on a per-case basis (measure, pick code with worst performance problems within module and optimize).
With this per-case approach, you will probably never have a case where your code is slow because the compiler has to traverse a vtbl. If that is the case, the slowness would probably come from calling functions through pointers instead of directly (i.e. the problem would be solved by inlining, not by adding an extra pointer in the base class).
All this is academic anyway, until you have a concrete case to optimize (and you have measured that your worst offender is virtual function calls).
Edit:
I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous?
Because it looks like a generic solution (applying it ubiquitously would decrease performance instead of improving it), solving a non-existent problem (your application is generally not slowed down due to virtual function calls).
Virtual functions do not "traverse" the table, just do a single fetch of a pointer from a location and call that address. That as if you had a manual implementation of a pointer-to-funciton and used that for a call instead of a direct one.
So your work is only good for obfuscation, and sabotage the cases where the compiler can issue nonvirtual direct call.
Using a pointer-to-memberfunction is probably even worse than PTF, it will likely use the same VMT structure for an similar offseted access, just a variable one instead of fixed.
Mostly because it doesn't work. Most modern CPUs are better at branch prediction and speculative execution than you think. However I have yet to see a CPU that do speculative execution beyond a non-static branch.
Furthermore in a modern CPU you are more likely to have a cache miss because you had a context switch just prior to the call and another program took over the cache than you are because of a v-table, even this scenario is a very remote possiblity.
Actually some compilers may use thunks, which translate to ordinary function pointers themselves, so basically the compiler does for you what you are trying to do manually (and probably confuse the hell out of people).
Also, having a pointer to virtual function table, the space complexity of virtual function is O(1) (just the pointer). On the other hand, if you store function pointers within the class, then the complexity is O(N) (your class now contains as many pointers as there are "virtual" functions). If there are many functions, you are paying toll for that - when pre-fetching your object, you are loading all the pointers in the cache line, instead of just a single pointer and the first few members which you are likely to need. That sounds like a waste.
The virtual function table, on the other hand, sits in one place for all the objects of one type and is likely never pushed out of the cache while your code calls some short virtual functions in a loop (which is presumably the problem when virtual function cost would become the bottleneck).
As to the branch prediction, in some cases a simple decision tree over object type and inlined functions for each particular type give good performance (then you store type information instead of a pointer). This is not applicable to all types of problems and would be mostly a premature optimization.
As a rule of thumb, don't worry about the language constructs because they seem unfamiliar. Worry and optimize only after you have measured and identified where the bottleneck really is.

Virtual Methods or Function Pointers

When implementing polymorphic behavior in C++ one can either use a pure virtual method or one can use function pointers (or functors). For example an asynchronous callback can be implemented by:
Approach 1
class Callback
{
public:
Callback();
~Callback();
void go();
protected:
virtual void doGo() = 0;
};
//Constructor and Destructor
void Callback::go()
{
doGo();
}
So to use the callback here, you would need to override the doGo() method to call whatever function you want
Approach 2
typedef void (CallbackFunction*)(void*)
class Callback
{
public:
Callback(CallbackFunction* func, void* param);
~Callback();
void go();
private:
CallbackFunction* iFunc;
void* iParam;
};
Callback::Callback(CallbackFunction* func, void* param) :
iFunc(func),
iParam(param)
{}
//Destructor
void go()
{
(*iFunc)(iParam);
}
To use the callback method here you will need to create a function pointer to be called by the Callback object.
Approach 3
[This was added to the question by me (Andreas); it wasn't written by the original poster]
template <typename T>
class Callback
{
public:
Callback() {}
~Callback() {}
void go() {
T t; t();
}
};
class CallbackTest
{
public:
void operator()() { cout << "Test"; }
};
int main()
{
Callback<CallbackTest> test;
test.go();
}
What are the advantages and disadvantages of each implementation?
Approach 1 (Virtual Function)
"+" The "correct way to do it in C++
"-" A new class must be created per callback
"-" Performance-wise an additional dereference through VF-Table compared to Function Pointer. Two indirect references compared to Functor solution.
Approach 2 (Class with Function Pointer)
"+" Can wrap a C-style function for C++ Callback Class
"+" Callback function can be changed after callback object is created
"-" Requires an indirect call. May be slower than functor method for callbacks that can be statically computed at compile-time.
Approach 3 (Class calling T functor)
"+" Possibly the fastest way to do it. No indirect call overhead and may be inlined completely.
"-" Requires an additional Functor class to be defined.
"-" Requires that callback is statically declared at compile-time.
FWIW, Function Pointers are not the same as Functors. Functors (in C++) are classes that are used to provide a function call which is typically operator().
Here is an example functor as well as a template function which utilizes a functor argument:
class TFunctor
{
public:
void operator()(const char *charstring)
{
printf(charstring);
}
};
template<class T> void CallFunctor(T& functor_arg,const char *charstring)
{
functor_arg(charstring);
};
int main()
{
TFunctor foo;
CallFunctor(foo,"hello world\n");
}
From a performance perspective, Virtual functions and Function Pointers both result in an indirect function call (i.e. through a register) although virtual functions require an additional load of the VFTABLE pointer prior to loading the function pointer. Using Functors (with a non-virtual call) as a callback are the highest performing method to use a parameter to template functions because they can be inlined and even if not inlined, do not generate an indirect call.
Approach 1
Easier to read and understand
Less possibility of errors (iFunc cannot be NULL, you're not using a void *iParam, etc
C++ programmers will tell you that this is the "right" way to do it in C++
Approach 2
Slightly less typing to do
VERY slightly faster (calling a virtual method has some overhead, usually the same of two simple arithmetic operations.. So it most likely won't matter)
That's how you would do it in C
Approach 3
Probably the best way to do it when possible. It will have the best performance, it will be type safe, and it's easy to understand (it's the method used by the STL).
The primary problem with Approach 2 is that it simply doesn't scale. Consider the equivalent for 100 functions:
class MahClass {
// 100 pointers of various types
public:
MahClass() { // set all 100 pointers }
MahClass(const MahClass& other) {
// copy all 100 function pointers
}
};
The size of MahClass has ballooned, and the time to construct it has also significantly increased. Virtual functions, however, are O(1) increase in the size of the class and the time to construct it- not to mention that you, the user, must write all the callbacks for all the derived classes manually which adjust the pointer to become a pointer to derived, and must specify function pointer types and what a mess. Not to mention the idea that you might forget one, or set it to NULL or something equally stupid but totally going to happen because you're writing 30 classes this way and violating DRY like a parasitic wasp violates a caterpillar.
Approach 3 is only usable when the desired callback is statically knowable.
This leaves Approach 1 as the only usable approach when dynamic method invocation is required.
It's not clear from your example if you're creating a utility class or not. Is you Callback class intended to implement a closure or a more substantial object that you just didn't flesh out?
The first form:
Is easier to read and understand,
Is far easier to extend: try adding methods pause, resume and stop.
Is better at handling encapsulation (presuming doGo is defined in the class).
Is probably a better abstraction, so easier to maintain.
The second form:
Can be used with different methods for doGo, so it's more than just polymorphic.
Could allow (with additional methods) changing the doGo method at run-time, allowing the instances of the object to mutate their functionality after creation.
Ultimately, IMO, the first form is better for all normal cases. The second has some interesting capabilities, though -- but not ones you'll need often.
One major advantage of the first method is it has more type safety. The second method uses a void * for iParam so the compiler will not be able to diagnose type problems.
A minor advantage of the second method is that it would be less work to integrate with C. But if you're code base is only C++, this advantage is moot.
Function pointers are more C-style I would say. Mainly because in order to use them you usually must define a flat function with the same exact signature as your pointer definition.
When I write C++ the only flat function I write is int main(). Everything else is a class object. Out of the two choices I would choose to define an class and override your virtual, but if all you want is to notify some code that some action happened in your class, neither of these choices would be the best solution.
I am unaware of your exact situation but you might want to peruse design patterns
I would suggest the observer pattern. It is what I use when I need to monitor a class or wait for some sort of notification.
For example, let us look at an interface for adding read functionality to a class:
struct Read_Via_Inheritance
{
virtual void read_members(void) = 0;
};
Any time I want to add another source of reading, I have to inherit from the class and add a specific method:
struct Read_Inherited_From_Cin
: public Read_Via_Inheritance
{
void read_members(void)
{
cin >> member;
}
};
If I want to read from a file, database, or USB, this requires 3 more separate classes. The combinations start to be come very ugly with multiple objects and multiple sources.
If I use a functor, which happens to resemble the Visitor design pattern:
struct Reader_Visitor_Interface
{
virtual void read(unsigned int& member) = 0;
virtual void read(std::string& member) = 0;
};
struct Read_Client
{
void read_members(Reader_Interface & reader)
{
reader.read(x);
reader.read(text);
return;
}
unsigned int x;
std::string& text;
};
With the above foundation, objects can read from different sources just by supplying different readers to the read_members method:
struct Read_From_Cin
: Reader_Visitor_Interface
{
void read(unsigned int& value)
{
cin>>value;
}
void read(std::string& value)
{
getline(cin, value);
}
};
I don't have to change any of the object's code (a good thing because it is already working). I can also apply the reader to other objects.
Generally, I use inheritance when I am performing generic programming. For example, if I have a Field class, then I can create Field_Boolean, Field_Text and Field_Integer. In can put pointers to their instances into a vector<Field *> and call it a record. The record can perform generic operations on the fields, and doesn't care or know what kind of a field is processed.
Change to pure virtual, first off. Then inline it. That should negate any method overhead call at all, so long as inlining doesn't fail (and it won't if you force it).
May as well use C, because this is the only real useful major feature of C++ compared to C. You will always call method and it can't be inlined, so it will be less efficient.