Virtual function calls can be slow due to virtual calls requiring an extra indexed deference to the v-table, which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications.
So I have been thinking of a way to overcome this performance issue of virtual functions yet still having some of the same functionality that virtual functions provide.
I am confident that this has been done before, but I devised a simple test that allows the base class to store a member function pointer that can be set by any the derived class. And when I call Foo() on any derived class, it will call the appropriate member function without having to traverse the v-table...
I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous?
Thanks in advance for your time! :)
class BaseClass
{
protected:
// member function pointer
typedef void(BaseClass::*FooMemFuncPtr)();
FooMemFuncPtr m_memfn_ptr_Foo;
void FooBaseClass()
{
printf("FooBaseClass() \n");
}
public:
BaseClass()
{
m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
}
void Foo()
{
((*this).*m_memfn_ptr_Foo)();
}
};
class DerivedClass : public BaseClass
{
protected:
void FooDeriveddClass()
{
printf("FooDeriveddClass() \n");
}
public:
DerivedClass() : BaseClass()
{
m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDeriveddClass;
}
};
int main(int argc, _TCHAR* argv[])
{
DerivedClass derived_inst;
derived_inst.Foo(); // "FooDeriveddClass()"
BaseClass base_inst;
base_inst.Foo(); // "FooBaseClass()"
BaseClass * derived_heap_inst = new DerivedClass;
derived_heap_inst->Foo();
return 0;
}
I did a test, and the version using virtual function calls was faster on my system with optimization.
$ time ./main 1
Using member pointer
real 0m3.343s
user 0m3.340s
sys 0m0.002s
$ time ./main 2
Using virtual function call
real 0m2.227s
user 0m2.219s
sys 0m0.006s
Here is the code:
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <stdio.h>
struct BaseClass
{
typedef void(BaseClass::*FooMemFuncPtr)();
FooMemFuncPtr m_memfn_ptr_Foo;
void FooBaseClass() { }
BaseClass()
{
m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
}
void Foo()
{
((*this).*m_memfn_ptr_Foo)();
}
};
struct DerivedClass : public BaseClass
{
void FooDerivedClass() { }
DerivedClass() : BaseClass()
{
m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDerivedClass;
}
};
struct VBaseClass {
virtual void Foo() = 0;
};
struct VDerivedClass : VBaseClass {
virtual void Foo() { }
};
static const size_t count = 1000000000;
static void f1(BaseClass* bp)
{
for (size_t i=0; i!=count; ++i) {
bp->Foo();
}
}
static void f2(VBaseClass* bp)
{
for (size_t i=0; i!=count; ++i) {
bp->Foo();
}
}
int main(int argc, char** argv)
{
int test = atoi(argv[1]);
switch (test) {
case 1:
{
std::cerr << "Using member pointer\n";
DerivedClass d;
f1(&d);
break;
}
case 2:
{
std::cerr << "Using virtual function call\n";
VDerivedClass d;
f2(&d);
break;
}
}
return 0;
}
Compiled using:
g++ -O2 main.cpp -o main
with g++ 4.7.2.
Virtual function calls can be slow due to virtual calls having to traverse the v-table,
That's not quite correct. The vtable should be computed on object construction, with each virtual function pointer set to the most specialized version in the hierarchy. The process of calling a virtual function does not iterate pointers but call something like *(vtbl_address + 8)(args);, which is computed in constant time.
which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications.
Your solution is not good for performance critical applications (in general) either, because it is generic.
As a rule, performance critical applications are optimized on a per-case basis (measure, pick code with worst performance problems within module and optimize).
With this per-case approach, you will probably never have a case where your code is slow because the compiler has to traverse a vtbl. If that is the case, the slowness would probably come from calling functions through pointers instead of directly (i.e. the problem would be solved by inlining, not by adding an extra pointer in the base class).
All this is academic anyway, until you have a concrete case to optimize (and you have measured that your worst offender is virtual function calls).
Edit:
I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous?
Because it looks like a generic solution (applying it ubiquitously would decrease performance instead of improving it), solving a non-existent problem (your application is generally not slowed down due to virtual function calls).
Virtual functions do not "traverse" the table, just do a single fetch of a pointer from a location and call that address. That as if you had a manual implementation of a pointer-to-funciton and used that for a call instead of a direct one.
So your work is only good for obfuscation, and sabotage the cases where the compiler can issue nonvirtual direct call.
Using a pointer-to-memberfunction is probably even worse than PTF, it will likely use the same VMT structure for an similar offseted access, just a variable one instead of fixed.
Mostly because it doesn't work. Most modern CPUs are better at branch prediction and speculative execution than you think. However I have yet to see a CPU that do speculative execution beyond a non-static branch.
Furthermore in a modern CPU you are more likely to have a cache miss because you had a context switch just prior to the call and another program took over the cache than you are because of a v-table, even this scenario is a very remote possiblity.
Actually some compilers may use thunks, which translate to ordinary function pointers themselves, so basically the compiler does for you what you are trying to do manually (and probably confuse the hell out of people).
Also, having a pointer to virtual function table, the space complexity of virtual function is O(1) (just the pointer). On the other hand, if you store function pointers within the class, then the complexity is O(N) (your class now contains as many pointers as there are "virtual" functions). If there are many functions, you are paying toll for that - when pre-fetching your object, you are loading all the pointers in the cache line, instead of just a single pointer and the first few members which you are likely to need. That sounds like a waste.
The virtual function table, on the other hand, sits in one place for all the objects of one type and is likely never pushed out of the cache while your code calls some short virtual functions in a loop (which is presumably the problem when virtual function cost would become the bottleneck).
As to the branch prediction, in some cases a simple decision tree over object type and inlined functions for each particular type give good performance (then you store type information instead of a pointer). This is not applicable to all types of problems and would be mostly a premature optimization.
As a rule of thumb, don't worry about the language constructs because they seem unfamiliar. Worry and optimize only after you have measured and identified where the bottleneck really is.
Related
I've come across to an Alexandrescu tutorial about traits and I have some reflections to share. This is the code:
// Example 6: Reference counting traits
//
template <class T>
class RefCountingTraits
{
static void Refer(T* p)
{
p->IncRef(); // assume RefCounted interface
}
static void Unrefer(T* p)
{
p->DecRef(); // assume RefCounted interface
}
};
template <>
class RefCountingTraits<Widget>
{
static void Refer(Widget* p)
{
p->AddReference(); // use Widget interface
}
static void Unrefer(Widget* p)
{
// use Widget interface
if (p->RemoveReference() == 0)
delete p;
}
};
How much overhead we have in this case compared to a standard virtual function member case? we are not accessing directly to the object also in this case: we are still passing a pointer. Is the compiler able to optimize it in same way?
At typical production optimisation levels (-O2 or /O2) you can expect all the code you've shown to be inlined and the bits without side-effects optimised away. That leaves the actual calls to IncRef or AddReference and the check for and delete-ion.
If virtual functions had been used, and if the reference counting code is trivial (e.g. not thread safe), it might have been about an order of magnitude slower due to a dispatch table lookup and out-of-line function call, but that will vary a bit with compiler, exact optimisation settings, CPU, calling conventions etc..
As always, when you have to care, profile and experiment.
I am working on an embedded platform which doesn't cope very well with dynamic code (no speculative / OOO execution at all).
On this platform I call a virtual member function on the same object quite often, however the compiler fails to optimize the vtable-lookup away, as it doesn't seem to recognize the lookup is only required for the first invocation.
Therefore I wonder: Is there a manual way to devirtualize a virtual member function of a C++ class in order to get a function-pointer which points directly to the resolved address?
I had a look at C++ function pointers, but since they seem to require a type specified, I guess this won`t work out.
Thank you in advance
There's no general standard-C++-only way to find the address of a virtual function, given only a reference to a base class object. Furthermore there's no reasonable type for that, because the this needs not be passed as an ordinary argument, following a general convention (e.g. it can be passed in a register, with the other args on stack).
If you do not need portability, however, you can always do whatever works for your given compiler. E.g., with Microsoft's COM (I know, that's not your platform) there is a known memory layout with vtable pointers, so as to access the functionality from C.
If you do need portability then I suggest to design in the optimization. For example, instead of
class Foo_base
{
public:
virtual void bar() = 0;
};
do like
class Foo_base
{
public:
typedef (*Bar_func)(Foo_base&);
virtual Bar_func bar_func() const = 0;
void bar() { bar_func()( *this ); }
};
supporting the same public interface as before, but now exposing the innards, so to speak, thus allowing manual optimization of repeated calls to bar.
Regarding gcc I have seen the following while debuggging the assembly code compiled.
I have seen that a generic method pointer holds two data:
a) a "pointer" to the method
b) an offset to add eventually to the class instance starting address ( the offset is used when multiple inheritance is involved and for methods of the second and further parent class that if applied to their objects would have their data at different starting points).
The "pointer" to the method is as follows:
1) if the "pointer" is even it is interpreted as a normal (non virtual) function pointer.
2) If the "pointer" is odd then 1 should be subtracted and the remaining value should be 0 or 4 or 8 or 12 ( supposing a pointer size of 4 bytes).
The previous codification supposes obviously that all normal methods start at even addresses (so the compiler should align them at even addresses).
So that offset is the offset into the vtable where to fetch the address of the "real" non virual method pointer.
So the correct idea in order to devirtualize the call is to convert a virtual method pointer to a non virtual method pointer and use it aftewards in order to apply it to the "subject" that is our class instance.
The code bellow does what described.
#include <stdio.h>
#include <string.h>
#include <typeinfo>
#include <typeindex>
#include <cstdint>
struct Animal{
int weight=0x11111111;
virtual int mm(){printf("Animal1 mm\n");return 0x77;};
virtual int nn(){printf("Animal1 nn\n");return 0x99;};
};
struct Tiger:Animal{
int weight=0x22222222,height=0x33333333;
virtual int mm(){printf("Tigerxx\n");return 0xCC;}
virtual int nn(){printf("Tigerxx\n");return 0x99;};
};
typedef int (Animal::*methodPointerT)();
typedef struct {
void** functionPtr;
size_t offset;
} MP;
void devirtualize(methodPointerT& mp0,const Animal& a){
MP& t=*(MP*)&mp0;
if((intptr_t)t.functionPtr & 1){
size_t index=(t.functionPtr-(void**)1); // there is obviously a more
void** vTable=(void**)(*(void**)&a); // efficient way. Just for clearness !
t.functionPtr=(void**)vTable[index];
}
};
int main()
{
int (Animal::*mp1)()=&Animal::nn;
MP& mp1MP=*(MP*)&mp1;
Animal x;Tiger y;
(x.*mp1)();(y.*mp1)();
devirtualize(mp1,x);
(x.*mp1)();(y.*mp1)();
}
Yes, this is possible in a way that works at least with MSVC, GCC and Clang.
I was also looking for how to do this, and here is a blog post I found that explains it in detail: https://medium.com/#calebleak/fast-virtual-functions-hacking-the-vtable-for-fun-and-profit-25c36409c5e0
Taking the code from there, in short, this is what you need to do. This function works for all objects:
template <typename T>
void** GetVTable(T* obj) {
return *((void***)obj);
}
And then to get a direct function pointer to the first virtual function of the class, you do this:
typedef void(VoidMemberFn)(void*);
VoidMemberFn* fn = (VoidMemberFn*)GetVTable<BaseType>(my_obj_ptr)[0];
// ... sometime later
fn(my_obj_ptr);
So it's quite easy actually.
When implementing polymorphic behavior in C++ one can either use a pure virtual method or one can use function pointers (or functors). For example an asynchronous callback can be implemented by:
Approach 1
class Callback
{
public:
Callback();
~Callback();
void go();
protected:
virtual void doGo() = 0;
};
//Constructor and Destructor
void Callback::go()
{
doGo();
}
So to use the callback here, you would need to override the doGo() method to call whatever function you want
Approach 2
typedef void (CallbackFunction*)(void*)
class Callback
{
public:
Callback(CallbackFunction* func, void* param);
~Callback();
void go();
private:
CallbackFunction* iFunc;
void* iParam;
};
Callback::Callback(CallbackFunction* func, void* param) :
iFunc(func),
iParam(param)
{}
//Destructor
void go()
{
(*iFunc)(iParam);
}
To use the callback method here you will need to create a function pointer to be called by the Callback object.
Approach 3
[This was added to the question by me (Andreas); it wasn't written by the original poster]
template <typename T>
class Callback
{
public:
Callback() {}
~Callback() {}
void go() {
T t; t();
}
};
class CallbackTest
{
public:
void operator()() { cout << "Test"; }
};
int main()
{
Callback<CallbackTest> test;
test.go();
}
What are the advantages and disadvantages of each implementation?
Approach 1 (Virtual Function)
"+" The "correct way to do it in C++
"-" A new class must be created per callback
"-" Performance-wise an additional dereference through VF-Table compared to Function Pointer. Two indirect references compared to Functor solution.
Approach 2 (Class with Function Pointer)
"+" Can wrap a C-style function for C++ Callback Class
"+" Callback function can be changed after callback object is created
"-" Requires an indirect call. May be slower than functor method for callbacks that can be statically computed at compile-time.
Approach 3 (Class calling T functor)
"+" Possibly the fastest way to do it. No indirect call overhead and may be inlined completely.
"-" Requires an additional Functor class to be defined.
"-" Requires that callback is statically declared at compile-time.
FWIW, Function Pointers are not the same as Functors. Functors (in C++) are classes that are used to provide a function call which is typically operator().
Here is an example functor as well as a template function which utilizes a functor argument:
class TFunctor
{
public:
void operator()(const char *charstring)
{
printf(charstring);
}
};
template<class T> void CallFunctor(T& functor_arg,const char *charstring)
{
functor_arg(charstring);
};
int main()
{
TFunctor foo;
CallFunctor(foo,"hello world\n");
}
From a performance perspective, Virtual functions and Function Pointers both result in an indirect function call (i.e. through a register) although virtual functions require an additional load of the VFTABLE pointer prior to loading the function pointer. Using Functors (with a non-virtual call) as a callback are the highest performing method to use a parameter to template functions because they can be inlined and even if not inlined, do not generate an indirect call.
Approach 1
Easier to read and understand
Less possibility of errors (iFunc cannot be NULL, you're not using a void *iParam, etc
C++ programmers will tell you that this is the "right" way to do it in C++
Approach 2
Slightly less typing to do
VERY slightly faster (calling a virtual method has some overhead, usually the same of two simple arithmetic operations.. So it most likely won't matter)
That's how you would do it in C
Approach 3
Probably the best way to do it when possible. It will have the best performance, it will be type safe, and it's easy to understand (it's the method used by the STL).
The primary problem with Approach 2 is that it simply doesn't scale. Consider the equivalent for 100 functions:
class MahClass {
// 100 pointers of various types
public:
MahClass() { // set all 100 pointers }
MahClass(const MahClass& other) {
// copy all 100 function pointers
}
};
The size of MahClass has ballooned, and the time to construct it has also significantly increased. Virtual functions, however, are O(1) increase in the size of the class and the time to construct it- not to mention that you, the user, must write all the callbacks for all the derived classes manually which adjust the pointer to become a pointer to derived, and must specify function pointer types and what a mess. Not to mention the idea that you might forget one, or set it to NULL or something equally stupid but totally going to happen because you're writing 30 classes this way and violating DRY like a parasitic wasp violates a caterpillar.
Approach 3 is only usable when the desired callback is statically knowable.
This leaves Approach 1 as the only usable approach when dynamic method invocation is required.
It's not clear from your example if you're creating a utility class or not. Is you Callback class intended to implement a closure or a more substantial object that you just didn't flesh out?
The first form:
Is easier to read and understand,
Is far easier to extend: try adding methods pause, resume and stop.
Is better at handling encapsulation (presuming doGo is defined in the class).
Is probably a better abstraction, so easier to maintain.
The second form:
Can be used with different methods for doGo, so it's more than just polymorphic.
Could allow (with additional methods) changing the doGo method at run-time, allowing the instances of the object to mutate their functionality after creation.
Ultimately, IMO, the first form is better for all normal cases. The second has some interesting capabilities, though -- but not ones you'll need often.
One major advantage of the first method is it has more type safety. The second method uses a void * for iParam so the compiler will not be able to diagnose type problems.
A minor advantage of the second method is that it would be less work to integrate with C. But if you're code base is only C++, this advantage is moot.
Function pointers are more C-style I would say. Mainly because in order to use them you usually must define a flat function with the same exact signature as your pointer definition.
When I write C++ the only flat function I write is int main(). Everything else is a class object. Out of the two choices I would choose to define an class and override your virtual, but if all you want is to notify some code that some action happened in your class, neither of these choices would be the best solution.
I am unaware of your exact situation but you might want to peruse design patterns
I would suggest the observer pattern. It is what I use when I need to monitor a class or wait for some sort of notification.
For example, let us look at an interface for adding read functionality to a class:
struct Read_Via_Inheritance
{
virtual void read_members(void) = 0;
};
Any time I want to add another source of reading, I have to inherit from the class and add a specific method:
struct Read_Inherited_From_Cin
: public Read_Via_Inheritance
{
void read_members(void)
{
cin >> member;
}
};
If I want to read from a file, database, or USB, this requires 3 more separate classes. The combinations start to be come very ugly with multiple objects and multiple sources.
If I use a functor, which happens to resemble the Visitor design pattern:
struct Reader_Visitor_Interface
{
virtual void read(unsigned int& member) = 0;
virtual void read(std::string& member) = 0;
};
struct Read_Client
{
void read_members(Reader_Interface & reader)
{
reader.read(x);
reader.read(text);
return;
}
unsigned int x;
std::string& text;
};
With the above foundation, objects can read from different sources just by supplying different readers to the read_members method:
struct Read_From_Cin
: Reader_Visitor_Interface
{
void read(unsigned int& value)
{
cin>>value;
}
void read(std::string& value)
{
getline(cin, value);
}
};
I don't have to change any of the object's code (a good thing because it is already working). I can also apply the reader to other objects.
Generally, I use inheritance when I am performing generic programming. For example, if I have a Field class, then I can create Field_Boolean, Field_Text and Field_Integer. In can put pointers to their instances into a vector<Field *> and call it a record. The record can perform generic operations on the fields, and doesn't care or know what kind of a field is processed.
Change to pure virtual, first off. Then inline it. That should negate any method overhead call at all, so long as inlining doesn't fail (and it won't if you force it).
May as well use C, because this is the only real useful major feature of C++ compared to C. You will always call method and it can't be inlined, so it will be less efficient.
Having at least one virtual method in a C++ class (or any of its parent classes) means that the class will have a virtual table, and every instance will have a virtual pointer.
So the memory cost is quite clear. The most important is the memory cost on the instances (especially if the instances are small, for example if they are just meant to contain an integer: in this case having a virtual pointer in every instance might double the size of the instances. As for the memory space used up by the virtual tables, I guess it is usually negligible compared to the space used up by the actual method code.
This brings me to my question: is there a measurable performance cost (i.e. speed impact) for making a method virtual? There will be a lookup in the virtual table at runtime, upon every method call, so if there are very frequent calls to this method, and if this method is very short, then there might be a measurable performance hit? I guess it depends on the platform, but has anyone run some benchmarks?
The reason I am asking is that I came across a bug that happened to be due to a programmer forgetting to define a method virtual. This is not the first time I see this kind of mistake. And I thought: why do we add the virtual keyword when needed instead of removing the virtual keyword when we are absolutely sure that it is not needed? If the performance cost is low, I think I will simply recommend the following in my team: simply make every method virtual by default, including the destructor, in every class, and only remove it when you need to. Does that sound crazy to you?
I ran some timings on a 3ghz in-order PowerPC processor. On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call.
So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful. A 7ns overhead on a function that inlines to 0.5ns is severe; a 7ns overhead on a function that takes 500ms to execute is meaningless.
The big cost of virtual functions isn't really the lookup of a function pointer in the vtable (that's usually just a single cycle), but that the indirect jump usually cannot be branch-predicted. This can cause a large pipeline bubble as the processor cannot fetch any instructions until the indirect jump (the call through the function pointer) has retired and a new instruction pointer computed. So, the cost of a virtual function call is much bigger than it might seem from looking at the assembly... but still only 7 nanoseconds.
Edit: Andrew, Not Sure, and others also raise the very good point that a virtual function call may cause an instruction cache miss: if you jump to a code address that is not in cache then the whole program comes to a dead halt while the instructions are fetched from main memory. This is always a significant stall: on Xenon, about 650 cycles (by my tests).
However this isn't a problem specific to virtual functions because even a direct function call will cause a miss if you jump to instructions that aren't in cache. What matters is whether the function has been run before recently (making it more likely to be in cache), and whether your architecture can predict static (not virtual) branches and fetch those instructions into cache ahead of time. My PPC does not, but maybe Intel's most recent hardware does.
My timings control for the influence of icache misses on execution (deliberately, since I was trying to examine the CPU pipeline in isolation), so they discount that cost.
There is definitely measurable overhead when calling a virtual function - the call must use the vtable to resolve the address of the function for that type of object. The extra instructions are the least of your worries. Not only do vtables prevent many potential compiler optimizations (since the type is polymorphic the compiler) they can also thrash your I-Cache.
Of course whether these penalties are significant or not depends on your application, how often those code paths are executed, and your inheritance patterns.
In my opinion though, having everything as virtual by default is a blanket solution to a problem you could solve in other ways.
Perhaps you could look at how classes are designed/documented/written. Generally the header for a class should make quite clear which functions can be overridden by derived classes and how they are called. Having programmers write this documentation is helpful in ensuring they are marked correctly as virtual.
I would also say that declaring every function as virtual could lead to more bugs than just forgetting to mark something as virtual. If all functions are virtual everything can be replaced by base classes - public, protected, private - everything becomes fair game. By accident or intention subclasses could then change the behavior of functions that then cause problems when used in the base implementation.
It depends. :) (Had you expected anything else?)
Once a class gets a virtual function, it can no longer be a POD datatype, (it may not have been one before either, in which case this won't make a difference) and that makes a whole range of optimizations impossible.
std::copy() on plain POD types can resort to a simple memcpy routine, but non-POD types have to be handled more carefully.
Construction becomes a lot slower because the vtable has to be initialized. In the worst case, the difference in performance between POD and non-POD datatypes can be significant.
In the worst case, you may see 5x slower execution (that number is taken from a university project I did recently to reimplement a few standard library classes. Our container took roughly 5x as long to construct as soon as the data type it stored got a vtable)
Of course, in most cases, you're unlikely to see any measurable performance difference, this is simply to point out that in some border cases, it can be costly.
However, performance shouldn't be your primary consideration here.
Making everything virtual is not a perfect solution for other reasons.
Allowing everything to be overridden in derived classes makes it much harder to maintain class invariants. How does a class guarantee that it stays in a consistent state when any one of its methods could be redefined at any time?
Making everything virtual may eliminate a few potential bugs, but it also introduces new ones.
If you need the functionality of virtual dispatch, you have to pay the price. The advantage of C++ is that you can use a very efficient implementation of virtual dispatch provided by the compiler, rather than a possibly inefficient version you implement yourself.
However, lumbering yourself with the overhead if you don't needx it is possibly going a bit too far. And most classesare not designed to be inherited from - to create a good base class requires more than making its functions virtual.
Virtual dispatch is an order of magnitude slower than some alternatives - not due to indirection so much as the prevention of inlining. Below, I illustrate that by contrasting virtual dispatch with an implementation embedding a "type(-identifying) number" in the objects and using a switch statement to select the type-specific code. This avoids function call overhead completely - just doing a local jump. There is a potential cost to maintainability, recompilation dependencies etc through the forced localisation (in the switch) of the type-specific functionality.
IMPLEMENTATION
#include <iostream>
#include <vector>
// virtual dispatch model...
struct Base
{
virtual int f() const { return 1; }
};
struct Derived : Base
{
virtual int f() const { return 2; }
};
// alternative: member variable encodes runtime type...
struct Type
{
Type(int type) : type_(type) { }
int type_;
};
struct A : Type
{
A() : Type(1) { }
int f() const { return 1; }
};
struct B : Type
{
B() : Type(2) { }
int f() const { return 2; }
};
struct Timer
{
Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
struct timespec from;
double elapsed() const
{
struct timespec to;
clock_gettime(CLOCK_MONOTONIC, &to);
return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
}
};
int main(int argc)
{
for (int j = 0; j < 3; ++j)
{
typedef std::vector<Base*> V;
V v;
for (int i = 0; i < 1000; ++i)
v.push_back(i % 2 ? new Base : (Base*)new Derived);
int total = 0;
Timer tv;
for (int i = 0; i < 100000; ++i)
for (V::const_iterator i = v.begin(); i != v.end(); ++i)
total += (*i)->f();
double tve = tv.elapsed();
std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';
// ----------------------------
typedef std::vector<Type*> W;
W w;
for (int i = 0; i < 1000; ++i)
w.push_back(i % 2 ? (Type*)new A : (Type*)new B);
total = 0;
Timer tw;
for (int i = 0; i < 100000; ++i)
for (W::const_iterator i = w.begin(); i != w.end(); ++i)
{
if ((*i)->type_ == 1)
total += ((A*)(*i))->f();
else
total += ((B*)(*i))->f();
}
double twe = tw.elapsed();
std::cout << "switched: " << total << ' ' << twe << '\n';
// ----------------------------
total = 0;
Timer tw2;
for (int i = 0; i < 100000; ++i)
for (W::const_iterator i = w.begin(); i != w.end(); ++i)
total += (*i)->type_;
double tw2e = tw2.elapsed();
std::cout << "overheads: " << total << ' ' << tw2e << '\n';
}
}
PERFORMANCE RESULTS
On my Linux system:
~/dev g++ -O2 -o vdt vdt.cc -lrt
~/dev ./vdt
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726
This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast. Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.
COMMENTS RE VIRTUAL DISPATCH
It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).
The extra cost is virtually nothing in most scenarios. (pardon the pun). ejac has already posted sensible relative measures.
The biggest thing you give up is possible optimizations due to inlining. They can be especially good if the function is called with constant parameters. This rarely makes a real difference, but in a few cases, this can be huge.
Regarding optimizations:
It is important to know and consider the relative cost of constructs of your language. Big O notation is onl half of the story - how does your application scale. The other half is the constant factor in front of it.
As a rule of thumb, I wouldn't go out of my way to avoid virtual functions, unless there are clear and specific indications that it is a bottle neck. A clean design always comes first - but it is only one stakeholder that should not unduly hurt others.
Contrived Example: An empty virtual destructor on an array of one million small elements may plow through at least 4MB of data, thrashing your cache. If that destructor can be inlined away, the data won't be touched.
When writing library code, such considerations are far from premature. You never know how many loops will be put around your function.
While everyone else is correct about the performance of virtual methods and such, I think the real problem is whether the team knows about the definition of the virtual keyword in C++.
Consider this code, what is the output?
#include <stdio.h>
class A
{
public:
void Foo()
{
printf("A::Foo()\n");
}
};
class B : public A
{
public:
void Foo()
{
printf("B::Foo()\n");
}
};
int main(int argc, char** argv)
{
A* a = new A();
a->Foo();
B* b = new B();
b->Foo();
A* a2 = new B();
a2->Foo();
return 0;
}
Nothing surprising here:
A::Foo()
B::Foo()
A::Foo()
As nothing is virtual. If the virtual keyword is added to the front of Foo in both A and B classes, we get this for the output:
A::Foo()
B::Foo()
B::Foo()
Pretty much what everyone expects.
Now, you mentioned that there are bugs because someone forgot to add a virtual keyword. So consider this code (where the virtual keyword is added to A, but not B class). What is the output then?
#include <stdio.h>
class A
{
public:
virtual void Foo()
{
printf("A::Foo()\n");
}
};
class B : public A
{
public:
void Foo()
{
printf("B::Foo()\n");
}
};
int main(int argc, char** argv)
{
A* a = new A();
a->Foo();
B* b = new B();
b->Foo();
A* a2 = new B();
a2->Foo();
return 0;
}
Answer: The same as if the virtual keyword is added to B? The reason is that the signature for B::Foo matches exactly as A::Foo() and because A's Foo is virtual, so is B's.
Now consider the case where B's Foo is virtual and A's is not. What is the output then? In this case, the output is
A::Foo()
B::Foo()
A::Foo()
The virtual keyword works downwards in the hierarchy, not upwards. It never makes the base class methods virtual. The first time a virtual method is encountered in the hierarchy is when the polymorphism begins. There isn't a way for later classes to make previous classes have virtual methods.
Don't forget that virtual methods mean that this class is giving future classes the ability to override/change some of its behaviors.
So if you have a rule to remove the virtual keyword, it may not have the intended effect.
The virtual keyword in C++ is a powerful concept. You should make sure each member of the team really knows this concept so that it can be used as designed.
Depending on your platform, the overhead of a virtual call can be very undesirable. By declaring every function virtual you're essentially calling them all through a function pointer. At the very least this is an extra dereference, but on some PPC platforms it will use microcoded or otherwise slow instructions to accomplish this.
I'd recommend against your suggestion for this reason, but if it helps you prevent bugs then it may be worth the trade off. I can't help but think that there must be some middle ground that is worth finding, though.
It will require just a couple of extra asm instruction to call virtual method.
But I don't think you worry that fun(int a, int b) has a couple of extra 'push' instructions compared to fun(). So don't worry about virtuals too, until you are in special situation and see that it really leads to problems.
P.S. If you have a virtual method, make sure you have a virtual destructor. This way you'll avoid possible problems
In response to 'xtofl' and 'Tom' comments. I did small tests with 3 functions:
Virtual
Normal
Normal with 3 int parameters
My test was a simple iteration:
for(int it = 0; it < 100000000; it ++) {
test.Method();
}
And here the results:
3,913 sec
3,873 sec
3,970 sec
It was compiled by VC++ in debug mode. I did only 5 tests per method and computed the mean value (so results may be pretty inaccurate)... Any way, the values are almost equal assuming 100 million calls. And the method with 3 extra push/pop was slower.
The main point is that if you don't like the analogy with the push/pop, think of extra if/else in your code? Do you think about CPU pipeline when you add extra if/else ;-) Also, you never know on what CPU the code will be running... Usual compiler can generates code more optimal for one CPU and less optimal for an other (Intel C++ Compiler)
Profiling my C++ code with gprof, I discovered that a significant portion of my time is spent calling one virtual method over and over. The method itself is short and could probably be inlined if it wasn't virtual.
What are some ways I could speed this up short of rewriting it all to not be virtual?
Are you sure the time is all call-related? Could it be the function itself where the cost is? If this is the case simply inlining things might make the function vanish from your profiler but you won't see much speed-up.
Assuming it really is the overhead of making so many virtual calls there's a limit to what you can do without making things non-virtual.
If the call has early-outs for things like time/flags then I'll often use a two-level approach. The checking is inlined with a non-virtual call, with the class-specific behavior only called if necessary.
E.g.
class Foo
{
public:
inline void update( void )
{
if (can_early_out)
return;
updateImpl();
}
protected:
virtual void updateImpl( void ) = 0;
};
If the virtual calling really is the bottleneck give CRTP a try.
Is the time being spent in the actual function call, or in the function itself?
A virtual function call is noticeably slower than a non-virtual call, because the virtual call requires an extra dereference. (Google for 'vtable' if you want to read all the hairy details.) )Update: It turns out the Wikipedia article isn't bad on this.
"Noticeably" here, though, means a couple of instructions If it's consuming a significant part of the total computation including time spent in the called function, that sounds like a marvelous place to consider unvirtualizing and inlining.
But in something close to 20 years of C++, I don't think I've ever seen that really happen. I'd love to see the code.
Please be aware that "virtual" and "inline" are not opposites -- a method can be both. The compiler will happily inline a virtual function if it can determine the type of the object at compile time:
struct B {
virtual int f() { return 42; }
};
struct D : public B {
virtual int f() { return 43; }
};
int main(int argc, char **argv) {
B b;
cout << b.f() << endl; // This call will be inlined
D d;
cout << d.f() << endl; // This call will be inlined
B& rb = rand() ? b : d;
cout << rb.f() << endl; // Must use virtual dispatch (i.e. NOT inlined)
return 0;
}
[UPDATE: Made certain rb's true dynamic object type cannot be known at compile time -- thanks to MSalters]
If the type of the object can be determined at compile time but the function is not inlineable (e.g. it is large or is defined outside of the class definition), it will be called non-virtually.
It's sometimes instructive to consider how you'd write the code in good old 'C' if you didn't have C++'s syntactic sugar available. Sometimes the answer isn't using an indirect call. See this answer for an example.
You might be able get a little better performance from the virtual call by changing the calling convention. The old Borland compiler had a __fastcall convention which passed arguments in cpu registers instead of on the stack.
If you're stuck with the virtual call and those few operations really count, then check your compiler documentation for supported calling conventions.
Here is one possible way to do it using RTTI.