I read about function pointers in C.
And everyone said that will make my program run slow.
Is it true?
I made a program to check it.
And I got the same results on both cases. (measure the time.)
So, is it bad to use function pointer?
Thanks in advance.
To response for some guys.
I said 'run slow' for the time that I have compared on a loop.
like this:
int end = 1000;
int i = 0;
while (i < end) {
fp = func;
fp ();
}
When you execute this, i got the same time if I execute this.
while (i < end) {
func ();
}
So I think that function pointer have no difference of time
and it don't make a program run slow as many people said.
You see, in situations that actually matter from the performance point of view, like calling the function repeatedly many times in a cycle, the performance might not be different at all.
This might sound strange to people, who are used to thinking about C code as something executed by an abstract C machine whose "machine language" closely mirrors the C language itself. In such context, "by default" an indirect call to a function is indeed slower than a direct one, because it formally involves an extra memory access in order to determine the target of the call.
However, in real life the code is executed by a real machine and compiled by an optimizing compiler that has a pretty good knowledge of the underlying machine architecture, which helps it to generate the most optimal code for that specific machine. And on many platforms it might turn out that the most efficient way to perform a function call from a cycle actually results in identical code for both direct and indirect call, leading to the identical performance of the two.
Consider, for example, the x86 platform. If we "literally" translate a direct and indirect call into machine code, we might end up with something like this
// Direct call
do-it-many-times
call 0x12345678
// Indirect call
do-it-many-times
call dword ptr [0x67890ABC]
The former uses an immediate operand in the machine instruction and is indeed normally faster than the latter, which has to read the data from some independent memory location.
At this point let's remember that x86 architecture actually has one more way to supply an operand to the call instruction. It is supplying the target address in a register. And a very important thing about this format is that it is normally faster than both of the above. What does this mean for us? This means that a good optimizing compiler must and will take advantage of that fact. In order to implement the above cycle, the compiler will try to use a call through a register in both cases. If it succeeds, the final code might look as follows
// Direct call
mov eax, 0x12345678
do-it-many-times
call eax
// Indirect call
mov eax, dword ptr [0x67890ABC]
do-it-many-times
call eax
Note, that now the part that matters - the actual call in the cycle body - is exactly and precisely the same in both cases. Needless to say, the performance is going to be virtually identical.
One might even say, however strange it might sound, that on this platform a direct call (a call with an immediate operand in call) is slower than an indirect call as long as the operand of the indirect call is supplied in a register (as opposed to being stored in memory).
Of course, the whole thing is not as easy in general case. The compiler has to deal with limited availability of registers, aliasing issues etc. But is such simplistic cases as the one in your example (and even in much more complicated ones) the above optimization will be carried out by a good compiler and will completely eliminate any difference in performance between a cyclic direct call and a cyclic indirect call. This optimization works especially well in C++, when calling a virtual function, since in a typical implementation the pointers involved are fully controlled by the compiler, giving it full knowledge of the aliasing picture and other relevant stuff.
Of course, there's always a question of whether your compiler is smart enough to optimize things like that...
I think when people say this they're referring to the fact that using function pointers may prevent compiler optimizations (inlining) and processor optimizations (branch prediction). However, if function pointers are an effective way to accomplish something that you're trying to do, chances are that any other method of doing it would have the same drawbacks.
And unless your function pointers are being used in tight loops in a performance critical application or on a very slow embedded system, chances are the difference is negligible anyway.
And everyone said that will make my
program run slow. Is it true?
Most likely this claim is false. For one, if the alternative to using function pointers are something like
if (condition1) {
func1();
} else if (condition2)
func2();
} else if (condition3)
func3();
} else {
func4();
}
this is most likely relatively much slower than just using a single function pointer. While calling a function through a pointer does have some (typically neglectable) overhead, it is normally not the direct-function-call versus through-pointer-call difference that is relevant to compare.
And secondly, never optimize for performance without any measurements. Knowing where the bottlenecks are is very difficult (read impossible) to know and sometimes this can be quite non-intuitively (for instance the linux kernel developers have started removing the inline keyword from functions because it actually hurt performance).
A lot of people have put in some good answers, but I still think there's a point being missed. Function pointers do add an extra dereference which makes them several cycles slower, that number can increase based on poor branch prediction (which incidentally has almost nothing to do with the function pointer itself). Additionally functions called via a pointer cannot be inlined. But what people are missing is that most people use function pointers as an optimization.
The most common place you will find function pointers in c/c++ APIs is as callback functions. The reason so many APIs do this is because writing a system that invokes a function pointer whenever events occur is much more efficient than other methods like message passing. Personally I've also used function pointers as part of a more-complex input processing system, where each key on the keyboard has a function pointer mapped to it via a jump table. This allowed me to remove any branching or logic from the input system and merely handle the key press coming in.
Calling a function via a function pointer is somewhat slower than a static function call, since the former call includes an extra pointer dereferencing. But AFAIK this difference is negligible on most modern machines (except maybe some special platforms with very limited resources).
Function pointers are used because they can make the program much simpler, cleaner and easier to maintain (when used properly, of course). This more than makes up for the possible very minor speed difference.
A lot of good points in earlier replies.
However take a look at C qsort comparison function. Because the comparison function cannot be inlined and needs to follow standard stack based calling conventions, the total running time for the sort can be an order of magnitude (more exactly 3-10x) slower for integer keys, than otherwise same code with a direct, inlineable, call.
A typical inlined comparison would be a sequence of simple CMP and possibly CMOV/SET instruction. A function call also incurs the overhead of a CALL, setting up stack frame, doing the comparison, tearing down stack frame and returning the result. Note, that the stack operations can cause pipeline stalls due to CPU pipeline length and virtual registers. For example if value of say eax is needed before the instruction that last modified eax has finished executing (which typically takes about 12 clock cycles on the newest processors). Unless the CPU can execute other instructions out of order to wait for that, a pipeline stall will occur.
Using a function pointer is slower that just calling a function as it is another layer of indirection. (The pointer needs to be dereferenced to get the memory address of the function). While it is slower, compared to everything else your program may do (Read a file, write to the console) it is negligible.
If you need to use function pointers, use them because anything that tries to do the same thing but avoids using them will be slower and less maintainable that using function pointers.
Possibly.
The answer depends on what the function pointer is being used for and hence what the alternatives are. Comparing function pointer calls to direct function calls is misleading if a function pointer is being used to implement a choice that's part of our program logic and which can't simply be removed. I'll go ahead and nonetheless show that comparison and come back to this thought afterwards.
Function pointer calls have the most opportunity to degrade performance compared to direct function calls when they inhibit inlining. Because inlining is a gateway optimization, we can craft wildly pathological cases where function pointers are made arbitrarily slower than the equivalent direct function call:
void foo(int* x) {
*x = 0;
}
void (*foo_ptr)(int*) = foo;
int call_foo(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo(&r);
return r;
}
int call_foo_ptr(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo_ptr(&r);
return r;
}
Code generated for call_foo():
call_foo(int*, int):
xor eax, eax
ret
Nice. foo() has not only been inlined, but doing so has allowed the compiler to eliminate the entire preceding loop! The generated code simply zeroes out the return register by XORing the register with itself and then returns. On the other hand, compilers will have to generate code for the loop in call_foo_ptr() (100+ lines with gcc 7.3) and most of that code effectively does nothing (so long as foo_ptr still points to foo()). (In more typical scenarios, you can expect that inlining a small function into a hot inner loop might reduce execution time by up to about an order of magnitude.)
So in a worst case scenario, a function pointer call is arbitrarily slower than a direct function call, but this is misleading. It turns out that if foo_ptr had been const, then call_foo() and call_foo_ptr() would have generated the same code. However, this would require us to give up the opportunity for indirection provided by foo_ptr. Is it "fair" for foo_ptr to be const? If we're interested in the indirection provided by foo_ptr, then no, but if that's the case, then a direct function call is not a valid option either.
If a function pointer is being used to provide useful indirection, then we can move the indirection around or in some cases swap out function pointers for conditionals or even macros, but we can't simply remove it. If we've decided that function pointers are a good approach but performance is a concern, then we typically want to pull indirection up the call stack so that we pay the cost of indirection in an outer loop. For example, in the common case where a function takes a callback and calls it in a loop, we might try moving the innermost loop into the callback (and changing the responsibility of each callback invocation accordingly).
Related
I am from Python and still new at c++.
Now I wonder if calling a function is slower in performance then calling the code of the func itself?
Some example.
struct mynum {
public:
int m_value = 0;
constexpr
int value() { return m_value; }
// Say we would create a func here.
// That wants to use the value of "m_value"
// Is it slower to use "value()" instead of "m_value"?
// Even if the difference is very small.
// Or is there indeed no difference because everything gets compiled.
void somefunc() {
if(value() == 0) {}
}
}
If the function body is available at the time it is called, there is a good chance the compiler will try to either automatically inline it (the "inline" keyword is just a hint) or leave it as a function body. In both cases you are probably in the best path as compilers are pretty good at this kind of decisions - or better than us.
If only the function prototype (the declaration) is known by the compiler and the body is defined in another compilation unit (*.cpp file) then there are a couple of hits you might take:
The processor pipeline (and speculative execution) might stall which may call you a few cycles although processors have become extremely efficient at these things in the past 10 years or so. Even dynamic branch optimization has become so good that there is no point rearranging the order or if/else like we used to do 20 years ago (still necessary for microprocessors though).
The register optimization will display a clean cut, which will affect some intensive calculations primarily. Basically the processor runs an optimization to decide in which registers the variables being used will reside on. When you make a call, only a couple of them will be guaranteed to be preserved, all the others will need to be reloaded when the function returns. If the number of variables active is large, that load/unload can affect performance but that is really rare.
If the function is a virtual method, the indirect lookup on the virtual table might add up to ten cycles. Compilers might de-virtualize a call if it knows exactly which class will be called however so this cost might be actually the same of a normal function. In more complex cases, with several layers of polymorphism then virtual calls might take up to 20 cycles. On my tests with 2 layers the cost is in average 5-7 cycles on an AMD Zen3 (Threadripper).
But overall if the function call is not virtual, the cost will be really negligible. There are programmers that swear by inlining everything but if my experience is worth note, I have programatically generated code 100% inlined and the same code compiled in separate and the performance was largely the same.
There is some function call overhead in C++, but a simple function like this that just returns a known variable will probably be compiled out and replaced with a reference to that variable.
Quick question and I apologize if it sounds naive.
What is faster in c++. A code like this:
ProgramsManager::CurrentProgram->Uniforms->Set(n1);
ProgramsManager::CurrentProgram->Uniforms->Set(n2);
ProgramsManager::CurrentProgram->Uniforms->Set(n3);
ProgramsManager::CurrentProgram->Uniforms->Set(...);
Or this one?
Uniforms* u = ProgramsManager::CurrentProgram->Uniforms;
u->Set(n1);
u->Set(n2);
u->Set(n3);
u->Set(...);
I know the second piece of code is faster in interpreted languages, but I feel like it makes no difference in compiled languages. Am I right?
Thank you in advance
The second might be faster, but it won't be faster by a lot.
The reason it might be faster is if the compiler cannot prove to itself that ProgramsManager::CurrentProgram->Uniforms could be changed by the calls to ...->Set. If it can't prove this, it will have to re-evaluate the expression ProgramsManager::CurrentProgram->Uniforms for each line.
However, modern CPUs are usually fairly quick at this kind of thing, and compilers are getting better.
There are 3 choices here, not 2.
Call a single parameter function.
Call one function with many parameters.
Call a single function with container, like struct or vector.
Fundamental Overhead
When calling a function there is an overhead of instructions. Usually this involves placing values in registers or on the stack or something else.
Lower level, there may be the possibility of the processor having to reload it's instruction cache / pipe line.
Optimizing The Function Call
For optimizing function calls, the best method is to avoid the call by pasting the code (a.k.a. inlining). This removes the overhead.
The next best is to reduce the number of function calls. For example, passing more parameters will use less function calls and less overhead.
Many Parameters versus One Container
The optimal function call passes values by registers. Extra parameters, more than the available registers, results in using the stack memory. This means that the function will need code to retrieve the values from the stack memory.
Passing many parameters using the stack incurs an overhead. Also, the function signature will need to change if more parameters are added or removed.
Placing variables into a container reduces the overhead. Only a pointer (or reference) to the container needs to be passed. This usually involves only a register since pointers usually fit into a register (many compilers pass structures by reference using pointers).
Another benefit to the container is that the container can change without having to change the function signature.
In implementing a menu on an embedded system in C(++) (AVR-Gcc), I ended up with void function pointer that take arguments, and usually make use of them.
// void function prototype
void (*auxFunc)(char *);
In some cases (in fact quite a few), the function actually doesn't need the argument, so I would do something like:
if (something) doAuxFunc(NULL);
I know I could just overload to a different function type, but I'm actually trying not to do this as I am instantiating multiple objects and want to keep them light.
Is calling multiple functions with NULL pointers (when they are intended for an actual pointer) worse than implementing many more function prototypes?
Checking for NULLs is a very small overhead even on a microcontroller - comparison against 0 is supposed to be lightning fast. If you overload several functions, you'll crucify readability for (a very slight) improvement in performance. Just let GCC's optimizer do its stuff, it's pretty good at it :)
Look at the disassembly, it should be generating a null (zero) to pass as the first argument, which either burns a register or a stack location, if it burns a register then it may cost you a push and pop if the calling function is starving for registers. (just using a function call may cost you pushes and pops if the function is starving for registers in order to implement the calling convention).
So there is likely a cost, but it may not be enough of a cost to change the way you do things.
Checking for 0 is really cheap, overloading is even cheaper, since it is decided at compile time which function to chose.
But if you think that your interfaces get too complicated with overloading and your function is small you should declare it inline and put it in a header. Checkig for 0 can then easily be optimized away by any decent modern compiler.
I think the "tradeoff" is ridiculously low for each approach but this is the time to do benchmarks for yourself. If you do so, please post some results :)
I have the following situation:
class A
{
public:
A(int whichFoo);
int foo1();
int foo2();
int foo3();
int callFoo(); // cals one of the foo's depending on the value of whichFoo
};
In my current implementation I save the value of whichFoo in a data member in the constructor and use a switch in callFoo() to decide which of the foo's to call. Alternatively, I can use a switch in the constructor to save a pointer to the right fooN() to be called in callFoo().
My question is which way is more efficient if an object of class A is only constructed once, while callFoo() is called a very large number of times. So in the first case we have multiple executions of a switch statement, while in the second there is only one switch, and multiple calls of a member function using the pointer to it. I know that calling a member function using a pointer is slower than just calling it directly. Does anybody know if this overhead is more or less than the cost of a switch?
Clarification: I realize that you never really know which approach gives better performance until you try it and time it. However, in this case I already have approach 1 implemented, and I wanted to find out if approach 2 can be more efficient at least in principle. It appears that it can be, and now it makes sense for me to bother to implement it and try it.
Oh, and I also like approach 2 better for aesthetic reasons. I guess I am looking for a justification to implement it. :)
How sure are you that calling a member function via a pointer is slower than just calling it directly? Can you measure the difference?
In general, you should not rely on your intuition when making performance evaluations. Sit down with your compiler and a timing function, and actually measure the different choices. You may be surprised!
More info: There is an excellent article Member Function Pointers and the Fastest Possible C++ Delegates which goes into very deep detail about the implementation of member function pointers.
You can write this:
class Foo {
public:
Foo() {
calls[0] = &Foo::call0;
calls[1] = &Foo::call1;
calls[2] = &Foo::call2;
calls[3] = &Foo::call3;
}
void call(int number, int arg) {
assert(number < 4);
(this->*(calls[number]))(arg);
}
void call0(int arg) {
cout<<"call0("<<arg<<")\n";
}
void call1(int arg) {
cout<<"call1("<<arg<<")\n";
}
void call2(int arg) {
cout<<"call2("<<arg<<")\n";
}
void call3(int arg) {
cout<<"call3("<<arg<<")\n";
}
private:
FooCall calls[4];
};
The computation of the actual function pointer is linear and fast:
(this->*(calls[number]))(arg);
004142E7 mov esi,esp
004142E9 mov eax,dword ptr [arg]
004142EC push eax
004142ED mov edx,dword ptr [number]
004142F0 mov eax,dword ptr [this]
004142F3 mov ecx,dword ptr [this]
004142F6 mov edx,dword ptr [eax+edx*4]
004142F9 call edx
Note that you don't even have to fix the actual function number in the constructor.
I've compared this code to the asm generated by a switch. The switch version doesn't provide any performance increase.
To answer the asked question: at the finest-grained level, the pointer to the member function will perform better.
To address the unasked question: what does "better" mean here? In most cases I would expect the difference to be negligible. Depending on what the class it doing, however, the difference may be significant. Performance testing before worrying about the difference is obviously the right first step.
If you are going to keep using a switch, which is perfectly fine, then you probably should put the logic in a helper method and call if from the constructor. Alternatively, this is a classic case of the Strategy Pattern. You could create an interface (or abstract class) named IFoo which has one method with Foo's signature. You would have the constructor take in an instance of IFoo (constructor Dependancy Injection that implemented the foo method that you want. You would have a private IFoo that would be set with this constructor, and every time you wanted to call Foo you would call your IFoo's version.
Note: I haven't worked with C++ since college, so my lingo might be off here, ut the general ideas hold for most OO languages.
If your example is real code, then I think you should revisit your class design. Passing in a value to the constructor, and using that to change behaviour is really equivalent to creating a subclass. Consider refactoring to make it more explicit. The effect of doing so is that your code will end up using a function pointer (all virtual methods are, really, are function pointers in jump tables).
If, however your code was just a simplified example to ask whether, in general, jump tables are faster than switch statements, then my intuition would say that jump tables are quicker, but you are dependent on the compiler's optimisation step. But if performance is really such a concern, never rely on intuition - knock up a test program and test it, or look at the generated assembler.
One thing is certain, a switch statement will never be slower than a jump table. The reason being that the best a compiler's optimiser can do will be too turn a series of conditional tests (i.e. a switch) into a jump table. So if you really want to be certain, take the compiler out of the decision process and use a jump table.
Sounds like you should make callFoo a pure virtual function and create some subclasses of A.
Unless you really need the speed, have done extensive profiling and instrumenting, and determined that the calls to callFoo are really the bottleneck. Have you?
Function pointers are almost always better than chained-ifs. They make cleaner code, and are nearly always faster (except perhaps in a case where its only a choice between two functions and is always correctly predicted).
I should think that the pointer would be faster.
Modern CPUs prefetch instructions; mis-predicted branches flush the cache, which means it stalls while it refills the cache. A pointer doens't do that.
Of course, you should measure both.
Optimize only when needed
First: Most of the time you most likely do not care, the difference will be very small. Make sure optimizing this call really makes sense first. Only if your measurements show there is really significant time spent in the call overhead, proceed to optimizing it (shameless plug - Cf. How to optimize an application to make it faster?) If the optimization is not significant, prefer the more readable code.
Indirect call cost depends on target platform
Once you have determined it is worth to apply low-level optimization, then it is a time to understand your target platform. The cost you can avoid here is the branch misprediction penalty. On modern x86/x64 CPU this misprediction is likely to be very small (they can predict indirect calls quite well most of the time), but when targeting PowerPC or other RISC platforms, the indirect calls/jumps are often not predicted at all and avoiding them can cause significant performance gain. See also Virtual call cost depends on platform.
Compiler can implement switch using jump table as well
One gotcha: Switch can sometimes be implemented as an indirect call (using a table) as well, especially when switching between many possible values. Such switch exhibits the same misprediction as a virtual function. To make this optimization reliable, one would probably prefer using if instead of switch for the most common case.
Use timers to see which is quicker. Although unless this code is going to be over and over then it's unlikely that you'll notice any difference.
Be sure that if you are running code from the constructor that if the contruction fails that you wont leak memory.
This technique is used heavily with Symbian OS:
http://www.titu.jyu.fi/modpa/Patterns/pattern-TwoPhaseConstruction.html
If you are only calling callFoo() once, than most likely the function pointer will be slower by an insignificant amount. If you are calling it many times than most likely the function pointer will be faster by an insignificant amount (because it doesn't need to keep going through the switch).
Either way look at the assembled code to find out for sure it is doing what you think it is doing.
One often overlooked advantage to switch (even over sorting and indexing) is if you know that a particular value is used in the vast majority of cases.
It's easy to order the switch so that the most common are checked first.
ps. To reinforce greg's answer, if you care about speed - measure.
Looking at assembler doesn't help when CPUs have prefetch / predictive branching and pipeline stalls etc
In an AI application I am writing in C++,
there is not much numerical computation
there are lot of structures for which run-time polymorphism is needed
very often, several polymorphic structures interact during computation
In such a situation, are there any optimization techniques? While I won't care to optimize the application just now, one aspect of selecting C++ over Java for the project was to enable more leverage to optimize and to be able to use non-object oriented methods (templates, procedures, overloading).
In particular, what are the optimization techniques related to virtual functions? Virtual functions are implemented through virtual tables in memory. Is there some way to pre-fetch these virtual tables onto L2 cache (the cost of fetching from memory/L2 cache is increasing)?
Apart from this, are there good references for data locality techniques in C++? These techniques would reduce the wait time for data fetch into L2 cache needed for computation.
Update: Also see the following related forums: Performance Penalty for Interface, Several Levels of Base Classes
Virtual functions are very efficient. Assuming 32 bit pointers the memory layout is approximately:
classptr -> [vtable:4][classdata:x]
vtable -> [first:4][second:4][third:4][fourth:4][...]
first -> [code:x]
second -> [code:x]
...
The classptr points to memory that is typically on the heap, occasionally on the stack, and starts with a four byte pointer to the vtable for that class. But the important thing to remember is the vtable itself is not allocated memory. It's a static resource and all objects of the same class type will point to the exactly the same memory location for their vtable array. Calling on different instances won't pull different memory locations into L2 cache.
This example from msdn shows the vtable for class A with virtual func1, func2, and func3. Nothing more than 12 bytes. There is a good chance the vtables of different classes will also be physically adjacent in the compiled library (you'll want to verify this is you're especially concerned) which could increase cache efficiency microscopically.
CONST SEGMENT
??_7A##6B#
DD FLAT:?func1#A##UAEXXZ
DD FLAT:?func2#A##UAEXXZ
DD FLAT:?func3#A##UAEXXZ
CONST ENDS
The other performance concern would be instruction overhead of calling through a vtable function. This is also very efficient. Nearly identical to calling a non-virtual function. Again from the example from msdn:
; A* pa;
; pa->func3();
mov eax, DWORD PTR _pa$[ebp]
mov edx, DWORD PTR [eax]
mov ecx, DWORD PTR _pa$[ebp]
call DWORD PTR [edx+8]
In this example ebp, the stack frame base pointer, has the variable A* pa at zero offset. The register eax is loaded with the value at location [ebp], so it has the A*, and edx is loaded with the value at location [eax], so it has class A vtable. Then ecx is loaded with [ebp], because ecx represents "this" it now holds the A*, and finally the call is made to the value at location [edx+8] which is the third function address in the vtable.
If this function call was not virtual the mov eax and mov edx would not be needed, but the difference in performance would be immeasurably small.
Section 5.3.3 of the draft Technical Report on C++ Performance is entirely devoted to the overhead of virtual functions.
Have you actually profiled and found where, and what needs optimization?
Work on actually optimizing virtual function calls when you have found they actually are the bottleneck.
The only optimization I can think of is Java's JIT compiler. If I understand it correctly, it monitors the calls as the code runs, and if most calls go to particular implementation only, it inserts conditional jump to implementation when the class is right. This way, most of the time, there is no vtable lookup. Of course, for the rare case when we pass a different class, vtable is still used.
I am not aware of any C++ compiler/runtime that uses this technique.
Virtual functions tend to be a lookup and indirection function call. On some platforms, this is fast. On others, e.g., one popular PPC architecture used in consoles, this isn't so fast.
Optimizations usually revolve around expressing variability higher up in the callstack so that you don't need to invoke a virtual function multiple times within hotspots.
You can implement polymorfism in runtime using virtual functions and in compile time by using templates. You can replace virtual functions with templates. Take a look at this article for more information - http://www.codeproject.com/KB/cpp/SimulationofVirtualFunc.aspx
A solution to dynamic polymorphism could be static polymmorphism, usable if your types are known at compile type: The CRTP (Curiously recurring template pattern).
http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern
The explanation on Wikipedia is clear enough, and perhaps It could help you if you really determined virtual method calls were source of performance bottlenecks.
As already stated by the other answers, the actual overhead of a virtual function call is fairly small. It may make a difference in a tight loop where it is called millions of times per second, but it's rarely a big deal.
However, it may still have a bigger impact in that it's harder for the compiler to optimize. It can't inline the function call, because it doesn't know at compile-time which function will be called. That also makes some global optimizations harder. And how much performance does this cost you? It depends. It is usually nothing to worry about, but there are cases where it may mean a significant performance hit.
And of course it also depends on the CPU architecture. On some, it can become quite expensive.
But it's worth keeping in mind that any kind of runtime polymorphism carries more or less the same overhead. Implementing the same functionality via switch statements or similar, to select between a number of possible functions may not be cheaper.
The only reliable way to optimize this would be if you could move some of the work to compile-time. If it is possible to implement part of it as static polymorphism, some speedup may be possible.
But first, make sure you have a problem. Is the code actually too slow to be acceptable?
Second, find out what makes it slow through a profiler.
And third, fix it.
I'm reinforcing all answers that say in effect:
If you don't actually know it's a problem, any concern about fixing it is probably misplaced.
What you want to know is:
What fraction of execution time (when it's actually running) is spent in the process of invoking methods, and in particular, which methods are the most costly (by this measure).
Some profilers can give you this information indirectly. They need to summarize at the statement level, but exclusive of the time spent in the method itself.
My favorite technique is to just pause it a number of times under a debugger.
If the time spent in the process of virtual function invocations is significant, like say 20%, then on the average 1 out of 5 samples will show, at the bottom of the call stack, in the disassembly window, the instructions for following the virtual function pointer.
If you don't actually see that, it is not a problem.
In the process, you will probably see other things higher up the call stack, that actually are not needed and could save you a lot of time.
Static polymorphism, as some users answered here. For example, WTL uses this method. A clear explanation of the WTL implementation can be found at http://www.codeproject.com/KB/wtl/wtl4mfc1.aspx#atltemplates
Virtual calls do not present much greater overhead over normal functions. Although, the greatest loss is that a virtual function when called polymorphically cannot be inlined. And inlining will in a lot of situations represent some real gain in performance.
Something You can do to prevent wastage of that facility in some situations is to declare the function inline virtual.
Class A {
inline virtual int foo() {...}
};
And when you are at a point of code you are SURE about the type of the object being called, you may make an inline call that will avoid the polymorphic system and enable inlining by the compiler.
class B : public A {
inline virtual int foo()
{
//...do something different
}
void bar()
{
//logic...
B::foo();
// more logic
}
};
In this example, the call to foo() will be made non-polymorphic and bound to B implementation of foo(). But do it only when you know for sure what the instance type is, because the automatic polymorphism feature will be gone, and this is not very obvious for later code readers.
You rarely have to worry about cache in regards to such commonly used items, since they're fetched once and kept there.
Cache is only generally an issue when dealing with large data structures that either:
Are large enough and used for a very long time by a single function so that function can push everything else you need out of the cache, or
Are randomly accessed enough that the data structures themselves aren't necessarily in cache when you load from them.
Things like Vtables are generally not going to be a performance/cache/memory issue; usually there's only one Vtable per object type, and the object contains a pointer to the Vtable instead of the Vtable itself. So unless you have a few thousand types of objects, I don't think Vtables are going to thrash your cache.
1), by the way, is why functions like memcpy use cache-bypassing streaming instructions like movnt(dq|q) for extremely large (multi-megabyte) data inputs.
The cost is more or less the same than normal functions nowadays for recent CPUS, but they can't be inlined. If you call the function millions times, the impact can be significant (try calling millions of times the same function, for example, once with inline once without, and you will see it can be twice slower if the function itself does something simple; this is not a theoritical case: it is quite common for a lot of numerical computation).
With modern, ahead-looking, multiple-dispatching CPUs the overhead for a virtual function might well be zero. Nada. Zip.
If an AI application does not require great deal of number crunching, I wouldn't worry about performance disadvantage of virtual functions. There will be a marginal performance hit, only if they appear in the complex computations which are evaluated repeatedly. I don't think you can force virtual table to stay in L2 cache either.
There are a couple of optimizations available for virtual functions,
People have written compilers that resort to code analysis and transformation of the program. But, these aren't a production grade compilers.
You could replace all virtual functions with equivalent "switch...case" blocks to call appropriate functions based on the type in the hierarchy. This way you'll get rid of compiler managed virtual table and you'll have your own virtual table in the form of switch...case block. Now, chances of your own virtual table being in the L2 cache are high as it in the code path. Remember, you'll need RTTI or your own "typeof" function to achieve this.