Faster way to call a method in c++

Faster way to call a method in c++ - c++

Quick question and I apologize if it sounds naive.
What is faster in c++. A code like this:
ProgramsManager::CurrentProgram->Uniforms->Set(n1);
ProgramsManager::CurrentProgram->Uniforms->Set(n2);
ProgramsManager::CurrentProgram->Uniforms->Set(n3);
ProgramsManager::CurrentProgram->Uniforms->Set(...);
Or this one?
Uniforms* u = ProgramsManager::CurrentProgram->Uniforms;
u->Set(n1);
u->Set(n2);
u->Set(n3);
u->Set(...);
I know the second piece of code is faster in interpreted languages, but I feel like it makes no difference in compiled languages. Am I right?
Thank you in advance

The second might be faster, but it won't be faster by a lot.
The reason it might be faster is if the compiler cannot prove to itself that ProgramsManager::CurrentProgram->Uniforms could be changed by the calls to ...->Set. If it can't prove this, it will have to re-evaluate the expression ProgramsManager::CurrentProgram->Uniforms for each line.
However, modern CPUs are usually fairly quick at this kind of thing, and compilers are getting better.

There are 3 choices here, not 2.
Call a single parameter function.
Call one function with many parameters.
Call a single function with container, like struct or vector.
Fundamental Overhead
When calling a function there is an overhead of instructions. Usually this involves placing values in registers or on the stack or something else.
Lower level, there may be the possibility of the processor having to reload it's instruction cache / pipe line.
Optimizing The Function Call
For optimizing function calls, the best method is to avoid the call by pasting the code (a.k.a. inlining). This removes the overhead.
The next best is to reduce the number of function calls. For example, passing more parameters will use less function calls and less overhead.
Many Parameters versus One Container
The optimal function call passes values by registers. Extra parameters, more than the available registers, results in using the stack memory. This means that the function will need code to retrieve the values from the stack memory.
Passing many parameters using the stack incurs an overhead. Also, the function signature will need to change if more parameters are added or removed.
Placing variables into a container reduces the overhead. Only a pointer (or reference) to the container needs to be passed. This usually involves only a register since pointers usually fit into a register (many compilers pass structures by reference using pointers).
Another benefit to the container is that the container can change without having to change the function signature.

Related

C++, do functions know previously computed function values or do they have to recompute them?

Lets say I have a very costly function that checks if an object has a certain property. Another function would then, depending on whether the object has the property, do different things.
If I have previously checked for the property, would it be recomputed by the second function, or is it known?
I'm thinking of something like:
bool check_property(object){
// very costly operations...
}
void do_something(object){
if(check_property) {do thing}
else {do different thing}
}
Would the if in do_something recompute check_property?

There are several factors that have to come together for the compiler to avoid recomputing the function's result:
The compiler has to know which input values the function's result depends on. This knowledge is very difficult to extract from the code in general case. In some implementations you can help the compiler by using compiler-specific means to declare your function as "pure" or "const" (GCC function attributes)
The compiler has to make sure that the above input values did not change since the previous call to the same function. This might be very easy in some specific case, but is also very difficult in general case.
The compiler has to have the result of previous computation readily available. Normally, compilers do not deliberately "cache" such results in some dedicated storage for future reuse. The optimization in question is typically applied only when you make multiple calls to the same function in "close proximity" to each other, meaning that the previous result is easy to keep till the moment of the next call.
So, the optimization in question is certainly possible. But it is something you should expect to see in simple and very localized cases, like calling sqrt(x) several times in a row for the same value of x (in the same expression, in the same cycle and such). But for more complicated functions it is typically going to be your responsibility to either somehow avoid making multiple calls to the same expensive function, or maybe memoize the results if you believe it can benefit your code.

Unless the compiler can prove that check_property has no side effects and that all the data it depends from is the same, it is not allowed to remove the call; for all practical purposes, unless your function body is known in the current TU, it is pretty much trivial and the multiple calls happen in the same function, calling again will execute its code again. I don't know of any compiler that establish automatically a cross-call cache, because it's not trivial at all.
If you need to cache the computed values, in general you will have to do it yourself; keep in mind that it's not always trivial - generally the ugly beasts to tackle are cache invalidation (how do I know that the data used to calculate the value didn't change from the last time I calculated it? how do I avoid the cache size getting out of hand?) and multithreading concerns (is this code going to be called from multiple threads? if so, I have to synchronize the access to the cache, possibly adding coupling between unrelated threads and, in extreme cases, killing the efficiency of the cache itself).

To answer your question, yes. It will rerun it. If you want to make sure that the code doesn't run it again every time you call do_something, try adding a variable in your class that will tell you if you already ran it:
bool check_property(object){
// very costly operations...
return true;
}
void do_something(object,bool has_run){
if(has_run) {do thing}
else {do different thing}
}
void main() {
bool has_run = false;
has_run = check_property(object);
do_something(object,has_run);
}
There are of course multiple ways of doing this, and this might not fit your criteria, but it is a possible way of doing it!
I just realized that this isn't really how C++ works since everything is not in classes unlike Java. Instead you can just pass the value as an argument to the function itself. So, I have edited my code.

c++: Overhead using a function via function ptr [duplicate]

I read about function pointers in C.
And everyone said that will make my program run slow.
Is it true?
I made a program to check it.
And I got the same results on both cases. (measure the time.)
So, is it bad to use function pointer?
Thanks in advance.
To response for some guys.
I said 'run slow' for the time that I have compared on a loop.
like this:
int end = 1000;
int i = 0;
while (i < end) {
fp = func;
fp ();
}
When you execute this, i got the same time if I execute this.
while (i < end) {
func ();
}
So I think that function pointer have no difference of time
and it don't make a program run slow as many people said.

You see, in situations that actually matter from the performance point of view, like calling the function repeatedly many times in a cycle, the performance might not be different at all.
This might sound strange to people, who are used to thinking about C code as something executed by an abstract C machine whose "machine language" closely mirrors the C language itself. In such context, "by default" an indirect call to a function is indeed slower than a direct one, because it formally involves an extra memory access in order to determine the target of the call.
However, in real life the code is executed by a real machine and compiled by an optimizing compiler that has a pretty good knowledge of the underlying machine architecture, which helps it to generate the most optimal code for that specific machine. And on many platforms it might turn out that the most efficient way to perform a function call from a cycle actually results in identical code for both direct and indirect call, leading to the identical performance of the two.
Consider, for example, the x86 platform. If we "literally" translate a direct and indirect call into machine code, we might end up with something like this
// Direct call
do-it-many-times
call 0x12345678
// Indirect call
do-it-many-times
call dword ptr [0x67890ABC]
The former uses an immediate operand in the machine instruction and is indeed normally faster than the latter, which has to read the data from some independent memory location.
At this point let's remember that x86 architecture actually has one more way to supply an operand to the call instruction. It is supplying the target address in a register. And a very important thing about this format is that it is normally faster than both of the above. What does this mean for us? This means that a good optimizing compiler must and will take advantage of that fact. In order to implement the above cycle, the compiler will try to use a call through a register in both cases. If it succeeds, the final code might look as follows
// Direct call
mov eax, 0x12345678
do-it-many-times
call eax
// Indirect call
mov eax, dword ptr [0x67890ABC]
do-it-many-times
call eax
Note, that now the part that matters - the actual call in the cycle body - is exactly and precisely the same in both cases. Needless to say, the performance is going to be virtually identical.
One might even say, however strange it might sound, that on this platform a direct call (a call with an immediate operand in call) is slower than an indirect call as long as the operand of the indirect call is supplied in a register (as opposed to being stored in memory).
Of course, the whole thing is not as easy in general case. The compiler has to deal with limited availability of registers, aliasing issues etc. But is such simplistic cases as the one in your example (and even in much more complicated ones) the above optimization will be carried out by a good compiler and will completely eliminate any difference in performance between a cyclic direct call and a cyclic indirect call. This optimization works especially well in C++, when calling a virtual function, since in a typical implementation the pointers involved are fully controlled by the compiler, giving it full knowledge of the aliasing picture and other relevant stuff.
Of course, there's always a question of whether your compiler is smart enough to optimize things like that...

I think when people say this they're referring to the fact that using function pointers may prevent compiler optimizations (inlining) and processor optimizations (branch prediction). However, if function pointers are an effective way to accomplish something that you're trying to do, chances are that any other method of doing it would have the same drawbacks.
And unless your function pointers are being used in tight loops in a performance critical application or on a very slow embedded system, chances are the difference is negligible anyway.

And everyone said that will make my
program run slow. Is it true?
Most likely this claim is false. For one, if the alternative to using function pointers are something like
if (condition1) {
func1();
} else if (condition2)
func2();
} else if (condition3)
func3();
} else {
func4();
}
this is most likely relatively much slower than just using a single function pointer. While calling a function through a pointer does have some (typically neglectable) overhead, it is normally not the direct-function-call versus through-pointer-call difference that is relevant to compare.
And secondly, never optimize for performance without any measurements. Knowing where the bottlenecks are is very difficult (read impossible) to know and sometimes this can be quite non-intuitively (for instance the linux kernel developers have started removing the inline keyword from functions because it actually hurt performance).

A lot of people have put in some good answers, but I still think there's a point being missed. Function pointers do add an extra dereference which makes them several cycles slower, that number can increase based on poor branch prediction (which incidentally has almost nothing to do with the function pointer itself). Additionally functions called via a pointer cannot be inlined. But what people are missing is that most people use function pointers as an optimization.
The most common place you will find function pointers in c/c++ APIs is as callback functions. The reason so many APIs do this is because writing a system that invokes a function pointer whenever events occur is much more efficient than other methods like message passing. Personally I've also used function pointers as part of a more-complex input processing system, where each key on the keyboard has a function pointer mapped to it via a jump table. This allowed me to remove any branching or logic from the input system and merely handle the key press coming in.

Calling a function via a function pointer is somewhat slower than a static function call, since the former call includes an extra pointer dereferencing. But AFAIK this difference is negligible on most modern machines (except maybe some special platforms with very limited resources).
Function pointers are used because they can make the program much simpler, cleaner and easier to maintain (when used properly, of course). This more than makes up for the possible very minor speed difference.

A lot of good points in earlier replies.
However take a look at C qsort comparison function. Because the comparison function cannot be inlined and needs to follow standard stack based calling conventions, the total running time for the sort can be an order of magnitude (more exactly 3-10x) slower for integer keys, than otherwise same code with a direct, inlineable, call.
A typical inlined comparison would be a sequence of simple CMP and possibly CMOV/SET instruction. A function call also incurs the overhead of a CALL, setting up stack frame, doing the comparison, tearing down stack frame and returning the result. Note, that the stack operations can cause pipeline stalls due to CPU pipeline length and virtual registers. For example if value of say eax is needed before the instruction that last modified eax has finished executing (which typically takes about 12 clock cycles on the newest processors). Unless the CPU can execute other instructions out of order to wait for that, a pipeline stall will occur.

Using a function pointer is slower that just calling a function as it is another layer of indirection. (The pointer needs to be dereferenced to get the memory address of the function). While it is slower, compared to everything else your program may do (Read a file, write to the console) it is negligible.
If you need to use function pointers, use them because anything that tries to do the same thing but avoids using them will be slower and less maintainable that using function pointers.

Possibly.
The answer depends on what the function pointer is being used for and hence what the alternatives are. Comparing function pointer calls to direct function calls is misleading if a function pointer is being used to implement a choice that's part of our program logic and which can't simply be removed. I'll go ahead and nonetheless show that comparison and come back to this thought afterwards.
Function pointer calls have the most opportunity to degrade performance compared to direct function calls when they inhibit inlining. Because inlining is a gateway optimization, we can craft wildly pathological cases where function pointers are made arbitrarily slower than the equivalent direct function call:
void foo(int* x) {
*x = 0;
}
void (*foo_ptr)(int*) = foo;
int call_foo(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo(&r);
return r;
}
int call_foo_ptr(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo_ptr(&r);
return r;
}
Code generated for call_foo():
call_foo(int*, int):
xor eax, eax
ret
Nice. foo() has not only been inlined, but doing so has allowed the compiler to eliminate the entire preceding loop! The generated code simply zeroes out the return register by XORing the register with itself and then returns. On the other hand, compilers will have to generate code for the loop in call_foo_ptr() (100+ lines with gcc 7.3) and most of that code effectively does nothing (so long as foo_ptr still points to foo()). (In more typical scenarios, you can expect that inlining a small function into a hot inner loop might reduce execution time by up to about an order of magnitude.)
So in a worst case scenario, a function pointer call is arbitrarily slower than a direct function call, but this is misleading. It turns out that if foo_ptr had been const, then call_foo() and call_foo_ptr() would have generated the same code. However, this would require us to give up the opportunity for indirection provided by foo_ptr. Is it "fair" for foo_ptr to be const? If we're interested in the indirection provided by foo_ptr, then no, but if that's the case, then a direct function call is not a valid option either.
If a function pointer is being used to provide useful indirection, then we can move the indirection around or in some cases swap out function pointers for conditionals or even macros, but we can't simply remove it. If we've decided that function pointers are a good approach but performance is a concern, then we typically want to pull indirection up the call stack so that we pay the cost of indirection in an outer loop. For example, in the common case where a function takes a callback and calls it in a loop, we might try moving the innermost loop into the callback (and changing the responsibility of each callback invocation accordingly).

Using inline for a void function that takes no parameters

Is there any potential performance gain in replacing
void foo(void){/*some statement*/}
with
inline void foo(void){/*some statement*/}

There is no function call and return overhead. It probably will prevent instruction cache from reloading. Compiler will be allowed to perform far more optimizations once function body will be inlined.
Some more explanations:
CPU will load instructions when it will see it needs them, so if function will be inlined then CPU can load whole code in one read causing less CPU stalls. But if this function is actually quite large and is not executed very often then inlining it might actually cause more harm because CPU will likely load more cache lines that it is necessary. Below is example:
if ( condition ) {
// do some logic here
}
else {
foo();
}
now if condition is mostly true, then its better if foo() is not inlined, if condition is mostly false, then its better if it is inlined. So to make your code more cache friendly you should actually find a most common path of execution and make it work with as little if-s and possibly little function calls.
Function call overhead in this case is caused by the need to save registers on the stack (how many depends on actuall code), incrementing stack pointer, and jumping to funciton code. After function is done CPU needs to restore stack and registers to its previous state. This is obviously a lot of work especially if function is called inside tight loop.
Finally its important to remember that inline is only a hint to compiler. As a programmer you have knowledge of how your code is executed, and you should use this knowledge to structure your code to make it more cache friendly.

Yes (apart from the small but non-zero cost of the call): the optimiser might be able to do more to optimise the inlined code than it could with a function call.

Not-inlined function call incurs some minor penality even if you do not pass any arguments back and forth. For example, the address where the function should return after finishing its execution needs to be stored somewhere.
If I were you, I would make no particular distinction between void-void functions and other type of functions when deciding if to inline it or not. There are other more significant aspects which decide if inline is helpful or not, for example:
frequency of the calls (higher number favors inlining)
size of the function (bigger size favors not inlining)

inline is completely orthogonal to parameter and return type.

The overhead cost of a function call -- creating a stack for the function, pushing the arguments to the stack, managing return value from the stack, and deleting the stack -- are avoided when you have an inline function. That would improve performance.
In your case, you don't have any input arguments and a return value. So the cost of a function call will be reduced a little bit. It will be still more expensive than an inline function.

How expensive are NULL pointer arguments?

In implementing a menu on an embedded system in C(++) (AVR-Gcc), I ended up with void function pointer that take arguments, and usually make use of them.
// void function prototype
void (*auxFunc)(char *);
In some cases (in fact quite a few), the function actually doesn't need the argument, so I would do something like:
if (something) doAuxFunc(NULL);
I know I could just overload to a different function type, but I'm actually trying not to do this as I am instantiating multiple objects and want to keep them light.
Is calling multiple functions with NULL pointers (when they are intended for an actual pointer) worse than implementing many more function prototypes?

Checking for NULLs is a very small overhead even on a microcontroller - comparison against 0 is supposed to be lightning fast. If you overload several functions, you'll crucify readability for (a very slight) improvement in performance. Just let GCC's optimizer do its stuff, it's pretty good at it :)

Look at the disassembly, it should be generating a null (zero) to pass as the first argument, which either burns a register or a stack location, if it burns a register then it may cost you a push and pop if the calling function is starving for registers. (just using a function call may cost you pushes and pops if the function is starving for registers in order to implement the calling convention).
So there is likely a cost, but it may not be enough of a cost to change the way you do things.

Checking for 0 is really cheap, overloading is even cheaper, since it is decided at compile time which function to chose.
But if you think that your interfaces get too complicated with overloading and your function is small you should declare it inline and put it in a header. Checkig for 0 can then easily be optimized away by any decent modern compiler.

I think the "tradeoff" is ridiculously low for each approach but this is the time to do benchmarks for yourself. If you do so, please post some results :)

Performance when accessing class members

I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now

The compiler will probably equalize them. If it has any brains at all, it will copy values.a, values.b, and values.c into local variables or registers, which is also what happens in the simple case.
The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.

This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)

No, your cpu would cache the variables you use over and over again.

I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
int a = objA.a;
int b = objA.b;
int c = objA.c;
I think it would be almost the same performance

If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.

you can also start your function with
int & a = values.a;
int & b = values.b;
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.

As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the this pointer) and some offset (the location in the function's stack frame, or the location inside the class).
Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the this pointer might not. If this is the case, accessing this (which is presumably on the stack) would require an extra memory lookup.
Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.

If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js