Performance function calls per frame - c++

I'm a game developer therefore performance is really important to me.
My simple question:
I have a lot of checks(button clicks,collisions,whatever) running per frame, but I don't want to put everything in one function, therefore I would split them into other functions and just call them:
void Tick()
{
//Check 1 ..... lots of code
//Check 2 ...... lots of code
//Check 3 ..... lots of code
}
to
void Tick()
{
funcCheck1();
funcCheck2();
funcCheck3();
}
void funcCheck1()
{
//check1 lots of code
}
void funcCheck2()
{
//check2 lots of code
}
void funcCheck3()
{
//check3 lots of code
}
Does the function call per frame has any performance impact?(not inlined)
Clearly the second version is much more readable.

If you don't pass any complex objects by value, the overhead of calling several functions instead of putting all code in one function should be negligible (e.g.
put function parameters on top of the stack, add space for the return type, jump to the beginning of the called function's code)

You cannot say for sure, specifically that the compiler could inline small function automatically. The only way to be sure is to use a profiler and compare the two scenarios.

Related

How many times can C++ pass by reference?

I am writing a small game on my phone in SDL2. In main I have a while loop and basic game control conditions which are bools. I pass 'initialise', 'update', 'quit' and the 'renderer' to a game function by reference that deals with the game logic. Now it's getting more complicated I want to separate certain logic to outside of game, and to do that I have to pass the references from main, to game, to more functions outside of game. Main would pass to game, game would pass to func2, and possibly func2 needs to pass to func3.
Do the C++ standards/specification limit the use of pass by reference? You could have a chain of 10+ functions passing down quit, update, etc.
//
// extra functions here
// which break up game
// each need quit, initialise, update and renderer
// so I pass by reference
//
// void func2(&rend, &quit, &update, &initialise)
void game(SDL_Renderer *rend, bool &initialise, bool &quit, bool &update)
{
static Table t{};
if (initialise)
{
// setup game
func2(rend, initialise, quit, update);
}
if(update)
{
// refresh screen
}
while (PollTouchEvent)
{
// touching screen
// can quit here
}
}
int main(int argc, char *argv[])
{
// initialise SDL
// ...
// game stuff
bool quit = false;
bool update = true;
bool initialise = true;
while (!quit)
{
game(renderer, initialise, quit, update);
}
// quit SDL
return 0;
}
There is no limit to the number of times you can pass a variable by reference, other than perhaps the size of the stack — i.e. the address of a variable that is passed by reference may need to be placed on the stack, and in extreme cases (read: deeply recursive functions) that might exceed the stack’s capacity. Such a problem isn’t very common, though.
Whether it’s a good approach to be passing lots of variables by reference is a different question. In particular, if you find yourself passing the same set of references to many different functions, consider creating a struct or class-object containing all of that information as member variables and just passing around a single reference to that object instead; that will be more efficient and also much easier to maintain/update as your program changes.
Also note that modern c++ compilers can often perform optimizations more effectively on arguments that are passed by-value than on arguments passed by-reference, since in the by-value case the optimizer doesn’t have to worry about the possibility of aliasing. Whether one approach or the other is faster would have to be measured on a case-by-case basis, though, as performance can depend a lot on the details of what is being done.

Calling Recursive Function From Function Without Blocking parent Function C++ [duplicate]

This question already has an answer here:
How can I run 3 QProcess in background using the same executable
(1 answer)
Closed 5 years ago.
I'm programming with Qt (c++) but my issue is universal in programming (most probably).
To simplify things, the function GetInput(string input) continuously scans for new input.
Depending on the input, the program exits or calls a recursive function.
The problem is, the RecursiveFunc() function blocks the GetInput() function, thus making it impossible to get further input (making it impossible to exit). Basically, the RecursiveFunc() function will call itself over and over, so the GetInput function never returns, making it impossible to get any more input.
My Question: How does a function call a recursive function BUT STILL continuously run and return WHILE the recursion is running.
//needs to constantly scan for input
void GetInput(string input)
{
if (input == "exit")
{
//terminate program
//this point is never reached after calling RecursiveFunc()
}
else if (input == "program1")
{
//Code executions gets stuck here because of recursion
RecursiveFunc();
int i = 0; //this statement is never reached, for example
}
}
void RecursiveFunc()
{
//update some values
//do some more things
//sleep a little, then call the fuction again
RecursiveFunc()
}
I'm thinking, something similiar to a fire-and-forget mechanism would be needed, but I can't quite figure it out. I could probably use threads, but I'm trying to avoid that (as the program should stay as simple as possible). As stated, I'm using Qt. So, what options do I have? What's the best solution in terms of simplicity?
Threads, co-routines, message loops with timers.
Qt has a message loop; change architecture to use that is simplest.
Co-routines lack language support, but there are myriads of implementations people have hacked together.
Threading is complex to get right, but keeps each code looking mostly linear.
Conclusion: Rewrite your code to be message loop based. Instead of recursive and sleeping, post a delayed message to do work later.
All right,
I found a way to achieve what I wanted without any fancy message loops and without rewriting my whole code. Instead of calling RecursiveFunc() recursively, I'm now calling GetInput() recursively (with qobject meta calls).
Simplified, this is my hackerishy solution:
//needs to constantly scan for input
void GetInput(string input)
{
if (input == "x")
{
//terminate program
}
else if (input == "program1")
{
RecursiveFunc();
//sleep a little
GetInput(""); //calls GetInput() recursively
}
}
void RecursiveFunc()
{
//update some values
//do some more things
}
I'm not sure if this is a very good practice, but it works for now.

How to free resources owned by local variables?

Some of functions in my program needs to run a long time so that the user may interrupted it. The structure is like this:
int MainWindow::someFunc1()
{
//VP is a class defined somewhere.
VP vp1;
//the for loop that needs time to execute.
return 0;
}
int MainWindow::someFunc2()
{
VP vp2;
//another loop that consumes time.
return 0;
}
If the user run the either of functions or at the same time and click exit on the right top, the program will still run in background until the loop is finished. I tried to free the resources in void closeEvent(QCloseEvent *) :
void MainWindow::closeEvent(QCloseEvent *)
{
vp.stopIt();
}
However since vp1 and vp2 are local variables, I don't know how to pass them into the closeEvent() function and free resources. Any suggestions will be appreciated.
Since the variables are created on the stack, they will be automatically freed in the end of their scope (at the closing } of the function in your case), you don't have to worry about them.
If you want to free them before the function ends, you need to re-implement the functions and probably allocate and free the memory for those variables by yourself, outside of the function. The way you pass them to the functions (either passing them as function arguments, or including them into the class) depends on you.
You can't. You should declare vp1 and vp2 in MainWindow as member variable.
As far as I understood the OP's requirement, he's looking how to interrupt someFunc1 or someFunc2 when the main window is closed.
Those functions run in the GUI thread, so the following statement is a misunderstanding
the program will still run in background until the loop is finished
What actually happens, the program runs until the function is complete, then the user action is processed by the framework. Therefore, when void MainWindow::closeEvent is executed, nothing is running in the background and resources are already freed.
OP should move someFunc1 and someFunc2 to a worker thread.
Theoretically, you might be able to do this using setjmp. Something along these lines:
#include "setjmp.h"
jmp_buf doNotAttempt;
jmp_buf badPractice;
int MainWindow::someFunc1()
{
VP vp1;
for (...) {
// do stuff
if (setjmp(doNotAttempt)) { /*free resources, then: */ longjmp(badPractice,1); }
}
return 0;
}
// [...]
void MainWindow::closeEvent(QCloseEvent *)
{
if (!setjmp(badPractice))
longjmp(doNotAttempt,1);
else
// do the same for your other loop
}
In practice, do not do this - it's a terrible idea for all kinds of reasons. As other folks have said, just declare vp1 and vp2 as member variables.

why do we need to call these functions at run time using function pointers. we can as well call them directly

Having read a bit about function pointers and callbacks, I fail to understand the basic purpose of it. To me it just looks like instead of calling the function directly we use the pointer to that function to invoke it. Can anybody please explain me callbacks and function pointers? How come the callback takes place when we use function pointers, because it seems we just call a function through a pointer to it instead of calling directly?
Thanks
ps: There have been some questions asked here regarding callbacks and function pointers but they do not sufficiently explain my problem.
What is a Callbak function?
In simple terms, a Callback function is one that is not called explicitly by the programmer. Instead, there is some mechanism that continually waits for events to occur, and it will call selected functions in response to particular events.
This mechanism is typically used when a operation(function) can take long time for execution and the caller of the function does not want to wait till the operation is complete, but does wish to be intimated of the outcome of the operation. Typically, Callback functions help implement such an asynchronous mechanism, wherein the caller registers to get inimated about the result of the time consuming processing and continuous other operations while at a later point of time, the caller gets informed of the result.
An practical example:
Windows event processing:
virtually all windows programs set up an event loop, that makes the program respond to particular events (eg button presses, selecting a check box, window getting focus) by calling a function. The handy thing is that the programmer can specify what function gets called when (say) a particular button is pressed, even though it is not possible to specify when the button will be pressed. The function that is called is referred to as a callback.
An source Code Illustration:
//warning: Mind compiled code, intended to illustrate the mechanism
#include <map>
typedef void (*Callback)();
std::map<int, Callback> callback_map;
void RegisterCallback(int event, Callback function)
{
callback_map[event] = function;
}
bool finished = false;
int GetNextEvent()
{
static int i = 0;
++i;
if (i == 5) finished = false;
}
void EventProcessor()
{
int event;
while (!finished)
{
event = GetNextEvent();
std::map<int, Callback>::const_iterator it = callback_map.find(event);
if (it != callback_map.end()) // if a callback is registered for event
{
Callback function = *it;
if (function)
{
(*function)();
}
else
{
std::cout << "No callback found\n";
}
}
}
}
void Cat()
{
std::cout << "Cat\n";
}
void Dog()
{
std::cout << "Dog\n";
}
void Bird()
{
std::cout << "Bird\n";
}
int main()
{
RegisterCallBack(1, Cat);
RegisterCallback(2, Dog);
RegisterCallback(3, Cat);
RegisterCallback(4, Bird);
RegisterCallback(5, Cat);
EventProcessor();
return 0;
}
The above would output the following:
Cat
Dog
Cat
Bird
Cat
Hope this helps!
Note: This is from one of my previous answers, here
One very striking reason for why we need function pointers is that they allow us to call a function that the author of the calling code (that's us) does not know! A call-back is a classic example; the author of qsort() doesn't know or care about how you compare elements, she just writes the generic algorithm, and it's up to you to provide the comparison function.
But for another important, widely used scenario, think about dynamic loading of libraries - by this I mean loading at run time. When you write your program, you have no idea which functions exist in some run-time loaded library. You might read a text string from the user input and then open a user-specified library and execute a user-specified function! The only way you could refer to such function is via a pointer.
Here's a simple example; I hope it convinces you that you could not do away with the pointers!
typedef int (*myfp)(); // function pointer type
const char * libname = get_library_name_from_user();
const char * funname = get_function_name_from_user();
void * libhandle = dlopen(libname, RTLD_NOW); // load the library
myfp fun = (myfp) dlsym(libhandle, funname); // get our mystery function...
const int result = myfp(); // ... and call the function
// -- we have no idea which one!
printf("Your function \"%s:%s\" returns %i.\n", libname, funname, result);
It's for decoupling. Look at sqlite3_exec() - it accepts a callback pointer that is invoked for each row retrieved. SQLite doesn't care of what your callback does, it only needs to know how to call it.
Now you don't need to recompile SQLite each time your callback changes. You may have SQLite compiled once and then just recompile your code and either relink statically or just restart and relink dynamically.
It also avoids name collision. If you have 2 libs, both do sorting and both expect you to define a function called sort_criteria that they can call, how would you sort 2 different objects types with the same function?
It would quickly get complicated following all the if's and switches in the sort_criteria function, with callbacks you can specify your own function (with their nice to interpret name) to those sort functions.

optimizing branching by re-ordering

I have this sort of C function -- that is being called a zillion times:
void foo ()
{
if (/*condition*/)
{
}
else if(/*another_condition*/)
{
}
else if (/*another_condition_2*/)
{
}
/*And so on, I have 4 of them, but we can generalize it*/
else
{
}
}
I have a good test-case that calls this function, causing certain if-branches to be called more than the others.
My goal is to figure the best way to arrange the if statements to minimize the branching.
The only way I can think of is to do write to a file for every if condition branched to, thereby creating a histogram. This seems to be a tedious way. Is there a better way, better tools?
I am building it on AS3 Linux, using gcc 3.4; using oprofile (opcontrol) for profiling.
It's not portable, but many versions of GCC support a function called __builtin_expect() that can be used to tell the compiler what we expect a value to be:
if(__builtin_expect(condition, 0)) {
// We expect condition to be false (0), so we're less likely to get here
} else {
// We expect to get here more often, so GCC produces better code
}
The Linux kernel uses these as macros to make them more intuitive, cleaner, and more portable (i.e. redefine the macros on non-GCC systems):
#ifdef __GNUC__
# define likely(x) __builtin_expect((x), 1)
# define unlikely(x) __builtin_expect((x), 0)
#else
# define likely(x) (x)
# define unlikely(x) (x)
#endif
With this, we can rewrite the above:
if(unlikely(condition)) {
// we're less likely to get here
} else {
// we expect to get here more often
}
Of course, this is probably unnecessary unless you're aiming for raw speed and/or you've profiled and found that this is a problem.
Try a profiler (gprof?) - it will tell you how much time is spent. I don't recall if gprof counts branches, but if not, just call a separate empty method in each branch.
Running your program under Callgrind will give you branch information. Also I hope you profiled and actually determined this piece of code is problematic, as this seems like a microoptimization at best. The compiler is going to generate a branch table from the if/else if/else if it's able to which would require no branching (this is dependent on what the conditionals are, obviously)0, and even failing that the branch predictor on your processor (assuming this is not for embedded work, if it is feel free to ignore me) is pretty good at determining the target of branches.
It doesn't actually matter what order you change them round to, IMO. The branch predictor will store the most common branch and auto take it anyway.
That said, there are something you could try ... You could maintain a set of job queues and then, based on the if statements, assign them to the correct job queue before executing them one after another at the end.
This could further be optimised by using conditional moves and so forth (This does require assembler though, AFAIK). This could be done by conditionally moving a 1 into a register, that is initialised as 0, on condition a. Place the pointer valueat the end of the queue and then decide to increment the queue counter or not by adding that conditional 1 or 0 to the counter.
Suddenly you have eliminated all branches and it becomes immaterial how many branch mispredictions there are. Of course, as with any of these things, you are best off profiling because, though it seems like it would provide a win ... it may not.
We use a mechanism like this:
// pseudocode
class ProfileNode
{
public:
inline ProfileNode( const char * name ) : m_name(name)
{ }
inline ~ProfileNode()
{
s_ProfileDict.Find(name).Value() += 1; // as if Value returns a nonconst ref
}
static DictionaryOfNodesByName_t s_ProfileDict;
const char * m_name;
}
And then in your code
void foo ()
{
if (/*condition*/)
{
ProfileNode("Condition A");
// ...
}
else if(/*another_condition*/)
{
ProfileNode("Condition B");
// ...
} // etc..
else
{
ProfileNode("Condition C");
// ...
}
}
void dumpinfo()
{
ProfileNode::s_ProfileDict.PrintEverything();
}
And you can see how it's easy to put a stopwatch timer in those nodes too and see which branches are consuming the most time.
Some counter may help. After You see the counters, and there are large differences, You can sort the conditions in a decreasing order.
static int cond_1, cond_2, cond_3, ...
void foo (){
if (condition){
cond_1 ++;
...
}
else if(/*another_condition*/){
cond_2 ++;
...
}
else if (/*another_condtion*/){
cond_3 ++;
...
}
else{
cond_N ++;
...
}
}
EDIT: a "destructor" can print the counters at the end of a test run:
void cond_print(void) __attribute__((destructor));
void cond_print(void){
printf( "cond_1: %6i\n", cond_1 );
printf( "cond_2: %6i\n", cond_2 );
printf( "cond_3: %6i\n", cond_3 );
printf( "cond_4: %6i\n", cond_4 );
}
I think it is enough to modify only the file that contains the foo() function.
Wrap the code in each branch into a function and use a profiler to see how many times each function is called.
Line-by-line profiling gives you an idea which branches are called more often.
Using something like LLVM could make this optimization automatically.
As a profiling technique, this is what I rely on.
What you want to know is: Is the time spent in evaluating those conditions a significant fraction of execution time?
The samples will tell you that, and if not, it just doesn't matter.
If it does matter, for example if the conditions include function calls that are on the stack a significant part of the time, what you want to avoid is spending much time in comparisons that are false. The way you tell this is, if you often see it calling a comparison function from, say, the first or second if statement, then catch it in such a sample and step out of it to see if it returns false or true. If it typically returns false, it should probably go farther down the list.