My question is based on curiosity and not whether there is another approach to the problem or not. It is a strange/interesting question, so please read it with an open mind.
Let's assume there is a game loop that is being called every frame. The game loop in turn calls several functions through a myriad of if statements. For example, if the user has GUI to false then don't refresh the GUI otherwise call RefreshGui(). There are many other if statements in the loop and they call their respective functions if they are true. Some are if/if-else.../else which are more costly in the worst case. Even the functions that are called, if the if statement is true, have logic. If user wants raypicking on all objects call FunctionA(), if user wants raypicking on lights, call FunctionB(), ... , else call all functions. Hopefully you get the idea.
My point is, that is a lot of redundant if statements. So I decided to use function pointers instead. Now my assumption is that a function pointer is always going to be faster than an if statement. It is a replacement for if/else. So if the user wants to switch between two different camera modes, he/she presses the C key to toggle between them. The callback function for the keyboard changes the function pointer to the correct UpdateCamera function (in this case, the function pointer can point to either UpdateCameraFps() or UpdateCameraArcBall() )... you get the gist of it.
Now to the question itself. What if I have several update functions all with the same signature (let's say void (*Update)(float time) ), so that a function pointer can potentially point to any one of them. Then, I have a vector which is used to store the pointers. Then in my main update loop, I go through the vector and call each update function. I can remove/add and even change the order of the updates, without changing the underlying code. In the best case, I might only be calling one update function or in the worst case all of them, all with a very clean while loop and no nasty (potentially nested) if statements. I have implemented this part and it works great. I am aware, that, with each iteration of the while loop responsible for iterating through the vector, I am checking whether the itrBegin == itrEnd. More specifically while (itrBegin != itrEnd). Is there any way to avoid the call to the if statements? Can I use branch prediction to my advantage (or am I taking advantage of it already without knowing)?
Again, please take the question as-is, i.e. I am not looking for a different approach (although you are more than welcome to give one).
EDIT: A few replies state that this is an unneeded premature optimization and I should not be focusing on it and that the if-statement(s) cost is minuscule compared to the work done in all the separate update functions. Very true, and I completely agree, but that was not the point of the question and I apologize if I did not make the question clearer. I did learn quite a few new things with all the replies though!
there is a game loop that is being called every frame
That's a backwards way of describing it. A game loop doesn't run during a frame, a frame is handled in the body of the game loop.
my assumption is that a function pointer is always going to be faster than an if statement
Have you tested that? It's not likely to be true, especially if you're changing the pointer frequently (which really messes with the CPU's branch prediction).
Can I use branch prediction to my advantage (or am I taking advantage of it already without knowing)?
This is just wishful thinking. By having one indirect call inside your loop calling a bunch of different functions you are definitely working against the CPU branch prediction logic.
More specifically while (itrBegin != itrEnd). Is there any way to avoid the call to the if statements?
One thing you could do in order to avoid conditionals as you iterate the chain of functions is to use a linked list. Then each function can call the next one unconditionally, and you simply install your termination logic as the last function in the chain (longjmp or something). Or you could hopefully just never terminate, include glSwapBuffers (or the equivalent for your graphics API) in the list and just link it back to the beginning.
First, profile your code. Then optimize the parts that need it.
"if" statements are the least of your concerns. Typically, with optimization, you focus on loops, I/O operations, API calls (e.g. SQL), containers/algorithms that are inefficient and used frequently.
Using function pointers to try to optimize is typically the worst thing you can do. You kill any chance at code readability and work against the CPU and compiler. I recommend using polymorphism or just use the "if" statements.
To me, this is asking for an event-driven approach. Rather than checking every time if you need to do something, monitor for the incoming request to do something.
I don't know if you consider it a deviation from your approach, but it would reduce the number of if...then statements to 1.
while( active )
{
// check message queue
if( messages )
{
// act on each message and update flags accordingly
}
// draw based on flags (whether or not they changed is irrelevant)
}
EDIT: Also I agree with the poster who stated that the loop should not be based on frames; the frames should be based on the loop.
If the conditions checked by your ifs are not changing during the loop, you could check them all once, and set a function pointer to the function you'd like to call in that case. Then in the loop call the function the function pointer points to.
Related
Why does the Erlang if statement support only specific functions in its guard?
i.e -
ok(A) ->
if
whereis(abc)=:=undefined ->
register(abc,A);
true -> exit(already_registered)
end.
In this case we get an "illegal guard" error.
What would be the best practice to use function's return values as conditions?
Coming from other programming languages, Erlang's if seems strangely restrictive, and in fact, isn't used very much, with most people opting to use case instead. The distinction between the two is that while case can test any expression, if can only use valid Guard Expressions.
As explained in the above link, Guard Expressions are limited to known functions that are guaranteed to be free of side-effects. There are a number of reasons for this, most of which boil down to code predictability and inspectability. For instance, since matching is done top-down, guard expressions that don't match will be executed until one is found that does. If those expressions had side-effects, it could easily lead to unpredictable and confusing outcomes during debugging. While you can still accomplish that with case expressions, if you see an if you can know there are no side effects being introduced in the test without needing to check.
One last, but important thing, is that guards have to terminate. If they did not, the reduction of a function call could go on forever, and as the scheduler is based around reductions, that would be very bad indeed, with little to go on when things went badly.
As a counter-example, you can starve the scheduler in Go for exactly this reason. Because Go's scheduler (like all micro-process schedulers) is co-operatively multitasked, it has to wait for a goroutine to yield before it can schedule another one. Much like in Erlang, it waits for a function to finish what it's currently doing before it can move on. The difference is that Erlang has no loop-alike. To accomplish looping, you recurse, which necessitates a function call/reduction, and allows a point for the scheduler to intervene. In Go, you have C-style loops, which do not require a function call in their body, so code akin to for { i = i+1 } will starve the scheduler. Not that such loops without function calls in their body are super-common, but this issue does exist.
On the contrary, in Erlang it's extremely difficult to do something like this without setting out to do so explicitly. But if guards contained code that didn't terminate, it would become trivial.
Check this question: About the usage of "if" in Erlang language
In short:
Only a limited number of functions are allowed in guard sequences, and whereis is not one of them
Use case instead.
I have a function, which is executed hundreds of millions of times in a typical program run. This function performs a certain main task, but, if the user so desires, it should perform some slight variations of that main task. The obvious way to implement this would be something like this:
void f(bool do_option)
{
// Do the first part
if (do_option)
{
// Optional extra code
}
// Continue normal execution
}
However, this is not very elegant, since the value of do_option does not change during a program run. The if statement is unnecessarily being performed very often.
I solved it by turning do_option into a template parameter. I recompile the program every time I want to change it. Right now, this workflow is acceptable: I don't change these options very often and I am the sole user/developer. In the future though, both these things will change, so I want a single binary with command-line switches.
Question is: what is the best or most elegant way to deal with this situation? I don't mind having a large binary with many copies of f. I could create a map from a set of command-line parameters to a function to execute, or perhaps use a switch. But I don't want to maintain that map by hand -- there will probably be more than five such parameters.
By the way, I know somebody is going to say that I'm prematurely optimizing. That is correct, I tested it. In my specific case, the performance of runtime ifs is not much worse than my template construction. That doesn't mean I'm not interested if nicer solutions are possible.
On a modern (non-embedded) CPU, the branch predictor will be smart enough to recognize that the same path is taken every time, so an if statement is a perfectly acceptable (and readable) way of handling your situation.
On an embedded processor, compiler optimizations should be smart enough to get rid of most of the overhead of the if statement.
If you're really picky, you can use the template method that you mentioned earlier, and have an if statement select which version of the function to execute.
I'm calling a function fooA from main() that calls another function fooB that is recursive.
When I wish to return, I keep using exit(1) to halt execution. What is the right way to exit when the recursion tree is deep?
Returning through the recursion stack may not be of help because returning usually clears a part solution I build and I don't want to do that. I want to do execute more piece of code from main().
I read Exceptions can be used, it would be nice if I can get a code snippet.
The goto statement won't work to hop from one function back to another; Nikos C. is correct that it wouldn't account for releasing the stack frames of each of the calls you've made, so when you got to the function you goto'ed to, the stack pointer would be pointing to the stack frame of the function you were just in... no, that just won't work. Similarly, you can't simply call (either directly, or indirectly via a function pointer) the function you want to end up in when your algorithm is done. You'd never get back to the context you were in prior to diving into your recursive algorithm. You could conceivably architect a system this way, but in essence each time you did this you'd "leak" what was currently on the stack (not quite the same as leaking heap memory, but a similar effect). And if you were deep into a highly recursive algorithm, that could be a lot of "leaked" stack space.
No, you need to somehow return back to the calling context. There are only three ways to do so in C++:
Exit each function in turn by returning from it to its caller
backing up through the call chain in an orderly fashion.
Throw an exception and catch it at the point right after you
launched into your recursive algorithm (which automatically destroys
any objects created by each function on the stack in an orderly
fashion).
Use setjmp() & longjmp() to do something similar to throwing &
catching an exception, but "throwing" a longjmp() will not destroy
objects on the stack; if any such objects own heap allocations,
those allocations will be leaked.
To do option 1, you have to write your recursive function such that once a solution is reached, it returns some sort of indication that it's complete to its caller (which may be the same function), and its caller sees that fact & relays that fact on to its caller by returning to it (which may be the same function), so on and so on, until finally all stack frames of the recursive algorithm are released and you return to whatever function called the first function in the recursive algorithm.
To do option 2, you wrap the call to your recursive algorithm in a try{...} and immediately after it you catch(){...} the expected thrown object (which could conceivably be the result of the computation, or just some object that lets the caller know "hey, I'm done, you know where to find the result"). Example:
try
{
callMyRecursiveFunction(someArg);
}
catch( whateverTypeYouWantToThrow& result )
{
...do whatever you want to do with the result,
including copy it to somewhere else...
}
...and in your recursive function, when you finish the results, you simply:
throw(whateverTypeYouWantToThrow(anyArgsItsConstructorNeeds));
To do option 3...
#include <setjmp.h>
static jmp_buf jmp; // could be allocated other ways; the longjmp() user just needs to have access to it.
.
.
.
if (!setjmp(jmp)) // setjmp() returns zero 1st time, or whatever int value you send back to it with longjmp()
{
callMyRecursiveFunction(someArg);
}
...and in your recursive function, when you finish the results, you simply:
longjmp(jmp, 1); // this passes 1 back to the setjmp(). If your result is an int, you
// could pass that back to setjmp(), but you can't pass zero back.
The bad thing about using setjmp()/longjmp() is that if there are any stack-allocated objects still "alive" on the stack when you call longjmp(), execution will jump back to the setjmp() point, skipping the destructors for those objects. If your algorithm uses only POD types, that's not an issue. It's also not an issue if the non-POD types your algorithm uses do NOT contain any heap allocations (e.g. from malloc() or new). If your algorithm uses non-POD types that contain heap allocations, then you're only safe with options 1 & 2 above. But if your algorithm meets the criteria of being OK with setjmp()/longjmp(), and if your algorithm is buried under a ton of recursive calls at the point it finishes, setjmp()/longjmp() may be the fastest way back to the initial calling context. If that won't work, option 1 is probably your best bet in terms of speed. Option 2 may seem convenient (and would possibly eliminate a condition check at the start of each recursion call), but the overhead associated with the system automatically unwinding the callstack is somewhat significant.
It's typically said you should reserve exceptions for "exceptional events" (events expected to be very rare), and the overhead associated with unwinding the callstack is why. Older compilers used something akin to setjmp()/longjmp() to implement exceptions (setjmp() at the location of the try & catch, and longjmp() at the location of a throw), but there was of course extra overhead associated with determining what objects on the stack need destroyed, even if there are no such objects. Plus, every time you'd run across a try, it would have to save the context just in case there was a throw, and if exceptions are truly exceptional events, the time spent saving that context was simply wasted. Newer compilers are now more likely to use what are known as "Zero Cost Exceptions" (a.k.a. Table Based Exceptions), which seems like that would solve all the world's problems, but it doesn't.... It makes normal runtime faster because there is no longer a need to save the context every time you run across a try, but in the event that a throw executes, there is now even more overhead associated with decoding information stored in massive tables that the runtime has to process in order to figure out how to unwind the stack based on the location where the throw was encountered and content of the runtime stack. So exceptions aren't free, even though they're very convenient. You'll find a lot of stuff on the internet where people make claims about how unreasonably expensive they are and how much they slow down your code, and you'll also find lots of stuff by people refuting those claims, with both sides presenting hard data to bolster their claims. What you should take away from the arguments is that using exceptions is great if you expect them to rarely occur, because they result in cleaner interfaces & logic that's free of a ton of condition checking for "badness" every time you make a function call. But you shouldn't use exceptions as a means of normal communication between a caller and its callees, because that mode of communication is significantly more expensive than simply using return values.
This happened to me while finding the path from root to node of a binary tree. I was using a stack to store the nodes in preorder and the recursion wouldnt stop until the last node returned NULL. I used a global variable, integer i=1, and when I reached the node I was looking for I set that variable to 0 and used while(i==0) return stack; to allow the program to go back up the memory stack without popping my nodes off.
This is a followup to Clojure: Compile time insertion of pre/post functions
My goal is to call a debug function instead of throwing an exception. I am looking for the best way to store a list of stack frames, function calls and their arguments, to accomplish this.
I want to have a function (my-uber-debug), so that when I call it (instead of throwing an exception), the following things happen:
a new Java window pops up
there is a record of the current clojure stack frame
for each stack frame, there is a record of the argument passed to the function
This is so that I can move up/down the stack frames, and examine the arguments passed to get to this current point. [If somehow, magically, we can get the variables defined in "let" environments, that'd be awesome too.]
Current Idea
I'm going to have a thread local variable uber-debug, which has type:
List of StackFrames
where StackFrame = function + arguments
At each function call, it's going to push (cons the current function + arguments to uber-debug), then at the end of a function call, it's going to remove the first element from uber-debug
Then, when I call (my-uber-debug), it just pops up a new java window, and lets me interact with uber-debug
Question
The ideas I've had so far are probably not ideal for setting this up. What is the right way to solve this problem?
Edit:
The question is NOT about the Swing/GUI part. It's about how to store the stack frames.
Thanks!
Your answer may depend on a lot of factors, so I am going to answer this by giving you my thoughts.
If you merely want to store function calls and their parameters when an exception occurs, then either write a macro or function as a wrapper to accomplish this. You would then have to pass all functions to be called to this wrapper. The wrapper would perform the try catch operation and whatever else you need.
You might also want to look into Clojure meta data in addition to writing the wrapper, because your running code could look at its meta-data and make some decisions based on that as well. I have never used meta data, but the information at the link looks promising.
As a final thought, it might be helpful for you to further delineate what you want to accomplish by doing this by editing your original post and putting the information there.
For example, are these stack traces for a library or a main program?
As to storing all this information, are multiple threads going to need it, or just one?
Can you get by storing the information in a let binding at the highest level of your program, or do you need something like a ref?
I'm trying to fine tune some benchmark code we are using and am wondering if there is a way to communicate to GCC explicitly how to order certain bits of code. For example, given these blocks of code:
Pre
Start-Timer
Body
Stop-Timer
Post
I wish to tell GCC that each block must be kept in the above order without any instruction leakage into the other block. Ideally the timer would measure only Step 3, however, for practical reasons measuring at least Step 3 and at most Steps 2-4 will suffice. I just want to make sure I'm note measuring any part of Step 1 or 5.
Currently I use a __sync_synchronize in the Timer functions to issue a full memory fence. My hope is that, in addition to being a fence, that this function is marked to prevent reordering.
Is this call to __sync_synchronize sufficient? Also logically, would the C++11 fence commands also suffice according to the text of the standard?
If the Start-Timer is a function call and the Stop-Timer is another function call, the optimizer has little opportunity to move the Body around, or spill material from Pre or Post into Body.
All the side-effects from Pre must be complete before the Start-Timer function is called (there's a sequence point there). All the side effects of Stop-Timer must be complete before executing Post (there's a sequence point there, too). So the compiler would have to have the code for Start-Timer and Stop-Timer visible to monkey with the generated code, spilling material around, and I'm not convinced it could do so even then.
So, in summary, I don't think you have to worry about it if you use function calls to start and stop the timer.
Make two versions of the code: one with the real code you want to measure, one with stubs. Measure both. Subtract. Then, I think, you needn't care what GCC does.