Performance question on nested if's

Performance question on nested if's - c++

is there be any performance effect on "Lines of code - (C)" running inside nested ifs?
if (condition_1)
{
/* Lines of code */ - (A)
if (condition_2)
{
/* Lines of code */ - (B)
if (condition_n)
{
/* Lines of code */ - (C)
}
}
}
Does that mean you can nest any number of if statements without effecting the execution time for the code enclosing at the end of last if statement?

Remember C and C++ are translated to their assembly equivalents. In most cases, this is likely to be via some form of compare (e.g. cmp) and some form of jmp instruction.
As such, whatever code is generated from (C) will still be the same. The if nesting has no bearing on the output. If the resultant code is to generate add eax, 1 no matter how many ifs precede that, it will still be the same thing.
The only performance penalty will be in the number of if statements you use and whether or not the resultant assembly (jxx) is expensive on your system. However, I doubt that repeated nested use of if is likely to be a performance bottleneck in your application. Usually, it is time required to process data and or time required to get data.

You won't affect the execution time of the indicated code itself, but if evaluating your conditions is complex, or affected by other factors, then it could potentially lengthen the total time of execution.

The code will run as fast as if it was outside.
Just remember that evaluating an expression (in a if statement) is not "free" and will take a bit of time (more if the condition is more complex), so if your code is deeply nested it will take more time to reach it.

Related

Performance impact of using 'break' inside 'for-loop'

I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)

The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.

The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.

Is it good practice to construct long circuit statements?

Question Context: [C++] I want to know what is theoretically the fastest, and what the compiler will do. I don't want to hear about premature optimization is the root of all evil, etc.
I was writing some code like this:
bool b0 = ...;
bool b1 = ...;
if (b0 && b1)
{
...
}
But then I was thinking: the code, as-is, will compile into two TEST instructions, if compiled without optimizations. This means two branches. So I was thinking that it might be better to write:
if (b0 & b1)
Which will produce only one TEST instruction, if no optimization is done by the compiler. But then I feel that this is against my code-style. I usually write && and ||.
Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast). Will the compiler automatically compile it like &, even if I have used a && in the code? And what is theoretically faster? Does the behavior change if I do this:
if (b0 && b1)
{ ... }
else if (b0)
{ ... }
else if (b1)
{ ... }
else
{ ... }
Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?

Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast).
Most likely nothing more to increase the optimization.
As stated in my comments, you really can't optimize the evaluation any further than:
AND B0 WITH B1 (sets condition flags)
JUMP ZERO TO ...
Although, if you have a lot of simple boolean logic or data operations, some processors may conditionally execute them.
Will the compiler automatically compile it like &, even if I have used a && in the code?
And what is theoretically faster?
In most platforms, there is no difference in evaluation of A & B versus A && B.
In the final evaluation, either a compare or an AND instruction is executed, then a jump based on the status. Two instructions.
Most processors don't have Boolean registers. It's all numbers and bits.
Optimize By Boolean Logic
Your best option is to review the design and set up your algorithms to use Boolean algebra. You can than simplify the Boolean expressions.
Another option is to implement the code so that the compiler can generate conditional assembly instructions, if the platform supports them.
Optimize: Reduce jumps
Processors favor arithmetic and data transfers over jumps.
Many processors are always feeding an instruction pipeline. When it comes to a conditional branch instruction, the processor has to wait (suspend the instruction prefetching) until the condition status is determined. Then it can determine where the next instruction will be fetched.
If you can't remove the jumps, such as in a loop, make the ratio of data processing to jumping bigger in the data side. Search for "Loop Unrolling". Many compilers will perform this when optimization levels are increased.
Optimize: Data Cache
You may notice increased performance by organizing your data for best data cache usage.
For example, instead of 3 large arrays, use one array of a structure containing 3 elements. This allows the elements in use to be close to each other (and reduce the likelihood of accessing data outside of the cache).
Summary
The difference in evaluation of A && B versus A & B as conditional expressions is known as a micro-optimization. You will achieve improved performance by using Boolean algebra to reduce the quantity of conditional expressions. Jumps, or changes in execution path, slow down instruction execution. Fetching data outside of the data cache also slows down execution. You will most likely get better performance by redesigning your code and helping the compiler to reduce the branches and more effective use of the data cache.

If you care about what's fastest, why do you care what the compiler will do without optimisation?
Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?
This question seems to assume that the compiler transforms C++ code into more C++ code. It doesn't. It transforms your code into machine instructions (including the assembler as part of the compiler for argument's sake). You should not assume there is a one-to-one mapping from a C++ operator like && or & to a particular instruction.
With optimisation the compiler will do whatever it thinks will be faster. If a single instruction would be faster the compiler will generate a single instruction for if (b0 && b1), you don't need to bugger up your code with micro-optimisations to help it make such a simple transformation.
The compiler knows the instruction set it's using, it knows the context the condition is in and whether it can be removed entirely as dead code, or moved elsewhere to help the pipeline, or simplified by constant propagation, etc. etc.
And if you really care about what's fastest, why would you compute b1 until you know it's actually needed? If obtaining the value of b1 has no side effects the compiler could even transform your code to:
bool b0 = ...;
if (b0)
{
bool b1 = ...;
if (b1)
{
Does that mean two if conditions are faster than a &?! Of course not.
In other words, the whole premise of the question is flawed. Do not compromise the readability and simplicity of your code in the misguided pursuit of the "theoretically fastest" micro-optimisation. Spend your time improving the algorithms and data structures used not trying to second guess which instructions the compiler will generate.

DSP performance, what should be avoided?

I am starting with dsp programming right now and am writing my first low level classes and functions.
Since I want the functions to be fast (or at last not inefficient), I often wonder what I should use and what I should avoid in functions which get called per sample.
I know that the speed of an instruction varies quite a bit but I think that some of you at least can share a rule of thumb or just experience. :)
conditional statements
If I have to use conditions, switch should be faster than an if / else if block, right?
Are there differences between using two if-statements or an if-else? Somewhere I read that else should be avoided but I don't know why.
Also, compared to a multiplication, is there a rude estimation how much more time an if-block takes? Because in some cases, using multiplications by zero could be used instead of if-statements:
//something could be an int either 1 or 0:
if(something) {
signal += something_else;
}
// or:
signa+ += something*something_else;
functions and function-pointers
Instead of using conditional statements, you could use function-pointer. Instead of using conditions in every call, the pointer could be redirected to a specific function. However, for every call, the pointer had to be interpreted in order to call the right function. So I don't know if this would help or not.
What I also wonder is if calling functions have an impact. If so, boxing functions should be avoided, right?
variables
I would think that defining and using many variables in a function doesn't realy have an impact, at least relative to calculations. Is this true? If not, reusing declared variables would be better than more declaration.
calculations
Is there an order of calculation-types in term of the time they take to execute? I am sure that this highly depends on the context but a rule of thumb would be nice. I often read that people only count the multiplication in an algorithm. Is this because additions are realtively fast?
Does it make a difference between multiplication and division? (*0.5 or /2.0)
I hope that you can share soem experience.
Cheers

here are part of the answers:
calculations (talking about native precision of the processor for example 32bits):
Most DSP microprocessors have single cycle multipliers, that means a
multiply costs exactly the same as an addition in term of cycles.
and multiplication it generally faster then division.
conditional statements:
if/else - when looking in the assembly code you can see that the memory of the if condition is usually loaded by default, so when using if else make sure that the condition that will happen more frequently will be in the if.
but generally if possible you should avoid if/else in a loop to improve the pipe lining.
good luck.

DSP compilers are typically good at optimizing for loops that do not contain function-calls.
Therefore, try to inline every function that you call from within a time-critical for loop.
If your DSP is a fixed-point processor, then floating-point operations are implemented by SW.
This means that every such operation is essentially replaced by the compiler with a library function.
So you should basically avoid performing floating-point operations inside time-critical for loops.
The preprocessor should provide a special #pragma for the number of iterations of a for loop:
Minimum number of iterations
Maximum number of iterations
Multiplicity of the number of iterations
Use this #pragma where possible, in order to help the compiler to perform loop-unrolling where possible.
Finally, DSPs usually support a set of unique operations for enhanced performance.
As an example, consider _dotpu4 on Texas Instruments C64xx, which computes the scalar-product of two integers src1 and src2: For each pair of 8-bit values in src1 and src2, the 8-bit value from src1 is multiplied with the 8-bit value from src2, and the four products are summed together.
Check the data-sheet of your DSP, and see if you can make use of any of these operations.
The compiler should generate an intermediate file, which you can explore in order to analyze the expected performance of each of the optimized for loops in your code.
Based on that, you can try different assembly operations that might yield better results.

Can all control flow graphs be translated back using if and while?

I was wondering if all control flow graphs obtained from a typical JVM bytecode (see how to) of a single method (no recursion allowed) could be translated back to equivalent ifs and whiles code.
If not, what is the smallest JVM bytecode sequence which cannot be translated back to ifs and whiles ?

There are several reasons why bytecode control flow may not be translatable back into Java without extreme measures.
JSR/RET - this instruction pairs has no equivalent in Java. The best you can do is inline it. However, this will lead to an exponential increase in code size if they are nested.
Irreducible loops - In Java, every loop has a single entry point which dominates the rest of the loop. An "irreducible" loop is one that has multiple distinct entry points, and hence no direct Java equivalent. There are several approaches. My preferred solution is to duplicate part of the loop body, though this can lead to exponential blow up in pathological cases as well. The other approach is to turn the method into a while-switch state machine, but this obscures the original control flow.
An example instruction sequence is
ifnull L3
L2: nop
L3: goto L2
This is the simplest possible irreducible loop. It is impossible to turn into Java without changing the structure or duplicating part of the code (though in this case, there are no actual statements so duplicating wouldn't be so bad).
The last part is exception handling. Java requires all exception handling to be done through structured try/catch blocks and it's variations while bytecode doesn't. At the bytecode level, exception handlers are basically another form of goto. In pathological cases, the best you can do is create a seperate try catch for every instruction that throws and repeat the process above.

I think a jump into the middle of a loop is not expressible in structured code:
JMP L1 // jump into the middle of a loop
L2:
IFCMP L3 // loop condition
// do something inside the loop
L1:
// do something else inside the loop
JMP L2
L3:
// exit the loop
Sorry, this is not exactly JVM bytecode, but you can get the idea.

Is while faster than for?

As in the topic, I learnt in school, that loop for is faster than loop while, but someone told me that while is faster.
I must optimize the program and I want to write while instead for, but I have a concern that it will be slower?
for example I can change for loop:
for (int i=0; i<x; i++)
{
cout<<"dcfvgbh"<<endl;
}
into while loop:
i=0;
while (i<x)
{
cout<<"dcfvgbh"<<endl;
i++;
}

The standard requires (§6.5.3/1) that:
The for statement
for ( for-init-statement conditionopt; expressionopt) statement
is equivalent to
{
for-init-statement
while ( condition ) {
statement
expression;
}
}
As such, you're unlikely to see much difference between them (even if execution time isn't necessarily part of the equivalence specified in the standard). There are a few exceptions listed to the equivalence as well (scopes of names, execution of the expression before evaluating the condition if you execute a continue). The latter could, at least theoretically, affect speed a little bit under some conditions, but probably not enough to notice or care about as a rule, and definitely not unless you actually used a continue inside the loop.

For all intents and purposes for is just a fancy way of writing while, so there is no performance advantage either way. The main reason to use one over the other is how the intent is translated so the reader understands better what the loop is actually doing.

No.
Nope, it's not.
It is not faster.

You cout will eat 99% of the clock cycles for this loop. Beware micro-optimization. At any rate, these two will give essentially identical code.
The only time when a for loop can be faster is when you have a known terminating condition - e.g.
for(ii = 0; ii < 24; ii++)
because some optimizing compilers will perform loop unrolling. This means they will not perform a test on every pass through the loop because they can "see" that just doing the thing inside the loop 24 times (or 6 times in blocks of 4, etc) will be a tiny bit more efficient. When the thing inside the loop is very small (e.g. jj += ii;), such optimization makes the for loop a bit faster than the while (which typically doesn't do "unrolling").
Otherwise - no difference.
update at the request of #zeroth
Source: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.9346&rep=rep1&type=pdf
Quote from source (my emphasis):
Unrolling a loop at the source- code level involves identification of
loop constructs (e.g., for, while, do-while, etc.), determination of
the loop count to ensure that it is a counting loop, replication of
the loop body and the adjustment of loop count of the unrolled loop. A
prologue or epilogue code may also be inserted. Using this approach,
it is difficult to unroll loops formed using a while and goto
statements since the loop count is not obvious. However, for all but
the simplest of loops, this approach is tedious and error prone.
The other alternative is to unroll loops automatically. Automatic
unrolling can be done early on source code, late on the unoptimized
intermediate representation, or very late on an optimized
representation of the program. If it is done at the source-code level,
then typically only counting loops formed using for statements are
unrolled. Unrolling loops formed using other control constructs is
difficult since the loop count is not obvious.

To the best of my knowledge swapping out for loops for while loops is not an established optimization technique.
Both your examples will be identical in performance, but as an exercise you could time them to confirm this for yourself.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js