I noticed the below and think it is inefficient. What am I missing? I imagine there must be speed advantages I am unaware of. For context, this is production code in a brokerage firm's API.
What I saw:
const unsigned MAX_ATTEMPTS = 50;
unsigned attempt = 0;
for (;;) {
++attempt;
// logic, functions, output
if( attempt >= MAX_ATTEMPTS) {
break;
}
}
What I expected:
const unsigned MAX_ATTEMPTS = 50;
for(unsigned attempt = 0; attempt < MAX_ATTEMPTS; ++attempt){
// logic, functions, output
}
*** Corrected typo
I noticed the below and think it is inefficient. What am I missing?
That it's pointless to speculate about efficiency unless you know
whether there's a measurable problem
how long it currently takes
how long it's desirable for it to take
how much effort it would cost to improve
So, if this loop is not speed-critical and is dominated by the logic, functions, output - which for the avoidance of doubt it absolutely is unless they have output orders of magnitude more efficient than anyone else - then there is no problem in the first place, and your speculation is unlikely to be productive.
If this loop is somehow speed critical (I emphasize again how unlikely this is), then you need to measure it - and you need to decide what result would be acceptable. Otherwise you're just wasting time rearranging deckchairs instead of doing anything valuable.
Finally, if you pass tests zero through two inclusive, you still need to judge whether any improvement is worth the effort required to implement, test, review and deploy it. If it turns out to be 1% below the optimum latency decided at step 2, and some other part of your codebase is currently burning money, then this still not likely to be top priority.
From a learning rather than a business point of view however - it's great to spot potential inefficiencies like these. That's not because they're important to fix, but because you're probably wrong, and the process of learning how to benchmark them - and of understanding why you were wrong - is good experience and will improve your intuition for next time.
The only differences are that in the original code:
You can access the last value of attempt after the loop
The loop will be executed at least once.
It offers no obvious benefits. And if you ask me, the original code is quite ugly. I would have done this instead:
unsigned attempt = 1;
do {
// Logic
} while(++attempt <= MAX_ATTEMPTS);
There is a chance that one of them gets compiled to faster code. In order to find out, you need to benchmark it. Which one is faster (if any) can vary from system to system.
I think that you cut too much, and I suppose that it was something like this.
for (;;) {
++attempt;
// logic, functions, output
result = somefunc();
if(result == SUCCESS) break;
if( attempt >= MAX_ATTEMPTS) {
break;
}
}
I do not like zillions breaks in the code. I prefer:
do {
// Logic
result = somefunc();
} while(result != SUCCESS && attempt++ < MAX_ATTEMPTS);
If I am right your version should look like this
result = FAILURE;
for(unsigned attempt = 1; result != SUCCESS && attempt <= MAX_ATTEMPTS; ++attempt){
result = somefunc();
// logic, functions, output
}
There will not be any difference in performance between those versions. It is a question of the personal preferences
Related
My prof once said, that if-statements are rather slow and should be avoided as much as possible. I'm making a game in OpenGL, where I need a lot of them.
In my tests replacing an if-statement with AND via short-circuiting worked, but is it faster?
bool doSomething();
int main()
{
int randomNumber = std::rand() % 10;
randomNumber == 5 && doSomething();
return 0;
}
bool doSomething()
{
std::cout << "function executed" << std::endl;
return true;
}
My intention is to use this inside the draw function of my renderer. My models are supposed to have flags, if a flag is true, a certain function should execute.
if-statements are rather slow and should be avoided as much as possible.
This is wrong and/or misleading. Most simplified statements about slowness of a program are wrong. There's probably something wrong with this answer too.
C++ statements don't have a speed that can be attributed to them. It's the speed of the compiled program that matters. And that consists of assembly language instructions; not of C++ statements.
What would probably be more correct is to say that branch instructions can be relatively slow (on modern, superscalar CPU architectures) (when the branch cannot be predicted well) (depending on what you are comparing to; there are many things that are much more expensive).
randomNumber == 5 && doSomething();
An if-statement is often compiled into a program that uses a branch instruction. A short-circuiting logical-and operation is also often compiled into a program that uses a branch instruction. Replacing if-statement with a logical-and operator is not a magic bullet that makes the program faster.
If you were to compare the program produced by the logical-and and the corresponding program where it is replaced with if (randomNumber == 5), you would find that the optimiser sees through your trick and produces the same assembly in both cases.
My models are supposed to have flags, if a flag is true, a certain function should execute.
In order to avoid the branch, you must change the premise. Instead of iterating through a sequence of all models, checking flag, and conditionally calling a function, you could create a sequence of all models for which the function should be called, iterate that, and call the function unconditionally -> no branching. Is this alternative faster? There is certainly some overhead of maintaining the data structure and the branch predictor may have made this unnecessary. Only way to know for sure is to measure the program.
I agree with the comments above that in almost all practical cases, it's OK to use ifs as much as you need without hesitation.
I also agree that it is not an issue important for a beginner to waste energy on optimizing, and that using logical operators will likely to emit code similar to ifs.
However - there is a valid issue here related to branching in general, so those who are interested are welcome to read on.
Modern CPUs use what we call Instruction pipelining.
Without getting too deap into the technical details:
Within each CPU core there is a level of parallelism.
Each assembly instruction is composed of several stages, and while the current instruction is executed, the next instructions are prepared to a certain degree.
This is called instruction pipelining.
This concept is broken with any kind of branching in general, and conditionals (ifs) in particular.
It's true that there is a mechanism of branch prediction, but it works only to some extent.
So although in most cases ifs are totally OK, there are cases it should be taken into account.
As always when it comes to optimizations, one should carefully profile.
Take the following piece of code as an example (similar things are common in image processing and other implementations):
unsigned char * pData = ...; // get data from somewhere
int dataSize = 100000000; // something big
bool cond = ...; // initialize some condition for relevant for all data
for (int i = 0; i < dataSize; ++i, ++pData)
{
if (cond)
{
*pData = 2; // imagine some small calculation
}
else
{
*pData = 3; // imagine some other small calculation
}
}
It might be better to do it like this (even though it contains duplication which is evil from software engineering point of view):
if (cond)
{
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = 2; // imagine some small calculation
}
}
else
{
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = 3; // imagine some other small calculation
}
}
We still have an if but it's causing to branch potentially only once.
In certain [rare] cases (requires profiling as mentioned above) it will be more efficient to do even something like this:
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = (2 * cond + 3 * (!cond));
}
I know it's not common , but I encountered specific HW some years ago on which the cost of 2 multiplications and 1 addition with negation was less than the cost of branching (due to reset of instruction pipeline). Also this "trick" supports using different condition values for different parts of the data.
Bottom line: ifs are usually OK, but it's good to be aware that sometimes there is a cost.
Look at this function:
float process(float in) {
float out = in;
for (int i = 0; i < 31; ++i) {
if (biquads_[i]) {
out = biquads_[i]->filter(out);
}
}
return out;
}
biquads_ is a std::optional<Biquad>[31].
in this case i check for every optional to check if its not empty, and then call the filter function of biquad, if instead I unconditionally call filter function, changing it to multiply by 1 or simply return the input value, would be more efficient?
Most likely it won't make a shread of difference (guessing somewhat though since your question is not entirely clear). For two reasons: 1) unless the code is going to be used in a very hot path, it won't matter even if one way is a few nanoseconds faster than the other. 2) most likely your compilers optimizer will be clever enough to generate code that performs close-to (if not identical to) the same in both cases. Did you test it? Did you benchmark/profile it? If not; do so - with optimization enabled.
Strive to write clear, readable, maintainable code. Worry about micro-optimization later when you actually have a problem and your profiler points to your function as a hot-spot.
I have code in my C++ application that generally does this:
bool myFlag = false;
while (/*some finite condition unrelated to myFlag*/) {
if (...) {
// statements, unrelated to myFlag
} else {
// set myFlag to true, perhaps only if it was false before?
}
}
if (myFlag) {
// Do something...
}
The question I have pertains to the else statement of my code. Basically, my loop may set the value of myFlag from false to true, based on a certain condition not being met. Never will the flag be unset from true to false. I would like to know what statement makes more sense performance-wise and perhaps if this issue is actually a non-issue due to compiler optimization.
myFlag = true;
OR
if (!myFlag) myFlag = true;
I would normally choose the former because it requires writing less code. However, I began to wonder that maybe it involved needless writing to memory and therefore the latter would prevent needless writing if myFlag was already true. But, would using the latter take more time because there is a conditional statement and therefore compile code using more instructions?
Or maybe I am over-thinking this too much...
UPDATE 1
Just to clarify a bit...the purpose of my latter case is to not write to memory if the variable was already true. Thus, only write to memory if the variable is false.
You're almost certainly better off just using myFlag = true;.
About the best you can hope for from the if (!myFlag) myFlag = true; is that the compiler will notice that the if is irrelevant and optimize it away. In particular, the if statement needs to read the current value of myFlag. If the value isn't already in the cache, that means the instruction will stall while waiting for the data to be read from memory.
By contrast, if you just write (without testing first) the value can be written to a write queue, and then more instructions can execute immediately. You won't get a stall until/unless you read the value of myFlag (and assuming it's read reasonably soon after writing, it'll probably still be in the cache, so stalling will be minimal).
CPU-cycle wise, prefer myFlag = true; Think about it: even if the compiler makes no optimization (not really likely), just setting it takes one asm statement, and going through the if takes at least 1 asm statement.
So just go with the assignment.
And more importantly, don't try to make hypotheses on such low-level details, specific compiler optimizations can totally go against intuition.
You do realize that the check is moot right? If you blindly set it to true and it was not set, you are setting it. If it was already true, then there is no change and you are not setting it, so you effectively can implement it as:
myFlag = true;
Regarding the potential optimizations, to be able to test, the value must be in the cache, so most of the cost is already paid. On the other hand, the branch (if the compiler does not optimize the if away, which most will) can have a greater impact in performance.
You are most likely over-thinking the problem as others already mentioned, so let me do the same. The following might be faster if you can afford to double the statements unrelated to myFlag. In fact, you can get rid of myFlag. OK, here we go:
while (/*some finite condition*/) {
if (...) {
// statements
} else {
while (/*some finite condition*/) {
if (...) {
// statements, repeated
}
}
// Do something (as if myFlag was true in the OPs example)
break;
}
}
As with all performance optimization: Measure, measure, measure!
It is architecture specific whether if (!myFlag) myFlag = true; will take more time to execute than the simple myFlag = true; even without any optimization. There are architectures (e.g., https://developer.qualcomm.com/hexagon-processor) where both statements will take only one cycle each to execute.
The only way to figure out on your machine would be by measurement.
In any case myFlag = true will always be faster or have same execution time as if (!myFlag) myFlag = true;
This question gave me headache too so i simply tested it myself with the following code (C#):
System.Diagnostics.Stopwatch time = new System.Diagnostics.Stopwatch();
int i = 0;
int j = 1;
time.Start();
if (i != 0)
i = 0;
time.Stop();
Console.WriteLine("compare + set - {0} ticks", time.ElapsedTicks);
time.Reset();
time.Start();
if (j != 0)
j = 0;
time.Stop();
Console.WriteLine("compare - {0} ticks", time.ElapsedTicks);
time.Reset();
time.Start();
i = 0;
time.Stop();
Console.WriteLine("set - {0} ticks", time.ElapsedTicks);
Console.ReadLine();
result:
compare + set - 1 ticks
compare - 1 ticks
set - 0 ticks
while the time, used to set the value surely isn't zero, it shows that even a single query needed more time than just setting the variable.
The below is the code which I need to optimize and planned that it would be good to move to the switch construct. But I can compare in case. So I planned to make the comparison (len > 3) as the default case.
If I make the comparison part (len > 3) as the default case and add the default as the first in the switch, will it be faster?
Or how can I make the below code as a switch statement?
if ( len > 3 ) {
// Which will happen more often;
}
else if ( len == 3 ) {
// Next case which may occur often;
} else if ( len == 2 ) {
// The next priority case;
} else {
// And this case occurs rarely;
}
Probably not. Both if...else and switch...case are high-level constructs. What slows you down is the branch prediction. The better your prediction is the faster your code will run. You should put your most occurring case in first if, second in the else if and so on, just like you wrote. For switch the result depends on the internal compiler implementation which can reorder the cases despite your own order. The default should be actually reserved for the less occurring situations because rest of the conditions must be checked before falling back to default.
To conclude all this, performance-wise usage of if...else is optimal as long as you set your conditions in the correct order. Regarding switch...case it's compiler specific and depends on the applied optimizations.
Also note that switch...case is more limited than if...else since it supports only simple comparison of values.
Although you've accepted what is probably the best answer, I wanted to provide an alternative.
Note that the standard caveat applies - optimisation isn't optimisation unless you've profiled your code.
However if you are encountering poor performance relating to branches, you can reduce or eliminate them. That your code has one or more inequality comparisons is not an obstacle - you can reduce your cases down to a set of direct equalities, and if necessary use that to index a table, rather than branch at all.
void doSomething(int len)
{
static const char* str[] =
{ "%2d > 3\n",
"%2d < 2\n",
"%2d = 2\n",
"%2d = 3\n"
};
int m1 = (len-2)>>31;
int m2 = (len-4)>>31;
int r = (len & m2 & ~m1) + !!m1;
printf(str[r],len);
}
Note that this codes makes several assumptions which may not hold in practice, but as we're making the wild assumption that this even needs optimising in the first place...
Also, note that better optimisations may be possible with more knowledge about the actual range and type of the input parameter, and indeed what the actual actions taken need to be.
You can't move comparisons into a switch statement... it uses single checks for its selections.. i.e.,
switch (len) {
case 1:
// Do case 1 stuff here
break;
case 2:
// Do case 2 stuff here
break;
case 3:
// Do case 3 stuff here
break;
}
Use breaks to prevent the case statements from running into each other. Read more here.
Your code is as 'optimized' as it will get in its current state...
The only way you're going to know is to benchmark it with your
compiler. If performance is an issue, you should use the option
to provide the compiler with profiler output, and let it decide;
it will generally find the best solution. (Note that even on
a specific architecture, like Intel, the best solution in terms
of machine instructions may vary from one processor to the
next.)
In your case, the switch would probably look like:
switch ( len ) {
case 2:
// ...
break;
case 3:
// ...
break;
default:
if ( len > 3 ) {
// ...
} else {
// ...
}
}
With only two effective cases, the compiler doesn't have much to
work with. A typical implementation (without extreme
optimization) would do bounds checking, then a table lookup for
the two explicit cases. Any decent compiler will then pick up
that the comparison in your default case corresponds to one of
the bounds checks it has already done, and not duplicate it.
But with only two cases, the jump table will probably not make
a significant difference compared to the two comparisons,
especially as you'll be out of bounds in the most frequent case.
Until you have actual profiler information that this is
a bottleneck in your code, I wouldn't worry about it. Once you
have that information, you can profile different variants to see
which is faster, but I suspect that if you use maximum
optimization and feed profiling information back into the
compiler, there will be no difference.
If you are worried about speed, the truth is that your if...else or switch...case statements won't have a real impact of your application speed unless you have hundreds of them. The places where you lose speed is in your iterations or loops. To answer your question specifically, you cannot convert your if...else statement to a switch...case statement with the default appearing first; but with that said, if you did convert to a switch...case then you will that they run at the same speed (difference is too minute to be picked up by conventional benchmarking tools).
You can use a range in a case:
switch (len) {
case 3 ... INT_MAX:
// ...
break;
case 2:
// ...
break;
default:
// ...
break;
}
But that is an extension provided by GCC...
I just stumbled upon a change that seems to have counterintuitive performance ramifications. Can anyone provide a possible explanation for this behavior?
Original code:
for (int i = 0; i < ct; ++i) {
// do some stuff...
int iFreq = getFreq(i);
double dFreq = iFreq;
if (iFreq != 0) {
// do some stuff with iFreq...
// do some calculations with dFreq...
}
}
While cleaning up this code during a "performance pass," I decided to move the definition of dFreq inside the if block, as it was only used inside the if. There are several calculations involving dFreq so I didn't eliminate it entirely as it does save the cost of multiple run-time conversions from int to double. I expected no performance difference, or if any at all, a negligible improvement. However, the perfomance decreased by nearly 10%. I have measured this many times, and this is indeed the only change I've made. The code snippet shown above executes inside a couple other loops. I get very consistent timings across runs and can definitely confirm that the change I'm describing decreases performance by ~10%. I would expect performance to increase because the int to double conversion would only occur when iFreq != 0.
Chnaged code:
for (int i = 0; i < ct; ++i) {
// do some stuff...
int iFreq = getFreq(i);
if (iFreq != 0) {
// do some stuff with iFreq...
double dFreq = iFreq;
// do some stuff with dFreq...
}
}
Can anyone explain this? I am using VC++ 9.0 with /O2. I just want to understand what I'm not accounting for here.
You should put the conversion to dFreq immediately inside the if() before doing the calculations with iFreq. The conversion may execute in parallel with the integer calculations if the instruction is farther up in the code. A good compiler might be able to push it farther up, and a not-so-good one may just leave it where it falls. Since you moved it to after the integer calculations it may not get to run in parallel with integer code, leading to a slowdown. If it does run parallel, then there may be little to no improvement at all depending on the CPU (issuing an FP instruction whose result is never used will have little effect in the original version).
If you really want to improve performance, a number of people have done benchmarks and rank the following compilers in this order:
1) ICC - Intel compiler
2) GCC - A good second place
3) MSVC - generated code can be quite poor compared to the others.
You may also want to try -O3 if they have it.
Maybe the result of getFreq is kept inside a register in the first case and written to memory in the second case? It might also be, that the performance decrease has to do with CPU mechanisms as pipelining and/or branch prediction.
You could check the generated assembly code.
This looks to me like a pipeline stall
int iFreq = getFreq(i);
double dFreq = iFreq;
if (iFreq != 0) {
Allows the conversion to double to happen in parallel with other code
since dFreq is not being used immediately. it gives the compiler something
to do between storing iFreq and using it, so this conversion is most likely
"free".
But
int iFreq = getFreq(i);
if (iFreq != 0) {
// do some stuff with iFreq...
double dFreq = iFreq;
// do some stuff with dFreq...
}
Could be hitting a store/reference stall after the conversion to double since you begin using the double value right away.
Modern processors can do multiple things per clock cycle, but only when the things are independent. Two consecutive instructions that reference the same register often result in a stall. The actual conversion to double may take 3 clocks, but all but the first clock can be done in parallel with other work, provided you don't refer to the result of the conversion for an instruction or two.
C++ compilers are getting pretty good at re-ordering instructions to take advantage of this, it looks like your change defeated some nice optimization.
One other (less likely) possibility is that when the conversion to float was before the branch, the compiler was able remove the branch entirely. Branchless code is often a major performance win in modern processors.
It would be interesting to see what instructions the compiler actually emitted for these two cases.
Try moving the definition of dFreq outside of the for loop but keep the assignment inside the for loop/if block.
Perhaps the creation of dFreq on the stack every for loop, inside the if, is causing issue (although the compiler should take care of that). Perhaps a regression in the compiler, if the dFreq var is in the four loop its created once, inside the if inside the for its created every time.
double dFreq;
int iFreq;
for (int i = 0; i < ct; ++i)
{
// do some stuff...
iFreq = getFreq(i);
if (iFreq != 0)
{
// do some stuff with iFreq...
dFreq = iFreq;
// do some stuff with dFreq...
}
}
maybe the compiler is optimizing it taking the definition outside the for loop. when you put it in the if the compiler optimizations aren't doing that.
There's a likelihood that this changed caused your compiler to disable some optimizations. What happens if you move the declarations above the loop?
Once I've read a document about optimization that said that as defining variables just before their usage and not even before was a good practice, the compilers could optimize code following that advice.
This article (a bit old but quite valid) say (with statistics) something similar : http://www.tantalon.com/pete/cppopt/asyougo.htm#PostponeVariableDeclaration
It's easy enough to find out. Just take 20 stackshots of the slow version, and of the fast version. In the slow version you will see on roughly 2 of the shots what it is doing that it is not doing in the fast version. You will see a subtle difference in where it halts in the assembly language.