Comparison in loop (optimization) - c++

Let's consider the situation:
bool b = checking_some_condition();
for (int i = 0; i < 1000000; ++i)
{
if (b)
do_something(i);
else
do_something_else(i);
}
Is it obvious that compiler optimizes the code above into something like this? :
if (b)
{
for (int i = 0; i < 1000000; ++i)
do_something(i);
}
else
{
for (int i = 0; i < 1000000; ++i)
do_something_else(i);
}
Of course, I am only giving example the present the situation. I know that checking bool value 1000000 times is hardly noticeable for the performace, but if I'd have more complex comparisons with multiple ways of how the code inside loop would go, change in performance could be significant. Especially if this code would be inside the function that is called multiple times.

As was mentioned in the comments above you can't really make a safe assumption what the compiler will optimize or won't. It's their "freedom" to do these things or not.
If you want to get a feeling for what's going on the best way is to look at the generated assembly which will give you and objective way of arguing what the compiler might have done. https://godbolt.org/z/W-5Hve shows the easy example you posted above.
However, please try to make the example in godbolt as realistic as possible and then check the assembly. Even if two snippets will yield the same assembly in godbolt to make sure that this will also happen in your codebase you need to check the assembly of your compiled implementation in you codebase as well.
Summarizing this, what I normally do is:
- try a realistic example in godbolt and play with different compilers/flags and change the code until I think I know whats going on.
- compile my project and look at the assembly there to try and find the specific function again to make sure that the result in my code base is the same.
As a little extra: objdump -M intel -dC executable will show you the assembly of an executable.

Related

Will an operation done several times in sequence be simplified by compiler?

I've had this question for a long time but never knew where to look. If a certain operation is written many times will the compiler simplify it or will it run the exact same operation and get the exact same answer?
For example, in the following c-like pseudo-code (i%3)*10 is repeated many times.
for(int i=0; i<100; i++) {
array[(i%3)*10] = someFunction((i%3)*10);
int otherVar = (i%3)*10 + array[(i%3)*10];
int lastVar = (i%3)*10 - otherVar;
anotherFunction(lastVar);
}
I understand a variable would be better for visual purposes, but is it also faster? Is (i%3)*10 calculated 5 times per loop?
There are certain cases where I don't know if its faster to use a variable or just leave the original operation.
Edit: using gcc (MinGW.org GCC-8.2.0-3) 8.2.0 on win 10
Which optimizations are done depends on the compiler, the compiler optimization flag(s) you specify, and the architecture.
Here are a few possible optimizations for your example:
Loop Unrolling This makes the binary larger and thus is a trade-off; for example you may not want this on a tiny microprocessor with very little memory.
Common Subexpression Elimination (CSE) you can be pretty sure that your (i % 3) * 10 will only be executed once per loop iteration.
About your concern about visual clarity vs. optimization: When dealing with a 'local situation' like yours, you should focus on code clarity.
Optimization gains are often to be made at a higher level; for example in the algorithm you use.
There's a lot to be said about optimization; the above are just a few opening remarks. It's great that you're interested in how things work, because this is important for a good (C/C++) programmer.
As a matter of course, you should remove the obfuscation present in your code:
for (int i = 0; i < 100; ++i) {
int i30 = i % 3 * 10;
int r = someFunction(i30);
array[i30] = r;
anotherFunction(-r);
}
Suddenly, it looks quite a lot simpler.
Leave it to the compiler (with appropriate options) to optimize your code unless you find you actually have to take a hand after measuring.
In this case, unrolling three times looks like a good idea for the compiler to pursue. Though inlining might always reveal even better options.
Yes, operations done several times in sequence will be optimized by a compiler.
To go into more detail, all major compilers (GCC, Clang, and MSVC) store the value of (i%3)*10 into a temporary (scratch, junk) register, and then use that whenever an equivalent expression is used again.
This optimization is called GCSE (GNU Common Subexpression Elimination) for GCC, and just CSE otherwise.
This takes a decent chunk out of the time that it takes to compute the loop.

How to let my compiler more stupid (wrong index)?

I have encounter a horrible situation.
I usually use visual code to edit my code, also compile and execute in it(F5).
But I found vscode is too smart or ignore some warning message for me. And
output the right answer, which also work fine in Ideone. But in window cmd or dev C++ my code can't output anything, just return a big number.
And I find some situation will occur the thing I mention above.
The code like this
for (i = 0; i < g.size(); i++)
{
int source;
int dest;
int minWeight = 999;
for (j = 0; i < g[j].size(); j++)
{
// no edge, come to next condition
if (!g[i][j])
continue;
if (g[i][j] < minWeight)
{
source = i;
dest = j;
minWeight = g[i][j];
}
}
if
updateGroup(index, index[source], index[dest]);
else
updateGroup(index, index[source], index[dest]);
}
You may found that the second for loops have wrong condition statement, it should change
j = 0; i < g[j].size(); j++ to j = 0; j < g[i].size(); j++
So I wonder to know
Are there any way let my vscode more strict?
Why it still can output right answer in vscode and ideone?
How to avoid or be easier to found where my code wrong when this kind of no message error?
Really hope someone can help me, and appreciate all of your suggestion!!
There is no way for the compiler or computer to read your mind and guess what you meant to write instead of what you really did mean.
Even when this mistake results in a bug, it cannot know that you did not intend to write this, or that you meant to write some other specific thing instead.
Even when this bug results in your program having undefined behaviour, it is not possible to detect many cases of undefined behaviour, and it is not worthwhile for a compiler author to attempt to write code to do this, because it's too hard and not useful enough. Even if they did, the compiler could still not guess what you meant instead.
Remember, loops like this don't have to check or increment the same variable that you declared in the preamble; that's just a common pattern (now superseded by the safer ranged-for statement). There's nothing inherently wrong with having a loop that increments i but checks j.
Ultimately, the solution to this problem is to write tests for your code, which is why many organisations have dedicated Quality Assurance teams to search for bugs, and why you should already be testing your code before committing it to your project.
Remember to concentrate and pay close attention and read your code, and eventually such typos will become less common in your work. Of course once in a while you will write a bug, and your tests will catch it. Sometimes your tests won't catch it, which is when your customers will eventually notice it and raise a complaint. Then, you release a new version that fixes the bug.
This is all totally normal software development practice. That's what makes it fun! 😊
1) Crank up your compiler warnings as high as they will go.
2) Use multiple different compilers (they all warn about different things).
3) Know the details of the language well (a multi year effort) and be really, really careful about the code you write.
4) Write (and regularly run) lots of tests.
5) Use tools like sanitizers, fuzzers, linters, static code analyzers etc. to help catch bugs.
6) Build and run your code on multiple platforms to keep it portable and find bugs exposed by different environments/implementations.

Performance impact of using 'break' inside 'for-loop'

I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)
The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.
The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.

Intel C++ Compiler understanding what optimization is performed

I have a code segment which is as simple as :
for( int i = 0; i < n; ++i)
{
if( data[i] > c && data[i] < r )
{
--data[i];
}
}
It's a part of a large function and project. This is actually a rewrite of a different loop, which proved to be time consuming (long loops), but I was surprised by two things :
When data[i] was temporary stored like this :
for( int i = 0; i < n; ++i)
{
const int tmp = data[i];
if( tmp > c && tmp < r )
{
--data[i];
}
}
It became more much slower. I don't claim this should be faster, but I can not understand why it should be so much slower, the compiler should be able to figure out if tmp should be used or not.
But more importantly when I moved the code segment into a separate function it became around four times slower. I wanted to understand what was going on, so I looked in the opt-report and in both cases the loop is vectorized and seem to do the same optimization.
So my question is what can make such a difference on a function which is not called a million times, but is time consuming in itself ? What to look for in the opt-report ?
I could avoid it by just keeping it inlined, but the why is bugging me.
UPDATE :
I should underline that my main concern is to understand, why it became slower, when moved to a separate function. The code example given with tmp variable, was just a strange example I encountered during the process.
You're probably register starved, and the compiler is having to load and store. I'm pretty sure that the native x86 assembly instructions can take memory addresses to operate on- i.e., the compiler can keep those registers free. But by making it local, you may changing the behaviour wrt. aliasing and the compiler may not be able to prove that the faster version has the same semantics, especially if there is some form of multiple threads in here, allowing it to change the code.
The function was slower when in a new segment likely because function calls not only can break the pipeline, but also create poor instruction cache performance (there's extra code for parameter push/pop/etc).
Lesson: Let the compiler do the optimizing, it's smarter than you. I don't mean that as an insult, it's smarter than me too. But really, especially the Intel compiler, those guys know what they're doing when targetting their own platform.
Edit: More importantly, you need to recognize that compilers are targetted at optimizing unoptimized code. They're not targetted at recognizing half-optimized code. Specifically, the compiler will have a set of triggers for each optimization, and if you happen to write your code in such a way as that they're not hit, you can avoid optimizations being performed even if the code is semantically identical.
And you also need to consider implementation cost. Not every function ideal for inlining can be inlined- just because inlining that logic is too complex for the compiler to handle. I know that VC++ will rarely inline with loops, even if the inlining yields benefit. You may be seeing this in the Intel compiler- that the compiler writers simply decided that it wasn't worth the time to implement.
I encountered this when dealing with loops in VC++- the compiler would produce different assembly for two loops in slightly different formats, even though they both achieved the same result. Of course, their Standard library used the ideal format. You may observe a speedup by using std::for_each and a function object.
You're right, the compiler should be able to identify that as unused code and remove it/not compile it. That doesn't mean it actually does identify it and remove it.
Your best bet is to look at the generated assembly and check to see exactly what is going on. Remember, just because a clever compiler could be able to figure out how to do an optimization, it doesn't mean it can.
If you do check, and see that the code is not removed, you might want to report that to the intel compiler team. It sounds like they might have a bug.

Is there any performance difference between for() and while()?

Or is it all about semantics?
Short answer: no, they are exactly the same.
Guess it could in theory depend on the compiler; a really broken one might do something slightly different but I'd be surprised.
Just for fun here are two variants that compile down to exactly the same assembly code for me using x86 gcc version 4.3.3 as shipped with Ubuntu. You can check the assembly produced on the final binary with objdump on linux.
int main()
{
#if 1
int i = 10;
do { printf("%d\n", i); } while(--i);
#else
int i = 10;
for (; i; --i) printf("%d\n", i);
#endif
}
EDIT: Here is an "oranges with oranges" while loop example that also compiles down to the same thing:
while(i) { printf("%d\n", i); --i; }
If your for and while loops do the same things, the machine code generated by the compiler should be (nearly) the same.
For instance in some testing I did a few years ago,
for (int i = 0; i < 10; i++)
{
...
}
and
int i = 0;
do
{
...
i++;
}
while (i < 10);
would generate exactly the same code, or (and Neil pointed out in the comments) with one extra jmp, which won't make a big enough difference in performance to worry about.
There is no semantic difference, there need not be any compiled difference. But it depends on the compiler. So I tried with with g++ 4.3.2, CC 5.5, and xlc6.
g++, CC were identical, xlc WAS NOT
The difference in xlc was in the initial loop entry.
extern int doit( int );
void loop1( ) {
for ( int ii = 0; ii < 10; ii++ ) {
doit( ii );
}
}
void loop2() {
int ii = 0;
while ( ii < 10 ) {
doit( ii );
ii++;
}
}
XLC OUTPUT
.loop2: # 0x00000000 (H.10.NO_SYMBOL)
mfspr r0,LR
stu SP,-80(SP)
st r0,88(SP)
cal r3,0(r0)
st r3,64(SP)
l r3,64(SP) ### DIFFERENCE ###
cmpi 0,r3,10
bc BO_IF_NOT,CR0_LT,__L40
...
enter code here
.loop1: # 0x0000006c (H.10.NO_SYMBOL+0x6c)
mfspr r0,LR
stu SP,-80(SP)
st r0,88(SP)
cal r3,0(r0)
cmpi 0,r3,10 ### DIFFERENCE ###
st r3,64(SP)
bc BO_IF_NOT,CR0_LT,__La8
...
The scope of the variable in the test of the while loop is wider than the scope of variables declared in the header of the for loop.
Therefore, if there are performance implications as a side-effect of keeping a variable alive longer, then there will be performance implications in choosing between a while and a for loop ( and not wrapping the while up in {} to reduce the scope of its variables ).
An example might be a concurrent collection which counts the number of iterators referring to it, and if more than one iterator exists, it applies locking to prevent concurrent modification, but as an optimisation elides the locking if only one iterator refers to it. If you then had two for loops in a function using differently named iterators on the same container, the fast path would be taken, but with two while loops the slow path would be taken. Similarly there may be performance implications if the objects are large (more cache traffic), or use system resources. But I can't think of a real example that I've ever seen where it would make a difference.
Compilers that optimize using loop unrolling will probably only do so in the for-loop case.
Both are equivalent. It's a matter of semantics.
The only difference may lie in the do... while construct, where you postpone the evaluation of the condition until after the body, and thus may save 1 evaluation.
i = 1; do { ... i--; } while( i > 0 );
as opposed to
for( i = 1; i > 0; --i )
{ ....
}
I write compilers. We compile all "structured" control flow (if, while, for, switch, do...while) into conditional and unconditional branches. Then we analyze the control-flow graph. Since a C compiler has to deal with general goto anyway, it is easiest to reduce everything to branch and conditional-branch instructions, then be sure to handle that case well. (A C compiler has to do a good job not just on handwritten code but also on automatically generated code, which may have many, many goto statements.)
No. If they're doing equivalent things, they'll compile to the same code - as you say, it's about semantics. Choose the one that best represents what you're trying to express.
Ideally it should be the same, but eventually it depends on your compiler/interpreter. To be sure, you must measure or examine the generated assembly code.
Proof that there may be a difference: These lines produce different assembly code using cc65.
for (; i < 1000; ++i);
while (i < 1000) ++i;
On Atmel ATMega while() is faster than for(). Why is this is explained in AVR035: Efficient C Coding for AVR.
P.S. Original platform was not mentioned in question.
continue behaves differently in for and while: in for, it alters the counter, in while, it usually doesn't
To add another answer: In my experience, optimizing software is like a big, bushy beard being shaved off a man.
First you lop it off in big chunks with scissors (prune whole limbs off the call tree).
Then you make it short with an electric clipper (tweak algorithms).
Finally you shave it with a razor to get rid of the last little bit (low-level optimization).
The last is where the difference between for() and while() might, but probably won't, make a difference.
P.S. The programmers I know (who are all very good, and I suspect are a representative sample) basically go at it from the other direction.
They are the same as far as performance goes. I tend to use while when waiting for a state change (such as waiting for a buffer to be filled) and for when processing a number of discrete objects (such as going through each item in a collection).
There is a difference in some cases.
If you are at the point where that difference matters, you either need to pick a better algorithm or begin coding in assembly language. Trust me, coding in assembly is preferable to fixing your compiler version.
Is while() faster/slower than for()? Let's review a few things about optimization:
Compiler-writers work very hard to shave cycles by having fewer calls to jump, compare, increment, and the other kinds of instructions that they generate.
Call instructions, on the other hand, consume many magnitudes more cycles, but the compiler is nearly powerless to do anything to remove those.
As programmers, we write lots of function calls, some because we mean to, some because we're lazy, and some because the compiler slips them in without being obvious.
Most of the time, it doesn't matter, because the hardware is so fast, and our jobs are so small, that the computer is like a beagle dog who wolfes her food and begs for more.
Sometimes, however, the job is big enough that performance is an issue.
What do we do then? Where's the bigger payoff?
Getting the compiler to shave a few cycles off loops & such?
Finding function calls that don't -really- need to be done so much?
The compiler can't do the latter. Only we the programmers can.
We need to learn or be taught how to do this. It doesn't come naturally.
We are congenitally inclined to make wrong guesses and then bet on them.
Getting better algorithms is a start, but only a start. Our teachers need to teach this, if indeed they know how.
Profilers are a start. I do this.
The apocryphal quote of Willie Sutton when asked Why do you rob banks?:
Because that's where the money is.
If you want to save cycles, find out where they are.
Probably only coding style.
for if you know the number of iterations.
while if you do not know the number of iterations.