Is while faster than for?

Is while faster than for? - c++

As in the topic, I learnt in school, that loop for is faster than loop while, but someone told me that while is faster.
I must optimize the program and I want to write while instead for, but I have a concern that it will be slower?
for example I can change for loop:
for (int i=0; i<x; i++)
{
cout<<"dcfvgbh"<<endl;
}
into while loop:
i=0;
while (i<x)
{
cout<<"dcfvgbh"<<endl;
i++;
}

The standard requires (§6.5.3/1) that:
The for statement
for ( for-init-statement conditionopt; expressionopt) statement
is equivalent to
{
for-init-statement
while ( condition ) {
statement
expression;
}
}
As such, you're unlikely to see much difference between them (even if execution time isn't necessarily part of the equivalence specified in the standard). There are a few exceptions listed to the equivalence as well (scopes of names, execution of the expression before evaluating the condition if you execute a continue). The latter could, at least theoretically, affect speed a little bit under some conditions, but probably not enough to notice or care about as a rule, and definitely not unless you actually used a continue inside the loop.

For all intents and purposes for is just a fancy way of writing while, so there is no performance advantage either way. The main reason to use one over the other is how the intent is translated so the reader understands better what the loop is actually doing.

No.
Nope, it's not.
It is not faster.

You cout will eat 99% of the clock cycles for this loop. Beware micro-optimization. At any rate, these two will give essentially identical code.
The only time when a for loop can be faster is when you have a known terminating condition - e.g.
for(ii = 0; ii < 24; ii++)
because some optimizing compilers will perform loop unrolling. This means they will not perform a test on every pass through the loop because they can "see" that just doing the thing inside the loop 24 times (or 6 times in blocks of 4, etc) will be a tiny bit more efficient. When the thing inside the loop is very small (e.g. jj += ii;), such optimization makes the for loop a bit faster than the while (which typically doesn't do "unrolling").
Otherwise - no difference.
update at the request of #zeroth
Source: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.9346&rep=rep1&type=pdf
Quote from source (my emphasis):
Unrolling a loop at the source- code level involves identification of
loop constructs (e.g., for, while, do-while, etc.), determination of
the loop count to ensure that it is a counting loop, replication of
the loop body and the adjustment of loop count of the unrolled loop. A
prologue or epilogue code may also be inserted. Using this approach,
it is difficult to unroll loops formed using a while and goto
statements since the loop count is not obvious. However, for all but
the simplest of loops, this approach is tedious and error prone.
The other alternative is to unroll loops automatically. Automatic
unrolling can be done early on source code, late on the unoptimized
intermediate representation, or very late on an optimized
representation of the program. If it is done at the source-code level,
then typically only counting loops formed using for statements are
unrolled. Unrolling loops formed using other control constructs is
difficult since the loop count is not obvious.

To the best of my knowledge swapping out for loops for while loops is not an established optimization technique.
Both your examples will be identical in performance, but as an exercise you could time them to confirm this for yourself.

Related

How can we elimate break statement in C/C++ programs to support compiler for auto vectorization?

There are some of the rules to be followed while programming, making the program easy for the compiler to auto vectorize.
Some of the rules include:
No 'break' statements(The loop should have single flow)
No 'if' statements(Only masked assignment allowed) and many other
It would be great to know what steps can be taken in order to replace these conditional break statements.
I would like to give a simple example to make it easy to understand but the program can contain many 'break' statements which are usually depending on rigorous condition and status flags.
Consider a simple program working on a 2D array:
void main()
{
int i,j,sum = 0;
int array[3][5] = {{1,3,6,8,9},{23,65,77,54,5},{2,5,-7,-89,-8}};
for(i=0;i<3;i++)
{
for(j=0;j<5;j++)
{
sum += a[i][j];
if(i==2 && j==2)
{
break;//breaks out of loop j
}
}
}
return 0;
}

It would be great to know what steps can be taken in order to replace these conditional break statements.
Generally speaking, you don't.
A break statement within a loop exists to terminate the loop, typically based on a condition other than the loop's normal termination condition. Loops usually have a terminal condition built into them, as well as a specific syntactic place where that condition is found. As such, if the code in question is decently written, a break statement will only appear in cases where the additional termination condition cannot reasonably be added to the loop termination condition.
So if we're talking about code that is reasonable (and if we're not, then vectorization is the least of your worries), then rewriting it to fold the break condition into the loop condition is either straight-up impossible or will lead to code that is exceedingly difficult to follow/maintain.
So in general, if break is being used because it's necessary, then it isn't going to be possible/reasonable to fold it into the loop condition. And even if it is, a complex loop termination condition can kill auto-vectorization just as effectively as a break statement.
Most code is not going to be amenable to vectorization. Your trivial example is non-vectorizable because it contains an OS system call as part of its operation. Vectorization typically applies to looping over sequences of values and performing fairly simple operations on them.

Performance impact of using 'break' inside 'for-loop'

I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)

The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.

The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.

Compilers, If statements and loops

This is a general efficiency question for c++. I am not familiar with the inner workings of compilers, so suppose I have several loops and a potential if statement inside, e.g.:
for(int i=0; ...)
{
for(int j=0; ...)
{
if( ... )
{
...
}
else
{
... (slightly different)
}
}
}
However, this if-statement is independent of the loops. Is there a significant speed difference if I instead define the if/else statement outside of the loops with the loops inside? E.g.:
if( ... )
{
for(int i=0; ...)
{
for(int j=0; ...)
{
...
}
}
}
else
{
for(int i=0; ...)
{
for(int j=0; ...)
{
... (slightly different)
}
}
}
If so, or if not so, why is that? I have some notion that a compiler will recognize the same if statement being done over and over, but this is quite unfamiliar territory to me.
I examined the response to this question:
Would compiler optimize conditional statement in loop by moving it ouside the loop?
and he discusses the different levels of optimization in gcc, and how -O3 (I think) would do that. But is anything done like this automatically? If not, how big of a cost is an if-statement like this inside of a loop?

The only real answer is maybe. If the condition is a loop
invariant, then the transposition you suggest is legal, and if
the compiler can recognize the loop invariance, then it can make
the transposition. Whether it does or not depends on the
compiler: g++ /O3 does, at least in 64 bit mode, cl /Ox /Os
doesn't, at least in 32 bit mode; g++ also unrolls the two loops.
In my tests, at least; I more or less guaranteed that the
compiler could determine that the condition was a loop invariant
by wrapping the loop in a function, with the condition
a function argument of type bool const; depending on the
condition, it may be more or less difficult for the compiler to
prove loop invariance. And of course, the fact that the
compiler has more registers to play with in 64 bit mode could
also affect its optimizations .
Also: although I'd instinctively expect the g++ version to be
faster, it is significantly larger; in some cases, this may
negatively affect the various memory caches, resulting in the
code actually running slower.
In the end, I'd write the first, always. If the profiler later
shows it to be a bottleneck, there's no issue about going back
and rewriting it along the lines of the second, then measuring
to see if it makes a difference, one way or the other, and how
much difference it makes. And be aware that the best results
may depend on the compiler and the architecture you are
targetting.

This question is impossible to say which is quicker.
Just remember the 80-20 rule (http://en.wikipedia.org/wiki/Pareto_principle#In_software) and find the bit of code by profiling.
Anyway just write the code readable and maintainable in the first place. If you have performance problems profile the code.

Depends on case but I would bet second option is faster. This is because no branching will happen and compiler has higher chance to replace lot of code with some MMX/SSE group of instructions.
Also at least theoretically in first case CPU has to solve same if() in each for() cycle. In second case if() is outside of look and should be faster. But again, modern compilers often can find this problem and solve it magically.
But yes, usually it is important to write code for readability unless performance is real big concern.

Constant embedded for loop condition optimization in C++ with gcc

Will a compiler optimize tihs:
bool someCondition = someVeryTimeConsumingTask(/* ... */);
for (int i=0; i<HUGE_INNER_LOOP; ++i)
{
if (someCondition)
doCondition(i);
else
bacon(i);
}
into:
bool someCondition = someVeryTimeConsumingTask(/* ... */);
if (someCondition)
for (int i=0; i<HUGE_INNER_LOOP; ++i)
doCondition(i);
else
for (int i=0; i<HUGE_INNER_LOOP; ++i)
bacon(i);
someCondition is trivially constant within the for loop.
This may seem obvious and that I should do this myself, but if you have more than one condition then you are dealing with permuatations of for loops, so the code would get quite a bit longer. I am deciding on whether to do it (I am already optimizing) or whether it will be a waste of my time.

It's possible that the compiler might write the code as you did, but I've never seen such optimization.
However there is something called branch prediction in modern CPU. In essence it means that when the processor is asked to execute a conditional jump, it'll start to execute what is judged to be the likeliest branch before evaluating the condition. This is done to keep the pipeline full of instructions.
In case the processor fails (and takes the bad branch) it cause a flush of the pipeline: it's called a misprediction.
A very common trait of this feature is that if the same test produce the same result several times in a row, then it'll be considered to produce the same result by the branch prediction algorithm... which is of course tailored for loops :)
It makes me smile because you are worrying about the if within the for body while the for itself causes a branch prediction >> the condition must be evaluated at each iteration to check whether or not to continue ;)
So, don't worry about it, it costs less than a cache miss.
Now, if you really are worried about this, there is always the functor approach.
typedef void (*functor_t)(int);
functor_t func = 0;
if (someCondition) func = &doCondition;
else func = &bacon;
for (int i=0; i<HUGE_INNER_LOOP; ++i) (*func)(i);
which sure looks much better, doesn't it ? The obvious drawback is the necessity for compatible signatures, but you can write wrappers around the functions for that. As long as you don't need to break/return, you'll be fine with this. Otherwise you would need a if in the loop body :D

It does not seem to do so with either -O2 or -O3 optimisations. This is something you can (and should, if you are concerned with optimisation) test for yourself - compile with the optimisation you are interested in and examine the emitted assembly language.

Have you profiled your app to find out where the slowdowns are? If not, why are you even thinking about optimization? Until you know which methods need to be optimized, you're wasting your time worrying about micro-optimizations like this.
Is this the location of the slowdown? If so, then what you're doing may be useful. Yes, the compiler may optimize this, but there's no guarantee that it does. If this isn't the location of the slowdown, then look elsewhere; the cost of one additional branch every time through the loop is probably trivial relative to all of the other work you're doing.

Is Loop Hoisting still a valid manual optimization for C code?

Using the latest gcc compiler, do I still have to think about these types of manual loop optimizations, or will the compiler take care of them for me well enough?

If your profiler tells you there is a problem with a loop, and only then, a thing to watch out for is a memory reference in the loop which you know is invariant across the loop but the compiler does not. Here's a contrived example, bubbling an element out to the end of an array:
for ( ; i < a->length - 1; i++)
swap_elements(a, i, i+1);
You may know that the call to swap_elements does not change the value of a->length, but if the definition of swap_elements is in another source file, it is quite likely that the compiler does not. Hence it can be worthwhile hoisting the computation of a->length out of the loop:
int n = a->length;
for ( ; i < n - 1; i++)
swap_elements(a, i, i+1);
On performance-critical inner loops, my students get measurable speedups with transformations like this one.
Note that there's no need to hoist the computation of n-1; any optimizing compiler is perfectly capable of discovering loop-invariant computations among local variables. It's memory references and function calls that may be more difficult. And the code with n-1 is more manifestly correct.
As others have noted, you have no business doing any of this until you've profiled and have discovered that the loop is a performance bottleneck that actually matters.

Write the code, profile it, and only think about optimising it when you have found something that is not fast enough, and you can't think of an alternative algorithm that will reduce/avoid the bottleneck in the first place.
With modern compilers, this advice is even more important - if you write simple clean code, the compiler's optimiser can often do a better job of optimising the code than it can if you try to give it snazzy "pre-optimised" code.

Check the generated assembly and see for yourself. See if the computation for the loop-invariant code is being done inside the loop or outside the loop in the assembly code that your compiler generates. If it's failing to do the loop hoisting, do the hoisting yourself.
But as others have said, you should always profile first to find your bottlenecks. Once you've determined that this is in fact a bottleneck, only then should you check to see if the compiler's performing loop hoisting (aka loop-invariant code motion) in the hot spots. If it's not, help it out.

Compilers generally do an excellent job with this type of optimization, but they do miss some cases. Generally, my advice is: write your code to be as readable as possible (which may mean that you hoist loop invariants -- I prefer to read code written that way), and if the compiler misses optimizations, file bugs to help fix the compiler. Only put the optimization into your source if you have a hard performance requirement that can't wait on a compiler fix, or the compiler writers tell you that they're not going to be able to address the issue.

Where they are likely to be important to performance, you still have to think about them.
Loop hoisting is most beneficial when the value being hoisted takes a lot of work to calculate. If it takes a lot of work to calculate, it's probably a call out of line. If it's a call out of line, the latest version of gcc is much less likely than you are to figure out that it will return the same value every time.
Sometimes people tell you to profile first. They don't really mean it, they just think that if you're smart enough to figure out when it's worth worrying about performance, then you're smart enough to ignore their rule of thumb. Obviously, the following code might as well be "prematurely optimized", whether you have profiled or not:
#include <iostream>
bool isPrime(int p) {
for (int i = 2; i*i <= p; ++i) {
if ((p % i) == 0) return false;
}
return true;
}
int countPrimesLessThan(int max) {
int count = 0;
for (int i = 2; i < max; ++i) {
if (isPrime(i)) ++count;
}
return count;
}
int main() {
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 million is: ";
std::cout << countPrimesLessThan(1000*1000);
std::cout << std::endl;
}
}
It takes a "special" approach to software development not to manually hoist that call to countPrimesLessThan out of the loop, whether you've profiled or not.

Early optimizations are bad only if other aspects - like readability, clarity of intent, or structure - are negatively affected.
If you have to declare it anyway, loop hoisting can even improve clarity, and it explicitely documents your assumption "this value doesn't change".
As a rule of thumb I wouldn't hoist the count/end iterator for a std::vector, because it's a common scenario easily optimized. I wouldn't hoist anything that I can trust my optimizer to hoist, and I wouldn't hoist anything known to be not critical - e.g. when running through a list of dozen windows to respond to a button click. Even if it takes 50ms, it will still appear "instanteneous" to the user. (But even that is a dangerous assumption: if a new feature requires looping 20 times over this same code, it suddenly is slow). You should still hoist operations such as opening a file handle to append, etc.
In many cases - very well in loop hoisting - it helps a lot to consider relative cost: what is the cost of the hoisted calculation compared to the cost of running through the body?
As for optimizations in general, there are quite some cases where the profiler doesn't help. Code may have very different behavior depending on the call path. Library writers often don't know their call path otr frequency. Isolating a piece of code to make things comparable can already alter the behavior significantly. The profiler may tell you "Loop X is slow", but it won't tell you "Loop X is slow because call Y is thrashing the cache for everyone else". A profiler couldn't tell you "this code is fast because of your snarky CPU, but it will be slow on Steve's computer".

A good rule of thumb is usually that the compiler performs the optimizations it is able to.
Does the optimization require any knowledge about your code that isn't immediately obvious to the compiler? Then it is hard for the compiler to apply the optimization automatically, and you may want to do it yourself
In most cases, lop hoisting is a fully automatic process requiring no high-level knowledge of the code -- just a lot of lifetime and dependency analysis, which is what the compiler excels at in the first place.
It is possible to write code where the compiler is unable to determine whether something can be hoisted out safely though -- and in those cases, you may want to do it yourself, as it is a very efficient optimization.
As an example, take the snippet posted by Steve Jessop:
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 billion is: ";
std::cout << countPrimesLessThan(1000*1000*1000);
std::cout << std::endl;
}
Is it safe to hoist out the call to countPrimesLessThan? That depends on how and where the function is defined. What if it has side effects? It may make an important difference whether it is called once or ten times, as well as when it is called. If we don't know how the function is defined, we can't move it outside the loop. And the same is true if the compiler is to perform the optimization.
Is the function definition visible to the compiler? And is the function short enough that we can trust the compiler to inline it, or at least analyze the function for side effects? If so, then yes, it will hoist it outside the loop.
If the definition is not visible, or if the function is very big and complicated, then the compiler will probably assume that the function call can not be moved safely, and then it won't automatically hoist it out.

Remember 80-20 Rule.(80% of execution time is spent on 20% critical code in the program)
There is no meaning in optimizing the code which have no significant effect on program's overall efficiency.
One should not bother about such kind of local optimization in the code.So the best approach is to profile the code to figure out the critical parts in the program which consumes heavy CPU cycles and try to optimize it.This kind of optimization will really makes some sense and will result in improved program efficiency.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js