C++ optimization if performance - c++

consider these 2 situations of if statements:
if( error ==0 )
{
// DO success stuff
}
else
{
// DO error handling stuff
}
and this one:
if( error != 0 )
{
// DO error handling stuff
}
else
{
// DO success stuff
}
which one over performs the other, knowing that most of the time I come to the success code path.

Rather than worrying about this which might be a performance issue only in the rarest of cases, you should ask yourself which is more readable. For error checks, you could use a guard clause, which avoids too many indentations/brackets:
if( error != 0 )
{
// DO error handling stuff
return;
}
// DO success stuff
If you know that one path is more likely than the other and you are sure that this is really performance critical, you could let the compiler know (example for GCC):
if( _builtin_expect (error == 0, 1) )
{
// DO success stuff
}
else
{
// DO error handling stuff
}
Of course, this makes the code harder to read - only use it if really necessary.

It depends.
When the code is run just one time, it would be statistically faster to use the more likely one at first - if and only if the cpu implementation of branch prediction is not "sharing counters" between many lines (e.g. every 16th statement shares same). However, this is not how most code will run. It will run multiple, dozen, a trillion times (e.g. in a while loop).
Multiple runs
None will perform better than the other. The reason is branch prediction. Everytime your program runs an if statement, the cpu will count up or down the amount of times this statement was true. This way it can now predict with high accuracy the next time, if code runs again. If you would test your code a billion times, you will see it won't matter if your if or else part gets executed. CPU will optimize for what it think is the most likely case to occur.
This is a simplified explaination, as CPU branch prediction is smart enough to also see when some code always flip-flops: true, false, true, false, true, false or even true, true, false, true, true, false...
You can learn alot on the wikipedia article about branch prediction

Gcc's default behavior is to optimize for true case of the if statement. Based on that, it will choose either je or jne should be used.
If you know and want to fine control which call path is more likely, use the following macro to find control.
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)

They will perform identically. You are just trading a instruction jz for a jnz observing from Assembly level. None of them executes more or complexier instructions than the other.

It is quite unlikely that you will notice much of a difference between the two pieces of code. Comparing with 0 is the same operation whether the code later jumps if it's "true" or "false". So, use the form that expresses your "meaning" in the code best, rather than try to "outsmart the compiler" (unless you are REALLY good at it, you probably will just confuse things.
Since you have if ... else ... in both cases (most compilers will make a single return point, so even if you have a return in the middle of the function, it will still make a branch from the return to the bottom of the function, and if the condition is false, a branch to jump over it.
The only really beneficial way to solve this is to use hints that the branch is/isn't taken, which at least on some processors can be beneficial (and the compiler can turn the branches around so that the less likely conditions make the most branches). But it's also rather unportable, since the C and C++ languages don't have any features to allow such feedback to the compiler. But some compilers do implement such things.
Of course, the effect/result of this is VERY dependant on what the actual processor is (modern x86 has hints to the processor that feed into the branch prediction unit if there is no "history" for this particular branch - older x86, as used in some embedded systems, etc, won't have that. Other processors may or may not have the same feature - I believe ARM has a couple of bits to say "this is likely taken/not taken" as well). Ideally, for this, you want "profile driven optimisation", so the compiler can instrument and organise the code based on the most likely variants.
Always, use profiling and benchmarks to measure the results of any optimisation. It is often difficult to guess what is better just by looking at the code (even more so if you don't see the machine-code the compiler generates).

Any compiler should optimize the difference. Proof below. If error is set at runtime, then . . .
Using g++4.8 with -O3
This
int main(int argc, char **argv) {
bool error=argv[1];
if( error ){
return 0;
}else{
return 1;
}
}
makes . .
main:
xorl %eax, %eax
cmpq $0, 8(%rsi)
setne %al
ret
and this...
int main(int argc, char **argv) {
bool error=argv[1];
if( !error ){
return 1;
}else{
return 0;
}
}
...makes...
main:
xorl %eax, %eax
cmpq $0, 8(%rsi)
setne %al
ret
Same stuff to the CPU. Use the machine code when in doubt. http://gcc.godbolt.org/

Related

Is comparing to zero faster than comparing to any other number?

Is
if(!test)
faster than
if(test==-1)
I can produce assembly but there is too much assembly produced and I can never locate the particulars I'm after. I was hoping someone just knows the answer. I would guess they are the same unless most CPU architectures have some sort of "compare to zero" short cut.
thanks for any help.
Typically, yes. In typical processors testing against zero, or testing sign (negative/positive) are simple condition code checks. This means that instructions can be re-ordered to omit a test instruction. In pseudo assembly, consider this:
Loop:
LOADCC r1, test // load test into register 1, and set condition codes
BCZS Loop // If zero was set, go to Loop
Now consider testing against 1:
Loop:
LOAD r1, test // load test into register 1
SUBT r1, 1 // Subtract Test instruction, with destination suppressed
BCNE Loop // If not equal to 1, go to Loop
Now for the usual pre-optimization disclaimer: Is your program too slow? Don't optimize, profile it.
It depends.
Of course it's going to depend, not all architectures are equal, not all µarchs are equal, even compilers aren't equal but I'll assume they compile this in a reasonable way.
Let's say the platform is 32bit x86, the assembly might look something like
test eax, eax
jnz skip
Vs:
cmp eax, -1
jnz skip
So what's the difference? Not much. The first snippet takes a byte less. The second snippet might be implemented with an inc to make it shorter, but that would make it destructive so it doesn't always apply, and anyway, it's probably slower (but again it depends).
Take any modern Intel CPU. They do "macro fusion", which means they take a comparison and a branch (subject to some limitations), and fuse them. The comparison becomes essentially free in most cases. The same goes for test. Not inc though, but the inc trick only really applied in the first place because we just happened to compare to -1.
Apart from any "weird effects" (due to changed alignment and whatnot), there should be absolutely no difference on that platform. Not even a small difference.
Even if you got lucky and got the test for free as a result of a previous arithmetic instruction, it still wouldn't be any better.
It'll be different on other platforms, of course.
On x86 there won't be any noticeably difference, unless you are doing some math at the same time (e.g. while(--x) the result of --x will automatically set the condition code, where while(x) ... will necessitate some sort of test on the value in x before we know if it's zero or not.
Many other processors do have a "automatic updates of the condition codes on LOAD or MOVE instructions", which means that checking for "postive", "negative" and "zero" is "free" with every movement of data. Of course, you pay for that by not being able to backward propagate the compare instruction from the branch instruction, so if you have a comparison, the very next instruction MUST be a conditional branch - where an extra instruction between these would possibly help with alleviating any delay in the "result" from such an instruction.
In general, these sort of micro-optimisations are best left to compilers, rather than the user - the compiler will quite often convert for(i = 0; i < 1000; i++) into for(i = 1000-1; i >= 0; i--) if it thinks that makes sense [and the order of the loop isn't important in the compiler's view]. Trying to be clever with these sort of things tend to make the code unreadable, and performance can suffer badly on other systems (because when you start tweaking "natural" code to "unnatural", the compiler tends to think that you really meant what you wrote, and not optimise it the same way as the "natural" version).

What is faster: compare then change, or change immediately?

Let I'm doing very fast loops and I have to be sure that in the end of each loop the variable a is SOMEVALUE. What will be faster?
if (a != SOMEVALUE) a = SOMEVALUE;
or just instantly do
a = SOMEVALUE;
Is it float/int/bool/language specific?
Update: a is a primitive type, not a class. And the possibility of TRUE comparison is 50%. I know that the algorithm is what makes a loop fast, so my question is also about the coding style.
Update2: thanks everyone for quick answers!
In almost all cases just setting the value will be faster.
It might not be faster when you have to deal with cache line sharing with other cpus or if 'a' is in some special type of memory, but it's safe to assume that a branch misprediction is probably a more common problem than cache sharing.
Also - smaller code is better, not just for the cache but also for making the code comprehensible.
If in doubt - profile.
The general answer is to profile such kind of questions. However, in this case a simple analysis is available:
Each test is a branch. Each branch incurs a slight performance penalty. However, we have branch prediction and this penalty is somewhat amortized in time, depending how many iterations your loop has and how many times the prediction was correct.
Translated into your case, if you have many changes to a during the loop it is very likely that the code using if will be worse in performance. On the other hand, if the value is updated very rarely there would be an infinitely small difference between the two cases.
Still, change immediately is better and should be used, as long as you don't care about the previous value, as your snippets show.
Other reasons for an immediate change: it leads to smaller code thus better cache locality, thus better code performance. It is a very rare situation in which updating a will invalidate a cache line and incur a performance hit. Still, if I remember correctly, this will byte you only on multi processor cases and very rarely.
Keep in mind that there are cases when the two are not similar. Comparing NaNs is undefined behaviour.
Also, this comment treats only the case of C. In C++ you can have classes where the assignment operator / copy constructor takes longer than testing for equality. In that case, you might want to test first.
Taking into account your update, it's better to simply use assignment as long as you're sure of not dealing with undefined behaviour (floats). Coding-style wise it is also better, easier to read.
You should profile it.
My guess would be that there is little difference, depending on how often the test is true (this is due to branch-prediction).
Of course, just setting it has the smallest absolute code size, which frees up instruction cache for more interesting code.
But, again, you should profile it.
I would be surprised is the answer wasn't a = somevalue, but there is no generic answer to this question. Firslty it depends on the speed of copy versus the speed of equality comparison. If the equality comparison is very fast then your first option may be better. Secondly, as always, it depends on your compiler/platform. The only way to answer such questions is to try both methods and time them.
As others have said, profiling it is going to be the easiest way to tell as it depends a lot on what kind of input you're throwing at it. However, if you think about the computational complexity of the two algorithms, the more input you throw at it, the smaller any possible difference of them becomes.
As you are asking this for a C++ program, I assume that you are compiling the code into native machine instructions.
Assigning the value directly without any comparison should be much faster in any case. To compare the values, both the values a and SOMEVALUE should be transferred to registers and one machine instruction cmp() has to be executed.
But in the later case where you assign directly, you just move one value from one memory location to another.
Only way the assignment can be slower is when memory writes are significantly costlier than memory reads. I don't see that happening.
Profile the code. Change accordingly.
For basic types, the no branch option should be faster. MSVS for example doesn't optimize the branch out.
That being said, here's an example of where the comparison version is faster:
struct X
{
bool comparisonDone;
X() : comparisonDone(false) {}
bool operator != (const X& other) { comparisonDone = true; return true; }
X& operator = (const X& other)
{
if ( !comparisonDone )
{
for ( int i = 0 ; i < 1000000 ; i++ )
cout << i;
}
return *this;
}
}
int main()
{
X a;
X SOMEVALUE;
if (a != SOMEVALUE) a = SOMEVALUE;
a = SOMEVALUE;
}
Change immediately is usually faster, as it involves no branch in the code.
As commented below and answered by others, it really depends on many variables, but IMHO the real question is: do you care what was the previous value? If you are, you should check, otherwise, you shouldn't.
That if can actually be 'optimized away' by some compilers, basically turning the if into code noise (for the programmer who's reading it).
When I compile the following function with GCC for x86 (with -O1, which is a pretty reasonable optimization level):
int foo (int a)
{
int b;
if (b != a)
b = a;
b += 5;
return b;
}
GCC just 'optimizes' the if and the assignment away, and simply uses the argument to do the addition:
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
popl %ebp
addl $5, %eax
ret
.ident "GCC: (GNU) 4.4.3"
Having or not having the if generates exact the same code.

Is x >= 0 more efficient than x > -1?

Doing a comparison in C++ with an int is x >= 0 more efficient than x > -1?
short answer: no.
longer answer to provide some educational insight: it depends entirely on your compiler, allthough i bet that every sane compiler creates identical code for the 2 expressions.
example code:
int func_ge0(int a) {
return a >= 0;
}
int func_gtm1(int a) {
return a > -1;
}
and then compile and compare the resulting assembler code:
% gcc -S -O2 -fomit-frame-pointer foo.cc
yields this:
_Z8func_ge0i:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
movl 4(%esp), %eax
notl %eax
shrl $31, %eax
ret
.cfi_endproc
vs.
_Z9func_gtm1i:
.LFB1:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
movl 4(%esp), %eax
notl %eax
shrl $31, %eax
ret
.cfi_endproc
(compiler: g++-4.4)
conclusion: don't try to outsmart the compiler, concentrate on algorithms and data structures, benchmark and profile real bottlenecks, if in doubt: check the output of the compiler.
You can look at the resulting assembly code, which may differ from architecture to architecture, but I would bet that the resulting code for either would require exactly the same cycles.
And, as mentioned in the comments - better write what's most comprehensible, optimize when you have real measured bottlenecks, which you can identify with a profiler.
BTW: Rightly mentioned, that x>-1 may cause problems if x is unsigned. It may be implicitly cast into signed (although you should get a warning on that), which would yield incorrect result.
The last time I answered such a question I just wrote "measure", and filled out with periods until SO accepted it.
That answer was downvoted 3 times in a few minutes, and deleted (along with at least one other answer of the question) by an SO moderator.
Still, there is no alternative to measuring.
So it is the only possible answer.
And in order to go on and on about this in sufficient detail that the answer is not just downvoted and deleted, you need to keep in mind that what you're measuring is just that: that a single set of measurements does not necessarily tell you anything in general, but just a specific result. Of course it might sound patronizing to mention such obvious things. So, OK, let that be it: just measure.
Or, should I perhaps mention that most processors have a special instruction for comparing against zero, and yet that that does not allow one to conclude anything about performance of your code snippets?
Well, I think I stop there. Remember: measure. And don't optimize prematurely!
EDIT: an amendment with the points mentioned by #MooingDuck in the commentary.
The question:
Doing a comparison in C++ with an int is x >= 0 more efficient than x > -1?
What’s wrong with the question
Donald Knuth, author of the classic three volume work The Art of Computer Programming, once wrote[1],
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”
How efficient x >= 0 is compared to x > -1 is most often irrelevant. I.e. it’s most likely a wrong thing to focus on.
How clearly it expresses what you want to say, is much more important. Your time and the time of others maintaining this code is generally much more important than the execution time of the program. Focus on how well the code communicates to other programmers, i.e., focus on clarity.
Why the focus of the question is wrong
Clarity affects the chance of correctness. Any code can be made arbitrarily fast if it does not need to be correct. Correctness is therefore most important, and means that clarity is very important – much more important than shaving a nano-second of execution time…
And the two expressions are not equivalent wrt. clarity, and wrt. to their chance of being correct.
If x is a signed integer, then x >= 0 means exactly the same as x > -1. But if x is an unsigned integer, e.g. of type unsigned, then x > -1 means x > static_cast<unsigned>(-1) (via implicit promotion), which in turn means x > std::numeric_limits<unsigned>::max(). Which is presumably not what the programmer meant to express!
Another reason why the focus is wrong (it’s on micro-efficiency, while it should be on clarity) is that the main impact on efficiency comes in general not from timings of individual operations (except in some cases from dynamic allocation and from the even slower disk and network operations), but from algorithmic efficiency. For example, writing …
string s = "";
for( int i = 0; i < n; ++i ) { s = s + "-"; }
is pretty inefficient, because it uses time proportional to the square of n, O(n2), quadratic time.
But writing instead …
string s = "";
for( int i = 0; i < n; ++i ) { s += "-"; }
reduces the time to proportional to n, O(n), linear time.
With the focus on individual operation timings one could be thinking now about writing '-' instead of "-", and such silly details. Instead, with the focus on clarity, one would be focusing on making that code more clear than with a loop. E.g. by using the appropriate string constructor:
string s( n, '-' );
Wow!
Finally, a third reason to not sweat the small stuff is that in general it’s just a very small part of the code that contributes disproportionally to the execution time. And identifying that part (or parts) is not easy to do by just analyzing the code. Measurements are needed, and this kind of "where is it spending its time" measurement is called profiling.
How to figure out the answer to the question
Twenty or thirty years ago one could get a reasonable idea of efficiency of individual operations, by simply looking at the generated machine code.
For example, you can look at the machine code by running the program in a debugger, or you use the approiate option to ask the compiler to generate an assembly language listing. Note for g++: the option -masm=intel is handy for telling the compiler not to generate ungrokkable AT&T syntax assembly, but instead Intel syntax assembly. E.g., Microsoft's assembler uses extended Intel syntax.
Today the computer's processor is more smart. It can execute instructions out of order and even before their effect is needed for the "current" point of execution. The compiler may be able to predict that (by incorporating effective knowledge gleaned from measurements), but a human has little chance.
The only recourse for the ordinary programmer is therefore to measure.
Measure, measure, measure!
And in general this involves doing the thing to be measured, a zillion times, and dividing by a zillion.
Otherwise the startup time and take-down time will dominate, and the result will be garbage.
Of course, if the generated machine code is the same, then measuring will not tell you anything useful about the relative difference. It can then only indicate something about how large the measurement error is. Because you know then that there should be zero difference.
Why measuring is the right approach
Let’s say that theoretical considerations in an SO answer indicated that x >= -1 will be slower than x > 0.
The compiler can beat any such theoretical consideration by generating awful code for that x > 0, perhaps due to a contextual "optimization" opportunity that it then (unfortunately!) recognizes.
The computer's processor can likewise make a mess out of the prediction.
So in any case you then have to measure.
Which means that the theoretical consideration has told you nothing useful: you’ll be doing the same anyway, namely, measuring.
Why this elaborated answer, while apparently helpful, is IMHO really not
Personally I would prefer the single word “measure” as an answer.
Because that’s what it boils down to.
Anything else the reader not only can figure out on his own, but will have to figure out the details of anyway – so that it’s just verbiage to try to describe it here, really.
References:
[1] Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.
Your compiler is free to decide how to implement those (which assembly instructions to use). Because of that, there is no difference. One compiler could implement x > -1 as x >= 0 and another could implement x >= 0 as x > -1. If there is any difference (unlikely), your compiler will pick the better one.
They should be equivalent. Both will be translated by the compiler into a single assembly instruction (neglecting that both will need to load x into a register). On any modern day processor there is a 'greater-than' instruction and a 'greater-than-or-equal' instruction. And since you are comparing it to a constant value, it will take the same amount of time.
Don't fret over the minute details, find the big performance problems (like algorithm design) and attack those, look at Amdahls Law.
I doubt there is any measurable difference. The compiler should emit some assembled code with a jump instruction such as JAE (jump if above or equal) or JA (jump if above). These instructions likely span the same number of cycles.
Ultimately, it doesn't matter. Just use what is more clear to a human reader of your code.

Is "for(;;)" faster than "while (true)"? If not, why do people use it?

for (;;) {
//Something to be done repeatedly
}
I have seen this sort of thing used a lot, but I think it is rather strange...
Wouldn't it be much clearer to say while(true), or something along those lines?
I'm guessing that (as is the reason for many-a-programmer to resort to cryptic code) this is a tiny margin faster?
Why, and is it really worth it? If so, why not just define it this way:
#define while(true) for(;;)
See also: Which is faster: while(1) or while(2)?
It's not faster.
If you really care, compile with assembler output for your platform and look to see.
It doesn't matter. This never matters. Write your infinite loops however you like.
I prefer for(;;) for two reasons.
One is that some compilers produce warnings on while(true) (something like "loop condition is constant"). Avoiding warnings is always a good thing to do.
Another is that I think for(;;) is clearer and more telling.
I want an infinite loop. It literally has no condition, it depends on nothing. I just want it to continue forever, until I do something to break out of it.
Whereas with while(true), well, what's true got to do with anything? I'm not interested in looping until true becomes false, which is what this form literally says (loop while true is true). I just want to loop.
And no, there is absolutely no performance difference.
Personally I use for (;;) because there aren't any numbers in it, it's just a keyword. I prefer it to while (true), while (1), while (42), while (!0) etc etc.
Because of Dennis Ritchie
I started using for (;;) because that's the way Dennis Ritchie does it in K&R, and when learning a new language I always try to imitate the smart guys.
This is idiomatic C/C++. It's probably better in the long run to get used to it if you plan on doing much in the C/C++ space.
Your #define won't work, since the thing being #define'd has to look like a C identifier.
All modern compilers will generate the same code for the two constructs.
I prefer for (;;) because it's the most consistent in different C-like languages.
In C++ while (true) is fine, but in C you depend on a header to define true, yet TRUE is a commonly used macro too. If you use while (1) it's correct in C and C++, and JavaScript, but not Java or C#, which require the loop condition to be a boolean, such as while (true) or while (1 == 1). In PHP, keywords are case-insensitive but the language prefers the capitalization TRUE.
However, for (;;) is always completely correct in all of those languages.
It's certainly not faster in any sane compiler. They will both be compiled into unconditional jumps. The for version is easier to type (as Neil said) and will be clear if you understand for loop syntax.
If you're curious, here is what gcc 4.4.1 gives me for x86. Both use the x86 JMP instruction.
void while_infinite()
{
while(1)
{
puts("while");
}
}
void for_infinite()
{
for(;;)
{
puts("for");
}
}
compiles to (in part):
.LC0:
.string "while"
.text
.globl while_infinite
.type while_infinite, #function
while_infinite:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
.L2:
movl $.LC0, (%esp)
call puts
jmp .L2
.size while_infinite, .-while_infinite
.section .rodata
.LC1:
.string "for"
.text
.globl for_infinite
.type for_infinite, #function
for_infinite:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
.L5:
movl $.LC1, (%esp)
call puts
jmp .L5
.size for_infinite, .-for_infinite
I personally prefer the for (;;) idiom (which will compile to the same code as while (TRUE).
Using while (TRUE) may be more readable in one sense, I've decided to use the for (;;) idiom because it stands out.
An infinite loop construct should be easily noticed or called out in code, and I personally think the for (;;) style does this a bit better than while (TRUE) or while (1).
Also, I recall that some compilers issue warnings when the controlling expression of a while loop is a constant. I don't think that happens too much, but just the potential for spurious warnings is enough for me to want to avoid it.
I've seen some people prefer it because they have a #define somewhere like this:
#define EVER ;;
Which allows them to write this:
for (EVER)
{
/* blah */
}
What about (if your language supports it):
start:
/* BLAH */
goto start;
There's no difference in terms of the machine code that is generated.
However, just to buck the trend, I'd argue that the while(TRUE) form is much more readable and intuitive than for(;;), and that readability and clarity are much more important reasons for coding guidelines than any reasons I've heard for the for(;;) approach (I prefer to base my coding guidelines on solid reasoning and/or proof of effectiveness myself).
while(true)
generates a warning with Visual Studio (condition is constant). Most places I've worked compile production builds with warnings as errors. So
for(;;)
is preferred.
Not just a well-known pattern, but a standard idiom in C (and C++)
Both should be same if your code is optimized by compiler. To explain what I mean by optimization, here is a sample code written in MSVC 10:
int x = 0;
while(true) // for(;;)
{
x +=1;
printf("%d", x);
}
If you build it in Debug mode (without any optimization (/Od)) disassembly shows the clear difference. There is extra instructions for the true condition inside while.
while(true)
00D313A5 mov eax,1 //extra
00D313AA test eax,eax //extra
00D313AC je main+39h (0D313B9h) //extra
{
x +=1;
00D313AE mov eax,dword ptr [x]
00D313B1 add eax,1
00D313B4 mov dword ptr [x],eax
printf("%d", x);
...
}
00D313B7 jmp main+25h (0D313A5h)
for(;;)
{
x +=1;
00D213A5 mov eax,dword ptr [x]
00D213A8 add eax,1
00D213AB mov dword ptr [x],eax
printf("%d", x);
...
}
00D213AE jmp main+25h (0D213A5h)
However, if you build your code in Release mode (with default Maximize Speed (/O2)) you get same output for both. Both loops are reduced to one jump instruction.
for(;;)
{
x +=1;
01291010 inc esi
printf("%d", x);
...
}
0129101C jmp main+10h (1291010h)
while(true)
{
x +=1;
00311010 inc esi
printf("%d", x);
...
}
0031101C jmp main+10h (311010h)
Whichever you will use does not matter for a decent compiler with speed optimization is on.
It's a matter of personal preference which way is faster. Personally, I am a touchtypist and never look at my keyboard, during programming -- I can touchtype all 104 keys on my keyboard.
I find if faster to type "while (TRUE)".
I mentally added some finger movement measurements and totalled them up.
"for(;;)" has about 12 key-widths of movements back and fourth (between home keys and the keys, and between home keys and SHIFT key)
"while (TRUE)" has about 14 key-widths of movements back and fourth.
However, I am vastly less error-prone when typing the latter. I mentally think in words at a time, so I find it faster to type things like "nIndex" than acronyms such as "nIdx" because I have to actually mentally spell out the lettering rather than speak it inside my mind and let my fingers auto-type the word (like riding a bicycle)
(My TypingTest.com benchmark = 136 WPM)
I cannot imagine that a worthwhile compiler would generate any different code. Even if it did, there would be no way of determining without testing the particular compiler which was more efficient.
However I suggest you prefer for(;;) for the following reasons:
a number of compilers I have used will generate a constant expression warning for while(true) with appropriate warning level settings.
in your example the macro TRUE may not be defined as you expect
there are many possible variants of the infinite while loop such as while(1), while(true), while(1==1) etc.; so for(;;) is likely to result in greater consistency.
All good answers - behavior should be exactly the same.
HOWEVER - Just suppose it DID make a difference. Suppose one of them took 3 more instructions per iteration.
Should you care?
ONLY if what you do inside the loop is almost nothing, which is almost never the case.
My point is, there is micro-optimization and macro-optimization. Micro-optimization is like "getting a haircut to lose weight".
The "forever" loop is popular in embedded systems as a background loop. Some people implement it as:
for (; ;)
{
// Stuff done in background loop
}
And sometimes it is implemented as:
while (TRUE /* or use 1 */)
{
// Stuff done in background loop
}
And yet another implementation is:
do
{
// Stuff done in background loop
} while (1 /* or TRUE */);
An optimizing compiler should generate the same or similar assembly code for these fragments. One important note: the execution time for the loops is not a big concern since these loops go on forever, and more time is spent in the processing section.
for(;;Sleep(50))
{
// Lots of code
}
Is a clearer than:
while(true)
{
// Lots of code
Sleep(50);
}
Not that this applies if you aren't using Sleep().
The most important reason to use "for(;;)" is the fear of using "while(TRUE)" when you do exploratory programming. It's easier to control the amount of repetitions with "for", and also, easier to convert the "for" loop into an infinite.
For example, if you are constructing a recursive function, you can limit the amount of calls to the function before converting into an infinite loop.
for(int i=0;i<1000;i++) recursiveFunction(oneParam);
When I'm sure of my function, then I convert it to an infinite loop:
for(;;) recursiveFunction(oneParam);
I assume while(true) is more readable than for(;;) -- its look like programmer misses something in for loop :)
As others have pointed out, it does not matter at all from a technical view. Some people think one is more readable than the other, and they have different opinions about which it is.
To me, that's basically just nitpicking, because any C programmer should be able to instantly recognize both while(1) and for(;;) as infinite loops.
Your define will not work. However, you CAN (but shouldn't) do this:
#define repeat for(;;)
int main(void)
{
repeat {
puts("Hello, World");
}
}
But really, DON'T do things like that...

Why is this code slower even if the function is inlined?

I have a method like this :
bool MyFunction(int& i)
{
switch(m_step)
{
case 1:
if (AComplexCondition)
{
i = m_i;
return true;
}
case 2:
// some code
case 3:
// some code
}
}
Since there are lots of case statements (more than 3) and the function is becoming large, I tried to extract the code in case 1 and put it in an inline function like this:
inline bool funct(int& i)
{
if (AComplexCondition)
{
i = m_i;
return true;
}
return false;
}
bool MyFunction(int& i)
{
switch(m_step)
{
case 1:
if (funct(i))
{
return true;
}
case 2:
// some code
case 3:
// some code
}
}
It seems this code is significantly slower than the original. I checked with -Winline and the function is inlined. Why is this code slower? I thought it would be equivalent. The only difference I see is there is one more conditional check in the second version, but I thought the compiler should be able to optimize it away. Right?
edit:
Some peoples suggested that I should use gdb to stop over every assembly instructions in both versions to see the differences. I did this.
The first version look like this :
mov
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
movl (m_step = 1)
The second version, that is a bit slower seems simpler.
movl (m_step = 1)
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
xchg %ax,%ax (This is a nop I think)
These two version seems to do the same thing, so I still don't know why the second version is still slower.
Just step through it. Plant a breakpoint, go into the disassembly view, and start stepping.
All mysteries will vanish.
This is very hard to track down. One problem could be code bloat causing the majority of the loop to be pushed out of the (small) CPU cache... But that doesn't entirely make sense either now that I think of it..
What I suggest doing:
Isolate the code and condition as much as possible while still being able to observe the slowdown.
Then, go profile it. Does the profiling make sense? Now, (assuming your up for the adventure) disasssemble the code and look at what g++ is doing different. Report those results back here
GMan is correct, inline doesn't guarantee that your function will be inlined. It is a hint to the compiler that it might be a good idea. If the compiler doesn't think it is wise to inline the function, you now have the overhead of a function call. Which at the very least will mean two JMP statement being executed. Which means the instructions for the function are stored in a non sequential location, not in the next memory location where the function was invoked, and execution will move that new location complete it and move back to after your function call.
Without seeing the ComplexCondition part, it's hard to say. If that condition is sufficiently complex, the compiler won't be able to pipeline it properly and it will interfere with the branch prediction in the chip. Just a possibility.
Does the assembler tell you anything about what's happening? It might be easier to look at the disassembly than to have us guess, although I go along with iaimtomisbehave's jmp idea generally.
This is a good question. Let us know what you find. I do have a few thoughts mostly stemming from the compiler no longer being able to break up the code you have inlined, but no guaranteed answer.
statement order. It makes sense that the compiler would put this statement with its complex code last. That means the other cases would be evaluated first and it would never get checked unless necessary. If you simplify the statement it might not do this, meaning your crazy conditional gets fully evaluated every time.
creating extra cases. It should be possible to pull some of the coditionals out of the if statement and make an extra case stament in some circumstances. That could eliminate some checking.
pipelining defeated. Even if it inlines, it won't be able to break up the code inside the actuall inlining any. This is the basic issue with all three of these, but with pipelining this causes problems obviously since for pipelining you want to start executing before you get to the check itself.