Is x >= 0 more efficient than x > -1? - c++

Doing a comparison in C++ with an int is x >= 0 more efficient than x > -1?

short answer: no.
longer answer to provide some educational insight: it depends entirely on your compiler, allthough i bet that every sane compiler creates identical code for the 2 expressions.
example code:
int func_ge0(int a) {
return a >= 0;
}
int func_gtm1(int a) {
return a > -1;
}
and then compile and compare the resulting assembler code:
% gcc -S -O2 -fomit-frame-pointer foo.cc
yields this:
_Z8func_ge0i:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
movl 4(%esp), %eax
notl %eax
shrl $31, %eax
ret
.cfi_endproc
vs.
_Z9func_gtm1i:
.LFB1:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
movl 4(%esp), %eax
notl %eax
shrl $31, %eax
ret
.cfi_endproc
(compiler: g++-4.4)
conclusion: don't try to outsmart the compiler, concentrate on algorithms and data structures, benchmark and profile real bottlenecks, if in doubt: check the output of the compiler.

You can look at the resulting assembly code, which may differ from architecture to architecture, but I would bet that the resulting code for either would require exactly the same cycles.
And, as mentioned in the comments - better write what's most comprehensible, optimize when you have real measured bottlenecks, which you can identify with a profiler.
BTW: Rightly mentioned, that x>-1 may cause problems if x is unsigned. It may be implicitly cast into signed (although you should get a warning on that), which would yield incorrect result.

The last time I answered such a question I just wrote "measure", and filled out with periods until SO accepted it.
That answer was downvoted 3 times in a few minutes, and deleted (along with at least one other answer of the question) by an SO moderator.
Still, there is no alternative to measuring.
So it is the only possible answer.
And in order to go on and on about this in sufficient detail that the answer is not just downvoted and deleted, you need to keep in mind that what you're measuring is just that: that a single set of measurements does not necessarily tell you anything in general, but just a specific result. Of course it might sound patronizing to mention such obvious things. So, OK, let that be it: just measure.
Or, should I perhaps mention that most processors have a special instruction for comparing against zero, and yet that that does not allow one to conclude anything about performance of your code snippets?
Well, I think I stop there. Remember: measure. And don't optimize prematurely!
EDIT: an amendment with the points mentioned by #MooingDuck in the commentary.
The question:
Doing a comparison in C++ with an int is x >= 0 more efficient than x > -1?
What’s wrong with the question
Donald Knuth, author of the classic three volume work The Art of Computer Programming, once wrote[1],
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”
How efficient x >= 0 is compared to x > -1 is most often irrelevant. I.e. it’s most likely a wrong thing to focus on.
How clearly it expresses what you want to say, is much more important. Your time and the time of others maintaining this code is generally much more important than the execution time of the program. Focus on how well the code communicates to other programmers, i.e., focus on clarity.
Why the focus of the question is wrong
Clarity affects the chance of correctness. Any code can be made arbitrarily fast if it does not need to be correct. Correctness is therefore most important, and means that clarity is very important – much more important than shaving a nano-second of execution time…
And the two expressions are not equivalent wrt. clarity, and wrt. to their chance of being correct.
If x is a signed integer, then x >= 0 means exactly the same as x > -1. But if x is an unsigned integer, e.g. of type unsigned, then x > -1 means x > static_cast<unsigned>(-1) (via implicit promotion), which in turn means x > std::numeric_limits<unsigned>::max(). Which is presumably not what the programmer meant to express!
Another reason why the focus is wrong (it’s on micro-efficiency, while it should be on clarity) is that the main impact on efficiency comes in general not from timings of individual operations (except in some cases from dynamic allocation and from the even slower disk and network operations), but from algorithmic efficiency. For example, writing …
string s = "";
for( int i = 0; i < n; ++i ) { s = s + "-"; }
is pretty inefficient, because it uses time proportional to the square of n, O(n2), quadratic time.
But writing instead …
string s = "";
for( int i = 0; i < n; ++i ) { s += "-"; }
reduces the time to proportional to n, O(n), linear time.
With the focus on individual operation timings one could be thinking now about writing '-' instead of "-", and such silly details. Instead, with the focus on clarity, one would be focusing on making that code more clear than with a loop. E.g. by using the appropriate string constructor:
string s( n, '-' );
Wow!
Finally, a third reason to not sweat the small stuff is that in general it’s just a very small part of the code that contributes disproportionally to the execution time. And identifying that part (or parts) is not easy to do by just analyzing the code. Measurements are needed, and this kind of "where is it spending its time" measurement is called profiling.
How to figure out the answer to the question
Twenty or thirty years ago one could get a reasonable idea of efficiency of individual operations, by simply looking at the generated machine code.
For example, you can look at the machine code by running the program in a debugger, or you use the approiate option to ask the compiler to generate an assembly language listing. Note for g++: the option -masm=intel is handy for telling the compiler not to generate ungrokkable AT&T syntax assembly, but instead Intel syntax assembly. E.g., Microsoft's assembler uses extended Intel syntax.
Today the computer's processor is more smart. It can execute instructions out of order and even before their effect is needed for the "current" point of execution. The compiler may be able to predict that (by incorporating effective knowledge gleaned from measurements), but a human has little chance.
The only recourse for the ordinary programmer is therefore to measure.
Measure, measure, measure!
And in general this involves doing the thing to be measured, a zillion times, and dividing by a zillion.
Otherwise the startup time and take-down time will dominate, and the result will be garbage.
Of course, if the generated machine code is the same, then measuring will not tell you anything useful about the relative difference. It can then only indicate something about how large the measurement error is. Because you know then that there should be zero difference.
Why measuring is the right approach
Let’s say that theoretical considerations in an SO answer indicated that x >= -1 will be slower than x > 0.
The compiler can beat any such theoretical consideration by generating awful code for that x > 0, perhaps due to a contextual "optimization" opportunity that it then (unfortunately!) recognizes.
The computer's processor can likewise make a mess out of the prediction.
So in any case you then have to measure.
Which means that the theoretical consideration has told you nothing useful: you’ll be doing the same anyway, namely, measuring.
Why this elaborated answer, while apparently helpful, is IMHO really not
Personally I would prefer the single word “measure” as an answer.
Because that’s what it boils down to.
Anything else the reader not only can figure out on his own, but will have to figure out the details of anyway – so that it’s just verbiage to try to describe it here, really.
References:
[1] Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.

Your compiler is free to decide how to implement those (which assembly instructions to use). Because of that, there is no difference. One compiler could implement x > -1 as x >= 0 and another could implement x >= 0 as x > -1. If there is any difference (unlikely), your compiler will pick the better one.

They should be equivalent. Both will be translated by the compiler into a single assembly instruction (neglecting that both will need to load x into a register). On any modern day processor there is a 'greater-than' instruction and a 'greater-than-or-equal' instruction. And since you are comparing it to a constant value, it will take the same amount of time.
Don't fret over the minute details, find the big performance problems (like algorithm design) and attack those, look at Amdahls Law.

I doubt there is any measurable difference. The compiler should emit some assembled code with a jump instruction such as JAE (jump if above or equal) or JA (jump if above). These instructions likely span the same number of cycles.
Ultimately, it doesn't matter. Just use what is more clear to a human reader of your code.

Related

What is faster: compare then change, or change immediately?

Let I'm doing very fast loops and I have to be sure that in the end of each loop the variable a is SOMEVALUE. What will be faster?
if (a != SOMEVALUE) a = SOMEVALUE;
or just instantly do
a = SOMEVALUE;
Is it float/int/bool/language specific?
Update: a is a primitive type, not a class. And the possibility of TRUE comparison is 50%. I know that the algorithm is what makes a loop fast, so my question is also about the coding style.
Update2: thanks everyone for quick answers!
In almost all cases just setting the value will be faster.
It might not be faster when you have to deal with cache line sharing with other cpus or if 'a' is in some special type of memory, but it's safe to assume that a branch misprediction is probably a more common problem than cache sharing.
Also - smaller code is better, not just for the cache but also for making the code comprehensible.
If in doubt - profile.
The general answer is to profile such kind of questions. However, in this case a simple analysis is available:
Each test is a branch. Each branch incurs a slight performance penalty. However, we have branch prediction and this penalty is somewhat amortized in time, depending how many iterations your loop has and how many times the prediction was correct.
Translated into your case, if you have many changes to a during the loop it is very likely that the code using if will be worse in performance. On the other hand, if the value is updated very rarely there would be an infinitely small difference between the two cases.
Still, change immediately is better and should be used, as long as you don't care about the previous value, as your snippets show.
Other reasons for an immediate change: it leads to smaller code thus better cache locality, thus better code performance. It is a very rare situation in which updating a will invalidate a cache line and incur a performance hit. Still, if I remember correctly, this will byte you only on multi processor cases and very rarely.
Keep in mind that there are cases when the two are not similar. Comparing NaNs is undefined behaviour.
Also, this comment treats only the case of C. In C++ you can have classes where the assignment operator / copy constructor takes longer than testing for equality. In that case, you might want to test first.
Taking into account your update, it's better to simply use assignment as long as you're sure of not dealing with undefined behaviour (floats). Coding-style wise it is also better, easier to read.
You should profile it.
My guess would be that there is little difference, depending on how often the test is true (this is due to branch-prediction).
Of course, just setting it has the smallest absolute code size, which frees up instruction cache for more interesting code.
But, again, you should profile it.
I would be surprised is the answer wasn't a = somevalue, but there is no generic answer to this question. Firslty it depends on the speed of copy versus the speed of equality comparison. If the equality comparison is very fast then your first option may be better. Secondly, as always, it depends on your compiler/platform. The only way to answer such questions is to try both methods and time them.
As others have said, profiling it is going to be the easiest way to tell as it depends a lot on what kind of input you're throwing at it. However, if you think about the computational complexity of the two algorithms, the more input you throw at it, the smaller any possible difference of them becomes.
As you are asking this for a C++ program, I assume that you are compiling the code into native machine instructions.
Assigning the value directly without any comparison should be much faster in any case. To compare the values, both the values a and SOMEVALUE should be transferred to registers and one machine instruction cmp() has to be executed.
But in the later case where you assign directly, you just move one value from one memory location to another.
Only way the assignment can be slower is when memory writes are significantly costlier than memory reads. I don't see that happening.
Profile the code. Change accordingly.
For basic types, the no branch option should be faster. MSVS for example doesn't optimize the branch out.
That being said, here's an example of where the comparison version is faster:
struct X
{
bool comparisonDone;
X() : comparisonDone(false) {}
bool operator != (const X& other) { comparisonDone = true; return true; }
X& operator = (const X& other)
{
if ( !comparisonDone )
{
for ( int i = 0 ; i < 1000000 ; i++ )
cout << i;
}
return *this;
}
}
int main()
{
X a;
X SOMEVALUE;
if (a != SOMEVALUE) a = SOMEVALUE;
a = SOMEVALUE;
}
Change immediately is usually faster, as it involves no branch in the code.
As commented below and answered by others, it really depends on many variables, but IMHO the real question is: do you care what was the previous value? If you are, you should check, otherwise, you shouldn't.
That if can actually be 'optimized away' by some compilers, basically turning the if into code noise (for the programmer who's reading it).
When I compile the following function with GCC for x86 (with -O1, which is a pretty reasonable optimization level):
int foo (int a)
{
int b;
if (b != a)
b = a;
b += 5;
return b;
}
GCC just 'optimizes' the if and the assignment away, and simply uses the argument to do the addition:
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
popl %ebp
addl $5, %eax
ret
.ident "GCC: (GNU) 4.4.3"
Having or not having the if generates exact the same code.

Which is faster (mask >> i & 1) or (mask & 1 << i)?

In my code I must choose one of this two expressions (where mask and i non constant integer numbers -1 < i < (sizeof(int) << 3) + 1). I don't think that this will make preformance of my programm better or worse, but it is very interesting for me. Do you know which is better and why?
First of all, whenever you find yourself asking "which is faster", your first reaction should be to profile, measure and find out for yourself.
Second of all, this is such a tiny calculation, that it almost certainly has no bearing on the performance of your application.
Third, the two are most likely identical in performance.
C expressions cannot be "faster" or "slower", because CPU cannot evaluate them directly.
Which one is "faster" depends on the machine code your compiler will be able to generate for these two expressions. If your compiler is smart enough to realize that in your context both do the same thing (e.g. you simply compare the result with zero), it will probably generate the same code for both variants, meaning that they will be equally fast. In such case it is quite possible that the generated machine code will not even remotely resemble the sequence of operations in the original expression (i.e. no shift and/or no bitwise-and). If what you are trying to do here is just test the value of one bit, then there are other ways to do it besides the shift-and-bitwise-and combination. And many of those "other ways" are not expressible in C. You can't use them in C, while the compiler can use them in machine code.
For example, the x86 CPU has a dedicated bit-test instruction BT that extracts the value of a specific bit by its number. So a smart compiler might simply generate something like
MOV eax, i
BT mask, eax
...
for both of your expressions (assuming it is more efficient, of which I'm not sure).
Use either one and let your compiler optimize it however it likes.
If "i" is a compile-time constant, then the second would execute fewer instructions -- the 1 << i would be computed at compile time. Otherwise I'd imagine they'd be the same.
Depends entirely on where the values mask and i come from, and the architecture on which the program is running. There's also nothing to stop the compiler from transforming one into the other in situations where they are actually equivalent.
In short, not worth worrying about unless you have a trace showing that this is an appreciable fraction of total execution time.
It is unlikely that either will be faster. If you are really curious, compile a simple program that does both, disassemble, and see what instructions are generated.
Here is how to do that:
gcc -O0 -g main.c -o main
objdump -d main | less
You could examine the assembly output and then look-up how many clock cycles each instruction takes.
But in 99.9999999 percent of programs, it won't make a lick of difference.
The 2 expressions are not logically equivalent, performance is not your concern!
If performance was your concern, write a loop to do 10 million of each and measure.
EDIT: You edited the question after my response ... so please ignore my answer as the constraints change things.

rate ++a,a++,a=a+1 and a+=1 in terms of execution efficiency in C.Assume gcc to be the compiler [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there a performance difference between i++ and ++i in C++?
In terms of usage of the following, please rate in terms of execution time in C.
In some interviews i was asked which shall i use among these variations and why.
a++
++a
a=a+1
a+=1
Here is what g++ -S produces:
void irrelevant_low_level_worries()
{
int a = 0;
// movl $0, -4(%ebp)
a++;
// incl -4(%ebp)
++a;
// incl -4(%ebp)
a = a + 1;
// incl -4(%ebp)
a += 1;
// incl -4(%ebp)
}
So even without any optimizer switches, all four statements compile to the exact same machine code.
You can't rate the execution time in C, because it's not the C code that is executed. You have to profile the executable code compiled with a specific compiler running on a specific computer to get a rating.
Also, rating a single operation doesn't give you something that you can really use. Todays processors execute several instructions in parallel, so the efficiency of an operation relies very much on how well it can be paired with the instructions in the surrounding code.
So, if you really need to use the one that has the best performance, you have to profile the code. Otherwise (which is about 98% of the time) you should use the one that is most readable and best conveys what the code is doing.
The circumstances where these kinds of things actually matter is very rare and few in between. Most of the time, it doesn't matter at all. In fact I'm willing to bet that this is the case for you.
What is true for one language/compiler/architecture may not be true for others. And really, the fact is irrelevant in the bigger picture anyway. Knowing these things do not make you a better programmer.
You should study algorithms, data structures, asymptotic analysis, clean and readable coding style, programming paradigms, etc. Those skills are a lot more important in producing performant and manageable code than knowing these kinds of low-level details.
Do not optimize prematurely, but also, do not micro-optimize. Look for the big picture optimizations.
This depends on the type of a as well as on the context of execution. If a is of a primitive type and if all four statements have the same identical effect then these should all be equivalent and identical in terms of efficiency. That is, the compiler should be smart enough to translate them into the same optimized machine code. Granted, that is not a requirement, but if it's not the case with your compiler then that is a good sign to start looking for a better compiler.
For most compilers it should compile to the same ASM code.
Same.
For more details see http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
I can't see why there should be any difference in execution time, but let's prove me wrong.
a++
and
++a
are not the same however, but this is not related to efficiency.
When it comes to performance of individual lines, context is always important, and guessing is not a good idea. Test and measure is better
In an interview, I would go with two answers:
At first glance, the generated code should be very similar, especially if a is an integer.
If execution time was definitely a known problem - you have to measure it using some kind of profiler.
Well, you could argue that a++ is short and to the point. It can only increment a by one, but the notation is very well understood. a=a+1 is a little more verbose (not a big deal, unless you have variablesWithGratuitouslyLongNames), but some might argue it's more "flexible" because you can replace the 1 or either of the a's to change the expression. a+=1 is maybe not as flexible as the other two but is a little more clear, in the sense that you can change the increment amount. ++a is different from a++ and some would argue against it because it's not always clear to people who don't use it often.
In terms of efficiency, I think most modern compilers will produce the same code for all of these but I could be mistaken. Really, you'd have to run your code with all variations and measure which performs best.
(assuming that a is an integer)
It depends on the context, and if we are in C or C++. In C the code you posted (except for a-- :-) will cause a modern C compiler to produce exactly the same code. But by a very high chance the expected answer is that a++ is the fastest one and a=a+1 the slowest, since ancient compilers relied on the user to perform such optimizations.
In C++ it depends of the type of a. When a is a numeric type, it acts the same way as in C, which means a++, a+=1 and a=a+1 generate the same code. When a is a object, it depends if any operator (++, + and =) is overloaded, since then the overloaded operator of the a object is called.
Also when you work in a field with very special compilers (like microcontrollers or embedded systems) these compilers can behave very differently on each of these input variations.

Performance of comparisons in C++ ( foo >= 0 vs. foo != 0 )

I've been working on a piece of code recently where performance is very important, and essentially I have the following situation:
int len = some_very_big_number;
int counter = some_rather_small_number;
for( int i = len; i >= 0; --i ){
while( counter > 0 && costly other stuff here ){
/* do stuff */
--counter;
}
/* do more stuff */
}
So here I have a loop that runs very often and for a certain number of runs the while block will be executed as well until the variable counter is reduced to zero and then the while loop will not be called because the first expression will be false.
The question is now, if there is a difference in performance between using
counter > 0 and counter != 0?
I suspect there would be, does anyone know specifics about this.
To measure is to know.
Do you think that what will solve your problem! :D
if(x >= 0)
00CA1011 cmp dword ptr [esp],0
00CA1015 jl main+2Ch (0CA102Ch) <----
...
if(x != 0)
00CA1026 cmp dword ptr [esp],0
00CA102A je main+3Bh (0CA103Bh) <----
In programming, the following statement is the sign designating the road to Hell:
I've been working on a piece of code recently where performance is very important
Write your code in the cleanest, most easy to understand way. Period.
Once that is done, you can measure its runtime. If it takes too long, measure the bottlenecks, and speed up the biggest ones. Keep doing that until it is fast enough.
The list of projects that failed or suffered catastrophic loss due to a misguided emphasis on blind optimization is large and tragic. Don't join them.
I think you're spending time optimizing the wrong thing. "costly other stuff here", "do stuff" and "do more stuff" are more important to look at. That is where you'll make the big performance improvements I bet.
There will be a huge difference if the counter starts with a negative number. Otherwise, on every platform I'm familiar with, there won't be a difference.
Is there a difference between counter > 0 and counter != 0? It depends on the platform.
A very common type of CPU are those from Intel we have in our PC's. Both comparisons will map to a single instruction on that CPU and I assume they will execute at the same speed. However, to be certain you will have to perform your own benchmark.
As Jim said, when in doubt see for yourself :
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
ptime Before = microsec_clock::universal_time(); // UTC NOW
// do stuff here
ptime After = microsec_clock::universal_time(); // UTC NOW
time_duration delta_t = After - Before; // How much time has passed?
cout << delta_t.total_seconds() << endl; // how much seconds total?
cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}
Here's a pretty nifty way of measuring time. Hope that helps.
OK, you can measure this, sure. However, these sorts of comparisons are so fast that you are probably going to see more variation based on processor swapping and scheduling then on this single line of code.
This smells of unnecessary, and premature, optimization. Right your program, optimize what you see. If you need more, profile, and then go from there.
I would add that the overwhelming performance aspects of this code on modern cpus will be dominated not by the comparison instruction but whether the comparison is well predicted since any mis-predict will waste many more cycles than any integral operation.
As such loop unrolling will probably be the biggest winner but measure, measure, measure.
Thinking that the type of comparison is going to make a difference, without knowing it, is the definition of guessing.
Don't guess.
In general, they should be equivalent (both are usually implemented in single-cycle instructions/micro-ops). Your compiler may do some strange special-case optimization that is difficult to reason about from the source level, which may make either one slightly faster. Also, equality testing is more energy-efficient than inequality testing (>), though the system-level effect is so small as to not merit discussion.
There may be no difference. You could try examining the assembly output for each.
That being said, the only way to tell if any difference is significant is to try it both ways and measure. I'd bet that the change makes no difference whatsoever with optimizations on.
Assuming you are developing for the x86 architecture, when you look at the assembly output it will come down to jns vs jne. jns will check the sign flag and jne will check the zero flag. Both operations, should as far as I know, be equally costly.
Clearly the solution is to use the correct data type.
Make counter an unsigned int. Then it can't be less than zero. Your compiler will obviously know this and be forced to choose the optimal solution.
Or you could just measure it.
You could also think about how it would be implemented...(here we go on a tangent)...
less than zero: the sign bit would be set, so need to check 1 bit
equal to zero : the whole value would be zero, so need to check all the bits
Of course, computers are funny things, and it may take longer to check a single bit than the whole value (however many bytes it is on your platform).
You could just measure it...
And you could find out that one it more optimal than another (under the conditions you measured it). But your program will still run like a dog because you spent all your time optimising the wrong part of your code.
The best solution is to use what many large software companies do - blame the hardware for not runnnig fast enough and encourage your customer to upgrade their equipment (which is clearly inferior since your product doesn't run fast enough).
< /rant>
I stumbled across this question just now, 3 years after it is asked, so I am not sure how useful the answer will still be... Still, I am surprised not to see clearly stated that answering your question requires to know two and only two things:
which processor you target
which compiler you work with
To the first point, each processor has different instructions for tests. On one given processor, two similar comparisons may turn up to take a different number of cycles. For example, you may have a 1-cycle instruction to do a gt (>), eq (==), or a le (<=), but no 1-cycle instruction for other comparisons like a ge (>=). Following a test, you may decide to execute conditional instructions, or, more often, as in your code example, take a jump. There again, cycles spent in jumps take a variable number of cycles on most high-end processors, depending whether the conditional jump is taken or not taken, predicted or not predicted. When you write code in assembly and your code is time critical, you can actually take quite a bit of time to figure out how to best arrange your code to minimize overall the cycle count and may end up in a solution that may have to be optimized based on the number of time a given comparison returns a true or false.
Which leads me to the second point: compilers, like human coders, try to arrange the code to take into account the instructions available and their latencies. Their job is harder because some assumptions an assembly code would know like "counter is small" is hard (not impossible) to know. For trivial cases like a loop counter, most modern compilers can at least recognize the counter will always be positive and that a != will be the same as a > and thus generate the best code accordingly. But that, as many mentioned in the posts, you will only know if you either run measurements, or inspect your assembly code and convince yourself this is the best you could do in assembly. And when you upgrade to a new compiler, you may then get a different answer.

What is faster (x < 0) or (x == -1)?

Variable x is int with possible values: -1, 0, 1, 2, 3.
Which expression will be faster (in CPU ticks):
1. (x < 0)
2. (x == -1)
Language: C/C++, but I suppose all other languages will have the same.
P.S. I personally think that answer is (x < 0).
More widely for gurus: what if x from -1 to 2^30?
That depends entirely on the ISA you're compiling for, and the quality of your compiler's optimizer. Don't optimize prematurely: profile first to find your bottlenecks.
That said, in x86, you'll find that both are equally fast in most cases. In both cases, you'll have a comparison (cmp) and a conditional jump (jCC) instructions. However, for (x < 0), there may be some instances where the compiler can elide the cmp instruction, speeding up your code by one whole cycle.
Specifically, if the value x is stored in a register and was recently the result of an arithmetic operation (such as add, or sub, but there are many more possibilities) that sets the sign flag SF in the EFLAGS register, then there's no need for the cmp instruction, and the compiler can emit just a js instruction. There's no simple jCC instruction that jumps when the input was -1.
Try it and see! Do a million, or better, a billion of each and time them. I bet there is no statistical significance in your results, but who knows -- maybe on your platform and compiler, you might find a result.
This is a great experiment to convince yourself that premature optimization is probably not worth your time--and may well be "the root of all evil--at least in programming".
Both operations can be done in a single CPU step, so they should be the same performance wise.
x < 0 will be faster. If nothing else, it prevents fetching the constant -1 as an operand.
Most architectures have special instructions for comparing against zero, so that will help too.
It could be dependent on what operations precede or succeed the comparison. For example, if you assign a value to x just before doing the comparison, then it might be faster to check the sign flag than to compare to a specific value. Or the CPU's branch-prediction performance could be affected by which comparison you choose.
But, as others have said, this is dependent upon CPU architecture, memory architecture, compiler, and a lot of other things, so there is no general answer.
The important consideration, anyway, is which actually directs your program flow accurately, and which just happens to produce the same result?
If x is actually and index or a value in an enum, then will -1 always be what you want, or will any negative value work? Right now, -1 is the only negative, but that could change.
You can't even answer this question out of context. If you try for a trivial microbenchmark, it's entirely possible that the optimizer will waft your code into the ether:
// Get time
int x = -1;
for (int i = 0; i < ONE_JILLION; i++) {
int dummy = (x < 0); // Poof! Dummy is ignored.
}
// Compute time difference - in the presence of good optimization
// expect this time difference to be close to useless.
Same, both operations are usually done in 1 clock.
It depends on the architecture, but the x == -1 is more error-prone. x < 0 is the way to go.
As others have said there probably isn't any difference. Comparisons are such fundamental operations in a CPU that chip designers want to make them as fast as possible.
But there is something else you could consider. Analyze the frequencies of each value and have the comparisons in that order. This could save you quite a few cycles. Of course you still need to compile your code to asm to verify this.
I'm sure you're confident this is a real time-taker.
I would suppose asking the machine would give a more reliable answer than any of us could give.
I've found, even in code like you're talking about, my supposition that I knew where the time was going was not quite correct. For example, if this is in an inner loop, if there is any sort of function call, even an invisible one inserted by the compiler, the cost of that call will dominate by far.
Nikolay, you write:
It's actually bottleneck operator in
the high-load program. Performance in
this 1-2 strings is much more valuable
than readability...
All bottlenecks are usually this
small, even in perfect design with
perfect algorithms (though there is no
such). I do high-load DNA processing
and know my field and my algorithms
quite well
If so, why not to do next:
get timer, set it to 0;
compile your high-load program with (x < 0);
start your program and timer;
on program end look at the timer and remember result1.
same as 1;
compile your high-load program with (x == -1);
same as 3;
on program end look at the timer and remember result2.
compare result1 and result2.
You'll get the Answer.