Performance of comparisons in C++ ( foo >= 0 vs. foo != 0 ) - c++

I've been working on a piece of code recently where performance is very important, and essentially I have the following situation:
int len = some_very_big_number;
int counter = some_rather_small_number;
for( int i = len; i >= 0; --i ){
while( counter > 0 && costly other stuff here ){
/* do stuff */
--counter;
}
/* do more stuff */
}
So here I have a loop that runs very often and for a certain number of runs the while block will be executed as well until the variable counter is reduced to zero and then the while loop will not be called because the first expression will be false.
The question is now, if there is a difference in performance between using
counter > 0 and counter != 0?
I suspect there would be, does anyone know specifics about this.

To measure is to know.

Do you think that what will solve your problem! :D
if(x >= 0)
00CA1011 cmp dword ptr [esp],0
00CA1015 jl main+2Ch (0CA102Ch) <----
...
if(x != 0)
00CA1026 cmp dword ptr [esp],0
00CA102A je main+3Bh (0CA103Bh) <----

In programming, the following statement is the sign designating the road to Hell:
I've been working on a piece of code recently where performance is very important
Write your code in the cleanest, most easy to understand way. Period.
Once that is done, you can measure its runtime. If it takes too long, measure the bottlenecks, and speed up the biggest ones. Keep doing that until it is fast enough.
The list of projects that failed or suffered catastrophic loss due to a misguided emphasis on blind optimization is large and tragic. Don't join them.

I think you're spending time optimizing the wrong thing. "costly other stuff here", "do stuff" and "do more stuff" are more important to look at. That is where you'll make the big performance improvements I bet.

There will be a huge difference if the counter starts with a negative number. Otherwise, on every platform I'm familiar with, there won't be a difference.

Is there a difference between counter > 0 and counter != 0? It depends on the platform.
A very common type of CPU are those from Intel we have in our PC's. Both comparisons will map to a single instruction on that CPU and I assume they will execute at the same speed. However, to be certain you will have to perform your own benchmark.

As Jim said, when in doubt see for yourself :
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
ptime Before = microsec_clock::universal_time(); // UTC NOW
// do stuff here
ptime After = microsec_clock::universal_time(); // UTC NOW
time_duration delta_t = After - Before; // How much time has passed?
cout << delta_t.total_seconds() << endl; // how much seconds total?
cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}
Here's a pretty nifty way of measuring time. Hope that helps.

OK, you can measure this, sure. However, these sorts of comparisons are so fast that you are probably going to see more variation based on processor swapping and scheduling then on this single line of code.
This smells of unnecessary, and premature, optimization. Right your program, optimize what you see. If you need more, profile, and then go from there.

I would add that the overwhelming performance aspects of this code on modern cpus will be dominated not by the comparison instruction but whether the comparison is well predicted since any mis-predict will waste many more cycles than any integral operation.
As such loop unrolling will probably be the biggest winner but measure, measure, measure.

Thinking that the type of comparison is going to make a difference, without knowing it, is the definition of guessing.
Don't guess.

In general, they should be equivalent (both are usually implemented in single-cycle instructions/micro-ops). Your compiler may do some strange special-case optimization that is difficult to reason about from the source level, which may make either one slightly faster. Also, equality testing is more energy-efficient than inequality testing (>), though the system-level effect is so small as to not merit discussion.

There may be no difference. You could try examining the assembly output for each.
That being said, the only way to tell if any difference is significant is to try it both ways and measure. I'd bet that the change makes no difference whatsoever with optimizations on.

Assuming you are developing for the x86 architecture, when you look at the assembly output it will come down to jns vs jne. jns will check the sign flag and jne will check the zero flag. Both operations, should as far as I know, be equally costly.

Clearly the solution is to use the correct data type.
Make counter an unsigned int. Then it can't be less than zero. Your compiler will obviously know this and be forced to choose the optimal solution.
Or you could just measure it.
You could also think about how it would be implemented...(here we go on a tangent)...
less than zero: the sign bit would be set, so need to check 1 bit
equal to zero : the whole value would be zero, so need to check all the bits
Of course, computers are funny things, and it may take longer to check a single bit than the whole value (however many bytes it is on your platform).
You could just measure it...
And you could find out that one it more optimal than another (under the conditions you measured it). But your program will still run like a dog because you spent all your time optimising the wrong part of your code.
The best solution is to use what many large software companies do - blame the hardware for not runnnig fast enough and encourage your customer to upgrade their equipment (which is clearly inferior since your product doesn't run fast enough).
< /rant>

I stumbled across this question just now, 3 years after it is asked, so I am not sure how useful the answer will still be... Still, I am surprised not to see clearly stated that answering your question requires to know two and only two things:
which processor you target
which compiler you work with
To the first point, each processor has different instructions for tests. On one given processor, two similar comparisons may turn up to take a different number of cycles. For example, you may have a 1-cycle instruction to do a gt (>), eq (==), or a le (<=), but no 1-cycle instruction for other comparisons like a ge (>=). Following a test, you may decide to execute conditional instructions, or, more often, as in your code example, take a jump. There again, cycles spent in jumps take a variable number of cycles on most high-end processors, depending whether the conditional jump is taken or not taken, predicted or not predicted. When you write code in assembly and your code is time critical, you can actually take quite a bit of time to figure out how to best arrange your code to minimize overall the cycle count and may end up in a solution that may have to be optimized based on the number of time a given comparison returns a true or false.
Which leads me to the second point: compilers, like human coders, try to arrange the code to take into account the instructions available and their latencies. Their job is harder because some assumptions an assembly code would know like "counter is small" is hard (not impossible) to know. For trivial cases like a loop counter, most modern compilers can at least recognize the counter will always be positive and that a != will be the same as a > and thus generate the best code accordingly. But that, as many mentioned in the posts, you will only know if you either run measurements, or inspect your assembly code and convince yourself this is the best you could do in assembly. And when you upgrade to a new compiler, you may then get a different answer.

Related

Optimize simple comparison with zero for performance

I have a bottleneck (about 20% CPU time) in my code which is in following if statement:
if (a == 0) { // here
...
}
where a is a uint8_t, so a number from 0 to 255.
Are there any low level optimizations to make it faster?
I thought about using bitwise NOR (~(a| 0)), but that would work only if a was a 1-bit, right?
Just in case: I don't care about code readability in this particular case.
Unless your compiler is garbage, there is nothing you can do to speed up integer comparison.
However, it is possible that the bottleneck you observe is not really the comparison itself, but rather the result of unlucky branch prediction.
There are two ways of getting around this:
If "to branch or not to branch" follows a pattern, move this last second decision further up in your program logic where you can use the pattern, just don't branch in your hot function. This might require serious thinking. A hacky way to find out whether you have patterns: Print 1 if you branch and 0 else for enough calls, Zip is up and see whether the resulting archive gets much smaller (in bits) than the number of values you printed. (Of course there are also smart formulas for that if you like it more theoretical.)
If you choose one branch over the other most of the time, you can tell the compiler which branch is the likely one. With gcc, checkout __builtin_expect, for other compilers, read the manual.
Important for both solutions: You will need to measure whether that actually helped. Especially the second one will not be magically be better, it might even make things much worse.

What is efficient to check: equal to or not equal to?

I was wondering, if we have if-else condition, then what is computationally more efficient to check: using the equal to operator or the not equal to operator? Is there any difference at all?
E.g., which one of the following is computationally efficient, both cases below will do same thing, but which one is better (if there's any difference)?
Case1:
if (a == x)
{
// execute Set1 of statements
}
else
{
// execute Set2 of statements
}
Case 2:
if (a != x)
{
// execute Set2 of statements
}
else
{
// execute Set1 of statements
}
Here assumptions are most of the time (say 90% of the cases) a will be equal to x. a and x both are of unsigned integer type.
Generally it shouldn't matter for performance which operator you use. However it is recommended for branching that the most likely outcome of the if-statement comes first.
Usually what you should consider is; what is the simplest and clearest way to write this code? IMHO, the first, positive is the simplest (not requiring a !)
In terms of performance there is no differences as the code is likely to compile to the same thing. (Certainly in the JIT for Java it should)
For Java, the JIT can optimise the code so the most common branch is preferred by the branch prediction.
In this simple case, it makes no difference. (assuming a and x are basic types) If they're class-types with overloaded operator == or operator != they might be different, but I wouldn't worry about it.
For subsequent loops:
if ( c1 ) { }
else if ( c2 ) { }
else ...
the most likely condition should be put first, to prevent useless evaluations of the others. (again, not applicable here since you only have one else).
GCC provides a way to inform the compiler about the likely outcome of an expression:
if (__builtin_expect(expression, 1))
…
This built-in evaluates to the value of expression, but it informs the compiler that the likely result is 1 (true for Booleans). To use this, you should write expression as clearly as possible (for humans), then set the second parameter to whichever value is most likely to be the result.
There is no difference.
The x86 CPU architecture has two opcodes for conditional jumps
JNE (jump if not equal)
JE (jump if equal)
Usually they both take the same number of CPU cycles.
And even when they wouldn't, you could expect the compiler to do such trivial optimizations for you. Write what's most readable and what makes your intention more clear instead of worrying about microseconds.
If you ever manage to write a piece of Java code that can be proven to be significantly more efficient one way than the other, you should publish your result and raise an issue against whatever implementation you observed the difference on.
More to the point, just asking this kind of question should be a sign of something amiss: it is an indication that you are focusing your attention and efforts on a wrong aspect of your code. Real-life application performance always suffers from inadequate architecture; never from concerns such as this.
Early optimization is the root of all evil
Even for branch prediction, I think you should not care too much about this, until it is really necessary.
Just as Peter said, use the simplest way.
Let the compiler/optimizer do its work.
It's a general rule of thumb (most nowadays) that the source code should express your intention in the most readable way. You are writing it to another human (and not to the computer), the one year later yourself or your team mate who will need to understand your code with the less effort.
It shouldn't make any difference performance wise but you consider what is easiest to read. Then when you are looking back on your code or if someone is looking at it, you want it to be easy to understand.
it has a little advantage (from point of readability) if the first condition is the one that is true in most cases.
Write the conditions that way that you can read them best. You will not benefit from speed by negating a condition
Most processors use an electrical gate for equality/inequality checks, this means all bits are checked at once. Therefore it should make no difference, but you want to truly optimise your code it is always better to benchmark things yourself and check the results.
If you are wondering whether it's worth it to optimise like that, imagine you would have this check multiple times for every pixel in your screen, or scenarios like that. Imho, it is alwasy worth it to optimise, even if it's only to teach yourself good habits ;)
Only the non-negative approach which you have used at the first seems to be the best .
The only way to know for sure is to code up both versions and measure their performance. If the difference is only a percent or so, use the version that more clearly conveys the intent.
It's very unlikely that you're going to see a significant difference between the two.
Performance difference between them is negligible. So, just think about readability of the code. For readability I prefer the one which has a more lines of code in the If statement.
if (a == x) {
// x lines of code
} else {
// y lines of code where y < x
}

Which is faster (mask >> i & 1) or (mask & 1 << i)?

In my code I must choose one of this two expressions (where mask and i non constant integer numbers -1 < i < (sizeof(int) << 3) + 1). I don't think that this will make preformance of my programm better or worse, but it is very interesting for me. Do you know which is better and why?
First of all, whenever you find yourself asking "which is faster", your first reaction should be to profile, measure and find out for yourself.
Second of all, this is such a tiny calculation, that it almost certainly has no bearing on the performance of your application.
Third, the two are most likely identical in performance.
C expressions cannot be "faster" or "slower", because CPU cannot evaluate them directly.
Which one is "faster" depends on the machine code your compiler will be able to generate for these two expressions. If your compiler is smart enough to realize that in your context both do the same thing (e.g. you simply compare the result with zero), it will probably generate the same code for both variants, meaning that they will be equally fast. In such case it is quite possible that the generated machine code will not even remotely resemble the sequence of operations in the original expression (i.e. no shift and/or no bitwise-and). If what you are trying to do here is just test the value of one bit, then there are other ways to do it besides the shift-and-bitwise-and combination. And many of those "other ways" are not expressible in C. You can't use them in C, while the compiler can use them in machine code.
For example, the x86 CPU has a dedicated bit-test instruction BT that extracts the value of a specific bit by its number. So a smart compiler might simply generate something like
MOV eax, i
BT mask, eax
...
for both of your expressions (assuming it is more efficient, of which I'm not sure).
Use either one and let your compiler optimize it however it likes.
If "i" is a compile-time constant, then the second would execute fewer instructions -- the 1 << i would be computed at compile time. Otherwise I'd imagine they'd be the same.
Depends entirely on where the values mask and i come from, and the architecture on which the program is running. There's also nothing to stop the compiler from transforming one into the other in situations where they are actually equivalent.
In short, not worth worrying about unless you have a trace showing that this is an appreciable fraction of total execution time.
It is unlikely that either will be faster. If you are really curious, compile a simple program that does both, disassemble, and see what instructions are generated.
Here is how to do that:
gcc -O0 -g main.c -o main
objdump -d main | less
You could examine the assembly output and then look-up how many clock cycles each instruction takes.
But in 99.9999999 percent of programs, it won't make a lick of difference.
The 2 expressions are not logically equivalent, performance is not your concern!
If performance was your concern, write a loop to do 10 million of each and measure.
EDIT: You edited the question after my response ... so please ignore my answer as the constraints change things.

Best way to test code speed in C++ without profiler, or does it not make sense to try?

On SO, there are quite a few questions about performance profiling, but I don't seem to find the whole picture. There are quite a few issues involved and most Q & A ignore all but a few at a time, or don't justify their proposals.
What Im wondering about. If I have two functions that do the same thing, and Im curious about the difference in speed, does it make sense to test this without external tools, with timers, or will this compiled in testing affect the results to much?
I ask this because if it is sensible, as a C++ programmer, I want to know how it should best be done, as they are much simpler than using external tools. If it makes sense, lets proceed with all the possible pitfalls:
Consider this example. The following code shows 2 ways of doing the same thing:
#include <algorithm>
#include <ctime>
#include <iostream>
typedef unsigned char byte;
inline
void
swapBytes( void* in, size_t n )
{
for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )
in[lo] ^= in[hi]
, in[hi] ^= in[lo]
, in[lo] ^= in[hi] ;
}
int
main()
{
byte arr[9] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };
const int iterations = 100000000;
clock_t begin = clock();
for( int i=iterations; i!=0; --i )
swapBytes( arr, 8 );
clock_t middle = clock();
for( int i=iterations; i!=0; --i )
std::reverse( arr, arr+8 );
clock_t end = clock();
double secSwap = (double) ( middle-begin ) / CLOCKS_PER_SEC;
double secReve = (double) ( end-middle ) / CLOCKS_PER_SEC;
std::cout << "swapBytes, for: " << iterations << " times takes: " << middle-begin
<< " clock ticks, which is: " << secSwap << "sec." << std::endl;
std::cout << "std::reverse, for: " << iterations << " times takes: " << end-middle
<< " clock ticks, which is: " << secReve << "sec." << std::endl;
std::cin.get();
return 0;
}
// Output:
// Release:
// swapBytes, for: 100000000 times takes: 3000 clock ticks, which is: 3sec.
// std::reverse, for: 100000000 times takes: 1437 clock ticks, which is: 1.437sec.
// Debug:
// swapBytes, for: 10000000 times takes: 1781 clock ticks, which is: 1.781sec.
// std::reverse, for: 10000000 times takes: 12781 clock ticks, which is: 12.781sec.
The issues:
Which timers to use and how get the cpu time actually consumed by the code under question?
What are the effects of compiler optimization (since these functions just swap bytes back and forth, the most efficient thing is obviously to do nothing at all)?
Considering the results presented here, do you think they are accurate (I can assure you that multiple runs give very similar results)? If yes, can you explain how std::reverse gets to be so fast, considering the simplicity of the custom function. I don't have the source code from the vc++ version that I used for this test, but here is the implementation from GNU. It boils down to the function iter_swap, which is completely incomprehensible for me. Would this also be expected to run twice as fast as that custom function, and if so, why?
Contemplations:
It seems two high precision timers are being proposed: clock() and QueryPerformanceCounter (on windows). Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don't give that functionality, so other processes on the system would interfere with measurements. This page on the gnu c library seems to contradict that, but when I put a breakpoint in vc++, the debugged process gets a lot of clock ticks even though it was suspended (I have not tested under gnu). Am I missing alternative counters for this, or do we need at least special libraries or classes for this? If not, is clock good enough in this example or would there be a reason to use the QueryPerformanceCounter?
What can we know for certain without debugging, dissassembling and profiling tools? Is anything actually happening? Is the function call being inlined or not? When checking in the debugger, the bytes do actually get swapped, but I'd rather know from theory why, than from testing.
Thanks for any directions.
update
Thanks to a hint from tojas the swapBytes function now runs as fast as the std::reverse. I had failed to realize that the temporary copy in case of a byte must be only a register, and thus is very fast. Elegance can blind you.
inline
void
swapBytes( byte* in, size_t n )
{
byte t;
for( int i=0; i<7-i; ++i )
{
t = in[i];
in[i] = in[7-i];
in[7-i] = t;
}
}
Thanks to a tip from ChrisW I have found that on windows you can get the actual cpu time consumed by a (read:your) process trough Windows Management Instrumentation. This definitely looks more interesting than the high precision counter.
Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don't give that functionality, so other processes on the system would interfere with measurements.
I do two things, to ensure that wall-clock time and CPU time are approximately the same thing:
Test for a significant length of time, i.e. several seconds (e.g. by testing a loop of however many thousands of iterations)
Test when the machine is more or less relatively idle except for whatever I'm testing.
Alternatively if you want to measure only/more exactly the CPU time per thread, that's available as a performance counter (see e.g. perfmon.exe).
What can we know for certain without debugging, dissassembling and profiling tools?
Nearly nothing (except that I/O tends to be relatively slow).
To answer you main question, it "reverse" algorithm just swaps elements from the array and not operating on the elements of the array.
Use QueryPerformanceCounter on Windows if you need a high-resolution timing. The counter accuracy depends on the CPU but it can go up to per clock pulse. However, profiling in real world operations is always a better idea.
Is it safe to say you're asking two questions?
Which one is faster, and by how much?
And why is it faster?
For the first, you don't need high precision timers. All you need to do is run them "long enough" and measure with low precision timers. (I'm old-fashioned, my wristwatch has a stop-watch function, and it is entirely good enough.)
For the second, surely you can run the code under a debugger and single-step it at the instruction level. Since the basic operations are so simple, you will be able to easily see roughly how many instructions are required for the basic cycle.
Think simple. Performance is not a hard subject. Usually, people are trying to find problems, for which this is a simple approach.
(This answer is specific to Windows XP and the 32-bit VC++ compiler.)
The easiest thing for timing little bits of code is the time-stamp counter of the CPU. This is a 64-bit value, a count of the number of CPU cycles run so far, which is about as fine a resolution as you're going to get. The actual numbers you get aren't especially useful as they stand, but if you average out several runs of various competing approaches then you can compare them that way. The results are a bit noisy, but still valid for comparison purposes.
To read the time-stamp counter, use code like the following:
LARGE_INTEGER tsc;
__asm {
cpuid
rdtsc
mov tsc.LowPart,eax
mov tsc.HighPart,edx
}
(The cpuid instruction is there to ensure that there aren't any incomplete instructions waiting to complete.)
There are four things worth noting about this approach.
Firstly, because of the inline assembly language, it won't work as-is on MS's x64 compiler. (You'll have to create a .ASM file with a function in it. An exercise for the reader; I don't know the details.)
Secondly, to avoid problems with cycle counters not being in sync across different cores/threads/what have you, you may find it necessary to set your process's affinity so that it only runs on one specific execution unit. (Then again... you may not.)
Thirdly, you'll definitely want to check the generated assembly language to ensure that the compiler is generating roughly the code you expect. Watch out for bits of code being removed, functions being inlined, that sort of thing.
Finally, the results are rather noisy. The cycle counters count cycles spent on everything, including waiting for caches, time spent on running other processes, time spent in the OS itself, etc. Unfortunately, it's not possible (under Windows, at least) to time just your process. So, I suggest running the code under test a lot of times (several tens of thousands) and working out the average. This isn't very cunning, but it seems to have produced useful results for me at any rate.
I would suppose that anyone competent enough to answer all your questions is gong to be far too busy to answer all your questions. In practice it is probably more effective to ask a single, well-defined questions. That way you may hope to get well-defined answers which you can collect and be on your way to wisdom.
So, anyway, perhaps I can answer your question about which clock to use on Windows.
clock() is not considered a high precision clock. If you look at the value of CLOCKS_PER_SEC you will see it has a resolution of 1 millisecond. This is only adequate if you are timing very long routines, or a loop with 10000's of iterations. As you point out, if you try and repeat a simple method 10000's of times in order to get a time that can be measured with clock() the compiler is liable to step in and optimize the whole thing away.
So, really, the only clock to use is QueryPerformanceCounter()
Is there something you have against profilers? They help a ton. Since you are on WinXP, you should really give a trial of vtune a try. Try a call graph sampling test and look at self time and total time of the functions being called. There's no better way to tune your program so that it's the fastest possible without being an assembly genius (and a truly exceptional one).
Some people just seem to be allergic to profilers. I used to be one of those and thought I knew best about where my hotspots were. I was often correct about obvious algorithmic inefficiencies, but practically always incorrect about more micro-optimization cases. Just rewriting a function without changing any of the logic (ex: reordering things, putting exceptional case code in a separate, non-inlined function, etc) can make functions a dozen times faster and even the best disassembly experts usually can't predict that without the profiler.
As for relying on simplistic timing tests alone, they are extremely problematic. That current test is not so bad but it's a very common mistake to write timing tests in ways in which the optimizer will optimize out dead code and end up testing the time it takes to do essentially a nop or even nothing at all. You should have some knowledge to interpret the disassembly to make sure the compiler isn't doing this.
Also timing tests like this have a tendency to bias the results significantly since a lot of them just involve running your code over and over in the same loop, which tends to simply test the effect of your code when all the memory in the cache with all the branch prediction working perfectly for it. It's often just showing you best case scenarios without showing you the average, real-world case.
Depending on real world timing tests is a little bit better; something closer to what your application will be doing at a high level. It won't give you specifics about what is taking what amount of time, but that's precisely what the profiler is meant to do.
Wha? How to measure speed without a profiler? The very act of measuring speed is profiling! The question amounts to, "how can I write my own profiler?" And the answer is clearly, "don't".
Besides, you should be using std::swap in the first place, which complete invalidates this whole pointless pursuit.
-1 for pointlessness.

What is faster (x < 0) or (x == -1)?

Variable x is int with possible values: -1, 0, 1, 2, 3.
Which expression will be faster (in CPU ticks):
1. (x < 0)
2. (x == -1)
Language: C/C++, but I suppose all other languages will have the same.
P.S. I personally think that answer is (x < 0).
More widely for gurus: what if x from -1 to 2^30?
That depends entirely on the ISA you're compiling for, and the quality of your compiler's optimizer. Don't optimize prematurely: profile first to find your bottlenecks.
That said, in x86, you'll find that both are equally fast in most cases. In both cases, you'll have a comparison (cmp) and a conditional jump (jCC) instructions. However, for (x < 0), there may be some instances where the compiler can elide the cmp instruction, speeding up your code by one whole cycle.
Specifically, if the value x is stored in a register and was recently the result of an arithmetic operation (such as add, or sub, but there are many more possibilities) that sets the sign flag SF in the EFLAGS register, then there's no need for the cmp instruction, and the compiler can emit just a js instruction. There's no simple jCC instruction that jumps when the input was -1.
Try it and see! Do a million, or better, a billion of each and time them. I bet there is no statistical significance in your results, but who knows -- maybe on your platform and compiler, you might find a result.
This is a great experiment to convince yourself that premature optimization is probably not worth your time--and may well be "the root of all evil--at least in programming".
Both operations can be done in a single CPU step, so they should be the same performance wise.
x < 0 will be faster. If nothing else, it prevents fetching the constant -1 as an operand.
Most architectures have special instructions for comparing against zero, so that will help too.
It could be dependent on what operations precede or succeed the comparison. For example, if you assign a value to x just before doing the comparison, then it might be faster to check the sign flag than to compare to a specific value. Or the CPU's branch-prediction performance could be affected by which comparison you choose.
But, as others have said, this is dependent upon CPU architecture, memory architecture, compiler, and a lot of other things, so there is no general answer.
The important consideration, anyway, is which actually directs your program flow accurately, and which just happens to produce the same result?
If x is actually and index or a value in an enum, then will -1 always be what you want, or will any negative value work? Right now, -1 is the only negative, but that could change.
You can't even answer this question out of context. If you try for a trivial microbenchmark, it's entirely possible that the optimizer will waft your code into the ether:
// Get time
int x = -1;
for (int i = 0; i < ONE_JILLION; i++) {
int dummy = (x < 0); // Poof! Dummy is ignored.
}
// Compute time difference - in the presence of good optimization
// expect this time difference to be close to useless.
Same, both operations are usually done in 1 clock.
It depends on the architecture, but the x == -1 is more error-prone. x < 0 is the way to go.
As others have said there probably isn't any difference. Comparisons are such fundamental operations in a CPU that chip designers want to make them as fast as possible.
But there is something else you could consider. Analyze the frequencies of each value and have the comparisons in that order. This could save you quite a few cycles. Of course you still need to compile your code to asm to verify this.
I'm sure you're confident this is a real time-taker.
I would suppose asking the machine would give a more reliable answer than any of us could give.
I've found, even in code like you're talking about, my supposition that I knew where the time was going was not quite correct. For example, if this is in an inner loop, if there is any sort of function call, even an invisible one inserted by the compiler, the cost of that call will dominate by far.
Nikolay, you write:
It's actually bottleneck operator in
the high-load program. Performance in
this 1-2 strings is much more valuable
than readability...
All bottlenecks are usually this
small, even in perfect design with
perfect algorithms (though there is no
such). I do high-load DNA processing
and know my field and my algorithms
quite well
If so, why not to do next:
get timer, set it to 0;
compile your high-load program with (x < 0);
start your program and timer;
on program end look at the timer and remember result1.
same as 1;
compile your high-load program with (x == -1);
same as 3;
on program end look at the timer and remember result2.
compare result1 and result2.
You'll get the Answer.