Which operator is faster: != or > - c++

Which operator is faster: > or ==?
Example: I want to test a value (which can have a positive value or -1) against -1 :
if(time > -1)
// or
if (time != -1)
time has type "int"

The standard doesn't say. So it's up to what opcodes the given compiler generates in its given version, and how fast a given CPU executes them.
I.e., implementation / platform defined.
You can find out for a specific compiler / platform combination by looking at / benchmarking the executable code.
But I seriously doubt it will make much of a difference; this is the kind of micro-optimization that is almost always dwarfed by higher-level architectural decisions.

It is platform-dependent. Generally though, those two operations will translate directly to the assembler instructions "branch if greater than" and "branch if not equal". It is unlikely that there is any performance difference between those two, and if there would be, it would be non-significant.
The only branch instruction which is ever so slightly faster than the others is usually "branch if zero"/"branch if not zero".
(In the dark ages when compilers sucked, C programmers therefore liked to write loops as down-counting to zero, instead of up-counting, so that comparisons would be done against zero instead of a value, in order to gain a few nanoseconds. Modern compilers can do that optimization themselves, but you still see such loops now and then.)
In general, you shouldn't concern yourself with micro-management of performance. If you spend time pondering if > is faster than !=, instead of pondering about program design, readability and functionality, you need to set your priorities straight asap.

Semantically these conditions are different. The first one checks whether object time is positive or zero.
if(time > -1)
In this case it would be better to write
if( time >= 0 )
However some functions return either a non-negative value or -1. For example a search function can return -1 if it did not find an element in an array. Or -1 can signal an error state or an absence of a value.
In this case it is better to use condition
if ( time != -1 )
As for the speed when the compiler can generate only one mashine instruction to make the comparison in the both cases.
It is not the case when you should think about the speed. You should think about what condition is more expressive and shows the intention of the programmer.

Related

Optimize simple comparison with zero for performance

I have a bottleneck (about 20% CPU time) in my code which is in following if statement:
if (a == 0) { // here
...
}
where a is a uint8_t, so a number from 0 to 255.
Are there any low level optimizations to make it faster?
I thought about using bitwise NOR (~(a| 0)), but that would work only if a was a 1-bit, right?
Just in case: I don't care about code readability in this particular case.
Unless your compiler is garbage, there is nothing you can do to speed up integer comparison.
However, it is possible that the bottleneck you observe is not really the comparison itself, but rather the result of unlucky branch prediction.
There are two ways of getting around this:
If "to branch or not to branch" follows a pattern, move this last second decision further up in your program logic where you can use the pattern, just don't branch in your hot function. This might require serious thinking. A hacky way to find out whether you have patterns: Print 1 if you branch and 0 else for enough calls, Zip is up and see whether the resulting archive gets much smaller (in bits) than the number of values you printed. (Of course there are also smart formulas for that if you like it more theoretical.)
If you choose one branch over the other most of the time, you can tell the compiler which branch is the likely one. With gcc, checkout __builtin_expect, for other compilers, read the manual.
Important for both solutions: You will need to measure whether that actually helped. Especially the second one will not be magically be better, it might even make things much worse.

Is It More Efficient to Do a Less Than comparison OR a Less Than Or Equal To?

I'm wondering if it's more efficient to do a less than or equal to comparison in a loop or a less than comparison. Does the <= operator instruct the computer to make two comparisons (is it less than, is it equal to), or does it simplify it? Take the following example. I want a loop than increments to 1000. Should I set the ceiling to 1001 and tell it that while i is < (OR !=) 1001, i++;? Or should I tell it that while i <= 1000, i++;? Will the compiler (GCC) simplify it to the same basic instructions?
The machine level architecture will have OP codes for both < and <= operations and both comparisons can be made in one cycle of the CPU. Meaning it makes no difference.
It depends on the architecture.
The original von Neumann IAS architecture (1945) did have only >= comparison.
Intel 8086 can use Loop label paradigm, which corresponds to do { } while (--cx > 0);
In legacy architectures, LOOP was not only smaller, but faster. In modern architectures LOOP is considered complex operation, which is slower than dec ecx; jnz label; When optimizing for size (-Os) this can still have significance.
Further considerations are that some (RISC) architectures do not have explicit flag registers. Then comparison can't be given free, as a side effect of loop decrement. Some RISC architectures have also a special 'zero' register, which means, that comparison (and every other mathematical operations) with zero is always available. RISCs with jump delay slots may even benefit from using post decrement: do { } while (a-- > 0);
An optimizing compiler should be able to convert a simple loop regardless of the syntax to the most optimized version for the given architecture. A complex loop would have a dependency to the iterator, side effects, or both: for (i=0;i<5;i++) func(i);.
Measure it. Only then you can be absolutely sure which is faster.
You may think a lot about all the parts that play a role here (compiler, optimisation, processor, etc.). But in the end it is faster if it takes less time. It's as simple as that.

Do compilers have certain optimization heuristics to support branch prediction? If not, why not?

This question is mostly a follow up after reading this article by Aater Suleman on improving branch prediction from the software side. The author provides a way to "unroll" conditional statements such to increase the probability of predicting a branch taken in the case of a 2-bit saturated counter scheme. Here's an exerpt:
Let me explain with an example. Lets suppose that X is random variable between 0 and 99. I want to run the following code:
if (X > 5 && X < 95) //branch is taken 90% of the time
do_something();
However, if I write the code as:
if(X > 5) //branch is taken 95% of the time
if(X < 95) //branch is taken 95% of the time
do_something();
The branch predictor has a better shot at predicting both these branches more accurately which may lead to better performance because the counters assigned to both these branches are more likely to stay saturated at taken (because two not-taken are less likely).
In general, any time you are ANDing/ORing conditions in an if-statement, you should think if the combination will be more biased or less biased and pick the version which is more biased.
My question is: do compilers follow this heuristic all of the time? Would a compiler even have the jurisdiction to do something like this since compilers exist in the scope of ISA's and architectures and branch prediction schemes exist in the scope of processors and more specific hardware implementations?
My intuition is that it would not hurt performance to expand control statements in such a way but at the same time I have not been able to find any evidence that compilers make such optimizations. If that is the case, why don't they? What am I missing in my reasoning and can someone please provide an example where such an optimization would be detrimental with respect to a certain architecture or prediction scheme?
Thanks.
It seems that Suleman is not aware that
if (X > 5 && X < 95)
do_something();
and
if(X > 5)
if(X < 95)
do_something();
are semantically equivalent in C and C++. The latest C standard states
(6.5.13/4):
Unlike the bitwise binary & operator, the && operator guarantees
left-to-right evaluation; if the second operand is evaluated, there is
a sequence point between the evaluations of the first and second
operands. If the first operand compares equal to 0, the second operand
is not evaluated.
A compiler may or may not generate the same code for both code samples. Rewriting your program to use one or the other form to improve performance is not a good idea. Results would depend heavily on the compiler version, ISA, and probably more variables. Leave this type of optimization to the compiler, but give the compiler the information it needs.
Some compilers like GCC and LLVM allow you to give explicit hints like this:
if (__builtin_expect(X > 5, 1)) {
// This block is likely to be taken.
}
if (__builtin_expect(X <= 5, 0)) {
// This block is unlikely to be taken.
}
Another way is to use profile guided optimization. The first step requires one or more test runs of your program to generate a database with statistical information about branch instructions. In a second step the compiler can then use this database to optimize your program. See your compiler manual for more details.
You can avoid the two branches by replacing this code with
if (((unsigned int) X) - 6 < 88)
do_something ();
The __builtin_expect obviously cannot change whether a condition is true or false. It's mostly used for processors where a taken branch is slower than a branch that is not taken. So the compiler would compile
if (__builtin_expect (condition, 0))
statement;
morestuff ();
to this:
if (condition) goto elsewhere;
backhere: morestuff ();
....
elsewhere: statement; goto backhere;
so that usually no branch is being taken, instead of the usual
if (! condition) goto nextstatement;
statement;
nextstatement: morestuff ();
But in the question, things don't work that way. if (cond1 && cond2) will not generate a single branch (only in special cases with a very clever compiler), so the comparison isn't valid. And if it was, two branches with 95% correct prediction are not better than one branch with 90%, because the number of mispredictions will be the same.

What is efficient to check: equal to or not equal to?

I was wondering, if we have if-else condition, then what is computationally more efficient to check: using the equal to operator or the not equal to operator? Is there any difference at all?
E.g., which one of the following is computationally efficient, both cases below will do same thing, but which one is better (if there's any difference)?
Case1:
if (a == x)
{
// execute Set1 of statements
}
else
{
// execute Set2 of statements
}
Case 2:
if (a != x)
{
// execute Set2 of statements
}
else
{
// execute Set1 of statements
}
Here assumptions are most of the time (say 90% of the cases) a will be equal to x. a and x both are of unsigned integer type.
Generally it shouldn't matter for performance which operator you use. However it is recommended for branching that the most likely outcome of the if-statement comes first.
Usually what you should consider is; what is the simplest and clearest way to write this code? IMHO, the first, positive is the simplest (not requiring a !)
In terms of performance there is no differences as the code is likely to compile to the same thing. (Certainly in the JIT for Java it should)
For Java, the JIT can optimise the code so the most common branch is preferred by the branch prediction.
In this simple case, it makes no difference. (assuming a and x are basic types) If they're class-types with overloaded operator == or operator != they might be different, but I wouldn't worry about it.
For subsequent loops:
if ( c1 ) { }
else if ( c2 ) { }
else ...
the most likely condition should be put first, to prevent useless evaluations of the others. (again, not applicable here since you only have one else).
GCC provides a way to inform the compiler about the likely outcome of an expression:
if (__builtin_expect(expression, 1))
…
This built-in evaluates to the value of expression, but it informs the compiler that the likely result is 1 (true for Booleans). To use this, you should write expression as clearly as possible (for humans), then set the second parameter to whichever value is most likely to be the result.
There is no difference.
The x86 CPU architecture has two opcodes for conditional jumps
JNE (jump if not equal)
JE (jump if equal)
Usually they both take the same number of CPU cycles.
And even when they wouldn't, you could expect the compiler to do such trivial optimizations for you. Write what's most readable and what makes your intention more clear instead of worrying about microseconds.
If you ever manage to write a piece of Java code that can be proven to be significantly more efficient one way than the other, you should publish your result and raise an issue against whatever implementation you observed the difference on.
More to the point, just asking this kind of question should be a sign of something amiss: it is an indication that you are focusing your attention and efforts on a wrong aspect of your code. Real-life application performance always suffers from inadequate architecture; never from concerns such as this.
Early optimization is the root of all evil
Even for branch prediction, I think you should not care too much about this, until it is really necessary.
Just as Peter said, use the simplest way.
Let the compiler/optimizer do its work.
It's a general rule of thumb (most nowadays) that the source code should express your intention in the most readable way. You are writing it to another human (and not to the computer), the one year later yourself or your team mate who will need to understand your code with the less effort.
It shouldn't make any difference performance wise but you consider what is easiest to read. Then when you are looking back on your code or if someone is looking at it, you want it to be easy to understand.
it has a little advantage (from point of readability) if the first condition is the one that is true in most cases.
Write the conditions that way that you can read them best. You will not benefit from speed by negating a condition
Most processors use an electrical gate for equality/inequality checks, this means all bits are checked at once. Therefore it should make no difference, but you want to truly optimise your code it is always better to benchmark things yourself and check the results.
If you are wondering whether it's worth it to optimise like that, imagine you would have this check multiple times for every pixel in your screen, or scenarios like that. Imho, it is alwasy worth it to optimise, even if it's only to teach yourself good habits ;)
Only the non-negative approach which you have used at the first seems to be the best .
The only way to know for sure is to code up both versions and measure their performance. If the difference is only a percent or so, use the version that more clearly conveys the intent.
It's very unlikely that you're going to see a significant difference between the two.
Performance difference between them is negligible. So, just think about readability of the code. For readability I prefer the one which has a more lines of code in the If statement.
if (a == x) {
// x lines of code
} else {
// y lines of code where y < x
}

What is faster (x < 0) or (x == -1)?

Variable x is int with possible values: -1, 0, 1, 2, 3.
Which expression will be faster (in CPU ticks):
1. (x < 0)
2. (x == -1)
Language: C/C++, but I suppose all other languages will have the same.
P.S. I personally think that answer is (x < 0).
More widely for gurus: what if x from -1 to 2^30?
That depends entirely on the ISA you're compiling for, and the quality of your compiler's optimizer. Don't optimize prematurely: profile first to find your bottlenecks.
That said, in x86, you'll find that both are equally fast in most cases. In both cases, you'll have a comparison (cmp) and a conditional jump (jCC) instructions. However, for (x < 0), there may be some instances where the compiler can elide the cmp instruction, speeding up your code by one whole cycle.
Specifically, if the value x is stored in a register and was recently the result of an arithmetic operation (such as add, or sub, but there are many more possibilities) that sets the sign flag SF in the EFLAGS register, then there's no need for the cmp instruction, and the compiler can emit just a js instruction. There's no simple jCC instruction that jumps when the input was -1.
Try it and see! Do a million, or better, a billion of each and time them. I bet there is no statistical significance in your results, but who knows -- maybe on your platform and compiler, you might find a result.
This is a great experiment to convince yourself that premature optimization is probably not worth your time--and may well be "the root of all evil--at least in programming".
Both operations can be done in a single CPU step, so they should be the same performance wise.
x < 0 will be faster. If nothing else, it prevents fetching the constant -1 as an operand.
Most architectures have special instructions for comparing against zero, so that will help too.
It could be dependent on what operations precede or succeed the comparison. For example, if you assign a value to x just before doing the comparison, then it might be faster to check the sign flag than to compare to a specific value. Or the CPU's branch-prediction performance could be affected by which comparison you choose.
But, as others have said, this is dependent upon CPU architecture, memory architecture, compiler, and a lot of other things, so there is no general answer.
The important consideration, anyway, is which actually directs your program flow accurately, and which just happens to produce the same result?
If x is actually and index or a value in an enum, then will -1 always be what you want, or will any negative value work? Right now, -1 is the only negative, but that could change.
You can't even answer this question out of context. If you try for a trivial microbenchmark, it's entirely possible that the optimizer will waft your code into the ether:
// Get time
int x = -1;
for (int i = 0; i < ONE_JILLION; i++) {
int dummy = (x < 0); // Poof! Dummy is ignored.
}
// Compute time difference - in the presence of good optimization
// expect this time difference to be close to useless.
Same, both operations are usually done in 1 clock.
It depends on the architecture, but the x == -1 is more error-prone. x < 0 is the way to go.
As others have said there probably isn't any difference. Comparisons are such fundamental operations in a CPU that chip designers want to make them as fast as possible.
But there is something else you could consider. Analyze the frequencies of each value and have the comparisons in that order. This could save you quite a few cycles. Of course you still need to compile your code to asm to verify this.
I'm sure you're confident this is a real time-taker.
I would suppose asking the machine would give a more reliable answer than any of us could give.
I've found, even in code like you're talking about, my supposition that I knew where the time was going was not quite correct. For example, if this is in an inner loop, if there is any sort of function call, even an invisible one inserted by the compiler, the cost of that call will dominate by far.
Nikolay, you write:
It's actually bottleneck operator in
the high-load program. Performance in
this 1-2 strings is much more valuable
than readability...
All bottlenecks are usually this
small, even in perfect design with
perfect algorithms (though there is no
such). I do high-load DNA processing
and know my field and my algorithms
quite well
If so, why not to do next:
get timer, set it to 0;
compile your high-load program with (x < 0);
start your program and timer;
on program end look at the timer and remember result1.
same as 1;
compile your high-load program with (x == -1);
same as 3;
on program end look at the timer and remember result2.
compare result1 and result2.
You'll get the Answer.