rate ++a,a++,a=a+1 and a+=1 in terms of execution efficiency in C.Assume gcc to be the compiler [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there a performance difference between i++ and ++i in C++?
In terms of usage of the following, please rate in terms of execution time in C.
In some interviews i was asked which shall i use among these variations and why.
a++
++a
a=a+1
a+=1

Here is what g++ -S produces:
void irrelevant_low_level_worries()
{
int a = 0;
// movl $0, -4(%ebp)
a++;
// incl -4(%ebp)
++a;
// incl -4(%ebp)
a = a + 1;
// incl -4(%ebp)
a += 1;
// incl -4(%ebp)
}
So even without any optimizer switches, all four statements compile to the exact same machine code.

You can't rate the execution time in C, because it's not the C code that is executed. You have to profile the executable code compiled with a specific compiler running on a specific computer to get a rating.
Also, rating a single operation doesn't give you something that you can really use. Todays processors execute several instructions in parallel, so the efficiency of an operation relies very much on how well it can be paired with the instructions in the surrounding code.
So, if you really need to use the one that has the best performance, you have to profile the code. Otherwise (which is about 98% of the time) you should use the one that is most readable and best conveys what the code is doing.

The circumstances where these kinds of things actually matter is very rare and few in between. Most of the time, it doesn't matter at all. In fact I'm willing to bet that this is the case for you.
What is true for one language/compiler/architecture may not be true for others. And really, the fact is irrelevant in the bigger picture anyway. Knowing these things do not make you a better programmer.
You should study algorithms, data structures, asymptotic analysis, clean and readable coding style, programming paradigms, etc. Those skills are a lot more important in producing performant and manageable code than knowing these kinds of low-level details.
Do not optimize prematurely, but also, do not micro-optimize. Look for the big picture optimizations.

This depends on the type of a as well as on the context of execution. If a is of a primitive type and if all four statements have the same identical effect then these should all be equivalent and identical in terms of efficiency. That is, the compiler should be smart enough to translate them into the same optimized machine code. Granted, that is not a requirement, but if it's not the case with your compiler then that is a good sign to start looking for a better compiler.

For most compilers it should compile to the same ASM code.

Same.
For more details see http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf

I can't see why there should be any difference in execution time, but let's prove me wrong.
a++
and
++a
are not the same however, but this is not related to efficiency.
When it comes to performance of individual lines, context is always important, and guessing is not a good idea. Test and measure is better

In an interview, I would go with two answers:
At first glance, the generated code should be very similar, especially if a is an integer.
If execution time was definitely a known problem - you have to measure it using some kind of profiler.

Well, you could argue that a++ is short and to the point. It can only increment a by one, but the notation is very well understood. a=a+1 is a little more verbose (not a big deal, unless you have variablesWithGratuitouslyLongNames), but some might argue it's more "flexible" because you can replace the 1 or either of the a's to change the expression. a+=1 is maybe not as flexible as the other two but is a little more clear, in the sense that you can change the increment amount. ++a is different from a++ and some would argue against it because it's not always clear to people who don't use it often.
In terms of efficiency, I think most modern compilers will produce the same code for all of these but I could be mistaken. Really, you'd have to run your code with all variations and measure which performs best.
(assuming that a is an integer)

It depends on the context, and if we are in C or C++. In C the code you posted (except for a-- :-) will cause a modern C compiler to produce exactly the same code. But by a very high chance the expected answer is that a++ is the fastest one and a=a+1 the slowest, since ancient compilers relied on the user to perform such optimizations.
In C++ it depends of the type of a. When a is a numeric type, it acts the same way as in C, which means a++, a+=1 and a=a+1 generate the same code. When a is a object, it depends if any operator (++, + and =) is overloaded, since then the overloaded operator of the a object is called.
Also when you work in a field with very special compilers (like microcontrollers or embedded systems) these compilers can behave very differently on each of these input variations.

Related

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

C++ compiler optimization for complex equations

I have some equations that involve multiple operations that I would like to run as fast as possible. Since the c++ compiler breaks it down in to machine code anyway does it matter if I break it up to multiple lines like
A=4*B+4*C;
D=3*E/F;
G=A*D;
vs
G=12*E*(B+C)/F;
My need is more complex than this but the i think it conveys the idea. Also if this is in a function that gets called is in a loop, does defining double A, D cost CPU time vs putting it in as a class variable?
Using a modern compiler, Clang/Gcc/VC++/Intel, it won't really matter, the best thing you should do is worry about how readable your code will be and turn on optimizations, compiler designers are well aware of issues like these and design their compilers to (for the most part) optimize according.
If I were to say which would be slower I would assume the first way since there would be 3 mov instructions, I could be wrong. but this isn't something you should worry about too much.
If these variables are integers, that second code fragment is not a valid optimization of the first. For B=1, C=1, E=1, F=6, you have:
A=4*B+4*C; // 8
D=3*E/F; // 0
G=A*D; // 0
and
G=12*E*(B+C)/F; // 4
If floating point, then it really depends on what compiler, what compiler options, and what cpu you have.

Is ++(a = b); faster than a = b + 1;?

Is it faster to use ++(a = b); instead of a = b + 1;?
For my understanding, the first approach consists of the operations:
move the value of b to a
increment a in memory
while the second approach does:
push b and 1 to the stack
call add
pop the result to a register
move the register to a
Does it actually take less cycles? Or does the compiler (gcc for example) do an optimization so it does not make a difference?
edit: TIL that ++(a=b) is wrong illegal UB, at least in pre-C++11. Nevertheless, I'll discuss this assuming it's either legal or the compiler does what you expect.
Generally speaking, a = b + 1; is faster.
The optimizer will most surely make the same of both. If not, it is more likely to optimize the second version, because it is a very common thing to write, and omtimizers are more likely to recognize common things than weird corner cases.
Why do I say it should be the same after optimization, but the second is faster? Because of the fellow developers. Everyone recognizes a = b + 1; immediately. Noone really has to think about it. The other case is more likely to trigger a reaction in the likes of "wtf is he doing there, and why?". Many people will figure out eventually what you did there. Some will not. Some might even introduce bugs because of it. Few people will find out why you did it and nevertheless stumble each time they have to read that line. Everyone will lose time wondering while reading that line. That's why the other is faster.
Caveat: all this is written silently assuming that you are talking of builtin types, like ints or pointers. Your interpretation of what the two do supports that. If we're talking of UDTs, the two lines are not even guaranteed to do the same. It then depends completely on how operator=, operator++ and operator+ and maybe the conversion from int are implemented. Nevertheless, if the implementations make you conside to write ++(a=b), they are most likely bad implementations and should be improved rather than hacked around.
tl;dr: if I'd catch you doing ++(a=b) in any codebase I work on, we'd have to have a serious talk ;-)
There is no simple answer to this question. The question has been flagged with C++ so we have no way of knowing what this code is actually doing without knowing the precise type of all the operands. Also, the context within which the code appears will make a difference to the way the optimiser generates code - the compiler could alias the variables and move the increment into instructions further down the program, for example, into effective address calculations for the two variables.
But the real question is, why do you care? As Arne said above, readability is far more important and you've not posted a scenario whereby any difference would have a measurable effect.
Only worry about it if it is actually causing a problem.
With optimizations on, they generate exactly the same code for me so they will perform exactly the same. This shouldn't be a surprise as the effects of both statements are exactly the same.
++(a = b); is undefined behaviour because there are two unsequenced modifications to a.
Although the value computation of a in a = b is sequenced before the modification of a due ++, the side-effect of a = b (storage to a) is unsequenced relative to the side-effect of ++ (again, storage to a).

What is efficient to check: equal to or not equal to?

I was wondering, if we have if-else condition, then what is computationally more efficient to check: using the equal to operator or the not equal to operator? Is there any difference at all?
E.g., which one of the following is computationally efficient, both cases below will do same thing, but which one is better (if there's any difference)?
Case1:
if (a == x)
{
// execute Set1 of statements
}
else
{
// execute Set2 of statements
}
Case 2:
if (a != x)
{
// execute Set2 of statements
}
else
{
// execute Set1 of statements
}
Here assumptions are most of the time (say 90% of the cases) a will be equal to x. a and x both are of unsigned integer type.
Generally it shouldn't matter for performance which operator you use. However it is recommended for branching that the most likely outcome of the if-statement comes first.
Usually what you should consider is; what is the simplest and clearest way to write this code? IMHO, the first, positive is the simplest (not requiring a !)
In terms of performance there is no differences as the code is likely to compile to the same thing. (Certainly in the JIT for Java it should)
For Java, the JIT can optimise the code so the most common branch is preferred by the branch prediction.
In this simple case, it makes no difference. (assuming a and x are basic types) If they're class-types with overloaded operator == or operator != they might be different, but I wouldn't worry about it.
For subsequent loops:
if ( c1 ) { }
else if ( c2 ) { }
else ...
the most likely condition should be put first, to prevent useless evaluations of the others. (again, not applicable here since you only have one else).
GCC provides a way to inform the compiler about the likely outcome of an expression:
if (__builtin_expect(expression, 1))
…
This built-in evaluates to the value of expression, but it informs the compiler that the likely result is 1 (true for Booleans). To use this, you should write expression as clearly as possible (for humans), then set the second parameter to whichever value is most likely to be the result.
There is no difference.
The x86 CPU architecture has two opcodes for conditional jumps
JNE (jump if not equal)
JE (jump if equal)
Usually they both take the same number of CPU cycles.
And even when they wouldn't, you could expect the compiler to do such trivial optimizations for you. Write what's most readable and what makes your intention more clear instead of worrying about microseconds.
If you ever manage to write a piece of Java code that can be proven to be significantly more efficient one way than the other, you should publish your result and raise an issue against whatever implementation you observed the difference on.
More to the point, just asking this kind of question should be a sign of something amiss: it is an indication that you are focusing your attention and efforts on a wrong aspect of your code. Real-life application performance always suffers from inadequate architecture; never from concerns such as this.
Early optimization is the root of all evil
Even for branch prediction, I think you should not care too much about this, until it is really necessary.
Just as Peter said, use the simplest way.
Let the compiler/optimizer do its work.
It's a general rule of thumb (most nowadays) that the source code should express your intention in the most readable way. You are writing it to another human (and not to the computer), the one year later yourself or your team mate who will need to understand your code with the less effort.
It shouldn't make any difference performance wise but you consider what is easiest to read. Then when you are looking back on your code or if someone is looking at it, you want it to be easy to understand.
it has a little advantage (from point of readability) if the first condition is the one that is true in most cases.
Write the conditions that way that you can read them best. You will not benefit from speed by negating a condition
Most processors use an electrical gate for equality/inequality checks, this means all bits are checked at once. Therefore it should make no difference, but you want to truly optimise your code it is always better to benchmark things yourself and check the results.
If you are wondering whether it's worth it to optimise like that, imagine you would have this check multiple times for every pixel in your screen, or scenarios like that. Imho, it is alwasy worth it to optimise, even if it's only to teach yourself good habits ;)
Only the non-negative approach which you have used at the first seems to be the best .
The only way to know for sure is to code up both versions and measure their performance. If the difference is only a percent or so, use the version that more clearly conveys the intent.
It's very unlikely that you're going to see a significant difference between the two.
Performance difference between them is negligible. So, just think about readability of the code. For readability I prefer the one which has a more lines of code in the If statement.
if (a == x) {
// x lines of code
} else {
// y lines of code where y < x
}

Which is faster (mask >> i & 1) or (mask & 1 << i)?

In my code I must choose one of this two expressions (where mask and i non constant integer numbers -1 < i < (sizeof(int) << 3) + 1). I don't think that this will make preformance of my programm better or worse, but it is very interesting for me. Do you know which is better and why?
First of all, whenever you find yourself asking "which is faster", your first reaction should be to profile, measure and find out for yourself.
Second of all, this is such a tiny calculation, that it almost certainly has no bearing on the performance of your application.
Third, the two are most likely identical in performance.
C expressions cannot be "faster" or "slower", because CPU cannot evaluate them directly.
Which one is "faster" depends on the machine code your compiler will be able to generate for these two expressions. If your compiler is smart enough to realize that in your context both do the same thing (e.g. you simply compare the result with zero), it will probably generate the same code for both variants, meaning that they will be equally fast. In such case it is quite possible that the generated machine code will not even remotely resemble the sequence of operations in the original expression (i.e. no shift and/or no bitwise-and). If what you are trying to do here is just test the value of one bit, then there are other ways to do it besides the shift-and-bitwise-and combination. And many of those "other ways" are not expressible in C. You can't use them in C, while the compiler can use them in machine code.
For example, the x86 CPU has a dedicated bit-test instruction BT that extracts the value of a specific bit by its number. So a smart compiler might simply generate something like
MOV eax, i
BT mask, eax
...
for both of your expressions (assuming it is more efficient, of which I'm not sure).
Use either one and let your compiler optimize it however it likes.
If "i" is a compile-time constant, then the second would execute fewer instructions -- the 1 << i would be computed at compile time. Otherwise I'd imagine they'd be the same.
Depends entirely on where the values mask and i come from, and the architecture on which the program is running. There's also nothing to stop the compiler from transforming one into the other in situations where they are actually equivalent.
In short, not worth worrying about unless you have a trace showing that this is an appreciable fraction of total execution time.
It is unlikely that either will be faster. If you are really curious, compile a simple program that does both, disassemble, and see what instructions are generated.
Here is how to do that:
gcc -O0 -g main.c -o main
objdump -d main | less
You could examine the assembly output and then look-up how many clock cycles each instruction takes.
But in 99.9999999 percent of programs, it won't make a lick of difference.
The 2 expressions are not logically equivalent, performance is not your concern!
If performance was your concern, write a loop to do 10 million of each and measure.
EDIT: You edited the question after my response ... so please ignore my answer as the constraints change things.