Compilation and Code Optimization - c++

I will state my problem in a very simplified form, which is:
If I type in C
void main(){
int a=3+2;
double b=7/2;
}
When will a and b, be assigned their values of 5 and 3.5 is it when I compile my code or is it when I run the code?
In other words, What will happen when I press compile? and how it is different from the case when I press run, in terms of assigning the values and doing the computations and how is that different from writing my code as:
void main(){
int a=5;
double b=3.5;
}
I am asking this because I have heard about compiler optimization but it is not really my area.
Any comments, reviews will be highly appreciated.
Thank you.

Since you are asking about "code optimization" - a good optimizing compiler will optimize this code down to void main(){}. a and b will be completely eliminated.
Also, 7/2 == 3, not 3.5

Compiling will translate the high-level language into the lower language, such as assembly. A good compiler may optimize, and this can be customizable (for example with -O2) option or so.
Regarding your code, double b=7/2; will yield 3.0 instead of 3.5, because you do an integer and integer operation. If you would like to have 3.5, you should do it like double b=7.0/2.0;. This is a quite common mistake that people do.

What will happen when I press compile?
Nobody knows. The compiler may optimize it to a constant, or it may not. It probably will, but it isn't required to.
You generally shouldn't worry or really even think about compiler optimization, unless you're in a position that absolutely needs it, which very few developers are. The compiler can usually do a better job than you can.

It's compiler-dependent, a good one will do CF and/or DCE

I don't know anything about optimization either, but I decided to give this a shot. Using, gcc -c -S test.c I got the assembly for the function. Here's what the line int a = 3 + 2 comes out as.
movl $5, -4(%rbp)
So for me, it's converting the value (3+2) to 5 at compile time, but it depends on the compiler and platform and whatever flags you pass it.
(Also, I made the function return a just so that it wouldn't optimize the code out entirely.)

Related

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

C++ compiler optimization for complex equations

I have some equations that involve multiple operations that I would like to run as fast as possible. Since the c++ compiler breaks it down in to machine code anyway does it matter if I break it up to multiple lines like
A=4*B+4*C;
D=3*E/F;
G=A*D;
vs
G=12*E*(B+C)/F;
My need is more complex than this but the i think it conveys the idea. Also if this is in a function that gets called is in a loop, does defining double A, D cost CPU time vs putting it in as a class variable?
Using a modern compiler, Clang/Gcc/VC++/Intel, it won't really matter, the best thing you should do is worry about how readable your code will be and turn on optimizations, compiler designers are well aware of issues like these and design their compilers to (for the most part) optimize according.
If I were to say which would be slower I would assume the first way since there would be 3 mov instructions, I could be wrong. but this isn't something you should worry about too much.
If these variables are integers, that second code fragment is not a valid optimization of the first. For B=1, C=1, E=1, F=6, you have:
A=4*B+4*C; // 8
D=3*E/F; // 0
G=A*D; // 0
and
G=12*E*(B+C)/F; // 4
If floating point, then it really depends on what compiler, what compiler options, and what cpu you have.

rate ++a,a++,a=a+1 and a+=1 in terms of execution efficiency in C.Assume gcc to be the compiler [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there a performance difference between i++ and ++i in C++?
In terms of usage of the following, please rate in terms of execution time in C.
In some interviews i was asked which shall i use among these variations and why.
a++
++a
a=a+1
a+=1
Here is what g++ -S produces:
void irrelevant_low_level_worries()
{
int a = 0;
// movl $0, -4(%ebp)
a++;
// incl -4(%ebp)
++a;
// incl -4(%ebp)
a = a + 1;
// incl -4(%ebp)
a += 1;
// incl -4(%ebp)
}
So even without any optimizer switches, all four statements compile to the exact same machine code.
You can't rate the execution time in C, because it's not the C code that is executed. You have to profile the executable code compiled with a specific compiler running on a specific computer to get a rating.
Also, rating a single operation doesn't give you something that you can really use. Todays processors execute several instructions in parallel, so the efficiency of an operation relies very much on how well it can be paired with the instructions in the surrounding code.
So, if you really need to use the one that has the best performance, you have to profile the code. Otherwise (which is about 98% of the time) you should use the one that is most readable and best conveys what the code is doing.
The circumstances where these kinds of things actually matter is very rare and few in between. Most of the time, it doesn't matter at all. In fact I'm willing to bet that this is the case for you.
What is true for one language/compiler/architecture may not be true for others. And really, the fact is irrelevant in the bigger picture anyway. Knowing these things do not make you a better programmer.
You should study algorithms, data structures, asymptotic analysis, clean and readable coding style, programming paradigms, etc. Those skills are a lot more important in producing performant and manageable code than knowing these kinds of low-level details.
Do not optimize prematurely, but also, do not micro-optimize. Look for the big picture optimizations.
This depends on the type of a as well as on the context of execution. If a is of a primitive type and if all four statements have the same identical effect then these should all be equivalent and identical in terms of efficiency. That is, the compiler should be smart enough to translate them into the same optimized machine code. Granted, that is not a requirement, but if it's not the case with your compiler then that is a good sign to start looking for a better compiler.
For most compilers it should compile to the same ASM code.
Same.
For more details see http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
I can't see why there should be any difference in execution time, but let's prove me wrong.
a++
and
++a
are not the same however, but this is not related to efficiency.
When it comes to performance of individual lines, context is always important, and guessing is not a good idea. Test and measure is better
In an interview, I would go with two answers:
At first glance, the generated code should be very similar, especially if a is an integer.
If execution time was definitely a known problem - you have to measure it using some kind of profiler.
Well, you could argue that a++ is short and to the point. It can only increment a by one, but the notation is very well understood. a=a+1 is a little more verbose (not a big deal, unless you have variablesWithGratuitouslyLongNames), but some might argue it's more "flexible" because you can replace the 1 or either of the a's to change the expression. a+=1 is maybe not as flexible as the other two but is a little more clear, in the sense that you can change the increment amount. ++a is different from a++ and some would argue against it because it's not always clear to people who don't use it often.
In terms of efficiency, I think most modern compilers will produce the same code for all of these but I could be mistaken. Really, you'd have to run your code with all variations and measure which performs best.
(assuming that a is an integer)
It depends on the context, and if we are in C or C++. In C the code you posted (except for a-- :-) will cause a modern C compiler to produce exactly the same code. But by a very high chance the expected answer is that a++ is the fastest one and a=a+1 the slowest, since ancient compilers relied on the user to perform such optimizations.
In C++ it depends of the type of a. When a is a numeric type, it acts the same way as in C, which means a++, a+=1 and a=a+1 generate the same code. When a is a object, it depends if any operator (++, + and =) is overloaded, since then the overloaded operator of the a object is called.
Also when you work in a field with very special compilers (like microcontrollers or embedded systems) these compilers can behave very differently on each of these input variations.

GCC: program doesn't work with compilation option -O3

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.
Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?
My GCC version is 3.4.6.
Is there any known workaround for this kind of problem?
There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.
Now that you posted the code fragment and a working workaround was found (#Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html
I would assume your code is wrong first.
Though it is hard to tell.
Does your code compile with 0 warnings?
g++ -Wall -Wextra -pedantic -ansi
Here's some code that seems to work, until you hit -O3...
#include <stdio.h>
int main()
{
int i = 0, j = 1, k = 2;
printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
return 0;
}
Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!
But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.
I find valgrind is often helpful at times like these.
EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!
The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.
On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.
As an experiment, try to see if this will force the compiler to round everything consistently.
volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;
The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.
You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.
This kind of question is impossible to answer properly without more information.
It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).
In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.
Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.
It's almost (almost) never the compiler.
First, make sure you're compiling warning-free, with -Wall.
If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.
5 will get you 10 that you've fixed the problem by this point.
Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.
I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.
Alright...
This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.
Wow, I didn't expect answers so quicly, and so many...
The error occurs upon sorting a std::vector of pointers using std::sort()
I provide the strict-weak-ordering functor.
But I know the functor I provide is correct because I've used it a lot and it works fine.
Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.
I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)
as other have pointed out, probably strict aliasing.
turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions.
warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.
As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.
post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!
The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.
The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.
I consider -ffloat-store the best solution here (already suggested by CesarB).

Which compiles to faster code: "n * 3" or "n+(n*2)"?

Which compiles to faster code: "ans = n * 3" or "ans = n+(n*2)"?
Assuming that n is either an int or a long, and it is is running on a modern Win32 Intel box.
Would this be different if there was some dereferencing involved, that is, which of these would be faster?
long a;
long *pn;
long ans;
...
*pn = some_number;
ans = *pn * 3;
Or
ans = *pn+(*pn*2);
Or, is it something one need not worry about as optimizing compilers are likely to account for this in any case?
IMO such micro-optimization is not necessary unless you work with some exotic compiler. I would put readability on the first place.
It doesn't matter. Modern processors can execute an integer MUL instruction in one clock cycle or less, unlike older processers which needed to perform a series of shifts and adds internally in order to perform the MUL, thereby using multiple cycles. I would bet that
MUL EAX,3
executes faster than
MOV EBX,EAX
SHL EAX,1
ADD EAX,EBX
The last processor where this sort of optimization might have been useful was probably the 486. (yes, this is biased to intel processors, but is probably representative of other architectures as well).
In any event, any reasonable compiler should be able to generate the smallest/fastest code. So always go with readability first.
As it's easy to measure it yourself, why don't do that? (Using gcc and time from cygwin)
/* test1.c */
int main()
{
int result = 0;
int times = 1000000000;
while (--times)
result = result * 3;
return result;
}
machine:~$ gcc -O2 test1.c -o test1
machine:~$ time ./test1.exe
real 0m0.673s
user 0m0.608s
sys 0m0.000s
Do the test for a couple of times and repeat for the other case.
If you want to peek at the assembly code, gcc -S -O2 test1.c
This would depend on the compiler, its configuration and the surrounding code.
You should not try and guess whether things are 'faster' without taking measurements.
In general you should not worry about this kind of nanoscale optimisation stuff nowadays - it's almost always a complete irrelevance, and if you were genuinely working in a domain where it mattered, you would already be using a profiler and looking at the assembly language output of the compiler.
It's not difficult to find out what the compiler is doing with your code (I'm using DevStudio 2005 here). Write a simple program with the following code:
int i = 45, j, k;
j = i * 3;
k = i + (i * 2);
Place a breakpoint on the middle line and run the code using the debugger. When the breakpoint is triggered, right click on the source file and select "Go To Disassembly". You will now have a window with the code the CPU is executing. You will notice in this case that the last two lines produce exactly the same instructions, namely, "lea eax,[ebx+ebx*2]" (not bit shifting and adding in this particular case). On a modern IA32 CPU, it's probably more efficient to do a straight MUL rather than bit shifting due to pipelineing nature of the CPU which incurs a penalty when using a modified value too soon.
This demonstrates what aku is talking about, namely, compilers are clever enough to pick the best instructions for your code.
It does depend on the compiler you are actually using, but very probably they translate to the same code.
You can check it by yourself by creating a small test program and checking its disassembly.
Most compilers are smart enough to decompose an integer multiplication into a series of bit shifts and adds. I don't know about Windows compilers, but at least with gcc you can get it to spit out the assembler, and if you look at that you can probably see identical assembler for both ways of writing it.
It doesn't care. I think that there are more important things to optimize. How much time have you invested thinking and writing that question instead of coding and testing by yourself?
:-)
As long as you're using a decent optimising compiler, just write code that's easy for the compiler to understand. This makes it easier for the compiler to perform clever optimisations.
You asking this question indicates that an optimising compiler knows more about optimisation than you do. So trust the compiler. Use n * 3.
Have a look at this answer as well.
Compilers are good at optimising code such as yours. Any modern compiler would produce the same code for both cases and additionally replace * 2 by a left shift.
Trust your compiler to optimize little pieces of code like that. Readability is much more important at the code level. True optimization should come at a higher level.