Is there a method to automatically find the best compiler options (on a given machine), which result in the fastest possible executable?
Naturally, I use g++ -O3, but there are additional flags that may make the code run faster, e.g. -ffast-math and others, some of which are hardware-dependent.
Does anyone know some code I can put in my configure.ac file (GNU autotools), so that the flags will be added to the Makefile automatically by the ./configure command?
In addition to automatically determining the best flags, I would be interested in some useful compiler flags that are good to use as a default for most optimized executables.
Update: Most people suggest to just try different flags and select the best ones empirically. For that method, I'd have a follow-up question: Is there a utility that lists all compiler flags that are possible for the machine I'm running on (e.g. tests if SSE instructions are available etc.)?
I don't think you can do this at configure-time, but there is at least one program which attempts to optimize gcc option flags given a particular executable and machine. See http://www.coyotegulch.com/products/acovea/ for example.
You might be able to use this with some knowledge of your target machine(s) to find a good set of options for your code.
Um - yes. This is possible. Look into profile-guided optimization.
some compilers provide "-fast" option to automatically select most aggressive optimization for given compilation host. http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler
Unfortunately, g++ does not provide similar flags.
as a follow-up to your next question, for g++ you can use -mtune option together with -O3 which will give you reasonably fast defaults. Challenge then is to find processor type of your compilation host. you may want to look on autoconf macro archive, to see somebody wrote necessary tests. otherwise, assuming linux, you have to parse /proc/cpuinfo to get processor type
After some googling, I found this script: gcccpuopt.
On one of my machines (32bit), it outputs:
-march=pentium4 -mfpmath=sse
On another machine (64bit) it outputs:
$ ./gcccpuopt
Warning: The optimum *32 bit* architecture is reported
-m32 -march=core2 -mfpmath=sse
So, it's not perfect, but might be helpful.
See also -mcpu=native/-mtune=native gcc options.
Is there a method to automatically find the best compiler options (on a given machine), which result in the fastest possible executable?
No.
You could compile your program with a large assortment of compiler options, then benchmark each and every version, then select the one that is "fastest," but that's hardly reliable and probably not useful for your program.
This is a solution that works for me, but it does take a little while to set up. In "Python Scripting for Computational Science" by Hans Petter Langtangen (an excellent book in my opinion), an example is given of using a short python script to do numerical experiments to determine the best compiler options for your C/Fortran/... program. This is described in Chapter 1.1.11 on "Nested Heterogeneous Data Structures".
Source code for examples from the book are freely available at http://folk.uio.no/hpl/scripting/index.html (I'm not sure of the license, so will not reproduce any code here), and in particular you can find code for a similar numerical test in the code in TCSE3-3rd-examples.tar.gz in the file src/app/wavesim2D/F77/compile.py , which you could use as a base for writing a script which is appropriate for a particular system/language (C++ in your case).
Optimizing your app is mainly your job, not the compiler's.
Here's an example of what I'm talking about.
Once you've done that, IF your app is compute-bound, with hotspots in your code (not in library code) THEN the compiler optimizations for speed will make some difference, so you can try different flag combinations.
Related
For gcc flags like --coverage, do they have any side effect, for example run time performance penalty?
If not then can I just leave them in the common makefile and use them to build production code?
Gcc document doesn't say anything about this.
This question
How Does The Debugging Option -g Change the Binary Executable?
explains -g, but how about --coverage?I guess the answer is yes, the added instrument code must cost quite a bit at runtime
Transferring a succinct comment as an answer so maybe there can be closure on the question.
Coverage code costs space and time. You probably don't want them in production code.
Debug information isn't loaded normally; the cost is much smaller (basically, zero).
I have a very simple question but I haven't been able to find the answer yet so here I go:
I am using a shared library and I'd like to know if it had been compiled with an optimization flag such as -O3 or not.
Is there a way to find that information ?
Thank you very much
If you are using gcc 4.3 or later, check out -frecord-gcc-switches. After you build the binary file use readelf -n to read the notes section.
More info can be found here Detect GCC compile-time flags of a binary
Unless whoever compiled the library in the first place used a compiler that saves these flags to the binary somehow (I think only recent GCC allows that, and probably clang), there's inherently no way to know exactly what flags have been used. You can, of course, if you have had a lot of experience looking at assembly, deduct a lot (for example "this looks like an automatically unrolled loop", "This looks like someone optimized for a processor where A xor A is faster than A := 0x0", etc).
Generally, there's always different source code that can end up as the same compiled code, so there's no way to tell wether what has been compiled was optimized "by hand" in the first place or has seen compiler optimization in many cases.
Also, there are a lot of C++ compilers out there, a lot of versions of these and even more flags...
Now, your question comes out of somewhere; I'm guessing you're asking this because either
you want to know if there's debugging symbols in there, or
you want to make sure something isn't crashing because of incorrect optimization, or
you want to know whether there's potential for optimization.
Now, 1. is really rather independent of the level of optimization; of course, the more you optimize, the less your bytecode corresponds to "lines of source code", but you can still have debugging symbols.
The second point: I've learned the hard way that unless I've successfully excluded every other alternative, I'm the one to blame for bugs (and not my compiler).
The third point: There's always room for optimization, but that won't help you unless you're in a position to recompile the library yourself. If you recompile, you'll set the flags, so no need to find out if they were set in the first place. If you're not able to recompile: Knowing there is room won't help you. If you're just getting your library out of a complex build process: Most build systems leave you with a log that will include things like compiler flags.
I have a task to create optimized C++ source code and give it to friend for compilation. It means, that I do not control the final compilation, I just write the source code of C++ program.
I know, that a can make optimization during compilation with -O1 (and -O2 and others) options of GCC. But how can I get this optimized source code instead of compiled program? I am not able to configure parameters of my friend's compiler, that is why I need to make a good source on my side.
The optimizations performed by GCC are low level, that means you won't get C++ code again but assembly code in best case. But you won't be able to convert it or something.
In sum: Optimize the source code on code level, not on object level.
You could ask GCC to dump its internal (Gimple, ...) representations, at various "stages". The middle-end of GCC is made of hundreds of passes, and you could ask GCC to dump them, with arguments like -fdump-tree-all or -fdump-gimple-all; beware that you can get hundreds of dump files for a single compilation!
However, GCC internal representations are quite low level, and you should not expect to understand them without reading a lot of material.
The dump options I am mentionning are mostly useful to those working inside GCC, or extending it thru plugins coded in C or extensions coded in MELT (a high-level domain specific language to extend GCC). I am not sure they will be very useful to your friend. However, they can be useful to make you understand that optimization passes do a lot of complex processing.
And don't forget that premature optimization is evil : you should first make your program run correctly, then benchmark and profile it, at last optimize the few parts worth of your efforts. You probably won't be able to write correct & efficient programs without testing and running them yourself, before giving them to your friend.
Easy - choose the best algorithm possible, let the rest be handled by the optimizer.
Optimizing the source code is different than optimizing the binary. You optimize the source code, the compiler will optimize the binary.
For anything more than algorithm choice, you'll need to do some profiling. Sure, there are practices that can speed up code speed, but some make the code less readable. Only optimize when you have to, and after you measure.
I just asked a question related to how the compiler optimizes certain C++ code, and I was looking around SO for any questions about how to verify that the compiler has performed certain optimizations. I was trying to look at the assembly listing generated with g++ (g++ -c -g -O2 -Wa,-ahl=file.s file.c) to possibly see what is going on under the hood, but the output is too cryptic to me. What techniques do people use to tackle this problem, and are there any good references on how to interpret the assembly listings of optimized code or articles specific to the GCC toolchain that talk about this problem?
GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.
Using the -fdump-* family of options, you can ask GCC to output intermediary states of the tree.
For example, feed this to gcc -c -fdump-tree-all -O3
unsigned fib(unsigned n) {
if (n < 2) return n;
return fib(n - 2) + fib(n - 1);
}
and watch as it gradually transforms from simple exponential algorithm into a complex polynomial algorithm. (Really!)
A useful technique is to run the code under a good sampling profiler, e.g. Zoom under Linux or Instruments (with Time Profiler instrument) under Mac OS X. These profilers not only show you the hotspots in your code but also map source code to disassembled object code. Highlighting a source line shows the (not necessarily contiguous) lines of generated code that map to the source line (and vice versa). Online opcode references and optimization tips are a nice bonus.
Instruments: developer.apple.com
Zoom: www.rotateright.com
Not gcc, but when debugging in Visual Studio you have the option to intersperse assembly and source, which gives a good idea of what has been generated for what statement. But sometimes it's not quite aligned correctly.
The output of the gcc tool chain and objdump -dS isn't at the same granularity. This article on getting gcc to output source and assembly has the same options as you are using.
Adding the -L option (eg, gcc -L -ahl) may provide slightly more intelligible listings.
The equivalent MSVC option is /FAcs (and it's a little better because it intersperses the source, machine language, and binary, and includes some helpful comments).
About one third of my job consists of doing just what you're doing: juggling C code around and then looking at the assembly output to make sure it's been optimized correctly (which is preferred to just writing inline assembly all over the place).
Game-development blogs and articles can be a good resource for the topic since games are effectively real-time applications in constant memory -- I have some notes on it, so does Mike Acton, and others. I usually like to keep Intel's instruction set reference up in a window while going through listings.
The most helpful thing is to get a good ground-level understanding of assembly programming generally first -- not because you want to write assembly code, but because having done so makes reading disassembly much easier. I've had a hard time finding a good modern textbook though.
In order to output the optimizations applied you can use:
-fopt-info-optimized
To see those that have not been applied
-fopt-info-missed
Beware that the output is sent to standard error stream so to see it you actually have to redirect that : ( hint 2>&1 )
Here is nice example of :
g++ -O3 -std=c++11 -march=native -mtune=native
-fopt-info-optimized h2d.cpp -o h2d 2>&1
h2d.cpp:225:3: note: loop vectorized
h2d.cpp:213:3: note: loop vectorized
h2d.cpp:198:3: note: loop vectorized
h2d.cpp:186:3: note: loop vectorized
You can check the interleaved output, when having applied -g with objdump -dS|c++filt , but that will not get you that far.Enjoy!
Zoom from RotateRight ( http://rotateright.com ) is mentioned in another answer, but to expand on that: it shows you the mapping of source to assembly in what they call the "code browser". It's incredibly handy even if you're not an asm expert because they have also integrated assembly documentation into the app. And the assembly listing is annotated with comments and timing for several CPU types.
You can just open your object or executable file with Zoom and take a look at what the compiler has done with your code.
Victor, in your case the optimization you are looking for is just a smaller allocation of local memory on the stack. You should see a smaller allocation at function entry and a smaller deallocation at function exit if the space used by the empty class is optimized away.
As for the general question, I've been reading (and writing) assembly language for more than (gulp!) 30 years and all I can say is that it takes practice, especially to read the output of a compiler.
Instead of trying to read through an assembler dump, run your program inside a debugger. You can pause execution, single-step through instructions, set breakpoints on the code you want to check, etc. Many debuggers can display your original C code alongside the generated assembly so you can more easily see what the compiler did to optimize your code.
Also, if you are trying to test a specific compiler optimization you can create a short dummy function that contains the type of code that fits the optimization you are interested in (and not much else, the simpler it is the easier the assembly is to read). Compile the program once with optimizations on and once with them off; comparing the generated assembly code for the dummy function between builds should show you what the compiler's optimizers did.
I have a proprietary application I would like to hand out to a few people for testing, except we do not want to reveal the source to them. The application is written in C++ for Linux. It links against readily available packages on the Fedora/Ubuntu repos.
Is there any way to process the source to something intermediate... then distribute it, and have the users do a final compile which actually compiles and links the intermediate code to their native platform.
I am trying to see if there is any alternative to distributing precompiled binaries. Thanks.
Just compile it to assembler. Can be done using the -S option.
helloworld.cpp:
#include <iostream>
using namespace std;
int main(void)
{
cout << "Hello World" << endl;
return 0;
}
And then do:
emil#lanfear /home/emil/dev/assemblertest $ g++ -S -o helloworld.s helloworld.cpp
emil#lanfear /home/emil/dev/assemblertest $ g++ -o helloworld helloworld.s
emil#lanfear /home/emil/dev/assemblertest $ ./helloworld
Hello World
Using this method you can distribute only the .s-files which will contain very hard to read assembler.
It's not a technical answer, but do you trust them enough to just ask for a signed NDA?
You can process the existing source code to "mangle" it; basically this consists of stripping out all comments, and changing variable names to be minimal and stripping out all source code formatting. The problem is, they can relatively easily change the variable names back and add formatting and comments; while they won't have the same level of information in the resultant source code as you do, they WILL have completely functional source code (because that's what you distributed to them). This is about the only way to do this sort of thing.
In short, no. By definition if they can compile it then they have your source. The best you can do is increase the pain of them trying to understand it.
I agree with John. If you have a small number of clients and trust them, an NDA would be a better route.
Another thing I just thought about... what about just running the preprocessor and compiler, but not the assembler and the linker? You would need a copy for each specific architecture's assembly language, but I assume that would be painful enough to dissuade editing while easy enough to compile.
You could divide your application in two parts: first part will contain precompiled libraries with the OS independent functionality, and the second one will contain a little parts of sources that users would compile. In such way NVIDIA distributes their drivers.
You could obfuscate your C/C++ code. See my question for C/C++ obfuscation tools
The best solution I can come up with is to either limit the number of platforms you support and do your own compilations, or compile to a binary format that is hard to extract information from, but the user can compile this to their native format.
Personally I would go with option 1.
If it's C, you can use the clang compiler, then dump the LLVM intermediate representation. If it's C++, you might need to wait a while until clang's C++ support matures.