What extra optimisation does g++ do with -Ofast? - c++

In g++ 4.6 (or later), what extra optimisations does -Ofast enable other than -ffast-math?
The man page says this option "also enables optimizations that are not valid for all standard compliant programs". Where can I find more information about whether this might affect my program or not?

Here's a command for checking what options are enabled with -Ofast:
$ g++ -c -Q -Ofast --help=optimizers | grep enabled
Since I only have g++ 4.4 that doesn't support -Ofast, I can't show you the output.

The -Ofast options might silently enable the gcc C++ extensions. You should check your sources to see if you make any use of them. In addition, the compiler might turn off some obscure and rarely encountered syntax checking for digraphs and trigraphs (this only improves compiler performance, not the speed of the compiled code).

Related

Intel Pin with C++14

The Questions
I have a few questions surrounding usage of Intel Pin with C++14 or other C++ verions.
There are rarely any problems compiling code from older C++ with newer versions, but since Intel Pin is manipulates instruction level, is there any undesirable side effects that might come if I compile it with C++11 or C++14?
If it's ok to compile with C++11 or C++14, how do I make a rule to enable a newer version of C++ for my tool only?
How do I set GCC/G++ default C++ version to latest, if possible, and what should I keep in mind when doing so?
Situation
I'm building a dynamic call graph pin tool. To make it understandable, I'm computing the depth of the call stack. For safety, I decided to wrap the excerpt of code that increments or decrements the depth with a std::mutex. This has gotten me to the problem that std::mutex is available only since C++11, which is not Intel Pin default in my machine.
$ g++ -v
[...]
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)
Compile command:
$ make obj-intel64/callgraph.so
[...]
error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support
[...]
EDIT
I managed to make a build rule that defines version to C++11, but it breaks. The command sent to g++ through make was
g++ -DBIGARRAY_MULTIPLIER=1 -Wall -Werror -Wno-unknown-pragmas -D__PIN__=1
-DPIN_CRT=1 -fno-stack-protector -fno-exceptions -funwind-tables
-fasynchronous-unwind-tables -fno-rtti -DTARGET_IA32E -DHOST_IA32E -fPIC
-DTARGET_LINUX -fabi-version=2 -I/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/source/include/pin
-I/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/source/include/pin/gen
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/stlport/include
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/libstdc++/include
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/crt/include
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/crt/include/arch-x86_64
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/crt/include/kernel/uapi
-isystem /home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/crt/include/kernel/uapi/asm-x86
-I/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/components/include
-I/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/xed-intel64/include
-I/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/source/tools/InstLib -O3
-fomit-frame-pointer -fno-strict-aliasing -std=c++11
-c -o obj-intel64/callgraph.o callgraph.cpp
This doesn't compile. Instead, it'll fall into a huge error log inside STL headers. It appears that Pin comes along with it's own subset of STL, that conflicts with C++11 and C++14. I've uploaded a paste of the g++ output. It filled 2331 lines, but I've noticed that strange thing in the folders it visits. STL libraries are included from 2 different directories:
/usr/include/c++/5/
/home/gabriel/Downloads/pin-3.0-76991-gcc-linux/extras/stlport/include/
Solving errors one-by-one is unfeasible, deleting pin stl port probably is an even worse idea. If it's possible to use Pin with newer C++, I'd say simple std=c++1y is not the way.
From the compiler options used to compile the pin tool, I presume you are using the latest version of Pin, namely 3.0. According to Intel, the CRT that ships with the framework doesn't support C++11 and later versions of the language. In particular, you will not be able to use any of the APIs supported in C++11 including std::mutex. If it's critical for you to use C++11 APIs then you should use the previous version of Pin, namely 2.14, which doesn't ship with a CRT and uses the CRT of your compiler.
However, if all you want is a mutex, you can use the OS-portable mutex that ships with Pin 3.0. For more information, refer to the documentation.
When using Pin 3.0 you are not allowed to use any header file or object file of your compiler (those from /usr/include/c++/5/). You can only use PinCRT and few system header files.

Can I enable vectorization only for one part of the code?

Is there a way to enable vectorization only for some part of the code, like a pragma directive? Basically having as if the -ftree-vectorize is enabled only while compiling some part of the code? Pragma simd for example is not available with gcc...
The reason is that from benchmarking we saw that with -O3 (which enables vectorization) the timings were worse than with -O2. But there are some part of the code for which we would like the compiler to try vectorizing loops.
One solution I could use would be to restrict the compiler directive to one file.
Yes, this is possible. You can either disable it for the whole module or individual functions. You can't however do this for particular loops.
For individual functions use
__attribute__((optimize("no-tree-vectorize"))).
For whole modules -O3 automatic enables -ftree-vectorize. I'm not sure how to disable it once it's enabled but you can use -O2 instead. If you want to use all of -O3 except -ftree-vectorize then do this
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts | grep enabled
And then include all the options except for -ftree-vectorize.
Edit: I don't see -fno-tree-vectorize in the man pages but it works anyway so you can do -O3 -fno-tree-vectorize.
Edit: The OP actually wants to enable vectorization for particular functions or whole modules. In that case for individual functions __attribute__((optimize("tree-vectorize"))) can be used and for whole modules -O2 -ftree-vectorize.
Edit (from Antonio): In theory there is a pragma directive to enable tree-vectorizing all functions that follow
#pragma GCC optimize("tree-vectorize")
But it seems not to work with my g++ compiler, maybe because of the bug mentioned here:
How to enable optimization in G++ with #pragma. On the other hand, the function attribute works.

What is the -O5 flag for compiling gfortran .f90 files?

I see the flag in the documentation of how to compile some f90 code I have acquired (specifically, mpfi90 -O5 file.f90), but researching the -O5 flag turned up nothing in the gfortran docs, mpfi docs, or anywhere else. I assume it is an optimization flag like -O1, etc., but I'm not sure.
Thanks!
Source: http://publib.boulder.ibm.com/infocenter/comphelp/v7v91/index.jsp?topic=%2Fcom.ibm.xlf91a.doc%2Fxlfug%2Fhu00509.htm
The flag -O5 is an optimizer like -O3 and -O2. The linked source says,
qnoopt/-O0 Fast compilation, debuggable code, conserved program
semantics.
-O2 (same as -O) Comprehensive low-level optimization; partial debugging support.
-O3 More extensive optimization; some precision trade-offs.
-O4 and -O5 Interprocedural optimization; loop optimization; automatic machine tuning.
With each higher number containing all the optimizations of the lower levels.

Being extremely pedantic with the way your code is compiled

I would like to find out which is the most extreme error checking flag combination for g++ (4.7). We are not using the new C++11 specification, since we need to cross compile the code with older compilers, and these older compilers (mostly g++ 4.0) often cause problems which simply are ignored by the g++4.7.
Right now we use the following set of flags:
-Wall -Wcomment -Wformat -Winit-self -ansi -pedantic-errors \
-Wno-long-long -Wmissing-include-dirs -Werror -Wextra
but this combination does not identify issues such as a double being passed in to a function which expects int, or comparison between signed and unsigned int and this causes the old compiler to choke on it.
I have read through the documentation and -Wsign-compare should be enabled by -Wextra but in practice seems this is not the case, so I might have missed something...
The -ansi is alias for the default standard without GNU extensions. I'd suggest instead being explicit using -std=c++98, but it should be default for g++ -ansi, so not really different.
But generally I've never seen anything that would be accepted by newer gcc and rejected by older gcc on the grounds of being invalid. I suspect any such problem is a bug in the older compiler or it's standard library. Gcc does not have warnings for things that are correct, but didn't work with older versions of it, so you don't have any other option than to test with the older version.
As for the specific issues you mention:
Passing double to function that expects int is not an error. It might be undefined behaviour though. -Wconversion should help.
Comparing signed with unsigned is also well defined, also always worked as defined and in case of equality comparisons actually makes programmers write worse code (comparing unsigned variable larger than int with -1 is something else than comparing it with -1u). So I actually always compile with -Wno-sign-compare.
The compiler should not print warnings for headers found in directories given with -isystem instead of -I, so that should let you silence the warning for Qt headers and keep it enabled for your own code. So you should be able to use -Wconversion.
Use lint or some other static analysis tool to check the code, in addition to compiler. On my Linux distro, apt-get install splint will get splint, maybe check if that has been packaged for your OS for easy installation.

What is the difference between -O0 ,-O1 and -g

I am wondering about the use of -O0,-O1 and -g for enabling debug symbols in a lib.
Some suggest to use -O0 to enable debug symbols and some suggest to use -g.
So what is the actual difference between -g and -O0 and what is the difference between -01 and -O0 and which is best to use.
-O0 is optimization level 0 (no optimization, same as omitting the -O argument)
-O1 is optimization level 1.
-g generates and embeds debugging symbols in the binaries.
See the gcc docs and manpages for further explanation.
For doing actual debugging, debuggers are usually not able to make sense of stuff that's been compiled with optimization, though debug symbols are useful for other things even with optimization, such as generating a stacktrace.
-OX specify the optimisation level that the compiler will perform. -g is used to generate debug symbols.
From GCC manual
http://gcc.gnu.org/onlinedocs/
3.10 Options That Control Optimization`
-O
-O1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.`
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.`
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-cp-clone options.`
-O0
Reduce compilation time and make debugging produce the expected results. This is the default. `
-g
Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2). GDB can work with this debugging information.`
-O0 doesn't enable debug symbols, it just disables optimizations in the generated code so debugging is easier (the assembly code follows the C code more or less directly). -g tells the compiler to produce symbols for debugging.
It's possible to generate symbols for optimized code (just continue to specify -g), but trying to step through code or set breakpoints may not work as you expect because the emitted code will likely not "follow along" with the original C source closely. So debugging in that situation can be considerably trickier.
-O1 (which is the same as -O) performs a minimal set of optimizations. -O0 essentially tells the compiler not to optimize. There are a slew of options that allow a very fine control over how you might want the compiler to perform: http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Optimize-Options.html#Optimize-Options
As mentioned by others, -O set of options indicate the levels of optimization that must be done by the compiler whereas, the -g option adds the debugging symbols.
For a more detailed understanding, please refert to the following links
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options