When to use -O2 flag for gcc? - c++

If I use "-O2" flag, the performance improves, but the compilation time gets longer.
How can I decide, whether to use it or not?
Maybe O2 makes the most difference in some certain types of code (e.g. math calculations?), and I should use it only for those parts of the project?
EDIT: I want to emphasize the fact that setting -O2 for all components of my project changes the total compilation time from 10 minutes to 30 minutes.

I would recommend using -O2 most of the time, benefits include:
Usually reduces size of generated code (unlike -O3).
More warnings (some warnings require analysis that is only done during optimization)
Often measurably improved performance (which may not matter).
If release-level code will have optimization enabled, it's best to have optimization enabled throughout the development/test cycle.
Source-level debugging is more difficult with optimizations enabled, occasionally it is helpful to disable optimization when debugging a problem.

I'm in bioinformatics so my advice may be biased. That said, I always use the -O3 switch (for release and test builds, that is; not usually for debugging). True, it has certain disadvantages, namely increasing compile-time and often the size of the executable.
However, the first factor can be partially mitigated by a good build strategy and other tricks reducing the overall build time. Also, since most of the compilation is really I/O bound, the increase of compile time is often not that pronounced.
The second disadvantage, the executable's size, often simply doesn't matter at all.

Never.
Use -O3 -Wall -Werror -std=[whatever your code base should follow]

Always, except when you're programming and just want to test something you just wrote.

We usually have our build environment set up so that we can build debug builds that use -O0 and release builds that use -O3 (the build enviroment preserves the objects and libraries of all configurations, so that one can switch easily between configurations). During development one mostly builds and runs the debug configuration for faster build speed (and more accurate debug information) and less frequently also builds and tests the release configuration.

Is the increased compilation time really noticable? I use -O2 all the time as the default, anything less just leaves a lot of "friction" in your code. Also note that the optimization levels of -O1, -O2 tends to be the best tested, as they are most interesting. -O0 tends to be more buggy, and you can debug pretty well at -O2 in my experience. Provided you have some idea about what a compiler can do in terms of code reordering, inlining, etc.
-Werror -Wall is necessary.

Related

C++ Debug and Release build configurations

Currently using VSCode, g++, C++20, Ubuntu 20.04 Lts.
What compiler flags can I use for release builds and debug builds separately? Do I turn off every optimization flag for debug builds? Or does it not really matter? I would appreciate any advice, recommendations, or feedback as I couldn't find much on my own.
Do I turn off every optimization flag for debug builds?
Yes, I would say that is the best way to go, and it does really matter! Depending on your code, your understanding of the compiler/debugger and level of optimisation chosen, the experience of debugging it will vary from mildly annoying to frustrating and almost useless. This answer gives a synopsis of the different levels for gcc and this question has several answers going into more detail about optimisations.
As a summary, the compiler is in general allowed to modify your code in any way it sees fit, as long as it still behaves as if all your statements were executed exactly as written. In practice, -O1 already enables dozens of techniques and -O2 and -O3 will probably leave almost nothing untouched, which makes it harder to pinpoint issues because:
Stepping through code may visit statements in a different order or skip them entirely, also hindering the use of breakpoints;
Function calls may disappear because they were inlined, and no longer be callable from the debugging prompt;
Local variables tend to have shorter lifetimes than in your source code, so you can't always query their values.
I personally build with CMake and primarily use two of its build types:
Debug (-g): No optimisations, compiles runtime assert statements;
RelWithDebInfo (-O2 -g -DNDEBUG): Fast code without these assertions that is harder to debug, but suitable for performance analysis once your program is working correctly.

Optimized Execution Time

Because of a school assignment I have to convert a C++ code to assembly(ARMv8). Then I have to compile the C++ code using GCC's -O0,-O1,-O2,-O3 and -Os optimizations, write down the time and compare with the execute time of my assembly code. As, I think I know -O3 have to be faster than -O1 and -O2. However, I get that -O2 is the fastest, then are -O1,-O3,-Os,-O0. Is that usual? (Calculated times are about 30 seconds).
Notice that GCC has many other optimization flags.
There is no guarantee that -O3 gives faster code than -O2; a compiler can apply more optimization passes, but they are all heuristics and might be unsuccessful (or even slow down slightly your particular code). Hence it does happen that -O3 gives some slightly slower code than -O2 (on some particular input source code).
You could try a more recent version of GCC (the latest -in November 2017- is GCC 7, GCC 8 will go out in few months). You could also try some better -march= or -mtune= option.
At last, with your GCC plugin, you might add your own optimization pass, or change the order (and the set) of applied optimization passes (there are several hundreds different optimization passes in GCC). But you'll need a lot of work (perhaps a year or two) to be able to extend GCC.
You could tune optimization parameters, and some project (MILEPOST) has even used machine learning techniques to improve them.
See also slides and references on my (old) GCC MELT documentation.
Yes, it is usual. Take the -Ox optimization as guide-lines. In average, they produce optimization that is advertise, but a lot depends on the style in which the code is written, memory layout, as well as the compiler itself.
Sometimes, you need to try and fail many times before getting the optimal code.
-O2 indeed gives the best optimization in most of the cases.

GCC handpicking Optimizations

I see this thread, and I had the same question, but this one isn't really answered: GCC standard optimizations behavior
I'm trying to figure out exactly what flag is causing an incredible boost in performance, in O1. I first found out which flags are set, using g++ -O1 -Q --help=optimizers and then got each of the enabled ones and used them to compile with g++. But the output results were different (the binary itself was of difference sizes).
How do I handpick optimizations for g++ or is this not possible?
Not all optimizations have individual flags, so no combination of them will generate the same code as using -O1 or any other of the general optimization enabling options (-Os, -O2, etc...). Also I imagine that a lot of the specific optimization options are ignored when you use -O0 (the default) because they require passes that are skipped if optimization hasn't generally enabled.
To try to narrow down your performance increase you can try using -O1 and then selectively disabling optimizations. For example:
g++ -O1 -fno-peephole -fno-tree-cselim -fno-var-tracking ...
You still might not have better luck this way though. It might be multiple optimizations in combination are producing your performance increase. It could also be the result of optimizations not covered by any specific flag.
I also doubt that better cache locality resulted in your "incredible boost in performance". If so it was likely a coincidence, especially at -O1. Big performance increases usually come about because GCC was able eliminate a chunk of your code either because it didn't actually have any net effect, always resulted in the same value being computed or it invoked undefined behaviour.

Do `-g -rdynamic` gcc flags slow down application execution (grow performance consumption) notably?

So I want to distribute my gcc application with backtrace logging for critical errors. Yet it is quite performance critical application so I wonder if -g -rdynamic gcc flags do slow down execution (especially if they do allot)? Also would like to give my users maximum performance so I do compile with optimization flags like "-flto" and "-mtune" and that makes me wonder if flags would conflict and inside baacktrace would be madness?
Although introducing debug symbols does not affect performance by itself, your application still end up far behind in terms of possible performance. What I mean by that is that it would be bad idea to use -g and -O3 simultaneously, in general. Therefore, if your application is performance critical, but at the same time severely needs to keep good level of debugging, then it would be reasonable to find some balance between these two. In the latest versions of GCC, we are provided with -Og flag:
Optimize debugging experience. -Og enables optimizations that do not
interfere with debugging. It should be the optimization level of
choice for the standard edit-compile-debug cycle, offering a
reasonable level of optimization while maintaining fast compilation
and a good debugging experience.
I think it would be good idea to test your application with this flag, to see whether the performance is indeed better than bare -g, but the debugging stays intact.
Once again, do not neglect reading official GCC documentation. LTO is relatively new feature in GCC, and, as a result, some of its parts are still experimental and are not meant for production. For example, this is the direct extract:
Link-time optimization does not work well with generation of debugging
information. Combining -flto with -g is currently experimental and
expected to produce wrong results.
Not so long ago I had mixed experience with LTO. Sometimes it works well, sometimes the project doesn't even compile, not to mention that there could also be subtle runtime issues. Summarizing all of it, I would not recommend using LTO, especially in your situation.
NOTE: Performance gain from LTO usually varies from 0% to 3%, and it heavily depends on the underlying application. Without profiling, you cannot tell whether it is even reasonable to employ LTO for your situation as it might deliver more troubles than benefits.
Flags like -march and -mtune usually do optimizations on a very low level - instruction level for the target processor architecture. Thus, I wouldn't expect them to interfere with debugging. Nevertheless, you are welcomed to test this yourself with your application.
-g has no impact whatsoever on performance. -rdynamic will increase the size of the dynamic symbol table in the main executable, which might slow down dynamic linking. My best guess is that the slow-down will be very small but possibly measurable (nonzero) with precise measurement/profiling tools.

-march and debug mode

I know too much optimization doesn't make much sense for debug code.
But what about using -march=native to make better use of the instruction set?
EDIT:
Let's reformulate this. I know enabling optimizations and debug mode at the same time might have disadvantages like:
GCC allows you to use -g with -O. The shortcuts taken by optimized
code may occasionally produce surprising results: some variables you
declared may not exist at all; flow of control may briefly move where
you did not expect it; some statements may not be executed because
they compute constant results or their values were already at hand;
some statements may execute in different places because they were
moved out of loops.
So my question is, does -march=native have similar side effects or is it sensible to use it in debug code as well?
The problem with optimization is aggressive optimization passes that alter control flow can confuse debuggers. -march=native may enable additional optimizations (cmov, for example) if those passes have been enabled with a -O option, but will not in itself confuse the debugger.