Should I use matching (gcc) compiler optimization flags when profiling the code? - c++

I am using -O3 when compiling the code, and now I need to profile it. For profiling, there are two main choices I came accross: valgrind --tool=callgrind and gprof.
Valgrind (callgrind) docs state:
As with Cachegrind, you probably want to compile with debugging info (the -g option) and with optimization turned on.
However, in the C++ optimization book by Agner Fog, I have read the following:
Many optimization options are incompatible with debugging. A debugger can execute a
code one line at a time and show the values of all variables. Obviously, this is not possible
when parts of the code have been reordered, inlined, or optimized away. It is common to
make two versions of a program executable: a debug version with full debugging support
which is used during program development, and a release version with all relevant
optimization options turned on. Most IDE's (Integrated Development Environments) have
facilities for making a debug version and a release version of object files and executables.
Make sure to distinguish these two versions and turn off debugging and profiling support in
the optimized version of the executable.
This seems to conflict the callgrind instructions to compile the code with the debugging info flag -g. If I enable debugging in the following way:
-ggdb -DFULLDEBUG
am I not causing this option to conflict with the -O3 optimization flag? Using those two options together makes no sense to me after what I have read so far.
If I use say -O3 optimization flag, can I compile the code with additional profiling info by using:
-pg
and still profile it with valgrind?
Does it ever make sense to profile a code compiled with
-ggdb -DFULLDEBUG -O0
flags? It seems silly - not inlining functions and unrolling loops may shift the bottlenecks in the code, so this should be used for development only, to get the code to actually do stuff properly.
Does it ever make sense to compile the code with one optimization flag, and profile the code compiled with another optimization flag?

Why are you profiling? Just to get measurements or to find speedups?
The common wisdom that you should only profile optimized code is based on assuming the code is nearly optimal to begin with, which if there are significant speedups, it is not.
You should treat the finding of speedups as if they were bugs. Many people use this method of doing so.
After you've removed needless computations, if you still have tight CPU loops, i.e. you're not spending all your time in system or library or I/O routines the optimizer doesn't see, then turn on -O3, and let it do its magic.

Related

Are there any downsides to compiling with -g flag?

GDB documentation tells me that in order to compile for debugging, I need to ask my compiler to generate debugging symbols. This is done by specifying a '-g' flag.
Furthermore, GDB doc recommends I'd always compile with a '-g' flag. This sounds good, and I'd like to do that.
But first, I'd like to find out about downsides. Are there any penalties involved with compiling-for-debugging in production code?
I am mostly interested in:
GCC as the compiler of choice
Red hat Linux as target OS
C and C++ languages
(Although information about other environments is welcome as well)
Many thanks!
If you use -g (which on recent GCC or Clang can be used with optimization flags like -O2):
compilation time is slower (and linking will use a lot more memory)
the executable is a bigger file (see elf(5) and use readelf(1)...)
the executable carries a lot of information about your source code.
you can use GDB easily
some interesting libraries, like Ian Taylor's libbacktrace, requires DWARF information (e.g. -g)
If you don't use -g it would be harder to use the GDB debugger (but possible).
So if you transmit the binary executable to a partner that should not understand how your source code was written, you need to avoid -g
See also the strip(1) and strace(1) commands.
Notice that using the -g flag for debugging information is also valid for Ocaml, Rust
PS. Recent GCC (e.g. GCC 10 or GCC 11 in 2021) accept many debugger flags. With -g3 your executable carries more debug information (e.g. description of C++ macros and their expansion) that with -g or -g1. Of course, compilation time increases, and executable size also. In principle, your GCC plugin (perhaps Bismon in 2021, or those inside the source code of the Linux kernel) could add even more debug information. In practice, you won't do that unless you can improve your debugger. However, a GCC plugin (or some #pragmas) can remove some debug information (e.g. remove debug information for a selected set of functions).
Generally, adding debug information increases the size of the binary files (or creates extra files for the debug information). That's nowadays usually not a problem, unless you're distributing it over slow networks. And of course this debug information may help others in analyzing your code, if they want to do that. Typically, the -g flag is used together with -O0 (the default), which disables compiler optimization and generates code that is as close as possible to the source, so that debugging is easier. While you can use debug information together with optimizations enabled, this is really tricky, because variables may not exist, or the sequence of instructions may be different than in the source. This is generally only done if an error needs to be analyzed that only happens after the optimizations are enabled. Of course, the downside of -O0 is poorer performance.
So as a conclusion: Typically one uses -g -O0 during development, and for distribution or production code just -O3.

Why is thrust sort so slow? [duplicate]

I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it.
I have checked the documentation in Nvidia:
--device-debug (-G)
Generate debug information for device code.
But it is not helping me to know the reason why is giving me such bad performance.
Where is it generating this debug information and when? and what could be the cause of this bad performance?
Using the -G switch disables most compiler optimizations that nvcc might do in device code. The resulting code will often run slower than code that is not compiled with -G, for this reason.
This is pretty easy to see by running your executable in each case through cuobjdump -sass myexecutable and looking at the generated device code. You'll see generally less device code in the non -G case, and you can see the differences in specific optimizations as well.
One of the reasons for this is that highly optimized device code may eliminate actual lines of source code and actual source code variables. This can make it very difficult to debug code. Therefore to enable debugging, most optimizations are disabled with -G.
Also note that with Thrust, using the -G switch may result in unpredictable behavior. Newer versions of thrust should behave better, but there may still be unexpected issues when compiling thrust code with -G.

Is a program compiled with -g gcc flag slower than the same program compiled without -g?

I'm compiling a program with -O3 for performance and -g for debug symbols (in case of crash I can use the core dump). One thing bothers me a lot, does the -g option results in a performance penalty? When I look on the output of the compilation with and without -g, I see that the output without -g is 80% smaller than the output of the compilation with -g. If the extra space goes for the debug symbols, I don't care about it (I guess) since this part is not used during runtime. But if for each instruction in the compilation output without -g I need to do 4 more instructions in the compilation output with -g than I certainly prefer to stop using -g option even at the cost of not being able to process core dumps.
How to know the size of the debug symbols section inside the program and in general does compilation with -g creates a program which runs slower than the same code compiled without -g?
Citing from the gcc documentation
GCC allows you to use -g with -O. The shortcuts taken by optimized
code may occasionally produce surprising results: some variables you
declared may not exist at all; flow of control may briefly move where
you did not expect it; some statements may not be executed because
they compute constant results or their values are already at hand;
some statements may execute in different places because they have been
moved out of loops.
that means:
I will insert debugging symbols for you but I won't try to retain them if an optimization pass screws them out, you'll have to deal with that
Debugging symbols aren't written into the code but into another section called "debug section" which isn't even loaded at runtime (only by a debugger). That means: no code changes. You shouldn't notice any performance difference in code execution speed but you might experience some slowness if the loader needs to deal with the larger binary or if it takes into account the increased binary size somehow. You will probably have to benchmark the app yourself to be 100% sure in your specific case.
Notice that there's also another option from gcc 4.8:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
This flag will impact performance because it will disable any optimization pass that would interfere with debugging infos.
Finally, it might even happen that some optimizations are better suited to a specific architecture rather than another one and unless instructed to do so for your specific processor (see march/mtune options for your architecture), in O3 gcc will do its best for a generic architecture. That means you might even experience O3 being slower than O2 in some contrived scenarios. "Best-effort" doesn't always mean "the best available".

When debugging a C++ program with GDB the "next" command seems to skip source lines

When I debug my C++ program, I set a breakpoint on the main function. When the program starts running, it seems to have skipped several lines of source before the line at which it stops. What's the problem?
Your program is probably compiled with optimisation enabled, which means that the lines of source are not necessarily sequentially translated into machine code. Under optimisation, the execution of different parts of the source code can be re-ordered and interleaved - this is likely what you're seeing.
If you want to step through your source code in a simple, sequential line-by-line manner you will need to compile with no optimisation (-O0).
Alternatively, if you understand machine code you can use:
set disassemble-next-line on
which will show you the disassembly of the code that the debugger is stopped on alongside the source code line it belongs to.
You seem to have symbols for your program, as GDB happily reads them. However, do you have the source in the original place or are you perhaps debugging on a different machine?
What does:
info source
give you when you enter it on the command prompt? It should give you something along the lines of:
(gdb) info source
Current source file is hello.c
Compilation directory is /home/username/source
Located in /home/username/source/hello.c
Contains 7 lines.
Source language is c.
Compiled with DWARF 2 debugging format.
Includes preprocessor macro info.
if GDB has debug symbols and source available.
From the output, however, it looks like this part should be fine, so caf is likely right that this is about the optimization level of your compiler.
Keep in mind that this is the very reason for debug versus release settings. During development you'll perhaps want -O0 or -O1 combined with -ggdb -g3 if you're using GCC to compile. For other compilers the settings may be different. For a release you'll probably want to use the highest safe optimization value (see this link), -O2 for gcc or -O3 if you are using one of the widely used architectures and aren't afraid of nasty surprises.
Either way if you are serious about software development and consequently debugging, you should learn the very basics of the assembly language for your target CPUs. Why? Because sometimes the optimizer, especially in GCC, goes haywire and does stupid things even when you tell it to not trust your code, such as with -fno-strict-aliasing. I've encountered cases where it would happily use instructions on a SPARC which are supposed to be used only on aligned data, but there was no guarantee that the data we gave it was aligned. Anyway, it's the very reason Gentoo recommends -O2 instead of any higher value for optimization. If you don't know why an assembly instruction does what it does or why your program does something silly and you can't take the magnifying glass and step down to the assembly level, you'll be lost.
How to see the assembly code in GDB
As pointed out by caf you can use set disassemble-next-line on to see the disassembly at the current program counter if you are using GDB 7.0 or newer. On older GDB versions you may resort to the trusty old display command:
disp/i $pc
which sets an automatic display for the program counter ($pc). Perhaps a better and visually more appealing alternative, especially if you have a lot of screen estate, is to use layout asm and layout regs combined in GDB. See the following screen shot:

gcc -O0 vs. -Og compilation time

The release notes for gcc were a little vague on -Og:
It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level -O0.
Does "Overall experience for development" include compilation time? If I don't need debug symbols and am optimizing for compile time, should I be using -O0 or -Og?
Does "Overall experience for development" include compilation time?
I think it does, but not in this very specific case.
If I don't need debug symbols and am optimizing for compile time, should I be using -O0 or -Og?
-O0.
If I don't need debug symbols and am optimizing for compile time, should I be using -O0 or -Og?
If the presence or absence of debug symbols doesn't matter, time both options and see which one is faster.
With -Og the compiler has to construct and write out extra data (for debugging), so it will take longer. Just compile to assembler (with gcc -S -Og, etc) and compare. But whatever difference there is between -O0 and -Og runtime is probably dwarfed by the time to start gcc and its complete machinery.
If you want compile time, perhaps you should consider tcc for C. Perhaps LLVM is faster for C++.