Debug symbols stability - c++

I am compiling an application with -g option:
gcc -g -o main1 main.c
then I strip debug object from it:
objcopy --strip-debug main1
Let's assume that my main1 application will crash and I would like to use a core dump coredump1 to debug the problem.
Could I rebuild the source code once more
gcc -g -o main2 main.c
and extract debug symbols
objcopy --only-keep-debug main2 main2.debug
and use main2.debug to debug the coredump1?
Can I trust that debug symbols will be always aligned? Is it guaranteed by language standard or compiler requirement?
Will debug symbols match if my source code will contain strings based on macros like__DATE__ or __TIME__ ?
Will it work if I enable code optimization?

Will debug symbols match ...
Will it work if I enable code optimizaiton?
As others have commented, you should not rely on this, and instead always build with -g and separate debug symbols out before shipping the "final product".
That said, in practice this works for GCC1 with or without optimization, but doesn't work at all for Clang/LLVM (which gives you a practical reason not to depend on this).
1 Or at least it did last time I tried this for several non-trivial binaries a few years ago.
Note that maintaining this property requires active effort from the compiler developers and thus can be broken as violations are introduced, noticed and fixed.

Related

gcc -O2 is smaller then gcc -O2 -g followed by strip --strip-all

I am building code that I want to produce release versions. However I also want to be able to debug cores if they crash.
So I read that building with debug symbols can be used followed by producing a copy of the binary that you run strip on. Then you can take the core produced by the stripped binary (the released/customer binary) and then gdb this against your copy of the binary with debug symbols...
So step one for me was to generate the binary, I do:
gcc -O2 ... -o testbin_release_orig (original release bin without symbols)
gcc -O2 -g ... -o testbin_debug (full debug binary)
cp testbin_debug testbin_release
strip --strip-all testbin_release (stripped debug binary)
This produces three files with different sizes:
testbin_release_orig: ~1.7Mb
testbin_debug: ~13Mb
testbin_release: ~2.1Mb
My question is, why is testbin_release not exactly the same size as testbin_release_orig? I am guessing that strip can't strip all the debug symbols that gcc adds. But there is about 0.4Mb of "extra stuff" - what does that consist of?
The difference is from the debug code.
For an 1.7 MB executable you are probably using a library or two. Usually they have something like:
#ifdef _DEBUG
// some debug code
#endif
Also common practice for big projects, so some of it may be your code as well.
strip removes only the symbols. The debug code stays.

g++ switch to not include own symbolic function names (and debugging data) -?

Haven't found one, but is there a switch to exclude any debugging data as well as clear-text references to local (own) functions in generated code?
Simple example:
void setenv( char* in_str ) {
}
...gives me a readable "setenv" name in the executable, which is really not needed, unless it's an interpretive language.
Also in the executable - text names of variables, which is even stranger.
==========
EDIT:
So far tried Solaris strip, GNU strip, g++ -O0 and -s switches. The only way to remove the symbols in question was "strip --strip-all" from the object file (but not the executable), but then it won't link.
So it looks like Richard C is right, and this is indeed needed for lib* runtimes.
You can either use the gnu strip command line tool, or link with the gcc -s flag. Note though, the only benefit will be decreased file size. This part of the binary is only loaded into memory if you run the app in a debugger or you generate a stack trace. I prefer to use the strip command, because you can save the debug info separately and load if it you want to get a stack trace for some reason.
examples:
g++ -o myexecutable ...
strip --strip-unneeded myexecutable
or
g++ -s -o myexecutable a.o b.o c.o ...

Why does the order of clang compiler flags affect the resulting binary size?

Alternate title: Why does my dylib include extra exported symbols when compiled by Xcode vs Makefile?
My company builds a c++ dynamic library (dylib) on the Mac using clang and we recently ported our hand crafted Makefile to the CMake build system and are now using the generated Xcode projects. After ensuring that all the compiler/linker flags and environment variables matched exactly between the two systems, we noticed that the dylib created by CMake/Xcode was slightly larger. Closer examination showed that it contained some additional exported symbols (from templated functions that were never referenced and therefore should never have been instantiated - the specific templates had their definitions and specializations in the source files as we use explicit instantiation frequently, although in this case they were not explicitly instantiated). Examining the disassembly of some of the object files showed slight instruction differences as well. The only thing that got the libraries to match in size and symbols exactly was to match the order of the compiler flags exactly. This appears to demonstrate some order dependent interaction between compiler flags which seems like a compiler bug or at least poorly documented behavior.
For this specific issue, these were the compiler invocations:
clang++ -fvisibility=hidden -fvisibility-ms-compat -c foo.cpp -o foo.o
clang++ -fvisibility-ms-compat -fvisibility=hidden -c foo.cpp -o foo.o
And this was the linker invocation:
clang++ -dynamiclib -o libfoo.dylib foo.o
Displaying the exported symbols with:
nm -g libfoo.dylib
showed the differences. I submitted this LLVM Bug.
Are there ever any valid situations where compiler flag ordering matters?
Microsoft's compilers and pretty much everyone else's have traditionally had very different models for symbol visibility in the object file. The former has for a long time used C and C++ language extensions to control symbol emission by the compiler, and by default not exporting symbols.
It seems likely that -fvisibility=hidden and -fvisibility-ms-compat are mutually exclusive, and that the compiler honours the last one see on its command-line.
In all fairness, there is little documentation for -fvisibility-ms-compat to be had - other than the commit adding it to clang.

Why does a 2-stage command-line build with clang not generate a dSYM directory?

I have a simple project I want to debug want to produce dSYM folder with debugging symbols.
Running:
clang++ -std=c++14 -stdlib=libc++ -g -o Lazy Lazy.cpp
Creates Lazy.dSYM as I expect.
However:
clang++ -std=c++14 -stdlib=libc++ -g -c Lazy.cpp
clang++ -stdlib=libc++ -g -o Lazy Lazy.o
Does not create Lazy.dSYM (It seems that the symbols are embedded in the binary).
Sadly the 2-step build is what my modified makefile does. How can I generate Lazy.dSYM from a 2-stage compile-and-link build?
I don't need a dSYM directory, just debugging symbols, but would like to understand when and why it is created.
The creation of the .dSYM bundle is done by a tool called dsymutil. When Apple added support for DWARF debugging information, they decided to separate "executable linking" from "debug information linking". As such, the debug information linking is not done by the normal linker, it's done by dsymutil.
As a convenience, when you build a program all in one step, the compiler invokes dsymutil on your behalf. That's because it knows it has all of the inputs. If you add the -v (a.k.a. --verbose) option to the compile command, you will see the invocation of dsymutil as the last step it does.
In other cases, though, it doesn't do that. It leaves the debug information linking step for the user to do manually. You can do it by simply issuing the command:
dsymutil <your_program>
Here's an article by an Apple engineer who helped design and implement Apple's support for DWARF explaining their thinking. He also answered a question here on Stack Overflow about this stuff.

What is the difference between -O0 ,-O1 and -g

I am wondering about the use of -O0,-O1 and -g for enabling debug symbols in a lib.
Some suggest to use -O0 to enable debug symbols and some suggest to use -g.
So what is the actual difference between -g and -O0 and what is the difference between -01 and -O0 and which is best to use.
-O0 is optimization level 0 (no optimization, same as omitting the -O argument)
-O1 is optimization level 1.
-g generates and embeds debugging symbols in the binaries.
See the gcc docs and manpages for further explanation.
For doing actual debugging, debuggers are usually not able to make sense of stuff that's been compiled with optimization, though debug symbols are useful for other things even with optimization, such as generating a stacktrace.
-OX specify the optimisation level that the compiler will perform. -g is used to generate debug symbols.
From GCC manual
http://gcc.gnu.org/onlinedocs/
3.10 Options That Control Optimization`
-O
-O1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.`
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.`
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-cp-clone options.`
-O0
Reduce compilation time and make debugging produce the expected results. This is the default. `
-g
Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2). GDB can work with this debugging information.`
-O0 doesn't enable debug symbols, it just disables optimizations in the generated code so debugging is easier (the assembly code follows the C code more or less directly). -g tells the compiler to produce symbols for debugging.
It's possible to generate symbols for optimized code (just continue to specify -g), but trying to step through code or set breakpoints may not work as you expect because the emitted code will likely not "follow along" with the original C source closely. So debugging in that situation can be considerably trickier.
-O1 (which is the same as -O) performs a minimal set of optimizations. -O0 essentially tells the compiler not to optimize. There are a slew of options that allow a very fine control over how you might want the compiler to perform: http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Optimize-Options.html#Optimize-Options
As mentioned by others, -O set of options indicate the levels of optimization that must be done by the compiler whereas, the -g option adds the debugging symbols.
For a more detailed understanding, please refert to the following links
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options