gcc -O2 is smaller then gcc -O2 -g followed by strip --strip-all - c++

I am building code that I want to produce release versions. However I also want to be able to debug cores if they crash.
So I read that building with debug symbols can be used followed by producing a copy of the binary that you run strip on. Then you can take the core produced by the stripped binary (the released/customer binary) and then gdb this against your copy of the binary with debug symbols...
So step one for me was to generate the binary, I do:
gcc -O2 ... -o testbin_release_orig (original release bin without symbols)
gcc -O2 -g ... -o testbin_debug (full debug binary)
cp testbin_debug testbin_release
strip --strip-all testbin_release (stripped debug binary)
This produces three files with different sizes:
testbin_release_orig: ~1.7Mb
testbin_debug: ~13Mb
testbin_release: ~2.1Mb
My question is, why is testbin_release not exactly the same size as testbin_release_orig? I am guessing that strip can't strip all the debug symbols that gcc adds. But there is about 0.4Mb of "extra stuff" - what does that consist of?

The difference is from the debug code.
For an 1.7 MB executable you are probably using a library or two. Usually they have something like:
#ifdef _DEBUG
// some debug code
#endif
Also common practice for big projects, so some of it may be your code as well.
strip removes only the symbols. The debug code stays.

Related

Debug symbols stability

I am compiling an application with -g option:
gcc -g -o main1 main.c
then I strip debug object from it:
objcopy --strip-debug main1
Let's assume that my main1 application will crash and I would like to use a core dump coredump1 to debug the problem.
Could I rebuild the source code once more
gcc -g -o main2 main.c
and extract debug symbols
objcopy --only-keep-debug main2 main2.debug
and use main2.debug to debug the coredump1?
Can I trust that debug symbols will be always aligned? Is it guaranteed by language standard or compiler requirement?
Will debug symbols match if my source code will contain strings based on macros like__DATE__ or __TIME__ ?
Will it work if I enable code optimization?
Will debug symbols match ...
Will it work if I enable code optimizaiton?
As others have commented, you should not rely on this, and instead always build with -g and separate debug symbols out before shipping the "final product".
That said, in practice this works for GCC1 with or without optimization, but doesn't work at all for Clang/LLVM (which gives you a practical reason not to depend on this).
1 Or at least it did last time I tried this for several non-trivial binaries a few years ago.
Note that maintaining this property requires active effort from the compiler developers and thus can be broken as violations are introduced, noticed and fixed.

g++ switch to not include own symbolic function names (and debugging data) -?

Haven't found one, but is there a switch to exclude any debugging data as well as clear-text references to local (own) functions in generated code?
Simple example:
void setenv( char* in_str ) {
}
...gives me a readable "setenv" name in the executable, which is really not needed, unless it's an interpretive language.
Also in the executable - text names of variables, which is even stranger.
==========
EDIT:
So far tried Solaris strip, GNU strip, g++ -O0 and -s switches. The only way to remove the symbols in question was "strip --strip-all" from the object file (but not the executable), but then it won't link.
So it looks like Richard C is right, and this is indeed needed for lib* runtimes.
You can either use the gnu strip command line tool, or link with the gcc -s flag. Note though, the only benefit will be decreased file size. This part of the binary is only loaded into memory if you run the app in a debugger or you generate a stack trace. I prefer to use the strip command, because you can save the debug info separately and load if it you want to get a stack trace for some reason.
examples:
g++ -o myexecutable ...
strip --strip-unneeded myexecutable
or
g++ -s -o myexecutable a.o b.o c.o ...

LLDB not showing source code

I am trying to debug a C++ program I am writing, but when I run it in LLDB and stop the program, it only shows me the assembler, not the original source.
e.g. after the crash I’m trying to debug:
Process 86122 stopped
* thread #13: tid = 0x142181, 0x0000000100006ec1 debug_build`game::update() + 10961, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100006ec1 debug_build`game::update() + 10961
debug_build`game::update:
-> 0x100006ec1 <+10961>: movq (%rdx), %rdx
0x100006ec4 <+10964>: movq %rax, -0xb28(%rbp)
0x100006ecb <+10971>: movq -0x1130(%rbp), %rax
0x100006ed2 <+10978>: movq 0x8(%rax), %rsi
I am compiling with -O0 -g. I see the same thing when running the debugger via Xcode (I’m on OSX) or from the command line.
What else might I need to do to get the source code to show up in LLDB?
Additional notes
Here is an example of a typical build command:
clang++ -std=c++1y -stdlib=libc++ -fexceptions -I/usr/local/include -c -O2 -Wall -ferror-limit=5 -g -O0 -ftrapv lib/format.cpp -o format.o
The earlier -O2 is there because that’s the default I’m using, but I believe the later -O0 overrides it, right?
What I’ve tried
I’ve recreated this problem with a simple ‘hello world’ program using the same build settings.
After some searching, I tried running dsymutil main.o which said warning: no debug symbols in executable (-arch x86_64), so perhaps the debug symbols are not being generated by my build commands?
I also tried adding -gsplit-dwarf to the build commands but with no effect.
Here is the link command from my ‘hello world’ version:
clang++ main.o -L/usr/local/lib -g -o hello
I ran dwarfdump (I read about it here) on the executable and object files. It looks to my untrained eye like the debug symbols are present in the object files, but not in the executable itself (unless dwarfdump only works on object files, which is possible). So maybe the linking stage is the issue. Or maybe there’s a problem with the DWARF.
I have now got this working in the ‘hello world’ program, through issuing build commands one-by-one in the terminal. I am therefore guessing this may be an issue with my build system (Tup), possibly running the commands with a different working directory so the paths get mangled or something.
When you add the -g command line option to clang, DWARF debug information is put in the .o file. When you link your object files (.o, ranlib archives aka static libraries aka .a files) into an executable/dylib/framework/bundle, "debug notes" are put in the executable to say (1) the location of the .o etc files with the debug information, and (2) the final addresses of the functions/variables in the executable binary. Optimization flags (-O0, -O2 etc) do not have an impact on debug information generation - although debugging code compiled with optimization is much more difficult than debugging code built at -O0.
If you run the debugger on that executable binary -- without any other modification -- the debugger will read the debug information from the .o etc files as long as they're still on the filesystem at the same file path when you built the executable. This makes iterative development quick - no tool needs to read, update, and output the (large) debug information. You can see these "debug notes" in the executable by running nm -pa exename and looking for OSO entries (among others). These are stabs nlist entries and running strip(1) on your executable will remove them.
If you want to collect all of the debug information (in the .o files) into a standalone bundle, then you run dsymutil on the executable. This uses the debug notes (assumptions: (1) the .o files are still in their orig location, and (2) the executable has not been stripped) to create a "dSYM bundle". If the binary is exename, the dSYM bundle is exename.dSYM. When the debugger is run on exename, it will look next to that binary for the dSYM bundle. If not found there, it will do a Spotlight search to see if the dSYM is in a spotlight-indexed location on your computer.
You can run dwarfdump on .o files, or on the dSYM bundle -- they both have debug information in them. dwarfdump won't find any debug information in your output executable.
So, the normal workflow: Compile with -g. Link executable image. If iterative development, run debugger. If shipping/archiving the binary, create dSYM, strip executable.
I solved it by adding the path to debug symbols which are present in a.out.dSYM directory using (lldb) target symbols add a.out.dSYM command.

Why does a 2-stage command-line build with clang not generate a dSYM directory?

I have a simple project I want to debug want to produce dSYM folder with debugging symbols.
Running:
clang++ -std=c++14 -stdlib=libc++ -g -o Lazy Lazy.cpp
Creates Lazy.dSYM as I expect.
However:
clang++ -std=c++14 -stdlib=libc++ -g -c Lazy.cpp
clang++ -stdlib=libc++ -g -o Lazy Lazy.o
Does not create Lazy.dSYM (It seems that the symbols are embedded in the binary).
Sadly the 2-step build is what my modified makefile does. How can I generate Lazy.dSYM from a 2-stage compile-and-link build?
I don't need a dSYM directory, just debugging symbols, but would like to understand when and why it is created.
The creation of the .dSYM bundle is done by a tool called dsymutil. When Apple added support for DWARF debugging information, they decided to separate "executable linking" from "debug information linking". As such, the debug information linking is not done by the normal linker, it's done by dsymutil.
As a convenience, when you build a program all in one step, the compiler invokes dsymutil on your behalf. That's because it knows it has all of the inputs. If you add the -v (a.k.a. --verbose) option to the compile command, you will see the invocation of dsymutil as the last step it does.
In other cases, though, it doesn't do that. It leaves the debug information linking step for the user to do manually. You can do it by simply issuing the command:
dsymutil <your_program>
Here's an article by an Apple engineer who helped design and implement Apple's support for DWARF explaining their thinking. He also answered a question here on Stack Overflow about this stuff.

Get the compiler options from a compiled executable?

It there a way to see what compiler and flags were used to create an executable file in *nix? I have an old version of my code compiled and I would like to see whether it was compiled with or without optimization. Google was not too helpful, but I'm not sure I am using the correct keywords.
gcc has a -frecord-gcc-switches option for that:
-frecord-gcc-switches
This switch causes the command line that was used to invoke the compiler to
be recorded into the object file that is being created. This switch is only
implemented on some targets and the exact format of the recording is target
and binary file format dependent, but it usually takes the form of a section
containing ASCII text.
Afterwards, the ELF executables will contain .GCC.command.line section with that information.
$ gcc -O2 -frecord-gcc-switches a.c
$ readelf -p .GCC.command.line a.out
String dump of section '.GCC.command.line':
[ 0] a.c
[ 4] -mtune=generic
[ 13] -march=x86-64
[ 21] -O2
[ 25] -frecord-gcc-switches
Of course, it won't work for executables compiled without that option.
For the simple case of optimizations, you could try using a debugger if the file was compiled with debug info. If you step through it a little, you may notice that some variables were 'optimized out'. That suggests that optimization took place.
If you compile with the -frecord-gcc-switches flag, then the command line compiler options will be written in the binary in the note section. See also the docs.
Another option is -grecord-gcc-swtiches (note, not -f but -g). According to gcc docs it'll put flags into dwarf debug info. And looks like it's enabled by default since gcc 4.8.
I've found dwarfdump program to be useful to extract those cflags. Note, strings program does not see them. Looks like dwarf info is compressed.
As long as the executable was compiled by gcc with -g option, the following should do the trick:
readelf --debug-dump=info /path/to/executable | grep "DW_AT_producer"
For example:
% cat test.c
int main() {
return 42;
}
% gcc -g test.c -o test
% readelf --debug-dump=info ./test | grep "DW_AT_producer"
<c> DW_AT_producer : (indirect string, offset: 0x2a): GNU C17 10.2.0 -mtune=generic -march=x86-64 -g
Sadly, clang doesn't seem to record options in similar way, at least in version 10.
Of course, strings would turn this up too, but one has to have at least some idea of what to look for as inspecting all the strings in real-world binary with naked eyes is usually impractical. E.g. with the binary from above example:
% strings ./test | grep march
GNU C17 10.2.0 -mtune=generic -march=x86-64 -g -O3
This is something that would require compiler support. You don't mention what compiler you are using but since you tagged your question linux I will assume you are using gcc -- which does not default the feature you're asking about (but -frecord-gcc-switches is an option to perform this).
If you want to inspect your binary, the strings command will show you everything that appears to be a readable character string within the file.
If you still have the compiler (same version) you used, and it is only one flag you're unsure about, you can try compiling your code again, once with and once without the flag. Then you can compare the executables. Your old one should be identical, or very similar, to one of the new ones.
I highly doubt it is possible:
int main()
{
}
When compiled with:
gcc -O3 -ffast-math -g main.c -o main
None of the parameters can be found in the generated object:
strings main | grep -O3
(no output)