Last week I released the Linux and windows version of an application.
And after the release we realized that the symbols were not stripped off, and my manager thinks (and I disagree) that it might allow the user to understand our algorithm.
Anyway, now, I will have to clean-up the symbols and re-release the application.
My question,
What is the best way to strip symbols in Linux?
What is the best way to strip symbols in Windows?
With Visual C++ (and other Microsoft compilers) on Windows, symbols aren't part of the binaries. Instead, they are stored in separate files called "Program Database" files (.pdb files). Just don't provide the .pdb files.
With the GNU toolchain you would use strip to remove symbols from the binaries.
For the GNU toolchain, there exists a cool sleight of hand:
objcopy --only-keep-debug yourprogram ../somepath/yourprogram.dbg
strip yourprogram
objcopy --add-gnu-debuglink=../somepath/yourprogram.dbg yourprogram
Now you can zip the folder your program is in (or pack it into an installer or whatever), there will be no more debug symbols in it.
BUT : If you launch the debugger or if you run a tool like addr2line or obdump, the tool will (thanks to the debuglink info) automagically know where to find the symbols and load them.
Which is awesome, because it means you have the benefits of having symbols on your end, without distributing them to the user.
Related
While debugging a Qt 5 application, I am sometimes not interested in the internals of Qt 5 but in the structure of the application itself. Therefore I do not need to load all debugging symbols of the Qt 5 libraries since these take a few seconds to load.
Is it possible to prevent GDB from loading symbols for these Qt 5 libraries while keeping the debugging symbols for my application?
Is it possible to prevent GDB from loading symbols for these Qt 5 libraries while keeping the debugging symbols for my application?
Yes.
As Richard Critten's comment mentions, setting auto-solib-add to 0 will prevent loading of symbols for all shared libraries, and you can then add files manually with the sharedlibrary command (which accepts a regex). If this regex is omitted, then all shared libraries are loaded.
That however would prevent auto-loading of all symbols (not just debug symbols), and would also prevent auto-loading of symbols for system libraries, which are often required to unwind the stack.
A better approach may be to save a copy of Qt5 libraries with full debug info somewhere, e.g. ~/Qt5-debug/, then run strip -g on the original libraries. That way, you will get symbolic info for all libraries, and in the rare case when you actually need full-debug info for Qt5, you can still do that using the GDB file ~/Qt5-debug/libQt5Core.so.5.2 or similar commands.
The chapter GDB Files from the GDB manual has more documentation on using such separate debugging symbols.
If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.
I'm running gdb with set verbose on and I'm trying to understand one of the messages I am getting:
Reading symbols from system-supplied DSO at 0x7ffff7ffb000...(no debugging symbols found)...done.
What is thesystem-supplied DSO? After some search I think that DSO might stand for "dynamic shared object". But I still don't understand exactly what gdb is doing here and how I might solve the problem with the debugging symbols not being found (or if it even matters).
Also the program that I am debugging is being compiled with llvm-gcc and has an LLVM pass applied to it. I think that is effecting the behavior of gdb, but I"m not exactly sure how.
So essentially my question is what does the message that gdb prints mean, is it likely to cause a problem, and if so any suggestions on how I could help gdb find the debugging symbols.
According to this document a DSO is:
A dynamic shared object (DSO) is an object file that’s meant to be
used simultaneously— or shared—by multiple applications (a.out files)
while they’re executing.
I believe that a system supplied DSO is just a DLL provided by the OS and loaded by the main executable. Since this is an external library you don't have the debugging symbols of such object unless you download them separately. Typically the release binaries are stripped of debugging symbols but they can have a link to a separate file. A typical Linux distribution provides a package containing the debugging symbols of such binaries ( like the xxx-debuginfo-xxx.rpm for RedHat based distributions).
In this context, system-supplied-DSO means a shared library provided directly by the linux kernel such as VDSO. Debuginfo is indeed available for them, but is packaged along with the kernel rather than userspace. Use debuginfod to automatically fetch them if your distro supports that.
I'm using the apple gcc to compile a dylib that I'm going to redistribute. For various reasons I'm using some libraries, let's say libz to keep it simple.
Since this library is not typically found on a Mac system I wish to static link in used symbols into the dylib by passing the path to the .a-file to simplify deployment.
Now, the linker links in all symbols from the lib into the resulting dylib although I only reference a subset. On linux I've never encountered this problem, the linker happily discards all unreferenced symbols and creates a very slim executable, so it should be possible. The dylib file I have now is ~10 times larger than it should.
I've tried fiddle around with the -dead_code linker flag, but to no avail. Perhaps I just don't understand it?
Does anyone know the solution to this?
Try -Wl,--gc-sections.
As regards -dead_strip (what you probably meant by -dead_code):
Before turning on the -dead_strip
option your project will first have to
be "ported" to work with dead code
stripping. This will include changing
from -gused (the default for -g) to
-gfull and re-compiling all of the objects files being linked into your
program with the new compiler from the
Mac OS X June 2004 release. Also if
your building an executable that loads
plugins, which uses symbols from the
executable, you will have to make sure
the symbols the plugins use are not
stripped (by using
attribute((used)) or the -exported_symbols_list option). If you are using an export list and building
a shared library, or an executable
that will be used with ld(1)'s
-bundle_loader flag, you need to include the symbols for exception
frame information in the export list
for your exported C++ symbols. These
symbols end with .eh and can be seen
with the nm(1) tool.
and:
To enable dead-code stripping from the
command line, pass the -dead_strip
option to ld. You should also pass the
-gfull option to GCC to generate a complete set of debugging symbols for
your code. The linker uses this extra
debugging information to dead strip
the executable.
Hope this helps.
All content in this answer was located within the first few Google search results for "apple ld static link unused symbols". :)
I have some questions regarding debugging symbols and what can be done with them, besides, well, debugging. I'm mostly interested in answers regarding GCC, but I'd also be happy to know how it looks like under other compilers, including MSVC.
First of all:
What are the common formats/types of debugging symbols?
How do they relate to compilers and platforms? Is it always the same format on GCC and MinGW among platforms?
Can I check in runtime whether the build has them and what format are they in?
And some more practical questions... How can I:
Check the current file and line number?
Obtain the (qualified) function name being executed?
Obtain a full current stack trace?
Let me emphasize that I'm talking about run-time checks. All of those can be read and pretty-printed by GDB, but I don't know how much info comes from the debugging symbols themselves and how much from the source code which GDB also has access to.
Maybe there's a library which is able to parse the debugging symbols and yield such information?
Are the debugging symbols standardised well enough that I can expect some degree of portability for such solutions?
What are the common formats/types of debugging symbols?
DWARF and STABS (those are embedded inside executable, in special sections), Program Database (PDB; external file, used by MSVC).
How do they relate to compilers and platforms? Is it always the same format on GCC and MinGW among platforms?
GCC uses DWARF/STABS (I think it's a GCC compile-time option) both on Linux (ELF) and Windows (PE), don't know about others. MSVC always uses PDB.
Can I check in runtime whether the build has them and what format are they in?
You can parse the executable image and see if there are sections with debugging info (see STABS documentation and DWARF specs). PDB files are distributed either with executables or via symbol servers (so if you don't want to go online, check if there is X.pdb for X.exe/X.dll).
About how to read and use those symbols — I don't know about DWARF/STABS (there's probably something around GNU binutils that can locate and extract those), but for PDB your best bet is to use dbghelp — its usage is pretty well documented and there are a lot of examples available on the net. There's also DIA SDK that can be used to query PDB files.
Are the debugging symbols standardised well enough that I can expect some degree of portability for such solutions?
DWARF has a formal specification, and it's complicated as hell. PDB AFAIK is not documented, but dbghelp/DIA are, and are the recommended way.