How to get non-dynamic symbols in GCC backtrace? - c++

GCC's backtrace_symbols() only resolves dynamic symbols, since handling all types of symbols is something GCC maintainers do not want to get into.
How would I go about extracting non-dynamic symbols obtained from GCC's backtrace() function myself?

Check out what addr2line does using bfd. That is one approach I have used successfully.
More specifically, backtracefilt gets you basically all the way there, you just need to adapt it to take the addresses from backtrace instead of parsing a file.

libdw, part of elfutils, can be used to read the DWARF debugging information that is present if you compiled with -g.

Related

Is there any way to get the list of classes that exist in a C++ executable?

I have a C++ executable binary that was compiled from combining many files which has a number of classes defined in them. Is it possible to get a list of classes, methods and properties that are there in it? I might be asking too much but can I somehow also automatically generate a class diagram which is my ultimate goal?
If the executable was compiled with debug symbols, you might have a chance to at least get the class names. On Linux, you'd do
nm -C <executable>
which should give you a list of symbols. You should read the documentation of nm, because it provides quite a bit of information. However, you won't get a class hierarchy. I even believe that would be rather hard. You could try checking which constructors are called by other constructors, maybe you'll get lucky, but that will be a mess.
On linux, you can use nm to get symbols in a library (classes, it's methods and functions).

Strip symbols and RTTI text from GCC executable

My project uses template metaprogramming heavily. Most of the action happens inside recursive templates which produce objects and functions with very long (mangled) symbol names.
Despite the build time being only ~30 sec, the resulting executable is about a megabyte, and it's mostly symbol names.
On Linux, adding a -s argument to GCC brings the size down to ~300 KiB, but a quick look with a text editor shows there are still a lot of cumbersome names in there. I can't find how to strip anything properly on OS X… will just write that off for now.
I suspect that the vtable entries for providing typeid(x).name() are taking up a big chunk. Removing all use of the typeid operator did not cause anything more to be stripped on Linux. I think that the default exception handler uses the facility to report the type of an uncaught exception.
How might I maximize strippage and minimize these kilobyte-sized symbols in my executable?
Just run the program strip on the final executable. If you want to be fancier, you can use some other tools to store the debug info separately, but for your stated purpose, just strip a.out is fine. Maybe use the --strip-all option--I haven't tried that myself to see if it differs from the default behavior.
If you really want to try disabling RTTI, well, it's gcc -fno-rtti. But that may break your program badly--only one way to find out I guess.

How much source information is stored in c++ executables

Some days ago I accidentally opened a C++ executable of a commercial application in Notepad++ and found out that there's quite a lot information about the original source code stored in the executable.
Inside the executable I could find file names (app.c, dlgstat.c, ...), function names (GetTickCount, DispatchMessageA, ...) and small pieces of source code, mostly conditions (szChar != TEXT('\0'), iRow < XTGetRows( hwndList )). After that I checked another QT executable and: yes again source file names and method signatures.
Because of that I am wondering how much source code information is really stored in a C/C++ executable (e.g., compiled using QT or MinGW). Is this probably some kind of debug build still containing the original source? Is this information used for some reflection stuff? Is there any reason why publishers don't remove this stuff?
How much source code information is really stored in a C/C++ executable?
In practice, not much. The source code is not required at runtime. The strings you name come from two things:
The function names (e.g. GetTickCount) are the names of functions imported from other modules. The names are required at runtime because the functions are resolved dynamically (by calling GetProcAddress with the function name).
The conditions are likely assertions: the assert macro stringizes its argument so that when it fires you know what condition was not met.
If you build a DLL, it will also contain a names of all of the functions it exports, so they can be resolved at runtime (the same is likely true for other shared object formats).
Debug symbols may also contain some of the original source code, though it depends on the format used by the debug symbols. These symbols may be contained either in the binary itself or in an auxiliary file (for example, .pdb files used on Windows).
Windows function names: they probably are there just because they are being accessed dynamically - somewhere in your program there's a GetProcAddress to get their address. Still, no reason to worry, every application uses WinAPIs, so there's not much to discover about your executable from that information.
Conditions: probably from some assert-like macro; they are included to allow assert to print what failed condition triggered the failed assertion. Anyhow, in release mode assertions should be removed automatically.
Source file names and method signatures: probably from some usage of __FILE__ and __func__ macros; probably, again, from assert.
Other sources of information about the inner structure of your program is RTTI, that has to provide some representation for every type that typeid could be working on. If you don't need its functionality, you can disable it (but I don't know if that is possible in Qt projects).
Mixed into the binary of a C++ app you will find the names of most global symbols (and debugging symbols if enabled in the compiler), but with extra 'decoration text' that encodes the calling signature of the symbol if it is a function or method. Likewise, the literals of character strings are embedded in clear text. But no where will you find anything like the actual source code that the compiler used to create the binary executable. That information is lost during the compilation process, and it is especially hard to reverse engineer if C++ templates are employed in the build.

Adding own symbols for file in gdb

Ok.. so I'm working doing debugging on x86 with gdb.
The particular files in question are stripped so I have no symbols from the binary itself. I have no access to the source code, but a rough idea of what's happening under the hood.
My asm knowledge is just about good enough to decide the purpose of a function and decide its purpose. Thus I can decide on my own appropriate names for functions after looking at them for a while, but I would like to be able to inject these as symbols so that once decided upon they can be used in later debugging..
Does anybody know how to load custom symbols into gdb?
I've considered recompiling gdb with and adding an extra command to the UI to allow loading of a symbol at an address.. I was wondering if it would be possible to create a dummy object file with the symbols I've defined and then load it using add-symbol-file?
Or would it be possible to compile a c program with dummy function and so how force them to be the correct size and at the correct location and then simply load that??
This sounds like it should be an easy task, but it turns out to be surprisingly annoying, mostly because ELF as a file format is annoying to generate, so most tools are content with parsing it.
As described here, GDB reads the symbol information from two places, first some minimal information from the symbols in the .symtab and/or .dynsym sections, and afterwards more detailed information from the .debug_info section if it is present.
This immediately suggests two possible ways to add the information, either add the symbol to .symtab or generate your own DWARF info including the symbol.
However, generating DWARF from scratch seems to be a really uncommon use case, so the only working approach I've found so far is to use objcopy to add the symbol to the binary itself:
objcopy a.out --add-symbol function_name=.text:0x900,function,global a.out2
Note that gdb doesn't like absolute symbols for functions, I had to specify it as an offset into the .text section to be useful (i.e., be able to set breakpoints on the function and have it appear in backtraces)
Also, I wasn't able to find any way to modify the "size" field of the symbol.
I wouldn't look for a solution in gdb. I would instead try to figure out how to put the symbols back to the binary. Logically, if it is possible to strip the symbols, then it must be possible to add them back. I'd expect linker (ld) or some other tool to allow that.
I recommend to check all the tools in binutils package (objdump, objcopy, nm, ld, ...) - they are capable of many almost miraculous things!
Tomas

Getting pointer to bottom of the call stack and resolving symbol by address (like dladdr) under Windows?

I want to implement an analog of backtrace utility under windows in order to add this information to exception for example.
I need to capture return addresses and then translate it into symbols names.
I'm aware of StackWalk64 and of StackWalker project but unfortunately it has several important drawbacks:
It is known to be very slow (the StackWalk64) and I don't want to waste much time for collecting the trace the basically can be done as fast as walking on linked list.
The function StackWalk64 is known to be not thread safe.
I want to support only x86 and possible x86_64 architectures
Basic idea I have is following:
Run on stack using esp/ebp registers similarly to what GCC's __builtin_return_address(x)/__builtin_frame_address(x) doe till I reach the bottom of the stack (this is what glibc does).
Translate addresses to symbols
Demangle them.
Problems/Questions:
How do I know that I reach the to of the stack? For example glibc implementation has __libc_stack_end so it is easy to find where to stop. Is there any analog of such thing under Windows? How can I get stack bottom address?
What are the analogs of dladdr functionality. Now I know that unlike ELF platform that keeps most of symbol names, PE format does not. So it should read somehow the debug information. Any ideas?
Capturing Stack Trace: RtlCaptureStackBackTrace
Getting Symbols: Using DBG Help library (MSVC only). Key functions:
// Initialization
hProcess = GetCurrentProcess()
SymSetOptions(SYMOPT_DEFERRED_LOADS)
SymInitialize(hProcess, NULL, TRUE)
// Fetching symbol
SymFromAddr(...)
Implementation can be found there
You use StackWalk but resolve symbols later.