I'm trying to debug a segfault in Android's surfaceflinger daemon on a custom made ARM board. The process crashes before dumping the call stack and register content, including the program counter.
Normally I would've used objdump and searched for the program counter. The problem is that part of the call stack is in a shared library. Without using gdb, how can I correlate the program counter with a line in the source file? That is, can the addresses of shared library instructions be determined without running the program?
The simplest solution is to load core dump into gdb and use info symbol <program counter address>, see https://stackoverflow.com/a/7648883/72178.
You can also use addr2line but you will have to provide library starting address in parameters to addr2line, see How to map function address to function in *.so files.
You need your program (and all the relevant shared libraries) to be compiled with debug information (in DWARF format), e.g. by passing some -g (or -g2 or -g3) flag to the GCC compiler when they are built. Notice that with GCC such a debugging option can be mixed with optimization options like -O2
Then you might use utilities like addr2line, or perhaps libraries like libbacktrace. FWIW, the GCC compiler itself (actually its cc1plus) uses that libbacktrace library to print a useful backtrace on SIGSEGV and other terminating signals (on compiler crashes).
BTW, you could (and probably should) enable core(5) dumping and do a post mortem analysis of it with gdb
Notice that due to ASLR, a shared library is loaded (actually mmap(2)-ed) at some "random" page.
Read Drepper's How to Write Shared Libraries paper.
Related
I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.
The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.
There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly
At a customer place a third party software has crashed. The process and the libraries are stripped (no symbols), the call stack does not give any useful information. All that I have is registers which may not be corrupted. This third party code has been written is C.
Now, I have used gdb till now to debug simpler issues. But this one is a bit complicated. I think register and raw stack information may be used to corelate where the crash occurred and I require help on this aspect.
It may not be possible to deploy a non-stripped binary at customer site, nor would it be possible to do inhouse crash reproduction. Also, I am not familiar with this third party code.
Also I require pointers/sites/documents for the following:
1) ELF and various section headers.
2) How to create a symbol file (during compilation) for a library and a process.
3) How to tell gdb to read symbols from a symbol file.
One thing we should be able to do is to open you core file against a non-stripped/with-symbols version of your process. As long as the compilation process (compiler, optimization flags, etc.) is the same and you just keep all these debugging information, GDB should be able to provide you with all the information you can expect from a core.
gdb [options] executable-file core-file
To compile your process with the debugging information (symbols and dwarfs for lines, types, ...), you need to add -g in your compiler flags. The same applies for your custom libraries.
For the system libraries, it might be conviant sometime (not always), modern Linux distributions (at least Fedora) directly provide them to gdb.
How do you usually get around this problem? Imagine that a thread crashes inside libc code (which is a system shared library) on Computer1 and then generates a coredump. But the Computer2 on which this coredump will be analysed might have a different version of libc.
So:
How important it is to have the same shared library on the remote computer? Will the gdb correctly reconstruct stacktrace without having exact same version of libc on Conputer2?
How important it is to have correct debug symbols for libc? Will the gdb correctly reconstruct stacktrace without having exact same debug symbols on the Computer2?
And what is the "correct" way to avoid this debug symbol mismatch problem for shared system libraries? For me it seems that there is no single solution that solves this problem in an elegant way? Maybe anyone can share his experience?
It depends. On some processors, such as x86_64, correct unwind descriptors are required for GDB to properly unwind the stack. On such machine, analyzing coredump with non-matching libc will likely produce complete garbage.
You don't need debug symbols for libc to get the stack trace. You wouldn't get file and line numbers without debug symbols, but you should get correct function names (except when inlining has taken place).
The premise of your question is wrong -- debug symbols have nothing to do with this. The "correct" way to analyze coredump on C2, when that coredump was produced on C1, is to have a copy of C1's libraries (in e.g. /tmp/C1/lib/...) and direct GDB to use that copy instead of the C2's installed libc with
(gdb) set solib-absolute-prefix /tmp/C1
command.
Note: above setting must be in effect before you load the core into GDB. This:
gdb exe core
(gdb) set solib-absolute-prefix /tmp/C1
will not work (core is read before the setting is in effect).
Here is the right way:
gdb exe
(gdb) set solib-absolute-prefix /tmp/C1
(gdb) core core
(I've tried to find a reference to this on the web, but didn't).
What are unwind descriptors?
Unwind descriptors are required when code is compiled without frame pointers (default for x86_64 in optimized mode). Such code does not save %rbp register, and so GDB needs to be told how to "step back" from current frame to the caller frame (this process is also known as stack unwinding).
Why isn't C1's libc.so included in the core?
The core file usually contains only contents of writable segments of the program address space. The read-only segments (where executable code and unwind descriptors reside) is not usually necessary -- you could just read them directly from libc.so on disk.
Except this doesn't work when you analyze C1's core on C2!
Some (but not all) operating systems allow one to configure "full coredumps", where the OS will dump read-only mappings as well, precisely so you can analyze core on any machine.
I've a program that implements a plugin system by dynamically loading a function from some plugin_name.so (as usual).
But in turn I've a static "helper" library (lets call it helper.a) whose functions are used both from the main program and the main function in the plugin. They don't have to inter-operate in any way, they are just helper functions for text manipulation and such.
This program, once started, cannot be reloaded or restarted, that's why I'm expecting to have new "helper" functionality from the plugin, not from the main program.
So my questin is.. is it possible to force this "plugin function code" in the .so to use (statically link against?) a different (perhaps newer) version of "helper" than the main program?
How could this be done? perhaps by statically linking or otherwise adding helper.a to plugin_name.so?
Nick Meyer's answer is correct on Windows and AIX, but is unlikely to be correct on every other UNIX platform by default.
On most UNIX platforms, the runtime loader maintains a single name space for all symbols, so if you define foo_helper in a.out, and also in plugin.so, and then call foo_helper from either, the first definition visible to the runtime loader (usually that from a.out) is used by default for both calls.
In addition, the picture is complicated by the fact that foo_helper may not be exported from a.out (and thus may be invisible to runtime loader), unless you use -rdynamic flag, or some other shared library references it. In other words, things may appear to work as Nick described them, then you add a shared library to the a.out link line, and they don't work that way anymore.
On ELF platforms (such as Linux), you have great control over symbol visibility and binding. See description of -fvisibility=hidden and -rdynamic in GCC man page, and also -Bsymbolic in linker man page.
Most other UNIX platforms have some way to control symbol bindings as well, but this is necessarily platform-specific.
If your main program and dynamic library both statically link to helper.a, then you shouldn't need to worry about mixing versions of helper.a (as long as you don't do things like pass pointers allocated in helper.a between the .exe and .so boundaries).
The code required from the helper.a is inserted to the actual binary when you link against it. So when you call into helper.a from the .exe, you will be executing code from the code segment of your executable image, and when you call into helper.a from the .so, you will be executing code from the portion of the address space where the .so was loaded. Even if you're calling the same function inside helper.a, you're calling two different 'instances' of that function depending on whether the call was made from the .exe or the .so.
i think this question is the same as yours. How to force symbols from a static library to be included in a shared library build?
The --whole-archive linker option should do this. You'd use it as e.g.
gcc -o libmyshared.so foo.o -lanothersharedlib -Wl,--whole-archive -lmystaticlib
and it works for me.
I'm trying to understand how a certain library works. I've compiled it with my added prinfts and everything is great. Now I want to stop the example program during runtime to look at the call stack, but I can't quite figure out how to do it with gdb. The function I want to break on, is inside a shared library. I've reviewed a previous question here on SO, but the approach doesn't work for me. The language in question is C++. I've attempted to provide the filename and line number, but gdb refuses to understand that, it only lists the source files from the demo app.
Any suggestions?
You can do "break main" first. By the time you hit that, the shared library should be loaded, and you can then set a breakpoint in any of its routines.
There are two cases to consider (and your question doesn't make it clear which case you have):
- your executable is linked with the shared library directly:
this means that GDB will "see" the symbols (and sources) from shared library when you stop on main
- your executable dynamically loads the shared library (e.g. via dlopen):
in that case, GDB will not "see" your shared library until after dlopen completes.
Since you can't see the symbols when you stop at main, I am guessing you have the second case.
You can do "set stop-on-solib-events 1" at the (gdb) prompt, and GDB will stop every time a new shared library is loaded (or unloaded).
You can see which libraries GDB "knows" about via info shared command.
Just wait until you see your target library in that list, before attempting to set breakpoints in it.
Check this out:
http://linux.die.net/man/1/ltrace
it will trace your library calls - probably be useful.
And "strace" does the same thing for system calls.
And with that you should be able to find an entry point... You could set a breakpoint in GDB that way (although i can't explain the details myself)