I have a backtrace in GDB that has frames pointing to a .so object which I compiled myself from source files. How can I make GDB frames point to source files instead of the .so object?
To illustrate, I would like this:
#0 FuncA (...) at path/to/fileA.c:741
#1 0x00000000004e01c6 in FuncB (...) from path/to/mylib.so
#2 0x0000000000436c75 in FuncC (...) from path/to/mylib.so
to turn to this:
#0 FuncA (...) at path/to/fileA.c:741
#1 0x00000000004e01c6 in FuncB (...) at path/to/fileB.c:123
#2 0x0000000000436c75 in FuncC (...) at path/to/fileC.c:234
I figured it out while typing the above question: I had to compile the mylib.so with -ggdb3 flag.
Related
So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1
I'm debugging a SIGSEGV error on a huge application running on Yocto/ARM64 (iMX8QM).
If I run the application in GDB, I can get the backtrace:
Thread 1 "HmiAppCentral" received signal SIGSEGV, Segmentation fault.
0x0000000000b0a0d0 in kanzi::Node3D::~Node3D() ()
(gdb) bt
#0 0x0000000000b0a0d0 in kanzi::Node3D::~Node3D() ()
#1 0x0000000000cd4e44 in kanzi::Model3D::~Model3D() ()
#2 0x0000000000b09c38 in kanzi::Node3D::removeChild(unsigned long) ()
[...]
Then I export the core dump, quit GDB and restart it:
(gdb) generate-core-file
warning: target file /proc/2279/cmdline contained unexpected null characters
[...]
gdb -c core.2279
Then GDB is not able to print the backtrace anymore:
(gdb) bt full
#0 0x0000000000b0a0d0 in ?? ()
No symbol table info available.
#1 0x0000000000000001 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
The address of the first frame is correct (0x0000000000b0a0d0), however GDB is not able to find the function name when reloading the core dump. Any hint?
Just like when the OS creates a core file, the original program executable is not included in the core file itself, and it is this executable that contains the debug information (or allows GDB to find the debug information).
What this means is, if you want to debug with the debug information then you need to provide both the executable and the core file, so something like:
gdb my_program.exe -c core.pid
I am having problems debugging a multi-threaded C++ application on an ARMv7 targets. The issue shows up on two different ARM targets, and I use different toolchains for them:
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I've checked some threads, but (due to having the same issue with a minimalistic multithreading program) it seems that I
* don't have a corrupted stack
* any issues with virtual functions or function pointers
Mostly I'm using the target Toradex Colibri iMX6 which has an Angstrom Linux 2016.12 running on it.
Questions
Is there something wrong with how I build the program?
is there sth. wrong with how I'm using the gdbserver / gdb?
which options do I have to fix the debugger output?
I debug via gdbserver on the target and the toolchain's arm-linux-gnueabihf-gdb on my host.
There's no native gdb for any of the targets.
I can build the application for Linux x86, but can't reproduce the bug so far on the PC.
SW-problem
It seems that two of the threads are getting stuck, maybe due to a deadlock of two mutexes, or a thread trying to get one mutex a second time
(although that seems unlikely, the bug showed up after I've configured a mutex as recursive; I'll have to check for a second mutex used in that thread).
All other threads seem to keep running fine.
SW-build and debug configuration
Build settings:
I'm using a toolchain provided by Toradex with arm-linux-gnueabihf-g++ and
-std=c++11 -Wall -Werror -Wextra -Wno-unused-result -Winit-self -Wmissing-include-dirs -Wpointer-arith -Wno-format-security -Wno-implicit-fallthrough -Wl,-Map=output.map -ggdb -g3 -fno-inline -O0
I pass the same program to the debuggers (i.e. to gdbserver on the target and to arm-linux-gnueabihf-gdb on the host)
$ (gdb) set sysroot </path/to/libs>
$ (gdb) file <binary>
$ (gdb) target remote IP:port
shared libraries:
For shared libraries, I've copied the /usr/lib and /lib from the target to the host. I've then downloaded the debug libraries which are available for the target/distribution and replaced the original shared libs with those.
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x76fcf800 0x76feaa70 Yes /path/to/libs/lib/ld-linux-armhf.so.3
0x76fb9700 0x76fbcd2c Yes /path/to/libs/lib/librt.so.1
0x76f940c0 0x76fa2e0c Yes /path/to/libs/lib/libpthread.so.0
0x76f01630 0x76f72a10 Yes (*) /path/to/libs/usr/lib/libstdc++.so.6
0x76e14d38 0x76e48028 Yes /path/to/libs/lib/libm.so.6
0x76e041b0 0x76e0e7ec Yes /path/to/libs/lib/libgcc_s.so.1
0x76cd1000 0x76dc2b10 Yes /path/to/libs/lib/libc.so.6
0x7449c96c 0x744a29e4 Yes /path/to/libs/lib/libnss_files.so.2
(*): Shared library is missing debugging information.
I could not find a debug library for libstdc++.so.6.
Debugging results
Debug simple single-threaded application with crash on target:
works, i.e. does not report the error message from above
Debug simple multi-threaded application, with or without deadlock, on target:
(gdb) bt
#0 0x76d6cd44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Debug the same simple multi-threaded application, with or without deadlock, on Linux-x86:
works
Debug buggy application on PC:
seems to work, but we cannot reproduce the bug so far
Debug the affected application on target:
Thread 1 received signal SIGINT, Interrupt.
0x76f9facc in __lll_robust_lock_wait (futex=0x257b94 <namespace1::function()::su_place+20>, private=0)
at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:46
46 /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c: No such file or directory.
(gdb) thread apply all bt
Thread 6 (Thread 6606.6630):
#0 0x76d832c8 in __setreuid (ruid=8, euid=0)
at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/i386/setreuid.c:29
#1 0x7efff06c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 6606.6629):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (Thread 6606.6628):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 6606.6627):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 6606.6626):
#0 __lll_robust_lock_wait (
futex=0x25b950 <namespace_2::a_function()::a_static_member+152>, private=128)
at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:31
#1 0x00000080 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 6606.6606):
#0 0x76f9facc in __lll_robust_lock_wait (futex=0x257b94 <namespace1::function()::su_place+20>,
private=0) at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:46
#1 0x00000002 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Update
I could find the bug (mutex deadlock) using valgrind with the PC-build of the SW.
However, the issue here is about the problems with gdb, which I could not understand or solve yet.
I've then downloaded the debug libraries which are available for the target/distribution and replaced the original shared libs with those.
This is possibly the wrong thing to do (depending on what exactly you mean by "debug libraries"), and may be contributing to your problem. See this answer.
As a first step, I would use the exact same libraries that you are using on the target, and check whether that changes the behavior of GDB.
It seems that gdb fails finding the code position of an assertion failure, after I recompile my code. More precisely, I expect the position of a signal raise, relative to an assertion failure, to be
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.`6
while instead I obtain
0x00007ffff7a5ff00 in ?? ()
For instance, consider the following code
#include <assert.h>
int main()
{
assert(0);
return 0;
}
compiled with debug symbols and debugged with gdb.
> gcc -g main.c
> gdb a.out
On the first run of gdb, the position is found, and the backtrace is reported correctly:
GNU gdb (Gentoo 8.0.1 p1) 8.0.1
...
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
#1 0x00007ffff7a61baa in abort () from /lib64/libc.so.6
#2 0x00007ffff7a57cb7 in ?? () from /lib64/libc.so.6
#3 0x00007ffff7a57d72 in __assert_fail () from /lib64/libc.so.6
#4 0x00005555555546b3 in main () at main.c:5
(gdb)
The problem comes when I recompile the code. After recompiling, I issue the run command in the same gdb instance. Gdb re-reads the symbols, starts the program from the beginning, but does not find the right position:
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
`/home/myself/a.out' has changed; re-reading symbols.
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in ?? ()
(gdb) bt
#0 0x00007ffff7a5ff00 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) up
Initial frame selected; you cannot go up.
(gdb) n
Cannot find bounds of current function
At this point the debugger is unusable. One cannot go up, step forward.
As a workaround, I can manually reload the file, and positions are found again.
(gdb) file a.out
Load new symbol table from "a.out"? (y or n) y
Reading symbols from a.out...done.
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb)
Unfortunately, after reloading the file this way, gdb fails resetting the breakpoints.
ERRATA CORRIGE: I was experiencing failure in resetting the breakpoints using gdb 7.12.1. After upgrading to 8.0.1 the problem vanished. Supposedly, this was related to the bugfix https://sourceware.org/bugzilla/show_bug.cgi?id=21555. However, code positions where assertions fail still cannot be found correctly.
Does anybody have any idea about what is going on here?
This has started happening after a system update. The system update recompiled all system libraries, including the glibc, as position independent code, i.e., compiled with -fPIC.
Also, the version of the gcc I am using is 6.4.0
Here is a workaround. Since file re-reads the symbols correctly, while run does not, we can define a hook for the command run so to execute file before:
define hook-run
pi gdb.execute("file %s" % gdb.current_progspace().filename)
end
after you change the source file and recompile u are generating a different file from the one loaded to GDB.
you need to stop the running debug cession and reload the file.
you cant save the previously defined breakpoints and watch points in the file to a changed source, since gdb is actually inserting additional code to your source to support breakpoints and registrar handlers.
if you change the source the the behavior is undefined and you need to reset those breakpoints.
you can refer to gdb manual regarding saving breakpoints in a file as
Mark Plotnick suggested, but it wont work if you change the file(from my experience)
https://sourceware.org/gdb/onlinedocs/gdb/Save-Breakpoints.html
I want to load shared library (.so) from gdb, I found this command :
(gdb) call dlopen("path/to/lib.so",..)
But it doesn't work, I link my program with -ldl.
The error I get is:
No symbol "dlopen" in current context
what did I miss ?
I found a command that resolve the half of this topic. I explain:
First, you should load the shared object into the process:
(gdb) set environment LD_PRELOAD /usr/lib/libdl.so
After that, we define the file to debbuging
(gdb) file pgm
For testing, we put breakpoint in main i.e
(gdb) break main
Now, we run the program
(gdb) run
and we call dlopen
(gdb) call dlopen("/path/to/lib.so",2)
until now it's work, but when I put the last command, I have:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7de7f09 in ?? () from /lib64/ld-linux-x86-64.so.2
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(_gdb_expr) will be abandoned.
When the function is done executing, GDB will silently stop.
nothing changes when I modify 'unwindonsignal to (on/off)'
What did I forget this time ?
useful