gdb only finds some debug symbols - c++

So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1

Related

GDB is not able to read the core file it produced

I'm debugging a SIGSEGV error on a huge application running on Yocto/ARM64 (iMX8QM).
If I run the application in GDB, I can get the backtrace:
Thread 1 "HmiAppCentral" received signal SIGSEGV, Segmentation fault.
0x0000000000b0a0d0 in kanzi::Node3D::~Node3D() ()
(gdb) bt
#0 0x0000000000b0a0d0 in kanzi::Node3D::~Node3D() ()
#1 0x0000000000cd4e44 in kanzi::Model3D::~Model3D() ()
#2 0x0000000000b09c38 in kanzi::Node3D::removeChild(unsigned long) ()
[...]
Then I export the core dump, quit GDB and restart it:
(gdb) generate-core-file
warning: target file /proc/2279/cmdline contained unexpected null characters
[...]
gdb -c core.2279
Then GDB is not able to print the backtrace anymore:
(gdb) bt full
#0 0x0000000000b0a0d0 in ?? ()
No symbol table info available.
#1 0x0000000000000001 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
The address of the first frame is correct (0x0000000000b0a0d0), however GDB is not able to find the function name when reloading the core dump. Any hint?
Just like when the OS creates a core file, the original program executable is not included in the core file itself, and it is this executable that contains the debug information (or allows GDB to find the debug information).
What this means is, if you want to debug with the debug information then you need to provide both the executable and the core file, so something like:
gdb my_program.exe -c core.pid

GDB reading symbols with "symbol-file" command on a core file

I am trying to analyze segfault on a core file on linux. I am not sure if the following behavior is correct, thus i deliberately caused a segfault using
#include <signal.h>
int main() {
raise(SIGSEGV);
}
the binary is build with debug info i.e.
file mainTestFile
mainTestFile: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, with debug_info, not stripped
notice how it does say "with debug_info, not stripped" at the end
when i execute the binary i get a core file generated which is called core-mainTestFile.20474
(In order to generate the core file i hat to set my ulimit to unlimited i.e.
ulimit -c unlimited
)
if i run only the binary under GDB and do backtrace "bt" then i get the segfault and i get all names of the functions involved
printed nicely i.e. notice how the gdb says when starting "reading symbols from ./mainTestFile...done."
gdb ./mainTestFile
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
....
reading symbols from ./mainTestFile...done.
(gdb) run
Starting program: /src/exe/mainTestFile
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
__GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000000000402dad in main (argc=1, argv=0x7fffffffda38) at /src/exe/main.cpp:53
(gdb)
however if i try to anaylise only the core file with gdb like that
gdb -c core-mainTestFile.20474
then i get only question marks
the when i execute "bt" then i do not see the names of the methods, instead i get question marks
(gdb) bt
#0 0x00007f34d8842e97 in ?? ()
#1 0x0000000000000000 in ?? ()
they only workaround i found is to supply the binary directly at the command line then it all gets printed nicely.
even if i try to tell GDB to use the symbols file and point that to the binary file which does have the symbols
i.e.
symbol-file /src/exe/mainTestFile
then GDB says
Reading symbols from /src/exe/mainTestFile...done
and when i execute bt i see the question marks again? Why is that. Is GDB not able to get the symbols out of the binary?
it only works if supply the binary directly on the command like like:
gdb /src/exe/mainTestFile -c core-mainTestFile.20474
my question is should the GDB be able to read symbols of the binary when directly supplying him the binary over the "symbol-file" command or not. Why is this working when supping him directly the binary over the command line, what is the difference?
should the GDB be able to read symbols of the binary when directly supplying him the binary over the "symbol-file" command or not.
In theory, using symbol-file and core-file commands in either order in GDB should be equivalent.
But there is a bug: symbol-file followed by core-file works, and the opposite order doesn't.
Since generally the end-user can always rearrange his commands into the order that works, this has never propagated to the top of any GDB developer's queue of things to fix.
Related bug (but not an exact duplicate).

Gdb cannot find assertion failure positions after recompiling

It seems that gdb fails finding the code position of an assertion failure, after I recompile my code. More precisely, I expect the position of a signal raise, relative to an assertion failure, to be
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.`6
while instead I obtain
0x00007ffff7a5ff00 in ?? ()
For instance, consider the following code
#include <assert.h>
int main()
{
assert(0);
return 0;
}
compiled with debug symbols and debugged with gdb.
> gcc -g main.c
> gdb a.out
On the first run of gdb, the position is found, and the backtrace is reported correctly:
GNU gdb (Gentoo 8.0.1 p1) 8.0.1
...
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
#1 0x00007ffff7a61baa in abort () from /lib64/libc.so.6
#2 0x00007ffff7a57cb7 in ?? () from /lib64/libc.so.6
#3 0x00007ffff7a57d72 in __assert_fail () from /lib64/libc.so.6
#4 0x00005555555546b3 in main () at main.c:5
(gdb)
The problem comes when I recompile the code. After recompiling, I issue the run command in the same gdb instance. Gdb re-reads the symbols, starts the program from the beginning, but does not find the right position:
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
`/home/myself/a.out' has changed; re-reading symbols.
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in ?? ()
(gdb) bt
#0 0x00007ffff7a5ff00 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) up
Initial frame selected; you cannot go up.
(gdb) n
Cannot find bounds of current function
At this point the debugger is unusable. One cannot go up, step forward.
As a workaround, I can manually reload the file, and positions are found again.
(gdb) file a.out
Load new symbol table from "a.out"? (y or n) y
Reading symbols from a.out...done.
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb)
Unfortunately, after reloading the file this way, gdb fails resetting the breakpoints.
ERRATA CORRIGE: I was experiencing failure in resetting the breakpoints using gdb 7.12.1. After upgrading to 8.0.1 the problem vanished. Supposedly, this was related to the bugfix https://sourceware.org/bugzilla/show_bug.cgi?id=21555. However, code positions where assertions fail still cannot be found correctly.
Does anybody have any idea about what is going on here?
This has started happening after a system update. The system update recompiled all system libraries, including the glibc, as position independent code, i.e., compiled with -fPIC.
Also, the version of the gcc I am using is 6.4.0
Here is a workaround. Since file re-reads the symbols correctly, while run does not, we can define a hook for the command run so to execute file before:
define hook-run
pi gdb.execute("file %s" % gdb.current_progspace().filename)
end
after you change the source file and recompile u are generating a different file from the one loaded to GDB.
you need to stop the running debug cession and reload the file.
you cant save the previously defined breakpoints and watch points in the file to a changed source, since gdb is actually inserting additional code to your source to support breakpoints and registrar handlers.
if you change the source the the behavior is undefined and you need to reset those breakpoints.
you can refer to gdb manual regarding saving breakpoints in a file as
Mark Plotnick suggested, but it wont work if you change the file(from my experience)
https://sourceware.org/gdb/onlinedocs/gdb/Save-Breakpoints.html

How to set earliest possible breakpoint

I'm trying to stop right after the module is loaded in gdb. Let's assume that the binary is completely stripped out of all symbol informations, so there's no main.
Ideally I'd set the breakpoint on the entry point, but that idea breaks down due to relocations:
(gdb) info target
Symbols from "./application".
Local exec file:
`./application', file type elf64-x86-64.
Entry point: 0xc154
...
(gdb) break *0xc154
Breakpoint 1 at 0xc154
(gdb) r
Starting program: ./application
Warning:
Cannot insert breakpoint 1.
Error accessing memory address 0xc154: Input/output error.
(gdb) info target
Symbols from "./application".
Unix child process:
Using the running image of child process 22835.
While running this, GDB does not access memory from...
Local exec file:
`./application', file type elf64-x86-64.
Entry point: 0x555555560154
Even though that kind-of works (I could set a new breakpoint on the new address and disable the original), it cannot be easily executed via gdb script / batch mode, because it has a failing instruction in the middle.
Is there a way to do that? Ideally something like "run single instruction", rather than "run" would be useful.
Update:
GDB-8.1 implemented starti command, which makes this very easy.
Entry point: 0xc154
This is a dynamically-linked, position-independent (PIE) binary.
You want to stop in the dynamic linker after that binary is loaded and relocated, but before it executed anything.
(gdb) set stop-on-solib-events 1
(gdb) run
Starting program: /tmp/a.out
Stopped due to shared library event (no libraries added or removed)
(gdb) info target
Symbols from "/tmp/a.out".
Unix child process:
Using the running image of child process 13746.
While running this, GDB does not access memory from...
Local exec file:
`/tmp/a.out', file type elf64-x86-64.
Entry point: 0x5555555545f0
...
(gdb) bt
#0 __GI__dl_debug_state () at dl-debug.c:77
#1 0x00007ffff7ddd488 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=0x7ffff7ffe870) at rtld.c:1678
#2 0x00007ffff7defb24 in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7ddc6e0 <dl_main>) at ../elf/dl-sysdep.c:244
#3 0x00007ffff7ddf365 in _dl_start_final (arg=0x7fffffffe440) at rtld.c:338
#4 _dl_start (arg=0x7fffffffe440) at rtld.c:564
#5 0x00007ffff7ddb6b8 in _start () from /lib64/ld-linux-x86-64.so.2

gdb no symbol table loaded for core file

There was a core dump produced at the customer end for my application and while looking at the backtrace I don't have the symbols loaded...
(gdb) where
#0 0x000000364c032885 in ?? ()
#1 0x000000364c034065 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) bt full
#0 0x000000364c032885 in ?? ()
No symbol table info available.
#1 0x000000364c034065 in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.
One think I want to mention in here is that the application being used is build with -g option.
To me it seems that the required libraries are not being loaded. I tried to load the libraries manually using the "symbol-file", but this doesn't help.
What could be the possible issue?
No symbol table info available.
Chances are you invoked GDB incorrectly. Don't do this:
gdb core
gdb -c core
Do this instead:
gdb exename core
Also see this answer for what you'll likely have to do to get meaningful crash stack trace for a core from customer's machine.
I was facing a similar issue and later found out that I am missing -g option, Make sure you have compiled the binary with -g.
This happens when you run gdb with path to executable that does not correspond to the one that produced the core dump.
Make sure that you provide gdb with the correct path.
<put an example of correct code or commands here>