Large core dump with no function names in GDB - c++

I'm fairly new to debugging core dumps on Linux, and I'm running into a weird issue. Hoping to get some suggestions.
We're getting occasional crashes on our game servers running on AWS Linux boxes. I set up the boxes to generate core dumps. Often, the dumps are around a few hundred MB -- roughly the size of the program in memory. These I'm able to load in gdb and seemingly get a valid backtrace.
But frequently, we're getting dumps that are multiple GB in size. Usually, when I load these core dumps in gdb, there's no usable info in the backtrace.
Here's an example output:
> gdb AAPGOrbis core.3871
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from AAPGOrbis...Reading symbols from <path to>/AAPGOrbis.dbg...done.
done.
[New LWP 3871]
[New LWP 3877]
[New LWP 6557]
[New LWP 3876]
[New LWP 6558]
[New LWP 6559]
warning: Error reading shared library list entry at 0x302e6f732e646165
warning: Error reading shared library list entry at 0x74756d5f64616572
Core was generated by `/opt/aapg/Binaries/Linux/AAPGOrbis aaentry?game=AAGame.AAGamePreGameLobbyDedica'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fed61d001f7 in ?? ()
(gdb) bt full
#0 0x00007fed61d001f7 in ?? ()
No symbol table info available.
#1 0x00007fed61d018e8 in ?? ()
No symbol table info available.
#2 0x0000000000000020 in ?? ()
No symbol table info available.
#3 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)
Any ideas as to what might be causing this? I'm wondering if the size of the core dumps coupled with the lack of valid data is indicative of some really bad memory corruption.
Any insight would be greatly appreciated!

warning: Error reading shared library list entry at 0x302e6f732e646165
warning: Error reading shared library list entry at 0x74756d5f64616572
GDB is attempting to read the list of loaded shared libraries from a clearly bogus address (both of these addresses are ASCII strings ead.so.0read_mut).
The most frequent cause is that you have given GDB the wrong binary: the AAPGOrbis that you give GDB must be exactly the same binary as the one that crashed.
Another possibility is that the shared library list (which is in heap) has indeed been corrupted by the program running amok.

Related

Why is CrashDump.exe hanging? How to go about debugging?

I'm using the CrashCatcher/CrashDebug code from https://github.com/adamgreen/CrashDebug.
My platform is an STM32H753 running freertos. I've been able to generate a dump file based on the HexDump.c example in CrashCatcher. So far so good. The problem arises when I try to run arm-none-eabi-gdb.exe and connect it to CrashDebug.exe. I'm running on windows 10 and using the stm32 tool chain. I built CrashDebug.exe locally using mingw. I run the following command line:
C:\ST\STM32CubeIDE_1.6.1\STM32CubeIDE\plugins\com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.9-2020-q2-update.win32_2.0.0.202105311346\tools\bin\arm-none-eabi-gdb.exe "fw-10143-sensor-hub-stm32-firmware\Debug\THOR_1.5_Sensor_Hub_FW.elf" -ex "set target-charset ASCII" -ex "target remote | /Users/felix/source/Thor/CrashDebug.exe --elf C:\Users\felix\source\Thor\fw-10143-sensor-hub-stm32-firmware\Debug\THOR_1.5_Sensor_Hub_FW.elf --dump C:\Users\felix\source\Thor\crash.dmp"
This runs gdb but the command interpreter becomes unresponsive and never returns to the gdb prompt. I can not even ctrl-c to break. I have to kill the process. Here is what gdb outputs before the hang.
C:\ST\STM32CubeIDE_1.6.1\STM32CubeIDE\plugins\com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.9-2020-q2-update.win32_2.0.0.202105311346\tools\bin\arm-none-eabi-gdb.exe: warning: Couldn't determine a path for the index cache directory.
GNU gdb (GNU Tools for STM32 9-2020-q2-update.20201001-1621) 8.3.1.20191211-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-w64-mingw32 --target=arm-none-eabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from fw-10143-sensor-hub-stm32-firmware\Debug\THOR_1.5_Sensor_Hub_FW.elf...
Remote debugging using | /Users/felix/source/Thor/CrashDebug.exe --elf C:\Users\felix\source\Thor\fw-10143-sensor-hub-stm32-firmware\Debug\THOR_1.5_Sensor_Hub_FW.elf --dump C:\Users\felix\source\Thor\crash.dmp
I'd expect the gdb prompt to come back so I can get a stack trace out.
I tried running an instance of the mingw gdb and connecting to the running CrashDebug.exe process and halting it. When I do this I get the following stack trace.
(gdb) symbol-file c:/users/felix/source/Thor/CrashDebug/bins/win32/CrashDebug.exe
Load new symbol table from "c:\users\felix\source\Thor\CrashDebug\bins\win32\CrashDebug.exe"? (y or n) y
Reading symbols from c:\users\felix\source\Thor\CrashDebug\bins\win32\CrashDebug.exe...done.
(gdb) thread apply all bt
Thread 2 (Thread 3240.0x2804):
#0 0x77154d11 in ntdll!DbgBreakPoint () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x7718dca9 in ntdll!DbgUiRemoteBreakin () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2 0xaa43fb20 in ?? ()
#3 0x7718dc70 in ntdll!DbgUiIssueRemoteBreakin () from C:\WINDOWS\SYSTEM32\ntdll.dll
#4 0x7556fa29 in KERNEL32!BaseThreadInitThunk () from C:\WINDOWS\System32\kernel32.dll
#5 0x77147a7e in ntdll!RtlGetAppContainerNamedObjectPath () from C:\WINDOWS\SYSTEM32\ntdll.dll
#6 0x77147a4e in ntdll!RtlGetAppContainerNamedObjectPath () from C:\WINDOWS\SYSTEM32\ntdll.dll
#7 0x00000000 in ?? ()
Thread 1 (Thread 3240.0xfc4):
#0 0x77152a1c in ntdll!ZwWriteFile () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x7509f32c in WriteFile () from C:\WINDOWS\System32\KernelBase.dll
#2 0x00000134 in ?? ()
#3 0x00000000 in ?? ()
(gdb)
Not very helpful. I'm at a loss as to what to do next. Because of the way CrashDebug is launched I can not breakpoint and step through the code to see what is going wrong. Can anyone advise me as to how to get this working?
This issue has been resolved (very quickly) by the awesome author.
https://github.com/adamgreen/CrashDebug/issues/18

GDB with corefile on remote embedded device - How to get more information about backtrace?

I have a core dump from a C++ application running on an embedded imx6 board (yocto linux). I can put gdb on the box and run it in a terminal to examine the core file like so just fine:
gdb myApplication core.udpsrc256:src.1520419431.5526
I get extremely limited information, and really need to know more about what caused the core dump. All I have is a printout from the application:
(myApplication:5526): GLib-ERROR **: ../../glib-2.46.2/glib/gmem.c:100: failed to allocate 65611 bytes
./run-app.sh: line 8: 5526 Trace/breakpoint trap (core dumped) XDG_RUNTIME_DIR=/run/user/root ./myApplication
Also the core dump backtrace gives some useless stuff. I need to know more stuff up the stack that led to this frame:
#0 0x75ff1910 in raise () from /lib/libc.so.6
[Current thread is 1 (LWP 5533)]
(gdb)
(gdb)
(gdb) bt
#0 0x75ff1910 in raise () from /lib/libc.so.6
#1 0x6b169558 in g_logv () from /usr/lib/libglib-2.0.so.0
#2 0x6b169610 in g_log () from /usr/lib/libglib-2.0.so.0
#3 0x6b1681c4 in g_malloc () from /usr/lib/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Sidenote -- there is some warnings when I startup gdb:
GNU gdb (GDB) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-poky-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from qt5qmlvideo...done.
warning: exec file is newer than core file.
[New LWP 5533]
[New LWP 5526]
[New LWP 5531]
[New LWP 5528]
[New LWP 5534]
[New LWP 21064]
[New LWP 5536]
[New LWP 21065]
[New LWP 5532]
[New LWP 5527]
[New LWP 5530]
[New LWP 5537]
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `./qt5qmlvideo -platform wayland'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x75ff1910 in raise () from /lib/libc.so.6
[Current thread is 1 (LWP 5533)]
(gdb)
Can anyone help? Do I need some of the stuff gdb warns about... or can i rebuild my application and its dependencies in some other configuration that would give more output? Thank you!
Some more notes that may matter -
This is a multithreaded application running a gstreamer pipeline. Many gstreamer plugins generate their own threads, one of which in this pipeline is 'udpsrc'. I'm wondering if it's because this failure happens in one of those threads is the reason why I can't get details, but I want to know how to get it to show the details if possible!
(1)
The
Do you need "set solib-search-path" or "set sysroot"?
is a problem. Check the path (on your device) where linux-vdso.so.1 resides, and include that in the solib-search-path. Similarly for the other shared-object libraries that your program uses. E.g. if some shared-object libraries are in /lib, some are in /usr/adowdy/lib and some are in /usr/adowdy/arm/lib, you can say:
(gdb) set solib-search-path /lib:/usr/adowdy/lib:/usr/adowdy/arm/lib
(2) The
warning: Unable to find libthread_db matching inferior's thread
library, thread debugging will not be available.
is also a problem. See the answer to this question
(3) The
failed to allocate 65611 bytes
is a clue. Are you, by any chance, trying to allocate a negative number of bytes (maybe 65536 - 65611 = -75 bytes)?
Also the core dump backtrace gives some useless stuff.
It's not entirely useless. The stack trace, and the message from the application say the same thing: your application ran out of memory (malloc failed to allocate 65611 bytes).
While a more complete stack would tell you which particular call to g_malloc failed, it's very likely to not matter in practice -- if this g_malloc didn't fail, the next one would.
You likely have a memory leak, or are simply allocating too much memory for what your system allows.
You should look into many debugging tools built for solving this exact problem.

Opening core dump file with different executable but the same sources

I have a coredump file from a colleague's machine.
We both have the same sources for the program and the same third party *.so files (like libmysqlclient18 and several in-house ones).
The problem is that we both compile the software (from the same sources) independently and I want to use his core dump files for inspection with GDB on my machine.
When I try to load the core file and my executable into gdb I get:
user#ubuntu:/mnt/hgfs/share/dir$ gdb prog core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /mnt/hgfs/share/dir/prog...done.
warning: exec file is newer than core file.
[New LWP 4465]
[New LWP 4462]
[New LWP 4464]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `./prog'.
Program terminated with signal 11, Segmentation fault.
#0 0x002d3706 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt full
#0 0x002d3706 in ?? () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#1 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Is this possible for this scenario to work? (I compile the software with debugging symbols enabled, of course)
If not, what are the technical details I'm missing?
I know for sure that it wouldn't be possible if he or I would make modifications to the source, since then the executable would be different, but this is not the case, the third party *.so files and the sources, they all match.
UPDATE:
After installing libc6-dbg as user mkfs suggested in the comments, I get this in gdb:
user#ubuntu:/mnt/hgfs/share/dir$ gdb prog core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /mnt/hgfs/share/dir/prog...done.
warning: exec file is newer than core file.
[New LWP 4465]
[New LWP 4462]
[New LWP 4464]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `./prog'.
Program terminated with signal 11, Segmentation fault.
#0 0x002d3706 in _IO_helper_overflow (s=0x0, c=0) at vfprintf.c:2188
2188 vfprintf.c: No such file or directory.
(gdb) bt
#0 0x002d3706 in _IO_helper_overflow (s=0x0, c=0) at vfprintf.c:2188
#1 0x00000000 in ?? ()
(gdb) bt full
#0 0x002d3706 in _IO_helper_overflow (s=0x0, c=0) at vfprintf.c:2188
written = 47
target = <optimized out>
used = -1226838776
#1 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Is this possible for this scenario to work?
Yes, but you need to ensure that the symbol layout of the binary you build on both machines is identical (or at least close enough). This isn't necessarily trivial: things like local username, pathname for sources or installation directory, and hostname sometimes leak into the built object files, and may cause symbol mismatch.
To check whether the binaries are close, run diff <(nm a.out) <(nm b.out) -- there should only be few differences. If you see a lot of differences, your binaries aren't close enough.
I compile the software with debugging symbols enabled, of course
This may be your first mistake: if your coworker builds with -O2, and you build with -g (and implied -O0), the binary is guaranteed to not match.
You need to build with exactly the flags your coworker builds with (but you may add debugging symbols; e.g. if your coworker builds with -O2, you should build with -O2 -g together).
P.S. Note that you also need identical versions of system libraries on the two machines.

Compiling a Qt application in order to get better debuginfos (Linux)

I downloaded FabariaGest source code then to compile it you have to run:
on qt 4 systems:
cmake -DMAKE_INSTALL_PREFIX=directory -DWANT_QT4=ON -DWANT_QWT=ON
on qt5 systems:
cmake -DMAKE_INSTALL_PREFIX=directory -DWANT_QT5=ON -DWANT_QWTQT5=ON
When it finishes, you run
# make install
Since I get a segmentation fault when I start the software, I tried to run
cd /opt/fabaria_gest
gdb fabaria_gest
run
backtrace
but I only get
[user#localhost fabaria_gest]$ gdb fabaria_gest
GNU gdb (GDB) Fedora 7.9.1-19.fc22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from fabaria_gest...(no debugging symbols found)...done.
(gdb) run
Starting program: /opt/fabaria_gest/fabaria_gest
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 7246.
[New Thread 0x7fffe0798700 (LWP 7244)]
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f568c in QBasicAtomicInteger<int>::load() const ()
(gdb) backtrace
#0 0x00000000004f568c in QBasicAtomicInteger<int>::load() const ()
#1 0x0000000000502a92 in QtPrivate::RefCount::ref() ()
#2 0x0000000000502e37 in QString::QString(QString const&) ()
#3 0x00000000006b9709 in _ZL13getSystemInfov ()
#4 0x00000000006bc29f in main ()
(gdb)
#0 0x00
What can I do to recompile the software in order to get better debuginfos? Setting set follow-fork-mode in GDB did not help
Using
cmake -DCMAKE_BUILD_TYPE=Debug ...
will instruct cmake to generate debug information, which should make the backtrace easier to read.

Is it possible to debug a core file originally produced by an executable which had symbols stripped?

A core file was produced by a Release version of code (g++) which had symbols stripped.
Taking that same (SVN) version of code, I have modified the build options to include symbols.
Should I be able to debug that core file using the executable I've built which includes symbols? It doesn't appear I can, but just want to make sure it isn't something else I'm doing wrong.
bash-3.2$ gdb ./MyTest.53519 -c ~/public_html/core.20375.MyTest
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/me/tmp/MyTest.53519...done.
warning: core file may not match specified executable file.
[New Thread 20388]
[New Thread 20389]
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./MyTest -np1 -p1'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f99c19b39e0 in ?? ()
(gdb) bt
#0 0x00007f99c19b39e0 in ?? ()
#1 0x00007f99bc438650 in ?? ()
#2 0x00007f99c19b58cf in ?? ()
#3 0x00007f9994008880 in ?? ()
#4 0x00007f99b8013080 in ?? ()
#5 0x00007f99bc4386f0 in ?? ()
#6 0x00007f99bc438b00 in ?? ()
#7 0x00007f99bc438810 in ?? ()
#8 0x00007f99c19aa489 in ?? ()
In theory it should work to reproduce symbols by building the program again. But it requires that you have exactly the same version of compiler, with exactly the same settings, same version of OS and same version of libraries, system headers, etc, and the OS doesn't do address randomization [or the debugger understands the address shuffling].
It does appear, however, that SOMETHING is missing in that chain of "If ... and ... " that I listed above - which means your stack and it's content isn't particularly meaningful.
You may want to do x/1000zg $rsp and see if the stack content looks more sane that way than the backtrace. You can always load your debug-symbol'd executable in a separate gdb instance, and use disass 0x124213,+123 to disassemble what is at 0x123213 (and 123 bytes on) in that process.
Also, assuming you are not 100% certain of threads, try info thread to see what threads are about.
I don't envy you...
I have had to do this and let me tell you it is not easy.
The method I ended up using was much like yours. I built a copy of the library with all of the same options, plus debugging symbols.
Then I looked into the core dump to find the crash location. I used gdb disassembly view to find the machine instructions and backed up until I found the function start.
Then I went into objdump -dC on the original library without symbols to find the matching location by searching for matching series of machine instructions. That gave me the offsets in the library.
Then I was able to go into the library built with debugging symbols and find the same function. It was usually in the same general area, although small shifts could happen because of random numbers used by the compiler. (Note: I later started using GCC's -frandom-seed option to force use of the SAME random numbers on each compile run. Read the docs. Must be a different seed for each source file.)
From there it was a matter of reading the disassembly to determine what had gone wrong. If the core dump said it crashed from reading a NULL pointer in register $r12 I had to figure from the debug symbol version where $r12 got its value.
After doing this a few times I changed the build system to use the same random seeds for every build and to build a debug symbol version from the start, which is stripped to produce the final binaries. All versions are stuffed into an SVN server to be pulled out as needed later. Although since each build has unique directory names a simple NFS directory would work just as well.