Debugging a haxe based ndk app on android with gdb - c++

I am trying to debug a openfl/haxe app on android with gdb. Since haxe/openfl compiles to c++ which is then compiled using the ndk, I am basically trying to debug an ndk app.
I got gdbserver attached to the apps process and can remote debug it using arm-linux-androideabi-gdb. But as soon as I try to get a backtrace (the app is running fine at this moment), I get this:
#0 0xb6e70b10 in ?? ()
#1 0xb6e4833c in ?? ()
#2 0xb6e4833c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I understand that this can happen when I corrupted the stack by using bad pointers, but since the app is running fine (am actually looking for a runtime error, not a crash), that is not the case.

Related

gdb only finds some debug symbols

So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1

gbd fails to link to source

I'm having problems using gdb with codeblocks on windows. Using even the default console app with no changes, right after a fresh install of codeblocks, it stops at breakpoints but does not highlight the line in gdb. doesn't seem to know where to look.
Project path is C:\Users\username\Documents\codeblocks
Building with -g (default)
bt gives me this
#0 0x0009df38 in ?? ()
#1 0x00000000 in ?? ()

How to debug DPDK libraries to diagnose segmentation fault?

I am working with DPDK version 18.11.8 stable on Linux, using a gcc x64 build.
At runtime I get a segmentation fault. Running gdb on the core dump gives this backtrace:
#0 0x0000000000f65680 in rte_eth_devices ()
#1 0x000000000048a03a in rte_eth_rx_burst (nb_pkts=7,
rx_pkts=0x7fab40620480, queue_id=0, port_id=<optimized out>)
at
/opt/dpdk/dpdk-18.08/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3825
#2 Socket_poll (ucRxPortId=<optimized out>, ucRxQueId=ucRxQueId at entry=0
'\000', uiMaxNumOfRxFrm=uiMaxNumOfRxFrm at entry=7,
pISocketListener=pISocketListener at entry=0xf635d0 <FH_gtFrontHaulObj+16>)
at /data/<snip>/SocketClass.c:2188
#3 0x000000000048b941 in FH_perform (args_ptr=<optimized out>) at
/data/<snip>/FrontHaul.c:281
#4 0x00000000005788e4 in eal_thread_loop ()
#5 0x00007fab419fae65 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fab4172388d in clone () from /lib64/libc.so.6
So it seems that rte_eth_rx_burst() calls rte_eth_devices () and that function crashes, presumably because of an illegal memory access. Possibly a hugepages problem?
I want to enable more debug info in DPDK. I am building DPDK using:
usertools/dpdk-setup.sh
Am I correct in thinking that the build commands in that script use make and I should modify the appropriate:
config/defconfig_*
file (defconfig_x86_64-native-linuxapp-gcc in my case) ?
If so, would these values be appropriate?
CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=y
RTE_LOG_LEVEL=RTE_LOG_DEBUG
RTE_LIBRTE_ETHDEV_DEBUG=y
(not sure whether all values should be prefixed by 'CONFIG_'?)
I tried building DPDK using:
$ export EXTRA_CFLAGS='-O0 -g'
$ make install T=x86_64-native-linuxapp-gcc
but that gave no extra info in the backtrace.
EDIT: error is identified update is Fixed and running without crashing now
using chat room dpdk-debug, we were able to rebuild the libraries and application with proper CFLAGS. Using gdb have identified the probable cause is in rte_eth_rx_burst not being passed with pointer array for mbuf.
Based on the GDB details for frame 1, it looks the application is not build with the EXTRA_CFLAGS (assuming you are using DPDK example Makefile). The right way to build an DPDK application for debugging is to follow the steps as
cd [dpdk target folder]
make clean
make EXTRA_CFLAGS='-O0 -ggdb'
cd [application folder]
make EXTRA_CFLAGS='-O0 -ggdb'
then use GDB in TUI or non-TUI mode to analyze the error.
note:
one of the most common mistakes I commit in rx_burst, is passing *mbuf_array instead of **mbuf_array as the argument.
if custom Makefile is used for the application, pass the EXTRA_CFLAGS as CFLAGS+="-O0 -ggdb"

gdb error: Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I am having problems debugging a multi-threaded C++ application on an ARMv7 targets. The issue shows up on two different ARM targets, and I use different toolchains for them:
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I've checked some threads, but (due to having the same issue with a minimalistic multithreading program) it seems that I
* don't have a corrupted stack
* any issues with virtual functions or function pointers
Mostly I'm using the target Toradex Colibri iMX6 which has an Angstrom Linux 2016.12 running on it.
Questions
Is there something wrong with how I build the program?
is there sth. wrong with how I'm using the gdbserver / gdb?
which options do I have to fix the debugger output?
I debug via gdbserver on the target and the toolchain's arm-linux-gnueabihf-gdb on my host.
There's no native gdb for any of the targets.
I can build the application for Linux x86, but can't reproduce the bug so far on the PC.
SW-problem
It seems that two of the threads are getting stuck, maybe due to a deadlock of two mutexes, or a thread trying to get one mutex a second time
(although that seems unlikely, the bug showed up after I've configured a mutex as recursive; I'll have to check for a second mutex used in that thread).
All other threads seem to keep running fine.
SW-build and debug configuration
Build settings:
I'm using a toolchain provided by Toradex with arm-linux-gnueabihf-g++ and
-std=c++11 -Wall -Werror -Wextra -Wno-unused-result -Winit-self -Wmissing-include-dirs -Wpointer-arith -Wno-format-security -Wno-implicit-fallthrough -Wl,-Map=output.map -ggdb -g3 -fno-inline -O0
I pass the same program to the debuggers (i.e. to gdbserver on the target and to arm-linux-gnueabihf-gdb on the host)
$ (gdb) set sysroot </path/to/libs>
$ (gdb) file <binary>
$ (gdb) target remote IP:port
shared libraries:
For shared libraries, I've copied the /usr/lib and /lib from the target to the host. I've then downloaded the debug libraries which are available for the target/distribution and replaced the original shared libs with those.
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x76fcf800 0x76feaa70 Yes /path/to/libs/lib/ld-linux-armhf.so.3
0x76fb9700 0x76fbcd2c Yes /path/to/libs/lib/librt.so.1
0x76f940c0 0x76fa2e0c Yes /path/to/libs/lib/libpthread.so.0
0x76f01630 0x76f72a10 Yes (*) /path/to/libs/usr/lib/libstdc++.so.6
0x76e14d38 0x76e48028 Yes /path/to/libs/lib/libm.so.6
0x76e041b0 0x76e0e7ec Yes /path/to/libs/lib/libgcc_s.so.1
0x76cd1000 0x76dc2b10 Yes /path/to/libs/lib/libc.so.6
0x7449c96c 0x744a29e4 Yes /path/to/libs/lib/libnss_files.so.2
(*): Shared library is missing debugging information.
I could not find a debug library for libstdc++.so.6.
Debugging results
Debug simple single-threaded application with crash on target:
works, i.e. does not report the error message from above
Debug simple multi-threaded application, with or without deadlock, on target:
(gdb) bt
#0 0x76d6cd44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Debug the same simple multi-threaded application, with or without deadlock, on Linux-x86:
works
Debug buggy application on PC:
seems to work, but we cannot reproduce the bug so far
Debug the affected application on target:
Thread 1 received signal SIGINT, Interrupt.
0x76f9facc in __lll_robust_lock_wait (futex=0x257b94 <namespace1::function()::su_place+20>, private=0)
at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:46
46 /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c: No such file or directory.
(gdb) thread apply all bt
Thread 6 (Thread 6606.6630):
#0 0x76d832c8 in __setreuid (ruid=8, euid=0)
at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/i386/setreuid.c:29
#1 0x7efff06c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 6606.6629):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (Thread 6606.6628):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 6606.6627):
#0 0x76d55d44 in uname () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 6606.6626):
#0 __lll_robust_lock_wait (
futex=0x25b950 <namespace_2::a_function()::a_static_member+152>, private=128)
at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:31
#1 0x00000080 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 6606.6606):
#0 0x76f9facc in __lll_robust_lock_wait (futex=0x257b94 <namespace1::function()::su_place+20>,
private=0) at /usr/src/debug/glibc/2.24-r0/git/nptl/lowlevelrobustlock.c:46
#1 0x00000002 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Update
I could find the bug (mutex deadlock) using valgrind with the PC-build of the SW.
However, the issue here is about the problems with gdb, which I could not understand or solve yet.
I've then downloaded the debug libraries which are available for the target/distribution and replaced the original shared libs with those.
This is possibly the wrong thing to do (depending on what exactly you mean by "debug libraries"), and may be contributing to your problem. See this answer.
As a first step, I would use the exact same libraries that you are using on the target, and check whether that changes the behavior of GDB.

freeglut segfaults, no useful info from gdb

In this program I'm writing, I've been using freeglut and, generally, it has been working. However, sometimes when there is some issue in the program that often has nothing to do with rendering at all, I get a segfault at glutInit() and no explanation from GDB.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7942409 in glutInit () from /usr/lib/x86_64-linux-gnu/libglut.so.3
Backtrace:
#0 0x00007ffff7942409 in glutInit () from /usr/lib/x86_64-linux-gnu/libglut.so.3
#1 0x0000000000415d4c in initGL () at ../gfx/render.cpp:62
#2 0x00000000004035f3 in main () at battle.cpp:49
Running with rendering disabled produces no errors.
So, I am wondering what I need to do to get more information on these failures. Can I get the backtrace to look inside liblut.so.3?
(As an aside, any recommendations for a more reliable toolkit than freeglut are appreciated.)
Can I get the backtrace to look inside liblut.so.3?
You already have a backtrace that is looking inside libglut.so.3.
You need to either
install debug symbols for it (sudo apt-get install freeglut3-dbg or some such), or
compile libglut.so from source, or
debug at assembly level: x/i $pc, disas, info registers, etc.