gdb: Cannot find new threads: generic error after system update - gdb

I am running an OpenEmbedded based Linux on an ARM board, where my application is running. I used to run kernel 2.6.35, gdb 6.8 and gcc 4.3. Lately I've updated the system to kernel 2.6.37, gdb 7.4 (also tried 7.3) and gcc 4.6.
Now, my application can not be debugged anymore (on the ARM board), everytime I try to run it in gdb I get the error "gdb: Cannot find new threads: generic error". The application makes use of pthreads and does link against pthreads (readelf lists libpthread.so.0 as a dependency). The suggested solutions I found so far all recommend linking to pthread which I am already doing. The other recommendation I found was to use LD_PRELOAD=/lib/libpthread.so.0 which does not make any difference for me.
Debugging the x86 builds of the application works without a problem.
EDIT: To answer the questions posed in the first answer, I am using gdb on the target (ARM), i.e. no cross-gdb. I also have not stripped libpthread.so.0 (/lib/libpthread-2.9.so: ELF 32-bit LSB shared object, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, not stripped). glibc remained at version 2.9, and the update involved recompiling the whole linux image
EDIT2: Removing /lib/libthread-db* allows debugging (with consequent warnings and obviously some features will not work)
EDIT3: Using set debug libthread-db 1 I get:
Starting program: /home/root/app
Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib/libthread_db.so.1.
td_ta_new failed: application not linked with libthread
thread_db_load_search returning 0
Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib/libthread_db.so.1.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: Unable to set global thread event mask: generic error
Warning: find_new_threads_once: find_new_threads_callback: cannot get thread info: generic error
Found 0 new threads in iteration 0.
Warning: find_new_threads_once: find_new_threads_callback: cannot get thread info: generic error
Found 0 new threads in iteration 1.
Warning: find_new_threads_once: find_new_threads_callback: cannot get thread info: generic error
Found 0 new threads in iteration 2.
Warning: find_new_threads_once: find_new_threads_callback: cannot get thread info: generic error
Found 0 new threads in iteration 3.
thread_db_load_search returning 1
Warning: find_new_threads_once: find_new_threads_callback: cannot get thread info: generic error
Found 0 new threads in iteration 0.
Cannot find new threads: generic error
(gdb) Write failed: Broken pipe

There are two common causes of this error:
You have a mis-match between libpthread.so.0 and libthread_db.so.1
You have stripped libpthread.so.0
Your message isn't entirely clear:
are you using cross GDB to debug the application running on ARM from an x86 host?
have you updated (or rebuilt) glibc in addition to updating the kernel, etc.
If you stripped libpthread.so.0, then don't do that -- libthread_db needs it to not be stripped.
If you are cross-debugging, make sure to rebuild libthread_db.so.1 on host to match glibc on target.
Update:
not cross-debugging
did not strip libpthread
So, something in your GDB or glibc appears to have been broken. You can try to see what that is by
Putting removed libthread_db back, and
(gdb) set debug libthread-db 1
(gdb) run
Update 2:
warning: Unable to set global thread event mask: generic error
This means that GDB was able to look up td_ta_set_event function in libthread_db, and called it, but the function returned an error. One way this could happen is if GDB was unable to find __nptl_threads_events function in libpthread.so.0. What does this command produce:
nm /lib/libpthread.so.0 | grep __nptl_threads_events
If that command produces output, e.g.:
000000000021c294 b __nptl_threads_events
then I am not sure what else is failing. You'll likely have to debug GDB itself to figure out what's happening.
If on the other hand the grep above produces no output, then it's a problem with your toolchain: you'll have to figure out why that variable doesn't appear in your rebuilt libpthread.so.0.

Related

wsl2, gdb and cross-debug 32bit i386 - failing to map addresses to symbols ( in ?? () )

I'm having issues when trying to debug a cross compiled binary on my WSL2 host and only end up with backtraces with addresses in ?? (), any hint's on what to verify and change are welcome!
file mybin shows:
ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=..., with debug_info, not stripped
The application is started in WSL2 via qemu-i386(based on output from ps)
NOTE: I was wondering a bit about this because in my prev dev env using vm-ware and ubuntu 18.04 i was not seeing qemu-i386 used but did not think more about it based on WSL2 issues regarding 32bit application support referring to qemu and binfmt solving it.
I'm running gdb-multiarch (GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2)
Loading the executable and listing symbols with info functions <a_regex> works fine but when attaching and breaking i get bt's like this (NOTE output below is taken from VSCode with a few logging flags enabled, hence the -exec bt thing for example):
-exec bt
1: (777701) <-1183-interpreter-exec console "bt"
1: (777704) ->~"#0 0x000000000047a4ea in ?? ()\n"
1: (777707) ->~"#1 0x00007ffd2dcdb1c0 in ?? ()\n"
1: (777709) ->~"#2 0x0000000000467efc in ?? ()\n"
1: (777711) ->~"#3 0x0000000000000000 in ?? ()\n"
NOTE: When attaching i get the following warning:
warning: Selected architecture i386 is not compatible with reported target architecture i386:x86-64
setting architecture to i386:x86-64 is accepted by gdb but makes no difference
Setting a breakpoint gives the following error:
1: (40020) ->&"Cannot insert breakpoint 1.\n"
1: (40020) ->&"Cannot access memory at address 0xbce346f\n"
1: (40020) ->&"\n"
1: (40023) ->^error,msg="Command aborted."
UPDATE: SOLVED
Thought installing gcc-multilib solved it but it seems more likely the issue was because of a bug in Docker Desktop which has been fixed in v3.2.2. See description in my own answer below.
The qemu-i386 thing was bugging me so I decided to try compiling a simple .c with -m32 flag to check if it also would trigger being run via qemu, got errors because I was missing gcc-multilib so I installed it.
Started the buildt binary and noticed that it did not run via qemu-i386.
Started my original application again and this time it did also not start via qemu.
Started gdb-multiarch, loaded the bin and attached to the process and now suddenly everything worked fine, got a nice proper backtrace!

Force gdb to use provided thread lib

I have an embedded ARM application which is bundled with all the so-libraries stripped, including the libpthread.so. Sometimes the application gets stuck in some part of code and I want to be able to attach to it with gdb and see what's going on. The problem is that gdb refuses to load the needed threading support library, with the following messages:
Trying host libthread_db library: /home/me/debug_libs/libthread_db.so.1.
td_ta_new failed: application not linked with libthread
thread_db_load_search returning 0
warning: Unable to find libthread_db matching inferior's thread
library, thread debugging will not be available.
Because of this I cannot debug the application, e.g. I cannot see current call stacks for all threads.
After some investigation I suspect that the td_ta_new failing with the application not linked with libthread is caused by the stripped version of libpthread, which lacks the nptl_version symbol. Is there any way to bypass the error?
The gdb is compiled for ARM and being run on the device itself. I have unstripped versions of the libraries, but the application is already running with the stripped libraries.
Is there any way to bypass the error?
A few ways that come to mind:
Use add-symbol-file to override the stripped libpthread.so.0 with un-stripped one:
(gdb) info shared libpthread.so
# shows the path and memory address where stripped libpthread.so.0 is loaded
(gdb) add-symbol-file /path/to/unstripped/libpthread.so.0 $address
# should override with new symbols, and attempt to re-load libthread_db.so.1
Run gdb -ex 'set sysroot /path/to/unstripped' ... where /path/to/unstripped is the path that mirrors installed tree (that is, if you are using /lib/libpthread.so.0, there should be /path/to/unstripped/lib/libpthread.so.0.
I have not tested this, but I believe it should work.
You could comment out the version check in GDB and rebuild it.

GDB record process does not support ioctl request on ARM

I compiled GDB 7.8 for native debugging on my Arndale 5250 board running, linaro 3.12(2013). The GDB was configured as “arm-linux-gnueabihf” and built using statically linked libraries.
It works fine on the board but in “record and replay mode”. It generates the following message when came across a printf statement:
"Process record and replay target doesn't support ioctl request 0x7efff06c ()
( null)Process record: inferior program stopped.”
[process 2169] #1 stopped.
0x76f0f704 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
When proceeded further it cannot debug any more.
(gdb) n
Cannot find bounds of current function
I believe its because of some missing libraries on the target platform.
Please note that when I built GDB I copied only its exe to the target Arndale board, not any libraries. I thought statically linking with libraries will do the task.
Any idea how I can run process record and replay on ARM architecture like I usually do on my x86 machine?
gdb's process record feature works by executing each assembly instruction and recording its effects. When a calling into the kernel, it also has to know the effects of the system call. ioctl presents a singular challenge here because there are many possible iocctl calls.
From the description it sounds as though your libc is using an ioctl that gdb does not already know. In this case there is no solution other than to implement support for that call in gdb.

gdb 7.0 warnings: Wrong size fpregset in core file

reWhen analyzing a core file, my gdb 7.0 outputs several warnings:
warning: Wrong size gregset in core file.
warning: Wrong size fpregset in core file.
warning: Wrong size gregset in core file.
warning: Wrong size fpregset in core file.
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
I am not sure if its related, but I am unable to get a backtrace:
(gdb) bt
#0 0x00000000 in ?? ()
OS architecture is SUN Solaris 10 SPARC.
Questions:
What is the reason/cause of these warnings?
Why can't I retrieve a backtrace?
How to fix these problems?
The problem can in gdb as well in your program.
I would recommend to update gdb to the most recent version (7.3.1). Also it could be helpful to create simple test program and analyze its core with gdb to be sure that your utility works fine.
"gregset" and other error indicate that gdb unable read the data from the core file. It can happen if your program gone wild and corrupt stack. gregset error means that gdb was unable to read general-purpose register set from a core file. fpregset is for floating-point register set. The expected register size is platform dependent.
bt would not work if you cant read core file properly.
I also had the fpregset warnings (and no stack trace) when I tried to work on a 64bit core dump with gdb 7.6.2 on Solaris 10. The cause seems to be, that the userspace applications of Solaris 10 are compiled with 32bits by default - and without support for 64bit core cumps.
The guys in GDB's IRC channel gave me the following parameter:
--enable-64-bit-bfd
I also compiled a 64bit version of gdb (-m64), but that shouldn't be necessary. Now gdb could work on the 64bit core dump and create the stack trace without any warnings.

GDB: error setting break point in template class functions in header files

I have used two different versions of GDB, both give problems in the following code:
Trimmed down code in MyFile.h:
template<class T>
struct ABC: PQR<T> {
void flow(PP pp) {
const QX qx = XYZ<Z>::foo(pp); // Trying to set a breakpoint here, line no. 2533
ASSERTp(qx >= last_qx());
}
}
GDB 7.1:
Reading symbols from /path_to_exec/exec...done.
(gdb) break MyFile.h:2533
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x156.
Note: breakpoint 1 also set at pc 0x156.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x121.
Note: breakpoint 1 also set at pc 0x156.
Note: breakpoint 1 also set at pc 0x156.
Note: breakpoint 1 also set at pc 0x121.
Breakpoint 1 at 0x44e5c4: file PacketEngine.h, line 2533. (23 locations)
(gdb) run
Starting program: /path_to_exec/exec -options
Warning:
Cannot insert breakpoint 1.
Error accessing memory address 0x121: Input/output error.
Cannot insert breakpoint 1.
Error accessing memory address 0x156: Input/output error.
Why is it trying to set 23 breakpoints for one? And further down, it is giving error on run
GDB 6.3:
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
(gdb) break MyFile.h:2533
No line 2533 in file "MyFile.h".
At start of the program, it doesn't even accept the breakpoint
If I break in function ASSERTp, it breaks. Then. if I go "UP", and type break, it successfully inserts breakpoint (break MyFile.h:2533). [thus somehow it finds the file/line after the program actually runs]. However, despite the breakpoint being set, on rerunning the program it does not stop at line 2533 but 2534 only (breakpoint in function ASSERTp).
My questions:
1) Can someone please help me solve this?
2) I have often had problems with template code and GDB. Is there any good & free C++ debugger for templates?
3) Not really important, but a side question if it matters: Which version is preferable? The 7.1 seems to be more buggy, but I remember on some runs, it gives less problems.
System info:
uname -a
Linux ... 2.6.9-67.ELsmp #1 SMP Fri Nov 16 12:49:06 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
file /usr/bin/gdb #### GDB 6.3
/usr/bin/gdb: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), stripped
file ~/local/bin/gdb #### GDB 7.1
/home/user/local/bin/gdb: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped
file /path_to_exec/exec
/path_to_exec/exec: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped
I am not of any other debugger for linux, but I never experienced such problems as you explained.
You formulated your question really nice (so you probably did), but did you compile your sources with debug symbols?
EDIT
btw I haven't tried gdb 7.1 - only 6.8 version. If you think it is very buggy, try using the last version of the 6 version.
I have seen something similar (using GDB 7.0) where a breakpoint set in a template function is never hit.
Our project is built using an old version of G++ (much older than the version shipped in my distro). I found that by building a version of GDB using the same compiler we develop with that the problem was solved.
gdb sets a different breakpoint for each instantiated template, i.e for each different type assumed by T (and perhaps Z) in your program. However, the addresses that it is trying to set breakpoints at 0x121 seem to be too low and probably correspond to some system locations. This is probably why gdb can't set the breakpoints.
You should try gdb 7.2, perhaps that will help.
Also, e2dbg is a different type of debugger for Linux, but it is not as mature as gdb.
http://www.eresi-project.org/wiki/TheEmbeddedELFDebugger