multithread - additional thread created by the system? - c++

I am attaching with gdb to the running process (my multithread server). Then I request info thread and see that there is always +1 (or 2? ) additional thread(s), that I does not create at my code.
I created only:
4 workers (must be at cond_wait() )
1 signal thread ( always at sigwait() )
1 maintenance thread (executing each N seconds, then cond_wait() state)
1 thread that use popen() (executing each N seconds, then cond_wait() state)
1 main() thread (at accept() )
So, = 8 thread created by me. But why reported 9 or 10 ???
System is FreeBSD 6.4
Also, always have problem with this additional thread, it crash my program and it always at pthread_testcancel () state!
c++ pthreads - crash while trying to lock mutex for reading
Seems, marked by * thread number 10 is currently executed thread? and its the same as 8 thread??? or I have 2 additional thread? Is it normal? thanks.. and sorry for my bad english.
(gdb) info thread
* 10 LWP 100108 0x4865a79b in pthread_testcancel () from /lib/libpthread.so.2 ( WHAT IS ??? (1) )
9 Thread 0x80d4000 (runnable) 0x486d7bd3 in accept () from /lib/libc.so.6
8 Thread 0x80d4a00 (LWP 100090) 0x4865a79b in pthread_testcancel ()
from /lib/libpthread.so.2 ( WHAT IS??? (2) )
7 Thread 0x80d4c00 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
6 Thread 0x80d4e00 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
5 Thread 0x868b000 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
4 Thread 0x868b200 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
3 Thread 0x868b400 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
2 Thread 0x868b600 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2
1 Thread 0x868b800 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2

The additional threads are the result of 3rd party libraries. A quick search through curl, ImageMagick, tinyxml2, and pcre's source code shows that curl and ImageMagick have pthread_create() calls.
With regards to debugging in gdb:
In info threads, the * indicating the current thread being examined. It does not indicate the current thread running.
On the backtrace, the in ?? () can indicate that either the libraries are not built with debugging information (-g with gcc) or the stack is corrupted. Generally, if the stack is corrupted, gdb will give an explicit indication.
Also, be certain to check ImageMagick's thread of execution documentation.

Related

gdb info thread print thread name [duplicate]

This question already has answers here:
gdb how to get thread name displayed
(4 answers)
Closed 5 years ago.
During use gdb to debug multithread code, I need to find the thread name from gdb to locate error more faster.
for e.g. below gdb command only print thread 1,2,3..., etc, but I wish I can not thread A,B,C... that means I need to thread name list out.
Is it possible from gdb command.
(gdb) info threads
15 Thread 8725 __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:13
14 Thread 8726 __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:13
13 Thread 8730 __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:13
12 Thread 13328 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
11 Thread 13330 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
10 Thread 13331 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
9 Thread 8711 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
8 Thread 13334 nanosleep () at bionic/libc/arch-arm/syscalls/nanosleep.S:13
7 Thread 8722 nanosleep () at bionic/libc/arch-arm/syscalls/nanosleep.S:13
6 Thread 8724 nanosleep () at bionic/libc/arch-arm/syscalls/nanosleep.S:13
5 Thread 8710 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
4 Thread 8712 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
3 Thread 8723 __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:13
2 Thread 8721 read () at bionic/libc/arch-arm/syscalls/read.S:14
* 1 Thread 8709 __futex_wait () at bionic/libc/arch-arm/bionic/futex_arm.S:51
You don't say what version of gdb you are using.
For native Linux (that is, not using gdbserver), printing of thread names was added in gdb 7.3. So, upgrade to at least that version and you should see it work.
Support for this for gdbserver is planned, but not yet implemented.
Support for other platforms depends on volunteers.

Make gdb show thread names on 'apply all' operations

I'm debugging an app with many threads, so I've named them using prctl. This works great with gdb's info threads option, but it would be nice if thread * apply all operations showed it as well. Any way to coerce gdb to do this?
(gdb) info threads
Id Target Id Frame
...
3 Thread 0x7ffff6ffe700 (LWP 30048) "poll_uart_threa" 0x00007ffff78eb823 in select ()
at ../sysdeps/unix/syscall-template.S:82
2 Thread 0x7ffff77ff700 (LWP 30047) "signal hander" do_sigwait (set=<optimized out>,
sig=0x7ffff77feed8)
at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
* 1 Thread 0x7ffff7fcc700 (LWP 30046) "simulator" __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
Pointer, PID {well, thread ID, but LWP threads == processes, ish}, and name
(gdb) thread apply all bt
...
Thread 3 (Thread 0x7ffff6ffe700 (LWP 30048)):
#0 0x00007ffff78eb823 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000403bb3 in poll_uart_thread (unused=0x0) at uart.c:96
#2 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6ffe700) at pthread_create.c:308
#3 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7ffff77ff700 (LWP 30047)):
<call stack>
#2 0x0000000000417a89 in sig_thread (arg=0x7fffffffbb60) at simulator.c:879
#3 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff77ff700) at pthread_create.c:308
#4 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7ffff7fcc700 (LWP 30046)):
<call stack>
#9 0x00000000004182e3 in simulator (flash_file=0x7fffffffe0e4 "../programs/blink.bin")
at simulator.c:1005
#10 0x0000000000401f14 in main (argc=3, argv=0x7fffffffdd48) at cli.c:167
While I can find the name by hunting the call stack, it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
There's no easy way, you have to patch GDB. It's a simple patch, you can find it here.
it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
Please file an ehnancement request in GDB bugzilla.
If you are using GDB with embedded python, you might be able to script "thread apply" to do what you want, but it really ought to do the right thing already.

infinite abort() in a backrace of a c++ program core dump

I have a strange problem that I can't solve. Please help!
The program is a multithreaded c++ application that runs on ARM Linux machine. Recently I began testing it for the long runs and sometimes it crashes after 1-2 days like so:
*** glibc detected ** /root/client/my_program: free(): invalid pointer: 0x002a9408 ***
When I open core dump I see that the main thread it seems has a corrupt stack: all I can see is infinite abort() calls.
GNU gdb (GDB) 7.3
...
This GDB was configured as "--host=i686 --target=arm-linux".
[New LWP 706]
[New LWP 700]
[New LWP 702]
[New LWP 703]
[New LWP 704]
[New LWP 705]
Core was generated by `/root/client/my_program'.
Program terminated with signal 6, Aborted.
#0 0x001c44d4 in raise ()
(gdb) bt
#0 0x001c44d4 in raise ()
#1 0x001c47e0 in abort ()
#2 0x001c47e0 in abort ()
#3 0x001c47e0 in abort ()
#4 0x001c47e0 in abort ()
#5 0x001c47e0 in abort ()
#6 0x001c47e0 in abort ()
#7 0x001c47e0 in abort ()
#8 0x001c47e0 in abort ()
#9 0x001c47e0 in abort ()
#10 0x001c47e0 in abort ()
#11 0x001c47e0 in abort ()
And it goes on and on. I tried to get to the bottom of it by moving up the stack: frame 3000 or even more, but eventually core dump runs out of frames and I still can't see why this has happened.
When I examine the other threads everything seems normal there.
(gdb) info threads
Id Target Id Frame
6 LWP 705 0x00132f04 in nanosleep ()
5 LWP 704 0x001e7a70 in select ()
4 LWP 703 0x00132f04 in nanosleep ()
3 LWP 702 0x00132318 in sem_wait ()
2 LWP 700 0x00132f04 in nanosleep ()
* 1 LWP 706 0x001c44d4 in raise ()
(gdb) thread 5
[Switching to thread 5 (LWP 704)]
#0 0x001e7a70 in select ()
(gdb) bt
#0 0x001e7a70 in select ()
#1 0x00057ad4 in CSerialPort::read (this=0xbea7d98c, string_buffer=..., delimiter=..., timeout_ms=1000) at CSerialPort.cpp:202
#2 0x00070de4 in CScanner::readResponse (this=0xbea7d4cc, resp_recv=..., timeout=1000, delim=...) at PidScanner.cpp:657
#3 0x00071198 in CScanner::sendExpect (this=0xbea7d4cc, cmd=..., exp_str=..., rcv_str=..., timeout=1000) at PidScanner.cpp:604
#4 0x00071d48 in CScanner::pollPid (this=0xbea7d4cc, mode=1, pid=12, pid_str=...) at PidScanner.cpp:525
#5 0x00072ce0 in CScanner::poll1 (this=0xbea7d4cc)
#6 0x00074c78 in CScanner::Poll (this=0xbea7d4cc)
#7 0x00089edc in CThread5::Thread5Poll (this=0xbea7d360)
#8 0x0008c140 in CThread5::run (this=0xbea7d360)
#9 0x00088698 in CThread::threadFunc (p=0xbea7d360)
#10 0x0012e6a0 in start_thread ()
#11 0x001e90e8 in clone ()
#12 0x001e90e8 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(Classes and functions names are a bit wierd because I changed them -:)
So, thread #1 is where the stack is corrupt, backtrace of every other (2-6) shows
Backtrace stopped: previous frame identical to this frame (corrupt stack?).
It happends because threads 2-6 are created in the thread #1.
The thing is that I can't run the program in gdb because it runs on an embedded system. I can't use remote gdb server. The only option is examining core dumps that occur not very often.
Could you please suggest something that could move me forward with this? (Maybe something else I can extract from the core dump or maybe somehow to make some hooks in the code to catch abort() call).
UPDATE: Basile Starynkevitch suggested to use Valgrind, but turns out it's ported only for ARMv7. I have ARM 926 which is ARMv5, so this won't work for me. There are some efforts to compile valgrind for ARMv5 though: Valgrind cross compilation for ARMv5tel, valgrind on the ARM9
UPDATE 2: Couldn't make Electric Fence work with my program. The program uses C++ and pthreads. The version of Efence I got, 2.1.13 crashed in a arbitrary place after I start a thread and try to do something more or less complicated (for example to put a value into an STL vector). I saw people mentioning some patches for Efence on the web but didn't have time to try them. I tried this on my Linux PC, not on the ARM, and other tools like valgrind or Dmalloc don't report any problems with the code. So, everyone using version 2.1.13 of efence be prepared to have problems with pthreads (or maybe pthread + C++ + STL, don't know).
My guess for the "infinite' aborts is that either abort() causes a loop (e.g. abort -> signal handler -> abort -> ...) or that gdb can't correctly interpret the frames on the stack.
In either case I would suggest manually checking out the stack of the problematic thread. If abort causes a loop, you should see a pattern or at least the return address of abort repeating every so often. Perhaps you can then more easily find the root of the problem by manually skipping large parts of the (repeating) stack.
Otherwise, you should find that there is no repeating pattern and hopefully the return address of the failing function somewhere on the stack. In the worst case such addresses are overwritten due to a buffer overflow or such, but perhaps then you can still get lucky and recognise what it is overwritten with.
One possibility here is that something in that thread has very, very badly smashed the stack by vastly overwriting an on-stack data structure, destroying all the needed data on the stack in the process. That makes postmortem debugging very unpleasant.
If you can reproduce the problem at will, the right thing to do is to run the thread under gdb and watch what is going on precisely at the moment when the the stack gets nuked. This may, in turn, require some sort of careful search to determine where exactly the error is happening.
If you cannot reproduce the problem at will, the best I can suggest is very carefully looking for clues in the thread local storage for that thread to see if it hints at where the thread was executing before death hit.

Python PyGILState_{Ensure/Release} causes segfault while returning to C++ from Python code

UPDATE Well, it looks like adding PyEval_InitThreads() before the call to PyGILState_Ensure() does the trick. In my haste to figure things out I incorrectly attributed my "hanging" to PyEval_InitThreads().
However, after reading some Python documentation I am wondering if this is the correct solution.
It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock.
First of all, I am working on some modified GNU Radio code - particularly a modified gr_bin_statistics_f block. Now, there is a bug report (albeit an old one) which pretty much describes my exact situation.
http://gnuradio.org/redmine/issues/show/199
Now, usrp_spectrum_sense.py which is mentioned in the bug report calls gr_bin_statistics_f (C++) which then calls back to Python periodically to re-tune the USRP (radio).
Here is what happens when the Python code is called:
PyGILState_STATE d_gstate;
d_gstate = PyGILState_Ensure();
// call python code
PyGILState_Release(d_gstate);
So, once we return from the Python code a segmentation fault occurs when PyGILState_Release(d_gstate) is called. While there are differences between my code and the original gr_bin_statistics_f, nothing seems to be remotely related to this.
I read that calling PyEval_InitThreads() before PyGILState_Ensure() has solved the problem for some people, but it just causes my program to hang.
Can anyone shed light on this for me? Or is it simply time to send a message to the GNU Radio mailing list?
Using Python2.7 on Fedora 14 x86_64.
Here is the GDB backtrace:
(gdb) c
Continuing.
[New Thread 0x7fabd3a8d700 (LWP 23969)]
[New Thread 0x7fabd328c700 (LWP 23970)]
[New Thread 0x7fabd2a8b700 (LWP 23971)]
[New Thread 0x7fabd228a700 (LWP 23972)]
[New Thread 0x7fabd1a89700 (LWP 23973)]
[New Thread 0x7fabd1288700 (LWP 23974)]
[New Thread 0x7fabd0a87700 (LWP 23975)]
[New Thread 0x7fabbbfff700 (LWP 23976)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fabbbfff700 (LWP 23976)]
0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
#1 0x00000036c1317679 in PyThread_release_lock () from /usr/lib64/libpython2.7.so.1.0
#2 0x00007fabd6159c1f in ~ensure_py_gil_state (this=0x2dc6fc0, x=887000000)
at gnuradio_swig_py_general.cc:5593
#3 gr_py_feval_dd::calleval (this=0x2dc6fc0, x=887000000) at gnuradio_swig_py_general.cc:5605
#4 0x00007fabd77c4b6e in gr_noise_level_f::tune_window (this=0x2db3ca0,
target_freq=) at gr_noise_level_f.cc:97
#5 0x00007fabd77c554b in gr_noise_level_f::work (this=0x2db3ca0, noutput_items=7,
input_items=, output_items=)
at gr_noise_level_f.cc:115
#6 0x00007fabd7860714 in gr_sync_block::general_work (this=0x2db3ca0,
noutput_items=, ninput_items=,
input_items=, output_items=) at gr_sync_block.cc:64
#7 0x00007fabd7846ce4 in gr_block_executor::run_one_iteration (this=0x7fabbbffed90)
at gr_block_executor.cc:299
#8 0x00007fabd7864332 in gr_tpb_thread_body::gr_tpb_thread_body (this=0x7fabbbffed90, block=...)
at gr_tpb_thread_body.cc:49
#9 0x00007fabd785cce7 in operator() (function_obj_ptr=...) at gr_scheduler_tpb.cc:42
#10 operator() (function_obj_ptr=...)
at /home/tja/Research/energy/detector/gnuradio-3.3.0/gruel/src/include/gruel/thread_body_wrapper.h:49
#11 boost::detail::function::void_function_obj_invoker0, void>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153
---Type to continue, or q to quit---
#12 0x00007fabd74914ef in operator() (this=)
at /usr/include/boost/function/function_template.hpp:1013
#13 boost::detail::thread_data >::run (this=)
at /usr/include/boost/thread/detail/thread.hpp:61
#14 0x00007fabd725ca55 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.44.0
#15 0x00000036b3e06d5b in start_thread () from /lib64/libpthread.so.0
#16 0x00000036b3ae4a7d in clone () from /lib64/libc.so.6
(gdb)
Thanks for looking!
Python expects a certain amount of initialization to be done by the main thread before anything attempts to call back in from a subthread.
If the main thread is an application that is embedding Python, then it should call PyEval_InitThreads() immediately after calling Py_Initialize().
If the main thread is instead the Python interpreter itself (as seems to be the case here), then the module using the multithreaded extension module should include an "import threading" early to ensure that PyEval_InitThreads() is called correctly before any subthreads are spawned.
I ran into this exact problem as well. The documentation for anything relating to threads in CPython is unfortunately patchy at best.
Essentially, you need to do the following:
In your main thread, BEFORE any other threads are spawned, you need to call PyEval_InitThreads(). A good place to do this is right after you call PyInitialize().
Now, PyEval_InitThreads() not only initializes the Python interpreter thread-state, it also implicitly acquires the Global Interpreter Lock. This means, you need to release the lock before you call PyGILEnsure_State() in some other thread, otherwise your program will hang. You can do this with the function PyEval_ReleaseLock().
So basically, in your main thread, before any other threads are launched, you want to say:
PyInitialize();
PyEval_InitThreads();
PyEval_ReleaseLock();
Then, in any additional thread, anytime you use the Python API you need to say:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* ... some code that does things with Python ... */
PyGILState_Release(gstate);

Determining the correct thread to debug in GDB

I've run into some problems debugging a multi-threaded process using GDB. I have a multi-threaded process that splinters off into several (8 or 9) different threads, and I am trying to determine what the contents of variables are when the constructor for a class called XML_File_Data is called. However, I've run into a problem where, after I apply the correct function breakpoint to all threads and it's apparent one of the thread's break point is getting hit (the program temporarily halts execution), I'm not able to determine which thread hit the breakpoint. The command
(gdb) thread apply all where
is giving me shockingly useless information in the form:
#0 0x004ab410 in __kernel_vsyscall ()
#1 0x05268996 in nanosleep () from /lib/libc.so.6
#2 0x052a215c in usleep () from /lib/libc.so.6
#3 0x082ee313 in frame_clock_frame_end (clock=0xb4bfd2f8)
at frame_clock.c:143
#4 0x003a349a in ?? ()
#5 0x00b5cfde in thread_proxy ()
from /cets_development_libraries/install/lib/libboost_thread-gcc41-mt-1_38.so.1.38.0
#6 0x02c1f5ab in start_thread () from /lib/libpthread.so.0
#7 0x052a8cfe in clone () from /lib/libc.so.6
Of the 9 processes, 7 or so are giving me almost exactly that output, and the information about the last 2 isn't really much more helpful (functions far down the call stack have recognizable names, but any recent #0-#4 functions aren't recognizable).
This is what I have so far:
(gdb) gdb
(gdb) gdb attach <processid>
(gdb) thread apply all 'XML_File_Data::XML_File_Data()'
and (after the breakpoint is hit)
(gdb) thread apply all where
Could any experienced debuggers offer me some hints on what I am doing wrong or what is normally done in this situation?
Cheers,
Charlie
EDIT: Fortunately, I was able to find out that the cause of the ??'s was optimized code being run through the debugger, in addition to not running the debugger in the directory of the executable file. Still not much success with the debugging though.
I find myself doing this all the time:
> t a a f
Short for:
> thread apply all frame
Of course, other variants are possible:
> t a a bt 3
Which prints the bottom 3 frames of each thread's stack. (You can also use negative numbers to get the top N frames of the stack)
You can use command thread or info threads to find out the current thread number after breakpoint hit
(gdb) thread
[Current thread is 1 (Thread 0xb790d6c0 (LWP 2519))]
(gdb)
(gdb) info threads
17 Thread 0xb789cb90 (LWP 2536) 0xb7fc6402 in __kernel_vsyscall ()
16 Thread 0xb769bb90 (LWP 2537) 0xb7fc6402 in __kernel_vsyscall ()
15 Thread 0xb749ab90 (LWP 2543) 0xb7fc6402 in __kernel_vsyscall ()
14 Thread 0xb7282b90 (LWP 2544) 0xb7fc6402 in __kernel_vsyscall ()
13 Thread 0xb5827b90 (LWP 2707) 0xb7fc6402 in __kernel_vsyscall ()
12 Thread 0xb5626b90 (LWP 2708) 0xb7fc6402 in __kernel_vsyscall ()
11 Thread 0xb5425b90 (LWP 2709) 0xb7fc6402 in __kernel_vsyscall ()
10 Thread 0xb5161b90 (LWP 2713) 0xb7fc6402 in __kernel_vsyscall ()
9 Thread 0xb4ef9b90 (LWP 2715) 0xb7fc6402 in __kernel_vsyscall ()
8 Thread 0xb4af7b90 (LWP 2717) 0xb7fc6402 in __kernel_vsyscall ()
7 Thread 0xb46ffb90 (LWP 2718) 0xb7fc6402 in __kernel_vsyscall ()
6 Thread 0xb44feb90 (LWP 2726) 0xb7fc6402 in __kernel_vsyscall ()
5 Thread 0xb42fdb90 (LWP 2847) 0xb7fc6402 in __kernel_vsyscall ()
4 Thread 0xb40fcb90 (LWP 2848) 0xb7fc6402 in __kernel_vsyscall ()
3 Thread 0xb3efbb90 (LWP 2849) 0xb7fc6402 in __kernel_vsyscall ()
2 Thread 0xb3cfab90 (LWP 2850) 0xb7fc6402 in __kernel_vsyscall ()
* 1 Thread 0xb790d6c0 (LWP 2519) 0xb7fc6402 in __kernel_vsyscall ()
(gdb)
An asterisk `*' to the left of the gdb thread number indicates the current thread. See here.