What's with the extra threads reported by GDB? - c++

I have a C++ application that starts as single thread and processes some video frames. For each frame the application spawns 2 threads that join and this is done in a loop for each frame.
I'm trying to investigate whether there is another thread that I haven't detected. The application is quite complex and loads shared libraries that may spawn threads of their own.
I use gdb's info threads for that.
This is what I get:
Id Target Id Frame
7 Thread 0x7fffde7fc700 (LWP 16644) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
6 Thread 0x7fffdeffd700 (LWP 16643) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
5 Thread 0x7fffdf7fe700 (LWP 16642) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
4 Thread 0x7fffdffff700 (LWP 16641) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
3 Thread 0x7fffe4988700 (LWP 16640) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
2 Thread 0x7fffe5c0b700 (LWP 16639) "my_debugged_process" 0x00007ffff3dc812d in poll ()
at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fc2800 (LWP 16636) "my_debugged_process" TheApplication::SomeClass::processFrame (this=0x743530, srcI=...,
dstI=...) at TheApplication.cpp:315
So the question is:
What are the threads from 2 to 7? Are they somehow related to my process? I only recognize thread 1.
I see that they are all waiting for a semaphore so I'm inclined to say that they belong to the debugger.

First, what Jonathan said in the comments: on Linux, gdb does not create any threads in your process. gdb tries to have a reasonably minimal impact on your application -- it can't be zero but it is pretty close.
Second, what Jonathan said again: to try to understand the threads after they are running, get a backtrace and see if it makes any sense. For a single thread:
(gdb) thread 52 # e.g.
(gdb) bt
Or for all of them:
(gdb) thread apply all bt
Finally, to see threads when they are created, one way to try is to get a backtrace when the thread is started:
(gdb) break pthread_create
(gdb) commands
> bt
> cont
> end
This should print a stack trace when a thread is created. This won't necessarily work in all cases (some programs call clone directly) but it should be ok for well-behaved libraries.

Related

gdb show local file frames in thread list

I'm debugging C++ deadlocks with gdb. Inevitably all the deadlock frames are deep inside library functions. For example, __lll_lock_wait is found several layers under mutex::lock(). It would be really great if I could just glance at the thread list to deduce where the deadlock is happening instead of going through the backtrace for each thread.
Example
main.cpp
In this example I purposely introduce deadlock by trying to lock the same mutex twice.
#include <mutex>
#include <thread>
void lock_a(){
std::mutex a;
a.lock();
// attempt to lock something that's already locked
a.lock();
}
int main(){
std::thread t_a(lock_a);
t_a.join();
return 0;
}
compilation & running
clang++ -g main.cpp -lpthread -o main.exe
gdb -ex run --args ./main.exe
gdb
Here I'm trying to figure out what caused the deadlock. If info threads could somehow prioritize "local" files I might find the cause much faster. At long last I'm able to see that I'm trying to lock a mutex in main.cpp
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7a41740 (LWP 2651239) "main.exe" __pthread_clockjoin_ex (threadid=140737348110080, thread_return=0x0,
clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
2 Thread 0x7ffff7a40700 (LWP 2651293) "main.exe" __lll_lock_wait (futex=futex#entry=0x7ffff7a3fe08, private=0) at lowlevellock.c:52
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff7a40700 (LWP 2651293))]
#0 __lll_lock_wait (futex=futex#entry=0x7ffff7a3fe08, private=0) at lowlevellock.c:52
52 lowlevellock.c: No such file or directory.
(gdb) f 4
#4 0x0000000000401223 in lock_a () at main.cpp:8
8 a.lock();
Is there some way to have gdb prioritize certain files when listing threads?

What does mean Thread 0x7fffc57fa700 (LWP 31671) exited] in gdb?

I develop and debug some program on Ubuntu 18.04 using C and C++.
At some point my multithreaded program crashes. In gdb I also see:
Thread 0x7fffc57fa700 (LWP 31671) exited]
What means 0x7fffc57fa700 and LWP 31671 ? I guess it is something like thread id. I need get it in C code. But when I tried std::this_thread::get_id() it returned int value, not 0x7fffc57fa700.
"LWP 31671" is the "Light Weight" process ID of the thread which ended. It inside the Linux kernel represents the thread. It's an implementation detail appearing on Linux, just ignore it.
"0x7fffc57fa700" is the hexadecimal representation of the thread's ID, namely what is return be the thread class' member function get_id() for C/C++ threads or pthread_self() for POSIX threads.
Address probably refers to either pthread_t (can be obtained with pthread_self) or to thread entry point, while LWP ( Light Weight Process) id can be obtained with syscall(SYS_gettid).
Note that the value returned by this_thread::get_id() is not necessary related to those system values, it is just an identifier with different values among all the thread objects representing a running thread that can be used to distinguish between threads.

All threads in wait in core dump file, but someone triggered SIG_ABRT

I am supporting an application written in C++ over many years and as of late it has started to crash providing core dumps that we don't know how to handle.
It runs on an appliance on Ubuntu 14.04.5
When loading the core file in GDB it says that:
Program terminated with signal SIGABRT, Aborted
I can inspect 230 threads but they are all in wait() in the exact same memory position.
There is a thread with ID 1 that in theory could be the responsible but that thread is also in wait.
So I have two questions basically.
How does the id index of the threads work?
Is thread with GDB ID 1 the last active thread? or is that an arbitrary index and the failure can be in any of the other threads?
How can all threads be in wait() when a SIGABRT is triggered?
Shouldn't the instruction pointer be at the failing command when the OS decided to step in an halt the process? Or is it some sort of deadlock protection?
Any help much appreciated.
Backtrace of thread 1:
#0 0xf771dcd9 in ?? ()
#1 0xf74ad4ca in _int_free (av=0x38663364, p=<optimized out>,have_lock=-186161432) at malloc.c:3989
#2 0xf76b41ab in std::string::_Rep::_M_destroy(std::allocator<char> const&) () from /usr/lib32/libstdc++.so.6
#3 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#4 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#5 0x5685e8b4 in SlimStringMapper::~SlimStringMapper() ()
#6 0x567d6bc3 in destroy ()
#7 0x566a40b4 in HttpProxy::getLogonCredentials(HttpClient*, HttpServerTransaction*, std::string const&, std::string const&, std::string&, std::string&) ()
#8 0x566a5d04 in HttpProxy::add_authorization_header(HttpClient*, HttpServerTransaction*, Hosts::Host*) ()
#9 0x566af97c in HttpProxy::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#10 0x566d597e in callOnClientRequest(HttpClient*, HttpServerTransaction*, FastHttpRequest*) ()
#11 0x566d169f in GateKeeper::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#12 0x566a2291 in HttpClientThread::run() ()
#13 0x5682e37c in wa_run_thread ()
#14 0xf76f6f72 in start_thread (arg=0xec65ab40) at pthread_create.c:312
#15 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#16 0xec65ab40 in ?? ()
Another thread that should be in wait:
#0 0xf771dcd9 in ?? ()
#1 0x5682e37c in wa_run_thread ()
#2 0xf76f6f72 in start_thread (arg=0xf33bdb40) at pthread_create.c:312
#3 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#4 0xf33bdb40 in ?? ()
Best regards
Jon
How can all threads be in wait() when a SIGABRT is triggered?
Is wait the POSIX function, or something from the run-time environment? Are you looking at a higher-level backtrace?
Anyway, there is an easy explanation why this can happen: SIGABRT was sent to the process, and not generated by a thread in a synchronous fashion. Perhaps a coworker sent the signal to create the coredump, after observing the deadlock, to collect evidence for future analysis?
How does the id index of the threads work? Is thread with GDB ID 1 the last active thread?
When the program is running under GDB, GDB numbers threads as it discovers them, so thread 1 is always the main thread.
But when loading a core dump, GDB discoveres threads in the order in which the kernel saved them. The kernels that I have seen always save the thread which caused program termination first, so usually loading core into GDB immediately gets you to the crash point without the need to switch threads.
How can all threads be in wait() when a SIGABRT is triggered?
One possiblity is that you are not analyzing the core correctly. In particular, you need exact copies of shared libraries that were used at the time when the core was produced, and that's unlikely to be the case when the application runs on "appliance" and you are analysing core on your development machine. See this answer.
I just saw your question. First of all my answer is not specific to you direct question but some solution to handle this kind of situation. Multi-threading entirely depend on the hardware and operating system of a machine. Especially memory and processors. Increase in thread means requirement of more memory as well as more time slice for processor. I don’t think your application have more than 100 processor to facilitate 230 thread to run concurrently with highest performance. To avoid this situation do the below steps which may help you.
Control the creation of threads. Control number of threads running concurrently.
Increase the memory size of your application. (check compiler options to increase memory for the application at run time or O/S to allocate enough memory)
Set grid size and stack size of each thread properly. (calculation need to be done based on your application’s threads functionality, this is bit complicated. Please read some documentation)
Handle synchronized block properly to avoid any deadlock.
Where necessary use conditional lock etc.
As you told that most of your threads are in wait condition, that means they are waiting for a lock to release for their turn, that means one of the thread already acquire the lock and still busy in processing or probably in deadlock situation.

How to tell what line of code created new thread (gdb)?

I'm attempting to debug a rather complicated program that is seg faulting. I've just learned about gdb and am trying to use it to find the problem. Currently, it shows
[New Thread 0x7fff4963700 (LWP 4768)]
[New Thread 0x7fff1faf700 (LWP 4769)]
[New Thread 0x7fff17ae700 (LWP 4768)]
very shortly after my program commences. That would be great if I had written multithreaded code, but I haven't. Is there a way to tell exactly what line of code is creating these new threads?
Working on Linux, catch syscall clone should break on all threads (and possibly some processes) creation. Notice that it will break in the creator thread (=the new thread is yet to be started).
Since you get the full backtrace that leads to the clone, if you need to extract the new thread entry point you should do up until you reach the pthread_create (or similar library function) stack frame and take it from its parameters (you can also directly check the parameters to clone, but I fear that the address there will be of some pthread library stub).
Threads have their own call stack. The only thing you can see is the value on the bottom of the stack. Point the thread id in t <thread id> or thread <thread id> and get call stack using bt or backtrace. You may obtain thread ids during pausing execution of your application in gdb and running info threads.
For example, my gdb session look like (specially tried to make be more clear for you) this:
(gdb) t 23
[Switching to thread 23 (Thread 0x7fff8ffff700 (LWP 32334))]
#0 0x00007fffc0cb829e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
(gdb) bt
#0 0x00007fffc0cb829e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007fffc0cb5bb0 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff52b10a5 in start_thread (arg=0x7fff8ffff700) at pthread_create.c:309
#3 0x00007ffff591a88d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Here gdb says that first value of the call stack is somewhere in libgomp.so (OpenMP library). Next you can see pthread_create.c which is system-dependent method of starting thread.

gdb nostop SIGSEGV on a specific thread

I have a program that purposely segfaults on one threads, but I have a problem that the other thread is segfaulting, I'd like to catch it with GDB, I saw that I can:
handle SIGSEGV nostop noprint
but I'd like to do that only on the thread that purposely does that.. is it possible?
I'll explain:
I have 2 threads, one thread is segfaulting(and recovers(mprotect read only and then releasing memory)), that works fine, the other thread does something else, but sadly, there is a bug and it is segfaulting, I want to catch that segfault, and not the other ones that occur in the other thread.
As I know, depending on the OS, and I assume linux for my answer and the answer is 'NO'!
Posix exceptions can have a sigmask per thread but only one handler per task. So it is not possible to set different handling for each thread. sigaction will handle it for the complete process. So I see no way for gdb to change this.
I'll explain: I have 2 threads, one thread is segfaulting(and recovers(mprotect read only and then releasing memory)), that works fine, the other thread does something else, but sadly, there is a bug and it is segfaulting, I want to catch that segfault, and not the other ones that occur in the other thread
You have to tell gdb to ignore the first SIGSEGV signal. So after the first sagfault use the signal 0 command in this thread. Your program will resume execution under gdb and that is that you want. Then it will stop at the second segfault in your second thread and this is what you want to inspect.
(gdb) help signal
Continue program with the specified signal.
Usage: signal SIGNAL
The SIGNAL argument is processed the same as the handle command.
An argument of "0" means continue the program without sending it a signal.
This is useful in cases where the program stopped because of a signal,
and you want to resume the program while discarding the signal.
So
Do not use handle SIGSEGV nostop noprint. Run your program under
gdb.
When it segfaults in the first threead do signal 0. Your program
resumes execution.
Then it segfaults in another thread. Now use backtrace to see the
problem.
Or if your two thread are not dependent on each other you can wait in the thread that first segfaulted while another segfault happen. Just do call sleep(60) in the first thread as soon as it causes a segfault and wait for another segfault in another thread. Your first thread will wait:
Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff7fde700 (LWP 25744)]
0x000000000040075d in my_thread_func1 (arg=0x0) at my_test_2.cpp:17
17 ptr1 = ptr1 / 0;
(gdb) call sleep(60)
Thread 140737343510272:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff75dd700 (LWP 25745)]
0x00000000004007a3 in my_thread_func2 (arg=0x0) at my_test_2.cpp:27
27 *ptr2 = *ptr2 + 2;
The program received a signal in another thread while
making a function call from GDB.
Evaluation of the expression containing the function
(sleep) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb)