How to tell what line of code created new thread (gdb)? - c++

I'm attempting to debug a rather complicated program that is seg faulting. I've just learned about gdb and am trying to use it to find the problem. Currently, it shows
[New Thread 0x7fff4963700 (LWP 4768)]
[New Thread 0x7fff1faf700 (LWP 4769)]
[New Thread 0x7fff17ae700 (LWP 4768)]
very shortly after my program commences. That would be great if I had written multithreaded code, but I haven't. Is there a way to tell exactly what line of code is creating these new threads?

Working on Linux, catch syscall clone should break on all threads (and possibly some processes) creation. Notice that it will break in the creator thread (=the new thread is yet to be started).
Since you get the full backtrace that leads to the clone, if you need to extract the new thread entry point you should do up until you reach the pthread_create (or similar library function) stack frame and take it from its parameters (you can also directly check the parameters to clone, but I fear that the address there will be of some pthread library stub).

Threads have their own call stack. The only thing you can see is the value on the bottom of the stack. Point the thread id in t <thread id> or thread <thread id> and get call stack using bt or backtrace. You may obtain thread ids during pausing execution of your application in gdb and running info threads.
For example, my gdb session look like (specially tried to make be more clear for you) this:
(gdb) t 23
[Switching to thread 23 (Thread 0x7fff8ffff700 (LWP 32334))]
#0 0x00007fffc0cb829e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
(gdb) bt
#0 0x00007fffc0cb829e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007fffc0cb5bb0 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff52b10a5 in start_thread (arg=0x7fff8ffff700) at pthread_create.c:309
#3 0x00007ffff591a88d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Here gdb says that first value of the call stack is somewhere in libgomp.so (OpenMP library). Next you can see pthread_create.c which is system-dependent method of starting thread.

Related

gdb show local file frames in thread list

I'm debugging C++ deadlocks with gdb. Inevitably all the deadlock frames are deep inside library functions. For example, __lll_lock_wait is found several layers under mutex::lock(). It would be really great if I could just glance at the thread list to deduce where the deadlock is happening instead of going through the backtrace for each thread.
Example
main.cpp
In this example I purposely introduce deadlock by trying to lock the same mutex twice.
#include <mutex>
#include <thread>
void lock_a(){
std::mutex a;
a.lock();
// attempt to lock something that's already locked
a.lock();
}
int main(){
std::thread t_a(lock_a);
t_a.join();
return 0;
}
compilation & running
clang++ -g main.cpp -lpthread -o main.exe
gdb -ex run --args ./main.exe
gdb
Here I'm trying to figure out what caused the deadlock. If info threads could somehow prioritize "local" files I might find the cause much faster. At long last I'm able to see that I'm trying to lock a mutex in main.cpp
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7a41740 (LWP 2651239) "main.exe" __pthread_clockjoin_ex (threadid=140737348110080, thread_return=0x0,
clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
2 Thread 0x7ffff7a40700 (LWP 2651293) "main.exe" __lll_lock_wait (futex=futex#entry=0x7ffff7a3fe08, private=0) at lowlevellock.c:52
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff7a40700 (LWP 2651293))]
#0 __lll_lock_wait (futex=futex#entry=0x7ffff7a3fe08, private=0) at lowlevellock.c:52
52 lowlevellock.c: No such file or directory.
(gdb) f 4
#4 0x0000000000401223 in lock_a () at main.cpp:8
8 a.lock();
Is there some way to have gdb prioritize certain files when listing threads?

Deadlock when throwing an exception in C++

I'm investigating a report of a deadlock that occurred within my library, which is generally mutli-threaded and written in C++11. The stacktrace during the deadlock looks like this:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fb4049e250d in __lll_lock_wait () from /lib64/libpthread.so.0
Id Target Id Frame
* 1 Thread 0x7fb40533b740 (LWP 26259) "i-foca" 0x00007fb4049e250d in __lll_lock_wait () from /lib64/libpthread.so.0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fb4049e250d in __lll_lock_wait () from /lib64/libpthread.so.0
Thread 1 (Thread 0x7fb40533b740 (LWP 26259)):
#0 0x00007fb4049e250d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fb4049dde76 in _L_lock_941 () from /lib64/libpthread.so.0
#2 0x00007fb4049ddd6f in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fb40403a0af in dl_iterate_phdr () from /lib64/libc.so.6
#4 0x00007fb3eb7f3bbf in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#5 0x00007fb3eb7f0d2c in ?? () from /lib64/libgcc_s.so.1
#6 0x00007fb3eb7f16ed in ?? () from /lib64/libgcc_s.so.1
#7 0x00007fb3eb7f1b7e in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
#8 0x00007fb3eba56986 in __cxa_throw () from /lib64/libstdc++.so.6
#9 0x00007fb3e7b3dd39 in <my library>
The code that causes the deadlock is basically throw NameError(...);, which is to say, a standard C++ construct which is supposed to be thread-safe. However, the code deadlocks nevertheless, trying to acquire a mutex in GLIBC's dl_iterate_phdr(). The following additional information is known about the environment:
Even though my library can spawn multiple threads, during the incident it ran in single-threaded mode, as evidenced by the stacktrace;
The program where my library is used does extensive forking-without-exec;
My library uses an at-fork handler in order to sanitize all its mutexes/threads when a fork occurs (however, I have no control over the mutexes in standard libraries). In particular, a fork cannot occur while an exception is being thrown.
I still don't understand how this deadlock could have occurred.
I'm considering the following scenarios, but not sure which one is possible and which one is not:
There are multiple child processes. One of them tries to throw an exception and crashes. If somehow the mutex that GLIBC uses is shared between child processes, and one of the children locks it but then fails to unlock because of the crash. Is it possible for a mutex to be shared in such a way?
Another library that I'm not aware of also uses multiple threads, and the fork happens when that library throws an exception in its code, which leaves the exception mutex in locked state in the child process. My library is then merely unfortunate enough to walk into this trap.
Any other scenario?

Segmentation Fault in Multithreaded program and incomplete information on gdb backtrace

I am writing a program which use both OS threads and user threads (fibers, I have written this user Threading program with context switching through assembly language). The problem is that the program sometimes ends with a segmentation fault but other times it doesn't.
The problem is due to a function getting called, with invalid arguments, which shouldn't get called. I think gdb backtrace isn't giving the proper information. Here is the output of my gdb program
#0 0x0000000000000000 in ?? ()
#1 0x0000555555555613 in thread_entry (fn=0x0, arg=0x0) at userThread2.cpp:243
#2 0x000055555555c791 in start_thread () at contextSwitch2.s:57
#3 0x0000000000000000 in ?? ()
fn is the function I want to run as a user thread, arg is the argument passed to that function.
I have a function Spawn in my user threading library code which push the two arguments (fn and arg) and the pointer to start_thread on the stack and thus start_thread, an assembly function, gets called which call the c++ function thread_entry to call the function fn with arguments arg.
I am not expecting a call to start_thread or thread_entry at the point of error so I am not sure how start_thread gets called. Even if it gets called then Spawn() should have called start_thread as it is the only function which calls start_thread. But Spawn is not shown in gdb backtrace.
Some online posts have mentioned the possibility of stack corruption or something similar the result of error and they have prescribed the use of "record btrace pt". I have spent considerable time setting up intel btrace pt support in the kernel/gdb but I was unable to set it up so I am not going through that route.
Here is a link to my code with compilation instructions:
https://github.com/smartWaqar/userThreading
I set a breakpoint on thread_entry, and observed:
...
[Thread 0x7ffff7477700 (LWP 203995) exited]
parentId: 1
OST 1 Hello A0 on CPU 2
current_thread_num 0 next_thread_num 1
After Thread Exit
After changeOSThread
OST 1 Hello C1 on CPU 2 ---------------
Before changeOSThread
**************** In changeOSThread **************
current_thread_num 1 next_thread_num 2
Thread 3 "a.out" hit Breakpoint 1, thread_entry (fn=0x0, arg=0x0) at userThread2.cpp:243
243 fn(arg) ;
(gdb) bt
#0 thread_entry (fn=0x0, arg=0x0) at userThread2.cpp:243
#1 0x000055555555c181 in start_thread () at context.s:57
#2 0x0000000000000000 in ?? ()
Conclusions:
GDB is giving you correct crash stack trace.
You do in fact call thread_entry with fn==0, which of course promptly crashes.
There is something racy going on, as this doesn't happen every time.
Even if it gets called then Spawn() should have called start_thread as it is the only function which calls start_thread
I've observed the following "call" to strart_thread:
Thread 2 "a.out" hit Breakpoint 1, start_thread () at context.s:53
53 push %rbp
(gdb) bt
#0 start_thread () at context.s:53
#1 0x0000555555555e4f in changeOSThread (parentId=<error reading variable>) at t.cc:196
#2 0x0000000000000000 in ?? ()
So I think your mental model of who calls start_thread is wrong.
This is a bit too much code for me to look at. If you want additional help, please reduce the test case to bare minimum.

All threads in wait in core dump file, but someone triggered SIG_ABRT

I am supporting an application written in C++ over many years and as of late it has started to crash providing core dumps that we don't know how to handle.
It runs on an appliance on Ubuntu 14.04.5
When loading the core file in GDB it says that:
Program terminated with signal SIGABRT, Aborted
I can inspect 230 threads but they are all in wait() in the exact same memory position.
There is a thread with ID 1 that in theory could be the responsible but that thread is also in wait.
So I have two questions basically.
How does the id index of the threads work?
Is thread with GDB ID 1 the last active thread? or is that an arbitrary index and the failure can be in any of the other threads?
How can all threads be in wait() when a SIGABRT is triggered?
Shouldn't the instruction pointer be at the failing command when the OS decided to step in an halt the process? Or is it some sort of deadlock protection?
Any help much appreciated.
Backtrace of thread 1:
#0 0xf771dcd9 in ?? ()
#1 0xf74ad4ca in _int_free (av=0x38663364, p=<optimized out>,have_lock=-186161432) at malloc.c:3989
#2 0xf76b41ab in std::string::_Rep::_M_destroy(std::allocator<char> const&) () from /usr/lib32/libstdc++.so.6
#3 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#4 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#5 0x5685e8b4 in SlimStringMapper::~SlimStringMapper() ()
#6 0x567d6bc3 in destroy ()
#7 0x566a40b4 in HttpProxy::getLogonCredentials(HttpClient*, HttpServerTransaction*, std::string const&, std::string const&, std::string&, std::string&) ()
#8 0x566a5d04 in HttpProxy::add_authorization_header(HttpClient*, HttpServerTransaction*, Hosts::Host*) ()
#9 0x566af97c in HttpProxy::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#10 0x566d597e in callOnClientRequest(HttpClient*, HttpServerTransaction*, FastHttpRequest*) ()
#11 0x566d169f in GateKeeper::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#12 0x566a2291 in HttpClientThread::run() ()
#13 0x5682e37c in wa_run_thread ()
#14 0xf76f6f72 in start_thread (arg=0xec65ab40) at pthread_create.c:312
#15 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#16 0xec65ab40 in ?? ()
Another thread that should be in wait:
#0 0xf771dcd9 in ?? ()
#1 0x5682e37c in wa_run_thread ()
#2 0xf76f6f72 in start_thread (arg=0xf33bdb40) at pthread_create.c:312
#3 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#4 0xf33bdb40 in ?? ()
Best regards
Jon
How can all threads be in wait() when a SIGABRT is triggered?
Is wait the POSIX function, or something from the run-time environment? Are you looking at a higher-level backtrace?
Anyway, there is an easy explanation why this can happen: SIGABRT was sent to the process, and not generated by a thread in a synchronous fashion. Perhaps a coworker sent the signal to create the coredump, after observing the deadlock, to collect evidence for future analysis?
How does the id index of the threads work? Is thread with GDB ID 1 the last active thread?
When the program is running under GDB, GDB numbers threads as it discovers them, so thread 1 is always the main thread.
But when loading a core dump, GDB discoveres threads in the order in which the kernel saved them. The kernels that I have seen always save the thread which caused program termination first, so usually loading core into GDB immediately gets you to the crash point without the need to switch threads.
How can all threads be in wait() when a SIGABRT is triggered?
One possiblity is that you are not analyzing the core correctly. In particular, you need exact copies of shared libraries that were used at the time when the core was produced, and that's unlikely to be the case when the application runs on "appliance" and you are analysing core on your development machine. See this answer.
I just saw your question. First of all my answer is not specific to you direct question but some solution to handle this kind of situation. Multi-threading entirely depend on the hardware and operating system of a machine. Especially memory and processors. Increase in thread means requirement of more memory as well as more time slice for processor. I don’t think your application have more than 100 processor to facilitate 230 thread to run concurrently with highest performance. To avoid this situation do the below steps which may help you.
Control the creation of threads. Control number of threads running concurrently.
Increase the memory size of your application. (check compiler options to increase memory for the application at run time or O/S to allocate enough memory)
Set grid size and stack size of each thread properly. (calculation need to be done based on your application’s threads functionality, this is bit complicated. Please read some documentation)
Handle synchronized block properly to avoid any deadlock.
Where necessary use conditional lock etc.
As you told that most of your threads are in wait condition, that means they are waiting for a lock to release for their turn, that means one of the thread already acquire the lock and still busy in processing or probably in deadlock situation.

What's with the extra threads reported by GDB?

I have a C++ application that starts as single thread and processes some video frames. For each frame the application spawns 2 threads that join and this is done in a loop for each frame.
I'm trying to investigate whether there is another thread that I haven't detected. The application is quite complex and loads shared libraries that may spawn threads of their own.
I use gdb's info threads for that.
This is what I get:
Id Target Id Frame
7 Thread 0x7fffde7fc700 (LWP 16644) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
6 Thread 0x7fffdeffd700 (LWP 16643) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
5 Thread 0x7fffdf7fe700 (LWP 16642) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
4 Thread 0x7fffdffff700 (LWP 16641) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
3 Thread 0x7fffe4988700 (LWP 16640) "my_debugged_process" sem_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
2 Thread 0x7fffe5c0b700 (LWP 16639) "my_debugged_process" 0x00007ffff3dc812d in poll ()
at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fc2800 (LWP 16636) "my_debugged_process" TheApplication::SomeClass::processFrame (this=0x743530, srcI=...,
dstI=...) at TheApplication.cpp:315
So the question is:
What are the threads from 2 to 7? Are they somehow related to my process? I only recognize thread 1.
I see that they are all waiting for a semaphore so I'm inclined to say that they belong to the debugger.
First, what Jonathan said in the comments: on Linux, gdb does not create any threads in your process. gdb tries to have a reasonably minimal impact on your application -- it can't be zero but it is pretty close.
Second, what Jonathan said again: to try to understand the threads after they are running, get a backtrace and see if it makes any sense. For a single thread:
(gdb) thread 52 # e.g.
(gdb) bt
Or for all of them:
(gdb) thread apply all bt
Finally, to see threads when they are created, one way to try is to get a backtrace when the thread is started:
(gdb) break pthread_create
(gdb) commands
> bt
> cont
> end
This should print a stack trace when a thread is created. This won't necessarily work in all cases (some programs call clone directly) but it should be ok for well-behaved libraries.