Deadlock when spawning threads? - c++

I have an application that has one main thread which spawns another thread which spawns threads for each request received and I'm getting a core dump probably due to a deadlock. On gdb I see the following:
__lll_lock_wait_private ();
_L_lock_4714 ();
start_thread ();
clone ();
This is generated from the following code sample:
do
{
pthread_t handle;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&handle, 0, run, msg);
pthread_detach(handle);
} while (!stop)
run is an extern function while the rest of the code is part of class methods.
void* run(void* arg)
{
Handler handler;
Msg* msg = static_cast<Msg*> (arg);
handler.handleMsg(msg);
return NULL;
}
handleMsg method does some processing and then call another application thru a system call:
...
system("AnotherApplication param1, param2 &");
...
Note the ampersand. It is on purpose because I want the process to run asynchronously. The response goes thru the main thread thru another type of communication.
This application has been running on Linux:
Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
I'm not ignoring any signals.
What could be the problem here?

pthread_detach manual informs us:
Attempting to detach an already detached thread results in unspecified behavior.
However, you're creating your threads as detached from the start up.
What results do you expect?

Related

Why does the program hang in WSL?

I have the following code in my program.
Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask); //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;
The thread that executes this code will properly call the signal handler when on native linux. The problem is that on Windows Subsystem for Linux, the program hangs with the SIGUSR1 or SIGUSR2 is sent via pthread_kill which sends signals to a thread. Why does this work on native ubuntu (via VMWARE WORKSTATION 14) and debian and fedora, but NOT WSL?
When you have a hanging bug that you cannot reproduce when running within the debugger, you can attach the debugger to the running process after you reproduce the hang. This won't let you observe the variables changing as you lead to the hang, but at least you get the stack trace of exactly where the hang is occurring.
Once you know the process id of the hung process (assume it's 12345), you can use:
$ gdb -p 12345
Or, you can kill the process with a signal that will cause a core to be generated. I like to use SIGTRAP, since it is easy to distinguish from a SIGSEGV.
$ kill -SIGTRAP 12345
And then you can use gdb to discover what the process was hanging on.
The advantage of attaching to the running process is that the process is still live. This allows you to call functions from the debugger, which may provide easier access to diagnostics built into your program. The core file preserves the error, which is beneficial if the hanging bug is difficult to reproduce.

Signal handling in mutlti-threaded process

I have a basic problem of handling signal in a multi-threaded process.
In my code, I create one sub-thread from the main thread, to listen to a SIGALRM which will be later trigger by main thread (using other function like timer_create gives me the same result, so please don't focus on this).
The problem is, instead of catching the signal, the whole process terminated with a strange "Alarm clock" output on the console.
This is my code:
#include <iostream>
#include <sys/time.h>
#include <unistd.h>
#include <csignal>
using namespace std;
void* run_something(void* args){
//unblock the SIGALRM to be catched
sigset_t sig;
sigemptyset(&sig);
sigaddset(&sig, SIGALRM);
sigprocmask(SIG_UNBLOCK, &sig, NULL); //tried with pthread_sigmask
//wait for SIGALRM
int catchedSig;
sigwait(&sig, &catchedSig);
cout<<"in sub-thread, SIGALRM catched, return"<<endl;
}
int main(int argc, char** argv){
//block SIGALRM in main thread
sigset_t sig;
sigemptyset(&sig);
sigaddset(&sig, SIGALRM);
sigprocmask(SIG_BLOCK, &sig, NULL);
//create new thread
pthread_t thread;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&thread, &attr, run_something, NULL);
//trigger SIGARLM after 2s
alarm(2); //tried with timer_create/sigevent
//wait
cout<<"in main thread, waiting for sub-thread to terminate"<<endl;
pthread_join(thread, NULL);
cout<<"in main thread, terminating"<<endl;
return EXIT_SUCCESS;
}
Expected result
in main thread, waiting for sub-thread to terminate
in sub-thread, SIGALRM getting caught, return
in main thread, terminating
Observed result
in main thread, waiting for sub-thread to terminate
Alarm clock
Additional info:
I'm using g++ (Debian 5.4.0-4) 5.4.0 20160609.
Your run_something thread unblocks SIGALRM before calling sigwait for that signal, but this is undefined behavior. sigwait removes a signal from the set of pending (i.e., blocked) signals.
Don't unblock in your thread and you'll see the behavior you expect.
The code shown does not set up any signal handler for SIGARLM.
Therefore on signal reception the OS does as it ought to, namely invoke SIGALRM's default action, that is to terminate the process. Printing "Alarm clock" to the console is part of the default behaviour, BTW.
To fix this set up a signal handler for SIGARLM. This can be done in a portable manner by using sigaction().
Also do not use sigprocmask() in a multi-threaded program, as its behaviour is unspecified. Use pthread_sigmask() instead.
Update:
I missed the code calls sigwait() ... :}
Under this condition fixing this issues does not raise the need to set up a signal handler (which still would solve the issue as well and is valid) but doing as proposed by pilcrow's answer, that is leave the signals blocked prior to calling sigwait()(or sigwaitinfo()).
Additionally make sure to use pthread_sigmask() instead of sigprocmask() for the reason given above.
Unrelated to the question's issue:
I create one sub-thread from the main thread
There is no such concept as "sub"-threads. After having been created all process' threads are "siblings" on the same level. This includes the initial thread started using main(). The "main"-thread is commonly called this way just because of the name of it's thread-function: main

C++ & OpenSSL: SIGPIPE when writing in closed pipe

I'm coding a C++ SSL Server for TCP Connections on Linux.
When the program uses SSL_write() to write into a closed pipe, a SIGPIPE-Exception gets thrown which causes the program to shut down. I know that this is normal behaviour. But the program should not always die when the peer not closes the connection correctly.
I have already googled a lot and tried pretty much everything I found, but it seems like nothing is working for me. signal(SIGPIPE,SIG_IGN) does not work - the exception still gets thrown (Same for signal(SIGPIPE, SomeKindOfHandler).
The gdb output:
Program received signal SIGPIPE, Broken pipe.
0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0 0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff7883835 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#2 0x00007ffff7881687 in BIO_write () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#3 0x00007ffff7b9d3e0 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#4 0x00007ffff7b9db04 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#5 0x000000000042266a in NetInterface::SendToSubscribers(bool) () at ../Bether/NetInterface.h:181
#6 0x0000000000425834 in main () at ../Bether/main.cpp:111
About the Code:
I'm using a thread which is waiting for new connections and accepting them. The thread then puts the connection information (BIO & SSL) into a static map inside the NetInterface class.
Every 5 seconds NetInterface::sendTOSubscribers() is executed from main(). This function accesses the static map and sends data to every connection in there. This function is also where the SIGPIPE comes from.
I have used signal(SIGPIPE,SIG_IGN) in main() (obviously before the 5-seconds loop) and in NetInterface::SendToSubscribers(), but it is not working anywhere.
Thanks for your help!
You have to call function sigaction to change this behavior either to ignore SIGPIPE or handle it in a specific way with your own signal handler. Please don't use function signal, it's obsolete.
http://man7.org/linux/man-pages/man2/sigaction.2.html
One way to do it (I haven't compiled this code but should be something like this):
void sigpipe_handler(int signal)
{
...
}
int main()
{
struct sigaction sh;
struct sigaction osh;
sh.sa_handler = &sigpipe_handler; //Can set to SIG_IGN
// Restart interrupted system calls
sh.sa_flags = SA_RESTART;
// Block every signal during the handler
sigemptyset(&sh.sa_mask);
if (sigaction(SIGPIPE, &sh, &osh) < 0)
{
return -1;
}
...
}
If the program is multithreaded, it is a little different as you have less control on which thread will receive the signal. That depends on the type of signal. For SIGPIPE, it will be sent to the pthread that generated the signal. Nevertheless, sigaction should work OK.
It is possible to set the mask in the main thread and all subsequently created pthreads will inherit the signal mask. Otherwise, the signal mask can be set in each thread.
sigset_t blockedSignal;
sigemptyset(&blockedSignal);
sigaddset(&blockedSignal, SIGPIPE);
pthread_sigmask(SIG_BLOCK, &blockedSignal, NULL);
However, if you block the signal, it will be pending for the process and as soon as it is possible it will be delivered. For this case, use sigtimedwait at the end of the thread. sigaction set at the main thread or in the thread that generated SIGPIPE should work as well.
I've found the solution, it works with pthread_sigmask.
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGPIPE);
if (pthread_sigmask(SIG_BLOCK, &set, NULL) != 0)
return -1;
Thanks to everyone for the help!

C++: Thread synchronization scenario on Linux Platform

I am implementing multithreaded C++ program for Linux platform where I need a functionality similar to WaitForMultipleObjects().
While searching for the solution I observed that there are articles that describe how to achieve WaitForMultipleObjects() functionality in Linux with examples but those examples does not satisfy the scenario that I have to support.
The scenario in my case is pretty simple. I have a daemon process in which the main thread exposes a method/callback to the outside world for example to a DLL. The code of the DLL is not under my control. The same main thread creates a new thread "Thread 1". Thread 1 has to execute kind of an infinite loop in which it would wait for a shutdown event (daemon shutdown) OR it would wait on the data available event being signaled through the exposed method/callback mentioned above.
In short the thread would be waiting on shutdown event and data available event where if shutdown event is signaled the wait would satisfy and the loop would be broken or if data available event is signaled then also wait would satisfy and thread would do business processing.
In windows, it seems very straight forward. Below is the MS Windows based pseudo code for my scenario.
//**Main thread**
//Load the DLL
LoadLibrary("some DLL")
//Create a new thread
hThread1 = __beginthreadex(..., &ThreadProc, ...)
//callback in main thread (mentioned in above description) which would be called by the DLL
void Callbackfunc(data)
{
qdata.push(data);
SetEvent(s_hDataAvailableEvent);
}
void OnShutdown()
{
SetEvent(g_hShutdownEvent);
WaitforSingleObject(hThread1,..., INFINITE);
//Cleanup here
}
//**Thread 1**
unsigned int WINAPI ThreadProc(void *pObject)
{
while (true)
{
HANDLE hEvents[2];
hEvents[0] = g_hShutdownEvent;
hEvents[1] = s_hDataAvailableEvent;
//3rd parameter is set to FALSE that means the wait should satisfy if state of any one of the objects is signaled.
dwEvent = WaitForMultipleObjects(2, hEvents, FALSE, INFINITE);
switch (dwEvent)
{
case WAIT_OBJECT_0 + 0:
// Shutdown event is set, break the loop
return 0;
case WAIT_OBJECT_0 + 1:
//do business processing here
break;
default:
// error handling
}
}
}
I want to implement the same for Linux. According to my understanding when it would come to Linux, it has totally different mechanism where we need to register for signals. If the termination signal arrives, the process would come to know that it is about to shutdown but before that it is necessary for the process to wait for the running thread to gracefully shutdown.
The correct way to do this in Linux would be using condition variables. While this is not the same as WaitForMultipleObjects in Windows, you will get the same functionality.
Use two bools to determine whether there is data available or a shutdown must occur.
Then have the shutdown function and the data function both set the bools accordingly, and signal the condition variable.
#include <pthread.h>
pthread_cond_t cv = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_t hThread1; // this isn't a good name for it in linux, you'd be
// better with something line "tid1" but for
// comparison's sake, I've kept this
bool shutdown_signalled;
bool data_available;
void OnShutdown()
{
//...shutdown behavior...
pthread_mutex_lock(&mutex);
shutdown_signalled = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cv);
}
void Callbackfunc(...)
{
// ... whatever needs to be done ...
pthread_mutex_lock(&mutex);
data_available = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cv);
}
void *ThreadProc(void *args)
{
while(true){
pthread_mutex_lock(&mutex);
while (!(shutdown_signalled || data_available)){
// wait as long as there is no data available and a shutdown
// has not beeen signalled
pthread_cond_wait(&cv, &mutex);
}
if (data_available){
//process data
data_available = false;
}
if (shutdown_signalled){
//do the shutdown
pthread_mutex_unlock(&mutex);
return NULL;
}
pthread_mutex_unlock(&mutex); //you might be able to put the unlock
// before the ifs, idk the particulars of your code
}
}
int main(void)
{
shutdown_signalled = false;
data_available = false;
pthread_create(&hThread1, &ThreadProc, ...);
pthread_join(hThread1, NULL);
//...
}
I know windows has condition variables as well, so this shouldn't look too alien. I don't know what rules windows has about them, but on a POSIX platform the wait needs to be inside of a while loop because "spurious wakeups" can occur.
If you wish to write unix or linux specific code, you have differenr APIs available:
pthread: provides threads, mutex, condition variables
IPC (inter process comunication) mechanisms : mutex, semaphore, shared memory
signals
For threads, the first library is mandatory (there are lower level syscalls on linux, but it's more tedious). For events, the three may be used.
The system shutdown event generate termination (SIG_TERM) and kill (SIG_KILL) signals broadcasted to all the relevant processes. Hence an individual daemon shutdown can also be initiated this way. The goal of the game is to catch the signals, and initiate process shutdown. The important points are:
the signal mechanism is made in such a way that it is not necessary to wait for them
Simply install a so called handler using sigaction, and the system will do the rest.
the signal is set to the process, and any thread may intercept it (the handler may execute in any context)
You need therefore to install a signal handler (see sigaction(2)), and somehow pass the information to the other threads that the application must terminate.
The most convenient way is probably to have a global mutex protected flag which all your threads will consult regularily. The signal handler will set that flag to indicate shutdown. For the worker thread, it means
telling the remote host that the server is closing down,
close its socket on read
process all the remaining received commands/data and send answers
close the socket
exit
For the main thread, this will mean initiating a join on the worker thread, then exit.
This model should not interfer with the way data is normally processed: a blocking call to select or poll will return the error EINTR if a signal was caught, and for a non blocking call, the thread is regularily checking the flag, so it does work too.

pthread_create differences in linux kernel 2.4.20 and 2.4.36

I have some code on two systems running kernel 2.4.20 and kernel 2.4.38.
They both have gcc 3.2.2 and glibc 2.3.2
Under kernel 2.4.38, the pthread_t handles aren't being reused. Under a heavy load test the application crashes once the handles reach 0xFFFFFFFF.
( I suspected this in the first place because the app crashes in deployments where IT uses a network port scanner- the threads are created for handling socket connections )
This simple example recreates the problem:
void* ThreadProc(void* param)
{
usleep(10000);
printf(" Thread 0x%x\n", (unsigned int)pthread_self());
usleep(10000);
return NULL;
}
int main(int argc, char* argv[])
{
pthread_t sThread;
while(1)
{
pthread_create(&sThread, NULL, ThreadProc, NULL);
printf("Created 0x%x\n", (unsigned int)sThread);
pthread_join(sThread, NULL);
};
return 0;
}
Under 2.4.20:
Created 0x40838cc0
Thread 0x40838cc0
Created 0x40838cc0
Thread 0x40838cc0
Created 0x40838cc0
Thread 0x40838cc0
...and on and on...
Under 2.4.36:
Created 0x4002
Thread 0x4002
Created 0x8002
Thread 0x8002
Created 0xc002
Thread 0xc002
...keeps growing...
How can I get kernel 2.4.36 to recycle handles? Unfortunately I can't change kernel easily.
Thanks!
If your observations are correct, only two possible solutions exist.
Either
Upgrade the kernel. This may or may not be feasible for you.
Recycle threads within your application.
Option 2 is something you can do even if the kernel is misbehaving. You can hold a pool of threads that remain in a sleeping state when not being used. Thread pools are a widely known software engineering pattern (see http://en.wikipedia.org/wiki/Thread_pool_pattern). This is probably the better solution for you.
Turns out I wasn't joining my threads properly in the load test.
When I ran the load test again, the thread handles reached 0xFFFFF002 then rolled over to 0x1002 and carried on happily.
Moral of the story: Make dead sure your threads are joined or detached!