I'm coding a C++ SSL Server for TCP Connections on Linux.
When the program uses SSL_write() to write into a closed pipe, a SIGPIPE-Exception gets thrown which causes the program to shut down. I know that this is normal behaviour. But the program should not always die when the peer not closes the connection correctly.
I have already googled a lot and tried pretty much everything I found, but it seems like nothing is working for me. signal(SIGPIPE,SIG_IGN) does not work - the exception still gets thrown (Same for signal(SIGPIPE, SomeKindOfHandler).
The gdb output:
Program received signal SIGPIPE, Broken pipe.
0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0 0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff7883835 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#2 0x00007ffff7881687 in BIO_write () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#3 0x00007ffff7b9d3e0 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#4 0x00007ffff7b9db04 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#5 0x000000000042266a in NetInterface::SendToSubscribers(bool) () at ../Bether/NetInterface.h:181
#6 0x0000000000425834 in main () at ../Bether/main.cpp:111
About the Code:
I'm using a thread which is waiting for new connections and accepting them. The thread then puts the connection information (BIO & SSL) into a static map inside the NetInterface class.
Every 5 seconds NetInterface::sendTOSubscribers() is executed from main(). This function accesses the static map and sends data to every connection in there. This function is also where the SIGPIPE comes from.
I have used signal(SIGPIPE,SIG_IGN) in main() (obviously before the 5-seconds loop) and in NetInterface::SendToSubscribers(), but it is not working anywhere.
Thanks for your help!
You have to call function sigaction to change this behavior either to ignore SIGPIPE or handle it in a specific way with your own signal handler. Please don't use function signal, it's obsolete.
http://man7.org/linux/man-pages/man2/sigaction.2.html
One way to do it (I haven't compiled this code but should be something like this):
void sigpipe_handler(int signal)
{
...
}
int main()
{
struct sigaction sh;
struct sigaction osh;
sh.sa_handler = &sigpipe_handler; //Can set to SIG_IGN
// Restart interrupted system calls
sh.sa_flags = SA_RESTART;
// Block every signal during the handler
sigemptyset(&sh.sa_mask);
if (sigaction(SIGPIPE, &sh, &osh) < 0)
{
return -1;
}
...
}
If the program is multithreaded, it is a little different as you have less control on which thread will receive the signal. That depends on the type of signal. For SIGPIPE, it will be sent to the pthread that generated the signal. Nevertheless, sigaction should work OK.
It is possible to set the mask in the main thread and all subsequently created pthreads will inherit the signal mask. Otherwise, the signal mask can be set in each thread.
sigset_t blockedSignal;
sigemptyset(&blockedSignal);
sigaddset(&blockedSignal, SIGPIPE);
pthread_sigmask(SIG_BLOCK, &blockedSignal, NULL);
However, if you block the signal, it will be pending for the process and as soon as it is possible it will be delivered. For this case, use sigtimedwait at the end of the thread. sigaction set at the main thread or in the thread that generated SIGPIPE should work as well.
I've found the solution, it works with pthread_sigmask.
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGPIPE);
if (pthread_sigmask(SIG_BLOCK, &set, NULL) != 0)
return -1;
Thanks to everyone for the help!
Related
I have the following code in my program.
Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask); //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;
The thread that executes this code will properly call the signal handler when on native linux. The problem is that on Windows Subsystem for Linux, the program hangs with the SIGUSR1 or SIGUSR2 is sent via pthread_kill which sends signals to a thread. Why does this work on native ubuntu (via VMWARE WORKSTATION 14) and debian and fedora, but NOT WSL?
When you have a hanging bug that you cannot reproduce when running within the debugger, you can attach the debugger to the running process after you reproduce the hang. This won't let you observe the variables changing as you lead to the hang, but at least you get the stack trace of exactly where the hang is occurring.
Once you know the process id of the hung process (assume it's 12345), you can use:
$ gdb -p 12345
Or, you can kill the process with a signal that will cause a core to be generated. I like to use SIGTRAP, since it is easy to distinguish from a SIGSEGV.
$ kill -SIGTRAP 12345
And then you can use gdb to discover what the process was hanging on.
The advantage of attaching to the running process is that the process is still live. This allows you to call functions from the debugger, which may provide easier access to diagnostics built into your program. The core file preserves the error, which is beneficial if the hanging bug is difficult to reproduce.
I have a basic problem of handling signal in a multi-threaded process.
In my code, I create one sub-thread from the main thread, to listen to a SIGALRM which will be later trigger by main thread (using other function like timer_create gives me the same result, so please don't focus on this).
The problem is, instead of catching the signal, the whole process terminated with a strange "Alarm clock" output on the console.
This is my code:
#include <iostream>
#include <sys/time.h>
#include <unistd.h>
#include <csignal>
using namespace std;
void* run_something(void* args){
//unblock the SIGALRM to be catched
sigset_t sig;
sigemptyset(&sig);
sigaddset(&sig, SIGALRM);
sigprocmask(SIG_UNBLOCK, &sig, NULL); //tried with pthread_sigmask
//wait for SIGALRM
int catchedSig;
sigwait(&sig, &catchedSig);
cout<<"in sub-thread, SIGALRM catched, return"<<endl;
}
int main(int argc, char** argv){
//block SIGALRM in main thread
sigset_t sig;
sigemptyset(&sig);
sigaddset(&sig, SIGALRM);
sigprocmask(SIG_BLOCK, &sig, NULL);
//create new thread
pthread_t thread;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&thread, &attr, run_something, NULL);
//trigger SIGARLM after 2s
alarm(2); //tried with timer_create/sigevent
//wait
cout<<"in main thread, waiting for sub-thread to terminate"<<endl;
pthread_join(thread, NULL);
cout<<"in main thread, terminating"<<endl;
return EXIT_SUCCESS;
}
Expected result
in main thread, waiting for sub-thread to terminate
in sub-thread, SIGALRM getting caught, return
in main thread, terminating
Observed result
in main thread, waiting for sub-thread to terminate
Alarm clock
Additional info:
I'm using g++ (Debian 5.4.0-4) 5.4.0 20160609.
Your run_something thread unblocks SIGALRM before calling sigwait for that signal, but this is undefined behavior. sigwait removes a signal from the set of pending (i.e., blocked) signals.
Don't unblock in your thread and you'll see the behavior you expect.
The code shown does not set up any signal handler for SIGARLM.
Therefore on signal reception the OS does as it ought to, namely invoke SIGALRM's default action, that is to terminate the process. Printing "Alarm clock" to the console is part of the default behaviour, BTW.
To fix this set up a signal handler for SIGARLM. This can be done in a portable manner by using sigaction().
Also do not use sigprocmask() in a multi-threaded program, as its behaviour is unspecified. Use pthread_sigmask() instead.
Update:
I missed the code calls sigwait() ... :}
Under this condition fixing this issues does not raise the need to set up a signal handler (which still would solve the issue as well and is valid) but doing as proposed by pilcrow's answer, that is leave the signals blocked prior to calling sigwait()(or sigwaitinfo()).
Additionally make sure to use pthread_sigmask() instead of sigprocmask() for the reason given above.
Unrelated to the question's issue:
I create one sub-thread from the main thread
There is no such concept as "sub"-threads. After having been created all process' threads are "siblings" on the same level. This includes the initial thread started using main(). The "main"-thread is commonly called this way just because of the name of it's thread-function: main
I have an application that has one main thread which spawns another thread which spawns threads for each request received and I'm getting a core dump probably due to a deadlock. On gdb I see the following:
__lll_lock_wait_private ();
_L_lock_4714 ();
start_thread ();
clone ();
This is generated from the following code sample:
do
{
pthread_t handle;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&handle, 0, run, msg);
pthread_detach(handle);
} while (!stop)
run is an extern function while the rest of the code is part of class methods.
void* run(void* arg)
{
Handler handler;
Msg* msg = static_cast<Msg*> (arg);
handler.handleMsg(msg);
return NULL;
}
handleMsg method does some processing and then call another application thru a system call:
...
system("AnotherApplication param1, param2 &");
...
Note the ampersand. It is on purpose because I want the process to run asynchronously. The response goes thru the main thread thru another type of communication.
This application has been running on Linux:
Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
I'm not ignoring any signals.
What could be the problem here?
pthread_detach manual informs us:
Attempting to detach an already detached thread results in unspecified behavior.
However, you're creating your threads as detached from the start up.
What results do you expect?
I'm doing some event handling with C++ and pthreads. I have a main thread that reads from event queue I defined, and a worker thread that fills the event queue. The queue is of course thread safe.
The worker thread have a list of file descriptors and create an epoll system call to get events on those file descriptors. It uses epoll_wait to wait for events on the fd's.
Now the problem. Assuming I want to terminate my application cleanly, how can I cancel the worker thread properly? epoll_wait is not one of the cancellation points of pthread(7) so it cannot react properly on pthread_cancel.
The worker thread main() looks like this
while(m_WorkerRunning) {
epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
//handle events and insert to queue
}
The m_WorkerRunning is set to true when the thread starts and it looks like I can interrupt the thread by settings m_WorkerRunning to false from the main thread. The problem is that epoll_wait theoretically can wait forever.
Other solution I though about is: instead of waiting forever (-1) I can wait for example X time slots, then handle properly no-events case and if m_WorkerRunning == false then exit the loop and terminate the worker thread cleanly. The main thread then sets m_WorkerRunning to false, and sleeps X. However I'm not sure about the performance of such epoll_wait and also not sure what would be the correct X? 500ms? 1s? 10s?
I'd like to hear some experienced advises!
More relevant information: the fd's I'm waiting events on, are devices in /dev/input so technically I'm doing some sort of input subsystem. The targeted OS is Linux (latest kernel) on ARM architecture.
Thanks!
alk's answer above is almost correct. The difference, however, is very dangerous.
If you are going to send a signal in order to wake up epoll_wait, never use epoll_wait. You must use epoll_pwait, or you might run into a race with your epoll never waking up.
Signals arrive asynchronously. If your SIGUSR1 arrives after you've checked your shutdown procedure, but before your loop returns to the epoll_wait, then the signal will not interrupt the wait (as there is none), but neither will the program exit.
This might be very likely or extremely unlikely, depending on how long the loop takes in relation to how much time is spent in the wait, but it is a bug one way or the other.
Another problem with alk's answer is that it does not check why the wait was interrupted. It might be any number of reasons, some unrelated to your exit.
For more information, see the man page for pselect. epoll_pwait works in a similar way.
Also, never send signals to threads using kill. Use pthread_kill instead. kill's behavior when sending signals is, at best, undefined. There is no guarantee that the correct thread will receive it, which might cause an unrelated system call to be interrupted, or nothing at all to happen.
You could send the thread a signal which would interupt the blocking call to epoll_wait(). If doing so modify your code like this:
while(m_WorkerRunning)
{
int result = epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
if (-1 == result)
{
if (EINTR == errno)
{
/* Handle shutdown request here. */
break;
}
else
{
/* Error handling goes here. */
}
}
/* Handle events and insert to queue. */
}
A way to add a signal handler:
#include <signal.h>
/* A generic signal handler doing nothing */
void signal_handler(int sig)
{
sig = sig; /* Cheat compiler to not give a warning about an unused variable. */
}
/* Wrapper to set a signal handler */
int signal_handler_set(int sig, void (*sa_handler)(int))
{
struct sigaction sa = {0};
sa.sa_handler = sa_handler;
return sigaction(sig, &sa, NULL);
}
To set this handler for the signal SIGUSR1 do:
if (-1 == signal_handler_set(SIGUSR1, signal_handler))
{
perror("signal_handler_set() failed");
}
To send a signal SIGUSR1 from another process:
if (-1 == kill(<target process' pid>, SIGUSR1))
{
perror("kill() failed");
}
To have a process send a signal to itself:
if (-1 == raise(SIGUSR1))
{
perror("raise() failed");
}
I am working on a networking program using epoll on linux machine and I got the error message from gdb.
Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff609a700 (LWP 19788)]
0x00007ffff7bcdb2d in write () from /lib/libpthread.so.0
(gdb)
(gdb) backtrace
#0 0x00007ffff7bcdb2d in write () from /lib/libpthread.so.0
#1 0x0000000000416bc8 in WorkHandler::workLoop() ()
#2 0x0000000000416920 in WorkHandler::runWorkThread(void*) ()
#3 0x00007ffff7bc6971 in start_thread () from /lib/libpthread.so.0
#4 0x00007ffff718392d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
My server doing n^2 time calculation and I tried to run the server with 500 connected users. What might cause this error? and how do I fix this?
while(1){
if(remainLength >= MAX_LENGTH)
currentSentLength = write(client->getFd(), sBuffer, MAX_LENGTH);
else
currentSentLength = write(client->getFd(), sBuffer, remainLength);
if(currentSentLength == -1){
log("WorkHandler::workLoop, connection has been lost \n");
break;
}
sBuffer += currentSentLength;
remainLength -= currentSentLength;
if(remainLength == 0)
break;
}
When you write to a pipe that has been closed (by the remote end) , your program will receive this signal. For simple command-line filter programs, this is often an appropriate default action, since the default handler for SIGPIPE will terminate the program.
For a multithreaded program, the correct action is usually to ignore the SIGPIPE signal, so that writing to a closed socket will not terminate the program.
Note that you cannot successfully perform a check before writing, since the remote end may close the socket in between your check and your call to write().
See this question for more information on ignoring SIGPIPE: How to prevent SIGPIPEs (or handle them properly)
You're not catching SIGPIPE signals, but you're trying to write to a pipe that's been broken/closed.
Fairly self-explanatory.
It's usually sufficient to handle SIGPIPE signals as a no-op, and handle the error case around your write call in whatever application-specific manner you require... like this.