Catching signals such as SIGSEGV and SIGFPE in multithreaded program - c++

I am trying to write a multithreaded logging system for a program running on linux.
Calls to the logging system in the main program threads pushes a data structure containing the data to be logged into a FIFO queue. A dedicated thread picks the data of the queue and outputs the data, while the programs main thread continues with its task.
If the main program causes SIGSEGV or other signals to be raised I need to make sure that the queue is empty before terminating.
My plan is to block the signals using pthread_sigmask http://man7.org/linux/man-pages/man3/pthread_sigmask.3.html for all but one thread, but reading the list of signals on http://man7.org/linux/man-pages/man7/signal.7.html i noticed:
A signal may be generated (and thus pending) for a process as a whole (e.g., when sent >using kill(2)) or for a specific thread (e.g., certain signals, such as SIGSEGV and SIGFPE, >generated as a consequence of executing a specific machine-language instruction are
thread directed, as are signals targeted at a specific thread using pthread_kill(3)).
If I block SIGSEGV on all threads but a thread dedicated to catching signals, will it then catch a SIGSEGV raised by a different thread?
I found the question Signal handling with multiple threads in Linux, but I am clueless as to which signals are thread specific and how to catch them.

I agree with the comments: in practice catching and handling SIGSEGV is often a bad thing.
And SIGSEGV is delivered to a specific thread (see this), the one running the machine instruction which accessed to some illegal address.
So you cannot run a thread dedicated to catching SIGSEGV in other threads. And you probably could not easily use signalfd(2) for SIGSEGV...
Catching (and returning normally from its signal handler) SIGSEGV is a complex and processor specific thing (it cannot be "portable C code"). You need to inspect and alter the machine state in the handler, that is either modify the address space (by calling mmap(2) etc...) or modify the register state of the current thread. So use sigaction(2) with SA_SIGINFO and change the machine specific state pointed by the third argument (of type ucontext_t*) of the signal handler. Then dive into the processor specific uc_mcontext field of it. Have fun changing individual registers, etc... If you don't alter the machine state of the faulty thread, execution is resumed (after returning from your SIGSEGV handler) in the same situation as before, and another SIGSEGV signal is immediately sent.... Or simply, don't return normally from a SIGSEGV handler (e.g. use siglongjmp(3) or abort(3) or _exit(2) ...).
Even if you happen to do all this, it is rumored that Linux kernels are not extremely efficient on such executions. So it is rumored that trying to mimic Hurd/Mach external pagers this way on Linux is not very efficient. See this answer...
Of course signal handlers should call only (see signal(7) for more) async-signal-safe functions. In particular, you cannot in principle call fprintf from them (and you might not be able to use reliably your logging system, but it could work in most but not all cases).
What I said on SIGSEGV also holds for SIGBUS and SIGFPE (and other thread-specific asynchronous signals, if they exist).

Related

pthread_kill: What signal to use so it doesn't interfere with program execution?

I'm currently implementing a tracing system in our program that relies sending signals to individual threads / lightweight processes using pthread_kill. My problem lies in deciding what signal to use so it doesn't interfere with program execution otherwise. Basically, I have 3 requirements for the signal type:
It should be sendable to a signal thread / lightweight process
It should not terminate the program
It should be catcheable by common tracing tools like gdb or strace.
I used SIGUSR2 at the beginning, but I think there are some platforms that will call terminate if it occurs.
Is there a signal type that is guaranteed to be visible by the abovementioned tracing tools and not terminate the process?
Regards

signal() : any performance impact?

I need to catch SIGABRT, SIGSEGV and SIGILL to present the user a proper critical error message when something out of my control fails and the program need to exit.
However my program does a lot of realtime computing, so performance is important.
Does signal() ( http://www.cplusplus.com/reference/csignal/signal/ ) cause any performance loss (some sort of constant monitoring ?) or not at all (only triggered when an exception happen, no performance lost otherwise).
edit: My software runs on Windows (7 and higher) and OS X (10.7 and higher).
If your time critical process catches signals, there is no "special" time wasting. Indeed, the kernel holds a table of signals and actions for your process which it has to walk through if a signal was send. But every way of sending a message to a process or invoking a handler needs time. A message queue or waiting on a "flag" will have nearly the same "waste".
But using signals can have other implications which should be mentioned. Nearly every system call can be interrupted if the signal arrives. The return value from the call is EINTR. If you have many signals you pass to your process, this may slow down your application a lot, because you have to always check for EINTR with go again into the system call. And every system call is a bit expensive. So looping a lot over system calls with EINTR return values can be a bad design.
But for your question, you only look for SIGABRT, SIGSEGV and SIGILL. These signals are typically only used for seldom exceptions. So don't fear to use them as needed. But avoid using these signals frequently for own IPC. That can be done but is very bad design. For user IPC there are better signal names and also better methods.
In a short: For only catching exception signals, you don't have any time critical issues here.

Catch process kills in c++ under Unix

Is it possible with a C++ program to monitor which processes gets killed (either by the user or by the OS), or if the process terminates for some other reasons which are not segmentation fault or illegal operations, and perform some actions afterwards?
Short answer, yes it's possible.
Long answer:
You will need to implement signal handlers for the different signals that may kill a process. You can't necessarily catch EVERY type of signal (in particular, SIGKILL is not possible to catch since that would potentially make a process unkillable).
Use the sigaction function call to set up your signal handlers.
There is a decent list of which signals do what here (about 1/3 down from the top):
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html
Edit: Sorry, thought you meant within the process, not from outside of the process. If you "own" the process, you can use ptrace and it's PTRACE_GETSIGINFO to get what the signal was.
To generally "find processes killed" would be quite difficult - or at least to tell the difference between processes just exiting on their own, as opposed to those that exit because they are killed for some other reason.

How to monitor Unexpectedly exited Threads?

In multi thread programming, what if one of worker thread is unexpectedly exited and main thread needs to know whether that thread is alive or not.
Is there any way to check this?
I was wondering if there is a typical signal that is made when worker thread is exited.
(Linux)
Thank you
If threads are unexpectedly dying in your program, it is toast. If you want fault isolation, with recovery, use multiple processes (with shared memory) instead of, or inaddition to threads. On POSIX (and Win32 also) you can detect if the owner of a process-shared mutex died while holding that mutex and implement some "fsck-like" check and repair of the shared data to try to restore its invariants. (Obviously it helps you if the data structure is designed with recoverable transactions in mind.)
On Win32 you can use Windows structured exception handling (SEH) to catch any kind of exception in a thread. (For instance access violation, division by zero, ...). Using the tool help API you can gain a list of the attached modules, and there are interfaces for reading the machine registers, faulting address, etc.
In POSIX you can do that with signal handling. Events like access violations and such deliver signals to the thread to which they pertain.
It doesn't seem realistic to code up these pieces into a recovery strategy that tries to keep a buggy program running.

Boost.asio & UNIX signal handling

Preface
I have a multi-threaded application running via Boost.Asio. There is only one boost::asio::io_service for the whole application and all the things are done inside it by a group of threads. Sometimes it is needed to spawn child processes using fork and exec. When child terminates I need to make waitpid on it to check exit code an to collect zombie. I used recently added boost::asio::signal_set but encountered a problem under ancient systems with linux-2.4.* kernels (that are unfortunately still used by some customers). Under older linux kernels threads are actually a special cases of processes and therefore if a child was spawned by one thread, another thread is unable to wait for it using waitpid family of system calls. Asio's signal_set posts signal handler to io_service and any thread running this service can run this handler, which is inappropriate for my case. So I decided to handle signals in old good signal/sigaction way - all threads have the same handler that calls waitpid. So there is another problem:
The problem
When signal is caught by handler and process is successfully sigwaited, how can I "post" this to my io_service from the handler? As it seems to me, obvious io_service::post() method is impossible because it can deadlock on io_service internal mutexes if signal comes at wrong time. The only thing that came to my mind is to use some pipe or socketpair to write notifications there and async_wait on another end as it is done sometimes to handle signals in poll() event loops.
Are there any better solutions?
I've not dealt with boost::asio but I have solved a similar problem. I believe my solution works for both LinuxThreads and the newer NPTL threads.
I'm assuming that the reason you want to "post" signals to your *io_service* is to interrupt an system call so the thread/program will exit cleanly. Is this correct? If not maybe you can better describe your end goal.
I tried a lot of different solutions including some which required detecting which type of threads were being used. The thing that finally helped me solve this was the section titled Interruption of System Calls and Library Functions by Signal Handlers of man signal(7).
The key is to use sigaction() in your signal handling thread with out SA_RESTART, to create handlers for all the signals you want to catch, unmask these signals using pthread_sigmask(SIG_UNBLOCK, sig_set, 0) in the signal handling thread and mask the same signal set in all other threads. The handler does not have to do anything. Just having a handler changes the behavior and not setting SA_RESTART allows interruptible systems calls (like write()) to interrupt. Whereas if you use sigwait() system calls in other threads are not interrupted.
In order to easily mask signals in all other threads. I start the signal handling thread. Then mask all the signals in want to handle in the main thread before starting any other threads. Then when other threads are started they copy the main thread's signal mask.
The point is if you do this then you may not need to post signals to your *io_service* because you can just check your system calls for interrupt return codes. I don't know how this works with boost::asio though.
So the end result of all this is that I can catch the signals I want like SIGINT, SIGTERM, SIGHUO and SIGQUIT in order to perform a clean shutdown but my other threads still get their system calls interrupted and can also exit cleanly with out any communication between the signal thread and the rest of the system, with out doing anything dangerous in the signal handler and a single implementation works on both LinuxThreads and NPTL.
Maybe that wasn't the answer you were looking for but I hope it helps.
NOTE: If you want to figure out if the system is running LinuxThreads you can do this by spawning a thread and then comparing it's PID to the main thread's PID. If they differ it's LinuxThreads. You can then choose the best solution for the thread type.
If you are already polling your IO, another possible solution that is very simple is to just use a boolean to signal the other threads. A boolean is always either zero or not so there is no possibility of a partial update and a race condition. You can then just set this boolean flag without any mutexes that the other threads read. Tools like valgrind wont like it but in practice it works.
If you want to be even more correct you can use gcc's atomics but this is compiler specific.