Is it possible with a C++ program to monitor which processes gets killed (either by the user or by the OS), or if the process terminates for some other reasons which are not segmentation fault or illegal operations, and perform some actions afterwards?
Short answer, yes it's possible.
Long answer:
You will need to implement signal handlers for the different signals that may kill a process. You can't necessarily catch EVERY type of signal (in particular, SIGKILL is not possible to catch since that would potentially make a process unkillable).
Use the sigaction function call to set up your signal handlers.
There is a decent list of which signals do what here (about 1/3 down from the top):
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html
Edit: Sorry, thought you meant within the process, not from outside of the process. If you "own" the process, you can use ptrace and it's PTRACE_GETSIGINFO to get what the signal was.
To generally "find processes killed" would be quite difficult - or at least to tell the difference between processes just exiting on their own, as opposed to those that exit because they are killed for some other reason.
Related
I am working on a project where we have used pthread_create to create several child threads.
The thread creation logic is not in my control as its implemented by some other part of project.
Each thread perform some operation which takes more than 30 seconds to complete.
Under normal condition the program works perfectly fine.
But the problem occurs at the time of termination of the program.
I need to exit from main as quickly as possible when I receive the SIGINT signal.
When I call exit() or return from main, the exit handlers and global objects' destructors are called. And I believe these operations are having a race condition with the running threads. And I believe there are many race conditions, which is making hard to solve all of theses.
The way I see it there are two solutions.
call _exit() and forget all de-allocation of resources
When SIGINT is there, close/kill all threads and then call exit() from main thread, which will release resources.
I think 1st option will work, but I do not want to abruptly terminate the process.
So I want to know if it is possible to terminate all child threads as quickly as possible so that exit handler & destructor can perform required clean-up task and terminate the program.
I have gone through this post, let me know if you know other ways: POSIX API call to list all the pthreads running in a process
Also, let me know if there is any other solution to this problem
What is it that you need to do before the program quits? If the answer is 'deallocate resources', then you don't need to worry. If you call _exit then the program will exit immediately and the OS will clean up everything for you.
Be aware also that what you can safely do in a signal hander is extremely limited, so attempting to perform any cleanup yourself is not recommended. If you're interested, there's a list of what you can do here. But you can't flush a file to disk, for example (which is about the only thing I can think of that you might legitimately want to do here). That's off limits.
I need to exit from main as quickly as possible when I receive the SIGINT signal.
How is that defined? Because there's no way to "exit quickly as possible" when you receive one signal like that.
You can either set flag(s), post to semaphore(s), or similar to set a state that tells other threads it's time to shut down, or you can kill the entire process.
If you elect to set flag(s) or similar to tell the other threads to shut down, you set those flags and return from your signal handler and hope the threads behave and the process shuts down cleanly.
If you elect to kill threads, there's effectively no difference in killing a thread, killing the process, or calling _exit(). You might as well just keep it simple and call _exit().
That's all you can chose between when you have to make your decision in a single signal handler call. Pick one.
A better solution is to use escalating signals. For example, when you get SIGQUIT or SIGINT, you set flag(s) or otherwise tell threads it's time to clean up and exit the process - or else. Then, say five seconds later whatever is shutting down your process sends SIGTERM and the "or else" happens. When you get SIGTERM, your signal handler simply calls _exit() - those threads had their chance and they messed it up and that's their fault. Or you can call abort() to generate a core file and maybe provide enough evidence to fix the miscreant threads that won't shut down.
And finally, five seconds later the managing process will nuke the process from orbit with SIGKILL just to be sure.
I'm currently implementing a tracing system in our program that relies sending signals to individual threads / lightweight processes using pthread_kill. My problem lies in deciding what signal to use so it doesn't interfere with program execution otherwise. Basically, I have 3 requirements for the signal type:
It should be sendable to a signal thread / lightweight process
It should not terminate the program
It should be catcheable by common tracing tools like gdb or strace.
I used SIGUSR2 at the beginning, but I think there are some platforms that will call terminate if it occurs.
Is there a signal type that is guaranteed to be visible by the abovementioned tracing tools and not terminate the process?
Regards
I need to catch SIGABRT, SIGSEGV and SIGILL to present the user a proper critical error message when something out of my control fails and the program need to exit.
However my program does a lot of realtime computing, so performance is important.
Does signal() ( http://www.cplusplus.com/reference/csignal/signal/ ) cause any performance loss (some sort of constant monitoring ?) or not at all (only triggered when an exception happen, no performance lost otherwise).
edit: My software runs on Windows (7 and higher) and OS X (10.7 and higher).
If your time critical process catches signals, there is no "special" time wasting. Indeed, the kernel holds a table of signals and actions for your process which it has to walk through if a signal was send. But every way of sending a message to a process or invoking a handler needs time. A message queue or waiting on a "flag" will have nearly the same "waste".
But using signals can have other implications which should be mentioned. Nearly every system call can be interrupted if the signal arrives. The return value from the call is EINTR. If you have many signals you pass to your process, this may slow down your application a lot, because you have to always check for EINTR with go again into the system call. And every system call is a bit expensive. So looping a lot over system calls with EINTR return values can be a bad design.
But for your question, you only look for SIGABRT, SIGSEGV and SIGILL. These signals are typically only used for seldom exceptions. So don't fear to use them as needed. But avoid using these signals frequently for own IPC. That can be done but is very bad design. For user IPC there are better signal names and also better methods.
In a short: For only catching exception signals, you don't have any time critical issues here.
How can signals be handled safley in and MPI application (for example SIGUSR1 which should tell the application that its runtime has expired and should terminate in the next 10 min.)
I have several constraints:
Finish all parallel/serial IO first befor quitting the application!
In all other circumstances the application can exit without any problem
How can this be achieved safely, no deadlocks while trying to exit, and properly leaving the current context jumping back to main() and calling MPI_FINALIZE() ?
Somehow the processes have to aggree on exiting (I think this is the same in multithreaded applicaitons) but how is that done efficiently without having to communicate to much? Is anybody aware of some standart way of doing this properly?
Below are some thought which might or might not work:
Idea 1:
Lets say for each process we catch the signal in a signal handler and push it on a "unhandled signals stack" (USS) and we simply return from the signal handler routine . We then have certain termination points in our application especially before and after IO operations which then handle all signals in USS.
If there is a SIGUSR1 in USS for example, each process would then exit at a termination point.
This idea has the problem that there could still be deadlocks, process 1 is just catching a singnal befor a termination point, while process 2 passed this point already and is now starting parallel IO. process 1 would exit, which results in a deadlock in process 2 (waiting for process 1 for IO which exited)...
Idea 2:
Only the master process 0 catches the signal in the signal handler and then sends a broadcast message : "all process exit!" at a specific point in the application. All processes receive the broadcast and throw and exception which is catched in main and MPI_FINALIZE is called.
This way the exit happens safely, but for the cost of having to receive continously broadcast message to see if we should exit or not
Thanks a lot!
If your goal is to stop all processes at the same point, then there is no way around always synchronizing at the possible termination points. That is, a collective call at the termination points is required.
Of course, you can try to avoid an extra broadcast by using the synchronization of another collective call to ensure proper termination, or piggy-pack the termination information on an existing broadcast, but I don't think that's worth it. After all, you only need to synchronize before I/O and at least once per ten minutes. At such a frequency, even a broadcast is not a performance problem.
Using signals in your MPI application in general is not safe. Some implementations may support it and others may not.
For instance, in MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures.
http://lists.mpich.org/pipermail/discuss/2014-October/003242.html
Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes.
http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect14
Other implementations will differ. So before you go too far down this route, make sure that the implementation you're using can deal with it.
I am trying to write a multithreaded logging system for a program running on linux.
Calls to the logging system in the main program threads pushes a data structure containing the data to be logged into a FIFO queue. A dedicated thread picks the data of the queue and outputs the data, while the programs main thread continues with its task.
If the main program causes SIGSEGV or other signals to be raised I need to make sure that the queue is empty before terminating.
My plan is to block the signals using pthread_sigmask http://man7.org/linux/man-pages/man3/pthread_sigmask.3.html for all but one thread, but reading the list of signals on http://man7.org/linux/man-pages/man7/signal.7.html i noticed:
A signal may be generated (and thus pending) for a process as a whole (e.g., when sent >using kill(2)) or for a specific thread (e.g., certain signals, such as SIGSEGV and SIGFPE, >generated as a consequence of executing a specific machine-language instruction are
thread directed, as are signals targeted at a specific thread using pthread_kill(3)).
If I block SIGSEGV on all threads but a thread dedicated to catching signals, will it then catch a SIGSEGV raised by a different thread?
I found the question Signal handling with multiple threads in Linux, but I am clueless as to which signals are thread specific and how to catch them.
I agree with the comments: in practice catching and handling SIGSEGV is often a bad thing.
And SIGSEGV is delivered to a specific thread (see this), the one running the machine instruction which accessed to some illegal address.
So you cannot run a thread dedicated to catching SIGSEGV in other threads. And you probably could not easily use signalfd(2) for SIGSEGV...
Catching (and returning normally from its signal handler) SIGSEGV is a complex and processor specific thing (it cannot be "portable C code"). You need to inspect and alter the machine state in the handler, that is either modify the address space (by calling mmap(2) etc...) or modify the register state of the current thread. So use sigaction(2) with SA_SIGINFO and change the machine specific state pointed by the third argument (of type ucontext_t*) of the signal handler. Then dive into the processor specific uc_mcontext field of it. Have fun changing individual registers, etc... If you don't alter the machine state of the faulty thread, execution is resumed (after returning from your SIGSEGV handler) in the same situation as before, and another SIGSEGV signal is immediately sent.... Or simply, don't return normally from a SIGSEGV handler (e.g. use siglongjmp(3) or abort(3) or _exit(2) ...).
Even if you happen to do all this, it is rumored that Linux kernels are not extremely efficient on such executions. So it is rumored that trying to mimic Hurd/Mach external pagers this way on Linux is not very efficient. See this answer...
Of course signal handlers should call only (see signal(7) for more) async-signal-safe functions. In particular, you cannot in principle call fprintf from them (and you might not be able to use reliably your logging system, but it could work in most but not all cases).
What I said on SIGSEGV also holds for SIGBUS and SIGFPE (and other thread-specific asynchronous signals, if they exist).