I have a Linux app which must spawn a child process on request with one of the exec() functions.
If the child process has not finished before it's time to spawn it again, the app must kill the previous instance before starting the new one. It does this with
kill(pid, SIGTERM);
I'm keeping the pid of the previous instance and using
waitpid(pid, &status, WNOHANG)
to reap the process.
It seems that sometimes there's an extremely long time window (possibly into the hundreds of milliseconds) between issuing the kill call and being able to reap the process with waitpid.
What could cause this? I thought that unless the child process set a signal hander (this one doesn't) then it would be killed virtually immediately. This is on a 200MHz ARM9 but still ... seems odd to me.
The process may catch SIGTERM in order to perform its own cleanup before shutting down.
Even if it doesn't, you don't know how long it'll take the OS to shut down the process.
Trying to rely on things like this is a fool's errand.
Related
In my process I need to start/restart another process.
Currently I use a thread with a tiny stack size and the following code:
void startAndMonitorA()
{
while(true)
{
system("myProcess");
LOG("myProcess crashed");
usleep(1000 * 1000);
}
}
I feel like that's not best practice. I have no idea about the resources the std::system() call is blocking or wasting. I'm on an embedded Linux - so in general I try to care about resources.
One problematic piece is restarting immediately: if the child process fails to start that is going to cause 100% CPU usage. It may be a transient error in the child process (e.g. cannot connect to a server). It may be a good idea to add a least one second pause before trying to restart.
What system call does on Linux is:
Sets up signals SIGINT and SIGQUIT to be ignored.
Blocks signal SIGCHLD.
fork()
Child process calls exec() shell, passing the command line to the shell.
Parent process calls waitpid() that blocks the thread till the child process terminates.
Parent process restores its signal dispositions.
If you were to re-implement the functionality of system you would probably omit step 5 (along with steps 1, 2 and 6) to avoid blocking the thread and rely on SIGCHLD to get notified when the child process has terminated and needs to be restarted.
In other words, the bare minimum would be to set up a signal handler for SIGCHLD and call fork and exec.
The code as shown would be adequate for most circumstances. If you really care about resource usage, you should be aware that you are starting (and keeping around) a thread for each process you are monitoring. If your program has an event loop anyway, that kind of thing can be avoided at the cost of some additional effort (and an increase in complexity).
Implementing this would entail the following:
Instead of calling system(), use fork() and exec() to start the external program. Store its PID in a global table.
Set a SIGCHLD handler that notifies the event loop of the exit of a child, e.g. by writing a byte to a pipe monitored by the event loop.
When a child exits, run waitpid with the WNOHANG flag in a loop that runs for as long as there are children to reap. waitpid() will return the PID of the child that exited, so that you know to remove its PID from the table, and to schedule a timeout that restarts it.
In Windows (7), in VC++ we can set the "process shutdown parameters" (in XP a parent process will automatically shutdown before the child) to ensure a parent process is killed BEFORE a child process, like so:
GetProcessShutdownParameters(&shutdownlevel, &shutdownflags);
SetProcessShutdownParameters(shutdownlevel+1, SHUTDOWN_NORETRY);
How to do this in C++ on Linux (gcc) ? I find a lot discussion in many forums on how to ensure a child process killed, in case a parent process dies (e.g. use of prctl on Linux), but I have found nothing on how to GUARANTEE that the parent process is killed by the OS before the child process, like the above for Windows. Maybe it is automatic in Linux ?
System shutdown in the Unix world works a bit differently.
When the system is being shut down, at first the shutdown scripts are invoked, which handle any complex or time consuming tasks, and when the scripts have run, all remaining processes are then first sent a SIGTERM signal (which kills any process that doesn't have an explicit handler), and, a few seconds later, a SIGKILL signal (which kills the process and cannot be handled).
The order in which the last part happens is undefined.
In general programs should be written so they can be shutdown by simply sending SIGTERM.
I'm guessing that you want the parent stopped before the child because the parent would simply restart the child. The proper way to avoid that is to collect the child's exit status (which you are responsible for anyway), and avoid restarting when the exit status indicates that the process ended because of being sent SIGTERM.
(You still want to restart on SIGKILL, because that is what happens to the largest process when the system runs out of memory)
In my Qt C++ program I created a process as follows:
myProcess = new QProcess();
myProcess->start(programpath, arguments);
Terminating the process is handled as follows:
myProcess->terminate();
Terminating the child process with QProcess::kill(),terminate() or close() works, but I don't want to use it because it doesn't give the child process a chance to clean up before exiting.
Is there any other way to exit the process? Thanks!
The polite way would be for the parent process to politely ask the child process to go away. Then when the child process exits (of its own volition), the QProcess object will emit a finished(int, QProcess::ExitStatus) signal, and you can have a slot connected to that signal that will continue your process (e.g. by deleting the QProcess object at that time). (Or if you don't mind blocking your Qt event loop for a little while, you could just call waitForFinished() on the QProcess object after asking it to exit, and waitForFinished() won't return until the process has gone away or the timeout period has elapsed)
Of course for the above to work you need some way to ask the child process to exit. How you go about doing that will depend on what the child process is running. If you're lucky, you are in control of the child process's code, in which case you can modify it to exit in response to some action of the parent process -- for example, you could code the child process to exit when its stdin descriptor is closed, and have the parent process call closeWriteChannel() on the QProcess object to cause that to happen. Or if you're running under Linux/Unix you could send a SIGINT signal to the child process and the child process could set up a handler that would catch the signal and start an orderly shutdown. Or if you want something really stupid-quick and dirty, have the child process periodically check for the presence of a file at a well-known location (e.g. "/tmp/hey-child-process-PIDNUMBERHERE-go-away.txt" or something) and the parent process would create such a file when it wants the child to go away. Not that I'd recommend that last method as I don't think it would be very robust, except maybe as a proof of concept.
terminate actually gives the process an chance to clean up. The program being terminated just has to take that chance i.e. the system sents a SIGTERM and the application and it can ignore that and exit cleanly on its own. If this is still not nice enough then you have to implement your own way of asking the application to quit. Jeremy Friesner made some good successions. If the application code is not written by yourself you'll have to read the documentation for that program closer, maybe its documented how to do that.
I am invoking several processes in my main and I can get the pid of that processes. Now I want to wait until all this processes have been finished and then clear the shared memory block from my parent process. Also if any of the process not finished and segfaulted I want to kill that process. So how to check from the pid of processes in my parent process code that a process is finished without any error or it gave broke down becoz of runtime error or any other cause, so that I can kill that process.
Also what if I want to see the status of some other process which is not a child process but its pid is known.
Code is appreciated( I am not looking for script but code ).
Look into waitpid(2) with WNOHANG option. Check the "fate" of the process with macros in the manual page, especially WIFSIGNALED().
Also, segfaulted process is already dead (unless SIGSEGV is specifically handled by the process, which is usually not a good idea.)
From your updates, it looks like you also want to check on other processes, which are not children of your current process.
You can look at /proc/{pid}/status to get an overview of what a process is currently doing, its either going to be:
Running
Stopped
Sleeping
Disk (D) sleep (i/o bound, uninterruptable)
Zombie
However, once a process dies (fully, unless zombied) so does its entry in /proc. There's no way to tell if it exited successfully, segfaulted, caught a signal that could not be handled, or failed to handle a signal that could be handled. Not unless its parent logs that information somewhere.
It sounds like your writing a watchdog for other processes that you did not start, rather than keeping track of child processes.
If a program segfaults, you won't need to kill it. It's dead already.
Use the wait and waitpid calls to wait for children to finish and check the status for some idea of how they exiting. See here for details on how to use these functions. Note especially the WIFSIGNALED and WTERMSIG macros.
waitpid() from SIGCHLD handler to catch the moment when application terminates itself. Note that if you start multiple processes you have to loop on waitpid() with WNOHANG until it returns 0.
kill() with signal 0 to check whether the process is still running. IIRC zombies still qualify as processes thus you have to have proper SIGCHLD handler for that to work.
I wrote a program that forks some processes with fork(). I want to kill all child- and the mother process if there is an error. If I use exit(EXIT_FAILURE) only the child process is killed.
I am thinking about a system("killall [program_name]") but there must be a better way...
Thank you all!
Lennart
Under UNIX, send SIGTERM, or SIGABRT, or SIGPIPE or sth. alike to the mother process. This signal will then be propagated to all clients automatically, if they do not explicitely block or ignore it.
Use getppid() to get the PID to send the signal to, and kill() to send the signal.
getppid() returns the process ID of
the parent of the calling process.
The kill() system call can be used to send any signal to any process group or process.
Remarks:
1. Using system is evil. Use internal functions to send signals.
2. killall would be even more evil. Consider several instances of your program running at once.
See How to make child process die after parent exits?
On Linux there's a prctl() call which is explicitly designed to send a signal to all of a process's children when the parent dies for whatever reason.
I need to check and can't do it where I am at the second, but I'm really not sure that ypnos' assertion about SIGPIPE, SIGTERM and SIGABRT being propagated to all children is correct.
However if you use kill(-ppid) (note the minus sign) then so long as the children are still in the parent process's process group then the kernel will deliver any signal to all of the children.
If your mother process is not started by the command line, it may not be the
process group leader, like as a deamon.
To ensure that your mother process is the process group leader, call setsid() during
your process initialization.
Then in your child process, if you want to cause all the processes to exit:
pgid = getpgid();
kill(pgid, 15);
You can also do tricks, like telling all your siblings to suspend:
kill(pgid, 20);
And resume:
kill(pgid, 18);
Consider suicidal approach - setting up an alarm() at the beginning of the process (both parent and child) with some positive number of seconds. If computation completes within that time and "there is no error", call alarm(0) to cancel the timer; otherwise the SIGALRM will kill the process (assuming you're not explicitly catching or ignoring it.)
Well, make a case against this instead of just down-voting :)