c++ fork, without wait, defuncts execl - c++

Looking to fork a process, in c++, that wont hang its parent process - its parent is a daemon and must remain running. If i wait() on the forked process the forked execl wont defunt - but - it will also hang the app - not waiting fixes the app hang - but the command becomes defunt.
if((pid = fork()) < 0)
perror("Error with Fork()");
else if(pid > 0) {
//wait here will hang the execl in the parent
//dont wait will defunt the execl command
//---- wait(&pid);
return "";
} else {
struct rlimit rl;
int i;
if (rl.rlim_max == RLIM_INFINITY)
rl.rlim_max = 1024;
for (i = 0; (unsigned) i < rl.rlim_max; i++)
close(i);
if(execl("/bin/bash", "/bin/bash", "-c", "whoami", (char*) 0) < 0) perror("execl()");
exit(0);
}
How can I fork the execl without a wait(&pid) where execl's command wont defunct?
UPDATE
Fixed by adding the following before the fork
signal(SIGCHLD, SIG_IGN);
Still working with my limited skills at a more compatible solution based on the accepted answer. Thanks!

By default, wait and friends wait until a process has exited, then reap it. You can call waitpid with the WNOHANG to return immediately if no child has exited.
The defunct/"zombie" process will sit around until you wait on it. So if you run it in the background, you must arrange to reap it eventually by any of several ways:
try waitpid with WNOHANG routinely: int pid = waitpid(-1, &status, WNOHANG)
install a signal handler for SIGCHLD to be notified when it exits
Additionally, under POSIX.1-2001, you can use sigaction set the SA_NOCLDWAIT on SIGCHLD. Or set its action to SIG_IGN. Older systems (including Linux 2.4.x, but not 2.6.x or 3.x) don't support this.
Check your system manpages, or alternative the wait in the Single Unix Specification. The Single Unix Spec also gives some helpful code examples. SA_NOCLDWAIT is documented in sigaction.

I think a signal handler would be the best way as indicated. I would like to point out another way this could be handled: Fork twice and have the child exit while the grandchild would call execl. The defunct process would then be cleaned up by the init process.

As said in comment, double fork saves process from defunct state.
What is the reason for performing a double fork when creating a daemon?

Related

Determine if a process has suspended

I try to send a SIGTSTP signal to a particular process, but how to determine if the process has actually suspended using C library functions or syscalls in Linux?
Read from /proc/[pid]/stat.
From the man page, you can get the status of a process from this file:
state %c
One character from the string "RSDZTW" where R is running, S is
sleeping in an interruptible wait, D is waiting in uninterruptible
disk sleep, Z is zombie, T is traced or stopped (on a signal), and W
is paging.
I know this is an old post, but for anyone who as curious as me!
The simple answer is that there is only one STATIC, consistent way to check status, which is from /proc/[pid]/stat, BUT if you want to have as few architecture dependencies as possible and don't want to do that, you can check the signal.
Signals can only be seen once, so you'll have to keep track of it yourself, but waitpid can tap a process to see if any signals have been received since you last checked:
BOOL is_suspended;
int status;
pid_t result = waitpid(pid, &status, WNOHANG | WUNTRACED | WCONTINUED);
if(result > 0) { // Signal has been received
if (WIFSTOPPED(status)) {
is_suspended = true;
} else if (WIFCONTINUED(status)) {
is_suspended = false;
}
}

Way to force file descriptor to close so that pclose() will not block?

I am creating a pipe using popen() and the process is invoking a third party tool which in some rare cases I need to terminate.
::popen(thirdPartyCommand.c_str(), "w");
If I just throw an exception and unwind the stack, my unwind attempts to call pclose() on the third party process whose results I no longer need. However, pclose() never returns as it blocks with the following stack trace on Centos 4:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00807dc3 in __waitpid_nocancel () from /lib/libc.so.6
#2 0x007d0abe in _IO_proc_close##GLIBC_2.1 () from /lib/libc.so.6
#3 0x007daf38 in _IO_new_file_close_it () from /lib/libc.so.6
#4 0x007cec6e in fclose##GLIBC_2.1 () from /lib/libc.so.6
#5 0x007d6cfd in pclose##GLIBC_2.1 () from /lib/libc.so.6
Is there any way to force the call to pclose() to be successful before calling it so I can programmatically avoid this situation of my process getting hung up waiting for pclose() to succeed when it never will because I've stopped supplying input to the popen()ed process and wish to throw away its work?
Should I write an end of file somehow to the popen()ed file descriptor before trying to close it?
Note that the third party software is forking itself. At the point where pclose() has hung, there are four processes, one of which is defunct:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
abc 6870 0.0 0.0 8696 972 ? S 04:39 0:00 sh -c /usr/local/bin/third_party /home/arg1 /home/arg2 2>&1
abc 6871 0.0 0.0 10172 4296 ? S 04:39 0:00 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6874 99.8 0.0 10180 1604 ? R 04:39 141:44 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6875 0.0 0.0 0 0 ? Z 04:39 0:00 [third_party] <defunct>
I see two solutions here:
The neat one: you fork(), pipe() and execve() (or anything in the exec family of course...) "manually", then it is going to be up to you to decide if you want to let your children become zombies or not. (i.e. to wait() for them or not)
The ugly one: if you're sure you only have one of this child process running at any given time, you could use sysctl() to check if there is any process running with this name before you call pclose()... yuk.
I strongly advise the neat way here, or you could just ask whomever responsible to fix that infinite loop in your third party tool haha.
Good luck!
EDIT:
For you first question: I don't know. Doing some researches on how to find processes by name using sysctl() shoud tell you what you need to know, I myself have never pushed it this far.
For your second and third question: popen() is basically a wrapper to fork() + pipe() + dup2() + execl().
fork() duplicates the process, execl() replaces the duplicated process' image with a new one, pipe() handles inter process communication and dup2() is used to redirect the output... And then pclose() will wait() for the duplicated process to die, which is why we're here.
If you want to know more, you should check this answer where I've recently explained how to perform a simple fork with standard IPC. In this case, it's just a bit more complicated as you have to use dup2() to redirect the standard output to your pipe.
You should also take a look at popen()/pclose() source codes, as they are of course open source.
Finally, here's a brief example, I cannot make it clearer than that:
int pipefd[2];
pipe(pipefd);
if (fork() == 0) // I'm the child
{
close(pipefd[0]); // I'm not going to read from this pipe
dup2(pipefd[1], 1); // redirect standard output to the pipe
close(pipefd[1]); // it has been duplicated, close it as we don't need it anymore
execve()/execl()/execsomething()... // execute the program you want
}
else // I'm the parent
{
close(pipefd[1]); // I'm not going to write to this pipe
while (read(pipefd[0], &buf, 1) > 0) // read while EOF
write(1, &buf, 1);
close(pipefd[1]); // cleaning
}
And as always, remember to read the man pages and to check all your return values.
Again, good luck!
Another solution is to kill all your children. If you know that the only child processes you have are processes that get started when you do popen(), then it's easy enough. Otherwise you may need some more work or use the fork() + execve() combo, in which case you will know the first child's PID.
Whenever you run a child process, it's PPID (parent process ID) is your own PID. It is easy enough to read the list of currently running processes and gather those that have their PPID = getpid(). Repeat the loop looking for processes that have their PPID equal to one of your children's PID. In the end you build a whole tree of child processes.
Since you child processes may end up creating other child processes, to make it safe, you will want to block those processes by sending a SIGSTOP. That way they will stop creating new children. As far as I know, you can't prevent the SIGSTOP from doing its deed.
The process is therefore:
function kill_all_children()
{
std::vector<pid_t> me_and_children;
me_and_children.push_back(getpid());
bool found_child = false;
do
{
found_child = false;
std::vector<process> processes(get_processes());
for(auto p : processes)
{
// i.e. if I'm the child of any one of those processes
if(std::find(me_and_children.begin(),
me_and_children.end(),
p.ppid()))
{
kill(p.pid(), SIGSTOP);
me_and_children.push_back(p.pid());
found_child = true;
}
}
}
while(found_child);
for(auto c : me_and_children)
{
// ignore ourselves
if(c == getpid())
{
continue;
}
kill(c, SIGTERM);
kill(c, SIGCONT); // make sure it continues now
}
}
This is probably not the best way to close your pipe, though, since you probably need to let the command time to handle your data. So what you want is execute that code only after a timeout. So your regular code could look something like this:
void send_data(...)
{
signal(SIGALRM, handle_alarm);
f = popen("command", "w");
// do some work...
alarm(60); // give it a minute
pclose(f);
alarm(0); // remove alarm
}
void handle_alarm()
{
kill_all_children();
}
-- about the alarm(60);, the location is up to you, it could also be placed before the popen() if you're afraid that the popen() or the work after it could also fail (i.e. I've had problems where the pipe fills up and I don't even reach the pclose() because then the child process loops forever.)
Note that the alarm() may not be the best idea in the world. You may prefer using a thread with a sleep made of a poll() or select() on an fd which you can wake up as required. That way the thread would call the kill_all_children() function after the sleep, but you can send it a message to wake it up early and let it know that the pclose() happened as expected.
Note: I left the implementation of the get_processes() out of this answer. You can read that from /proc or with the libprocps library. I have such an implementation in my snapwebsites project. It's called process_list. You could just reap off that class.
I'm using popen() to invoke a child process which doesn't need any stdin or stdout, it just runs for a short time to do its work, then it stops all by itself. Arguably, invoking this type of child process should rather be done with system() ? Anyway, pclose() is used afterwards to verify that the child process exited cleanly.
Under certain conditions, this child process keeps on running indefinitely. pclose() blocks forever, so then my parent process is also stuck. CPU usage runs to 100%, other executables get starved, and my whole embedded system crumbles. I came here looking for solutions.
Solution 1 by #cmc : decomposing popen() into fork(), pipe(), dup2() and execl().
It might just be a matter of personal taste, but I'm reluctant to rewrite perfectly fine system calls myself. I would just end up introducing new bugs.
Solution 2 by #cmc : verifying that the child process actually exists with sysctl(), to make sure that pclose() will return successfully. I find that this somehow sidesteps the problem from the OP #WilliamKF - there is definitely a child process, it just has become unresponsive. Forgoing the pclose() call won't solve that. [As an aside, in the 7 years since #cmc wrote this answer, sysctl() seems to have become deprecated.]
Solution 3 by #Alexis Wilke : killing the child process. I like this approach best. It basically automates what I did when I stepped in manually to resuscitate my dying embedded system. The problem with my stubborn adherence to popen(), is that I get no PID from the child process. I have been trying in vain with
waitid(P_PGID, getpgrp(), &child_info, WNOHANG);
but all I get on my Debian Linux 4.19 system is EINVAL.
So here's what I cobbled together. I'm searching for the child process by name; I can afford to take a few shortcuts, as I'm sure there will only be one process with this name. Ironically, commandline utility ps is invoked by yet another popen(). This won't win any elegance prizes, but at least my embedded system stays afloat now.
FILE* child = popen("child", "r");
if (child)
{
int nr_loops;
int child_pid;
for (nr_loops=10; nr_loops; nr_loops--)
{
FILE* ps = popen("ps | grep child | grep -v grep | grep -v \"sh -c \" | sed \'s/^ *//\' | sed \'s/ .*$//\'", "r");
child_pid = 0;
int found = fscanf(ps, "%d", &child_pid);
pclose(ps);
if (found != 1)
// The child process is no longer running, no risk of blocking pclose()
break;
syslog(LOG_WARNING, "child running PID %d", child_pid);
usleep(1000000); // 1 second
}
if (!nr_loops)
{
// Time to kill this runaway child
syslog(LOG_ERR, "killing PID %d", child_pid);
kill(child_pid, SIGTERM);
}
pclose(child); // Even after it had to be killed
} /* if (child) */
I learned in the hard way, that I have to pair every popen() with a pclose(), otherwise I pile up the zombie processes. I find it remarkable that this is needed after a direct kill; I figure that's because according to the manpage, popen() actually launches sh -c with the child process in it, and it's this surrounding sh that becomes a zombie.

Avoiding the production of zombie processes in C++

Very strange bug, perhaps someone will see something I'm missing.
I have a C++ program which forks off a bash shell, and then passes commands to it.
Periodically, the commands will contain nonsense and the bash process will hang. I detect this using semtimedwait, and then run a little function like this:
if (kill(*bash_pid, SIGKILL)) {
cerr << "Error sending SIGKILL to the bash process!" << endl;
exit(1);
} else {
// collect exit status
long counter = 0;
do {
pid = waitpid(*bash_pid, &status, WNOHANG);
if (pid == 0) { // status not available yet
sleep(1);
}
if(counter++ > 5){
cerr << "ERROR: Bash child process ignored SIGKILL >5 sec!" << endl;
}
} while (pid != *bash_pid && pid != -1);
if(pid == -1){
cerr << "Failed to clean up zombie bash process!" << endl;
exit(1);
}
// re-initialized bash process
*bash_pid = init_bash();
}
Assuming I understand the workings of waitpid correctly, this should first send SIGKILL to the shell, and then essentially sit in a spinlock, trying to reap the resulting process. Eventually, it succeeds and then a new bash process is started with init_bash().
At least, that's what should happen. Instead, the child process's exit status is never collected, and it continues to exist as a zombie process. In spite of this, the parent does exit the loop and manages to restart the bash process, and continues with normal execution. Eventually too many zombies are generated and the system runs out of pids.
Additionally:
Fork is called in exactly one place in the program, inside init_bash.
Checks prevent init_bash from being called except once at the program's start and after a call to the function above.
Thoughts?
Articles that I read indicate that the reason for a zombie process is that a child process does an exit however the parent never collects the child's exit.
This article provides several ways to kill a zombie process from the command line. One technique is to use other signals besides SIGKILL for instance SIGTERM.
This article has an answer which suggests SIGKILL should not be used.
One of the techniques is to kill the parent thereby also killing its child processes including any zombies. The author indicates that there appear to be child processes that just remain as zombies until the OS is restarted.
You do not mention the mechanism used to communicate the commands to the child process. However one option may be to turn the child process loose by disconnecting it from its parent similar to the way a child of a terminal process can be disconnected from the terminal session. That way the child will become its own process and if there is a problem may exit without becoming a zombie.

c++ fork() & execl() dont wait, detach completely

So I have a simple fork and exec program. It works pretty good but I want to be able to detach the process that is started, I try a fork with no wait:
if((pid = fork()) < 0)
perror("Error with Fork()");
else if(pid > 0) {
return "";
}
else {
if(execl("/bin/bash", "/bin/bash", "-c", cmddo, (char*) 0) < 0) perror("execl()");
exit(0);
}
It starts the proc fine but when my main app is closed - so is my forked proc.
How do I keep the forked process running after the main proc (that started it) closes?
Thanks :D
Various things to do if you want to start a detached/daemon process:
fork again and exit the first child (so the second child process no longer has the original process as its parent pid)
call setsid(2) to get a new session and process group
reopen stdin/stdout/stderr to dereference the controlling tty, if there was one. Or, for example, you might have inherited a pipe stdout that will be broken and give you SIGPIPE if you try to write it.
chdir to / to get away from the ancestor's current directory
Probably all you really want is to ignore SIGHUP in your fork()ed process as this is normally the one which brings the program down. That is, what you need to do is
signal(SIGHUP, SIG_IGN);
Using nohup arranges for a reader to be present which would avoid possibly writing to close pipe. To avoid this you could either arrange for standard outputs not to be available or to also ignore SIGPIPE. There are a number of signals which terminate your program when not ignore (see man signal; some signals can't be ignored) but the one which will be sent to the child is is SIGHUP.

Child process receives parent's SIGINT

I have one simple program that's using Qt Framework.
It uses QProcess to execute RAR and compress some files. In my program I am catching SIGINT and doing something in my code when it occurs:
signal(SIGINT, &unix_handler);
When SIGINT occurs, I check if RAR process is done, and if it isn't I will wait for it ... The problem is that (I think) RAR process also gets SIGINT that was meant for my program and it quits before it has compressed all files.
Is there a way to run RAR process so that it doesn't receive SIGINT when my program receives it?
Thanks
If you are generating the SIGINT with Ctrl+C on a Unix system, then the signal is being sent to the entire process group.
You need to use setpgid or setsid to put the child process into a different process group so that it will not receive the signals generated by the controlling terminal.
[Edit:]
Be sure to read the RATIONALE section of the setpgid page carefully. It is a little tricky to plug all of the potential race conditions here.
To guarantee 100% that no SIGINT will be delivered to your child process, you need to do something like this:
#define CHECK(x) if(!(x)) { perror(#x " failed"); abort(); /* or whatever */ }
/* Block SIGINT. */
sigset_t mask, omask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
CHECK(sigprocmask(SIG_BLOCK, &mask, &omask) == 0);
/* Spawn child. */
pid_t child_pid = fork();
CHECK(child_pid >= 0);
if (child_pid == 0) {
/* Child */
CHECK(setpgid(0, 0) == 0);
execl(...);
abort();
}
/* Parent */
if (setpgid(child_pid, child_pid) < 0 && errno != EACCES)
abort(); /* or whatever */
/* Unblock SIGINT */
CHECK(sigprocmask(SIG_SETMASK, &omask, NULL) == 0);
Strictly speaking, every one of these steps is necessary. You have to block the signal in case the user hits Ctrl+C right after the call to fork. You have to call setpgid in the child in case the execl happens before the parent has time to do anything. You have to call setpgid in the parent in case the parent runs and someone hits Ctrl+C before the child has time to do anything.
The sequence above is clumsy, but it does handle 100% of the race conditions.
What are you doing in your handler? There are only certain Qt functions that you can call safely from a unix signal handler. This page in the documentation identifies what ones they are.
The main problem is that the handler will execute outside of the main Qt event thread. That page also proposes a method to deal with this. I prefer getting the handler to "post" a custom event to the application and handle it that way. I posted an answer describing how to implement custom events here.
Just make the subprocess ignore SIGINT:
child_pid = fork();
if (child_pid == 0) {
/* child process */
signal(SIGINT, SIG_IGN);
execl(...);
}
man sigaction:
During an execve(2), the dispositions of handled signals are reset to the default;
the dispositions of ignored signals are left unchanged.