I was going through Richard Stevens"Advanced Programming in UNIX Environment" and I found this topic.
*8.13. system Function
*****Because system is implemented by calling fork, exec, and waitpid, there are three types of return values.**
1. If either the fork fails or waitpid returns an error other than EINTR, system returns –1 with errno set to indicate the error.
2. If the exec fails, implying that the shell can't be executed, the return value is as if the shell had executed exit(127).
**3. Otherwise, all three functions—fork, exec, and waitpid—succeed, and the return value from system is the termination status of the shell, in the format specified for waitpid.******
As of my understanding we fork() a process by the cmdstring name and exec() makes it separate from the parent process.
But unable to figure out how waitpid() function is a part of system() function call?
The below link ambiguous constructor call while object creation didn't provide me correct answer.
After you fork() off, your original process continues immediately, i.e. fork() returns at once. At that point, the new process is still running. Since system() is supposed to be synchronous, i.e. must only return after the executed program finishes, the original program now needs to call waitpid() on the PID of the new process to wait for its termination.
In a picture:
[main process]
.
.
.
fork() [new process]
A
/ \
| \
| \___ exec()
waitpid() .
z .
z . (running)
z .
z Done!
z |
+----<----<---+
|
V
(continue)
The system() call would, in a Unix environment look something like this:
int system(const char *cmd)
{
int pid = fork();
if(!pid) // We are in the child process.
{
// Ok, so it's more complicated than this, it makes a new string with a
// shell in it, etc.
exec(cmd);
exit(127); // exec failed, return 127. [exec doesn't return unless it failed!]
}
else
{
if (pid < 0)
{
return -1; // Failed to fork!
}
int status;
if (waitpid(pid, &status, 0) > 0)
{
return status;
}
}
return -1;
}
Please do note that this is SYMBOLICALLY what system does - it's a fair bit more complicated, because waitpid can give other values, and all sorts of other things that need checking.
From the man pages:
system() executes a command specified in command by calling /bin/sh -c command, and returns after the command has been completed. During execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT will be ignored.
system() presumably uses waitpid() to wait until the shell command finishes.
Related
How to I get the status of another process?
i want to know the execution status of another process.
i want to receive and process the event as a inotify.
no search /proc by periods.
how to another process status (running , killed ) event?
SYSTEM : linux, solaris, aix
Linux
Under Linux (and probably many Unixes system) you can achieve this by using the ptrace call, then using waitpid to wait for status:
manpages:
ptrace call: http://man7.org/linux/man-pages/man2/ptrace.2.html
waitpid call: https://linux.die.net/man/2/waitpid
From the manpage:
Death under ptrace
When a (possibly multithreaded) process receives a killing signal
(one whose disposition is set to SIG_DFL and whose default action is
to kill the process), all threads exit. Tracees report their death
to their tracer(s). Notification of this event is delivered via
waitpid(2).
beware that you will need to have special authorization in certain cases. Take a look at /proc/sys/kernel/yama/ptrace_scope. (if you can modify the target program, you can also change the behavior of ptrace by calling ptrace(PTRACE_TRACEME, 0, nullptr, nullptr);
To use ptrace, first you must get your process PID, then call PTRACE_ATTACH:
// error checking removed for the sake of clarity
#include <sys/ptrace.h>
pid_t child_pid;
// ... Get your child_pid somehow ...
// 1. attach to your process:
long err;
err = ptrace(PTRACE_ATTACH, child_pid, nullptr, nullptr);
// 2. wait for your process to stop:
int process_status;
err = waitpid(child_pid, &process_status, 0);
// 3. restart the process (continue)
ptrace(PTRACE_CONT, child_pid, nullptr, nullptr);
// 4. wait for any change in status:
err = waitpid(child_pid, &process_status, 0);
// while waiting, the process is running...
// by default waitpid will wait for process to terminate, but you can
// change this with WNOHANG in the options.
if (WIFEXITED(status)) {
// exitted
}
if (WIFSIGNALED(status)) {
// process got a signal
// WTERMSIG(status) will get you the signal that was sent.
}
AIX:
The solution will need some adaptation to work with AIX, have a look at the doc there:
ptrace documentation: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.basetrf1/ptrace.htm
waitpid documentation: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.basetrf1/ptrace.htm
Solaris
As mentionned here ptrace may not be available on your version of Solaris, you may have to resort to procfs there.
I have written a program where I create a thread in the main and use system() to start another process from the thread. Also I start the same process using the system() in the main function also. The process started from the thread seems to stay alive even when the parent process dies. But the one called from the main function dies with the parent. Any ideas why this is happening.
Please find the code structure below:
void *thread_func(void *arg)
{
system(command.c_str());
}
int main()
{
pthread_create(&thread_id, NULL, thread_func, NULL);
....
system(command.c_str());
while (true)
{
....
}
pthread_join(thread_id, NULL);
return 0;
}
My suggestion is: Don't do what you do. If you want to create an independently running child-process, research the fork and exec family functions. Which is what system will use "under the hood".
Threads aren't really independent the same way processes are. When your "main" process ends, all threads end as well. In your specific case the thread seems to continue to run while the main process seems to end because of the pthread_join call, it will simply wait for the thread to exit. If you remove the join call the thread (and your "command") will be terminated.
There are ways to detach threads so they can run a little more independently (for example you don't have to join a detached thread) but the main process still can't end, instead you have to end the main thread, which will keep the process running for as long as there are detached threads running.
Using fork and exec is actually quite simple, and not very complex:
int pid = fork();
if (pid == 0)
{
// We are in the child process, execute the command
execl(command.c_str(), command.c_str(), nullptr);
// If execl returns, there was an error
std::cout << "Exec error: " << errno << ", " << strerror(errno) << '\n';
// Exit child process
exit(1);
}
else if (pid > 0)
{
// The parent process, do whatever is needed
// The parent process can even exit while the child process is running, since it's independent
}
else
{
// Error forking, still in parent process (there are no child process at this point)
std::cout << "Fork error: " << errno << ", " << strerror(errno) << '\n';
}
The exact variant of exec to use depends on command. If it's a valid path (absolute or relative) to an executable program then execl works well. If it's a "command" in the PATH then use execlp.
There are two points here that I think you've missed:
First, system is a synchronous call. That means, your program (or, at least, the thread calling system) waits for the child to complete. So, if your command is long-running, both your main thread and your worker thread will be blocked until it completes.
Secondly, you are "joining" the worker thread at the end of main. This is the right thing to do, because unless you join or detach the thread you have undefined behaviour. However, it's not what you really intended to do. The end result is not that the child process continues after your main process ends... your main process is still alive! It is blocked on the pthread_join call, which is trying to wrap up the worker thread, which is still running command.
In general, assuming you wish to spawn a new process entirely unrelated to your main process, threads are not the way to do it. Even if you were to detach your thread, it still belongs to your process, and you are still required to let it finish before your process terminates. You can't detach from the process using threads.
Instead, you'll need OS features such as fork and exec (or a friendly C++ wrapper around this functionality, such as Boost.Subprocess). This is the only way to truly spawn a new process from within your program.
But, you can cheat! If command is a shell command, and your shell supports background jobs, you could put & at the end of the command (this is an example for Bash syntax) to make the system call:
Ask the shell to spin off a new process
Wait for it to do that
The new process will now continue to run in the background
For example:
const std::string command = "./myLongProgram &";
// ^
However, again, this is kind of a hack and proper fork mechanisms that reside within your program's logic should be preferred for maximum portability and predictability.
I am creating a pipe using popen() and the process is invoking a third party tool which in some rare cases I need to terminate.
::popen(thirdPartyCommand.c_str(), "w");
If I just throw an exception and unwind the stack, my unwind attempts to call pclose() on the third party process whose results I no longer need. However, pclose() never returns as it blocks with the following stack trace on Centos 4:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00807dc3 in __waitpid_nocancel () from /lib/libc.so.6
#2 0x007d0abe in _IO_proc_close##GLIBC_2.1 () from /lib/libc.so.6
#3 0x007daf38 in _IO_new_file_close_it () from /lib/libc.so.6
#4 0x007cec6e in fclose##GLIBC_2.1 () from /lib/libc.so.6
#5 0x007d6cfd in pclose##GLIBC_2.1 () from /lib/libc.so.6
Is there any way to force the call to pclose() to be successful before calling it so I can programmatically avoid this situation of my process getting hung up waiting for pclose() to succeed when it never will because I've stopped supplying input to the popen()ed process and wish to throw away its work?
Should I write an end of file somehow to the popen()ed file descriptor before trying to close it?
Note that the third party software is forking itself. At the point where pclose() has hung, there are four processes, one of which is defunct:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
abc 6870 0.0 0.0 8696 972 ? S 04:39 0:00 sh -c /usr/local/bin/third_party /home/arg1 /home/arg2 2>&1
abc 6871 0.0 0.0 10172 4296 ? S 04:39 0:00 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6874 99.8 0.0 10180 1604 ? R 04:39 141:44 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6875 0.0 0.0 0 0 ? Z 04:39 0:00 [third_party] <defunct>
I see two solutions here:
The neat one: you fork(), pipe() and execve() (or anything in the exec family of course...) "manually", then it is going to be up to you to decide if you want to let your children become zombies or not. (i.e. to wait() for them or not)
The ugly one: if you're sure you only have one of this child process running at any given time, you could use sysctl() to check if there is any process running with this name before you call pclose()... yuk.
I strongly advise the neat way here, or you could just ask whomever responsible to fix that infinite loop in your third party tool haha.
Good luck!
EDIT:
For you first question: I don't know. Doing some researches on how to find processes by name using sysctl() shoud tell you what you need to know, I myself have never pushed it this far.
For your second and third question: popen() is basically a wrapper to fork() + pipe() + dup2() + execl().
fork() duplicates the process, execl() replaces the duplicated process' image with a new one, pipe() handles inter process communication and dup2() is used to redirect the output... And then pclose() will wait() for the duplicated process to die, which is why we're here.
If you want to know more, you should check this answer where I've recently explained how to perform a simple fork with standard IPC. In this case, it's just a bit more complicated as you have to use dup2() to redirect the standard output to your pipe.
You should also take a look at popen()/pclose() source codes, as they are of course open source.
Finally, here's a brief example, I cannot make it clearer than that:
int pipefd[2];
pipe(pipefd);
if (fork() == 0) // I'm the child
{
close(pipefd[0]); // I'm not going to read from this pipe
dup2(pipefd[1], 1); // redirect standard output to the pipe
close(pipefd[1]); // it has been duplicated, close it as we don't need it anymore
execve()/execl()/execsomething()... // execute the program you want
}
else // I'm the parent
{
close(pipefd[1]); // I'm not going to write to this pipe
while (read(pipefd[0], &buf, 1) > 0) // read while EOF
write(1, &buf, 1);
close(pipefd[1]); // cleaning
}
And as always, remember to read the man pages and to check all your return values.
Again, good luck!
Another solution is to kill all your children. If you know that the only child processes you have are processes that get started when you do popen(), then it's easy enough. Otherwise you may need some more work or use the fork() + execve() combo, in which case you will know the first child's PID.
Whenever you run a child process, it's PPID (parent process ID) is your own PID. It is easy enough to read the list of currently running processes and gather those that have their PPID = getpid(). Repeat the loop looking for processes that have their PPID equal to one of your children's PID. In the end you build a whole tree of child processes.
Since you child processes may end up creating other child processes, to make it safe, you will want to block those processes by sending a SIGSTOP. That way they will stop creating new children. As far as I know, you can't prevent the SIGSTOP from doing its deed.
The process is therefore:
function kill_all_children()
{
std::vector<pid_t> me_and_children;
me_and_children.push_back(getpid());
bool found_child = false;
do
{
found_child = false;
std::vector<process> processes(get_processes());
for(auto p : processes)
{
// i.e. if I'm the child of any one of those processes
if(std::find(me_and_children.begin(),
me_and_children.end(),
p.ppid()))
{
kill(p.pid(), SIGSTOP);
me_and_children.push_back(p.pid());
found_child = true;
}
}
}
while(found_child);
for(auto c : me_and_children)
{
// ignore ourselves
if(c == getpid())
{
continue;
}
kill(c, SIGTERM);
kill(c, SIGCONT); // make sure it continues now
}
}
This is probably not the best way to close your pipe, though, since you probably need to let the command time to handle your data. So what you want is execute that code only after a timeout. So your regular code could look something like this:
void send_data(...)
{
signal(SIGALRM, handle_alarm);
f = popen("command", "w");
// do some work...
alarm(60); // give it a minute
pclose(f);
alarm(0); // remove alarm
}
void handle_alarm()
{
kill_all_children();
}
-- about the alarm(60);, the location is up to you, it could also be placed before the popen() if you're afraid that the popen() or the work after it could also fail (i.e. I've had problems where the pipe fills up and I don't even reach the pclose() because then the child process loops forever.)
Note that the alarm() may not be the best idea in the world. You may prefer using a thread with a sleep made of a poll() or select() on an fd which you can wake up as required. That way the thread would call the kill_all_children() function after the sleep, but you can send it a message to wake it up early and let it know that the pclose() happened as expected.
Note: I left the implementation of the get_processes() out of this answer. You can read that from /proc or with the libprocps library. I have such an implementation in my snapwebsites project. It's called process_list. You could just reap off that class.
I'm using popen() to invoke a child process which doesn't need any stdin or stdout, it just runs for a short time to do its work, then it stops all by itself. Arguably, invoking this type of child process should rather be done with system() ? Anyway, pclose() is used afterwards to verify that the child process exited cleanly.
Under certain conditions, this child process keeps on running indefinitely. pclose() blocks forever, so then my parent process is also stuck. CPU usage runs to 100%, other executables get starved, and my whole embedded system crumbles. I came here looking for solutions.
Solution 1 by #cmc : decomposing popen() into fork(), pipe(), dup2() and execl().
It might just be a matter of personal taste, but I'm reluctant to rewrite perfectly fine system calls myself. I would just end up introducing new bugs.
Solution 2 by #cmc : verifying that the child process actually exists with sysctl(), to make sure that pclose() will return successfully. I find that this somehow sidesteps the problem from the OP #WilliamKF - there is definitely a child process, it just has become unresponsive. Forgoing the pclose() call won't solve that. [As an aside, in the 7 years since #cmc wrote this answer, sysctl() seems to have become deprecated.]
Solution 3 by #Alexis Wilke : killing the child process. I like this approach best. It basically automates what I did when I stepped in manually to resuscitate my dying embedded system. The problem with my stubborn adherence to popen(), is that I get no PID from the child process. I have been trying in vain with
waitid(P_PGID, getpgrp(), &child_info, WNOHANG);
but all I get on my Debian Linux 4.19 system is EINVAL.
So here's what I cobbled together. I'm searching for the child process by name; I can afford to take a few shortcuts, as I'm sure there will only be one process with this name. Ironically, commandline utility ps is invoked by yet another popen(). This won't win any elegance prizes, but at least my embedded system stays afloat now.
FILE* child = popen("child", "r");
if (child)
{
int nr_loops;
int child_pid;
for (nr_loops=10; nr_loops; nr_loops--)
{
FILE* ps = popen("ps | grep child | grep -v grep | grep -v \"sh -c \" | sed \'s/^ *//\' | sed \'s/ .*$//\'", "r");
child_pid = 0;
int found = fscanf(ps, "%d", &child_pid);
pclose(ps);
if (found != 1)
// The child process is no longer running, no risk of blocking pclose()
break;
syslog(LOG_WARNING, "child running PID %d", child_pid);
usleep(1000000); // 1 second
}
if (!nr_loops)
{
// Time to kill this runaway child
syslog(LOG_ERR, "killing PID %d", child_pid);
kill(child_pid, SIGTERM);
}
pclose(child); // Even after it had to be killed
} /* if (child) */
I learned in the hard way, that I have to pair every popen() with a pclose(), otherwise I pile up the zombie processes. I find it remarkable that this is needed after a direct kill; I figure that's because according to the manpage, popen() actually launches sh -c with the child process in it, and it's this surrounding sh that becomes a zombie.
Very strange bug, perhaps someone will see something I'm missing.
I have a C++ program which forks off a bash shell, and then passes commands to it.
Periodically, the commands will contain nonsense and the bash process will hang. I detect this using semtimedwait, and then run a little function like this:
if (kill(*bash_pid, SIGKILL)) {
cerr << "Error sending SIGKILL to the bash process!" << endl;
exit(1);
} else {
// collect exit status
long counter = 0;
do {
pid = waitpid(*bash_pid, &status, WNOHANG);
if (pid == 0) { // status not available yet
sleep(1);
}
if(counter++ > 5){
cerr << "ERROR: Bash child process ignored SIGKILL >5 sec!" << endl;
}
} while (pid != *bash_pid && pid != -1);
if(pid == -1){
cerr << "Failed to clean up zombie bash process!" << endl;
exit(1);
}
// re-initialized bash process
*bash_pid = init_bash();
}
Assuming I understand the workings of waitpid correctly, this should first send SIGKILL to the shell, and then essentially sit in a spinlock, trying to reap the resulting process. Eventually, it succeeds and then a new bash process is started with init_bash().
At least, that's what should happen. Instead, the child process's exit status is never collected, and it continues to exist as a zombie process. In spite of this, the parent does exit the loop and manages to restart the bash process, and continues with normal execution. Eventually too many zombies are generated and the system runs out of pids.
Additionally:
Fork is called in exactly one place in the program, inside init_bash.
Checks prevent init_bash from being called except once at the program's start and after a call to the function above.
Thoughts?
Articles that I read indicate that the reason for a zombie process is that a child process does an exit however the parent never collects the child's exit.
This article provides several ways to kill a zombie process from the command line. One technique is to use other signals besides SIGKILL for instance SIGTERM.
This article has an answer which suggests SIGKILL should not be used.
One of the techniques is to kill the parent thereby also killing its child processes including any zombies. The author indicates that there appear to be child processes that just remain as zombies until the OS is restarted.
You do not mention the mechanism used to communicate the commands to the child process. However one option may be to turn the child process loose by disconnecting it from its parent similar to the way a child of a terminal process can be disconnected from the terminal session. That way the child will become its own process and if there is a problem may exit without becoming a zombie.
Looking to fork a process, in c++, that wont hang its parent process - its parent is a daemon and must remain running. If i wait() on the forked process the forked execl wont defunt - but - it will also hang the app - not waiting fixes the app hang - but the command becomes defunt.
if((pid = fork()) < 0)
perror("Error with Fork()");
else if(pid > 0) {
//wait here will hang the execl in the parent
//dont wait will defunt the execl command
//---- wait(&pid);
return "";
} else {
struct rlimit rl;
int i;
if (rl.rlim_max == RLIM_INFINITY)
rl.rlim_max = 1024;
for (i = 0; (unsigned) i < rl.rlim_max; i++)
close(i);
if(execl("/bin/bash", "/bin/bash", "-c", "whoami", (char*) 0) < 0) perror("execl()");
exit(0);
}
How can I fork the execl without a wait(&pid) where execl's command wont defunct?
UPDATE
Fixed by adding the following before the fork
signal(SIGCHLD, SIG_IGN);
Still working with my limited skills at a more compatible solution based on the accepted answer. Thanks!
By default, wait and friends wait until a process has exited, then reap it. You can call waitpid with the WNOHANG to return immediately if no child has exited.
The defunct/"zombie" process will sit around until you wait on it. So if you run it in the background, you must arrange to reap it eventually by any of several ways:
try waitpid with WNOHANG routinely: int pid = waitpid(-1, &status, WNOHANG)
install a signal handler for SIGCHLD to be notified when it exits
Additionally, under POSIX.1-2001, you can use sigaction set the SA_NOCLDWAIT on SIGCHLD. Or set its action to SIG_IGN. Older systems (including Linux 2.4.x, but not 2.6.x or 3.x) don't support this.
Check your system manpages, or alternative the wait in the Single Unix Specification. The Single Unix Spec also gives some helpful code examples. SA_NOCLDWAIT is documented in sigaction.
I think a signal handler would be the best way as indicated. I would like to point out another way this could be handled: Fork twice and have the child exit while the grandchild would call execl. The defunct process would then be cleaned up by the init process.
As said in comment, double fork saves process from defunct state.
What is the reason for performing a double fork when creating a daemon?