Boost.Process wait_for_exit(child): crash? - c++

I am using version 0.5 of Boost.Process. Documentation can be found here. I am using Mac OS X Yosemite.
My problem: I am launching a compilation as a child process. I want to wait for the process to finish.
When my child process compiles correctly, everything is ok.
But when my child process does not compile, my code seems to crash when calling boost::process::wait_for_exit.
My user code looks like this:
EDIT: Code has been edited to match latest, more correct version (still does not work).
s::error_code ec{};
bp::child child = bp::execute(bpi::set_args(compilationCommand),
bpi::bind_stderr(outErrLog_),
bpi::bind_stdout(outErrLog_),
bpi::inherit_env(),
bpi::set_on_error(ec));
bool compilationSuccessful = true;
if (!ec) {
s::error_code ec2;
bp::wait_for_exit(child, ec2);
if (ec2)
compilationSuccessful = false;
}
The internal implementation of bp::wait_for_exit:
template <class Process>
inline int wait_for_exit(const Process &p, boost::system::error_code &ec)
{
pid_t ret;
int status;
do
{
ret = ::waitpid(p.pid, &status, 0);
} while ((ret == -1 && errno == EINTR) || (ret != -1 && !WIFEXITED(status)));
if (ret == -1) {
BOOST_PROCESS_RETURN_LAST_SYSTEM_ERROR("waitpid(2) failed");
}
else
ec.clear();
return status;
}
The code after ::waitpidis never reached when my compilation command fails. The error shown is: "child has exited; pid: xxxx; uid: yyy; exit value: 1".
Questions:
Is this a bug or I am misusing boost::process::wait_for_exit.
Any workaround for avoiding the crash I am getting that is portable?

Just looking at your code, the first thing that strikes me is that you don't actually test the "ec" variable that says whether execute() succeeded or not until after you call wait_for_exit(). If you're calling wait_for_exit() with an invalid child process, it's quite understandable that it would crash.
Start by checking "ec" before calling wait_for_exit().

So the problem was that Boost.Test modifies the signals stack in some way.
This signal stack modification has interactions with Boost.Process and code cannot be reliably tested, at least in the default Boost.Test configuration.
I rewrote the tests with a normal main and some functions and it did the job.

Related

How to use signals in C++ and How they react?

I am trying to learn all interactions about signals and I discovered a funny interaction in it I can't understand.
Here's an abstract of the program, Im instructed to do execvp with grandchild, while child needs to wait for grandchild to finish. It runs correctly when without any signal interactions.
void say_Hi(int num) { printf("Finished\n"); }
int main() {
int i = 2;
char *command1[] = {"sleep", "5", NULL};
char *command2[] = {"sleep", "10", NULL};
signal(SIGCHLD, SIG_IGN);
signal(SIGUSR1, say_Hi);
while(i > 0) {
pid_t pid = fork();
if (pid == 0) {
pid_t pidChild = fork();
if (pidChild == 0) {
if (i == 2) {
execvp(command1[0], command1);
} else {
execvp(command2[0], command2);
}
} else if (pidChild > 0) {
waitpid(pidChild, 0, 0);
// kill(pid, SIGUSR1);
printf("pid finished: %d\n", pidChild);
exit(EXIT_FAILURE);
}
exit(EXIT_FAILURE);
} else {
//parent immediately goes to next loop
i--;
}
}
cin >> i; //just for me to pause and observate answers above
return 0;
}
As shown above, kill(pid, SIGUSR1); is commented, the program runs correctly.
Output:
pid finished: 638532 //after 5 sec
pid finished: 638533 //after 10 sec
However, when it is uncommented. Output becomes:
Finished
pid finished: 638610 //after 5 sec
Finished
Finished
Finished
Finished
pid finished: 638611 //after 5 sec too, why?
Finished
I would like to ask:
The whole program finished at once after 5 secs and a total of 6 "Finished" is printed out. Why is so?
Is there a way for me to modify it so that say_Hi function run in a total of two times only, in a correct time interval?
Please forgive me if my code looks stupid or bulky, I'm a newbie in programming. Any Comments about my code and help are appreciated!
void say_Hi(int num) { printf("Finished\n"); }
printf cannot be called in a signal handler. None of the C or the C++ library functions (with few exceptions) can be called in the signal handler. You can't even allocate or delete any memory from a signal handler (using either the C or the C++ library), except by using low-level OS calls like brk() or sbrk(). This is because of a very simple reason: that none of the C or the C++ library functions are signal-safe (with very few exceptions). Only function calls that are explicitly designated as "signal-safe" can be called from a signal handler. None of the C or C++ library functions or classes (with few exceptions) are signal-safe. The End.
The only thing that can be called from a signal handler are low-level operating system calls, like read() and write(), that operate directly on file handles. They are, by definition, signal-safe.
For this simple reason the shown code, when it comes to signals, is undefined behavior. Trying to analyze or figure out your programs behavior, from that respect, such as why or why not you see this message, is completely pointless. It cannot be logically analyzed. This is undefined behavior.
Answer:
kill(getpid(), SIG_USR1);

C++ Timed Process

I'm trying to set up some test software for code that is already written (that I cannot change). The issue I'm having is that it is getting hung up on certain calls, so I want to try to implement something that will kill the process if it does not complete in x seconds.
The two methods I've tried to solve this problem were to use fork or pthread, both haven't worked for me so far though. I'm not sure why pthread didn't work, I'm assuming it's because the static call I used to set up the thread had some issues with the memory needed to run the function I was calling (I continually got a segfault while the function I was testing was running). Fork worked initially, but on the second time I would fork a process, it wouldn't be able to check to see if the child had finished or not.
In terms of semi-pseudo code, this is what I've written
test_runner()
{
bool result;
testClass* myTestClass = new testClass();
pid_t pID = fork();
if(pID == 0) //Child
{
myTestClass->test_function(); //function in question being tested
}
else if(pID > 0) //Parent
{
int status;
sleep(5);
if(waitpid(0,&status,WNOHANG) == 0)
{
kill(pID,SIGKILL); //If child hasn't finished, kill process and fail test
result = false;
}
else
result = true;
}
}
This method worked for the initial test, but then when I would go to test a second function, the if(waitpid(0,&status,WNOHANG) == 0) would return that the child had finished, even when it had not.
The pthread method looked along these lines
bool result;
test_runner()
{
long thread = 1;
pthread_t* thread_handle = (pthread_t*) malloc (sizeof(pthread_t));
pthread_create(&thread_handle[thread], NULL, &funcTest, (void *)&thread); //Begin class that tests function in question
sleep(10);
if(pthread_cancel(thread_handle[thread] == 0))
//Child process got stuck, deal with accordingly
else
//Child process did not get stuck, deal with accordingly
}
static void* funcTest(void*)
{
result = false;
testClass* myTestClass = new testClass();
result = myTestClass->test_function();
}
Obviously there is a little more going on than what I've shown, I just wanted to put the general idea down. I guess what I'm looking for is if there is a better way to go about handling a problem like this, or maybe if someone sees any blatant issues with what I'm trying to do (I'm relatively new to C++). Like I mentioned, I'm not allowed to go into the code that I'm setting up the test software for, which prevents me from putting signal handlers in the function I'm testing. I can only call the function, and then deal with it from there.
If c++11 is legit you could utilize future with wait_for for this purpose.
For example (live demo):
std::future<int> future = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status = future.wait_for(std::chrono::seconds(5));
if (status == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Success
std::future<int> future2 = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status2 = future2.wait_for(std::chrono::seconds(1));
if (status2 == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Timeout
Another thing:
As per the documentation using waitpid with 0 :
meaning wait for any child process whose process group ID is equal to
that of the calling process.
Avoid using pthread_cancel it's probably not a good idea.

waitpid/wexitstatus returning 0 instead of correct return code

I have the helper function below, used to execute a command and get the return value on posix systems. I used to use popen, but it is impossible to get the return code of an application with popen if it runs and exits before popen/pclose gets a chance to do its work.
The following helper function creates a process fork, uses execvp to run the desired external process, and then the parent uses waitpid to get the return code. I'm seeing odd cases where it's refusing to run.
When called with wait = true, waitpid should return the exit code of the application no matter what. However, I'm seeing stdout output that specifies the return code should be non-zero, yet the return code is zero. Testing the external process in a regular shell, then echoing $? returns non-zero, so it's not a problem w/ the external process not returning the right code. If it's of any help, the external process being run is mount(8) (yes, I know I can use mount(2) but that's besides the point).
I apologize in advance for a code dump. Most of it is debugging/logging:
inline int ForkAndRun(const std::string &command, const std::vector<std::string> &args, bool wait = false, std::string *output = NULL)
{
std::string debug;
std::vector<char*> argv;
for(size_t i = 0; i < args.size(); ++i)
{
argv.push_back(const_cast<char*>(args[i].c_str()));
debug += "\"";
debug += args[i];
debug += "\" ";
}
argv.push_back((char*)NULL);
neosmart::logger.Debug("Executing %s", debug.c_str());
int pipefd[2];
if (pipe(pipefd) != 0)
{
neosmart::logger.Error("Failed to create pipe descriptor when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
pid_t pid = fork();
if (pid == 0)
{
close(pipefd[STDIN_FILENO]); //child isn't going to be reading
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);
if (execvp(command.c_str(), &argv[0]) != 0)
{
exit(EXIT_FAILURE);
}
return 0;
}
else if (pid < 0)
{
neosmart::logger.Error("Failed to fork when trying to launch %s", debug.c_str());
return EXIT_FAILURE;
}
else
{
close(pipefd[STDOUT_FILENO]);
int exitCode = 0;
if (wait)
{
waitpid(pid, &exitCode, wait ? __WALL : (WNOHANG | WUNTRACED));
std::string result;
char buffer[128];
ssize_t bytesRead;
while ((bytesRead = read(pipefd[STDIN_FILENO], buffer, sizeof(buffer)-1)) != 0)
{
buffer[bytesRead] = '\0';
result += buffer;
}
if (wait)
{
if ((WIFEXITED(exitCode)) == 0)
{
neosmart::logger.Error("Failed to run command %s", debug.c_str());
neosmart::logger.Info("Output:\n%s", result.c_str());
}
else
{
neosmart::logger.Debug("Output:\n%s", result.c_str());
exitCode = WEXITSTATUS(exitCode);
if (exitCode != 0)
{
neosmart::logger.Info("Return code %d", (exitCode));
}
}
}
if (output)
{
result.swap(*output);
}
}
close(pipefd[STDIN_FILENO]);
return exitCode;
}
}
Note that the command is run OK with the correct parameters, the function proceeds without any problems, and WIFEXITED returns TRUE. However, WEXITSTATUS returns 0, when it should be returning something else.
Probably isn't your main issue, but I think I see a small problem. In your child process, you have...
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO); //but wait, this pipe is closed!
But I think what you want is:
dup2(pipefd[STDOUT_FILENO], STDOUT_FILENO);
dup2(pipefd[STDOUT_FILENO], STDERR_FILENO);
close(pipefd[STDOUT_FILENO]); //now that it's been dup2'd for both, can close
I don't have much experience with forks and pipes in Linux, but I did write a similar function pretty recently. You can take a look at the code to compare, if you'd like. I know that my function works.
execAndRedirect.cpp
I'm using the mongoose library, and grepping my code for SIGCHLD revealed that using mg_start from mongoose results in setting SIGCHLD to SIG_IGN.
From the waitpid man page, on Linux a SIGCHLD set to SIG_IGN will not create a zombie process, so waitpid will fail if the process has already successfully run and exited - but will run OK if it hasn't yet. This was the cause of the sporadic failure of my code.
Simply re-setting SIGCHLD after calling mg_start to a void function that does absolutely nothing was enough to keep the zombie records from being immediately erased.
Per #Geoff_Montee's advice, there was a bug in my redirect of STDERR, but this was not responsible for the problem as execvp does not store the return value in STDERR or even STDOUT, but rather in the kernel object associated with the parent process (the zombie record).
#jilles' warning about non-contiguity of vector in C++ does not apply for C++03 and up (only valid for C++98, though in practice, most C++98 compilers did use contiguous storage, anyway) and was not related to this issue. However, the advice on reading from the pipe before blocking and checking the output of waitpid is spot-on.
I've found that pclose does NOT block and wait for the process to end, contrary to the documentation (this is on CentOS 6). I've found that I need to call pclose and then call waitpid(pid,&status,0); to get the true return value.

How to fork() and exec() in this one?

I'm writing my own shell, but no fork gives my child_pid = 0...
What's wrong there?
while(true)
{
read_command(command);
if ((child_pid = fork()) == -1)
{
fprintf(stderr, "can't fork\n");
exit(1);
}
else if (child_pid == 0) //child
{
status=execl("./myShell" command);
}
else
{
wait(status); //parent
}
}
I guess that the (child_pid == -1) is not entered... Is the father (else) branch entered twice (by both process) or what?
Anyway I can't see a bug in that snippet of code. If you're sure your execution flow gets there, and has an unpredictable behavior its because of a bug.
I doubt glibc is bugged on your system: my best guess is that your program has got a broken pointer that broke everything. This is the most common cause of this kind of really weird behaviors.
Your code is OK. Add a debug print in if(child_pid == 0) and make sure it is not called. If fork cannot create a child, it sets errno to indicate the error occurred.

How to handle execvp(...) errors after fork()?

I do the regular thing:
fork()
execvp(cmd, ) in child
If execvp fails because no cmd is found, how can I notice this error in parent process?
The well-known self-pipe trick can be adapted for this purpose.
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <sysexits.h>
#include <unistd.h>
int main(int argc, char **argv) {
int pipefds[2];
int count, err;
pid_t child;
if (pipe(pipefds)) {
perror("pipe");
return EX_OSERR;
}
if (fcntl(pipefds[1], F_SETFD, fcntl(pipefds[1], F_GETFD) | FD_CLOEXEC)) {
perror("fcntl");
return EX_OSERR;
}
switch (child = fork()) {
case -1:
perror("fork");
return EX_OSERR;
case 0:
close(pipefds[0]);
execvp(argv[1], argv + 1);
write(pipefds[1], &errno, sizeof(int));
_exit(0);
default:
close(pipefds[1]);
while ((count = read(pipefds[0], &err, sizeof(errno))) == -1)
if (errno != EAGAIN && errno != EINTR) break;
if (count) {
fprintf(stderr, "child's execvp: %s\n", strerror(err));
return EX_UNAVAILABLE;
}
close(pipefds[0]);
puts("waiting for child...");
while (waitpid(child, &err, 0) == -1)
if (errno != EINTR) {
perror("waitpid");
return EX_SOFTWARE;
}
if (WIFEXITED(err))
printf("child exited with %d\n", WEXITSTATUS(err));
else if (WIFSIGNALED(err))
printf("child killed by %d\n", WTERMSIG(err));
}
return err;
}
Here's a complete program.
$ ./a.out foo
child's execvp: No such file or directory
$ (sleep 1 && killall -QUIT sleep &); ./a.out sleep 60
waiting for child...
child killed by 3
$ ./a.out true
waiting for child...
child exited with 0
How this works:
Create a pipe, and make the write endpoint CLOEXEC: it auto-closes when an exec is successfully performed.
In the child, try to exec. If it succeeds, we no longer have control, but the pipe is closed. If it fails, write the failure code to the pipe and exit.
In the parent, try to read from the other pipe endpoint. If read returns zero, then the pipe was closed and the child must have exec successfully. If read returns data, it's the failure code that our child wrote.
You terminate the child (by calling _exit()) and then the parent can notice this (through e.g. waitpid()). For instance, your child could exit with an exit status of -1 to indicate failure to exec. One caveat with this is that it is impossible to tell from your parent whether the child in its original state (i.e. before exec) returned -1 or if it was the newly executed process.
As suggested in the comments below, using an "unusual" return code would be appropriate to make it easier to distinguish between your specific error and one from the exec()'ed program. Common ones are 1, 2, 3 etc. while higher numbers 99, 100, etc. are more unusual. You should keep your numbers below 255 (unsigned) or 127 (signed) to increase portability.
Since waitpid blocks your application (or rather, the thread calling it) you will either need to put it on a background thread or use the signalling mechanism in POSIX to get information about child process termination. See the SIGCHLD signal and the sigaction function to hook up a listener.
You could also do some error checking before forking, such as making sure the executable exists.
If you use something like Glib, there are utility functions to do this, and they come with pretty good error reporting. Take a look at the "spawning processes" section of the manual.
1) Use _exit() not exit() - see http://opengroup.org/onlinepubs/007908775/xsh/vfork.html - NB: applies to fork() as well as vfork().
2) The problem with doing more complicated IPC than the exit status, is that you have a shared memory map, and it's possible to get some nasty state if you do anything too complicated - e.g. in multithreaded code, one of the killed threads (in the child) could have been holding a lock.
Not should you wonder how you can notice it in parent process, but also you should keep in mind that you must notice the error in parent process. That's especially true for multithreaded applications.
After execvp you must place a call to function that terminates the process in any case. You should not call any complex functions that interact with C library (such as stdio), since effects of them may mingle with pthreads of libc functionality of parent process. So you can't print a message with printf() in child process and have to inform parent about the error instead.
The easiest way, among the other, is passing return code. Supply nonzero argument to _exit() function (see note below) you used to terminate the child and then examine the return code in the parent. Here's the example:
int pid, stat;
pid = fork();
if (pid == 0){
// Child process
execvp(cmd);
if (errno == ENOENT)
_exit(-1);
_exit(-2);
}
wait(&stat);
if (!WIFEXITED(stat)) { // Error happened
...
}
Instead of _exit(), you might think of exit() function, but it's incorrect, since this function will do a part of the C-library cleanup that should be done only when parent process terminates. Instead, use _exit() function, that doesn't do such a cleanup.
Well, you could use the wait/waitpid functions in the parent process. You can specify a status variable that holds info about the status of the process that terminated. The downside is that the parent process is blocked until the child process finishes execution.
Anytime exec fails in a subprocess, you should use kill(getpid(),SIGKILL) and the parent should always have a signal handler for SIGCLD and tell the user of the program, in the appropriate way, that the process was not successfully started.