Linux: fork & execv, wait for child process hangs - c++

I wrote a helper function to start a process using fork() and execv() inspired by this answer. It is used to start e.g. mysqldump to make a database backup.
The code works totally fine in a couple of different locations with different programs.
Now I hit one constellation where it fails:
It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends.
If I check, if the worker process finished with kill(), I can tell that it did.
Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process!
Is there anything in my code (see below) that is incorrect that could trigger that behavior?
I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.
What's strange:
Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?
Environment:
The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().
Code (production code in which logging got traded for output via std::cout) with sample main function:
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
namespace {
bool checkStatus(const int status) {
return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}
}
bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
auto result = true;
const pid_t intermediatePid = fork();
if(intermediatePid == 0) {
// intermediate process
std::cout << "Intermediate process: Started (" << getpid() << ")." << std::endl;
const pid_t workerPid = fork();
if(workerPid == 0) {
// worker process
if(fileDescriptor) {
std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
if(-1 == dupResult) {
std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
_exit(EXIT_FAILURE);
}
}
execv(path, const_cast<char**>(argv));
std::cout << "Intermediate process: Worker failed!" << std::endl;
_exit(EXIT_FAILURE);
} else if(-1 == workerPid) {
std::cout << "Intermediate process: Starting worker failed!" << std::endl;
_exit(EXIT_FAILURE);
}
const pid_t timeoutPid = fork();
if(timeoutPid == 0) {
// timeout process
std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
sleep(timeoutInSeconds);
std::cout << "Timeout process: Finished." << std::endl;
_exit(EXIT_SUCCESS);
} else if(-1 == timeoutPid) {
std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition never evaluated to true in my tests.
const auto killResult = kill(workerPid, 0);
if((-1 == killResult) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Waiting for child processes." << std::endl;
int status = -1;
const pid_t exitedPid = wait(&status);
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition evaluates to true in the case of an error.
const auto killResult2 = kill(workerPid, 0);
if((-1 == killResult2) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Child process finished. Status: " << status << "." << std::endl;
if(exitedPid == workerPid) {
std::cout << "Intermediate process: Killing timeout process." << std::endl;
kill(timeoutPid, SIGKILL);
} else {
std::cout << "Intermediate process: Killing worker process." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);
} else if(-1 == intermediatePid) {
// error
std::cout << "Parent process: Error starting intermediate process!" << std::endl;
result = false;
} else {
// parent process
std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
processId = intermediatePid;
}
return(result);
}
bool waitForProcess(const pid_t processId) {
int status = 0;
const auto waitResult = waitpid(processId, &status, 0);
auto result = false;
if(waitResult == processId) {
result = checkStatus(status);
}
return(result);
}
int main() {
pid_t pid = 0;
const char* const path = "/bin/ls";
const char* argv[] = { "/bin/ls", "--help", nullptr };
const unsigned int timeoutInS = 5;
const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
if(startResult) {
const auto waitResult = waitForProcess(pid);
std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
} else {
std::cout << "startProcess failed!" << std::endl;
}
}
Edit
The expected output should contain
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: 0.
Intermediate process: Killing timeout process.
In the case of error the output looks like this
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: -1
Intermediate process: Killing worker process.
When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.

I found the problem:
Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code
#if !defined(_WIN32) && !defined(__SYMBIAN32__)
// Ignore SIGPIPE signal, so if browser cancels the request, it
// won't kill the whole process.
(void) signal(SIGPIPE, SIG_IGN);
// Also ignoring SIGCHLD to let the OS to reap zombies properly.
(void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32
(void) signal(SIGCHLD, SIG_IGN);
causes that
if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."
as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD.
This is also described in the man page for WAIT(2)
ERRORS [...]
ECHILD [...] (This can happen for
one's own child if the action for SIGCHLD is set to SIG_IGN.
See also the Linux Notes section about threads.)
Stupid on my part not to check the return value correctly.
Before trying
if(exitedPid == workerPid) {
I should have checked that exitedPid is != -1.
If I do so errno gives me ECHILD. If I would have known that in the first place, I would have read the man page and probably found the problem faster...
Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.
Additional info:
The code that caused this problem was changed in mongoose in September 2013 with this commit.

In our application the similar issue we faced. in a intense situation of repeated child process forks(), the child process never returned. One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.

Related

pool of processes using fork() and boost::asio

I am currently trying to implement a process-pool that can be communicated with by the parent process. No child processes should exit until the parent tells thems so (likely using a signal). Upon now a couple of questions arose in my head and I am happy to get some input using my MWE:
#include <iostream>
#include <boost/thread.hpp>
#include <boost/process/async_pipe.hpp>
#include <boost/asio.hpp>
#include <boost/array.hpp>
static const std::size_t process_count = 3;
static void start_reading(boost::process::async_pipe& ap)
{
static boost::array<char, 256> buf;
ap.async_read_some(boost::asio::buffer(buf.data(), buf.size()), [](const boost::system::error_code& error, std::size_t bytes_transferred)
{
if(!error)
{
std::cout << "received " << bytes_transferred << " from pid " << getpid() << " " << buf[0] << "...." << std::endl;
// perform some heavy computations here..
}
});
}
static void start_writing(boost::process::async_pipe& ap)
{
boost::array<char, 256> buf;
buf.fill('A');
ap.async_write_some(boost::asio::buffer(buf.data(), buf.size()), [&ap](const boost::system::error_code& error, std::size_t bytes_transferred)
{
if(!error)
{
std::cout << "parent " << getpid() << " sent " << bytes_transferred << " to [" << ap.native_source() << "," << ap.native_sink() << "]" << std::endl;
}
});
}
int main()
{
try
{
boost::asio::io_service io_context;
// prevent the associated executor from stopping
boost::asio::executor_work_guard<boost::asio::io_context::executor_type> guard = boost::asio::make_work_guard(io_context);
pid_t main_process = getpid();
std::cout << "before forks " << main_process << std::endl;
std::vector<boost::process::async_pipe> pipes;
pipes.reserve(process_count);
for(std::size_t i = 0; i < process_count; i++)
{
pipes.emplace_back(io_context);
io_context.notify_fork(boost::asio::io_service::fork_prepare);
pid_t pid = fork();
if(pid == 0)
{
io_context.notify_fork(boost::asio::io_service::fork_child);
// perform some costly initialization here...
boost::process::async_pipe& ap = pipes[i];
std::cout << "child " << getpid() << " listening to [" << ap.native_source() << "," << ap.native_sink() << "]" << std::endl;
start_reading(ap);
io_context.run();
}
else if(pid > 0)
{
io_context.notify_fork(boost::asio::io_service::fork_parent);
}
else
{
std::cerr << "fork() failed" << std::endl;
}
}
// only parent gets there
start_writing(pipes[0]);
start_writing(pipes[0]);
start_writing(pipes[1]);
start_writing(pipes[2]);
io_context.run();
}
catch(const std::exception& e)
{
std::cerr << e.what() << std::endl;
}
return 1;
}
The program outputs
before forks 15603
child 15611 listening to [8,9]
child 15612 listening to [10,11]
parent 15603 sent 256 to [8,9]
parent 15603 sent 256 to [8,9]
parent 15603 sent 256 to [10,11]
parent 15603 sent 256 to [21,22]
received 256 from pid 15612 A....
received 256 from pid 15611 A....
child 15613 listening to [21,22]
received 256 from pid 15613 A....
My main concern at the time is how to infinitely read data in the worker processes (the childs) as long as the process is not already busy. As soon as the worker gets into the handler from async_read_some, it performs some computations as stated in the comment (might take a few seconds). While doing this, the process should and will block, afterwards I want to notify my parent to be ready again and accept new reads over the pipe. So far I don't have any profound idea how to do this. Notifying the parent from the child is not necessary per-se, but the parent needs to keep track of all idle child processes all time, so it can send new input via the corresponding pipe.
Apart from that there is one thing I didn't get yet:
Notice that boost::array<char, 256> buf; is static in start_reading. If I remove the static modifier I never get into the completion handler of async_read_some, why is that?
EDIT:
calling start_reading again in completion routine, will continue read. However, without the parent process "knowing" it.
EDIT2:
Until now I figured out one possible way (guess there are several) that might work. I am not finished with the implementation but the shared mutex works as expected. Here is some pseudo-code:
process_pool
{
worker get_next_worker()
ScopedLock(memory_mapped_mutex);
free_worker = *available.rbegin()
available.pop_back()
return free_worker;
memory_mapped_vec<worker> available;
};
server::completion_handler_async_connect()
get_next_worker().socket().write(request)
worker::completion_handler_async_read()
// do something very long before locking
ScopedLock(memory_mapped_mutex);
process_pool::available.push_back(self);
Apart from that there is one thing I didn't get yet: Notice that boost::array<char, 256> buf; is static in start_reading. If I remove the static modifier I never get into the completion handler of async_read_some, why is that?
That's because the buf is a local varuable, and it doesn't exist after start_reading exits. However async_read (or any other async_XXXX call) returns immediately, without waiting for the operation to complete. So if the buffer doesn't persist then you are writing into a a dangling reference to unspecified stack space, leading to Undefined Behaviour.
As for communicating back and forth, that is unnecessarily complicated between processes. Is there any reason you can't use multi-threading? That way all workers can simply monitor a shared queue.
Of course you can setup the same with a queue shared between processes (in which case I would advise against doing it via pipes with Asio, but instead use message_queue.

Child Process runs even after parent process has exited?

I was writing a code for a research program. I have following requirement:
1. Main binary execution begins at main()
2. main() fork()
3. child process runs a linpack benchmark binary using execvp()
4. parent process runs some monitoring process and wait for child to exit.
The code is below:
main.cpp
extern ServerUncorePowerState * BeforeStates ;
extern ServerUncorePowerState * AfterStates;
int main(int argc, char *argv[]) {
power pwr;;
procstat st;
membandwidth_t data;
int sec_pause = 1; // sample every 1 second
pid_t child_pid = fork();
if (child_pid >= 0) { //fork successful
if (child_pid == 0) { // child process
int exec_status = execvp(argv[1], argv+1);
if (exec_status) {
std::cerr << "execv failed with error "
<< errno << " "
<< strerror(errno) << std::endl;
}
} else { // parent process
int status = 1;
waitpid(child_pid, &status, WNOHANG);
write_headers();
pwr.init();
st.init();
init_bandwidth();
while (status) {
cout << " Printing status Value: " << status << endl;
sleep (sec_pause);
time_t now;
time(&now);
struct tm *tinfo;
tinfo = localtime(&now);
pwr.loop();
st.loop();
data = getbandwidth();
write_samples(tinfo, pwr, st, data.read_bandwidth + data.write_bandwidth);
waitpid(child_pid, &status, WNOHANG);
}
wait(&status); // wait for child to exit, and store its status
//--------------------This code is not executed------------------------
std::cout << "PARENT: Child's exit code is: "
<< WEXITSTATUS(status)
<< std::endl;
delete[] BeforeStates;
delete[] AfterStates;
}
} else {
std::cerr << "fork failed" << std::endl;
return 1;
}
return 0;
}
What is expected that the child will exit and then parent exits but due to some unknown reason after 16 mins parent exits but child is still running.
Normally It is said that when parent exits the child dies automatically.
What could be the reason for this strange behavior???
Normally It is said that when parent exits the child dies automatically.
Well this is not always true, it depends on the system. When a parent process terminates, the child process is called an orphan process. In a Unix-like OS this is managed by relating the parent process of the orphan process to the init process, this is called re-parenting and it's automatically managed by the OS. In other types of OS, orphan processes are automatically killed by the system. You can find more details here.
From the code snippet I would think that maybe the issue is in the wait(&status) statement. The previous loop would end (or not be executed) when the return status is 0, which is the default return value from your final return 0 at the end, that could be yielded by the previous waitpid(child_pid, &status, WNOHANG) statements. This means that the wait(&status) statement would wait on a already terminated process, this may cause some issues.

Supervisor Program Forking to a Multi-threaded Child

First off, allow me to describe my scenario:
I developed a supervisory program on Linux that forks and then uses execv(), in the child process, to launch my multi-threaded application. The supervisory program is acting as a watchdog to the multi-threaded application. If the multi-threaded application does not send a SIGUSR1 signal to the supervisor after a period of time then the supervisory program will kill the child using the pid_t from the fork() call and repeat the process again.
Here is the code for the Supervisory Program:
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <iostream>
#include <cerrno>
time_t heartbeatTime;
void signalHandler(int sigNum)
{
//std::cout << "Signal (" << sigNum << ") received.\n";
time(&heartbeatTime);
}
int main(int argc, char *argv[])
{
pid_t cpid, ppid;
int result = 0;
bool programLaunched = false;
time_t now;
double timeDiff;
int error;
char ParentID[25];
char *myArgv[2];
// Get the Parent Process ID
ppid = ::getpid();
// Initialize the Child Process ID
cpid = 0;
// Copy the PID into the char array
sprintf(ParentID, "%i", ppid);
// Set up the array to pass to the Program
myArgv[0] = ParentID;
myArgv[1] = 0;
// Print out of the P PID
std::cout << "Parent ID: " << myArgv[0] << "\n";
// Register for the SIGUSR1 signal
signal(SIGUSR1, signalHandler);
// Register the SIGCHLD so the children processes exit fully
signal(SIGCHLD, SIG_IGN);
// Initialize the Heart Beat time
time(&heartbeatTime);
// Loop forever and ever, amen.
while (1)
{
// Check to see if the program has been launched
if (programLaunched == false)
{
std::cout << "Forking the process\n";
// Fork the process to launch the application
cpid = fork();
std::cout << "Child PID: " << cpid << "\n";
}
// Check if the fork was successful
if (cpid < 0)
{
std::cout << "Error in forking.\n";
// Error in forking
programLaunched = false;
}
else if (cpid == 0)
{
// Check if we need to launch the application
if (programLaunched == false)
{
// Send a message to the output
std::cout << "Launching Application...\n";
// Launch the Application
result = execv("./MyApp", myArgv);
std::cout << "execv result = " << result << "\n";
// Check if the program launched has failed
if (result != -1)
{
// Indicate the program has been launched
programLaunched = true;
// Exit the child process
return 0;
}
else
{
std::cout << "Child process terminated; bad execv\n";
// Flag that the program has not been launched
programLaunched = false;
// Exit the child process
return -1;
}
}
}
// In the Parent Process
else
{
// Get the current time
time(&now);
// Get the time difference between the program heartbeat time and current time
timeDiff = difftime(now, heartbeatTime);
// Check if we need to restart our application
if ((timeDiff > 60) && (programLaunched == true))
{
std::cout << "Killing the application\n";
// Kill the child process
kill(cpid, SIGINT);
// Indicate that the process was ended
programLaunched = false;
// Reset the Heart Beat time
time(&heartbeatTime);
return -1;
}
// Check to see if the child application is running
if (kill(cpid, 0) == -1)
{
// Get the Error
error = errno;
// Check if the process is running
if (error == ESRCH)
{
std::cout << "Process is not running; start it.\n";
// Process is not running.
programLaunched = false;
return -1;
}
}
else
{
// Child process is running
programLaunched = true;
}
}
// Give the process some time off.
sleep(5);
}
return 0;
}
This approach worked fairly well until I ran into a problem with the library I was using. It didn't like all of the killing and it basically ended up tying up my Ethernet port in an endless loop of never releasing - not good.
I then tried an alternative method. I modified the supervisory program to allow it to exit if it had to kill the multi-threaded application and I created a script that will launch the supervisor program from crontab. I used a shell script that I found on Stackoverflow.
#!/bin/bash
#make-run.sh
#make sure a process is always running.
export DISPLAY=:0 #needed if you are running a simple gui app.
process=YourProcessName
makerun="/usr/bin/program"
if ps ax | grep -v grep | grep $process > /dev/null
then
exit
else
$makerun &
fi
exit
I added it to crontab to run every minute. That was very helpful and it restarted the supervisory program which in turn restarted multi-threaded application but I noticed a problem of multiple instances of the multi-threaded application being launched. I'm not really sure why this was happening.
I know I'm really hacking this up but I'm backed into a corner with this implementation. I'm just trying to get it to work.
Suggestions?

fork() and waitpid() not waiting for child

I am having a bit of trouble getting waitpid to work could someone please explain what is wrong with this code?
#include <iostream>
#include <sys/wait.h>
#include <unistd.h>
using namespace std;
int main() {
string filename_memory;
decltype(fork()) pid;
if (!(pid = fork())) {
cout << "in child" << endl;
sleep(1);
}
else {
int status_child;
do {
waitpid(pid, &status_child, WNOHANG);
cout << "waiting for child to finish" << endl;
} while (!WIFEXITED(status_child));
cout << "child finished" << endl;
}
return 0;
}
If wait() or waitpid() returns because the status of a child process
is available, these functions shall return a value equal to the
process ID of the child process for which status is reported.
If waitpid() was invoked with WNOHANG set in options, it has at least
one child process specified by pid for which status is not available,
and status is not available for any process specified by pid, 0 is
returned. Otherwise, (pid_t)-1 shall be returned, and errno set to
indicate the error.
This means that the status_child variable has no meaning until waitpid returns the pid of the child.
You can fix this by applying these changes:
int ret;
do {
ret = waitpid(pid, &status_child, WNOHANG);
cout << "waiting for child to finish" << endl;
} while (ret != pid || !WIFEXITED(status_child));
cout << "child finished" << endl;

How to find out whether child process still is running?

I am spawning a process in my application:
int status = posix_spawnp(&m_iProcessHandle, (char*)strProgramFilepath.c_str(), NULL, NULL, argsWrapper.m_pBuffer, NULL);
When I want to see if the process is still running, I use kill:
int iReturn = kill(m_iProcessHandle,0);
But after the spawned process has finished its work, it hangs around. The return value on the kill command is always 0. Not -1. I am calling kill from within the code, but if I call it from the command line, there is no error - the spawned process still exists.
Only when my application exits does the command-line kill return "No such process".
I can change this behavior in my code with this:
int iResult = waitpid(m_iProcessHandle, &iStatus, 0);
The call to waitpd closes down the spawned process and I can call kill and get -1 back, but by then I know the spawned process is dead.
And waitpd blocks my application!
How can I test a spawned processes to see if it is running, but without blocking my application?
UPDATE
Thanks for the help! I have implemented your advise and here is the result:
// background-task.cpp
//
#include <spawn.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <signal.h>
#include "background-task.h"
CBackgroundTask::CBackgroundTask()
{
// Initialize member variables
m_iProcessHandle = 0;
}
CBackgroundTask::~CBackgroundTask()
{
// Clean up (kill first)
_lowLevel_cleanup(true);
}
bool CBackgroundTask::IsRunning()
{
// Shortcuts
if (m_iProcessHandle == 0)
return false;
// Wait for the process to finish
int iStatus = 0;
int iResult = waitpid(m_iProcessHandle, &iStatus, WNOHANG);
return (iResult != -1);
}
void CBackgroundTask::Wait()
{
// Wait (clean up without killing)
_lowLevel_cleanup(false);
}
void CBackgroundTask::Stop()
{
// Stop (kill and clean up)
_lowLevel_cleanup(true);
}
void CBackgroundTask::_start(const string& strProgramFilepath, const string& strArgs, int iNice /*=0*/)
{
// Call pre-start
_preStart();
// Split the args and build array of char-strings
CCharStringAarray argsWrapper(strArgs,' ');
// Run the command
int status = posix_spawnp(&m_iProcessHandle, (char*)strProgramFilepath.c_str(), NULL, NULL, argsWrapper.m_pBuffer, NULL);
if (status == 0)
{
// Process created
cout << "posix_spawn process=" << m_iProcessHandle << " status=" << status << endl;
}
else
{
// Failed
cout << "posix_spawn: error=" << status << endl;
}
// If process created...
if(m_iProcessHandle != 0)
{
// If need to adjust nice...
if (iNice != 0)
{
// Change the nice
stringstream ss;
ss << "sudo renice -n " << iNice << " -p " << m_iProcessHandle;
_runCommand(ss.str());
}
}
else
{
// Call post-stop success=false
_postStop(false);
}
}
void CBackgroundTask::_runCommand(const string& strCommand)
{
// Diagnostics
cout << "Running command: " << COUT_GREEN << strCommand << endl << COUT_RESET;
// Run command
system(strCommand.c_str());
}
void CBackgroundTask::_lowLevel_cleanup(bool bKill)
{
// Shortcuts
if (m_iProcessHandle == 0)
return;
// Diagnostics
cout << "Cleaning up process " << m_iProcessHandle << endl;
// If killing...
if (bKill)
{
// Kill the process
kill(m_iProcessHandle, SIGKILL);
}
// Diagnostics
cout << "Waiting for process " << m_iProcessHandle << " to finish" << endl;
// Wait for the process to finish
int iStatus = 0;
int iResult = waitpid(m_iProcessHandle, &iStatus, 0);
// Diagnostics
cout << "waitpid: status=" << iStatus << " result=" << iResult << endl;
// Reset the process-handle
m_iProcessHandle = 0;
// Call post-stop with success
_postStop(true);
// Diagnostics
cout << "Process cleaned" << endl;
}
Until the parent process calls one of the wait() functions to get the exit status of a child, the child stays around as a zombie process. If you run ps during this time, you'll see that the process is still there in the Z state. So kill() returns 0 because the process exists.
If you don't need to get the child's status, see How can I prevent zombie child processes? for how you can make the child disappear immediately when it exits.