I am spawning a process in my application:
int status = posix_spawnp(&m_iProcessHandle, (char*)strProgramFilepath.c_str(), NULL, NULL, argsWrapper.m_pBuffer, NULL);
When I want to see if the process is still running, I use kill:
int iReturn = kill(m_iProcessHandle,0);
But after the spawned process has finished its work, it hangs around. The return value on the kill command is always 0. Not -1. I am calling kill from within the code, but if I call it from the command line, there is no error - the spawned process still exists.
Only when my application exits does the command-line kill return "No such process".
I can change this behavior in my code with this:
int iResult = waitpid(m_iProcessHandle, &iStatus, 0);
The call to waitpd closes down the spawned process and I can call kill and get -1 back, but by then I know the spawned process is dead.
And waitpd blocks my application!
How can I test a spawned processes to see if it is running, but without blocking my application?
UPDATE
Thanks for the help! I have implemented your advise and here is the result:
// background-task.cpp
//
#include <spawn.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <signal.h>
#include "background-task.h"
CBackgroundTask::CBackgroundTask()
{
// Initialize member variables
m_iProcessHandle = 0;
}
CBackgroundTask::~CBackgroundTask()
{
// Clean up (kill first)
_lowLevel_cleanup(true);
}
bool CBackgroundTask::IsRunning()
{
// Shortcuts
if (m_iProcessHandle == 0)
return false;
// Wait for the process to finish
int iStatus = 0;
int iResult = waitpid(m_iProcessHandle, &iStatus, WNOHANG);
return (iResult != -1);
}
void CBackgroundTask::Wait()
{
// Wait (clean up without killing)
_lowLevel_cleanup(false);
}
void CBackgroundTask::Stop()
{
// Stop (kill and clean up)
_lowLevel_cleanup(true);
}
void CBackgroundTask::_start(const string& strProgramFilepath, const string& strArgs, int iNice /*=0*/)
{
// Call pre-start
_preStart();
// Split the args and build array of char-strings
CCharStringAarray argsWrapper(strArgs,' ');
// Run the command
int status = posix_spawnp(&m_iProcessHandle, (char*)strProgramFilepath.c_str(), NULL, NULL, argsWrapper.m_pBuffer, NULL);
if (status == 0)
{
// Process created
cout << "posix_spawn process=" << m_iProcessHandle << " status=" << status << endl;
}
else
{
// Failed
cout << "posix_spawn: error=" << status << endl;
}
// If process created...
if(m_iProcessHandle != 0)
{
// If need to adjust nice...
if (iNice != 0)
{
// Change the nice
stringstream ss;
ss << "sudo renice -n " << iNice << " -p " << m_iProcessHandle;
_runCommand(ss.str());
}
}
else
{
// Call post-stop success=false
_postStop(false);
}
}
void CBackgroundTask::_runCommand(const string& strCommand)
{
// Diagnostics
cout << "Running command: " << COUT_GREEN << strCommand << endl << COUT_RESET;
// Run command
system(strCommand.c_str());
}
void CBackgroundTask::_lowLevel_cleanup(bool bKill)
{
// Shortcuts
if (m_iProcessHandle == 0)
return;
// Diagnostics
cout << "Cleaning up process " << m_iProcessHandle << endl;
// If killing...
if (bKill)
{
// Kill the process
kill(m_iProcessHandle, SIGKILL);
}
// Diagnostics
cout << "Waiting for process " << m_iProcessHandle << " to finish" << endl;
// Wait for the process to finish
int iStatus = 0;
int iResult = waitpid(m_iProcessHandle, &iStatus, 0);
// Diagnostics
cout << "waitpid: status=" << iStatus << " result=" << iResult << endl;
// Reset the process-handle
m_iProcessHandle = 0;
// Call post-stop with success
_postStop(true);
// Diagnostics
cout << "Process cleaned" << endl;
}
Until the parent process calls one of the wait() functions to get the exit status of a child, the child stays around as a zombie process. If you run ps during this time, you'll see that the process is still there in the Z state. So kill() returns 0 because the process exists.
If you don't need to get the child's status, see How can I prevent zombie child processes? for how you can make the child disappear immediately when it exits.
Related
image for what output is supposed to look like:My problem is that I need to write a program that will accept the names of 3 processes as command-line arguments. Each of these processes will run for as many seconds as:(PID%10)*3+5 and terminate. After those 3 children terminated, the parent process
will reschedule each child. When all children have been rescheduled 3 times, the parent will terminate. I have used fork to create the three children but am struggling with getting them to exit with that specific criteria?
using namespace std;
int main(){
int i;
int pid;
for(i=0;i<3;i++) // loop will run n times (n=3)
{
if(fork() == 0)
{
pid = getpid();
cout << "Process p" << i+1 << " pid:" << pid << " Started..." << endl;
exit(0);
}
}
for(int i=0;i<5;i++) // loop will run n times (n=3)
wait(NULL);
}
You can use sigtimedwait to wait for SIGCHLD or timeout.
Working example:
#include <cstdio>
#include <cstdlib>
#include <signal.h>
#include <unistd.h>
template<class... Args>
void start_child(unsigned max_runtime_sec, Args... args) {
// Block SIGCHLD.
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_BLOCK, &set, nullptr);
// Enable SIGCHLD.
signal(SIGCHLD, [](int){});
pid_t child_pid = fork();
switch(child_pid) {
case -1:
std::abort();
case 0: {
// Child process.
execl(args..., nullptr);
abort(); // never get here.
}
default: {
// paren process.
timespec timeout = {};
timeout.tv_sec = max_runtime_sec;
siginfo_t info = {};
int rc = sigtimedwait(&set, nullptr, &timeout);
if(SIGCHLD == rc) {
std::printf("child %u terminated in time with return code %d.\n", static_cast<unsigned>(child_pid), info.si_status);
}
else {
kill(child_pid, SIGTERM);
sigwaitinfo(&set, &info);
std::printf("child %u terminated on timeout with return code %d.\n", static_cast<unsigned>(child_pid), info.si_status);
}
}
}
}
int main() {
start_child(2, "/bin/sleep", "/bin/sleep", "10");
start_child(2, "/bin/sleep", "/bin/sleep", "1");
}
Output:
child 31548 terminated on timeout with return code 15.
child 31549 terminated in time with return code 0.
With these changes your program produces the desired output:
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <iostream>
using namespace std;
int main()
{
for (int round = 0; ++round <= 4; )
{
int i;
cout << "*** ROUND: " << round << " ***\n";
for (i=0; i<3; i++) // loop will run n times (n=3)
{
if (fork() == 0)
{
int pid = getpid();
cout << "Process p" << i+1 << " pid:" << pid << " started...\n";
unsigned int seconds = pid%10*3+5;
cout << "Process " << pid << " exiting after "
<< seconds-sleep(seconds) << " seconds\n";
exit(0);
}
}
while (i--) // loop will run n times (n=3)
{
int status;
cout << "Process " << wait(&status);
cout << " exited with status: " << status << endl;
}
}
}
As Serge suggested, we're using sleep() for every child before exiting it. it will pause the process for a number of seconds.
To get the actual status information, we call wait(&status) instead of wait(NULL).
We're doing this all for the first scheduling round plus the desired 3 times of rescheduling.
I am having a bit of trouble getting waitpid to work could someone please explain what is wrong with this code?
#include <iostream>
#include <sys/wait.h>
#include <unistd.h>
using namespace std;
int main() {
string filename_memory;
decltype(fork()) pid;
if (!(pid = fork())) {
cout << "in child" << endl;
sleep(1);
}
else {
int status_child;
do {
waitpid(pid, &status_child, WNOHANG);
cout << "waiting for child to finish" << endl;
} while (!WIFEXITED(status_child));
cout << "child finished" << endl;
}
return 0;
}
If wait() or waitpid() returns because the status of a child process
is available, these functions shall return a value equal to the
process ID of the child process for which status is reported.
If waitpid() was invoked with WNOHANG set in options, it has at least
one child process specified by pid for which status is not available,
and status is not available for any process specified by pid, 0 is
returned. Otherwise, (pid_t)-1 shall be returned, and errno set to
indicate the error.
This means that the status_child variable has no meaning until waitpid returns the pid of the child.
You can fix this by applying these changes:
int ret;
do {
ret = waitpid(pid, &status_child, WNOHANG);
cout << "waiting for child to finish" << endl;
} while (ret != pid || !WIFEXITED(status_child));
cout << "child finished" << endl;
I wrote a helper function to start a process using fork() and execv() inspired by this answer. It is used to start e.g. mysqldump to make a database backup.
The code works totally fine in a couple of different locations with different programs.
Now I hit one constellation where it fails:
It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends.
If I check, if the worker process finished with kill(), I can tell that it did.
Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process!
Is there anything in my code (see below) that is incorrect that could trigger that behavior?
I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.
What's strange:
Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?
Environment:
The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().
Code (production code in which logging got traded for output via std::cout) with sample main function:
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
namespace {
bool checkStatus(const int status) {
return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}
}
bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
auto result = true;
const pid_t intermediatePid = fork();
if(intermediatePid == 0) {
// intermediate process
std::cout << "Intermediate process: Started (" << getpid() << ")." << std::endl;
const pid_t workerPid = fork();
if(workerPid == 0) {
// worker process
if(fileDescriptor) {
std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
if(-1 == dupResult) {
std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
_exit(EXIT_FAILURE);
}
}
execv(path, const_cast<char**>(argv));
std::cout << "Intermediate process: Worker failed!" << std::endl;
_exit(EXIT_FAILURE);
} else if(-1 == workerPid) {
std::cout << "Intermediate process: Starting worker failed!" << std::endl;
_exit(EXIT_FAILURE);
}
const pid_t timeoutPid = fork();
if(timeoutPid == 0) {
// timeout process
std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
sleep(timeoutInSeconds);
std::cout << "Timeout process: Finished." << std::endl;
_exit(EXIT_SUCCESS);
} else if(-1 == timeoutPid) {
std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition never evaluated to true in my tests.
const auto killResult = kill(workerPid, 0);
if((-1 == killResult) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Waiting for child processes." << std::endl;
int status = -1;
const pid_t exitedPid = wait(&status);
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition evaluates to true in the case of an error.
const auto killResult2 = kill(workerPid, 0);
if((-1 == killResult2) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Child process finished. Status: " << status << "." << std::endl;
if(exitedPid == workerPid) {
std::cout << "Intermediate process: Killing timeout process." << std::endl;
kill(timeoutPid, SIGKILL);
} else {
std::cout << "Intermediate process: Killing worker process." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);
} else if(-1 == intermediatePid) {
// error
std::cout << "Parent process: Error starting intermediate process!" << std::endl;
result = false;
} else {
// parent process
std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
processId = intermediatePid;
}
return(result);
}
bool waitForProcess(const pid_t processId) {
int status = 0;
const auto waitResult = waitpid(processId, &status, 0);
auto result = false;
if(waitResult == processId) {
result = checkStatus(status);
}
return(result);
}
int main() {
pid_t pid = 0;
const char* const path = "/bin/ls";
const char* argv[] = { "/bin/ls", "--help", nullptr };
const unsigned int timeoutInS = 5;
const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
if(startResult) {
const auto waitResult = waitForProcess(pid);
std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
} else {
std::cout << "startProcess failed!" << std::endl;
}
}
Edit
The expected output should contain
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: 0.
Intermediate process: Killing timeout process.
In the case of error the output looks like this
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: -1
Intermediate process: Killing worker process.
When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.
I found the problem:
Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code
#if !defined(_WIN32) && !defined(__SYMBIAN32__)
// Ignore SIGPIPE signal, so if browser cancels the request, it
// won't kill the whole process.
(void) signal(SIGPIPE, SIG_IGN);
// Also ignoring SIGCHLD to let the OS to reap zombies properly.
(void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32
(void) signal(SIGCHLD, SIG_IGN);
causes that
if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."
as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD.
This is also described in the man page for WAIT(2)
ERRORS [...]
ECHILD [...] (This can happen for
one's own child if the action for SIGCHLD is set to SIG_IGN.
See also the Linux Notes section about threads.)
Stupid on my part not to check the return value correctly.
Before trying
if(exitedPid == workerPid) {
I should have checked that exitedPid is != -1.
If I do so errno gives me ECHILD. If I would have known that in the first place, I would have read the man page and probably found the problem faster...
Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.
Additional info:
The code that caused this problem was changed in mongoose in September 2013 with this commit.
In our application the similar issue we faced. in a intense situation of repeated child process forks(), the child process never returned. One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.
So, I have an application that I want to be notified of hotplug events on linux. Naturally, I looked at libudev and its API. I also found a useful tutorial on how to use select() with libudev. Following the tutorial and glancing at the API, I came up with this example program that waits for hotplug events and then outputs some basic information about the device that was just added or removed.
#include <poll.h>
#include <libudev.h>
#include <stdexcept>
#include <iostream>
udev* hotplug;
udev_monitor* hotplug_monitor;
void init()
{
// create the udev object
hotplug = udev_new();
if(!this->hotplug)
{
throw std::runtime_error("cannot create udev object");
}
// create the udev monitor
hotplug_monitor = udev_monitor_new_from_netlink(hotplug, "udev");
// start receiving hotplug events
udev_monitor_enable_receiving(hotplug_monitor);
}
void deinit()
{
// destroy the udev monitor
udev_monitor_unref(hotplug_monitor);
// destroy the udev object
udev_unref(hotplug);
}
void run()
{
// create the poll item
pollfd items[1];
items[0].fd = udev_monitor_get_fd(hotplug_monitor);
items[0].events = POLLIN;
items[0].revents = 0;
// while there are hotplug events to process
while(poll(items, 1, 50) > 0)
{
// XXX
std::cout << "hotplug[ " << items[0].revents << " ]" << std::endl;
// receive the relevant device
udev_device* dev = udev_monitor_receive_device(hotplug_monitor);
if(!dev)
{
// error receiving device, skip it
continue;
}
// XXX
std::cout << "hotplug[" << udev_device_get_action(dev) << "] ";
std::cout << udev_device_get_devnode(dev) << ",";
std::cout << udev_device_get_subsystem(dev) << ",";
std::cout << udev_device_get_devtype(dev) << std::endl;
// destroy the relevant device
udev_device_unref(dev);
// XXX
std::cout << "done" << std::endl;
// clear the revents
items[0].revents = 0;
}
}
int main(int args, char* argv[])
{
init();
while(true)
{
run();
}
deinit();
}
Well, it doesn't work. Here's the output I get when I plug in a usb mouse.
hotplug[ 1 ]
hotplug[add] /dev/bus/usb/008/002,usb,usb_device
done
hotplug[ 1 ]
hotplug[add]
At that point the program freezes and I have to stop it with Ctrl-C. What am I doing wrong?
The program doesn't actually stop; it continues running, but std::cout gets messed up when you try to print a NULL string (not all events have all properties). A fix is to make the three prints (devnode, subsystem, devtype) conditional.
I am forking a number of processes and I want to measure how long it takes to complete the whole task, that is when all processes forked are completed. Please advise how to make the parent process wait until all child processes are terminated? I want to make sure that I stop the timer at the right moment.
Here is as a code I use:
#include <iostream>
#include <string>
#include <fstream>
#include <sys/time.h>
#include <sys/wait.h>
using namespace std;
struct timeval first, second, lapsed;
struct timezone tzp;
int main(int argc, char* argv[])// query, file, num. of processes.
{
int pCount = 5; // process count
gettimeofday (&first, &tzp); //start time
pid_t* pID = new pid_t[pCount];
for(int indexOfProcess=0; indexOfProcess<pCount; indexOfProcess++)
{
pID[indexOfProcess]= fork();
if (pID[indexOfProcess] == 0) // child
{
// code only executed by child process
// magic here
// The End
exit(0);
}
else if (pID[indexOfProcess] < 0) // failed to fork
{
cerr << "Failed to fork" << endl;
exit(1);
}
else // parent
{
// if(indexOfProcess==pCount-1) and a loop with waitpid??
gettimeofday (&second, &tzp); //stop time
if (first.tv_usec > second.tv_usec)
{
second.tv_usec += 1000000;
second.tv_sec--;
}
lapsed.tv_usec = second.tv_usec - first.tv_usec;
lapsed.tv_sec = second.tv_sec - first.tv_sec;
cout << "Job performed in " <<lapsed.tv_sec << " sec and " << lapsed.tv_usec << " usec"<< endl << endl;
}
}//for
}//main
I'd move everything after the line "else //parent" down, outside the for loop. After the loop of forks, do another for loop with waitpid, then stop the clock and do the rest:
for (int i = 0; i < pidCount; ++i) {
int status;
while (-1 == waitpid(pids[i], &status, 0));
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
cerr << "Process " << i << " (pid " << pids[i] << ") failed" << endl;
exit(1);
}
}
gettimeofday (&second, &tzp); //stop time
I've assumed that if the child process fails to exit normally with a status of 0, then it didn't complete its work, and therefore the test has failed to produce valid timing data. Obviously if the child processes are supposed to be killed by signals, or exit non-0 return statuses, then you'll have to change the error check accordingly.
An alternative using wait:
while (true) {
int status;
pid_t done = wait(&status);
if (done == -1) {
if (errno == ECHILD) break; // no more child processes
} else {
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
cerr << "pid " << done << " failed" << endl;
exit(1);
}
}
}
This one doesn't tell you which process in sequence failed, but if you care then you can add code to look it up in the pids array and get back the index.
The simplest method is to do
while(wait() > 0) { /* no-op */ ; }
This will not work if wait() fails for some reason other than the fact that there are no children left. So with some error checking, this becomes
int status;
[...]
do {
status = wait();
if(status == -1 && errno != ECHILD) {
perror("Error during wait()");
abort();
}
} while (status > 0);
See also the manual page wait(2).
Call wait (or waitpid) in a loop until all children are accounted for.
In this case, all processes are synchronizing anyway, but in general wait is preferred when more work can be done (eg worker process pool), since it will return when the first available process state changes.
I believe the wait system call will accomplish what you are looking for.
for (int i = 0; i < pidCount; i++) {
while (waitpid(pids[i], NULL, 0) > 0);
}
It won't wait in the right order, but it will stop shortly after the last child dies.