I was writing a code for a research program. I have following requirement:
1. Main binary execution begins at main()
2. main() fork()
3. child process runs a linpack benchmark binary using execvp()
4. parent process runs some monitoring process and wait for child to exit.
The code is below:
main.cpp
extern ServerUncorePowerState * BeforeStates ;
extern ServerUncorePowerState * AfterStates;
int main(int argc, char *argv[]) {
power pwr;;
procstat st;
membandwidth_t data;
int sec_pause = 1; // sample every 1 second
pid_t child_pid = fork();
if (child_pid >= 0) { //fork successful
if (child_pid == 0) { // child process
int exec_status = execvp(argv[1], argv+1);
if (exec_status) {
std::cerr << "execv failed with error "
<< errno << " "
<< strerror(errno) << std::endl;
}
} else { // parent process
int status = 1;
waitpid(child_pid, &status, WNOHANG);
write_headers();
pwr.init();
st.init();
init_bandwidth();
while (status) {
cout << " Printing status Value: " << status << endl;
sleep (sec_pause);
time_t now;
time(&now);
struct tm *tinfo;
tinfo = localtime(&now);
pwr.loop();
st.loop();
data = getbandwidth();
write_samples(tinfo, pwr, st, data.read_bandwidth + data.write_bandwidth);
waitpid(child_pid, &status, WNOHANG);
}
wait(&status); // wait for child to exit, and store its status
//--------------------This code is not executed------------------------
std::cout << "PARENT: Child's exit code is: "
<< WEXITSTATUS(status)
<< std::endl;
delete[] BeforeStates;
delete[] AfterStates;
}
} else {
std::cerr << "fork failed" << std::endl;
return 1;
}
return 0;
}
What is expected that the child will exit and then parent exits but due to some unknown reason after 16 mins parent exits but child is still running.
Normally It is said that when parent exits the child dies automatically.
What could be the reason for this strange behavior???
Normally It is said that when parent exits the child dies automatically.
Well this is not always true, it depends on the system. When a parent process terminates, the child process is called an orphan process. In a Unix-like OS this is managed by relating the parent process of the orphan process to the init process, this is called re-parenting and it's automatically managed by the OS. In other types of OS, orphan processes are automatically killed by the system. You can find more details here.
From the code snippet I would think that maybe the issue is in the wait(&status) statement. The previous loop would end (or not be executed) when the return status is 0, which is the default return value from your final return 0 at the end, that could be yielded by the previous waitpid(child_pid, &status, WNOHANG) statements. This means that the wait(&status) statement would wait on a already terminated process, this may cause some issues.
Related
My program loops through a vector of strings and runs a program to do some work. Each entry in the vector has its own associated program. The child processes are created using fork() and execv() in the loop. The parent process is waiting until each child process has returned before continuing the loop using waidpid(). The called child processes in my test environment (for now) will each print a message, sleep() and print another message.
The code works perfectly fine as long as all execv() does not return -1 (for example because the file wasn't found).
std::vector<std::string> files{ "foo", "bar", "foobar" };
for (size_t i=0; i<files.size(); i++)
{
pid_t pid_fork = fork();
if (pid_fork == -1)
{
std::cout << "error: could not fork process" << std::endl;
} else if (pid_fork > 1)
{
std::cout << "this is the parent" << std::endl;
int pid_status;
pid_t child_ret = waitpid(pid_fork, &pid_status, 0);
std::cout << "child_ret: " << child_ret << std::endl;
if (child_ret == -1)
{
std::cout << "error waiting for child " << pid_fork << std::endl;
} else
{
if (WIFEXITED(pid_status))
{
std::cout << "child process exit status: " << WEXITSTATUS(pid_status) << std::endl;
if (WIFEXITED(pid_status) == 0)
{
std::cout << "updating db that file has been loaded: " << files[i].first << std::endl;
/* some code to update a DB table */
} else
{
std::cout << "exit status = FAILED" << std::endl;
}
}
}
} else
{
std::cout << "this is the child" << std::endl;
char *args[] = {NULL};
if (execv(("./etl/etl_" + files[i].c_str(), args) == -1)
{
std::cout << "could not load ./etl/etl_" << files[i] << std::endl;
/* DB insert of failed "load" here */
return EXIT_FAILURE;
}
}
}
/* some more code here writing stuff to a database before cleanup and returning from main*/
Output:
this is the parent
this is the child
hello from etl_foo
etl_foo is done
child_ret: 77388
child process exit status: 0
this is the parent
this is the child
hello from etl_bar
etl_bar is done
child_ret: 77389
child process exit status: 0
this is the parent
this is the child
hello from etl_foobar
etl_foobar is done
child_ret: 77390
child process exit status: 0
If, however I cause execv() to return ´´´-1´´´ because I deleted etl_foobar the parent process seems to no longer wait for the child process to return
this is the child
hello from etl_foo
etl_foo is done
child_ret: 77620
child process exit status: 0
this is the parent
this is the child
hello from etl_bar
etl_bar is done
child_ret: 77621
child process exit status: 0
this is the parent
this is the child
could not load ./etl_foobar
-> here the end of the parent code is reached, the DB is updated and the parent returns (?)
-> I expect the program to be done at this stage, however... this happens
child_ret: 77622
terminate called after throwing an instance of 'sql::SQLException'
what(): Lost connection to MySQL server during query
Aborted (core dumped)
It seems the code block after pid_t child_ret = waitpid(pid_fork, &pid_status, 0); is executed, which I don't understand. The parent has already returned, yet part of the parent's code is still executed and fails as the connection object for the db connection has been deleted just before the parent returns.
The desired behavior is that upon discovery that execv() == -1 the child process returns to the waiting parent, which then finishes off the remaining code and returns itself in an orderly manner, the same way it does when there is no error in execv(). Thank you!
Edit: User Sneftel pointed me to the fact that the child process in case of failure actually doesn't return, which I have changed now. The parent process hence is now waiting for all children to return, including those where execv fails.
Nevertheless, I still have the issue that whenever the child returns with EXIT_FAILURE, the following loop performs up until the next DB insert is attempted, where I continue to get the "lost MySQL connection" error + core dump. Not sure what the origin of this is.
First off, allow me to describe my scenario:
I developed a supervisory program on Linux that forks and then uses execv(), in the child process, to launch my multi-threaded application. The supervisory program is acting as a watchdog to the multi-threaded application. If the multi-threaded application does not send a SIGUSR1 signal to the supervisor after a period of time then the supervisory program will kill the child using the pid_t from the fork() call and repeat the process again.
Here is the code for the Supervisory Program:
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <iostream>
#include <cerrno>
time_t heartbeatTime;
void signalHandler(int sigNum)
{
//std::cout << "Signal (" << sigNum << ") received.\n";
time(&heartbeatTime);
}
int main(int argc, char *argv[])
{
pid_t cpid, ppid;
int result = 0;
bool programLaunched = false;
time_t now;
double timeDiff;
int error;
char ParentID[25];
char *myArgv[2];
// Get the Parent Process ID
ppid = ::getpid();
// Initialize the Child Process ID
cpid = 0;
// Copy the PID into the char array
sprintf(ParentID, "%i", ppid);
// Set up the array to pass to the Program
myArgv[0] = ParentID;
myArgv[1] = 0;
// Print out of the P PID
std::cout << "Parent ID: " << myArgv[0] << "\n";
// Register for the SIGUSR1 signal
signal(SIGUSR1, signalHandler);
// Register the SIGCHLD so the children processes exit fully
signal(SIGCHLD, SIG_IGN);
// Initialize the Heart Beat time
time(&heartbeatTime);
// Loop forever and ever, amen.
while (1)
{
// Check to see if the program has been launched
if (programLaunched == false)
{
std::cout << "Forking the process\n";
// Fork the process to launch the application
cpid = fork();
std::cout << "Child PID: " << cpid << "\n";
}
// Check if the fork was successful
if (cpid < 0)
{
std::cout << "Error in forking.\n";
// Error in forking
programLaunched = false;
}
else if (cpid == 0)
{
// Check if we need to launch the application
if (programLaunched == false)
{
// Send a message to the output
std::cout << "Launching Application...\n";
// Launch the Application
result = execv("./MyApp", myArgv);
std::cout << "execv result = " << result << "\n";
// Check if the program launched has failed
if (result != -1)
{
// Indicate the program has been launched
programLaunched = true;
// Exit the child process
return 0;
}
else
{
std::cout << "Child process terminated; bad execv\n";
// Flag that the program has not been launched
programLaunched = false;
// Exit the child process
return -1;
}
}
}
// In the Parent Process
else
{
// Get the current time
time(&now);
// Get the time difference between the program heartbeat time and current time
timeDiff = difftime(now, heartbeatTime);
// Check if we need to restart our application
if ((timeDiff > 60) && (programLaunched == true))
{
std::cout << "Killing the application\n";
// Kill the child process
kill(cpid, SIGINT);
// Indicate that the process was ended
programLaunched = false;
// Reset the Heart Beat time
time(&heartbeatTime);
return -1;
}
// Check to see if the child application is running
if (kill(cpid, 0) == -1)
{
// Get the Error
error = errno;
// Check if the process is running
if (error == ESRCH)
{
std::cout << "Process is not running; start it.\n";
// Process is not running.
programLaunched = false;
return -1;
}
}
else
{
// Child process is running
programLaunched = true;
}
}
// Give the process some time off.
sleep(5);
}
return 0;
}
This approach worked fairly well until I ran into a problem with the library I was using. It didn't like all of the killing and it basically ended up tying up my Ethernet port in an endless loop of never releasing - not good.
I then tried an alternative method. I modified the supervisory program to allow it to exit if it had to kill the multi-threaded application and I created a script that will launch the supervisor program from crontab. I used a shell script that I found on Stackoverflow.
#!/bin/bash
#make-run.sh
#make sure a process is always running.
export DISPLAY=:0 #needed if you are running a simple gui app.
process=YourProcessName
makerun="/usr/bin/program"
if ps ax | grep -v grep | grep $process > /dev/null
then
exit
else
$makerun &
fi
exit
I added it to crontab to run every minute. That was very helpful and it restarted the supervisory program which in turn restarted multi-threaded application but I noticed a problem of multiple instances of the multi-threaded application being launched. I'm not really sure why this was happening.
I know I'm really hacking this up but I'm backed into a corner with this implementation. I'm just trying to get it to work.
Suggestions?
I wrote a helper function to start a process using fork() and execv() inspired by this answer. It is used to start e.g. mysqldump to make a database backup.
The code works totally fine in a couple of different locations with different programs.
Now I hit one constellation where it fails:
It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends.
If I check, if the worker process finished with kill(), I can tell that it did.
Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process!
Is there anything in my code (see below) that is incorrect that could trigger that behavior?
I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.
What's strange:
Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?
Environment:
The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().
Code (production code in which logging got traded for output via std::cout) with sample main function:
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
namespace {
bool checkStatus(const int status) {
return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}
}
bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
auto result = true;
const pid_t intermediatePid = fork();
if(intermediatePid == 0) {
// intermediate process
std::cout << "Intermediate process: Started (" << getpid() << ")." << std::endl;
const pid_t workerPid = fork();
if(workerPid == 0) {
// worker process
if(fileDescriptor) {
std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
if(-1 == dupResult) {
std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
_exit(EXIT_FAILURE);
}
}
execv(path, const_cast<char**>(argv));
std::cout << "Intermediate process: Worker failed!" << std::endl;
_exit(EXIT_FAILURE);
} else if(-1 == workerPid) {
std::cout << "Intermediate process: Starting worker failed!" << std::endl;
_exit(EXIT_FAILURE);
}
const pid_t timeoutPid = fork();
if(timeoutPid == 0) {
// timeout process
std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
sleep(timeoutInSeconds);
std::cout << "Timeout process: Finished." << std::endl;
_exit(EXIT_SUCCESS);
} else if(-1 == timeoutPid) {
std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition never evaluated to true in my tests.
const auto killResult = kill(workerPid, 0);
if((-1 == killResult) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Waiting for child processes." << std::endl;
int status = -1;
const pid_t exitedPid = wait(&status);
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition evaluates to true in the case of an error.
const auto killResult2 = kill(workerPid, 0);
if((-1 == killResult2) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Child process finished. Status: " << status << "." << std::endl;
if(exitedPid == workerPid) {
std::cout << "Intermediate process: Killing timeout process." << std::endl;
kill(timeoutPid, SIGKILL);
} else {
std::cout << "Intermediate process: Killing worker process." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);
} else if(-1 == intermediatePid) {
// error
std::cout << "Parent process: Error starting intermediate process!" << std::endl;
result = false;
} else {
// parent process
std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
processId = intermediatePid;
}
return(result);
}
bool waitForProcess(const pid_t processId) {
int status = 0;
const auto waitResult = waitpid(processId, &status, 0);
auto result = false;
if(waitResult == processId) {
result = checkStatus(status);
}
return(result);
}
int main() {
pid_t pid = 0;
const char* const path = "/bin/ls";
const char* argv[] = { "/bin/ls", "--help", nullptr };
const unsigned int timeoutInS = 5;
const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
if(startResult) {
const auto waitResult = waitForProcess(pid);
std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
} else {
std::cout << "startProcess failed!" << std::endl;
}
}
Edit
The expected output should contain
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: 0.
Intermediate process: Killing timeout process.
In the case of error the output looks like this
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: -1
Intermediate process: Killing worker process.
When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.
I found the problem:
Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code
#if !defined(_WIN32) && !defined(__SYMBIAN32__)
// Ignore SIGPIPE signal, so if browser cancels the request, it
// won't kill the whole process.
(void) signal(SIGPIPE, SIG_IGN);
// Also ignoring SIGCHLD to let the OS to reap zombies properly.
(void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32
(void) signal(SIGCHLD, SIG_IGN);
causes that
if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."
as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD.
This is also described in the man page for WAIT(2)
ERRORS [...]
ECHILD [...] (This can happen for
one's own child if the action for SIGCHLD is set to SIG_IGN.
See also the Linux Notes section about threads.)
Stupid on my part not to check the return value correctly.
Before trying
if(exitedPid == workerPid) {
I should have checked that exitedPid is != -1.
If I do so errno gives me ECHILD. If I would have known that in the first place, I would have read the man page and probably found the problem faster...
Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.
Additional info:
The code that caused this problem was changed in mongoose in September 2013 with this commit.
In our application the similar issue we faced. in a intense situation of repeated child process forks(), the child process never returned. One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.
I'm trying to dot product two vectors, with each process taking on a separate starting and ending index. What seems to be happening is that the code gets executed twice.
void DotProduct::MultiProcessDot()
{
pid_t pID,w;
int status;
unsigned int index = mNumberOfValuesPerVector / 2;
if((pID = fork()) < 0){
cout << "fork error" << endl;
}
else if(pID == 0){ /* child */
ProcessDotOperation(0, index);
exit(EXIT_FAILURE);
}
else{ /* parent */
ProcessDotOperation(index, mNumberOfValuesPerVector);
w = waitpid(pID, &status, WNOHANG);
if(w == 0){
cout << "alive" << endl;
}else if(w == -1){
cout << "dead" << endl;
}
}
}
ProcessDotOperation calculates the dot product using shared memory with sem_wait() and sem_post(). What seems to be happening is this:
Parent runs ProcessDotOperation
"alive" is printed
Parent runs ProcessDotOperation
"alive" is printed
Program continues execution (going on to other functions)
Child runs ProcessDotOperation
Child runs ProcessDotOperation
Note: I may have a fundamental misunderstanding of what's happening, so by parent and child, I'm referring to the comments in the code as to which process I think is running.
How do I make it such that the child runs ProcessDotOperation once, the parent runs ProcessDotOperation once, and then the program continues operation?
Any help is appreciated.
Edit
If I print before the fork(), and change w = waitpid(pID, &status, WNOHANG); to w = waitpid(pID, &status, 0);, here's the output:
forking
parent
child
forking
parent
child
continued execution...
Here's the code of ProcessDotOperation:
void DotProduct::ProcessDotOperation(unsigned int startIndex, unsigned int endIndex)
{
for(unsigned int i = startIndex; i < endIndex; i++){
sem_wait(mSem);
mShmProductId += mVectors[0][i] * mVectors[1][i];
cout << startIndex << " " << endIndex << " " << i << endl;
sem_post(mSem);
}
}
Someone is calling MultiProcessDot a second time.
I think you need a loop around the waitpid(). As it is written, you wait once, without hanging around for a dead child, returning immediately if the child is not yet dead. This allows the parent to go on with other activities, of course.
I'm not sure it's a complete explanation of what you observe, but we can't see your trace code. Print things like the process's PID with each message.
I am forking a number of processes and I want to measure how long it takes to complete the whole task, that is when all processes forked are completed. Please advise how to make the parent process wait until all child processes are terminated? I want to make sure that I stop the timer at the right moment.
Here is as a code I use:
#include <iostream>
#include <string>
#include <fstream>
#include <sys/time.h>
#include <sys/wait.h>
using namespace std;
struct timeval first, second, lapsed;
struct timezone tzp;
int main(int argc, char* argv[])// query, file, num. of processes.
{
int pCount = 5; // process count
gettimeofday (&first, &tzp); //start time
pid_t* pID = new pid_t[pCount];
for(int indexOfProcess=0; indexOfProcess<pCount; indexOfProcess++)
{
pID[indexOfProcess]= fork();
if (pID[indexOfProcess] == 0) // child
{
// code only executed by child process
// magic here
// The End
exit(0);
}
else if (pID[indexOfProcess] < 0) // failed to fork
{
cerr << "Failed to fork" << endl;
exit(1);
}
else // parent
{
// if(indexOfProcess==pCount-1) and a loop with waitpid??
gettimeofday (&second, &tzp); //stop time
if (first.tv_usec > second.tv_usec)
{
second.tv_usec += 1000000;
second.tv_sec--;
}
lapsed.tv_usec = second.tv_usec - first.tv_usec;
lapsed.tv_sec = second.tv_sec - first.tv_sec;
cout << "Job performed in " <<lapsed.tv_sec << " sec and " << lapsed.tv_usec << " usec"<< endl << endl;
}
}//for
}//main
I'd move everything after the line "else //parent" down, outside the for loop. After the loop of forks, do another for loop with waitpid, then stop the clock and do the rest:
for (int i = 0; i < pidCount; ++i) {
int status;
while (-1 == waitpid(pids[i], &status, 0));
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
cerr << "Process " << i << " (pid " << pids[i] << ") failed" << endl;
exit(1);
}
}
gettimeofday (&second, &tzp); //stop time
I've assumed that if the child process fails to exit normally with a status of 0, then it didn't complete its work, and therefore the test has failed to produce valid timing data. Obviously if the child processes are supposed to be killed by signals, or exit non-0 return statuses, then you'll have to change the error check accordingly.
An alternative using wait:
while (true) {
int status;
pid_t done = wait(&status);
if (done == -1) {
if (errno == ECHILD) break; // no more child processes
} else {
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
cerr << "pid " << done << " failed" << endl;
exit(1);
}
}
}
This one doesn't tell you which process in sequence failed, but if you care then you can add code to look it up in the pids array and get back the index.
The simplest method is to do
while(wait() > 0) { /* no-op */ ; }
This will not work if wait() fails for some reason other than the fact that there are no children left. So with some error checking, this becomes
int status;
[...]
do {
status = wait();
if(status == -1 && errno != ECHILD) {
perror("Error during wait()");
abort();
}
} while (status > 0);
See also the manual page wait(2).
Call wait (or waitpid) in a loop until all children are accounted for.
In this case, all processes are synchronizing anyway, but in general wait is preferred when more work can be done (eg worker process pool), since it will return when the first available process state changes.
I believe the wait system call will accomplish what you are looking for.
for (int i = 0; i < pidCount; i++) {
while (waitpid(pids[i], NULL, 0) > 0);
}
It won't wait in the right order, but it will stop shortly after the last child dies.