I am using waitpid as given
waitpid(childPID, &status, WNOHANG);
This is used in a program inside an infinite loop that forks when needed and the parent waits for the child process to return. But recently I have come across a problem where in the program exits after printing this to the cerr..
waitpid: No child processes
This is always the last log from the program before it crashes/exits. I know that it doesnot segfault or anything because i have a traceback function written that prints the last 10 addresses that the program accessed. So does it mean that the program exited the loop after finding that there is no child process? Or is there something sinister at work here?
I guess what is happening over here is that the fork system call is failing due lo lack of available entries in the process table. You can do a perror on the output of fork. I think it should be RESOURCE_TEMPORARILY_UNAVAILABLE.
Related
I need to terminate my child when parent exits abnormally. I found an answer in this below link for my requirement:
How can I cause a child process to exit when the parent does?
Answer from tomcri:
You can simply start a thread reading from System.in in the child process. If you are not writing to stdin of your child, nobody else will do and the thread will block "forever". But you will get an EOF (or an exception), if the parent process is killed or dies otherwise. So you can shutdown the child as well. (Child process started with java.lang.ProcessBuilder)
Actually, I tried to comment below this Answer but couldn't due to lack of enough reputation.
So my question is, how do I recognize the character in the stdin is EOF? In my program, I am reading data from stdin for some other purpose too. So I need to know when EOF/(chars comes when parent dies) somehow. stdin is filled with character = ΓΏ (hex = 0xFFFF). I read some pages like 'The highest possible valid code point is Ux10FFFF in Unicode'. So how do I check to exit?
My program is Windows C++ and supports Unicode characters. Is there any better way to understand this scenario?
I had a process with C++ on windows 2008R2, there are several theads in it. During the process's startup, there is a chance that one of the thread will exit. I didn't get a way to detect what happens, any suggestions?
Based on my investigation, the thread just exit without an exception. Access to a null pointer can cause the similar issue, but I didn't find such a position in the process. In fact, it should be better if the process just crash, then I can get a dump file; but nothing happens, just one thread exit.
I had tried the tool user mode process dumper, but it cannot work on the windows version that this process is working on.
I had tried the tool process monitor to check the thread exit event, but the process monitor will throw an exception when I try to reproduce this issue by starting the process again and again.
Thanks in advance.
Found the root cause at last -- the string is accessed by more than one threads, and one thread just exit. String is not thread safe.
Process Monitor helped to get the thread exit call stack on a powerful host, this makes the root cause clear.
Thanks all for your suggestions.
How can I measure the memory used by a child process after I call fork and exec? Basically I want to be able to write code that corresponds to the following
if (!fork()) {
// run child process
exec();
} else {
while (child active) {
print memory used by child
}
}
There are two things that I do not know here, how can I see if the child process has finished running? Will I have to use some sort of process level mutual exclusion here? If yes then what is a structure I can use? Can I just use the OS filesystem for this purpose?
Also I was looking at the answer at this link Differences between fork and exec, in paragraph 8 the author says copy on write is useful when process calls fork without calling exec. But isn't this true more in the case when the parent calls fork and does not call exec? When the parent calls exec the virtual address space of the child is wiped out and replaced with the one resulting from the new program loaded into memory correct?
Thank you!
Regarding the above comment chain which I evidently can't reply to because I don't have 50 rep:
The return value of fork in the parent if successful is the PID of the child. You should probably save the return value so you can 1. wait on the correct child (if you have more than one), and 2. see if fork fails (in which case you probably don't want to loop until the child exits ).
You could also use signals to figure out when the child dies instead of continuously trying to wait with the WNOHANG option. The process will send SIGCHLD to the parent when it terminates (or stops) and if it died then you can wait on it with waitpid and stop your loop. see:
man 7 signal
man 2 sigaction
for more information on this.
regarding memory usage, it seems you either want /proc/[pid]/statm or /proc/[pid]/stat.
man 5 proc will give you all the information about what is in those files.
I have been running a program I developed in C++ with OpenMPI version 1.6.5 in Ubuntu 14.04.
Everything was working fine (i.e. the program was executing as it was supposed to) until I quit it using Ctrl+C at a point, as I realised I ran it with a wrong input value and could not be bothered to wait for it to complete (big, rookie mistake!).
After I changed the variable value and recompiled the program (allright), I tried to run the program again with mpirun -np 8 program_name. However, OpenMPI returned the following error:
mpirun has exited due to process rank 5 with PID 3363 on
node Hal exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
I tried to recompile the program and run it multiple times, but always got the same error with a different process rank depending on whichever was called first (I assume). Since I think the problem is relating to my "not so nice" way to quit the previous run, I restarted the computer, but the error is still there.
Is there a command to shut down all MPI runs or a file to clear to see if that was the problem?
Thank you very much in advance!
I am invoking several processes in my main and I can get the pid of that processes. Now I want to wait until all this processes have been finished and then clear the shared memory block from my parent process. Also if any of the process not finished and segfaulted I want to kill that process. So how to check from the pid of processes in my parent process code that a process is finished without any error or it gave broke down becoz of runtime error or any other cause, so that I can kill that process.
Also what if I want to see the status of some other process which is not a child process but its pid is known.
Code is appreciated( I am not looking for script but code ).
Look into waitpid(2) with WNOHANG option. Check the "fate" of the process with macros in the manual page, especially WIFSIGNALED().
Also, segfaulted process is already dead (unless SIGSEGV is specifically handled by the process, which is usually not a good idea.)
From your updates, it looks like you also want to check on other processes, which are not children of your current process.
You can look at /proc/{pid}/status to get an overview of what a process is currently doing, its either going to be:
Running
Stopped
Sleeping
Disk (D) sleep (i/o bound, uninterruptable)
Zombie
However, once a process dies (fully, unless zombied) so does its entry in /proc. There's no way to tell if it exited successfully, segfaulted, caught a signal that could not be handled, or failed to handle a signal that could be handled. Not unless its parent logs that information somewhere.
It sounds like your writing a watchdog for other processes that you did not start, rather than keeping track of child processes.
If a program segfaults, you won't need to kill it. It's dead already.
Use the wait and waitpid calls to wait for children to finish and check the status for some idea of how they exiting. See here for details on how to use these functions. Note especially the WIFSIGNALED and WTERMSIG macros.
waitpid() from SIGCHLD handler to catch the moment when application terminates itself. Note that if you start multiple processes you have to loop on waitpid() with WNOHANG until it returns 0.
kill() with signal 0 to check whether the process is still running. IIRC zombies still qualify as processes thus you have to have proper SIGCHLD handler for that to work.