I am developing a program that is doing various tasks using fork(). I am starting the program, everything works fine. I observed that after some time (1 day) i get flooded with <defunct> processes, over 600 700 ... where max forks is setted to 500. This is the code :
int numforks = 0;
int maxf = 100;
// READ FROM FILE ...
while (fgets(nutt,2048,fp))
{
fflush(stdout);
if (!(fork()))
{
some_time_intensive_function();
exit(0);
}
else
{
numforks++;
if (numforks >= maxf)
{
wait(NULL);
numforks--;
}
}
}
// DON'T EXIT PROGRAM TILL ALL FORKS ARE FINISHED
while(numforks>0)
{
wait(NULL);
numforks--;
}
// CLOSE READ FILE ...
This programs keeps all the time 500 forks oped like a thread pool.
I don't really understand what <defunct> processes are, but i heard that they aren't errors in the child processes like SEG FAULT occurring, but rather parent process is not waiting correctly.
I want to get read of <defunct>s, any ideas to solve this?
I repeat, this happens after some time 1-2 days.
Thank you.
I think you have two problems:
Firstly wait can return for reasons other than a child process has terminated (and if it does, it will leave a defunct process). I think you need to pass in a non-null pointer, and inspect the returned wait status. Only decrement numforks if appropriate.
Secondlynumforks doesn't (effectively) limit the total number of child processes. If the parent process launches two processes, they will each go on to inherit numforks of 0 and 1. Then each of those child processes will launch 500 and 499 more subprocesses.
I think you need exit(0) (or break) after your time_consuming_process().
(I assume you are running on Linux, or some other POSIX system like MacOSX)
Beware of orphan processes.
Read Advanced Linux Programming which has several chapters related to your issue.
You'll better keep the result of fork (in some pid_t variable or field), and handle all three cases (>0: fork was successful; ==0, in child process, <0: fork failed!). And you should probably call waitpid(2) appropriately. In the child process it is reasonable to call exit(3) (or execve(2)...)
Perhaps you should handle SIGCHLD signal. Read carefully signal(7).
(you don't show enough of your program, and an entire book is needed to explain all that)
As a rule of thumb you don't want to have many runnable processes. On a typical laptop or desktop computer, you should not have more than a dozen of runnable processes. Use top(1) or ps(1) to list your processes (and notably to understand how many processes you have). Perhaps use (at least during debugging) bash ulimit builtin (it calls setrlimit(2) from inside your shell) in your terminal e.g. as ulimit -u 50 to limit the number of processes (to 50).
If coding in genuine C++11, you should consider using frameworks like Qt or POCO (both provide support for processes).
You should care about inter-process communication (perhaps with pipe(7)-s or socket(7)-s and some event loop, see poll(2) ...) and synchronization issues. Perhaps look into MPI or 0mq.
(you probably need to read a lot more)
Perhaps strace(1) might be helpful to debug your issues.
Don't forget to check every system call. See syscalls(2) & errno(3).
Related
I'm working on a large-scale application that spawns numerous processes for dealing with various tasks. In some situations, the OS will kill one of my processes because of memory pressure. That's ok, it's entirely expected, the parent process handles this gracefully.
What I'd like to know is find out why a process was killed. If it was killed because of memory pressure, I want to respawn the treatment a little later. If it was killed for any other reason – because, say, of an assertion failure or an out of bounds memory access, I want to log and investigate.
So, here's my question: how do you find out that a child process was killed because the OS needed the memory?
Question applies to:
Windows;
MacOS;
Linux;
(for bonus points, I'm also interested in Android, but that's not my priority).
Processes are not running as root/admin.
On Linux, you can read the syslog to find out whether a process was killed by the OS. you can investigate it by reading the syslog (/var/log/messages or /var/log/syslog on some distributions) or via the dmesg command.
If you spawned the process you can also detect that it was killed with the SIGKILL(9) signal, as opposed to the SIGSEGV(11) signal that corresponds to the app crashing all by itself, and SIGINT(2)/SIGTERM(15) that means that the applications was aked to terminate gracefully.
Regarding Windows, I only know that this type of monitoring can be enabled via the Application Event Log. There's a GUI Application that can help you set it up.
When the OS intervenes in the execution of a process in order to kill, it does so via signals.
What you can do (on IX based/like platforms) is -- dmesg.
It outputs the kernel activity logs.
From there, you can identify the signal that was sent to your process.
For example this code below --
#include <stdio.h>
int main (void)
{
char *p = NULL;
printf ("\n%c", *p);
return 0;
}
Causes this obtained from dmesg --
[8478285.606105] crash.out[16830]: segfault at 0 ip 0000000000400531 sp 00007fffc373b090 error 4 in crash.out[400000+1000]
I am trying to run multiple command in ubuntu using c++ code at the same time.
I used system() call to run multiple command but the problem with system() call is it invoke only one command at a time and rest commands are in waiting.
below I wrote my sample code, may this help you to get what I am trying to do.
major thing is I want to run all these command at a time not one by one. Please help me.
Thanks in advance.
main()
{
string command[3];
command[0]= "ls -l";
command[1]="ls";
command[2]="cat main.cpp";
for(int i=0;i<3;i++){
system(command[i].c_str());
}
}
You should read Advanced Linux Programming (a bit old, but freely available). You probably want (in the traditional way, like most shells do):
perhaps catch SIGCHLD (set the signal handler before fork, see signal(7) & signal-safety(7)...)
call fork(2) to create a new process. Be sure to check all three cases (failure with a negative returned pid_t, child with a 0 pid_t, parent with a positive pid_t). If you want to communicate with that process, use pipe(2) (read about pipe(7)...) before the fork.
in the child process, close some useless file descriptors, then run some exec function (or the underlying execve(2)) to run the needed program (e.g. /bin/ls)
call (in the parent, perhaps after having got a SIGCHLD) wait(2) or waitpid(2) or related functions.
This is very usual. Several chapters of Advanced Linux Programming are explaining it better.
There is no need to use threads in your case.
However, notice that the role of ls and cat could be accomplished with various system calls (listed in syscalls(2)...), notably read(2) & stat(2). You might not even need to run other processes. See also opendir(3) & readdir(3)
Perhaps (notably if you communicate with several processes thru several pipe(7)-s) you might want to have some event loop using poll(2) (or the older select(2)). Some libraries provide an event loop (notably all GUI widget libraries).
You have a few options (as always):
Use threads (C++ standard library implementation is good) to spawn multiple threads which each perform a system call then terminate. join on the thread list to wait for them all to terminate.
Use the *NIX fork command to spawn a new process, then within each child process use exec to execute the desired command (see here for an example of "getting the right string to the right child"). Parent process can use waitpid to determine when all children have finished running, in order to move on with the program.
Append "&" to each of your commands, which'll tell the shell to run each one in the background (specifically, system will start the process in the background then return, without waiting for the result). Not tried this, don't know if it'll work. You can't then wait for the call to terminate though (thanks PSkocik).
Just pointing out - if you run those 3 specific commands at the same time, you're unlikely to be able to read the output as they'll all print text to the terminal at the same time.
If you do require reading the output from within the program (though not mentioned in your question), this is relevant (although it doesn't use system).
I have an .exe Program, which triggers some other files during execution.
So at a given point, the tree might become like:
Main program
-Program 1
-Program 2
-Program 3
Of all these programs I have their PID, so I am able to close them successfully. However, when a user 'brute forces the program' (read close the program manually), I am unable to close these child programs. Is there an option to trigger the closing of child-programs before the main-program itself will actually exit. (Something is for example also possible in an html-page to remind the user e.g. or they really want to leave te page).
Because, when this situation occurs, on the next run the main-program will try to start up these child-programs again, however they are already running. (And the settings of the main-program are time dependent and have to be transferred to the other child-programs on start-up to work properly)
Ideally, I would like to have a cross-platform solution, since I have to make the app available for Windows, Linux and MacOS.
Thanks for your answers.
This is an OS feature and each OS offers it in its own way. Keeping track of the PIDs does not work, for once for the reason you mention (your parent process may itself crash) and second because the child process may spawn grand-children processes of its own that needs to be tracked, and then grand-grand-children and so on.
On Windows this is handled by NT Job Objects by asking for the JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE:
Causes all processes associated with the job to terminate when the last handle to the job is closed.
The way to use it is to create the job object in the parent process and make the handle non-inheritable. Then any child process will become part of the job, but only one handle exisst (the one owned by the parent). If the parent crashes then the handle is reclaimed by the OS and this will terminate the NT job object, killing all child processes as well as any grand-child or grand-grand-child process.
On Linux (and OS X) the same functionality is achieved with process groups.
I am not aware of any cross-platform library that would abstract this into a coherent uniform API.
I have a program written in C++ intended to run on a Linux OS. Ignoring much of the program, it boils down to this - it starts X number of executables after some amount of time (for simplicity sake, let's use 5 seconds).
Currently, I'm using system(path/to/executable/executable_name) to do the actual starting of the executable(s) and that works just fine for getting the executable(s) to start.
I'm also trying to maintain a status for each executable (for simplicity sake again, let's just say the status is either "UP" or "DOWN" (running or not running)). I have been able to accomplish this...somewhat...
Backing up just a tad, when my program is told to start the executable(s), the logic looks something like this:
pid = fork()
if (pid < 0) exit 0; //fork failed
if (pid == 0) {
system(path/to/executable/executable_name)
set executable's status to DOWN
} else {
verify executable started
set executable's status to UP
}
Herein lies my problem. fork() causes a child process to be spawned, which is what I thought I needed in order for the original process to continue starting additional executables. I don't want to wait for an executable to stop in order to start another.
However, the executable starts in another child process...which is separate from the parent process... and if I try to set the executable's status to DOWN in the child process when system returns, the parent process does not know about it...
I have a few ideas of what I might need to do:
use threads instead of fork: create a new thread to call system, but would the parent/main thread know about the new thread changing the status of the executable?
use fork and exec: but I'm not sure that would be any better than what I already have (I've read the man pages for fork and exec but I guess I'm still a little fuzzy on how to best utilize exec)
Any suggestions?
EDIT 1
I thought I'd better give a little more context for the logic:
void startAll() {
for each 'executable'
call startExecutable(executable_name)
}
...
void startExecutable (executable_name) {
pid = fork()
if (pid < 0) exit 0; //fork failed
if (pid == 0) {
system(path/to/executable/executable_name)
set executable's status to DOWN
exit (1); <-- this is because once the child process's system returns, I don't want it to return to the above loop and start starting executables
} else {
verify executable started
set executable's status to UP
}
}
EDIT 2
As mentioned at the beginning, this is assuming a simplified setup (a first run if you will). The plan is to handle not just an "UP" or "DOWN" state, but also a third state to handle sending a message to the executables my program has started - "STANDBY." I initially left this piece out to avoid complicating the explanation but I now see that it is imperitive to include.
You need to understand what exactly is happening when you fork. What you're doing is creating a subprocess that's an exact clone of the forking process. All variables currently in memory are copied exactly, and the subprocess has access to all of those copies of all of those variables.
But they're copies, so as you've noticed, fork and exec/system does not on its own handle inter-process communication (IPC). Setting a memory value in one of the processes doesn't alter that variable in any other process, including its parent, because the memory spaces are different.
Also, system is very similar to exec, but gives you much less control over the file descriptors and execution environment. You're effectively already doing a fork and exec, which is what you should be doing.
When you fork properly (as you do in your example), you now have two processes, and neither one is waiting for the other - they just run in completely different codepaths. What you basically want is to have the parent do nothing but sit around waiting for new programs to open, and occassionally check the status of the kids, while the kids run and play as long as they want.
There are IPC solutions such as pipes and message FIFO queues, but that's excessive in your case. In your case, you're just looking for process management. The parent is given the pid of the children. Save it and use it. You can call waitpid to wait for the child to end, but you don't want that. You just want the parent to check the status of the child. One way to do that is check if kill(childPid,0) == 0. If not, then the pid has exited, i.e. it's no longer running. You can also check /proc/childPid for all sorts of information.
If your status is less simple than your question implied, you'll want to look into piping after forking and execing. Otherwise, all you need is process monitoring.
Based on your EDIT 2, you're still within the domain of process management, instead of IPC. The kill command sends a signal to a process (if the command is non-0). What you're looking for is to have the parent kill(childPid, SIGTSTP). On the child side, you just need to make a signal handler, using the signal command. Among many other references, see http://www.yolinux.com/TUTORIALS/C++Signals.html. Basically, you want:
void sigTempStopHandler(int signum) { /* ... */ }
signal(SIGTSTP, sigTempStopHandler);
to be executed in the child code. The parent, of course, would know when this state is sent, so can change the status. You can use other signals for resuming when necessary.
When to pipe vs. signal:
Piping is the most robust IPC you could use - it lets you send any amount of data from one process to another, and can be in whichever direction you want. If you want your parent to send "You've been a very bad boy" to the child, it can, and the child can send "But I'll choose your nursing home one day" to the parent. (Less flippantly, you can pass any data, whether text or binary from one process to another - including objects that you serialize, or just the raw data for objects if it doesn't depend on memory, e.g. an int.)
So far, what you've described is sending simple command structures from the parent to the child, and kill is perfect for that. The child could send signals almost as easily - except that it would need to know the parent's pid to do that. (Not hard to do - before forking, save the pid: int pid = getPid();, now the child knows the parent.) Signals have no data, they're just very raw events, but so far, that sounds like all you're looking for.
I am writing a shell where I need to launch several child processes at once and record the system time and user time.
So far I am able to do it. The only problem is that I am using wait4 to grab the system resources used by the child program and put it in my rusage structure called usage.
How can I launch all the processes at the same time and keep track of the user and system times? I can remove the wait4() system call and use it outside to loop so I can make the parent wait, but if I do that then I can only record the times for the last process and not all of them.
Do you have any idea how I can fix this?
execute(commandPipev,"STANDARD",0);
wait4(pid,&status,0,&usage);
printf("Child process: %s\t PID:%d\n", commandPipev[0], pid);
printf("System time: %ld.%06ld sec\n",usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
printf("User time: %ld.%06ld sec\n\n",usage.ru_utime.tv_sec, usage.ru_utime.tv_usec);
A convoluted answer.
In a POSIX environment, launch the children, then use waitid() with the WNOWAIT option to tell you that some child has exited. The option leaves the child in a waitable state - that is, you can use another wait-family call to garner the information you need. You can then use the non-POSIX wait4() system call to garner the usage information for the just exited child, and deal with the accounting you need to do. Note that you might find a different process has terminated between the waitid() and wait4() calls; you need to use a loop and appropriate flags and tests to collect all the available corpses (dead child processes) before going back to the waitid() call to find out about the other previously incomplete child processes. You also have to worry about any of the wait-family of functions returning the information for a process that was previously started in the background and has now finished.
The Linux man page for wait4(2) suggests that WNOWAIT might work directly with wait4(2), so you may be able to do it all more cleanly - if, indeed, you need the option at all.
Consider whether you can use process groups to group the child processes together, to make waiting for the members of the process group easier.