Dynamically executing and terminating external programs with C++

Dynamically executing and terminating external programs with C++ - c++

I need to execute processes with still being in control of each process.
I want to create a class which stores the threads or pids or what ever is necessary to do so.
I currently have a program which executes one external application with the C function execvp and also loads the environment from a shell script. So my current program is blocking. But I need to be able to keep it freely running and only by time I terminate a currently running or start a new external application.
My current approach would be to create a thread, which uses the execve function. But then the thread would be blocking as far as I can see.
The code which might be in the thread (with variables then):
char *argv[] = { "/bin/bash", "-c", "myApplication", 0 };
execve(argv[0], &argv[0], environment.data());
The applications called are probably not fixed in the code their names will be given by an external setup file, including parameters.
Now my actual question, is there a better way to "manage" external applications like that in c++? Some ready solution (class, library)? And if not how do I terminate the thread if this is the actual way. Using the terminate call is said to be bad practice, that's what I often read.
I hope this is now specific enough for the forum, because I do not know how to get more specific anymore. If you need more hints what I want to create here, feel free to ask in the comments.
Update:
to DBus & others:
Additional information I do not wrote all of the processes I want to start!
So it will be used to start 3rd party applications, which even if I have the code, do not want to change.

You want to fork() before you exec. fork() is a function that creates a new process identical to the original caller of fork() running as a subprocess. The difference is that the parent process gets the child's pid as a return value and the child gets 0. The gist of what you want to do is this:
pid_t pid = fork();
if( pid == 0 )
{
// we're the child process
char *argv[] = { "/bin/bash", "-c", "myApplication", 0 };
int rc = execve(argv[0], &argv[0], environment.data());
// execve only returns if there was an error
// check 'errno' and handle it here
}
else if ( pid < 0 )
{
// pid is less than zero, we didn't successfully fork,
// there is no child process.
throw "error message";
}
// do whatever processing the parent does
More info is here. The kill() function isn't bad practice per se, if you want to quickly and gracefully end the subprocess you can write signal handlers in it, but you should be using something like dbus or zeromq to do proper interprocess communication. You want to tell the program to do something, not just tell it to die (usually what you want it to do if you're killing it).

NEVER USE execv functions in threads because the execve() system call overlays the current process image with a new process image.
The correct pattern if fork-exec or better vfork-exec. Extract from the manpage:
The vfork() system call can be used to create new processes without fully
copying the address space of the old process, which is horrendously inefficient in a paged environment. It is useful when the purpose of fork(2)
would have been to create a new system context for an execve(2). The
vfork() system call differs from fork(2) in that the child borrows the
parent's memory and thread of control until a call to execve(2) or an
exit (either by a call to _exit(2) or abnormally). The parent process is
suspended while the child is using its resources.
Using vfork shortly followed with execve, you avoid the copy of the original process image, and do not erase if with the new process, so the original process has the pid of its child and cat control it, look whether it has ended, send it signals and so on.

Related

Keeping track of background processes internally in cpp shell

I'm writing a shell in cpp and I was hoping to get some advice. I have a command that will do an exec in the background, and I'm trying to keep track of which background processes are still running. I thought maybe I could keep track of the PID and do a string find on /proc/, but it seems to stay longer than it should. I'm testing it by using the sleep command, but it seems to always linger around wherever I look long after it should've finished. I'm probably just not doing the right thing to see if it is still running though.
Thanks in advance for any help.

Assuming you are spawning off the child process via fork() or forkpty(), one reasonably good way to track the child process's condition is to have the parent process create a connected-socket-pair (e.g. via socketpair()) before forking, and have the child process call dup2() to make one end of that socket-pair its stdin/stdout/stderr file descriptor, e.g.:
// Note: error-checking has been removed for clarity
int temp[2];
(void) socketpair(AF_UNIX, SOCK_STREAM, 0, temp);
pid_t pid = fork();
if (pid == 0)
{
// We are the child process!
(void) dup2(temp[1], STDIN_FILENO);
(void) dup2(temp[1], STDOUT_FILENO);
(void) dup2(temp[1], STDERR_FILENO);
// call exec() here...
}
The benefit of this is that now the parent process has a file descriptor (temp[0]) that is connected to the stdin, stdout, and stderr of the child process, and the parent process can select() on that descriptor to find out whenever the child process has written text to its stderr or stdout streams, and can then read() on that file descriptor to find out what the child process wrote (useful if you want to then display that text to the user, or if not you can just throw the read text away), and most importantly, it will know when the child process has closed its stderr and stdout streams, because then the parent process's next call to read() on that file descriptor will indicate 0 aka EOF.
Since the OS will automatically close the child process's streams whenever it exits for any reason (including crashing), this is a pretty reliable way to get notified that the child process has gone away.
The only potential gotcha is that the child process could (for whatever reason) manually call close(STDOUT_FILENO) and close(STDERR_FILENO), and yet still remain running; in that case the parent process would see the socket-pair connection closing as usual, and wrongly think the child process had gone away when in fact it hadn't. Fortunately it's pretty rare for a program to do that, so unless you need to be super-robust you can probably ignore that corner case.

On a POSIX-like system, after you create any child processes using fork, you should clean up those child processes by calling wait or waitpid from the parent process. The name "wait" is used because the functions are most commonly used when the parent has nothing to do until a child exits or is killed, but waitpid can also be used (by passing WNOHANG) to check on whether a child process is finished without making the parent process wait.
Note that at least on Linux, when a child process has exited or been killed but the parent process has not "waited" for the child, the kernel keeps some information about the child process in memory, as a "zombie process". This is done so that a later "wait" can correctly fetch the information about the child's exit code or fatal signal. These zombie processes do have entries in /proc, which may be why you see a child "stay longer than it should", if that's how you were checking.

Linux best practice to start and watch another process

In my process I need to start/restart another process.
Currently I use a thread with a tiny stack size and the following code:
void startAndMonitorA()
{
while(true)
{
system("myProcess");
LOG("myProcess crashed");
usleep(1000 * 1000);
}
}
I feel like that's not best practice. I have no idea about the resources the std::system() call is blocking or wasting. I'm on an embedded Linux - so in general I try to care about resources.

One problematic piece is restarting immediately: if the child process fails to start that is going to cause 100% CPU usage. It may be a transient error in the child process (e.g. cannot connect to a server). It may be a good idea to add a least one second pause before trying to restart.
What system call does on Linux is:
Sets up signals SIGINT and SIGQUIT to be ignored.
Blocks signal SIGCHLD.
fork()
Child process calls exec() shell, passing the command line to the shell.
Parent process calls waitpid() that blocks the thread till the child process terminates.
Parent process restores its signal dispositions.
If you were to re-implement the functionality of system you would probably omit step 5 (along with steps 1, 2 and 6) to avoid blocking the thread and rely on SIGCHLD to get notified when the child process has terminated and needs to be restarted.
In other words, the bare minimum would be to set up a signal handler for SIGCHLD and call fork and exec.

The code as shown would be adequate for most circumstances. If you really care about resource usage, you should be aware that you are starting (and keeping around) a thread for each process you are monitoring. If your program has an event loop anyway, that kind of thing can be avoided at the cost of some additional effort (and an increase in complexity).
Implementing this would entail the following:
Instead of calling system(), use fork() and exec() to start the external program. Store its PID in a global table.
Set a SIGCHLD handler that notifies the event loop of the exit of a child, e.g. by writing a byte to a pipe monitored by the event loop.
When a child exits, run waitpid with the WNOHANG flag in a loop that runs for as long as there are children to reap. waitpid() will return the PID of the child that exited, so that you know to remove its PID from the table, and to schedule a timeout that restarts it.

Memory usage by a child process, fork and exec

How can I measure the memory used by a child process after I call fork and exec? Basically I want to be able to write code that corresponds to the following
if (!fork()) {
// run child process
exec();
} else {
while (child active) {
print memory used by child
}
}
There are two things that I do not know here, how can I see if the child process has finished running? Will I have to use some sort of process level mutual exclusion here? If yes then what is a structure I can use? Can I just use the OS filesystem for this purpose?
Also I was looking at the answer at this link Differences between fork and exec, in paragraph 8 the author says copy on write is useful when process calls fork without calling exec. But isn't this true more in the case when the parent calls fork and does not call exec? When the parent calls exec the virtual address space of the child is wiped out and replaced with the one resulting from the new program loaded into memory correct?
Thank you!

Regarding the above comment chain which I evidently can't reply to because I don't have 50 rep:
The return value of fork in the parent if successful is the PID of the child. You should probably save the return value so you can 1. wait on the correct child (if you have more than one), and 2. see if fork fails (in which case you probably don't want to loop until the child exits ).
You could also use signals to figure out when the child dies instead of continuously trying to wait with the WNOHANG option. The process will send SIGCHLD to the parent when it terminates (or stops) and if it died then you can wait on it with waitpid and stop your loop. see:
man 7 signal
man 2 sigaction
for more information on this.
regarding memory usage, it seems you either want /proc/[pid]/statm or /proc/[pid]/stat.
man 5 proc will give you all the information about what is in those files.

Linux fork function compared to Windows' CreateProcess - what gets copied?

I am porting Windows application to Linux. I use CreateProcess on Windows to run child processes and redirect all standard streams (in, out, error). Streams redirect is critical, main process sends data to children and receives theirs output and error messages. Main process is very big one with a lot of memory and threads, and child processes are small ones. On Linux I see that fork function has similar functionality as CreateProcess on Windows. However, manual says that fork "creates parent process copy", including code, data and stack. Does it mean that if I create copy of a huge process that uses 1 GB of memory just to run a very simple command line tool that uses 1 MB of memory itself, I will need to fist duplicate 1 GB of memory with fork, and then replace this 1 GB with 1 MB process? So, if I have 100 threads it will be required to have 100 GB of memory to run 100 processes that need just 100 MB of memory to run? Also what about other threads in parent process that "don't know" about fork execution, what will they do? What fork function does "under the hood" and is it really effective way to create a lot of small child processes from huge parent?

When you call fork() then initially only your VM is copied and all pages are marked copy-on write. Your new child process will have a logical copy of your parent processes VM, but it will not consume any additional RAM until you actually start writing to it.
As for threads, fork creates only one new thread in the child process that resembles a copy of the calling thread.
Also as soon as you call any of the exec family of calls (which I assume you want to) then your entire process image is replaced with a new one and only file descriptors are kept.
If your parent process has a lot of open file descriptors then I suggest you go through /proc/self/fd and close all file descriptors in the child that you don't need.

fork basically splits your process into two, with both parent and child processes continuing at the instruction after the fork function call. However, the return value value in the child process is 0, whilst in the parent process it is the process id of the child process.
The creation of the child process is extremly quick since it uses the same pages as the parent. The pages are marker as copy-on-write (COW) so that if either process changes the page then the other won't be affected. Once the child process exists it usually calls one of the exec functions to replace itself with a image. Windows doesn't have an equivilant to fork, instead the CreateProcess call only allows you to start a new process.
There is an alternative to fork called clone which gives you much more control over what happens when the new process is started. For example you can specify a function to call in the new process.

The copies are "copy-on-write", so if your child process does not modify the data, it will not use any memory besides that of the father process. Typically, after a fork(), the child process makes an exec() to replace the program of this process with a different one, then all the memory is dropped anyway.

I haven't used CreateProcess, but fork() is not an exact copy of the process. It creates a child process, but the child starts its execution at the same instruction in which the parent called fork, and continues from there.
I recommend taking a look at Chapter 5 of the Three Easy Pieces OS book. This may get you started and you might find the child spawning call you're looking for.

The forked child process has almost all the parent facility copied: memory, descriptors, text etc. The only exception is parents' threads, they are not copied.

Start an executable from C++ program & continue

I have a program written in C++ intended to run on a Linux OS. Ignoring much of the program, it boils down to this - it starts X number of executables after some amount of time (for simplicity sake, let's use 5 seconds).
Currently, I'm using system(path/to/executable/executable_name) to do the actual starting of the executable(s) and that works just fine for getting the executable(s) to start.
I'm also trying to maintain a status for each executable (for simplicity sake again, let's just say the status is either "UP" or "DOWN" (running or not running)). I have been able to accomplish this...somewhat...
Backing up just a tad, when my program is told to start the executable(s), the logic looks something like this:
pid = fork()
if (pid < 0) exit 0; //fork failed
if (pid == 0) {
system(path/to/executable/executable_name)
set executable's status to DOWN
} else {
verify executable started
set executable's status to UP
}
Herein lies my problem. fork() causes a child process to be spawned, which is what I thought I needed in order for the original process to continue starting additional executables. I don't want to wait for an executable to stop in order to start another.
However, the executable starts in another child process...which is separate from the parent process... and if I try to set the executable's status to DOWN in the child process when system returns, the parent process does not know about it...
I have a few ideas of what I might need to do:
use threads instead of fork: create a new thread to call system, but would the parent/main thread know about the new thread changing the status of the executable?
use fork and exec: but I'm not sure that would be any better than what I already have (I've read the man pages for fork and exec but I guess I'm still a little fuzzy on how to best utilize exec)
Any suggestions?
EDIT 1
I thought I'd better give a little more context for the logic:
void startAll() {
for each 'executable'
call startExecutable(executable_name)
}
...
void startExecutable (executable_name) {
pid = fork()
if (pid < 0) exit 0; //fork failed
if (pid == 0) {
system(path/to/executable/executable_name)
set executable's status to DOWN
exit (1); <-- this is because once the child process's system returns, I don't want it to return to the above loop and start starting executables
} else {
verify executable started
set executable's status to UP
}
}
EDIT 2
As mentioned at the beginning, this is assuming a simplified setup (a first run if you will). The plan is to handle not just an "UP" or "DOWN" state, but also a third state to handle sending a message to the executables my program has started - "STANDBY." I initially left this piece out to avoid complicating the explanation but I now see that it is imperitive to include.

You need to understand what exactly is happening when you fork. What you're doing is creating a subprocess that's an exact clone of the forking process. All variables currently in memory are copied exactly, and the subprocess has access to all of those copies of all of those variables.
But they're copies, so as you've noticed, fork and exec/system does not on its own handle inter-process communication (IPC). Setting a memory value in one of the processes doesn't alter that variable in any other process, including its parent, because the memory spaces are different.
Also, system is very similar to exec, but gives you much less control over the file descriptors and execution environment. You're effectively already doing a fork and exec, which is what you should be doing.
When you fork properly (as you do in your example), you now have two processes, and neither one is waiting for the other - they just run in completely different codepaths. What you basically want is to have the parent do nothing but sit around waiting for new programs to open, and occassionally check the status of the kids, while the kids run and play as long as they want.
There are IPC solutions such as pipes and message FIFO queues, but that's excessive in your case. In your case, you're just looking for process management. The parent is given the pid of the children. Save it and use it. You can call waitpid to wait for the child to end, but you don't want that. You just want the parent to check the status of the child. One way to do that is check if kill(childPid,0) == 0. If not, then the pid has exited, i.e. it's no longer running. You can also check /proc/childPid for all sorts of information.
If your status is less simple than your question implied, you'll want to look into piping after forking and execing. Otherwise, all you need is process monitoring.
Based on your EDIT 2, you're still within the domain of process management, instead of IPC. The kill command sends a signal to a process (if the command is non-0). What you're looking for is to have the parent kill(childPid, SIGTSTP). On the child side, you just need to make a signal handler, using the signal command. Among many other references, see http://www.yolinux.com/TUTORIALS/C++Signals.html. Basically, you want:
void sigTempStopHandler(int signum) { /* ... */ }
signal(SIGTSTP, sigTempStopHandler);
to be executed in the child code. The parent, of course, would know when this state is sent, so can change the status. You can use other signals for resuming when necessary.
When to pipe vs. signal:
Piping is the most robust IPC you could use - it lets you send any amount of data from one process to another, and can be in whichever direction you want. If you want your parent to send "You've been a very bad boy" to the child, it can, and the child can send "But I'll choose your nursing home one day" to the parent. (Less flippantly, you can pass any data, whether text or binary from one process to another - including objects that you serialize, or just the raw data for objects if it doesn't depend on memory, e.g. an int.)
So far, what you've described is sending simple command structures from the parent to the child, and kill is perfect for that. The child could send signals almost as easily - except that it would need to know the parent's pid to do that. (Not hard to do - before forking, save the pid: int pid = getPid();, now the child knows the parent.) Signals have no data, they're just very raw events, but so far, that sounds like all you're looking for.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js