Under Unix / Linux, what happens to my active RAII objects upon forking? Will there be double deletions?
What is with copy construction and -assignment? How to make sure nothing bad happens?
fork(2) creates a full copy of the process, including all of its memory. Yes, destructors of automatic objects will run twice - in the parent process and in the child process, in separate virtual memory spaces. Nothing "bad" happens (unless of course, you deduct money from an account in a destructor), you just need to be aware of the fact.
Principally, it is no problem to use these functions in C++, but you have to be aware of what data is shared and how.
Consider that upon fork(), the new process gets a complete copy of the parent's memory (using copy-on-write). Memory is state, therefore
you have two independent processes that must leave a clean state behind.
Now, as long as you stay within the bounds of the memory given to you, you should not have any problem at all:
#include <iostream>
#include <unistd.h>
class Foo {
public:
Foo () { std::cout << "Foo():" << this << std::endl; }
~Foo() { std::cout << "~Foo():" << this << std::endl; }
Foo (Foo const &) {
std::cout << "Foo::Foo():" << this << std::endl;
}
Foo& operator= (Foo const &) {
std::cout << "Foo::operator=():" << this<< std::endl;
return *this;
}
};
int main () {
Foo foo;
int pid = fork();
if (pid > 0) {
// We are parent.
int childExitStatus;
waitpid(pid, &childExitStatus, 0); // wait until child exits
} else if (pid == 0) {
// We are the new process.
} else {
// fork() failed.
}
}
Above program will print roughly:
Foo():0xbfb8b26f
~Foo():0xbfb8b26f
~Foo():0xbfb8b26f
No copy-construction or copy-assignment happens, the OS will make bitwise copies.
The addresses are the same because they are not physical addresses, but pointers into each process' virtual memory space.
It becomes more difficult when the two instances share information, e.g. an opened file that must be flushed and closed before exiting:
#include <iostream>
#include <fstream>
int main () {
std::ofstream of ("meh");
srand(clock());
int pid = fork();
if (pid > 0) {
// We are parent.
sleep(rand()%3);
of << "parent" << std::endl;
int childExitStatus;
waitpid(pid, &childExitStatus, 0); // wait until child exits
} else if (pid == 0) {
// We are the new process.
sleep(rand()%3);
of << "child" << std::endl;
} else {
// fork() failed.
}
}
This may print
parent
or
child
parent
or something else.
Problem being that the two instances do not enough to coordinate their access to the same file, and you don't know the implementation details of std::ofstream.
(Possible) solutions can be found under the terms "Interprocess Communication" or "IPC", the most nearby one would be waitpid():
#include <unistd.h>
#include <sys/wait.h>
int main () {
pid_t pid = fork();
if (pid > 0) {
int childExitStatus;
waitpid(pid, &childExitStatus, 0); // wait until child exits
} else if (pid == 0) {
...
} else {
// fork() failed.
}
}
The most simple solution would be to ensure that each process only uses its own virtual memory, and nothing else.
The other solution is a Linux specific one: Ensure that the sub-process does no clean up. The operating system will make a raw, non-RAII cleanup of all acquired memory and close all open files without flushing them.
This can be useful if you are using fork() with exec() to run another process:
#include <unistd.h>
#include <sys/wait.h>
int main () {
pid_t pid = fork();
if (pid > 0) {
// We are parent.
int childExitStatus;
waitpid(pid, &childExitStatus, 0);
} else if (pid == 0) {
// We are the new process.
execlp("echo", "echo", "hello, exec", (char*)0);
// only here if exec failed
} else {
// fork() failed.
}
}
Another way to just exit without triggering any more destructors is the exit() function. I generally advice to not use in C++, but when forking, it has its place.
References:
http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
man pages
The currently accepted answer shows a synchronization problem which frankly has nothing to do with what problems RAII can really cause. That is, whether you use RAII or not, you will have synchronization problems between parent and child. Heck, if you run the same process in two different consoles, you have the exact same synchronization problem! (i.e. no fork() involved in your program, just your program running twice in parallel.)
To resolve synchronization problems, you may use a semaphore. See sema_open(3) and related functions. Note that a thread would generate the exact same synchronization problems. Only you can use a mutex to synchronize multiple threads and in most cases a mutex is much faster than a semaphore..
So where you do get a problem with RAII is when you use it to hold on what I call an external resource, although all external resources are not affected the same way. I have had the problem in two circumstances and I will show both here.
Do not shutdown() a socket
Say you have your own socket class. In the destructor, you do a shutdown. After all, once you are done, you can as well send a message to the other end of the socket saying you are done with the connection:
class my_socket
{
public:
my_socket(char * addr)
{
socket_ = socket(s)
...bind, connect...
}
~my_socket()
{
if(_socket != -1)
{
shutdown(socket_, SHUT_RDWR);
close(socket_);
}
}
private:
int socket_ = -1;
};
When you use this RAII class, the shutdown() function affects the socket in the parent AND the child. That means both, the parent and the child cannot read nor write to that socket anymore. Here I suppose that the child does not use the socket at all (and thus I have absolutely no synchronization problems,) but when the child dies, the RAII class wakes up and the destructor gets called. At that point it shutdowns the socket which becomes unusable.
{
my_socket soc("127.0.0.1:1234");
// do something with soc in parent
...
pid_t const pid(fork());
if(pid == 0)
{
int status(0);
waitpid(pid, &status, 0);
}
else if(pid > 0)
{
// the fork() "duplicated" all memory (with copy-on-write for most)
// and duplicated all descriptors (see dup(2)) which is why
// calling 'close(s)' is perfectly safe in the child process.
// child does some work
...
// here 'soc' calls my_socket::~my_socket()
return;
}
else
{
// fork did not work
...
}
// here my_socket::~my_socket() was called in child and
// the socket was shutdown -- therefore it cannot be used
// anymore!
// do more work in parent, but cannot use 'soc'
// (which is probably not the wanted behavior!)
...
}
Avoid using socket in parent and child
Another possibility, still with a socket (although you could have the same effect with a pipe or some other mechanism used to communicate externally,) is to end up sending a "BYE" command twice. This is actually very close to being a synchronization problem, though, but in this case, that synchronization happens in the RAII object when it gets destroyed.
Say for example that you create a socket and manage it in an object. Whenever the object gets destroyed, you want to tell the other side by sending a "BYE" command:
class communicator
{
public:
communicator()
{
socket_ = socket();
...bind, connect...
}
~communicator()
{
write(socket_, "BYE\n", 4);
// shutdown(socket_); -- now we know not to do that!
close(socket_);
}
private
int socket_ = -1;
};
In this case, the other end receives the "BYE" command and closes the connection. Now the parent cannot communicate using that socket since it got closed by the other end!
This is very similar to what phresnel talks about with his ofstream example. Only, it is not an easy to fix synchronization. The order in which you write the "BYE\n" or another command to the socket won't change the fact that in the end the socket gets closed from the other side (i.e. synchronization can be achieved using an inter-process lock, whereas, that "BYE" command is similar to the shutdown() command, it stops the communication in its track!)
A Solution
For the shutdown() it was easy enough, we just do not call the function. That being said, maybe you still wanted to have the shutdown() happen in the parent, just not in the child.
There are several ways to fix the problem, one of them is to memorize the pid and use it to know whether these destructive function calls should be called or not. There is a possible fix:
class communicator
{
communicator()
: pid_(getpid())
{
socket_ = socket();
...bind, connect...
}
~communicator()
{
if(socket_ != -1)
{
if(pid_ == getpid())
{
write(socket_, "BYE\n", 4);
shutdown(socket_, SHUT_RDWR);
}
close(socket_);
}
}
private:
pid_t pid_;
int socket_;
};
Here we do the write() and shutdown() only if we are in the parent.
Notice that the child can (and is expected to) do the close() on the socket descriptor since the fork() called dup() on all the descriptors so the child has a different file descriptor to each file it holds.
Another Security Guard
Now there may be way more complicated cases where an RAII object is created way up in a parent and the child will call the destructor of that RAII object anyway. As mentioned by roemcke, calling _exit() is probably the safest thing to do (exit() works in most cases, but it can have unwanted side effects in the parent, at the same time, exit() may be required for the child to end cleanly--i.e. delete tmpfile() it created!). In other words, instead of using return, call _exit().
pid_t r(fork());
if(r == 0)
{
try
{
...child do work here...
}
catch(...)
{
// you probably want to log a message here...
}
_exit(0); // prevent stack unfolding and calls to atexit() functions
/* NOT REACHED */
}
This is anyway much safer just because you probably do not want the child to return in the "parent's code" where many other things could happen. Not just stack unfolding. (i.e. continuing a for() loop that the child is not supposed to continue...)
The _exit() function does not return, so destructors of objects defined on the stack do not get called. The try/catch is very important here because the _exit() is not going to be called if the child raises an exception, although it should call the terminate() function which also won't destroy all the heap allocated objects, it calls the terminate() function after it unfolded the stack and thus probably called all your RAII destructors... and again not what you would expect.
The difference between exit() and _exit() is that the former calls you atexit() functions. You relatively rarely need to do that in the child or the parent. At least, I never had any strange side effect. However, some libraries do make use of the atexit() without consideration of the possibility a fork() gets called. One way to protect yourself in an atexit() function is to record the PID of the process which requires the atexit() function. If when the function gets called the PID doesn't match, then you just return and do nothing else.
pid_t cleanup_pid = -1;
void cleanup()
{
if(cleanup_pid != getpid())
{
return;
}
... do your clean up here ...
}
void some_function_requiring_cleanup()
{
if(cleanup_pid != getpid())
{
cleanup_pid = getpid();
atexit(cleanup);
}
... do work requiring cleanup ...
}
Obviously, the number of libraries that use atexit() and do it right is probably very close to 0. So... you should avoid such libraries.
Remember that if you call execve() or _exit(), the cleanup will not occur. So in case of a tmpfile() call in the child + _exit(), that temporary file will not get deleted automatically...
Unless you know what you are doing, the child process should always call _exit() after it has done its stuff:
pid_t pid = fork()
if (pid == 0)
{
do_some_stuff(); // Make sure this doesn't throw anything
_exit(0);
}
The underscore is important. Do not call exit() in child process, it flushes stream buffers to disk (or wherever the filedescriptor is pointing), and you will end up with things written twice.
Related
Saying that I have used fork to create one child process. Here is an example:
pid_t pid=fork();
if (pid==0) /* child */
{
// do something
exit(0); // _exit, exit or return????
}
else /* parrent */
{
wait(nullptr);
return 0;
}
I've seen many examples of fork. Some of them used _exit to terminate the child process to avoid flush the I/O buffer, others used exit to terminate the child process. But non of them used return. As my understanding, _exit and exit won't call destructors automatically, so is it better to call return instead of exit in the child process? Or because all examples that I've ever seen are C, instead of C++, so they don't need to worry about destructors?
You can use either _exit or exit, but you shouldn't use return. When you fork a child, you retain the entire call stack as part of forking the child. So if you use return, you end up returning up all the way through your program, potentially continuing on and performing other tasks, which is almost certainly not what you want.
For example, if you have something like this snippet:
int get_value()
{
pid_t pid;
if (!(pid = fork())) {
int x = 0;
// do something with x.
exit(x);
}
else {
int status;
wait(&status);
return status;
}
}
int main()
{
int value = get_value();
switch (get_value()) {
case 0:
// call f
break;
case 255 << 8:
// call g
break;
}
}
you'll could end up calling f or g or doing other work with return, which is definitely not desired.
If you call _exit, functions that are registered with atexit are not called. This is the right thing to do in threaded environments. If you're not working in a threaded environment and you don't have any handlers registered with atexit, then they should be functionally equivalents.
If you want destructors in your child process to be called, put the child process code in its own function and let its variables be automatically destroyed when they go out of scope. exit will not destroy objects for you, which is good because usually you do not want to destroy objects created in the parent process in your child process.
You could use return if you are looking for an exit code of the child process, just to say the process ran and executed correctly/not. Same as you do with your main function in a program. Otherwise just use exit to stop the process from running any further.
fork will copy the whole process, its not equivalent to launching a thread with a new main function.
Returning will simply return from the current function and the execution of the child will continue in the enclosing function.
So in you snippet you have to terminate the child or it will "escape". You can do that by calling exit() or std::terminate(). No destructors are called in both cases. Don't mix two different languages.
If you really need to call the destructors in the child, throw an exception and catch it in main. That will unwind the stack correctly.
Exit command should be avoid to use in any case except from ending the execution of the programme. For anything else, I would use return.
My assignment requires me to encapsulate the principle of process handling.
Here's what my Process class contains:
class Process
{
public:
Process();
~Process();
pid_t getPid() const;
private:
pid_t pid_;
};
Constructor:
Process::Process()
{
this->pid_ = fork();
}
Destructor:
Process::~Process()
{
if (this->pid_ > 0)
kill(this->pid_, SIGKILL);
}
Here's the problem: after encapsulating and creating an object like such:
void example()
{
Process pro;
if (pro.pid_ == 0)
{
// Child Process
}
else if (pro.pid_ < 0)
{
// Error
}
else
{
// Parent Process
}
}
My program almost never enters the child code, but when I fork() normally (with no encapsulation) it works like a charm.
Where did I go wrong?
This sounds like a race condition. In your case the parent most of the time seems to kill the child before it does what you expected from it. Without synchronization between the two processes, any execution order is possible. If you want the child always to perform work, you have to implement some form of synchronization. A common hack (i.e. not really synchronization) is [sleep][2]()ing in the parent. Very common is [wait][2]()ing in the parent for completion of the child.
I have some code that looks like this and I'm unsure how to handle the part which will never get executed since a part of this code runs in infinite loop while waiting for connections and when I terminate the program, it exits from there only.
main(){
// do some stuff....
while(1) {
int newFD =
accept(sockFD, (struct sockaddr *)&client_addr, &client_addr_size);
if(newFD == -1) {
std::cerr << "Error while Accepting on socket" << std::endl;
continue;
}
if(!fork()) {
close(sockFD); // close child's sockfd - not needed here
// lalala do stuff send message here
close(newFD); // finally close its newFD - message sent, no use
return 0;
}
close(newFD); // close parent's newFD - no use here
}
// now execution never reaches here
close(sockFD); // so how to handle this?
freeaddrinfo(res); // and this?
return 0;
}
You can, and probably should add a exit handler if your code is to be used by other people or you yourself just want it cleaner. In your exit handler you can toggle a flag that makes the while() loop terminate. The following code will work 100% fine for this use case and is reliable and cross platform, but if you want to do more complicated things you should use proper thread safe OS specific functions or something like Boost or C++11
First declare two global variables, make them volatile so the compiler will always force us to read or write its actually memory value. If you we do not declare it volatile then it is possible the compiler can put its value in a register which will make this not work. With volatile set it will read the memory location on every loop and work correctly, even with multiple threads.
volatile bool bRunning=true;
volatile bool bFinished=false;
and instead of your while(1) {} loop, change it to this
while(bRunning)
{
dostuff
}
bFinished=true;
In your exit handler simply set bRunning=false;
void ExitHandler()
{
bRunning=false;
while(bFinished==false) { Sleep(1); }
}
You didn't specify an operating system but it looks like you are Linux based, to set a handler on Linux you need this.
void ExitHandler(int s)
{
bRunning=false;
}
int main()
{
struct sigaction sigIntHandler;
sigIntHandler.sa_handler = ExitHandler;
sigemptyset(&sigIntHandler.sa_mask);
sigIntHandler.sa_flags = 0;
sigaction(SIGINT, &sigIntHandler, NULL);
while(bRunning)
{
dostuff
}
...error_handling...
}
And on Windows when you are a console app its the following.
BOOL WINAPI ConsoleHandler(DWORD CEvent)
{
switch (CEvent)
{
case CTRL_C_EVENT:
case CTRL_BREAK_EVENT:
case CTRL_CLOSE_EVENT:
case CTRL_LOGOFF_EVENT:
case CTRL_SHUTDOWN_EVENT:
bRunning = false;
while (bFinished == false) Sleep(1);
break;
}
return TRUE;
}
int main()
{
SetConsoleCtrlHandler(ConsoleHandler, TRUE);
while(bRunning()
{
dostuff
}
...error_handling...
}
Notice the need to test and wait for bFinished here. If you don't do this on Windows your app may not have enough time to shutdown as the exit handler is called by a separate OS specific thread. On Linux this is not necessary and you need to exit from your handler for your main thread to continue.
Another thing to note is by default Windows only gives you ~5 seconds to shut down before it terminates you. This is unfortunate in many cases and if more time is needed you will need to change the registry setting (bad idea) or implement a service which has better hooks into such things. For your simple case it will be fine.
For these things, the OS will take care of properly releasing the resources on shutdown. However, more generally, you still need to make sure that allocated resources don't pile up during program execution, even if they are reclaimed by the OS automatically, because such a resource leak will still influence behaviour and performance of your program.
Now, concerning the resources at hand, there's no reason not to treat them like all resources in C++. The accepted rule is to bind them to an object that will release them in their destructor, see also the RAII idiom. That way, even if at some later stage someone added a break statement the code would still behave correctly.
BTW: The more serious problem I see here is the lack of proper error handling in general.
I'd like to create a process by calling a executable, just as popen would allow. But I don't want to actually communicate through a pipe with it: I want to control it, like sending signals there or find out if the process is running, wait for it to finish after sending SIGINT and so on, just like multiprocessing in Python works. Like this:
pid_t A = create_process('foo');
pid_t B = create_process('bar');
join(B); // wait for B to return
send_signal(A, SIGINT);
What's the proper way to go?
Use case for example:
monitoring a bunch of processes (like restarting them when they crash)
UPDATE
I see in which direction the answers are going: fork(). Then I'd like to modify my use case: I'd like to create a class which takes a string in the constructor and is specified as follows: When an object is instantiated, a (sub)process is started (and controlled by the instance of the class), when the destructor is called, the process gets the terminate signal and the destructor returns as soon as the process returned.
Use case now: In a boost state chart, start a process when a state is entered, and send termination when the state has been left. I guess, http://www.highscore.de/boost/process/process/tutorials.html#process.tutorials.start_child is the thing that comes closest to what I'm looking for, excpet that it seems outdated.
Isn't that possible in a non-invasive way? Maybe I have a fundamental misunderstanding and there is a better way to do this kind of work, if so I'd be glad to get some hints.
UPDATE 2
Thanks to the answers below, I think I got the idea a little bit. I thought, this example would print "This is main" three times, once for the "parent", and once for each fork() – but that's wrong. So: Thank you for the patient answers!
#include <iostream>
#include <string>
#include <unistd.h>
struct myclass
{
pid_t the_pid;
myclass(std::string the_call)
{
the_pid = fork();
if(the_pid == 0)
{
execl(the_call.c_str(), NULL);
}
}
};
int main( int argc, char** argv )
{
std::cout << "This is main" << std::endl;
myclass("trivial_process");
myclass("trivial_process");
}
The below is not a realistic code at all, but it gives you some idea.
pid_t pid = fork()
if (pid == 0) {
// this is child process
execl("foo", "foo", NULL);
}
// continue your code in the main process.
Using the previously posted code, try this:
#include <signal.h>
#include <unistd.h>
class MyProc
{
public:
MyProc( const std::string& cmd)
{
m_pid = fork()
if (pid == 0) {
execl(cmd.c_str(), cmd.c_str(), NULL);
}
}
~MyProc()
{
// Just for the case, we have 0, we do not want to kill ourself
if( m_pid > 0 )
{
kill(m_pid, SIGKILL);
wait(m_pid);
}
}
private:
pid_t m_pid;
}
The downside I see on this example will be, you can not be sure, the process has finished (and probably he will not) if the signal is emitted, since the OS will continue after the kill immediately and the other process may get it delayed.
To ensure this, you may use ps ... with a grep to the pid, this should work then.
Edit: I have added the wait, which cames up in a comment up there!
Have a look to fork() (man 2 fork)
I have a program that uses fork() to create a child process. I have seen various examples that use wait() to wait for the child process to end before closing, but I am wondering what I can do to simply check if the file process is still running.
I basically have an infinite loop and I want to do something like:
if(child process has ended) break;
How could I go about doing this?
Use waitpid() with the WNOHANG option.
int status;
pid_t result = waitpid(ChildPID, &status, WNOHANG);
if (result == 0) {
// Child still alive
} else if (result == -1) {
// Error
} else {
// Child exited
}
You don't need to wait for a child until you get the SIGCHLD signal. If you've gotten that signal, you can call wait and see if it's the child process you're looking for. If you haven't gotten the signal, the child is still running.
Obviously, if you need to do nothing unitl the child finishes, just call wait.
EDIT: If you just want to know if the child process stopped running, then the other answers are probably better. Mine is more to do with synchronizing when a process could do several computations, without necessarily terminating.
If you have some object representing the child computation, add a method such as bool isFinished() which would return true if the child has finished. Have a private bool member in the object that represents whether the operation has finished. Finally, have another method private setFinished(bool) on the same object that your child process calls when it finishes its computation.
Now the most important thing is mutex locks. Make sure you have a per-object mutex that you lock every time you try to access any members, including inside the bool isFinished() and setFinished(bool) methods.
EDIT2: (some OO clarifications)
Since I was asked to explain how this could be done with OO, I'll give a few suggestions, although it heavily depends on the overall problem, so take this with a mound of salt. Having most of the program written in C style, with one object floating around is inconsistent.
As a simple example you could have a class called ChildComputation
class ChildComputation {
public:
//constructor
ChildComputation(/*some params to differentiate each child's computation*/) :
// populate internal members here {
}
~ChildComputation();
public:
bool isFinished() {
m_isFinished; // no need to lock mutex here, since we are not modifying data
}
void doComputation() {
// put code here for your child to execute
this->setFinished(true);
}
private:
void setFinished(bool finished) {
m_mutex.lock();
m_isFinished = finished;
m_mutex.unlock();
}
private:
// class members
mutex m_mutexLock; // replace mutex with whatever mutex you are working with
bool m_isFinished;
// other stuff needed for computation
}
Now in your main program, where you fork:
ChildComputation* myChild = new ChildComputation(/*params*/);
ChildPID= fork();
if (ChildPID == 0) {
// will do the computation and automatically set its finish flag.
myChild->doComputation();
}
else {
while (1) { // your infinite loop in the parent
// ...
// check if child completed its computation
if (myChild->isFinished()) {
break;
}
}
// at the end, make sure the child is no runnning, and dispose of the object
// when you don't need it.
wait(ChildPID);
delete myChild;
}
Hope that makes sense.
To reiterate, what I have written above is an ugly amalgamation of C and C++ (not in terms of syntax, but style/design), and is just there to give you a glimpse of synchronization with OO, in your context.
I'm posting the same answer here i posted at as this question How to check if a process is running in C++? as this is basically a duplicate. Only difference is the use case of the function.
Use kill(pid, sig) but check for the errno status. If you're running as a different user and you have no access to the process it will fail with EPERM but the process is still alive. You should be checking for ESRCH which means No such process.
If you're running a child process kill will succeed until waitpid is called that forces the clean up of any defunct processes as well.
Here's a function that returns true whether the process is still running and handles cleans up defunct processes as well.
bool IsProcessAlive(int ProcessId)
{
// Wait for child process, this should clean up defunct processes
waitpid(ProcessId, nullptr, WNOHANG);
// kill failed let's see why..
if (kill(ProcessId, 0) == -1)
{
// First of all kill may fail with EPERM if we run as a different user and we have no access, so let's make sure the errno is ESRCH (Process not found!)
if (errno != ESRCH)
{
return true;
}
return false;
}
// If kill didn't fail the process is still running
return true;
}