Ownership of global resources after fork - c++

Consider a Foo that holds some resource
struct Foo
{
~Foo();
};
and a global std::vector<Foo>. Perhaps stupid example, but it illustrates the problem well.
std::vector<Foo> bar;
Now, the processes forks.
If bar is then only modified by the child process, then a call to exit should be the proper thing to do within the child process. If calling _exit, any Foo:s in bar would leak. If the parent added some stuff to bar before the fork, these object may be destroyed twice. Or maybe this is not a problem, because they should be considered as different objects.
What is the proper way of dealing with object lifetime together with a fork. Is the only sane way of dealing with this problems to let the child exec and start over?
I should notice that at this point in the program, there is guaranteed to be only one thread.

What is the proper way of dealing with object lifetime together with a fork?
The proper way to handle shared resources when forking depends on what those objects or resources are.
For any object or variable in the process memory, you automatically get a copy of it upon forking. Each process will then be able to modify or destroy any variable without affecting the other process. This also means that each process is responsible for cleaning up its unique copy of the resource.
For other resources that exist outside the process, like a file, web socket, or shared memory; The best way to handle them will depend on what that resource is. Normally those best practices will be outlined by the library / API you are using to create those resources initially.

Once you've forked, your variables exhibit copy-on-write semantics, so any changes by the child process result in unique variables for the child not shared with the parent. Similarly, changes in the parent process will result in the child having new copies and do not propagate, so the parent can go as far as to exit without interrupting the child. I've done this to implement a self-updating program.
Note that as stated in other answers, "global resources" should be treated on a case by case basis, but variables are not global resources.

Related

Can I cause a Meyers singleton to reinitialize in forked child processes?

Say I have a meyers singleton
Data& get() {
static Data data = initialization_work();
return data;
}
that's already been used and initialized in a parent process. I then fork(2) a child process, and in the child process data may need to be different (ie, i'd like initialization_work() to rerun the first time get() is called in the child).
First, will data automatically be reinitialized in the forked child? My suspicion is no, based on my mediocre knowledge of linux (forked child VA space is a duplicate of parent mapped to physical memory copy-on-write, so i assume child process will see data as already initialized and won't redo initialization_work()) and a couple other questions (C static variables and linux fork,
C++ static variable and multiple processes).
Second, if data will not be reinitialized in the child by default, is there a way to force reinitialization the first time the child calls get()? If not, I can try to figure out some other pattern (and I'd really appreciate any suggestions on what pattern might fit this use case).
First, will data automatically be reinitialized in the forked child? My suspicion is no
Why just a suspicion? Indeed, nothing will be initialized. As #MartinYork mentions - you get a copy of the original process. That's the magic of forking - no need to initialize the world. The parent and the child both proceed to act differently based (almost) only on the return value of the fork() call.
Second, if data will not be reinitialized in the child by default, is there a way to force reinitialization the first time the child calls get()?
That's the same question as asking about forcing a reinitialization of a singleton in the original process. Now, of course there are ways...
You get a non-const reference to data, right? Well, you can destroy it explicitly, then use placement-new on that address. I'm... almost sure that's legit.
#BenVoigt's neat alternative to the above: Use an std::optional<Data> singleton. It's still a sort-of-a-singleton, but you can assign nullopt to it, for destruction, then construct a new Data in place with std::optional::emplace.
Change the class' code to place the data variable somewhere more visible.
Just drop the singleton pattern. It is unlikely you really need it. Watch:
Retiring the Singleton Pattern: Concrete Suggestions for What to use InsteadA talk by Peter Muldoon at CppCon'20.

Shared and local variables C++

I need to implement solution for readers writer problem in file system i'm developing. I was searching on the internet and found out this Wikipedia solution. Since I'm told no starving may exist I chose third solution. Now, I'm new in multithreaded programming and I have one question. How do I sepparate shared and local variables? I wanted to instantiate one ReadersWriters class in every file object that would manage accessing to that file, so every thread need to have it's own prev and current local variables and all of them need to share nreaders variable. I want to place them in ReadersWriters class.
As far as I know there are only two ways for the parent thread to share data with a child thread.
Global Variables
Passing it via a pointer during creation of the child thread.
Obviously new pointers may be tacked onto any existing objects.
Local variables with remain thread local unless you do something to prevent them from being so. Remember that each thread will have its own stack.

C++ - Initialize 2 copies of object owned by different threads

Suppose I have thread A and thread B. Each thread owns a copy of the same object (let's call it "Foo" for simplicity). Certain pieces of data in Foo are initialized from a file on startup. So that means that on startup, I have to set the data read from the file in both copies of Foo. If no data is read from the file, then I still want to set the same data instance in both copies of Foo and not have each thread initialize the data separately.
Due to the architecture of the software I am working on, I am unable to perform this work on construction of either copy of Foo. So I am forced to do some type of Initialize() or similar method that is called after construction of the object. Since this all happens during the initialization phase of the application, I do not need to be concerned about thread-safety, but I am still concerned about the cleanliness of this solution. Here is what I have come up with so far:
ThreadA::Start()
{
...
// Initialize() will read the data from the file if it exists. Otherwise, it will set default values.
m_Foo.Initialize();
// Now call Initialize() for thread B's copy of Foo while running in thread A but before thread B is even started. Pass in thread A's copy of Foo to set the values with. This should be safe to do since thread B has not even started yet.
m_ThreadB.GetFoo().Initialize(m_Foo);
...
}
Is this the cleanest way to initialize the data in both copies of Foo? Thanks!
I think you might be over thinking it because there are threads. If no threads are running, then you can use any method you wish to initialize them because there are no race cases. Don't even think about the threads. For the rest of this, I will call ThreadA and threadB BugbearA and BugbearB instead.
If you only have two Bugbears, and BugbearA always starts before BugbearB, then your solution is the best one can ever come up with.
If the situation is actually more complicated than you let on, I would recommend refactoring the data to have a shared object containing the data that may be initialized from a file, and an unshared object For each Bugbear. The first Bugbear to Start() initializes the shared object. All threads copy that data into their own Foo and start.
This refactor will also set you up for multithreading, if you need it later. All you have to do is provide proper synchronization for the shared object.

Static objects and singletons

I'm using boost singletons (from serialization).
For example, there are some classes which inherit boost::serialization::singleton. Each of them has such define near it's definition (in h-file):
#define appManager ApplicationManager::get_const_instance()
class ApplicationManager: public boost::serialization::singleton<ApplicationManager> { ... };
And I have to call some method from that class each update (nearly 17 ms), for example, 200 times. So the code is like:
for (int i=0; i < 200; ++i)
appManager.get_some_var();
I looked with gprof at function call stack and saw that boost::get_const_instance calls each time. Maybe, in release-mode compiler will optimize this?
My idea is to make some global variable like:
ApplicationManager &handle = ApplicationManager::get_const_instance();
And use handle, so it wouldn't call get_const_instnace each time. Is that right?
Instead of using the Singleton anti-pattern, just a global variable and be done with it. It's more honest.
The main benefit of Singleton is when you want lazy initialization, or more fine grained control over initialization order than a global variable would allow you. It doesn't look like either of these things are a concern for you, so just use a global.
Personally, I think designs with global variables or Singletons are almost certainly broken. But to each h(is/er) own.
If you are bent on using a Singleton, the performance concern you raise is interesting, but likely not an issue as the function call overhead is probably less than 100ns. As was pointed out, you should profile. If it really concerns you a whole lot, store a local reference to the Singleton before the loop:
ApplicationManager &myAppManager = appManager;
for (int i=0; i < 200; ++i)
myAppManager.get_some_var();
BTW, using that #define in that way is a serious mistake. Almost all cases where you use the preprocessor for anything other than conditional compilation based on compile-time flags is probably a poor use. Boost does make extensive use of the pre-processor, but mostly to get around C++ limitations. Do not emulate it.
Lastly, that function is probably doing something important. One of the jobs of a get_instance method for Singletons is to avoid having multiple threads initialize the same Singleton at the same time. With global variables this shouldn't be an issue because they should be initialized before you've started any threads.
Is it really a problem? I mean, does your application really suffers for this behaviour?
I would despise such a solution because, in all effects, you are countering one of the benefits of the Singleton pattern, namely to avoid global variables. If you want to use a global variable, then don't use Singleton at all, right?
Yes, that is certainly a possible solution. I'm not entirely sure what boost is doing with its singleton behind the scenes; you can look that up yourself in the code.
The singleton pattern is just like creating a global object and accessing the global object, in most respects. There are some differences:
1) The singleton object instance is not created until it is first accessed, whereas the global object is created at program startup.
2) Because the singleton object is not created until it is first accessed, it is actually created when the program is running. Thus the singleton instance has access to other fully constructed objects in the program when the constructor is actually running.
3) Because you access the singleton through the getInstance() method (boost's get_const_instance method) there is a little bit of overhead for executing that method call.
So if you're not concerned about when the singleton is actually created, and can live with it being created at program startup, you could just go with a global variable and access that. If you really need the singleton created after the program starts up, then you need the singleton. In that case, you can grab and hold onto a reference to the object returned by get_const_instance() and use that reference.
Something that bit me in the past though you should be aware of. You're actually getting a reference to the object that is owned by the singleton. You don't own that object.
1) Do not write code that would cause the destructor to execute (say, using a shared pointer on the returned reference), or write any other code that could cause the object to end up in a bad state.
2) In a multi-threaded app, take care to correctly lock fields in the object if the object may be used by more than one thread.
3) In a multi-threaded app, make sure that all threads that hold onto references to the object terminate before the program is unloaded. I've seen a case where the singleton's code resides in one DLL library; a thread that holds the reference lives in another DLL library. When the program ends, the thread was still active. The DLL holding the singleton's code was unloaded first; the thread that was still alive tried to do something to the singleton's object and caused a crash.
Singletons have their advantages in situations where you want to control the level of access to something at process or application scope beyond what a global variable could achieve in a more elegant way.
However most singleton objects provided in a library will be designed to ensure some level of thread safety and most likely access to the instance is being locked via a mutex or other critical section of some kind which can affect performance.
In the case of a game or 3d application where performance is key you may want to consider making your own lightweight singleton if thread safety is not a concern and gain some performance.

Cleaning up threads referencing an object when deleting the object (in C++)

I have an object (Client * client) which starts multiple threads to handle various tasks (such as processing incoming data). The threads are started like this:
// Start the thread that will process incoming messages and stuff them into the appropriate queues.
mReceiveMessageThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)receiveRtpMessageFunction, this, 0, 0);
These threads all have references back to the initial object, like so:
// Thread initialization function for receiving RTP messages from a newly connected client.
static int WINAPI receiveRtpMessageFunction(LPVOID lpClient)
{
LOG_METHOD("receiveRtpMessageFunction");
Client * client = (Client *)lpClient;
while(client ->isConnected())
{
if(client ->receiveMessage() == ERROR)
{
Log::log("receiveRtpMessageFunction Failed to receive message");
}
}
return SUCCESS;
}
Periodically, the Client object gets deleted (for various good and sufficient reasons). But when that happens, the processing threads that still have references to the (now deleted) object throw exceptions of one sort or another when trying to access member functions on that object.
So I'm sure that there's a standard way to handle this situation, but I haven't been able to figure out a clean approach. I don't want to just terminate the thread, as that doesn't allow for cleaning up resources. I can't set a property on the object, as it's precisely properties on the object that become inaccessible.
Thoughts on the best way to handle this?
I would solve this problem by introducing a reference count to your object. The worker thread would hold a reference and so would the creator of the object. Instead of using delete, you decrement from the reference count and whoever drops the last reference is the one that actually calls delete.
You can use existing reference counting mechanisms (shared_ptr etc.), or you can roll your own with the Win32 APIs InterlockedIncrement() and InterlockedDecrement() or similar (maybe the reference count is a volatile DWORD starting out at 1...).
The only other thing that's missing is that when the main thread releases its reference, it should signal to the worker thread to drop its own reference. One way you can do this is by an event; you can rewrite the worker thread's loop as calls to WaitForMultipleObjects(), and when a certain event is signalled, you take that to mean that the worker thread should clean up and drop the reference.
You don't have much leeway because of the running threads.
No combination of shared_ptr + weak_ptr may save you... you may call a method on the object when it's valid and then order its destruction (using only shared_ptr would).
The only thing I can imagine is to first terminate the various processes and then destroy the object. This way you ensure that each process terminate gracefully, cleaning up its own mess if necessary (and it might need the object to do that).
This means that you cannot delete the object out of hand, since you must first resynchronize with those who use it, and that you need some event handling for the synchronization part (since you basically want to tell the threads to stop, and not wait indefinitely for them).
I leave the synchronization part to you, there are many alternatives (events, flags, etc...) and we don't have enough data.
You can deal with the actual cleanup from either the destructor itself or by overloading the various delete operations, whichever suits you.
You'll need to have some other state object the threads can check to verify that the "client" is still valid.
One option is to encapsulate your client reference inside some other object that remains persistent, and provide a reference to that object from your threads.
You could use the observer pattern with proxy objects for the client in the threads. The proxies act like smart pointers, forwarding access to the real client. When you create them, they register themselves with the client, so that it can invalidate them from its destructor. Once they're invalidated, they stop forwarding and just return errors.
This could be handled by passing a (boost) weak pointer to the threads.