C++ - Initialize 2 copies of object owned by different threads

C++ - Initialize 2 copies of object owned by different threads - c++

Suppose I have thread A and thread B. Each thread owns a copy of the same object (let's call it "Foo" for simplicity). Certain pieces of data in Foo are initialized from a file on startup. So that means that on startup, I have to set the data read from the file in both copies of Foo. If no data is read from the file, then I still want to set the same data instance in both copies of Foo and not have each thread initialize the data separately.
Due to the architecture of the software I am working on, I am unable to perform this work on construction of either copy of Foo. So I am forced to do some type of Initialize() or similar method that is called after construction of the object. Since this all happens during the initialization phase of the application, I do not need to be concerned about thread-safety, but I am still concerned about the cleanliness of this solution. Here is what I have come up with so far:
ThreadA::Start()
{
...
// Initialize() will read the data from the file if it exists. Otherwise, it will set default values.
m_Foo.Initialize();
// Now call Initialize() for thread B's copy of Foo while running in thread A but before thread B is even started. Pass in thread A's copy of Foo to set the values with. This should be safe to do since thread B has not even started yet.
m_ThreadB.GetFoo().Initialize(m_Foo);
...
}
Is this the cleanest way to initialize the data in both copies of Foo? Thanks!

I think you might be over thinking it because there are threads. If no threads are running, then you can use any method you wish to initialize them because there are no race cases. Don't even think about the threads. For the rest of this, I will call ThreadA and threadB BugbearA and BugbearB instead.
If you only have two Bugbears, and BugbearA always starts before BugbearB, then your solution is the best one can ever come up with.
If the situation is actually more complicated than you let on, I would recommend refactoring the data to have a shared object containing the data that may be initialized from a file, and an unshared object For each Bugbear. The first Bugbear to Start() initializes the shared object. All threads copy that data into their own Foo and start.
This refactor will also set you up for multithreading, if you need it later. All you have to do is provide proper synchronization for the shared object.

Related

Multithreading: first access from another thread after being written... do I need volatile?

Hope the question is easy to understand. In any case, let's write some code...
struct MyClass
{
int a;
std::vector<char> b;
...
}
MyClass object;
Let's say I am running two threads A and B:
Thread A creates object and reads or writes some data members except a and b.
Thread A passes a pointer to object to a function that will run in thread B.
Thread B writes a or adds data to vector b.
Thread A reads a and b (i.e., access these for the first time).
If the answer is yes, i.e., that I need volatile in this case, I have another question: why is it thread safe to use immutable objects where data members are written in one thread and read in many others? It seems very similar to this case :)

why is it thread safe to use immutable objects where data members are written in one thread and read in many others?
Immutable objects are created and then never change after that point. Since multithreaded access to an object is only possible after it is created (what is there to share before then) every thread will always see the same value.
So there is never a case where an immutable object will appear differently on separate threads, since there is no way for an immutable object can change state after it has been shared.

Destructors and asynchronous tasks

I have a class which calls an asynchronous task using std::async in his constructor for loading its content. ( I want the loading of the object done asynchronously )
The code looks like this:
void loadObject(Object* object)
{
// ... load object
}
Object::Object():
{
auto future = std::async(std::launch::async, loadObject, this);
}
I have several instances of these objects getting created and deleted on my main thread, they can get deleted any time, even before their loading has finished.
I'd like to know if it is dangerous to having object getting destroyed when it is still getting handled on another thread. And how can I stop the thread if the object gets destroyed ?
EDIT: The std::future destructor does not block my code with the VS2013's compiler that I am using due to a bug.

As MikeMB already mentioned, your constructor doesn't finish until the load has been completed. Check this question for how to overcome that: Can I use std::async without waiting for the future limitation?
I'd like to know if it is dangerous to having object getting destroyed when it is still getting handled on another thread.
Accessing object's memory after deletion is certainly dangerous, yes. The behaviour will be undefined.
how can I stop the thread if the object gets destroyed ?
What I recommend you to take care of first, is to make sure that the object doesn't get destroyed while it's still being pointed at by something that is going to use it.
One approach is to use a member flag signifying completed load that is updated in the async task and checked in the destructor and synchronize the access with a condition variable. That will allow the destructor to block until the async task is complete.
Once you've managed to prevent the object from being destroyed, you can use another synchronized member flag to signify that the object is being destroyed and skip the loading if it's set. That'll add synchronization overhead but may be worth it if loading is expensive.
Another approach which avoids blocking destructor is to pass a std::shared_ptr to the async task and require all Object instances to be owned by a shared pointer. That limitation may not be very desireably and you'll need to inherit std::enable_shared_from_this to get the shared pointer in the constructor.

There is nothing asynchronous happening in your code, because the constructor blocks until loadObject() returns (The destructor of a future returned by std::async implicitly joins).
If it would not, it would depend on how you have written your code (and especially your destructor), but most probably, your code would incur undefined behavior.

Yes it is dangerous to having object getting destroyed when it is still getting handled on another thread
You can implement a lot of strategies actually depending on requirements and desired behaviour.
I would implement sort of pimpl strategy here, that means that all actual data will be stored in the pointer that your object holds. You will load all the data to the data-pointer-object and store it in the public-object atomically.
Techincally speaking object should be fully constrcuted and ready to use by the time the constrcutor is finished. In your case data-pointer-object will still probably be not ready to use. And you should make your class to handle correctly that state.
So here we go:
class Object
{
std::shared_ptr<Object_data> d;
Object::Object():
d(std::make_shared<Object_data>())
{
some_futures_matser.add_future(std::async(std::launch::async, loadObject, d));
}
}
Then you make atomic flag in your data-object that will signal that loading is complete and object is ready to use.
class Object_data
{
// ...
std::atomic<bool> loaded {false};
};
loadObject(std::shared_ptr<Object_data> d)
{
/// some load code here
d->loaded = true;
}
You have to check if your object is constrcuted every time when you acces it (with thread safe way) through loaded flag

Preventing variables from going out of scope so they persist for another thread

I have a function that creates a bunch of local variables, then passes their addresses to a function that runs in a separate thread - something like this:
void MyFunction()
{
MyClass a;
AnotherClass b;
...
FinalClass z;
CallFunctionInNewThread(&a,&b,&c,...,&z);
}
Of course, these variables are destroyed when the MyFunction goes out of scope (so the function in a thread is now pointing to garbage), so this setup doesn't work. What are my options here? If I allocate the variables on the heap with 'new', I will never get a chance to delete them. If I make them smart pointers or similar, I'd have to make the threaded function accept them as smart pointers, or their reference count will not be increased so they will still get destroyed immediately. It seems like they kind of want to be member variables of a wrapper class of MyFunction, but there are a few hundred lines and tens of these things and that would just be crazy messy. Are there any other choices?

Are there any other choices?
Simply copy (if trivial) or move/swap the data (if heavy to create) -- similar to transferring ownership from one thread to the other. Seems Thread A really does not need a reference from the description. Bonus: This removes concurrent access complexities from your program.

One little trick you can do is to pass a semaphore object into the thread function and then wait for that semaphore to be signaled. You do need to check that the thread was created successfully.
The new thread first makes local copies of the values (or references in the case of smart pointers), then signals the semaphore and carries on.
The calling thread can then continue and drop those objects off its stack without interfering with your new thread. It can even delete the semaphore object since it is no longer required by either thread.
It does mean that the calling thread has to wait until the thread is started and has copied its data, but that probably will be a short time. If you are going to the effort of spawning a thread to do any work at all, then this slight delay in the parent thread ought to be acceptable.

Do I need mutex in constructor for field?

Let's assume I have a simple class A with one field in C++. This field is initialized in the constructor. Class A also has a method called doit() for modifing the value of this field. doit() will be called from multiple threads. If I have a mutex only in the doit() method, is this sufficient? Do I have a guarantee that I will never read an uninitialized field (because there is no lock in the constructor)?
Edit: I probably was not clear enough. Is there no issue involving processor cache or something similar? I mean, if there is no mutex for initializing memory region (i.e. my field) - is there no risk that the other thread will read some garbage value?

Your object can only be initialised once, and you won't be able use it before it's initialised, so you don't need a mutex there. You will however need a mutex or other suitable lock in your DoIt function, as you said this will be accessed across multiple threads.
Update for edited question: No, you don't need to worry about processor cache. You must construct your object first, before you can have a handle to it. Only once you have this handle can you pass it to other threads to be used. What I'm trying to say is, the spawned threads must start after the construction of the original object, it is impossible for it to happen the other way around!

It is not possible to call doit() on an object that is not created yet, so you do not need mutex in the constructor.
If doit() is the only method that accesses the field, then you should be fine.
If other methods of your class also access that field, even from a single thread, then you must use a mutex also in these methods.

You first need to construct the object before those pesky threads get their hands on it. The OS will allocate memory for the constructor that is only called by one thread. Ths OS looks after that allocation and therefore nothing needs to be done on your part. Hell you can even create two objects of the same class in two different threads.
You can be very conservative and use a mutex at the start of any method that used that field to lock it, and release it and the end.
Or if you understand the interactions of the various methods with the various algorithms , you can use a mutex for critical sections of code that use that field - i.e. That part of the code needs to be sure that the field is not altered by another thread during processing, but you method can release the lock after the critical section, do something else then perhaps have another critical section.

Passing data structures to different threads

I have an application that will be spawning multiple threads. However, I feel there might be an issue with threads accessing data that they shouldn't be.
Here is the structure of the threaded application (sorry for the crudeness):
MainThread
/ \
/ \
/ \
Thread A Thread B
/ \ / \
/ \ / \
/ \ / \
Thread A_1 Thread A_2 Thread B_1 Thread B_2
Under each lettered thread (which could be many), there will only be two threads and they are fired of sequentially. The issue i'm having is I'm not entirely sure how to pass in a datastructure into these threads.
So, the datastructure is created in MainThread, will be modified in the lettered thread (Thread A, etc) specific to that thread and then a member variable from that datastructure is sent to Letter_Numbered threads.
Currently, the lettered thread class has a member variable and when the class is constructed, the datastructure from mainthread is passed in by reference, invoking the copy constructor so the lettered thread has it's own copy to play with.
The lettered_numbered thread simply takes in a string variable from the data structure within the lettered thread. My question is, is this accceptable? Is there a much better way to ensure each lettered thread gets its own data structure to play with?
Sorry for the somewhat poor explanation, please leave comments and i'll try to clarify.
EDIT:
So my lettered thread constructor should take the VALUE of the data structure, not the reference?

I would have each thread create it's own copy of the datastructure, e.g. you pass the structure in the constructor and then explicitly create a local copy. Then you are guaranteed that the threads have distinct copies. (You say that it's passsed by reference, and that this invokes the copy constructor. I think you mean pass by value? I feel it's better to explicitly make a copy, to leave no doubt and to make your intent clear. Otherwise someone might later come along and change your pass by value to pass by reference as a "smart optimization".)
EDIT: Removed comment about strings. For some reason, I was assuming .NET.
To ensure strings are privately owned, follow the same procedure, create a copy of the string, which you can then freely modify.

There is a pattern called Active Object Pattern wherein each object executes in its own thread. Frameworks like ACE support this. If you have access to such frameworks, you should use those. In any case, i would believe creating a new instance of an object and allowing it to exetute in its own thread is much cleaner that invoking the copy-constructor to make a copy of the object. Else see if you can fit a solution that uses Thread Local Storage.

Have you looked at boost threads?
You would basically create a callable class that has a constructor that takes the parameters the thread is to work on and then launch the thread by passing objects of your callable class, initialized and ready to go.
This is very similar to how Java implements threads and it makes a good amount of sense most of the time from a design point of view.

You aparently are making a copy of the data for each trhead and everything works? then no problem.
Here are some additional thoughts:
If data is read only, you can share a single struct and everything will be ok, as long as each read is small and fast (basic types)
If data needs to be written, but "private" (or contained) to each thread, then send a copy to each thread (what you are doing). Caveat: I assume the data is not too big and a copy does not eat to much resources.
If the data needs to be written and the new values shared between threads, then you need to think about it (read on it) and create a proper design. I like a transactional object to centralize each threads read/write operation. Like a tiny database in memory. Check on thread mutex, semaphores and critical sections). Dealing with huge data set I have used a database to centralize requests (See ODBM). You can also check existing messaging queuing libraries (like MSMQ) to have data change ordered and synchronized.
Hope this helps.

It seems unlikely that you would want each thread to operate on the data and then not at least occasionally have another thread react to what another thread has done to another thread's work on the data. If you are truly independent meaning that no other thread truly will ever care about work that another thread has done, then I suggest making a copy of the data, otherwise in the case where you will want to do work in one thread and make that result of that work available to another thread I would suggest that you, pass a reference/pointer to the object around and then protect access to it via locks so that the threads can work with it, properly, I suggest a multi-read, single writer lock implementation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ - Initialize 2 copies of object owned by different threads - c++

Related

Multithreading: first access from another thread after being written... do I need volatile?

Destructors and asynchronous tasks

Preventing variables from going out of scope so they persist for another thread

Do I need mutex in constructor for field?

Passing data structures to different threads

Categories

Resources