Passing data structures to different threads

Passing data structures to different threads - c++

I have an application that will be spawning multiple threads. However, I feel there might be an issue with threads accessing data that they shouldn't be.
Here is the structure of the threaded application (sorry for the crudeness):
MainThread
/ \
/ \
/ \
Thread A Thread B
/ \ / \
/ \ / \
/ \ / \
Thread A_1 Thread A_2 Thread B_1 Thread B_2
Under each lettered thread (which could be many), there will only be two threads and they are fired of sequentially. The issue i'm having is I'm not entirely sure how to pass in a datastructure into these threads.
So, the datastructure is created in MainThread, will be modified in the lettered thread (Thread A, etc) specific to that thread and then a member variable from that datastructure is sent to Letter_Numbered threads.
Currently, the lettered thread class has a member variable and when the class is constructed, the datastructure from mainthread is passed in by reference, invoking the copy constructor so the lettered thread has it's own copy to play with.
The lettered_numbered thread simply takes in a string variable from the data structure within the lettered thread. My question is, is this accceptable? Is there a much better way to ensure each lettered thread gets its own data structure to play with?
Sorry for the somewhat poor explanation, please leave comments and i'll try to clarify.
EDIT:
So my lettered thread constructor should take the VALUE of the data structure, not the reference?

I would have each thread create it's own copy of the datastructure, e.g. you pass the structure in the constructor and then explicitly create a local copy. Then you are guaranteed that the threads have distinct copies. (You say that it's passsed by reference, and that this invokes the copy constructor. I think you mean pass by value? I feel it's better to explicitly make a copy, to leave no doubt and to make your intent clear. Otherwise someone might later come along and change your pass by value to pass by reference as a "smart optimization".)
EDIT: Removed comment about strings. For some reason, I was assuming .NET.
To ensure strings are privately owned, follow the same procedure, create a copy of the string, which you can then freely modify.

There is a pattern called Active Object Pattern wherein each object executes in its own thread. Frameworks like ACE support this. If you have access to such frameworks, you should use those. In any case, i would believe creating a new instance of an object and allowing it to exetute in its own thread is much cleaner that invoking the copy-constructor to make a copy of the object. Else see if you can fit a solution that uses Thread Local Storage.

Have you looked at boost threads?
You would basically create a callable class that has a constructor that takes the parameters the thread is to work on and then launch the thread by passing objects of your callable class, initialized and ready to go.
This is very similar to how Java implements threads and it makes a good amount of sense most of the time from a design point of view.

You aparently are making a copy of the data for each trhead and everything works? then no problem.
Here are some additional thoughts:
If data is read only, you can share a single struct and everything will be ok, as long as each read is small and fast (basic types)
If data needs to be written, but "private" (or contained) to each thread, then send a copy to each thread (what you are doing). Caveat: I assume the data is not too big and a copy does not eat to much resources.
If the data needs to be written and the new values shared between threads, then you need to think about it (read on it) and create a proper design. I like a transactional object to centralize each threads read/write operation. Like a tiny database in memory. Check on thread mutex, semaphores and critical sections). Dealing with huge data set I have used a database to centralize requests (See ODBM). You can also check existing messaging queuing libraries (like MSMQ) to have data change ordered and synchronized.
Hope this helps.

It seems unlikely that you would want each thread to operate on the data and then not at least occasionally have another thread react to what another thread has done to another thread's work on the data. If you are truly independent meaning that no other thread truly will ever care about work that another thread has done, then I suggest making a copy of the data, otherwise in the case where you will want to do work in one thread and make that result of that work available to another thread I would suggest that you, pass a reference/pointer to the object around and then protect access to it via locks so that the threads can work with it, properly, I suggest a multi-read, single writer lock implementation.

Related

why we need both std::promise and std::future?

I am wondering why we need both std::promise and std::future ? why c++11 standard divided get and set_value into two separate classes std::future and std::promise?
In the answer of this post, it mentioned that :
The reason it is separated into these two separate "interfaces" is to
hide the "write/set" functionality from the "consumer/reader".
I don't understand the benefit of hiding here. But isn't it simpler if we have only one class "future"? For example: promise.set_value can be replaced by future.set_value.

The problem that promise/future exist to solve is to shepherd a value from one thread to another. It may also transfer an exception instead.
So the source thread must have some object that it can talk to, in order to send the desired value to the other thread. Alright... who owns that object? If the source has a pointer to something that the destination thread owns, how does the source know if the destination thread has deleted the object? Maybe the destination thread no longer cares about the value; maybe something changed such that it decided to just drop your thread on the floor and forget about it.
That's entirely legitimate code in some cases.
So now the question becomes why the source doesn't own the promise and simply give the destination a pointer/reference to it? Well, there's a good reason for that: the promise is owned by the source thread. Once the source thread terminates, the promise will be destroyed. Thus leaving the destination thread with a reference to a destroyed promise.
Oops.
Therefore, the only viable solution is to have two full-fledged objects: one for the source and one for the destination. These objects share ownership of the value that gets transferred. Of course, that doesn't mean that they couldn't be the same type; you could have something like shared_ptr<promise> or somesuch. After all, promise/future must have some shared storage of some sort internally, correct?
However, consider the interface of promise/future as they currently stand.
promise is non-copyable. You can move it, but you can't copy it. future is also non-copyable, but a future can become a shared_future that is copyable. So you can have multiple destinations, but only one source.
promise can only set the value; it can't even get it back. future can only get the value; it cannot set it. Therefore, you have an asymmetric interface, which is entirely appropriate to this use case. You don't want the destination to be able to set the value and the source to be able to retrieve it. That's backwards code logic.
So that's why you want two objects. You have an asymmetric interface, and that's best handled with two related but separate types and objects.

I would think of a promise/future as an asynchronous queue (that's only intended to hold a single value).
The future is the read end of the queue. The promise is the write end of the queue.
The use of the two is normally distinct: the producer normally just writes to the "queue", and the consume just reads from it. Although, as you've noted, it's possible for a producer to read the value, there's rarely much reason for it to do that, so optimizing that particular operation is rarely seen as much of a priority.
In the usual scheme of things, the producer produces the value, and puts it into the promise. The consumer gets the value from the future. Each "client" uses one simple interface dedicated exclusively to one simple task, so it's easier to design and document the code, as well as ensuring that (for example) the consumer code doesn't mess with something related to producing the value (or vice versa). Yes, it's possible to do that, but enough extra work that it's fairly unlikely to happen by accident.

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.

Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.

Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

Several singletons: one for each task

I have a C++ multi-threaded application which run tasks in separate threads. Each task have an object which handles and stores it's output. Each task create different business logic objects and probably another threads or threadpools.
What I want to do is somehow provide an easy way for any of business logic objects which are run by task to access each task's output without manually passing "output" object to each business logic object.
What i see is to create output singleton factory and store task_id in TLS. But the problem is when business logic create a new thread or thread pool and those thread would not have task_id in TLS. In this way i would need to have an access to parent's thread TLS.
The other way is to simply grab all output since task's start. There would be output from different task in that time, but at least, better than nothing...
I'm looking for any suggestions or ideas of clean and pretty way of solving my problem. Thanks.
upd: yeah, it is not singletone, I agree. I just want to be able to access this object like this:
output << "message";
And that's it. No worry of passing pointers to output object between business logic classes. I need to have a global output object per task.

From an application point of view, they are not singletons, so why treating the objects like singletons?
I would make a new instance of the output storer and pass the (smart?) pointer to the new thread. The main function may put the pointer in the TLS, thus making the instance global per thread (I don't think that this is a wise design deision, but it is asked). When making a new (sub-?)thread, the pointer can again be passed. So according to me, no singletons or factories are needed.

If I understand you correctly, you want to have multiple class instances (each not necessarily the same class) all be able to access a common data pool that needs to be thread safe. I can think of a few ways to do this. The first idea is to have this data pool in a class that each of the other classes contain. This data pool will actually store it's data in a static member, so that way there is only one instance of the data even though there will be more than one instance of the data pool class. The class will then have accessor methods which access this static data pool (so that it is transparent). To make it thread safe you would then require the access to go through a mutex or something like that.

Do I need mutex in constructor for field?

Let's assume I have a simple class A with one field in C++. This field is initialized in the constructor. Class A also has a method called doit() for modifing the value of this field. doit() will be called from multiple threads. If I have a mutex only in the doit() method, is this sufficient? Do I have a guarantee that I will never read an uninitialized field (because there is no lock in the constructor)?
Edit: I probably was not clear enough. Is there no issue involving processor cache or something similar? I mean, if there is no mutex for initializing memory region (i.e. my field) - is there no risk that the other thread will read some garbage value?

Your object can only be initialised once, and you won't be able use it before it's initialised, so you don't need a mutex there. You will however need a mutex or other suitable lock in your DoIt function, as you said this will be accessed across multiple threads.
Update for edited question: No, you don't need to worry about processor cache. You must construct your object first, before you can have a handle to it. Only once you have this handle can you pass it to other threads to be used. What I'm trying to say is, the spawned threads must start after the construction of the original object, it is impossible for it to happen the other way around!

It is not possible to call doit() on an object that is not created yet, so you do not need mutex in the constructor.
If doit() is the only method that accesses the field, then you should be fine.
If other methods of your class also access that field, even from a single thread, then you must use a mutex also in these methods.

You first need to construct the object before those pesky threads get their hands on it. The OS will allocate memory for the constructor that is only called by one thread. Ths OS looks after that allocation and therefore nothing needs to be done on your part. Hell you can even create two objects of the same class in two different threads.
You can be very conservative and use a mutex at the start of any method that used that field to lock it, and release it and the end.
Or if you understand the interactions of the various methods with the various algorithms , you can use a mutex for critical sections of code that use that field - i.e. That part of the code needs to be sure that the field is not altered by another thread during processing, but you method can release the lock after the critical section, do something else then perhaps have another critical section.

passing "this" to a thread c++

What is the best way of performing the following in C++. Whilst my current method works I'm not sure it's the best way to go:
1) I have a master class that has some function in it
2) I have a thread that takes some instructions on a socket and then runs one of the functions in the master class
3) There are a number of threads that access various functions in the master class
I create the master class and then create instances of the thread classes from the master. The constructor for the thread class gets passed the "this" pointer for the master. I can then run functions from the master class inside the threads - i.e. I get a command to do something which runs a function in the master class from the thread. I have mutex's etc to prevent race problems.
Am I going about this the wrong way - It kinda seems like the thread classes should inherit the master class or another approach would be to not have separate thread classes but just have them as functions of the master class but that gets ugly.

Sounds good to me. In my servers, it is called 'SCB' - ServerControlBlock - and provides access to services like the IOCPbuffer/socket pools, logger, UI access for status/error messages and anything else that needs to be common to all the handler threads. Works fine and I don't see it as a hack.
I create the SCB, (and ensure in the ctor that all services accessed through it are started and ready for use), before creating the thread pool that uses the SCB - no nasty singletonny stuff.
Rgds,
Martin

Separate thread classes is pretty normal, especially if they have specific functionality. I wouldn't inherit from the main thread.

Passing the this pointer to threads is not, in itself, bad. What you do with it can be.
The this pointer is just like any other POD-ish data type. It's just a chunk of bits. The stuff that is in this might be more than PODs however, and passing what is in effect a pointer to it's members can be dangerous for all the usual reasons. Any time you share anything across threads, it introduces potential race conditions and deadlocks. The elementary means to resolve those conflicts is, of course, to introduce synchronization in the form of mutexes, semaphores, etc, but this can have the suprising effect of serializing your application.
Say you have one thread reading data from a socket and storing it to a synchronized command buffer, and another thread which reads from that command buffer. Both threads use the same mutex, which protects the buffer. All is well, right?
Well, maybe not. Your threads could become serialized if you're not very careful with how you lock the buffer. Presumably you created separate threads for the buffer-insert and buffer-remove codes so that they could run in parallel. But if you lock the buffer with each insert & each remove, then only one of those operations can be executing at a time. As long as your writing to the buffer, you can't read from it and vice versa.
You can try to fine-tune the locks so that they are as brief as possible, but so long as you have shared, synchronized data, you will have some degree of serialization.
Another approach is to hand data off to another thread explicitly, and remove as much data sharing as possible. Instead of writing to and reading from a buffer as in the above, for example, your socket code might create some kind of Command object on the heap (eg Command* cmd = new Command(...);) and pass that off to the other thread. (One way to do this in Windows is via the QueueUserAPC mechanism).
There are pros & cons to both approaches. The synchronization method has the benefit of being somewhat simpler to understand and implement at the surface, but the potential drawback of being much more difficult to debug if you mess something up. The hand-off method can make many of the problems inherent with synchronization impossible (thereby actually making it simpler), but it takes time to allocate memory on the heap.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js