C++ std condition variable covering a lot of share variables

C++ std condition variable covering a lot of share variables - c++

I am writing a multithreaded program in C++ and in my main thread I am waiting for my other threads to put packages in different queues. Depending on the sort of package and from which thread they originate.
The queues are protected by mutexes as it should be.
But in my main I don't want to be doing:
while(true)
if(!queue1->empty)
{
do stuff
}
if(!queue2->empty)
{
do stuff
}
etc
So you need to use condition variables to signal the main that something has changed. Now I can only block on 1 condition variable so I need all these threads to use the same conditional variable and an accompanying mutex. Now I dont want to really use this mutex to lock all my threads. It doesn't mean that when 1 thread is writing to a queue, another cant write to a totally different queue. So I use seperate mutexes for each queue. But now how do I use this additional mutex that comes with the conditional variable.
How it's done with 2 threads and 1 queue using boost, very simular to std.
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html :
template<typename Data>
class concurrent_queue
{
private:
boost::condition_variable the_condition_variable;
public:
void wait_for_data()
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.empty())
{
the_condition_variable.wait(lock);
}
}
void push(Data const& data)
{
boost::mutex::scoped_lock lock(the_mutex);
bool const was_empty=the_queue.empty();
the_queue.push(data);
if(was_empty)
{
the_condition_variable.notify_one();
}
}
// rest as before
};
So how do you solve this?

I'd say the key to your problem is here:
Now I dont want to really use this mutex to lock all my threads. It doesn't mean that when 1 thread is writing to a queue, another cant write to a totally different queue. So I use seperate mutexes for each queue.
Why? Because:
... packages come in relatively slow. And queues are empty most of the time
It seems to me that you've designed yourself into a corner because of something you thought you needed when in reality you probably didn't need it because one queue would have actually worked in the usage scenario you mention.
I'd say start off with one queue and see how far it gets you. Then, when you run into a limitation where you really do have many threads waiting on a single mutex, you will have a lot more information about the problem and will therefore be able to solve it better.
In essence, I'd say the reason you're facing this problem is premature design optimization and the way to fix that would be to back track and change the design right now.

Create a top-level (possibly circular) queue of all the queues that have work in them.
This queue can be protected by a single mutex, and have a condvar which only needs to be signalled when it changes from empty to non-empty.
Now, your individual queues can each have their own mutex, and they only need to touch the shared/top-level queue (and its mutex) when they change from empty to non-empty.
Some details will depend on whether you want your thread to take only the front item from each non-empty queue in turn, or consume each whole queue in sequence, but the idea is there.
Going from non-empty to non-empty (but increased size) should also be passed on to the top level queue?
That depends, as I said, on how you consume them. If, every time a queue has something in it, you do this:
(you already have the top-level lock, that's how you know this queue has something in it)
lock the queue
swap the queue contents with a local working copy
remove the queue from the top-level queue
unlock the queue
then a work queue is always either non-empty, and hence in the top-level queue, or empty and hence not in the queue.
If you don't do this, and just pull the front element off each non-empty queue, then you have more states to consider.
Note that if
... packages come in relatively slow. And queues are empty most of the time
you could probably just have one queue, since there wouldn't be enough activity to cause a lot of contention. This simplifies things enormously.

Both #Carleeto and #Useless have given good answers. You have only a single consumer, so a single queue will give you the best performance. You can't get higher throughput than the single consumer working constantly, so your objective should be to minimize the locking overhead of the single consumer, not the producers. You do this by having the producer wait on a single condition variable (indicating that the queue is non-empty) with a single mutex.
Here's how you do the parametric polymorphism. Complete type safety, no casting, and only a single virtual function in the parent class:
class ParentType {
public:
virtual void do_work(...[params]...)=0;
virtual ~ParentType() {}
};
class ChildType1 : public ParentType {
private:
// all my private variables and functions
public:
virtual void do_work(...[params]...) {
// call private functions and use private variables from ChildType1
}
};
class ChildType2: public ParentType {
private:
// completely different private variables and functions
public:
virtual void do-work(...[params]...) {
// call private functions and use private variables from ChildType2
}
};

Related

C++11 How to handle 2 thread safe queues using condition variables in 1 thread

I have 2 connection objects running on there own threads that each put different data into there respective queue that is running in the main thread. So the main thread has 2 queues and needs to be awaken when either one of these queues signals it has put in data. I have written a thread-safe queue that encapsulates pushing,popping and signaling the condition variable inside the theadsafe_queue. But it seems as it won't work because in the main loop it can block inside the first queue and data can come to the second queue and not be waked up and vice versa.
Do I have to shared the same condition variable and mutex between the 2 queues.
I could modify my threadsafe_queue to take the condition variable and mutex as parameters and pass the same ones to each queue.
Or I am thinking maybe using wait_until with a timer for each queue to give a chance to check each queue once there is a timeout, but this doesn't seem efficient.
The main processing thread has alot legacy code with static objects/variables and containers so it can't be split into 2 threads without introducing alot of locks.
What do you think is the best way.

Merge the queues.
Or, write a streaming system. The producers don't need to know where their data goes; it jist has to go. They need a:
template<class T>
using sink=std::function<void(T)>;
to send their data.
The listener doesn't need to know where the data is coming from. It needs a source:
template<class T>
using source= sink<sink<T>>;
now they are on different threads; so you need a way to get data from A to B.
template<class T>
struct threadsafe_queue {
sink<T> get_sink();
source<T> get_source();
};
in there maintain your mutex, condition variable, and buffer.
Now here ismthe fun part. If we have X=variant<A,B>, then sink<X> can convert to sink<A> (also source<A> can convert to source<X>).
So if thread 1 produces A and thread 2 produces B, they can both feed into a sink<X> without them even knowing.
Meanwhile the consumer thread sees either A or B coming from the queue.
You can replace source<T>=sink<sink<T>> with source<T>=std::function<std::optional<T>()>, where it returns empty when done. I loke sources being sinks of sinks; use is:
void print_ints( source<int> src ) {
src([](int x){ std::cout<<x<<','; });
std::cout<<"\n";
}
vs my less preferred:
void print_ints( source<int> src ) {
while(auto x=src()){std::cout<<*x<<','; };
std::cout<<"\n";
}
As an aside, you can tag source/sink types and overload | and add pipe<In,Out> etc.
But that isn't useful here.

What is the best way to share data containers between threads in c++

I have an application which has a couple of processing levels like:
InputStream->Pre-Processing->Computation->OutputStream
Each of these entities run in separate thread.
So in my code I have the general thread, which owns the
std::vector<ImageRead> m_readImages;
and then it passes this member variable to each thread:
InputStream input{&m_readImages};
std::thread threadStream{&InputStream::start, &InputStream};
PreProcess pre{&m_readImages};
std::thread preStream{&PreProcess::start, &PreProcess};
...
And each of these classes owns a pointer member to this data:
std::vector<ImageRead>* m_ptrReadImages;
I also have a global mutex defined, which I lock and unlock on each read/write operation to that shared container.
What bothers me is that this mechanism is pretty obscure and sometimes I get confused whether the data is used by another thread or not.
So what is the more straightforward way to share this container between those threads?

The process you described as "Input-->preprocessing-->computation-->Output" is sequential by design: each step depends on the previous one so parallelization in this particular manner is not beneficial as each thread just has to wait for another to complete. Try to find out which step takes most time and parallelize that. Or try to set up multiple parallel processing pipelines that operate sequentially on independent, individual data sets. A usual approach for that would employ a processing queue which distributes the tasks among a set of threads.

It would seem to me that your reading and preprocessing could be done independently of the container.
Naively, I would structure this as a fan-out and then fan-in network of tasks.
First, make dispatch task (a task is a unit of work that is given to a thread to actually operate) that will create input-and-preprocess tasks.
Use futures as a means for the sub-tasks to communicate back a pointer to the completely loaded image.
Make a second task, the std::vector builder task that just calls join on the futures to get the results when they are done and adds them to the std::vector array.
I suggest you structure things this way because I suspect that any IO and preprocessing you are doing will take longer than setting a value in the vector. Using tasks instead of threads directly lets you tune the parallel portion of your work.
I hope that's not too abstracted away from the concrete elements. This is a pattern I find to be well balanced between saturating available hardware, reducing thrash / lock contention, and is understandable by future-you debugging it later.

I would use 3 separate queues, ready_for_preprocessing which is fed by InputStream and consumed by Pre-processing, ready_for_computation which is fed by Pre-Processing and consumed by Computation, and ready_for_output which is fed by Computation and consumed by OutputStream.
You'll want each queue to be in a class, which has an access mutex (to control actually adding and removing items from the queue) and an "image available" semaphore (to signal that items are available) as well as the actual queue. This would allow multiple instances of each thread. Something like this:
class imageQueue
{
std::deque<ImageRead> m_readImages;
std::mutex m_changeQueue;
Semaphore m_imagesAvailable;
public:
bool addImage( ImageRead );
ImageRead getNextImage();
}
addImage() takes the m_changeQueue mutex, adds the image to m_readImages, then signals m_imagesAvailable;
getNextImage() waits on m_imagesAvailable. When it becomes signaled, it takes m_changeQueue, removes the next image from the list, and returns it.
cf. http://en.cppreference.com/w/cpp/thread

Ignoring the question of "Should each operation run in an individual thread", it appears that the objects that you want to process move from thread to thread. In effect, they are uniquely owned by only one thread at a time (no thread ever needs to access any data from other threads, ). There is a way to express just that in C++: std::unique_ptr.
Each step then only works on its owned image. All you have to do is find a thread-safe way to move the ownership of your images through the process steps one by one, which means the critical sections are only at the boundaries between tasks. Since you have multiple of these, abstracting it away would be reasonable:
class ProcessBoundary
{
public:
void setImage(std::unique_ptr<ImageRead> newImage)
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer == nullptr)
{
// Image has been transferred to next step, so we can place this one here.
m_imageToTransfer = std::move(m_newImage);
return;
}
}
std::this_thread::yield();
}
}
std::unique_ptr<ImageRead> getImage()
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer != nullptr)
{
// Image has been transferred to next step, so we can place this one here.
return std::move(m_imageToTransfer);
}
}
std::this_thread::yield();
}
}
void stop()
{
running = false;
}
private:
std::mutex m_mutex;
std::unique_ptr<ImageRead> m_imageToTransfer;
std::atomic<bool> running; // Set to true in constructor
};
The process steps would then ask for an image with getImage(), which they uniquely own once that function returns. They process it and pass it to the setImage of the next ProcessBoundary.
You could probably improve on this with condition variables, or adding a queue in this class so that threads can get back to processing the next image. However, if some steps are faster than others they will necessarily be stalled by the slower ones eventually.

This is a design pattern problem. I suggest to read about concurrency design pattern and see if there is anything that would help you out.
If you wan to add concurrency to the following sequential process.
InputStream->Pre-Processing->Computation->OutputStream
Then I suggest to use the active object design pattern. This way each process is not blocked by the previous step and can run concurrently. It is also very simple to implement(Here is an implementation:
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095)
As to your question about each thread sharing a DTO. This is easily solved with a wrapper on the DTO. The wrapper will contain write and read functions. The write functions blocks with a mutext and the read returns const data.
However, I think your problem lies in design. If the process is sequential as you described, then why are each process sharing the data? The data should be passed into the next process once the current one completes. In other words, each process should be decoupled.

You are correct in using mutexes and locks. For C++11, this is really the most elegant way of accessing complex data between threads.

Put all database operations in a specific thread using Qt

I have a console application where after a timeout signal, a 2D matrix (15*1200) should be parsed element-by element and inserted to a database. Since the operation is time-consuming, I perform the insertion in a new thread using QConcurrent::run.
However, due to timeout signals, several threads may start before one finished, so multiple accesses to the database may occur.
As a solution, I was trying to buffer all database operations in a specific thread, in other words, assign a specific thread to the database class, but do not know how to do so.

Your problem is a classical concurrent data analysis problem. Have you tried using std::mutex? Here's how you use it:
You create some variable std::mutex (mutex = mutual exclusion) that's accessible by all the relevant threads.
std::mutex myLock;
and then, let's say that the function that processes the data looks like this:
void processData(const Data& myData)
{
ProcessedData d = parseData();
insertToDatabase(d);
}
Now from what I understand, you're afraid that multiple threads will call insertToDatabase(d) simultaneously. Now to solve this issue, simply do the following:
void processData(const Data& myData)
{
ProcessedData d = parseData();
myLock.lock();
insertToDatabase(d);
myLock.unlock();
}
Now with this, if another thread tries to access the same function, it will block until another all other threads are finished. So threads are mutually excluded from accessing the call together.
More about this:
Caveats:
This mutex object must be the same one that all the threads see, otherwise this is useless. So either make it global (bad idea, but will work), or put it in a the class that will do the calls.
Mutex objects are non-copyable. So if you include them in a class, you should either make the mutex object a pointer, or you should reimplement the copy constructor of that class to prevent copying that mutex, or make your class noncopyable using delete:
class MyClass
{
//... stuff
MyClass(const MyClass& src) = delete;
//... other stuff
};
There are way more fancier ways to use std::mutex, including std::lock_guard and std::unique_lock, which take ownership of the mutex and do the lock for you. This are good to use if you know that the call insertToDatabase(d); could throw an exception. In that case, using only the code I wrote will not unlock the mutex, and the program will reach a deadlock.
In the example I provided, here's how you use lock_guard:
void processData(const Data& myData)
{
ProcessedData d = parseData();
std::lock_guard<std::mutex> guard(myLock);
insertToDatabase(d);
//it will unlock automatically at the end of this function, when the object "guard" is destroyed
}
Be aware that calling lock() twice by the same thread has undefined behavior.
Everything I did above is C++11.
If you're going to deal with multiple threads, I recommend that you start reading about data management with multiple threads. This is a good book.
If you insist on using Qt stuff, here's the same thing from Qt... QMutex.

How can I avoid deadlocking when registering/deregistering observers during notification?

I have ideas for solving this, but I have a feeling this problem has been solved many times over.
I have implemented an observer pattern, similar to this:
struct IObserver {
virtual void notify(Event &event) = 0;
}
struct Notifier {
void registerObserver(IObserver* observer, EventRange range) {
lock(_mutex);
_observers[observer] = range;
}
void deregisterObserver(IObserver* observer) {
lock(_mutex);
_observers.erase(observers.find(observer));
}
void handleEvent() { /* pushes event onto queue */ }
void run();
mutex _mutex;
queue<Event> _eventQueue;
map<IObserver, EventRange> _observers;
}
The run method is called from a thread I create (it is actually owned by the notifier). The method looks something like...
void Notifier::run() {
while(true) {
waitForEvent();
Event event = _eventQueue.pop();
// now we have an event, acquire a lock and notify listeners
lock(_mutex);
BOOST_FOREACH(map<IObserver, EventRange>::value_type &value, _observers){
value.first->notify(event);
}
}
}
This works perfectly, until notify attempts to create an object that in turn attempts to register an observer. In this scenario, an attempt is made to acquire the already locked lock, and we end up in a deadlock. This situation can be avoided by using a recursive mutex. However, now consider the situation where a notification triggers removal of an Observer. Now map iterators are invalidated.
My question is, is there a pattern available that prevents this deadlock situation?

I think the real problem here is that you have an event that is manipulating the list of observers while you are iterating over the list of observers. If you are executing a notify(...) operation, you are iterating over the list. If you are iterating over the original list (and not a copy), then either registration or deregistration alters the list while you are iterating over it. I don't believe the iterators in a std::map would handle this well.
I have had this problem as well (just in a single threaded context) and found the only way to deal with it was to create a temporary copy of the observer list and iterate over that.
I also cached off removed observers during iteration so I would be sure that if I had observers A, B, and C, then if A leads to C being removed, the list still has C in it but C gets skipped.
I have an implementation of this for single threaded applications.
You could convert it to a threaded approach with a little work.
EDIT: I think the points of vulnerability for a multi-threaded application are the creation of the copy of the observer list (which you do when you enter notify(...)) and the addition of the observers to the "recently removed" list when observers detach. Don't place mutexes around these functions; place mutexes around the creation/update of the lists inside those function or create functions for just that purpose and place mutexes around them.
EDIT: I also strongly suggest creating some unit test cases (e.g. CPP Unit) to hammer the attach/detach/multi-detach scenarios from multiple threads. I had to do this in order to find one of the subtler problems when I was working on it.
EDIT: I specifically don't try to handle the case of new observers added as a consequence of a notify(...) call. That is to say, there is a list of recently removed but not a list of recently added. This is done to prevent a "notify->add->notify->add->ect." from happening, which can happen if somebody sticks a notify in a constructor.
The general approach is sketched out here.
The code is available on github here.
I have used this approach in several example solutions, which you can find on this site (and code for many of them on github as well).
Was this helpful?

Elegant ways to notify consumer when producer is done?

I'm implementing a concurrent_blocking_queue with minimal functions:
//a thin wrapper over std::queue
template<typename T>
class concurrent_blocking_queue
{
std::queue<T> m_internal_queue;
//...
public:
void add(T const & item);
T& remove();
bool empty();
};
I intend to use this for producer-consumer problem (I guess, it is where one uses such data structures?). But I'm stuck on one problem which is:
How to elegantly notify consumer when producer is done? How would the producer notify the queue when it is done? By calling a specifiic member function, say done()? Is throwing exception from the queue (i.e from remove function) a good idea?
I came across many examples, but all has infinite loop as if the producer will produce items forever. None discussed the issue of stopping condition, not even the wiki article.

I've simply introduced a dummy "done" product in the past. So if the producer can create "products" of, say, type A and type B, I've invented type "done". When a consumer encounters a product of type "done" it knows that further processing isn't required anymore.

It is true that it's common to enqueue a special "we're done" message; however I think OP's original desire for an out-of-band indicator is reasonable. Look at the complexity people are contemplating to set up an in-band completion message! Proxy types, templating; good grief. I'd say a done() method is simpler and easier, and it makes the common case (we're not done yet) faster and cleaner.
I would agree with kids_fox that a try_remove that returns an error code if the queue is done is preferred, but that's stylistic and YMMV.
Edit:
Bonus points for implementing a queue that keeps track of how many producers are remaining in a multiple-producers situation and raises the done exception iff all producers have thrown in the towel ;-) Not going to do that with in-band messages!

My queues have usually used pointers (with an std::auto_ptr in the
interface, to clearly indicate that the sender may no longer access the
pointer); for the most part, the queued objects were polymorphic, so
dynamic allocation and reference semantics were required anyway.
Otherwise, it shouldn't be too difficult to add an “end of
file” flag to the queue. You'd need a special function on the
producer side (close?) to set it (using exactly the same locking
primitives as when you write to the queue), and the loop in the removal
function must wait for either something to be there, or the queue to be
closed. Of course, you'll need to return a Fallible value, so that
the reader can know whether the read succeeded or not. Also, don't
forget that in this case, you need a notify_all to ensure that all
processes waiting on the condition are awoken.
BTW: I don't quite see how your interface is implementable. What does
the T& returned by remove refer to. Basically, remove has to be
something like:
Fallible<T>
MessageQueue<T>::receive()
{
ScopedLock l( myMutex );
while ( myQueue.empty() && ! myIsDone )
myCondition.wait( myMutex );
Fallible<T> results;
if ( !myQueue.empty() ) {
results.validate( myQueue.top() );
myQueue.pop();
}
return results;
}
Even without the myIsDone condition, you have to read the value into a
local variable before removing it from the queue, and you can't return a
reference to a local variable.
For the rest:
void
MessageQueue<T>::send( T const& newValue )
{
ScopedLock l( myMutex );
myQueue.push( newValue );
myCondition.notify_all();
}
void
MessageQueue<T>::close()
{
ScopedLock l( myMutex );
myIsDone = true;
myCondition.notify_all();
}

'Stopping' is not often discussed because it's often never done. In those cases where it is required, it's often just as easier and more flexible to enqueue a poison-pill using the higher-level P-C protocol itself as it is to build extra functionality into the queue itself.
If you really want to do this, you could indeed set a flag that causes every consumer to raise an exception, either 'immediately' or whenever it gets back to the queue, but there are problems. Do you need the 'done' method to be synchronous, ie. do you want all the consumers gone by the time 'done' returns, or asynchronous, ie. the last consumer thread calls an event parameter when all the other the consumers are gone?
How are you going to arrange for those consumers that are currently waiting to wake up? How many are waiting and how many are busy, but will return to the queue when they have done their work? What if one or more consumers are stuck on a blocking call, (perhaps they can be unblocked, but that requires a call from another thread - how are you going to do that)?
How are the consumers going to notify that they have handled their exception and are about to die? Is 'about to die' enough, or do you need to wait on the thread handle? If you have to wait on the thread handle, what is going to do the waiting - the thread requesting the queue shutdown or the last consumer thread to notify?
Oh yes - to be safe, you should arrange for producer threads that turn up with objects to queue up while in 'shutting down' state to raise an exception as well.
I raise these questions becasue I've done all this once, a long time ago. Eventually, it all worked-ish. The objects queued up all had to have a 'QueuedItem' inserted into their inheritance chain, (so that a job-cancellation method could be exposed to the queue), and the queue had to keep a thread-safe list of objects that had been popped-off by
threads but not processed yet.
After a while, I stopped using the class in favour of a simple P-C queue with no special shutdown mechanism.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js