C++ wait notify in threads with synchronized queues - c++

I have a program structured like that: one thread that receives tasks and writes them to input queue, multiple which process them and write in output queue, one that responds with results from it. When queue is empty, thread sleeps for several milliesconds. Queue has mutex inside it, pushing does lock(), and popping does try_lock() and returns if there is nothing in queue.
This is processing thread for example:
//working - atomic bool
while (working) {
if (!inputQue_->pop(msg)) {
std::this_thread::sleep_for(std::chrono::milliseconds(200));
continue;
} else {
string reply = messageHandler_->handle(msg);
if (!reply.empty()) {
outputQue_->push(reply);
}
}
}
And the thing that I dont like is that the time since receiving task until responding, as i have measured with high_resolution_clock, is almost 0, when there is no sleeping. When there is sleeping, it becomes bigger.
I dont want cpu resources to be wasted and want to do something like that: when recieving thread gets task, it notifies one of the processing threads, that does wait_for, and when processing task is done, it notifies responding thread same way. As a result I think i will get less time spent and cpu resources will not be wasted. And I have some questions:
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?

Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
Yes, assuming you do it correctly.
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
You can use a single mutex and a single condition variable, but that makes it a bit more complex. I'd suggest a single mutex, but one condition variable for each condition a thread might want to wait for.
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
Absolutely not. You need to hold the mutex while you check whether the queue is empty and continue to hold it until you call wait_for. Otherwise, you destroy the entire logic of the condition variable. The mutex associated with the condition variable must protect the condition that the thread is going to wait for, which in this case is the queue being non-empty.
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
I don't know what you mean by the "free thread". As a general rule, you can use notify_one if it's not possible for a thread to be blocked on the condition variable that can't handle the condition. You should use notify_all if either more than one thread might need to be awoken or there's a possibility that more than one thread will be blocked on the condition variable and the "wrong thread" could be woken, that is, there could be at least one thread that can't do whatever it is that needs to be done.
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Sure, it's quite possible. But that would mean no threads were blocked on the condition. In that case, no thread can block on the condition because threads must check the condition before they wait, and they do it while holding a mutex. To provide this atomic "unlock and wait" semantic is the entire purpose of a condition variable.

The mechanism you have is called polling. The thread repeatedly checks (polls) if there is data available. As you mentioned, it has the drawback of wasting time. (But it is simple). What you mentioned you would like to use is called a blocking mechanism. This deschedules the thread until the moment that work becomes available.
1) Yes (although I don't know exactly what you're imagining)
2) a) Yes, 2 condition variables is one way to do it. b) Common mutex is best
3) You would probably place those within pop, and calling pop would have the potential to block.
4) No. notify_one will only wake a thread that is currently waiting from having called wait. Also, if multiple are waiting, it is not necessarily guaranteed which will receive the notification. (OS/library dependent)
5) No. If 1+ threads are waiting, notify_one it is guaranteed to wake one. BUT if no threads are waiting, the notification is consumed (and has no effect). Note that under certain edge conditions, notify_one may actually wake more than one. Also, a thread may wake from wait without anyone having called notify_one ("Spurious wake up"). The fact that this can happen at all means that you always have to do additional checking for it.
This is called the producer/consumer problem btw.

In general, your considerations about condition variable are correct. My proposal is more connected to design and reusability of such functionality.
The main idea is to implement ThreadPool pattern, which has constructor with number of worker threads ,methods submitTask, shutdown, join.
Having such class, you will use 2 instances of pools: one multithreaded for processing, second (singlethreaded by your choice) for result sending.
The pool consists of Blocking Queue of Tasks and array of Worker threads, each performing the same "pop Task and run" loop.The Blocking Queue encapsulates mutex and cond_var. The Task is common functor.
This also brings your design to Task oriented approach, which has a lot of advantages in future of your application.
You are welcome to ask more questions about implementation details if you like this idea.
Best regards, Daniel

Related

unlock the mutex after condition_variable::notify_all() or before?

Looking at several videos and the documentation example, we unlock the mutex before calling the notify_all(). Will it be better to instead call it after?
The common way:
Inside the Notifier thread:
//prepare data for several worker-threads;
//and now, awaken the threads:
std::unique_lock<std::mutex> lock2(sharedMutex);
_threadsCanAwaken = true;
lock2.unlock();
_conditionVar.notify_all(); //awaken all the worker threads;
//wait until all threads completed;
//cleanup:
_threadsCanAwaken = false;
//prepare new batches once again, etc, etc
Inside one of the worker threads:
while(true){
// wait for the next batch:
std::unique_lock<std::mutex> lock1(sharedMutex);
_conditionVar.wait(lock1, [](){return _threadsCanAwaken});
lock1.unlock(); //let sibling worker-threads work on their part as well
//perform the final task
//signal the notifier that one more thread has completed;
//loop back and wait until the next task
}
Notice how the lock2 is unlocked before we notify the condition variable - should we instead unlock it after the notify_all() ?
Edit
From my comment below: My concern is that, what if the worker spuriously awakes, sees that the mutex is unlocked, super-quickly completes the task and loops back to the start of while. Now the slow-poke Notifier finally calls notify_all(), causing the worker to loop an additional time (excessive and undesired).
There are no advantages to unlocking the mutex before signaling the condition variable unless your implementation is unusual. There are two disadvantages to unlocking before signaling:
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
There is a performance penalty for unlocking before signaling that is avoided by unlocking after signaling. If you signal before you unlock, a good implementation will know that your signal cannot possibly render any thread ready-to-run because the mutex is held by the calling thread and any thread affects by the condition variable necessarily cannot make forward progress without the mutex. This permits a significant optimization (often called "wait morphing") that is not possible if you unlock first.
So signal while holding the lock unless you have some unusual reason to do otherwise.
should we instead unlock it after the notify_all() ?
It is correct to do it either way but you may have different behavior in different situations. It is quite difficult to predict how it will affect performance of your program - I've seen both positive and negative effects for different applications. So it is better you profile your program and make decision on your particular situation based on profiling.
As mentioned here : cppreference.com
The notifying thread does not need to hold the lock on the same mutex
as the one held by the waiting thread(s); in fact doing so is a
pessimization, since the notified thread would immediately block
again, waiting for the notifying thread to release the lock.
That said, documentation for wait
At the moment of blocking the thread, the function automatically calls
lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function
unblocks and calls lck.lock(), leaving lck in the same state as when
the function was called. Then the function returns (notice that this
last mutex locking may block again the thread before returning).
so when notified wait will re-attempt to gain the lock and in that process it will get blocked again till original notifying thread releases the lock.
So I'll suggest that release the lock before calling notify. As done in example on cppreference.com and most importantly
Don't be Pessimistic.
David's answer seems to me wrong.
First, assuming the simple case of two threads, one waiting for the other on a condition variable, unlocking first by the notifier will not waken the other waiting thread, as the signal has not arrived. Then the notify call will immediately waken the waiting thread. You do not need any special optimizations.
On the other hand, signalling first has the potential of waking up a thread and making it sleep immediately again, as it cannot hold the lock—unless wait morphing is implemented.
Wait morphing does not exist in Linux at least, according to the answer under this StackOverflow question: Which OS / platforms implement wait morphing optimization?
The cppreference example also unlocks first before signalling: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
It explicit says:
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock, though some implementations recognize the pattern and do not attempt to wake up the thread that is notified under lock.
should we instead unlock it after the notify_all() ?
After reading several related posts, I've formed the opinion that it's purely a performance issue. If OS supports "wait morphing", unlock after; otherwise, unlock before.
I'm adding an answer here to augment that of #DavidSchwartz 's. Particularly, I'd like to clarify his point 1.
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
The 1st thing I said is that, because it's a CV and not a Mutex, a better term for the so-called "deadlock" might be "sleep paralysis" - a mistake some programs make is that
a thread that's supposed to wake
went to sleep due to not rechecking the condition it's been waiting for before wait'ng again.
The 2nd thing is that, when waking some other thread(s),
the default choice should be broadcast/notify_all (broadcast is the POSIX term, which is equivalent to its C++ counterpart).
signal/notify is an optimized special case used for when there's only 1 other thread is waiting.
Finally 3rd, David is adamant that
it's better to unlock after notify,
because it can avoid the "deadlock" which I've been referring to as "sleep paralysis".
If it's unlock then notify, then there's a window where another thread (let's call this the "wrong" thread) may i.) acquire the mutex, ii.)going into wait, and iii.) wake up. The steps i. ii. and iii. happens too quickly, consumed the signal, leaving the intended (let's call it "correct") thread in sleep.
I discussed this extensively with David, he clarified that only when all 3 points are violated ( 1. condvar associated with several separate conditions and/or didn't check it before waiting again; 2. signal/notify only 1 thread when there're more than 1 other threads using the condvar; 3. unlock before notify creating a window for race condition ), the "sleep paralysis" would occur.
Finally, my recommendation is that, point 1 and 2 are essential for correctness of the program, and fixing issues associated with 1 and 2 should be prioritized over 3, which should only be a augmentative "last resort".
For the purpose of providing reference, manpage for signal/broadcast and wait contains some info from version 3 of Single Unix Specification that gave some explanations on point 1 and 2, and partly 3. Although specified for POSIX/Unix/Linux in C, it's concepts are applicable to C++.
As of this writing (2023-01-31), the 2018 edition of version 4 of Single Unix Specification is released, and the drafting of version 5 is underway.

Waiting for all asynchronous consumers to complete before producer continues

We have a problem set that is very close to the producer-consumer problem. The actual use case is for a thread (producer) that runs through a directory listing (approx. 2000 entries), then feeds these entries to 4 threads (consumers) that processes specific files in those directories.
The problem we are attempting to resolve is how to make the producer thread wait for the final consumer to complete before continuing on. There is post-processing required once we have all the files in memory that can only be done once all the files have been read.
We have implemented a very naive counter solution based on a busy wait that polls a class counter (counter incremented by producer, decremented by consumer, protected by a mutex):
while(fileCnt > 0) {
usleep(10000);
}
This is of cause not a nice soltion.
Is there any way of doing this via conditionals/semaphores/something else?
We are limited to non-C++11 implementations (pthread based).
Thanks.
Hmm.. this is actually quite difficult to do in an efficient manner for a general case. If you know before submitting your first entry how many objects you are going to submit to the queue, (as you seem to do), it's easier:
Set an atomic integer to the number of objects to be submitted. Load a callback in each item queued that the threads call when they have finished processing each object. The callback decrements the int towards zero. When a thread decs it to zero, it signals a synchro object upon which the producer is waiting after queueing its last object.
I'm still thinking about what to do if the producer is iterating some list and does not know where the end is before queueing its first item:(
That case may require an actual lock in the callback so that the producer can enter it and check 'atomically' if all the queued operations are finished yet and, if not, wait on the synchro object after exiting the lock. It's safer if the synchro object maintains state, eg. a semaphore, so that a signal made after exiting the lock, but before the waiting, is not missed, (?? not sure how to do it safely with a condvar??).

How it is possible to wait inside thread loop

i want to know how it is possible to wait for a work to done and then continue and create new one
while(!stop)
{
CreateWork();
waitForWorkToDone();
}
wait must not block calling thread
how i can achive this?
To achieve this, you can rely on the operating system providing a facility to block until notified with or without a timeout. Thus, your thread correctly does not use unnecessary CPU cycles by performing a busy wait, but is still able to respond to program state changes. With POSIX threads, you can use a condition timed wait. I'll illustrate with the boost implementation, but the concept extends generally.
do
{
boost::unique_lock<boost::mutex> lock(state_change_mutex);
boost::system_time const timeout = boost::get_system_time() + boost::posix_time::seconds(5);
state_change_cond.timed_wait(lock,timeout);
...
} while(!done);
Overall this thread will loop until the done sentinel value becomes true. Other threads can signal this thread by calling
state_change_cond.notify_all();
Or in this example if no signal happens in 5 seconds then the thread wakes up by itself.
Note that condition variables require locking by mutexes. This is to guarantee that the thread is awoken atomically and that it will behave correctly in a mutually exclusive section as inter-thread signaling implicitly is.
How about Creating a Signal. Create a handler that creates CreateWork() and signals when the job is done! Just a Suggestion

C++/BOOST: Does condition_variable::wait( ) / notify( ) ensure ordering of threads waiting?

I need to know if there is a way to "queue up" threads that wait on a condition variable so that they are awoken in the correct order...without writing a bunch of queueing code, that is.
In most systems, the following reversal of the producer/consumer model (with blocking on full mailbox) may not ensure ordering:
unique_lock lock1(mutex), lock2(mutex)
ConditionVariable cv
Code Block A: (called by multiple threads)
lock(lock1)
timestampOnEntry = now()
cv.wait(lock1) // Don't worry about spurious notifies, out of scope.
somethingRequiringMonotonicOrderOfTimestamps(timestampOnEntry)
unlock(lock1)
Code Block B: (called by a single thread, typically within a loop)
lock(lock2)
somethingVeryVerySlow()
(1) unlock(lock2) // the ordering here is not a mistake
(2) cv.notify_one(lock2) // prevents needless reblocking in code block A
Note that lines (1) and (2) in the given order. This prevents an unnecessary second block on guard in code block A should the notified thread wake up before guard is unlocked by the thread in code block B.
The question is that if multiple threads are "blocked" on wait, I need to know if
*notify_one* will wake them up in the order in which blocked. Probably not (as in Java). If not by default, if there is a way to specify that.
This could of course be done with a bunch of queuing code, but I'd prefer to use a pre-canned BOOST methodology, regardless of how complicated the contents of the can are. Of course, should I convert *cv.notify_one(guard)* into *cv.notify_all(guard)*, I would be required to do the queueing code, regardless.
No such guarantess are given by the standard, notify_one may wake any thread that is currently waiting (§30.5.1):
void notify_one() noexcept;
Effects: If any threads are blocked waiting for *this, unblocks one of those theads.
The only way to ensure that a specific thread reacts to the event is to wake all threads and then have some additional synchronization mechanism that sends all but the correct thread back to sleep.
This is a fundamental limitation due to the requirements that the platform has to fulfill: Usually condition variables are implemented in a way that the waiting threads are put into a suspended state and will not get scheduled by the system again until a notify occurs. A scheduler implementation is not required to provide the functionality for selecting a specific thread for waking up (and many actually don't).
So this part of the logic inevitably has to be handled by user code, which in turn means you have to wake up all threads to make it work, because this is the only way to ensure that the correct thread will get woken at all.
The short answer, as you seem to have suspected, is no. Which thread (or threads) notify_one is going to rouse is not necessarily guaranteed.
That said, I'm not sure what to make of your example code. Specifically, passing a mutex to notify_one doesn't make sense to me (I am unaware of any condition variable implementations on any platform that signal/broadcast that way). I don't know your use case--perhaps you must have a lot of thread local data that prevents arranging your application state in such a way that any thread can pick up the necessary data to do the next task? My first reaction to that would be to refactor the code to care less about which particular OS thread does which work and focus more on the ordering of the work itself.

Cheapest way to wake up multiple waiting threads without blocking

I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?
If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.
For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html
Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)
boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.