C++11 lockfree single producer single consumer: how to avoid busy wait - c++

I'm trying to implement a class that uses two threads: one for the producer and one for the consumer. The current implementation does not use locks:
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
Worker() : working_(false), done_(false) {}
~Worker() {
done_ = true; // exit even if the work has not been completed
worker_.join();
}
void enqueue(int value) {
queue_.push(value);
if (!working_) {
working_ = true;
worker_ = std::thread([this]{ work(); });
}
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_ = false;
}
private:
std::atomic<bool> working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
The application needs to enqueue work items for a certain amount of time and then sleep waiting for an event. This is a minimal main that simulates the behavior:
int main()
{
Worker w;
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
I'm pretty sure that my implementation is bugged: what if the worker thread completes and before executing working_ = false, another enqueue comes? Is it possible to make my code thread safe without using locks?
The solution requires:
a fast enqueue
the destructor has to quit even if the queue is not empty
no busy wait, because there are long period of time in which the worker thread is idle
no locks if possible
Edit
I did another implementation of the Worker class, based on your suggestions. Here is my second attempt:
class Worker
{
public:
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false) { }
~Worker() {
// exit even if the work has not been completed
done_ = true;
if (worker_.joinable())
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set()) {
if (worker_.joinable())
worker_.join();
worker_ = std::thread([this]{ work(); });
}
return enqueued;
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
I introduced the worker_.join() inside the enqueue method. This can impact the performances, but in very rare cases (when the queue gets empty and before the thread exits, another enqueue comes). The working_ variable is now an atomic_flag that is set in enqueue and cleared in work. The Additional while after working_.clear() is needed because if another value is pushed, before the clear, but after the while, the value is not processed.
Is this implementation correct?
I did some tests and the implementation seems to work.
OT: Is it better to put this as an edit, or an answer?

what if the worker thread completes and before executing working_ = false, another enqueue comes?
Then the value will be pushed to the queue but will not be processed until another value is enqueued after the flag is set. You (or your users) may decide whether that is acceptable. This can be avoided using locks, but they're against your requirements.
The code may fail if the running thread is about to finish and sets working_ = false; but hasn't stopped running before next value is enqueued. In that case your code will call operator= on the running thread which results in a call to std::terminate according to the linked documentation.
Adding worker_.join() before assigning the worker to a new thread should prevent that.
Another problem is that queue_.push may fail if the queue is full because it has a fixed size. Currently you just ignore the case and the value will not be added to the full queue. If you wait for queue to have space, you don't get fast enqueue (in the edge case). You could take the bool returned by push (which tells if it was successful) and return it from enqueue. That way the caller may decide whether it wants to wait or discard the value.
Or use non-fixed size queue. Boost has this to say about that choice:
Can be used to completely disable dynamic memory allocations during push in order to ensure lockfree behavior.
If the data structure is configured as fixed-sized, the internal nodes are stored inside an array and they are addressed
by array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
to achieve lock-freedom.

Your worker thread needs more than 2 states.
Not running
Doing tasks
Idle shutdown
Shutdown
If you force shut down, it skips idle shutdown. If you run out of tasks, it transitions to idle shutdown. In idle shutdown, it empties the task queue, then goes into shutting down.
Shutdown is set, then you walk off the end of your worker task.
The producer first puts things on the queue. Then it checks the worker state. If Shutdown or Idle shutdown, first join it (and transition it to not running) then launch a new worker. If not running, just launch a new worker.
If the producer wants to launch a new worker, it first makes sure that we are in the not running state (otherwise, logic error). We then transition to the Doing tasks state, and then we launch the worker thread.
If the producer wants to shut down the helper task, it sets the done flag. It then checks the worker state. If it is anything besides not running, it joins it.
This can result in a worker thread that is launched for no good reason.
There are a few cases where the above can block, but there where a few before as well.
Then, we write a formal or semi-formal proof that the above cannot lose messages, because when writing lock free code you aren't done until you have a proof.

This is my solution of the question. I don't like very much answering myself, but I think showing actual code may help others.
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
// I used this semaphore class: https://gist.github.com/yohhoy/2156481
#include "binsem.hpp"
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
// the worker thread starts in the constructor
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false), semaphore_(0)
, worker_([this]{ work(); })
{ }
~Worker() {
// exit even if the work has not been completed
done_ = true;
semaphore_.signal();
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set())
// signal to the worker thread to wake up
semaphore_.signal();
return enqueued;
}
void work() {
int value;
// the worker thread continue to live
while (!done_) {
// wait the start signal, sleeping
semaphore_.wait();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
binsem semaphore_;
Queue queue_;
std::thread worker_;
};
I tried the suggestion of #Cameron, to not shutdown the thread and adding a semaphore. This actually is used only in the first enqueue and in the last work. This is not lock-free, but only in these two cases.
I did some performance comparison, between my previous version (see my edited question), and this one. There are no significant differences, when there are not many start and stop. However, the enqueue is 10 times faster when it have to signal the worker thread, instead of starting a new thread. This is a rare case, so it is not very important, but anyway it is an improvement.
This implementation satisfies:
lock-free in the common case (when enqueue and work are busy);
no busy wait in case for long time there are not enqueue
the destructor exits as soon as possible
correctness?? :)

Very partial answer: I think all those atomics, semaphores and states are a back-communication channel, from "the thread" to "the Worker". Why not use another queue for that? At the very least, thinking about it will help you around the problem.

Related

How bad it is to lock a mutex in an infinite loop or an update function

std::queue<double> some_q;
std::mutex mu_q;
/* an update function may be an event observer */
void UpdateFunc()
{
/* some other processing */
std::lock_guard lock{ mu_q };
while (!some_q.empty())
{
const auto& val = some_q.front();
/* update different states according to val */
some_q.pop();
}
/* some other processing */
}
/* some other thread might add some values after processing some other inputs */
void AddVal(...)
{
std::lock_guard lock{ mu_q };
some_q.push(...);
}
For this case is it okay to handle the queue this way?
Or would it be better if I try to use a lock-free queue like the boost one?
How bad it is to lock a mutex in an infinite loop or an update function
It's pretty bad. Infinite loops actually make your program have undefined behavior unless it does one of the following:
terminate
make a call to a library I/O function
perform an access through a volatile glvalue
perform a synchronization operation or an atomic operation
Acquiring the mutex lock before entering the loop and just holding it does not count as performing a synchronization operation (in the loop). Also, when holding the mutex, noone can add information to the queue, so while processing the information you extract, all threads wanting to add to the queue will have to wait - and no other worker threads wanting to share the load can extract from the queue either. It's usually better to extract one task from the queue, release the lock and then work with what you got.
The common way is to use a condition_variable that lets other threads acquire the lock and then notify other threads waiting with the same condition_variable. The CPU will be pretty close to idle while waiting and wake up to do the work when needed.
Using your program as a base, it could look like this:
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
std::queue<double> some_q;
std::mutex mu_q;
std::condition_variable cv_q; // the condition variable
bool stop_q = false; // something to signal the worker thread to quit
/* an update function may be an event observer */
void UpdateFunc() {
while(true) {
double val;
{
std::unique_lock lock{mu_q};
// cv_q.wait lets others acquire the lock to work with the queue
// while it waits to be notified.
while (not stop_q && some_q.empty()) cv_q.wait(lock);
if(stop_q) break; // time to quit
val = std::move(some_q.front());
some_q.pop();
} // lock released so others can use the queue
// do time consuming work with "val" here
std::cout << "got " << val << '\n';
}
}
/* some other thread might add some values after processing some other inputs */
void AddVal(double val) {
std::lock_guard lock{mu_q};
some_q.push(val);
cv_q.notify_one(); // notify someone that there's a new value to work with
}
void StopQ() { // a function to set the queue in shutdown mode
std::lock_guard lock{mu_q};
stop_q = true;
cv_q.notify_all(); // notify all that it's time to stop
}
int main() {
auto th = std::thread(UpdateFunc);
// simulate some events coming with some time apart
std::this_thread::sleep_for(std::chrono::seconds(1));
AddVal(1.2);
std::this_thread::sleep_for(std::chrono::seconds(1));
AddVal(3.4);
std::this_thread::sleep_for(std::chrono::seconds(1));
AddVal(5.6);
std::this_thread::sleep_for(std::chrono::seconds(1));
StopQ();
th.join();
}
If you really want to process everything that is currently in the queue, then extract everything first and then release the lock, then work with what you extracted. Extracting everything from the queue is done quickly by just swapping in another std::queue. Example:
#include <atomic>
std::atomic<bool> stop_q{}; // needs to be atomic in this version
void UpdateFunc() {
while(not stop_q) {
std::queue<double> work; // this will be used to swap with some_q
{
std::unique_lock lock{mu_q};
// cv_q.wait lets others acquire the lock to work with the queue
// while it waits to be notified.
while (not stop_q && some_q.empty()) cv_q.wait(lock);
std::swap(work, some_q); // extract everything from the queue at once
} // lock released so others can use the queue
// do time consuming work here
while(not stop_q && not work.empty()) {
auto val = std::move(work.front());
work.pop();
std::cout << "got " << val << '\n';
}
}
}
You can use it like you currently are assuming proper use of the lock across all threads. However, you may run into some frustrations about how you want to call updateFunc().
Are you going to be using a callback?
Are you going to be using an ISR?
Are you going to be polling?
If you use a 3rd party lib it often trivializes thread synchronization and queues
For example, if you are using a CMSIS RTOS(v2). It is a fairly straight forward process to get multiple threads to pass information between each other. You could have multiple producers, and a single consumer.
The single consumer can wait in a forever loop where it waits to receive a message before performing its work
when timeout is set to osWaitForever the function will wait for an
infinite time until the message is retrieved (i.e. wait semantics).
// Two producers
osMessageQueuePut(X,Y,Z,timeout=0)
osMessageQueuePut(X,Y,Z,timeout=0)
// One consumer which will run only once something enters the queue
osMessageQueueGet(X,Y,Z,osWaitForever)
tldr; You are safe to proceed, but using a library will likely make your synchronization problems easier.

Two questions on std::condition_variables

I have been trying to figure out std::condition_variables and I am particularly confused by wait() and whether to use notify_all or notify_one.
First, I've written some code and attached it below. Here's a short explanation: Collection is a class that holds onto a bunch of Counter objects. These Counter objects have a Counter::increment() method, which needs to be called on all the objects, over and over again. To speed everything up, Collection also maintains a thread pool to distribute the work over, and sends out all the work with its Collection::increment_all() method.
These threads don't need to communicate with each other, and there are usually many more Counter objects than there are threads. It's fine if one thread processes more than Counters than others, just as long as all the work gets done. Adding work to the queue is easy and only needs to be done in the "main" thread. As far as I can see, the only bad thing that can happen is if other methods (e.g. Collection::printCounts) are allowed to be called on the counters in the middle of the work being done.
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
#include <queue>
class Counter{
private:
int m_count;
public:
Counter() : m_count(0) {}
void increment() {
m_count ++;
}
int getCount() const { return m_count; }
};
class Collection{
public:
Collection(unsigned num_threads, unsigned num_counters)
: m_shutdown(false)
{
// start workers
for(size_t i = 0; i < num_threads; ++i){
m_threads.push_back(std::thread(&Collection::work, this));
}
// intsntiate counters
for(size_t j = 0; j < num_counters; ++j){
m_counters.emplace_back();
}
}
~Collection()
{
m_shutdown = true;
for(auto& t : m_threads){
if(t.joinable()){
t.join();
}
}
}
void printCounts() {
// wait for work to be done
std::unique_lock<std::mutex> lk(m_mtx);
m_work_complete.wait(lk); // q2: do I need a while lop?
// print all current counters
for(const auto& cntr : m_counters){
std::cout << cntr.getCount() << ", ";
}
std::cout << "\n";
}
void increment_all()
{
std::unique_lock<std::mutex> lock(m_mtx);
m_work_complete.wait(lock);
for(size_t i = 0; i < m_counters.size(); ++i){
m_which_counters_have_work.push(i);
}
}
private:
void work()
{
while(!m_shutdown){
bool action = false;
unsigned which_counter;
{
std::unique_lock<std::mutex> lock(m_mtx);
if(m_which_counters_have_work.size()){
which_counter = m_which_counters_have_work.front();
m_which_counters_have_work.pop();
action = true;
}else{
m_work_complete.notify_one(); // q1: notify_all
}
}
if(action){
m_counters[which_counter].increment();
}
}
}
std::vector<Counter> m_counters;
std::vector<std::thread> m_threads;
std::condition_variable m_work_complete;
std::mutex m_mtx;
std::queue<unsigned> m_which_counters_have_work;
bool m_shutdown;
};
int main() {
int num_threads = std::thread::hardware_concurrency()-1;
int num_counters = 10;
Collection myCollection(num_threads, num_counters);
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
return 0;
}
I compile this on Ubuntu 18.04 with g++ -std=c++17 -pthread thread_pool.cpp -o tp && ./tp I think the code accomplishes all of those objectives, but a few questions remain:
I am using m_work_complete.wait(lk) to make sure the work is finished before I start printing all the new counts. Why do I sometimes see this written inside a while loop, or with a second argument as a lambda predicate function? These docs mention spurious wake ups. If a spurious wake up occurs, does that mean printCounts could prematurely print? If so, I don't want that. I just want to ensure the work queue is empty before I start using the numbers that should be there.
I am using m_work_complete.notify_all instead of m_work_complete.notify_one. I've read this thread, and I don't think it matters--only the main thread is going to be blocked by this. Is it faster to use notify_one just so the other threads don't have to worry about it?
std::condition_variable is not really a condition variable, it's more of a synchronization tool for reaching a certain condition. What that condition is is up to the programmer, and it should still be checked after each condition_variable wake-up, since it can wake-up spuriously, or "too early", when the desired condition isn't yet reached.
On POSIX systems, condition_variable::wait() delegates to pthread_cond_wait, which is susceptible to spurious wake-up (see "Condition Wait Semantics" in the Rationale section). On Linux, pthread_cond_wait is in turn implemented via a futex, which is again susceptible to spurious wake-up.
So yes you still need a flag (protected by the same mutex) or some other way to check that the work is actually complete. A convenient way to do this is by wrapping the check in a predicate and passing it to the wait() function, which would loop for you until the predicate is satisfied.
notify_all unblocks all threads waiting on the condition variable; notify_one unblocks just one (or at least one, to be precise). If there are more than one waiting threads, and they are equivalent, i.e. either one can handle the condition fully, and if the condition is sufficient to let just one thread continue (as in submitting a work unit to a thread pool), then notify_one would be more efficient since it won't unblock other threads unnecessarily for them to only notice no work to be done and going back to waiting. If you ever only have one waiter, then there would be no difference between notify_one and notify_all.
It's pretty simple: Use notify() when;
There is no reason why more than one thread needs to know about the event. (E.g., use notify() to announce the availability of an item that a worker thread will "consume," and thereby make the item unavailable to other workers)*AND*
There is no wrong thread that could be awakened. (E.g., you're probably safe if all of the threads are wait()ing in the same line of the same exact function.)
Use notify_all() in all other cases.

How could I quit a C++ blocking queue?

After reading some other articles, I got to know that I could implement a c++ blocking queue like this:
template<typename T>
class BlockingQueue {
public:
std::mutex mtx;
std::condition_variable not_full;
std::condition_variable not_empty;
std::queue<T> queue;
size_t capacity{5};
BlockingQueue()=default;
BlockingQueue(int cap):capacity(cap) {}
BlockingQueue(const BlockingQueue&)=delete;
BlockingQueue& operator=(const BlockingQueue&)=delete;
void push(const T& data) {
std::unique_lock<std::mutex> lock(mtx);
while (queue.size() >= capacity) {
not_full.wait(lock, [&]{return queue.size() < capacity;});
}
queue.push(data);
not_empty.notify_all();
}
T pop() {
std::unique_lock<std::mutex> lock(mtx);
while (queue.empty()) {
not_empty.wait(lock, [&]{return !queue.empty();});
}
T res = queue.front();
queue.pop();
not_full.notify_all();
return res;
}
bool empty() {
std::unique_lock<std::mutex> lock(mtx);
return queue.empty();
}
size_t size() {
std::unique_lock<std::mutex> lock(mtx);
return queue.size();
}
void set_capacity(const size_t capacity) {
this->capacity = (capacity > 0 ? capacity : 10);
}
};
This works for me, but I do not know how could I shut it down if I start it in the background thread:
void main() {
BlockingQueue<float> q;
bool stop{false};
auto fun = [&] {
std::cout << "before entering loop\n";
while (!stop) {
q.push(1);
}
std::cout << "after entering loop\n";
};
std::thread t_bg(fun);
t_bg.detach();
// Some other tasks here
stop = true;
// How could I shut it down before quit here, or could I simply let the operation system do that when the whole program is over?
}
The problem is that when I want to shut down the background thread, the background thread might have been sleeping because the queue is full and the push operation is blocked. How could I stop it when I want the background thread to stop ?
One easy way would be to add a flag that you set from outside when you want to abort a pop() operation that's already blocked. And then you'd have to decide what an aborted pop() is going to return. One way is for it to throw an exception, another would be to return an std::optional<T>. Here's the first method (I'll only write the changed parts.)
Add this type wherever you think is appropriate:
struct AbortedPopException {};
Add this to your class fields:
mutable std::atomic<bool> abort_flag = false;
Also add this method:
void abort () const {
abort_flag = true;
}
Change the while loop in the pop() method like this: (you don't need the while at all, since I believe the condition variable wait() method that accepts a lambda does not wake up/return spuriously; i.e. the loop is inside the wait already.)
not_empty.wait(lock, [this]{return !queue.empty() || abort_flag;});
if (abort_flag)
throw AbortedPopException{};
That's it (I believe.)
In your main(), when you want to shut the "consumer" down you can call abort() on your queue. But you'll have to handle the thrown exception there as well. It's your "exit" signal, basically.
Some side notes:
Don't detach from threads! Specially here where AFAICT there is no reason for it (and some actual danger too.) Just signal them to exit (in any manner appropriate) and join() them.
Your stop flag should be atomic. You read from it in your background thread and write to it from your main thread, and those can (and in fact do) overlap in time, so... data race!
I don't understand why you have a "full" state and "capacity" in your queue. Think about whether they are necessary.
UPDATE 1: In response to OP's comment about detaching... Here's what happens in your main thread:
You spawn the "producer" thread (i.e. the one that pushed stuff onto the queue)
Then you do all the work you want to do (e.g. consuming the stuff on the queue)
Sometime, perhaps at the end of main(), you signal the thread to stop (e.g. by setting stop flag to true)
then, and only then you join() with the thread.
It is true that your main thread will block while it is waiting for the thread to pick up the "stop" signal, exit its loop, and return from its thread function, but that's a very very short wait. And you have nothing else to do. More importantly, you'll know that your thread exited cleanly and predictably, and from that point on, you know definitely that that thread won't be running (not important for you here, but could be critical for some other threaded task.)
That is the pattern that you usually want to follow in spawning worker thread that loop over a short task.
Update 2: About "full" and "capacity" of the queue. That's fine. It's certainly your decision. No problem with that.
Update 3: About "throwing" vs. returning an "empty" object to signal an aborted "blocking pop()". I don't think there is anything wrong with throwing like that; specially since it is very very rare (just happens once at the end of the operation of the producer/consumer.) However, if all T types that you want to store in your Queue have an "invalid" or "empty" state, then you certainly can use that. But throwing is more general, if more "icky" to some people.

Thread Pool blocks main threads after some loops

I'm trying to learn how threading works on C++ and I found an implementation which I used as a guide
to make my own implementation, however after a loop or a couple it blocks.
I have a thread-safe queue in which I retrieve the jobs that are assigned to the thread pool.
Each thread runs this function:
// Declarations
std::vector<std::thread> m_threads;
JobQueue m_jobs; // A queue with locks
std::mutex m_mutex;
std::condition_variable m_condition;
std::atomic_bool m_active;
std::atomic_bool m_started;
std::atomic_int m_busy;
///...
[this, threadIndex] {
int numThread = threadIndex;
while(this->m_active) {
std::unique_ptr<Job> currJob;
bool dequeued = false;
{
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this, numThread]() {
return (this->m_started && !this->m_jobs.empty()) || !this->m_active;
});
if (this->m_active) {
m_busy++;
dequeued = this->m_jobs.dequeue(currJob);
}
}
if (dequeued) {
currJob->execute();
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
m_condition.notify_all();
} else {
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
}
}
}
and the loop is basically:
while(1) {
int numJobs = rand() % 10000;
std::cout << "Will do " << numJobs << " jobs." << std::endl;
while(numJobs--) {
pool.assign([](){
// some heavy calculation
});
}
pool.waitEmpty();
std::cout << "Done!" << std::endl; // chrono removed for readability
}
While the waitEmpty method is described as:
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this] {
return this->empty();
});
And is in this wait method that the code usually hangs as the test inside is never called again.
I've debugged it, changed the notification_one's and all's from place to place, but for some reason after some loops it always blocks.
Usually, but not always, it locks on condition_variable.wait() method that locks the current thread until there are no other thread working and the queue is empty, but I also saw it happen when I call condition_variable.notify_all().
Some debugging helped me notice that while I call notify_all() on the slave thread, the wait() in the main thread is not tested again.
The expected behavior is that it does not block when it loops.
I'm using G++ 8.1.0 on Windows.
and the output is:
Will do 41 jobs.
Done! Took 0ms!
Will do 8467 jobs.
<main thread blocked>
Edit: I fixed the issue pointed by paddy's comment: now m_busy-- also happens when a job is not dequeued.
Edit 2: Running this on Linux does not locks the main thread and runs as expected. (g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0)
Edit 3: As mentioned in the comments, corrected deadlock to block, as it only involves one lock
Edit 4: As commented by Jérôme Richard I was able to improve it by creating a lock_guard around the m_busy--; but now the code blocks at the notify_all() that is called inside the assign method. Here is the assign method for reference:
template<class Func, class... Args>
auto assign(Func&& func, Args&&... args) -> std::future<typename std::result_of<Func(Args...)>::type> {
using jobResultType = typename std::result_of<Func(Args...)>::type;
auto task = std::make_shared<std::packaged_task<jobResultType()>>(
std::bind(std::forward<Func>(func), std::forward<Args>(args)...)
);
auto job = std::unique_ptr<Job>(new Job([task](){ (*task)(); }));
std::future<jobResultType> result = task->get_future();
m_jobs.enqueue(std::move(job));
std::cout << " - enqueued";
m_condition.notify_all();
std::cout << " - ok!" << std::endl;
return result;
}
In one of the loops the last output is
//...
- enqueued - ok!
- enqueued - ok!
- enqueued
<blocked again>
Edit 5: With the latest changes, this does not happens on msbuild compiler.
The Gist for my implementation is here: https://gist.github.com/GuiAmPm/4be7716b7f1ea62819e61ef4ad3beb02
Here's also the original Article which I based my implementation:
https://roar11.com/2016/01/a-platform-independent-thread-pool-using-c14/
Any help will be appreciated.
tl;dr: use a std::lock_guard of m_mutex around m_busy-- to avoid unexpected wait condition blocking.
Analysis
First of all, please note that the problem can occur with one thread in the pool and just few jobs. This means that there is a problem between the master thread that submit the jobs and the one that execute them.
Using GDB to analyze further the state of the program when the wait condition get stuck, one can see that there is no jobs, m_busy is set to 0 and both threads are waiting for notifications.
This means that there is a concurrency issue on the wait condition between the master and the only worker on the last job to execute.
By adding a global atomic clock in your code, one can see that (in almost all case) the worker finishes all the jobs before the master can wait for the jobs to be completed and workers done.
Here is one practical scenario retrieved (bullets are done sequentially):
the master start the wait call and there is jobs remaining
the worker perform m_busy++, dequeue the last job and execute it (m_busy is now set to 1 and the job queue is empty)
the master compute the predicate of the wait call
the master call ThreadPool::empty and the result is false due to busy set to 1
the worker perform m_busy-- (m_busy is now set to 0)
from that moment, the master could wait for the condition back (but is suspected to not do it)
the worker notify the condition
the master is suspected to wait for the condition back only now and to not be impacted by this last notification (as no waits will happen next)
At this point, the master is no longer executing instructions and will wait forever
the worker wait for the condition and will wait forever too
The fact that the master is not impacted by the notification is very strange.
It seems to be related to memory fencing issues. A more detailed explanation can be found here. To quote the article:
Even if you make dataReady an atomic, it must be modified under the mutex; if not the modification to the waiting thread may be published, but not correctly synchronized.
So a solution is to replace the m_busy-- instruction by the following lines:
{
std::lock_guard<std::mutex> lck {this->m_mutex};
m_busy--;
}
It avoid the previous scenario. Indeed, on one hand m_mutex is acquired in during the predicate checking of the wait call preventing m_busy to be modified during this specific moment; on the other hand it enforce data to be properly synchronized.
It should be theoretically safer to also include the m_jobs.dequeue call into it but will strongly reduce the degree of parallelism of the workers. In practice, useful synchronizations are made when the lock is released in the worker threads.
Please note that one general workaround to avoid such problems could be to add a timeout to waiting calls using the wait_for function in a loop to enforce the predicate condition. However, this solution comes a the price of a higher latency of the waiting calls and can thus significantly slow the execution down.

C++ multithreading, simple consumer / producer threads, LIFO, notification, counter

I am new to multi-thread programming, I want to implement the following functionality.
There are 2 threads, producer and consumer.
Consumer only processes the latest value, i.e., last in first out (LIFO).
Producer sometimes generates new value at a faster rate than consumer can
process. For example, producer may generate 2 new value in 1
milli-second, but it approximately takes consumer 5 milli-seconds to process.
If consumer receives a new value in the middle of processing an old
value, there is no need to interrupt. In other words, consumer will finish current
execution first, then start an execution on the latest value.
Here is my design process, please correct me if I am wrong.
There is no need for a queue, since only the latest value is
processed by consumer.
Is notification sent from producer being queued automatically???
I will use a counter instead.
ConsumerThread() check the counter at the end, to make sure producer
doesn't generate new value.
But what happen if producer generates a new value just before consumer
goes to sleep(), but after check the counter???
Here is some pseudo code.
boost::mutex mutex;
double x;
void ProducerThread()
{
{
boost::scoped_lock lock(mutex);
x = rand();
counter++;
}
notify(); // wake up consumer thread
}
void ConsumerThread()
{
counter = 0; // reset counter, only process the latest value
... do something which takes 5 milli-seconds ...
if (counter > 0)
{
... execute this function again, not too sure how to implement this ...
}
else
{
... what happen if producer generates a new value here??? ...
sleep();
}
}
Thanks.
If I understood your question correctly, for your particular application, the consumer only needs to process the latest available value provided by the producer. In other words, it's acceptable for values to get dropped because the consumer cannot keep up with the producer.
If that's the case, then I agree that you can get away without a queue and use a counter. However, the shared counter and value variables will be need to be accessed atomically.
You can use boost::condition_variable to signal notifications to the consumer that a new value is ready. Here is a complete example; I'll let the comments do the explaining.
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition_variable.hpp>
#include <boost/thread/locks.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
boost::mutex mutex;
boost::condition_variable condvar;
typedef boost::unique_lock<boost::mutex> LockType;
// Variables that are shared between producer and consumer.
double value = 0;
int count = 0;
void producer()
{
while (true)
{
{
// value and counter must both be updated atomically
// using a mutex lock
LockType lock(mutex);
value = std::rand();
++count;
// Notify the consumer that a new value is ready.
condvar.notify_one();
}
// Simulate exaggerated 2ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
}
void consumer()
{
// Local copies of 'count' and 'value' variables. We want to do the
// work using local copies so that they don't get clobbered by
// the producer when it updates.
int currentCount = 0;
double currentValue = 0;
while (true)
{
{
// Acquire the mutex before accessing 'count' and 'value' variables.
LockType lock(mutex); // mutex is locked while in this scope
while (count == currentCount)
{
// Wait for producer to signal that there is a new value.
// While we are waiting, Boost releases the mutex so that
// other threads may acquire it.
condvar.wait(lock);
}
// `lock` is automatically re-acquired when we come out of
// condvar.wait(lock). So it's safe to access the 'value'
// variable at this point.
currentValue = value; // Grab a copy of the latest value
// while we hold the lock.
}
// Now that we are out of the mutex lock scope, we work with our
// local copy of `value`. The producer can keep on clobbering the
// 'value' variable all it wants, but it won't affect us here
// because we are now using `currentValue`.
std::cout << "value = " << currentValue << "\n";
// Simulate exaggerated 5ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(500));
}
}
int main()
{
boost::thread c(&consumer);
boost::thread p(&producer);
c.join();
p.join();
}
ADDENDUM
I was thinking about this question recently, and realized that this solution, while it may work, is not optimal. Your producer is using all that CPU just to throw away half of the computed values.
I suggest that you reconsider your design and go with a bounded blocking queue between the producer and consumer. Such a queue should have the following characteristics:
Thread-safe
The queue has a fixed size (bounded)
If the consumer wants to pop the next item, but the queue is empty, the operation will be blocked until notified by the producer that an item is available.
The producer can check if there's room to push another item and block until the space becomes available.
With this type of queue, you can effectively throttle down the producer so that it doesn't outpace the consumer. It also ensures that the producer doesn't waste CPU resources computing values that will be thrown away.
Libraries such as TBB and PPL provide implementations of concurrent queues. If you want to attempt to roll your own using std::queue (or boost::circular_buffer) and boost::condition_variable, check out this blogger's example.
The short answer is that you're almost certainly wrong.
With a producer/consumer, you pretty much need a queue between the two threads. There are basically two alternatives: either your code won't will simply lose tasks (which usually equals not working at all) or else your producer thread will need to block for the consumer thread to be idle before it can produce an item -- which effectively translates to single threading.
For the moment, I'm going to assume that the value you get back from rand is supposed to represent the task to be executed (i.e., is the value produced by the producer and consumed by the consumer). In that case, I'd write the code something like this:
void producer() {
for (int i=0; i<100; i++)
queue.insert(random()); // queue.insert blocks if queue is full
queue.insert(-1.0); // Tell consumer to exit
}
void consumer() {
double value;
while ((value = queue.get()) != -1) // queue.get blocks if queue is empty
process(value);
}
This, relegates nearly all the interlocking to the queue. The rest of the code for both threads pretty much ignores threading issues entirely.
Implementing a pipeline is actually quite tricky if you are doing it ground-up. For example, you'd have to use condition variable to avoid the kind of race condition you described in your question, avoid busy waiting when implementing the mechanism for "waking up" the consumer etc... Even using a "queue" of just 1 element won't save you from some of these complexities.
It's usually much better to use specialized libraries that were developed and extensively tested specifically for this purpose. If you can live with Visual C++ specific solution, take a look at Parallel Patterns Library, and the concept of Pipelines.