Allow multiple mutex owners or specified number of concurrent code executions - c++

There is a code I that I don't want to be executed by more than X threads (e.g. five) at a time. Every smaller number would be fine. I'm currently experimenting with condition variables, here is what I worked out so far:
void Manager::EnterQueue(Worker *w)
{
{
// Ensure stable work of std::queue
const std::lock_guard<std::mutex> lock(queueInsertMutex);
workerQueue.push(w);
}
std::unique_lock<std::mutex> unlock_object(queueMutex);
while (workerQueue.front() != w)
{
// stop all threads not being at front of queue
cv.wait(unlock_object);
}
workerQueue.pop();
{
// ensure that numOfAvailableObjects is checked by one thread at a time
const std::lock_guard<std::mutex> lock(counterMutex);
if (numOfAvailableObjects > 1)
{
// limit is not exceeded. Fire up another thread
numOfAvailableObjects--;
cv.notify_all();
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(SOME_WORK_TIME_IN_MS));
{
const std::lock_guard<std::mutex> lock(counterMutex);
numOfAvailableObjects++;
}
cv.notify_all();
}
So the idea was:
make while loop that passes only Workers that are first in queue
if numOfAvailableObjects > 1 then pass next worker by calling cv.notify_all() which (in theory) would resume all cv.wait(unlock_object).
Of course it doesn't work as expected. sleep_for is executed currently by one thread at a time. Do you know any other alternatives that would work like cv.wait() that will stop threads until I tell them to resume?

I think you're looking for a semaphore. Look it up.

Related

Thread Pool blocks main threads after some loops

I'm trying to learn how threading works on C++ and I found an implementation which I used as a guide
to make my own implementation, however after a loop or a couple it blocks.
I have a thread-safe queue in which I retrieve the jobs that are assigned to the thread pool.
Each thread runs this function:
// Declarations
std::vector<std::thread> m_threads;
JobQueue m_jobs; // A queue with locks
std::mutex m_mutex;
std::condition_variable m_condition;
std::atomic_bool m_active;
std::atomic_bool m_started;
std::atomic_int m_busy;
///...
[this, threadIndex] {
int numThread = threadIndex;
while(this->m_active) {
std::unique_ptr<Job> currJob;
bool dequeued = false;
{
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this, numThread]() {
return (this->m_started && !this->m_jobs.empty()) || !this->m_active;
});
if (this->m_active) {
m_busy++;
dequeued = this->m_jobs.dequeue(currJob);
}
}
if (dequeued) {
currJob->execute();
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
m_condition.notify_all();
} else {
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
}
}
}
and the loop is basically:
while(1) {
int numJobs = rand() % 10000;
std::cout << "Will do " << numJobs << " jobs." << std::endl;
while(numJobs--) {
pool.assign([](){
// some heavy calculation
});
}
pool.waitEmpty();
std::cout << "Done!" << std::endl; // chrono removed for readability
}
While the waitEmpty method is described as:
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this] {
return this->empty();
});
And is in this wait method that the code usually hangs as the test inside is never called again.
I've debugged it, changed the notification_one's and all's from place to place, but for some reason after some loops it always blocks.
Usually, but not always, it locks on condition_variable.wait() method that locks the current thread until there are no other thread working and the queue is empty, but I also saw it happen when I call condition_variable.notify_all().
Some debugging helped me notice that while I call notify_all() on the slave thread, the wait() in the main thread is not tested again.
The expected behavior is that it does not block when it loops.
I'm using G++ 8.1.0 on Windows.
and the output is:
Will do 41 jobs.
Done! Took 0ms!
Will do 8467 jobs.
<main thread blocked>
Edit: I fixed the issue pointed by paddy's comment: now m_busy-- also happens when a job is not dequeued.
Edit 2: Running this on Linux does not locks the main thread and runs as expected. (g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0)
Edit 3: As mentioned in the comments, corrected deadlock to block, as it only involves one lock
Edit 4: As commented by Jérôme Richard I was able to improve it by creating a lock_guard around the m_busy--; but now the code blocks at the notify_all() that is called inside the assign method. Here is the assign method for reference:
template<class Func, class... Args>
auto assign(Func&& func, Args&&... args) -> std::future<typename std::result_of<Func(Args...)>::type> {
using jobResultType = typename std::result_of<Func(Args...)>::type;
auto task = std::make_shared<std::packaged_task<jobResultType()>>(
std::bind(std::forward<Func>(func), std::forward<Args>(args)...)
);
auto job = std::unique_ptr<Job>(new Job([task](){ (*task)(); }));
std::future<jobResultType> result = task->get_future();
m_jobs.enqueue(std::move(job));
std::cout << " - enqueued";
m_condition.notify_all();
std::cout << " - ok!" << std::endl;
return result;
}
In one of the loops the last output is
//...
- enqueued - ok!
- enqueued - ok!
- enqueued
<blocked again>
Edit 5: With the latest changes, this does not happens on msbuild compiler.
The Gist for my implementation is here: https://gist.github.com/GuiAmPm/4be7716b7f1ea62819e61ef4ad3beb02
Here's also the original Article which I based my implementation:
https://roar11.com/2016/01/a-platform-independent-thread-pool-using-c14/
Any help will be appreciated.
tl;dr: use a std::lock_guard of m_mutex around m_busy-- to avoid unexpected wait condition blocking.
Analysis
First of all, please note that the problem can occur with one thread in the pool and just few jobs. This means that there is a problem between the master thread that submit the jobs and the one that execute them.
Using GDB to analyze further the state of the program when the wait condition get stuck, one can see that there is no jobs, m_busy is set to 0 and both threads are waiting for notifications.
This means that there is a concurrency issue on the wait condition between the master and the only worker on the last job to execute.
By adding a global atomic clock in your code, one can see that (in almost all case) the worker finishes all the jobs before the master can wait for the jobs to be completed and workers done.
Here is one practical scenario retrieved (bullets are done sequentially):
the master start the wait call and there is jobs remaining
the worker perform m_busy++, dequeue the last job and execute it (m_busy is now set to 1 and the job queue is empty)
the master compute the predicate of the wait call
the master call ThreadPool::empty and the result is false due to busy set to 1
the worker perform m_busy-- (m_busy is now set to 0)
from that moment, the master could wait for the condition back (but is suspected to not do it)
the worker notify the condition
the master is suspected to wait for the condition back only now and to not be impacted by this last notification (as no waits will happen next)
At this point, the master is no longer executing instructions and will wait forever
the worker wait for the condition and will wait forever too
The fact that the master is not impacted by the notification is very strange.
It seems to be related to memory fencing issues. A more detailed explanation can be found here. To quote the article:
Even if you make dataReady an atomic, it must be modified under the mutex; if not the modification to the waiting thread may be published, but not correctly synchronized.
So a solution is to replace the m_busy-- instruction by the following lines:
{
std::lock_guard<std::mutex> lck {this->m_mutex};
m_busy--;
}
It avoid the previous scenario. Indeed, on one hand m_mutex is acquired in during the predicate checking of the wait call preventing m_busy to be modified during this specific moment; on the other hand it enforce data to be properly synchronized.
It should be theoretically safer to also include the m_jobs.dequeue call into it but will strongly reduce the degree of parallelism of the workers. In practice, useful synchronizations are made when the lock is released in the worker threads.
Please note that one general workaround to avoid such problems could be to add a timeout to waiting calls using the wait_for function in a loop to enforce the predicate condition. However, this solution comes a the price of a higher latency of the waiting calls and can thus significantly slow the execution down.

How to properly abort a thread with the use of a condition_variable?

I have a class with some methods that should be thread safe, i.e. multiple threads should be able operate on the class object state. One of the methods spawns a new thread that, every 10 seconds, updates a field. Because this thread can be long-running, I'd like to be able to abort it properly.
I have implemented a solution that uses std::condition_variable.wait_for() to wait for an abortion signal inside the thread, but am not particularly sure if my solution is either optimal or correct at all.
class A
{
unsigned int value; // A value that will be updated every 10 s in another thread
bool is_being_updated; // true while value is being updated in another thread
std::thread t;
bool aborted; // true = thread should abort
mutable std::mutex m1;
mutable std::mutex m2;
std::condition_variable cv;
public:
A();
~A();
void begin_update(); // Creates a thread that periodically updates value
void abort(); // Aborts the updating thread
unsigned int get_value() const;
void set_value(unsigned int);
};
This is how I implemented the methods:
A::A() : value(0), is_being_updated(false), aborted(false) { }
A::~A()
{
// Not sure if this is thread safe?
if(t.joinable()) t.join();
}
// Updates this->value every 10 seconds
void A::begin_update()
{
std::lock_guard<std::mutex> lck(m1);
if (is_being_updated) return; // Don't allow begin_update() while updating
is_being_updated = true;
if (aborted) aborted = false;
// Create a thread that will update value periodically
t = std::thread([this] {
std::unique_lock<std::mutex> update_lock(m2);
for(int i=0; i < 10; i++)
{
cv.wait_for(update_lock, std::chrono::seconds(10), [this]{ return aborted; });
if (!aborted)
{
std::lock_guard<std::mutex> lck(m1);
this->value++; // Update value
}
else
{
break; // Break on thread abort
}
}
// Locking here would cause indefinite blocking ...
// std::lock_guard<std::mutex> lck(m1);
if(is_being_updated) is_being_updated = false;
});
}
// Aborts the thread created in begin_update()
void A::abort()
{
std::lock_guard<std::mutex> lck(m1);
is_being_updated = false;
this->value = 0; // Reset value
{
std::lock_guard<std::mutex> update_lock(m2);
aborted = true;
}
cv.notify_one(); // Signal abort ...
if(t.joinable()) t.join(); // Wait for the thread to finish
}
unsigned int A::get_value() const
{
std::lock_guard<std::mutex> lck(m1);
return this->value;
}
void A::set_value(unsigned int v)
{
std::lock_guard<std::mutex> lck(m1);
if (is_being_updated) return; // Cannot set value while thread is updating it
this->value = v;
}
This seems to work fine, but I'm uncertain about it being correct. My concerns are the following:
Is my destructor safe? Suppose that the updating thread has not been aborted and is still doing its job while A object goes out of scope. A switch to a different thread now happens while dtor's t.join() still hasn't finished, and the switched-to thread calls begin_update() on the same object. Is something like this possible? Should I introduce e.g. an extra is_being_destructed flag that I would set to true inside a destructor and that all other methods should check for being false before they can proceed? Or can no such undesired scenario happen?
Inside the thread, at the end, I'm setting is_being_updated = false without a lock, despite the variable being shared state. This can mean that other threads won't see its correct value, e.g. even after the thread is done, some other thread may still see the value as is_being_updated == true instead of false. I cannot lock the mutex, however, because abort() may have already locked it, meaning that the call will block indefinitely. I'm not sure about the best way to solve this, other than perhaps making is_being_updated atomic. Would that work?
I've read about spurious wakeups, but am not sure I the code should do anything extra to handle them. As far as I understand, the answer is no, and no problems are to be expected in this regard.
Is my thinking here correct? Did I miss anything else that I should have in mind?
This stuff is always hard to check, so don't be afraid to question me if you think I misunderstand.
Short answer, no, it's not thread safe.
As long as the thread that has scope of A is the one calling abort (and doesn't forget to call abort), you won't experience a race condition, as A::abort() will block until the thread is joined. Under these assumptions, the join in your destructor is pointless.
If abort is called by the a thread that doesn't own A, then it's definitely possible for the thread to be join-ed twice, which is bad. Using .joinable() to decide to join a thread or not is a big red flag.
Please remove one of your if(t.joinable()) t.join(); (I'm leaning towards the one in the destructor) and change the other to just t.join().
As you said, you can make is_being_updated atomic. That's a great solution.
Here's another solution. You can signal without holding the lock. (It's actually better form in general, as it helps reduce lock contention, since the first thing the woken thread needs to do is reacquire its mutex.)
void A::abort()
{
{
std::lock(m1, m2); // deadlock-proof
std::lock_guard<std::mutex> lck(m1, std::adopt_lock);
std::lock_guard<std::mutex> update_lock(m2, std::adopt_lock);
is_being_updated = false;
this->value = 0; // Reset value
aborted = true;
}
cv.notify_one(); // Signal abort ...
t.join(); // Wait for the thread to finish
}
You're good. The way you wrote the wait, you will only come back if abort==true or 10 seconds has elapsed.
1) I think this problem is inherent on your design, as it is a bool flag will not fix the problem. Maybe A shouldn't go out of scope until all the threads stop using it, in which case it should reside in a managed pointer like shared_ptr.
2) You should be using atomics for your bools and also value, this would avoid having to use the unique_lock for increasing the value and for returning it.
3) As I said in the comments the lambda in the cv handles the spurious wakeups.
The biggest bit of code smell is using a full thread to update a variable every 10 seconds. A heavy-weight OS thread with magabytes to gigabytes of address space to do one task every 10 seconds.
What more, it is updating a value without anyone being able to see the change.
You already have a get_value wrapping accessor. Simply store the start point when you want to start counting. When you call get_value calculate the time since the start point. Divide by 10 seconds. Use that to calculate the returned value.
In a real application, you'd have a timer system that lets you trigger events (either in a thread pool, or in a message pump) every period of time. You'd use that instead of a dedicated thread to do something like this, and you'd make sure that modifying that value was vulgar (allowed people to subscribe to changes in it). Then your abort would consist of deregistering the timer instead of stopping a thread.
Your system is a horrible mixture of the two, using threads for no good reason.

Thread pool stuck on wait condition

I'm encountering a stuck in my c++ program using this thread pool class:
class ThreadPool {
unsigned threadCount;
std::vector<std::thread> threads;
std::list<std::function<void(void)> > queue;
std::atomic_int jobs_left;
std::atomic_bool bailout;
std::atomic_bool finished;
std::condition_variable job_available_var;
std::condition_variable wait_var;
std::mutex wait_mutex;
std::mutex queue_mutex;
std::mutex mtx;
void Task() {
while (!bailout) {
next_job()();
--jobs_left;
wait_var.notify_one();
}
}
std::function<void(void)> next_job() {
std::function<void(void)> res;
std::unique_lock<std::mutex> job_lock(queue_mutex);
// Wait for a job if we don't have any.
job_available_var.wait(job_lock, [this]()->bool { return queue.size() || bailout; });
// Get job from the queue
mtx.lock();
if (!bailout) {
res = queue.front();
queue.pop_front();
}else {
// If we're bailing out, 'inject' a job into the queue to keep jobs_left accurate.
res = [] {};
++jobs_left;
}
mtx.unlock();
return res;
}
public:
ThreadPool(int c)
: threadCount(c)
, threads(threadCount)
, jobs_left(0)
, bailout(false)
, finished(false)
{
for (unsigned i = 0; i < threadCount; ++i)
threads[i] = std::move(std::thread([this, i] { this->Task(); }));
}
~ThreadPool() {
JoinAll();
}
void AddJob(std::function<void(void)> job) {
std::lock_guard<std::mutex> lock(queue_mutex);
queue.emplace_back(job);
++jobs_left;
job_available_var.notify_one();
}
void JoinAll(bool WaitForAll = true) {
if (!finished) {
if (WaitForAll) {
WaitAll();
}
// note that we're done, and wake up any thread that's
// waiting for a new job
bailout = true;
job_available_var.notify_all();
for (auto& x : threads)
if (x.joinable())
x.join();
finished = true;
}
}
void WaitAll() {
std::unique_lock<std::mutex> lk(wait_mutex);
if (jobs_left > 0) {
wait_var.wait(lk, [this] { return this->jobs_left == 0; });
}
lk.unlock();
}
};
gdb say (when stopping the blocked execution) that the stuck was in (std::unique_lock&, ThreadPool::WaitAll()::{lambda()#1})+58>
I'm using g++ v5.3.0 with support for c++14 (-std=c++1y)
How can I avoid this problem?
Update
I've edited (rewrote) the class: https://github.com/edoz90/threadpool/blob/master/ThreadPool.h
The issue here is a race condition on your job count. You're using one mutex to protect the queue, and another to protect the count, which is semantically equivalent to the queue size. Clearly the second mutex is redundant (and improperly used), as is the job_count variable itself.
Every method that deals with the queue has to gain exclusive access to it (even JoinAll to read its size), so you should use the same queue_mutex in the three bits of code that tamper with it (JoinAll, AddJob and next_job).
Btw, splitting the code at next_job() is pretty awkward IMO. You would avoid calling a dummy function if you handled the worker thread body in a single function.
EDIT:
As other comments have already stated, you would probably be better off getting your eyes off the code and reconsidering the problem globally for a while.
The only thing you need to protect here is the job queue, so you need only one mutex.
Then there is the problem of waking up the various actors, which requires a condition variable since C++ basically does not give you any other useable synchronization object.
Here again you don't need more than one variable. Terminating the thread pool is equivalent to dequeueing the jobs without executing them, which can be done any which way, be it in the worker threads themselves (skipping execution if the termination flag is set) or in the JoinAll function (clearing the queue after gaining exclusive access).
Last but not least, you might want to invalidate AddJob once someone decided to close the pool, or else you could get stuck in the destructor while someone keeps feeding in new jobs.
I think you need to keep it simple.
you seem to be using a mutex too many. So there's queue_mutex and you use that when you add and process jobs.
Now what's the need for another separate mutex when you are waiting on reading the queue?
Why can't you use just a conditional variable with the same queue_mutex to read the queue in your WaitAll() method?
Update
I would also recommend using a lock_guard instead of the unique_lock in your WaitAll. There really isn't a need to lock the queue_mutex beyond the WaitAll under exceptional conditions. If you exit the WaitAll exceptionally it should be released regardless.
Update2
Ignore my Update above. Since you are using a condition variable you can't use a lock guard in the WaitAll. But if you are using a unique_lock always go with the try_to_lock version especially if you have more than a couple control paths

C++11 lockfree single producer single consumer: how to avoid busy wait

I'm trying to implement a class that uses two threads: one for the producer and one for the consumer. The current implementation does not use locks:
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
Worker() : working_(false), done_(false) {}
~Worker() {
done_ = true; // exit even if the work has not been completed
worker_.join();
}
void enqueue(int value) {
queue_.push(value);
if (!working_) {
working_ = true;
worker_ = std::thread([this]{ work(); });
}
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_ = false;
}
private:
std::atomic<bool> working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
The application needs to enqueue work items for a certain amount of time and then sleep waiting for an event. This is a minimal main that simulates the behavior:
int main()
{
Worker w;
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
I'm pretty sure that my implementation is bugged: what if the worker thread completes and before executing working_ = false, another enqueue comes? Is it possible to make my code thread safe without using locks?
The solution requires:
a fast enqueue
the destructor has to quit even if the queue is not empty
no busy wait, because there are long period of time in which the worker thread is idle
no locks if possible
Edit
I did another implementation of the Worker class, based on your suggestions. Here is my second attempt:
class Worker
{
public:
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false) { }
~Worker() {
// exit even if the work has not been completed
done_ = true;
if (worker_.joinable())
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set()) {
if (worker_.joinable())
worker_.join();
worker_ = std::thread([this]{ work(); });
}
return enqueued;
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
I introduced the worker_.join() inside the enqueue method. This can impact the performances, but in very rare cases (when the queue gets empty and before the thread exits, another enqueue comes). The working_ variable is now an atomic_flag that is set in enqueue and cleared in work. The Additional while after working_.clear() is needed because if another value is pushed, before the clear, but after the while, the value is not processed.
Is this implementation correct?
I did some tests and the implementation seems to work.
OT: Is it better to put this as an edit, or an answer?
what if the worker thread completes and before executing working_ = false, another enqueue comes?
Then the value will be pushed to the queue but will not be processed until another value is enqueued after the flag is set. You (or your users) may decide whether that is acceptable. This can be avoided using locks, but they're against your requirements.
The code may fail if the running thread is about to finish and sets working_ = false; but hasn't stopped running before next value is enqueued. In that case your code will call operator= on the running thread which results in a call to std::terminate according to the linked documentation.
Adding worker_.join() before assigning the worker to a new thread should prevent that.
Another problem is that queue_.push may fail if the queue is full because it has a fixed size. Currently you just ignore the case and the value will not be added to the full queue. If you wait for queue to have space, you don't get fast enqueue (in the edge case). You could take the bool returned by push (which tells if it was successful) and return it from enqueue. That way the caller may decide whether it wants to wait or discard the value.
Or use non-fixed size queue. Boost has this to say about that choice:
Can be used to completely disable dynamic memory allocations during push in order to ensure lockfree behavior.
If the data structure is configured as fixed-sized, the internal nodes are stored inside an array and they are addressed
by array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
to achieve lock-freedom.
Your worker thread needs more than 2 states.
Not running
Doing tasks
Idle shutdown
Shutdown
If you force shut down, it skips idle shutdown. If you run out of tasks, it transitions to idle shutdown. In idle shutdown, it empties the task queue, then goes into shutting down.
Shutdown is set, then you walk off the end of your worker task.
The producer first puts things on the queue. Then it checks the worker state. If Shutdown or Idle shutdown, first join it (and transition it to not running) then launch a new worker. If not running, just launch a new worker.
If the producer wants to launch a new worker, it first makes sure that we are in the not running state (otherwise, logic error). We then transition to the Doing tasks state, and then we launch the worker thread.
If the producer wants to shut down the helper task, it sets the done flag. It then checks the worker state. If it is anything besides not running, it joins it.
This can result in a worker thread that is launched for no good reason.
There are a few cases where the above can block, but there where a few before as well.
Then, we write a formal or semi-formal proof that the above cannot lose messages, because when writing lock free code you aren't done until you have a proof.
This is my solution of the question. I don't like very much answering myself, but I think showing actual code may help others.
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
// I used this semaphore class: https://gist.github.com/yohhoy/2156481
#include "binsem.hpp"
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
// the worker thread starts in the constructor
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false), semaphore_(0)
, worker_([this]{ work(); })
{ }
~Worker() {
// exit even if the work has not been completed
done_ = true;
semaphore_.signal();
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set())
// signal to the worker thread to wake up
semaphore_.signal();
return enqueued;
}
void work() {
int value;
// the worker thread continue to live
while (!done_) {
// wait the start signal, sleeping
semaphore_.wait();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
binsem semaphore_;
Queue queue_;
std::thread worker_;
};
I tried the suggestion of #Cameron, to not shutdown the thread and adding a semaphore. This actually is used only in the first enqueue and in the last work. This is not lock-free, but only in these two cases.
I did some performance comparison, between my previous version (see my edited question), and this one. There are no significant differences, when there are not many start and stop. However, the enqueue is 10 times faster when it have to signal the worker thread, instead of starting a new thread. This is a rare case, so it is not very important, but anyway it is an improvement.
This implementation satisfies:
lock-free in the common case (when enqueue and work are busy);
no busy wait in case for long time there are not enqueue
the destructor exits as soon as possible
correctness?? :)
Very partial answer: I think all those atomics, semaphores and states are a back-communication channel, from "the thread" to "the Worker". Why not use another queue for that? At the very least, thinking about it will help you around the problem.

Parallel writer and reader of std::vector

I have a class that is used by 2 threads at the same time: one thread adds results (one by one) to the results of a task, the second thread works on those results that are already there.
// all members are copy-able
struct task {
command cmd;
vector<result> results;
};
class generator {
public:
generator(executor* e); // store the ptr
void run();
...
};
class executor {
public:
void run();
void add_result(int command_id, result r);
task& find_task(int command_id);
...
private:
vector<task> tasks_;
condition_variable_any update_condition_;
};
Launch
// In main, we have instances of generator and executor,
// we launch 2 threads and wait for them.
std::thread gen_th( std::bind( &generator::run, gen_instance_) );
std::thread exe_th( std::bind( &executor::run, exe_instance_) );
Generator Thread
void generator::run() {
while(is_running) {
sleep_for_random_seconds();
executor_->add_result( SOME_ID, new_result() );
}
}
Executor thread
void executor::add_result( int command_id, result r ) {
std::unique_lock<std::recursive_mutex> l(mutex_);
task& t = this->find_task(command_id);
t.results.push_back(r);
update_condition_.notify_all();
}
void executor::run() {
while(is_running) {
update_condition_.wait(...);
task& t = this->find_task(SOME_ID);
for(result r: t.results) {
// no live updates are visible here
}
}
}
Generator thread adds a result every few seconds.
Executor thread is an executor itself. It is run via the run method, which waits for an update and when that happens, it works on the results.
Few things to take notice of:
vector of tasks may be big; the results are never disposed;
the for-each loop in executor fetches the task it's working on, then iterates over results, checks which of them are new and processes them. Once processed, they are marked and won't be processed again. This processing may take some time.
The problem occurs when Executor Thread doesn't finish the for loop before another result is added - the result object is not visible in the for loop. Since Executor Thread is working, it doesn't notice the update condition update, doesn't refresh the vector etc. When it finishes (working on a alread-not-actual view of tasks_) it hangs again on the update_condition_.. which was just triggered.
I need to make the code aware, that it should run the loop again after finishing it or make changes to a task visible in the for-each loop. What is the best solution to this problem?
You just need to check whether your vector is empty or not before blocking on the CV. Something like that:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty()) // <-- this is important
update_condition_.wait(lock);
// handle tasks_
}
If your architecture allows it (ie. if you don't need to hold the lock while handling the tasks), you may also want to unlock the mutex ASAP, before handling the tasks so that the producer can push more tasks without blocking. Maybe swapping your tasks_ vector with a temporary one, then unlock the mutex, and only then start handling the tasks in the temporary vector:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty())
update_condition_.wait(lock);
std::vector<task> localTasks;
localTasks.swap(tasks_);
lock.unlock(); // <-- release the lock early
// handle localTasks
}
Edit: ah now I realize this doesn't really fit your situation, because your messages are not directly in tasks_ but in tasks_.results. You get my general idea though, but using it will require structure changes in your code (eg. flatten your tasks / results and always have a cmd associated with a single result).
I act in the following way in the same situation
std::vector< ... > temp;
mutex.lock();
temp.swap( results );
mutex.unlock();
for(result r: temp ){
...
}
A little overhead takes a place, but in general whole code is more readeble and if an amount of calculations is big, then the time for copying goes to zero (sorry for english - it's not native to me)))