Thread Pool blocks main threads after some loops

Thread Pool blocks main threads after some loops - c++

I'm trying to learn how threading works on C++ and I found an implementation which I used as a guide
to make my own implementation, however after a loop or a couple it blocks.
I have a thread-safe queue in which I retrieve the jobs that are assigned to the thread pool.
Each thread runs this function:
// Declarations
std::vector<std::thread> m_threads;
JobQueue m_jobs; // A queue with locks
std::mutex m_mutex;
std::condition_variable m_condition;
std::atomic_bool m_active;
std::atomic_bool m_started;
std::atomic_int m_busy;
///...
[this, threadIndex] {
int numThread = threadIndex;
while(this->m_active) {
std::unique_ptr<Job> currJob;
bool dequeued = false;
{
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this, numThread]() {
return (this->m_started && !this->m_jobs.empty()) || !this->m_active;
});
if (this->m_active) {
m_busy++;
dequeued = this->m_jobs.dequeue(currJob);
}
}
if (dequeued) {
currJob->execute();
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
m_condition.notify_all();
} else {
{
std::lock_guard<std::mutex> lock { this->m_mutex };
m_busy--;
}
}
}
}
and the loop is basically:
while(1) {
int numJobs = rand() % 10000;
std::cout << "Will do " << numJobs << " jobs." << std::endl;
while(numJobs--) {
pool.assign([](){
// some heavy calculation
});
}
pool.waitEmpty();
std::cout << "Done!" << std::endl; // chrono removed for readability
}
While the waitEmpty method is described as:
std::unique_lock<std::mutex> lock { this->m_mutex };
this->m_condition.wait(lock, [this] {
return this->empty();
});
And is in this wait method that the code usually hangs as the test inside is never called again.
I've debugged it, changed the notification_one's and all's from place to place, but for some reason after some loops it always blocks.
Usually, but not always, it locks on condition_variable.wait() method that locks the current thread until there are no other thread working and the queue is empty, but I also saw it happen when I call condition_variable.notify_all().
Some debugging helped me notice that while I call notify_all() on the slave thread, the wait() in the main thread is not tested again.
The expected behavior is that it does not block when it loops.
I'm using G++ 8.1.0 on Windows.
and the output is:
Will do 41 jobs.
Done! Took 0ms!
Will do 8467 jobs.
<main thread blocked>
Edit: I fixed the issue pointed by paddy's comment: now m_busy-- also happens when a job is not dequeued.
Edit 2: Running this on Linux does not locks the main thread and runs as expected. (g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0)
Edit 3: As mentioned in the comments, corrected deadlock to block, as it only involves one lock
Edit 4: As commented by Jérôme Richard I was able to improve it by creating a lock_guard around the m_busy--; but now the code blocks at the notify_all() that is called inside the assign method. Here is the assign method for reference:
template<class Func, class... Args>
auto assign(Func&& func, Args&&... args) -> std::future<typename std::result_of<Func(Args...)>::type> {
using jobResultType = typename std::result_of<Func(Args...)>::type;
auto task = std::make_shared<std::packaged_task<jobResultType()>>(
std::bind(std::forward<Func>(func), std::forward<Args>(args)...)
);
auto job = std::unique_ptr<Job>(new Job([task](){ (*task)(); }));
std::future<jobResultType> result = task->get_future();
m_jobs.enqueue(std::move(job));
std::cout << " - enqueued";
m_condition.notify_all();
std::cout << " - ok!" << std::endl;
return result;
}
In one of the loops the last output is
//...
- enqueued - ok!
- enqueued - ok!
- enqueued
<blocked again>
Edit 5: With the latest changes, this does not happens on msbuild compiler.
The Gist for my implementation is here: https://gist.github.com/GuiAmPm/4be7716b7f1ea62819e61ef4ad3beb02
Here's also the original Article which I based my implementation:
https://roar11.com/2016/01/a-platform-independent-thread-pool-using-c14/
Any help will be appreciated.

tl;dr: use a std::lock_guard of m_mutex around m_busy-- to avoid unexpected wait condition blocking.
Analysis
First of all, please note that the problem can occur with one thread in the pool and just few jobs. This means that there is a problem between the master thread that submit the jobs and the one that execute them.
Using GDB to analyze further the state of the program when the wait condition get stuck, one can see that there is no jobs, m_busy is set to 0 and both threads are waiting for notifications.
This means that there is a concurrency issue on the wait condition between the master and the only worker on the last job to execute.
By adding a global atomic clock in your code, one can see that (in almost all case) the worker finishes all the jobs before the master can wait for the jobs to be completed and workers done.
Here is one practical scenario retrieved (bullets are done sequentially):
the master start the wait call and there is jobs remaining
the worker perform m_busy++, dequeue the last job and execute it (m_busy is now set to 1 and the job queue is empty)
the master compute the predicate of the wait call
the master call ThreadPool::empty and the result is false due to busy set to 1
the worker perform m_busy-- (m_busy is now set to 0)
from that moment, the master could wait for the condition back (but is suspected to not do it)
the worker notify the condition
the master is suspected to wait for the condition back only now and to not be impacted by this last notification (as no waits will happen next)
At this point, the master is no longer executing instructions and will wait forever
the worker wait for the condition and will wait forever too
The fact that the master is not impacted by the notification is very strange.
It seems to be related to memory fencing issues. A more detailed explanation can be found here. To quote the article:
Even if you make dataReady an atomic, it must be modified under the mutex; if not the modification to the waiting thread may be published, but not correctly synchronized.
So a solution is to replace the m_busy-- instruction by the following lines:
{
std::lock_guard<std::mutex> lck {this->m_mutex};
m_busy--;
}
It avoid the previous scenario. Indeed, on one hand m_mutex is acquired in during the predicate checking of the wait call preventing m_busy to be modified during this specific moment; on the other hand it enforce data to be properly synchronized.
It should be theoretically safer to also include the m_jobs.dequeue call into it but will strongly reduce the degree of parallelism of the workers. In practice, useful synchronizations are made when the lock is released in the worker threads.
Please note that one general workaround to avoid such problems could be to add a timeout to waiting calls using the wait_for function in a loop to enforce the predicate condition. However, this solution comes a the price of a higher latency of the waiting calls and can thus significantly slow the execution down.

Related

How to let a thread wait itself out without using Sleep()?

I want the while loop in the thread to run , wait a second, then run again, so on and so on., but this don't seem to work, how would I fix it?
main(){
bool flag = true;
pthread = CreateThread(NULL, 0, ThreadFun, this, 0, &ThreadIP);
}
ThreadFun(){
while(flag == true)
WaitForSingleObject(pthread,1000);
}

This is one way to do it, I prefer using condition variables over sleeps since they are more responsive and std::async over std::thread (mainly because std::async returns a future which can send information back the the starting thread. Even if that feature is not used in this example).
#include <iostream>
#include <chrono>
#include <future>
#include <condition_variable>
// A very useful primitive to communicate between threads is the condition_variable
// despite its name it isn't a variable perse. It is more of an interthread signal
// saying, hey wake up thread something may have changed that's interesting to you.
// They come with some conditions of their own
// - always use with a lock
// - never wait without a predicate
// (https://www.modernescpp.com/index.php/c-core-guidelines-be-aware-of-the-traps-of-condition-variables)
// - have some state to observe (in this case just a bool)
//
// Since these three things go together I usually pack them in a class
// in this case signal_t which will be used to let thread signal each other
class signal_t
{
public:
// wait for boolean to become true, or until a certain time period has passed
// then return the value of the boolean.
bool wait_for(const std::chrono::steady_clock::duration& duration)
{
std::unique_lock<std::mutex> lock{ m_mtx };
m_cv.wait_for(lock, duration, [&] { return m_signal; });
return m_signal;
}
// wiat until the boolean becomes true, wait infinitely long if needed
void wait()
{
std::unique_lock<std::mutex> lock{ m_mtx };
m_cv.wait(lock, [&] {return m_signal; });
}
// set the signal
void set()
{
std::unique_lock<std::mutex> lock{ m_mtx };
m_signal = true;
m_cv.notify_all();
}
private:
bool m_signal { false };
std::mutex m_mtx;
std::condition_variable m_cv;
};
int main()
{
// create two signals to let mainthread and loopthread communicate
signal_t started; // indicates that loop has really started
signal_t stop; // lets mainthread communicate a stop signal to the loop thread.
// in this example I use a lambda to implement the loop
auto future = std::async(std::launch::async, [&]
{
// signal this thread has been scheduled and has started.
started.set();
do
{
std::cout << ".";
// the stop_wait_for will either wait 500 ms and return false
// or stop immediately when stop signal is set and then return true
// the wait with condition variables is much more responsive
// then implementing a loop with sleep (which will only
// check stop condition every 500ms)
} while (!stop.wait_for(std::chrono::milliseconds(500)));
});
// wait for loop to have started
started.wait();
// give the thread some time to run
std::this_thread::sleep_for(std::chrono::seconds(3));
// then signal the loop to stop
stop.set();
// synchronize with thread stop
future.get();
return 0;
}

While the other answer is a possible way to do it, my answer will mostly answer from a different angle trying to see what could be wrong with your code...
Well, if you don't care to wait up to one second when flag is set to false and you want a delay of at least 1000 ms, then a loop with Sleep could work but you need
an atomic variable (for ex. std::atomic)
or function (for ex. InterlockedCompareExchange)
or a MemoryBarrier
or some other mean of synchronisation to check the flag.
Without proper synchronisation, there is no guarantee that the compiler would read the value from memory and not the cache or a register.
Also using Sleep or similar function from a UI thread would also be suspicious.
For a console application, you could wait some time in the main thread if the purpose of you application is really to works for a given duration. But usually, you probably want to wait until processing is completed. In most cases, you should usually wait that threads you have started have completed.
Another problem with Sleep function is that the thread always has to wake up every few seconds even if there is nothing to do. This can be bad if you want to optimize battery usage. However, on the other hand having a relatively long timeout on function that wait on some signal (handle) might make your code a bit more robust against missed wakeup if your code has some bugs in it.
You also need a delay in some cases where you don't really have anything to wait on but you need to pull some data at regular interval.
A large timeout could also be useful as a kind of watch dog timer. For example, if you expect to have something to do and receive nothing for an extended period, you could somehow report a warning so that user could check if something is not working properly.
I highly recommand you to read a book on multithreading like Concurrency in Action before writing multithread code code.
Without proper understanding of multithreading, it is almost 100% certain that anyone code is bugged. You need to properly understand the C++ memory model (https://en.cppreference.com/w/cpp/language/memory_model) to write correct code.
A thread waiting on itself make no sense. When you wait a thread, you are waiting that it has terminated and obviously if it has terminated, then it cannot be executing your code. You main thread should wait for the background thread to terminate.
I also usually recommand to use C++ threading function over the API as they:
Make your code portable to other system.
Are usually higher level construct (std::async, std::future, std::condition_variable...) than corresponding Win32 API code.

is there any way to wakeup multiple threads at the same time in c/c++

well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
do_something();
notify_others(); // this is what I'm looking for
flag.clear();
} else {
// this is the waiting thread
wait_till_notification();
do_some_other_thing();
}
}
void runner() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
// ...
}
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
lk.lock();
do_something();
lk.unlock();
flag.clear();
} else {
// worker thread
lk.shared_lock();
do_some_other_thing();
lk.shared_unlock();
}
}
// outer loop
void looper() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
}
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not

it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.

It sounds like you are looking for call_once
#include <mutex>
void example()
{
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
do_something();
});
if(! i_did_once)
do_some_other_thing();
}
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.

Allow multiple mutex owners or specified number of concurrent code executions

There is a code I that I don't want to be executed by more than X threads (e.g. five) at a time. Every smaller number would be fine. I'm currently experimenting with condition variables, here is what I worked out so far:
void Manager::EnterQueue(Worker *w)
{
{
// Ensure stable work of std::queue
const std::lock_guard<std::mutex> lock(queueInsertMutex);
workerQueue.push(w);
}
std::unique_lock<std::mutex> unlock_object(queueMutex);
while (workerQueue.front() != w)
{
// stop all threads not being at front of queue
cv.wait(unlock_object);
}
workerQueue.pop();
{
// ensure that numOfAvailableObjects is checked by one thread at a time
const std::lock_guard<std::mutex> lock(counterMutex);
if (numOfAvailableObjects > 1)
{
// limit is not exceeded. Fire up another thread
numOfAvailableObjects--;
cv.notify_all();
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(SOME_WORK_TIME_IN_MS));
{
const std::lock_guard<std::mutex> lock(counterMutex);
numOfAvailableObjects++;
}
cv.notify_all();
}
So the idea was:
make while loop that passes only Workers that are first in queue
if numOfAvailableObjects > 1 then pass next worker by calling cv.notify_all() which (in theory) would resume all cv.wait(unlock_object).
Of course it doesn't work as expected. sleep_for is executed currently by one thread at a time. Do you know any other alternatives that would work like cv.wait() that will stop threads until I tell them to resume?

I think you're looking for a semaphore. Look it up.

C++11 lockfree single producer single consumer: how to avoid busy wait

I'm trying to implement a class that uses two threads: one for the producer and one for the consumer. The current implementation does not use locks:
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
Worker() : working_(false), done_(false) {}
~Worker() {
done_ = true; // exit even if the work has not been completed
worker_.join();
}
void enqueue(int value) {
queue_.push(value);
if (!working_) {
working_ = true;
worker_ = std::thread([this]{ work(); });
}
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_ = false;
}
private:
std::atomic<bool> working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
The application needs to enqueue work items for a certain amount of time and then sleep waiting for an event. This is a minimal main that simulates the behavior:
int main()
{
Worker w;
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
I'm pretty sure that my implementation is bugged: what if the worker thread completes and before executing working_ = false, another enqueue comes? Is it possible to make my code thread safe without using locks?
The solution requires:
a fast enqueue
the destructor has to quit even if the queue is not empty
no busy wait, because there are long period of time in which the worker thread is idle
no locks if possible
Edit
I did another implementation of the Worker class, based on your suggestions. Here is my second attempt:
class Worker
{
public:
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false) { }
~Worker() {
// exit even if the work has not been completed
done_ = true;
if (worker_.joinable())
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set()) {
if (worker_.joinable())
worker_.join();
worker_ = std::thread([this]{ work(); });
}
return enqueued;
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
I introduced the worker_.join() inside the enqueue method. This can impact the performances, but in very rare cases (when the queue gets empty and before the thread exits, another enqueue comes). The working_ variable is now an atomic_flag that is set in enqueue and cleared in work. The Additional while after working_.clear() is needed because if another value is pushed, before the clear, but after the while, the value is not processed.
Is this implementation correct?
I did some tests and the implementation seems to work.
OT: Is it better to put this as an edit, or an answer?

what if the worker thread completes and before executing working_ = false, another enqueue comes?
Then the value will be pushed to the queue but will not be processed until another value is enqueued after the flag is set. You (or your users) may decide whether that is acceptable. This can be avoided using locks, but they're against your requirements.
The code may fail if the running thread is about to finish and sets working_ = false; but hasn't stopped running before next value is enqueued. In that case your code will call operator= on the running thread which results in a call to std::terminate according to the linked documentation.
Adding worker_.join() before assigning the worker to a new thread should prevent that.
Another problem is that queue_.push may fail if the queue is full because it has a fixed size. Currently you just ignore the case and the value will not be added to the full queue. If you wait for queue to have space, you don't get fast enqueue (in the edge case). You could take the bool returned by push (which tells if it was successful) and return it from enqueue. That way the caller may decide whether it wants to wait or discard the value.
Or use non-fixed size queue. Boost has this to say about that choice:
Can be used to completely disable dynamic memory allocations during push in order to ensure lockfree behavior.
If the data structure is configured as fixed-sized, the internal nodes are stored inside an array and they are addressed
by array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
to achieve lock-freedom.

Your worker thread needs more than 2 states.
Not running
Doing tasks
Idle shutdown
Shutdown
If you force shut down, it skips idle shutdown. If you run out of tasks, it transitions to idle shutdown. In idle shutdown, it empties the task queue, then goes into shutting down.
Shutdown is set, then you walk off the end of your worker task.
The producer first puts things on the queue. Then it checks the worker state. If Shutdown or Idle shutdown, first join it (and transition it to not running) then launch a new worker. If not running, just launch a new worker.
If the producer wants to launch a new worker, it first makes sure that we are in the not running state (otherwise, logic error). We then transition to the Doing tasks state, and then we launch the worker thread.
If the producer wants to shut down the helper task, it sets the done flag. It then checks the worker state. If it is anything besides not running, it joins it.
This can result in a worker thread that is launched for no good reason.
There are a few cases where the above can block, but there where a few before as well.
Then, we write a formal or semi-formal proof that the above cannot lose messages, because when writing lock free code you aren't done until you have a proof.

This is my solution of the question. I don't like very much answering myself, but I think showing actual code may help others.
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
// I used this semaphore class: https://gist.github.com/yohhoy/2156481
#include "binsem.hpp"
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
// the worker thread starts in the constructor
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false), semaphore_(0)
, worker_([this]{ work(); })
{ }
~Worker() {
// exit even if the work has not been completed
done_ = true;
semaphore_.signal();
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set())
// signal to the worker thread to wake up
semaphore_.signal();
return enqueued;
}
void work() {
int value;
// the worker thread continue to live
while (!done_) {
// wait the start signal, sleeping
semaphore_.wait();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
binsem semaphore_;
Queue queue_;
std::thread worker_;
};
I tried the suggestion of #Cameron, to not shutdown the thread and adding a semaphore. This actually is used only in the first enqueue and in the last work. This is not lock-free, but only in these two cases.
I did some performance comparison, between my previous version (see my edited question), and this one. There are no significant differences, when there are not many start and stop. However, the enqueue is 10 times faster when it have to signal the worker thread, instead of starting a new thread. This is a rare case, so it is not very important, but anyway it is an improvement.
This implementation satisfies:
lock-free in the common case (when enqueue and work are busy);
no busy wait in case for long time there are not enqueue
the destructor exits as soon as possible
correctness?? :)

Very partial answer: I think all those atomics, semaphores and states are a back-communication channel, from "the thread" to "the Worker". Why not use another queue for that? At the very least, thinking about it will help you around the problem.

Parallel writer and reader of std::vector

I have a class that is used by 2 threads at the same time: one thread adds results (one by one) to the results of a task, the second thread works on those results that are already there.
// all members are copy-able
struct task {
command cmd;
vector<result> results;
};
class generator {
public:
generator(executor* e); // store the ptr
void run();
...
};
class executor {
public:
void run();
void add_result(int command_id, result r);
task& find_task(int command_id);
...
private:
vector<task> tasks_;
condition_variable_any update_condition_;
};
Launch
// In main, we have instances of generator and executor,
// we launch 2 threads and wait for them.
std::thread gen_th( std::bind( &generator::run, gen_instance_) );
std::thread exe_th( std::bind( &executor::run, exe_instance_) );
Generator Thread
void generator::run() {
while(is_running) {
sleep_for_random_seconds();
executor_->add_result( SOME_ID, new_result() );
}
}
Executor thread
void executor::add_result( int command_id, result r ) {
std::unique_lock<std::recursive_mutex> l(mutex_);
task& t = this->find_task(command_id);
t.results.push_back(r);
update_condition_.notify_all();
}
void executor::run() {
while(is_running) {
update_condition_.wait(...);
task& t = this->find_task(SOME_ID);
for(result r: t.results) {
// no live updates are visible here
}
}
}
Generator thread adds a result every few seconds.
Executor thread is an executor itself. It is run via the run method, which waits for an update and when that happens, it works on the results.
Few things to take notice of:
vector of tasks may be big; the results are never disposed;
the for-each loop in executor fetches the task it's working on, then iterates over results, checks which of them are new and processes them. Once processed, they are marked and won't be processed again. This processing may take some time.
The problem occurs when Executor Thread doesn't finish the for loop before another result is added - the result object is not visible in the for loop. Since Executor Thread is working, it doesn't notice the update condition update, doesn't refresh the vector etc. When it finishes (working on a alread-not-actual view of tasks_) it hangs again on the update_condition_.. which was just triggered.
I need to make the code aware, that it should run the loop again after finishing it or make changes to a task visible in the for-each loop. What is the best solution to this problem?

You just need to check whether your vector is empty or not before blocking on the CV. Something like that:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty()) // <-- this is important
update_condition_.wait(lock);
// handle tasks_
}
If your architecture allows it (ie. if you don't need to hold the lock while handling the tasks), you may also want to unlock the mutex ASAP, before handling the tasks so that the producer can push more tasks without blocking. Maybe swapping your tasks_ vector with a temporary one, then unlock the mutex, and only then start handling the tasks in the temporary vector:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty())
update_condition_.wait(lock);
std::vector<task> localTasks;
localTasks.swap(tasks_);
lock.unlock(); // <-- release the lock early
// handle localTasks
}
Edit: ah now I realize this doesn't really fit your situation, because your messages are not directly in tasks_ but in tasks_.results. You get my general idea though, but using it will require structure changes in your code (eg. flatten your tasks / results and always have a cmd associated with a single result).

I act in the following way in the same situation
std::vector< ... > temp;
mutex.lock();
temp.swap( results );
mutex.unlock();
for(result r: temp ){
...
}
A little overhead takes a place, but in general whole code is more readeble and if an amount of calculations is big, then the time for copying goes to zero (sorry for english - it's not native to me)))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js