c++ thread worker failure under high load - c++

I have been working on a idea for a system where I can have many workers that are triggered on a regular basis by a a central timer class. The part I'm concerned about here is a TriggeredWorker which, in a loop, uses the mutex & conditionVariable approach to wait to be told to do work. It has a method trigger that is called (by a different thread) that triggers work to be done. It is an abstract class that has to be subclassed for the actual work method to be implemented.
I have a test that shows that this mechanism works. However, as I increase the load by reducing the trigger interval, the test starts to fail. When I delay 20 microseconds between triggers, the test is 100% reliable. As I reduce down to 1 microsecond, I start to get failures in that the count of work performed reduces from 1000 (expected) to values like 986, 933, 999 etc..
My questions are: (1) what is it that is going wrong and how can I capture what is going wrong so I can report it or do something about it? And, (2) is there some better approach that I could use that would be better? I have to admit that my experience with c++ is limited to the last 3 months, although I have worked with other languages for several years.
Many thanks for reading...
Here are the key bits of code:
Triggered worker header file:
#ifndef TIMER_TRIGGERED_WORKER_H
#define TIMER_TRIGGERED_WORKER_H
#include <thread>
#include <plog/Log.h>
class TriggeredWorker {
private:
std::mutex mutex_;
std::condition_variable condVar_;
std::atomic<bool> running_{false};
std::atomic<bool> ready_{false};
void workLoop();
protected:
virtual void work() {};
public:
void start();
void stop();
void trigger();
};
#endif //TIMER_TRIGGERED_WORKER_H
Triggered worker implementation:
#include "TriggeredWorker.h"
void TriggeredWorker::workLoop() {
PLOGD << "workLoop started...";
while(true) {
std::unique_lock<std::mutex> lock(mutex_);
condVar_.wait(lock, [this]{
bool ready = this->ready_;
bool running = this->running_;
return ready | !running; });
this->ready_ = false;
if (!this->running_) {
break;
}
PLOGD << "Calling work()...";
work();
lock.unlock();
condVar_.notify_one();
}
PLOGD << "Worker thread completed.";
}
void TriggeredWorker::start() {
PLOGD << "Worker start...";
this->running_ = true;
auto thread = std::thread(&TriggeredWorker::workLoop, this);
thread.detach();
}
void TriggeredWorker::stop() {
PLOGD << "Worker stop.";
this->running_ = false;
}
void TriggeredWorker::trigger() {
PLOGD << "Trigger.";
std::unique_lock<std::mutex> lock(mutex_);
ready_ = true;
lock.unlock();
condVar_.notify_one();
}
and the test:
#include "catch.hpp"
#include "TriggeredWorker.h"
#include <thread>
TEST_CASE("Simple worker performs work when triggered") {
static std::atomic<int> twt_count{0};
class SimpleTriggeredWorker : public TriggeredWorker {
protected:
void work() override {
PLOGD << "Incrementing counter.";
twt_count.fetch_add(1);
}
};
SimpleTriggeredWorker worker;
worker.start();
for (int i = 0; i < 1000; i++) {
worker.trigger();
std::this_thread::sleep_for(std::chrono::microseconds(20));
}
std::this_thread::sleep_for(std::chrono::seconds(1));
CHECK(twt_count == 1000);
std::this_thread::sleep_for(std::chrono::seconds(1));
worker.stop();
}

What happens when worker.trigger() is called twice before workLoop acquires the lock? You loose one of those "triggers". Smaller time gap means higher probability of test failure, because of higher probability of multiple consecutive worker.trigger() calls before workLoop wakes up. Note that there's nothing that guarantees that workLoop will acquire the lock after worker.trigger() but before another worker.trigger() happens, even when those calls happen one after another (i.e. not in parallel). This is governed by the OS scheduler and we have no control over it.
Anyway the core problem is that setting ready_ = true twice looses information. Unlike incrementing an integer twice. And so the simplest solution is to replace bool with int and do inc/dec with == 0 checks. This solution is also known as semaphore. More advanced (potentially better, especially when you need to pass some data to the worker) approach is to use a (bounded?) thread safe queue. That depends on what exactly you are trying to achieve.
BTW 1: all your reads and updates, except for stop() function (and start() but this isn't really relevant), happen under the lock. I suggest you fix stop() to be under lock as well (since it is rarely called anyway) and turn atomics into non-atomics. There's an unnecessary overhead of atomics at the moment.
BTW 2: I suggest not using thread.detach(). You should store the std::thread object on TriggeredWorker and add destructor that does stop with join. These are not independent beings and so without detach() you make your code safer (one should never die without the other).

Related

Two questions on std::condition_variables

I have been trying to figure out std::condition_variables and I am particularly confused by wait() and whether to use notify_all or notify_one.
First, I've written some code and attached it below. Here's a short explanation: Collection is a class that holds onto a bunch of Counter objects. These Counter objects have a Counter::increment() method, which needs to be called on all the objects, over and over again. To speed everything up, Collection also maintains a thread pool to distribute the work over, and sends out all the work with its Collection::increment_all() method.
These threads don't need to communicate with each other, and there are usually many more Counter objects than there are threads. It's fine if one thread processes more than Counters than others, just as long as all the work gets done. Adding work to the queue is easy and only needs to be done in the "main" thread. As far as I can see, the only bad thing that can happen is if other methods (e.g. Collection::printCounts) are allowed to be called on the counters in the middle of the work being done.
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
#include <queue>
class Counter{
private:
int m_count;
public:
Counter() : m_count(0) {}
void increment() {
m_count ++;
}
int getCount() const { return m_count; }
};
class Collection{
public:
Collection(unsigned num_threads, unsigned num_counters)
: m_shutdown(false)
{
// start workers
for(size_t i = 0; i < num_threads; ++i){
m_threads.push_back(std::thread(&Collection::work, this));
}
// intsntiate counters
for(size_t j = 0; j < num_counters; ++j){
m_counters.emplace_back();
}
}
~Collection()
{
m_shutdown = true;
for(auto& t : m_threads){
if(t.joinable()){
t.join();
}
}
}
void printCounts() {
// wait for work to be done
std::unique_lock<std::mutex> lk(m_mtx);
m_work_complete.wait(lk); // q2: do I need a while lop?
// print all current counters
for(const auto& cntr : m_counters){
std::cout << cntr.getCount() << ", ";
}
std::cout << "\n";
}
void increment_all()
{
std::unique_lock<std::mutex> lock(m_mtx);
m_work_complete.wait(lock);
for(size_t i = 0; i < m_counters.size(); ++i){
m_which_counters_have_work.push(i);
}
}
private:
void work()
{
while(!m_shutdown){
bool action = false;
unsigned which_counter;
{
std::unique_lock<std::mutex> lock(m_mtx);
if(m_which_counters_have_work.size()){
which_counter = m_which_counters_have_work.front();
m_which_counters_have_work.pop();
action = true;
}else{
m_work_complete.notify_one(); // q1: notify_all
}
}
if(action){
m_counters[which_counter].increment();
}
}
}
std::vector<Counter> m_counters;
std::vector<std::thread> m_threads;
std::condition_variable m_work_complete;
std::mutex m_mtx;
std::queue<unsigned> m_which_counters_have_work;
bool m_shutdown;
};
int main() {
int num_threads = std::thread::hardware_concurrency()-1;
int num_counters = 10;
Collection myCollection(num_threads, num_counters);
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
return 0;
}
I compile this on Ubuntu 18.04 with g++ -std=c++17 -pthread thread_pool.cpp -o tp && ./tp I think the code accomplishes all of those objectives, but a few questions remain:
I am using m_work_complete.wait(lk) to make sure the work is finished before I start printing all the new counts. Why do I sometimes see this written inside a while loop, or with a second argument as a lambda predicate function? These docs mention spurious wake ups. If a spurious wake up occurs, does that mean printCounts could prematurely print? If so, I don't want that. I just want to ensure the work queue is empty before I start using the numbers that should be there.
I am using m_work_complete.notify_all instead of m_work_complete.notify_one. I've read this thread, and I don't think it matters--only the main thread is going to be blocked by this. Is it faster to use notify_one just so the other threads don't have to worry about it?
std::condition_variable is not really a condition variable, it's more of a synchronization tool for reaching a certain condition. What that condition is is up to the programmer, and it should still be checked after each condition_variable wake-up, since it can wake-up spuriously, or "too early", when the desired condition isn't yet reached.
On POSIX systems, condition_variable::wait() delegates to pthread_cond_wait, which is susceptible to spurious wake-up (see "Condition Wait Semantics" in the Rationale section). On Linux, pthread_cond_wait is in turn implemented via a futex, which is again susceptible to spurious wake-up.
So yes you still need a flag (protected by the same mutex) or some other way to check that the work is actually complete. A convenient way to do this is by wrapping the check in a predicate and passing it to the wait() function, which would loop for you until the predicate is satisfied.
notify_all unblocks all threads waiting on the condition variable; notify_one unblocks just one (or at least one, to be precise). If there are more than one waiting threads, and they are equivalent, i.e. either one can handle the condition fully, and if the condition is sufficient to let just one thread continue (as in submitting a work unit to a thread pool), then notify_one would be more efficient since it won't unblock other threads unnecessarily for them to only notice no work to be done and going back to waiting. If you ever only have one waiter, then there would be no difference between notify_one and notify_all.
It's pretty simple: Use notify() when;
There is no reason why more than one thread needs to know about the event. (E.g., use notify() to announce the availability of an item that a worker thread will "consume," and thereby make the item unavailable to other workers)*AND*
There is no wrong thread that could be awakened. (E.g., you're probably safe if all of the threads are wait()ing in the same line of the same exact function.)
Use notify_all() in all other cases.

How to wake a std::thread while it is sleeping

I am using C++11 and I have a std::thread which is a class member, and it sends information to listeners every 2 minutes. Other that that it just sleeps. So, I have made it sleep for 2 minutes, then send the required info, and then sleep for 2 minutes again.
// MyClass.hpp
class MyClass {
~MyClass();
RunMyThread();
private:
std::thread my_thread;
std::atomic<bool> m_running;
}
MyClass::RunMyThread() {
my_thread = std::thread { [this, m_running] {
m_running = true;
while(m_running) {
std::this_thread::sleep_for(std::chrono::minutes(2));
SendStatusInfo(some_info);
}
}};
}
// Destructor
~MyClass::MyClass() {
m_running = false; // this wont work as the thread is sleeping. How to exit thread here?
}
Issue:
The issue with this approach is that I cannot exit the thread while it is sleeping. I understand from reading that I can wake it using a std::condition_variable and exit gracefully? But I am struggling to find a simple example which does the bare minimum as required in above scenario. All the condition_variable examples I've found look too complex for what I am trying to do here.
Question:
How can I use a std::condition_variable to wake the thread and exit gracefully while it is sleeping? Or are there any other ways of achieving the same without the condition_variable technique?
Additionally, I see that I need to use a std::mutex in conjunction with std::condition_variable? Is that really necessary? Is it not possible to achieve the goal by adding the std::condition_variable logic only to required places in the code here?
Environment:
Linux and Unix with compilers gcc and clang.
How can I use an std::condition_variable to wake the thread and exit gracefully while it was sleeping? Or are there any other ways of achieving the same without condition_variable technique?
No, not in standard C++ as of C++17 (there are of course non-standard, platform-specific ways to do it, and it's likely some kind of semaphore will be added to C++2a).
Additionally, I see that I need to use a std::mutex in conjunction with std::condition_variable? Is that really necessary?
Yes.
Is it not possible to achieve the goal by adding the std::condition_variable logic only to required places in the code piece here?
No. For a start, you can't wait on a condition_variable without locking a mutex (and passing the lock object to the wait function) so you need to have a mutex present anyway. Since you have to have a mutex anyway, requiring both the waiter and the notifier to use that mutex isn't such a big deal.
Condition variables are subject to "spurious wake ups" which means they can stop waiting for no reason. In order to tell if it woke because it was notified, or woke spuriously, you need some state variable that is set by the notifying thread and read by the waiting thread. Because that variable is shared by multiple threads it needs to be accessed safely, which the mutex ensures.
Even if you use an atomic variable for the share variable, you still typically need a mutex to avoid missed notifications.
This is all explained in more detail in
https://github.com/isocpp/CppCoreGuidelines/issues/554
A working example for you using std::condition_variable:
struct MyClass {
MyClass()
: my_thread([this]() { this->thread(); })
{}
~MyClass() {
{
std::lock_guard<std::mutex> l(m_);
stop_ = true;
}
c_.notify_one();
my_thread.join();
}
void thread() {
while(this->wait_for(std::chrono::minutes(2)))
SendStatusInfo(some_info);
}
// Returns false if stop_ == true.
template<class Duration>
bool wait_for(Duration duration) {
std::unique_lock<std::mutex> l(m_);
return !c_.wait_for(l, duration, [this]() { return stop_; });
}
std::condition_variable c_;
std::mutex m_;
bool stop_ = false;
std::thread my_thread;
};
How can I use an std::condition_variable to wake the thread and exit gracefully while it was sleeping?
You use std::condition_variable::wait_for() instead of std::this_thread::sleep_for() and first one can be interrupted by std::condition_variable::notify_one() or std::condition_variable::notify_all()
Additionally, I see that I need to use a std::mutex in conjunction with std::condition_variable? Is that really necessary? Is it not possible to achieve the goal by adding the std::condition_variable logic only to required places in the code piece here?
Yes it is necessary to use std::mutex with std::condition_variable and you should use it instead of making your flag std::atomic as despite atomicity of flag itself you would have race condition in your code and you will notice that sometimes your sleeping thread would miss notification if you would not use mutex here.
There is a sad, but true fact - what you are looking for is a signal, and Posix threads do not have a true signalling mechanism.
Also, the only Posix threading primitive associated with any sort of timing is conditional variable, this is why your online search lead you to it, and since C++ threading model is heavily built on Posix API, in standard C++ Posix-compatible primitives is all you get.
Unless you are willing to go outside of Posix (you do not indicate platform, but there are native platform ways to work with events which are free from those limitations, notably eventfd in Linux) you will have to stick with condition variables and yes, working with condition variable requires a mutex, since it is built into API.
Your question doesn't specifically ask for code sample, so I am not providing any. Let me know if you'd like some.
Additionally, I see that I need to use a std::mutex in conjunction with std::condition_variable? Is that really necessary? Is it not possible to achieve the goal by adding the std::condition_variable logic only to required places in the code piece here?
std::condition_variable is a low level primitive. Actually using it requires fiddling with other low level primitives as well.
struct timed_waiter {
void interrupt() {
auto l = lock();
interrupted = true;
cv.notify_all();
}
// returns false if interrupted
template<class Rep, class Period>
bool wait_for( std::chrono::duration<Rep, Period> how_long ) const {
auto l = lock();
return !cv.wait_until( l,
std::chrono::steady_clock::now() + how_long,
[&]{
return !interrupted;
}
);
}
private:
std::unique_lock<std::mutex> lock() const {
return std::unique_lock<std::mutex>(m);
}
mutable std::mutex m;
mutable std::condition_variable cv;
bool interrupted = false;
};
simply create a timed_waiter somewhere both the thread(s) that wants to wait, and the code that wants to interrupt, can see it.
The waiting threads do
while(m_timer.wait_for(std::chrono::minutes(2))) {
SendStatusInfo(some_info);
}
to interrupt do m_timer.interrupt() (say in the dtor) then my_thread.join() to let it finish.
Live example:
struct MyClass {
~MyClass();
void RunMyThread();
private:
std::thread my_thread;
timed_waiter m_timer;
};
void MyClass::RunMyThread() {
my_thread = std::thread {
[this] {
while(m_timer.wait_for(std::chrono::seconds(2))) {
std::cout << "SendStatusInfo(some_info)\n";
}
}};
}
// Destructor
MyClass::~MyClass() {
std::cout << "~MyClass::MyClass\n";
m_timer.interrupt();
my_thread.join();
std::cout << "~MyClass::MyClass done\n";
}
int main() {
std::cout << "start of main\n";
{
MyClass x;
x.RunMyThread();
using namespace std::literals;
std::this_thread::sleep_for(11s);
}
std::cout << "end of main\n";
}
Or are there any other ways of achieving the same without the condition_variable technique?
You can use std::promise/std::future as a simpler alternative to a bool/condition_variable/mutex in this case. A future is not susceptible to spurious wakes and doesn't require a mutex for synchronisation.
Basic example:
std::promise<void> pr;
std::thread thr{[fut = pr.get_future()]{
while(true)
{
if(fut.wait_for(std::chrono::minutes(2)) != std::future_status::timeout)
return;
}
}};
//When ready to stop
pr.set_value();
thr.join();
Or are there any other ways of achieving the same without condition_variable technique?
One alternative to a condition variable is you can wake your thread up at much more regular intervals to check the "running" flag and go back to sleep if it is not set and the allotted time has not yet expired:
void periodically_call(std::atomic_bool& running, std::chrono::milliseconds wait_time)
{
auto wake_up = std::chrono::steady_clock::now();
while(running)
{
wake_up += wait_time; // next signal send time
while(std::chrono::steady_clock::now() < wake_up)
{
if(!running)
break;
// sleep for just 1/10 sec (maximum)
auto pre_wake_up = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
pre_wake_up = std::min(wake_up, pre_wake_up); // don't overshoot
// keep going to sleep here until full time
// has expired
std::this_thread::sleep_until(pre_wake_up);
}
SendStatusInfo(some_info); // do the regular call
}
}
Note: You can make the actual wait time anything you want. In this example I made it 100ms std::chrono::milliseconds(100). It depends how responsive you want your thread to be to a signal to stop.
For example in one application I made that one whole second because I was happy for my application to wait a full second for all the threads to stop before it closed down on exit.
How responsive you need it to be is up to your application. The shorter the wake up times the more CPU it consumes. However even very short intervals of a few milliseconds will probably not register much in terms of CPU time.
You could also use promise/future so that you don't need to bother with conditionnal and/or threads:
#include <future>
#include <iostream>
struct MyClass {
~MyClass() {
_stop.set_value();
}
MyClass() {
auto future = std::shared_future<void>(_stop.get_future());
_thread_handle = std::async(std::launch::async, [future] () {
std::future_status status;
do {
status = future.wait_for(std::chrono::seconds(2));
if (status == std::future_status::timeout) {
std::cout << "do periodic things\n";
} else if (status == std::future_status::ready) {
std::cout << "exiting\n";
}
} while (status != std::future_status::ready);
});
}
private:
std::promise<void> _stop;
std::future<void> _thread_handle;
};
// Destructor
int main() {
MyClass c;
std::this_thread::sleep_for(std::chrono::seconds(9));
}

Tricky situation with race condition

I have this race condition with an audio playback class, where every time I start playback I set keepPlaying as true, and false when I stop.
The problem happens when I stop() immediately after I start, and the keepPlaying flag is set to false, then reset to true again.
I could put a delay in stop(), but I don't think that's a very good solution. Should I use conditional variable to make stop() wait until keepPlaying is true?
How would you normally solve this problem?
#include <iostream>
#include <thread>
using namespace std;
class AudioPlayer
{
bool keepRunning;
thread thread_play;
public:
AudioPlayer(){ keepRunning = false; }
~AudioPlayer(){ stop(); }
void play()
{
stop();
// keepRunning = true; // A: this works OK
thread_play = thread(&AudioPlayer::_play, this);
}
void stop()
{
keepRunning = false;
if (thread_play.joinable()) thread_play.join();
}
void _play()
{
cout << "Playing: started\n";
keepRunning = true; // B: this causes problem
while(keepRunning)
{
this_thread::sleep_for(chrono::milliseconds(100));
}
cout << "Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
return 0;
}
Output:
$ ./test
Playing: started
(pause indefinitely...)
Here is my suggestion, combining many comments from below as well:
1) Briefly synchronized the keepRunning flag with a mutex so that it cannot be modified while a previous thread is still changing state.
2) Changed the flag to atomic_bool, as it is also modified while the mutex is not used.
class AudioPlayer
{
thread thread_play;
public:
AudioPlayer(){ }
~AudioPlayer()
{
keepRunning = false;
thread_play.join();
}
void play()
{
unique_lock<mutex> l(_mutex);
keepRunning = false;
if ( thread_play.joinable() )
thread_play.join();
keepRunning = true;
thread_play = thread(&AudioPlayer::_play, this);
}
void stop()
{
unique_lock<mutex> l(_mutex);
keepRunning = false;
}
private:
void _play()
{
cout << "Playing: started\n";
while ( keepRunning == true )
{
this_thread::sleep_for(chrono::milliseconds(10));
}
cout << "Playing: stopped\n";
}
atomic_bool keepRunning { false };
std::mutex _mutex;
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
this_thread::sleep_for(chrono::milliseconds(100));
ap.stop();
return 0;
}
To answer the question directly.
Setting keepPlaying=true at point A is synchronous in the main thread but setting it at point B it is asynchronous to the main thread.
Being asynchronous the call to ap.stop() in the main thread (and the one in the destructor) might take place before point B is reached (by the asynchronous thread) so the last thread runs forever.
You should also make keepRunning atomic that will make sure that the value is communicated between the threads correctly. There's no guarantee of when or if the sub-thread will 'see' the value set by the main thread without some synchronization. You could also use a std::mutex.
Other answers don't like .join() in stop(). I would say that's a design decision. You certainly need to make sure the thread has stopped before leaving main()(*) but that could take place in the destructor (as other answers suggest).
As a final note the more conventional design wouldn't keep re-creating the 'play' thread but would wake/sleep a single thread. There's an overhead of creating a thread and the 'classic' model treats this as a producer/consumer pattern.
#include <iostream>
#include <thread>
#include <atomic>
class AudioPlayer
{
std::atomic<bool> keepRunning;
std::thread thread_play;
public:
AudioPlayer():keepRunning(false){
}
~AudioPlayer(){ stop(); }
void play()
{
stop();
keepRunning = true; // A: this works OK
thread_play = std::thread(&AudioPlayer::_play, this);
}
void stop()
{
keepRunning=false;
if (thread_play.joinable()){
thread_play.join();
}
}
void _play()
{
std::cout<<"Playing: started\n";
while(keepRunning)
{
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout<<"Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
ap.stop();
return 0;
}
(*) You can also detach() but that's not recommended.
First, what you have here is indeed the definition of a data race - one thread is writing to a non-atomic variable keepRunning and another is reading from it. So even if you uncomment the line in play, you'd still have a data race. To avoid that, make keepRunning a std::atomic<bool>.
Now, the fundamental problem is the lack of symmetry between play and stop - play does the actual work in a spawned thread, while stop does it in the main thread. To make the flow easier to reason about, increase symmetry:
set keepRunning in play, or
have play wait for the thread to be up and running and done with any setup (also eliminating the need for the if in stop).
As a side note, one way to handle cases where a flag is set and reset in possibly uneven order is to replace it with a counter. You then stall until you see the expected value, and only then apply the change (using CAS).
Ideally, you'd just set keepPlaying before starting the thread (as in your commented out play() function). That's the neatest solution, and skips the race completely.
If you want to be more fancy, you can also use a condition_variable and signal the playing thread with notify_one or notify_all, and in the loop check wait_until with a duration of 0. If it's not cv_status::timeout then you should stop playing.
Don't make stop pause and wait for state to settle down. That would work here, but is a bad habit to get into for later.
As noted in the comment, it is undefined behavior to write to a variable while simultaneously reading from it. atomic<bool> solves this, but wouldn't fix your race on its own, it just makes the reads and writes well defined.
I modified your program a bit and it works now. Let's discuss problems first:
Problem 1: using plain bool variable in 2 threads
Here both threads update the variable and it might lead to a race condition, because it is highly dependent which thread comes first and even end up in undefined behaviour. Undefined behaviour especially might occur when write from one thread is interrupted by another. Here Snps brought up links to the following SO answers:
When do I really need to use atomic<bool> instead of bool?
trap representation
In addition I was searching if write can be interrupted for bool on x86 platforms and came across this answer:
Can a bool read/write operation be not atomic on x86?
Problem 2: Caching as compiler optimization
Another problem is that variables are allowed to be cached. It means that the «playing thread» might cache the value of keepRunning and thus never terminate or terminate after considerable amount of time. In previous C++ version (98, 2003) a volatile modifier was the only construct to mark variables to prevent/avoid caching optimization and in this case force the compiler to always read the variable from its actual memory location. Thus given the «playing thread» enters the while loop keepRunning might be cached and never read or with considerable delays no matter when stop() modifies it.
After C++ 11 atomic template and atomic_bool specialization were introduced to make such variables as non-cachable and being read/set in an uninterruptible manner, thus adressing Problems 1 & 2.
Side note: volatile and caching explained by Andrei Alexandrescu in the Dr. Dobbs article which addresses exactly this situation:
Caching variables in registers is a very valuable optimization that applies most of the time, so it would be a pity to waste it. C and C++ give you the chance to explicitly disable such caching. If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable.
Problem 3: stop was called before _play() function was even started
The problem here is that in multi-threaded OSs scheduler grants some time slice for a thread to run. If the thread can progress and this time slice is not over thread continues to run. In «main thread» all play() calls were executed even before the «play threads» started to run. Thus the object destruction took place before _play() function started running. And there you set the variable keepRunning to true.
How I fixed this problem
We need to ensure that play() returns when the _play() function started running. A condition_variable is of help here. play() blocks so long until _play() notifies it that it has started the execution.
Here is the code:
#include <iostream>
#include <thread>
#include <atomic>
using namespace std;
class AudioPlayer
{
atomic_bool keepRunning;
thread thread_play;
std::mutex mutex;
std::condition_variable play_started;
public:
AudioPlayer()
: keepRunning{false}
{}
~AudioPlayer(){ stop(); }
void play()
{
stop();
std::unique_lock<std::mutex> lock(mutex);
thread_play = thread(&AudioPlayer::_play, this);
play_started.wait(lock);
}
void stop()
{
keepRunning = false;
cout << "stop called" << endl;
if (thread_play.joinable()) thread_play.join();
}
void _play()
{
cout << "Playing: started\n";
keepRunning = true; // B: this causes problem
play_started.notify_one();
while(keepRunning)
{
this_thread::sleep_for(chrono::milliseconds(100));
}
cout << "Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
return 0;
}
Your solution A is actually almost correct. It's still undefined behavior to have one thread read from non-atomic variable that another is writing to. So keepRunning must be made an atomic<bool>. Once you do that and in conjunction with your fix from A, your code will be fine. That is because stop now has a correct post condition that no thread will be active (in particular no _play call) after it exits.
Note that no mutex is necessary. However, play and stop are not themselves thread safe. As long as the client of AudioPlayer is not using the same instance of AudioPlayer in multiple threads though that shouldn't matter.

C++11 lockfree single producer single consumer: how to avoid busy wait

I'm trying to implement a class that uses two threads: one for the producer and one for the consumer. The current implementation does not use locks:
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
Worker() : working_(false), done_(false) {}
~Worker() {
done_ = true; // exit even if the work has not been completed
worker_.join();
}
void enqueue(int value) {
queue_.push(value);
if (!working_) {
working_ = true;
worker_ = std::thread([this]{ work(); });
}
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_ = false;
}
private:
std::atomic<bool> working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
The application needs to enqueue work items for a certain amount of time and then sleep waiting for an event. This is a minimal main that simulates the behavior:
int main()
{
Worker w;
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
for (int i = 0; i < 1000; ++i)
w.enqueue(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
I'm pretty sure that my implementation is bugged: what if the worker thread completes and before executing working_ = false, another enqueue comes? Is it possible to make my code thread safe without using locks?
The solution requires:
a fast enqueue
the destructor has to quit even if the queue is not empty
no busy wait, because there are long period of time in which the worker thread is idle
no locks if possible
Edit
I did another implementation of the Worker class, based on your suggestions. Here is my second attempt:
class Worker
{
public:
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false) { }
~Worker() {
// exit even if the work has not been completed
done_ = true;
if (worker_.joinable())
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set()) {
if (worker_.joinable())
worker_.join();
worker_ = std::thread([this]{ work(); });
}
return enqueued;
}
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
};
I introduced the worker_.join() inside the enqueue method. This can impact the performances, but in very rare cases (when the queue gets empty and before the thread exits, another enqueue comes). The working_ variable is now an atomic_flag that is set in enqueue and cleared in work. The Additional while after working_.clear() is needed because if another value is pushed, before the clear, but after the while, the value is not processed.
Is this implementation correct?
I did some tests and the implementation seems to work.
OT: Is it better to put this as an edit, or an answer?
what if the worker thread completes and before executing working_ = false, another enqueue comes?
Then the value will be pushed to the queue but will not be processed until another value is enqueued after the flag is set. You (or your users) may decide whether that is acceptable. This can be avoided using locks, but they're against your requirements.
The code may fail if the running thread is about to finish and sets working_ = false; but hasn't stopped running before next value is enqueued. In that case your code will call operator= on the running thread which results in a call to std::terminate according to the linked documentation.
Adding worker_.join() before assigning the worker to a new thread should prevent that.
Another problem is that queue_.push may fail if the queue is full because it has a fixed size. Currently you just ignore the case and the value will not be added to the full queue. If you wait for queue to have space, you don't get fast enqueue (in the edge case). You could take the bool returned by push (which tells if it was successful) and return it from enqueue. That way the caller may decide whether it wants to wait or discard the value.
Or use non-fixed size queue. Boost has this to say about that choice:
Can be used to completely disable dynamic memory allocations during push in order to ensure lockfree behavior.
If the data structure is configured as fixed-sized, the internal nodes are stored inside an array and they are addressed
by array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
to achieve lock-freedom.
Your worker thread needs more than 2 states.
Not running
Doing tasks
Idle shutdown
Shutdown
If you force shut down, it skips idle shutdown. If you run out of tasks, it transitions to idle shutdown. In idle shutdown, it empties the task queue, then goes into shutting down.
Shutdown is set, then you walk off the end of your worker task.
The producer first puts things on the queue. Then it checks the worker state. If Shutdown or Idle shutdown, first join it (and transition it to not running) then launch a new worker. If not running, just launch a new worker.
If the producer wants to launch a new worker, it first makes sure that we are in the not running state (otherwise, logic error). We then transition to the Doing tasks state, and then we launch the worker thread.
If the producer wants to shut down the helper task, it sets the done flag. It then checks the worker state. If it is anything besides not running, it joins it.
This can result in a worker thread that is launched for no good reason.
There are a few cases where the above can block, but there where a few before as well.
Then, we write a formal or semi-formal proof that the above cannot lose messages, because when writing lock free code you aren't done until you have a proof.
This is my solution of the question. I don't like very much answering myself, but I think showing actual code may help others.
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
// I used this semaphore class: https://gist.github.com/yohhoy/2156481
#include "binsem.hpp"
using Queue =
boost::lockfree::spsc_queue<
int,
boost::lockfree::capacity<1024>>;
class Worker
{
public:
// the worker thread starts in the constructor
Worker()
: working_(ATOMIC_FLAG_INIT), done_(false), semaphore_(0)
, worker_([this]{ work(); })
{ }
~Worker() {
// exit even if the work has not been completed
done_ = true;
semaphore_.signal();
worker_.join();
}
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set())
// signal to the worker thread to wake up
semaphore_.signal();
return enqueued;
}
void work() {
int value;
// the worker thread continue to live
while (!done_) {
// wait the start signal, sleeping
semaphore_.wait();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
working_.clear();
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
}
}
}
private:
std::atomic_flag working_;
std::atomic<bool> done_;
binsem semaphore_;
Queue queue_;
std::thread worker_;
};
I tried the suggestion of #Cameron, to not shutdown the thread and adding a semaphore. This actually is used only in the first enqueue and in the last work. This is not lock-free, but only in these two cases.
I did some performance comparison, between my previous version (see my edited question), and this one. There are no significant differences, when there are not many start and stop. However, the enqueue is 10 times faster when it have to signal the worker thread, instead of starting a new thread. This is a rare case, so it is not very important, but anyway it is an improvement.
This implementation satisfies:
lock-free in the common case (when enqueue and work are busy);
no busy wait in case for long time there are not enqueue
the destructor exits as soon as possible
correctness?? :)
Very partial answer: I think all those atomics, semaphores and states are a back-communication channel, from "the thread" to "the Worker". Why not use another queue for that? At the very least, thinking about it will help you around the problem.

How would you implement your own reader/writer lock in C++11?

I have a set of data structures I need to protect with a readers/writer lock. I am aware of boost::shared_lock, but I would like to have a custom implementation using std::mutex, std::condition_variable and/or std::atomic so that I can better understand how it works (and tweak it later).
Each data structure (moveable, but not copyable) will inherit from a class called Commons which encapsulates the locking. I'd like the public interface to look something like this:
class Commons {
public:
void read_lock();
bool try_read_lock();
void read_unlock();
void write_lock();
bool try_write_lock();
void write_unlock();
};
...so that it can be publicly inherited by some:
class DataStructure : public Commons {};
I'm writing scientific code and can generally avoid data races; this lock is mostly a safeguard against the mistakes I'll probably make later. Thus my priority is low read overhead so I don't hamper a correctly-running program too much. Each thread will probably run on its own CPU core.
Could you please show me (pseudocode is ok) a readers/writer lock? What I have now is supposed to be the variant that prevents writer starvation. My main problem so far has been the gap in read_lock between checking if a read is safe to actually incrementing a reader count, after which write_lock knows to wait.
void Commons::write_lock() {
write_mutex.lock();
reading_mode.store(false);
while(readers.load() > 0) {}
}
void Commons::try_read_lock() {
if(reading_mode.load()) {
//if another thread calls write_lock here, bad things can happen
++readers;
return true;
} else return false;
}
I'm kind of new to multithreading, and I'd really like to understand it. Thanks in advance for your help!
Here's pseudo-code for a ver simply reader/writer lock using a mutex and a condition variable. The mutex API should be self-explanatory. Condition variables are assumed to have a member wait(Mutex&) which (atomically!) drops the mutex and waits for the condition to be signaled. The condition is signaled with either signal() which wakes up one waiter, or signal_all() which wakes up all waiters.
read_lock() {
mutex.lock();
while (writer)
unlocked.wait(mutex);
readers++;
mutex.unlock();
}
read_unlock() {
mutex.lock();
readers--;
if (readers == 0)
unlocked.signal_all();
mutex.unlock();
}
write_lock() {
mutex.lock();
while (writer || (readers > 0))
unlocked.wait(mutex);
writer = true;
mutex.unlock();
}
write_unlock() {
mutex.lock();
writer = false;
unlocked.signal_all();
mutex.unlock();
}
That implementation has quite a few drawbacks, though.
Wakes up all waiters whenever the lock becomes available
If most of the waiters are waiting for a write lock, this is wastefull - most waiters will fail to acquire the lock, after all, and resume waiting. Simply using signal() doesn't work, because you do want to wake up everyone waiting for a read lock unlocking. So to fix that, you need separate condition variables for readability and writability.
No fairness. Readers starve writers
You can fix that by tracking the number of pending read and write locks, and either stop acquiring read locks once there a pending write locks (though you'll then starve readers!), or randomly waking up either all readers or one writer (assuming you use separate condition variable, see section above).
Locks aren't dealt out in the order they are requested
To guarantee this, you'll need a real wait queue. You could e.g. create one condition variable for each waiter, and signal all readers or a single writer, both at the head of the queue, after releasing the lock.
Even pure read workloads cause contention due to the mutex
This one is hard to fix. One way is to use atomic instructions to acquire read or write locks (usually compare-and-exchange). If the acquisition fails because the lock is taken, you'll have to fall back to the mutex. Doing that correctly is quite hard, though. Plus, there'll still be contention - atomic instructions are far from free, especially on machines with lots of cores.
Conclusion
Implementing synchronization primitives correctly is hard. Implementing efficient and fair synchronization primitives is even harder. And it hardly ever pays off. pthreads on linux, e.g. contains a reader/writer lock which uses a combination of futexes and atomic instructions, and which thus probably outperforms anything you can come up with in a few days of work.
Check this class:
//
// Multi-reader Single-writer concurrency base class for Win32
//
// (c) 1999-2003 by Glenn Slayden (glenn#glennslayden.com)
//
//
#include "windows.h"
class MultiReaderSingleWriter
{
private:
CRITICAL_SECTION m_csWrite;
CRITICAL_SECTION m_csReaderCount;
long m_cReaders;
HANDLE m_hevReadersCleared;
public:
MultiReaderSingleWriter()
{
m_cReaders = 0;
InitializeCriticalSection(&m_csWrite);
InitializeCriticalSection(&m_csReaderCount);
m_hevReadersCleared = CreateEvent(NULL,TRUE,TRUE,NULL);
}
~MultiReaderSingleWriter()
{
WaitForSingleObject(m_hevReadersCleared,INFINITE);
CloseHandle(m_hevReadersCleared);
DeleteCriticalSection(&m_csWrite);
DeleteCriticalSection(&m_csReaderCount);
}
void EnterReader(void)
{
EnterCriticalSection(&m_csWrite);
EnterCriticalSection(&m_csReaderCount);
if (++m_cReaders == 1)
ResetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
LeaveCriticalSection(&m_csWrite);
}
void LeaveReader(void)
{
EnterCriticalSection(&m_csReaderCount);
if (--m_cReaders == 0)
SetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
}
void EnterWriter(void)
{
EnterCriticalSection(&m_csWrite);
WaitForSingleObject(m_hevReadersCleared,INFINITE);
}
void LeaveWriter(void)
{
LeaveCriticalSection(&m_csWrite);
}
};
I didn't have a chance to try it, but the code looks OK.
You can implement a Readers-Writers lock following the exact Wikipedia algorithm from here (I wrote it):
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
int g_sharedData = 0;
int g_readersWaiting = 0;
std::mutex mu;
bool g_writerWaiting = false;
std::condition_variable cond;
void reader(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
++g_readersWaiting;
// reading
std::cout << "\n reader #" << i << " is reading data = " << g_sharedData << '\n';
// end reading
--g_readersWaiting;
while(g_readersWaiting > 0)
cond.wait(lg);
cond.notify_one();
}
void writer(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
// writing
std::cout << "\n writer #" << i << " is writing\n";
g_sharedData += i * 10;
// end writing
g_writerWaiting = true;
while(g_readersWaiting > 0)
cond.wait(lg);
g_writerWaiting = false;
cond.notify_all();
}//lg.unlock()
int main()
{
std::thread reader1{reader, 1};
std::thread reader2{reader, 2};
std::thread reader3{reader, 3};
std::thread reader4{reader, 4};
std::thread writer1{writer, 1};
std::thread writer2{writer, 2};
std::thread writer3{writer, 3};
std::thread writer4{reader, 4};
reader1.join();
reader2.join();
reader3.join();
reader4.join();
writer1.join();
writer2.join();
writer3.join();
writer4.join();
return(0);
}
I believe this is what you are looking for:
class Commons {
std::mutex write_m_;
std::atomic<unsigned int> readers_;
public:
Commons() : readers_(0) {
}
void read_lock() {
write_m_.lock();
++readers_;
write_m_.unlock();
}
bool try_read_lock() {
if (write_m_.try_lock()) {
++readers_;
write_m_.unlock();
return true;
}
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void read_unlock() {
--readers_;
}
// Note: This implementation uses a busy wait to make other functions more efficient.
// Consider using try_write_lock instead! and note that the number of readers can be accessed using readers()
void write_lock() {
while (readers_) {}
if (!write_m_.try_lock())
write_lock();
}
bool try_write_lock() {
if (!readers_)
return write_m_.try_lock();
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void write_unlock() {
write_m_.unlock();
}
int readers() {
return readers_;
}
};
For the record since C++17 we have std::shared_mutex, see: https://en.cppreference.com/w/cpp/thread/shared_mutex