Writing a (spinning) thread barrier using c++11 atomics - c++

I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier
{
private:
std::atomic<int> counter[2];
std::atomic<int> lock[2];
std::atomic<int> cur_idx;
int thread_count;
public:
//constructors...
bool wait();
};
All members are initialized to zero, except thread_count, which holds the appropriate count.
I have implemented the wait function as
int idx = cur_idx.load();
if(lock[idx].load() == 0)
{
lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
counter[idx].store(0);
cur_idx.fetch_xor(1);
lock[idx].store(0);
return true;
}
while(lock[idx].load() == 1);
return false;
However when trying to use it with two threads (thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1);, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed:
int idx = cur_idx;
if(lock[idx] == 0)
{
__sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
__sync_synchronize();
counter[idx] = 0;
cur_idx ^= 1;
__sync_synchronize();
lock[idx] = 0;
__sync_synchronize();
return true;
}
while(lock[idx] == 1);
return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
I got lucky with the second implementation and my algorithm is crap
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second)
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped
For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2);
std::thread t([&b]()->void
{
std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
b.wait();
});
b.wait();
t.join();
Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping)
Any insights would be appreciated.
edit: Explanation for the algorithm to make it clearer:
thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier).
lock is set to one when the first (or any) thread enters the barrier.
counter counts how many threads are inside the barrier and is atomically incremented once for each thread
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero
otherwise the thread waits for the lock to become zero
in the next use of the barrier different variables (counter, lock) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)
edit2:
I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic, but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions

I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:
std::atomic<bool> consumerLock;
{ // the critical section
while (consumerLock.exchange(true)) { } // this is the spinlock
// do something useful
consumerLock = false; // unlock
}
In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag. With that, the critical section would look like this:
std::atomic_flag consumerLock;
{
// critical section
while (consumerLock.test_and_set()) { /* spin */ }
// do stuff
consumerLock.clear();
}
(You can use acquire and release memory ordering there if you prefer.)

It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>
class spinning_barrier
{
public:
spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}
bool wait ()
{
unsigned int step = step_.load ();
if (nwait_.fetch_add (1) == n_ - 1)
{
/* OK, last thread to come. */
nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
step_.fetch_add (1);
return true;
}
else
{
/* Run in circles and scream like a little girl. */
while (step_.load () == step)
;
return false;
}
}
protected:
/* Number of synchronized threads. */
const unsigned int n_;
/* Number of threads currently spinning. */
std::atomic<unsigned int> nwait_;
/* Number of barrier syncronizations completed so far,
* it's OK to wrap. */
std::atomic<unsigned int> step_;
};
EDIT:
#Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.

Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading.
struct bar_t {
unsigned const count;
std::atomic<unsigned> spaces;
std::atomic<unsigned> generation;
bar_t(unsigned count_) :
count(count_), spaces(count_), generation(0)
{}
void wait() {
unsigned const my_generation = generation;
if (!--spaces) {
spaces = count;
++generation;
} else {
while(generation == my_generation);
}
}
};

Here is a simple version of mine :
// spinning_mutex.hpp
#include <atomic>
class spinning_mutex
{
private:
std::atomic<bool> lockVal;
public:
spinning_mutex() : lockVal(false) { };
void lock()
{
while(lockVal.exchange(true) );
}
void unlock()
{
lockVal.store(false);
}
bool is_locked()
{
return lockVal.load();
}
};
Usage : (from std::lock_guard example)
#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"
int g_i = 0;
spinning_mutex g_i_mutex; // protects g_i
void safe_increment()
{
std::lock_guard<spinning_mutex> lock(g_i_mutex);
++g_i;
// g_i_mutex is automatically released when lock
// goes out of scope
}
int main()
{
std::thread t1(safe_increment);
std::thread t2(safe_increment);
t1.join();
t2.join();
}

I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable.
Basically it is the solution of chill, but instead of the while loop it is using std::conditional_variable.wait() and std::conditional_variable.notify_all(). In my tests it seems to work fine.
#include <atomic>
#include <condition_variable>
#include <mutex>
class SpinningBarrier
{
public:
SpinningBarrier (unsigned int threadCount) :
threadCnt(threadCount),
step(0),
waitCnt(0)
{}
bool wait()
{
if(waitCnt.fetch_add(1) >= threadCnt - 1)
{
std::lock_guard<std::mutex> lock(mutex);
step += 1;
condVar.notify_all();
waitCnt.store(0);
return true;
}
else
{
std::unique_lock<std::mutex> lock(mutex);
unsigned char s = step;
condVar.wait(lock, [&]{return step == s;});
return false;
}
}
private:
const unsigned int threadCnt;
unsigned char step;
std::atomic<unsigned int> waitCnt;
std::condition_variable condVar;
std::mutex mutex;
};

Why not use std::atomic_flag (from C++11)?
http://en.cppreference.com/w/cpp/atomic/atomic_flag
std::atomic_flag is an atomic boolean type. Unlike all specializations
of std::atomic, it is guaranteed to be lock-free.
Here's how I would write my spinning thread barrier class:
#ifndef SPINLOCK_H
#define SPINLOCK_H
#include <atomic>
#include <thread>
class SpinLock
{
public:
inline SpinLock() :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock(const SpinLock &) :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock &operator=(const SpinLock &)
{
return *this;
}
inline void lock()
{
while (true)
{
for (int32_t i = 0; i < 10000; ++i)
{
if (!m_lock.test_and_set(std::memory_order_acquire))
{
return;
}
}
std::this_thread::yield(); // A great idea that you don't see in many spinlock examples
}
}
inline bool try_lock()
{
return !m_lock.test_and_set(std::memory_order_acquire);
}
inline void unlock()
{
m_lock.clear(std::memory_order_release);
}
private:
std::atomic_flag m_lock;
};
#endif

Stolen straight from docs
spinlock.h
#include <atomic>
using namespace std;
/* Fast userspace spinlock */
class spinlock {
public:
spinlock(std::atomic_flag& flag) : flag(flag) {
while (flag.test_and_set(std::memory_order_acquire)) ;
};
~spinlock() {
flag.clear(std::memory_order_release);
};
private:
std::atomic_flag& flag;
};
usage.cpp
#include "spinlock.h"
atomic_flag kartuliga = ATOMIC_FLAG_INIT;
void mutually_exclusive_function()
{
spinlock lock(kartuliga);
/* your shared-resource-using code here */
}

Related

Elegant assert that function is not called from several threads

I have a function that must not be called from more than one thread at the same time. Can you suggest some elegant assert for this?
You can use a thin RAII wrapper around std::atomic<>:
namespace {
std::atomic<int> access_counter;
struct access_checker {
access_checker() { check = ++access_counter; }
access_checker( const access_checker & ) = delete;
~access_checker() { --access_counter; }
int check;
};
}
void foobar()
{
access_checker checker;
// assert than checker.check == 1 and react accordingly
...
}
it is simplified version for single use to show the idea and can be improved to use for multiple functions if necessary
Sounds like you need a mutex. Assuming you are using std::thread you can look at the coding example in the following link for specifically using std::mutex: http://www.cplusplus.com/reference/mutex/mutex/
// mutex example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
std::mutex mtx; // mutex for critical section
void print_block (int n, char c) {
// critical section (exclusive access to std::cout signaled by locking mtx):
mtx.lock();
for (int i=0; i<n; ++i) { std::cout << c; }
std::cout << '\n';
mtx.unlock();
}
int main ()
{
std::thread th1 (print_block,50,'*');
std::thread th2 (print_block,50,'$');
th1.join();
th2.join();
return 0;
}
In the above code print_block locks mtx, does what it needs to do, and then unlocks mtx. If print_block is called from two different threads, one thread will lock mtx first and the other thread will block on mtx.lock() and be force to wait until the other thread calls mtx.unlock(). This means only one thread can execute the code between mtx.lock() and mtx.unlock() (exclusive) at the same time.
This assumes by "at the same time" you mean at the same literal time. If you only want one thread to be able to call a function I would recommend looking into std::this_thread::get_id which will get you the id of the current thread. An assert could be as simple as storing the owning thread in owning_thread_id and then calling assert(owning_thread_id == std::this_thread::get_id()).

Audio threading

I have a seperate thread for audio in my application because it sounded like a good idea at the time but now I am conserned at how other threads will comunicate with the audio thread.
audioThread() {
while(!isCloseRequested) {
If(audio.dogSoundRequested) {
audio.playDogSound();
}
}
}
otherThread() {
Audio.dogSoundRequested();
}
Would this be an efficient way to thread audio or do you see issues with this setup?
The issue at stake here seems to be
1: how to make audio.dogSoundRequested and isCloseRequested thread-safe.
2: audioThread is busy-waiting (e.g. spinning infinitely until audio.dogSoundRequested becomes true.
As others have suggested, you could use a mutex to protect both variables, but this is overkill - additionally, it's generally good form in audio code not to use blocking sychronisation in order to avoid issues with priority inversion.
Instead, assuming you're using C++11 or C++14, you could use an atomic variable, whcih are lightweight and don't (in most implementations) block:
#include <atomic>
...
std::atomic<bool> dogSoundRequested{false};
std::atomic<bool> isCloseRequested{false};
Reads and writes to the std::atomic have the same contract as for built-in types, but will generate code that ensures the read and write are atomic with respect to other threads, and that the results are synchronised with other CPUs.
In the case of audio.dogSoundRequested you want both of these effects, and in the case of isCloseRequested, that the result is immediately visible on other CPU.
To solve the busy-waiting issue, use a condition variable to awake audioThread when there's something to do:
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
audioThread()
{
while(!isCloseRequested)
{
m.lock();
cv.wait(m);
// wait() returns with the mutex still held.
m.unlock();
if(audio.dogSoundRequested)
{
audio.playDogSound();
}
}
}
void dogSoundRequested()
{
dogSoundRequested = true;
cv.notify_one();
}
In addition to the use of mutex, here is a simple setup for multiple threads
// g++ -o multi_threading -pthread -std=c++11 multi_threading.cpp
#include <iostream>
#include <thread>
#include <exception>
#include <mutex>
#include <climits> // min max of short int
void launch_consumer() {
std::cout << "launch_consumer" << std::endl;
} // launch_consumer
void launch_producer(std::string chosen_file) {
std::cout << "launch_producer " << chosen_file << std::endl;
} // launch_producer
// -----------
int main(int argc, char** argv) {
std::string chosen_file = "audio_file.wav";
std::thread t1(launch_producer, chosen_file);
std::this_thread::sleep_for (std::chrono::milliseconds( 100));
std::thread t2(launch_consumer);
// -------------------------
t1.join();
t2.join();
return 0;
}
Rather than complicate the code with mutexes and condition variables, consider making a thread-safe FIFO. In this case, one that could have multiple writers and one consumer. Other threads of the application are the writers to this FIFO, the audioThread() is the consumer.
// NOP = no operation
enum AudioTask {NOP, QUIT, PLAY_DOG, ...};
class Fifo
{
public:
bool can_push() const; // is it full?
void push(AudioTask t); // safely writes to the FIFO
AudioTask pop(); // safely reads from the FIFO, if empty NOP
};
Now the audioThread() is a bit cleaner, assuming fifo and audio are application class members:
void audioThread()
{
bool running = true;
while(running)
{
auto task = fifo.pop();
switch(task)
{
case NOP: std::this_thread::yield(); break;
case QUIT: running = false; break;
case PLAY_DOG: audio.playDogSound(); break;
}
}
}
Finally, the calling code only needs to push tasks into the FIFO:
void request_dog_sound()
{
fifo.push(PLAY_DOG);
}
void stop_audio_thread()
{
fifo.push(QUIT);
audio_thread.join();
}
This puts the details of the thread-safe synchronization inside the Fifo class, keeping the rest of the application cleaner.
If you want to be sure that no other thread touches the playDogSound() function, use a mutex lock to lock the resource.
std::mutex mtx;
audioThread() {
while(!isCloseRequested) {
if (audio.dogSoundRequested) {
mtx.lock();
audio.playDogSound();
mtx.unlock();
}
}
}

Multi Threading Using Boost C++ - Synchronisation Issue

I would like to do multithreading where Thread ONE passes data to 4-5 Worker Threads which process the data and ones ALL Worker Threads are finished I would like to continue. I'm using boost to realize that however I have a synchronisation problem. Meaning at one point the program stops and doesn't continue working.
I used OpenMP before and that works nicely but I would like to set the thread priorities individually and I could not figure out how to do that with OpenMP therefore I worked on my own solution:
I would be very glad if some could give hints to find the bug in this code or could help me to find another approach for the problem.
Thank you,
KmgL
#include <QCoreApplication>
#include <boost/thread.hpp>
#define N_CORE 6
#define N_POINTS 10
#define N_RUNS 100000
class Sema{
public:
Sema(int _n =0): m_count(_n),m_mut(),m_cond(){}
void set(int _n)
{
boost::unique_lock<boost::mutex> w_lock(m_mut);
m_count = -_n;
}
void wait()
{
boost::unique_lock<boost::mutex> lock(m_mut);
while (m_count < 0)
{
m_cond.wait(lock);
}
--m_count;
}
void post()
{
boost::unique_lock<boost::mutex> lock(m_mut);
++m_count;
m_cond.notify_all();
}
private:
boost::condition_variable m_cond;
boost::mutex m_mut;
int m_count;
};
class Pool
{
private:
boost::thread m_WorkerThread;
boost::condition_variable m_startWork;
bool m_WorkerRun;
bool m_InnerRun;
Sema * m_sem;
std::vector<int> *m_Ep;
std::vector<int> m_ret;
void calc()
{
unsigned int no_pt(m_Ep->size());
std::vector<int> c_ret;
for(unsigned int i=0;i<no_pt;i++)
c_ret.push_back(100 + m_Ep->at(i));
m_ret = c_ret;
}
void run()
{
boost::mutex WaitWorker_MUTEX;
while(m_WorkerRun)
{
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_startWork.wait(u_lock);
calc();
m_sem->post();
}
}
public:
Pool():m_WorkerRun(false),m_InnerRun(false){}
~Pool(){}
void start(Sema * _sem){
m_WorkerRun = true;
m_sem = _sem;
m_ret.clear();
m_WorkerThread = boost::thread(&Pool::run, this);}
void stop(){m_WorkerRun = false;}
void join(){m_WorkerThread.join();}
void newWork(std::vector<int> &Ep)
{
m_Ep = &Ep;
m_startWork.notify_all();
}
std::vector<int> getWork(){return m_ret;}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
Pool TP[N_CORE];
Sema _sem(0);
for(int k=0;k<N_CORE;k++)
TP[k].start(&_sem);
boost::this_thread::sleep(boost::posix_time::milliseconds(10));
std::vector<int> V[N_CORE];
for(int k=0;k<N_CORE;k++)
for(int i=0;i<N_POINTS;i++)
{
V[k].push_back((k+1)*1000+i);
}
for(int j=0;j<N_RUNS;j++)
{
_sem.set(N_CORE);
for(int k=0;k<N_CORE;k++)
{
TP[k].newWork(V[k]);
}
_sem.wait();
for(int k=0;k<N_CORE;k++)
{
V[k].clear();
V[k]=TP[k].getWork();
if(V[k].size()!=N_POINTS)
std::cout<<"ERROR: "<<"V["<<k<<"].size(): "<<V[k].size()<<std::endl;
}
if((j+1)%100==0)
std::cout<<"LOOP: "<<j+1<<std::endl;
}
std::cout<<"FINISHED: "<<std::endl;
return a.exec();
}
You have a race between the calls to Pool::newWork() and Pool::run().
You have to remember that signaling/broadcasting a condition variable is not a sticky event. If your thread is not waiting on the condition variable at the time of the signaling, the signal will be lost. This is what can happen in your program: There is nothing that prevents your main thread to call Pool::newWork() on each of your Pool objects before they have time to call wait() on your condition variable.
To solve this, you need to move boost::mutex WaitWorker_MUTEX as a class member instead of it being a local variable. Pool::newWork() needs to grab that mutex before doing updates:
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_Ep = &Ep;
m_startWork.notify(); // no need to use notify_all()
Since you're using a condition variable in Pool::run(), you need to handle spurious wakeup. I would recommend setting m_Ep to NULL when you construct the object and every time you're done with the work item:
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
while (1) {
while (m_Ep == NULL && m_workerRun) {
m_startWork.wait(u_lock);
}
if (!m_workerRun) {
return;
}
calc();
m_sem->post();
m_Ep = NULL;
}
stop() will need to grab the mutex and notify():
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_workRun = false;
m_startWork.notify();
These changes should make the 10ms sleep you have un-necessary. You do not seem to call Pool::stop() or Pool::join(). You should change your code to call them.
You'll also get better performance by working on m_ret in Pool::calc() than copying the result at the end. You're also doing copies when you return the work. You might want Pool::getWork() to return a const ref to m_ret.
I have not run this code so there might be other issues. It should help you move
It seems from your code that you're probably wondering why condition variables need to go hand in hand with a mutex (because you declare one local mutex in Pool::run()). I hope my fix makes it clearer.
It could be done with Boost futures. Start the threads then wait for all of them to finish. No other synchronization needed.

Is possible to get a thread-locking mechanism in C++ with a std::atomic_flag?

Using MS Visual C++2012
A class has a member of type std::atomic_flag
class A {
public:
...
std::atomic_flag lockFlag;
A () { std::atomic_flag_clear (&lockFlag); }
};
There is an object of type A
A object;
who can be accessed by two (Boost) threads
void thr1(A* objPtr) { ... }
void thr2(A* objPtr) { ... }
The idea is wait the thread if the object is being accessed by the other thread.
The question is: do it is possible construct such mechanism with an atomic_flag object? Not to say that for the moment, I want some lightweight that a boost::mutex.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
I've tryed in each thread some as:
thr1 (A* objPtr) {
...
while (std::atomic_flag_test_and_set_explicit (&objPtr->lockFlag, std::memory_order_acquire)) {
boost::this_thread::sleep(boost::posix_time::millisec(100));
}
... /* Zone to portect */
std::atomic_flag_clear_explicit (&objPtr->lockFlag, std::memory_order_release);
... /* the process continues */
}
But with no success, because the second thread hangs. In fact, I don't completely understand the mechanism involved in the atomic_flag_test_and_set_explicit function. Neither if such function returns inmediately or can delay until the flag can be locked.
Also it is a mistery to me how to get a lock mechanism with such a function who always set the value, and return the previous value. with no option to only read the actual setting.
Any suggestion are welcome.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
Such a zone is known as the critical section. The simplest way to work with a critical section is to lock by mutual exclusion.
The mutex solution suggested is indeed the way to go, unless you can prove that this is a hotspot and the lock contention is a performance problem. Lock-free programming using just atomic and intrinsics is enormously complex and cannot be recommended at this level.
Here's a simple example showing how you could do this (live on http://liveworkspace.org/code/6af945eda5132a5221db823fa6bde49a):
#include <iostream>
#include <thread>
#include <mutex>
struct A
{
std::mutex mux;
int x;
A() : x(0) {}
};
void threadf(A* data)
{
for(int i=0; i<10; ++i)
{
std::lock_guard<std::mutex> lock(data->mux);
data->x++;
}
}
int main(int argc, const char *argv[])
{
A instance;
auto t1 = std::thread(threadf, &instance);
auto t2 = std::thread(threadf, &instance);
t1.join();
t2.join();
std::cout << instance.x << std::endl;
return 0;
}
It looks like you're trying to write a spinlock. Yes, you can do that with std::atomic_flag, but you are better off using std::mutex instead. Don't use atomics unless you really know what you're doing.
To actually answer the question asked: Yes, you can use std::atomic_flag to create a thread locking object called a spinlock.
#include <atomic>
class atomic_lock
{
public:
atomic_lock()
: lock_( ATOMIC_FLAG_INIT )
{}
void lock()
{
while ( lock_.test_and_set() ) { } // Spin until the lock is acquired.
}
void unlock()
{
lock_.clear();
}
private:
std::atomic_flag lock_;
};

How would you implement your own reader/writer lock in C++11?

I have a set of data structures I need to protect with a readers/writer lock. I am aware of boost::shared_lock, but I would like to have a custom implementation using std::mutex, std::condition_variable and/or std::atomic so that I can better understand how it works (and tweak it later).
Each data structure (moveable, but not copyable) will inherit from a class called Commons which encapsulates the locking. I'd like the public interface to look something like this:
class Commons {
public:
void read_lock();
bool try_read_lock();
void read_unlock();
void write_lock();
bool try_write_lock();
void write_unlock();
};
...so that it can be publicly inherited by some:
class DataStructure : public Commons {};
I'm writing scientific code and can generally avoid data races; this lock is mostly a safeguard against the mistakes I'll probably make later. Thus my priority is low read overhead so I don't hamper a correctly-running program too much. Each thread will probably run on its own CPU core.
Could you please show me (pseudocode is ok) a readers/writer lock? What I have now is supposed to be the variant that prevents writer starvation. My main problem so far has been the gap in read_lock between checking if a read is safe to actually incrementing a reader count, after which write_lock knows to wait.
void Commons::write_lock() {
write_mutex.lock();
reading_mode.store(false);
while(readers.load() > 0) {}
}
void Commons::try_read_lock() {
if(reading_mode.load()) {
//if another thread calls write_lock here, bad things can happen
++readers;
return true;
} else return false;
}
I'm kind of new to multithreading, and I'd really like to understand it. Thanks in advance for your help!
Here's pseudo-code for a ver simply reader/writer lock using a mutex and a condition variable. The mutex API should be self-explanatory. Condition variables are assumed to have a member wait(Mutex&) which (atomically!) drops the mutex and waits for the condition to be signaled. The condition is signaled with either signal() which wakes up one waiter, or signal_all() which wakes up all waiters.
read_lock() {
mutex.lock();
while (writer)
unlocked.wait(mutex);
readers++;
mutex.unlock();
}
read_unlock() {
mutex.lock();
readers--;
if (readers == 0)
unlocked.signal_all();
mutex.unlock();
}
write_lock() {
mutex.lock();
while (writer || (readers > 0))
unlocked.wait(mutex);
writer = true;
mutex.unlock();
}
write_unlock() {
mutex.lock();
writer = false;
unlocked.signal_all();
mutex.unlock();
}
That implementation has quite a few drawbacks, though.
Wakes up all waiters whenever the lock becomes available
If most of the waiters are waiting for a write lock, this is wastefull - most waiters will fail to acquire the lock, after all, and resume waiting. Simply using signal() doesn't work, because you do want to wake up everyone waiting for a read lock unlocking. So to fix that, you need separate condition variables for readability and writability.
No fairness. Readers starve writers
You can fix that by tracking the number of pending read and write locks, and either stop acquiring read locks once there a pending write locks (though you'll then starve readers!), or randomly waking up either all readers or one writer (assuming you use separate condition variable, see section above).
Locks aren't dealt out in the order they are requested
To guarantee this, you'll need a real wait queue. You could e.g. create one condition variable for each waiter, and signal all readers or a single writer, both at the head of the queue, after releasing the lock.
Even pure read workloads cause contention due to the mutex
This one is hard to fix. One way is to use atomic instructions to acquire read or write locks (usually compare-and-exchange). If the acquisition fails because the lock is taken, you'll have to fall back to the mutex. Doing that correctly is quite hard, though. Plus, there'll still be contention - atomic instructions are far from free, especially on machines with lots of cores.
Conclusion
Implementing synchronization primitives correctly is hard. Implementing efficient and fair synchronization primitives is even harder. And it hardly ever pays off. pthreads on linux, e.g. contains a reader/writer lock which uses a combination of futexes and atomic instructions, and which thus probably outperforms anything you can come up with in a few days of work.
Check this class:
//
// Multi-reader Single-writer concurrency base class for Win32
//
// (c) 1999-2003 by Glenn Slayden (glenn#glennslayden.com)
//
//
#include "windows.h"
class MultiReaderSingleWriter
{
private:
CRITICAL_SECTION m_csWrite;
CRITICAL_SECTION m_csReaderCount;
long m_cReaders;
HANDLE m_hevReadersCleared;
public:
MultiReaderSingleWriter()
{
m_cReaders = 0;
InitializeCriticalSection(&m_csWrite);
InitializeCriticalSection(&m_csReaderCount);
m_hevReadersCleared = CreateEvent(NULL,TRUE,TRUE,NULL);
}
~MultiReaderSingleWriter()
{
WaitForSingleObject(m_hevReadersCleared,INFINITE);
CloseHandle(m_hevReadersCleared);
DeleteCriticalSection(&m_csWrite);
DeleteCriticalSection(&m_csReaderCount);
}
void EnterReader(void)
{
EnterCriticalSection(&m_csWrite);
EnterCriticalSection(&m_csReaderCount);
if (++m_cReaders == 1)
ResetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
LeaveCriticalSection(&m_csWrite);
}
void LeaveReader(void)
{
EnterCriticalSection(&m_csReaderCount);
if (--m_cReaders == 0)
SetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
}
void EnterWriter(void)
{
EnterCriticalSection(&m_csWrite);
WaitForSingleObject(m_hevReadersCleared,INFINITE);
}
void LeaveWriter(void)
{
LeaveCriticalSection(&m_csWrite);
}
};
I didn't have a chance to try it, but the code looks OK.
You can implement a Readers-Writers lock following the exact Wikipedia algorithm from here (I wrote it):
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
int g_sharedData = 0;
int g_readersWaiting = 0;
std::mutex mu;
bool g_writerWaiting = false;
std::condition_variable cond;
void reader(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
++g_readersWaiting;
// reading
std::cout << "\n reader #" << i << " is reading data = " << g_sharedData << '\n';
// end reading
--g_readersWaiting;
while(g_readersWaiting > 0)
cond.wait(lg);
cond.notify_one();
}
void writer(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
// writing
std::cout << "\n writer #" << i << " is writing\n";
g_sharedData += i * 10;
// end writing
g_writerWaiting = true;
while(g_readersWaiting > 0)
cond.wait(lg);
g_writerWaiting = false;
cond.notify_all();
}//lg.unlock()
int main()
{
std::thread reader1{reader, 1};
std::thread reader2{reader, 2};
std::thread reader3{reader, 3};
std::thread reader4{reader, 4};
std::thread writer1{writer, 1};
std::thread writer2{writer, 2};
std::thread writer3{writer, 3};
std::thread writer4{reader, 4};
reader1.join();
reader2.join();
reader3.join();
reader4.join();
writer1.join();
writer2.join();
writer3.join();
writer4.join();
return(0);
}
I believe this is what you are looking for:
class Commons {
std::mutex write_m_;
std::atomic<unsigned int> readers_;
public:
Commons() : readers_(0) {
}
void read_lock() {
write_m_.lock();
++readers_;
write_m_.unlock();
}
bool try_read_lock() {
if (write_m_.try_lock()) {
++readers_;
write_m_.unlock();
return true;
}
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void read_unlock() {
--readers_;
}
// Note: This implementation uses a busy wait to make other functions more efficient.
// Consider using try_write_lock instead! and note that the number of readers can be accessed using readers()
void write_lock() {
while (readers_) {}
if (!write_m_.try_lock())
write_lock();
}
bool try_write_lock() {
if (!readers_)
return write_m_.try_lock();
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void write_unlock() {
write_m_.unlock();
}
int readers() {
return readers_;
}
};
For the record since C++17 we have std::shared_mutex, see: https://en.cppreference.com/w/cpp/thread/shared_mutex