How can I synchronize three threads? - c++

My app consist of the main-process and two threads, all running concurrently and making use of three fifo-queues:
The fifo-q's are Qmain, Q1 and Q2. Internally the queues each use a counter that is incremented when an item is put into the queue, and decremented when an item is 'get'ed from the queue.
The processing involve two threads,
QMaster, which get from Q1 and Q2, and put into Qmain,
Monitor, which put into Q2,
and the main process, which get from Qmain and put into Q1.
The QMaster-thread loop consecutively checks the counts of Q1 and Q2 and if any items are in the q's, it get's them and puts them into Qmain.
The Monitor-thread loop obtains data from external sources, package it and put it into Q2.
The main-process of the app also runs a loop checking the count of Qmain, and if any items, get's an item
from Qmain at each iteration of the loop and process it further. During this processing it occasionally
puts an item into Q1 to be processed later (when it is get'ed from Qmain in turn).
The problem:
I've implemented all as described above, and it works for a randomly (short) time and then hangs.
I've managed to identify the source of the crashing to happen in the increment/decrement of the
count of a fifo-q (it may happen in any of them).
What I've tried:
Using three mutex's: QMAIN_LOCK, Q1_LOCK and Q2_LOCK, which I lock whenever any get/put operation
is done on a relevant fifo-q. Result: the app doesn't get going, just hangs.
The main-process must continue running all the time, must not be blocked on a 'read' (named-pipes fail, socketpair fail).
Any advice?
I think I'm not implementing the mutex's properly, how should it be done?
(Any comments on improving the above design also welcome)
[edit] below are the processes and the fifo-q-template:
Where & how in this should I place the mutex's to avoid the problems described above?
main-process:
...
start thread QMaster
start thread Monitor
...
while (!quit)
{
...
if (Qmain.count() > 0)
{
X = Qmain.get();
process(X)
delete X;
}
...
//at some random time:
Q2.put(Y);
...
}
Monitor:
{
while (1)
{
//obtain & package data
Q2.put(data)
}
}
QMaster:
{
while(1)
{
if (Q1.count() > 0)
Qmain.put(Q1.get());
if (Q2.count() > 0)
Qmain.put(Q2.get());
}
}
fifo_q:
template < class X* > class fifo_q
{
struct item
{
X* data;
item *next;
item() { data=NULL; next=NULL; }
}
item *head, *tail;
int count;
public:
fifo_q() { head=tail=NULL; count=0; }
~fifo_q() { clear(); /*deletes all items*/ }
void put(X x) { item i=new item(); (... adds to tail...); count++; }
X* get() { X *d = h.data; (...deletes head ...); count--; return d; }
clear() {...}
};

An example of how I would adapt the design and lock the queue access the posix way.
Remark that I would wrap the mutex to use RAII or use boost-threading and that I would use stl::deque or stl::queue as queue, but staying as close as possible to your code:
main-process:
...
start thread Monitor
...
while (!quit)
{
...
if (Qmain.count() > 0)
{
X = Qmain.get();
process(X)
delete X;
}
...
//at some random time:
QMain.put(Y);
...
}
Monitor:
{
while (1)
{
//obtain & package data
QMain.put(data)
}
}
fifo_q:
template < class X* > class fifo_q
{
struct item
{
X* data;
item *next;
item() { data=NULL; next=NULL; }
}
item *head, *tail;
int count;
pthread_mutex_t m;
public:
fifo_q() { head=tail=NULL; count=0; }
~fifo_q() { clear(); /*deletes all items*/ }
void put(X x)
{
pthread_mutex_lock(&m);
item i=new item();
(... adds to tail...);
count++;
pthread_mutex_unlock(&m);
}
X* get()
{
pthread_mutex_lock(&m);
X *d = h.data;
(...deletes head ...);
count--;
pthread_mutex_unlock(&m);
return d;
}
clear() {...}
};
Remark too that the mutex still needs to be initialized as in the example here and that count() should also use the mutex

Use the debugger. When your solution with mutexes hangs look at what the threads are doing and you will get a good idea about the cause of the problem.
What is your platform? In Unix/Linux you can use POSIX message queues (you can also use System V message queues, sockets, FIFOs, ...) so you don't need mutexes.
Learn about condition variables. By your description it looks like your Qmaster-thread is busy looping, burning your CPU.
One of your responses suggest you are doing something like:
Q2_mutex.lock()
Qmain_mutex.lock()
Qmain.put(Q2.get())
Qmain_mutex.unlock()
Q2_mutex.unlock()
but you probably want to do it like:
Q2_mutex.lock()
X = Q2.get()
Q2_mutex.unlock()
Qmain_mutex.lock()
Qmain.put(X)
Qmain_mutex.unlock()
and as Gregory suggested above, encapsulate the logic into the get/put.
EDIT: Now that you posted your code I wonder, is this a learning exercise?
Because I see that you are coding your own FIFO queue class instead of using the C++ standard std::queue. I suppose you have tested your class really well and the problem is not there.
Also, I don't understand why you need three different queues. It seems that the Qmain queue would be enough, and then you will not need the Qmaster thread that is indeed busy waiting.
About the encapsulation, you can create a synch_fifo_q class that encapsulates the fifo_q class. Add a private mutex variable and then the public methods (put, get, clear, count,...) should be like put(X) { lock m_mutex; m_fifo_q.put(X); unlock m_mutex; }
question: what would happen if you have more than one reader from the queue? Is it guaranteed that after a "count() > 0" you can do a "get()" and get an element?

I wrote a simple application below:
#include <queue>
#include <windows.h>
#include <process.h>
using namespace std;
queue<int> QMain, Q1, Q2;
CRITICAL_SECTION csMain, cs1, cs2;
unsigned __stdcall TMaster(void*)
{
while(1)
{
if( Q1.size() > 0)
{
::EnterCriticalSection(&cs1);
::EnterCriticalSection(&csMain);
int i1 = Q1.front();
Q1.pop();
//use i1;
i1 = 2 * i1;
//end use;
QMain.push(i1);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs1);
}
if( Q2.size() > 0)
{
::EnterCriticalSection(&cs2);
::EnterCriticalSection(&csMain);
int i1 = Q2.front();
Q2.pop();
//use i1;
i1 = 3 * i1;
//end use;
QMain.push(i1);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs2);
}
}
return 0;
}
unsigned __stdcall TMoniter(void*)
{
while(1)
{
int irand = ::rand();
if ( irand % 6 >= 3)
{
::EnterCriticalSection(&cs2);
Q2.push(irand % 6);
::LeaveCriticalSection(&cs2);
}
}
return 0;
}
unsigned __stdcall TMain(void)
{
while(1)
{
if (QMain.size() > 0)
{
::EnterCriticalSection(&cs1);
::EnterCriticalSection(&csMain);
int i = QMain.front();
QMain.pop();
i = 4 * i;
Q1.push(i);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs1);
}
}
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
::InitializeCriticalSection(&cs1);
::InitializeCriticalSection(&cs2);
::InitializeCriticalSection(&csMain);
unsigned threadID;
::_beginthreadex(NULL, 0, &TMaster, NULL, 0, &threadID);
::_beginthreadex(NULL, 0, &TMoniter, NULL, 0, &threadID);
TMain();
return 0;
}

You should not lock second mutex when you already locked one.
Since the question is tagged with C++, I suggest to implement locking inside get/add logic of the queue class (e.g. using boost locks) or write a wrapper if your queue is not a class.
This allows you to simplify the locking logic.
Regarding the sources you have added: queue size check and following put/get should be done in one transaction otherwise another thread can edit the queue in between

Are you acquiring multiple locks simultaneously? This is generally something you want to avoid. If you must, ensure you are always acquiring the locks in the same order in each thread (this is more restrictive to your concurrency and why you generally want to avoid it).
Other concurrency advice: Are you acquiring the lock prior to reading the queue sizes? If you're using a mutex to protect the queues, then your queue implementation isn't concurrent and you probably need to acquire the lock before reading the queue size.

1 problem may occur due to this rule "The main-process must continue running all the time, must not be blocked on a 'read'". How did you implement it? what is the difference between 'get' and 'read'?
Problem seems to be in your implementation, not in the logic. And as you stated, you should not be in any dead lock because you are not acquiring another lock whether in a lock.

Related

Sharing common data element between threads

I am trying to use c++ queue. I know queue element can be accessed by existing threads, but I want to use same queue element. It will be used by all threads, e.g: same video frame I want to use between thread1 and thread2.
Once it's processed by two threads I want to pop next video frame. I know threads will access individual elements (queue element 1 by thread1 and queue element2 by thread2), but I want to access queue element 1 by both threads. I am unable to lock single buffer for both threads.
Please help me to share same queue element between threads.
You could put each frame in an envelope containing a counter that you decrease every time the queue is pop:ed. When the counter reaches zero, you remove the element. Example:
struct envelope_t {
int count;
frame_t frame;
envelope_t(const frame_t& f) : count(2), frame(f) {}
};
class myqueue {
std::queue<envelope_t> data;
std::mutex mtx_data;
std::condition_variable cv_data;
public:
template<class... Args>
decltype(auto) emplace(Args&&... args) {
std::lock_guard<std::mutex> lock(mtx_data);
auto rv = data.emplace(std::forward<Args>(args)...);
cv_data.notify_one();
return rv;
}
frame_t pop() {
std::unique_lock<std::mutex> lock(mtx_data);
while(data.size() == 0) cv_data.wait(lock);
if(--data.front().count) {
cv_data.notify_one();
return data.front().frame;
} else {
auto msg = std::move(data.front().frame);
data.pop();
return msg;
}
}
};

C++: both classes do not run concurrently

its my first time here. My code is suppose to make two ultrasonic sensors function at the same time using an mbed. However, i cant seem to make both classes void us_right() and void us_left() in the code run concurrently. Help please :(
#include "mbed.h"
DigitalOut triggerRight(p9);
DigitalIn echoRight(p10);
DigitalOut triggerLeft(p13);
DigitalIn echoLeft(p14);
//DigitalOut myled(LED1); //monitor trigger
//DigitalOut myled2(LED2); //monitor echo
PwmOut steering(p21);
PwmOut velocity(p22);
int distanceRight = 0, distanceLeft = 0;
int correctionRight = 0, correctionLeft = 0;
Timer sonarRight, sonarLeft;
float vo=0;
// Velocity expects -1 (reverse) to +1 (forward)
void Velocity(float v) {
v=v+1;
if (v>=0 && v<=2) {
if (vo>=1 && v<1) { //
velocity.pulsewidth(0.0014); // this is required to
wait(0.1); //
velocity.pulsewidth(0.0015); // move into reverse
wait(0.1); //
} //
velocity.pulsewidth(v/2000+0.001);
vo=v;
}
}
// Steering expects -1 (left) to +1 (right)
void Steering(float s) {
s=s+1;
if (s>=0 && s<=2) {
steering.pulsewidth(s/2000+0.001);
}
}
void us_right() {
sonarRight.reset();
sonarRight.start();
while (echoRight==2) {};
sonarRight.stop();
correctionRight = sonarLeft.read_us();
triggerRight = 1;
sonarRight.reset();
wait_us(10.0);
triggerRight = 0;
while (echoRight==0) {};
// myled2=echoRight;
sonarRight.start();
while (echoRight==1) {};
sonarRight.stop();
distanceRight = ((sonarRight.read_us()-correctionRight)/58.0);
printf("Distance from Right is: %d cm \n\r",distanceRight);
}
void us_left() {
sonarLeft.reset();
sonarLeft.start();
while (echoLeft==2) {};
sonarLeft.stop();
correctionLeft = sonarLeft.read_us();
triggerLeft = 1;
sonarLeft.reset();
wait_us(10.0);
triggerLeft = 0;
while (echoLeft==0) {};
// myled2=echoLeft;
sonarLeft.start();
while (echoLeft==1) {};
sonarLeft.stop();
distanceLeft = (sonarLeft.read_us()-correctionLeft)/58.0;
printf("Distance from Left is: %d cm \n\r",distanceLeft);
}
int main() {
while(true) {
us_right();
us_left();
}
if (distanceLeft < 10 || distanceRight < 10) {
if (distanceLeft < distanceRight) {
for(int i=0; i>-100; i--) { // Go left
Steering(i/100.0);
wait(0.1);
}
}
if (distanceLeft > distanceRight) {
for(int i=0; i>100; i++) { // Go Right
Steering(i/100.0);
wait(0.1);
}
}
}
wait(0.2);
}
You need to use some mechanism to create new threads or processes. Your implementation is sequential, there is nothing you do that tells the code to run concurrently.
You should take a look at some threads libraries (pthreads for example, or if you have access to c++11, there are thread functionality there) or how to create new processes as well as some kind of message passing interface between these processes.
Create two threads, one for each ultrasonic sensor:
void read_left_sensor() {
while (1) {
// do the reading
wait(0.5f);
}
}
int main() {
Thread left_thread;
left_thread.start(&read_left_sensor);
Thread right_thread;
right_thread.start(&read_right_sensor);
while (1) {
// put your control code for the vehicle here
wait(0.1f);
}
}
You can use global variables to write to when reading the sensor, and read them in your main loop. The memory is shared.
Your first problem is that you have placed code outside of your infinite while(true) loop. This later code will never run. But maybe you know this.
int main() {
while(true) {
us_right();
us_left();
} // <- Loops back to the start of while()
// You Never pass this point!!!
if (distanceLeft < 10 || distanceRight < 10) {
// Do stuff etc.
}
wait(0.2);
}
But, I think you are expecting us_right() and us_left() to happen at exactly the same time. You cannot do that in a sequential environment.
Jan Jongboom is correct in suggesting you could use Threads. This allows the 'OS' to designate time for each piece of code to run. But it is still not truly parallel. Each function (classes are a different thing) will get a chance to run. One will run, and when it is finished (or during a wait) another function will get its chance to run.
As you are using an mbed, I'd suggest that your project is an MBED OS 5 project
(you select this when you start a new project). Otherwise you'll need to use an RTOS library. There is a blinky example using threads that should sum it up well. Here is more info.
Threading can be dangerous for someone without experience. So stick to a simple implementation to start with. Make sure you understand what/why/how you are doing it.
Aside: From a hardware perspective, running ultrasonic sensors in parallel is actually not ideal. They both broadcast the same frequency, and can hear each other. Triggering them at the same time, they interfere with each other.
Imagine two people shouting words in a closed room. If they take turns, it will be obvious what they are saying. If they both shout at the same time, it will be very hard!
So actually, not being able to run in parallel is probably a good thing.

Locking multiple parts of an array - Multithreading

I'm trying to implement a threadsafe locking mechanism for an array with the following intended use case:
Request the indexes you want to lock and try to acquire them. If you fail to acquire ANY index, bail out, and try again (essentially spin).
Once the necessary locks have been acquired perform processing on these indexes.
Release the acquired locks!
I'm using the below code to test the lock - it just increments a test count, with the same indexes being specified for each iteration (so it forces access to be sequential). The only problem is it doesn't work, and I'm kind of stumped...
I have a feeling I'm missing some sort of key race condition, but I can't identity it yet :(
#pragma omp parallel for
for (int i = 0; i < activeItems; i++)
{
std::vector<int> lockedBucketIndexes = {0, 1, 2, 3};
//try and get a lock on the buckets we want, otherwise keep trying
while (!spatialHash->TryAcquireBucketLocks(lockedBucketIndexes))
{
//TODO - do some fancy backoff here
}
testCount++;
spatialHash->DropBucketLocks(lockedBucketIndexes);
}
The class/methods that do the locking:
std::vector<int> _bucketLocks;
SpinLock _bucketLockLock;
bool SpatialHash::TryAcquireBucketLocks(std::vector<int> bucketIndexes)
{
bool success = true;
//try and get a lock to set our bucket locks... lockception
_bucketLockLock.lock();
//quickly check that the buckets we want are free
for each (int bucketIndex in bucketIndexes)
{
if (_bucketLocks[bucketIndex] > 0)
{
success = false;
break;
}
}
//if all the buckets are free, set them to occupied
if (success)
{
for each (int bucketIndex in bucketIndexes)
{
_bucketLocks[bucketIndex] = 1;
}
}
//let go of the lock
_bucketLockLock.unlock();
return success;
}
void DropBucketLocks(std::vector<int> bucketIndexes)
{
//I have no idea why these locks are required
//It seems to almost work with them though...
_bucketLockLock.lock();
for each (int bucketIndex in bucketIndexes)
{
_bucketLocks[bucketIndex] = 0;
}
_bucketLockLock.unlock();
return true;
}
The spinlock class:
class SpinLock {
std::atomic_flag locked = ATOMIC_FLAG_INIT;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) { ; }
}
void unlock() {
locked.clear(std::memory_order_release);
}
};

flushing thread local buffer at end of parallel loop with TBB

I want to parallelize a loop (using tbb) which contains some expensive but vectorizable iterations (randomly spread). My idea was to buffer those and flush the buffer whenever it reaches the vector size. Such a buffer must be thread-local. For example,
// dummy for testing
void do_vectorized_work(size_t k, size_t*indices)
{}
// dummy for testing
bool requires_expensive_work(size_t k)
{ return (k&7)==0; }
struct buffer
{
size_t K=0, B[vector_size];
void load(size_t i)
{
B[K++]=i;
if(K==vector_size)
flush();
}
void flush()
{
do_vectorized_work(K,B);
K=0;
}
};
void do_work_in_parallel(size_t N)
{
tbb::enumerable_thread_specific<buffer> tl_buffer;
tbb::parallel_for(size_t(0),N,[&](size_t i)
{
if(requires_expensive_work(i))
tl_buffer.local().load(i);
});
}
However, this leaves the buffers non-empty, so I still have to flush each of them a final time
for(auto&b:tl_buffer)
b.flush();
but this is serial! Of course, I can also try to do this in parallel
using tl_range = typename tbb::enumerable_thread_specific<buffer>::range_type;
tbb::parallel_for(tl_buffer.range(),[](tl_range const&range)
{
for(auto r:range)
r->flush();
});
But I'm not sure this is efficient (since there are only as many buffers as there are threads). I was wondering whether it is possible to avoid this final flush after the event. I.e. is it possible to use tbb::tasks (replacing tbb::parallel_for) in such a way that each thread's final task is to flush its buffer?
No, a worker thread does not have complete information about whether this particular task is the last task of the given work or not (this is how work-stealing works). Thus, it is not possible to implement such a function on the level of parallel_for or the scheduler itself. Thus, I'd recommend you to go with these two approaches you describe.
There are two other things you can do about this though.
make it asynchronous. I.e. enqueue a task which will get everything flushed. It will help to remove this code from the hot path on the main thread. Just be careful if there are any dependencies which need to be set on completion of this task.
use tbb::task_scheduler_observer in order to initialize thread-specific data and release it lazily when threads get shut down or when there is no work remains for some time. The latter requires using local observer feature which is not yet officially supported but remains stable for few years already.
Example:
#define TBB_PREVIEW_LOCAL_OBSERVER 1
#include <tbb/tbb.h>
#include <assert.h>
typedef void * buffer_t;
const static int bufsz = 1024;
class thread_buffer_allocator: public tbb::task_scheduler_observer {
tbb::enumerable_thread_specific<buffer_t> _buf;
public:
thread_buffer_allocator( )
: tbb::task_scheduler_observer( /*local=*/ true ) {
observe(true); // activate the observer
}
~thread_buffer_allocator( ) {
observe(false); // deactivate the observer
for(auto &b : _buf) {
printf("destructor: cleared: %p\n", b);
free(b);
}
}
/*override*/ void on_scheduler_entry( bool worker ) {
assert(_buf.local() == nullptr);
_buf.local() = malloc(bufsz);
printf("on entry: %p\n", _buf.local());
}
/*override*/ void on_scheduler_exit( bool worker ) {
printf("on exit\n");
if(_buf.local()) {
printf("on exit: cleared %p\n", _buf.local());
free(_buf.local());
_buf.local() = nullptr;
}
}
};
int main() {
thread_buffer_allocator buffers_scope;
tbb::parallel_for(0, 1024*1024*1024, [&](auto i){
usleep(i%3);
});
return 0;
}
It occurred to me that this can be solved by reduction.
struct buffer
{
std::size_t K=0, B[vector_size];
void load(std::size_t i)
{
B[K++]=i;
if(K==vector_size) flush();
}
void flush()
{
do_vectorized_work(K,B);
K=0;
}
buffer(buffer const&, tbb::split)
{}
void operator()(tbb::block_range<std::size_t> const&range)
{ for(i:range) load(i); }
bool empty()
{ return K==0; }
std::size_t pop()
{ return K? B[--K] : 0; }
void join(buffer&rhs)
{ while(!rhs.empty()) load(rhs.pop()); }
};
void do_work_in_parallel(std::size_t N)
{
buffer buff;
tbb::parallel_reduce(tbb::block_range<std::size_t>(0,N,vector_size),buff);
if(!buff.empty())
buff.flush();
}

c++11 multi-reader / multi-writer queue using atomics for object state and perpetual incremented indexes

I am using atomics and a circular buffer in order to implement a multi-reader threads, multi-writer threads object pool.
It is difficult to investigate because instrumenting code leads to bug vanishment !
The model
Producers (or writer threads) request an Element to the Ring in order to 'prepare' the element. When terminated, the writer thread changes the element state so a reader can 'consume' it. After that, the element becomes available again for writing.
Consumers (or reader threads) request an object to the Ring in order to 'read' the object.
After 'releasing' the object, the object is in a state::Ready state, eg available to be consume by a reader thread.
It can fail if no object is available eg the next free object in the Ring is not on state::Unused state.
The 2 classes, Element and Ring
Element :
to be written, a writer thread must successfully exchange the _state member from state::Unused to state::LockForWrite
when finished, the writer thread force the state to state::Ready (it should be the only to handle this Element)
to be read, a rader thread must successfully exchange the _state member from state::Ready to state::LockForRead
when finished, the reader thread force the state to state::Unused (it should be the only to handle this Element)
Summarized :
writers lifecycle : state::Unused -> state::LockForWrite -> state::Ready
readers lifecycle : state::Ready -> state::LockForRead -> state::Unused
Ring
has a vector of Element , seen as a circular buffer.
std::atomic<int64_t> _read, _write; are the 2 indexes used to access the elements via :
_elems[ _write % _elems.size() ] for writers,
_elems[ _read % _elems.size() ] for readers.
When a reader has successfully LockForRead an object, the _read index is incremented.
When a writer has successfully LockForWrite an object, the _write index is incremented.
The main :
We add to a vector some writers and readers threads sharing the same Ring. Each thread just try to get_read or get_write element and release them just after.
Based on Element transition everything should be fine but one can observe that the Ring at some point gets blocked like because some elements in the ring are in state state::Ready with a _write % _elems.size() index pointing on it and symetrically, some elements in the ring are in state state::Unused with a _read % _elems.size() index pointing on it ! Both = deadlock.
#include<atomic>
#include<vector>
#include<thread>
#include<iostream>
#include<cstdint>
typedef enum : int
{
Unused, LockForWrite, Ready, LockForRead
}state;
class Element
{
std::atomic<state> _state;
public:
Element():_state(Unused){ }
// a reader need to successfully make the transition Ready => LockForRead
bool lock_for_read() { state s = Ready; return _state.compare_exchange_strong(s, LockForRead); }
void unlock_read() { state s = Unused; _state.store(s); }
// a reader need to successfully make the transition Unused => LockForWrite
bool lock_for_write() { state s = Unused; return _state.compare_exchange_strong(s, LockForWrite); }
void unlock_write() { state s = Ready; _state.store(s); }
};
class Ring
{
std::vector<Element> _elems;
std::atomic<int64_t> _read, _write;
public:
Ring(size_t capacity)
: _elems(capacity), _read(0), _write(0) {}
Element * get_for_read() {
Element * ret = &_elems[ _read.load() % _elems.size() ];
if (!ret->lock_for_read()) // if success, the object belongs to the caller thread as reader
return NULL;
_read.fetch_add(1); // success! incr _read index
return ret;
}
Element * get_for_write() {
Element * ret = &_elems[ _write.load() % _elems.size() ];
if (!ret->lock_for_write())// if success, the object belongs to the caller thread as writer
return NULL;
_write.fetch_add(1); // success! incr _write index
return ret;
}
void release_read(Element* e) { e->unlock_read();}
void release_write(Element* e) { e->unlock_write();}
};
int main()
{
const int capacity = 10; // easy to process modulo[![enter image description here][1]][1]
std::atomic<bool> stop=false;
Ring ring(capacity);
std::function<void()> writer_job = [&]()
{
std::cout << "writer starting" << std::endl;
Element * e;
while (!stop)
{
if (!(e = ring.get_for_write()))
continue;
// do some real writer job ...
ring.release_write(e);
}
};
std::function<void()> reader_job = [&]()
{
std::cout << "reader starting" << std::endl;
Element * e;
while (!stop)
{
if (!(e = ring.get_for_read()))
continue;
// do some real reader job ...
ring.release_read(e);
}
};
int nb_writers = 1;
int nb_readers = 2;
std::vector<std::thread> threads;
threads.reserve(nb_writers + nb_readers);
std::cout << "adding writers" << std::endl;
while (nb_writers--)
threads.push_back(std::thread(writer_job));
std::cout << "adding readers" << std::endl;
while (nb_readers--)
threads.push_back(std::thread(reader_job));
// wait user key press, halt in debugger after 1 or 2 seconds
// in order to reproduce problem and watch ring
std::cin.get();
stop = true;
std::cout << "waiting all threads...\n";
for (auto & th : threads)
th.join();
std::cout << "end" << std::endl;
}
This "watch debugger screeshot" has been took pausing the program after running 1 second. As you can see, _read is pointing to the element 8 marked as state::Unused so no transition can unblock this state for this reader, except a writer but _write index is pointing on element 0 with state state::Ready !
My question: what did I missed in this ? Structurally I am sure the sequence is correct but I am missing some atomic trick ...
os tested : rhel5/gcc 4.1.2, rhel 7/gcc 4.8, win10/ms visual 2015, win10/mingw
Yann's answer is correct about the problem: your threads can create "holes" in the sequence by reading and writing elements out-of-order if there's a delay between the read/write lock and the increment of the index. The fix is to verify that the index has not changed between the initial read and the increment, a la:
class Element
{
std::atomic<state> _state;
public:
Element():_state(Unused){ }
// a reader need to successfully make the transition Ready => LockForRead
bool lock_for_read() {
state s = Ready;
return _state.compare_exchange_strong(s, LockForRead);
}
void abort_read() { _state = Ready; }
void unlock_read() { state s = Unused; _state.store(s); }
// a reader need to successfully make the transition Unused => LockForWrite
bool lock_for_write() {
state s = Unused;
return _state.compare_exchange_strong(s, LockForWrite);
}
void abort_write() { _state = Unused; }
void unlock_write() { state s = Ready; _state.store(s); }
};
class Ring
{
std::vector<Element> _elems;
std::atomic<int64_t> _read, _write;
public:
Ring(size_t capacity)
: _elems(capacity), _read(0), _write(0) {}
Element * get_for_read() {
auto i = _read.load();
Element * ret = &_elems[ i % _elems.size() ];
if (ret->lock_for_read()) {
// if success, the object belongs to the caller thread as reader
if (_read.compare_exchange_strong(i, i + 1))
return ret;
// Woops, reading out of order.
ret->abort_read();
}
return NULL;
}
Element * get_for_write() {
auto i = _write.load();
Element * ret = &_elems[ i % _elems.size() ];
if (ret->lock_for_write()) {
// if success, the object belongs to the caller thread as writer
if (_write.compare_exchange_strong(i, i + 1))
return ret;
// Woops, writing out of order.
ret->abort_write();
}
return NULL;
}
void release_read(Element* e) { e->unlock_read();}
void release_write(Element* e) { e->unlock_write();}
};
You do not have atomic section around the increment of the two shared counters _read and _write.
That looks bad to me, you could switch another element without meaning to.
Imagine this scenario,
1 reader R1 and 1 writer W are happily cooperating.
Reader 2 executes : Element * ret = &_elems[ _read.load() % _elems.size() ];
and gets pushed off the cpu.
Now R1 and W are still playing together, so the positions of _read and _write are now arbitrary w.r.t. the element ret that R2 is pointing.
Now at some point R2 gets scheduled, and it so happens that *ret_ is readable (again possibly, R1 and W went around the block a few times).
Ouch, as you see, we will read it, and increment "_read", but _read has no relation to _ret. This creates kind of holes, elements that have not been read, but that are below _read index.
So, make critical sections to ensure that increment of _read/_write is done in the same semantic step as the actual lock.