Reader-Writer Preference Using Semaphores - c++

I'm currently working on a correct implementation of the Reader-Writer problem (see here).
I found this solution in the Qt docks guaranteeing fair treatment of Reader and Writer threads by using a semaphore and mutex. The basic code is this:
sem_t semaphore_;
pthread_mutex_t lock_;
void PalindromeDatabase::initializeLocks()
{
sem_init(&semaphore_, 0, NumberOfReaders_);
pthread_mutex_init(&lock_, nullptr);
}
void PalindromeDatabase::lockReaders()
{
sem_wait(&semaphore_);
}
void PalindromeDatabase::unlockReaders()
{
sem_post(&semaphore_);
}
void PalindromeDatabase::lockWriters()
{
pthread_mutex_lock(&lock_);
{
for (int i = 0; i < NumberOfReaders_; ++i)
sem_wait(&semaphore_);
}
pthread_mutex_unlock(&lock_);
}
void PalindromeDatabase::unlockWriters()
{
for (int i = 0; i < NumberOfReaders_; ++i)
sem_post(&semaphore_);
}
This seems like a very elegant solution. It seems easier and a lot more efficient than the pthread_rwlock_*behavior detailed in this SO answer.
I was wondering if this code below is a correct adjustment of the Qt solution to prefer Reader threads.
int readersActive_;
sem_t semaphore_;
pthread_mutex_t lock_;
pthread_mutex_t readLock_;
pthread_cond_t wait_;
void PalindromeDatabase::initializeLocks()
{
sem_init(&semaphore_, 0, numberOfReaders_);
pthread_mutex_init(&lock_, nullptr);
pthread_mutex_init(&readLock_, nullptr);
pthread_cond_init(&wait_, nullptr);
}
void PalindromeDatabase::lockReaders()
{
pthread_mutex_lock(&lock_);
{
pthread_mutex_lock(&readLock_);
sem_wait(&semaphore_);
pthread_mutex_unlock(&readLock_);
++readersActive_;
}
pthread_mutex_unlock(&lock_);
}
void PalindromeDatabase::lockReaders()
{
pthread_mutex_lock(&lock_);
{
pthread_mutex_lock(&readLock_);
sem_wait(&semaphore_);
pthread_mutex_unlock(&readLock_);
++readersActive_;
}
pthread_mutex_unlock(&lock_);
}
void PalindromeDatabase::unlockReaders()
{
sem_post(&semaphore_);
pthread_mutex_lock(&lock_);
{
--readersActive_;
if (readersActive_ == 0)
pthread_cond_signal(&wait_);
}
pthread_mutex_unlock(&lock_);
}
void PalindromeDatabase::lockWriters()
{
pthread_mutex_lock(&lock_);
{
if (readersActive_ != 0)
{
do
{
pthread_cond_wait(&wait_, &lock_);
} while (readersActive_ != 0);
}
pthread_mutex_lock(&readLock_);
for (int i = 0; i < numberOfReaders_; ++i)
sem_wait(&semaphore_);
pthread_mutex_unlock(&readLock_);
}
pthread_mutex_unlock(&lock_);
}
void PalindromeDatabase::unlockWriters()
{
for (int i = 0; i < numberOfReaders_; ++i)
sem_post(&semaphore_);
}

There are quite some issues with your code:
The semaphore is only used by the writer and as such it is meaningless.
While locking for writer, you use the mutex, while unlocking you don't.
The readers signal a changed condition when #readers become zero, and the writer waits for the signal of the condition variable, but it doesn't check the condition.
Upon locking for writer, if #readers is already zero, it will not actually lock.
Having thought about my remark that it is easy, locking is still tricky, I thought about it and I hope I cracked it with this pseudo code, focussing on correct order not the correct notation:
void lockReader()
{
lock(rdmutex); // make sure Reader and Writer can't interfere during locking
lock(wrmutex); // lock mutex so waitfor can unlock
while (writer_)
waitfor(wrcv, wrmutex); // no active writers
++readers_; // at least 1 reader present
unlock(wrmutex);
unlock(rdmutex);
}
void unlockReader()
{
lock(rdmutex);
bool noReaders = (--readers_ == 0);
unlock(rdmutex);
if (noReaders) signal(rdcv); // signal when no more readers
}
void lockWriter()
{
lock(WritersLock); // only 1 writer allowed
lock(rdmutex); // lock mutex so waitfor can unlock and no interference by lockReader
while (readers_ != 0)
waitfor(rdcv, rdmutex); // wait until no more readers
lock(wrmutex);
writer_ = true; // a writer is busy
unlock(wrmutex);
unlock(rdmutex);
// WritersLock is still locked
}
void unlockWriter()
{
lock(wrmutex);
writer_ = false;
unlock(wrmutex);
signal(wrcv); // no more writer (until WritersLock is unlocked)
unlock(WritersLock);
}
As it turns out, the Qt implementation is simpler, but my algorithm doesn't need to know the maximum number of readers in advance.

Related

two threads entering the critical section on the mutex

The following simplified example of several
I'm writing a c++20 software which explits pthreads. The simplified example shows how I have a shared resource shared_resource, an int variable, which is written by several threads, several times. To access the variable I use a mutex and a condition variable. A typical use of mutex and condition variables.
the num_readers is used as following:
greater than 0: multiple readers accessing the shared variable
0: neither writers nor readers are accessing the resource
-1: a writer is writing a new value on the resource. No more readers nor writers are avaibale until the writer releases the resource
The simplified version has no readers for focusing on the problem. Since num_readers = num_readers - 1; can be executed only when a writer releases the resource by setting it to 0 and signaling the other writers, I expect 0 or -1 values, but never -2!
The problem is that by executing the following I randomly get -2 values, so some interleaving problem is occurring I guess:
WAT>? num_readers -2
Process finished with exit code 1
#include <iostream>
#include <pthread.h>
#include <cstdlib>
#include <thread>
#include <random>
void* writer(void* parameters);
pthread_mutex_t mutex{PTHREAD_MUTEX_DEFAULT};
pthread_cond_t cond_writer = PTHREAD_COND_INITIALIZER;
int num_readers{0};
int shared_resource{0};
int main() {
const int WRITERS{500};
pthread_t writers[WRITERS];
for(unsigned int i=0; i < WRITERS; i++) {
pthread_create(&writers[i], NULL, writer, NULL);
}
for(auto &writer_thread : writers) {
pthread_join(writer_thread, NULL);
std::cout << "[main] writer returned\n";
}
std::cout << "[main] exiting..." << std::endl;
return 0;
}
void* writer(void* parameters) {
for (int i=0; i<5; i++) {
pthread_mutex_lock(&mutex);
while(num_readers != 0) {
if (num_readers < -1) {
std::cout << "WAT>? num_readers " << std::to_string(num_readers) << "\n";
exit(1);
}
pthread_cond_wait(&cond_writer, &mutex);
}
num_readers = num_readers - 1;
pthread_mutex_unlock(&mutex);
std::uniform_int_distribution<int> dist(1, 1000);
std::random_device rd;
int new_value = dist(rd);
shared_resource = new_value;
pthread_mutex_lock(&mutex);
num_readers = 0;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond_writer);
}
return 0;
}
So: why isn't this code thread safe?
Some issues stand out in your code:
You modify the number of readers in the write funtion. Only the reader function should do that.
Same thing for the signaling of the condition variable. That should only be signaled from the reader function.
incrementing and decrementing the number of readers is usually done with a semaphore: an atomic int and an associated condition variable.
Here is the algorithm:
int reader()
{
// indicate that a read is in progress.
//
// a. lock()/
// b. increment number of readers.
// c. unlock() as soon as possible, so other readers can also read reading.
//
// note that any write in progress will stop the thread here.
pthread_mutex_lock(&mutex);
++num_readers;
pthread_mutex_unlock(&mutex);
// read protected data
int result = shared_resource;
// decremennt readers count.
//
// note that calls to lock()/unlock() are not necessary if
// num_readers is atomic (I.e.: std::atomic<int>)
pthread_mutex_lock(&mutex);
if (--num_readers == 0)
pthread_cond_signal(&cond_writer); // last reader sets the cond_var
pthread_mutex_unlock(&mutex);
return result;
}
void writer(int value)
{
// lock
pthread_mutex_lock(&mutex);
// wait for no readers, the mutex is released while waiting for
// the last read to complete. Note that access to num_readers is
// done while the mutex is owned.
while (num_readers != 0)
pthread_cond_wait(&cond_writer, &mutex);
// modify protected data.
shared_resource = value;
// unlock.
pthread_mutex_unlock(&mutex);
}

Atomically incrementing an integer in shared memory for multiple processes on linux x86-64 with gcc

The Question
What's a good way to increment a counter and signal once that counter reaches a given value (i.e., signaling a function waiting on blocks until full, below)? It's a lot like asking for a semaphore. The involved processes are communicating via shared memory (/dev/shm), and I'm currently trying to avoid using a library (like Boost).
Initial Solution
Declare a struct that contains a SignalingIncrementingCounter. This struct is allocated in shared memory, and a single process sets up the shared memory with this struct before the other processes begin. The SignalingIncrementingCounter contains the following three fields:
A plain old int to represent the counter's value.
Note: Due to the MESI caching protocol, we are guaranteed that if one cpu core modifies the value, that the updated value will be reflected in other caches once the value is read from those other caches.
A pthread mutex to guard the reading and incrementing of the integer counter
A pthread condition variable to signal when the integer has reached a desirable value
Other Solutions
Instead of using an int, I also tried using std::atomic<int>. I've tried just defining this field as a member of the SignalingIncrementingCounter class, and I've also tried allocating it into the struct at run time with placement new. It seems that neither worked better than the int.
The following should work.
The Implementation
I include most of the code, but I leave out parts of it for the sake of brevity.
signaling_incrementing_counter.h
#include <atomic>
struct SignalingIncrementingCounter {
public:
void init(const int upper_limit_);
void reset_to_empty();
void increment(); // only valid when counting up
void block_until_full(const char * comment = {""});
private:
int upper_limit;
volatile int value;
pthread_mutex_t mutex;
pthread_cond_t cv;
};
signaling_incrementing_counter.cpp
#include <pthread.h>
#include <stdexcept>
#include "signaling_incrementing_counter.h"
void SignalingIncrementingCounter::init(const int upper_limit_) {
upper_limit = upper_limit_;
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
int retval = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
if (retval) {
throw std::runtime_error("Error while setting sharedp field for mutex");
}
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
pthread_mutex_init(&mutex, &attr);
pthread_mutexattr_destroy(&attr);
}
{
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_cond_init(&cv, &attr);
pthread_condattr_destroy(&attr);
}
value = 0;
}
void SignalingIncrementingCounter::reset_to_empty() {
pthread_mutex_lock(&mutex);
value = 0;
// No need to signal, because in my use-case, there is no function that unblocks when the value changes to 0
pthread_mutex_unlock(&mutex);
}
void SignalingIncrementingCounter::increment() {
pthread_mutex_lock(&mutex);
fprintf(stderr, "incrementing\n");
++value;
if (value >= upper_limit) {
pthread_cond_broadcast(&cv);
}
pthread_mutex_unlock(&mutex);
}
void SignalingIncrementingCounter::block_until_full(const char * comment) {
struct timespec max_wait = {0, 0};
pthread_mutex_lock(&mutex);
while (value < upper_limit) {
int val = value;
printf("blocking until full, value is %i, for %s\n", val, comment);
clock_gettime(CLOCK_REALTIME, &max_wait);
max_wait.tv_sec += 5; // wait 5 seconds
const int timed_wait_rv = pthread_cond_timedwait(&cv, &mutex, &max_wait);
if (timed_wait_rv)
{
switch(timed_wait_rv) {
case ETIMEDOUT:
break;
default:
throw std::runtime_error("Unexpected error encountered. Investigate.");
}
}
}
pthread_mutex_unlock(&mutex);
}
Using either an int or std::atomic works.
One of the great things about the std::atomic interface is that it plays quite nicely with the int "interface". So, the code is almost exactly the same. One can switch between each implementation below by adding a #define USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER true.
I'm not so sure about statically creating the std::atomic in shared memory, so I use placement new to allocate it. My guess is that relying on the static allocation would work, but it may technically be undefined behavior. Figuring that out is beyond the scope of my question, but a comment on that topic would be quite welcome.
signaling_incrementing_counter.h
#include <atomic>
#include "gpu_base_constants.h"
struct SignalingIncrementingCounter {
public:
/**
* We will either count up or count down to the given limit. Once the limit is reached, whatever is waiting on this counter will be signaled and allowed to proceed.
*/
void init(const int upper_limit_);
void reset_to_empty();
void increment(); // only valid when counting up
void block_until_full(const char * comment = {""});
// We don't have a use-case for the block_until_non_full
private:
int upper_limit;
#if USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER
volatile int value;
#else // USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER
std::atomic<int> value;
std::atomic<int> * value_ptr;
#endif // USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER
pthread_mutex_t mutex;
pthread_cond_t cv;
};
signaling_incrementing_counter.cpp
#include <pthread.h>
#include <stdexcept>
#include "signaling_incrementing_counter.h"
void SignalingIncrementingCounter::init(const int upper_limit_) {
upper_limit = upper_limit_;
#if !GPU_USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER
value_ptr = new(&value) std::atomic<int>(0);
#endif // GPU_USE_INT_IN_SHARED_MEMORY_FOR_SIGNALING_COUNTER
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
int retval = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
if (retval) {
throw std::runtime_error("Error while setting sharedp field for mutex");
}
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
pthread_mutex_init(&mutex, &attr);
pthread_mutexattr_destroy(&attr);
}
{
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_cond_init(&cv, &attr);
pthread_condattr_destroy(&attr);
}
reset_to_empty(); // should be done at end, since mutex functions are called
}
void SignalingIncrementingCounter::reset_to_empty() {
int mutex_rv = pthread_mutex_lock(&mutex);
if (mutex_rv) {
throw std::runtime_error("Unexpected error encountered while grabbing lock. Investigate.");
}
value = 0;
// No need to signal, because there is no function that unblocks when the value changes to 0
pthread_mutex_unlock(&mutex);
}
void SignalingIncrementingCounter::increment() {
fprintf(stderr, "incrementing\n");
int mutex_rv = pthread_mutex_lock(&mutex);
if (mutex_rv) {
throw std::runtime_error("Unexpected error encountered while grabbing lock. Investigate.");
}
++value;
fprintf(stderr, "incremented\n");
if (value >= upper_limit) {
pthread_cond_broadcast(&cv);
}
pthread_mutex_unlock(&mutex);
}
void SignalingIncrementingCounter::block_until_full(const char * comment) {
struct timespec max_wait = {0, 0};
int mutex_rv = pthread_mutex_lock(&mutex);
if (mutex_rv) {
throw std::runtime_error("Unexpected error encountered while grabbing lock. Investigate.");
}
while (value < upper_limit) {
int val = value;
printf("blocking during increment until full, value is %i, for %s\n", val, comment);
/*const int gettime_rv =*/ clock_gettime(CLOCK_REALTIME, &max_wait);
max_wait.tv_sec += 5;
const int timed_wait_rv = pthread_cond_timedwait(&cv, &mutex, &max_wait);
if (timed_wait_rv)
{
switch(timed_wait_rv) {
case ETIMEDOUT:
break;
default:
pthread_mutex_unlock(&mutex);
throw std::runtime_error("Unexpected error encountered. Investigate.");
}
}
}
pthread_mutex_unlock(&mutex);
}

Condition variable's "wait" function causing unexpected behaviour when predicate is provided

As an educational exercise I'm implementing a thread pool using condition variables. A controller thread creates a pool of threads that wait on a signal (an atomic variable being set to a value above zero). When signaled the threads wake, perform their work, and when the last thread is done it signals the main thread to awaken. The controller thread blocks until the last thread is complete. The pool is then available for subsequent re-use.
Every now and then I was getting a timeout on the controller thread waiting for the worker to signal completion (likely because of a race condition when decrementing the active work counter), so in an attempt to solidify the pool I replaced the "wait(lck)" form of the condition variable's wait method with "wait(lck, predicate)". Since doing this, the behaviour of the thread pool is such that it seems to permit decrementing of the active work counter below 0 (which is the condition for reawakening the controller thread) - I have a race condition. I've read countless articles on atomic variables, synchronisation, memory ordering, spurious and lost wakeups on stackoverflow and various other sites, have incorporated what I've learnt to the best of my ability, and still cannot for the life of me work out why the way I've coded the predicated wait just does not work. The counter should only ever be as high as the number of threads in the pool (say, 8) and as low as zero. I've started losing faith in myself - it just shouldn't be this hard to do something fundamentally simple. There is clearly something else I need to learn here :)
Considering of course that there was a race condition I ensured that the two variables that drive the awakening and termination of the pool are both atomic, and that both are only ever changed while protected with a unique_lock. Specifically, I made sure that when a request to the pool was launched, the lock was acquired, the active thread counter was changed from 0 to 8, unlocked the mutex, and then "notified_all". The controller thread would only then be awakened with the active thread count at zero, once the last worker thread decremented it that far and "notified_one".
In the worker thread, the condition variable would wait and wake only when the active thread count is greater than zero, unlock the mutex, in parallel proceed to execute the work preassigned to the processor when the pool was created, re-acquire the mutex, and atomically decrement the active thread count. It would then, while still supposedly protected by the lock, test if it was the last thread still active, and if so, again unlock the mutex and "notify_one" to awaken the controller.
The problem is - the active thread counter repeatedly proceeds below zero after even only 1 or 2 iterations. If I test the active thread count at the start of a new workload, I could find the active thread count down around -6 - it is as if the pool was allowed to reawaken the controller thread before the work was completed.
Given that the thread counter and terminate flag are both atomic variables and are only ever modified while under the protection of the same mutex, I am using sequential memory ordering for all updates, I just cannot see how this is happening and I'm lost.
#include <stdafx.h>
#include <Windows.h>
#include <iostream>
#include <thread>
using std::thread;
#include <mutex>
using std::mutex;
using std::unique_lock;
#include <condition_variable>
using std::condition_variable;
#include <atomic>
using std::atomic;
#include <chrono>
#include <vector>
using std::vector;
class IWorkerThreadProcessor
{
public:
virtual void Process(int) = 0;
};
class MyProcessor : public IWorkerThreadProcessor
{
int index_ = 0;
public:
MyProcessor(int index)
{
index_ = index;
}
void Process(int threadindex)
{
for (int i = 0; i < 5000000; i++);
std::cout << '(' << index_ << ':' << threadindex << ") ";
}
};
#define MsgBox(x) do{ MessageBox(NULL, x, L"", MB_OK ); }while(false)
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
//This goes negative when using the wait_for with predicate
atomic<int> threadsActive_ = 0;
atomic<bool> terminateFlag_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtxWorker_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtxWorker_);
threadsActive_ = 0;
terminateFlag_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
//Only used to see how many invocations I was getting through before experiencing the issue - sadly it's only one or two
invocations_++;
try
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use the predicated wait this break will fire !!!!!!
if (threadsActive_.load() != 0)
__debugbreak();
threadsActive_.store(poolSize_);
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
{
//As you can tell this has taken me through a journey trying to characterise the issue...
if (threadsActive_ > 0)
MsgBox(L"Thread pool timed out with still active threads");
else if (threadsActive_ == 0)
MsgBox(L"Thread pool timed out with zero active threads");
else
MsgBox(L"Thread pool timed out with negative active threads");
}
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use this predicated wait I see the failure !!!!!!
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
//!!!!!!!! Does not cause the failure but obviously will not handle
//spurious wake-ups !!!!!!!!!!
//cvSignalWork_.wait(lck);
if (terminateFlag_.load())
return;
//Unlock to parallelise the work load
lck.unlock();
processor->Process(threadIndex);
//Re-lock to decrement the work count
lck.lock();
//This returns the value before the subtraction so theoretically if the previous value was 1 then we're the last thread going and we can now signal the controller thread to wake. This is the only place that the decrement happens so I don't know how it could possibly go negative
if (threadsActive_.fetch_sub(1, std::memory_order_seq_cst) == 1)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtxWorker_);
if (!terminateFlag_)
{
terminateFlag_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
int main()
{
std::vector<IWorkerThreadProcessor*> processors;
for (int i = 0; i < 8; i++)
processors.push_back(new MyProcessor(i));
std::cout << "Instantiating thread pool\n";
auto pool = new ThreadPool;
std::cout << "Initialisting thread pool\n";
pool->Init(processors);
std::cout << "Thread pool initialised\n";
for (int i = 0; i < 200; i++)
{
std::cout << "Workload " << i << "\n";
pool->ProcessWorkload(std::chrono::milliseconds(500));
std::cout << "Workload " << i << " complete." << "\n";
}
for (auto a : processors)
delete a;
delete pool;
return 0;
}
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
std::atomic<unsigned int> awakenings_ = 0;
std::atomic<unsigned int> startedWorkloads_ = 0;
std::atomic<unsigned int> completedWorkloads_ = 0;
atomic<bool> terminate_ = false;
atomic<bool> stillFiring_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtx_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtx_);
//threadsActive_ = 0;
terminate_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
invokations_ = 0;
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
try
{
unique_lock<mutex> lck(mtx_);
invokations_++;
if (startedWorkloads_ != 0)
__debugbreak();
if (completedWorkloads_ != 0)
__debugbreak();
if (awakenings_ != 0)
__debugbreak();
if (stillFiring_)
__debugbreak();
stillFiring_ = true;
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
//[this] { return this->threadsActive_.load() == 0; })
[this] { return completedWorkloads_ == poolSize_ && !stillFiring_; })
)
{
if (completedWorkloads_ < poolSize_)
{
if (startedWorkloads_ < poolSize_)
MsgBox(L"Thread pool timed out with some threads unstarted");
else if (startedWorkloads_ == poolSize_)
MsgBox(L"Thread pool timed out with all threads started but not all completed");
}
else
__debugbreak();
}
if (completedWorkloads_ != poolSize_)
__debugbreak();
if (awakenings_ != poolSize_)
__debugbreak();
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtx_);
cvSignalWork_.wait(
lck,
[this] {
return
(stillFiring_ && (startedWorkloads_ < poolSize_)) ||
terminate_;
});
awakenings_++;
if (startedWorkloads_ == 0 && terminate_)
return;
if (stillFiring_ && startedWorkloads_ < poolSize_) //guard against spurious wakeup
{
startedWorkloads_++;
if (startedWorkloads_ == poolSize_)
stillFiring_ = false;
lck.unlock();
processor->Process(threadIndex);
lck.lock();
completedWorkloads_++;
if (completedWorkloads_ == poolSize_)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtx_);
if (!terminate_) //Don't attempt to double-terminate
{
terminate_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
I'm not certain if the following helps solve the problem, but I think the error is as shown below:
This
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
should be replaced by
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[&] { return threadsActive_.load() == 0; })
)
Looks like the lambda is not accessing the instantiated member of the class. Here is some reference to back my case. Look at Lambda Capture section of this page.
Edit:
Another place you are using wait for with lambdas.
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
Maybe modify all the lambdas and then see if it works?
The reason I'm looking at the lambda is because it seems like a case similar to a spurious wakeup. Hope it helps.

A semaphore implmentation with Peterson's N process algorithm

I need feedback on my code for following statement, am I on right path?
Problem statement:
a. Implement a semaphore class that has a private int and three public methods: init, wait and signal. The wait and signal methods should behave as expected from a semaphore and must use Peterson's N process algorithm in their implementation.
b. Write a program that creates 5 threads that concurrently update the value of a shared integer and use an object of semaphore class created in part a) to ensure the correctness of the concurrent updates.
Here is my working program:
#include <iostream>
#include <pthread.h>
using namespace std;
pthread_mutex_t mid; //muted id
int shared=0; //global shared variable
class semaphore {
int counter;
public:
semaphore(){
}
void init(){
counter=1; //initialise counter 1 to get first thread access
}
void wait(){
pthread_mutex_lock(&mid); //lock the mutex here
while(1){
if(counter>0){ //check for counter value
counter--; //decrement counter
break; //break the loop
}
}
pthread_mutex_unlock(&mid); //unlock mutex here
}
void signal(){
pthread_mutex_lock(&mid); //lock the mutex here
counter++; //increment counter
pthread_mutex_unlock(&mid); //unlock mutex here
}
};
semaphore sm;
void* fun(void* id)
{
sm.wait(); //call semaphore wait
shared++; //increment shared variable
cout<<"Inside thread "<<shared<<endl;
sm.signal(); //call signal to semaphore
}
int main() {
pthread_t id[5]; //thread ids for 5 threads
sm.init();
int i;
for(i=0;i<5;i++) //create 5 threads
pthread_create(&id[i],NULL,fun,NULL);
for(i=0;i<5;i++)
pthread_join(id[i],NULL); //join 5 threads to complete their task
cout<<"Outside thread "<<shared<<endl;//final value of shared variable
return 0;
}
You need to release the mutex while spinning in the wait loop.
The test happens to work because the threads very likely run their functions start to finish before there is any context switch, and hence each one finishes before the next one even starts. So you have no contention over the semaphore. If you did, they'd get stuck with one waiter spinning with the mutex held, preventing anyone from accessing the counter and hence release the spinner.
Here's an example that works (though it may still have an initialization race that causes it to sporadically not launch correctly). It looks more complicated, mainly because it uses the gcc built-in atomic operations. These are needed whenever you have more than a single core, since each core has its own cache. Declaring the counters 'volatile' only helps with compiler optimization - for what is effectively SMP, cache consistency requires cross-processor cache invalidation, which means special processor instructions need to be used. You can try replacing them with e.g. counter++ and counter-- (and same for 'shared') - and observe how on a multi-core CPU it won't work. (For more details on the gcc atomic ops, see https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html)
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdint.h>
class semaphore {
pthread_mutex_t lock;
int32_t counter;
public:
semaphore() {
init();
}
void init() {
counter = 1; //initialise counter 1 to get first access
}
void spinwait() {
while (true) {
// Spin, waiting until we see a positive counter
while (__atomic_load_n(&counter, __ATOMIC_SEQ_CST) <= 0)
;
pthread_mutex_lock(&lock);
if (__atomic_load_n(&counter, __ATOMIC_SEQ_CST) <= 0) {
// Someone else stole the count from under us or it was
// a fluke - keep trying
pthread_mutex_unlock(&lock);
continue;
}
// It's ours
__atomic_fetch_add(&counter, -1, __ATOMIC_SEQ_CST);
pthread_mutex_unlock(&lock);
return;
}
}
void signal() {
pthread_mutex_lock(&lock); //lock the mutex here
__atomic_fetch_add(&counter, 1, __ATOMIC_SEQ_CST);
pthread_mutex_unlock(&lock); //unlock mutex here
}
};
enum {
NUM_TEST_THREADS = 5,
NUM_BANGS = 1000
};
// Making semaphore sm volatile would be complicated, because the
// pthread_mutex library calls don't expect volatile arguments.
int shared = 0; // Global shared variable
semaphore sm; // Semaphore protecting shared variable
volatile int num_workers = 0; // So we can wait until we have N threads
void* fun(void* id)
{
usleep(100000); // 0.1s. Encourage context switch.
const int worker = (intptr_t)id + 1;
printf("Worker %d ready\n", worker);
// Spin, waiting for all workers to be in a runnable state. These printouts
// could be out of order.
++num_workers;
while (num_workers < NUM_TEST_THREADS)
;
// Go!
// Bang on the semaphore. Odd workers increment, even decrement.
if (worker & 1) {
for (int n = 0; n < NUM_BANGS; ++n) {
sm.spinwait();
__atomic_fetch_add(&shared, 1, __ATOMIC_SEQ_CST);
sm.signal();
}
} else {
for (int n = 0; n < NUM_BANGS; ++n) {
sm.spinwait();
__atomic_fetch_add(&shared, -1, __ATOMIC_SEQ_CST);
sm.signal();
}
}
printf("Worker %d done\n", worker);
return NULL;
}
int main() {
pthread_t id[NUM_TEST_THREADS]; //thread ids
// create test worker threads
for(int i = 0; i < NUM_TEST_THREADS; i++)
pthread_create(&id[i], NULL, fun, (void*)((intptr_t)(i)));
// join threads to complete their task
for(int i = 0; i < NUM_TEST_THREADS; i++)
pthread_join(id[i], NULL);
//final value of shared variable. For an odd number of
// workers this is the loop count, NUM_BANGS
printf("Test done. Final value: %d\n", shared);
const int expected = (NUM_TEST_THREADS & 1) ? NUM_BANGS : 0;
if (shared == expected) {
puts("PASS");
} else {
printf("Value expected was: %d\nFAIL\n", expected);
}
return 0;
}

performance of boost::threads notify_one vs notify_all

I have an implementation of Semaphore to manage a shared resource using boost::threads. My implementation of the Semaphore is as shown below.
void releaseResource()
{
boost::unique_lock<boost::mutex> lock(mutex_);
boost::thread::id curr_thread = boost::this_thread::get_id();
// scan through to identify the current thread
for (unsigned int i=0; i<org_count;++i)
{
if (res_data[i].thread_id == curr_thread)
{
res_data[i].thread_id = boost::thread::id();
res_data[i].available = true;
break;
}
}
++count_;
condition_.notify_one();
}
unsigned int acquireResource()
{
boost::unique_lock<boost::mutex> lock(mutex_);
while(count_ == 0) { // put thread to sleep until resource becomes available
condition_.wait(lock);
}
--count_;
// Scan through the resource list and hand a index for the resource
unsigned int res_ctr;
for (unsigned int i=0; i<org_count;++i)
{
if (res_data[i].available)
{
res_data[i].thread_id = boost::this_thread::get_id();
res_data[i].available = false;
res_ctr = i;
break;
}
}
return res_ctr;
}
My question is regarding the performance degradation that I notice when there are more threads than the number of resources available. If I use notify_all to wake threads up instead of notify_one as shown in the releaseResource() code, I see that the performance improves. Has anyone else experienced something similar?
I am using Windows 7 and boost 1.52.