I have an atomic counter (std::atomic<uint32_t> count) which deals out sequentially incrementing values to multiple threads.
uint32_t my_val = ++count;
Before I get my_val I want to ensure that the increment won't overflow (ie: go back to 0)
if (count == std::numeric_limits<uint32_t>::max())
throw std::runtime_error("count overflow");
I'm thinking this is a naive check because if the check is performed by two threads before either increments the counter, the second thread to increment will get 0 back
if (count == std::numeric_limits<uint32_t>::max()) // if 2 threads execute this
throw std::runtime_error("count overflow");
uint32_t my_val = ++count; // before either gets here - possible overflow
As such I guess I need to use a CAS operation to make sure that when I increment my counter, I am indeed preventing a possible overflow.
So my questions are:
Is my implementation correct?
Is it as efficient as it can be (specifically do I need to check against max twice)?
My code (with working exemplar) follows:
#include <iostream>
#include <atomic>
#include <limits>
#include <stdexcept>
#include <thread>
std::atomic<uint16_t> count;
uint16_t get_val() // called by multiple threads
{
uint16_t my_val;
do
{
my_val = count;
// make sure I get the next value
if (count.compare_exchange_strong(my_val, my_val + 1))
{
// if I got the next value, make sure we don't overflow
if (my_val == std::numeric_limits<uint16_t>::max())
{
count = std::numeric_limits<uint16_t>::max() - 1;
throw std::runtime_error("count overflow");
}
break;
}
// if I didn't then check if there are still numbers available
if (my_val == std::numeric_limits<uint16_t>::max())
{
count = std::numeric_limits<uint16_t>::max() - 1;
throw std::runtime_error("count overflow");
}
// there are still numbers available, so try again
}
while (1);
return my_val + 1;
}
void run()
try
{
while (1)
{
if (get_val() == 0)
exit(1);
}
}
catch(const std::runtime_error& e)
{
// overflow
}
int main()
{
while (1)
{
count = 1;
std::thread a(run);
std::thread b(run);
std::thread c(run);
std::thread d(run);
a.join();
b.join();
c.join();
d.join();
std::cout << ".";
}
return 0;
}
Yes, you need to use CAS operation.
std::atomic<uint16_t> g_count;
uint16_t get_next() {
uint16_t new_val = 0;
do {
uint16_t cur_val = g_count; // 1
if (cur_val == std::numeric_limits<uint16_t>::max()) { // 2
throw std::runtime_error("count overflow");
}
new_val = cur_val + 1; // 3
} while(!std::atomic_compare_exchange_weak(&g_count, &cur_val, new_val)); // 4
return new_val;
}
The idea is the following: once g_count == std::numeric_limits<uint16_t>::max(), get_next() function will always throw an exception.
Steps:
Get current value of the counter
If it is maximal, throw an exception (no numbers available anymore)
Get new value as increment of the current value
Try to atomically set new value. If we failed to set it (it was done by another thread already), try again.
If efficiency is a big concern then I'd suggest not being so strict on the check. I'm guessing that under normal use overflow won't be an issue, but do you really need the full 65K range (your example uses uint16)?
It would be easier if you assume some maximum on the number of threads you have running. This is a reasonable limit since no program has unlimited numbers of concurrency. So if you have N threads you can simply reduce your overflow limit to 65K - N. To compare if you overflow you don't need a CAS:
uint16_t current = count.load(std::memory_order_relaxed);
if( current >= (std::numeric_limits<uint16_t>::max() - num_threads - 1) )
throw std::runtime_error("count overflow");
count.fetch_add(1,std::memory_order_relaxed);
This creates a soft-overflow condition. If two threads come here at once both of them will potentially pass, but that's okay since the count variable itself never overflows. Any future arrivals at this point will logically overflow (until count is reduced again).
It seems to me that there's still a race condition where count will be set to 0 momentarily such that another thread will see the 0 value.
Assume that count is at std::numeric_limits<uint16_t>::max() and two threads try to get the incremented value. At the moment that Thread 1 performs the count.compare_exchange_strong(my_val, my_val + 1), count is set to 0 and that's what Thread 2 will see if it happens to call and complete get_val() before Thread 1 has a chance to restore count to max().
Related
I'm trying to implement some algorithm using threads that must be synchronized at some moment. More or less the sequence for each thread should be:
1. Try to find a solution with current settings.
2. Synchronize solution with other threads.
3. If any of the threads found solution end work.
4. (empty - to be inline with example below)
5. Modify parameters for algorithm and jump to 1.
Here is a toy example with algorithm changed to just random number generation - all threads should end if at least one of them will find 0.
#include <iostream>
#include <condition_variable>
#include <thread>
#include <vector>
const int numOfThreads = 8;
std::condition_variable cv1, cv2;
std::mutex m1, m2;
int lockCnt1 = 0;
int lockCnt2 = 0;
int solutionCnt = 0;
void workerThread()
{
while(true) {
// 1. do some important work
int r = rand() % 1000;
// 2. synchronize and get results from all threads
{
std::unique_lock<std::mutex> l1(m1);
++lockCnt1;
if (r == 0) ++solutionCnt; // gather solutions
if (lockCnt1 == numOfThreads) {
// last thread ends here
lockCnt2 = 0;
cv1.notify_all();
}
else {
cv1.wait(l1, [&] { return lockCnt1 == numOfThreads; });
}
}
// 3. if solution found then quit all threads
if (solutionCnt > 0) return;
// 4. if not, then set lockCnt1 to 0 to have section 2. working again
{
std::unique_lock<std::mutex> l2(m2);
++lockCnt2;
if (lockCnt2 == numOfThreads) {
// last thread ends here
lockCnt1 = 0;
cv2.notify_all();
}
else {
cv2.wait(l2, [&] { return lockCnt2 == numOfThreads; });
}
}
// 5. Setup new algorithm parameters and repeat.
}
}
int main()
{
srand(time(NULL));
std::vector<std::thread> v;
for (int i = 0; i < numOfThreads ; ++i) v.emplace_back(std::thread(workerThread));
for (int i = 0; i < numOfThreads ; ++i) v[i].join();
return 0;
}
The questions I have are about sections 2. and 4. from code above.
A) In a section 2 there is synchronization of all threads and gathering solutions (if found). All is done using lockCnt1 variable. Comparing to single use of condition_variable I found it hard how to set lockCnt1 to zero safely, to be able to reuse this section (2.) next time. Because of that I introduced section 4. Is there better way to do that (without introducing section 4.)?
B) It seems that all examples shows using condition_variable rather in context of 'producer-consumer' scenario. Is there better way to synchronization all threads in case where all are 'producers'?
Edit: Just to be clear, I didn't want to describe algorithm details since this is not important here - anyway this is necessary to have all solution(s) or none from given loop execution and mixing them is not allowed. Described sequence of execution must be followed and the question is how to have such synchronization between threads.
A) You could just not reset the lockCnt1 to 0, just keep incrementing it further. The condition lockCnt2 == numOfThreads then changes to lockCnt2 % numOfThreads == 0. You can then drop the block #4. In future you could also use std::experimental::barrier to get the threads to meet.
B) I would suggest using std::atomic for solutionCnt and then you can drop all other counters, the mutex and the condition variable. Just atomically increase it by one in the thread that found solution and then return. In all threads after every iteration check if the value is bigger than zero. If it is, then return. The advantage is that the threads do not have to meet regularly, but can try to solve it at their own pace.
Out of curiosity, I tried to solve your problem using std::async. For every attempt to find a solution, we call async. Once all parallel attempts have finished, we process feedback, adjust parameters, and repeat. An important difference with your implementation is that feedback is processed in the calling (main) thread. If processing feedback takes too long — or if we don't want to block the main thread at all — then the code in main() can be adjusted to also call std::async.
The code is supposed to be quite efficient, provided that the implementation of async uses a thread pool (e. g. Microsoft's implementation does that).
#include <chrono>
#include <future>
#include <iostream>
#include <vector>
const int numOfThreads = 8;
struct Parameters{};
struct Feedback {
int result;
};
Feedback doTheWork(const Parameters &){
// do the work and provide result and feedback for future runs
return Feedback{rand() % 1000};
}
bool isSolution(const Feedback &f){
return f.result == 0;
}
// Runs doTheWork in parallel. Number of parallel tasks is same as size of params vector
std::vector<Feedback> findSolutions(const std::vector<Parameters> ¶ms){
// 1. Run async tasks to find solutions. Normally threads are not created each time but re-used from a pool
std::vector<std::future<Feedback>> futures;
for (auto &p: params){
futures.push_back(std::async(std::launch::async,
[&p](){ return doTheWork(p); }));
}
// 2. Syncrhonize: wait for all tasks
std::vector<Feedback> feedback(futures.size());
for (auto nofRunning = futures.size(), iFuture = size_t{0}; nofRunning > 0; ){
// Check if the task has finished (future is invalid if we already handled it during an earlier iteration)
auto &future = futures[iFuture];
if (future.valid() && future.wait_for(std::chrono::milliseconds(1)) != std::future_status::timeout){
// Collect feedback for next attempt
// Alternatively, we could already check if solution has been found and cancel other tasks [if our algorithm supports cancellation]
feedback[iFuture] = std::move(future.get());
--nofRunning;
}
if (++iFuture == futures.size())
iFuture = 0;
}
return feedback;
}
int main()
{
srand(time(NULL));
std::vector<Parameters> params(numOfThreads);
// 0. Set inital parameter values here
// If we don't want to block the main thread while the algorithm is running, we can use std::async here too
while (true){
auto feedbackVector = findSolutions(params);
auto itSolution = std::find_if(std::begin(feedbackVector), std::end(feedbackVector), isSolution);
// 3. If any of the threads has found a solution, we stop
if (itSolution != feedbackVector.end())
break;
// 5. Use feedback to re-configure parameters for next iteration
}
return 0;
}
I'm trying to implement a lock-free queue that uses a linear circular buffer to store data. In contrast to a general-purpose lock-free queue I have the following relaxing conditions:
I know the worst-case number of elements that will ever be stored in the queue. The queue is part of a system that operates on a fixed set of elements. The code will never attempt to store more elements in the queue as there are elements in this fixed set.
No multi-producer/multi-consumer. The queue will either be used in a multi-producer/single-consumer or a single-producer/multi-consumer setting.
Conceptually, the queue is implemented as follows
Standard power-of-two ring buffer. The underlying data-structure is a standard ring-buffer using the power-of-two trick. Read and write indices are only ever incremented. They are clamped to the size of the underlying array when indexing into the array using a simple bitmask. The read pointer is atomically incremented in pop(), the write pointer is atomically incremented in push().
Size variable gates access to pop(). An additional "size" variable tracks the number of elements in the queue. This eliminates the need to perform arithmetic on the read and write indices. The size variable is atomically incremented after the entire write operation has taken place, i.e. the data has been written to the backing storage and the write cursor has been incremented. I'm using a compare-and-swap (CAS) operation to atomically decrement size in pop() and only continue, if size is non-zero. This way pop() should be guaranteed to return valid data.
My queue implementation is as follows. Note the debug code that halts execution whenever pop() attempts to read past the memory that has previously been written by push(). This should never happen, since ‒ at least conceptually ‒ pop() may only proceed if there are elements on the queue (there should be no underflows).
#include <atomic>
#include <cstdint>
#include <csignal> // XXX for debugging
template <typename T>
class Queue {
private:
uint32_t m_data_size; // Number of elements allocated
std::atomic<T> *m_data; // Queue data, size is power of two
uint32_t m_mask; // Bitwise AND mask for m_rd_ptr and m_wr_ptr
std::atomic<uint32_t> m_rd_ptr; // Circular buffer read pointer
std::atomic<uint32_t> m_wr_ptr; // Circular buffer write pointer
std::atomic<uint32_t> m_size; // Number of elements in the queue
static uint32_t upper_power_of_two(uint32_t v) {
v--; // https://graphics.stanford.edu/~seander/bithacks.html
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16;
v++;
return v;
}
public:
struct Optional { // Minimal replacement for std::optional
bool good;
T value;
Optional() : good(false) {}
Optional(T value) : good(true), value(std::move(value)) {}
explicit operator bool() const { return good; }
};
Queue(uint32_t max_size)
: // XXX Allocate 1 MiB of additional memory for debugging purposes
m_data_size(upper_power_of_two(1024 * 1024 + max_size)),
m_data(new std::atomic<T>[m_data_size]),
m_mask(m_data_size - 1),
m_rd_ptr(0),
m_wr_ptr(0),
m_size(0) {
// XXX Debug code begin
// Fill the memory with a marker so we can detect invalid reads
for (uint32_t i = 0; i < m_data_size; i++) {
m_data[i] = 0xDEADBEAF;
}
// XXX Debug code end
}
~Queue() { delete[] m_data; }
Optional pop() {
// Atomically decrement the size variable
uint32_t size = m_size.load();
while (size != 0 && !m_size.compare_exchange_weak(size, size - 1)) {
}
// The queue is empty, abort
if (size <= 0) {
return Optional();
}
// Read the actual element, atomically increase the read pointer
T res = m_data[(m_rd_ptr++) & m_mask].load();
// XXX Debug code begin
if (res == T(0xDEADBEAF)) {
std::raise(SIGTRAP);
}
// XXX Debug code end
return res;
}
void push(T t) {
m_data[(m_wr_ptr++) & m_mask].store(t);
m_size++;
}
bool empty() const { return m_size == 0; }
};
However, underflows do occur and can easily be triggered in a multi-threaded stress-test. In this particular test I maintain two queues q1 and q2. In the main thread I feed a fixed number of elements into q1. Two worker threads read from q1 and push onto q2 in a tight loop. The main thread reads data from q2 and feeds it back to q1.
This works fine if there is only one worker-thread (single-producer/single-consumer) or as long as all worker-threads are on the same CPU as the main thread. However, it fails as soon as there are two worker threads that are explicitly scheduled onto a different CPU than the main thread.
The following code implements this test
#include <pthread.h>
#include <thread>
#include <vector>
static void queue_stress_test_main(std::atomic<uint32_t> &done_count,
Queue<int> &queue_rd, Queue<int> &queue_wr) {
for (size_t i = 0; i < (1UL << 24); i++) {
auto res = queue_rd.pop();
if (res) {
queue_wr.push(res.value);
}
}
done_count++;
}
static void set_thread_affinity(pthread_t thread, int cpu) {
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cpu, &cpuset);
if (pthread_setaffinity_np(thread, sizeof(cpu_set_t),
&cpuset) != 0) {
throw "Error while calling pthread_setaffinity_np";
}
}
int main() {
static constexpr uint32_t n_threads{2U}; // Number of worker threads
//static constexpr uint32_t n_threads{1U}; // < Works fine
static constexpr uint32_t max_size{16U}; // Elements in the queue
std::atomic<uint32_t> done_count{0}; // Number of finished threads
Queue<int> queue1(max_size), queue2(max_size);
// Launch n_threads threads, make sure the main thread and the two worker
// threads are on different CPUs.
std::vector<std::thread> threads;
for (uint32_t i = 0; i < n_threads; i++) {
threads.emplace_back(queue_stress_test_main, std::ref(done_count),
std::ref(queue1), std::ref(queue2));
set_thread_affinity(threads.back().native_handle(), 0);
}
set_thread_affinity(pthread_self(), 1);
//set_thread_affinity(pthread_self(), 0); // < Works fine
// Pump data from queue2 into queue1
uint32_t elems_written = 0;
while (done_count < n_threads || !queue2.empty()) {
// Initially fill queue1 with all values from 0..max_size-1
if (elems_written < max_size) {
queue1.push(elems_written++);
}
// Read elements from queue2 and put them into queue1
auto res = queue2.pop();
if (res) {
queue1.push(res.value);
}
}
// Wait for all threads to finish
for (uint32_t i = 0; i < n_threads; i++) {
threads[i].join();
}
}
Most of the time this program triggers the trap in the queue code, which means that pop() attempts to read memory that has never been touched by push() ‒ although pop() should only succeed if push() has been called at least as often as pop().
You can compile and run the above program with GCC/clang on Linux using
c++ -std=c++11 queue.cpp -o queue -lpthread && ./queue
Either just concatenate the above two code blocks or download the complete program here.
Note that I'm a complete novice when it comes to lock-free datastructures. I'm perfectly aware that there are plenty of battle-tested lock-free queue implementations for C++. However, I simply can't figure out why the above code does not work as intended.
You have two bugs, one of which can cause the failure you observe.
Let's look at your push code, except we'll allow only one operation per statement:
void push(T t)
{
auto const claimed_index = m_wr_ptr++; /* 1 */
auto const claimed_offset = claimed_index & m_mask; /* 2 */
auto& claimed_data = m_data[claimed_offset]; /* 3 */
claimed_data.store(t); /* 4 */
m_size++; /* 5 */
}
Now, for a queue with two producers, there is a window of vulnerability to a race condition between operations 1 and 4:
Before:
m_rd_ptr == 1
m_wr_ptr == 1
m_size == 0
Producer A:
/* 1 */ claimed_index = 1; m_wr_ptr = 2;
/* 2 */ claimed_offset = 1;
Scheduler puts Producer A to sleep here
Producer B:
/* 1 */ claimed_index = 2; m_wr_ptr = 3;
/* 2 */ claimed_offset = 2;
/* 3 */ claimed_data = m_data[2];
/* 4 */ claimed_data.store(t);
/* 5 */ m_size = 1;
After:
m_size == 1
m_rd_ptr == 1
m_wr_ptr == 3
m_data[1] == 0xDEADBEAF
m_data[2] == value_produced_by_B
The consumer now runs, sees m_size > 0, and reads from m_data[1] while increasing m_rd_ptr from 1 to 2. But m_data[1] hasn't been written by Producer A yet, and Producer B wrote to m_data[2].
The second bug is the complementary case in pop() when a consumer thread is interrupted between the m_rd_ptr++ action and the .load() call. It can result in reading values out of order, potentially so far out of order that the queue has completely circled and overwritten the original value.
Just because two operations in a single source statement are atomic does not make the entire statement atomic.
I have the following code:
The header:
class Counter
{
public:
Conuter(const std::string& fileName);
boost::uint16_t getCounter();
private:
tbb::atomic<boost::uint32_t> counter;
std::string counterFileName;
};
The cpp:
Counter::Counter(const std::string& fileName) : counter(), counterFileName(fileName)
{
std::string line;
std::ifstream counterFile (fileName.c_str());
if (counterFile.is_open())
{
getline (counterFile, line);
counterFile.close();
}
boost::uint32_t temp = std::stoul (line,nullptr,0);
counter = temp;
}
boost::uint32_t Counter::getCounter()
{
if (counter > 1000)
{
counter = 0;
}
assert( counter < 1000);
const boost::uint32_t ret = counter++;
if ((counter % 10) == 0)
{
// write the counter back to the file.
std::ofstream file (counterFileName.c_str());
if (file.is_open())
{
myfile << counter;
myfile.close();
}
}
return ret;
}
And elsewhere lets say we have two threads:
boost::thread t1(&Counter::getCounter, counter);
boost::thread t2(&Counter::getCounter, counter);
My question is around the atomic variable. Since the getCounter function can access the counter value up to 3 times per call can the atomic variable change from one call to the next. For example, if the call to if (counter > 1000) fails to pass is there any sort of guarantee that the assert will also pass? Maybe more concretely does the atomic variable block until the end of the function call? Or just as long as the value is being read/written? My second question is, how does the operating system handle the atomic? As in if the atomic doesn't cause the function to block until it is finished, what happens if one thread is updating the variable and one thread is attempting to write it out? Sorry for the abundance of questions this is my first attempt at a lock free data structure.
First of all, even in a single-threaded app, sequence of
if (counter > 1000) ...
assert(counter < 1000)
will fail when counter is 1000.
Second, yes, atomic variable can change between reads easily. The whole point of it is that the single read is atomic, and if another thread is updating the variable exactly when it is read, you are guaranteed to have a proper memory-ordered read (you also have some guarantees regarding arithmetics on it - your increment is guaranteed to increment). But it says nothing about next read!
If you need to lock the variable, you need to use locking mechansims, such as mutexes.
I manage some memory that is used by concurrent threads, and I have a variable
unsigned int freeBytes
When I request some memory from a task
unsigned int bytesNeeded
I must check if
bytesNeeded<=freeBytes
and if yes keep the old value of freeBytes and subtract atomically from freeBytes bytesNeeded.
Does the atomic library OR the x86 offers such possibilities ?
Use an atomic compare-and-swap operation. In pseudo-code:
do {
unsigned int n = load(freeBytes);
if (n < bytesNeeded) { return NOT_ENOUGH_MEMORY; }
unsigned int new_n = n - bytesNeeded;
} while (!compare_and_swap(&freeBytes, n, new_n));
With real C++ <atomic> variables the actual could would look pretty similar:
#include <atomic>
// Global counter for the amount of available bytes
std::atomic<unsigned int> freeBytes; // global
// attempt to decrement the counter by bytesNeeded; returns whether
// decrementing succeeded.
bool allocate(unsigned int bytesNeeded)
{
for (unsigned int n = freeBytes.load(); ; )
{
if (n < bytesNeeded) { return false; }
unsigned int new_n = n - bytesNeeded;
if (freeBytes.compare_exchange_weak(n, new_n)) { return true; }
}
}
(Note that the final compare_exchange_weak takes the first argument by reference and updates it with the current value of the atomic variable in the event that the exchange fails.)
By contrast, incrementing the value ("deallocate?) can be done with a simple atomic addition (unless you want to check for overflow). This is to some extent symptomatic of lock-free containers: Creating something is relatively easy, assuming infinite resources, but removing requires trying in a loop.
Some time ago had an interview and was asked to implement
Semaphore by using mutex operations and primitives only
(he allowed int to be considered as atomic). I came with solution below.
He did not like busy/wait part -- while (count >= size) {} -- and asked to implement locking instead by using more primitive
types and mutexes. I did not manage to come with improved solution.
Any ideas how it could be done?
struct Semaphore {
int size;
atomic<int> count;
mutex updateMutex;
Semaphore(int n) : size(n) { count.store(0); }
void aquire() {
while (1) {
while (count >= size) {}
updateMutex.lock();
if (count >= size) {
updateMutex.unlock();
continue;
}
++count;
updateMutex.unlock();
break;
}
}
void release() {
updateMutex.lock();
if (count > 0) {
--count;
} // else log err
updateMutex.unlock();
}
};
I'd wager this is not possible to implement without a busy-loop using mutexes only.
If not busy-looping, you have to block somewhere. The only blocking primitive you've got is
a mutex. Hence, you have to block on some mutex, when the semaphore counter is zero. You can be woken up only by the single owner of that mutex. However, you should woken up whenever an arbitrary thread returns a counter to the semaphore.
Now, if you are allowed condition variables, it's an entirely different story.
as #chill pointet out, the solution I did write down here will not work, as locks have unique ownership. I guess in the end you will revert to busy wait (if you are not allowed to use condition variables). I leave it here if ppl. have the same idea they see that this DOES NOT WORK ;)
struct Semaphore {
int size;
atomic<int> count;
mutex protection;
mutex wait;
Semaphore(int n) : size(n) { count.store(0); }
void aquire() {
protection.lock();
--count;
if (count < -1) {
protection.unlock();
wait.lock();
}
protection.unlock();
}
void release() {
protection.lock();
++count;
if (count > 0) {
wait.unlock();
}
protection.unlock();
}
};
That's true because technically there are some parts in your code that have no need to exist.
1- you used atomic datatypes atomic<int> count; which will take very few more cycles in execution and it is useless as long as incrementing and decrementing are locked by updateMutex.lock(); code so there is no other thread can change it during the locked state.
2- you put while (count >= size) {} which is also useless because you checked count again after the spinlock statement which is necessary and the one important here. "remember spinlock is a while(1)" when the mutex is taken by another thread.
besides if you decided to use int count; with some compiler's optimizations, maybe your code won't re-read count value!! for optimization, remember your semaphore is supposed to be used by different threads!! so you need to make it volatile, to avoid this problem.
at last, let me rewrite your code in a more performant way.
struct Semaphore {
int size;
volatile int count;
mutex updateMutex;
Semaphore(int n) : size(n), count(0) {}
void aquire() {
while (1) {
updateMutex.lock();
if (count >= size) {
updateMutex.unlock();
continue;
}
++count;
updateMutex.unlock();
break;
}
}
void release() {
updateMutex.lock();
if (count > 0) {
--count;
} // else log err
updateMutex.unlock();
}
};
EDIT - use a second mutex for queuing intstead of threads
Since a mutex already have proper thread-support, it can be used to queue the threads (instead of doing it explicitly as I first had tried to do). Unless the mutex is restricted to only let the owner unlock it (a lock?), then this solution doesn't work.
I found the solution in Anthony Howe's pdf that I came across when searching. There are two more solutions given there as well. I changed the names to make more sense for this example.
more or less pseudo code:
Semaphore{
int n;
mutex* m_count; //unlocked initially
mutex* m_queue; //locked initially
};
void wait(){
m_count.lock();
n = n-1;
if(n < 0){
m_count.unlock();
m_queue.lock(); //wait
}
m_count.unlock(); //unlock signal's lock
}
void signal(){
m_count.lock();
n = n+1;
if(n <= 0){
m_queue.unlock(); //leave m_count locked
}
else{
m_count.unlock();
}
}
lemme try this
`
# number of threads/workers
w = 10
# maximum concurrency
cr = 5
r_mutex = mutex()
w_mutex = [mutex() for x in range(w)]
# assuming mutex can be locked and unlocked by anyone
# (essentially we need a binary semaphore)
def acquire(id):
r_mutex.lock()
cr -= 1
# r_mutex.unlock()
# if exceeding maximum concurrency
if cr < 0:
# lock twice to be waken up by someone
w_mutex[id].lock()
r_mutex.unlock()
w_mutex[id].lock()
w_mutex[id].unlock()
return
r_mutex.unlock()
def release(id):
r_mutex.lock()
cr += 1
# someone must be waiting if cr < 0
if cr <= 0:
# maybe you can do this in a random order
for w in w_mutex:
if w.is_locked():
w.unlock()
break
r_mutex.unlock()
`