Implementing a "temporarily suspendable" concurrent loop in C++ - c++

I'm writing a program whose main thread spawns a worker thread that performs some work, sleeps for a set amount of time in an infinite loop, i.e. the worker thread executes:
void do_work() {
for (;;) {
// do some work
std::this_thread::sleep_for(100ms);
}
}
Now, I would additionally like to be able to temporarily completely disable this worker thread from the main thread, i.e. I would like to write the following functions:
disable_worker(): disable the worker thread
enable_worker(): enable the worker thread again
What I've come up with is the following:
#include <chrono>
#include <condition_variable>
#include <mutex>
#include <thread>
using namespace std::literals::chrono_literals;
bool enabled;
std::mutex mtx;
std::condition_variable cond;
void disable_worker() {
std::lock_guard<std::mutex> lock(mtx);
enabled = false;
}
void enable_worker() {
{
std::lock_guard<std::mutex> lock(mtx);
enabled = true;
}
cond.notify_one();
}
void do_work() {
for (;;) {
std::unique_lock<std::mutex> lock(mtx);
cond.wait(lock, []{ return enabled; });
// ... do some work ...
std::this_thread::sleep_for(100ms);
}
}
int main() {
std::thread t(do_work);
// ... enable/disable t as necessary ...
}
I suppose this works (at least I can't spot any issues), however, I would also like to guarantee that when either of enable_worker and disable_worker return (in the main thread), the working thread is guaranteed to be either blocking on the condition variable or sleeping, i.e. not performing any work. How can I implement this without any race conditions?

Here is an API for a concurrent door with a queue counter and a the idea of using it "sleepily".
struct SleepyDoorQueue {
void UseDoor() {
auto l = lock();
++queue_size;
cv.notify_all();
cv.wait( l, [&]{ return open; } );
--queue_size;
}
// sleeps for a while, then tries to open the door.
// considered in queue while sleeping.
template<class Rep, class Period>
void SleepyUseDoor( const std::chrono::duration<Rep, Period>& rel_time ) {
{
auto l = lock();
++queue_size;
cv.notify_all();
}
std::this_thread::sleep_for(rel_time);
auto l = lock();
cv.wait( l, [&]{ return open; } );
--queue_size;
}
void CloseDoor() {
auto l = lock();
open = false;
}
void OpenDoor() {
auto l = lock();
open = true;
cv.notify_all();
}
void WaitForQueueSize(std::size_t n) const {
auto l = lock();
cv.wait(l, [&]{ return queue_size >= n; } );
}
explicit SleepyDoorQueue( bool startOpened = true ):open(startOpened) {}
private:
std::condition_variable cv;
mutable std::mutex m;
std::size_t queue_size = 0;
bool open = true;
auto lock() const { return std::unique_lock(m); }
};
the main thread closes the door, and waits for a queue size of 1 to ensure that the worker thread isn't working.
The worker thread does a SleepyUseDoor to try to open it after sleeping for 100ms.
When the worker thread can do work, the main thread just opens the door.
This will be inefficient with a large number of worker and controller threads, as I use the same cv for both the queue and door opening message. So one will cause the other threads to wake up spuriously. With one worker and one controller thread, the messages won't be spurious to any significant degree.
I only notify on queue size increase and door opening, but I do more than 1 notification on purpose (if there is someone waiting for a queue size change and a door opener eats it, that would suck).
You could probably implement this with two doors actually.
struct Door {
// blocks until the door is open
void UseDoor() const {
auto l = lock();
cv.wait(l, [&]{ return open; });
}
// opens the door. Notifies blocked threads trying to use the door.
void OpenDoor() {
auto l = lock();
open = true;
cv.notify_all();
}
// closes the door.
void CloseDoor() {
auto l = lock();
open = false;
}
explicit Door(bool startOpen=true):open(startOpen) {}
private:
std::condition_variable cv;
mutable std::mutex m;
bool open = true;
auto lock() const { return std::unique_lock(m); }
};
The worker thread does this:
Door AmNotWorking(true);
Door CanWork(true);
void work() {
for(;;) {
canWork.UseDoor()
AmNotWorking.CloseDoor();
// work
AmNotWorking.OpenDoor();
std::this_thread::sleep_for(100ms);
}
}
the controller thread does:
void preventWork() {
CanWork.CloseDoor();
AmNotWorking.UseDoor();
}
void allowWork() {
CanWork.OpenDoor();
}
but I see a race condition there; between CanWork.UseDoor() and AmNotWorking.OpenDoor(); someone could close the CanWork door then read the AmNotWorking door. We need that to be atomic.
// Goes through the door when it is open.
// atomically runs the lambda passed in while the
// mutex is locked with checking the door state.
// WARNING: this can cause deadlocks if you do the
// wrong things in the lambda.
template<class F>
void UseDoor(F atomicWhenOpen) const {
auto l = lock();
cv.wait(l, [&]{ return open; });
atomicWhenOpen();
}
that does an atomic operation when we successfully use the door. A bit dangerous, but the worker thread can now:
void work() {
for(;;) {
canWork.UseDoor([]{AmNotWorking.CloseDoor();});
// work
AmNotWorking.OpenDoor();
std::this_thread::sleep_for(100ms);
}
}
this guarantees we have the "AmNotWorking" door closed in the same lock as we verified the "CanWork" door is open.
void preventWork() {
CanWork.CloseDoor();
AmNotWorking.UseDoor();
}
If the "use can work and close am working" operation happens before the CanWork.CloseDoor(), we won't be able to AmNotWorking.UseDoor() until the worker thread finishes their work.
If it happens after CanWork.CloseDoor(), then the AmNotWorking.UseDoor() is closed, so we again wait until the worker thread is not working.
We can't CanWork.CloseDoor() between the can work door being used and the AmNotWorking being closed, which is what that extra atomic lambda callback gives us.
We can probably make a less dangerous primitive, but I'm not sure how to do it elegantly.
Maybe a simple semaphore?
template<class T = std::ptrdiff_t>
struct Semaphore {
void WaitUntilExactValue( T t ) const {
auto l = lock();
cv.wait( l, [&]{ return value==t; }
}
void WaitUntilAtLeastValue( T t ) const {
auto l = lock();
cv.wait( l, [&]{ return value>=t; }
}
void WaitUntilAtMostValue( T t ) const {
auto l = lock();
cv.wait( l, [&]{ return value<=t; }
}
void Increment() {
auto l = lock();
++value;
cv.notify_all();
}
void BoundedIncrement(T ceil) {
auto l = lock();
cv.wait(l, [&]{ return value+1 <= ceil; });
++value;
cv.notify_all();
}
void Decrement() {
auto l = lock();
--value;
cv.notify_all();
}
void BoundedDecrement(T floor) {
auto l = lock();
cv.wait(l, [&]{ return value-1 >= floor; });
--value;
cv.notify_all();
}
explicit Semaphore( T in = 0 ):value(std::forward<T>(in)) {}
private:
std::condition_variable cv;
mutable std::mutex m;
T value = 0;
auto lock() const; // see above
};
then
Semaphore workLimit(1);
void work() {
for(;;) {
workLimit.BoundedDecrement(0);
// work
workLimit.Increment();
std::this_thread::sleep_for(100ms);
}
}
void preventWork() {
workLimit.Decrement();
workLimit.WaitUntilExactValue(0);
}
void allowWork() {
workLimit.Increment();
}
Here, the workLimit is how many more workers are permitted to be working at this point. It is 1 to start with.
When a worker is working but not allowed to, it is -1. When it is working and allowed to, it is 0. When it is sleeping and allowed to work, it is 1. When it is sleeping (either because it is in sleep for, or bounded decrement) and not allowed to work, it is 0.

Related

C++ STL Producer multiple consumer where producer waits for free consumer before producing next value

My little consumer-producer problem had me stumped for some time. I didn't want an implementation where one producer pushes some data round-robin to the consumers, filling up their queues of data respectively.
I wanted to have one producer, x consumers, but the producer waits with producing new data until a consumer is free again. In my example there are 3 consumers so the producer creates a maximum of 3 objects of data at any given time. Since I don't like polling, the consumers were supposed to notify the producer when they are done. Sounds simple, but the solution I found doesn't please me. First the code.
#include "stdafx.h"
#include <mutex>
#include <iostream>
#include <future>
#include <map>
#include <atomic>
std::atomic_int totalconsumed;
class producer {
using runningmap_t = std::map<int, std::pair<std::future<void>, bool>>;
// Secure the map of futures.
std::mutex mutex_;
runningmap_t running_;
// Used for finished notification
std::mutex waitermutex_;
std::condition_variable waiter_;
// The magic number to limit the producer.
std::atomic<int> count_;
bool can_run();
void clean();
// Fake a source, e.g. filesystem scan.
int fakeiter;
int next();
bool has_next() const;
public:
producer() : fakeiter(50) {}
void run();
void notify(int value);
void wait();
};
class consumer {
producer& producer_;
public:
consumer(producer& producer) : producer_(producer) {}
void run(int value) {
std::this_thread::sleep_for(std::chrono::milliseconds(42));
std::cout << "Consumed " << value << " on (" << std::this_thread::get_id() << ")" << std::endl;
totalconsumed++;
producer_.notify(value);
}
};
// Only if less than three threads are active, another gets to run.
bool producer::can_run() { return count_.load() < 3; }
// Verify if there's something to consume
bool producer::has_next() const { return 0 != fakeiter; }
// Produce the next value for consumption.
int producer::next() { return --fakeiter; }
// Remove the futures that have reported to be finished.
void producer::clean()
{
for (auto it = running_.begin(); it != running_.end(); ) {
if (it->second.second) {
it = running_.erase(it);
}
else {
++it;
}
}
}
// Runs the producer. Creates a new consumer for every produced value. Max 3 at a time.
void producer::run()
{
while (has_next()) {
if (can_run()) {
auto c = next();
count_++;
auto future = std::async(&consumer::run, consumer(*this), c);
std::unique_lock<std::mutex> lock(mutex_);
running_[c] = std::make_pair(std::move(future), false);
clean();
}
else {
std::unique_lock<std::mutex> lock(waitermutex_);
waiter_.wait(lock);
}
}
}
// Consumers diligently tell the producer that they are finished.
void producer::notify(int value)
{
count_--;
mutex_.lock();
running_[value].second = true;
mutex_.unlock();
std::unique_lock<std::mutex> waiterlock(waitermutex_);
waiter_.notify_all();
}
// Wait for all consumers to finish.
void producer::wait()
{
while (!running_.empty()) {
mutex_.lock();
clean();
mutex_.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
// Looks like the application entry point.
int main()
{
producer p;
std::thread pthread(&producer::run, &p);
pthread.join();
p.wait();
std::cout << std::endl << std::endl << "Total consumed " << totalconsumed.load() << std::endl;
return 0;
}
The part I don't like is the list of values mapped to the futures, called running_. I need to keep the future around until the consumer is actually done. I can't remove the future from the map in the notify method or else I'll kill the thread that is currently calling notify.
Am I missing something that could simplify this construct?
template<class T>
struct slotted_data {
std::size_t I;
T t;
};
template<class T>
using sink = std::function<void(T)>;
template<class T, std::size_t N>
struct async_slots {
bool produce( slotted_data<T> data ) {
if (terminate || data.I>=N) return false;
{
auto l = lock();
if (slots[data.I]) return false;
slots[data.I] = std::move(data.t);
}
cv.notify_one();
return true;
}
// rare use of non-lambda cv.wait in the wild!
bool consume(sink<slotted_data<T>> f) {
auto l = lock();
while(!terminate) {
for (auto& slot:slots) {
if (slot) {
auto r = std::move(*slot);
slot = std::nullopt;
f({std::size_t(&slot-slots.data()), std::move(r)}); // invoke in lock
return true;
}
}
cv.wait(l);
}
return false;
}
// easier and safer version:
std::optional<slotted_data<T>> consume() {
std::optional<slotted_data<T>> r;
bool worked = consume([&](auto&& data) { r = std::move(data); });
if (!worked) return {};
return r;
}
void finish() {
{
auto l = lock();
terminate = true;
}
cv.notify_all();
}
private:
auto lock() { return std::unique_lock<std::mutex>(m); }
std::mutex m;
std::condition_variable cv;
std::array< std::optional<T>, N > slots;
bool terminate = false;
};
async_slots provides a fixed number of slots and an awaitable consume. If you try to produce two things in the same slot, the producer function returns false and ignores you.
consume invokes the sink of the data inside the mutex in a continuation passing style. This permits atomic consumption.
We want to invert producer and consumer:
template<class T, std::size_t N>
struct slotted_consumer {
bool consume( std::size_t I, sink<T> sink ) {
std::optional<T> data;
std::condition_variable cv;
std::mutex m;
bool worked = slots.produce(
{
I,
[&](auto&& t){
{
std::unique_lock<std::mutex> l(m);
data.emplace(std::move(t));
}
cv.notify_one();
}
}
);
if (!worked) return false;
std::unique_lock<std::mutex> l(m);
cv.wait(l, [&]()->bool{
return (bool)data;
});
sink( std::move(*data) );
return true;
}
bool produce( T t ) {
return slots.consume(
[&](auto&& f) {
f.t( std::move(t) );
}
);
}
void finish() {
slots.finish();
}
private:
async_slots< sink<T>, N > slots;
};
we have to take some care to execute sink in a context where we are not holding the mutex of async_slots, which is why consume above is so strange.
Live example.
You share a slotted_consumer< int, 3 > slots. The producing thread repeatedly calls slots.produce(42);. It blocks until a new consumer lines up.
Consumer #2 calls slots.consume( 2, [&](int x){ /* code to consume x */ } ), and #1 and #0 pass their slot numbers as well.
All 3 consumers can be waiting for the next production. The above system defaults to feeding #0 first if it is waiting for more work; we could make it "fair" at a cost of keeping a bit more state.

Synchronize three Threads in C++

I have the following program (made up example!):
#include<thread>
#include<mutex>
#include<iostream>
class MultiClass {
public:
void Run() {
std::thread t1(&MultiClass::Calc, this);
std::thread t2(&MultiClass::Calc, this);
std::thread t3(&MultiClass::Calc, this);
t1.join();
t2.join();
t3.join();
}
private:
void Calc() {
for (int i = 0; i < 10; ++i) {
std::cout << i << std::endl;
}
}
};
int main() {
MultiClass m;
m.Run();
return 0;
}
What I need is to sync the loop iterations the following way and I cant come up with a solution (I've been fiddling for about an hour now using mutexes but cant find THE combination):
t1 and t2 shall do one loop iteration, then t3 shall do one iteration, then again t1 and t2 shall do one, then t3 shall do one.
So you see, I need t1 and t2 to do things simultaneously and after one iteration, t3 shall do one iteration on its own.
Can you point your finger on how I would be able to achieve that? Like I said, ive been trying this with mutexes and cant come up with a solution.
If you really want to do this by hand with the given thread structure, you could use something like this*:
class SyncObj {
mutex mux;
condition_variable cv;
bool completed[2]{ false,false };
public:
void signalCompetionT1T2(int id) {
lock_guard<mutex> ul(mux);
completed[id] = true;
cv.notify_all();
}
void signalCompetionT3() {
lock_guard<mutex> ul(mux);
completed[0] = false;
completed[1] = false;
cv.notify_all();
}
void waitForCompetionT1T2() {
unique_lock<mutex> ul(mux);
cv.wait(ul, [&]() {return completed[0] && completed[1]; });
}
void waitForCompetionT3(int id) {
unique_lock<mutex> ul(mux);
cv.wait(ul, [&]() {return !completed[id]; });
}
};
class MultiClass {
public:
void Run() {
std::thread t1(&MultiClass::Calc1, this);
std::thread t2(&MultiClass::Calc2, this);
std::thread t3(&MultiClass::Calc3, this);
t1.join();
t2.join();
t3.join();
}
private:
SyncObj obj;
void Calc1() {
for (int i = 0; i < 10; ++i) {
obj.waitForCompetionT3(0);
std::cout << "T1:" << i << std::endl;
obj.signalCompetionT1T2(0);
}
}
void Calc2() {
for (int i = 0; i < 10; ++i) {
obj.waitForCompetionT3(1);
std::cout << "T2:" << i << std::endl;
obj.signalCompetionT1T2(1);
}
}
void Calc3() {
for (int i = 0; i < 10; ++i) {
obj.waitForCompetionT1T2();
std::cout << "T3:" << i << std::endl;
obj.signalCompetionT3();
}
}
};
However, this is only a reasonable approach, if each iteration is computational expensive, such that you can ignore the synchronization overhead. If that is not the case you should probably better have a look at a proper parallel programming library like intel's tbb or microsofts ppl.
*)NOTE: This code is untested and unoptimized. I just wrote it to show what the general structure could look like
Use two condition variables, here is a sketch..
thread 1 & 2 wait on condition variable segment_1:
std::condition_variable segment_1;
thread 3 waits on condition variable segment_2;
std::condition_variable segment_2;
threads 1 & 2 should wait() on segment_1, and thread 3 should wait() on segment_2. To kick off threads 1 & 2, call notify_all() on segment_1, and once they complete, call notify_one() on segment_2 to kick off thread 3. You may want to use some controlling thread to control the sequence unless you can chain (i.e. once 1 & 2 complete, the last one to complete calls notify for thread 3 and so on..)
This is not perfect (see lost wakeups)

Efficiently waiting for all tasks in a threadpool to finish

I currently have a program with x workers in my threadpool. During the main loop y tasks are assigned to the workers to complete, but after the tasks are sent out I must wait for all tasks for finish before preceding with the program. I believe my current solution is inefficient, there must be a better way to wait for all tasks to finish but I am not sure how to go about this
// called in main after all tasks are enqueued to
// std::deque<std::function<void()>> tasks
void ThreadPool::waitFinished()
{
while(!tasks.empty()) //check if there are any tasks in queue waiting to be picked up
{
//do literally nothing
}
}
More information:
threadpool structure
//worker thread objects
class Worker {
public:
Worker(ThreadPool& s): pool(s) {}
void operator()();
private:
ThreadPool &pool;
};
//thread pool
class ThreadPool {
public:
ThreadPool(size_t);
template<class F>
void enqueue(F f);
void waitFinished();
~ThreadPool();
private:
friend class Worker;
//keeps track of threads so we can join
std::vector< std::thread > workers;
//task queue
std::deque< std::function<void()> > tasks;
//sync
std::mutex queue_mutex;
std::condition_variable condition;
bool stop;
};
or here's a gist of my threadpool.hpp
example of what I want to use waitFinished() for:
while(running)
//....
for all particles alive
push particle position function to threadpool
end for
threadPool.waitFinished();
push new particle position data into openGL buffer
end while
so this way I can send hundrends of thousands of particle position tasks to be done in parallel, wait for them to finish and put the new data inside the openGL position buffers
This is one way to do what you're trying. Using two condition variables on the same mutex is not for the light-hearted unless you know what is going on internally. I didn't need the atomic processed member other than my desire to demonstrate how many items were finished between each run.
The sample workload function in this generates one million random int values, then sorts them (gotta heat my office one way or another). waitFinished will not return until the queue is empty and no threads are busy.
#include <iostream>
#include <deque>
#include <functional>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <random>
//thread pool
class ThreadPool
{
public:
ThreadPool(unsigned int n = std::thread::hardware_concurrency());
template<class F> void enqueue(F&& f);
void waitFinished();
~ThreadPool();
unsigned int getProcessed() const { return processed; }
private:
std::vector< std::thread > workers;
std::deque< std::function<void()> > tasks;
std::mutex queue_mutex;
std::condition_variable cv_task;
std::condition_variable cv_finished;
std::atomic_uint processed;
unsigned int busy;
bool stop;
void thread_proc();
};
ThreadPool::ThreadPool(unsigned int n)
: busy()
, processed()
, stop()
{
for (unsigned int i=0; i<n; ++i)
workers.emplace_back(std::bind(&ThreadPool::thread_proc, this));
}
ThreadPool::~ThreadPool()
{
// set stop-condition
std::unique_lock<std::mutex> latch(queue_mutex);
stop = true;
cv_task.notify_all();
latch.unlock();
// all threads terminate, then we're done.
for (auto& t : workers)
t.join();
}
void ThreadPool::thread_proc()
{
while (true)
{
std::unique_lock<std::mutex> latch(queue_mutex);
cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
if (!tasks.empty())
{
// got work. set busy.
++busy;
// pull from queue
auto fn = tasks.front();
tasks.pop_front();
// release lock. run async
latch.unlock();
// run function outside context
fn();
++processed;
latch.lock();
--busy;
cv_finished.notify_one();
}
else if (stop)
{
break;
}
}
}
// generic function push
template<class F>
void ThreadPool::enqueue(F&& f)
{
std::unique_lock<std::mutex> lock(queue_mutex);
tasks.emplace_back(std::forward<F>(f));
cv_task.notify_one();
}
// waits until the queue is empty.
void ThreadPool::waitFinished()
{
std::unique_lock<std::mutex> lock(queue_mutex);
cv_finished.wait(lock, [this](){ return tasks.empty() && (busy == 0); });
}
// a cpu-busy task.
void work_proc()
{
std::random_device rd;
std::mt19937 rng(rd());
// build a vector of random numbers
std::vector<int> data;
data.reserve(100000);
std::generate_n(std::back_inserter(data), data.capacity(), [&](){ return rng(); });
std::sort(data.begin(), data.end(), std::greater<int>());
}
int main()
{
ThreadPool tp;
// run five batches of 100 items
for (int x=0; x<5; ++x)
{
// queue 100 work tasks
for (int i=0; i<100; ++i)
tp.enqueue(work_proc);
tp.waitFinished();
std::cout << tp.getProcessed() << '\n';
}
// destructor will close down thread pool
return EXIT_SUCCESS;
}
Output
100
200
300
400
500
Best of luck.

std::lock_guard won't unlock

I'm trying to lock my list of mutexes in the following code so that only one thread can search it, unlock, lock or modify it at a time.
#include <mutex>
#include <map>
#include <memory>
#include <vector>
#include <thread>
#include <atomic>
#include <iostream>
#include <Windows.h>
struct MoveableMutex
{
std::mutex m;
MoveableMutex() {}
MoveableMutex(MoveableMutex const&) {}
MoveableMutex& operator = (MoveableMutex const&) { return *this; }
};
class Locks
{
private:
static std::mutex map_lock;
static std::uint32_t lock_count;
std::map<std::uint32_t, MoveableMutex> locklist;
public:
std::uint32_t AddLock();
void RemoveLock(std::uint32_t ID);
void Lock(std::uint32_t ID);
bool TryLock(std::uint32_t ID);
void Unlock(std::uint32_t ID);
};
std::uint32_t Locks::lock_count = 0;
std::mutex Locks::map_lock;
std::uint32_t Locks::AddLock()
{
std::lock_guard<std::mutex> guard(map_lock);
locklist.insert(std::make_pair(++lock_count, MoveableMutex()));
return lock_count;
}
void Locks::RemoveLock(std::uint32_t ID)
{
std::lock_guard<std::mutex> guard(map_lock);
auto it = locklist.find(ID);
if (it != locklist.end())
{
it->second.m.unlock();
locklist.erase(it);
}
}
void Locks::Lock(std::uint32_t ID)
{
std::lock_guard<std::mutex> guard(map_lock);
auto it = this->locklist.find(ID);
if (it != this->locklist.end())
{
it->second.m.lock();
}
}
bool Locks::TryLock(std::uint32_t ID)
{
std::lock_guard<std::mutex> guard(map_lock);
auto it = this->locklist.find(ID);
if (it != this->locklist.end())
{
return it->second.m.try_lock();
}
return false;
}
void Locks::Unlock(std::uint32_t ID)
{
std::lock_guard<std::mutex> guard(map_lock);
auto it = this->locklist.find(ID);
if (it != locklist.end())
{
it->second.m.unlock();
}
}
int main()
{
Locks locklist;
int i = locklist.AddLock();
std::atomic<bool> stop(false);
std::atomic<bool> stop2(false);
std::thread o([&]
{
locklist.Lock(i);
while(!stop)
{
std::cout << "Hey\n";
Sleep(100);
}
locklist.Unlock(i);
});
std::thread t([&]
{
locklist.Lock(i);
while(!stop2)
{
std::cout << "Hey2\n";
Sleep(100);
}
locklist.Unlock(i);
});
Sleep(1000);
stop = true;
system("CLS");
o.join();
Sleep(1000);
stop2 = true;
t.join();
return 0;
}
However, with the std::lock_guard inside the Unlock function, it causes a deadlock. If I remove the lock_guard from the Unlock function, it works fine.
Is there a reason the lock_guard isn't destructing or unlocking?
One thread calls Lock, which ends up locking the mutex in the map. The other thread calls Lock, which locks map_lock then tries to lock the mutex in the map, and gets stuck there (with map_lock still held). Eventually, the first thread gets out of the loop and calls Unlock, which gets stuck waiting on map_lock.
The main design flaw here is that you have a thread acquire two locks, one after another. This only works safely if all threads acquire them in the same order (and release in reverse order of acquiring). But your code acquires them in different order at different times: that's a recipe for a deadlock.
See also: lock hierarchy

C++0x has no semaphores? How to synchronize threads?

Is it true that C++0x will come without semaphores? There are already some questions on Stack Overflow regarding the use of semaphores. I use them (posix semaphores) all the time to let a thread wait for some event in another thread:
void thread0(...)
{
doSomething0();
event1.wait();
...
}
void thread1(...)
{
doSomething1();
event1.post();
...
}
If I would do that with a mutex:
void thread0(...)
{
doSomething0();
event1.lock(); event1.unlock();
...
}
void thread1(...)
{
event1.lock();
doSomethingth1();
event1.unlock();
...
}
Problem: It's ugly and it's not guaranteed that thread1 locks the mutex first (Given that the same thread should lock and unlock a mutex, you also can't lock event1 before thread0 and thread1 started).
So since boost doesn't have semaphores either, what is the simplest way to achieve the above?
You can easily build one from a mutex and a condition variable:
#include <mutex>
#include <condition_variable>
class semaphore {
std::mutex mutex_;
std::condition_variable condition_;
unsigned long count_ = 0; // Initialized as locked.
public:
void release() {
std::lock_guard<decltype(mutex_)> lock(mutex_);
++count_;
condition_.notify_one();
}
void acquire() {
std::unique_lock<decltype(mutex_)> lock(mutex_);
while(!count_) // Handle spurious wake-ups.
condition_.wait(lock);
--count_;
}
bool try_acquire() {
std::lock_guard<decltype(mutex_)> lock(mutex_);
if(count_) {
--count_;
return true;
}
return false;
}
};
Based on Maxim Yegorushkin's answer, I tried to make the example in C++11 style.
#include <mutex>
#include <condition_variable>
class Semaphore {
public:
Semaphore (int count_ = 0)
: count(count_) {}
inline void notify()
{
std::unique_lock<std::mutex> lock(mtx);
count++;
cv.notify_one();
}
inline void wait()
{
std::unique_lock<std::mutex> lock(mtx);
while(count == 0){
cv.wait(lock);
}
count--;
}
private:
std::mutex mtx;
std::condition_variable cv;
int count;
};
I decided to write the most robust/generic C++11 semaphore I could, in the style of the standard as much as I could (note using semaphore = ..., you normally would just use the name semaphore similar to normally using string not basic_string):
template <typename Mutex, typename CondVar>
class basic_semaphore {
public:
using native_handle_type = typename CondVar::native_handle_type;
explicit basic_semaphore(size_t count = 0);
basic_semaphore(const basic_semaphore&) = delete;
basic_semaphore(basic_semaphore&&) = delete;
basic_semaphore& operator=(const basic_semaphore&) = delete;
basic_semaphore& operator=(basic_semaphore&&) = delete;
void notify();
void wait();
bool try_wait();
template<class Rep, class Period>
bool wait_for(const std::chrono::duration<Rep, Period>& d);
template<class Clock, class Duration>
bool wait_until(const std::chrono::time_point<Clock, Duration>& t);
native_handle_type native_handle();
private:
Mutex mMutex;
CondVar mCv;
size_t mCount;
};
using semaphore = basic_semaphore<std::mutex, std::condition_variable>;
template <typename Mutex, typename CondVar>
basic_semaphore<Mutex, CondVar>::basic_semaphore(size_t count)
: mCount{count}
{}
template <typename Mutex, typename CondVar>
void basic_semaphore<Mutex, CondVar>::notify() {
std::lock_guard<Mutex> lock{mMutex};
++mCount;
mCv.notify_one();
}
template <typename Mutex, typename CondVar>
void basic_semaphore<Mutex, CondVar>::wait() {
std::unique_lock<Mutex> lock{mMutex};
mCv.wait(lock, [&]{ return mCount > 0; });
--mCount;
}
template <typename Mutex, typename CondVar>
bool basic_semaphore<Mutex, CondVar>::try_wait() {
std::lock_guard<Mutex> lock{mMutex};
if (mCount > 0) {
--mCount;
return true;
}
return false;
}
template <typename Mutex, typename CondVar>
template<class Rep, class Period>
bool basic_semaphore<Mutex, CondVar>::wait_for(const std::chrono::duration<Rep, Period>& d) {
std::unique_lock<Mutex> lock{mMutex};
auto finished = mCv.wait_for(lock, d, [&]{ return mCount > 0; });
if (finished)
--mCount;
return finished;
}
template <typename Mutex, typename CondVar>
template<class Clock, class Duration>
bool basic_semaphore<Mutex, CondVar>::wait_until(const std::chrono::time_point<Clock, Duration>& t) {
std::unique_lock<Mutex> lock{mMutex};
auto finished = mCv.wait_until(lock, t, [&]{ return mCount > 0; });
if (finished)
--mCount;
return finished;
}
template <typename Mutex, typename CondVar>
typename basic_semaphore<Mutex, CondVar>::native_handle_type basic_semaphore<Mutex, CondVar>::native_handle() {
return mCv.native_handle();
}
in acordance with posix semaphores, I would add
class semaphore
{
...
bool trywait()
{
boost::mutex::scoped_lock lock(mutex_);
if(count_)
{
--count_;
return true;
}
else
{
return false;
}
}
};
And I much prefer using a synchronisation mechanism at a convenient level of abstraction, rather than always copy pasting a stitched-together version using more basic operators.
C++20 finally has semaphores - std::counting_semaphore<max_count>.
These have (at least) the following methods:
acquire() (blocking)
try_acquire() (non-blocking, returns immediately)
try_acquire_for() (non-blocking, takes a duration)
try_acquire_until() (non-blocking, takes a time at which to stop trying)
release()
You can read these CppCon 2019 presentation slides, or watch the video. There's also the official proposal P0514R4, but it may not be up-to-date with actual C++20.
You can also check out cpp11-on-multicore - it has a portable and optimal semaphore implementation.
The repository also contains other threading goodies that complement c++11 threading.
You can work with mutex and condition variables. You gain exclusive access with the mutex, check whether you want to continue or need to wait for the other end. If you need to wait, you wait in a condition. When the other thread determines that you can continue, it signals the condition.
There is a short example in the boost::thread library that you can most probably just copy (the C++0x and boost thread libs are very similar).
Also can be useful RAII semaphore wrapper in threads:
class ScopedSemaphore
{
public:
explicit ScopedSemaphore(Semaphore& sem) : m_Semaphore(sem) { m_Semaphore.Wait(); }
ScopedSemaphore(const ScopedSemaphore&) = delete;
~ScopedSemaphore() { m_Semaphore.Notify(); }
ScopedSemaphore& operator=(const ScopedSemaphore&) = delete;
private:
Semaphore& m_Semaphore;
};
Usage example in multithread app:
boost::ptr_vector<std::thread> threads;
Semaphore semaphore;
for (...)
{
...
auto t = new std::thread([..., &semaphore]
{
ScopedSemaphore scopedSemaphore(semaphore);
...
}
);
threads.push_back(t);
}
for (auto& t : threads)
t.join();
I found the shared_ptr and weak_ptr, a long with a list, did the job I needed. My issue was, I had several clients wanting to interact with a host's internal data. Typically, the host updates the data on it's own, however, if a client requests it, the host needs to stop updating until no clients are accessing the host data. At the same time, a client could ask for exclusive access, so that no other clients, nor the host, could modify that host data.
How I did this was, I created a struct:
struct UpdateLock
{
typedef std::shared_ptr< UpdateLock > ptr;
};
Each client would have a member of such:
UpdateLock::ptr m_myLock;
Then the host would have a weak_ptr member for exclusivity, and a list of weak_ptrs for non-exclusive locks:
std::weak_ptr< UpdateLock > m_exclusiveLock;
std::list< std::weak_ptr< UpdateLock > > m_locks;
There is a function to enable locking, and another function to check if the host is locked:
UpdateLock::ptr LockUpdate( bool exclusive );
bool IsUpdateLocked( bool exclusive ) const;
I test for locks in LockUpdate, IsUpdateLocked, and periodically in the host's Update routine. Testing for a lock is as simple as checking if the weak_ptr's expired, and removing any expired from the m_locks list (I only do this during the host update), I can check if the list is empty; at the same time, I get automatic unlocking when a client resets the shared_ptr they are hanging onto, which also happens when a client gets destroyed automatically.
The over all effect is, since clients rarely need exclusivity (typically reserved for additions and deletions only), most of the time a request to LockUpdate( false ), that is to say non-exclusive, succeeds so long as (! m_exclusiveLock). And a LockUpdate( true ), a request for exclusivity, succeeds only when both (! m_exclusiveLock) and (m_locks.empty()).
A queue could be added to mitigate between exclusive and non-exclusive locks, however, I have had no collisions thus far, so I intend to wait until that happens to add the solution (mostly so I have a real-world test condition).
So far this is working well for my needs; I can imagine the need to expand this, and some issues that might arise over expanded use, however, this was quick to implement, and required very little custom code.
There old question but I would like to offer another solution.
It seems you need a not semathore but a event like Windows Events.
Very effective events can be done like following:
#ifdef _MSC_VER
#include <concrt.h>
#else
// pthread implementation
#include <cstddef>
#include <cstdint>
#include <shared_mutex>
namespace Concurrency
{
const unsigned int COOPERATIVE_TIMEOUT_INFINITE = (unsigned int)-1;
const size_t COOPERATIVE_WAIT_TIMEOUT = SIZE_MAX;
class event
{
public:
event();
~event();
size_t wait(unsigned int timeout = COOPERATIVE_TIMEOUT_INFINITE);
void set();
void reset();
static size_t wait_for_multiple(event** _PPEvents, size_t _Count, bool _FWaitAll, unsigned int _Timeout = COOPERATIVE_TIMEOUT_INFINITE);
static const unsigned int timeout_infinite = COOPERATIVE_TIMEOUT_INFINITE;
private:
int d;
std::shared_mutex guard;
};
};
namespace concurrency = Concurrency;
#include <unistd.h>
#include <errno.h>
#include <sys/eventfd.h>
#include <sys/epoll.h>
#include <chrono>
#include "../HandleHolder.h"
typedef CommonHolder<int, close> fd_holder;
namespace Concurrency
{
int watch(int ep_fd, int fd)
{
epoll_event ep_event;
ep_event.events = EPOLLIN;
ep_event.data.fd = fd;
return epoll_ctl(ep_fd, EPOLL_CTL_ADD, fd, &ep_event);
}
event::event()
: d(eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK))
{
}
event::~event()
{
std::unique_lock<std::shared_mutex> lock(guard);
close(d);
d = -1;
}
size_t event::wait(unsigned int timeout)
{
fd_holder ep_fd(epoll_create1(EPOLL_CLOEXEC));
{
std::shared_lock<std::shared_mutex> lock(guard);
if (d == -1 || watch(ep_fd.GetHandle(), d) < 0)
return COOPERATIVE_WAIT_TIMEOUT;
}
epoll_event ep_event;
return epoll_wait(ep_fd.GetHandle(), &ep_event, 1, timeout) == 1 && (ep_event.events & EPOLLIN) ? 0 : COOPERATIVE_WAIT_TIMEOUT;
}
void event::set()
{
uint64_t count = 1;
write(d, &count, sizeof(count));
}
void event::reset()
{
uint64_t count;
read(d, &count, sizeof(count));
}
size_t event::wait_for_multiple(event** _PPEvents, size_t _Count, bool _FWaitAll, unsigned int _Timeout)
{
if (_FWaitAll) // not implemented
std::abort();
const auto deadline = _Timeout != COOPERATIVE_TIMEOUT_INFINITE ? std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now().time_since_epoch()).count() + _Timeout : COOPERATIVE_TIMEOUT_INFINITE;
fd_holder ep_fd(epoll_create1(EPOLL_CLOEXEC));
int fds[_Count];
for (int i = 0; i < _Count; ++i)
{
std::shared_lock<std::shared_mutex> lock(_PPEvents[i]->guard);
fds[i] = _PPEvents[i]->d;
if (fds[i] != -1 && watch(ep_fd.GetHandle(), fds[i]) < 0)
fds[i] = -1;
}
epoll_event ep_events[_Count];
// Вызов epoll_wait может быть прерван сигналом. Ждём весь тайм-аут, так же, как в Windows
int res = 0;
while (true)
{
res = epoll_wait(ep_fd.GetHandle(), &ep_events[0], _Count, _Timeout);
if (res == -1 && errno == EINTR && std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now().time_since_epoch()).count() < deadline)
continue;
break;
}
for (int i = 0; i < _Count; ++i)
{
if (fds[i] == -1)
continue;
for (int j = 0; j < res; ++j)
if (ep_events[j].data.fd == fds[i] && (ep_events[j].events & EPOLLIN))
return i;
}
return COOPERATIVE_WAIT_TIMEOUT;
}
};
#endif
And then just use concurrency::event
Different from other answers, I propose a new version which:
Unblocks all waiting threads before being deleted. In this case, deleting the semaphore will wake up all waiting threads and only after everybody wakes up, the semaphore destructor will exit.
Has a parameter to the wait() call, to automatically unlock the calling thread after the timeout in milliseconds has passed.
Has an options on the construtor to limit available resources count only up to the count the semaphore was initialized with. This way, calling notify() too many times will not increase how many resources the semaphore has.
#include <stdio.h>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <iostream>
std::recursive_mutex g_sync_mutex;
#define sync(x) do { \
std::unique_lock<std::recursive_mutex> lock(g_sync_mutex); \
x; \
} while (false);
class Semaphore {
int _count;
bool _limit;
int _all_resources;
int _wakedup;
std::mutex _mutex;
std::condition_variable_any _condition_variable;
public:
/**
* count - how many resources this semaphore holds
* limit - limit notify() calls only up to the count value (available resources)
*/
Semaphore (int count, bool limit)
: _count(count),
_limit(limit),
_all_resources(count),
_wakedup(count)
{
}
/**
* Unlock all waiting threads before destructing the semaphore (to avoid their segfalt later)
*/
virtual ~Semaphore () {
std::unique_lock<std::mutex> lock(_mutex);
_wakeup(lock);
}
void _wakeup(std::unique_lock<std::mutex>& lock) {
int lastwakeup = 0;
while( _wakedup < _all_resources ) {
lock.unlock();
notify();
lock.lock();
// avoids 100% CPU usage if someone is not waking up properly
if (lastwakeup == _wakedup) {
std::this_thread::sleep_for( std::chrono::milliseconds(10) );
}
lastwakeup = _wakedup;
}
}
// Mutex and condition variables are not movable and there is no need for smart pointers yet
Semaphore(const Semaphore&) = delete;
Semaphore& operator =(const Semaphore&) = delete;
Semaphore(const Semaphore&&) = delete;
Semaphore& operator =(const Semaphore&&) = delete;
/**
* Release one acquired resource.
*/
void notify()
{
std::unique_lock<std::mutex> lock(_mutex);
// sync(std::cerr << getTime() << "Calling notify(" << _count << ", " << _limit << ", " << _all_resources << ")" << std::endl);
_count++;
if (_limit && _count > _all_resources) {
_count = _all_resources;
}
_condition_variable.notify_one();
}
/**
* This function never blocks!
* Return false if it would block when acquiring the lock. Otherwise acquires the lock and return true.
*/
bool try_acquire() {
std::unique_lock<std::mutex> lock(_mutex);
// sync(std::cerr << getTime() << "Calling try_acquire(" << _count << ", " << _limit << ", " << _all_resources << ")" << std::endl);
if(_count <= 0) {
return false;
}
_count--;
return true;
}
/**
* Return true if the timeout expired, otherwise return false.
* timeout - how many milliseconds to wait before automatically unlocking the wait() call.
*/
bool wait(int timeout = 0) {
std::unique_lock<std::mutex> lock(_mutex);
// sync(std::cerr << getTime() << "Calling wait(" << _count << ", " << _limit << ", " << _all_resources << ")" << std::endl);
_count--;
_wakedup--;
try {
std::chrono::time_point<std::chrono::system_clock> timenow = std::chrono::system_clock::now();
while(_count < 0) {
if (timeout < 1) {
_condition_variable.wait(lock);
}
else {
std::cv_status status = _condition_variable.wait_until(lock, timenow + std::chrono::milliseconds(timeout));
if ( std::cv_status::timeout == status) {
_count++;
_wakedup++;
return true;
}
}
}
}
catch (...) {
_count++;
_wakedup++;
throw;
}
_wakedup++;
return false;
}
/**
* Return true if calling wait() will block the calling thread
*/
bool locked() {
std::unique_lock<std::mutex> lock(_mutex);
return _count <= 0;
}
/**
* Return true the semaphore has at least all resources available (since when it was created)
*/
bool freed() {
std::unique_lock<std::mutex> lock(_mutex);
return _count >= _all_resources;
}
/**
* Return how many resources are available:
* - 0 means not free resources and calling wait() will block te calling thread
* - a negative value means there are several threads being blocked
* - a positive value means there are no threads waiting
*/
int count() {
std::unique_lock<std::mutex> lock(_mutex);
return _count;
}
/**
* Wake everybody who is waiting and reset the semaphore to its initial value.
*/
void reset() {
std::unique_lock<std::mutex> lock(_mutex);
if(_count < 0) {
_wakeup(lock);
}
_count = _all_resources;
}
};
Utility to print the current timestamp:
std::string getTime() {
char buffer[20];
#if defined( WIN32 )
SYSTEMTIME wlocaltime;
GetLocalTime(&wlocaltime);
::snprintf(buffer, sizeof buffer, "%02d:%02d:%02d.%03d ", wlocaltime.wHour, wlocaltime.wMinute, wlocaltime.wSecond, wlocaltime.wMilliseconds);
#else
std::chrono::time_point< std::chrono::system_clock > now = std::chrono::system_clock::now();
auto duration = now.time_since_epoch();
auto hours = std::chrono::duration_cast< std::chrono::hours >( duration );
duration -= hours;
auto minutes = std::chrono::duration_cast< std::chrono::minutes >( duration );
duration -= minutes;
auto seconds = std::chrono::duration_cast< std::chrono::seconds >( duration );
duration -= seconds;
auto milliseconds = std::chrono::duration_cast< std::chrono::milliseconds >( duration );
duration -= milliseconds;
time_t theTime = time( NULL );
struct tm* aTime = localtime( &theTime );
::snprintf(buffer, sizeof buffer, "%02d:%02d:%02d.%03ld ", aTime->tm_hour, aTime->tm_min, aTime->tm_sec, milliseconds.count());
#endif
return buffer;
}
Example program using this semaphore:
// g++ -o test -Wall -Wextra -ggdb -g3 -pthread test.cpp && gdb --args ./test
// valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose ./test
// procdump -accepteula -ma -e -f "" -x c:\ myexe.exe
int main(int argc, char* argv[]) {
std::cerr << getTime() << "Creating Semaphore" << std::endl;
Semaphore* semaphore = new Semaphore(1, false);
semaphore->wait(1000);
semaphore->wait(1000);
std::cerr << getTime() << "Auto Unlocking Semaphore wait" << std::endl;
std::this_thread::sleep_for( std::chrono::milliseconds(5000) );
delete semaphore;
std::cerr << getTime() << "Exiting after 10 seconds..." << std::endl;
return 0;
}
Example output:
11:03:01.012 Creating Semaphore
11:03:02.012 Auto Unlocking Semaphore wait
11:03:07.012 Exiting after 10 seconds...
Extra function which uses a EventLoop to unlock the semaphores after some time:
std::shared_ptr<std::atomic<bool>> autowait(Semaphore* semaphore, int timeout, EventLoop<std::function<void()>>& eventloop, const char* source) {
std::shared_ptr<std::atomic<bool>> waiting(std::make_shared<std::atomic<bool>>(true));
sync(std::cerr << getTime() << "autowait '" << source << "'..." << std::endl);
if (semaphore->try_acquire()) {
eventloop.enqueue( timeout, [waiting, source, semaphore]{
if ( (*waiting).load() ) {
sync(std::cerr << getTime() << "Timeout '" << source << "'..." << std::endl);
semaphore->notify();
}
} );
}
else {
semaphore->wait(timeout);
}
return waiting;
}
Semaphore semaphore(1, false);
EventLoop<std::function<void()>>* eventloop = new EventLoop<std::function<void()>>(true);
std::shared_ptr<std::atomic<bool>> waiting_something = autowait(&semaphore, 45000, eventloop, "waiting_something");
In case someone is interested in the atomic version, here is the implementation. The performance is expected better than the mutex & condition variable version.
class semaphore_atomic
{
public:
void notify() {
count_.fetch_add(1, std::memory_order_release);
}
void wait() {
while (true) {
int count = count_.load(std::memory_order_relaxed);
if (count > 0) {
if (count_.compare_exchange_weak(count, count-1, std::memory_order_acq_rel, std::memory_order_relaxed)) {
break;
}
}
}
}
bool try_wait() {
int count = count_.load(std::memory_order_relaxed);
if (count > 0) {
if (count_.compare_exchange_strong(count, count-1, std::memory_order_acq_rel, std::memory_order_relaxed)) {
return true;
}
}
return false;
}
private:
std::atomic_int count_{0};
};