How to limit the number of running instances in C++ - c++

I have a c++ class that allocates a lot of memory. It does this by calling a third-party library that is designed to crash if it cannot allocate the memory, and sometimes my application creates several instances of my class in parallel threads. With too many threads I have a crash.
My best idea for a solution is to make sure that there are never, say, more than three instances running at the same time. (Is this a good idea?)
And my current best idea for implementing that is to use a boost mutex. Something along the lines of the following pseudo-code,
MyClass::MyClass(){
my_thread_number = -1; //this is a class variable
while (my_thread_number == -1)
for (int i=0; i < MAX_PROCESSES; i++)
if(try_lock a mutex named i){
my_thread_number = i;
break;
}
//Now I know that my thread has mutex number i and it is allowed to run
}
MyClass::~MyClass(){
release mutex named my_thread_number
}
As you see, I am not quite sure of the exact syntax for mutexes here.. So summing up, my questions are
Am I on the right track when I want to solve my memory error by limiting the number of threads?
If yes, Should I do it with mutexes or by other means?
If yes, Is my algorithm sound?
Is there a nice example somewhere of how to use try_lock with boost mutexes?
Edit: I realized I am talking about threads, not processes.
Edit: I am involved in building an application that can run on both linux and Windows...

UPDATE My other answer addresses scheduling resources among threads (after the question was clarified).
It shows both a semaphore approach to coordinate work among (many) workers, and a thread_pool to limit workers in the first place and queue the work.
On linux (and perhaps other OSes?) you can use a lock file idiom (but it's not supported with some file-systems and old kernels).
I would suggest to use Interprocess synchronisation objects.
E.g., using a Boost Interprocess named semaphore:
#include <boost/interprocess/sync/named_semaphore.hpp>
#include <boost/thread.hpp>
#include <cassert>
int main()
{
using namespace boost::interprocess;
named_semaphore sem(open_or_create, "ffed38bd-f0fc-4f79-8838-5301c328268c", 0ul);
if (sem.try_wait())
{
std::cout << "Oops, second instance\n";
}
else
{
sem.post();
// feign hard work for 30s
boost::this_thread::sleep_for(boost::chrono::seconds(30));
if (sem.try_wait())
{
sem.remove("ffed38bd-f0fc-4f79-8838-5301c328268c");
}
}
}
If you start one copy in the back ground, new copies will "refuse" to start ("Oops, second instance") for about 30s.
I have a feeling it might be easier to reverse the logic here. Mmm. Lemme try.
some time passes
Hehe. That was more tricky than I thought.
The thing is, you want to make sure that the lock doesn't remain when your application is interrupted or killed. In the interest of sharing the techniques for portably handling the signals:
#include <boost/interprocess/sync/named_semaphore.hpp>
#include <boost/thread.hpp>
#include <cassert>
#include <boost/asio.hpp>
#define MAX_PROCESS_INSTANCES 3
boost::interprocess::named_semaphore sem(
boost::interprocess::open_or_create,
"4de7ddfe-2bd5-428f-b74d-080970f980be",
MAX_PROCESS_INSTANCES);
// to handle signals:
boost::asio::io_service service;
boost::asio::signal_set sig(service);
int main()
{
if (sem.try_wait())
{
sig.add(SIGINT);
sig.add(SIGTERM);
sig.add(SIGABRT);
sig.async_wait([](boost::system::error_code,int sig){
std::cerr << "Exiting with signal " << sig << "...\n";
sem.post();
});
boost::thread sig_listener([&] { service.run(); });
boost::this_thread::sleep_for(boost::chrono::seconds(3));
service.post([&] { sig.cancel(); });
sig_listener.join();
}
else
{
std::cout << "More than " << MAX_PROCESS_INSTANCES << " instances not allowed\n";
}
}
There's a lot that could be explained there. Let me know if you're interested.
NOTE It should be quite obvious that if kill -9 is used on your application (forced termination) then all bets are off and you'll have to either remove the Name Semaphore object or explicitly unlock it (post()).
Here's a testrun on my system:
sehe#desktop:/tmp$ (for a in {1..6}; do ./test& done; time wait)
More than 3 instances not allowed
More than 3 instances not allowed
More than 3 instances not allowed
Exiting with signal 0...
Exiting with signal 0...
Exiting with signal 0...
real 0m3.005s
user 0m0.013s
sys 0m0.012s

Here's a simplistic way to implement your own 'semaphore' (since I don't think the standard library or boost have one). This chooses a 'cooperative' approach and workers will wait for each other:
#include <boost/thread.hpp>
#include <boost/phoenix.hpp>
using namespace boost;
using namespace boost::phoenix::arg_names;
void the_work(int id)
{
static int running = 0;
std::cout << "worker " << id << " entered (" << running << " running)\n";
static mutex mx;
static condition_variable cv;
// synchronize here, waiting until we can begin work
{
unique_lock<mutex> lk(mx);
cv.wait(lk, phoenix::cref(running) < 3);
running += 1;
}
std::cout << "worker " << id << " start work\n";
this_thread::sleep_for(chrono::seconds(2));
std::cout << "worker " << id << " done\n";
// signal one other worker, if waiting
{
lock_guard<mutex> lk(mx);
running -= 1;
cv.notify_one();
}
}
int main()
{
thread_group pool;
for (int i = 0; i < 10; ++i)
pool.create_thread(bind(the_work, i));
pool.join_all();
}
Now, I'd say it's probably better to have a dedicated pool of n workers taking their work from a queue in turns:
#include <boost/thread.hpp>
#include <boost/phoenix.hpp>
#include <boost/optional.hpp>
using namespace boost;
using namespace boost::phoenix::arg_names;
class thread_pool
{
private:
mutex mx;
condition_variable cv;
typedef function<void()> job_t;
std::deque<job_t> _queue;
thread_group pool;
boost::atomic_bool shutdown;
static void worker_thread(thread_pool& q)
{
while (auto job = q.dequeue())
(*job)();
}
public:
thread_pool() : shutdown(false) {
for (unsigned i = 0; i < boost::thread::hardware_concurrency(); ++i)
pool.create_thread(bind(worker_thread, ref(*this)));
}
void enqueue(job_t job)
{
lock_guard<mutex> lk(mx);
_queue.push_back(std::move(job));
cv.notify_one();
}
optional<job_t> dequeue()
{
unique_lock<mutex> lk(mx);
namespace phx = boost::phoenix;
cv.wait(lk, phx::ref(shutdown) || !phx::empty(phx::ref(_queue)));
if (_queue.empty())
return none;
auto job = std::move(_queue.front());
_queue.pop_front();
return std::move(job);
}
~thread_pool()
{
shutdown = true;
{
lock_guard<mutex> lk(mx);
cv.notify_all();
}
pool.join_all();
}
};
void the_work(int id)
{
std::cout << "worker " << id << " entered\n";
// no more synchronization; the pool size determines max concurrency
std::cout << "worker " << id << " start work\n";
this_thread::sleep_for(chrono::seconds(2));
std::cout << "worker " << id << " done\n";
}
int main()
{
thread_pool pool; // uses 1 thread per core
for (int i = 0; i < 10; ++i)
pool.enqueue(bind(the_work, i));
}
PS. You can use C++11 lambdas instead boost::phoenix there if you prefer.

Related

Multiple threads waiting for all to finish till new work is started

I am trying to create a sort of threadpool that runs functions on separate threads and only starts a new iteration when all functions have finished.
map<size_t, bool> status_map;
vector<thread> threads;
condition_variable cond;
bool are_all_ready() {
mutex m;
unique_lock<mutex> lock(m);
for (const auto& [_, status] : status_map) {
if (!status) {
return false;
}
}
return true;
}
void do_little_work(size_t id) {
this_thread::sleep_for(chrono::seconds(1));
cout << id << " did little work..." << endl;
}
void do_some_work(size_t id) {
this_thread::sleep_for(chrono::seconds(2));
cout << id << " did some work..." << endl;
}
void do_much_work(size_t id) {
this_thread::sleep_for(chrono::seconds(4));
cout << id << " did much work..." << endl;
}
void run(const function<void(size_t)>& function, size_t id) {
while (true) {
mutex m;
unique_lock<mutex> lock(m);
cond.wait(lock, are_all_ready);
status_map[id] = false;
cond.notify_all();
function(id);
status_map[id] = true;
cond.notify_all();
}
}
int main() {
threads.push_back(thread(run, do_little_work, 0));
threads.push_back(thread(run, do_some_work, 1));
threads.push_back(thread(run, do_much_work, 2));
for (auto& thread : threads) {
thread.join();
}
return EXIT_SUCCESS;
}
I expect to get the output:
0 did little work...
1 did some work...
2 did much work...
0 did little work...
1 did some work...
2 did much work...
.
.
.
after the respective timeouts but when I run the program I only get
0 did little work...
0 did little work...
.
.
.
I also have to say that Im rather new to multithreading but in my understanding, the condition_variable should to the taks of blocking every thread till the predicate returns true. And in my case are_all_ready should return true after all functions have returned.
There are several ways to do this.
Easiest in my opinion would be a C++20 std::barrier, which says, "wait until all of N threads have arrived and are waiting here."
#include <barrier>
std::barrier synch_workers(3);
....
void run(const std::function<void(size_t)>& func, size_t id) {
while (true) {
synch_workers.arrive_and_wait(); // wait for all three to be ready
func(id);
}
}
Cruder and less efficient, but equally effective, would be to construct and join() new sets of three worker threads for each "batch" of work:
int main(...) {
std::vector<thread> threads;
...
while (flag_running) {
threads.push_back(...);
threads.push_back(...);
...
for (auto& thread : threads) {
thread.join();
}
threads.clear();
}
Aside
I'd suggest you revisit some core synchronization concepts, however. You are using new mutexes when you want to re-use a shared one. The scope of your unique_lock isn't quite right.
Now, your idea to track worker thread "busy/idle" state in a map is straightforward, but cannot correctly coordinate "batches" or "rounds" of work that must be begun at the same time.
If a worker sees in the map that two of three threads, including itself, are "idle", what does that mean? Is a "batch" of work concluding — i.e., two workers are waiting for a tardy third? Or has a batch just begun — i.e., the two idle threads are tardy and had better get to work like their more eager peer?
The threads cannot know the answer without keeping track of the current batch of work, which is what a barrier (or its more complex cousin the phaser) does under the hood.
As-is, your program has a crash (UB) due to concurrent access to status_map.
When you do:
void run(const function<void(size_t)>& function, size_t id)
{
...
mutex m;
unique_lock<mutex> lock(m);
...
status_map[id] = false;
the locks created are local variables, one per thread, and as such independent. So, it doesn't prevent multiple threads from writing to status_map at once, and thus crashing. That's what I get on my machine.
Now, if you make the mutex static, only one thread can access the map at once. But that also makes it so that only one thread runs at once. With this I see 0, 1 and 2 running, but only once at a time and a strong tendency for the previous thread to have run to run again.
My suggestion, go back to the drawing board and make it simpler. All threads run at once, single mutex to protect the map, only lock the mutex to access the map, and ... well, in fact, I don't even see the need for a condition variable.
e.g. what is wrong with:
#include <thread>
#include <iostream>
#include <vector>
using namespace std;
vector<thread> threads;
void do_little_work(size_t id) {
this_thread::sleep_for(chrono::seconds(1));
cout << id << " did little work..." << endl;
}
void do_some_work(size_t id) {
this_thread::sleep_for(chrono::seconds(2));
cout << id << " did some work..." << endl;
}
void do_much_work(size_t id) {
this_thread::sleep_for(chrono::seconds(4));
cout << id << " did much work..." << endl;
}
void run(const function<void(size_t)>& function, size_t id) {
while (true) {
function(id);
}
}
int main() {
threads.push_back(thread(run, do_little_work, 0));
threads.push_back(thread(run, do_some_work, 1));
threads.push_back(thread(run, do_much_work, 2));
for (auto& thread : threads) {
thread.join();
}
return EXIT_SUCCESS;
}

Using boost to turn single thread to multi thread

I'm trying to turn a code from a single thread to a multi thread(example, create 6 threads instead of 1) while making sure they all start and finish without any interference from each other. What would be a way to do this? Could I just do a for loop that creates a thread until i < 6? And just add a mutex class with lock() and unlock()?
#include <iostream>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
void workerFunc()
{
boost::posix_time::seconds workTime(3);
std::cout << "Worker: running" << std::endl;
// Pretend to do something useful...
boost::this_thread::sleep(workTime);
std::cout << "Worker: finished" << std::endl;
}
int main(int argc, char* argv[])
{
std::cout << "main: startup" << std::endl;
boost::thread workerThread(workerFunc);
std::cout << "main: waiting for thread" << std::endl;
workerThread.join();
std::cout << "main: done" << std::endl;
system("pause");
return 0;
}
Yes, it's certainly possible. Since you don't want any interference between them, give them unique data to work with so that you do not need to synchronize the access to that data with a std::mutex or making it std::atomic. To further minimize the interference between threads, align the data according to std::hardware_destructive_interference_size.
You can use boost::thread::hardware_concurrency() to get the number of hardware threads available on the current system so that you don't have to hardcode the number of threads to run.
Passing references to the thread can be done using std::ref (or else the thread will get a ref to a copy of the data).
Here I create a std::list of threads and a std::vector of data to work on.
#include <cstdint> // std::int64_t
#include <iostream>
#include <list>
#include <new> // std::hardware_destructive_interference_size
#include <vector>
#include <boost/thread.hpp>
unsigned hardware_concurrency() {
unsigned rv = boost::thread::hardware_concurrency();
if(rv == 0) rv = 1; // fallback if hardware_concurrency returned 0
return rv;
}
// if you don't have hardware_destructive_interference_size, use something like this
// instead:
//struct alignas(64) data {
struct alignas(std::hardware_destructive_interference_size) data {
std::int64_t x;
};
void workerFunc(data& d) {
// work on the supplied data
for(int i = 0; i < 1024*1024-1; ++i) d.x -= i;
for(int i = 0; i < 1024*1024*1024-1; ++i) d.x += i;
}
int main() {
std::cout << "main: startup" << std::endl;
size_t number_of_threads = hardware_concurrency();
std::list<boost::thread> threads;
std::vector<data> dataset(number_of_threads);
// create the threads
for(size_t idx = 0; idx < number_of_threads; ++idx)
threads.emplace_back(workerFunc, std::ref(dataset[idx]));
std::cout << "main: waiting for threads" << std::endl;
// join all threads
for(auto& th : threads) th.join();
// display results
for(const data& d : dataset) std::cout << d.x << "\n";
std::cout << "main: done" << std::endl;
}
If you are using C++11 (or later), I suggest using std::thread instead.
Starting and stopping a bunch of Boost threads
std::vector<boost::thread> threads;
for (int i = 0; i < numberOfThreads; ++i) {
boost::thread t(workerFunc);
threads.push_back(std::move(t));
}
for (auto& t : threads) {
t.join();
}
Keep in mind that join() doesn't terminate the threads, it only waits until they are finished.
Synchronization
Mutexes are required if multiple threads access the same data and at least one of them is writing the data. You can use a mutex to ensure that multiple threads enter the critical sections of the code. Example:
std::queue<int> q;
std::mutex q_mu;
void workerFunc1() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
q.push(foo);
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
void workerFunc2() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
foo = q.pop();
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
This prevents undefined behavior like reading an item from the queue that hasn't been written completely. Be careful - data races can crash your program or corrupt your data. I'm frequently using tools like Thread Sanitizer or Helgrind to ensure I didn't miss anything. If you only want to pass results back into the main program but don't need to share data between your threads you might want to consider using std::promise and std::future.
Yes, spawning new threads can be done with a simple loop. You will have to keep a few things in mind though:
If threads will operate on shared data, it will need to be protected with mutexes, atomics or via some other way to avoid data races and undefined behaviour (bear in mind that even primitive types such as int have to be wrapped with an atomic or mutex according to the standard).
You will have to make sure that you will eventually either call join() or detach() on every spawned thread before its object goes out of scope to prevent it from suddenly terminating.
Its best to do some computations on the main thread while waiting for worker threads to use this time efficiently instead of wasting it.
You generally want to spawn 1 thread less than the number of total threads you want as the program starts running with with one thread by default (the main thread).

Resolve deadlock issue, waiting in the main thread for multiple worker threads to finish (C++11)

I'm trying to write a program with c++11 in which multiple threads are run, and, during each cycle the main thread will wait for each thread to be finished. The program below is a testing program for this concept.
Apparently I'm missing something trivial in my implementation as it looks like I'm experiencing a deadlock (Not always, just during some random runs).
#include <iostream>
#include <stdio.h>
#include <thread>
#include <chrono>
#include <condition_variable>
#include <mutex>
using namespace std;
class Producer
{
public:
Producer(int a_id):
m_id(a_id),
m_ready(false),
m_terminate(false)
{
m_id = a_id;
m_thread = thread(&Producer::run, this);
// ensure thread is available before it is started
this_thread::sleep_for(std::chrono::milliseconds(100));
}
~Producer() {
terminate();
m_thread.join();
}
void start() {
//cout << "start " << m_id << endl;
unique_lock<mutex> runLock(m_muRun);
m_ready = true;
runLock.unlock();
m_cond.notify_all();
}
void wait() {
cout << "wait " << m_id << endl;
unique_lock<decltype(m_muRun)> runLock(m_muRun);
m_cond.wait(runLock, [this]{return !m_ready;});
}
void terminate() {
m_terminate = true;
start();
}
void run() {
do {
unique_lock<decltype(m_muRun)> runLock(m_muRun);
m_cond.wait(runLock, [this]{return m_ready;});
if (!m_terminate) {
cout << "running thread: " << m_id << endl;
} else {
cout << "exit thread: " << m_id << endl;
}
runLock.unlock();
m_ready = false;
m_cond.notify_all();
} while (!m_terminate);
}
private:
int m_id;
bool m_ready;
bool m_terminate;
thread m_thread;
mutex m_muRun;
condition_variable m_cond;
};
int main()
{
Producer producer1(1);
Producer producer2(2);
Producer producer3(3);
for (int i=0; i<10000; ++i) {
cout << i << endl;
producer1.start();
producer2.start();
producer3.start();
producer1.wait();
producer2.wait();
producer3.wait();
}
cout << "exit" << endl;
return 0;
}
The program's output when the deadlock is occurring:
....
.......
running thread: 2
running thread: 1
wait 1
wait 2
wait 3
running thread: 3
Looking at the program's output when the deadlock occurs, I suspect the bottleneck of the program is that sometimes the Producer::wait function is called, before the corresponding thread is actually started, i.e. the command Producer::start should have triggered the start, a.k. unlocking of the mutex, however it is not yet picked up by the thread's run method (Producer::run), (NB: I'm not 100% sure of this!). I'm a bit lost here, hopefully somebody can provide some help.
You have race condition in this code:
runLock.unlock();
m_ready = false;
m_ready variable must be always protected by mutex for proper synchronization. And it is completely unnecessary to wait for thread to start this_thread::sleep_for() - proper synchronization would take care of that as well so you can simply remove that line. Note this is pretty inefficient way of doing proper multithreading - there should be thread pool instead of individual object with separate mutex and condition variable each.

How to cancel std::async when condition is met?

I am running an asynchronous task and want to cancel it when a certain condition (bool) is met.
void MyClass::createTask()
{
this->future = std::async(std::launch::async, [this](){
while(this->CONDITION == false)
{
// do work
}
});
}
void MyClass::cancelTask()
{
this->CONDITION = true;
this->future.get();
}
Obviously, calling MyClass::cancelTask() would cause a data-race, because this->CONDITION is being written to and read from at the same time. So the first thing that came to my mind is to use a std::mutex. However that would mean that the task has to lock and unlock the mutex on every new iteration of the while-loop. Since the async task is performance critical, this seems like a bad choice.
Is there a cleaner, and especially a more perfomant way to achieve what I am trying to do? Switching from std::async to std::thread would be ok if it enabled an efficient solution.
As far as I know there is no elegant way to close a thread/async task in C++.
A simple way is to use std::atomic<bool> or std::atomic_flag instead of a mutex.
If you are familiar with boost library, than you could use boost::thread with interruption_points.
I have a solution for this kind of requeirements. I use std::mutex, std::condition_variable and std::unique_lock<std::mutex> to create tow methods: pauseThread and resumeThread.
The idea is use the condition_variable and unique_lock to make the thread wait for a time, for example 5 seconds, and after the time os over the thread continue its execution. But, if you want to interrupt the condition_variable you could use its method notify_one().
Using your code, and continue with your idea, i made some changes to your class:
MODIFICATION: I modify the flag bKeepRunning.
MyClass.h
#include <mutex>
#include <chrono>
#include <future>
#include <atomic>
class MyClass
{
std::atomic<bool> bKeepRunning;
std::mutex mtx_t;
std::condition_variable cv_t;
std::future<void> _future;
public:
MyClass();
~MyClass();
void createTask();
void stopTask();
void pauseThread(int time);
void resumeThread();
}
MyClass.cpp
#include "MyClass.h"
#include <iostream>
using namespace std;
MyClass::MyClass()
{
bKeepRunning = false;
}
MyClass::~MyClass()
{
}
void MyClass::createTask()
{
bKeepRunning = true;
_future = std::async(std::launch::async, [this]() {
int counter = 0;
cout << "Thread running" << endl;
while (bKeepRunning)
{
counter++;
cout << "Asynchronous thread counter = [" << counter << "]" << endl;
this->pauseThread(5);//Wait for 5 seconds
}
cout << "Thread finished." << endl;
});
}
void MyClass::stopTask()
{
cout << "Stoping Thread." << endl;
bKeepRunning = false;
resumeThread();
}
void MyClass::pauseThread(int time)
{
std::unique_lock<std::mutex> lck_t(mtx_t);
cv_t.wait_for(lck_t, chrono::seconds(time));
}
void MyClass::resumeThread()
{
cout << "Resumming thread" << endl;
cv_t.notify_one();
}
I made a console sample to show how it works:
Main.cpp
#include <iostream>
#include <sstream>
#include <string>
#include "MyClass.h"
using namespace std;
int main(int argc, char* argv[])
{
MyClass app;
char line[80];
cout << "Press Enter to stop thread." << endl;
app.createTask();
cin.getline(line,80);
app.stopTask();
}
If you need some other period of time to pause your thread, you can try to change the interval and time of chrono::seconds(time) to, for example, chrono::milliseconds(time) that is using milliseconds.+
At the end, if you execute this sample, you could get an output like:

Non-blocking semaphores in C++11?

A number of questions on this site deal with the lack of a semaphore object in the multi-threading support introduced in C++11. Many people suggested implementing semaphores using mutexes or condition variables or a combination of both.
However, none of these approaches allows to increment and decrement a semaphore while guaranteeing that the calling thread is not blocked, since usually a lock must be acquired before reading the semaphore's value. The POSIX semaphore for instance has the functions sem_post() and sem_trywait(), both of which are non-blocking.
Is it possible to implement a non-blocking semaphore with the C++11 multi-threading support only? Or am I necessarily required to use an OS-dependent library for this? If so, why does the C++11 revision not include a semaphore object?
A similar question has not been answered in 3 years. (Note: I believe the question I am asking is much broader though, there are certainly other uses for a non-blocking semaphore object aside from a producer/consumer. If despite this someone believes my question is a duplicate, then please tell me how I can bring back attention to the old question since this is still an open issue.)
I don't see a problem to implement a semaphore. Using C++11 atomics and mutextes it should be possible.
class Semaphore
{
private:
std::atomic<int> count_;
public:
Semaphore() :
count_(0) // Initialized as locked.
{
}
void notify() {
count_++;
}
void wait() {
while(!try_wait()) {
//Spin Locking
}
}
bool try_wait() {
int count = count_;
if(count) {
return count_.compare_exchange_strong(count, count - 1);
} else {
return false;
}
}
};
Here is a little example of the usage:
#include <iostream>
#include "Semaphore.hpp"
#include <thread>
#include <vector>
Semaphore sem;
int counter;
void run(int threadIdx) {
while(!sem.try_wait()) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
//Alternative use wait
//sem.wait()
std::cout << "Thread " << threadIdx << " enter critical section" << std::endl;
counter++;
std::cout << "Thread " << threadIdx << " incresed counter to " << counter << std::endl;
// Do work;
std::this_thread::sleep_for(std::chrono::milliseconds(30));
std::cout << "Thread " << threadIdx << " leave critical section" << std::endl;
sem.notify();
}
int main() {
std::vector<std::thread> threads;
for(int i = 0; i < 15; i++) {
threads.push_back(std::thread(run, i));
}
sem.notify();
for(auto& t : threads) {
t.join();
}
std::cout << "Terminate main." << std::endl;
return 0;
}
Of course, the wait is a blocking operation. But notify and try_wait are both non-blocking, if the compare and exchange operation is non blocking (can be checked).