Race condition on stack - c++

I have simple class Hello and I am trying to call member function say_hello on different thread. I created two different implementation it hellos_in_stack and hellos_in_heap. hellos_in_heap works as expected however hellos_on_stack have a race condition on member variable _i. How can I avoid it on stack using mutex?
#include <thread>
#include <iostream>
#include <vector>
#include <mutex>
std::mutex mu;
class Hello
{
int _i;
public:
Hello()
{
std::lock_guard<std::mutex> lock(mu);
_i = 0;
}
~Hello(){
}
void say_hello()
{
std::lock_guard<std::mutex> lock(mu);
std::cout << "say_hello from thread " << ++_i << " " <<this << " " << std::this_thread::get_id() << std::endl;
}
};
void hellos_in_stack()
{
std::vector<std::thread> threads;
for(int i = 0; i < 4; ++i)
{
Hello h;
threads.push_back(std::thread(&Hello::say_hello, &h));
}
for(auto& thread : threads){
thread.join();
}
}
void hellos_in_heap()
{
std::vector<std::thread> threads;
std::vector<Hello *> hellos;
Hello *h = nullptr;
for(int i = 0; i < 4; ++i)
{
h = new Hello();
hellos.push_back(h);
threads.push_back(std::thread(&Hello::say_hello, h));
}
for(auto& thread : threads){
thread.join();
}
for(auto hello : hellos){
delete hello;
}
}
int main()
{
hellos_in_stack();
hellos_in_heap();
return 0;
}

Let's describe the race condition first...
The line Hello h; is constructing h on the main thread's stack. Once the for loop moves on to create the next thread, h is destroyed and another Hello is created -- likely, but not guaranteed, to be at the same address as the previous h.
h must be kept alive for the lifetime of the thread that is running its say_hello method.
One solution would be to create h on the new thread's stack. This can be done like so:
std::vector<std::thread> threads;
for (int i = 0; i < 4; ++i)
{
threads.emplace_back([]() {
Hello h;
h.say_hello();
});
}
Another option, if you still need the instances of h to be accessible from the main thread, would be to store them in a container.
std::vector<std::thread> threads;
std::list<Hello> hellos;
for (int i = 0; i < 4; ++i)
{
hellos.emplace_back();
threads.emplace_back(&Hello::say_hello, &hellos.back());
}
Using a container we've introduced some more complexity. Now, care must be taken to make sure that we use the container itself in a safe way. In this case std::list is used instead of std::vector because calling emplace_back/push_back on std::vector can cause it to resize its buffer. This would destroy Hello instance out from under running threads!
Running example: https://ideone.com/F7STsf

Related

Thread pool with job queue gets stuck

I want to split jobs among multiple std::thread workers and continue once they are all done.
To do so, I implemented a thread pool class mainly based on this SO answer.
I noticed, however, that my benchmarks can get stuck, running forever, without any errors thrown.
I wrote a minimal reproducing code, enclosed at the end.
Based on terminal output, the issue seems to occur when the jobs are being queued.
I checked videos (1, 2), documentation (3) and blog posts (4).
I tried replacing the type of the locks, using atomics.
I could not find the underlying cause.
Here is the snippet to replicate the issue.
The program repeatedly counts the odd elements in the test vector.
#include <atomic>
#include <condition_variable>
#include <functional>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
#include <vector>
class Pool {
public:
const int worker_count;
bool to_terminate = false;
std::atomic<int> unfinished_tasks = 0;
std::mutex mutex;
std::condition_variable condition;
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
void thread_loop()
{
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(mutex);
condition.wait(lock, [&] { return (!jobs.empty()) || to_terminate; });
if (to_terminate)
return;
job = jobs.front();
jobs.pop();
}
job();
unfinished_tasks -= 1;
}
}
public:
Pool(int size) : worker_count(size)
{
if (size < 0)
throw std::invalid_argument("Worker count needs to be a positive integer");
for (int i = 0; i < worker_count; ++i)
threads.push_back(std::thread(&Pool::thread_loop, this));
};
~Pool()
{
{
std::unique_lock lock(mutex);
to_terminate = true;
}
condition.notify_all();
for (auto &thread : threads)
thread.join();
threads.clear();
};
void queue_job(const std::function<void()> &job)
{
{
std::unique_lock<std::mutex> lock(mutex);
jobs.push(job);
unfinished_tasks += 1;
// std::cout << unfinished_tasks;
}
condition.notify_one();
}
void wait()
{
while (unfinished_tasks) {
; // spinlock
};
}
};
int main()
{
constexpr int worker_count = 8;
constexpr int vector_size = 1 << 10;
Pool pool = Pool(worker_count);
std::vector<int> test_vector;
test_vector.reserve(vector_size);
for (int i = 0; i < vector_size; ++i)
test_vector.push_back(i);
std::vector<int> worker_odd_counts(worker_count, 0);
std::function<void(int)> worker_task = [&](int thread_id) {
int chunk_size = vector_size / (worker_count) + 1;
int my_start = thread_id * chunk_size;
int my_end = std::min(my_start + chunk_size, vector_size);
int local_odd_count = 0;
for (int ii = my_start; ii < my_end; ++ii)
if (test_vector[ii] % 2 != 0)
++local_odd_count;
worker_odd_counts[thread_id] = local_odd_count;
};
for (int iteration = 0;; ++iteration) {
std::cout << "Jobs.." << std::flush;
for (int i = 0; i < worker_count; ++i)
pool.queue_job([&worker_task, i] { worker_task(i); });
std::cout << "..queued. " << std::flush;
pool.wait();
int odd_count = 0;
for (auto elem : worker_odd_counts)
odd_count += elem;
std::cout << "Iter:" << iteration << ". Odd:" << odd_count << '\n';
}
}
Here is the terminal output of one specific run:
[...]
Jobs....queued. Iter:2994. Odd:512
Jobs....queued. Iter:2995. Odd:512
Jobs..
Edit:
The error occurres using GCC 12.2.0 x86_64-w64-mingw32 on Windows 10 with AMD Ryzen 4750U CPU. I do not get past 15k iterations .
Using Visual Studio Community 2022, I got past 1.5M iterations (and stopped it myself). Thanks #IgorTandetnik for pointing out the latter.
Mingw doesn’t natively support multithreading on Windows. They supporting threads in their C++ standard library over the POSIX API, and winpthreads compatibility layer which implements that API on top of the Windows OS threads.
I think your error is not in the C++ code, but in the computer setup. Do the following.
Use the compiler from x86_64-12.2.0-release-posix-seh-ucrt-rt_v10-rev2.7z archive, there.
Don’t forget the binary built that way depends on a bunch of DLL files provided by the compiler: libgcc_s_seh-1.dll, libwinpthread-1.dll and libstdc++-6.dll. You must use exactly the same version of these DLL which were shipped with mingw. If you have some other versions of these DLLs anywhere in your %PATH%, expect all kinds of fails.
Couple general notes.
Linux-first C++ compilers like gcc have issues on Windows. A path of least resistance is using Visual C++ instead. If you want your software to build on other platforms as well, consider cmake to abstract away the compiler.
Windows already includes a thread pool implementation, since Vista. The API is easy to use, you only need 4 functions: CreateThreadpoolWork, SubmitThreadpoolWork, WaitForThreadpoolWorkCallbacks, and CloseThreadpoolWork. Example.
The first thing you should do is split the queue from the thread pool. They are both tricky enough, writing both of them comingled in one class is asking for trouble.
This also allows you to unit test the queue without the pool.
template<class Payload>
class MutexQueue {
public:
std::optional<Payload> wait_and_pop();
void push(Payload);
void terminate_queue();
bool queue_is_terminated() const;
private:
mutable std::mutex m;
std::condition_variable cv;
std::deque<Payload> q;
bool terminated = false;
std::unique_lock<std::mutex> lock() const {
return std::unique_lock<std::mutex>(m);
}
};
this is a bit easier to write than the thread pool.
void push(Payload p) {
{
auto l = lock();
if (terminate) return;
q.push_back(std::move(p));
}
cv.notify_one();
}
void terminate_queue() {
{
auto l = lock(); // YOU CANNOT SKIP THIS LOCK, even if terminate is atomic
terminate = true;
q.clear();
}
cv.notify_all();
}
bool queue_is_terminated() const {
auto l = lock(); // if you make terminate atomic, you CAN skip this lock
return terminate;
}
std::optional<Payload> wait_and_pop() {
auto l = lock();
cv.wait(l, [&]{ return terminate || !q.empty(); }
if (terminate) return std::nullopt;
auto r = std::move(q.front());
q.pop_front();
return std::move(r);
}
there we go.
Now our thread pool is simpler.
struct ThreadPool {
explicit ThreadPool(std::size_t n) {
create_threads(n);
}
std::future<void> push_task(std::function<void()> f) {
std::packaged_task<void()> p = std::move(f);
auto r = p.get_future();
q.push( std::move(p) );
return r;
}
void terminate_pool() {
q.terminate_queue();
terminate_threads();
}
~ThreadPool() {
terminate_pool();
}
private:
MutexQueue<std::packaged_task<void()>> q;
std::vector<std::thread> threads;
void terminate_threads() {
for(auto& thread:threads)
thread.join();
threads.clear();
}
static void thread_task( MutexQueue<std::packaged_task<void()>>* pq ) {
if (!pq) return;
while (auto task = pq->wait_and_pop()) {
(*task)();
}
}
void create_threads(std::size_t n) {
for (std::size_t i = 0; i < n; ++i) {
threads.push_back( std::thread( thread_task, &q ) );
}
}
I cannot spot an error in your code. But with the above, you can test a split of the queue from the pool.
The queue will work with pthreads or other primitives.

Thread pool with individual std::function jobs per worker crashes with segmentation fault

I have successfully implemented the thread pool from an answer on Stack Overflow, which helped me in speeding up my program. It uses a single std::queue to distribute jobs (std::function<void()>) among multiple workers (std::threads).
I wanted to improve on this. As I only need to run a limited set of functions, I planned to ditch the queue and to use variables instead. In other words, the n-th worker would do the n-th job from the std::vector<std::function<void()>>. Unfortunately, my test app crashes with Segmentation fault (core dumped) and I could not realize my mistake so far.
Here is my ~minimal reproducible code, with the job of counting the odd elements in a vector. (Idea taken from Scott Meyers: Cpu Caches and Why You Care.)
#include <algorithm>
#include <condition_variable>
#include <functional>
#include <iostream>
#include <mutex>
#include <stdexcept> // std::invalid_argument
#include <thread>
#include <vector>
// Thread pool with a std::function for each worker.
class Pool {
public:
enum class Status {
idle,
working,
terminate
};
const int worker_count;
std::vector<Status> statuses;
std::vector<std::mutex> mutexes;
std::vector<std::condition_variable> conditions;
std::vector<std::thread> threads;
std::vector<std::function<void()>> jobs;
void thread_loop(int thread_id)
{
std::puts("Thread started");
auto &my_status = statuses[thread_id];
auto &my_mutex = mutexes[thread_id];
auto &my_condition = conditions[thread_id];
auto &my_job = jobs[thread_id];
while (true) {
std::unique_lock<std::mutex> lock(my_mutex);
my_condition.wait(lock, [this, &my_status] { return my_status != Status::idle; });
if (my_status == Status::terminate)
return;
my_job();
my_status = Status::idle;
lock.unlock();
my_condition.notify_one(); // Tell the main thread we are done
}
}
public:
Pool(int size) : worker_count(size), statuses(size, Status::idle), mutexes(size), conditions(size), threads(), jobs(size)
{
if (size < 0)
throw std::invalid_argument("Worker count needs to be a positive integer");
};
~Pool()
{
for (int i = 0; i < worker_count; ++i) {
std::unique_lock lock(mutexes[i]);
statuses[i] = Status::terminate;
lock.unlock(); // Unlock before notifying
conditions[i].notify_one();
}
for (auto &thread : threads)
thread.join();
threads.clear();
};
void start_threads()
{
threads.resize(worker_count);
jobs.resize(worker_count);
for (int i = 0; i < worker_count; ++i) {
statuses[i] = Status::idle;
jobs[i] = []() { std::puts("I am running"); };
threads[i] = std::thread(&Pool::thread_loop, this, i);
}
}
void set_and_start_job(const std::function<void(int)> &job)
{
for (int i = 0; i < worker_count; ++i) {
std::unique_lock lock(mutexes[i]);
jobs[i] = [&job, i]() { job(i); };
statuses[i] = Status::working;
lock.unlock();
conditions[i].notify_one();
}
}
void wait()
{
for (int i = 0; i < worker_count; ++i) {
auto &my_status = statuses[i];
std::unique_lock lock(mutexes[i]);
conditions[i].wait(lock, [this, &my_status] { return my_status != Status::working; });
}
}
};
int main()
{
constexpr int worker_count = 1;
constexpr int vector_size = 1 << 10;
std::vector<int> test_vector;
test_vector.reserve(vector_size);
for (int i = 0; i < vector_size; ++i)
test_vector.push_back(i);
std::vector<int> worker_odd_counts(worker_count, 0);
const auto worker_task = [&](int thread_id) {
int chunk_size = vector_size / (worker_count) + 1;
int my_start = thread_id * chunk_size;
int my_end = std::min(my_start + chunk_size, vector_size);
int local_odd_count = 0;
for (int ii = my_start; ii < my_end; ++ii)
if (test_vector[ii] % 2 != 0)
++local_odd_count;
worker_odd_counts[thread_id] = local_odd_count;
};
Pool pool = Pool(worker_count);
pool.start_threads();
pool.set_and_start_job(worker_task);
pool.wait();
int odd_count = 0;
for (auto elem : worker_odd_counts)
odd_count += elem;
std::cout << odd_count << '\n';
}
TL;DR version:
The simplest fix is to change
jobs[i] = [&job, i]() { job(i); };
to
jobs[i] = [job, i]() { job(i); };
This captures job by value and makes a copy. The copy won't go out of scope before the lambda does and the lambda will outlive the thread.
The Long version:
The problem is at
jobs[i] = [&job, i]() { job(i); };
in set_and_start_job. The object backing job goes out of scope before the threads get started, but how can this be if
pool.set_and_start_job(worker_task);
and worker_task won't go out of scope until after the the threads are joined?
Turns out that's because set_and_start_job requires a const std::function<void(int)> & and worker_task isn't a std::function, merely implicitly convertible to a std::function. This conversion makes a temporary variable with a lifespan bound to set_and_start_job's job parameter. When set_and_start_job exits, job goes out of scope and the temporary is destroyed.
The simple fix is above, but we can also make the conversion right at the source to that `std::function is passed all the way through the system and will go out of scope after the threads are joined.
const std::function<void(int)> worker_task = [&](int thread_id) { ... };
There may be some small resource saving in end-to-end std::function and capturing a reference, but my experiences with references and threads haven't been the best, so I'd prefer the copy to reduce the possibility that I've missed some subtlety or someone in the future will make a change that adds some.
In the function Pool::set_and_start_job, when setting the job, removing the & from the job capture seems to have resolved the issue:
jobs[i] = [job, i]() { job(i); };
However, I just had the suspicion and does not know the underlying cause.

std::thread throwing "resource dead lock would occur"

I have a list of objects, each object has member variables which are calculated by an "update" function. I want to update the objects in parallel, that is I want to create a thread for each object to execute it's update function.
Is this a reasonable thing to do? Any reasons why this may not be a good idea?
Below is a program which attempts to do what I described, this is a complete program so you should be able to run it (I'm using VS2015). The goal is to update each object in parallel. The problem is that once the update function completes, the thread throws an "resource dead lock would occur" exception and aborts.
Where am I going wrong?
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
class Object
{
public:
Object(int sleepTime, unsigned int id)
: m_pSleepTime(sleepTime), m_pId(id), m_pValue(0) {}
void update()
{
if (!isLocked()) // if an object is not locked
{
// create a thread to perform it's update
m_pThread.reset(new std::thread(&Object::_update, this));
}
}
unsigned int getId()
{
return m_pId;
}
unsigned int getValue()
{
return m_pValue;
}
bool isLocked()
{
bool mutexStatus = m_pMutex.try_lock();
if (mutexStatus) // if mutex is locked successfully (meaning it was unlocked)
{
m_pMutex.unlock();
return false;
}
else // if mutex is locked
{
return true;
}
}
private:
// private update function which actually does work
void _update()
{
m_pMutex.lock();
{
std::cout << "thread " << m_pId << " sleeping for " << m_pSleepTime << std::endl;
std::chrono::milliseconds duration(m_pSleepTime);
std::this_thread::sleep_for(duration);
m_pValue = m_pId * 10;
}
m_pMutex.unlock();
try
{
m_pThread->join();
}
catch (const std::exception& e)
{
std::cout << e.what() << std::endl; // throws "resource dead lock would occur"
}
}
unsigned int m_pSleepTime;
unsigned int m_pId;
unsigned int m_pValue;
std::mutex m_pMutex;
std::shared_ptr<std::thread> m_pThread; // store reference to thread so it doesn't go out of scope when update() returns
};
typedef std::shared_ptr<Object> ObjectPtr;
class ObjectManager
{
public:
ObjectManager()
: m_pNumObjects(0){}
void updateObjects()
{
for (int i = 0; i < m_pNumObjects; ++i)
{
m_pObjects[i]->update();
}
}
void removeObjectByIndex(int index)
{
m_pObjects.erase(m_pObjects.begin() + index);
}
void addObject(ObjectPtr objPtr)
{
m_pObjects.push_back(objPtr);
m_pNumObjects++;
}
ObjectPtr getObjectByIndex(unsigned int index)
{
return m_pObjects[index];
}
private:
std::vector<ObjectPtr> m_pObjects;
int m_pNumObjects;
};
void main()
{
int numObjects = 2;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(numObjects);
for (int i = 0; i < numObjects; ++i)
objectSleepTimes.push_back(rand());
ObjectManager mgr;
// Create some objects
for (int i = 0; i < numObjects; ++i)
mgr.addObject(std::make_shared<Object>(objectSleepTimes[i], i));
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (int i = 0; i < numObjects; ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.updateObjects();
int numCompleted = 0; // number of objects which finished updating
while (numCompleted != numObjects)
{
for (int i = 0; i < numObjects; ++i)
{
auto objectRef = mgr.getObjectByIndex(i);
if (!objectRef->isLocked()) // if object is not locked, it is finished updating
{
std::cout << "Object " << objectRef->getId() << " completed. Value = " << objectRef->getValue() << std::endl;
mgr.removeObjectByIndex(i);
numCompleted++;
}
}
}
system("pause");
}
Looks like you've got a thread that is trying to join itself.
While I was trying to understand your solution I was simplifying it a lot. And I come to point that you use std::thread::join() method in a wrong way.
std::thread provide capabilities to wait for it completion (non-spin wait) -- In your example you wait for thread completion in infinite loop (snip wait) that will consume CPU time heavily.
You should call std::thread::join() from other thread to wait for thread completion. Mutex in Object in your example is not necessary. Moreover, you missed one mutex to synchronize access to std::cout, which is not thread-safe. I hope the example below will help.
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
#include <cassert>
// cout is not thread-safe
std::recursive_mutex cout_mutex;
class Object {
public:
Object(int sleepTime, unsigned int id)
: _sleepTime(sleepTime), _id(id), _value(0) {}
void runUpdate() {
if (!_thread.joinable())
_thread = std::thread(&Object::_update, this);
}
void waitForResult() {
_thread.join();
}
unsigned int getId() const { return _id; }
unsigned int getValue() const { return _value; }
private:
void _update() {
{
{
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "thread " << _id << " sleeping for " << _sleepTime << std::endl;
}
std::this_thread::sleep_for(std::chrono::seconds(_sleepTime));
_value = _id * 10;
}
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "Object " << getId() << " completed. Value = " << getValue() << std::endl;
}
unsigned int _sleepTime;
unsigned int _id;
unsigned int _value;
std::thread _thread;
};
class ObjectManager : public std::vector<std::shared_ptr<Object>> {
public:
void runUpdate() {
for (auto it = this->begin(); it != this->end(); ++it)
(*it)->runUpdate();
}
void waitForAll() {
auto it = this->begin();
while (it != this->end()) {
(*it)->waitForResult();
it = this->erase(it);
}
}
};
int main(int argc, char* argv[]) {
enum {
TEST_OBJECTS_NUM = 2,
};
srand(static_cast<unsigned int>(time(nullptr)));
ObjectManager mgr;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(TEST_OBJECTS_NUM);
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
objectSleepTimes.push_back(rand() * 9 / RAND_MAX + 1); // 1..10 seconds
// Create some objects
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
mgr.push_back(std::make_shared<Object>(objectSleepTimes[i], i));
assert(mgr.size() == TEST_OBJECTS_NUM);
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (size_t i = 0; i < mgr.size(); ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.runUpdate();
mgr.waitForAll();
//system("pause"); // use Ctrl+F5 to run the app instead. That's more reliable in case of sudden app exit.
}
About is it a reasonable thing to do...
A better approach is to create an object update queue. Objects that need to be updated are added to this queue, which can be fulfilled by a group of threads instead of one thread per object.
The benefits are:
No 1-to-1 correspondence between thread and objects. Creating a thread is a heavy operation, probably more expensive than most update code for a single object.
Supports thousands of objects: with your solution you would need to create thousands of threads, which you will find exceeds your OS capacity.
Can support additional features like declaring dependencies between objects or updating a group of related objects as one operation.

std::condition_variable calling notify_all more than once

First, let me introduce you to my problem.
My code looks like this:
#include <iostream>
#include <thread>
#include <condition_variable>
std::mutex mtx;
std::mutex cvMtx;
std::mutex mtx2;
bool ready{false};
std::condition_variable cv;
int threadsFinishedCurrentLevel{0};
void tfunc() {
for(int i = 0; i < 5; i++) {
//do something
for (int j = 0; j < 10000; j++) {
std::cout << j << std::endl;
}
//this is i-th level
mtx2.lock();
threadsFinishedCurrentLevel++;
if (threadsFinishedCurrentLevel == 2) {
//this is last thread in current level
threadsFinishedCurrentLevel = 0;
cvMtx.unlock();
}
mtx2.unlock();
{
//wait for notify
unique_lock<mutex> lck(mtx);
while (!ready) cv_.wait(lck);
}
}
}
int main() {
cvMtx.lock(); //init
std::thread t1(tfunc);
std::thread t2(tfunc);
for (int i = 0; i < 5; i++) {
cvMtx.lock();
{
unique_lock<mutex> lck(mtx);
ready = true;
cv.notify_all();
}
}
t1.join();
t2.join();
return 0;
}
I have 2 threads. My computation consists of levels(for this example, lets say we have 5 levels). On the same level, computation can be divided to threads. Each thread then calculates part of a problem. When i want to step to the next(higher) level, lower level must be first done. So my idea is something like this. When last thread on the current level is done, it unlocks main thread, so it can notify all of the threads to continue to next level. But this notify has to be called more then once. Because there are plenty of these levels. Can this condition_variable be restarted or something? Or do I need for each level one condition_variable? So for example, when i have 1000 levels, i need to allocate dynamically 1000x condition_variable?
Is it just me or you are trying to block the main thread with a mutex (which is your way of trying to notify it when all threads are done?), I mean that's not the task of a mutex. That's where the condition variable should be used.
// New condition_variable, to nofity main thread when child is done with level
std::condition_variable cv2;
// When a child is done, it will update this counter
int counter = 0; // This is already protected by cvMtx, otherwise it could be atomic.
// This is to sync cout
std::mutex cout_mutex;
void tfunc()
{
for (int i = 0; i < 5; i++)
{
{
std::lock_guard<std::mutex> l(cout_mutex);
std::cout << "Level " << i + 1 << " " << std::this_thread::get_id() << std::endl;
}
{
std::lock_guard<std::mutex> l(cvMtx);
counter++; // update counter &
}
cv2.notify_all(); // notify main thread we are done.
{
//wait for notify
unique_lock<mutex> lck(mtx);
cv.wait(lck);
// Note that I've removed the "ready" flag here
// That's because u would need multiple ready flags to make that work
}
}
}
int main()
{
std::thread t1(tfunc);
std::thread t2(tfunc);
for (int i = 0; i < 5; i++)
{
{
unique_lock<mutex> lck(cvMtx);
// Wait takes a predicate which u can take advantage of
cv2.wait(lck, [] { return (counter == 2); });
counter = 0;
// This thread will get notified multiple times
// But it only will wake up when counter matches 2
// Which equals to how many threads we've created.
}
// Sleeping a bit to know the code is working
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
// Wake up all threds and continue to next level.
unique_lock<mutex> lck(mtx);
cv.notify_all();
}
t1.join();
t2.join();
return 0;
}
The synchronization can be done with a single counter, threads increment the counter under lock and check for the counter to reach a multiple of the number of concurrent threads. This greatly simplifies the logic. I've made this change and also grouped the shared variables into a class, and provided member functions to access them. To avoid false sharing I've ensured that variables that are read-only are separate from those that are read-write by the threads, and also separated read-write variables by usage. The use of global variables is discouraged, see C++ Core Guidelines for this and other good advice.
The simplified code follows, you can see it live in ideone. Note: it looks like there isn't true concurrency in ideone, you'll have to run this on a multi-core environment to actually test hardware concurrency.
//http://stackoverflow.com/questions/35318942/stdcondition-variable-calling-notify-all-more-than-once
#include <iostream>
#include <functional>
#include <thread>
#include <mutex>
#include <vector>
#include <condition_variable>
static constexpr size_t CACHE_LINE_SIZE = 64;
static constexpr size_t NTHREADS = 2;
static constexpr size_t NLEVELS = 5;
static constexpr size_t NITERATIONS = 100;
class Synchronize
{
alignas(CACHE_LINE_SIZE) // read/write while threads are busy working
std::mutex mtx_std_cout;
alignas(CACHE_LINE_SIZE) // read/write while threads are synchronizing at level
std::mutex cvMtx;
std::condition_variable cv;
size_t threadsFinished{0};
alignas(CACHE_LINE_SIZE) // read-only parameters
const size_t n_threads;
const size_t n_levels;
public: // class Synchronize owns unique resources:
// - must be explicitly constructed
// - disallow default ctor,
// - disallow copy/move ctor and
// - disallow copy/move assignment
Synchronize( Synchronize const& ) = delete;
Synchronize & operator=( Synchronize const& ) = delete;
explicit Synchronize( size_t nthreads, size_t nlevels )
: n_threads{nthreads}, n_levels{nlevels}
{}
size_t nlevels() const { return n_levels; }
std::mutex & std_cout_mutex() { return mtx_std_cout; }
void level_done_wait_all( size_t level )
{
std::unique_lock<std::mutex> lk(cvMtx);
threadsFinished++;
cv.wait(lk, [&]{return threadsFinished >= n_threads * (level+1);});
cv.notify_all();
}
};
void tfunc( Synchronize & sync )
{
for(size_t i = 0; i < sync.nlevels(); i++)
{
//do something
for (size_t j = 0; j < NITERATIONS; j++) {
std::unique_lock<std::mutex> lck(sync.std_cout_mutex());
if (j == 0) std::cout << '\n';
std::cout << ' ' << i << ',' << j;
}
sync.level_done_wait_all(i);
}
}
int main() {
Synchronize sync{ NTHREADS, NLEVELS };
std::vector<std::thread*> threads(NTHREADS,nullptr);
for(auto&t:threads) t = new std::thread(tfunc,std::ref(sync));
for(auto t:threads) {
t->join();
delete t;
}
std::cout << std::endl;
return 0;
}

whats the use of shared mutex?

Consider following example -
#include <boost/thread.hpp>
#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>
void wait(int seconds)
{
boost::this_thread::sleep(boost::posix_time::seconds(seconds));
}
boost::shared_mutex mutex;
std::vector<int> random_numbers;
void fill()
{
std::srand(static_cast<unsigned int>(std::time(0)));
for (int i = 0; i < 3; ++i)
{
boost::unique_lock<boost::shared_mutex> lock(mutex);
random_numbers.push_back(std::rand());
lock.unlock();
wait(1);
}
}
void print()
{
for (int i = 0; i < 3; ++i)
{
wait(1);
boost::shared_lock<boost::shared_mutex> lock(mutex);
std::cout << random_numbers.back() << std::endl;
}
}
int sum = 0;
void count()
{
for (int i = 0; i < 3; ++i)
{
wait(1);
boost::shared_lock<boost::shared_mutex> lock(mutex);
sum += random_numbers.back();
}
}
int main()
{
boost::thread t1(fill);
boost::thread t2(print);
boost::thread t3(count);
t1.join();
t2.join();
t3.join();
std::cout << "Summe: " << sum << std::endl;
}
In the given example, both print() and count() access random_numbers read-only. While the print() function writes the last number of random_numbers to the standard output stream, the count() function adds it to the variable sum. Since neither function modifies random_numbers, both can access it at the same time using a non-exclusive lock of type boost::shared_lock.
My question is : As the resource is read only why the shared mutex is needed at the first place in count and print function?' Cant we manage without it?
As the resource is read only [...]
No, it is not : the fill() method proceed to writes through the following :
random_numbers.push_back(std::rand()); // write to random_numbers
So the shared mutex really is necessary to synchronize your access to the vector.