Alternative barrier to spinlock? - c++

Lets say I have this function that multiple threads need to run in a sort of lock step
std::atomic<bool> go = false;
void func() {
while (!go.load()) {} //sync barrier
...
}
I want to get rid of the spinlock and replace it for something mutex based since I have a lot of threads doing all kinds of stuff and spinlocking a dozen of threads is disasterous to the overall throughput, it runs much quicker if I include Sleep(1) inside the spinlock for example.
So is there something in STL that would be similar to AllMemoryBarrierWithGroupSync() in HLSL for example? Basically it would just put each of the threads to sleep at the barrier until all of them have reached it.

It sounds like you want to do exactly what a condition variable is good for.
bool go = false;
std::mutex mtx;
std::condition_variable cv;
void thread_func()
{
{
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, []{ return go; });
}
// Do stuff
}
void start_all()
{
{
std::unique_lock<std::mutex> lock(mtx);
go = true;
}
cv.notify_all();
}

IF you are willing to use experimental features, then latch or barrier will help you. Otherwise you might create your own similar construct using conditional_variable or conditional_variable_any with shared_lock (C++17 feature).
Using shared_mutex to implement a barrier:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <shared_mutex>
#include <thread>
#include <vector>
std::shared_mutex mtx;
std::condition_variable_any cv;
bool ready = false;
void thread_func()
{
{
std::shared_lock<std::shared_mutex> lock(mtx);
cv.wait(lock, []{return ready;});
}
std::cout << '0';
//Rest of calculations
}
int main()
{
std::vector<std::thread> threads;
for(int i = 0; i < 5; ++i)
threads.emplace_back(thread_func);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::shared_mutex> lock(mtx);
std::cout << "Go\n";
ready = true;
}
cv.notify_all();
for(auto& t: threads)
t.join();
std::cout << "\nFinished\n";
}

Related

Pause threads from a different thread, and then wait until all are paused

I want to pause a number of worker thread from a creator thread. This can be done with a conditional variable, as seen in this code.
#include <iostream>
#include <vector>
#include <thread>
#include <condition_variable>
#include <atomic>
#define NR_ITERATIONS 3
#define NR_THREADS 5
class c_threads {
private:
bool m_worker_threads_pause;
//std::atomic<int> m_worker_threads_paused;
std::mutex m_worker_thread_mutex;
std::condition_variable m_worker_thread_conditional_variable;
void worker_thread() {
std::unique_lock<std::mutex> worker_thread_lock(m_worker_thread_mutex);
m_worker_thread_conditional_variable.wait(worker_thread_lock,
[this]{return !this->m_worker_threads_pause;}
);
std::cout << "worker thread function" << std::endl;
//...
}
void creator_thread() {
std::cout << "creator thread function" << std::endl;
{
std::lock_guard<std::mutex> lock_guard(m_worker_thread_mutex);
m_worker_threads_pause = true;
}
// wait_until( worker_threads_waiting == NR_THREADS);
//...
{
std::lock_guard<std::mutex> lock_guard(m_worker_thread_mutex);
m_worker_threads_pause = false;
}
m_worker_thread_conditional_variable.notify_all();
}
public:
c_threads() : m_worker_threads_pause(true)
/*m_worker_threads_paused(0)*/ {}
void start_job() {
std::vector<std::thread> worker_threads;
worker_threads.reserve(NR_THREADS);
for (int i=0;i<NR_THREADS;i++) {
worker_threads.emplace_back(&c_threads::worker_thread,this);
}
std::thread o_creator_thread(&c_threads::creator_thread,this);
o_creator_thread.join();
for (auto& thread : worker_threads) {
thread.join();
}
}
};
int main(int argc, char** argv) {
c_threads o_threads;
o_threads.start_job();
}
The problem is that the creator_thread function should wait until all worker_functions are waiting at the conditional variable before it proceeds.
Every time that the creator_thread function is called it should
Pause the worker threads
Wait until they are all paused at the condition variable
Proceed
How to achieve this?
There might be a better way, but I think you're going to have to do something a little more complicated, like create a gatekeeper object. Worker threads generally work like this:
while(iShouldKeepRunning()) {
... lock the mutex
... look for something to do
... if nothing to do, then wait on the condition
}
I think instead you would want some sort of "give me more work" object, or maybe a "is it safe to keep working" object that your creater thread can block.
while(iShouldKeepRunning()) {
... no mutex at all
... ask the gatekeeper for something to do / if it's safe to do something
... and the gatekeeper blocks as necessary
... do the work
}
The gatekeeper locks the mutex, checks if it's safe to give out work, and if it isn't, increments a "I'm making this guy wait" counter before blocking on the condvar.
Something like that.
The blocker might look something like:
class BlockMyThreads {
public:
int runningCount = 0;
int blockedCount = 0;
bool mayWork = true;
std::mutex myMutex;
std::condition_variable condVar;
void iAmWorking() {
std::unique_lock<std::mutex> lock(myMutex);
++runningCount;
}
void letMeWork() {
std::unique_lock<std::mutex> lock(myMutex);
while (!mayWork) {
++blockedCount;
condVar.wait(lock);
--blockedCount;
}
}
void block() {
std::unique_lock<std::mutex> lock(myMutex);
mayWork = false;
}
void release() {
std::unique_lock<std::mutex> lock(myMutex);
mayWork = true;
condVar.notifyAll(lock);
}
};
I haven't tested this, so there might be errors. Your worker threads would need to call iAmWorking() at the start (to give you a thread count) and you'd want to increment a decrement they call when they're done, I suppose.
The main thread can call block() and release() as you desire.

C++ condition_variable wait_for() blocks forever [duplicate]

I'm trying to create a producer-consumer program, where the consumers must keep running until all the producers are finished, then consume what's left in the queue (if there's anything left) and then end. You can check my code bellow, I think I know where the problem (probably deadlock) is, but I don't know how to make it work properly.
#include<iostream>
#include<cstdlib>
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
using namespace std;
class Company{
public:
Company() : producers_done(false) {}
void start(int n_producers, int n_consumers); // start customer&producer threads
void stop(); // join all threads
void consumer();
void producer();
/* some other stuff */
private:
condition_variable cond;
mutex mut;
bool producers_done;
queue<int> products;
vector<thread> producers_threads;
vector<thread> consumers_threads;
/* some other stuff */
};
void Company::consumer(){
while(!products.empty()){
unique_lock<mutex> lock(mut);
while(products.empty() && !producers_done){
cond.wait(lock); // <- I think this is where the deadlock happens
}
if (products.empty()){
break;
}
products.pop();
cout << "Removed product " << products.size() << endl;
}
}
void Company::producer(){
while(true){
if((rand()%10) == 0){
break;
}
unique_lock<mutex> lock(mut);
products.push(1);
cout << "Added product " << products.size() << endl;
cond.notify_one();
}
}
void Company::stop(){
for(auto &producer_thread : producers_threads){
producer_thread.join();
}
unique_lock<mutex> lock(mut);
producers_done = true;
cout << "producers done" << endl;
cond.notify_all();
for(auto &consumer_thread : consumers_threads){
consumer_thread.join();
}
cout << "consumers done" << endl;
}
void Company::start(int n_producers, int n_consumers){
for(int i = 0; i<n_producers; ++i){
producers_threads.push_back(thread(&Company::producer, this));
}
for(int i = 0; i<n_consumers; ++i){
consumers_threads.push_back(thread(&Company::consumer, this));
}
}
int main(){
Company c;
c.start(2, 2);
c.stop();
return true;
}
I know, there are a lot of producer-consumer related questions here, and I've scrolled through at least 10 of them, but none provided answer to my issue.
When people use std::atomic along with std::mutex and std::condition_variable that results in deadlock in almost 100% of cases. This is because modifications to that atomic variable are not protected by the mutex and hence condition variable notifications get lost when that variable is updated after the mutex is locked but before condition variable wait in the consumer.
A fix would be to not use std::atomic and only modify and read producers_done while the mutex is held. E.g.:
void Company::consumer(){
for(;;){
unique_lock<mutex> lock(mut);
while(products.empty() && !producers_done)
cond.wait(lock);
if(products.empty())
break;
orders.pop();
}
}
Another error in the code is that in while(!products.empty()) it calls products.empty() without holding the mutex, resulting in a race condition.
The next error is keeping the mutex locked while waiting for the consumer threads to terminate. Fix:
{
unique_lock<mutex> lock(mut);
producers_done = true;
// mutex gets unlocked here.
}
cond.notify_all();
for(auto &consumer_thread : consumers_threads)
consumer_thread.join();

Condition variable basic example

I am learning condition variables in C++11 and wrote this program based on a sample code.
The goal is to accumulate in a vector the first ten natural integers that are generated by a producer and pushed into the vector by a consumer. However it does not work since, for example on some runs, the vector only contains 1, 7 and 10.
#include <mutex>
#include <condition_variable>
#include<vector>
#include <iostream>
#include <cstdio>
std::mutex mut;
#define MAX 10
int counter;
bool isIncremented = false;
std::vector<int> vec;
std::condition_variable condvar;
void producer() {
while (counter < MAX) {
std::lock_guard<std::mutex> lg(mut);
++counter;
isIncremented = true;
condvar.notify_one();
}
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> ul(mut);
condvar.wait(ul, [] { return isIncremented; });
vec.push_back(counter);
isIncremented = false;
if (counter >= MAX) {
break;
}
}
}
int main(int argc, char *argv[]) {
std::thread t1(consumer);
std::thread t2(producer);
t2.join();
t1.join();
for (auto i : vec) {
std::cout << i << ", ";
}
std::cout << std::endl;
// Expected output: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
// Example of actual output: 1, 7, 10,
std::cout << "Press enter to quit";
getchar();
return 0;
}
The problem is that you only remember the last number your producer produced. And your producer never waits until the consumer has consumed what it produced. If your producer thread gets to do more than one iteration of its loop before the consumer thread gets to run (which is not unlikely since the loop doesn't do much), the consumer will only see the last number the producer produced and only push that one into the vector…
To solve this problem, either use a second condition variable to make the producer wait for someone to pick up the last result it produced, or use something that can store more than one result between producer and consumer, or a combination thereof…
Note: Notifying a condition variable is not a blocking call. If it were, it would have to ask you to hand over the mutex so it can internally release it or you'd end up in a deadlock. notify_one() will just wake up one of the threads that are waiting on the condition variable and return. The wait call that the woken thread was blocking on will reacquire the mutex before it returns. In your case, it's not unlikely that the consumer thread be woken and then fail to reacquire the mutex and block again right away because your producer thread is still holding on to the mutex when it's calling notify_one(). Thus, as a general rule of thumb, you want to release the mutex associated with a condition variable should you be holding it before you call notify…
A side note, apparently you used the lock_guard<> in producer, but unique_lock in consumer. In the consumer, the unique_lock also doesn't seem to guard the share resource exclusively.
Below is a modified code that uses unique_lock in both producer and consumer, that guard against shared resource counter.
The code adds a sleep in the producer so that the consumer can be notified of the counter change.
Output seems to be as expected.
#include <mutex>
#include <condition_variable>
#include<vector>
#include <iostream>
#include <cstdio>
#include <thread>
#include <chrono>
std::mutex mut;
#define MAX 10
int counter = 0;
bool isIncremented = false;
std::vector<int> vec;
std::condition_variable condvar;
void producer() {
while (counter < MAX) {
std::unique_lock<std::mutex> lg(mut);
++counter;
isIncremented = true;
lg.unlock();
condvar.notify_one();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> ul(mut);
condvar.wait(ul, [] { return isIncremented; });
vec.push_back(counter);
isIncremented = false;
if (counter >= MAX) {
break;
}
ul.unlock();
}
}
int main(int argc, char *argv[]) {
std::thread t1(consumer);
std::thread t2(producer);
t2.join();
t1.join();
for (auto i : vec) {
std::cout << i << ", ";
}
std::cout << std::endl;
return 0;
}
Using #MichaelKenzel suggestions from the answer, here is a working example. std::queue is used in order to store more than one result between producer and consumer.
#include<mutex>
#include<condition_variable>
#include<vector>
#include<iostream>
#include<cstdio>
#include<thread>
#include<queue>
std::mutex mut;
#define MAX 10
int counter;
std::queue<int> data_queue;
std::vector<int> vec;
std::condition_variable condvar;
void producer()
{
while (counter < MAX)
{
++counter;
std::lock_guard<std::mutex> lg(mut);
data_queue.push(counter);
condvar.notify_one();
}
}
void consumer()
{
while (true)
{
std::unique_lock<std::mutex> ul(mut);
condvar.wait(ul, [] { return !data_queue.empty(); });
int data = data_queue.front();
data_queue.pop();
ul.unlock();
vec.push_back(data);
if (data >= MAX)
{
break;
}
}
}
int main(int argc, char *argv[])
{
std::thread t1(consumer);
std::thread t2(producer);
t2.join();
t1.join();
for (auto i : vec)
{
std::cout << i << ", ";
}
std::cout << std::endl;
return 0;
}

c++ thread does not execute

The thread1 function does not seem to get executed
#include <iostream>
#include <fstream>
#include <thread>
#include <condition_variable>
#include <queue>
std::condition_variable cv;
std::mutex mu;
std::queue<int> queue;
bool ready;
static void thread1() {
while(!ready) {std::this_thread::sleep_for(std::chrono::milliseconds(10));}
while(ready && queue.size() <= 4) {
std::unique_lock<std::mutex> lk(mu);
cv.wait(lk, [&]{return !queue.empty();});
queue.push(2);
}
}
int main() {
ready = false;
std::thread t(thread1);
while(queue.size() <= 4) {
{
std::lock_guard<std::mutex> lk(mu);
queue.push(1);
}
ready = true;
cv.notify_one();
}
t.join();
for(int i = 0; i <= queue.size(); i++) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
return 0;
}
On my Mac the output is 1 2 1 2 but in my ubuntu its 1 1 1. I'm compiling with g++ -std=c++11 -pthread -o thread.out thread.cpp && ./thread.out. Am I missing something?
This:
for(int i = 0; i <= queue.size(); i++) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
Is undefined behavior. A for loop that goes from 0 to size runs size+1 times. I would suggest that you write this in the more idiomatic style for a queue:
while(!queue.empty()) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
When I run this on coliru, which I assume runs some kind of *nix machine, I get 4 1's: http://coliru.stacked-crooked.com/a/8de5b01e87e8549e.
Again, you haven't specified anything that would force each thread to run a certain amount of times. You only (try to*) cause an invariant where the queue will reach size 4, either way. It just happens to be that on the machines that we ran it on, thread 2 never manages to acquire the mutex.
This example will be more interesting if you add more work or even (just for pedagogical purposes) delays at various points. Simulating that the two threads are actually doing work. If you add sleeps at various points you can ensure that the two threads alternate, though depending where you add them you may see your invariant of 4 elements in the thread break!
*Note that even your 4 element invariant on the queue, is not really an invariant. It is possible (though very unlikely) that both threads pass the while condition at the exact same moment, when there are 3 elements in the queue. One acquires the lock first and pushes, and then the other. So you can end up with 5 elements in the queue! (as you can see, asynchronous programming is tricky). In particular you really need to check the queue size when you have the lock in order for this to work.
I was able to solve this by making the second thread wait on a separate predicate on a separate conditional variable. I'm not sure if queue.size() is thread safe.
#include <iostream>
#include <fstream>
#include <thread>
#include <condition_variable>
#include <queue>
std::condition_variable cv;
std::condition_variable cv2;
std::mutex mu;
std::queue<int> queue;
bool tick;
bool tock;
static void thread1() {
while(queue.size() < 6) {
std::unique_lock<std::mutex> lk(mu);
cv2.wait(lk, []{return tock;});
queue.push(1);
tock = false;
tick = true;
cv.notify_one();
}
}
int main() {
tick = false;
tock = true;
std::thread t(thread1);
while(queue.size() < 6) {
std::unique_lock<std::mutex> lk(mu);
cv.wait(lk, []{return tick;});
queue.push(2);
tick = false;
tock = true;
cv2.notify_one();
}
t.join();
while(!queue.empty()) {
int r = queue.front();
queue.pop();
std::cout << r << std::endl;
}
return 0;
}

unique lock and condition variable - explicitly calling unlock

I found an example code which demonstrates how to use a condition variable :
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <deque>
using namespace std;
deque<int> qu;
mutex mu;
condition_variable cond;
void fun1()
{
int count = 100;
while (count > 0)
{
unique_lock<mutex> locker(mu);
qu.push_front(count);
locker.unlock(); // explicit unlock 1
cond.notify_one();
--count;
}
}
void fun2()
{
int data = 0;
while(data != 1)
{
unique_lock<mutex> locker(mu);
cond.wait(locker, [](){ return !(qu.empty()); });
data = qu.back();
qu.pop_back();
locker.unlock(); // explicit unlock 2
cout<<"data: "<<data<<endl;
}
}
int main()
{
thread t1(fun1);
thread t2(fun2);
t1.join();
t2.join();
system("pause");
return 0;
}
I think that explicitly calling unlock is not necessary. However in fun1 calling it before notify_one might increase a performace, right ? Why unlock is called in fun2 (in each iteration unlock is called implicitly, so doing it explicitly make no sense) ?
std::unique_lock use the RAII pattern.
That means its doesn't need to explicitly call unlock on mutex. This provides exception safety i.e in case of exception after locking the mutex and before explicitly unlocking it it automatically gets unlocked as it goes out of scope.
It seems misleading to me. Locking with a mutex is required to use condition variables. This example uses the same mutex for multiple shared variables (cond and qu).
I think, it will not work properly if fun1 or fun2 runs on more than one thread.
Below would be more clear:
mutex mu;
mutex mu_for_cv;
condition_variable cond;
void fun1()
{
int count = 100;
while (count > 0)
{
unique_lock<mutex> locker(mu);
qu.push_front(count);
{
unique_lock<mutex> locker(mu_for_cv);
cond.notify_one();
}
--count;
}
}
void fun2()
{
int data = 0;
while(data != 1)
{
{
unique_lock<mutex> locker(mu_for_cv);
cond.wait(locker, [](){ return !(qu.empty()); });
}
unique_lock<mutex> locker(mu);
if (!qu.empty())
{
data = qu.back();
qu.pop_back();
cout<<"data: "<<data<<endl;
}
}
}
Also, it'd be better to check the queue is not empty in fun2 to defense against spurious wakeups.