Deadlock with blocking queue and barrier in C++ - c++

I have this very simple and small C++ program that creates a thread pool, then put messages in a blocking queue shared between threads to say to each thread what to do.
Message can be: -1 (end of stream -> terminate), -2 (barrier -> wait for all threads to reach it, then continue), other values to do random computation. The loop is done in this order: some computation, barrier, some computation, barrier, ..., barrier, end of stream, thread join, exit.
I'm not able to understand why I obtain deadlock even with 2 threads in the pool. The queue is not able to become empty, but the order in which I push and pop messages would always lead to an empty queue!
The blocking queue implementation is the one proposed here (C++ Equivalent to Java's BlockingQueue) with just two methods added. I copy also the queue code below.
Any help?
Main.cpp
#include <iostream>
#include <vector>
#include <thread>
#include "Queue.hpp"
using namespace std;
// function executed by each thread
void f(int i, Queue<int> &q){
while(1){
// take a message from blocking queue
int j= q.pop();
// if it is end of stream then exit
if (j==-1) break;
// if it is barrier, wait for other threads to reach it
if (j==-2){
// active wait! BAD, but anyway...
while(q.size() > 0){
;
}
}
else{
// random stuff
int x = 0;
for(int i=0;i<j;i++)
x += 4;
}
}
}
int main(){
Queue<int> queue; //blocking queue
vector<thread> tids; // thread pool
int nt = 2; // number of threads
int dim = 8; // number to control number of operations
// create thread pool, passing thread id and queue
for(int i=0;i<nt;i++)
tids.push_back(thread(f,i, std::ref(queue)));
for(int dist=1; dist<=dim; dist++){ // without this outer loop the program works fine
// push random number
for(int j=0;j<dist;j++){
queue.push(4);
}
// push barrier code
for(int i=0;i<nt;i++){
queue.push(-2);
}
// active wait! BAD, but anyway...
while (queue.size()>0){
;
}
}
// push end of stream
for(int i=0;i<nt;i++)
queue.push(-1);
// join thread pool
for(int i=0;i<nt;i++){
tids[i].join();
}
return 0;
}
Queue.hpp
#include <deque>
#include <mutex>
#include <condition_variable>
template <typename T>
class Queue
{
private:
std::mutex d_mutex;
std::condition_variable d_condition;
std::deque<T> d_queue;
public:
void push(T const& value) {
{
std::unique_lock<std::mutex> lock(this->d_mutex);
d_queue.push_front(value);
}
this->d_condition.notify_one();
}
T pop() {
std::unique_lock<std::mutex> lock(this->d_mutex);
this->d_condition.wait(lock, [=]{ return !this->d_queue.empty(); });
T rc(std::move(this->d_queue.back()));
this->d_queue.pop_back();
return rc;
}
bool empty(){
std::unique_lock<std::mutex> lock(this->d_mutex);
return this->d_queue.empty();
}
int size(){
std::unique_lock<std::mutex> lock(this->d_mutex);
return this->d_queue.size();
}
};

I think the problem is your active wait that you describe as "BAD, but anyway..." and using the size of the queue as a barrier instead of using a true synchronization barrier
For dim =1 you push a Queue that has 4, -2, -2. One thread will grab the 4 and -2 while the other grabs the remaining -2. At this point the queue is empty and you have three threads (the two workers and main thread) doing an active wait racing to see if the queue has been emptied. There is a mutex on size that only lets one read the size at a time. If the main thread is scheduled first and determines that queue is empty it will push on -1, -1 to signal end of stream. Now, the queue is no longer empty, but one or both of the two worker threads are waiting for it to empty. Since they are waiting for it to be empty before taking another item the queue is deadlocked in this state.
For the case were dim > 1 there is likely a similar issue with pushing the next set of values into the queue on the main thread before both workings acknowledge the empty the queue and exit the active wait.

I had run your code and I understand the problem. The problem is with "-2" option. When the two threads arrive to this point, your main thread already pushed another values to the queue. So, if your queue increased it's size between the time that your threads got "-2" value, and before they arrive to "-2" option, your code will stuck:
Thread 1: get -2.
Thread 2: get -2.
Thread main: push -1.
Thread main: push -1.
Thread 1: wait untill the whole queue will be empty.
Thread 2: wait untill the whole queue will be empty.
queue:
-1
-1
^ this in case that dim equals 1. In your code, dim equals 8, you don't want to see how it looks like..
To solve this, all I did was to disable the following loop:
for(int i=0;i<nt;i++){
queue.push(-2);
}
When this pard disable, the code run perfectly.
This is how I checked it:
std::mutex guarder;
// function executed by each thread
void f(int i, Queue<int> &q){
while(1){
// take a message from blocking queue
guarder.lock();
int j= q.pop();
guarder.unlock();
// if it is end of stream then exit
if (j==-1) break;
// if it is barrier, wait for other threads to reach it
if (j==-2){
// active wait! BAD, but anyway...
while(q.size() > 0){
;
}
}
else{
// random stuff
int x = 0;
for(int i=0;i<j;i++)
x += 4;
guarder.lock();
cout << x << std::endl;
guarder.unlock();
}
}
}
int main(){
Queue<int> queue; //blocking queue
vector<thread> tids; // thread pool
int nt = 2; // number of threads
int dim = 8; // number to control number of operations
// create thread pool, passing thread id and queue
for(int i=0;i<nt;i++)
tids.push_back(thread(f,i, std::ref(queue)));
for(int dist=1; dist<=dim; dist++){ // without this outer loop the program works fine
// push random number
for(int j=0;j<dist;j++){
queue.push(dist);
}
/*// push barrier code
for(int i=0;i<nt;i++){
queue.push(-2);
}*/
// active wait! BAD, but anyway...
while (queue.size()>0){
;
}
}
// push end of stream
for(int i=0;i<nt;i++)
queue.push(-1);
// join thread pool
for(int i=0;i<nt;i++){
tids[i].join();
}
return 0;
}
The result:
4
8
8
12
12
12
16
16
16
20
20
16
20
20
20
24
24
24
24
24
24
28
28
28
28
28
28
28
32
32
32
32
32
32
32
32
BTW, the stuck didn't occur because your "active wait" part. It is not good, but it cause other problems usually (like slowing down your system).

Related

double buffer for consumer and producer problem

So I am trying to implement a double buffer for a typical producer and consumer problem.
1.get_items() basically produces 10 items at a time.
2.producer basically push 10 items onto a write queue. Assume that currently we only have one producer.
3.consumers will consume one item from the queue. There are many consumers.
So I am sharing my code as the following. The implementation idea is simple, consume from the readq until it is empty and then swap the queue pointer, which the readq would point to the writeq now and writeq would now points to the emptied queue and would starts to fill it again. So producer and consumer can work independently without halting each other. This sort of swaps space for time.
However, my code does not work in multiple consumer cases. In my code, I initiated 10 consumer threads, and it always stuck at the .join().
So I am thinking that my code is definitely buggy. However, by examine carefully, I did not find where that bug is. And it seems the code stuck after lk1.unlock(), so it is not stuck in a while or something obvious.
mutex m1;
mutex m2; // using 2 mutex, so when producer is locked, consumer can still run
condition_variable put;
condition_variable fetch;
queue<int> q1;
queue<int> q2;
queue<int>* readq = &q1;
queue<int>* writeq = &q2;
bool flag{ true };
vector<int> get_items() {
vector<int> res;
for (int i = 0; i < 10; i++) {
res.push_back(i);
}
return res;
}
void producer_mul() {
unique_lock<mutex> lk2(m2);
put.wait(lk2, [&]() {return flag == false; }); //producer waits for consumer signal
vector<int> items = get_items();
for (auto it : items) {
writeq->push(it);
}
flag = true; //signal queue is filled
fetch.notify_one();
lk2.unlock();
}
int get_one_item_mul() {
unique_lock<mutex> lk1(m1);
int res;
if (!(*readq).empty()) {
res = (*readq).front(); (*readq).pop();
if ((*writeq).empty() && flag == true) { //if writeq is empty
flag = false;
put.notify_one();
}
}
else {
readq = writeq; // swap queue pointer
while ((*readq).empty()) { // not yet write
if (flag) {
flag = false;
put.notify_one();//start filling process
}
//if (readq->empty()) { //upadted due to race. readq now points to writeq, so if producer finished, readq is not empty and flag = true.
fetch.wait(lk1, [&]() {return flag == true; });
//}
}
if (flag) {
writeq = writeq == &q1 ? &q2 : &q1; //swap the writeq to the alternative queue and fill it again
flag = false;
//put.notify_one(); //fill that queue again if needed. but in my case, 10 item is produced and consumed, so no need to use the 2nd round, plus the code does not working in this simple case..so commented out for now.
}
res = readq->front(); readq->pop();
}
lk1.unlock();
this_thread::sleep_for(10ms);
return res;
}
int main()
{
std::vector<std::thread> threads;
std::packaged_task<void(void)> job1(producer_mul);
vector<std::future<int>> res;
for (int i = 0; i < 10; i++) {
std::packaged_task<int(void)> job2(get_one_item_mul);
res.push_back(job2.get_future());
threads.push_back(std::thread(std::move(job2)));
}
threads.push_back(std::thread(std::move(job1)));
for (auto& t : threads) {
t.join();
}
for (auto& a : res) {
cout << a.get() << endl;
}
return 0;
}
I added some comments, but the idea and code is pretty simple and self-explanatory.
I am trying to figure out where the problem is in my code. Does it work for multiple consumer? Further more, if there are multiple producers here, does it work? I do not see a problem since basically in the code the lock is not fine grained. Producer and Consumer both are locked from the beginning till the end.
Looking forward to discussion and any help is appreciated.
Update
updated the race condition based on one of the answer.
The program is still not working.
Your program contains data races, and therefore exhibits undefined behavior. I see at least two:
producer_mul accesses and modifies flag while holding m2 mutex but not m1. get_one_item_mul accesses and modifies flag while holding m1 mutex but not m2. So flag is not in fact protected against concurrent access.
Similarly, producer_mul accesses writeq pointer while holding m2 mutex but not m1. get_one_item_mul modifies writeq while holding m1 mutex but not m2.
There's also a data race on the queues themselves. Initially, both queues are empty. producer_mul is blocked waiting on flag. Then the following sequence occurs ( P for producer thread, C for consumer thread):
C: readq = writeq; // Both now point to the same queue
C: flag = false; put.notify_one(); // This wakes up producer
**P: writeq->push(it);
**C: if (readq->empty())
The last two lines happen concurrently, with no protection against concurrent access. One thread modifies an std::queue instance while the other accesses that same instance. This is a data race.
There's a data race at the heart of the design. Let's imagine there's just one producer P and two consumers C1 and C2. Initially, P waits on put until flag == false. C1 grabs m1; C2 is blocked on m1.
C1 sets readq = writeq, then unblocks P1, then calls fetch.wait(lk1, [&]() {return flag == true; });. This unlocks m1, allowing C2 to proceed. So now P is busy writing to writeq while C2 is busy reading from readq - which is one and the same queue.

Start multiple threads and wait only for one to finish to obtain results

Assuming I have the function double someRandomFunction(int n) that takes an integer and returns double but it's random in the sense that it tries random stuff to come up with the solution so even though you run the function with the same arguments, sometimes it can take 10 seconds to finish and other 40 seconds to finish.
The double someRandomFunction(int n) functions itself is a wrapper to a black box function. So the someRandomFunction takes a while to complete but I don't have control in the main loop of the black box, hence I can't really check for a flag variable within the thread as the heavy computation happens in a black box function.
I would like to start 10 threads calling that function and I am interested in the result of the first thread which finishes first. I don't care which one it's I only need 1 result from these threads.
I found the following code:
std::vector<boost::future<double>> futures;
for (...) {
auto fut = boost::async([i]() { return someRandomFunction(2) });
futures.push_back(std::move(fut));
}
for (...) {
auto res = boost::wait_for_any(futures.begin(), futures.end());
std::this_thread::yield();
std::cout << res->get() << std::endl;
}
Which is the closest to what I am looking for, but still I can't see how I can make my program to terminate the other threads as far as one thread returns a solution.
I would like to wait for one to finish and then carry on with the result of that one thread to continue my program execution (i.e., I don't want to terminate my program after I obtain that single result, but I would like to use it for the remaining program execution.).
Again, I want to start up 10 threads calling the someRandomFunction and then wait for one thread to finish first, get the result of that thread and stop all the other threads even though they didn't finish their work.
If the data structure supplied to the black-box has some obvious start and end values, one way to make it finish early could be to change the end value while it's computing. It could of course cause all sorts of trouble if you've misunderstood how the black-box must work with the data, but if you are reasonably sure, it can work.
main spawns 100 outer threads that each spawn one inner thread that calls the blackbox. The inner thread receives the blackbox result and notifies all waiting threads that it's done. The outer thread waits for any inner thread to get done and then modifies the data for its own blackbox to trick it to finish.
No polling (except for the spurious wakeup loops) and no detached threads.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <vector>
#include <chrono>
// a work package for one black-box
struct data_for_back_box {
int start_here;
int end_here;
};
double blackbox(data_for_back_box* data) {
// time consuming work here:
for(auto v=data->start_here; v<data->end_here; ++v) {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
// just a debug
if(data->end_here==0) std::cout << "I was tricked into exiting early\n";
return data->end_here;
}
// synchronizing stuff and result
std::condition_variable cv;
std::mutex mtx;
bool done=false;
double result;
// a wrapper around the real blackbox
void inner(data_for_back_box* data) {
double r = blackbox(data);
if(done) return; // someone has already finished, skip this result
// notify everyone that we're done
std::unique_lock<std::mutex> lock(mtx);
result = r;
done=true;
cv.notify_all();
}
// context setup and wait for any inner wrapper
// to signal "done"
void outer(int n) {
data_for_back_box data{0, 100+n*n};
std::thread work(inner, &data);
{
std::unique_lock<std::mutex> lock(mtx);
while( !done ) cv.wait(lock);
}
// corrupt data for blackbox:
data.end_here = 0;
// wait for this threads blackbox to finish
work.join();
}
int main() {
std::vector<std::thread> ths;
// spawn 100 worker threads
for(int i=0; i<100; ++i) {
ths.emplace_back(outer, i);
}
double saved_result;
{
std::unique_lock<std::mutex> lock(mtx);
while( !done ) cv.wait(lock);
saved_result = result;
} // release lock
// join all threads
std::cout << "got result, joining:\n";
for(auto& th : ths) {
th.join();
}
std::cout << "result: " << saved_result << "\n";
}

How to run a function on a separate thread, if a thread is available

How can I run a function on a separate thread if a thread is available, assuming that i always want k threads running at the same time at any point?
Here's a pseudo-code
For i = 1 to N
IF numberOfRunningThreads < k
// run foo() on another thread
ELSE
// run foo()
In summary, once a thread is finished it notifies the other threads that there's a thread available that any of the other threads can use. I hope the description was clear.
My personal approach: Just do create the k threads and let them call foo repeatedly. You need some counter, protected against race conditions, that is decremented each time before foo is called by any thread. As soon as the desired number of calls has been performed, the threads will exit one after the other (incomplete/pseudo code):
unsigned int global_counter = n;
void fooRunner()
{
for(;;)
{
{
std::lock_guard g(global_counter_mutex);
if(global_counter == 0)
break;
--global_counter;
}
foo();
}
}
void runThreads(unsigned int n, unsigned int k)
{
global_counter = n;
std::vector<std::thread> threads(std::min(n, k - 1));
// k - 1: current thread can be reused, too...
// (provided it has no other tasks to perform)
for(auto& t : threads)
{
t = std::thread(&fooRunner);
}
fooRunner();
for(auto& t : threads)
{
t.join();
}
}
If you have data to pass to foo function, instead of a counter you could use e. g a FIFO or LIFO queue, whatever appears most appropriate for the given use case. Threads then exit as soon as the buffer gets empty; you'd have to prevent the buffer running empty prematurely, though, e. g. by prefilling all the data to be processed before starting the threads.
A variant might be a combination of both: exiting, if the global counter gets 0, waiting for the queue to receive new data e. g. via a condition variable otherwise, and the main thread continuously filling the queue while the threads are already running...
you can use (std::thread in <thread>) and locks to do what you want, but it seems to me that your code could be simply become parallel using openmp like this.
#pragma omp parallel num_threads(k)
#pragma omp for
for (unsigned i = 0; i < N; ++i)
{
auto t_id = omp_get_thread_num();
if (t_id < K)
foo()
else
other_foo()
}

Segfault occuring in attempt to synchronize queue

I am learning about multithreading and I wanted to simulate producer-consumer problem ( using semaphore if I can call it that ).
I have a class that holds a queue, producer push ints into queue and consumer retrieves it and prints it. I simulated is as following
class TestClass{
public:
void producer( int i ){
unique_lock<mutex> l(m);
q.push(i);
if( q.size() )
cnd.notify_all();
}
void consumer(){
unique_lock<mutex> l(m);
while( q.empty() ){
cnd.wait(l);
}
int tmp = q.front();
q.pop();
cout << "Producer got " << tmp << endl;
}
void ConsumerInit( int threads ){
for( int i = 0; i < threads; i++ ){
thrs[i] = thread(&TestClass::consumer, this );
}
for( auto &a : thrs )
a.join();
}
private:
queue<int> q;
vector<thread> thrs;
mutex m;
condition_variable cnd;
};
And I used a little console application to call data:
int main(){
int x;
TestClass t;
int counter = 0;
while( cin >> x ){
if( x == 0 )
break;
if( x == 1)
t.producer(counter++);
if( x == 2 )
t.ConsumerInit(5);
}
}
So when user input 1, a data is pushed into the queue , if user press 2 threads are spawned.
In any order of invoking it, for example, pressing 1 1 and then 2, or 2 1 1
it throws segfault. I am not sure why my understanding of my code is as following: let's assume order 2 1 1
I initialize 5 threads, they see that queue is empty, so they go to sleep. When I push a number to the queue, it notifies all threads sleeping.
The first one to wake up lock mutex again and proceed to retrieve number from queue and afterwards releasing the mutex, when mutex is released another thread do the same and unlocks the mutex, the third thread after mutex is unlocked is still in loop and see that queue is yet again empty and goes to sleep again, same with all remaining threads.
Is this logic correct? If so, why does this keep throwing segfault, if not, I appreciate all explanations.
Thanks for the help!
//edit
By answers suggets , i replaced [] with vector.push_back , but consumer does nothing with data now , does not take it or print it.
You aren't expanding the thrs vector when you do
thrs[i] = thread(&CTest::consumer, this );
You should do
thrs.emplace_back(&CTest::consumer, this);
That's where the crash would be.
Your issue has nothing to do with multithreading. You are accessing a std::vector out-of-bounds:
for (int i = 0; i < threads; i++) {
thrs[i] = thread(&CTest::consumer, this);
//...
vector<thread> thrs;
The thrs vector is empty, and you're trying to access as if it has entries.
To show the error, use:
thrs.at(i) = thread(&CTest::consumer, this);
and you will be greeted with a std::out_of_range exception instead of a segmentation fault.
Your program deadlocks, if the input sequence is not in the form of 1 1 1 1 1 ... 2. That is if the number if 1s preceding 2 is less than five.
Here is the reason:
If the total elements in queue size are less than 5 and the main thread calls consumerInit, some of the five created consumer threads will block waiting for the queue to receive elements. Meanwhile, the main thread blocks on the join operation. Since the main thread will be waiting for consumer threads to finish while some of those threads are waiting for data to consume, there will be no progress. Hence deadlock.
Problem is here:
for( auto &a : thrs )
a.join();
Main thread gets blocked here after you enter 2 waiting for the consumers to finish. So after this point you think that you are entering inputs, while there is no cin happening.
Remove these two lines and then you can enter 1 and producer/consumer will do their job.

C++ boost thread: having a worker thread pause and unpause based on mutexes/conditions using a concurrent queue

I am fairly new to multi-threaded programming, so please forgive my possibly imprecise question. Here is my problem:
I have a function processing data and generating lots of objects of the same type. This is done iterating in several nested loops, so it would be practical to just do all iterations, save these objects in some container and then work on that container in interfacing code doing the next steps. However, I have to create millions of these objects which would blow up the memory usage. These constraints are mainly due to external factors I cannot control.
Generating only a certain amount of data would be ideal, but breaking out of the loops and restarting later at the same point is also impractical. My idea was to do the processing in a separate thread which would be paused after n iterations and resumed once all n objects are completely processed, then resuming, doing n next iterations and so on until all iterations are done. It is important to wait until the thread has done all n iterations, so both threads would not really run in parallel.
This is where my problems begin: How do I do the mutex locking properly here? My approaches produce boost::lock_errors. Here is some code to show what I want to do:
boost::recursive_mutex bla;
boost::condition_variable_any v1;
boost::condition_variable_any v2;
boost::recursive_mutex::scoped_lock lock(bla);
int got_processed = 0;
const int n = 10;
void ProcessNIterations() {
got_processed = 0;
// have some mutex or whatever unlocked here so that the worker thread can
// start or resume.
// my idea: have some sort of mutex lock that unlocks here and a condition
// variable v1 that is notified while the thread is waiting for that.
lock.unlock();
v1.notify_one();
// while the thread is working to do the iterations this function should wait
// because there is no use to proceed until the n iterations are done
// my idea: have another condition v2 variable that we wait for here and lock
// afterwards so the thread is blocked/paused
while (got_processed < n) {
v2.wait(lock);
}
}
void WorkerThread() {
int counter = 0;
// wait for something to start
// my idea: acquire a mutex lock here that was locked elsewhere before and
// wait for ProcessNIterations() to unlock it so this can start
boost::recursive_mutex::scoped_lock internal_lock(bla);
for (;;) {
for (;;) {
// here do the iterations
counter++;
std::cout << "iteration #" << counter << std::endl;
got_processed++;
if (counter >= n) {
// we've done n iterations; pause here
// my idea: unlock the mutex, notify v2
internal_lock.unlock();
v2.notify_one();
while (got_processed > 0) {
// when ProcessNIterations() is called again, resume here
// my idea: wait for v1 reacquiring the mutex again
v1.wait(internal_lock);
}
counter = 0;
}
}
}
}
int main(int argc, char *argv[]) {
boost::thread mythread(WorkerThread);
ProcessNIterations();
ProcessNIterations();
while (true) {}
}
The above code fails after doing 10 iterations in the line v2.wait(lock); with the following message:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >'
what(): boost::lock_error
How do I do this properly? If this is the way to go, how do I avoid lock_errors?
EDIT: I solved it using a concurrent queue like discussed here. This queue also has a maximum size after which a push will simply wait until at least one element has been poped. Therefore, the producer worker can simply go on filling this queue and the rest of the code can pop entries as it is suitable. No mutex locking needs to be done outside the queue. The queue is here:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
boost::condition_variable the_condition_variable;
boost::condition_variable the_condition_variable_popped;
int max_size_;
public:
concurrent_queue(int max_size=-1) : max_size_(max_size) {}
void push(const Data& data) {
boost::mutex::scoped_lock lock(the_mutex);
while (max_size_ > 0 && the_queue.size() >= max_size_) {
the_condition_variable_popped.wait(lock);
}
the_queue.push(data);
lock.unlock();
the_condition_variable.notify_one();
}
bool empty() const {
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
bool wait_and_pop(Data& popped_value) {
boost::mutex::scoped_lock lock(the_mutex);
bool locked = true;
if (the_queue.empty()) {
locked = the_condition_variable.timed_wait(lock, boost::posix_time::seconds(1));
}
if (locked && !the_queue.empty()) {
popped_value=the_queue.front();
the_queue.pop();
the_condition_variable_popped.notify_one();
return true;
} else {
return false;
}
}
int size() {
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.size();
}
};
This could be implemented using conditional variables. Once you've performed N iterations, you call wait() on the condition variable, and when the objects are processed in another thread, call signal() on the condition variable to unblock the other thread that is blocked on the condition variable.
You probably want some sort of finite capacity queue list or stack in conjunction with a condition variable. When the queue is full, the producer thread waits on the condition variable, and any time a consumer thread removes an element from the queue, it signals the condition variable. That would allow the producer to wake up and fill the queue again. If you really wanted to process N elements at a time, then have the workers signal only when there's capacity in the queue for N elements, rather then every time they pull an item out of the queue.