Segfault occuring in attempt to synchronize queue - c++

I am learning about multithreading and I wanted to simulate producer-consumer problem ( using semaphore if I can call it that ).
I have a class that holds a queue, producer push ints into queue and consumer retrieves it and prints it. I simulated is as following
class TestClass{
public:
void producer( int i ){
unique_lock<mutex> l(m);
q.push(i);
if( q.size() )
cnd.notify_all();
}
void consumer(){
unique_lock<mutex> l(m);
while( q.empty() ){
cnd.wait(l);
}
int tmp = q.front();
q.pop();
cout << "Producer got " << tmp << endl;
}
void ConsumerInit( int threads ){
for( int i = 0; i < threads; i++ ){
thrs[i] = thread(&TestClass::consumer, this );
}
for( auto &a : thrs )
a.join();
}
private:
queue<int> q;
vector<thread> thrs;
mutex m;
condition_variable cnd;
};
And I used a little console application to call data:
int main(){
int x;
TestClass t;
int counter = 0;
while( cin >> x ){
if( x == 0 )
break;
if( x == 1)
t.producer(counter++);
if( x == 2 )
t.ConsumerInit(5);
}
}
So when user input 1, a data is pushed into the queue , if user press 2 threads are spawned.
In any order of invoking it, for example, pressing 1 1 and then 2, or 2 1 1
it throws segfault. I am not sure why my understanding of my code is as following: let's assume order 2 1 1
I initialize 5 threads, they see that queue is empty, so they go to sleep. When I push a number to the queue, it notifies all threads sleeping.
The first one to wake up lock mutex again and proceed to retrieve number from queue and afterwards releasing the mutex, when mutex is released another thread do the same and unlocks the mutex, the third thread after mutex is unlocked is still in loop and see that queue is yet again empty and goes to sleep again, same with all remaining threads.
Is this logic correct? If so, why does this keep throwing segfault, if not, I appreciate all explanations.
Thanks for the help!
//edit
By answers suggets , i replaced [] with vector.push_back , but consumer does nothing with data now , does not take it or print it.

You aren't expanding the thrs vector when you do
thrs[i] = thread(&CTest::consumer, this );
You should do
thrs.emplace_back(&CTest::consumer, this);
That's where the crash would be.

Your issue has nothing to do with multithreading. You are accessing a std::vector out-of-bounds:
for (int i = 0; i < threads; i++) {
thrs[i] = thread(&CTest::consumer, this);
//...
vector<thread> thrs;
The thrs vector is empty, and you're trying to access as if it has entries.
To show the error, use:
thrs.at(i) = thread(&CTest::consumer, this);
and you will be greeted with a std::out_of_range exception instead of a segmentation fault.

Your program deadlocks, if the input sequence is not in the form of 1 1 1 1 1 ... 2. That is if the number if 1s preceding 2 is less than five.
Here is the reason:
If the total elements in queue size are less than 5 and the main thread calls consumerInit, some of the five created consumer threads will block waiting for the queue to receive elements. Meanwhile, the main thread blocks on the join operation. Since the main thread will be waiting for consumer threads to finish while some of those threads are waiting for data to consume, there will be no progress. Hence deadlock.

Problem is here:
for( auto &a : thrs )
a.join();
Main thread gets blocked here after you enter 2 waiting for the consumers to finish. So after this point you think that you are entering inputs, while there is no cin happening.
Remove these two lines and then you can enter 1 and producer/consumer will do their job.

Related

double buffer for consumer and producer problem

So I am trying to implement a double buffer for a typical producer and consumer problem.
1.get_items() basically produces 10 items at a time.
2.producer basically push 10 items onto a write queue. Assume that currently we only have one producer.
3.consumers will consume one item from the queue. There are many consumers.
So I am sharing my code as the following. The implementation idea is simple, consume from the readq until it is empty and then swap the queue pointer, which the readq would point to the writeq now and writeq would now points to the emptied queue and would starts to fill it again. So producer and consumer can work independently without halting each other. This sort of swaps space for time.
However, my code does not work in multiple consumer cases. In my code, I initiated 10 consumer threads, and it always stuck at the .join().
So I am thinking that my code is definitely buggy. However, by examine carefully, I did not find where that bug is. And it seems the code stuck after lk1.unlock(), so it is not stuck in a while or something obvious.
mutex m1;
mutex m2; // using 2 mutex, so when producer is locked, consumer can still run
condition_variable put;
condition_variable fetch;
queue<int> q1;
queue<int> q2;
queue<int>* readq = &q1;
queue<int>* writeq = &q2;
bool flag{ true };
vector<int> get_items() {
vector<int> res;
for (int i = 0; i < 10; i++) {
res.push_back(i);
}
return res;
}
void producer_mul() {
unique_lock<mutex> lk2(m2);
put.wait(lk2, [&]() {return flag == false; }); //producer waits for consumer signal
vector<int> items = get_items();
for (auto it : items) {
writeq->push(it);
}
flag = true; //signal queue is filled
fetch.notify_one();
lk2.unlock();
}
int get_one_item_mul() {
unique_lock<mutex> lk1(m1);
int res;
if (!(*readq).empty()) {
res = (*readq).front(); (*readq).pop();
if ((*writeq).empty() && flag == true) { //if writeq is empty
flag = false;
put.notify_one();
}
}
else {
readq = writeq; // swap queue pointer
while ((*readq).empty()) { // not yet write
if (flag) {
flag = false;
put.notify_one();//start filling process
}
//if (readq->empty()) { //upadted due to race. readq now points to writeq, so if producer finished, readq is not empty and flag = true.
fetch.wait(lk1, [&]() {return flag == true; });
//}
}
if (flag) {
writeq = writeq == &q1 ? &q2 : &q1; //swap the writeq to the alternative queue and fill it again
flag = false;
//put.notify_one(); //fill that queue again if needed. but in my case, 10 item is produced and consumed, so no need to use the 2nd round, plus the code does not working in this simple case..so commented out for now.
}
res = readq->front(); readq->pop();
}
lk1.unlock();
this_thread::sleep_for(10ms);
return res;
}
int main()
{
std::vector<std::thread> threads;
std::packaged_task<void(void)> job1(producer_mul);
vector<std::future<int>> res;
for (int i = 0; i < 10; i++) {
std::packaged_task<int(void)> job2(get_one_item_mul);
res.push_back(job2.get_future());
threads.push_back(std::thread(std::move(job2)));
}
threads.push_back(std::thread(std::move(job1)));
for (auto& t : threads) {
t.join();
}
for (auto& a : res) {
cout << a.get() << endl;
}
return 0;
}
I added some comments, but the idea and code is pretty simple and self-explanatory.
I am trying to figure out where the problem is in my code. Does it work for multiple consumer? Further more, if there are multiple producers here, does it work? I do not see a problem since basically in the code the lock is not fine grained. Producer and Consumer both are locked from the beginning till the end.
Looking forward to discussion and any help is appreciated.
Update
updated the race condition based on one of the answer.
The program is still not working.
Your program contains data races, and therefore exhibits undefined behavior. I see at least two:
producer_mul accesses and modifies flag while holding m2 mutex but not m1. get_one_item_mul accesses and modifies flag while holding m1 mutex but not m2. So flag is not in fact protected against concurrent access.
Similarly, producer_mul accesses writeq pointer while holding m2 mutex but not m1. get_one_item_mul modifies writeq while holding m1 mutex but not m2.
There's also a data race on the queues themselves. Initially, both queues are empty. producer_mul is blocked waiting on flag. Then the following sequence occurs ( P for producer thread, C for consumer thread):
C: readq = writeq; // Both now point to the same queue
C: flag = false; put.notify_one(); // This wakes up producer
**P: writeq->push(it);
**C: if (readq->empty())
The last two lines happen concurrently, with no protection against concurrent access. One thread modifies an std::queue instance while the other accesses that same instance. This is a data race.
There's a data race at the heart of the design. Let's imagine there's just one producer P and two consumers C1 and C2. Initially, P waits on put until flag == false. C1 grabs m1; C2 is blocked on m1.
C1 sets readq = writeq, then unblocks P1, then calls fetch.wait(lk1, [&]() {return flag == true; });. This unlocks m1, allowing C2 to proceed. So now P is busy writing to writeq while C2 is busy reading from readq - which is one and the same queue.

Deadlock with blocking queue and barrier in C++

I have this very simple and small C++ program that creates a thread pool, then put messages in a blocking queue shared between threads to say to each thread what to do.
Message can be: -1 (end of stream -> terminate), -2 (barrier -> wait for all threads to reach it, then continue), other values to do random computation. The loop is done in this order: some computation, barrier, some computation, barrier, ..., barrier, end of stream, thread join, exit.
I'm not able to understand why I obtain deadlock even with 2 threads in the pool. The queue is not able to become empty, but the order in which I push and pop messages would always lead to an empty queue!
The blocking queue implementation is the one proposed here (C++ Equivalent to Java's BlockingQueue) with just two methods added. I copy also the queue code below.
Any help?
Main.cpp
#include <iostream>
#include <vector>
#include <thread>
#include "Queue.hpp"
using namespace std;
// function executed by each thread
void f(int i, Queue<int> &q){
while(1){
// take a message from blocking queue
int j= q.pop();
// if it is end of stream then exit
if (j==-1) break;
// if it is barrier, wait for other threads to reach it
if (j==-2){
// active wait! BAD, but anyway...
while(q.size() > 0){
;
}
}
else{
// random stuff
int x = 0;
for(int i=0;i<j;i++)
x += 4;
}
}
}
int main(){
Queue<int> queue; //blocking queue
vector<thread> tids; // thread pool
int nt = 2; // number of threads
int dim = 8; // number to control number of operations
// create thread pool, passing thread id and queue
for(int i=0;i<nt;i++)
tids.push_back(thread(f,i, std::ref(queue)));
for(int dist=1; dist<=dim; dist++){ // without this outer loop the program works fine
// push random number
for(int j=0;j<dist;j++){
queue.push(4);
}
// push barrier code
for(int i=0;i<nt;i++){
queue.push(-2);
}
// active wait! BAD, but anyway...
while (queue.size()>0){
;
}
}
// push end of stream
for(int i=0;i<nt;i++)
queue.push(-1);
// join thread pool
for(int i=0;i<nt;i++){
tids[i].join();
}
return 0;
}
Queue.hpp
#include <deque>
#include <mutex>
#include <condition_variable>
template <typename T>
class Queue
{
private:
std::mutex d_mutex;
std::condition_variable d_condition;
std::deque<T> d_queue;
public:
void push(T const& value) {
{
std::unique_lock<std::mutex> lock(this->d_mutex);
d_queue.push_front(value);
}
this->d_condition.notify_one();
}
T pop() {
std::unique_lock<std::mutex> lock(this->d_mutex);
this->d_condition.wait(lock, [=]{ return !this->d_queue.empty(); });
T rc(std::move(this->d_queue.back()));
this->d_queue.pop_back();
return rc;
}
bool empty(){
std::unique_lock<std::mutex> lock(this->d_mutex);
return this->d_queue.empty();
}
int size(){
std::unique_lock<std::mutex> lock(this->d_mutex);
return this->d_queue.size();
}
};
I think the problem is your active wait that you describe as "BAD, but anyway..." and using the size of the queue as a barrier instead of using a true synchronization barrier
For dim =1 you push a Queue that has 4, -2, -2. One thread will grab the 4 and -2 while the other grabs the remaining -2. At this point the queue is empty and you have three threads (the two workers and main thread) doing an active wait racing to see if the queue has been emptied. There is a mutex on size that only lets one read the size at a time. If the main thread is scheduled first and determines that queue is empty it will push on -1, -1 to signal end of stream. Now, the queue is no longer empty, but one or both of the two worker threads are waiting for it to empty. Since they are waiting for it to be empty before taking another item the queue is deadlocked in this state.
For the case were dim > 1 there is likely a similar issue with pushing the next set of values into the queue on the main thread before both workings acknowledge the empty the queue and exit the active wait.
I had run your code and I understand the problem. The problem is with "-2" option. When the two threads arrive to this point, your main thread already pushed another values to the queue. So, if your queue increased it's size between the time that your threads got "-2" value, and before they arrive to "-2" option, your code will stuck:
Thread 1: get -2.
Thread 2: get -2.
Thread main: push -1.
Thread main: push -1.
Thread 1: wait untill the whole queue will be empty.
Thread 2: wait untill the whole queue will be empty.
queue:
-1
-1
^ this in case that dim equals 1. In your code, dim equals 8, you don't want to see how it looks like..
To solve this, all I did was to disable the following loop:
for(int i=0;i<nt;i++){
queue.push(-2);
}
When this pard disable, the code run perfectly.
This is how I checked it:
std::mutex guarder;
// function executed by each thread
void f(int i, Queue<int> &q){
while(1){
// take a message from blocking queue
guarder.lock();
int j= q.pop();
guarder.unlock();
// if it is end of stream then exit
if (j==-1) break;
// if it is barrier, wait for other threads to reach it
if (j==-2){
// active wait! BAD, but anyway...
while(q.size() > 0){
;
}
}
else{
// random stuff
int x = 0;
for(int i=0;i<j;i++)
x += 4;
guarder.lock();
cout << x << std::endl;
guarder.unlock();
}
}
}
int main(){
Queue<int> queue; //blocking queue
vector<thread> tids; // thread pool
int nt = 2; // number of threads
int dim = 8; // number to control number of operations
// create thread pool, passing thread id and queue
for(int i=0;i<nt;i++)
tids.push_back(thread(f,i, std::ref(queue)));
for(int dist=1; dist<=dim; dist++){ // without this outer loop the program works fine
// push random number
for(int j=0;j<dist;j++){
queue.push(dist);
}
/*// push barrier code
for(int i=0;i<nt;i++){
queue.push(-2);
}*/
// active wait! BAD, but anyway...
while (queue.size()>0){
;
}
}
// push end of stream
for(int i=0;i<nt;i++)
queue.push(-1);
// join thread pool
for(int i=0;i<nt;i++){
tids[i].join();
}
return 0;
}
The result:
4
8
8
12
12
12
16
16
16
20
20
16
20
20
20
24
24
24
24
24
24
28
28
28
28
28
28
28
32
32
32
32
32
32
32
32
BTW, the stuck didn't occur because your "active wait" part. It is not good, but it cause other problems usually (like slowing down your system).

How to run a function on a separate thread, if a thread is available

How can I run a function on a separate thread if a thread is available, assuming that i always want k threads running at the same time at any point?
Here's a pseudo-code
For i = 1 to N
IF numberOfRunningThreads < k
// run foo() on another thread
ELSE
// run foo()
In summary, once a thread is finished it notifies the other threads that there's a thread available that any of the other threads can use. I hope the description was clear.
My personal approach: Just do create the k threads and let them call foo repeatedly. You need some counter, protected against race conditions, that is decremented each time before foo is called by any thread. As soon as the desired number of calls has been performed, the threads will exit one after the other (incomplete/pseudo code):
unsigned int global_counter = n;
void fooRunner()
{
for(;;)
{
{
std::lock_guard g(global_counter_mutex);
if(global_counter == 0)
break;
--global_counter;
}
foo();
}
}
void runThreads(unsigned int n, unsigned int k)
{
global_counter = n;
std::vector<std::thread> threads(std::min(n, k - 1));
// k - 1: current thread can be reused, too...
// (provided it has no other tasks to perform)
for(auto& t : threads)
{
t = std::thread(&fooRunner);
}
fooRunner();
for(auto& t : threads)
{
t.join();
}
}
If you have data to pass to foo function, instead of a counter you could use e. g a FIFO or LIFO queue, whatever appears most appropriate for the given use case. Threads then exit as soon as the buffer gets empty; you'd have to prevent the buffer running empty prematurely, though, e. g. by prefilling all the data to be processed before starting the threads.
A variant might be a combination of both: exiting, if the global counter gets 0, waiting for the queue to receive new data e. g. via a condition variable otherwise, and the main thread continuously filling the queue while the threads are already running...
you can use (std::thread in <thread>) and locks to do what you want, but it seems to me that your code could be simply become parallel using openmp like this.
#pragma omp parallel num_threads(k)
#pragma omp for
for (unsigned i = 0; i < N; ++i)
{
auto t_id = omp_get_thread_num();
if (t_id < K)
foo()
else
other_foo()
}

How do I make threads run sequentially instead of concurrently?

For example I want each thread to not start running until the previous one has completed, is there a flag, something like thread.isRunning()?
#include <iostream>
#include <vector>
#include <thread>
using namespace std;
void hello() {
cout << "thread id: " << this_thread::get_id() << endl;
}
int main() {
vector<thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(thread(hello));
for (thread& thr : threads)
thr.join();
cin.get();
return 0;
}
I know that the threads are meant to run concurrently, but what if I want to control the order?
There is no thread.isRunning(). You need some synchronization primitive to do it.
Consider std::condition_variable for example.
One approachable way is to use std::async. With the current definition of std::async is that the associated state of an operation launched by std::async can cause the returned std::future's destructor to block until the operation is complete. This can limit composability and result in code that appears to run in parallel but in reality runs sequentially.
{
std::async(std::launch::async, []{ hello(); });
std::async(std::launch::async, []{ hello(); }); // does not run until hello() completes
}
If we need the second thread start to run after the first one is completed, is a thread really needed?
For solution I think try to set a global flag, the set the value in the first thread, and when start the second thread, check the flag first should work.
You can't simply control the order like saying "First, thread 1, then thread 2,..." you will need to make use of synchronization (i.e. std::mutex and condition-variables std::condition_variable_any).
You can create events so as to block one thread until a certain event happend.
See cppreference for an overview of threading-mechanisms in C++-11.
You will need to use semaphore or lock.
If you initialize semaphore to value 0:
Call wait after thread.start() and call signal/ release in the end of thread execution function (e.g. run funcition in java, OnExit function etc...)
So the main thread will keep waiting until the thread in loop has completed its execution.
Task-based parallelism can achieve this, but C++ does not currently offer task model as part of it's threading libraries. If you have TBB or PPL you can use their task-based facilities.
I think you can achieve this by using std::mutex and std::condition_variable from C++11. To be able to run threads sequentially array of booleans in used, when thread is done doing some work it writes true in specific index of the array.
For example:
mutex mtx;
condition_variable cv;
int ids[10] = { false };
void shared_method(int id) {
unique_lock<mutex> lock(mtx);
if (id != 0) {
while (!ids[id - 1]) {
cv.wait(lock);
}
}
int delay = rand() % 4;
cout << "Thread " << id << " will finish in " << delay << " seconds." << endl;
this_thread::sleep_for(chrono::seconds(delay));
ids[id] = true;
cv.notify_all();
}
void test_condition_variable() {
thread threads[10];
for (int i = 0; i < 10; ++i) {
threads[i] = thread(shared_method, i);
}
for (thread &t : threads) {
t.join();
}
}
Output:
Thread 0 will finish in 3 seconds.
Thread 1 will finish in 1 seconds.
Thread 2 will finish in 1 seconds.
Thread 3 will finish in 2 seconds.
Thread 4 will finish in 2 seconds.
Thread 5 will finish in 0 seconds.
Thread 6 will finish in 0 seconds.
Thread 7 will finish in 2 seconds.
Thread 8 will finish in 3 seconds.
Thread 9 will finish in 1 seconds.

C++ boost thread: having a worker thread pause and unpause based on mutexes/conditions using a concurrent queue

I am fairly new to multi-threaded programming, so please forgive my possibly imprecise question. Here is my problem:
I have a function processing data and generating lots of objects of the same type. This is done iterating in several nested loops, so it would be practical to just do all iterations, save these objects in some container and then work on that container in interfacing code doing the next steps. However, I have to create millions of these objects which would blow up the memory usage. These constraints are mainly due to external factors I cannot control.
Generating only a certain amount of data would be ideal, but breaking out of the loops and restarting later at the same point is also impractical. My idea was to do the processing in a separate thread which would be paused after n iterations and resumed once all n objects are completely processed, then resuming, doing n next iterations and so on until all iterations are done. It is important to wait until the thread has done all n iterations, so both threads would not really run in parallel.
This is where my problems begin: How do I do the mutex locking properly here? My approaches produce boost::lock_errors. Here is some code to show what I want to do:
boost::recursive_mutex bla;
boost::condition_variable_any v1;
boost::condition_variable_any v2;
boost::recursive_mutex::scoped_lock lock(bla);
int got_processed = 0;
const int n = 10;
void ProcessNIterations() {
got_processed = 0;
// have some mutex or whatever unlocked here so that the worker thread can
// start or resume.
// my idea: have some sort of mutex lock that unlocks here and a condition
// variable v1 that is notified while the thread is waiting for that.
lock.unlock();
v1.notify_one();
// while the thread is working to do the iterations this function should wait
// because there is no use to proceed until the n iterations are done
// my idea: have another condition v2 variable that we wait for here and lock
// afterwards so the thread is blocked/paused
while (got_processed < n) {
v2.wait(lock);
}
}
void WorkerThread() {
int counter = 0;
// wait for something to start
// my idea: acquire a mutex lock here that was locked elsewhere before and
// wait for ProcessNIterations() to unlock it so this can start
boost::recursive_mutex::scoped_lock internal_lock(bla);
for (;;) {
for (;;) {
// here do the iterations
counter++;
std::cout << "iteration #" << counter << std::endl;
got_processed++;
if (counter >= n) {
// we've done n iterations; pause here
// my idea: unlock the mutex, notify v2
internal_lock.unlock();
v2.notify_one();
while (got_processed > 0) {
// when ProcessNIterations() is called again, resume here
// my idea: wait for v1 reacquiring the mutex again
v1.wait(internal_lock);
}
counter = 0;
}
}
}
}
int main(int argc, char *argv[]) {
boost::thread mythread(WorkerThread);
ProcessNIterations();
ProcessNIterations();
while (true) {}
}
The above code fails after doing 10 iterations in the line v2.wait(lock); with the following message:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >'
what(): boost::lock_error
How do I do this properly? If this is the way to go, how do I avoid lock_errors?
EDIT: I solved it using a concurrent queue like discussed here. This queue also has a maximum size after which a push will simply wait until at least one element has been poped. Therefore, the producer worker can simply go on filling this queue and the rest of the code can pop entries as it is suitable. No mutex locking needs to be done outside the queue. The queue is here:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
boost::condition_variable the_condition_variable;
boost::condition_variable the_condition_variable_popped;
int max_size_;
public:
concurrent_queue(int max_size=-1) : max_size_(max_size) {}
void push(const Data& data) {
boost::mutex::scoped_lock lock(the_mutex);
while (max_size_ > 0 && the_queue.size() >= max_size_) {
the_condition_variable_popped.wait(lock);
}
the_queue.push(data);
lock.unlock();
the_condition_variable.notify_one();
}
bool empty() const {
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
bool wait_and_pop(Data& popped_value) {
boost::mutex::scoped_lock lock(the_mutex);
bool locked = true;
if (the_queue.empty()) {
locked = the_condition_variable.timed_wait(lock, boost::posix_time::seconds(1));
}
if (locked && !the_queue.empty()) {
popped_value=the_queue.front();
the_queue.pop();
the_condition_variable_popped.notify_one();
return true;
} else {
return false;
}
}
int size() {
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.size();
}
};
This could be implemented using conditional variables. Once you've performed N iterations, you call wait() on the condition variable, and when the objects are processed in another thread, call signal() on the condition variable to unblock the other thread that is blocked on the condition variable.
You probably want some sort of finite capacity queue list or stack in conjunction with a condition variable. When the queue is full, the producer thread waits on the condition variable, and any time a consumer thread removes an element from the queue, it signals the condition variable. That would allow the producer to wake up and fill the queue again. If you really wanted to process N elements at a time, then have the workers signal only when there's capacity in the queue for N elements, rather then every time they pull an item out of the queue.