Blocking queue race condition? - c++

I'm trying to implement a high performance blocking queue backed by a circular buffer on top of pthreads, semaphore.h and gcc atomic builtins. The queue needs to handle multiple simulataneous readers and writers from different threads.
I've isolated some sort of race condition, and I'm not sure if it's a faulty assumption about the behavior of some of the atomic operations and semaphores, or whether my design is fundamentally flawed.
I've extracted and simplified it to the below standalone example. I would expect that this program never returns. It does however return after a few hundred thousand iterations with corruption detected in the queue.
In the below example (for exposition) it doesn't actually store anything, it just sets to 1 a cell that would hold the actual data, and 0 to represent an empty cell. There is a counting semaphore (vacancies) representing the number of vacant cells, and another counting semaphore (occupants) representing the number of occupied cells.
Writers do the following:
decrement vacancies
atomically get next head index (mod queue size)
write to it
increment occupants
Readers do the opposite:
decrement occupants
atomically get next tail index (mod queue size)
read from it
increment vacancies
I would expect that given the above, precisely one thread can be reading or writing any given cell at one time.
Any ideas about why it doesn't work or debugging strategies appreciated. Code and output below...
#include <stdlib.h>
#include <semaphore.h>
#include <iostream>
using namespace std;
#define QUEUE_CAPACITY 8 // must be power of 2
#define NUM_THREADS 2
struct CountingSemaphore
{
sem_t m;
CountingSemaphore(unsigned int initial) { sem_init(&m, 0, initial); }
void post() { sem_post(&m); }
void wait() { sem_wait(&m); }
~CountingSemaphore() { sem_destroy(&m); }
};
struct BlockingQueue
{
unsigned int head; // (head % capacity) is next head position
unsigned int tail; // (tail % capacity) is next tail position
CountingSemaphore vacancies; // how many cells are vacant
CountingSemaphore occupants; // how many cells are occupied
int cell[QUEUE_CAPACITY];
// (cell[x] == 1) means occupied
// (cell[x] == 0) means vacant
BlockingQueue() :
head(0),
tail(0),
vacancies(QUEUE_CAPACITY),
occupants(0)
{
for (size_t i = 0; i < QUEUE_CAPACITY; i++)
cell[i] = 0;
}
// put an item in the queue
void put()
{
vacancies.wait();
// atomic post increment
set(__sync_fetch_and_add(&head, 1) % QUEUE_CAPACITY);
occupants.post();
}
// take an item from the queue
void take()
{
occupants.wait();
// atomic post increment
get(__sync_fetch_and_add(&tail, 1) % QUEUE_CAPACITY);
vacancies.post();
}
// set cell i
void set(unsigned int i)
{
// atomic compare and assign
if (!__sync_bool_compare_and_swap(&cell[i], 0, 1))
{
corrupt("set", i);
exit(-1);
}
}
// get cell i
void get(unsigned int i)
{
// atomic compare and assign
if (!__sync_bool_compare_and_swap(&cell[i], 1, 0))
{
corrupt("get", i);
exit(-1);
}
}
// corruption detected
void corrupt(const char* action, unsigned int i)
{
static CountingSemaphore sem(1);
sem.wait();
cerr << "corruption detected" << endl;
cerr << "action = " << action << endl;
cerr << "i = " << i << endl;
cerr << "head = " << head << endl;
cerr << "tail = " << tail << endl;
for (unsigned int j = 0; j < QUEUE_CAPACITY; j++)
cerr << "cell[" << j << "] = " << cell[j] << endl;
}
};
BlockingQueue q;
// keep posting to the queue forever
void* Source(void*)
{
while (true)
q.put();
return 0;
}
// keep taking from the queue forever
void* Sink(void*)
{
while (true)
q.take();
return 0;
}
int main()
{
pthread_t id;
// start some pthreads to run Source function
for (int i = 0; i < NUM_THREADS; i++)
if (pthread_create(&id, NULL, &Source, 0))
abort();
// start some pthreads to run Sink function
for (int i = 0; i < NUM_THREADS; i++)
if (pthread_create(&id, NULL, &Sink, 0))
abort();
while (true);
}
Compile the above as follows:
$ g++ -pthread AboveCode.cpp
$ ./a.out
The output is different every time, but here is one example:
corruption detected
action = get
i = 6
head = 122685
tail = 122685
cell[0] = 0
cell[1] = 0
cell[2] = 1
cell[3] = 0
cell[4] = 1
cell[5] = 0
cell[6] = 1
cell[7] = 1
My system is Ubuntu 11.10 on Intel Core 2:
$ uname -a
Linux 3.0.0-14-generic #23-Ubuntu SMP \
Mon Nov 21 20:28:43 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep Intel
model name : Intel(R) Core(TM)2 Quad CPU Q9300 # 2.50GHz
$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Thanks,
Andrew.

One of possible situations, traced step by step for two writer threads (W0, W1) and one reader thread (R0). W0 entered put() earlier than W1, was interrupted by OS or hardware and finished later.
w0 (core 0) w1 (core 1) r0
t0 ---- --- blocked on occupants.wait() / take
t1 entered put() --- ---
t2 vacancies.wait() entered put() ---
t3 got new_head = 1 vacancies.wait() ---
t4 <interrupted by OS> got new_head = 2 ---
t5 written 1 at cell[2] ---
t6 occupants.post(); ---
t7 exited put() waked up
t8 --- got new_tail = 1
t9 <still in interrupt> --- read 0 from ceil[1] !! corruption !!
t10 written 1 at cell[1]
t11 occupants.post();
t12 exited put()

From a design point of view, I would consider the whole queue as a shared resource and protect it with a single mutex.
Writers do the following:
take the mutex
write to the queue (including handling of indexes)
free the mutex
Readers do the following:
take the mutex
read from the queue (including handling of indexes)
free the mutex

I have a theory. It's a circular queue so one reading thread may be getting lapped. Say a reader takes index 0. Before it does anything it loses the CPU. Another reader thread takes index 1, then 2, then 3 ... then 7, then 0. The first reader wakes up and both threads think they have exclusive access to index 0. Not sure how to prove it. Hope that helps.

Related

C++: Use future.get with timeout and without blocking

I'm having this problem, where I have a main loop, that needs to trigger an async work and must not wait for it to finish. What I want it to do is to check every while-loop whether the async work is done.
This can be accomplished with the future.wait_for().
Since I don't want to block the main loop, I can use future.wait_for(0).
So far so good.
In addition, I'd like to verify that I received (or didn't receive) an answer within X ms.
I can do that by checking how long since I launched the "async", and verify what comes first - X ms passed or future_status::ready returned.
My question - is this a good practice, or is there a better way to do it?
Some more information:
Since the main loop must launch many different async jobs, it means I need to have a lot of duplicated code - every launch needs to "remember" the timestamp it was launched and every time I check if the async job is ready, I need to re-calculate the time differences for each async job. This might be quite a hassle.
for now - this is an example of what I described (might have build errors):
#define MAX_TIMEOUT_MS 30
bool myFunc()
{
bool result = false;
//do something for quite some time
return result;
}
int main()
{
int timeout_ms = MAX_TIMEOUT_MS;
steady_clock::time_point start;
bool async_return = false;
std::future_status status = std::future_status::ready;
int delta_ms = 0;
while(true) {
// On first time, or once we have an answer, launch async again
if (status == std::future_status::ready) {
std::future<bool> fut = std::async (std::launch::async, myFunc);
start = steady_clock::now(); // record the start timestamp whenever we launch async()
}
// do something...
status = fut.wait_for(std::chrono::seconds(0));
// check how long since we launched async
delta_ms = chrono::duration_cast<chrono::milliseconds>(steady_clock::now() - start).count();
if (status != std::future_status::ready && delta_ms > timeout_ms ) {
break;
} else {
async_return = fut.get();
// and we do something with the result
}
}
return 0;
}
One thing you might want to consider: If your while loop doesn't do any relevant work, and just checks for task completion, you may be doing a busy-wait (https://en.wikipedia.org/wiki/Busy_waiting).
This means you are wasting a lot of CPU time doing useless work. This may sound counter-intuitive, but it can negatively affect your performance in evaluating task completion even if you are constantly checking it!
This can happen because this thread will look like it is doing a lot of work to the OS, and will receive high priority for processing. Which may make other threads (that are doing your async job) look less important and took longer to complete. Of course, this is not set in stone and anything can happen, but still, it is a waste of CPU if you are not doing any other work in that loop.
wait_for(0) is not the best option since it effectively breaks the execution of this thread, even if the work is not ready yet. And it may take longer than you expect for it to resume work (https://en.cppreference.com/w/cpp/thread/future/wait_for). std::future doesn't seem to have a truly non-blocking API yet (C++ async programming, how to not wait for future?), but you can use other resources such as a mutex and the try_lock (http://www.cplusplus.com/reference/mutex/try_lock/).
That said, if your loop still does important work, this flow is ok to use. But you might want to have a queue of completed jobs to check, instead of a single future. This queue would only be consumed by your main thread and can be implemented with a non-blocking thread-safe "try_get" call to get next completed jobs. As others commented, you may want to wrap your time-saving logic in a job dispatcher class or similar.
Maybe something like this (pseudo code!):
struct WorkInfo {
time_type begin_at; // initialized on job dispatch
time_type finished_at;
// more info
};
thread_safe_vector<WorkInfo> finished_work;
void timed_worker_job() {
info.begin_at = current_time();
do_real_job_work();
WorkInfo info;
info.finished_at = current_time();
finished_work.push(some_data);
}
void main() {
...
while (app_loop)
{
dispatch_some_jobs();
WorkInfo workTemp;
while (finished_work.try_get(&work)) // returns true if returned work
{
handle_finished_job(workTemp);
}
}
...
}
And if you are not familiar, I also suggest you to read about Thread-Pools (https://en.wikipedia.org/wiki/Thread_pool) and Producer-Consumer (https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem).
The code below runs tasks async and checks later if they are finished.
I've added some fake work and waits to see the results.
#define MAX_TIMEOUT_MS 30
struct fun_t {
size_t _count;
bool finished;
bool result;
fun_t () : _count (9999), finished (false), result (false) {
}
fun_t (size_t c) : _count (c), finished (false), result (false) {
}
fun_t (const fun_t & f) : _count (f._count), finished (f.finished), result (f.result) {
}
fun_t (fun_t && f) : _count (f._count), finished (f.finished), result (f.result) {
}
~fun_t () {
}
const fun_t & operator= (fun_t && f) {
_count = f._count;
finished = f.finished;
result = f.result;
return *this;
}
void run ()
{
for (int i = 0; i < 50; ++i) {
cout << _count << " " << i << endl;;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
result = true;
finished = true;
cout << " results: " << finished << ", " << result << endl;
}
operator bool () { return result; }
};
int main()
{
int timeout_ms = MAX_TIMEOUT_MS;
chrono::steady_clock::time_point start;
bool async_return = false;
std::future_status status = std::future_status::ready;
int delta_ms = 0;
std::map<size_t, fun_t> futs;
std::vector<std::future<void>> futfuncs;
size_t count = 0;
bool loop = true;
cout << "Begin --------------- " << endl;
while (loop) {
loop = false;
// On first time, or once we have an answer, launch async again
if (count < 3 && status == std::future_status::ready) {
//std::future<bool> fut = std::async (std::launch::async, myFunc);
futs[count] = std::move(fun_t(count));
//futs[futs.size() - 1].fut = std::async (std::launch::async, futs[futs.size() - 1]);
futfuncs.push_back (std::move(std::async(std::launch::async, &fun_t::run, &futs[count])));
}
// do something...
std::this_thread::sleep_for(std::chrono::seconds(2));
for (auto & f : futs) {
if (! f.second.finished) {
cout << " Not finished " << f.second._count << ", " << f.second.finished << endl;
loop = true;
} else {
bool aret = f.second;
cout << "Result: " << f.second._count << ", " << aret << endl;;
}
}
++count;
}
for (auto & f : futs) {
cout << " Verify " << f.second._count << ", " << f.second.finished;
if (f.second.finished) {
bool aret = f.second;
cout << "; result: " << aret;
}
cout << endl;
}
cout << "End --------------- " << endl;
return 0;
}
After removing lines (there are too much) you see the tasks. First number is the task id, second the iteration number.
Begin ---------------
0 0
0 1
0 2
Not finished 0, 0
1 0
0 20
1 1
Not finished 0, 0
Not finished 1, 0
2 0
1 20
0 40
2 1
0 49 // here task 0 ends
2 10
1 30
results: 1, 1 // "run" function ends
1 39
Result: 0, 1 // this is the verification "for"
Not finished 1, 0
Not finished 2, 0
results: 1, 1
Result: 0, 1
Result: 1, 1
Result: 2, 1
Verify 0, 1; result: 1
Verify 1, 1; result: 1
Verify 2, 1; result: 1
End ---------------

Troubles with simple Lock-Free MPSC Ring Buffer

I am trying to implement an array-based ring buffer that is thread-safe for multiple producers and a single consumer. The main idea is to have atomic head and tail indices. When pushing an element to the queue, the head is increased atomically to reserve a slot in the buffer:
#include <atomic>
#include <chrono>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <vector>
template <class T> class MPSC {
private:
int MAX_SIZE;
std::atomic<int> head{0}; ///< index of first free slot
std::atomic<int> tail{0}; ///< index of first occupied slot
std::unique_ptr<T[]> data;
std::unique_ptr<std::atomic<bool>[]> valid; ///< indicates whether data at an
///< index has been fully written
/// Compute next index modulo size.
inline int advance(int x) { return (x + 1) % MAX_SIZE; }
public:
explicit MPSC(int size) {
if (size <= 0)
throw std::invalid_argument("size must be greater than 0");
MAX_SIZE = size + 1;
data = std::make_unique<T[]>(MAX_SIZE);
valid = std::make_unique<std::atomic<bool>[]>(MAX_SIZE);
}
/// Add an element to the queue.
///
/// If the queue is full, this method blocks until a slot is available for
/// writing. This method is not starvation-free, i.e. it is possible that one
/// thread always fills up the queue and prevents others from pushing.
void push(const T &msg) {
int idx;
int next_idx;
int k = 100;
do {
idx = head;
next_idx = advance(idx);
while (next_idx == tail) { // queue is full
k = k >= 100000 ? k : k * 2; // exponential backoff
std::this_thread::sleep_for(std::chrono::nanoseconds(k));
} // spin
} while (!head.compare_exchange_weak(idx, next_idx));
if (valid[idx])
// this throws, suggesting that two threads are writing to the same index. I have no idea how this is possible.
throw std::runtime_error("message slot already written");
data[idx] = msg;
valid[idx] = true; // this was set to false by the reader,
// set it to true to indicate completed data write
}
/// Read an element from the queue.
///
/// If the queue is empty, this method blocks until a message is available.
/// This method is only safe to be called from one single reader thread.
T pop() {
int k = 100;
while (is_empty() || !valid[tail]) {
k = k >= 100000 ? k : k * 2;
std::this_thread::sleep_for(std::chrono::nanoseconds(k));
} // spin
T res = data[tail];
valid[tail] = false;
tail = advance(tail);
return res;
}
bool is_full() { return (head + 1) % MAX_SIZE == tail; }
bool is_empty() { return head == tail; }
};
When there is a lot of congestion, some messages get overwritten by other threads. Hence there must be something fundamentally wrong with what I'm doing here.
What seems to be happening is that two threads are acquiring the same index to write their data to. Why could that be?
Even if a producer were to pause just before writing it's data, the tail could not increase past this threads idx and hence no other thread should be able to overtake and claim that same idx.
EDIT
At the risk of posting too much code, here is a simple program that reproduces the problem. It sends some incrementing numbers from many threads and checks whether all numbers are received by the consumer:
#include "mpsc.hpp" // or whatever; the above queue
#include <thread>
#include <iostream>
int main() {
static constexpr int N_THREADS = 10; ///< number of threads
static constexpr int N_MSG = 1E+5; ///< number of messages per thread
struct msg {
int t_id;
int i;
};
MPSC<msg> q(N_THREADS / 2);
std::thread threads[N_THREADS];
// consumer
threads[0] = std::thread([&q] {
int expected[N_THREADS] {};
for (int i = 0; i < N_MSG * (N_THREADS - 1); ++i) {
msg m = q.pop();
std::cout << "Got message from T-" << m.t_id << ": " << m.i << std::endl;
if (expected[m.t_id] != m.i) {
std::cout << "T-" << m.t_id << " unexpected msg " << m.i << "; expected " << expected[m.t_id] << std::endl;
return -1;
}
expected[m.t_id] = m.i + 1;
}
});
// producers
for (int id = 1; id < N_THREADS; ++id) {
threads[id] = std::thread([id, &q] {
for (int i = 0; i < N_MSG; ++i) {
q.push(msg{id, i});
}
});
}
for (auto &t : threads)
t.join();
}
I am trying to implement an array-based ring buffer that is thread-safe for multiple producers and a single consumer.
I assume you are doing this as a learning exercise. Implementing a lock-free queue yourself is most probably the wrong thing to do if you want to solve a real problem.
What seems to be happening is that two threads are acquiring the same index to write their data to. Why could that be?
The combination of that producer spinlock with the outer CAS loop does not work in the intended way:
do {
idx = head;
next_idx = advance(idx);
while (next_idx == tail) { // queue is full
k = k >= 100000 ? k : k * 2; // exponential backoff
std::this_thread::sleep_for(std::chrono::nanoseconds(k));
} // spin
//
// ...
//
// All other threads (producers and consumers) can progress.
//
// ...
//
} while (!head.compare_exchange_weak(idx, next_idx));
The queue may be full when the CAS happens because those checks are performed independently. In addition, the CAS may succeed because the other threads may have advanced head to exactly match idx.

"Segmentation fault (core dumped)" while using pthread_create

So I've got a problem: when I trying to create the last thread it always says that core is dumped. Doesn't matter if I write to create 5 or 2 threads. Here is my code:
UPD: Now I can't do more than 3 threads and threads don't do functions that I want them to do(consume and produce)
UPD_2: Now I've go a message like that: terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
Aborted (core dumped)
#include<cstdlib>
#include <iostream>
#include <string>
#include <mutex>
#include <pthread.h>
#include <condition_variable>
#define NUM_THREADS 4
using namespace std;
struct thread_data
{
int thread_id;
int repeat;
};
class our_monitor{
private:
int buffer[100];
mutex m;
int n = 0, lo = 0, hi = 0;
condition_variable in,out;
unique_lock<mutex> lk;
public:
our_monitor():lk(m)
{
}
void insert(int val, int repeat)
{
in.wait(lk, [&]{return n <= 100-repeat;});
for(int i=0; i<repeat; i++)
{
buffer[hi] = val;
hi = (hi + 1) % 100; //ring buffer
n = n +1; //one more item in buffer
}
lk.unlock();
out.notify_one();
}
int remove(int repeat)
{
out.wait(lk, [&]{return n >= repeat;});
int val;
for(int i=0; i<repeat; i++)
{
val = buffer[lo];
lo = (lo + 1) % 100;
n -= 1;
}
lk.unlock();
in.notify_one();
return val;
}
};
our_monitor mon;
void* produce(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
cout<<"IN produce after paramiters"<< my_data->repeat<<endl;
int item;
item = rand()%100 + 1;
mon.insert(item, my_data->repeat);
cout<< "Item: "<< item << " Was prodused by thread:"<< my_data->thread_id << endl;
}
void* consume(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
cout<<"IN consume after paramiters"<< my_data->repeat<<endl;
int item;
item = mon.remove(my_data->repeat);
if(item) cout<< "Item: "<< item << " Was consumed by thread:"<< my_data->thread_id << endl;
}
int main()
{
our_monitor *mon = new our_monitor();
pthread_t threads[NUM_THREADS];
thread_data td[NUM_THREADS];
int rc;
int i;
for( i = 0; i < NUM_THREADS; i++ )
{
td[i].thread_id = i;
td[i].repeat = rand()%5 + 1;
if(i % 2 == 0)
{
cout << "main() : creating produce thread, " << i << endl;
rc = pthread_create(&threads[i], NULL, produce, (void*) &td[i]);
if (rc)
{
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
} else
{
cout << "main() : creating consume thread, " << i << endl;
rc = pthread_create(&threads[i], NULL, consume, (void *)&td[i]);
if (rc)
{
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
}
}
pthread_join(threads[0], NULL);
pthread_join(threads[1], NULL);
pthread_join(threads[2], NULL);
//pthread_exit(NULL);
}
UPD: Now I can't do more than 3 threads and threads don't do functions that I want them to do(consume and produce)
UPD_2: Now I've go a message like that: terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
Aborted (core dumped)
From cppref regarding std::condition_variable.wait(...)
"Calling this function if lock.mutex() is not locked by the current
thread is undefined behavior."
http://en.cppreference.com/w/cpp/thread/condition_variable/wait
Unfortunately, the program doesn't crash on line 47, but on line 55, where you unlock the lock that wasn't locked.
Lock the lock when you enter your functions. I've done a quick check of the rest of your logic, and I'm like 85% sure it's otherwise ok.
While I have you here, this is not strictly necessary, but it's good practice. std::lock_guard and std::unique_lock automatically lock the mutex when it enters scope and unlock it when it leaves scope. This helps simplify exception handling and weird function returns. I recommend you get rid of lk as a member variable and use it as a scoped local variable instead.
void insert(int val, int repeat)
{
{ // Scoped. Somewhat pedantic in this case, but it's always best to signal after the mutex is unlocked
std::unique_lock<std::mutex> lk(m);
in.wait(lk, [&]{return n <= 100-repeat;});
for(int i=0; i<repeat; i++)
{
buffer[hi] = val;
hi = (hi + 1) % 100; //ring buffer
n = n +1; //one more item in buffer
}
}
out.notify_one();
}
Ok, now for the final issue. The cool thing about producer/consumer is that we could produce and consume at the same time. However, we just locked our functions so this is no longer possible. What you can do now is move your condition lock/wait/unlock/work/signal inside the for loop
in pseudocode:
// produce:
while (true)
{
{
unique_lock lk(m)
wait(m, predicate)
}
produce 1
signal
}
The is equivalent to using semaphores (which C++'11 stl doesn't have, but you can easily make your own as shown above.)
// produce:
semaphore in(100);
semaphore out(0);
while (true)
{
in.down(1) // Subtracts 1 from in.count. Blocks when in.count == 0 (meaning the buffer is full)
produce 1
out.up(1) // Adds 1 to out.count
}
When main ends, td goes out of scope and ceases to exist. But you passed pointers into it to threads. You need to make sure td continues to exist as long as any threads might be using it.

consuming linkedlist queue in multithreads

I am learning OpenMP parallel processing library in C++. I felt that I got the basic concepts, and try to test my knowledge by implementing a linked list queue. I wanted to consume the queue from multiple threads.
The challenge here is that not to consume the same node twice. So I was considering sharing the queue between threads but allowing only a single thread to update(go to the next node in the queue) it at a time. For this purpose, I could use critical or lock. However, without using them; somehow, it seems to be working perfectly. No race-condition has occurred.
#include <iostream>
#include <omp.h>
#include <zconf.h>
struct Node {
int data;
struct Node* next = NULL;
Node() {}
Node(int data) {
this->data = data;
}
Node(int data, Node* node) {
this->data = data;
this->next = node;
}
};
void processNode(Node *pNode);
struct Queue {
Node *head = NULL, *tail = NULL;
Queue& add(int data) {
add(new Node(data));
return *this;
}
void add(Node *node) {
if (head == NULL) {
head = node;
tail = node;
} else {
tail->next = node;
tail = node;
}
}
Node* remove() {
Node *node;
node = head;
if (head != NULL)
head = head->next;
return node;
}
};
int main() {
srand(12);
Queue queue;
for (int i = 0; i < 6; ++i) {
queue.add(i);
}
double timer_started = omp_get_wtime();
omp_set_num_threads(3);
#pragma omp parallel
{
Node *n;
while ((n = queue.remove()) != NULL) {
double started = omp_get_wtime();
processNode(n);
double elapsed = omp_get_wtime() - started;
printf("Thread id: %d data: %d, took: %f \n", omp_get_thread_num(), n->data, elapsed);
}
}
double elapsed = omp_get_wtime() - timer_started;
std::cout << "end. took " << elapsed << " in total " << std::endl;
return 0;
}
void processNode(Node *node) {
int r = rand() % 3 + 1; // between 1 and 3
sleep(r);
}
Output looks like this:
Thread id: 0 data: 0, took: 1.000136
Thread id: 2 data: 2, took: 1.000127
Thread id: 2 data: 4, took: 1.000208
Thread id: 1 data: 1, took: 3.001371
Thread id: 0 data: 3, took: 2.001041
Thread id: 2 data: 5, took: 2.004960
end. took 4.00583 in total
I've run this with a different number of threads and many times. But, I couldn't get any race condition or something wrong. I was thinking it was possible for two different threads to invoke 'remove' and process a single node twice. But it did not happen. Why?
https://github.com/muatik/openmp-examples/blob/master/linkedlist/main.cpp
First and foremost, you can never prove multi-threaded code to be correct through testing. Your hunch, that you need a lock / critical section is correct.
Your test is particularly easy on the queue. The following breaks your queue quickly:
for (int i = 0; i < 10000; ++i) {
queue.add(i);
}
double timer_started = omp_get_wtime();
#pragma omp parallel
{
size_t counter = 0;
Node *n;
while ((n = queue.remove()) != NULL) {
processNode(n);
counter++;
}
#pragma omp critical
std::cout << "Thread " << omp_get_thread_num() << " processed " << counter << " nodes." << std::endl;
}
void processNode(Node *node) {}
Show for example the following interesting result:
Thread 1 processed 11133 nodes.
Thread 0 processed 9039 nodes.
But again, if you made a queue that runs a million times correctly with this test code, doesn't mean the queue is implemented correctly.
In particular, it is not sufficient to just protect remove, you must properly protect each and every read and write to the queue data. To get an idea of the difficulty to get this right, watch this excellent talk by Herb Sutter.
Generally, I recommend to use an existing parallel data structure, for example from Boost.Lockfree.
However, unfortunately OpenMP and C++11 lock / atomic primitives don't officially play well together. So strictly speaking, if you use OpenMP, you should stick to OpenMP synchronization primitives or libraries that use them.

One producer, two consumers acting on one 'queue' produced by producer

Preface: I'm new to multithreaded programming, and a little rusty with C++. My requirements are to use one mutex, and two conditions mNotEmpty and mEmpty. I must also create and populate the vectors in the way mentioned below.
I have one producer thread creating a vector of random numbers of size n*2, and two consumers inserting those values into two separate vectors of size n.
I am doing the following in the producer:
Lock the mutex: pthread_mutex_lock(&mMutex1)
Wait for consumer to say vector is empty: pthread_cond_wait(&mEmpty,&mMutex1)
Push back a value into the vector
Signal the consumer that the vector isn't empty anymore: pthread_cond_signal(&mNotEmpty)
Unlock the mutex: pthread_mutex_unlock(&mMutex1)
Return to step 1
In the consumer:
Lock the mutex: pthread_mutex_lock(&mMutex1)
Check to see if the vector is empty, and if so signal the producer: pthread_cond_signal(&mEmpty)
Else insert value into one of two new vectors (depending on which thread) and remove from original vector
Unlock the mutex: pthread_mutex_unlock(&mMutex1)
Return to step 1
What's wrong with my process? I keep getting segmentation faults or infinite loops.
Edit: Here's the code:
void Producer()
{
srand(time(NULL));
for(unsigned int i = 0; i < mTotalNumberOfValues; i++){
pthread_mutex_lock(&mMutex1);
pthread_cond_wait(&mEmpty,&mMutex1);
mGeneratedNumber.push_back((rand() % 100) + 1);
pthread_cond_signal(&mNotEmpty);
pthread_mutex_unlock(&mMutex1);
}
}
void Consumer(const unsigned int index)
{
for(unsigned int i = 0; i < mNumberOfValuesPerVector; i++){
pthread_mutex_lock(&mMutex1);
if(mGeneratedNumber.empty()){
pthread_cond_signal(&mEmpty);
}else{
mThreadVector.at(index).push_back[mGeneratedNumber.at(0)];
mGeneratedNumber.pop_back();
}
pthread_mutex_unlock(&mMutex1);
}
}
I'm not sure I understand the rationale behind the way you're doing
things. In the usual consumer-provider idiom, the provider pushes as
many items as possible into the channel, waiting only if there is
insufficient space in the channel; it doesn't wait for empty. So the
usual idiom would be:
provider (to push one item):
pthread_mutex_lock( &mutex );
while ( ! spaceAvailable() ) {
pthread_cond_wait( &spaceAvailableCondition, &mutex );
}
pushTheItem();
pthread_cond_signal( &itemAvailableCondition );
pthread_mutex_unlock( &mutex );
and on the consumer side, to get an item:
pthread_mutex_lock( &mutex );
while ( ! itemAvailable() ) {
pthread_cond_wait( &itemAvailableCondition, &mutex );
}
getTheItem();
pthread_cond_signal( &spaceAvailableCondition );
pthread_mutex_unlock( &mutex );
Note that for each condition, one side signals, and the other waits. (I
don't see any wait in your consumer.) And if there is more than one
process on either side, I'd recommend using pthread_cond_broadcast,
rather than pthread_cond_signal.
There are a number of other issues in your code. Some of them look more
like typos: you should copy/paste actual code to avoid this. Do you
really mean to read and pop mGeneratedValues, when you push into
mGeneratedNumber, and check whether that is empty? (If you actually
do have two different queues, then you're popping from a queue where no
one has pushed.) And you don't have any loops waiting for the
conditions; you keep iterating through the number of elements you
expect (incrementing the counter each time, so you're likely to
gerninate long before you should)—I can't see an infinite loop,
but I can readily see a endless wait in pthread_cond_wait in the
producer. I don't see a core dump off hand, but what happens when one
of the processes terminates (probably the consumer, because it never
waits for anything); if it ends up destroying the mutex or the condition
variables, you could get a core dump when another process attempts to
use them.
In producer, call pthread_cond_wait only when queue is not empty. Otherwise you get blocked forever due to a race condition.
You might want to consider taking mutex only after condition is fulfilled, e.g.
producer()
{
while true
{
waitForEmpty();
takeMutex();
produce();
releaseMutex();
}
}
consumer()
{
while true
{
waitForNotEmpty();
takeMutex();
consume();
releaseMutex();
}
}
Here is a solution to a similar problem like you. In this program producer produces a no and writes it to a array(buffer) and a maintains a file then update a status(status array) about it, while on getting data in the array(buffer) consumers start to consume(read and write to their file) and update a status that it has consumed. when producer looks that both the consumer has consumed the data it overrides the data with a new value and goes on. for convenience here i have restricted the code to run for 2000 nos.
// Producer-consumer //
#include <iostream>
#include <fstream>
#include <pthread.h>
#define MAX 100
using namespace std;
int dataCount = 2000;
int buffer_g[100];
int status_g[100];
void *producerFun(void *);
void *consumerFun1(void *);
void *consumerFun2(void *);
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t dataNotProduced = PTHREAD_COND_INITIALIZER;
pthread_cond_t dataNotConsumed = PTHREAD_COND_INITIALIZER;
int main()
{
for(int i = 0; i < MAX; i++)
status_g[i] = 0;
pthread_t producerThread, consumerThread1, consumerThread2;
int retProducer = pthread_create(&producerThread, NULL, producerFun, NULL);
int retConsumer1 = pthread_create(&consumerThread1, NULL, consumerFun1, NULL);
int retConsumer2 = pthread_create(&consumerThread2, NULL, consumerFun2, NULL);
pthread_join(producerThread, NULL);
pthread_join(consumerThread1, NULL);
pthread_join(consumerThread2, NULL);
return 0;
}
void *producerFun(void *)
{
//file to write produced data by producer
const char *producerFileName = "producer.txt";
ofstream producerFile(producerFileName);
int index = 0, producerCount = 0;
while(1)
{
pthread_mutex_lock(&mutex);
if(index == MAX)
{
index = 0;
}
if(status_g[index] == 0)
{
static int data = 0;
data++;
cout << "Produced: " << data << endl;
buffer_g[index] = data;
producerFile << data << endl;
status_g[index] = 5;
index ++;
producerCount ++;
pthread_cond_broadcast(&dataNotProduced);
}
else
{
cout << ">> Producer is in wait.." << endl;
pthread_cond_wait(&dataNotConsumed, &mutex);
}
pthread_mutex_unlock(&mutex);
if(producerCount == dataCount)
{
producerFile.close();
return NULL;
}
}
}
void *consumerFun1(void *)
{
const char *consumerFileName = "consumer1.txt";
ofstream consumerFile(consumerFileName);
int index = 0, consumerCount = 0;
while(1)
{
pthread_mutex_lock(&mutex);
if(index == MAX)
{
index = 0;
}
if(status_g[index] != 0 && status_g[index] != 2)
{
int data = buffer_g[index];
cout << "Cosumer1 consumed: " << data << endl;
consumerFile << data << endl;
status_g[index] -= 3;
index ++;
consumerCount ++;
pthread_cond_signal(&dataNotConsumed);
}
else
{
cout << "Consumer1 is in wait.." << endl;
pthread_cond_wait(&dataNotProduced, &mutex);
}
pthread_mutex_unlock(&mutex);
if(consumerCount == dataCount)
{
consumerFile.close();
return NULL;
}
}
}
void *consumerFun2(void *)
{
const char *consumerFileName = "consumer2.txt";
ofstream consumerFile(consumerFileName);
int index = 0, consumerCount = 0;
while(1)
{
pthread_mutex_lock(&mutex);
if(index == MAX)
{
index = 0;
}
if(status_g[index] != 0 && status_g[index] != 3)
{
int data = buffer_g[index];
cout << "Consumer2 consumed: " << data << endl;
consumerFile << data << endl;
status_g[index] -= 2;
index ++;
consumerCount ++;
pthread_cond_signal(&dataNotConsumed);
}
else
{
cout << ">> Consumer2 is in wait.." << endl;
pthread_cond_wait(&dataNotProduced, &mutex);
}
pthread_mutex_unlock(&mutex);
if(consumerCount == dataCount)
{
consumerFile.close();
return NULL;
}
}
}
Here is only one problem that producer in not independent to produce, that is it needs to take lock on the whole array(buffer) before it produces new data, and if the mutex is locked by consumer it waits for that and vice versa, i am trying to look for it.