I am trying to implement queue to synchronize producer-consumer problem. However I cannot figure out why my solution deadlocks.
What it should do:
Queue with fixed size containing tasks for consumer threads, producers should be blocked until there is space in the queue. (m producers, n consumers)
Technical requirements: POSIX threads (no C++11)
void taskProducer(ArData &a) {
while (true) {
Work work = a.workGenerator();
if (work == NULL) {
a.queue.finished();
return;
}
WorkTask task(work);
a.queue.push(task);
}
}
void consumer(TaskQueue &queue) {
while (true) {
Task task = queue.pop();
if (task.end) return;
task.run();
}
}
#define NUM_OF_PRODUCERS 2
//Task ktery muze vlakno zpracovat
class Task {
public:
Task() : end(false){ }
//Execute task
virtual void run() { }
//Mark task as ending Task, consumer should exit thread
bool end;
};
//Synchronous queue
class TaskQueue {
private:
//Number of consumer threads
int numOfConsumers;
//Number of producers that finished producing
int doneProducers;
//Producers wait, Consumers post
sem_t producerNotificator;
//Consumers wait, Producers post
sem_t consumerNotificator;
//Mutex for entire TaskQueue
pthread_mutex_t mut;
//Data storage
std::queue<Task> queue;
public:
explicit TaskQueue(int m_numOfConsumers) : numOfConsumers(m_numOfConsumers), doneProducers(0) {
pthread_mutex_t mut;
pthread_mutex_init(&mut, NULL);
sem_init(&consumerNotificator, 0, 0);
sem_init(&producerNotificator, 0, numOfConsumers *10);
}
~TaskQueue() {
pthread_mutex_destroy(&mut);
sem_destroy(&consumerNotificator);
sem_destroy(&producerNotificator);
}
//Waits for empty slot in queue before pushing
void push(Task &task) {
//Wait for slot in queue
sem_wait(&producerNotificator);
//Lock before any manipulation
pthread_mutex_lock(&mut);
queue.push(task);
pthread_mutex_unlock(&mut);
//Notify consumer of waiting Task
sem_post(&consumerNotificator);
}
//Waits before item is in queue and pops it
Task pop() {
//Wait for Task in queue
sem_wait(&consumerNotificator);
//Lock before any manipulation
pthread_mutex_lock(&mut);
Task task = queue.front();
queue.pop();
pthread_mutex_unlock(&mut);
//Notify producer about empty slot in queue
sem_post(&producerNotificator);
return task;
}
//Handle finishing producers
void finished() {
//Lock before any manipulation
pthread_mutex_lock(&mut);
//Check if it is not last producer
if (NUM_OF_PRODUCERS > ++doneProducers) {
pthread_mutex_unlock(&mut);
return;
}
//If it was last producer end consumers by adding end tasks into queue
for (int i = 0; i < numOfConsumers; ++i) {
Task t;
t.end = true;
queue.push(t);
}
pthread_mutex_unlock(&mut);
//Notify all consumers about new Tasks
for (int i = 0; i < numOfConsumers; ++i) {
sem_post(&consumerNotificator);
}
}
};
As far as I can tell, when second producer calls finished, it deadlocks at pthread_mutex_lock(&mut);. No idea why.
Related
I have a class TaskManager that holds a queue of tasks. Each time the next task is popped and executed.
class TaskManager
{
TaskQueue m_queue;
svc_tasks()
{
while (!m_queue.empty())
{
Task* task = m_queue.pop();
task->execute();
}
}
};
Inside the Task there are certain points I would like to pause for at least SLEEP_TIME_MS milliseconds. During this pause I would like to start executing the next task. When the pause ends I would like to put the task in the queue again.
class Task
{
int m_phase = -1;
execute()
{
m_phase++;
switch(m_phase)
{
case 0:
...
do_pause(SLEEP_TIME_MS);
return;
case 1:
...
break;
}
}
};
Is there a scheduler in std (C++ 17) or boost that I could use that would call a handler function when SLEEP_TIME_MS passes?
Thank you for any advice
You can use boost::asio::high_resolution_timer with its async_wait method.
Every time when you want to schedule the operation of pushing task into queue you have to:
create high_resolution_timer
call expires_after which specifies the expiry time (SLEEP_TIME_MS) i.e. when handler is called. In your case in this handler you push a task into the queue.
call async_wait with your handler
If we assume that execute method returns bool which indicates whether a task is completed (all phases were executed), it may be rewritten into sth like this:
while (!m_queue.empty()) // this condition should be changed
{
Task* task = m_queue.pop();
bool finished = task->execute();
if (!finished)
scheduler works here - start async_wait with handler
}
If I understand correctly, you want to push task into queue when SLEEP_TIME_MS is expired, so you cannot break loop when queue is empty, because you have to wait until pending tasks will be completion. You can introduce stop flag. And break loop on demand.
Below I put a snippet of code which works in the way you described (I hope):
struct Scheduler {
Scheduler(boost::asio::io_context& io)
: io(io) {}
boost::asio::io_context& io;
template<class F>
void schedule (F&& handler) {
auto timer = std::make_shared<boost::asio::high_resolution_timer>(io);
timer->expires_after(std::chrono::milliseconds(5000)); // SLEEP_TIME_MS
timer->async_wait(
[timer,handler](const boost::system::error_code& ec) {
handler();
});
}
};
struct Task {
int phase = -1;
bool execute() {
++phase;
std::cout << "phase: " << phase << std::endl;
if (phase == 0) {
return false;
}
else {
}
return true;
}
};
struct TaskManager {
Scheduler s;
std::queue<std::shared_ptr<Task>> tasks;
std::mutex tasksMtx;
std::atomic<bool> stop{false};
TaskManager(boost::asio::io_context& io) : s(io) {
for (int i = 0; i < 5; ++i)
tasks.push(std::make_shared<Task>());
}
void run() {
while (true) {
if (stop)
break;
{
std::lock_guard<std::mutex> lock{tasksMtx};
if (tasks.empty())
continue;
}
std::shared_ptr<Task> currTask = tasks.front();
tasks.pop();
bool finished = currTask->execute();
if (!finished)
s.schedule( [this, currTask](){ insertTaskToVector(std::move(currTask)); } );
}
}
template<class T>
void insertTaskToVector(T&& t) {
std::lock_guard<std::mutex> lock{tasksMtx};
tasks.push(std::forward<T>(t));
}
};
int main() {
boost::asio::io_context io;
boost::asio::io_context::work work{io};
std::thread th([&io](){ io.run();});
TaskManager tm(io);
tm.run();
As an educational exercise I'm implementing a thread pool using condition variables. A controller thread creates a pool of threads that wait on a signal (an atomic variable being set to a value above zero). When signaled the threads wake, perform their work, and when the last thread is done it signals the main thread to awaken. The controller thread blocks until the last thread is complete. The pool is then available for subsequent re-use.
Every now and then I was getting a timeout on the controller thread waiting for the worker to signal completion (likely because of a race condition when decrementing the active work counter), so in an attempt to solidify the pool I replaced the "wait(lck)" form of the condition variable's wait method with "wait(lck, predicate)". Since doing this, the behaviour of the thread pool is such that it seems to permit decrementing of the active work counter below 0 (which is the condition for reawakening the controller thread) - I have a race condition. I've read countless articles on atomic variables, synchronisation, memory ordering, spurious and lost wakeups on stackoverflow and various other sites, have incorporated what I've learnt to the best of my ability, and still cannot for the life of me work out why the way I've coded the predicated wait just does not work. The counter should only ever be as high as the number of threads in the pool (say, 8) and as low as zero. I've started losing faith in myself - it just shouldn't be this hard to do something fundamentally simple. There is clearly something else I need to learn here :)
Considering of course that there was a race condition I ensured that the two variables that drive the awakening and termination of the pool are both atomic, and that both are only ever changed while protected with a unique_lock. Specifically, I made sure that when a request to the pool was launched, the lock was acquired, the active thread counter was changed from 0 to 8, unlocked the mutex, and then "notified_all". The controller thread would only then be awakened with the active thread count at zero, once the last worker thread decremented it that far and "notified_one".
In the worker thread, the condition variable would wait and wake only when the active thread count is greater than zero, unlock the mutex, in parallel proceed to execute the work preassigned to the processor when the pool was created, re-acquire the mutex, and atomically decrement the active thread count. It would then, while still supposedly protected by the lock, test if it was the last thread still active, and if so, again unlock the mutex and "notify_one" to awaken the controller.
The problem is - the active thread counter repeatedly proceeds below zero after even only 1 or 2 iterations. If I test the active thread count at the start of a new workload, I could find the active thread count down around -6 - it is as if the pool was allowed to reawaken the controller thread before the work was completed.
Given that the thread counter and terminate flag are both atomic variables and are only ever modified while under the protection of the same mutex, I am using sequential memory ordering for all updates, I just cannot see how this is happening and I'm lost.
#include <stdafx.h>
#include <Windows.h>
#include <iostream>
#include <thread>
using std::thread;
#include <mutex>
using std::mutex;
using std::unique_lock;
#include <condition_variable>
using std::condition_variable;
#include <atomic>
using std::atomic;
#include <chrono>
#include <vector>
using std::vector;
class IWorkerThreadProcessor
{
public:
virtual void Process(int) = 0;
};
class MyProcessor : public IWorkerThreadProcessor
{
int index_ = 0;
public:
MyProcessor(int index)
{
index_ = index;
}
void Process(int threadindex)
{
for (int i = 0; i < 5000000; i++);
std::cout << '(' << index_ << ':' << threadindex << ") ";
}
};
#define MsgBox(x) do{ MessageBox(NULL, x, L"", MB_OK ); }while(false)
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
//This goes negative when using the wait_for with predicate
atomic<int> threadsActive_ = 0;
atomic<bool> terminateFlag_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtxWorker_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtxWorker_);
threadsActive_ = 0;
terminateFlag_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
//Only used to see how many invocations I was getting through before experiencing the issue - sadly it's only one or two
invocations_++;
try
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use the predicated wait this break will fire !!!!!!
if (threadsActive_.load() != 0)
__debugbreak();
threadsActive_.store(poolSize_);
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
{
//As you can tell this has taken me through a journey trying to characterise the issue...
if (threadsActive_ > 0)
MsgBox(L"Thread pool timed out with still active threads");
else if (threadsActive_ == 0)
MsgBox(L"Thread pool timed out with zero active threads");
else
MsgBox(L"Thread pool timed out with negative active threads");
}
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use this predicated wait I see the failure !!!!!!
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
//!!!!!!!! Does not cause the failure but obviously will not handle
//spurious wake-ups !!!!!!!!!!
//cvSignalWork_.wait(lck);
if (terminateFlag_.load())
return;
//Unlock to parallelise the work load
lck.unlock();
processor->Process(threadIndex);
//Re-lock to decrement the work count
lck.lock();
//This returns the value before the subtraction so theoretically if the previous value was 1 then we're the last thread going and we can now signal the controller thread to wake. This is the only place that the decrement happens so I don't know how it could possibly go negative
if (threadsActive_.fetch_sub(1, std::memory_order_seq_cst) == 1)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtxWorker_);
if (!terminateFlag_)
{
terminateFlag_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
int main()
{
std::vector<IWorkerThreadProcessor*> processors;
for (int i = 0; i < 8; i++)
processors.push_back(new MyProcessor(i));
std::cout << "Instantiating thread pool\n";
auto pool = new ThreadPool;
std::cout << "Initialisting thread pool\n";
pool->Init(processors);
std::cout << "Thread pool initialised\n";
for (int i = 0; i < 200; i++)
{
std::cout << "Workload " << i << "\n";
pool->ProcessWorkload(std::chrono::milliseconds(500));
std::cout << "Workload " << i << " complete." << "\n";
}
for (auto a : processors)
delete a;
delete pool;
return 0;
}
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
std::atomic<unsigned int> awakenings_ = 0;
std::atomic<unsigned int> startedWorkloads_ = 0;
std::atomic<unsigned int> completedWorkloads_ = 0;
atomic<bool> terminate_ = false;
atomic<bool> stillFiring_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtx_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtx_);
//threadsActive_ = 0;
terminate_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
invokations_ = 0;
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
try
{
unique_lock<mutex> lck(mtx_);
invokations_++;
if (startedWorkloads_ != 0)
__debugbreak();
if (completedWorkloads_ != 0)
__debugbreak();
if (awakenings_ != 0)
__debugbreak();
if (stillFiring_)
__debugbreak();
stillFiring_ = true;
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
//[this] { return this->threadsActive_.load() == 0; })
[this] { return completedWorkloads_ == poolSize_ && !stillFiring_; })
)
{
if (completedWorkloads_ < poolSize_)
{
if (startedWorkloads_ < poolSize_)
MsgBox(L"Thread pool timed out with some threads unstarted");
else if (startedWorkloads_ == poolSize_)
MsgBox(L"Thread pool timed out with all threads started but not all completed");
}
else
__debugbreak();
}
if (completedWorkloads_ != poolSize_)
__debugbreak();
if (awakenings_ != poolSize_)
__debugbreak();
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtx_);
cvSignalWork_.wait(
lck,
[this] {
return
(stillFiring_ && (startedWorkloads_ < poolSize_)) ||
terminate_;
});
awakenings_++;
if (startedWorkloads_ == 0 && terminate_)
return;
if (stillFiring_ && startedWorkloads_ < poolSize_) //guard against spurious wakeup
{
startedWorkloads_++;
if (startedWorkloads_ == poolSize_)
stillFiring_ = false;
lck.unlock();
processor->Process(threadIndex);
lck.lock();
completedWorkloads_++;
if (completedWorkloads_ == poolSize_)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtx_);
if (!terminate_) //Don't attempt to double-terminate
{
terminate_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
I'm not certain if the following helps solve the problem, but I think the error is as shown below:
This
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
should be replaced by
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[&] { return threadsActive_.load() == 0; })
)
Looks like the lambda is not accessing the instantiated member of the class. Here is some reference to back my case. Look at Lambda Capture section of this page.
Edit:
Another place you are using wait for with lambdas.
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
Maybe modify all the lambdas and then see if it works?
The reason I'm looking at the lambda is because it seems like a case similar to a spurious wakeup. Hope it helps.
I detach a thread from Class B:
t1 = std::thread(&Class::method, this);
t1.detach();
which as part of it's normal operation waits on a condition variable:
cv.wait(lock);
However, when I close my B application the detached thread remains. How do I stop/clean-up this thread when B::~B() is called?
Try this snippet: Set bool member variable discard_ to true to avoid execution of your scheduled process execution:
std::thread([&](){
std::lock_guard<std::mutex> lock(mutex_);
cv.wait(lock,[](){ return normal_predicate_here || discard_ ;});
if(discard_) return;
// execute scheduled process
}).detach();
Make the other thread cooperate for termination. Non-detached thread makes it easier to terminate cleanly, so that you do not destroy the state accessed by the other thread prematurely:
struct OtherThread {
std::mutex m_;
std::condition_variable c_;
bool stop_ = false;
std::thread t_;
void thread_function() {
for(;;) {
std::unique_lock<std::mutex> l(m_);
while(!stop_ /* || a-message-received */)
c_.wait(l);
if(stop_)
return;
// Process a message.
// ...
// Continue waiting for messages or stop.
}
}
~OtherThread() {
this->stop();
}
void stop() {
{
std::unique_lock<std::mutex> l(m_);
if(stop_)
return;
stop_ = true;
}
c_.notify_one();
t_.join(); // Wait till the thread exited, so that this object can be destroyed.
}
};
I am trying to understand the below implementation of thread pool using the pthreads. When I comment out the the for loop in the main, the program stucks, upon putting the logs it seems that its getting stuck in the join function in threadpool destructor.
I am unable to understand why this is happening, is there any deadlock scenario happening ?
This may be naive but can someone help me understand why this is happening and how to correct this.
Thanks a lot !!!
#include <stdio.h>
#include <queue>
#include <unistd.h>
#include <pthread.h>
#include <malloc.h>
#include <stdlib.h>
// Base class for Tasks
// run() should be overloaded and expensive calculations done there
// showTask() is for debugging and can be deleted if not used
class Task {
public:
Task() {}
virtual ~Task() {}
virtual void run()=0;
virtual void showTask()=0;
};
// Wrapper around std::queue with some mutex protection
class WorkQueue {
public:
WorkQueue() {
// Initialize the mutex protecting the queue
pthread_mutex_init(&qmtx,0);
// wcond is a condition variable that's signaled
// when new work arrives
pthread_cond_init(&wcond, 0);
}
~WorkQueue() {
// Cleanup pthreads
pthread_mutex_destroy(&qmtx);
pthread_cond_destroy(&wcond);
}
// Retrieves the next task from the queue
Task *nextTask() {
// The return value
Task *nt = 0;
// Lock the queue mutex
pthread_mutex_lock(&qmtx);
// Check if there's work
if (finished && tasks.size() == 0) {
// If not return null (0)
nt = 0;
} else {
// Not finished, but there are no tasks, so wait for
// wcond to be signalled
if (tasks.size()==0) {
pthread_cond_wait(&wcond, &qmtx);
}
// get the next task
nt = tasks.front();
if(nt){
tasks.pop();
}
// For debugging
if (nt) nt->showTask();
}
// Unlock the mutex and return
pthread_mutex_unlock(&qmtx);
return nt;
}
// Add a task
void addTask(Task *nt) {
// Only add the task if the queue isn't marked finished
if (!finished) {
// Lock the queue
pthread_mutex_lock(&qmtx);
// Add the task
tasks.push(nt);
// signal there's new work
pthread_cond_signal(&wcond);
// Unlock the mutex
pthread_mutex_unlock(&qmtx);
}
}
// Mark the queue finished
void finish() {
pthread_mutex_lock(&qmtx);
finished = true;
// Signal the condition variable in case any threads are waiting
pthread_cond_signal(&wcond);
pthread_mutex_unlock(&qmtx);
}
// Check if there's work
bool hasWork() {
//printf("task queue size is %d\n",tasks.size());
return (tasks.size()>0);
}
private:
std::queue<Task*> tasks;
bool finished;
pthread_mutex_t qmtx;
pthread_cond_t wcond;
};
// Function that retrieves a task from a queue, runs it and deletes it
void *getWork(void* param) {
Task *mw = 0;
WorkQueue *wq = (WorkQueue*)param;
while (mw = wq->nextTask()) {
mw->run();
delete mw;
}
pthread_exit(NULL);
}
class ThreadPool {
public:
// Allocate a thread pool and set them to work trying to get tasks
ThreadPool(int n) : _numThreads(n) {
int rc;
printf("Creating a thread pool with %d threads\n", n);
threads = new pthread_t[n];
for (int i=0; i< n; ++i) {
rc = pthread_create(&(threads[i]), 0, getWork, &workQueue);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
}
// Wait for the threads to finish, then delete them
~ThreadPool() {
workQueue.finish();
//waitForCompletion();
for (int i=0; i<_numThreads; ++i) {
pthread_join(threads[i], 0);
}
delete [] threads;
}
// Add a task
void addTask(Task *nt) {
workQueue.addTask(nt);
}
// Tell the tasks to finish and return
void finish() {
workQueue.finish();
}
// Checks if there is work to do
bool hasWork() {
return workQueue.hasWork();
}
private:
pthread_t * threads;
int _numThreads;
WorkQueue workQueue;
};
// stdout is a shared resource, so protected it with a mutex
static pthread_mutex_t console_mutex = PTHREAD_MUTEX_INITIALIZER;
// Debugging function
void showTask(int n) {
pthread_mutex_lock(&console_mutex);
pthread_mutex_unlock(&console_mutex);
}
// Task to compute fibonacci numbers
// It's more efficient to use an iterative algorithm, but
// the recursive algorithm takes longer and is more interesting
// than sleeping for X seconds to show parrallelism
class FibTask : public Task {
public:
FibTask(int n) : Task(), _n(n) {}
~FibTask() {
// Debug prints
pthread_mutex_lock(&console_mutex);
printf("tid(%d) - fibd(%d) being deleted\n", pthread_self(), _n);
pthread_mutex_unlock(&console_mutex);
}
virtual void run() {
// Note: it's important that this isn't contained in the console mutex lock
long long val = innerFib(_n);
// Show results
pthread_mutex_lock(&console_mutex);
printf("Fibd %d = %lld\n",_n, val);
pthread_mutex_unlock(&console_mutex);
// The following won't work in parrallel:
// pthread_mutex_lock(&console_mutex);
// printf("Fibd %d = %lld\n",_n, innerFib(_n));
// pthread_mutex_unlock(&console_mutex);
}
virtual void showTask() {
// More debug printing
pthread_mutex_lock(&console_mutex);
printf("thread %d computing fibonacci %d\n", pthread_self(), _n);
pthread_mutex_unlock(&console_mutex);
}
private:
// Slow computation of fibonacci sequence
// To make things interesting, and perhaps imporove load balancing, these
// inner computations could be added to the task queue
// Ideally set a lower limit on when that's done
// (i.e. don't create a task for fib(2)) because thread overhead makes it
// not worth it
long long innerFib(long long n) {
if (n<=1) { return 1; }
return innerFib(n-1) + innerFib(n-2);
}
long long _n;
};
int main(int argc, char *argv[])
{
// Create a thread pool
ThreadPool *tp = new ThreadPool(10);
// Create work for it
/*for (int i=0;i<100; ++i) {
int rv = rand() % 40 + 1;
showTask(rv);
tp->addTask(new FibTask(rv));
}*/
delete tp;
printf("\n\n\n\n\nDone with all work!\n");
}
The design is more or less OK-ish but implementationwise it contains several things that are a bit overcomplicated and may introduce instabilities. I guess you prog deadlocks when you comment out the for loop because you should use pthread_cond_broadcast instead of pthread_cond_signal in your WorkQueue::finish() method.
Note: I usually implemented threadpool termination by placing NUM_THREADS number of NULL items into the workqueue and I set a finished flag only to be able to check something in my addTask() method because after finish() I usually don't let adding new tasks and I return with false from addTask() or sometimes I assert.
Another note: Its best to encapsulate threads into classes, that has several benefits and makes proting to multiple platforms easier.
There may be other bugs too as I haven't executed your program, just ran through your code.
EDIT: Here is a reworked version, I issued some modifications to your code but I don't guarantee that it works. Fingers crossed... :-)
#include <stdio.h>
#include <queue>
#include <unistd.h>
#include <pthread.h>
#include <malloc.h>
#include <stdlib.h>
#include <assert.h>
// Reusable thread class
class Thread
{
public:
Thread()
{
state = EState_None;
handle = 0;
}
virtual ~Thread()
{
assert(state != EState_Started);
}
void start()
{
assert(state == EState_None);
// in case of thread create error I usually FatalExit...
if (pthread_create(&handle, NULL, threadProc, this))
abort();
state = EState_Started;
}
void join()
{
// A started thread must be joined exactly once!
// This requirement could be eliminated with an alternative implementation but it isn't needed.
assert(state == EState_Started);
pthread_join(handle, NULL);
state = EState_Joined;
}
protected:
virtual void run() = 0;
private:
static void* threadProc(void* param)
{
Thread* thread = reinterpret_cast<Thread*>(param);
thread->run();
return NULL;
}
private:
enum EState
{
EState_None,
EState_Started,
EState_Joined
};
EState state;
pthread_t handle;
};
// Base task for Tasks
// run() should be overloaded and expensive calculations done there
// showTask() is for debugging and can be deleted if not used
class Task {
public:
Task() {}
virtual ~Task() {}
virtual void run()=0;
virtual void showTask()=0;
};
// Wrapper around std::queue with some mutex protection
class WorkQueue
{
public:
WorkQueue() {
pthread_mutex_init(&qmtx,0);
// wcond is a condition variable that's signaled
// when new work arrives
pthread_cond_init(&wcond, 0);
}
~WorkQueue() {
// Cleanup pthreads
pthread_mutex_destroy(&qmtx);
pthread_cond_destroy(&wcond);
}
// Retrieves the next task from the queue
Task *nextTask() {
// The return value
Task *nt = 0;
// Lock the queue mutex
pthread_mutex_lock(&qmtx);
while (tasks.empty())
pthread_cond_wait(&wcond, &qmtx);
nt = tasks.front();
tasks.pop();
// Unlock the mutex and return
pthread_mutex_unlock(&qmtx);
return nt;
}
// Add a task
void addTask(Task *nt) {
// Lock the queue
pthread_mutex_lock(&qmtx);
// Add the task
tasks.push(nt);
// signal there's new work
pthread_cond_signal(&wcond);
// Unlock the mutex
pthread_mutex_unlock(&qmtx);
}
private:
std::queue<Task*> tasks;
pthread_mutex_t qmtx;
pthread_cond_t wcond;
};
// Thanks to the reusable thread class implementing threads is
// simple and free of pthread api usage.
class PoolWorkerThread : public Thread
{
public:
PoolWorkerThread(WorkQueue& _work_queue) : work_queue(_work_queue) {}
protected:
virtual void run()
{
while (Task* task = work_queue.nextTask())
task->run();
}
private:
WorkQueue& work_queue;
};
class ThreadPool {
public:
// Allocate a thread pool and set them to work trying to get tasks
ThreadPool(int n) {
printf("Creating a thread pool with %d threads\n", n);
for (int i=0; i<n; ++i)
{
threads.push_back(new PoolWorkerThread(workQueue));
threads.back()->start();
}
}
// Wait for the threads to finish, then delete them
~ThreadPool() {
finish();
}
// Add a task
void addTask(Task *nt) {
workQueue.addTask(nt);
}
// Asking the threads to finish, waiting for the task
// queue to be consumed and then returning.
void finish() {
for (size_t i=0,e=threads.size(); i<e; ++i)
workQueue.addTask(NULL);
for (size_t i=0,e=threads.size(); i<e; ++i)
{
threads[i]->join();
delete threads[i];
}
threads.clear();
}
private:
std::vector<PoolWorkerThread*> threads;
WorkQueue workQueue;
};
// stdout is a shared resource, so protected it with a mutex
static pthread_mutex_t console_mutex = PTHREAD_MUTEX_INITIALIZER;
// Debugging function
void showTask(int n) {
pthread_mutex_lock(&console_mutex);
pthread_mutex_unlock(&console_mutex);
}
// Task to compute fibonacci numbers
// It's more efficient to use an iterative algorithm, but
// the recursive algorithm takes longer and is more interesting
// than sleeping for X seconds to show parrallelism
class FibTask : public Task {
public:
FibTask(int n) : Task(), _n(n) {}
~FibTask() {
// Debug prints
pthread_mutex_lock(&console_mutex);
printf("tid(%d) - fibd(%d) being deleted\n", (int)pthread_self(), (int)_n);
pthread_mutex_unlock(&console_mutex);
}
virtual void run() {
// Note: it's important that this isn't contained in the console mutex lock
long long val = innerFib(_n);
// Show results
pthread_mutex_lock(&console_mutex);
printf("Fibd %d = %lld\n",(int)_n, val);
pthread_mutex_unlock(&console_mutex);
// The following won't work in parrallel:
// pthread_mutex_lock(&console_mutex);
// printf("Fibd %d = %lld\n",_n, innerFib(_n));
// pthread_mutex_unlock(&console_mutex);
// this thread pool implementation doesn't delete
// the tasks so we perform the cleanup here
delete this;
}
virtual void showTask() {
// More debug printing
pthread_mutex_lock(&console_mutex);
printf("thread %d computing fibonacci %d\n", (int)pthread_self(), (int)_n);
pthread_mutex_unlock(&console_mutex);
}
private:
// Slow computation of fibonacci sequence
// To make things interesting, and perhaps imporove load balancing, these
// inner computations could be added to the task queue
// Ideally set a lower limit on when that's done
// (i.e. don't create a task for fib(2)) because thread overhead makes it
// not worth it
long long innerFib(long long n) {
if (n<=1) { return 1; }
return innerFib(n-1) + innerFib(n-2);
}
long long _n;
};
int main(int argc, char *argv[])
{
// Create a thread pool
ThreadPool *tp = new ThreadPool(10);
// Create work for it
for (int i=0;i<100; ++i) {
int rv = rand() % 40 + 1;
showTask(rv);
tp->addTask(new FibTask(rv));
}
delete tp;
printf("\n\n\n\n\nDone with all work!\n");
}
I think you are having a race condition there...
When you remove the for loop, the pool is destructed as soon as it gets created so there is no time for the threads to start waiting on the queue. Try putting a sleep there and you'll see.
I implemented a threadpool library, which is used widely among all our services, so here come some advices:
You are using C++, so there's no need to use pthreads, just use boost, or std:thread if available
Don't signal, push empty tasks instead (pushing a task requires to signal, of course)
Use boost::function or std:function instead of a base class
Cope with spurious wake-ups (you code doesn't seem to handle them)
pthread_cond_signal wakes-up only one thread, you must use pthread_cond_broadcast if you want to notify them all, that said, I'd recommend, again, to stick to boost's conditions (#pasztorpisti got it rigth here, he's got my upvote)
Here is what I want to do: there is a main thread producing numbers and put them into a queue, it fires a child thread consuming the numbers in the queue.
The main thread should stop producing numbers if the queue's size grows more than 10, and it should resume number production if the queue's size goes down less than 5.
queue<int> qu;
void *num_consumer(void *arg)
{
while(1) {
//lock qu
int num = qu.pop();
//unlock qu
do_something_with(num);
}
}
int main()
{
pthread_create(&tid, NULL, num_consumer, NULL);
while(1) {
int num;
produce(&num);
//lock qu
queue.push(num);
//unlock qu
if(qu.size() >= 10) {
//how to block and how to resume the main thread?
}
}
}
I might use semaphore to do the job, but any other idea?
A condition variable is appropriate here- actually, a pair of condition variables, because the consumer also needs to block if the queue is empty:
pthread_cond_t qu_empty_cond = PTHREAD_COND_INITIALIZER;
pthread_cond_t qu_full_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t qu_mutex = PTHREAD_MUTEX_INITIALIZER;
void *num_consumer(void *arg)
{
while(1) {
int num;
pthread_mutex_lock(&qu_mutex);
while (qu.size() < 1)
pthread_cond_wait(&qu_empty_cond, &qu_mutex);
num = qu.pop();
if (qu.size() < 5)
pthread_cond_signal(&qu_full_cond);
pthread_mutex_unlock(&qu_mutex);
do_something_with(num);
}
}
int main()
{
pthread_create(&tid, NULL, num_consumer, NULL);
while(1) {
int num;
produce(&num);
pthread_mutex_lock(&qu_mutex);
queue.push(num);
pthread_cond_signal(&qu_empty_cond);
if (qu.size() >= 10)
do {
pthread_cond_wait(&qu_full_cond, &qu_mutex);
} while (qu.size() >= 5);
pthread_mutex_unlock(&qu_mutex);
}
}
Note that that when the producer waits on the condition variable, the queue mutex is atomically released.