How can I tell when my ThreadPool is finished with its tasks? - c++

In c++11, I have a ThreadPool object which manages a number of threads that are enqueued via a single lambda function. I know how many rows of data I have to work on and so I know ahead of time that I will need to queue N jobs. What I am not sure about is how to tell when all of those jobs are finished, so I can move on to the next step.
This is the code to manage the ThreadPool:
#include <cstdlib>
#include <vector>
#include <deque>
#include <iostream>
#include <atomic>
#include <thread>
#include <mutex>
#include <condition_variable>
class ThreadPool;
class Worker {
public:
Worker(ThreadPool &s) : pool(s) { }
void operator()();
private:
ThreadPool &pool;
};
class ThreadPool {
public:
ThreadPool(size_t);
template<class F>
void enqueue(F f);
~ThreadPool();
void joinAll();
int taskSize();
private:
friend class Worker;
// the task queue
std::deque< std::function<void()> > tasks;
// keep track of threads
std::vector< std::thread > workers;
// sync
std::mutex queue_mutex;
std::condition_variable condition;
bool stop;
};
void Worker::operator()()
{
std::function<void()> task;
while(true)
{
{ // acquire lock
std::unique_lock<std::mutex>
lock(pool.queue_mutex);
// look for a work item
while ( !pool.stop && pool.tasks.empty() ) {
// if there are none wait for notification
pool.condition.wait(lock);
}
if ( pool.stop ) {// exit if the pool is stopped
return;
}
// get the task from the queue
task = pool.tasks.front();
pool.tasks.pop_front();
} // release lock
// execute the task
task();
}
}
// the constructor just launches some amount of workers
ThreadPool::ThreadPool(size_t threads)
: stop(false)
{
for (size_t i = 0;i<threads;++i) {
workers.push_back(std::thread(Worker(*this)));
}
//workers.
//tasks.
}
// the destructor joins all threads
ThreadPool::~ThreadPool()
{
// stop all threads
stop = true;
condition.notify_all();
// join them
for ( size_t i = 0;i<workers.size();++i) {
workers[i].join();
}
}
void ThreadPool::joinAll() {
// join them
for ( size_t i = 0;i<workers.size();++i) {
workers[i].join();
}
}
int ThreadPool::taskSize() {
return tasks.size();
}
// add new work item to the pool
template<class F>
void ThreadPool::enqueue(F f)
{
{ // acquire lock
std::unique_lock<std::mutex> lock(queue_mutex);
// add the task
tasks.push_back(std::function<void()>(f));
} // release lock
// wake up one thread
condition.notify_one();
}
And then I distribute my job among threads like this:
ThreadPool pool(4);
/* ... */
for (int y=0;y<N;y++) {
pool->enqueue([this,y] {
this->ProcessRow(y);
});
}
// wait until all threads are finished
std::this_thread::sleep_for( std::chrono::milliseconds(100) );
Waiting for 100 milliseconds works just because I know those jobs can complete in less time than 100ms, but obviously its not the best approach. Once it has completed N rows of processing it needs to go through another 1000 or so generations of the same thing. Obviously, I want to begin the next generation as soon as I can.
I know there must be some way to add code into my ThreadPool so that I can do something like this:
while ( pool->isBusy() ) {
std::this_thread::sleep_for( std::chrono::milliseconds(1) );
}
I've been working on this for a couple nights now and I find it hard to find good examples of how to do this. So, what would be the proper way to implementat my isBusy() method?

I got it!
First of all, I introduced a few extra members to the ThreadPool class:
class ThreadPool {
/* ... exisitng code ... */
/* plus the following */
std::atomic<int> njobs_pending;
std::mutex main_mutex;
std::condition_variable main_condition;
}
Now, I can do better than checking some status every X amount of time. Now, I can block the Main loop until no more jobs are pending:
void ThreadPool::waitUntilCompleted(unsigned n) {
std::unique_lock<std::mutex> lock(main_mutex);
main_condition.wait(lock);
}
As long as I manage what's pending with the following bookkeeping code, at the head of the ThreadPool.enqueue() function:
njobs_pending++;
and right after I run the task in the Worker::operator()() function:
if ( --pool.njobs_pending == 0 ) {
pool.main_condition.notify_one();
}
Then the main thread can enqueue whatever tasks are necessary and then sit and wait until all calculations are completed with:
for (int y=0;y<N;y++) {
pool->enqueue([this,y] {
this->ProcessRow(y);
});
}
pool->waitUntilCompleted();

You may need to create an internal structure of threads associated with a bool variable flag.
class ThreadPool {
private:
// This Structure Will Keep Track Of Each Thread's Progress
struct ThreadInfo {
std::thread thread;
bool isDone;
ThreadInfo( std::thread& threadIn ) :
thread( threadIn ), isDone(false)
{}
}; // ThredInfo
// This Vector Should Be Populated In The Constructor Initially And
// Updated Anytime You Would Add A New Task.
// This Should Also Replace // std::vector<std::thread> workers
std::vector<ThreadInfo> workers;
public:
// The rest of your class would appear to be the same, but you need a
// way to test if a particular thread is currently active. When the
// thread is done this bool flag would report as being true;
// This will only return or report if a particular thread is done or not
// You would have to set this variable's flag for a particular thread to
// true when it completes its task, otherwise it will always be false
// from moment of creation. I did not add in any bounds checking to keep
// it simple which should be taken into consideration.
bool isBusy( unsigned idx ) const {
return workers[idx].isDone;
}
};

If you have N jobs and they have to be awaited for by calling thread sleep, then the most efficient way would be to create somewhere a variable, that would be set by an atomic operation to N before scheduling jobs and inside each job when done with computation, there would be atomic decrement of the variable. Then you can use atomic instruction to test if the variable is zero.
Or locked decrement with wait handles, when the variable would decrement to zero.
I just have to say, I do not like this idea you are asking for:
while ( pool->isBusy() ) {
std::this_thread::sleep_for( std::chrono::milliseconds(1) );
}
It just does not fit well, it won't be 1ms almost never, it is using resources needlessly etc...
The best way would be to decrement some variable atomically, and test atomically the variable if all done and the last job will simply based on atomic test set WaitForSingleObject.
And if you must, the waiting will be on WaitForSingleObject, and would woke up after completion, not many times.
WaitForSingleObject

Related

How to assign N tasks to M threads max.?

Im new to C++, and trying to get my head around multithreading. I’ve got the basics covered. Now imagine this situation:
I have, say, N tasks that I want to have completed ASAP. That‘s easy, just start N threads and lean back. But I’m not sure if this will work for N=200 or more.
So I’d like to say: I have N tasks, and I want to start a limited number of M worker threads. How do I schedule a task to be issued to a new thread once one of the previous threads has finished?
Or is all this taken care of by the OS or runtime, and I need not worry at all, even if N gets really big?
No, you don’t want to create 200 threads. While it would likely work just fine, creating a thread involves significant processing overhead. Rather, you want a “task queue” system, where a pool of worker threads (generally equal in size to the number of CPU cores) draw from a shared queue of things that need to be done. Intel TBB contains a commonly used task queue implementation, but there are others as well.
std::thread::hardware_concurrancy may be useful to decide how many threads you want. If it returns anything but 0 it is the number of concurrent threads which can run simultaneously. It's often the number of CPU cores multiplied with the number of hyperthreads each core may run. 12 cores and 2 HT:s/core makes 24. Exceeding this number will likely just slow everything down.
You can create a pool of threads standing by to grab work on your command since creating threads is somewhat expensive. If you have 1000000 tasks to deal with, you want the 24 threads (in this example) to be up all the time.
This is a very common scenario though and since C++17 there is an addition to many of the standard algorithms, like std::for_each, to make them execute according to execution policies. If you want it to execute in parallel, it'll use a built-in thread pool (most likely) to finish the task.
Example:
#include <algorithm>
#include <execution>
#include <vector>
struct Task {
some_type data_to_work_on;
some_type result;
};
int main() {
std::vector<Task> tasks;
std::for_each(std::execution::par, tasks.begin(), tasks.end(), [](Task& t) {
// work on task `t` here
});
// all tasks done, check the result in each.
}
I have N tasks, and I want to start a limited number of M worker threads.
How do I schedule a task to be issued to a new thread once
one of the previous threads has finished?
Set your thread pool size, M, taking into account the number of threads available in your system (hardware_concurrency).
Use a counting_semaphore to make sure you don't launch a task if there is not an available thread pool slot.
Loop through your N tasks, acquiring a thread pool slot, running the task, and releasing the thread pool slot. Notice that, since tasks are launched asynchronously, you will be able to have M tasks running in parallel.
[Demo]
#include <future> // async
#include <iostream> // cout
#include <semaphore> // counting_semaphore
#include <vector>
static const size_t THREAD_POOL_SIZE_DEFAULT{ std::thread::hardware_concurrency() };
static const size_t THREAD_POOL_SIZE_MAX{ std::thread::hardware_concurrency() * 2 };
static const size_t NUM_TASKS_DEFAULT{ 20 };
template <typename F>
void run_tasks(
F&& f,
size_t thread_pool_size = THREAD_POOL_SIZE_DEFAULT,
size_t num_tasks = NUM_TASKS_DEFAULT)
{
thread_pool_size = std::min(thread_pool_size, THREAD_POOL_SIZE_MAX);
std::counting_semaphore task_slots(thread_pool_size);
auto futures{ std::vector<std::future<void>>(num_tasks) };
auto task_results{ std::vector<int>(num_tasks) };
// We can run thread_pool_size tasks in parallel
// If all task slots are busy, we have to wait for a task to finish
for (size_t i{ 0 }; i < num_tasks; ++i)
{
// Wait for a task slot to be free
task_slots.acquire();
futures[i] = std::async(
std::launch::async,
[i, &f, &task_result = task_results[i], &task_slots]() {
// Execute task
task_result = std::forward<F>(f)(i);
// Release the task slot
task_slots.release();
}
);
}
// Wait for all the tasks to finish
for (auto& future : futures) { future.get(); };
for (auto& result: task_results) { std::cout << result << " "; }
}
int main()
{
run_tasks([](int i) { return i * i; }, 4, 20);
}
This is my take on a threadpool (not extensively debugged yet). In main, it starts a threadpool with the maximum of threads the hardware allows (the thing Ted Lyngmo was referring to)
There are quite a few things involved since this threadpool also allows callers to get back the results of asynchronously started call
std::shared_future (to return a result to caller if needed)
std::packaged_task (to hold a call)
std::condition_variable (to communicate that stuff has entered the queue, or to signal all threads should stop)
std::mutex/std::unique_lock (to protect the queue of calls)
std::thread (ofcourse)
use of lambda's
#include <cassert>
#include <condition_variable>
#include <exception>
#include <iostream>
#include <mutex>
#include <future>
#include <thread>
#include <vector>
#include <queue>
//=====================================================================================================================================
namespace details
{
// task_itf is something the threadpool can call to start a scheduled function call
// independent of argument and/or return value types
class task_itf
{
public:
virtual void execute() = 0;
};
//-------------------------------------------------------------------------------------------------------------------------------------
// A task is a container for a function call + arguments a future.
// but is already specialized for the return value type of the function call
// which the future also needs
//
template<typename retval_t>
class task final :
public task_itf
{
public:
template<typename lambda_t>
explicit task(lambda_t&& lambda) :
m_task(lambda)
{
}
std::future<retval_t> get_future()
{
return m_task.get_future();
}
std::shared_future<retval_t> get_shared_future()
{
return std::shared_future<retval_t>(m_task.get_future());
}
virtual void execute() override
{
m_task();
}
private:
std::packaged_task<retval_t()> m_task;
};
class stop_exception :
public std::exception
{
};
}
//-------------------------------------------------------------------------------------------------------------------------------------
// actual thread_pool class
class thread_pool_t
{
public:
// construct a thread_pool with specified number of threads.
explicit thread_pool_t(const std::size_t size) :
m_stop{ false }
{
std::condition_variable signal_started;
std::atomic<std::size_t> number_of_threads_started{ 0u };
for (std::size_t n = 0; n < size; ++n)
{
// move the thread into the vector, no need to copy
m_threads.push_back(std::move(std::thread([&]()
{
{
number_of_threads_started++;
signal_started.notify_all();
}
thread_loop();
})));
}
// wait for all threads to have started.
std::mutex mtx;
std::unique_lock<std::mutex> lock{ mtx };
signal_started.wait(lock, [&] { return number_of_threads_started == size; });
}
// destructor signals all threads to stop as soon as they are done.
// then waits for them to stop.
~thread_pool_t()
{
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_stop = true;
}
m_wakeup.notify_all();
for (auto& thread : m_threads)
{
thread.join();
}
}
// pass a function asynchronously to the threadpool
// this function returns a future so the calling thread
// my synchronize with a result if it so wishes.
template<typename lambda_t>
auto async(lambda_t&& lambda)
{
using retval_t = decltype(lambda());
auto task = std::make_shared<details::task<retval_t>>(lambda);
queue_task(task);
return task->get_shared_future();
}
// let the threadpool run the function but wait for
// the threadpool thread to finish
template<typename lambda_t>
auto sync(lambda_t&& lambda)
{
auto ft = async(lambda);
return ft.get();
}
void synchronize()
{
sync([] {});
}
private:
void queue_task(const std::shared_ptr<details::task_itf>& task_ptr)
{
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_queue.push(task_ptr);
}
// signal only one thread, first waiting thread to wakeup will run the next task.
m_wakeup.notify_one();
}
std::shared_ptr<details::task_itf> get_next_task()
{
static auto pred = [this] { return (m_stop || (m_queue.size() > 0)); };
std::unique_lock<std::mutex> lock(m_queue_mutex);
while (!pred())
{
m_wakeup.wait(lock, pred);
}
if (m_stop)
{
// use exception to break out of the mainloop
throw details::stop_exception();
}
auto task = m_queue.front();
m_queue.pop();
return task;
}
void thread_loop()
{
try
{
while (auto task = get_next_task())
{
task->execute();
}
}
catch (const details::stop_exception&)
{
}
}
std::vector<std::thread> m_threads;
std::mutex m_queue_mutex;
std::queue<std::shared_ptr<details::task_itf>> m_queue;
std::condition_variable m_wakeup;
bool m_stop;
};
//-----------------------------------------------------------------------------
int main()
{
thread_pool_t thread_pool{ std::thread::hardware_concurrency() };
for (int i = 0; i < 200; i++)
{
// just schedule asynchronous calls, returned futures are not used in this example
thread_pool.async([i]
{
std::cout << i << " ";
});
}
// this threadpool will not by default wait until all work is finished
// but stops processing when destructed.
// a call to synchronize will block until all work is done that is queued up till this moment.
thread_pool.synchronize();
std::cout << "\nDone...\n";
return 0;
}

When one worker thread fails, how to abort remaining workers?

I have a program which spawns multiple threads, each of which executes a long-running task. The main thread then waits for all worker threads to join, collects results, and exits.
If an error occurs in one of the workers, I want the remaining workers to stop gracefully, so that the main thread can exit shortly afterwards.
My question is how best to do this, when the implementation of the long-running task is provided by a library whose code I cannot modify.
Here is a simple sketch of the system, with no error handling:
void threadFunc()
{
// Do long-running stuff
}
void mainFunc()
{
std::vector<std::thread> threads;
for (int i = 0; i < 3; ++i) {
threads.push_back(std::thread(&threadFunc));
}
for (auto &t : threads) {
t.join();
}
}
If the long-running function executes a loop and I have access to the code, then
execution can be aborted simply by checking a shared "keep on running" flag at the top of each iteration.
std::mutex mutex;
bool error;
void threadFunc()
{
try {
for (...) {
{
std::unique_lock<std::mutex> lock(mutex);
if (error) {
break;
}
}
}
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
Now consider the case when the long-running operation is provided by a library:
std::mutex mutex;
bool error;
class Task
{
public:
// Blocks until completion, error, or stop() is called
void run();
void stop();
};
void threadFunc(Task &task)
{
try {
task.run();
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
In this case, the main thread has to handle the error, and call stop() on
the still-running tasks. As such, it cannot simply wait for each worker to
join() as in the original implementation.
The approach I have used so far is to share the following structure between
the main thread and each worker:
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
}
When a worker completes successfully, it decrements the running count. If
an exception is caught, the worker sets the error flag. In both cases, it
then calls condVar.notify_one().
The main thread then waits on the condition variable, waking up if either
error is set or running reaches zero. On waking up, the main thread
calls stop() on all tasks if error has been set.
This approach works, but I feel there should be a cleaner solution using some
of the higher-level primitives in the standard concurrency library. Can
anyone suggest an improved implementation?
Here is the complete code for my current solution:
// main.cpp
#include <chrono>
#include <mutex>
#include <thread>
#include <vector>
#include "utils.h"
// Class which encapsulates long-running task, and provides a mechanism for aborting it
class Task
{
public:
Task(int tidx, bool fail)
: tidx(tidx)
, fail(fail)
, m_run(true)
{
}
void run()
{
static const int NUM_ITERATIONS = 10;
for (int iter = 0; iter < NUM_ITERATIONS; ++iter) {
{
std::unique_lock<std::mutex> lock(m_mutex);
if (!m_run) {
out() << "thread " << tidx << " aborting";
break;
}
}
out() << "thread " << tidx << " iter " << iter;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
if (fail) {
throw std::exception();
}
}
}
void stop()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_run = false;
}
const int tidx;
const bool fail;
private:
std::mutex m_mutex;
bool m_run;
};
// Data shared between all threads
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
SharedData(int count)
: error(false)
, running(count)
{
}
};
void threadFunc(Task &task, SharedData &shared)
{
try {
out() << "thread " << task.tidx << " starting";
task.run(); // Blocks until task completes or is aborted by main thread
out() << "thread " << task.tidx << " ended";
} catch (std::exception &) {
out() << "thread " << task.tidx << " failed";
std::unique_lock<std::mutex> lock(shared.mutex);
shared.error = true;
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
--shared.running;
}
shared.condVar.notify_one();
}
int main(int argc, char **argv)
{
static const int NUM_THREADS = 3;
std::vector<std::unique_ptr<Task>> tasks(NUM_THREADS);
std::vector<std::thread> threads(NUM_THREADS);
SharedData shared(NUM_THREADS);
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
const bool fail = (tidx == 1);
tasks[tidx] = std::make_unique<Task>(tidx, fail);
threads[tidx] = std::thread(&threadFunc, std::ref(*tasks[tidx]), std::ref(shared));
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
// Wake up when either all tasks have completed, or any one has failed
shared.condVar.wait(lock, [&shared](){
return shared.error || !shared.running;
});
if (shared.error) {
out() << "error occurred - terminating remaining tasks";
for (auto &t : tasks) {
t->stop();
}
}
}
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
out() << "waiting for thread " << tidx << " to join";
threads[tidx].join();
out() << "thread " << tidx << " joined";
}
out() << "program complete";
return 0;
}
Some utility functions are defined here:
// utils.h
#include <iostream>
#include <mutex>
#include <thread>
#ifndef UTILS_H
#define UTILS_H
#if __cplusplus <= 201103L
// Backport std::make_unique from C++14
#include <memory>
namespace std {
template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(
Args&& ...args)
{
return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}
} // namespace std
#endif // __cplusplus <= 201103L
// Thread-safe wrapper around std::cout
class ThreadSafeStdOut
{
public:
ThreadSafeStdOut()
: m_lock(m_mutex)
{
}
~ThreadSafeStdOut()
{
std::cout << std::endl;
}
template <typename T>
ThreadSafeStdOut &operator<<(const T &obj)
{
std::cout << obj;
return *this;
}
private:
static std::mutex m_mutex;
std::unique_lock<std::mutex> m_lock;
};
std::mutex ThreadSafeStdOut::m_mutex;
// Convenience function for performing thread-safe output
ThreadSafeStdOut out()
{
return ThreadSafeStdOut();
}
#endif // UTILS_H
I've been thinking about your situation for sometime and this maybe of some help to you. You could probably try doing a couple of different methods to achieve you goal. There are 2-3 options that maybe of use or a combination of all three. I will at minimum show the first option for I'm still learning and trying to master the concepts of Template Specializations as well as using Lambdas.
Using a Manager Class
Using Template Specialization Encapsulation
Using Lambdas.
Pseudo code of a Manager Class would look something like this:
class ThreadManager {
private:
std::unique_ptr<MainThread> mainThread_;
std::list<std::shared_ptr<WorkerThread> lWorkers_; // List to hold finished workers
std::queue<std::shared_ptr<WorkerThread> qWorkers_; // Queue to hold inactive and waiting threads.
std::map<unsigned, std::shared_ptr<WorkerThread> mThreadIds_; // Map to associate a WorkerThread with an ID value.
std::map<unsigned, bool> mFinishedThreads_; // A map to keep track of finished and unfinished threads.
bool threadError_; // Not needed if using exception handling
public:
explicit ThreadManager( const MainThread& main_thread );
void shutdownThread( const unsigned& threadId );
void shutdownAllThreads();
void addWorker( const WorkerThread& worker_thread );
bool isThreadDone( const unsigned& threadId );
void spawnMainThread() const; // Method to start main thread's work.
void spawnWorkerThread( unsigned threadId, bool& error );
bool getThreadError( unsigned& threadID ); // Returns True If Thread Encountered An Error and passes the ID of that thread,
};
Only for demonstration purposes did I use bool value to determine if a thread failed for simplicity of the structure, and of course this can be substituted to your like if you prefer to use exceptions or invalid unsigned values, etc.
Now to use a class of this sort would be something like this: Also note that a class of this type would be considered better if it was a Singleton type object since you wouldn't want more than 1 ManagerClass since you are working with shared pointers.
SomeClass::SomeClass( ... ) {
// This class could contain a private static smart pointer of this Manager Class
// Initialize the smart pointer giving it new memory for the Manager Class and by passing it a pointer of the Main Thread object
threadManager_ = new ThreadManager( main_thread ); // Wouldn't actually use raw pointers here unless if you had a need to, but just shown for simplicity
}
SomeClass::addThreads( ... ) {
for ( unsigned u = 1, u <= threadCount; u++ ) {
threadManager_->addWorker( some_worker_thread );
}
}
SomeClass::someFunctionThatSpawnsThreads( ... ) {
threadManager_->spawnMainThread();
bool error = false;
for ( unsigned u = 1; u <= threadCount; u++ ) {
threadManager_->spawnWorkerThread( u, error );
if ( error ) { // This Thread Failed To Start, Shutdown All Threads
threadManager->shutdownAllThreads();
}
}
// If all threads spawn successfully we can do a while loop here to listen if one fails.
unsigned threadId;
while ( threadManager_->getThreadError( threadId ) ) {
// If the function passed to this while loop returns true and we end up here, it will pass the id value of the failed thread.
// We can now go through a for loop and stop all active threads.
for ( unsigned u = threadID + 1; u <= threadCount; u++ ) {
threadManager_->shutdownThread( u );
}
// We have successfully shutdown all threads
break;
}
}
I like the design of manager class since I have used them in other projects, and they come in handy quite often especially when working with a code base that contains many and multiple resources such as a working Game Engine that has many assets such as Sprites, Textures, Audio Files, Maps, Game Items etc. Using a Manager Class helps to keep track and maintain all of the assets. This same concept can be applied to "Managing" Active, Inactive, Waiting Threads, and knows how to intuitively handle and shutdown all threads properly. I would recommend using an ExceptionHandler if your code base and libraries support exceptions as well as thread safe exception handling instead of passing and using bools for errors. Also having a Logger class is good to where it can write to a log file and or a console window to give an explicit message of what function the exception was thrown in and what caused the exception where a log message might look like this:
Exception Thrown: someFunctionNamedThis in ThisFile on Line# (x)
threadID 021342 failed to execute.
This way you can look at the log file and find out very quickly what thread is causing the exception, instead of using passed around bool variables.
The implementation of the long-running task is provided by a library whose code I cannot modify.
That means you have no way to synchronize the job done by working threads
If an error occurs in one of the workers,
Let's suppose that you can really detect worker errors; some of then can be easily detected if reported by the used library others cannot i.e.
the library code loops.
the library code prematurely exit with an uncaught exception.
I want the remaining workers to stop **gracefully**
That's just not possible
The best you can do is writing a thread manager checking on worker thread status and if an error condition is detected it just (ungracefully) "kills" all the worker threads and exits.
You should also consider detecting a looped working thread (by timeout) and offer to the user the option to kill or continue waiting for the process to finish.
Your problem is that the long running function is not your code, and you say you cannot modify it. Consequently you cannot make it pay any attention whatsoever to any kind of external synchronisation primitive (condition variables, semaphores, mutexes, pipes, etc), unless the library developer has done that for you.
Therefore your only option is to do something that wrestles control away from any code no matter what it's doing. This is what signals do. For that, you're going to have to use pthread_kill(), or whatever the equivalent is these days.
The pattern would be that
The thread that detects an error needs to communicate that error back to the main thread in some manner.
The main thread then needs to call pthread_kill() for all the other remaining threads. Don't be confused by the name - pthread_kill() is simply a way of delivering an arbitrary signal to a thread. Note that signals like STOP, CONTINUE and TERMINATE are process-wide even if raised with pthread_kill(), not thread specific so don't use those.
In each of those threads you'll need a signal handler. On delivery of the signal to a thread the execution path in that thread will jump to the handler no matter what the long running function was doing.
You are now back in (limited) control, and can (probably, well, maybe) do some limited cleanup and terminate the thread.
In the meantime the main thread will have been calling pthread_join() on all the threads it's signaled, and those will now return.
My thoughts:
This is a really ugly way of doing it (and signals / pthreads are notoriously difficult to get right and I'm no expert), but I don't really see what other choice you have.
It'll be a long way from looking 'graceful' in source code, though the end user experience will be OK.
You will be aborting execution part way through running that library function, so if there's any clean up it would normally do (e.g. freeing up memory it has allocated) that won't get done and you'll have a memory leak. Running under something like valgrind is a way of working out if this is happening.
The only way of getting the library function to clean up (if it needs it) will be for your signal handler to return control to the function and letting it run to completion, just what you don't want to do.
And of course, this won't work on Windows (no pthreads, at least none worth speaking of, though there may be an equivalent mechanism).
Really the best way is going to be to re-implement (if at all possible) that library function.

C/C++ pthread signals and pointers

I'm having the hardest time trying to wrap my head around how to allow threads to signal each other.
My design:
The main function creates a single master thread that coordinates a bunch of other worker threads. The main function also creates the workers because the worker threads spawn and exit at intervals programmed in the main. The master thread needs to be able to signal these worker threads and signal_broadcast them all as well as the worker threads have to signal the master back (pthread_cond_signal). Since each thread needs a pthread_mutex and pthread_cond I made a Worker class and a Master class with these variables. Now this is where I am stuck. C++ does not allow you to pass member functions as the pthread_create(...) handler so I had to make a static handler inside and pass a pointer to itself to reinterpret_cast it to use its class data...
void Worker::start() {
pthread_create(&thread, NULL, &Worker::run, this);
}
void* Worker::run(void *ptr) {
Worker* data = reinterpret_cast<Worker*>(ptr);
}
The problem I have with this, probably wrong, setup is that when I passed in an array of worker pointers to the Master thread it signals a different reference of worker because I think the cast did some sort of copy. So I tried static_cast and same behavior.
I just need some sort of design where the Master and workers can pthread_cond_wait(...) and pthread_cond_signal(...) each other.
Edit 1
Added:
private:
Worker(const Worker&);
Still not working.
Edit Fixed the potential race in all versions:
1./1b Employs a sempaaphore built from a (mutex+condition+counter) as outlined in C++0x has no semaphores? How to synchronize threads?
2. uses a 'reverse' wait to ensure that a signal got ack-ed by the intended worker
I'd really suggest to use c++11 style <thread> and <condition_variable> to achieve this.
I have two (and a half) demonstations. They each assume you have 1 master that drives 10 workers. Each worker awaits a signal before it does it's work.
We'll use std::condition_variable (which works in conjunction with a std::mutex) to do the signaling. The difference between the first and second version will be the way in which the signaling is done:
1. Notifying any worker, one at a time:
1b. With a worker struct
2. Notifying all threads, coordinating which recipient worker is to respond
1. Notifying any worker, one at a time:
This is the simplest to do, because there's little coordination going on:
#include <vector>
#include <thread>
#include <mutex>
#include <algorithm>
#include <iostream>
#include <condition_variable>
using namespace std;
class semaphore
{ // see https://stackoverflow.com/questions/4792449/c0x-has-no-semaphores-how-to-synchronize-threads
std::mutex mx;
std::condition_variable cv;
unsigned long count;
public:
semaphore() : count() {}
void notify();
void wait();
};
static void run(int id, struct master& m);
struct master
{
mutable semaphore sem;
master()
{
for (int i = 0; i<10; ++i)
threads.emplace_back(run, i, ref(*this));
}
~master() {
for(auto& th : threads) if (th.joinable()) th.join();
std::cout << "done\n";
}
void drive()
{
// do wakeups
for (unsigned i = 0; i<threads.size(); ++i)
{
this_thread::sleep_for(chrono::milliseconds(rand()%100));
sem.notify();
}
}
private:
vector<thread> threads;
};
static void run(int id, master& m)
{
m.sem.wait();
{
static mutex io_mx;
lock_guard<mutex> lk(io_mx);
cout << "signaled: " << id << "\n";
}
}
int main()
{
master instance;
instance.drive();
}
/// semaphore members
void semaphore::notify()
{
lock_guard<mutex> lk(mx);
++count;
cv.notify_one();
}
void semaphore::wait()
{
unique_lock<mutex> lk(mx);
while(!count)
cv.wait(lk);
--count;
}
1b. With a worker struct
Note, if you had worker classes with worker::run a non-static member function, you can do the same with minor modifications:
struct worker
{
worker(int id) : id(id) {}
void run(master& m) const;
int id;
};
// ...
struct master
{
// ...
master()
{
for (int i = 0; i<10; ++i)
workers.emplace_back(i);
for (auto& w: workers)
threads.emplace_back(&worker::run, ref(w), ref(*this));
}
// ...
void worker::run(master& m) const
{
m.sem.wait();
{
static mutex io_mx;
lock_guard<mutex> lk(io_mx);
cout << "signaled: " << id << "\n";
}
}
A caveat
cv.wait() could suffer spurious wake-ups, in which the condition variable wasn't atually raised (e.g. in the event of OS signal handlers). This is a common thing to happen with condition variables on any platfrom.
The following approach fixes this:
2. Notifying all threads, coordinating which recipient worker
Use a flag to signal which thread was intended to receive the signal:
struct master
{
mutable mutex mx;
mutable condition_variable cv;
int signaled_id; // ADDED
master() : signaled_id(-1)
{
Let's pretend that driver got a lot more interesting and wants to signal all workers in a specific (random...) order:
void drive()
{
// generate random wakeup order
vector<int> wakeups(10);
iota(begin(wakeups), end(wakeups), 0);
random_shuffle(begin(wakeups), end(wakeups));
// do wakeups
for (int id : wakeups)
{
this_thread::sleep_for(chrono::milliseconds(rand()%1000));
signal(id);
}
}
private:
void signal(int id) // ADDED id
{
unique_lock<mutex> lk(mx);
std::cout << "signaling " << id << "\n";
signaled_id = id; // ADDED put it in the shared field
cv.notify_all();
cv.wait(lk, [&] { return signaled_id == -1; });
}
Now all we have to do is make sure that the receiving thread checks that it's id matches:
m.cv.wait(lk, [&] { return m.signaled_id == id; });
m.signaled_id = -1;
m.cv.notify_all();
This puts an end to spurious wake-ups.
Full code listings/live demos:
1. notify_one.cpp http://coliru.stacked-crooked.com/view?id=c968f8cffd57afc2a0c6777105203f85-03e740563a9d9c6bf97614ba6099fe92
1b. id. with worker struct: http://coliru.stacked-crooked.com/view?id=7bd224c42130a0461b0c894e0b7c74ae-03e740563a9d9c6bf97614ba6099fe92
2. notify_all.cpp http://coliru.stacked-crooked.com/view?id=1d3145ccbb93c1bec03b232d372277b8-03e740563a9d9c6bf97614ba6099fe92
It is not clear what your exact circumstances are, but it seems like you are using a container to hold your "Worker" instances that are created in main, and passing them to your "Master". If this is the case, there are a few remedies available to you. You need to pick one that is appropriate to your implementation.
Pass a reference to the container in main to the Master.
Change the container to hold (smart) pointers to Workers.
Make the container part of "Master" itself, so that it doesn't need to be passed to it.
Implement a proper destructor, copy constructor, and assignment operator for your Worker class (in other words, obey the Rule of Three).
Technically speaking, since pthread_create() is a C API, the function pointer that is passed to it needs to have C linkage (extern "C"). You can't make a method of a C++ class have C linkage, so you should define an external function:
extern "C" { static void * worker_run (void *arg); }
class Worker { //...
};
static void * worker_run (void *arg) {
return Worker::run(arg);
}

Multi Threading Using Boost C++ - Synchronisation Issue

I would like to do multithreading where Thread ONE passes data to 4-5 Worker Threads which process the data and ones ALL Worker Threads are finished I would like to continue. I'm using boost to realize that however I have a synchronisation problem. Meaning at one point the program stops and doesn't continue working.
I used OpenMP before and that works nicely but I would like to set the thread priorities individually and I could not figure out how to do that with OpenMP therefore I worked on my own solution:
I would be very glad if some could give hints to find the bug in this code or could help me to find another approach for the problem.
Thank you,
KmgL
#include <QCoreApplication>
#include <boost/thread.hpp>
#define N_CORE 6
#define N_POINTS 10
#define N_RUNS 100000
class Sema{
public:
Sema(int _n =0): m_count(_n),m_mut(),m_cond(){}
void set(int _n)
{
boost::unique_lock<boost::mutex> w_lock(m_mut);
m_count = -_n;
}
void wait()
{
boost::unique_lock<boost::mutex> lock(m_mut);
while (m_count < 0)
{
m_cond.wait(lock);
}
--m_count;
}
void post()
{
boost::unique_lock<boost::mutex> lock(m_mut);
++m_count;
m_cond.notify_all();
}
private:
boost::condition_variable m_cond;
boost::mutex m_mut;
int m_count;
};
class Pool
{
private:
boost::thread m_WorkerThread;
boost::condition_variable m_startWork;
bool m_WorkerRun;
bool m_InnerRun;
Sema * m_sem;
std::vector<int> *m_Ep;
std::vector<int> m_ret;
void calc()
{
unsigned int no_pt(m_Ep->size());
std::vector<int> c_ret;
for(unsigned int i=0;i<no_pt;i++)
c_ret.push_back(100 + m_Ep->at(i));
m_ret = c_ret;
}
void run()
{
boost::mutex WaitWorker_MUTEX;
while(m_WorkerRun)
{
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_startWork.wait(u_lock);
calc();
m_sem->post();
}
}
public:
Pool():m_WorkerRun(false),m_InnerRun(false){}
~Pool(){}
void start(Sema * _sem){
m_WorkerRun = true;
m_sem = _sem;
m_ret.clear();
m_WorkerThread = boost::thread(&Pool::run, this);}
void stop(){m_WorkerRun = false;}
void join(){m_WorkerThread.join();}
void newWork(std::vector<int> &Ep)
{
m_Ep = &Ep;
m_startWork.notify_all();
}
std::vector<int> getWork(){return m_ret;}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
Pool TP[N_CORE];
Sema _sem(0);
for(int k=0;k<N_CORE;k++)
TP[k].start(&_sem);
boost::this_thread::sleep(boost::posix_time::milliseconds(10));
std::vector<int> V[N_CORE];
for(int k=0;k<N_CORE;k++)
for(int i=0;i<N_POINTS;i++)
{
V[k].push_back((k+1)*1000+i);
}
for(int j=0;j<N_RUNS;j++)
{
_sem.set(N_CORE);
for(int k=0;k<N_CORE;k++)
{
TP[k].newWork(V[k]);
}
_sem.wait();
for(int k=0;k<N_CORE;k++)
{
V[k].clear();
V[k]=TP[k].getWork();
if(V[k].size()!=N_POINTS)
std::cout<<"ERROR: "<<"V["<<k<<"].size(): "<<V[k].size()<<std::endl;
}
if((j+1)%100==0)
std::cout<<"LOOP: "<<j+1<<std::endl;
}
std::cout<<"FINISHED: "<<std::endl;
return a.exec();
}
You have a race between the calls to Pool::newWork() and Pool::run().
You have to remember that signaling/broadcasting a condition variable is not a sticky event. If your thread is not waiting on the condition variable at the time of the signaling, the signal will be lost. This is what can happen in your program: There is nothing that prevents your main thread to call Pool::newWork() on each of your Pool objects before they have time to call wait() on your condition variable.
To solve this, you need to move boost::mutex WaitWorker_MUTEX as a class member instead of it being a local variable. Pool::newWork() needs to grab that mutex before doing updates:
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_Ep = &Ep;
m_startWork.notify(); // no need to use notify_all()
Since you're using a condition variable in Pool::run(), you need to handle spurious wakeup. I would recommend setting m_Ep to NULL when you construct the object and every time you're done with the work item:
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
while (1) {
while (m_Ep == NULL && m_workerRun) {
m_startWork.wait(u_lock);
}
if (!m_workerRun) {
return;
}
calc();
m_sem->post();
m_Ep = NULL;
}
stop() will need to grab the mutex and notify():
boost::unique_lock<boost::mutex> u_lock(WaitWorker_MUTEX);
m_workRun = false;
m_startWork.notify();
These changes should make the 10ms sleep you have un-necessary. You do not seem to call Pool::stop() or Pool::join(). You should change your code to call them.
You'll also get better performance by working on m_ret in Pool::calc() than copying the result at the end. You're also doing copies when you return the work. You might want Pool::getWork() to return a const ref to m_ret.
I have not run this code so there might be other issues. It should help you move
It seems from your code that you're probably wondering why condition variables need to go hand in hand with a mutex (because you declare one local mutex in Pool::run()). I hope my fix makes it clearer.
It could be done with Boost futures. Start the threads then wait for all of them to finish. No other synchronization needed.

consumer/producer in c++

This is a classic c/p problem where some threads produce data while other read the data. Both the producer and consumers are sharing a const sized buffer. If the buffer is empty then the consumers have to wait and if it is full then the producer has to wait. I am using semaphores to keep track of full or empty queues. The producer is going to decrement free spots semaphore, add value, and increment filled slots semaphore. So I am trying to implement a program that gets some numbers from the generator function, and then prints out the average of the numbers. By treating this as a producer-consumer problem, I am trying to save some time in the execution of the program. The generateNumber function causes some delay in the process so I want to create a number of threads that generate numbers, and put them into a queue. Then the "main thread" which is running the main function has to read from the queue and find the sum and then average. So here is what I have so far:
#include <cstdio>
#include <cstdlib>
#include <time.h>
#include "Thread.h"
#include <queue>
int generateNumber() {
int delayms = rand() / (float) RAND_MAX * 400.f + 200;
int result = rand() / (float) RAND_MAX * 20;
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = delayms * 1000000;
nanosleep(&ts, NULL);
return result; }
struct threadarg {
Semaphore filled(0);
Semaphore empty(n);
std::queue<int> q; };
void* threadfunc(void *arg) {
threadarg *targp = (threadarg *) arg;
threadarg &targ = *targp;
while (targ.empty.value() != 0) {
int val = generateNumber();
targ.empty.dec();
q.push_back(val);
targ.filled.inc(); }
}
int main(int argc, char **argv) {
Thread consumer, producer;
// read the command line arguments
if (argc != 2) {
printf("usage: %s [nums to average]\n", argv[0]);
exit(1); }
int n = atoi(argv[1]);
// Seed random number generator
srand(time(NULL));
}
I am a bit confused now because I am not sure how to create multiple producer threads that are generating numbers (if q is not full) while the consumer is reading from the queue (that is if q is not empty). I am not sure what to put in the main to implment it.
also in "Thread.h", you can create a thread, a mutex, or a semaphore. The thread has the methods .run(threadFunc, arg), .join(), etc. A mutex can be locked or unlocked. The semaphore methods have all been used in my code.
Your queue is not synchronized, so multiple producers could call push_back at the same time, or at the same time the consumer is calling pop_front ... this will break.
The simple approach to making this work is to use a thread-safe queue, which can be a wrapper around the std::queue you already have, plus a mutex.
You can start by adding a mutex, and locking/unlocking it around each call you forward to std::queue - for a single consumer that should be sufficient, for multiple consumers you'd need to fuse front() and pop_front() into a single synchronized call.
To let the consumer block while the queue is empty, you can add a condition variable to your wrapper.
That should be enough that you can find the answer online - sample code below.
template <typename T> class SynchronizedQueue
{
std::queue<T> queue_;
std::mutex mutex_;
std::condition_variable condvar_;
typedef std::lock_guard<std::mutex> lock;
typedef std::unique_lock<std::mutex> ulock;
public:
void push(T const &val)
{
lock l(mutex_); // prevents multiple pushes corrupting queue_
bool wake = queue_.empty(); // we may need to wake consumer
queue_.push(val);
if (wake) condvar_.notify_one();
}
T pop()
{
ulock u(mutex_);
while (queue_.empty())
condvar_.wait(u);
// now queue_ is non-empty and we still have the lock
T retval = queue_.front();
queue_.pop();
return retval;
}
};
Replace std::mutex et al with whatever primitives your "Thread.h" gives you.
What I would do is this:
Make a data class that hides your queue
Create thread-safe accessor methods for saving a piece of data to the q, and removing a piece of data from the q ( I would use a single mutex, or a critical section for accessors)
Handle the case where a consumor doesn't have any data to work with (sleep)
Handle the case where the q is becoming too full, and the producers need to slow down
Let the threads go willy-nilly adding and removing as they produce / consume
Also, remember to add a sleep into each and every thread, or else you'll peg the CPU and not give the thread scheduler a good spot to switch contexts and share the CPU with other threads / processes. You don't need to, but it's a good practice.
When managing shared state like this, you need a condition variable and
a mutex. The basic pattern is a function along the lines of:
ScopedLock l( theMutex );
while ( !conditionMet ) {
theCondition.wait( theMutex );
}
doWhatever();
theCondition.notify();
In your case, I'd probably make the condition variable and the mutex
members of the class implementing the queue. To write, the
conditionMet would be !queue.full(), so you'd end up with something
like:
ScopedLock l( queue.myMutex );
while ( queue.full() ) {
queue.myCondition.wait();
}
queue.insert( whatever );
queue.myCondition.notify();
and to read:
ScopedLock l( queue.myMutex );
while ( queue.empty() ) {
queue.myCondition.wait();
}
results = queue.extract();
queue.myCondition.notify();
return results;
Depending on the threading interface, there may be two notify
functions: notify one (which wakes up a single thread), and notify all
(which wakes up all of the waiting threads); in this case, you'll need
notify all (or you'll need two condition variables, one for space to
write, and one for something to read, with each function waiting on one,
but notifying the other).
Protect the queue accesses with a mutex, that should be it. A 'Computer Science 101' bounded producer-consumer queue needs two semaphores, (to manage the free/empty count and for producers/consumers to wait on, as you are already doing), and one mutex/futex/criticalSection to protect the queue.
I don't see how replacing the semaphores and mutex with condvars is any great help. What's the point? How do you implement a bounded producer-consumer queue with condvars that works on all platforms with multiple producers/consumers?
#include<iostream>
#include<deque>
#include<mutex>
#include<chrono>
#include<condition_variable>
#include<thread>
using namespace std;
mutex mu,c_out;
condition_variable cv;
class Buffer
{
public:
Buffer() {}
void add(int ele)
{
unique_lock<mutex> ulock(mu);
cv.wait(ulock,[this](){return q.size()<_size;});
q.push_back(ele);
mu.unlock();
cv.notify_all();
return;
}
int remove()
{
unique_lock<mutex> ulock(mu);
cv.wait(ulock,[this](){return q.size()>0;});
int v=q.back();
q.pop_back();
mu.unlock();
cv.notify_all();
return v;
}
int calculateAvarage()
{
int total=0;
unique_lock<mutex> ulock(mu);
cv.wait(ulock,[this](){return q.size()>0;});
deque<int>::iterator it = q.begin();
while (it != q.end())
{
total += *it;
std::cout << ' ' << *it++;
}
return total/q.size();
}
private:
deque<int> q;
const unsigned int _size=10;
};
class Producer
{
public:
Producer(Buffer *_bf=NULL)
{
this->bf=_bf;
}
void Produce()
{
while(true)
{
int num=rand()%10;
bf->add(num);
c_out.lock();
cout<<"Produced:"<<num<<"avarage:"<<bf->calculateAvarage()<<endl;
this_thread::sleep_for(chrono::microseconds(5000));
c_out.unlock();
}
}
private:
Buffer *bf;
};
class Consumer
{
public:
Consumer(Buffer *_bf=NULL)
{
this->bf=_bf;
}
void Consume()
{
while (true)
{
int num=bf->remove();
c_out.lock();
cout<<"Consumed:"<<num<<"avarage:"<<bf->calculateAvarage()<<endl;
this_thread::sleep_for(chrono::milliseconds(5000));
c_out.unlock();
}
}
private:
Buffer *bf;
};
int main()
{
Buffer b;
Consumer c(&b);
Producer p(&b);
thread th1(&Producer::Produce,&p);
thread th2(&Consumer::Consume,&c);
th1.join();
th2.join();
return 0;
}
Buffer class has doublended queue and max Buffer size of 10.
It has two function to add into queue and remove from queue.
Buffer class has calculateAvarage() function which will calculate the avarage echa time a element is added or deleted.
There are two more classes one is producer and consumer having buffwr class pointer .
We are having Consume() in consumer class and Produce() in Producer class.
Consume()>>Lock the buffer and check if size is of buffer is not 0 it will remove from Buffer and notify to producer and unlock.
Produce()>>Lok the buffer and check if size is of buffer is not max buffer size it will add and notify to consumer and unlock.