How to assign N tasks to M threads max.? - c++

Im new to C++, and trying to get my head around multithreading. I’ve got the basics covered. Now imagine this situation:
I have, say, N tasks that I want to have completed ASAP. That‘s easy, just start N threads and lean back. But I’m not sure if this will work for N=200 or more.
So I’d like to say: I have N tasks, and I want to start a limited number of M worker threads. How do I schedule a task to be issued to a new thread once one of the previous threads has finished?
Or is all this taken care of by the OS or runtime, and I need not worry at all, even if N gets really big?

No, you don’t want to create 200 threads. While it would likely work just fine, creating a thread involves significant processing overhead. Rather, you want a “task queue” system, where a pool of worker threads (generally equal in size to the number of CPU cores) draw from a shared queue of things that need to be done. Intel TBB contains a commonly used task queue implementation, but there are others as well.

std::thread::hardware_concurrancy may be useful to decide how many threads you want. If it returns anything but 0 it is the number of concurrent threads which can run simultaneously. It's often the number of CPU cores multiplied with the number of hyperthreads each core may run. 12 cores and 2 HT:s/core makes 24. Exceeding this number will likely just slow everything down.
You can create a pool of threads standing by to grab work on your command since creating threads is somewhat expensive. If you have 1000000 tasks to deal with, you want the 24 threads (in this example) to be up all the time.
This is a very common scenario though and since C++17 there is an addition to many of the standard algorithms, like std::for_each, to make them execute according to execution policies. If you want it to execute in parallel, it'll use a built-in thread pool (most likely) to finish the task.
Example:
#include <algorithm>
#include <execution>
#include <vector>
struct Task {
some_type data_to_work_on;
some_type result;
};
int main() {
std::vector<Task> tasks;
std::for_each(std::execution::par, tasks.begin(), tasks.end(), [](Task& t) {
// work on task `t` here
});
// all tasks done, check the result in each.
}

I have N tasks, and I want to start a limited number of M worker threads.
How do I schedule a task to be issued to a new thread once
one of the previous threads has finished?
Set your thread pool size, M, taking into account the number of threads available in your system (hardware_concurrency).
Use a counting_semaphore to make sure you don't launch a task if there is not an available thread pool slot.
Loop through your N tasks, acquiring a thread pool slot, running the task, and releasing the thread pool slot. Notice that, since tasks are launched asynchronously, you will be able to have M tasks running in parallel.
[Demo]
#include <future> // async
#include <iostream> // cout
#include <semaphore> // counting_semaphore
#include <vector>
static const size_t THREAD_POOL_SIZE_DEFAULT{ std::thread::hardware_concurrency() };
static const size_t THREAD_POOL_SIZE_MAX{ std::thread::hardware_concurrency() * 2 };
static const size_t NUM_TASKS_DEFAULT{ 20 };
template <typename F>
void run_tasks(
F&& f,
size_t thread_pool_size = THREAD_POOL_SIZE_DEFAULT,
size_t num_tasks = NUM_TASKS_DEFAULT)
{
thread_pool_size = std::min(thread_pool_size, THREAD_POOL_SIZE_MAX);
std::counting_semaphore task_slots(thread_pool_size);
auto futures{ std::vector<std::future<void>>(num_tasks) };
auto task_results{ std::vector<int>(num_tasks) };
// We can run thread_pool_size tasks in parallel
// If all task slots are busy, we have to wait for a task to finish
for (size_t i{ 0 }; i < num_tasks; ++i)
{
// Wait for a task slot to be free
task_slots.acquire();
futures[i] = std::async(
std::launch::async,
[i, &f, &task_result = task_results[i], &task_slots]() {
// Execute task
task_result = std::forward<F>(f)(i);
// Release the task slot
task_slots.release();
}
);
}
// Wait for all the tasks to finish
for (auto& future : futures) { future.get(); };
for (auto& result: task_results) { std::cout << result << " "; }
}
int main()
{
run_tasks([](int i) { return i * i; }, 4, 20);
}

This is my take on a threadpool (not extensively debugged yet). In main, it starts a threadpool with the maximum of threads the hardware allows (the thing Ted Lyngmo was referring to)
There are quite a few things involved since this threadpool also allows callers to get back the results of asynchronously started call
std::shared_future (to return a result to caller if needed)
std::packaged_task (to hold a call)
std::condition_variable (to communicate that stuff has entered the queue, or to signal all threads should stop)
std::mutex/std::unique_lock (to protect the queue of calls)
std::thread (ofcourse)
use of lambda's
#include <cassert>
#include <condition_variable>
#include <exception>
#include <iostream>
#include <mutex>
#include <future>
#include <thread>
#include <vector>
#include <queue>
//=====================================================================================================================================
namespace details
{
// task_itf is something the threadpool can call to start a scheduled function call
// independent of argument and/or return value types
class task_itf
{
public:
virtual void execute() = 0;
};
//-------------------------------------------------------------------------------------------------------------------------------------
// A task is a container for a function call + arguments a future.
// but is already specialized for the return value type of the function call
// which the future also needs
//
template<typename retval_t>
class task final :
public task_itf
{
public:
template<typename lambda_t>
explicit task(lambda_t&& lambda) :
m_task(lambda)
{
}
std::future<retval_t> get_future()
{
return m_task.get_future();
}
std::shared_future<retval_t> get_shared_future()
{
return std::shared_future<retval_t>(m_task.get_future());
}
virtual void execute() override
{
m_task();
}
private:
std::packaged_task<retval_t()> m_task;
};
class stop_exception :
public std::exception
{
};
}
//-------------------------------------------------------------------------------------------------------------------------------------
// actual thread_pool class
class thread_pool_t
{
public:
// construct a thread_pool with specified number of threads.
explicit thread_pool_t(const std::size_t size) :
m_stop{ false }
{
std::condition_variable signal_started;
std::atomic<std::size_t> number_of_threads_started{ 0u };
for (std::size_t n = 0; n < size; ++n)
{
// move the thread into the vector, no need to copy
m_threads.push_back(std::move(std::thread([&]()
{
{
number_of_threads_started++;
signal_started.notify_all();
}
thread_loop();
})));
}
// wait for all threads to have started.
std::mutex mtx;
std::unique_lock<std::mutex> lock{ mtx };
signal_started.wait(lock, [&] { return number_of_threads_started == size; });
}
// destructor signals all threads to stop as soon as they are done.
// then waits for them to stop.
~thread_pool_t()
{
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_stop = true;
}
m_wakeup.notify_all();
for (auto& thread : m_threads)
{
thread.join();
}
}
// pass a function asynchronously to the threadpool
// this function returns a future so the calling thread
// my synchronize with a result if it so wishes.
template<typename lambda_t>
auto async(lambda_t&& lambda)
{
using retval_t = decltype(lambda());
auto task = std::make_shared<details::task<retval_t>>(lambda);
queue_task(task);
return task->get_shared_future();
}
// let the threadpool run the function but wait for
// the threadpool thread to finish
template<typename lambda_t>
auto sync(lambda_t&& lambda)
{
auto ft = async(lambda);
return ft.get();
}
void synchronize()
{
sync([] {});
}
private:
void queue_task(const std::shared_ptr<details::task_itf>& task_ptr)
{
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_queue.push(task_ptr);
}
// signal only one thread, first waiting thread to wakeup will run the next task.
m_wakeup.notify_one();
}
std::shared_ptr<details::task_itf> get_next_task()
{
static auto pred = [this] { return (m_stop || (m_queue.size() > 0)); };
std::unique_lock<std::mutex> lock(m_queue_mutex);
while (!pred())
{
m_wakeup.wait(lock, pred);
}
if (m_stop)
{
// use exception to break out of the mainloop
throw details::stop_exception();
}
auto task = m_queue.front();
m_queue.pop();
return task;
}
void thread_loop()
{
try
{
while (auto task = get_next_task())
{
task->execute();
}
}
catch (const details::stop_exception&)
{
}
}
std::vector<std::thread> m_threads;
std::mutex m_queue_mutex;
std::queue<std::shared_ptr<details::task_itf>> m_queue;
std::condition_variable m_wakeup;
bool m_stop;
};
//-----------------------------------------------------------------------------
int main()
{
thread_pool_t thread_pool{ std::thread::hardware_concurrency() };
for (int i = 0; i < 200; i++)
{
// just schedule asynchronous calls, returned futures are not used in this example
thread_pool.async([i]
{
std::cout << i << " ";
});
}
// this threadpool will not by default wait until all work is finished
// but stops processing when destructed.
// a call to synchronize will block until all work is done that is queued up till this moment.
thread_pool.synchronize();
std::cout << "\nDone...\n";
return 0;
}

Related

Calling a Function in Parallel C++

I want to call a function in parallel in C++, which waits for some time and performs some task. But I don't want the execution flow to wait for the function. I considered using pthread in a simple way but again, I have to wait till it joins back !
void A_Function()
{
/* Call a function which waits for some time and then perform some tasks */
/* Do not wait for the above function to return and continue performing the background tasks */
}
Note: If I do not perform the background tasks while calling the function in parallel then in the next cycle, the function doesn't give me correct output.
Thanks in advance.
Use a std::future to package a std::async task. Wait for the future at the head of your function to ensure that it's completed before the next iteration, since you stated that the next iteration depends on the execution of this background task.
In the example below, I make the background task a simple atomic increment of a counter, and the foreground task just returns the counter value. This is for illustrative purposes only!
#include <iostream>
#include <future>
#include <thread>
class Foo {
public:
Foo() : counter_(0) {}
std::pair<int, std::future<void>> a_function(std::future<void>& f) {
// Ensure that the background task from the previous iteration
// has completed
f.wait();
// Set the task for the next iteration
std::future<void> fut = std::async(std::launch::async,
&Foo::background_task, this);
// Do some work
int value = counter_.load();
// Return the result and the future for the next iteration
return std::make_pair(value, std::move(fut));
}
void background_task() {
++counter_;
}
private:
std::atomic<int> counter_;
};
int main() {
// Bootstrap the procedure with some empty task...
std::future<void> bleak = std::async(std::launch::deferred, [](){});
Foo foo;
// Iterate...
for (size_t i = 0; i < 10; ++i) {
// Call the function
std::pair<int, std::future<void>> result = foo.a_function(bleak);
// Set the future for the next iteration
bleak = std::move(result.second);
// Do something with the result
std::cout << result.first << "\n";
}
}
Live example

How can I tell when my ThreadPool is finished with its tasks?

In c++11, I have a ThreadPool object which manages a number of threads that are enqueued via a single lambda function. I know how many rows of data I have to work on and so I know ahead of time that I will need to queue N jobs. What I am not sure about is how to tell when all of those jobs are finished, so I can move on to the next step.
This is the code to manage the ThreadPool:
#include <cstdlib>
#include <vector>
#include <deque>
#include <iostream>
#include <atomic>
#include <thread>
#include <mutex>
#include <condition_variable>
class ThreadPool;
class Worker {
public:
Worker(ThreadPool &s) : pool(s) { }
void operator()();
private:
ThreadPool &pool;
};
class ThreadPool {
public:
ThreadPool(size_t);
template<class F>
void enqueue(F f);
~ThreadPool();
void joinAll();
int taskSize();
private:
friend class Worker;
// the task queue
std::deque< std::function<void()> > tasks;
// keep track of threads
std::vector< std::thread > workers;
// sync
std::mutex queue_mutex;
std::condition_variable condition;
bool stop;
};
void Worker::operator()()
{
std::function<void()> task;
while(true)
{
{ // acquire lock
std::unique_lock<std::mutex>
lock(pool.queue_mutex);
// look for a work item
while ( !pool.stop && pool.tasks.empty() ) {
// if there are none wait for notification
pool.condition.wait(lock);
}
if ( pool.stop ) {// exit if the pool is stopped
return;
}
// get the task from the queue
task = pool.tasks.front();
pool.tasks.pop_front();
} // release lock
// execute the task
task();
}
}
// the constructor just launches some amount of workers
ThreadPool::ThreadPool(size_t threads)
: stop(false)
{
for (size_t i = 0;i<threads;++i) {
workers.push_back(std::thread(Worker(*this)));
}
//workers.
//tasks.
}
// the destructor joins all threads
ThreadPool::~ThreadPool()
{
// stop all threads
stop = true;
condition.notify_all();
// join them
for ( size_t i = 0;i<workers.size();++i) {
workers[i].join();
}
}
void ThreadPool::joinAll() {
// join them
for ( size_t i = 0;i<workers.size();++i) {
workers[i].join();
}
}
int ThreadPool::taskSize() {
return tasks.size();
}
// add new work item to the pool
template<class F>
void ThreadPool::enqueue(F f)
{
{ // acquire lock
std::unique_lock<std::mutex> lock(queue_mutex);
// add the task
tasks.push_back(std::function<void()>(f));
} // release lock
// wake up one thread
condition.notify_one();
}
And then I distribute my job among threads like this:
ThreadPool pool(4);
/* ... */
for (int y=0;y<N;y++) {
pool->enqueue([this,y] {
this->ProcessRow(y);
});
}
// wait until all threads are finished
std::this_thread::sleep_for( std::chrono::milliseconds(100) );
Waiting for 100 milliseconds works just because I know those jobs can complete in less time than 100ms, but obviously its not the best approach. Once it has completed N rows of processing it needs to go through another 1000 or so generations of the same thing. Obviously, I want to begin the next generation as soon as I can.
I know there must be some way to add code into my ThreadPool so that I can do something like this:
while ( pool->isBusy() ) {
std::this_thread::sleep_for( std::chrono::milliseconds(1) );
}
I've been working on this for a couple nights now and I find it hard to find good examples of how to do this. So, what would be the proper way to implementat my isBusy() method?
I got it!
First of all, I introduced a few extra members to the ThreadPool class:
class ThreadPool {
/* ... exisitng code ... */
/* plus the following */
std::atomic<int> njobs_pending;
std::mutex main_mutex;
std::condition_variable main_condition;
}
Now, I can do better than checking some status every X amount of time. Now, I can block the Main loop until no more jobs are pending:
void ThreadPool::waitUntilCompleted(unsigned n) {
std::unique_lock<std::mutex> lock(main_mutex);
main_condition.wait(lock);
}
As long as I manage what's pending with the following bookkeeping code, at the head of the ThreadPool.enqueue() function:
njobs_pending++;
and right after I run the task in the Worker::operator()() function:
if ( --pool.njobs_pending == 0 ) {
pool.main_condition.notify_one();
}
Then the main thread can enqueue whatever tasks are necessary and then sit and wait until all calculations are completed with:
for (int y=0;y<N;y++) {
pool->enqueue([this,y] {
this->ProcessRow(y);
});
}
pool->waitUntilCompleted();
You may need to create an internal structure of threads associated with a bool variable flag.
class ThreadPool {
private:
// This Structure Will Keep Track Of Each Thread's Progress
struct ThreadInfo {
std::thread thread;
bool isDone;
ThreadInfo( std::thread& threadIn ) :
thread( threadIn ), isDone(false)
{}
}; // ThredInfo
// This Vector Should Be Populated In The Constructor Initially And
// Updated Anytime You Would Add A New Task.
// This Should Also Replace // std::vector<std::thread> workers
std::vector<ThreadInfo> workers;
public:
// The rest of your class would appear to be the same, but you need a
// way to test if a particular thread is currently active. When the
// thread is done this bool flag would report as being true;
// This will only return or report if a particular thread is done or not
// You would have to set this variable's flag for a particular thread to
// true when it completes its task, otherwise it will always be false
// from moment of creation. I did not add in any bounds checking to keep
// it simple which should be taken into consideration.
bool isBusy( unsigned idx ) const {
return workers[idx].isDone;
}
};
If you have N jobs and they have to be awaited for by calling thread sleep, then the most efficient way would be to create somewhere a variable, that would be set by an atomic operation to N before scheduling jobs and inside each job when done with computation, there would be atomic decrement of the variable. Then you can use atomic instruction to test if the variable is zero.
Or locked decrement with wait handles, when the variable would decrement to zero.
I just have to say, I do not like this idea you are asking for:
while ( pool->isBusy() ) {
std::this_thread::sleep_for( std::chrono::milliseconds(1) );
}
It just does not fit well, it won't be 1ms almost never, it is using resources needlessly etc...
The best way would be to decrement some variable atomically, and test atomically the variable if all done and the last job will simply based on atomic test set WaitForSingleObject.
And if you must, the waiting will be on WaitForSingleObject, and would woke up after completion, not many times.
WaitForSingleObject

When one worker thread fails, how to abort remaining workers?

I have a program which spawns multiple threads, each of which executes a long-running task. The main thread then waits for all worker threads to join, collects results, and exits.
If an error occurs in one of the workers, I want the remaining workers to stop gracefully, so that the main thread can exit shortly afterwards.
My question is how best to do this, when the implementation of the long-running task is provided by a library whose code I cannot modify.
Here is a simple sketch of the system, with no error handling:
void threadFunc()
{
// Do long-running stuff
}
void mainFunc()
{
std::vector<std::thread> threads;
for (int i = 0; i < 3; ++i) {
threads.push_back(std::thread(&threadFunc));
}
for (auto &t : threads) {
t.join();
}
}
If the long-running function executes a loop and I have access to the code, then
execution can be aborted simply by checking a shared "keep on running" flag at the top of each iteration.
std::mutex mutex;
bool error;
void threadFunc()
{
try {
for (...) {
{
std::unique_lock<std::mutex> lock(mutex);
if (error) {
break;
}
}
}
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
Now consider the case when the long-running operation is provided by a library:
std::mutex mutex;
bool error;
class Task
{
public:
// Blocks until completion, error, or stop() is called
void run();
void stop();
};
void threadFunc(Task &task)
{
try {
task.run();
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
In this case, the main thread has to handle the error, and call stop() on
the still-running tasks. As such, it cannot simply wait for each worker to
join() as in the original implementation.
The approach I have used so far is to share the following structure between
the main thread and each worker:
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
}
When a worker completes successfully, it decrements the running count. If
an exception is caught, the worker sets the error flag. In both cases, it
then calls condVar.notify_one().
The main thread then waits on the condition variable, waking up if either
error is set or running reaches zero. On waking up, the main thread
calls stop() on all tasks if error has been set.
This approach works, but I feel there should be a cleaner solution using some
of the higher-level primitives in the standard concurrency library. Can
anyone suggest an improved implementation?
Here is the complete code for my current solution:
// main.cpp
#include <chrono>
#include <mutex>
#include <thread>
#include <vector>
#include "utils.h"
// Class which encapsulates long-running task, and provides a mechanism for aborting it
class Task
{
public:
Task(int tidx, bool fail)
: tidx(tidx)
, fail(fail)
, m_run(true)
{
}
void run()
{
static const int NUM_ITERATIONS = 10;
for (int iter = 0; iter < NUM_ITERATIONS; ++iter) {
{
std::unique_lock<std::mutex> lock(m_mutex);
if (!m_run) {
out() << "thread " << tidx << " aborting";
break;
}
}
out() << "thread " << tidx << " iter " << iter;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
if (fail) {
throw std::exception();
}
}
}
void stop()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_run = false;
}
const int tidx;
const bool fail;
private:
std::mutex m_mutex;
bool m_run;
};
// Data shared between all threads
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
SharedData(int count)
: error(false)
, running(count)
{
}
};
void threadFunc(Task &task, SharedData &shared)
{
try {
out() << "thread " << task.tidx << " starting";
task.run(); // Blocks until task completes or is aborted by main thread
out() << "thread " << task.tidx << " ended";
} catch (std::exception &) {
out() << "thread " << task.tidx << " failed";
std::unique_lock<std::mutex> lock(shared.mutex);
shared.error = true;
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
--shared.running;
}
shared.condVar.notify_one();
}
int main(int argc, char **argv)
{
static const int NUM_THREADS = 3;
std::vector<std::unique_ptr<Task>> tasks(NUM_THREADS);
std::vector<std::thread> threads(NUM_THREADS);
SharedData shared(NUM_THREADS);
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
const bool fail = (tidx == 1);
tasks[tidx] = std::make_unique<Task>(tidx, fail);
threads[tidx] = std::thread(&threadFunc, std::ref(*tasks[tidx]), std::ref(shared));
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
// Wake up when either all tasks have completed, or any one has failed
shared.condVar.wait(lock, [&shared](){
return shared.error || !shared.running;
});
if (shared.error) {
out() << "error occurred - terminating remaining tasks";
for (auto &t : tasks) {
t->stop();
}
}
}
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
out() << "waiting for thread " << tidx << " to join";
threads[tidx].join();
out() << "thread " << tidx << " joined";
}
out() << "program complete";
return 0;
}
Some utility functions are defined here:
// utils.h
#include <iostream>
#include <mutex>
#include <thread>
#ifndef UTILS_H
#define UTILS_H
#if __cplusplus <= 201103L
// Backport std::make_unique from C++14
#include <memory>
namespace std {
template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(
Args&& ...args)
{
return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}
} // namespace std
#endif // __cplusplus <= 201103L
// Thread-safe wrapper around std::cout
class ThreadSafeStdOut
{
public:
ThreadSafeStdOut()
: m_lock(m_mutex)
{
}
~ThreadSafeStdOut()
{
std::cout << std::endl;
}
template <typename T>
ThreadSafeStdOut &operator<<(const T &obj)
{
std::cout << obj;
return *this;
}
private:
static std::mutex m_mutex;
std::unique_lock<std::mutex> m_lock;
};
std::mutex ThreadSafeStdOut::m_mutex;
// Convenience function for performing thread-safe output
ThreadSafeStdOut out()
{
return ThreadSafeStdOut();
}
#endif // UTILS_H
I've been thinking about your situation for sometime and this maybe of some help to you. You could probably try doing a couple of different methods to achieve you goal. There are 2-3 options that maybe of use or a combination of all three. I will at minimum show the first option for I'm still learning and trying to master the concepts of Template Specializations as well as using Lambdas.
Using a Manager Class
Using Template Specialization Encapsulation
Using Lambdas.
Pseudo code of a Manager Class would look something like this:
class ThreadManager {
private:
std::unique_ptr<MainThread> mainThread_;
std::list<std::shared_ptr<WorkerThread> lWorkers_; // List to hold finished workers
std::queue<std::shared_ptr<WorkerThread> qWorkers_; // Queue to hold inactive and waiting threads.
std::map<unsigned, std::shared_ptr<WorkerThread> mThreadIds_; // Map to associate a WorkerThread with an ID value.
std::map<unsigned, bool> mFinishedThreads_; // A map to keep track of finished and unfinished threads.
bool threadError_; // Not needed if using exception handling
public:
explicit ThreadManager( const MainThread& main_thread );
void shutdownThread( const unsigned& threadId );
void shutdownAllThreads();
void addWorker( const WorkerThread& worker_thread );
bool isThreadDone( const unsigned& threadId );
void spawnMainThread() const; // Method to start main thread's work.
void spawnWorkerThread( unsigned threadId, bool& error );
bool getThreadError( unsigned& threadID ); // Returns True If Thread Encountered An Error and passes the ID of that thread,
};
Only for demonstration purposes did I use bool value to determine if a thread failed for simplicity of the structure, and of course this can be substituted to your like if you prefer to use exceptions or invalid unsigned values, etc.
Now to use a class of this sort would be something like this: Also note that a class of this type would be considered better if it was a Singleton type object since you wouldn't want more than 1 ManagerClass since you are working with shared pointers.
SomeClass::SomeClass( ... ) {
// This class could contain a private static smart pointer of this Manager Class
// Initialize the smart pointer giving it new memory for the Manager Class and by passing it a pointer of the Main Thread object
threadManager_ = new ThreadManager( main_thread ); // Wouldn't actually use raw pointers here unless if you had a need to, but just shown for simplicity
}
SomeClass::addThreads( ... ) {
for ( unsigned u = 1, u <= threadCount; u++ ) {
threadManager_->addWorker( some_worker_thread );
}
}
SomeClass::someFunctionThatSpawnsThreads( ... ) {
threadManager_->spawnMainThread();
bool error = false;
for ( unsigned u = 1; u <= threadCount; u++ ) {
threadManager_->spawnWorkerThread( u, error );
if ( error ) { // This Thread Failed To Start, Shutdown All Threads
threadManager->shutdownAllThreads();
}
}
// If all threads spawn successfully we can do a while loop here to listen if one fails.
unsigned threadId;
while ( threadManager_->getThreadError( threadId ) ) {
// If the function passed to this while loop returns true and we end up here, it will pass the id value of the failed thread.
// We can now go through a for loop and stop all active threads.
for ( unsigned u = threadID + 1; u <= threadCount; u++ ) {
threadManager_->shutdownThread( u );
}
// We have successfully shutdown all threads
break;
}
}
I like the design of manager class since I have used them in other projects, and they come in handy quite often especially when working with a code base that contains many and multiple resources such as a working Game Engine that has many assets such as Sprites, Textures, Audio Files, Maps, Game Items etc. Using a Manager Class helps to keep track and maintain all of the assets. This same concept can be applied to "Managing" Active, Inactive, Waiting Threads, and knows how to intuitively handle and shutdown all threads properly. I would recommend using an ExceptionHandler if your code base and libraries support exceptions as well as thread safe exception handling instead of passing and using bools for errors. Also having a Logger class is good to where it can write to a log file and or a console window to give an explicit message of what function the exception was thrown in and what caused the exception where a log message might look like this:
Exception Thrown: someFunctionNamedThis in ThisFile on Line# (x)
threadID 021342 failed to execute.
This way you can look at the log file and find out very quickly what thread is causing the exception, instead of using passed around bool variables.
The implementation of the long-running task is provided by a library whose code I cannot modify.
That means you have no way to synchronize the job done by working threads
If an error occurs in one of the workers,
Let's suppose that you can really detect worker errors; some of then can be easily detected if reported by the used library others cannot i.e.
the library code loops.
the library code prematurely exit with an uncaught exception.
I want the remaining workers to stop **gracefully**
That's just not possible
The best you can do is writing a thread manager checking on worker thread status and if an error condition is detected it just (ungracefully) "kills" all the worker threads and exits.
You should also consider detecting a looped working thread (by timeout) and offer to the user the option to kill or continue waiting for the process to finish.
Your problem is that the long running function is not your code, and you say you cannot modify it. Consequently you cannot make it pay any attention whatsoever to any kind of external synchronisation primitive (condition variables, semaphores, mutexes, pipes, etc), unless the library developer has done that for you.
Therefore your only option is to do something that wrestles control away from any code no matter what it's doing. This is what signals do. For that, you're going to have to use pthread_kill(), or whatever the equivalent is these days.
The pattern would be that
The thread that detects an error needs to communicate that error back to the main thread in some manner.
The main thread then needs to call pthread_kill() for all the other remaining threads. Don't be confused by the name - pthread_kill() is simply a way of delivering an arbitrary signal to a thread. Note that signals like STOP, CONTINUE and TERMINATE are process-wide even if raised with pthread_kill(), not thread specific so don't use those.
In each of those threads you'll need a signal handler. On delivery of the signal to a thread the execution path in that thread will jump to the handler no matter what the long running function was doing.
You are now back in (limited) control, and can (probably, well, maybe) do some limited cleanup and terminate the thread.
In the meantime the main thread will have been calling pthread_join() on all the threads it's signaled, and those will now return.
My thoughts:
This is a really ugly way of doing it (and signals / pthreads are notoriously difficult to get right and I'm no expert), but I don't really see what other choice you have.
It'll be a long way from looking 'graceful' in source code, though the end user experience will be OK.
You will be aborting execution part way through running that library function, so if there's any clean up it would normally do (e.g. freeing up memory it has allocated) that won't get done and you'll have a memory leak. Running under something like valgrind is a way of working out if this is happening.
The only way of getting the library function to clean up (if it needs it) will be for your signal handler to return control to the function and letting it run to completion, just what you don't want to do.
And of course, this won't work on Windows (no pthreads, at least none worth speaking of, though there may be an equivalent mechanism).
Really the best way is going to be to re-implement (if at all possible) that library function.

C/C++ pthread signals and pointers

I'm having the hardest time trying to wrap my head around how to allow threads to signal each other.
My design:
The main function creates a single master thread that coordinates a bunch of other worker threads. The main function also creates the workers because the worker threads spawn and exit at intervals programmed in the main. The master thread needs to be able to signal these worker threads and signal_broadcast them all as well as the worker threads have to signal the master back (pthread_cond_signal). Since each thread needs a pthread_mutex and pthread_cond I made a Worker class and a Master class with these variables. Now this is where I am stuck. C++ does not allow you to pass member functions as the pthread_create(...) handler so I had to make a static handler inside and pass a pointer to itself to reinterpret_cast it to use its class data...
void Worker::start() {
pthread_create(&thread, NULL, &Worker::run, this);
}
void* Worker::run(void *ptr) {
Worker* data = reinterpret_cast<Worker*>(ptr);
}
The problem I have with this, probably wrong, setup is that when I passed in an array of worker pointers to the Master thread it signals a different reference of worker because I think the cast did some sort of copy. So I tried static_cast and same behavior.
I just need some sort of design where the Master and workers can pthread_cond_wait(...) and pthread_cond_signal(...) each other.
Edit 1
Added:
private:
Worker(const Worker&);
Still not working.
Edit Fixed the potential race in all versions:
1./1b Employs a sempaaphore built from a (mutex+condition+counter) as outlined in C++0x has no semaphores? How to synchronize threads?
2. uses a 'reverse' wait to ensure that a signal got ack-ed by the intended worker
I'd really suggest to use c++11 style <thread> and <condition_variable> to achieve this.
I have two (and a half) demonstations. They each assume you have 1 master that drives 10 workers. Each worker awaits a signal before it does it's work.
We'll use std::condition_variable (which works in conjunction with a std::mutex) to do the signaling. The difference between the first and second version will be the way in which the signaling is done:
1. Notifying any worker, one at a time:
1b. With a worker struct
2. Notifying all threads, coordinating which recipient worker is to respond
1. Notifying any worker, one at a time:
This is the simplest to do, because there's little coordination going on:
#include <vector>
#include <thread>
#include <mutex>
#include <algorithm>
#include <iostream>
#include <condition_variable>
using namespace std;
class semaphore
{ // see https://stackoverflow.com/questions/4792449/c0x-has-no-semaphores-how-to-synchronize-threads
std::mutex mx;
std::condition_variable cv;
unsigned long count;
public:
semaphore() : count() {}
void notify();
void wait();
};
static void run(int id, struct master& m);
struct master
{
mutable semaphore sem;
master()
{
for (int i = 0; i<10; ++i)
threads.emplace_back(run, i, ref(*this));
}
~master() {
for(auto& th : threads) if (th.joinable()) th.join();
std::cout << "done\n";
}
void drive()
{
// do wakeups
for (unsigned i = 0; i<threads.size(); ++i)
{
this_thread::sleep_for(chrono::milliseconds(rand()%100));
sem.notify();
}
}
private:
vector<thread> threads;
};
static void run(int id, master& m)
{
m.sem.wait();
{
static mutex io_mx;
lock_guard<mutex> lk(io_mx);
cout << "signaled: " << id << "\n";
}
}
int main()
{
master instance;
instance.drive();
}
/// semaphore members
void semaphore::notify()
{
lock_guard<mutex> lk(mx);
++count;
cv.notify_one();
}
void semaphore::wait()
{
unique_lock<mutex> lk(mx);
while(!count)
cv.wait(lk);
--count;
}
1b. With a worker struct
Note, if you had worker classes with worker::run a non-static member function, you can do the same with minor modifications:
struct worker
{
worker(int id) : id(id) {}
void run(master& m) const;
int id;
};
// ...
struct master
{
// ...
master()
{
for (int i = 0; i<10; ++i)
workers.emplace_back(i);
for (auto& w: workers)
threads.emplace_back(&worker::run, ref(w), ref(*this));
}
// ...
void worker::run(master& m) const
{
m.sem.wait();
{
static mutex io_mx;
lock_guard<mutex> lk(io_mx);
cout << "signaled: " << id << "\n";
}
}
A caveat
cv.wait() could suffer spurious wake-ups, in which the condition variable wasn't atually raised (e.g. in the event of OS signal handlers). This is a common thing to happen with condition variables on any platfrom.
The following approach fixes this:
2. Notifying all threads, coordinating which recipient worker
Use a flag to signal which thread was intended to receive the signal:
struct master
{
mutable mutex mx;
mutable condition_variable cv;
int signaled_id; // ADDED
master() : signaled_id(-1)
{
Let's pretend that driver got a lot more interesting and wants to signal all workers in a specific (random...) order:
void drive()
{
// generate random wakeup order
vector<int> wakeups(10);
iota(begin(wakeups), end(wakeups), 0);
random_shuffle(begin(wakeups), end(wakeups));
// do wakeups
for (int id : wakeups)
{
this_thread::sleep_for(chrono::milliseconds(rand()%1000));
signal(id);
}
}
private:
void signal(int id) // ADDED id
{
unique_lock<mutex> lk(mx);
std::cout << "signaling " << id << "\n";
signaled_id = id; // ADDED put it in the shared field
cv.notify_all();
cv.wait(lk, [&] { return signaled_id == -1; });
}
Now all we have to do is make sure that the receiving thread checks that it's id matches:
m.cv.wait(lk, [&] { return m.signaled_id == id; });
m.signaled_id = -1;
m.cv.notify_all();
This puts an end to spurious wake-ups.
Full code listings/live demos:
1. notify_one.cpp http://coliru.stacked-crooked.com/view?id=c968f8cffd57afc2a0c6777105203f85-03e740563a9d9c6bf97614ba6099fe92
1b. id. with worker struct: http://coliru.stacked-crooked.com/view?id=7bd224c42130a0461b0c894e0b7c74ae-03e740563a9d9c6bf97614ba6099fe92
2. notify_all.cpp http://coliru.stacked-crooked.com/view?id=1d3145ccbb93c1bec03b232d372277b8-03e740563a9d9c6bf97614ba6099fe92
It is not clear what your exact circumstances are, but it seems like you are using a container to hold your "Worker" instances that are created in main, and passing them to your "Master". If this is the case, there are a few remedies available to you. You need to pick one that is appropriate to your implementation.
Pass a reference to the container in main to the Master.
Change the container to hold (smart) pointers to Workers.
Make the container part of "Master" itself, so that it doesn't need to be passed to it.
Implement a proper destructor, copy constructor, and assignment operator for your Worker class (in other words, obey the Rule of Three).
Technically speaking, since pthread_create() is a C API, the function pointer that is passed to it needs to have C linkage (extern "C"). You can't make a method of a C++ class have C linkage, so you should define an external function:
extern "C" { static void * worker_run (void *arg); }
class Worker { //...
};
static void * worker_run (void *arg) {
return Worker::run(arg);
}

C++ std::async run on main thread

IS there a way of running a function back on the main thread ?
So if I called a function via Async that downloaded a file and then parsed the data. It would then call a callback function which would run on my main UI thread and update the UI ?
I know threads are equal in the default C++ implementation so would I have to create a shared pointer to my main thread. How would I do this and pass the Async function not only the shared pointer to the main thread but also a pointer to the function I want to rrun on it and then run it on that main thread ?
I have been reading C++ Concurrency in Action and chapter four (AKA "The Chapter I Just Finished") describes a solution.
The Short Version
Have a shared std::deque<std::packaged_task<void()>> (or a similar sort of message/task queue). Your std::async-launched functions can push tasks to the queue, and your GUI thread can process them during its loop.
There Isn't Really a Long Version, but Here Is an Example
Shared Data
std::deque<std::packaged_task<void()>> tasks;
std::mutex tasks_mutex;
std::atomic<bool> gui_running;
The std::async Function
void one_off()
{
std::packaged_task<void()> task(FUNCTION TO RUN ON GUI THREAD); //!!
std::future<void> result = task.get_future();
{
std::lock_guard<std::mutex> lock(tasks_mutex);
tasks.push_back(std::move(task));
}
// wait on result
result.get();
}
The GUI Thread
void gui_thread()
{
while (gui_running) {
// process messages
{
std::unique_lock<std::mutex> lock(tasks_mutex);
while (!tasks.empty()) {
auto task(std::move(tasks.front()));
tasks.pop_front();
// unlock during the task
lock.unlock();
task();
lock.lock();
}
}
// "do gui work"
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
}
Notes:
I am (always) learning, so there is a decent chance that my code is not great. The concept is at least sound though.
The destructor of the return value from std::async (a std::future<>) will block until the operation launched with std::async completes (see std::async ), so waiting on the result of a task (as I do in my example) in one_off might not be a brilliant idea.
You may want to (I would, at least) create your own threadsafe MessageQueue type to improve code readability/maintainability/blah blah blah.
I swear there was one more thing I wanted to point out, but it escapes me right now.
Full Example
#include <atomic>
#include <chrono>
#include <deque>
#include <iostream>
#include <mutex>
#include <future>
#include <thread>
// shared stuff:
std::deque<std::packaged_task<void()>> tasks;
std::mutex tasks_mutex;
std::atomic<bool> gui_running;
void message()
{
std::cout << std::this_thread::get_id() << std::endl;
}
void one_off()
{
std::packaged_task<void()> task(message);
std::future<void> result = task.get_future();
{
std::lock_guard<std::mutex> lock(tasks_mutex);
tasks.push_back(std::move(task));
}
// wait on result
result.get();
}
void gui_thread()
{
std::cout << "gui thread: "; message();
while (gui_running) {
// process messages
{
std::unique_lock<std::mutex> lock(tasks_mutex);
while (!tasks.empty()) {
auto task(std::move(tasks.front()));
tasks.pop_front();
// unlock during the task
lock.unlock();
task();
lock.lock();
}
}
// "do gui work"
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
}
int main()
{
gui_running = true;
std::cout << "main thread: "; message();
std::thread gt(gui_thread);
for (unsigned i = 0; i < 5; ++i) {
// note:
// these will be launched sequentially because result's
// destructor will block until one_off completes
auto result = std::async(std::launch::async, one_off);
// maybe do something with result if it is not void
}
// the for loop will not complete until all the tasks have been
// processed by gui_thread
// ...
// cleanup
gui_running = false;
gt.join();
}
Dat Output
$ ./messages
main thread: 140299226687296
gui thread: 140299210073856
140299210073856
140299210073856
140299210073856
140299210073856
140299210073856
Are you looking for std::launch::deferred ? Passing this parameter to std::async makes the task executed on the calling thread when the get() function is called for the first time.