How to create an efficient multi-threaded task scheduler in C++? - c++

I'd like to create a very efficient task scheduler system in C++.
The basic idea is this:
class Task {
public:
virtual void run() = 0;
};
class Scheduler {
public:
void add(Task &task, double delayToRun);
};
Behind Scheduler, there should be a fixed-size thread pool, which run the tasks (I don't want to create a thread for each task). delayToRun means that the task doesn't get executed immediately, but delayToRun seconds later (measuring from the point it was added into the Scheduler).
(delayToRun means an "at-least" value, of course. If the system is loaded, or if we ask the impossible from the Scheduler, it won't be able to handle our request. But it should do the best it can)
And here's my problem. How to implement delayToRun functionality efficiently? I'm trying to solve this problem with the use of mutexes and condition variables.
I see two ways:
With manager thread
Scheduler contains two queues: allTasksQueue, and tasksReadyToRunQueue. A task gets added into allTasksQueue at Scheduler::add. There is a manager thread, which waits the smallest amount of time so it can put a task from allTasksQueue to tasksReadyToRunQueue. Worker threads wait for a task available in tasksReadyToRunQueue.
If Scheduler::add adds a task in front of allTasksQueue (a task, which has a value of delayToRun so it should go before the current soonest-to-run task), then the manager task need to be woken up, so it can update the time of wait.
This method can be considered inefficient, because it needs two queues, and it needs two condvar.signals to make a task run (one for allTasksQueue->tasksReadyToRunQueue, and one for signalling a worker thread to actually run the task)
Without manager thread
There is one queue in the scheduler. A task gets added into this queue at Scheduler::add. A worker thread checks the queue. If it is empty, it waits without a time constraint. If it is not empty, it waits for the soonest task.
If there is only one condition variable for which the working threads waiting for: this method can be considered inefficient, because if a task added in front of the queue (front means, if there are N worker threads, then the task index < N) then all the worker threads need to be woken up to update the time which they are waiting for.
If there is a separate condition variable for each thread, then we can control which thread to wake up, so in this case we don't need to wake up all threads (we only need to wake up the thread which has the largest waiting time, so we need to manage this value). I'm currently thinking about implementing this, but working out the exact details are complex. Are there any recommendations/thoughts/document on this method?
Is there any better solution for this problem? I'm trying to use standard C++ features, but I'm willing to use platform dependent (my main platform is linux) tools too (like pthreads), or even linux specific tools (like futexes), if they provide a better solution.

You can avoid both having a separate "manager" thread, and having to wake up a large number of tasks when the next-to-run task changes, by using a design where a single pool thread waits for the "next to run" task (if there is one) on one condition variable, and the remaining pool threads wait indefinitely on a second condition variable.
The pool threads would execute pseudocode along these lines:
pthread_mutex_lock(&queue_lock);
while (running)
{
if (head task is ready to run)
{
dequeue head task;
if (task_thread == 1)
pthread_cond_signal(&task_cv);
else
pthread_cond_signal(&queue_cv);
pthread_mutex_unlock(&queue_lock);
run dequeued task;
pthread_mutex_lock(&queue_lock);
}
else if (!queue_empty && task_thread == 0)
{
task_thread = 1;
pthread_cond_timedwait(&task_cv, &queue_lock, time head task is ready to run);
task_thread = 0;
}
else
{
pthread_cond_wait(&queue_cv, &queue_lock);
}
}
pthread_mutex_unlock(&queue_lock);
If you change the next task to run, then you execute:
if (task_thread == 1)
pthread_cond_signal(&task_cv);
else
pthread_cond_signal(&queue_cv);
with the queue_lock held.
Under this scheme, all wakeups are directly at only a single thread, there's only one priority queue of tasks, and there's no manager thread required.

Your specification is a bit too strong:
delayToRun means that the task doesn't get executed immediately, but delayToRun seconds later
You forgot to add "at least" :
The task don't get executed now, but at least delayToRun seconds later
The point is that if ten thousand tasks are all scheduled with a 0.1 delayToRun, they surely won't practically be able to run at the same time.
With such correction, you just maintain some queue (or agenda) of (scheduled-start-time, closure to run), you keep that queue sorted, and you start N (some fixed number) of threads which atomically pop the first element of the agenda and run it.
then all the worker threads need to be woken up to update the time which they are waiting for.
No, some worker threads would be woken up.
Read about condition variables and broadcast.
You might also user POSIX timers, see timer_create(2), or Linux specific fd timer, see timerfd_create(2)
You probably would avoid running blocking system calls in your threads, and have some central thread managing them using some event loop (see poll(2)...); otherwise, if you have a hundred tasks running sleep(100) and one task scheduled to run in half a second it won't run before a hundred seconds.
You may want to read about continuation-passing style programming (it -CPS- is highly relevant). Read the paper about Continuation Passing C by Juliusz Chroboczek.
Look also into Qt threads.
You could also consider coding in Go (with its Goroutines).

This is a sample implementation for the interface you provided that comes closest to your 'With manager thread' description.
It uses a single thread (timer_thread) to manage a queue (allTasksQueue) that is sorted based on the actual time when a task must be started (std::chrono::time_point).
The 'queue' is a std::priority_queue (which keeps its time_point key elements sorted).
timer_thread is normally suspended until the next task is started or when a new task is added.
When a task is about to be run, it is placed in tasksReadyToRunQueue, one of the worker threads is signaled, wakes up, removes it from the queue and starts processing the task..
Note that the thread pool has a compile-time upper limit for the number of threads (40). If you are scheduling more tasks than can be dispatched to workers,
new task will block until threads are available again.
You said this approach is not efficient, but overall, it seems reasonably efficient to me. It's all event driven and you are not wasting CPU cycles by unnecessary spinning.
Of course, it's just an example, optimizations are possible (note: std::multimap has been replaced with std::priority_queue).
The implementation is C++11 compliant
#include <iostream>
#include <chrono>
#include <queue>
#include <unistd.h>
#include <vector>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <memory>
class Task {
public:
virtual void run() = 0;
virtual ~Task() { }
};
class Scheduler {
public:
Scheduler();
~Scheduler();
void add(Task &task, double delayToRun);
private:
using timepoint = std::chrono::time_point<std::chrono::steady_clock>;
struct key {
timepoint tp;
Task *taskp;
};
struct TScomp {
bool operator()(const key &a, const key &b) const
{
return a.tp > b.tp;
}
};
const int ThreadPoolSize = 40;
std::vector<std::thread> ThreadPool;
std::vector<Task *> tasksReadyToRunQueue;
std::priority_queue<key, std::vector<key>, TScomp> allTasksQueue;
std::thread TimerThr;
std::mutex TimerMtx, WorkerMtx;
std::condition_variable TimerCV, WorkerCV;
bool WorkerIsRunning = true;
bool TimerIsRunning = true;
void worker_thread();
void timer_thread();
};
Scheduler::Scheduler()
{
for (int i = 0; i <ThreadPoolSize; ++i)
ThreadPool.push_back(std::thread(&Scheduler::worker_thread, this));
TimerThr = std::thread(&Scheduler::timer_thread, this);
}
Scheduler::~Scheduler()
{
{
std::lock_guard<std::mutex> lck{TimerMtx};
TimerIsRunning = false;
TimerCV.notify_one();
}
TimerThr.join();
{
std::lock_guard<std::mutex> lck{WorkerMtx};
WorkerIsRunning = false;
WorkerCV.notify_all();
}
for (auto &t : ThreadPool)
t.join();
}
void Scheduler::add(Task &task, double delayToRun)
{
auto now = std::chrono::steady_clock::now();
long delay_ms = delayToRun * 1000;
std::chrono::milliseconds duration (delay_ms);
timepoint tp = now + duration;
if (now >= tp)
{
/*
* This is a short-cut
* When time is due, the task is directly dispatched to the workers
*/
std::lock_guard<std::mutex> lck{WorkerMtx};
tasksReadyToRunQueue.push_back(&task);
WorkerCV.notify_one();
} else
{
std::lock_guard<std::mutex> lck{TimerMtx};
allTasksQueue.push({tp, &task});
TimerCV.notify_one();
}
}
void Scheduler::worker_thread()
{
for (;;)
{
std::unique_lock<std::mutex> lck{WorkerMtx};
WorkerCV.wait(lck, [this] { return tasksReadyToRunQueue.size() != 0 ||
!WorkerIsRunning; } );
if (!WorkerIsRunning)
break;
Task *p = tasksReadyToRunQueue.back();
tasksReadyToRunQueue.pop_back();
lck.unlock();
p->run();
delete p; // delete Task
}
}
void Scheduler::timer_thread()
{
for (;;)
{
std::unique_lock<std::mutex> lck{TimerMtx};
if (!TimerIsRunning)
break;
auto duration = std::chrono::nanoseconds(1000000000);
if (allTasksQueue.size() != 0)
{
auto now = std::chrono::steady_clock::now();
auto head = allTasksQueue.top();
Task *p = head.taskp;
duration = head.tp - now;
if (now >= head.tp)
{
/*
* A Task is due, pass to worker threads
*/
std::unique_lock<std::mutex> ulck{WorkerMtx};
tasksReadyToRunQueue.push_back(p);
WorkerCV.notify_one();
ulck.unlock();
allTasksQueue.pop();
}
}
TimerCV.wait_for(lck, duration);
}
}
/*
* End sample implementation
*/
class DemoTask : public Task {
int n;
public:
DemoTask(int n=0) : n{n} { }
void run() override
{
std::cout << "Start task " << n << std::endl;;
std::this_thread::sleep_for(std::chrono::seconds(2));
std::cout << " Stop task " << n << std::endl;;
}
};
int main()
{
Scheduler sched;
Task *t0 = new DemoTask{0};
Task *t1 = new DemoTask{1};
Task *t2 = new DemoTask{2};
Task *t3 = new DemoTask{3};
Task *t4 = new DemoTask{4};
Task *t5 = new DemoTask{5};
sched.add(*t0, 7.313);
sched.add(*t1, 2.213);
sched.add(*t2, 0.713);
sched.add(*t3, 1.243);
sched.add(*t4, 0.913);
sched.add(*t5, 3.313);
std::this_thread::sleep_for(std::chrono::seconds(10));
}

It means that you want to run all tasks continuously using some order.
You can create some type of sorted by a delay stack (or even linked list) of tasks. When a new task is coming you should insert it in the position depending of a delay time (just efficiently calculate that position and efficiently insert the new task).
Run all tasks starting with the head of the task stack (or list).

Core code for C++11:
#include <thread>
#include <queue>
#include <chrono>
#include <mutex>
#include <atomic>
using namespace std::chrono;
using namespace std;
class Task {
public:
virtual void run() = 0;
};
template<typename T, typename = enable_if<std::is_base_of<Task, T>::value>>
class SchedulerItem {
public:
T task;
time_point<steady_clock> startTime;
int delay;
SchedulerItem(T t, time_point<steady_clock> s, int d) : task(t), startTime(s), delay(d){}
};
template<typename T, typename = enable_if<std::is_base_of<Task, T>::value>>
class Scheduler {
public:
queue<SchedulerItem<T>> pool;
mutex mtx;
atomic<bool> running;
Scheduler() : running(false){}
void add(T task, double delayMsToRun) {
lock_guard<mutex> lock(mtx);
pool.push(SchedulerItem<T>(task, high_resolution_clock::now(), delayMsToRun));
if (running == false) runNext();
}
void runNext(void) {
running = true;
auto th = [this]() {
mtx.lock();
auto item = pool.front();
pool.pop();
mtx.unlock();
auto remaining = (item.startTime + milliseconds(item.delay)) - high_resolution_clock::now();
if(remaining.count() > 0) this_thread::sleep_for(remaining);
item.task.run();
if(pool.size() > 0)
runNext();
else
running = false;
};
thread t(th);
t.detach();
}
};
Test code:
class MyTask : Task {
public:
virtual void run() override {
printf("mytask \n");
};
};
int main()
{
Scheduler<MyTask> s;
s.add(MyTask(), 0);
s.add(MyTask(), 2000);
s.add(MyTask(), 2500);
s.add(MyTask(), 6000);
std::this_thread::sleep_for(std::chrono::seconds(10));
}

Related

How to get local hour efficiently?

I'm developing a service. Currently I need to get local hour for every request, since it involves system call, it costs too much.
In my case, some deviation like 200ms is OK for me.
So what's the best way to maintain a variable storing local_hour, and update it every 200ms?
static int32_t GetLocalHour() {
time_t t = std::time(nullptr);
if (t == -1) { return -1; }
struct tm *time_info_ptr = localtime(&t);
return (nullptr != time_info_ptr) ? time_info_ptr->tm_hour : -1;
}
If you want your main thread to spend as little time as possible on getting the current hour you can start a background thread to do all the heavy lifting.
For all things time use std::chrono types.
Here is the example, which uses quite a few (very useful) multithreading building blocks from C++.
#include <chrono>
#include <future>
#include <condition_variable>
#include <mutex>
#include <atomic>
#include <iostream>
// building blocks
// std::future/std::async, to start a loop/function on a seperate thread
// std::atomic, to be able to read/write threadsafely from a variable
// std::chrono, for all things time
// std::condition_variable, for communicating between threads. Basicall a signal that only signals that something has changed that might be interesting
// lambda functions : anonymous functions that are useful in this case for starting the asynchronous calls and to setup predicates (functions returning a bool)
// std::mutex : threadsafe access to a bit of code
// std::unique_lock : to automatically unlock a mutex when code goes out of scope (also needed for condition_variable)
// helper to convert time to start of day
using days_t = std::chrono::duration<int, std::ratio_multiply<std::chrono::hours::period, std::ratio<24> >::type>;
// class that has an asynchronously running loop that updates two variables (threadsafe)
// m_hours and m_seconds (m_seconds so output is a bit more interesting)
class time_keeper_t
{
public:
time_keeper_t() :
m_delay{ std::chrono::milliseconds(200) }, // update loop period
m_future{ std::async(std::launch::async,[this] {update_time_loop(); }) } // start update loop
{
// wait until asynchronous loop has started
std::unique_lock<std::mutex> lock{ m_mtx };
// wait until the asynchronous loop has started.
// this can take a bit of time since OS needs to schedule a thread for that
m_cv.wait(lock, [this] {return m_started; });
}
~time_keeper_t()
{
// threadsafe stopping of the mainloop
// to avoid problems that the thread is still running but the object
// with members is deleted.
{
std::unique_lock<std::mutex> lock{ m_mtx };
m_stop = true;
m_cv.notify_all(); // this will wakeup the loop and stop
}
// future.get will wait until the loop also has finished
// this ensures no member variables will be accessed
// by the loop thread and it is safe to fully destroy this instance
m_future.get();
}
// inline to avoid extra calls
inline int hours() const
{
return m_hours;
}
// inline to avoid extra calls
inline int seconds() const
{
return m_seconds;
}
private:
void update_time()
{
m_now = std::chrono::steady_clock::now();
std::chrono::steady_clock::duration tp = m_now.time_since_epoch();
// calculate back till start of day
days_t days = duration_cast<days_t>(tp);
tp -= days;
// calculate hours since start of day
auto hours = std::chrono::duration_cast<std::chrono::hours>(tp);
tp -= hours;
m_hours = hours.count();
// seconds since start of last hour
auto seconds = std::chrono::duration_cast<std::chrono::seconds>(tp);
m_seconds = seconds.count() % 60;
}
void update_time_loop()
{
std::unique_lock<std::mutex> lock{ m_mtx };
update_time();
// loop has started and has initialized all things time with values
m_started = true;
m_cv.notify_all();
// stop condition for the main loop, put in a predicate lambda
auto stop_condition = [this]()
{
return m_stop;
};
while (!m_stop)
{
// wait until m_cv is signaled or m_delay timed out
// a condition variable allows instant response and thus
// is better then just having a sleep here.
// (imagine a delay of seconds, that would also mean stopping could
// take seconds, this is faster)
m_cv.wait_for(lock, m_delay, stop_condition);
if (!m_stop) update_time();
}
}
std::atomic<int> m_hours;
std::atomic<int> m_seconds;
std::mutex m_mtx;
std::condition_variable m_cv;
bool m_started{ false };
bool m_stop{ false };
std::chrono::steady_clock::time_point m_now;
std::chrono::steady_clock::duration m_delay;
std::future<void> m_future;
};
int main()
{
time_keeper_t time_keeper;
// the mainloop now just can ask the time_keeper for seconds
// or in your case hours. The only time needed is the time
// to return an int (atomic) instead of having to make a full
// api call to get the time.
for (std::size_t n = 0; n < 30; ++n)
{
std::cout << "seconds now = " << time_keeper.seconds() << "\n";
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
return 0;
}
You don't need to query local time for every request because hour doesn't change every 200ms. Just update the local hour variable every hour
The most correct solution would be registering to a timer event like scheduled task on Windows or cronjobs on Linux that runs at the start of every hour. Alternatively create a timer that runs every hour and update the variable
The timer creation depends on the platform, for example on Windows use SetTimer, on Linux use timer_create. Here's a very simple solution using boost::asio which assumes that you run on the exact hour. You'll need to make some modification to allow it to run at any time, for example by creating a one-shot timer or by sleeping until the next hour
#include <chrono>
using namespace std::chrono_literals;
int32_t get_local_hour()
{
time_t t = std::time(nullptr);
if (t == -1) { return -1; }
struct tm *time_info_ptr = localtime(&t);
return (nullptr != time_info_ptr) ? time_info_ptr->tm_hour : -1;
}
static int32_t local_hour = get_local_hour();
bool running = true;
// Timer callback body, called every hour
void update_local_hour(const boost::system::error_code& /*e*/,
boost::asio::deadline_timer* t)
{
while (running)
{
t->expires_at(t->expires_at() + boost::posix_time::hour(1));
t->async_wait(boost::bind(print,
boost::asio::placeholders::error, t, count));
local_hour = get_local_hour();
}
}
int main()
{
boost::asio::io_service io;
// Timer that runs every hour and update the local_hour variable
boost::asio::deadline_timer t(io, boost::posix_time::hour(1));
t.async_wait(boost::bind(update_local_hour,
boost::asio::placeholders::error, &t));
running = true;
io.run();
std::this_thread::sleep_for(3h);
running = false; // stop the timer
}
Now just use local_hour directly instead of GetLocalHour()

Correct way to wait a condition variable that is notified by several threads

I'm trying to do this with the C++11 concurrency support.
I have a sort of thread pool of worker threads that all do the same thing, where a master thread has an array of condition variables (one for each thread, they need to 'start' synchronized, ie not run ahead one cycle of their loop).
for (auto &worker_cond : cond_arr) {
worker_cond.notify_one();
}
then this thread has to wait for a notification of each thread of the pool to restart its cycle again. Whats the correct way of doing this? Have a single condition variable and wait on some integer each thread that isn't the master is going to increase? something like (still in the master thread)
unique_lock<std::mutex> lock(workers_mtx);
workers_finished.wait(lock, [&workers] { return workers = cond_arr.size(); });
I see two options here:
Option 1: join()
Basically instead of using a condition variable to start the calculations in your threads, you spawn a new thread for every iteration and use join() to wait for it to be finished. Then you spawn new threads for the next iteration and so on.
Option 2: locks
You don't want the main-thread to notify as long as one of the threads is still working. So each thread gets its own lock, which it locks before doing the calculations and unlocks afterwards. Your main-thread locks all of them before calling the notify() and unlocks them afterwards.
I see nothing fundamentally wrong with your solution.
Guard workers with workers_mtx and done.
We could abstract this with a counting semaphore.
struct counting_semaphore {
std::unique_ptr<std::mutex> m=std::make_unique<std::mutex>();
std::ptrdiff_t count = 0;
std::unique_ptr<std::condition_variable> cv=std::make_unique<std::condition_variable>();
counting_semaphore( std::ptrdiff_t c=0 ):count(c) {}
counting_semaphore(counting_semaphore&&)=default;
void take(std::size_t n = 1) {
std::unique_lock<std::mutex> lock(*m);
cv->wait(lock, [&]{ if (count-std::ptrdiff_t(n) < 0) return false; count-=n; return true; } );
}
void give(std::size_t n = 1) {
{
std::unique_lock<std::mutex> lock(*m);
count += n;
if (count <= 0) return;
}
cv->notify_all();
}
};
take takes count away, and blocks if there is not enough.
give adds to count, and notifies if there is a positive amount.
Now the worker threads ferry tokens between two semaphores.
std::vector< counting_semaphore > m_worker_start{count};
counting_semaphore m_worker_done{0}; // not count, zero
std::atomic<bool> m_shutdown = false;
// master controller:
for (each step) {
for (auto&& starts:m_worker_start)
starts.give();
m_worker_done.take(count);
}
// master shutdown:
m_shutdown = true;
// wake up forever:
for (auto&& starts:m_worker_start)
starts.give(std::size_t(-1)/2);
// worker thread:
while (true) {
master->m_worker_start[my_id].take();
if (master->m_shutdown) return;
// do work
master->m_worker_done.give();
}
or somesuch.
live example.

C++ thread that starts several threads

I am trying to do a program that has to run 2 tasks periodically.
That is, for example, run task 1 every 10 seconds, and run task 2 every 20 seconds.
What I am thinking is to create two threads, each one with a timer. Thread 1 launches a new thread with task 1 every 10 seconds. and Thread 2 launches a new thread with task 2 every 20 seconds.
My doubt is, how to launch a new task 1 if the previous task 1 hasn't finished?
while (true)
{
thread t1 (task1);
this_thread::sleep_for(std::chrono::seconds(10));
t1.join();
}
I was trying this, but this way it will only launch a new task 1 when the previous one finishes.
EDIT:
Basically I want to implement a task scheduler.
Run task1 every X seconds.
Run task2 every Y seconds.
I was thinking in something like this:
thread t1 (timer1);
thread t2 (timer2);
void timer1()
{
while (true)
{
thread t (task1);
t.detach()
sleep(X);
}
}
the same for timer2 and task2
Perhaps you could create a periodic_task handler that is responsible for scheduling one task every t seconds. And then you can launch a periodic_task with a specific function and time duration from anywhere you want to in your program.
Below I've sketched something out. One valid choice is to detach the thread and let it run forever. Another is to include cancellation to allow the parent thread to cancel/join. I've included functionality to allow the latter (though you could still just detach/forget).
#include <condition_variable>
#include <functional>
#include <iostream>
#include <mutex>
#include <thread>
class periodic_task
{
std::chrono::seconds d_;
std::function<void()> task_;
std::mutex mut_;
std::condition_variable cv_;
bool cancel_{false};
public:
periodic_task(std::function<void()> task, std::chrono::seconds s)
: d_{s}
, task_(std::move(task))
{}
void
operator()()
{
std::unique_lock<std::mutex> lk{mut_};
auto until = std::chrono::steady_clock::now();
while (true)
{
while (!cancel_ && std::chrono::steady_clock::now() < until)
cv_.wait_until(lk, until);
if (cancel_)
return;
lk.unlock();
task_();
lk.lock();
until += d_;
}
}
void cancel()
{
std::unique_lock<std::mutex> lk{mut_};
cancel_ = true;
cv_.notify_one();
}
};
void
short_task()
{
std::cerr << "short\n";
}
void
long_task(int i, const std::string& message)
{
std::cerr << "long " << message << ' ' << i << '\n';
}
int
main()
{
using namespace std::chrono_literals;
periodic_task task_short{short_task, 7s};
periodic_task task_long{[](){long_task(5, "Hi");}, 13s};
std::thread t1{std::ref(task_short)};
std::this_thread::sleep_for(200ms);
std::thread t2{std::ref(task_long)};
std::this_thread::sleep_for(1min);
task_short.cancel();
task_long.cancel();
t1.join();
t2.join();
}
You want to avoid using thread::join() it, by definition, waits for the thread to finish. Instead, use thread::detach before sleeping, so it doesn't need to wait.
I'd suggest reading up on it http://www.cplusplus.com/reference/thread/thread/detach/

C++: Thread pool slower than single threading?

First of all I did look at the other topics on this website and found they don't relate to my problem as those mostly deal with people using I/O operations or thread creation overheads. My problem is that my threadpool or worker-task structure implementation is (in this case) a lot slower than single threading. I'm really confused by this and not sure if it's the ThreadPool, the task itself, how I test it, the nature of threads or something out of my control.
// Sorry for the long code
#include <vector>
#include <queue>
#include <thread>
#include <mutex>
#include <future>
#include "task.hpp"
class ThreadPool
{
public:
ThreadPool()
{
for (unsigned i = 0; i < std::thread::hardware_concurrency() - 1; i++)
m_workers.emplace_back(this, i);
m_running = true;
for (auto&& worker : m_workers)
worker.start();
}
~ThreadPool()
{
m_running = false;
m_task_signal.notify_all();
for (auto&& worker : m_workers)
worker.terminate();
}
void add_task(Task* task)
{
{
std::unique_lock<std::mutex> lock(m_in_mutex);
m_in.push(task);
}
m_task_signal.notify_one();
}
private:
class Worker
{
public:
Worker(ThreadPool* parent, unsigned id) : m_parent(parent), m_id(id)
{}
~Worker()
{
terminate();
}
void start()
{
m_thread = new std::thread(&Worker::work, this);
}
void terminate()
{
if (m_thread)
{
if (m_thread->joinable())
{
m_thread->join();
delete m_thread;
m_thread = nullptr;
m_parent = nullptr;
}
}
}
private:
void work()
{
while (m_parent->m_running)
{
std::unique_lock<std::mutex> lock(m_parent->m_in_mutex);
m_parent->m_task_signal.wait(lock, [&]()
{
return !m_parent->m_in.empty() || !m_parent->m_running;
});
if (!m_parent->m_running) break;
Task* task = m_parent->m_in.front();
m_parent->m_in.pop();
// Fixed the mutex being locked while the task is executed
lock.unlock();
task->execute();
}
}
private:
ThreadPool* m_parent = nullptr;
unsigned m_id = 0;
std::thread* m_thread = nullptr;
};
private:
std::vector<Worker> m_workers;
std::mutex m_in_mutex;
std::condition_variable m_task_signal;
std::queue<Task*> m_in;
bool m_running = false;
};
class TestTask : public Task
{
public:
TestTask() {}
TestTask(unsigned number) : m_number(number) {}
inline void Set(unsigned number) { m_number = number; }
void execute() override
{
if (m_number <= 3)
{
m_is_prime = m_number > 1;
return;
}
else if (m_number % 2 == 0 || m_number % 3 == 0)
{
m_is_prime = false;
return;
}
else
{
for (unsigned i = 5; i * i <= m_number; i += 6)
{
if (m_number % i == 0 || m_number % (i + 2) == 0)
{
m_is_prime = false;
return;
}
}
m_is_prime = true;
return;
}
}
public:
unsigned m_number = 0;
bool m_is_prime = false;
};
int main()
{
ThreadPool pool;
unsigned num_tasks = 1000000;
std::vector<TestTask> tasks(num_tasks);
for (auto&& task : tasks)
task.Set(randint(0, 1000000000));
auto s = std::chrono::high_resolution_clock::now();
#if MT
for (auto&& task : tasks)
pool.add_task(&task);
#else
for (auto&& task : tasks)
task.execute();
#endif
auto e = std::chrono::high_resolution_clock::now();
double seconds = std::chrono::duration_cast<std::chrono::nanoseconds>(e - s).count() / 1000000000.0;
}
Benchmarks with VS2013 Profiler:
10,000,000 tasks:
MT:
13 seconds of wall clock time
93.36% is spent in msvcp120.dll
3.45% is spent in Task::execute() // Not good here
ST:
0.5 seconds of wall clock time
97.31% is spent with Task::execute()
Usual disclaimer in such answers: the only way to tell for sure is to measure it with a profiler tool.
But I will try to explain your results without it. First of all, you have one mutex across all your threads. So only one thread at a time can execute some task. It kills all your gains you might have. In spite of your threads your code is perfectly serial. So at the very least make your task execution out of the mutex. You need to lock the mutex only to get a task out of the queue — you don't need to hold it when the task gets executed.
Next, your tasks are so simple that single thread will execute them in no time. You just can't measure any gains with such tasks. Create some heavy tasks which could produce some more interesting results(some tasks which are closer to the real world, not such contrived).
And the 3rd point: threads are not without their cost — context switching, mutex contention etc. To have real gains, as the previous 2 points say, you need to have tasks which take more time than the overheads threads introduce and the code should be truly parallel instead of waiting on some resource making it serial.
UPD: I looked at the wrong part of the code. The task is complex enough provided you create tasks with sufficiently large numbers.
UPD2: I've played with your code and found a good prime number to show how the MT code is better. Use the following prime number: 1019048297. It will give enough computation complexity to show the difference.
But why your code doesn't produce good results? It is hard to tell without seeing the implementation of randint() but I take it is pretty simple and in a half of the cases it returns even numbers and other cases produce not much of big prime numbers either. So the tasks are so simple that context switching and other things around your particular implementation and threads in general consume more time than the computation itself. Using the prime number I gave you give the tasks no choice but spend time computing — no easy answer since the number is big and actually prime. That's why the big number will give you the answer you seek — better time for the MT code.
You should not hold the mutex while the task is getting executed, otherwise other threads will not be able to get a task:
void work() {
while (m_parent->m_running) {
Task* currentTask = nullptr;
std::unique_lock<std::mutex> lock(m_parent->m_in_mutex);
m_parent->m_task_signal.wait(lock, [&]() {
return !m_parent->m_in.empty() || !m_parent->m_running;
});
if (!m_parent->m_running) continue;
currentTask = m_parent->m_in.front();
m_parent->m_in.pop();
lock.unlock(); //<- Release the lock so that other threads can get tasks
currentTask->execute();
currentTask = nullptr;
}
}
For MT, how much time is spent in each phase of the "overhead": std::unique_lock, m_task_signal.wait, front, pop, unlock?
Based on your results of only 3% useful work, this means the above consumes 97%. I'd get numbers for each part of the above (e.g. add timestamps between each call).
It seems to me, that the code you use to [merely] dequeue the next task pointer is quite heavy. I'd do a much simpler queue [possibly lockless] mechanism. Or, perhaps, use atomics to bump an index into the queue instead of the five step process above. For example:
void
work()
{
while (m_parent->m_running) {
// NOTE: this is just an example, not necessarily the real function
int curindex = atomic_increment(&global_index);
if (curindex >= max_index)
break;
Task *task = m_parent->m_in[curindex];
task->execute();
}
}
Also, maybe you should pop [say] ten at a time instead of just one.
You might also be memory bound and/or "task switch" bound. (e.g.) For threads that access an array, more than four threads usually saturates the memory bus. You could also have heavy contention for the lock, such that the threads get starved because one thread is monopolizing the lock [indirectly, even with the new unlock call]
Interthread locking usually involves a "serialization" operation where other cores must synchronize their out-of-order execution pipelines.
Here's a "lockless" implementation:
void
work()
{
// assume m_id is 0,1,2,...
int curindex = m_id;
while (m_parent->m_running) {
if (curindex >= max_index)
break;
Task *task = m_parent->m_in[curindex];
task->execute();
curindex += NUMBER_OF_WORKERS;
}
}

How can I safely terminate worker threads when they are complete?

I was trying to implement a master-worker model using the C++ 11 synchronization features for practice. The model uses a std::queue object along with a condition variable and some mutexes. The master thread puts tasks in the queue and the worker threads pops a task off the queue and "processes" them.
The code I have works properly (unless I've missed some race conditions) when I don't terminate the worker threads. However, the program never ends until you manually terminate it with Ctrl+C. I have some code to terminate the workers after the master thread finishes. Unfortunately, this doesn't work properly as it skips the last task on some execution runs.
So my question:
Is it possible to safely and properly terminate worker threads after all tasks have been processed?
This was just a proof of concept and I'm new to C++ 11 features so I apologize for my style. I appreciate any constructive criticism.
EDIT: nogard has kindly pointed out that this implementation of the model makes it quite complicated and showed me that what I'm asking for is pointless since a good implementation will not have this problem. Thread pools are the way to go in order to implement this properly. Also, I should be using an std::atomic instead of a normal boolean for worker_done (Thanks Jarod42).
#include <iostream>
#include <sstream>
#include <string>
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>
//To sleep
#include <unistd.h>
struct Task
{
int taskID;
};
typedef struct Task task;
//cout mutex
std::mutex printstream_accessor;
//queue related objects
std::queue<task> taskList;
std::mutex queue_accessor;
std::condition_variable cv;
//worker flag
bool worker_done = false;
//It is acceptable to call this on a lock only if you poll - you will get an inaccurate answer otherwise
//Will return true if the queue is empty, false if not
bool task_delegation_eligible()
{
return taskList.empty();
}
//Thread safe cout function
void safe_cout(std::string input)
{
// Apply a stream lock and state the calling thread information then print the input
std::unique_lock<std::mutex> cout_lock(printstream_accessor);
std::cout << "Thread:" << std::this_thread::get_id() << " " << input << std::endl;
}//cout_lock destroyed, therefore printstream_accessor mutex is unlocked
void worker_thread()
{
safe_cout("worker_thread() initialized");
while (!worker_done)
{
task getTask;
{
std::unique_lock<std::mutex> q_lock(queue_accessor);
cv.wait(q_lock,
[]
{ //predicate that will check if available
//using a lambda function to apply the ! operator
if (worker_done)
return true;
return !task_delegation_eligible();
}
);
if (!worker_done)
{
//Remove task from the queue
getTask = taskList.front();
taskList.pop();
}
}
if (!worker_done)
{
//process task
std::string statement = "Processing TaskID:";
std::stringstream convert;
convert << getTask.taskID;
statement += convert.str();
//print task information
safe_cout(statement);
//"process" task
usleep(5000);
}
}
}
/**
* master_thread():
* This thread is responsible for creating task objects and pushing them onto the queue
* After this, it will notify all other threads who are waiting to consume data
*/
void master_thread()
{
safe_cout("master_thread() initialized");
for (int i = 0; i < 10; i++)
{
//Following 2 lines needed if you want to don't want this thread to bombard the queue with tasks before processing of a task can be done
while (!task_delegation_eligible() ) //task_eligible() is true IFF queue is empty
std::this_thread::yield(); //yield execution to other threads (if there are tasks on the queue)
//create a new task
task newTask;
newTask.taskID = (i+1);
//lock the queue then push
{
std::unique_lock<std::mutex> q_lock(queue_accessor);
taskList.push(newTask);
}//unique_lock destroyed here
cv.notify_one();
}
safe_cout("master_thread() complete");
}
int main(void)
{
std::thread MASTER_THREAD(master_thread); //create a thread object named MASTER_THREAD and have it run the function master_thread()
std::thread WORKER_THREAD_1(worker_thread);
std::thread WORKER_THREAD_2(worker_thread);
std::thread WORKER_THREAD_3(worker_thread);
MASTER_THREAD.join();
//wait for the queue tasks to finish
while (!task_delegation_eligible()); //wait if the queue is full
/**
* Following 2 lines
* Terminate worker threads => this doesn't work as expected.
* The model is fine as long as you don't try to stop the worker
* threads like this as it might skip a task, however this program
* will terminate
*/
worker_done = true;
cv.notify_all();
WORKER_THREAD_1.join();
WORKER_THREAD_2.join();
WORKER_THREAD_3.join();
return 0;
}
Thanks a lot
There is visibility issue in your program: the change of worker_done flag made in one thread might not be observed by worker thread. In order to guarantee that the results of one action are observable to a second action, then you have to use some form of synchronization to make sure that the second thread sees what the first thread did.
To fix this issue you can use atomic as proposed by Jarod42.
If you do this program for practicing it's fine, but for the real applications you could profit from existing thread pool, which would greatly simplify your code.