#ifndef THREADPOOL_H
#define THREADPOOL_H
#include <iostream>
#include <deque>
#include <functional>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <atomic>
#include <vector>
//thread pool
class ThreadPool
{
public:
ThreadPool(unsigned int n = std::thread::hardware_concurrency())
: busy()
, processed()
, stop()
{
for (unsigned int i=0; i<n; ++i)
workers.emplace_back(std::bind(&ThreadPool::thread_proc, this));
}
template<class F> void enqueue(F&& f)
{
std::unique_lock<std::mutex> lock(queue_mutex);
tasks.emplace_back(std::forward<F>(f));
cv_task.notify_one();
}
void waitFinished()
{
std::unique_lock<std::mutex> lock(queue_mutex);
cv_finished.wait(lock, [this](){ return tasks.empty() && (busy == 0); });
}
~ThreadPool()
{
// set stop-condition
std::unique_lock<std::mutex> latch(queue_mutex);
stop = true;
cv_task.notify_all();
latch.unlock();
// all threads terminate, then we're done.
for (auto& t : workers)
t.join();
}
unsigned int getProcessed() const { return processed; }
private:
std::vector< std::thread > workers;
std::deque< std::function<void()> > tasks;
std::mutex queue_mutex;
std::condition_variable cv_task;
std::condition_variable cv_finished;
unsigned int busy;
std::atomic_uint processed;
bool stop;
void thread_proc()
{
while (true)
{
std::unique_lock<std::mutex> latch(queue_mutex);
cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
if (!tasks.empty())
{
// got work. set busy.
++busy;
// pull from queue
auto fn = tasks.front();
tasks.pop_front();
// release lock. run async
latch.unlock();
// run function outside context
fn();
++processed;
latch.lock();
--busy;
cv_finished.notify_one();
}
else if (stop)
{
break;
}
}
}
};
#endif // THREADPOOL_H
I have the above thread pool implementation using a latch. However, every time I add a task through the enqueue call, the overhead is quite large, it takes about 100 micro seconds.
How can I improve the performance of the threadpool?
Your code looks fine. The comments above in your question about compiling with release optimizations on are probably correct and all you need to do.
Disclaimer: Always measure code first with appropriate tools to identify where the bottlenecks are before attempting to improve it's performance. Otherwise, you might not get the improvements you seek.
But a couple of potential micro-optimizations I see are this.
Change this in your thread_proc function
while (true)
{
std::unique_lock<std::mutex> latch(queue_mutex);
cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
if (!tasks.empty())
To this:
std::unique_lock<std::mutex> latch(queue_mutex);
while (!stop)
{
cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
while (!tasks.empty() && !stop)
And then remove the else if (stop) block and the end of the function.
The main impact this has is that it avoids the extra "unlock" and "lock" on queue_mutex as a result of latch going out of scope on each iteration of the while loop. The changing of if (!tasks.empty()) to while (!tasks.empty()) might save a cycle or two as well by letting the currently executing thread which has the quantum keep the lock and try to deque the next work item.
<opinion>
One final thing. I'm always of the opinion that the notify should be outside the lock. That way, there's no lock contention when the other thread is woken up by the thread that just updated the queue. But I've never actually measured this assumption, so take it with a grain of salt:
template<class F> void enqueue(F&& f)
{
queue_mutex.lock();
tasks.emplace_back(std::forward<F>(f));
queue_mutex.unlock();
cv_task.notify_one();
}
Related
I'm having trouble thinking of a way to properly implement a signalling mechanism for multiple listeners waiting in the same function for a producer to signal some new data continuously, without getting "signalled" for the same previous data-
I want all listeners to always see the latest available data (not caring about missed signals if they are busy), without repeats.
My attempt so far:
#include <functional>
#include <shared_mutex>
#include <condition_variable>
#include <thread>
class Signaller {
public:
// Used by producer, will hold on to the mutex uniquely as it modifies data
void Signal(const std::function<void()>& fnIn) {
std::unique_lock lock(m_mtx);
fnIn();
m_newData = true;
m_cv.notify_all();
}
// Used by consumers, will only hold shared mutex to read data
void Wait(const std::function<void()>& fnIn) {
{
std::shared_lock lock(m_mtx);
m_cv.wait(lock, [this](){ return m_newData; });
fnIn();
}
// Need some way to flip m_newData to false when all threads are "done"
// (or some other method of preventing spurious wakeups)
// I don't think this is particularly ideal
{
std::unique_lock lock(m_mtx);
m_newData = false;
}
}
private:
std::condition_variable_any m_cv;
std::shared_mutex m_mtx;
bool m_newData{false}; // To prevent spurious wakeups
};
class Example {
public:
// Multiple threads will call this function in the same instance of Example
void ConsumerLoop()
{
int latestData{0};
while (true){
m_signaller.Wait([this, &latestData](){ latestData = m_latestData; });
// process latestData...
// I want to make sure latestData here is always the latest
// (It's OK to miss a few signals in between if its off processing this latest data)
}
}
// One thread will be using this to signal new data
void ProducerLoop(){
while(true){
int newData = rand();
m_signaller.Signal([this, newData](){ m_latestData = newData; });
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
private:
Signaller m_signaller;
int m_latestData;
};
My main issue (I think) is how to prevent spurious wakeups, while preventing repeated data from waking up the same thread. I've thought about using some sort of counter within each thread to keep track of whether it's receiving the same data, but couldn't get anywhere with that idea (unless I perhaps make some sort of map using std::this_thread::get_id?). Is there a better way to do this?
EDIT:
Expanding on my map of thread ID's idea, I think I've found a solution:
#include <functional>
#include <shared_mutex>
#include <condition_variable>
#include <unordered_map>
#include <thread>
class Signaller {
public:
// Used by producer, will hold on to the mutex uniquely as it modifies data
void Signal(const std::function<void()>& fnIn) {
std::unique_lock lock(m_mtx);
fnIn();
m_ctr++;
m_cv.notify_all();
}
void RegisterWaiter(){
std::unique_lock lock(m_mtx);
auto [itr, emplaced] = m_threadCtrMap.try_emplace(std::this_thread::get_id(), m_ctr);
if (!emplaced) {
itr->second = m_ctr;
}
}
// Used by consumers, will only hold shared mutex to read data
void Wait(const std::function<void()>& fnIn) {
std::shared_lock lock(m_mtx);
m_cv.wait(lock, [this](){ return m_threadCtrMap[std::this_thread::get_id()] != m_ctr; });
fnIn();
m_threadCtrMap[std::this_thread::get_id()] = m_ctr;
}
private:
std::condition_variable_any m_cv;
std::shared_mutex m_mtx;
std::uint32_t m_ctr{0};
std::unordered_map<std::thread::id, std::uint32_t> m_threadCtrMap; // Stores the last signalled ctr for that thread
};
class Example {
public:
// Multiple threads will call this function in the same instance of Example
void ConsumerLoop()
{
int latestData{0};
m_signaller.RegisterWaiter();
while (true){
m_signaller.Wait([this, &latestData](){ latestData = m_latestData; });
}
}
// One thread will be using this to signal new data
void ProducerLoop(){
while(true){
int newData = rand();
m_signaller.Signal([this, newData](){ m_latestData = newData; });
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
private:
Signaller m_signaller;
int m_latestData;
};
Here's my implementation:
#include <unordered_map>
#include <condition_variable>
#include <shared_mutex>
#include <thread>
/*
Example usage:
struct MyClass {
MultiCVSignaller m_signaller;
int m_latestData;
std::atomic<bool> m_stop{false};
~MyClass(){
m_stop = true;
m_signaller.Shutdown();
}
void FuncToWaitOnData() { // e.g. Multiple threads call this fn to "subscribe" to the signal
auto& signalCtr = m_signaller.RegisterListener();
while(!m_stop.load(std::memory_order_relaxed)) {
int latestDataInLocalThread;
// WaitForSignal() calls the provided function while holding on to the shared mutex
m_signaller.WaitForSignal(signalCtr, [this, &latestDataInLocalThread](){
latestDataInLocalThread = m_latestData;
});
// Make use of latest data...
}
}
void ProducerLoop() {
while(!m_stop.load(std::memory_order_relaxed)) {
// Signal() holds on to the mutex uniquely while calling the provided function.
m_signaller.Signal([&latestData](){
m_latestData = rand();
});
}
}
};
*/
class MultiCVSignaller
{
public:
using SignalCtr = std::uint32_t;
public:
MultiCVSignaller() = default;
~MultiCVSignaller() { Shutdown(); }
/*
Call to set and signal shutdown state, cancelling waits (and skipping the functions provided if any)
This should be added in the class' destructor before threads are joined.
*/
void Shutdown() {
std::unique_lock lock(m_mtx);
m_shutdown = true;
m_cv.notify_all();
}
// Calls the function if specified while holding on to the mutex with a UNIQUE lock
template<class Func = void(*)()>
void Signal(Func fnIn = +[]{})
{
std::unique_lock lock(m_mtx);
fnIn();
m_ctr++;
m_cv.notify_all();
}
MultiCVSignaller::SignalCtr& RegisterListener(){
std::unique_lock lock(m_mtx);
auto [itr, emplaced] = m_threadCtrMap.try_emplace(std::this_thread::get_id(), m_ctr);
if (!emplaced) {
itr->second = m_ctr;
}
return itr->second;
}
/*
Calls the optional function while holding on to the SHARED lock when signalled. The signalCtr argument should be provided by the return of RegisterListener() (see example)
*/
template<class Func = void(*)()>
void WaitForSignal(MultiCVSignaller::SignalCtr& signalCtr, Func fnIn = +[]{})
{
std::shared_lock lock(m_mtx);
m_cv.wait(lock, [this, &signalCtr](){ return ( m_shutdown || signalCtr != m_ctr); });
if (!m_shutdown)
{
fnIn();
signalCtr = m_ctr;
}
}
private:
std::condition_variable_any m_cv;
std::shared_mutex m_mtx;
bool m_shutdown{false};
SignalCtr m_ctr{0}; // Latest ctr from Signal()
// This map stores the signal count received for registered listeners.
// We use an unordered_map as references are never invalidated (unless erased),
// which is not the case for a vector
std::unordered_map<std::thread::id, SignalCtr> m_threadCtrMap;
};
I have two threads, one is the producer and other is consumer. My consumer is always late (due to some costly function call, simulated in below code using sleeps) so I have used ring buffer as I can afford to loose some events.
Questions:
I am wondering if it would be better to use condition variable instead of what I currently have : continuous monitoring of the ring buffer size to see if the events got generated. I know that the current while loop of checking the ring buffer size is expensive, so I can probably add some yield calls to reduce the tight loop. I want to reduce the chances of dropped events.
Can I get rid of pointers? In my current code I am passing pointers to my ring buffer from main function to the threads. Wondering if there is any fancy or better way to do the same?
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
#include <atomic>
#include <boost/circular_buffer.hpp>
#include <condition_variable>
#include <functional>
std::atomic<bool> mRunning;
std::mutex m_mutex;
std::condition_variable m_condVar;
long int data = 0;
class Detacher {
public:
template<typename Function, typename ... Args>
void createTask(Function &&func, Args&& ... args) {
m_threads.emplace_back(std::forward<Function>(func), std::forward<Args>(args)...);
}
Detacher() = default;
Detacher(const Detacher&) = delete;
Detacher & operator=(const Detacher&) = delete;
Detacher(Detacher&&) = default;
Detacher& operator=(Detacher&&) = default;
~Detacher() {
for (auto& thread : m_threads) {
thread.join();
}
}
private:
std::vector<std::thread> m_threads;
};
void foo_1(boost::circular_buffer<int> *cb)
{
while (mRunning) {
std::unique_lock<std::mutex> mlock(m_mutex);
if (!cb->size())
continue;
int data = cb[0][0];
cb->pop_front();
mlock.unlock();
if (!mRunning) {
break;
}
//simulate time consuming function call
std::this_thread::sleep_for(std::chrono::milliseconds(16));
}
}
void foo_2(boost::circular_buffer<int> *cb)
{
while (mRunning) {
std::unique_lock<std::mutex> mlock(m_mutex);
cb->push_back(data);
data++;
mlock.unlock();
//simulate time consuming function call
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
int main()
{
mRunning = true;
boost::circular_buffer<int> cb(100);
Detacher thread_1;
thread_1.createTask(foo_1, &cb);
Detacher thread_2;
thread_2.createTask(foo_2, &cb);
std::this_thread::sleep_for(std::chrono::milliseconds(20000));
mRunning = false;
}
The producer is faster (16x) than the consumer, so ~93% of all events will always be discarded.
Application without std::condition_variable:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <chrono>
std::mutex mutex;
std::queue<int> queue;
int counter;
void loadData()
{
while(true)
{
std::unique_lock<std::mutex> lock(mutex);
queue.push(++counter);
lock.unlock();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
void writeData()
{
while(true)
{
std::lock_guard<std::mutex> lock(mutex);
while(queue.size() > 0)
{
std::cout << queue.front() << std::endl;
queue.pop();
}
}
}
int main()
{
std::thread thread1(loadData);
std::thread thread2(writeData);
thread1.join();
thread2.join();
return 0;
}
Application with std::condition_variable:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <chrono>
std::mutex mutex;
std::queue<int> queue;
std::condition_variable condition_variable;
int counter;
void loadData()
{
while(true)
{
std::unique_lock<std::mutex> lock(mutex);
queue.push(++counter);
lock.unlock();
condition_variable.notify_one();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
void writeData()
{
while(true)
{
std::unique_lock<std::mutex> lock(mutex);
condition_variable.wait(lock, [](){return !queue.empty();});
std::cout << queue.front() << std::endl;
queue.pop();
}
}
int main()
{
std::thread thread1(loadData);
std::thread thread2(writeData);
thread1.join();
thread2.join();
return 0;
}
If I am right, it means that second version of this application is unsafe, because of queue.empty() function, which is used without any synchronization, so there are no locks. And there is my question: Should we use condition_variables if they cause problems like this one mentioned before?
Your first example busy waits -- there is a thread pounding on the lock, checking, then releasing the lock. This both increases contention of the mutex and wastes up to an entire CPU when nothing is being processed.
The second example has the waiting thread mostly sleeping. It only wakes up when there is data ready, or when there is a "spurious wakeup" (with the standard permits).
When it wakes up, it reacquires the mutex and checks the predicate. If the predicate fails, it releases the lock and waits on the condition variable again.
It is safe, because the predicate is guaranteed to be run within the mutex you acquired and passed to the wait function.
The second code is safe because the call to wait(lock, pred) is equivalent to (directly from the standard):
while (!pred())
wait(lock);
And a call to wait(lock) release (unlock) lock, and reacquire (lock) it on notification.
In your case, this is equivalent to:
auto pred = [](){return !queue.empty();};
std::unique_lock<std::mutex> lock(mutex); // acquire
while (!pred) { // Ok, we are locked
condition_variable.wait(lock); // release
// if you get here, the lock as been re-acquired
}
So all the calls to your pred are made with lock locked/acquired - No issue here as long as all other operations to queue are also guarded.
Trying to expand in my two previous questions Move operations for a class with a thread as member variable and Call function inside a lambda passed to a thread
I'm not understanding why the thread doing a wait_for is somtimes not being notified wich results in a deadlock. Cppreference says on condition variables http://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.
MCVE, the commented line explains what changes if I hold the lock, but I dont undrestand why:
#include <atomic>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
using namespace std;
class worker {
public:
template <class Fn, class... Args>
explicit worker(Fn func, Args... args) {
t = std::thread(
[&func, this](Args... cargs) -> void {
std::unique_lock<std::mutex> lock(mtx);
while (true) {
cond.wait(lock, [this]() -> bool { return ready; });
if (terminate) {
break;
}
func(cargs...);
ready = false;
}
},
std::move(args)...);
}
~worker() {
terminate = true;
if (t.joinable()) {
run_once();
t.join();
}
}
void run_once() {
// If i dont hold this mutex the thread is never notified of ready being
// true.
std::unique_lock<std::mutex> lock(mtx);
ready = true;
cout << "ready run once " << ready << endl;
cond.notify_all();
}
bool done() { return (!ready.load()); }
private:
std::thread t;
std::atomic<bool> terminate{false};
std::atomic<bool> ready{false};
std::mutex mtx;
std::condition_variable cond;
};
// main.cpp
void foo() {
worker t([]() -> void { cout << "Bark" << endl; });
t.run_once();
while (!t.done()) {
}
}
int main() {
while (true) {
foo();
}
return 0;
}
You need a memory barrier to make sure that the other thread will see the modified "ready" value. "ready" being atomic only ensures that the memory access is ordered so that modifications that happened before the atomic access are actually flushed to main memory. This does not guarantee that the other threads will see that memory, since these threads may have their own cache of the memory. Therefore, to ensure that the other thread sees the "ready" modification, the mutex is required.
{
std::unique_lock<std::mutex> lock(mtx);
ready = true;
}
Hello,
i am quite new to C++ but I have 6 years Java experience, 2 years C experience and some knowledge of concurrency basics. I am trying to create a threadpool to handle tasks. it is below with the associated test main.
it seems like the error is generated from
void ThreadPool::ThreadHandler::enqueueTask(void (*task)(void)) {
std::lock_guard<std::mutex> lock(queueMutex);
as said by my debugger, but doing traditional cout debug, i found out that sometimes it works without segfaulting and removing
threads.emplace(handler->getSize(), handler);
from ThreadPool::enqueueTask() improves stability greatly.
Overall i think it is related too my bad use of condition_variable (called idler).
compiler: minGW-w64 in CLion
.cpp
#include <iostream>
#include "ThreadPool.h"
ThreadPool::ThreadHandler::ThreadHandler(ThreadPool *parent) : parent(parent) {
thread = std::thread([&]{
while (this->parent->alive){
if (getSize()){
std::lock_guard<std::mutex> lock(queueMutex);
(*(queue.front()))();
queue.pop_front();
} else {
std::unique_lock<std::mutex> lock(idlerMutex);
idler.wait(lock);
}
}
});
}
void ThreadPool::ThreadHandler::enqueueTask(void (*task)(void)) {
std::lock_guard<std::mutex> lock(queueMutex);
queue.push_back(task);
idler.notify_all();
}
size_t ThreadPool::ThreadHandler::getSize() {
std::lock_guard<std::mutex> lock(queueMutex);
return queue.size();
}
void ThreadPool::enqueueTask(void (*task)(void)) {
std::lock_guard<std::mutex> lock(threadsMutex);
std::map<int, ThreadHandler*>::iterator iter = threads.begin();
threads.erase(iter);
ThreadHandler *handler = iter->second;
handler->enqueueTask(task);
threads.emplace(handler->getSize(), handler);
}
ThreadPool::ThreadPool(size_t size) {
for (size_t i = 0; i < size; ++i) {
std::lock_guard<std::mutex> lock(threadsMutex);
ThreadHandler *handler = new ThreadHandler(this);
threads.emplace(handler->getSize(), handler);
}
}
ThreadPool::~ThreadPool() {
std::lock_guard<std::mutex> lock(threadsMutex);
auto it = threads.begin(), end = threads.end();
for (; it != end; ++it) {
delete it->second;
}
}
.h
#ifndef WLIB_THREADPOOL_H
#define WLIB_THREADPOOL_H
#include <mutex>
#include <thread>
#include <list>
#include <map>
#include <condition_variable>
class ThreadPool {
private:
class ThreadHandler {
std::condition_variable idler;
std::mutex idlerMutex;
std::mutex queueMutex;
std::thread thread;
std::list<void (*)(void)> queue;
ThreadPool *parent;
public:
ThreadHandler(ThreadPool *parent);
void enqueueTask(void (*task)(void));
size_t getSize();
};
std::multimap<int, ThreadHandler*> threads;
std::mutex threadsMutex;
public:
bool alive = true;
ThreadPool(size_t size);
~ThreadPool();
virtual void enqueueTask(void (*task)(void));
};
#endif //WLIB_THREADPOOL_H
main:
#include <iostream>
#include <ThreadPool.h>
ThreadPool pool(3);
void fn() {
std::cout << std::this_thread::get_id() << '\n';
pool.enqueueTask(fn);
};
int main() {
std::cout << "Hello, World!" << std::endl;
pool.enqueueTask(fn);
return 0;
}
Your main() function invokes enqueueTask().
Immediately afterwards, your main() returns.
This gets the gears in motion for winding down your process. This involves invoking the destructors of all global objects.
ThreadPool's destructor then proceeds to delete all dynamically-scoped threads.
While the threads are still running. Hilarity ensues.
You need to implement the process for an orderly shutdown of all threads.
This means setting active to false, kicking all of the threads in the shins, and then joining all threads, before letting nature take its course, and finally destroy everything.
P.S. -- you need to fix how alive is being checked. You also need to make access to alive thread-safe, protected by a mutex. The problem is that the thread could be holding a lock on one of two differented mutexes. This makes this process somewhat complicated. Some redesign is in order, here.