Effective way of signaling and keeping a pthread open? - c++

I have some code that is trying to run some intense matrix processing, so I thought it would be faster if I multithreaded it. However, what my intention is is to keep the thread alive so that it can be used in the future for more processing. Here is the problem, the multithreaded version of the code runs slower than a single thread, and I believe the problem lies with the way I signal/keep my threads alive.
I am using pthreads on Windows and C++. Here is my code for the thread, where runtest() is the function where the matrix calculations happen:
void* playQueue(void* arg)
{
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break;
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_exit(NULL);
}
The playQueue() function is the one passed to the pthread, and what I have as of now, is that there is a queue (testQueue) of lets say 1000 items, and there are 100 threads. Each thread will continue to run until the queue is empty (hence the stuff inside the mutex).
I believe that the reason the multithread runs so slow is because of something called false sharing (i think?) and my method of signaling the thread to call runtest() and keeping the thread alive is poor.
What would be an effective way of doing this so that the multithreaded version will run faster (or at least equally as fast) as an iterative version?
HERE IS THE FULL VERSION OF MY CODE (minus the matrix stuff)
# include <cstdlib>
# include <iostream>
# include <cmath>
# include <complex>
# include <string>
# include <pthread.h>
# include <queue>
using namespace std;
# include "matrix_exponential.hpp"
# include "test_matrix_exponential.hpp"
# include "c8lib.hpp"
# include "r8lib.hpp"
# define NUM_THREADS 3
int main ( );
int counter;
queue<int> testQueue;
queue<int> anotherQueue;
void *playQueue(void* arg);
void runtest();
void matrix_exponential_test01 ( );
void matrix_exponential_test02 ( );
pthread_mutex_t anotherLock;
pthread_mutex_t queueLock;
pthread_cond_t queue_cv;
int main ()
{
counter = 0;
/* for (int i=0;i<1; i++)
for(int j=0; j<1000; j++)
{
runtest();
cout << counter << endl;
}*/
pthread_t threads[NUM_THREADS];
pthread_mutex_init(&queueLock, NULL);
pthread_mutex_init(&anotherLock, NULL);
pthread_cond_init (&queue_cv, NULL);
for(int z=0; z<1000; z++)
{
testQueue.push(1);
}
for( int i=0; i < NUM_THREADS; i++ )
{
pthread_create(&threads[i], NULL, playQueue, (void*)NULL);
}
while(anotherQueue.size()<NUM_THREADS)
{
}
cout << counter;
pthread_mutex_destroy(&queueLock);
pthread_cond_destroy(&queue_cv);
pthread_cancel(NULL);
cout << counter;
return 0;
}
void* playQueue(void* arg)
{
while(true)
{
cout<<counter<<endl;
pthread_mutex_lock(&queueLock);
if(testQueue.empty()){
pthread_mutex_unlock(&queueLock);
break;
}
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_mutex_lock(&anotherLock);
anotherQueue.push(1);
pthread_mutex_unlock(&anotherLock);
pthread_exit(NULL);
}
void runtest()
{
counter++;
matrix_exponential_test01 ( );
matrix_exponential_test02 ( );
}
So in here the "matrix_exponential_tests" are taken from this website with permission and is where all of the matrix math occurs. The counter is just used to debug and make sure all the instances are running.

Doesn't it stuck ?
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break; //<----------------you break without unlock the mutex...
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
The section between lock and unlock run slower than if it was in single thread.
mutexes are slowing you down. you should lock only the critical section, and if you want to speed it up, try not use mutex at all.
You can do it by supplying the test via function argument rather than use the queue.
one way to avoid using the mutex is to use a vector without deleting and std::atomic_int (c++11) as the index (or to lock only getting the current index and the increment)
or use iterator like this:
vector<test> testVector;
vector<test>::iterator it;
//when it initialized to:
it = testVector.begin();
now your loop can be like this:
while(true)
{
vector<test>::iterator it1;
pthread_mutex_lock(&queueLock);
it1 = (it==testVector.end())? it : it++;
pthread_mutex_unlock(&queueLock);
//now you outside the critical section:
if(it==testVector.end())
break;
//you don't delete or change the vector
//so you can use the it1 iterator freely
runtest();
}

Related

Thread Queue C++

'''The original post has been edited'''
How can I make a thread pool for two for loops in C++? I need to run the start_thread function 22 times for each number between 0 and 6. And I will have a flexible number of threads available depending on the machine I am using. How can I create a pool to allocate the free threads to the next of the nested loop?
for (int t=0; t <22; t++){
for(int p=0; p<6; p++){
thread th1(start_thread, p);
thread th2(start_thread, p);
th1.join();
th2.join();
}
}
Not really certain about what you want, but maybe it's something like this.
for (int t=0; t <22; t++){
std::vector<std::thread> th;
for(int p=0; p<6; p++){
th.emplace_back(std::thread(start_thread, p));
}
for(int p=0; p<6; p++){
th[i].join();
}
}
(or maybe permute the two loops)
Edit if you want to control the number of threads
#include <iostream>
#include <thread>
#include <vector>
void
start_thread(int t, int p)
{
std::cout << "th " << t << ' ' << p << '\n';
}
void
join_all(std::vector<std::thread> &th)
{
for(auto &e: th)
{
e.join();
}
th.clear();
}
int
main()
{
std::size_t max_threads=std::thread::hardware_concurrency();
std::vector<std::thread> th;
for(int t=0; t <22; ++t)
{
for(int p=0; p<6; ++p)
{
th.emplace_back(std::thread(start_thread, t, p));
if(size(th)==max_threads)
{
join_all(th);
}
}
}
join_all(th);
return 0;
}
If you don't want dependency on a third-party library, this is pretty simple.
Just create a number of threads you like and let them pick a "job" from some queue.
For example:
#include <iostream>
#include <mutex>
#include <chrono>
#include <vector>
#include <thread>
#include <queue>
void work(int p)
{
// do the "work"
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << p << std::endl;
}
std::mutex m;
std::queue<int> jobs;
void worker()
{
while (true)
{
int job(0);
// sync access to the jobs queue
{
std::lock_guard<std::mutex> l(m);
if (jobs.empty())
return;
job = jobs.front();
jobs.pop();
}
work(job);
}
}
int main()
{
// queue all jobs
for (int t = 0; t < 22; t++) {
for (int p = 0; p < 6; p++) {
jobs.push(p);
}
}
// create reasonable number of threads
static const int n = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
for (int i = 0; i < n; ++i)
threads.emplace_back(std::thread(worker));
// wait for all of them to finish
for (int i = 0; i < n; ++i)
threads[i].join();
}
[ADDED] Obviously, you don't want global variables in your production code; this is simply a demo solution.
Stop trying to code and draw out what you need to do and the pieces you need to have in order to do it.
You need one queue to hold the jobs, one mutex to protect the queue so the threads don't smurf it up with simultaneous accesses, and N threads.
Each thread function is a loop that
grabs the mutex,
gets a job from the queue,
releases the mutex, and
processes the job.
In this case I'd keep things simple by exiting the loop and the thread when there are no more jobs in the queue in step 2. In production you'd have the thread block and wait on the queue so it's still available to service jobs added later.
Wrap that up in a class with a function that allows you to add jobs to the queue, a function to start N threads, and a function to join on all of the running threads.
main defines an instance of the class, feeds in the jobs, starts the thread pool and then blocks on join until everyone's done.
Once you've beaten the design into something you have high confidence does what you need it to do, then you start writing code. Write code, especially multi-threaded code, without a plan and you're in for a lot of debugging and re-writing that usually exceeds the time spent on design by a significant margin.
Since C++17 you can use one of the execution policies for many of the algorithms in the standard library. This can simplify going over a number of work packages greatly. What goes on behind the curtains is usually that it picks threads from a built-in thread pool and distribute work to them efficiently. It usually use just enough™ threads in both Linux and Windows and it'll use all the CPU you've got left (0% idle on all cores when the CPU:s have started spinning at max frequency) - strangely without making neither Linux nor Windows "sluggish".
Here I've used the execution policy std::execution::parallel_policy (indicated by the std::execution::par constant). If you can prepare the work that needs to be done and put it in a container, like a std::vector, it'll be really easy.
#include <algorithm>
#include <chrono>
#include <execution> // std::execution::par
#include <iostream>
// #include <thread> // not needed to run with execuion policies
#include <vector>
struct work_package {
work_package() : payload(co) { ++co; }
int payload;
static int co;
};
int work_package::co = 10;
int main() {
std::vector<work_package> wps(22*6); // 132 work packages
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 10 to 141
// work on the work packages
std::for_each(std::execution::par, wps.begin(), wps.end(), [](auto& wp) {
// Probably in a thread - As long as you do not write to the same work package
// from different threads, you don't need synchronization here.
// do some work with the work package
++wp.payload;
});
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 11 to 142
}
With g++ you may need to install tbb (The Threading Building Blocks) that you also need to link with: -ltbb.
apt install libtbb-dev on Ubuntu.
dnf install tbb-devel.x86_64 on Fedora.
Other distributions may call it something different.
Visual Studio (2017 and later) links with the proper library automatically (also tbb if I'm now mistaken).

QtConcurrent: why releaseThread and reserveThread cause deadlock?

In Qt 4.7 Reference for QThreadPool, we find:
void QThreadPool::releaseThread()
Releases a thread previously reserved by a call to reserveThread().
Note: Calling this function without previously reserving a thread
temporarily increases maxThreadCount(). This is useful when a thread
goes to sleep waiting for more work, allowing other threads to
continue. Be sure to call reserveThread() when done waiting, so that
the thread pool can correctly maintain the activeThreadCount().
See also reserveThread().
void QThreadPool::reserveThread()
Reserves one thread, disregarding activeThreadCount() and
maxThreadCount().
Once you are done with the thread, call releaseThread() to allow it to
be reused.
Note: This function will always increase the number of active threads.
This means that by using this function, it is possible for
activeThreadCount() to return a value greater than maxThreadCount().
See also releaseThread().
I want to use releaseThread() to make it possible to use nested concurrent map, but in the following code, it hangs in waitForFinished():
#include <QApplication>
#include <QMainWindow>
#include <QtConcurrentMap>
#include <QtConcurrentRun>
#include <QFuture>
#include <QThreadPool>
#include <QtTest/QTest>
#include <QFutureSynchronizer>
struct Task2 { // only calculation
typedef void result_type;
void operator()(int count) {
int k = 0;
for (int i = 0; i < count * 10; ++i) {
for (int j = 0; j < count * 10; ++j) {
k++;
}
}
assert(k >= 0);
}
};
struct Task1 { // will launch some other concurrent map
typedef void result_type;
void operator()(int count) {
QVector<int> vec;
for (int i = 0; i < 5; ++i) {
vec.push_back(i+count);
}
Task2 task;
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(), task);
{
// with out releaseThread before wait, it will hang directly
QThreadPool::globalInstance()->releaseThread();
f.waitForFinished(); // BUG: may hang there
QThreadPool::globalInstance()->reserveThread();
}
}
};
int main() {
QThreadPool* gtpool = QThreadPool::globalInstance();
gtpool->setExpiryTimeout(50);
int count = 0;
for (;;) {
QVector<int> vec;
for (int i = 0; i < 40 ; i++) {
vec.push_back(i);
}
// launch a task with nested map
Task1 task; // Task1 will have nested concurrent map
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(),task);
f.waitForFinished(); // BUG: may hang there
count++;
// waiting most of thread in thread pool expire
while (QThreadPool::globalInstance()->activeThreadCount() > 0) {
QTest::qSleep(50);
}
// launch a task only calculation
Task2 task2;
QFuture<void> f2 = QtConcurrent::map(vec.begin(), vec.end(), task2);
f2.waitForFinished(); // BUG: may hang there
qDebug() << count;
}
return 0;
}
This code will not run forever; it will hang in after many loops (1~10000), with all threads waiting for condition variable.
My questions are:
Why does it hang?
Can I fix it and keep the nested concurrent map?
dev env:
Linux version 2.6.32-696.18.7.el6.x86_64; Qt4.7.4; GCC 3.4.5
Windows 7; Qt4.7.4; mingw 4.4.0
The program hangs because of the race condition in QThreadPool when you try to deal with expiryTimeout. Here is the analysis in detail :
The problem in QThreadPool - source
When starting a task, QThreadPool did something along the lines of:
QMutexLocker locker(&mutex);
taskQueue.append(task); // Place the task on the task queue
if (waitingThreads > 0) {
// there are already running idle thread. They are waiting on the 'runnableReady'
// QWaitCondition. Wake one up them up.
waitingThreads--;
runnableReady.wakeOne();
} else if (runningThreadCount < maxThreadCount) {
startNewThread(task);
}
And the the thread's main loop looks like this:
void QThreadPoolThread::run()
{
QMutexLocker locker(&manager->mutex);
while (true) {
/* ... */
if (manager->taskQueue.isEmpty()) {
// no pending task, wait for one.
bool expired = !manager->runnableReady.wait(locker.mutex(),
manager->expiryTimeout);
if (expired) {
manager->runningThreadCount--;
return;
} else {
continue;
}
}
QRunnable *r = manager->taskQueue.takeFirst();
// run the task
locker.unlock();
r->run();
locker.relock();
}
}
The idea is that the thread will wait for a given amount of second for
a task, but if no task was added in a given amount of time, the thread
expires and is terminated. The problem here is that we rely on the
return value of runnableReady. If there is a task that is scheduled at
exactly the same time as the thread expires, then the thread will see
false and will expire. But the main thread will not restart any other
thread. That might let the application hang as the task will never be
run.
The quick workaround is to use a long expiryTime (30000 by default) and remove the while loop that waits for the threads expired.
Here is the main function modified, the program runs smoothly in Windows 7, 4 threads used by default :
int main() {
QThreadPool* gtpool = QThreadPool::globalInstance();
//gtpool->setExpiryTimeout(50); <-- don't set the expiry Timeout, use the default one.
qDebug() << gtpool->maxThreadCount();
int count = 0;
for (;;) {
QVector<int> vec;
for (int i = 0; i < 40 ; i++) {
vec.push_back(i);
}
// launch a task with nested map
Task1 task; // Task1 will have nested concurrent map
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(),task);
f.waitForFinished(); // BUG: may hang there
count++;
/*
// waiting most of thread in thread pool expire
while (QThreadPool::globalInstance()->activeThreadCount() > 0)
{
QTest::qSleep(50);
}
*/
// launch a task only calculation
Task2 task2;
QFuture<void> f2 = QtConcurrent::map(vec.begin(), vec.end(), task2);
f2.waitForFinished(); // BUG: may hang there
qDebug() << count ;
}
return 0;
}
#tungIt's answer is good enough, I found the qtbug and fix commit, just for reference:
https://bugreports.qt.io/browse/QTBUG-3786
https://github.com/qt/qtbase/commit/a9b6a78e54670a70b96c122b10ad7bd64d166514#diff-6d5794cef91df41c39b5e7cc6b71d041

C++11 thread to modify std::list

I'll post my code, and then tell you what I think it's doing.
#include <thread>
#include <mutex>
#include <list>
#include <iostream>
using namespace std;
...
//List of threads and ints
list<thread> threads;
list<int> intList;
//Whether or not a thread is running
bool running(false);
//Counters
int busy(0), counter(0);
//Add 10000 elements to the list
for (int i = 0; i < 10000; ++i){
//push back an int
intList.push_back(i);
counter++;
//If the thread is running, make a note of it and continue
if (running){
busy++;
continue;
}
//If we haven't yet added 10 elements before a reset, continue
if (counter < 10)
continue;
//If we've added more than 10 ints, and there's no active thread,
//reset the counter and launch
counter = 0;
threads.push_back(std::thread([&]
//These iterators are function args
(list<int>::iterator begin, list<int>::iterator end){
//mutex for the running bool
mutex m;
m.lock();
running = true;
m.unlock();
//Remove either 10 elements or every element till the end
int removed(0);
while (removed < 10 && begin != end){
begin = intList.erase(begin);
removed++;
}
//unlock the running bool
m.lock();
running = false;
m.unlock();
//Pass into the thread func the current beginning and end of the list
}, intList.begin(), intList.end()));
}
for (auto& thread : threads){
thread.join();
}
What I think this code is doing is adding 10000 elements to the end of a list. For every 10 we add, launch a (single) thread that deletes the first 10 elements of the list (at the time the thread was launched).
I don't expect this to remove every list element, I was just interested in seeing if I could add to the end of a list while removing elements from the beginning. In Visual Studio I get a "list iterators incompatible" error quite often, but I figure the problem is cross platform.
What's wrong with my thinking? I know it's something
EDIT:
So I see now that this code is very incorrect. Really I just want one auxiliary thread active at a time to delete elements, which is why I though calling erase was ok. However I don't know how to declare a thread without joining it up, and if I wait for that then I don't really see the point of doing any of this.
Should I declare my thread before the loop and have it wait for a signal from the main thread?
To clarify, my goal here is to do the following: I want to grab keyboard presses on one thread and store them in a list, and every so often log them to a file on a separate thread while removing the things I've logged. Since I don't want to spend a lot of time writing to the disk, I'd like to write in discrete chunks (of 10).
Thanks to Christophe, and everyone else. Here's my code now... I may be using lock_guard incorrectly.
#include <thread>
#include <mutex>
#include <list>
#include <iostream>
#include <atomic>
using namespace std;
...
atomic<bool> running(false);
list<int> intList;
int busy(0), counter(0);
mutex m;
thread * t(nullptr);
for (int i = 0; i < 100000; ++i){
//Would a lock_guard here be inappropriate?
m.lock();
intList.push_back(i);
m.unlock();
counter++;
if (running){
busy++;
continue;
}
if (counter < 10)
continue;
counter = 0;
if (t){
t->join();
delete t;
}
t = new thread([&](){
running = true;
int removed(0);
while (removed < 10){
lock_guard<mutex> lock(m);
if (intList.size())
intList.erase(intList.begin());
removed++;
}
running = false;
});
}
if (t){
t->join();
delete t;
}
Your code won't work for because:
your mutex is local to each thread (each thread has it's own copy used only by itself: no chance of interthread synchronisation!)
intList is not an atomic type, but you access to it from several threads causing race conditions and undefined behaviour.
the begin and end that you send to your threads at their creation, might no longer be valid during the execution.
Here some improvements (look at the commented lines):
atomic<bool> running(false); // <=== atomic (to avoid unnecessary use of mutex)
int busy(0), counter(0);
mutex l; // define the mutex here, so that it will be the same for all threads
for (int i = 0; i < 10000; ++i){
l.lock(); // <===you need to protect each access to the list
intList.push_back(i);
l.unlock(); // <===and unlock
counter++;
if (running){
busy++;
continue;
}
if (counter < 10)
continue;
counter = 0;
threads.push_back(std::thread([&]
(){ //<====No iterator args as they might be outdated during executionof threads!!
running = true; // <=== no longer surrounded from lock/unlock as it is now atomic
int removed(0);
while (removed < 10){
l.lock(); // <====you really need to protect access to the list
if (intList.size()) // <=== check if elements exist NOW
intList.erase(intList.begin()); // <===use current data, not a prehistoric outdated local begin !!
l.unlock(); // <====end of protected section
removed++;
}
running = false; // <=== no longer surrounded from lock/unlock as it is now atomic
})); //<===No other arguments
}
...
By the way, I'd suggest that you have a look at lock_guard<mutex> for the locks, as these ensure the unlock in all circumstances (especially when there are exceptions or orhter surprises like this).
Edit: I've avoided the lock protection of running with a mutex, by making it atomic<bool>.

Stop infinite looping thread from main

I am relatively new to threads, and I'm still learning best techniques and the C++11 thread library. Right now I'm in the middle of implementing a worker thread which infinitely loops, performing some work. Ideally, the main thread would want to stop the loop from time to time to sync with the information that the worker thread is producing, and then start it again. My idea initially was this:
// Code run by worker thread
void thread() {
while(run_) {
// Do lots of work
}
}
// Code run by main thread
void start() {
if ( run_ ) return;
run_ = true;
// Start thread
}
void stop() {
if ( !run_ ) return;
run_ = false;
// Join thread
}
// Somewhere else
volatile bool run_ = false;
I was not completely sure about this so I started researching, and I discovered that volatile is actually not required for synchronization and is in fact generally harmful. Also, I discovered this answer, which describes a process nearly identical to the one I though about. In the answer's comments however, this solution is described as broken, as volatile does not guarantee that different processor cores readily (if ever) communicate changes on the volatile values.
My question is this then: Should I use an atomic flag, or something else entirely? What exactly is the property that is lacking in volatile and that is then provided by whatever construct is needed to solve my problem effectively?
Have you looked for the Mutex ? They're made to lock the Threads avoiding conflicts on the shared data. Is it what you're looking for ?
I think you want to use barrier synchronization using std::mutex?
Also take a look at boost thread, for a relatively high level threading library
Take a look at this code sample from the link:
#include <iostream>
#include <map>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
std::map<std::string, std::string> g_pages;
std::mutex g_pages_mutex;
void save_page(const std::string &url)
{
// simulate a long page fetch
std::this_thread::sleep_for(std::chrono::seconds(2));
std::string result = "fake content";
g_pages_mutex.lock();
g_pages[url] = result;
g_pages_mutex.unlock();
}
int main()
{
std::thread t1(save_page, "http://foo");
std::thread t2(save_page, "http://bar");
t1.join();
t2.join();
g_pages_mutex.lock(); // not necessary as the threads are joined, but good style
for (const auto &pair : g_pages) {
std::cout << pair.first << " => " << pair.second << '\n';
}
g_pages_mutex.unlock();
}
I would suggest to use std::mutex and std::condition_variable to solve the problem. Here's an example how it can work with C++11:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
using namespace std;
int main()
{
mutex m;
condition_variable cv;
// Tells, if the worker should stop its work
bool done = false;
// Zero means, it can be filled by the worker thread.
// Non-zero means, it can be consumed by the main thread.
int result = 0;
// run worker thread
auto t = thread{ [&]{
auto bound = 1000;
for (;;) // ever
{
auto sum = 0;
for ( auto i = 0; i != bound; ++i )
sum += i;
++bound;
auto lock = unique_lock<mutex>( m );
// wait until we can safely write the result
cv.wait( lock, [&]{ return result == 0; });
// write the result
result = sum;
// wake up the consuming thread
cv.notify_one();
// exit the loop, if flag is set. This must be
// done with mutex protection. Hence this is not
// in the for-condition expression.
if ( done )
break;
}
} };
// the main threads loop
for ( auto i = 0; i != 20; ++i )
{
auto r = 0;
{
// lock the mutex
auto lock = unique_lock<mutex>( m );
// wait until we can safely read the result
cv.wait( lock, [&]{ return result != 0; } );
// read the result
r = result;
// set result to zero so the worker can
// continue to produce new results.
result = 0;
// wake up the producer
cv.notify_one();
// the lock is released here (the end of the scope)
}
// do time consuming io at the side.
cout << r << endl;
}
// tell the worker to stop
{
auto lock = unique_lock<mutex>( m );
result = 0;
done = true;
// again the lock is released here
}
// wait for the worker to finish.
t.join();
cout << "Finished." << endl;
}
You could do the same with std::atomics by essentially implementing spin locks. Spin locks can be slower than mutexes. So I repeat the advise on the boost website:
Do not use spinlocks unless you are certain that you understand the consequences.
I believe that mutexes and condition variables are the way to go in your case.

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.
This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().
You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.
A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)
A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.
Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}
You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}
Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).
Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.
looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);
I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}
The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).