Thread Queue C++

Thread Queue C++ - c++

'''The original post has been edited'''
How can I make a thread pool for two for loops in C++? I need to run the start_thread function 22 times for each number between 0 and 6. And I will have a flexible number of threads available depending on the machine I am using. How can I create a pool to allocate the free threads to the next of the nested loop?
for (int t=0; t <22; t++){
for(int p=0; p<6; p++){
thread th1(start_thread, p);
thread th2(start_thread, p);
th1.join();
th2.join();
}
}

Not really certain about what you want, but maybe it's something like this.
for (int t=0; t <22; t++){
std::vector<std::thread> th;
for(int p=0; p<6; p++){
th.emplace_back(std::thread(start_thread, p));
}
for(int p=0; p<6; p++){
th[i].join();
}
}
(or maybe permute the two loops)
Edit if you want to control the number of threads
#include <iostream>
#include <thread>
#include <vector>
void
start_thread(int t, int p)
{
std::cout << "th " << t << ' ' << p << '\n';
}
void
join_all(std::vector<std::thread> &th)
{
for(auto &e: th)
{
e.join();
}
th.clear();
}
int
main()
{
std::size_t max_threads=std::thread::hardware_concurrency();
std::vector<std::thread> th;
for(int t=0; t <22; ++t)
{
for(int p=0; p<6; ++p)
{
th.emplace_back(std::thread(start_thread, t, p));
if(size(th)==max_threads)
{
join_all(th);
}
}
}
join_all(th);
return 0;
}

If you don't want dependency on a third-party library, this is pretty simple.
Just create a number of threads you like and let them pick a "job" from some queue.
For example:
#include <iostream>
#include <mutex>
#include <chrono>
#include <vector>
#include <thread>
#include <queue>
void work(int p)
{
// do the "work"
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << p << std::endl;
}
std::mutex m;
std::queue<int> jobs;
void worker()
{
while (true)
{
int job(0);
// sync access to the jobs queue
{
std::lock_guard<std::mutex> l(m);
if (jobs.empty())
return;
job = jobs.front();
jobs.pop();
}
work(job);
}
}
int main()
{
// queue all jobs
for (int t = 0; t < 22; t++) {
for (int p = 0; p < 6; p++) {
jobs.push(p);
}
}
// create reasonable number of threads
static const int n = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
for (int i = 0; i < n; ++i)
threads.emplace_back(std::thread(worker));
// wait for all of them to finish
for (int i = 0; i < n; ++i)
threads[i].join();
}
[ADDED] Obviously, you don't want global variables in your production code; this is simply a demo solution.

Stop trying to code and draw out what you need to do and the pieces you need to have in order to do it.
You need one queue to hold the jobs, one mutex to protect the queue so the threads don't smurf it up with simultaneous accesses, and N threads.
Each thread function is a loop that
grabs the mutex,
gets a job from the queue,
releases the mutex, and
processes the job.
In this case I'd keep things simple by exiting the loop and the thread when there are no more jobs in the queue in step 2. In production you'd have the thread block and wait on the queue so it's still available to service jobs added later.
Wrap that up in a class with a function that allows you to add jobs to the queue, a function to start N threads, and a function to join on all of the running threads.
main defines an instance of the class, feeds in the jobs, starts the thread pool and then blocks on join until everyone's done.
Once you've beaten the design into something you have high confidence does what you need it to do, then you start writing code. Write code, especially multi-threaded code, without a plan and you're in for a lot of debugging and re-writing that usually exceeds the time spent on design by a significant margin.

Since C++17 you can use one of the execution policies for many of the algorithms in the standard library. This can simplify going over a number of work packages greatly. What goes on behind the curtains is usually that it picks threads from a built-in thread pool and distribute work to them efficiently. It usually use just enough™ threads in both Linux and Windows and it'll use all the CPU you've got left (0% idle on all cores when the CPU:s have started spinning at max frequency) - strangely without making neither Linux nor Windows "sluggish".
Here I've used the execution policy std::execution::parallel_policy (indicated by the std::execution::par constant). If you can prepare the work that needs to be done and put it in a container, like a std::vector, it'll be really easy.
#include <algorithm>
#include <chrono>
#include <execution> // std::execution::par
#include <iostream>
// #include <thread> // not needed to run with execuion policies
#include <vector>
struct work_package {
work_package() : payload(co) { ++co; }
int payload;
static int co;
};
int work_package::co = 10;
int main() {
std::vector<work_package> wps(22*6); // 132 work packages
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 10 to 141
// work on the work packages
std::for_each(std::execution::par, wps.begin(), wps.end(), [](auto& wp) {
// Probably in a thread - As long as you do not write to the same work package
// from different threads, you don't need synchronization here.
// do some work with the work package
++wp.payload;
});
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 11 to 142
}
With g++ you may need to install tbb (The Threading Building Blocks) that you also need to link with: -ltbb.
apt install libtbb-dev on Ubuntu.
dnf install tbb-devel.x86_64 on Fedora.
Other distributions may call it something different.
Visual Studio (2017 and later) links with the proper library automatically (also tbb if I'm now mistaken).

Related

unable to implement list operations using thread

Thread newbie here. In the following code, I want to add elements to the global list using one thread and search for random elements using another thread.
#include <list>
#include <algorithm>
#include <mutex>
#include <thread>
using namespace std;
list<int> some_list;
mutex some_mutex;
void add_to_list(int new_value)
{
lock_guard<mutex> guard(some_mutex);
some_list.push_back(new_value);
}
bool list_contains(int value_to_find)
{
lock_guard<mutex> guard(some_mutex);
return find(begin(some_list), end(some_list), value_to_find) != end(some_list);
}
int main()
{
for(int i = 0; i < 100; ++i)
{
// Add i to some_list through one thread only
thread t(add_to_list, i);
t.detach();
// Search elements in different thread
thread t2(list_contains, i);
t2.detach();
}
return 0;
}
However, when I pass i along with add_to_list then that many no of threads are created. I want to add the elements to the list using single thread only. How to do this? Do, I need to pass vector of elements to the thread instead of variable?

First of all, don't use detach(). In the code above the program starts 200 threads and ends (when main() returns) without waiting for any of them to finish. Use join() instead, this will allow you to wait for the thread completion properly.
Then, if you want add_to_list to be done sequentially, then do that in a separate loop.
For example:
int main() {
vector<thread> threads;
// Add i to some_list through one thread only
threads.emplace_back([] {
for (int i = 0; i < 100; ++i) {
add_to_list(i);
}
});
for (int i = 0; i < 100; ++i) {
// Search elements in different threads
threads.emplace_back([i] {
list_contains(i);
});
}
// Join all threads (waits for their completion)
for (auto& t : threads) {
t.join();
}
}

C++ Fork Join Parallelism Blocking

Suppose you wish you run a section in parallel, then merge back into the main thread then back to section in parallel, and so on. Similar to the childhood game red light green light.
I've given an example of what I'm trying to do, where I'm using a conditional variable to block the threads at the start but wish to start them all in parallel but then block them at the end so they can be printed out serially. The *= operation could be a much larger operation spanning many seconds. Reusing the threads is also important. Using a task queue might be too heavy.
I need to use some kind of blocking construct that isn't just a plain busy loop, because I know how to solve this problem with busy loops.
In English:
Thread 1 creates 10 threads that are blocked
Thread 1 signals all threads to start (without blocking eachother)
Thread 2-11 process their exclusive memory
Thread 1 is waiting until 2-11 are complete (can use an atomic to count here)
Thread 2-11 complete, each can notify for 1 to check its condition if necessary
Thread 1 checks its condition and prints the array
Thread 1 resignals 2-11 to process again, continuing from 2
Example code (Naive adapted from example on cplusplus.com):
// condition_variable example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable
#include <atomic>
std::mutex mtx;
std::condition_variable cv;
bool ready = false;
std::atomic<int> count(0);
bool end = false;
int a[10];
void doublea (int id) {
while(!end) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
a[id] *= 2;
count.fetch_add(1);
}
}
void go() {
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
ready = false; // Naive
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
ready = true;
cv.notify_all();
ready = false;
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
end = true;
cv.notify_all();
}
int main () {
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i) {
a[i] = 0;
threads[i] = std::thread(doublea,i);
}
std::cout << "10 threads ready to race...\n";
go(); // go!
return 0;
}

This is not as trivial to implement it efficiently. Moreover, it does not make any sense unless you are learning this subject. Conditional variable is not a good choice here because it does not scale well.
I suggest you to look how mature run-time libraries implement fork-join parallelism and learn from them or use them in your app. See http://www.openmprtl.org/, http://opentbb.org/, https://www.cilkplus.org/ - all these are open-source.
OpenMP is the closest model for what you are looking for and it has the most efficient implementation of fork-join barriers. Though, it has its disadvantages because it is designed for HPC and lacks dynamic composability. TBB and Cilk work best for nested parallelism and usage in modules and libraries which can be used in context of external parallel regions.

You can use barrier or condition variable to start all threads. Then thread one can wait to when all threads end their work (by join method on all threads, it is blocking) and then print in one for loop their data.

Why is this piece of C++ code not synchronized

I am learning to write multithreading applications. So share I run into trouble anytime I want my threads to access even the simples shared resources, despite using mutex.
For example, consider this code:
using namespace std;
mutex mu;
std::vector<string> ob;
void addSomeAValues(){
mu.lock();
for(int a=0; a<10; a++){
ob.push_back("A" + std::to_string(a));
usleep(300);
}
mu.unlock();
}
void addSomeBValues(){
mu.lock();
for(int b=0; b<10; b++){
ob.push_back("B" + std::to_string(b));
usleep(300);
}
mu.unlock();
}
int main() {
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
thread t0(addSomeAValues);
thread t1(addSomeBValues);
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
t0.join();
t1.join();
//Display the results
cout << "Code Run Complete; results: \n";
for(auto k : ob){
cout << k <<endl;
}
//Code running complete, report the time it took
typedef std::chrono::duration<int,std::milli> millisecs_t;
millisecs_t duration(std::chrono::duration_cast<millisecs_t>(end-start));
std::cout << duration.count() << " milliseconds.\n";
return 0;
}
When I run the program, it behaves unpredictably. Sometimes, the values A0-9 and B0-9 is printed to console no problem, sometimes there is a segmentation fault with crash report, sometimes, A0-3 & B0-5 is presented.
If i am missing a core synchronization issue, pleasee help
Edit: after alot of useful feed back i changed the code to
#include <iostream>
#include <string>
#include <vector>
#include <mutex>
#include <unistd.h>
#include <thread>
#include <chrono>
using namespace std;
mutex mu;
std::vector<string> ob;
void addSomeAValues(){
for(int a=0; a<10; a++){
mu.lock();
ob.push_back("A" + std::to_string(a));
mu.unlock();
usleep(300);
}
}
void addSomeBValues(){
for(int b=0; b<10; b++){
mu.lock();
ob.push_back("B" + std::to_string(b));
mu.unlock();
usleep(300);
}
}
int main() {
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now() ;
thread t0(addSomeAValues);
thread t1(addSomeBValues);
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now() ;
t0.join();
t1.join();
//Display the results
cout << "Code Run Complete; results: \n";
for(auto k : ob){
cout << k <<endl;
}
//Code running complete, report the time it took
typedef std::chrono::duration<int,std::milli> millisecs_t ;
millisecs_t duration( std::chrono::duration_cast<millisecs_t>(end-start) ) ;
std::cout << duration.count() << " milliseconds.\n" ;
return 0;
}
however I get the following output sometimes:
*** Error in `/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment':
double free or corruption (fasttop): 0x00007f19fc000920 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7f1a0687da46]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x402dd4]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x402930]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x402a8d]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x402637
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x402278]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x4019cf]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x4041e3]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x404133]
/home/soliduscode/eclipse_workspace/CppExperiment/Debug/CppExperiment[0x404088]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb29f0)[0x7f1a06e8d9f0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7f1a060c6f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f1a068f6e1d]
Update & Solution
With the problem I was experiencing (namely: unpredictable executing of the program with intermittent dump of corruption complaints), all was solved by including -lpthread as part of my eclipse build (under project settings).
I am using C++11. It's odd, at least to me, that the program would compile without issuing a complaint that I have not yet linked against pthread.
So to anyone using C++11, std::thread, and linux, make sure you link against pthread otherwise your program runtime will be VERY unpredictable, and buggy.

If you're going to use threads, I'd advise doing the job at least a little differently.
Right now, one thread gets the mutex, does all it's going to do (including sleeping for 3000 microseconds), then quits. Then the other thread does essentially the same thing. This being the case, threads have accomplished essentially nothing positive and a fair amount of negative (synchronization code and such).
Your current code is almost unsafe with respect to exceptions -- if an exception were to be thrown inside one of your thread functions, the mutex wouldn't be unlocked, even though that thread could no longer execute.
Finally, right now, you're exposing a mutex, and leaving it to all code that accesses the associated resource to use the mutex correctly. I'd prefer to centralize the mutex locking so its exception safe, and most of the code can ignore it completely.
// use std::lock_guard, if available.
class lock {
mutex &m
public:
lock(mutex &m) : m(m) { m.lock(); }
~lock() { m.unlock(); }
};
class synched_vec {
mutex m;
std::vector<string> data;
public:
void push_back(std::string const &s) {
lock l(m);
data.push_back(s);
}
} ob;
void addSomeAValues(){
for(int a=0; a<10; a++){
ob.push_back("A" + std::to_string(a));
usleep(300);
}
}
This also means that if (for example) you decide to use a lock-free (or minimal locking) structure in the future, you should only have to modify the synched_vec, not all the rest of the code that uses it. Likewise, by keeping all the mutex handling in one place, it's much easier to get the code right, and if you do find a bug, much easier to ensure you've fixed it (rather than looking through all the client code).

The code in the question runs without any segmentation faults (with adding headers and replacing the sleep with a sleep for my system).
There are two problems with the code though, that could cause unexpected results:
Each thread locks the mutex during his full execution. This prevents the other thread to run. The two threads are not running in parallel! In your case, you should only lock, when you are accessing the vector.
Your end time point is taken after creating the threads and not after they are done executing. Both threads are done, when they are both joined.
Working compilable code with headers, chrono-sleep and the two errors fixed:
#include <mutex>
#include <string>
#include <vector>
#include <thread>
#include <iostream>
std::mutex mu;
std::vector<std::string> ob;
void addSomeAValues(){
for(int a=0; a<10; a++){
mu.lock();
ob.push_back("A" + std::to_string(a));
mu.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(300));
}
}
void addSomeBValues(){
for(int b=0; b<10; b++){
mu.lock();
ob.push_back("B" + std::to_string(b));
mu.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(300));
}
}
int main() {
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
std::thread t0(addSomeAValues);
std::thread t1(addSomeBValues);
t0.join();
t1.join();
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
//Display the results
std::cout << "Code Run Complete; results: \n";
for(auto k : ob){
std::cout << k << std::endl;
}
//Code running complete, report the time it took
typedef std::chrono::duration<int,std::milli> millisecs_t;
millisecs_t duration(std::chrono::duration_cast<millisecs_t>(end-start));
std::cout << duration.count() << " milliseconds.\n";
return 0;
}

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.

This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().

You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.

A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)

A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.

Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}

You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}

Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).

Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.

looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);

I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}

The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

Limitation on Qt and boost thread local storage

I have following questions on QThreadStorage and boost's thread_specific_ptr:
1) Is there any limitation on number of objects that can be stored in Qthreadstorage? I came across a qt query about 256 QThreadStorage objects, so like to clarify what this limitation points to?
2) Does QThreadStorage work only with QThreads?
3) Is there any limitation on boost tls?
4) I have a use case where I want to operate on tls and sync the data to main thread when all threads finish for further processing. I wrote the below code and like to check if the below code is okay.
#include <iostream>
#include <boost/thread/thread.hpp>
#include <boost/thread/tss.hpp>
boost::mutex mutex1;
int glob = 0;
class data
{
public:
char* p;
data()
{
p = (char*)malloc(10);
sprintf(p, "test%d\n", ++glob);
}
};
char* global_p[11] = {0};
int index = -1;
void cleanup(data* _ignored) {
std::cout << "TLS cleanup" << std::endl;
boost::mutex::scoped_lock lock(mutex1);
global_p[++index] = _ignored->p;
}
boost::thread_specific_ptr<data> value(cleanup);
void thread_proc()
{
value.reset(new data()); // initialize the thread's storage
std::cout << "here" << std::endl;
}
int main(int argc, char* argv[])
{
boost::thread_group threads;
for (int i=0; i<10; ++i)
threads.create_thread(&thread_proc);
threads.join_all();
for (int i=0; i<10; ++i)
puts(global_p[i]);
}

I can partially answer your question.
The 256 limit belongs to old qt. Probably you are reading old documentation. New qt version (i.e above 4.6) does not have such limit
QThreadStorage can destroy contained items at thread exit because it works closely with QThread. So separting these two is not a wise idea in my opinion.
Here I think you are asking the number of objects that can be stored with boost tls. I am not aware of any limitation on boost tls. You should be fine.
Your code looks OK to me except in the constructor of data you need to put a mutex lock before ++glob otherwise you may not get an incrementing value.
I hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js