how to make a threadpool with boost::thread

how to make a threadpool with boost::thread - c++

boost::thread is not-a-thread, a new thread is created when the ftor passed to it is called and thread exits when ftor returns.
We use threadpool to minimize thread creation and destruction cost. but each thread in threadpool is also destroyed when the supplied ftor returns.
So whats the basic concept behind building a threadpool ? is there any permanent thread where I can assign ftors to that thread ?

A thread pool is just a bunch of threads that already running, and that are all running the same function. This functions basically just waits on a queue, and when there is a "function" in the queue it extracts and executes it.
Pseudo-code:
void thread_pool_function()
{
while (true)
{
wait_for_signal_that_queue_is_not_empty();
function_to_call = queue.remove_top();
unklock_queue_semaphore();
function_to_call();
}
}
create_thread(thread_pool_function);
create_thread(thread_pool_function);
create_thread(thread_pool_function);
create_thread(thread_pool_function);
In the "code" above there are now four threads, all initially waiting for something to be put in a "queue". When there is something in the queue, it extracts it, and calls it as a function.
This is probably the simplest way to implement a thread pool.

In addtion to what #Joachim posted:
One way to flow-control such a system (and one I use a lot), is to use a 'pool queue', (blocking producer-consumer queue), of tasks, created and filled at startup with a fixed number of task objects. Any thread that wants to issue a task has to get one from the pool first and tasks are returned to the pool after completion handling. This limits the number of tasks in the system and, if the pool empties, requesting threads just have to wait, blocked on the empty pool, until some 'used' tasks come back in.
This works well, provides flow-control, prevents memory-runaway and eliminates continual task create/destroy. It's also easy to periodically display/write the pool queue depth on a timer, so you can see how 'busy' your app is, (and detect any leaks:).
Edit: Also, it removes the need for any bounded queues in the system. Unbounded queues are simpler and tend to need fewer system calls.

Related

When to use std::launch::deferred?

Lines from Anthony William book:
std::launch::deferred indicates that the function call is to be
deferred until either wait() or get() is called on the future.
X baz(X&);
auto f7 = std::async(std::launch::deferred, baz, std::ref(x)); //run in wait() or get()
//...
f7.wait(); //invoke deferred function
What could be the benefits or differences of this code over a direct call (baz(ref(x)) )?
In other words, what's the point of having future here?

Suppose you have a thread pool.
The thread pool owns a certain number of threads. Say 10.
When you add tasks, they return a future, and they queue into the pool.
Threads in the pool wake up, grab a task, work on it.
What happens when you have 10 tasks in that pool waiting on a task later in the queue? Well, a deadlock.
Now, what if we return a deferred future from this pool.
When you wait on this deferred future it wakes up, checks if the task is done. If so, it finishes and returns.
Next, if the tasks is in the queue and not yet started, it steals the work from the queue and runs it right there, and returns.
Finally, if it is being run by the queue but not finished, it does something more complex. (the simplest version which usually works is that it blocks on the task, but that doesn't solve some pathological cases).
In any case, now if a task in the queue sleeps waits for another task in the queue to complete that isn't queue'd yet, we still get forward progress.
Another use of this is less arcane. Suppose we have some lazy values.
Instead of calculating them, we store shared futures with the calcuation steps in them. Now anyone who needs them just does a .get(). If the value has already been calculated, we get the value; otherwise, we calculate it, then get it.
Later, we add in a system to do some work on idle or in another thread. These replace said deferred lazy futures in some cases, but not in others.

I think, the main benefit is that it might be executed in a different thread - the one which actually reads the future. This allows to transfer 'units of work' between threads - i.e. thread 1 creates the future, while thread 2 calls wait on it.

in my point of view. I read effective modern c++ rule 35
Compared to thread-based programming, a task-based design spares you the travails
of manual thread management
it means std::launch::deferred is a worse case when the OS have no ability to allocate a new thread for you however, the baz function still work but it run as a deferred task instead of returning failed like pthread_create or throw exception with std::thread like this:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
conclusion:
// same thread with called.
std::async(std::launch::deferred, bax,..) = baz()
// create a new thread to run baz(..) in case of OS have ability to allocate a new thread, otherwise same above
std::async(baz, ...) = std::async(std::launch::deferred| std::launch::async , baz, ...) != baz() ;
https://man7.org/linux/man-pages/man3/pthread_create.3p.html
tested at https://godbolt.org/z/hYv7TW51q

boost::threadpool::pool vs.boost::thread_group

I'm trying to understand the different use cases. and the difference between the 2 thread uses.
This is a great tutorial I have read which explains boost::thread_group.
and here is a code I'm using:
boost::threadpool::pool s_ThreadPool(GetCoreCount());
CFilterTask task(pFilter, // filter to run
boost::bind(&CFilterManagerThread::OnCompleteTask, this, _1, _2) // OnComplete sync callback // _1 will be filter name // _2 will be error code
);
// schedule the new task - runs on the threadpool
s_ThreadPool.schedule(task);
this is the destructor:
s_ThreadPool.wait(0);
can you please explain?

boost::thread_group is a convenience class for performing thread management operations on a collection of threads. For example, instead of having to iterate over std::vector<boost::thread>, invoking join() on each thread, the thread_group provides a convenient join_all() member function.
With boost::thread, regardless of it being managed by boost::thread_group, the lifetime of the thread is often dependent on the work in which the thread is doing. For example, if a thread is created to perform a computationally expensive calculation, then the thread can exit once the result has been calculated. If the work is short-lived, then the overhead of creating and destroying threads can affect performance.
On the other hand, a threadpool is a pattern, where a number of threads services a number of task/work. The lifetime of the thread is not directly associated with the lifetime of the task. To continue with the previous example, the application would schedule the computationally expensive calculation to run within the thread pool. The work will be queued within the threadpool, and one of the threadpool's threads will be selected to perform the work. Once the calculation has completed, the thread goes back to waiting for more work to be scheduled with the threadpool.
As shown in this threadpool example, a threadpool can be implemented with boost::thread_group to manage lifetime of threads, and boost::asio::io_service for task/work dispatching.

Interrupting threads if not joined

I am looking for a way(preferably with boost threads), to interrupt a thread if it has not joined. I start multiple threads, and would like to end any of them that have not finished by 200 milliseconds. I tried something like this
boost::thread_group tgroup;
tgroup.create_thread(boost::bind(&print_f));
tgroup.create_thread(boost::bind(&print_g));
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
tgroup.interrupt_all();
Now this works, and all threads are ended after 200 milliseconds; however I would like to try and join these threads if they finish before 200 milliseconds, is there a way to join and interrupt if not finished by a certain amount of time?
Edit: reason why I need join to happen before timeout:
I am creating a server where speed is very important. Unfortunately I have to make requests to other servers for some information. So I would like to make these calls in parallel, and finish as soon as possible. If a server is taking too long, I have to just ignore the information coming from that server, and continue on without it. So my timeout time is my maximum amount of time I can wait. It will be extremely beneficial to me to be able to continue on with contemplation when all responses are received, instead of waiting for time timeout timer. So what my program will:
-Get a request from a client.
-Parse information.
Create threads
-Send information to multiple other servers.
-Get information back from servers.
-Put information from servers on a shared queue.
End Threads
-Parse information from shared queue.
-Return information back to client

What you want to use is probably a set of scoped threads, and call terminate on all the remaining threads after timeout. thread groups and scoped threads are not useable together unfortunately.
The thread group class is actually a very simple container: you cannot remove a thread of it if you don't have a pointer to it already, and you cannot get a pointer to a thread which has been created by the group. The class API doesn't provide much either. This is a bit hindering for management in your situation.
The remaining solutions rely on creating the threads outside the goup, and have each of them do a specific task just before finishing. It could:
remove itself from the group,
then add itself to another group
The managing thread will have to call join_all on the later group, and act as before with the former.
using namespace boost;
void thread_end(auto &thmap, thread_group& t1, thread_group& t2, auto &task){
task();
thread *self = thmap[this_thread::get_id()];
t1.remove_thread(&self);
t2.add_thread(&self);
}
std::map<thread::id, thread *> thmap;
thread_group trunninggroup;
thread_group tfinishedgroup;
thread *th;
th = new thread(
bind(&thread_end, thmap, trunninggroup, tfinishedgroup, bind(&print_f)));
thmap[th->get_id()] = th;
trunninggroup.add_thread(th);
th = new thread(
bind(&thread_end, thmap, trunning_group, tfinishedgroup, bind(&print_g)));
thmap[th->get_id()] = th;
trunninggroup.add_thread(th);
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
tfinishedgroup.join_all();
trunninggroup.interrupt_all();
But this is not ideal if you actually want the managing thread to be notified of a thread end when it actually happens (and I'm not really certain it does anything useful anyway). A solution for getting notified is perhaps to:
do the group migration as above
then trigger a condition variable on which the management thread is doing a timed_wait
but you will have to do some time computation to keep track of the remaining time after being notified, and resume sleep with that time left. That would be entirely dependent on the Duration class used for that task.
Update
Seeing the big picture, I would try a completely different approach: I don't think that terminating a thread which is already finished is a problem, so I would leave them all in the group, and use the group to create them, as your code demonstrate it.
However, I would try to wake up the managing thread as soon as all the threads are done, or after timeout. This is not doable with what the thread_group class offers alone, but it can be done with a custom made semaphore, or a patched version of boost::barrier to allow a timed wait.
Basically, you set a barrier to the number of threads in the group plus one (the main thread), and have the main thread time wait on it. Each worker thread does its work, and when finished, post its result in the queue, and wait on the barrier. If all the worker threads finish their task, everyone will wait and the barrier gets triggered.
Then main thread (as well as all others, but it doesn't matter), wakes up and can proceed by terminating the group and process the result. Otherwise, it will be awaken at timeout and do the same anyway.
The patching of boost::barrier should not be too difficult, you should only need to duplicate the wait method and replace the condition variable wait inside by a timed_wait (I didn't look at the code, this assumption might be totally of the mark though). Otherwise I provided a sample semaphore implementation for this question, which shouldn't be difficult to patch either.
Some last consideration: terminating a thread is usually not the best approach. You should instead try to signal the threads they have to abort, and wait for them, or somehow havecthem pass their unfinished task to an auxiliary thread which should clean things up serially. Then your thread group would be ready to tackle on the next task, and you wouldn't have to destroy and create threads all the time, which is a somewhzt costly operation. It will require to formalize the idea of a task in the context of your application, and make the threads run on a loop for taking new tasks and process them.

If you're using a very recent Boost and C++11, use try_join_for() (http://www.boost.org/doc/libs/1_53_0/doc/html/thread/thread_management.html#thread.thread_management.thread.try_join_for). Otherwise, use timed_join() (http://www.boost.org/doc/libs/1_53_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join).

pthread pool, C++

I am working on a networking program using C++ and I'd like to implement a pthread pool. Whenever, I receive an event from the receive socket, I will put the data into the queue in the thread pool. I am thinking about creating 5 separate threads and will consistently check the queue to see if there is anything incoming data to be done.
This is quite straight forward topic but I am not a expert so I would like to hear anything that might help to implement this.
Please let me know any tutorials or references or problems I should aware.

Use Boost.Asio and have each thread in the pool invoke io_service::run().
Multiple threads may call
io_service::run() to set up a pool of
threads from which completion handlers
may be invoked. This approach may also
be used with io_service::post() to use
a means to perform any computational
tasks across a thread pool.
Note that all threads that have joined
an io_service's pool are considered
equivalent, and the io_service may
distribute work across them in an
arbitrary fashion.

Before I start.
Use boost::threads
If you want to know how to do it with pthread's then you need to use the pthread condition variables. These allow you to suspend threads that are waiting for work without consuming CPU.
When an item of work is added to the queue you signal the condition variable and one pthread will be released from the condition variable thus allowing it to take an item from the queue. When the thread finishes processing the work item it returns back to the condition variable to await the next piece of work.
The main loop for the threads in the loop should look like this;
ThreadWorkLoop() // The function that all the pool threads run.
{
while(poolRunnin)
{
WorkItem = getWorkItem(); // Get an item from the queue. This suspends until an item
WorkItem->run(); // is available then you can run it.
}
}
GetWorkItem()
{
Locker lock(mutex); // RAII: Lock/unlock mutex
while(workQueue.size() == 0)
{
conditionVariable.wait(mutex); // Waiting on a condition variable suspends a thread
} // until the condition variable is signalled.
// Note: the mutex is unlocked while the thread is suspended
return workQueue.popItem();
}
AddItemToQueue(item)
{
Locker lock(mutex);
workQueue.pushItem(item);
conditionVariable.signal(); // Release a thread from the condition variable.
}

Have the receive thread to push the data on the queue and the 5 threads popping it. Protect the queue with a mutex and let them "fight" for the data.
You also want to have a usleep() or pthread_yield() in the worker thread's main loop

You will need a mutex and a conditional variable. Mutex will protect your job queue and when receiving threads add a job to the queue it will signal the condition variable. The worker threads will wait on the condition variable and will wake up when it is signaled.

Boost asio is a good solution.
But if you dont want to use it (or cant use it for whatever reasons) then you'll probably want to use a semaphore based implementation.
You can find a multithreaded queue implementation based on semaphores that I use here:
https://gist.github.com/482342
The reason for using semaphores is that you can avoid having the worker threads continually polling, and instead have them woken up by the OS when there is work to be done.

Waiting win32 threads

I have a totally thread-safe FIFO structure( TaskList ) to store task classes, multiple number of threads, some of which creates and stores task and the others processes the tasks. TaskList class has a pop_front() method which returns the first task if there is at least one. Otherwise it returns NULL.
Here is an example of processing function:
TaskList tlist;
unsigned _stdcall ThreadFunction(void * qwe)
{
Task * task;
while(!WorkIsOver) // a global bool to end all threads.
{
while(task = tlist.pop_front())
{
// process Task
}
}
return 0;
}
My problem is, sometimes, there is no new task in the task list, so the processing threads enters in an endless loop (while(!WorkIsOver)) and CPU load increases. Somehow I have to make the threads wait until a new task is stored in the list. I think about Suspending and Resuming but then I need extra info about which threads are suspending or running which brings a greater complexity to coding.
Any ideas?
PS. I am using winapi, not Boost or TBB for threading. Because sometimes I have to terminate threads that process for too long, and create new ones immediately. This is critical for me. Please do not suggest any of these two.
Thanks

Assuming you are developing this in DevStudio, you can get the control you want using [IO Completion Ports]. Scary name, for a simple tool.
First, create an IOCompletion Port: CreateIOCompletionPort
Create your pool of worker threads using _beginthreadex / CreateThread
In each worker thread, implement a loop that calls GetQueuedCompletionStatus - The returned lpCompletionKey will be pointing to a work item to process.
Now, whenever you get a work item to process: call PostQueuedCompletionStatus from any thread - passing in the pointer to your work item as the completion key parameter.
Thats it. 3 API calls and you have implemented a thread pooling mechanism based on a kernel implemented queue object. Each call to PostQueuedCompletionStatus will automatically be deserialized onto a thread pool thread thats blocking on GetQueuedCompletionStatus. The pool of worker threads is created, and maintained - by you - so you can call TerminateThread on any worker threads that are taking too long. Even better - depending on how it is set up the kernel will only wake up as many threads as needed to ensure that each CPU core is running at ~100% load.
NB. TerminateThread is really not an appropriate API to use. Unless you really know what you are doing the threads are going to leak their stacks, none of the memory allocated by code on the thread will be deallocated and so on. TerminateThread is really only useful during process shutdown. There are some articles on the net detailing how to release the known OS resources that are leaked each time TerminateThread is called - if you persist in this approach you really need to find and read them if you haven't already.

Use a semaphore in your queue to indicate whether there are elements ready to be processed.
Every time you add an item, call ::ReleaseSemaphore to increment the count associated with the semaphore
In the loop in your thread process, call ::WaitForSingleObject() on the handle of your semaphore object -- you can give that wait a timeout so that you have an opportunity to know that your thread should exit. Otherwise, your thread will be woken up whenever there's one or more items for it to process, and also has the nice side effect of decrementing the semaphore count for you.

If you haven't read it, you should devour Herb Sutter's Effective Concurrency series which covers this topic and many many more.

Use condition variables to implement a producer/consumer queue - example code here.
If you need to support earlier versions of Windows you can use the condition variable in Boost. Or you could build your own by copying the Windows-specific code out of the Boost headers, they use the same Win32 APIs under the covers as you would if you build your own.

Why not just use the existing thread pool? Let Windows manage all of this.

You can use windows threadpool!
Or you can use api call
WaitForSingleObject or
WaitForMultipleObjects.
Use at least SwitchToThread api call
when thread is workless.

If TaskList has some kind of wait_until_not_empty method then use it. If it does not then one Sleep(1000) (or some other value) may just do the trick. Proper solution would be to create a wrapper around TaskList that uses an auto-reset event handle to indicate if list is not empty. You would need to reinvent current methods for pop/push, with new task list being the member of new class:
WaitableTaskList::WaitableTaskList()
{
// task list is empty upon creation
non_empty_event = CreateEvent(NULL, FALSE, FALSE, NULL);
}
Task* WaitableTaskList::wait_and_pop_front(DWORD timeout)
{
WaitForSingleObject(non_empty_event, timeout);
// .. handle error, return NULL on timeout
Task* result = task_list.pop_front();
if (!task_list.empty())
SetEvent(non_empty_event);
return result;
}
void WaitableTaskList::push_back(Task* item)
{
task_list.push_back(item);
SetEvent(non_empty_event);
}
You must pop items in task list only through methods such as this wait_and_pop_front().
EDIT: actually this is not a good solution. There is a way to have non_empty_event raised even if the list is empty. The situation requires 2 threads trying to pop and list having 2 items. If list becomes empty between if and SetEvent we will have the wrong state. Obviously we need to implement syncronization as well. At this point I would reconsider simple Sleep again :-)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js