how to implement a set of persistent coordinated worker threads - thread-synchronization

I want to reuse a set of worker threads. Each worker thread performs independent work but they must start and stop processing as a coordinated team. I need an efficient means for each worker thread to block until the main thread tells them all to go, and an efficient means for the main thread to block until all worker threads are finished.
Each chunk of work will only require some tens of microseconds so the usual approach of creating a set of threads then joining them all involves far too much overhead.
The pseudocode is like the following:
main thread:
create N threads
forever
prepare new independent work for each thread
tell all N threads to run their part
wait for all N threads to complete their work
use results
typical worker thread:
forever
wait to run
do my work
indicate to main my work is complete
My question is how best to perform this signaling and synchronization. I am not asking about how to divide up the work or move work to or from each thread; suffice it to say the threads do not interact.

Related

Continue executing another thread

I'm currently playing with WinAPI and I have proceeded to threads. My program has n threads which work with each other. There is one main thread, which writes some data to specific memory location and waits until thread working with specific memory location processes the data. Then whole procedure repeats again, without terminating any thread.
My problem is that busy waiting for second thread wastes too much time.
Is there any way to suspend current thread (to leave time for enother threads) or to solve this problem differently?
Please help.
I'm guessing that you're currently polling / busy waiting in your main thread, constantly checking the state of some completion flag the worker thread will set. As you note, this isn't desirable as you use some proportion of cpu bandwidth just waiting for the worker to complete. In some cases, this will reduce the amount of time your worker is scheduled for, delaying its completion.
Rather that doing this, you can use a synchronisation object such as Event or Semaphore to have your main thread sleep until the worker signals its completion.
You can use synchronization objects like mutex, semaaphores events etc for synchronization and WaitForSingleObject/WaitForMultipleObject API for thread waiting.

Synchronizing worker threads

I have a scenario for which I am trying to come up with the best synchronization approach. We assume that std::thread in C++11 is present, so no need to worry about differences between various threading libraries etc.
The scenario is this. Thread a, the main thread, wants to hand out tasks to a bunch of worker threads. Then, after giving out its final instruction for the time being, it needs to wait for all the threads to complete their work. We don't want to join them, just wait for them to finish their given task. Then thread a has to analyze the collected data from all threads, and then send out commands to the workers to begin the procedure again.
In short, these are the steps.
Thread a sends command x to all worker threads.
Thread a waits until all the workers have finished.
Thread a does processing.
Go back to 1.
What would you suggest that I use? Simple mutexes? Condition variables? A combination of the two? Any tips on how to structure the synchronization to be as efficient as possible would be appreciated.
You have n worker threads and one main thread a, which delegates tasks to workers and must wait for them to complete these tasks before assigning them a new batch of tasks.
The basic technique is to use a barrier (like boost::barrier) to synchronize the end of the worker threads and a.
The barrier is inittialized at n+1. Main thread a waits on the barrier, and each worker threads does the same at the end of its task. When the last thread called wait on the barrier, all the threads are woken up, and main thread can continue its work. You may want to add a second barrier to block the worker threads until a new task is assigned to them.
The body of worker thread may look like the following pseudocode:
while (running) {
startbarrier.wait(); // wait for main thread to signal start
do_work();
endbarrier.wait(); // signal end of work
}
The same thing can also be implemented with semaphores. Both semaphore and barrier can be implemented with a mutex and a condition variable.
See this SO question for more details.

boost thread pool

I need a threadpool for my application, and I'd like to rely on standard (C++11 or boost) stuff as much as possible. I realize there is an unofficial(!) boost thread pool class, which basically solves what I need, however I'd rather avoid it because it is not in the boost library itself -- why is it still not in the core library after so many years?
In some posts on this page and elsewhere, people suggested using boost::asio to achieve a threadpool like behavior. At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
What I want to do is:
ThreadPool pool(4);
for (...)
{
for (int i=0;i<something;i++)
pool.pushTask(...);
pool.join();
// do something with the results
}
Can anyone suggest a solution (except for using the existing unofficial thread pool on sourceforge)? Is there anything in C++11 or core boost that can help me here?
At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
I think you might have misunderstood the asio example:
IIRC (and it's been a while) each thread running in the thread pool has called io_service::run which means that effectively each thread has an event loop and a scheduler. To then get asio to complete tasks you post tasks to the io_service using the io_service::post method and asio's scheduling mechanism takes care of the rest. As long as you don't call io_service::stop, the thread pool will continue running using as many threads as you started running (assuming that each thread has work to do or has been assigned a io_service::work object).
So you don't need to create new threads for new tasks, that would go against the concept of a threadpool.
Have each task class derive from a Task that has an 'OnCompletion(task)' method/event. The threadpool threads can then call that after calling the main run() method of the task.
Waiting for a single task to complete is then easy. The OnCompletion() can perform whatever is required to signal the originating thread, signaling a condvar, queueing the task to a producer-consumer queue, calling SendMessage/PostMessage API's, Invoke/BeginInvoke, whatever.
If an oringinating thread needs to wait for several tasks to all complete, you could extend the above and issue a single 'Wait task' to the pool. The wait task has its own OnCompletion to communicate the completion of other tasks and has a thread-safe 'task counter', (atomic ops or lock), set to the number of 'main' tasks to be issued. The wait task is issued to the pool first and the thread that runs it waits on a private 'allDone' condvar in the wait task. The 'main' tasks are then issued to the pool with their OnCompletion set to call a method of the wait task that decrements the task counter towards zero. When the task counter reaches zero, the thread that achieves this signals the allDone condvar. The wait task OnCompletion then runs and so signals the completion of all the main tasks.
Such a mechansism does not require the continual create/terminate/join/delete of threadpool threads, places no restriction on how the originating task needs to be signaled and you can issue as many such task-groups as you wish. You should note, however, that each wait task blocks one threadpool thread, so make sure you create a few extra threads in the pool, (not usually any problem).
This seems like a job for boost::futures. The example in the docs seems to demonstrate exactly what you're looking to do.
Joining a thread mean stop for it until it stop, and if it stop and you want to assign a new task to it, you must create a new thread. So in your case you should wait for a condition (for example boost::condition_variable) to indicate end of tasks. So using this technique it is very easy to implement it using boost::asio and boost::condition_variable. Each thread call boost::asio::io_service::run and tasks will be scheduled and executed on different threads and at the end, each task will set a boost::condition_variable or event decrement a std::atomic to indicate end of the job! that's really easy, isn't it?

Scheduling huge number of threads so only 4 are executed in parallel

As already stated in the title I have a large number of threads (probably much higher than 100) that are rather saving a program state than running. I want only few of them (enough to use all physical processors) to really run concurrent and the rest should wait until one of the running is blocked. When this happens a new one should be running.
Is it possible to achieve this with pthreads for example with the pthread scheduling functions? How would you do this?
Regards,
Nobody
EDIT
More Information:
Each thread fetches a job from the taskpool on its own and goes on to a certain point.
I need 100 threads to gather at that certain point of program execution that cannot be calculated in parallel. When the calculation is done the threads should be awakened and go on. To make this efficient I have to avoid the scheduler from wasting time on switching between 100 threads instead of 4.
Just use a semaphore with initial count of 4?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_init.html
You could always launch 4 at a time, assigning them to a thread group, then waiting with a join all on the thread group. But I think more information is needed to develop a really useful answer.
Initialize a global variable to the number of threads to run concurrently.
When a thread wants to do work it obtains a slot. Using a mutex and condition variable, it waits until slots_available > 1. It then decrements slots_available releases the mutex and proceeds with its work.
When a thread has completed its work, it releases the slot by locking the mutex and incrementing slots_available. It signals all threads waiting on the condition variable so they can wake and see if slots_available > 1.
See https://computing.llnl.gov/tutorials/pthreads/#Mutexes for specific pthread library calls to use for the above.
I don't know how to do this with pthread functions, but I do have an idea:
I would implement this by adding some intelligence to the threadpool/taskpool to count the number of active threads and only make 4 - number of active threads available at any one time. This could be done by having an idle queue, a ready queue, and an active queue (or just active count). Tasks would grab from the ready queue, and the threadpool would only migrate tasks from the idle queue to the ready queue conditionally.

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.