Signalize thread to start specific function - c++

I want to split some CPU intensive jobs to multiple threads. I want to make a thread pool with, let's say, 4 threads.
I want to know very fast ways to do following:
Check if one thread is free for receiving processing
Signalize one thread to start specific function
Wait for all the threads to finish their jobs
This should be as fast as possible. I use C++ in Visual Studio 2010 on Windows 7. Any Win7/VS2010 specific solution would be preferred if it's faster than portable approach.
EDIT:
I found on MSDN this sample:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686946(v=vs.85).aspx
Is there any faster way to do this?

The stuff from the Boost thread library is pretty fast. You can start 4 threads that end up waiting for a boost::condition_variable. In the main thread you can add stuff to a task-queue and then call boost::condition_variable::notify_one in order to start one free thread, if any. As soon as one of the working threads is notified, it takes stuff out of the task queue and continues to do so until the queue is empty. In order to wait for the task queue to finish, let the thread that makes the task queue empty call boost::condition_variable::notify_all and wait in the main thread for that signal. Obviously you need to protect the shared data for this stuff with a mutex.
This technique works fine if you have medium to large size tasks and several thousand or less should execute in a second. I don't have experience with smaller tasks using this technique.
The parallel patterns library (PPL) is really good at that stuff too, it does a lot of stuff for you, but you don't have as much control. It's Windows only, but that seems to be fine with you. ;)
EDIT: Your link seems to be a good solution. Using the WINAPI is often the fastest thing you can do, since other APIs are usually build upon it. The WINAPI does not, however, provide very good abstraction. Thus I would prefer PPL, futures, etc. to perform tasks like that. How big are your tasks? If they take more than a few milliseconds, then you shouldn't worry about the api you're using, since that's not the bottleneck.

First way: Asynchronous Procedure Calls.
Another way: I/O Completion Ports, which can be used for your task.

I don't know about Visual C++ specific Thread Pools however I've heard about existance of some ppl.h. there an unofficial boost threadpool that I've use one. and just as all other boost It compiles well in Visual Studio

try tbb
class SimpleTask: public tbb::task {
public:
SimpleTask(const char *c ) {}
task* execute() {
//do task
return 0;
}
};
//execute tasks and wait
tbb::task_scheduler_init init(50);//initialize pool
tbb::task_list list;
for(int i=0;i<30;i++){//create 30 task
list.push_back(*new(tbb::task::allocate_root()) SimpleTask());
}
tbb::task::spawn_root_and_wait(list);//execute and wait for all task or call spawn without wait

Related

How to use the work_stealing scheduler in boost.fibers

I am trying to build a generic task system where I can post tasks that get executed on whatever thread is free. With previous attempt I often ran out of threads because they would block at some point. So I am trying boost fibers; when one fiber blocks the thread is free to work on some other fiber, sounds perfect.
The work-stealing algorithm seems to be ideal for my purpose, but I have a very hard time to use it. In the example code fibers get created and only then the threads and schedulers get created, so all the fibers actually get executed on all the threads. But I want to start fibers later and by then all the other threads are suspended indefinitely because they didn't have any work. I have not found any way to wake them up again, all my fibers get only executed on the main thread. "notify" seems to be the method to call, but I don't see any way to actually get to an instance of an algorithm.
I tried keeping pointers to all instances of the algorithm so I could call notify(), but that doesn't really help; most of the time the algorithms in the worker threads cannot steal anything from the main one because the next one is the dispatcher_context.
I could disable "suspend", but threads are busy-waiting then, not an option.
I also tried the shared_work-algorithm. Same problem, once a thread cannot find a fiber it will never wake up again. I tried the same hack manually calling notify(), same result, very unreliable.
I tried using the channels, but AFAICT, if a fiber is waiting for it, the current context just "hops" over and runs the waiting fiber, suspending the current one.
In short: I find it very hard to reliably run a fiber on another thread. When profiling most threads are just waiting on a condition_variable, even though I did create tons of fibers.
As a small testing case I am trying:
std::vector<boost::fibers::future<int>> v;
for (auto i = 0; i < 16; ++i)
v.emplace_back(boost::fibers::async([i] {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return i;
}));
int s = 0;
for (auto &f : v)
s += f.get();
I am intentionally using this_thread::sleep_for to simulate the CPU being busy.
With 16 threads I would expect this code to run in 1s, but mostly it ends up being 16s. I was able to get this specific example to actually run in 1s just hacking around stuff; but no way felt "right" and no way did work for other scenarios, it always had to be hand-crafted to one specific scenario.
I think this example should just work as expected with a work_stealing algorithm; what am I missing? Is it just a misuse of fibers? How could I implement this reliably?
Thanks,
Dix
boost.fiber contains an example using the work_stealing algorithm (examples/work_stealing.cpp).
You have to install the algorithm on each worker-thread that should handle/steal fibers.
boost::fibers::use_scheduling_algorithm< boost::fibers::algo::work_stealing >( 4); // 4 worker-threads
Before you process tasks/fibers, you have to wait till all worker-threads have been registered at the algotithm. The example uses a barrier for this purpose.
You need an idication that all work/task has been procesed, for isntance using a condition-variable.
Take a look at Running with worker threads (boost documentation).

Notify caller that a thread has finished

I am trying to use the multithreading features in the C++11 standard library and have the following situation envisioned.
I have a parent class which maintains a queue of thread. So something like:
std::queue<MyMTObject *> _my_threads;
The class MyMTObject contains the std::thread object.
The queue has a fixed size of 5 and the class initially starts with the queue being full.
As I have jobs to process I launch threads and I remove them from the queue. What I would like is to get a notification when the job is finished along with the pointer to the MyMTObject, so that I can reinsert them into the queue and make them available again.
I have basically 2 questions:
1: Is this a sound idea? I know I have not specified specifics but broadly speaking. I will, of course, control all access to the queue with a mutex.
2: Is there a way to implement this notification mechanism without using external libraries like Qt or boost.
For duplicates, I did look on the site but could not find anything that was suitable to manage a collection of threads.
I'm not sure if I need to mention this, but std::thread objects can't be re-used. Generally, the only reason you keep a std::thread reference is to std::thread::join the thread. If you don't plan to join the thread later (e.g. dispatch to threads and wait for completion), it's generally advised to std::thread::detach it.
If you're trying to keep threads for a thread pool, it's probably easier to have each thread block on the std::queue and pull objects from the queue to work on. This is relatively easy to implement using a std::mutex and a std::condition_variable. It generally gives good throughput, but to get finer control over scheduling you can do things like keep a seperate std::queue for each thread.
Detaching the threads and creating a work queue also has the added benefit that it avoids redundantly requesting the operating system create new threads which adds overhead and increases overall resource usage.
You could try to deploy some version of Reactor pattern I think. So, you could start one additional control thread that cleans after these workers. Now, you create a ThreadSafeQueue that will be used to communicate events from worker threads to control thread. This queue should be implemented in such a way that you can select on it and wait for any activity on the other end (some thread terminates and calls queue.push for example).
All in all I think it's quite elegant solution. I does add an overhead of an additional thread, but this thread will be mostly sleeping and waking up only once a while to clean up after the worker.
There is no elegant way to do this in Posix, and C++ threading model is almost a thin wrapper on Posix.
You can join a specific thread (one at a time), or you can wait on futures - again, one future at a time.
The best you can do to avoid looping is to employ a conditional variable, and make all threads singal on it (as well as indicating which one just exited by setting some sort of per-thread flag) just before they are about to exit. The 'reaper' would notice the signal and check the flags.
The issue is that this solution requires thread cooperation. But I know not of any better.

How to have a long waiting thread in Intel TBB?

I want to create a thread or task (more than one to be exact) that goes and does some non CPU intensive work that will take a lot of time because of external causes, such a HTTP request or a file IO operation from a slow disk. I could do this with async await in C# and would be exactly what i am trying to do here. Spawn a thread or task and let it do it's own thing while i continue with execution of the program and simply let it return the result whenever ready. The problem with TBB i have is that all tasks i can make think they are created for a CPU intensive work.
Is what TBB calls GUI Thread what i want in this case ? I would need more than one, is that possible ? Can you point me to the right direction ? Should i look for another library that provides threading and is available for multiple OS ?
Any I/O blocking activity is poorly modeled by a task -- since tasks are meant to run to completion, it's just not what tasks are for. You will not find any TBB task-based approach that circumvents this. Since what you want is a thread, and you want it to work more-or-less nicely with other TBB code you already have, just use TBB's native thread class to solve the problem as you would with any other threading API. You won't need to set priority or anything else on this TBB-managed thread, because it'll get to its blocking call and then not take up any further time until the resource is available.
About the only thing I can think of specifically in TBB is that a task can be assigned a priority. But this isn't the same thing as a thread priority. TBB task priorities only dictate when a task will be selected from the ready pool, but like you said - once the task is running, it's expected to be working hard. The way to do use this to solve the problem you mentioned is to break your IO work into segments, then submit them into the work pool as a series of (dependent) low-priority tasks. But I don't think this gets to your real problem ...
The GUI Thread you mentioned is a pattern in the TBB patterns document that says how to offload a task and then wait for a callback to signal that it's complete. It's not altogether different from an async. I don't think this solves your problem either.
I think the best way for you here is to make an OS-level thread. That's pthreads on Linux or windows threads on Windows. Then you'll want to call this on it: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx ... if you happen to be in C++11, you could use a std::thread to create the thread and then call thread::native_handle to get a handle to call the Windows API to set the priority.

Is there a way to find out, whether a thread is blocked?

I'm writing a thread pool class in C++ which receives tasks to be executed in parallel. I want all cores to be busy, if possible, but sometimes some threads are idle because they are blocked for a time for synchronization purposes. When this happens I would like to start a new thread, so that there are always approximately as many threads awake as there are cpu cores. For this purpose I need a way to find out whether a certain thread is awake or sleeping (blocked). How can I find this out?
I'd prefer to use the C++11 standard library or boost for portability purposes. But if necessary I would also use WinAPI. I'm using Visual Studio 2012 on Windows 7. But really, I'd like to have a portable way of doing this.
Preferably this thread-pool should be able to master cases like
MyThreadPool pool;
for ( int i = 0; i < 100; ++i )
pool.addTask( &block_until_this_function_has_been_called_a_hundred_times );
pool.join(); // waits until all tasks have been dispatched.
where the function block_until_this_function_has_been_called_a_hundred_times() blocks until 100 threads have called it. At this time all threads should continue running. One requirement for the thread-pool is that it should not deadlock because of a too low number of threads in the pool.
Add a facility to your thread pool for a thread to say "I'm blocked" and then "I'm no longer blocked". Before every significant blocking action (see below for what I mean by that) signal "I'm blocked", and then "I'm no longer blocked" afterwards.
What constitutes a "significant blocking action"? Certainly not a simple mutex lock: mutexes should only be held for a short period of time, so blocking on a mutex is not a big deal. I mean things like:
Waiting for I/O to complete
Waiting for another pool task to complete
Waiting for data on a shared queue
and other similar events.
Use Boost Asio. It has its own thread pool management and scheduling framework. The basic idea is to push tasks to the io_service object using the post() method, and call run() from as many threads as many CPU cores you have. You should create a work object while the calculation is running to avoid the threads from exiting if they don't have enough jobs.
The important thing about Asio is never to use any blocking calls. For I/O calls, use the asynchronous calls of Asio's own I/O objects. For synchronization, use strand objects instead of mutexes. If you post functions to the io service that is wrapped in a strand, then it ensures that at any time at most one task runs that belongs to a certain strand. If there is a conflict, the task remains in Asio's event queue instead of blocking a working thread.
There is one drawback of using asynchronous programming though. It is much harder to read a code that is scattered into several asynchronous calls than one with a clear control flow. You should be aware of this when designing your program.

Reading information from a worker thread efficiently

I'm writing some computer vision software, here's a brief description to help clarify the problem:
I have 3 cameras, each running at 60fps
Each camera has it's own thread, to utilise multiple cores
Each thread waits for a new frame to arrive, does some processing on the image, saves the result and waits for the next frame
My main program creates these thread, using boost, following this tutorial: http://blog.emptycrate.com/node/282
I am currently polling the threads in a tight loop to retrieve the data, e.g.:
while(1) {
for(i=0; i<numCams; i++) {
result[i] = cam[i]->getResult();
}
//do some stuff
}
This seems silly. Is there a standard way of letting the main program know that there is a new result and that it needs to be retrieved?
Thanks!
Yes, you need to use condition variables (AKA events).
Yes, you need to use synchronization. There are many forms depending on what you're using as a threading API, however the simplest is probably a condition variable.
What you need is a thread pool. The number of cameras isn't necessary the same as the optimal number of threads. Thread pool is optimized for performance. Then, you don't need to wait for condition or poll the jobs, you enqueue the jobs (most often it's std::function<void()>) in the thread pool, and that job object should perform all the required work. Use binders (std::bind) or lambda functions to create a job object.
In your case you are talking to hardware, so you may need to use whatever facilities your camera API provides for asynchronous notification of incomming data. Usually that will be some kind of callback you provide, or occasionally something like a Windows Event handle or Unix signal.
In general if you meant "standard" as in "part of the C++ standard", no. You need to use your OS's facilites for interprocess (or thread) condition signalling.
Note that if we were talking Ada (or Modula-2, or many other modern systems programming languages) the answer would have been "yes". I understand there is some talk of putting concurrency support of some kind into a future C++ standard.
In the meantime, there is the boost::thread library for doing this kind of thing. That isn't exactly "standard", but for C++ it is pretty close. I think for what you are trying to do, condition variables might be what you want. However, if you read over the whole facility, other simpler designs may occur to you.
I know this sounds a little odd, however consider using a boost::asio::io_service it's as close to a threadpool as you get currently. When you've captured an image, you can post to this service and the service can then execute a handler asynchronously to handle your image data.