I'm creating a multi-threaded program which highly relays on boost::fibers when it comes to executing the actual execution heavy code.
What I want to achieve:
My main thread knows when to activate and run which fiber and when to collect the future provided after the execution of the fiber.
As my software is running in a highly parallel environment with over 100 cores, I start as many worker threads if possible. The worker threads should run a fiber scheduler like boost::fibers::algo::work_stealing. Therefore they should execute all the fibers my main thread produces.
What's the issue:
As my main thread is quite buisy with creating and timing all the fibers for the over 100 worker threads I want to avoid the main thread from joining the execution of any fibers.
This means, my main thread should solely care about starting fibers and gathering their futures after they completed executing.
However, I do not know how to exclude my main fiber from executing fibers, too.
Possible, naive, solutions:
Even though I don't know how to solve my problem properly, I thougt about a few possibilities.
Create my own fiber scheduler: The scheduler is a customization point of boost::fibers. Hence, there might be the possibility to wirte a custome scheduler which excludes the main tread from execution.
Use a boost::fibers::buffered_channel to transmit tasks between threads: However, I don't if this is a good solution as this removes the great possibilities from using a fiber scheduler.
Ohter great way I don't know yet: I gues there might be another, simple way from excluding the main thread, which creates the fibers, from participating in the execution scheduling of boost::fibers
As I'm quite new in the boost::fibers library, I wonder what is the best way to achieve my goal?
Regards and thank you for your help!
Related
I have a main linux thread (th1) that runs a number of boost fibers that are scheduled using the boost priority scheduler.
Every so often, I would like to launch a fiber from another thread (th2) that will run in th1 and be scheduled along with the other th1 fibers. The code I use to launch fibers in th1 looks like:
void launchFiber()
{
boost::fibers::use_scheduling_algorithm< priority_scheduler >()
boost::fibers::fiber *fib = new boost::fibers::fiber(fb_fiberFunction);
priority_props & props( fib->properties< priority_props >() );
props.set_priority(FiberPriorityValue);
props.name = "Fiber Name";
fib->detach();
}
The launch code works fine when I call the launchFiber function from th1 but it does not work when I call it from th2--it looks like the fiber is not added to the th1 fiber queue. I have added a mutex to the th1 priority_scheduler routine to protect the fiber queue but this doesn't seem to help.
It seems to me that I don't really understand how the fiber system is working when there is more than one thread involved. I have tried to look at the library source code but it is not really clear to me.
My guess is that this would be simple if I understood it correctly. Could someone provide an example of how I might do this.
Contrary to system threads, fibers are based on cooperative scheduling. This means that you should manually tell to the scheduler when another fiber can be scheduled. The scheduler can choose the best fiber to schedule during this user-defined scheduling point. Here, the scheduler will choose the one with the highest priority. If there are no fibers with higher priority, then the same fiber can resume its execution back. The documentation states:
Each fiber has its own stack.
A fiber can save the current execution state, including all registers and CPU flags, the instruction pointer, and the stack pointer and later restore this state. The idea is to have multiple execution paths running on a single thread using cooperative scheduling (versus threads, which are preemptively scheduled). The running fiber decides explicitly when it should yield to allow another fiber to run (context switching).
Control is cooperatively passed between fibers launched on a given thread. At a given moment, on a given thread, at most one fiber is running. Spawning additional fibers on a given thread does not distribute your program across more hardware cores, though it can make more effective use of the core on which it's running.
this_fiber::yield() is meant to perform the actual yields operations on the current fiber.
Note that fibers are not safely compatible with thread-local storage if they are moved between threads (not the case by default) and using basic basic mutex/condition variables is not safe either, particularly if a yield can appear in a middle of a protected code (critical section) as it can cause deadlocks. It can also be sub-optimal because mutexes can cause the current thread to be pre-empted or passively waiting while another fiber could do computations. Boost provide alternative synchronisation mechanisms for fibers that are safer/more-efficient though one still need to care about that. This is why fibers cannot be used to execute any arbitrary code blindly.
For more information, you can give a look to the examples starting from the simplest one.
I did spend some time looking into this problem. It turns out that executing the command: boost::fibers::use_scheduling_algorithm< priority_scheduler >() creates a new priority_scheduler object with its own fiber queue. And this scheduler is associated with a context that is specific to the thread it is running in. So, in my circumstance, when I created a new fiber it ended up in the queue specific to the calling thread (th2, which wasn't running fibers) instead of the thread that was running all my fibers, th1.
So, I abandoned my idea of creating a fiber to run in th1 by a call from th2. I now using a queue that queues fiber launch requests from external threads. The fiber thread (th1) will check this queue when it executes the scheduler pick_next() function and if requests exist, fibers are created and added to th1's scheduler queue. It works fine--though I have an intermediate queue which I would prefer not to have (for esthetic reasons only).
I'm trying to create a task manager, which accepts tasks and runs each task as a new thread, using C++ and (currently) std::thread on a Linux environment .
the task manager accepts normal tasks and priority tasks.
when a priority task arrives, all normal tasks need to be halted until the priority task is done.
I'm keeping all normal task threads in a std::vector, but I couldn't find a proper function to halt those threads.
is there a way, preferably not using locks, to implement the wanted behavior?
maybe with <pthread> or boost threads?
There is no direct way to interrupt a thread from the outside.
Boost interruption points are handy to stop things once for all but that's not equivalent to a pause.
I would suggest you to implement your own "interruption" class with a condition variable (and yes a mutex) to check and wait efficiently anywhere inside your tasks. But it is up to you to explicitely call these interruptions.
Maybe another way would be make your priority tasks multithreadable so that you can allocate more threads to fulfill them => the scheduler is more likely to complete them first but that's not sure so forget what i said.
Sorry, I don't aknowledge anything better then this.
I want to create a thread or task (more than one to be exact) that goes and does some non CPU intensive work that will take a lot of time because of external causes, such a HTTP request or a file IO operation from a slow disk. I could do this with async await in C# and would be exactly what i am trying to do here. Spawn a thread or task and let it do it's own thing while i continue with execution of the program and simply let it return the result whenever ready. The problem with TBB i have is that all tasks i can make think they are created for a CPU intensive work.
Is what TBB calls GUI Thread what i want in this case ? I would need more than one, is that possible ? Can you point me to the right direction ? Should i look for another library that provides threading and is available for multiple OS ?
Any I/O blocking activity is poorly modeled by a task -- since tasks are meant to run to completion, it's just not what tasks are for. You will not find any TBB task-based approach that circumvents this. Since what you want is a thread, and you want it to work more-or-less nicely with other TBB code you already have, just use TBB's native thread class to solve the problem as you would with any other threading API. You won't need to set priority or anything else on this TBB-managed thread, because it'll get to its blocking call and then not take up any further time until the resource is available.
About the only thing I can think of specifically in TBB is that a task can be assigned a priority. But this isn't the same thing as a thread priority. TBB task priorities only dictate when a task will be selected from the ready pool, but like you said - once the task is running, it's expected to be working hard. The way to do use this to solve the problem you mentioned is to break your IO work into segments, then submit them into the work pool as a series of (dependent) low-priority tasks. But I don't think this gets to your real problem ...
The GUI Thread you mentioned is a pattern in the TBB patterns document that says how to offload a task and then wait for a callback to signal that it's complete. It's not altogether different from an async. I don't think this solves your problem either.
I think the best way for you here is to make an OS-level thread. That's pthreads on Linux or windows threads on Windows. Then you'll want to call this on it: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx ... if you happen to be in C++11, you could use a std::thread to create the thread and then call thread::native_handle to get a handle to call the Windows API to set the priority.
I need a threadpool for my application, and I'd like to rely on standard (C++11 or boost) stuff as much as possible. I realize there is an unofficial(!) boost thread pool class, which basically solves what I need, however I'd rather avoid it because it is not in the boost library itself -- why is it still not in the core library after so many years?
In some posts on this page and elsewhere, people suggested using boost::asio to achieve a threadpool like behavior. At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
What I want to do is:
ThreadPool pool(4);
for (...)
{
for (int i=0;i<something;i++)
pool.pushTask(...);
pool.join();
// do something with the results
}
Can anyone suggest a solution (except for using the existing unofficial thread pool on sourceforge)? Is there anything in C++11 or core boost that can help me here?
At first sight, that looked like what I wanted to do, however I found out that all implementations I have seen have no means to join on the currently active tasks, which makes it useless for my application. To perform a join, they send stop signal to all the threads and subsequently join them. However, that completely nullifies the advantage of threadpools in my use case, because that makes new tasks require the creation of a new thread.
I think you might have misunderstood the asio example:
IIRC (and it's been a while) each thread running in the thread pool has called io_service::run which means that effectively each thread has an event loop and a scheduler. To then get asio to complete tasks you post tasks to the io_service using the io_service::post method and asio's scheduling mechanism takes care of the rest. As long as you don't call io_service::stop, the thread pool will continue running using as many threads as you started running (assuming that each thread has work to do or has been assigned a io_service::work object).
So you don't need to create new threads for new tasks, that would go against the concept of a threadpool.
Have each task class derive from a Task that has an 'OnCompletion(task)' method/event. The threadpool threads can then call that after calling the main run() method of the task.
Waiting for a single task to complete is then easy. The OnCompletion() can perform whatever is required to signal the originating thread, signaling a condvar, queueing the task to a producer-consumer queue, calling SendMessage/PostMessage API's, Invoke/BeginInvoke, whatever.
If an oringinating thread needs to wait for several tasks to all complete, you could extend the above and issue a single 'Wait task' to the pool. The wait task has its own OnCompletion to communicate the completion of other tasks and has a thread-safe 'task counter', (atomic ops or lock), set to the number of 'main' tasks to be issued. The wait task is issued to the pool first and the thread that runs it waits on a private 'allDone' condvar in the wait task. The 'main' tasks are then issued to the pool with their OnCompletion set to call a method of the wait task that decrements the task counter towards zero. When the task counter reaches zero, the thread that achieves this signals the allDone condvar. The wait task OnCompletion then runs and so signals the completion of all the main tasks.
Such a mechansism does not require the continual create/terminate/join/delete of threadpool threads, places no restriction on how the originating task needs to be signaled and you can issue as many such task-groups as you wish. You should note, however, that each wait task blocks one threadpool thread, so make sure you create a few extra threads in the pool, (not usually any problem).
This seems like a job for boost::futures. The example in the docs seems to demonstrate exactly what you're looking to do.
Joining a thread mean stop for it until it stop, and if it stop and you want to assign a new task to it, you must create a new thread. So in your case you should wait for a condition (for example boost::condition_variable) to indicate end of tasks. So using this technique it is very easy to implement it using boost::asio and boost::condition_variable. Each thread call boost::asio::io_service::run and tasks will be scheduled and executed on different threads and at the end, each task will set a boost::condition_variable or event decrement a std::atomic to indicate end of the job! that's really easy, isn't it?
I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.