I am trying to schedule tasks in multi threaded systems. my idea is to have a local queue per thread, each thread will fetch the job from its local queue. But when the thread reaches some threshold, it should not fetch the job, rather it should transfer the job to a thread which is below the threshold level.
My doubt is how to set the threshold for the threads.
An alternative arrangement to this problem is giving threads who have finished their queue the ability to take work from the queue of others. This is better known as "Work Stealing" and is a well known scheduling algorithm e.g.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8905
What threading library are you using?
I use two OSS libraries in all of my threading projects TBB and Cilk Plus. One feature that these higher level runtimes provide is that they automatically schedule tasks on to threads in a way that make efficient use of processor resources. The runtimes are also very effective at load balancing the many task.
www.threadingbuildblocks.org
www.cilkplus.org
Related
I implemented a scheduler task delegation scheduler instead of a task stealing scheduler. So the basic idea of this method is each thread has its own private local queue. Whenever a task is produced, before the task gets enqueued to the local queues, a search operation is done among the queues and minimum size queue is found by comparing each size of the queues. Each time this minimum size queue is used to enqueue the task. This is a way of diverting the pressure of the work from a busy thread's queue and delegate the jobs to the least busy thread's queue.
The problem in this scheduling technique is, we dont know how much time each tasks takes to complete. ie. the queue may have a minimal count, but the task may be still operating, on the other hand the queue may have higher value counter, but the tasks may be completed very soon. any ideas to solve this problem?
I am working on linux, C++ programming language in our own multithreading library implementing a multi-rate synchronous data flow paradigm .
It seems that your scheduling policy doesn't fit the job at hand. Usually this type of naive-scheduling which ignores task completion times is only relevant when tasks are relatively equal in execution time.
I'd recommend doing some research. A good place to start would be Wikipedia's Scheduling article but that is of course just the tip of the iceberg.
I'd also give a second (and third) thought to the task-delegation requirement since timeslicing task operations allows you to fine grain queue management by considering the task's "history". However, if clients are designed so that each client consistently sends the same "type" of task, then you can achieve similar results with this knowledge.
As far as I remember from my Queueing Theory class the fairest (of them all;) system is the one which has a single queue and multiple servers. Using such system ensures the lowest expected average execution time for all tasks and the largest utilization factor (% of time it works, I'm not sure the term is correct).
In other words, unless you have some priority tasks, please reconsider your task delegation scheduler implementation.
I am trying to implement a new scheduling technique with Multithreads. Each Thread has it own private local queue. The idea is, each time the task is created from the program thread, it should search the minimum queue sizes ( a queue with less number of tasks) among the queues and enqueue in it.
A way of load balancing among threads, where less busy queues enqueued more.
Can you please suggest some logics (or) idea how to find the minimum size queues among the given queues dynamically in programming point of view.
I am working on visual studio 2008, C++ programming language in our own multithreading library implementing a multi-rate synchronous data flow paradigm .
As you see trying to find the less loaded queue is cumbersome and could be an inefficient method as you may add more work to queues with only one heavy task, whereas queues with small tasks will have nor more jobs and become quickly inactive.
You'd better use a work-stealing heuristic : when a thread is done with its own jobs it will look at the other threads queues and "steal" some work instead of remaining idle or be terminated.
Then the system will be auto-balanced with each thread being active until there is not enough work for everyone.
You should not have a situation with idle threads and work waiting for processing.
If you really want to try this, can each queue not just keep a public 'int count' member, updated with atomic inc/dec as tasks are pushed/popped?
Whether such a design is worth the management overhead and the occasional 'mistakes' when a task is queued to a thread that happens to be running a particularly lengthy job when another thread is just about to dequeue a very short job, is another issue.
Why aren't the threads fetching their work from a 'master' work queue ?
If you are really trying to distribute work items from a master source, to a set of workers, you are then doing load balancing, as you say. In that case, you really are talking about scheduling, unless you simply do round-robin style balancing. Scheduling is a very deep subject in Computing, you can easily spend weeks, or months learning about it.
You could synchronise a counter among the threads. But I guess this isn't what you want.
Since you want to implement everything using dataflow, everything should be queues.
Your first option is to query the number of jobs inside a queue. I think this is not easy, if you want a single reader/writer pattern, because you probably have to use lock for this operation, which is not what you want. Note: I'm just guessing, that you can't use lock-free queues here; either you have a counter or take the difference of two pointers, either way you have a lock.
Your second option (which can be done with lock-free code) is to send a command back to the dispatcher thread, telling him that worker thread x has consumed a job. Using this approach you have n more queues, each from one worker thread to the dispatcher thread.
I have a (soft) realtime system which queries some sensor data, does some processing and then waits for the next set of sensor data. The sensor data are read in a receiver thread and put into a queue, so the main thread is "sleeping" (by means of a mutex) until the new data has arrived.
There are other tasks like logging or some long-term calculations in the background to do. These are implemented to run in other threads.
However, it is important that while the main thread processes the sensor data, it should have highest priority which means that the others threads should not consume any CPU resources at all if possible (currently the background threads cause the main thread to slow down in an unacceptable way.)
According to Setting thread priority in Linux with Boost there is doubt that setting thread priorities will do the job. I am wondering how I can measure which effect setting thread priorities really has? (Platform: Angstrom Linux, ARM PC)
Is there a way to "pause" and "continue" threads completely?
Is there a pattern in C++ to maybe realize the pause/continue on my own? (I might be able to split the background work into small chunks and I could check after every chunk of work if I am allowed to continue, but the question is how big these chunks should be etc.)
Thanks for your thoughts!
Your problem is with OS scheduler, not the C++. You need to have a real real-time scheduler that will block lower priority threads while the higher priority thread is running.
Most "standard" PC schedulers are not real-time. There's an RT scheduler for Linux - use it. Start with reading about SCHED_RR and SCHED_FIFO, and the nice command.
In many systems, you'll have to spawn a task (using fork) to ensure the nice levels and the RT scheduler are actually effective, you have to read through the manuals of your system and figure out which scheduling modules you have and how are they implemented.
There is no portable way to set the priority in Boost::Thread. The reason is that different OSs will have different API for setting the priority (e.g. Windows and Linux).
The best way to set the priority in a portable way is to write a wrapper to boost::thread with a uniform API that internally gets the thread native_handle, and then uses the OS specific API (for example, in Linux you can use sched_setscheduler()).
You can see an example here:
https://sourceforge.net/projects/threadutility/
(code made by a student of mine, look at the svn repository)
I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.
I need to develop a module which will execute scheduled tasks.
Each task is scheduled to be executed within X milliseconds.
The module takes as a parameter an amount of worker threads to execute the tasks.
The tasks are piled up in a queue which will probably be a priority queue, so a thread checks for the next-in-queue task (the one with the lowest "redemption" time), thus there's no need to iterate through all tasks each time.
Is there any public library that does that or shall I roll my own?
Note: I'm using VC2008 on Windows.
If you don't mind a Boost dependency, threadpool might fit your needs.
Take a look at TBB - Intel Threading Building Blocks.
Just to add a little information to your question, what you're asking for is a real-time scheduler that uses the Earliest Deadline First algorithm. Also note that without OS support, you can't guarantee that your program will work in that X millisecond deadline you assign it. The OS could always decide to swap your task off its CPU in the middle of the job, making it take an unpredictably-long time to complete.
If your application critically depeneds on the task being done in the X milliseconds you set for it (or something blows up), you'll need to be running a real-time operating system, not regular Windows.