How to add tasks to a tango (D) ThreadPool asynchroniously ? - concurrency

I am comparing a task queue/thread pool pattern system to an n-threads system in D. I'm really new to the D programming language but have worked with threads in C, Java, and Python before. I'm using the Tango library, and I'm building a webserver as an example.
I decided to use tango.core.ThreadPool as my thread pool, as my project is focused on ease of use and performance between traditional threading and task queues.
The documentation shows that I have 3 options:
ThreadPool.wait() - Blocks the current thread while the pool consumes tasks from the queue.
ThreadPool.shutdown() - Finishes the tasks in the pool but not the ones in the queue.
ThreadPool.finish() - Finishes all tasks in the pool and queue, but then accept no more.
None of these things are what I want. It is my understanding that your list of tasks should be able to grow in these systems. The web server is very simple and naïve; I just want it to try its best at scaling to many concurrent requests, even if its resource management only consists of consuming things in the task queue as quickly as possible.
I suspect that it's because the main thread needs to join the other threads, but I'm a bit rusty on my threading knowledge.

what about void append(JobD job, Args args) ? from the docs it works like the Executor.execute(Runnable) form java (submit a task to be run some time in the future)
note that here it is a LIFO queue instead of the expected FIFO queue so allocate enough workers

I discovered that the way I was constructing my delegate contributed to blocking in some part of the code. Instead of closing over the object returned by SocketServer.accept, I now pass that object as a parameter to my delegate. I don't know why this was the solution, but the program now works as expected. I heard that closures in D version 1 are broken; maybe this has something to do with it.

Related

How to have a long waiting thread in Intel TBB?

I want to create a thread or task (more than one to be exact) that goes and does some non CPU intensive work that will take a lot of time because of external causes, such a HTTP request or a file IO operation from a slow disk. I could do this with async await in C# and would be exactly what i am trying to do here. Spawn a thread or task and let it do it's own thing while i continue with execution of the program and simply let it return the result whenever ready. The problem with TBB i have is that all tasks i can make think they are created for a CPU intensive work.
Is what TBB calls GUI Thread what i want in this case ? I would need more than one, is that possible ? Can you point me to the right direction ? Should i look for another library that provides threading and is available for multiple OS ?
Any I/O blocking activity is poorly modeled by a task -- since tasks are meant to run to completion, it's just not what tasks are for. You will not find any TBB task-based approach that circumvents this. Since what you want is a thread, and you want it to work more-or-less nicely with other TBB code you already have, just use TBB's native thread class to solve the problem as you would with any other threading API. You won't need to set priority or anything else on this TBB-managed thread, because it'll get to its blocking call and then not take up any further time until the resource is available.
About the only thing I can think of specifically in TBB is that a task can be assigned a priority. But this isn't the same thing as a thread priority. TBB task priorities only dictate when a task will be selected from the ready pool, but like you said - once the task is running, it's expected to be working hard. The way to do use this to solve the problem you mentioned is to break your IO work into segments, then submit them into the work pool as a series of (dependent) low-priority tasks. But I don't think this gets to your real problem ...
The GUI Thread you mentioned is a pattern in the TBB patterns document that says how to offload a task and then wait for a callback to signal that it's complete. It's not altogether different from an async. I don't think this solves your problem either.
I think the best way for you here is to make an OS-level thread. That's pthreads on Linux or windows threads on Windows. Then you'll want to call this on it: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx ... if you happen to be in C++11, you could use a std::thread to create the thread and then call thread::native_handle to get a handle to call the Windows API to set the priority.

Find minimum queue size among threads

I am trying to implement a new scheduling technique with Multithreads. Each Thread has it own private local queue. The idea is, each time the task is created from the program thread, it should search the minimum queue sizes ( a queue with less number of tasks) among the queues and enqueue in it.
A way of load balancing among threads, where less busy queues enqueued more.
Can you please suggest some logics (or) idea how to find the minimum size queues among the given queues dynamically in programming point of view.
I am working on visual studio 2008, C++ programming language in our own multithreading library implementing a multi-rate synchronous data flow paradigm .
As you see trying to find the less loaded queue is cumbersome and could be an inefficient method as you may add more work to queues with only one heavy task, whereas queues with small tasks will have nor more jobs and become quickly inactive.
You'd better use a work-stealing heuristic : when a thread is done with its own jobs it will look at the other threads queues and "steal" some work instead of remaining idle or be terminated.
Then the system will be auto-balanced with each thread being active until there is not enough work for everyone.
You should not have a situation with idle threads and work waiting for processing.
If you really want to try this, can each queue not just keep a public 'int count' member, updated with atomic inc/dec as tasks are pushed/popped?
Whether such a design is worth the management overhead and the occasional 'mistakes' when a task is queued to a thread that happens to be running a particularly lengthy job when another thread is just about to dequeue a very short job, is another issue.
Why aren't the threads fetching their work from a 'master' work queue ?
If you are really trying to distribute work items from a master source, to a set of workers, you are then doing load balancing, as you say. In that case, you really are talking about scheduling, unless you simply do round-robin style balancing. Scheduling is a very deep subject in Computing, you can easily spend weeks, or months learning about it.
You could synchronise a counter among the threads. But I guess this isn't what you want.
Since you want to implement everything using dataflow, everything should be queues.
Your first option is to query the number of jobs inside a queue. I think this is not easy, if you want a single reader/writer pattern, because you probably have to use lock for this operation, which is not what you want. Note: I'm just guessing, that you can't use lock-free queues here; either you have a counter or take the difference of two pointers, either way you have a lock.
Your second option (which can be done with lock-free code) is to send a command back to the dispatcher thread, telling him that worker thread x has consumed a job. Using this approach you have n more queues, each from one worker thread to the dispatcher thread.

Managing agent thread pools in Clojure

Is there a way to control the thread pools which handle the functions which get sent to agents? As I understand things, if I send-off, underneath the hood I'm using an unbounded thread pool. I would like to, say, run some functions on one thread pool and other functions on another. The reason for this is say I have a some functions which do IO and which are also less important. I'd throw these on some bounded thread pool and wouldn't worry if there was excessive blocking and they stacked up since they're, well, less important. The main thing is that I wouldn't want their crappy IO blocking to say have an effect on some more important functions which are running on another thread pool.
I'm basing the question off of something similar I did with thread pools in Akka and I'm just wondering I can accomplish the same thing with Clojure.
For Clojure versions up to 1.4:
You cannot replace the built-in agent send and send-off thread pools. They are hard-coded in Agent.java.
The send pool (used for computation) is fixed size = 2 + Runtime.getRuntime().availableProcessors().
The send-off pool (also used for futures) is a cached thread pool and will grow without bound. This allows an arbitrary number of background tasks to wait for I/O. The cached threads will be reused and discarded if they've been idle for one minute.
If you want to manage work on your own thread pool, you'll need to dip into java.util.concurrent or use a Clojure wrapper lib.
For Clojure 1.5 (upcoming):
You can supply your own ExecutorService using (send-via executor a f), the default threadpools are not hard-wired anymore. See Agent.java in Clojure 1.5+ for details.
The Clojure library Claypoole is designed for exactly this. It lets you define threadpools and use (and reuse) them for futures, pmaps, and so on.
Amit Rathore (of Runa inc), has published a library (called medusa) for managing thread pools. It looks like a fairly close match for what you are looking for.
http://s-expressions.com/2010/06/08/medusa-0-1-a-supervised-thread-pool-for-clojure-futures-2/

boost::asio starting different services in threads?

Seems like all the examples always show running the same io_service in all threads.
Can you start multiple io_services? Here is what I would like to do:
Start io_service A in the main thread for handling user input...
Start another io_service B in another thread that then can start a bunch of worker
threads all sharing io_service B.
Users on io_service A can "post" work on io_service B so that it gets done on the worker pool but no work is to be done on io_service A, i.e. the main thread.
Is this possible? Does this make sense?
Thanks
In my experience, it really depends on the application if an io_service per cpu or one per process is better performing. There was a discussion on the asio-users mailing list a few years ago on this very topic.
The Boost.Asio documentation has some great examples showing these two techniques in the HTTP Server 2 and HTTP Server 3 examples. But keep in mind the second HTTP server just shows how to use this technique, not when or why to use it. Those questions will need to be answered by profiling your application.
In general, you should use the following order when creating applications using Boost.Asio
Single threaded
Thread pool with a single io_service
Multiple io_service objects with some sort of CPU affinity
Good question!
Yes, it is possible for one. In an application I'm currently working on I have broken up the application into separate components responsible for different aspects of the system. Each component runs in its own thread, has its own set of timers, does its own network I/O using asio. From a testability/design perspective, it seems more clean to me, since no component can interfere with another, but I stand to be corrected. I suppose I could rewrite everything passing in the io service as a parameter, but currently haven't found the need to do so.
So coming back to your question, you can do whatever you want, IMO it's more a case of try it out and change it if you run into any issues.
Also, you might want to take a look at what Sam Miller pointed out in a different post WRT handling user input ( that is if you're using a console): https://stackoverflow.com/questions/5210796/boost-asio-how-to-write-console-server

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.