I have light CPU intensive functions that I want to run in parallel. What is the concurrency primitive which I should use ?
Use of agents and futures is not worthwile, as the cost of creating a new thread for these process is not justified.
I want to basically run a few light functions in concurrently, without creating threads. Can i do that ?
Thanks,
Murtaza
Have you benchmarked?
Agents might well be a good solution anyway, since they use a fixed-size thread pool that gets re-used (so you aren't creating new threads constantly).
I've benchmarked quickly on my machine and can do over million agent calls in 3 seconds:
(def ag (agent 0))
(time (dotimes [i 1000000] (send ag inc)))
=> "Elapsed time: 2882.170586 msecs"
If agents are still too heavyweight (unlikely?), then you should probably be looking for a way to batch up a group of functions into a single block of work. If you do this, then the overhead of the concurrency primitives will be minimal.
Related
I have a (let's say) function in C++ that takes random time to return a result (from 0.001 to 10 seconds). I would like to run this function N~10^5 independent times. So, it is natural to run lots of threads.
My question, is how can I run only 10 threads at a time. By this I mean that I would like to start 10 threads, and launching a new one only when another finishes. I tried launching 10^5 threads, and let the computer figure it out, but it doesn't work. Any suggestions?
You can either use std::async and let the system manage the number of threads.
Or you can, set a limited number of threads (a thread pool) - preferably using std::thread::hardware_concurrency; There are many ways to do this...
You need to start reading documentation of:
std::promise, std::future (std::shared_future)
std::packaged_task
C++ Concurrency In Action has an excellent chapter 9 on thread pools.
One good advanced idea is to make a thread-safe function queue (the simplest such queue can be obtained by wrapping std::queue in a class with much reduced interface that locks an std::mutex before pushing and popping/returning the popped value) to which you push your functions in advance... Then have your threads pull from the queue and execute the functions.
I was using reducers library in some of the places in my code on a production server with 32 cores to leverage some parallelism. But the Fork/Join frameworks seems to utilize cores so heavily that other processes choke out and become unresponsive.
Is there some way to limit the no. of cores being used or thread being spawned by reducers library on a jvm instance?
It seems it isn't possible to adjust the standard reducers forkjoin threadpool size through function or configuration parameters. You need to change core.reducers itself.
From core.reducers source
(def pool (delay (java.util.concurrent.ForkJoinPool.)))
This corresponds with the default java constructor without arguments
ForkJoinPool()
Creates a ForkJoinPool with parallelism equal to Runtime.availableProcessors(), using the default thread factory, no UncaughtExceptionHandler, and non-async LIFO processing mode.
instead of
ForkJoinPool(int parallelism)
Creates a ForkJoinPool with the indicated parallelism level, the default thread factory, no UncaughtExceptionHandler, and non-async LIFO processing mode.
It would be a nice addition to have at least the option to control the number of cores (there's also an even more configurable version of ForkJoinPool), but for now the only option would be to fork core.reducers and change that line to the number of max cores you want used:
(def pool (delay (java.util.concurrent.ForkJoinPool. 28)))
As I write more core.async code, a very common pattern that emerges is a go-loop that alts over a sequence of channels and does some work in response to a message, e.g.:
(go-loop [state {}]
(let [[value task] (alts! tasks)]
...work...
(recur state))
I don't feel like I understand the tradeoffs of the various ways I can actually do the work though, so I thought I'd try to explore them here.
Inline or by calling a function: this blocks the loop from continuing until the work is complete. Since it's in a go block, one wouldn't want to do I/O or locking operations.
>! a message to a channel monitored by a worker: if the channel is full, this blocks the loop by parking until the channel has capacity. This allows the thread to do other work and allows back pressure.
>!! a message: if the channel is full, this blocks by sleeping the thread running the go loop. This is probably undesirable because go threads are a strictly finite resource.
>! a message within another go block: this will succeed nearly immediately unless there are no go threads available. Conversely, if the channel is full and is being consumed slowly, this could starve the system of go threads in short order.
>!! a message with a thread block: similar to the go block, but consuming system threads instead of go threads, so the upper bound is probably higher
puts! a message: it's unclear what the tradeoffs are
call the work function in a future: gives the work to a thread from the clojure agent pool to do, allows the go loop to continue. If the input rate exceeds the output rate, this grows the agent pool queue without bound.
Is this summary correct and comprehensive?
If the work to be done is entirely CPU-bound, then I would probably do it inline in the go block, unless it's an operation that may take a long time and I want the go block to continue responding to other messages.
In general, any work which doesn't block, sleep, or do I/O can be safely put in a go block without having a major impact on the throughput of the system.
You can use >! to submit work to a worker or pool of workers. I would almost never use >!! in a go block because it can block one of the finite number of threads allocated to running go blocks.
When you need to do I/O or a potentially long-running computation, use a thread instead of a go. This is very similar to future — it creates a real thread — but it returns a channel like go.
put! is a lower-level operation generally used at the "boundaries" of core.async to connect it to conventional callback-based interfaces. There's rarely any reason to use put! inside a go.
core.async can support fine-grained control over how threads are created. I demonstrated a few possibilities in a blog post, Parallel Processing with core.async.
I am doing something related with thread with clojure.
Now I am working in REPL, write and update code in REPL.
My problem is, sometime there will be some futures left running. And I lost the ref by some kind of mistakes. My only way to stop them is restart the repl.
I want to know is there any ways to stop the running futures(threads) if I have no ref on them?
Clojure public API provides the only way you can stop running futures which is the following:
(shutdown-agents)
But that will not interrupt your future jobs. Instead of that you can interrupt them with
(import 'clojure.lang.Agent)
(.shutdownNow Agent/soloExecutor)
But please keep in mind that after described operations your Agent/soloExecutor will not accept new tasks. One of the ways how to deal with it is to reassign soloExecutor java field in Agent class. Hopefully it's public and not final.
(import 'java.util.concurrent.Executors)
(set! Agent/soloExecutor (Executors/newCachedThreadPool)) ;; in original clojure version thread pool is created with thread factory, but that createThreadFactory method is private, anyways for your purposes this code may work just fine without any thread factories.
(future (Thread/sleep 1000) (println "done") 100) ;; now works fine
But in my opinion it is not recommended way to make things in repl. It's way better not to lose your future references.
Basically I have a Task and a Thread class,I create threads equal to the amount of physical cores(or logical cores,since on Intel CPU cores they're double the count).
So basically threads take tasks from a list of tasks and execute them.However,I do have to make sure everything is safe and multiple threads don't try to take the same task at once and of course this introduces extra overhead(and headaches).
What I put the tasks functionality inside the threads?I mean - instead of 4 threads grabbing tasks from a pool of 200 tasks,why not 200 threads that execute in groups of 4 by 4,basically I won't need to synchronize anything,no locking,no nothing.Of course I won't be creating the threads during the whole run-time,just at the initialization.
What pros and cons would such a method have?One problem I can thin of is - since I only create the threads at initialization,their count is fixed,while with tasks I can keep dumping more tasks in the task pool.
Threads have cost - each one requires space for a TLS and for a stack as a minimum.
Keeping your Task and Thread classes separate would be a cleaner and more managable approach in the long run, and keep overhead down by allowing you to limit how many Threads are created and running at any given time (also, a Task is likely to take up less memory than a Thread, and be faster to create and free when needed). A Task is what controls what gets done. A Thread is what controls when a Task is run. Yes, you would need to store the Task objects in a thread-safe list, but that is very simple to implement using a critical section, mutexe, semaphore, etc. On Windows specifically, you could alternatively use an I/O Completion Port instead to submit Tasks to Threads, and let the OS handle the synchornization and scheduling for you.
It will definitely take longer to have 200 threads running at once than it is to run 4 threads to run 200 "tasks". You can test this by a simple program that does some simple math (e.g. calculate the first 20000000 prime, by asking each thread to do 100000 numbers at a time, then grabbing the next lot, or making 200 threads with 100000 numbers each).
How much slower? Don't know, depends on so many things.