I'm trying to use concurrency for my maps using pmap in Clojure, and I need to do some analysis based on the efficiency of the program under different thread counts.
Is the number of threads defined in Clojure within the pmap function, or somewhere in the project file? Looking at the pmap documentation there are no additional parameters compared to the map function.
For example, I need to run the program under 2, 32, 64 etc... threads.
Your question seems be to closely relative enough to:
How many threads does Clojure's pmap function spawn for URL-fetching operations?
From the answer of Alex Miller, you can deduce that the number of threads used by pmap is <your number of core> + 2. I don't why there is a + 2 but even with the current release of Clojure, 1.10.0, the source code of the pmap function is still the same.
As I have 4 cores on my machine, pmap should use 6 threads.
-- EDIT
To really answer to your question, you can define a custom pmap function, custom-pmap, which allow you to specify the number of thread you would like to use:
(defn custom-pmap
([f coll nb-thread]
(let [n nb-thread
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets)))))
(custom-pmap inc (range 1000) 8)
;; => (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ....999 1000)
You can use claypoole's pmap that takes a certain sized threadpool as a first argument.
# project.clj
[com.climate/claypoole "1.1.4"]
# or deps.edn
com.climate/claypoole {:mvn/version "1.1.4"}
Now let's specify some pool sizes and map an operation that takes one second over a collection of size 64.
(ns demo
(:refer-clojure :exclude-clojure [pmap])
(:require [com.climate.claypoole :refer [threadpool pmap]]))
(def pool-sizes
[2 32 64])
(doseq [pool-size pool-sizes]
(time (doall (pmap (threadpool pool-size) (fn [n] (Thread/sleep 1000)) (range 64)))))
"Elapsed time: 32113.704013 msecs""Elapsed time: 2013.242638 msecs""Elapsed time: 1011.616369 msecs"
So some overhead and 32 seconds for a threadpool with size 2, 2 seconds for size 32 en 1 second for size 64.
Below is a simplified version of an application I am working on. Specifically, I am interested in benchmarking the execution time of process-list. In the function process-list, I partition the input list into partitions equal to the number of threads I would like to execute in parallel. I then pass each partition to a thread through a call to future. Finally, In main I call process-list with time wrapped around it. Time should return the elapsed time of processing done by process-list but apparently, it only returns the amount of time it takes to create the future threads and does not wait for the futures to execute to completion. How can I dereference the futures inside process-list to ensure the elapsed time accounts for the execution of the future-threads to completion?
(ns listProcessing
(:require [clojure.string]
[clojure.pprint]
[input-random :as input]))
(def N-THREADS 4)
(def element_processing_retries (atom 0))
(def list-collection
"Each element is made into a ref"
(map ref input/myList))
(defn partition-list [threads list]
"partition list into required number of partitions which is equal
to the number of threads"
(let [partitions (partition-all
(Math/ceil (/ (count list) threads)) list)]
partitions))
(defn increase-element [element]
(ref-set element inc))
(defn process-list [list]
"Process `members of list` one by one."
(let [sub-lists (partition-list N-THREADS list)]
(doseq [sub-list sub-lists]
(let [futures '()
myFuture (future (dosync (swap! element_processing_retries inc)
(map increase-element sub-list)))]
(cons myFuture futures)
(map deref futures)))))
(defn main []
(let [f1 (future (time (process-list input/mylist)))]
#f1)
(main)
(shutdown-agents)
Below is an example of a simplified list input: Note the input here is simplified and the list processing too to simplify the question.
(ns input-random)
(def myList (list 1 2 4 7 89 12 34 45 56))
This will have some overhead. If you're trying to time millisecond differences, this will skew things a bit (although minute timings shouldn't be using time anyways).
I think your example was a little convoluted, so I reduced it down to what I think represents the problem a little better:
(time (doseq [n (range 5)]
(future
(Thread/sleep 2000))))
"Elapsed time: 1.687702 msecs"
The problem here is the same as the problem with your code: all this really does is time how long it takes for doseq to dispatch all the jobs.
The idea with my hack is to put each finished job into an atom, then check for an end condition in a busy wait:
(defn do-stuff [n-things]
(let [ret-atom (atom 0)]
(doseq [n (range n-things)]
(future
(Thread/sleep 2000)
(swap! ret-atom inc)))
ret-atom))
; Time how long it takes the entire `let` to run
(time
(let [n 5
ret-atom (do-stuff n)]
; Will block until the condition is met
(while (< #ret-atom n))))
"Elapsed time: 2002.813288 msecs"
The reason this is so hard to time is all you're doing is spinning up some side effects in a doseq. There's nothing defining what "done" is, so there's nothing to block on. I'm not great with core.async, but I suspect there may be something that may help in there. It may be possible to have a call to <!! that blocks until a channel has a certain number of elements. In that case, you would just need to put results into the channel as they're produced.
I'm keeping a registry of processes in an atom.
I want to start one and only one process (specifically a core.async go-loop) per id.
However, you're not supposed to perform side-effects in a swap!, so this code is no good:
(swap! processes-atom
(fn [processes]
(if (get processes id)
processes ;; already exists, do nothing
(assoc processes id (create-process! id)))))
How would I go about doing this correctly?
I have looked at locking, which takes an object as a monitor for the lock. I would prefer that each id - which are dynamic - have their own lock.
It seems that you need to protect processes-atom from concurrent modification, so that only single thread can have access to it. locking will work in this case. Since, by usage of locking, we will manage thread safety by ourselves, we can use volatile instead of atom (volatile is faster, but doesn't provide any thread-safety and atomicity guaranees).
Summing up the above, something like below should work fine:
(def processes-volatile (volatile! {}))
(defn create-and-save-process! [id]
(locking processes-volatile
(vswap! processes-volatile
(fn [processes]
(if (get processes id)
processes
(assoc processes id (create-process! id)))))))
You can do this by hand with locking, as OlegTheCat shows, and often that is a fine approach. However, in the comments you remark that it would be nice to avoid having the whole atom locked for as long as it takes to spawn a process, and that too is possible in a surprisingly simple way: instead of having a map from pid to process, have a map from pid to delay of process. That way, you can add a new delay very cheaply, and only actually create the process by dereferencing the delay, outside of the call to swap!. Dereferencing the delay will block waiting for that particular delay, so multiple threads who need the same process will not step on each other's toes, but the atom itself will be unlocked, allowing threads who want a different process to get it.
Here is a sample implementation of that approach, along with example definitions of the other vars your question implies, to make the code runnable as-is:
(def process-results (atom []))
(defn create-process! [id]
;; pretend creating the process takes a long time
(Thread/sleep (* 1000 (rand-int 3)))
(future
;; running it takes longer, but happens on a new thread
(Thread/sleep (* 1000 (rand-int 10)))
(swap! process-results conj id)))
(def processes-atom (atom {}))
(defn cached-process [id]
(-> processes-atom
(swap! (fn [processes]
(update processes id #(or % (delay (create-process! id))))))
(get id)
(deref)))
Of course only cached-process is needed if you already have the other things defined. And a sample run, to show that processes are successfully reused:
(defn stress-test [num-processes]
(reset! process-results [])
(reset! processes-atom {})
(let [running-processes (doall (for [i (range num-processes)]
(cached-process (rand-int 10))))]
(run! deref running-processes)
(deref process-results)))
user> (time (stress-test 40))
"Elapsed time: 18004.617869 msecs"
[1 5 2 0 9 7 8 4 3 6]
I prefer using a channel
(defn create-process! [id] {:id id})
(def ^:private processes-channel (chan))
(go (loop [processes {}]
(let [id (<! processes-channel)
process (if (contains? processes id)
(get processes id)
(create-process! id))]
(>! processes-channel process)
(recur (assoc processes id process)))))
(defn get-process-by-id
"Public API"
[id]
(>!! processes-channel id)
(<!! processes-channel))
Another answer is to use an agent to start each process. This decouples each process from each other, and avoids the problem of possible multiple calls to the "create-process" function:
(defn start-proc-agent
[state]
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" (:id state)))
(Thread/sleep delay)
(println (format "finished %d" (:id state)))
(merge state {:delay delay :state :running} )))
(def procs-agent (atom {}))
(dotimes [i 3]
(let [curr-agent (agent {:id i :state :unstarted})]
(swap! procs-agent assoc i curr-agent)
(send curr-agent start-proc-agent )))
(println "all dispatched...")
(pprint #procs-agent)
(Thread/sleep 3000)
(pprint #procs-agent)
When run we see:
starting 2
starting 1
starting 0
all dispatched...
{0 #<Agent#39d8240b: {:id 0, :state :unstarted}>,
1 #<Agent#3a6732bc: {:id 1, :state :unstarted}>,
2 #<Agent#7414167a: {:id 2, :state :unstarted}>}
finished 0
finished 1
finished 2
{0 #<Agent#39d8240b: {:id 0, :state :running, :delay 317}>,
1 #<Agent#3a6732bc: {:id 1, :state :running, :delay 1635}>,
2 #<Agent#7414167a: {:id 2, :state :running, :delay 1687}>}
So the global map procs-agent associates each process ID with the agent for that process. A side benefit of this approach is that you can send subsequent commands (in the form of functions) to the agent for a process and be assured they are independent (and parallel & asynchronous) to every other agent.
Alternate solution
Similar to your original question, we could use a single agent (instead of an agent per process) to simply serialize the creation of each process. Since agents are asynchronous, they don't have the possibility of re-trying the input function like swap!. Thus, side-effecting functions aren't a problem. You could write it like so:
(defn start-proc-once-only
[state i]
(let [curr-proc (get state i) ]
(if (= :running (:state curr-proc))
(do
(println "skipping restart of" i)
state)
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" i))
(Thread/sleep delay)
(println (format "finished %d" i))
(assoc state i {:delay delay :state :running})))))
(def procs (agent {}))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(println "all dispatched...")
(println :procs) (pprint #procs)
(Thread/sleep 5000)
(println :procs) (pprint #procs)
with result
:starting 0
:starting 1
:starting 2
starting 0
:starting 0
:starting 1
:starting 2
all dispatched...
:procs
{}
finished 0
starting 1
finished 1
starting 2
finished 2
skipping restart of 0
skipping restart of 1
skipping restart of 2
:procs
{0 {:delay 1970, :state :running},
1 {:delay 189, :state :running},
2 {:delay 1337, :state :running}}
I think you should use add-watch. It gets called once per change to the atom. In the watch-fn check whether a new id has been added to the atom, if so, create the process and add it to the atom. That'll trigger another call to the watch-fn, but that second call won't identify any new id needing a process.
The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.
Starting with Clojure I discovered a talk by Rich Hickey where he demonstrates some of Clojure's strengths on a basic Ant-Simulator.
Can this code still be considered as a good reference for Clojure? Especially the parts when he recursively sends off functions to agents to simulate a game loop.
Example:
(defn animation [x]
(when b/running
(send-off *agent* #'animation))
(. panel (repaint))
(. Thread (sleep defs/animation-sleep-ms))
nil)
Edit:
I am not interested in the #' reader macro but more wether it is idiomatic/good Clojure to
recursively call a function on a agent or not.
This snippet is current in Clojure 1.4. Is it idiomatic for a function to submit a task back to the agent that called it? Yes.
Here is an example that uses a similar approach to recursively calculate a factorial:
(defn fac [n limit total]
(if (< n limit)
(let [next-n (inc n)]
(send-off *agent* fac limit (* total next-n))
next-n)
total))
(def a (agent 1))
(await (send-off a fac 5 1))
; => nil
#a
;=> 120
Update
The above is a contrived example and actually not a good one, as there is a race condition between the various recursive send-off calls and the later await. There may be some send-off calls yet to be added to the agent's task queue.
I re-wrote the above as follows:
(defn factorial-using-agent-recursive [x]
(let [a (agent 1)]
(letfn [(calc [n limit total]
(if (< n limit)
(let [next-n (inc n)]
(send-off *agent* calc limit (* total next-n))
next-n)
total))]
(await (send-off a calc x 1)))
#a))
and observed the following behavior:
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(2 4 3 120 2 120 120 120 120 2)
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(2 2 2 3 2 2 3 2 120 2)
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(120 120 120 120 120 120 120 120 120 120)
Moral of the story is: don't use agents for synchronous calculations. Use them for asynchronous independent tasks - like updating animations displayed to a user :)