I'm trying to use concurrency for my maps using pmap in Clojure, and I need to do some analysis based on the efficiency of the program under different thread counts.
Is the number of threads defined in Clojure within the pmap function, or somewhere in the project file? Looking at the pmap documentation there are no additional parameters compared to the map function.
For example, I need to run the program under 2, 32, 64 etc... threads.
Your question seems be to closely relative enough to:
How many threads does Clojure's pmap function spawn for URL-fetching operations?
From the answer of Alex Miller, you can deduce that the number of threads used by pmap is <your number of core> + 2. I don't why there is a + 2 but even with the current release of Clojure, 1.10.0, the source code of the pmap function is still the same.
As I have 4 cores on my machine, pmap should use 6 threads.
-- EDIT
To really answer to your question, you can define a custom pmap function, custom-pmap, which allow you to specify the number of thread you would like to use:
(defn custom-pmap
([f coll nb-thread]
(let [n nb-thread
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets)))))
(custom-pmap inc (range 1000) 8)
;; => (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ....999 1000)
You can use claypoole's pmap that takes a certain sized threadpool as a first argument.
# project.clj
[com.climate/claypoole "1.1.4"]
# or deps.edn
com.climate/claypoole {:mvn/version "1.1.4"}
Now let's specify some pool sizes and map an operation that takes one second over a collection of size 64.
(ns demo
(:refer-clojure :exclude-clojure [pmap])
(:require [com.climate.claypoole :refer [threadpool pmap]]))
(def pool-sizes
[2 32 64])
(doseq [pool-size pool-sizes]
(time (doall (pmap (threadpool pool-size) (fn [n] (Thread/sleep 1000)) (range 64)))))
"Elapsed time: 32113.704013 msecs""Elapsed time: 2013.242638 msecs""Elapsed time: 1011.616369 msecs"
So some overhead and 32 seconds for a threadpool with size 2, 2 seconds for size 32 en 1 second for size 64.
Related
Below is a simplified version of an application I am working on. Specifically, I am interested in benchmarking the execution time of process-list. In the function process-list, I partition the input list into partitions equal to the number of threads I would like to execute in parallel. I then pass each partition to a thread through a call to future. Finally, In main I call process-list with time wrapped around it. Time should return the elapsed time of processing done by process-list but apparently, it only returns the amount of time it takes to create the future threads and does not wait for the futures to execute to completion. How can I dereference the futures inside process-list to ensure the elapsed time accounts for the execution of the future-threads to completion?
(ns listProcessing
(:require [clojure.string]
[clojure.pprint]
[input-random :as input]))
(def N-THREADS 4)
(def element_processing_retries (atom 0))
(def list-collection
"Each element is made into a ref"
(map ref input/myList))
(defn partition-list [threads list]
"partition list into required number of partitions which is equal
to the number of threads"
(let [partitions (partition-all
(Math/ceil (/ (count list) threads)) list)]
partitions))
(defn increase-element [element]
(ref-set element inc))
(defn process-list [list]
"Process `members of list` one by one."
(let [sub-lists (partition-list N-THREADS list)]
(doseq [sub-list sub-lists]
(let [futures '()
myFuture (future (dosync (swap! element_processing_retries inc)
(map increase-element sub-list)))]
(cons myFuture futures)
(map deref futures)))))
(defn main []
(let [f1 (future (time (process-list input/mylist)))]
#f1)
(main)
(shutdown-agents)
Below is an example of a simplified list input: Note the input here is simplified and the list processing too to simplify the question.
(ns input-random)
(def myList (list 1 2 4 7 89 12 34 45 56))
This will have some overhead. If you're trying to time millisecond differences, this will skew things a bit (although minute timings shouldn't be using time anyways).
I think your example was a little convoluted, so I reduced it down to what I think represents the problem a little better:
(time (doseq [n (range 5)]
(future
(Thread/sleep 2000))))
"Elapsed time: 1.687702 msecs"
The problem here is the same as the problem with your code: all this really does is time how long it takes for doseq to dispatch all the jobs.
The idea with my hack is to put each finished job into an atom, then check for an end condition in a busy wait:
(defn do-stuff [n-things]
(let [ret-atom (atom 0)]
(doseq [n (range n-things)]
(future
(Thread/sleep 2000)
(swap! ret-atom inc)))
ret-atom))
; Time how long it takes the entire `let` to run
(time
(let [n 5
ret-atom (do-stuff n)]
; Will block until the condition is met
(while (< #ret-atom n))))
"Elapsed time: 2002.813288 msecs"
The reason this is so hard to time is all you're doing is spinning up some side effects in a doseq. There's nothing defining what "done" is, so there's nothing to block on. I'm not great with core.async, but I suspect there may be something that may help in there. It may be possible to have a call to <!! that blocks until a channel has a certain number of elements. In that case, you would just need to put results into the channel as they're produced.
I have implemented the Sieve of Eratosthenes using Clojure's standard library.
(defn primes [below]
(remove (set (mapcat #(range (* % %) below %)
(range 3 (Math/sqrt below) 2)))
(cons 2 (range 3 below 2))))
I think this should be amenable to parallelism as there is no recursion and the reducer versions of remove and mapcat can be dropped in. Here is what I came up with:
(defn pprimes [below]
(r/foldcat
(r/remove
(into #{} (r/mapcat #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2))))
(into [] (cons 2 (range 3 below 2))))))
I've poured the initial set and the generated multiples into vectors as I understand that LazySeqs can't be folded. Also, r/foldcat is used to finally realize the collection.
My problem is that this is a little slower than the first version.
(time (first (primes 1000000))) ;=> approx 26000 seconds
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
Is there too much overhead from the coordinating processes or am I using reducers wrong?
Thanks to leetwinski this seems to work:
(defn pprimes2 [below]
(r/foldcat
(r/remove
(into #{} (r/foldcat (r/map #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2)))))
(into [] (cons 2 (range 3 below 2))))))
Apparently I needed to add another fold operation in order to map #(range (* % %) below %) in parallel.
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
(time (first (pprimes2 1000000))) ;=> approx 7500 seconds
Edit: The above code doesn't work. r/foldcat isn't concatenating the composite numbers it is just returning a vector of the multiples for each prime number. The final result is a vector of 2 and all the odd numbers. Replacing r/map with r/mapcat gives the correct answer but it is again slower than the original primes.
as far as remember, the r/mapcat and r/remove are not parallel themselves, thy are just producing foldable collections, which are in turn can be subject to parallelization by r/fold. in your case the only parallel operation is r/foldcat, which is according to documentation "Equivalent to (fold cat append! coll)", meaning that you just potentially do append! in parallel, which isn't what you want at all.
To make it parallel you should probably use r/fold with remove as a reduce function and concat as a combine function, but it won't really make your code faster i guess, due to the nature of your algorithm (i mean you will try to remove a big set of items from every chunk of a collection)
I have the following bit of code that produces the correct results:
(ns scratch.core
(require [clojure.string :as str :only (split-lines join split)]))
(defn numberify [str]
(vec (map read-string (str/split str #" "))))
(defn process [acc sticks]
(let [smallest (apply min sticks)
cuts (filter #(> % 0) (map #(- % smallest) sticks))]
(if (empty? cuts)
acc
(process (conj acc (count cuts)) cuts))))
(defn print-result [[x & xs]]
(prn x)
(if (seq xs)
(recur xs)))
(let [input "8\n1 2 3 4 3 3 2 1"
lines (str/split-lines input)
length (read-string (first lines))
inputs (first (rest lines))]
(print-result (process [length] (numberify inputs))))
The process function above recursively calls itself until the sequence sticks is empty?.
I am curious to know if I could have used something like take-while or some other technique to make the code more succinct?
If ever I need to do some work on a sequence until it is empty then I use recursion but I can't help thinking there is a better way.
Your core problem can be described as
stop if count of sticks is zero
accumulate count of sticks
subtract the smallest stick from each of sticks
filter positive sticks
go back to 1.
Identify the smallest sub-problem as steps 3 and 4 and put a box around it
(defn cuts [sticks]
(let [smallest (apply min sticks)]
(filter pos? (map #(- % smallest) sticks))))
Notice that sticks don't change between steps 5 and 3, that cuts is a fn sticks->sticks, so use iterate to put a box around that:
(defn process [sticks]
(->> (iterate cuts sticks)
;; ----- 8< -------------------
This gives an infinite seq of sticks, (cuts sticks), (cuts (cuts sticks)) and so on
Incorporate step 1 and 2
(defn process [sticks]
(->> (iterate cuts sticks)
(map count) ;; count each sticks
(take-while pos?))) ;; accumulate while counts are positive
(process [1 2 3 4 3 3 2 1])
;-> (8 6 4 1)
Behind the scene this algorithm hardly differs from the one you posted, since lazy seqs are a delayed implementation of recursion. It is more idiomatic though, more modular, uses take-while for cancellation which adds to its expressiveness. Also it doesn't require one to pass the initial count and does the right thing if sticks is empty. I hope it is what you were looking for.
I think the way your code is written is a very lispy way of doing it. Certainly there are many many examples in The Little Schema that follow this format of reduction/recursion.
To replace recursion, I usually look for a solution that involves using higher order functions, in this case reduce. It replaces the min calls each iteration with a single sort at the start.
(defn process [sticks]
(drop-last (reduce (fn [a i]
(let [n (- (last a) (count i))]
(conj a n)))
[(count sticks)]
(partition-by identity (sort sticks)))))
(process [1 2 3 4 3 3 2 1])
=> (8 6 4 1)
I've changed the algorithm to fit reduce by grouping the same numbers after sorting, and then counting each group and reducing the count size.
The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.
Starting with Clojure I discovered a talk by Rich Hickey where he demonstrates some of Clojure's strengths on a basic Ant-Simulator.
Can this code still be considered as a good reference for Clojure? Especially the parts when he recursively sends off functions to agents to simulate a game loop.
Example:
(defn animation [x]
(when b/running
(send-off *agent* #'animation))
(. panel (repaint))
(. Thread (sleep defs/animation-sleep-ms))
nil)
Edit:
I am not interested in the #' reader macro but more wether it is idiomatic/good Clojure to
recursively call a function on a agent or not.
This snippet is current in Clojure 1.4. Is it idiomatic for a function to submit a task back to the agent that called it? Yes.
Here is an example that uses a similar approach to recursively calculate a factorial:
(defn fac [n limit total]
(if (< n limit)
(let [next-n (inc n)]
(send-off *agent* fac limit (* total next-n))
next-n)
total))
(def a (agent 1))
(await (send-off a fac 5 1))
; => nil
#a
;=> 120
Update
The above is a contrived example and actually not a good one, as there is a race condition between the various recursive send-off calls and the later await. There may be some send-off calls yet to be added to the agent's task queue.
I re-wrote the above as follows:
(defn factorial-using-agent-recursive [x]
(let [a (agent 1)]
(letfn [(calc [n limit total]
(if (< n limit)
(let [next-n (inc n)]
(send-off *agent* calc limit (* total next-n))
next-n)
total))]
(await (send-off a calc x 1)))
#a))
and observed the following behavior:
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(2 4 3 120 2 120 120 120 120 2)
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(2 2 2 3 2 2 3 2 120 2)
user=> (for [x (range 10)] (factorial-using-agent-recursive 5))
(120 120 120 120 120 120 120 120 120 120)
Moral of the story is: don't use agents for synchronous calculations. Use them for asynchronous independent tasks - like updating animations displayed to a user :)