Below is a simplified version of an application I am working on. Specifically, I am interested in benchmarking the execution time of process-list. In the function process-list, I partition the input list into partitions equal to the number of threads I would like to execute in parallel. I then pass each partition to a thread through a call to future. Finally, In main I call process-list with time wrapped around it. Time should return the elapsed time of processing done by process-list but apparently, it only returns the amount of time it takes to create the future threads and does not wait for the futures to execute to completion. How can I dereference the futures inside process-list to ensure the elapsed time accounts for the execution of the future-threads to completion?
(ns listProcessing
(:require [clojure.string]
[clojure.pprint]
[input-random :as input]))
(def N-THREADS 4)
(def element_processing_retries (atom 0))
(def list-collection
"Each element is made into a ref"
(map ref input/myList))
(defn partition-list [threads list]
"partition list into required number of partitions which is equal
to the number of threads"
(let [partitions (partition-all
(Math/ceil (/ (count list) threads)) list)]
partitions))
(defn increase-element [element]
(ref-set element inc))
(defn process-list [list]
"Process `members of list` one by one."
(let [sub-lists (partition-list N-THREADS list)]
(doseq [sub-list sub-lists]
(let [futures '()
myFuture (future (dosync (swap! element_processing_retries inc)
(map increase-element sub-list)))]
(cons myFuture futures)
(map deref futures)))))
(defn main []
(let [f1 (future (time (process-list input/mylist)))]
#f1)
(main)
(shutdown-agents)
Below is an example of a simplified list input: Note the input here is simplified and the list processing too to simplify the question.
(ns input-random)
(def myList (list 1 2 4 7 89 12 34 45 56))
This will have some overhead. If you're trying to time millisecond differences, this will skew things a bit (although minute timings shouldn't be using time anyways).
I think your example was a little convoluted, so I reduced it down to what I think represents the problem a little better:
(time (doseq [n (range 5)]
(future
(Thread/sleep 2000))))
"Elapsed time: 1.687702 msecs"
The problem here is the same as the problem with your code: all this really does is time how long it takes for doseq to dispatch all the jobs.
The idea with my hack is to put each finished job into an atom, then check for an end condition in a busy wait:
(defn do-stuff [n-things]
(let [ret-atom (atom 0)]
(doseq [n (range n-things)]
(future
(Thread/sleep 2000)
(swap! ret-atom inc)))
ret-atom))
; Time how long it takes the entire `let` to run
(time
(let [n 5
ret-atom (do-stuff n)]
; Will block until the condition is met
(while (< #ret-atom n))))
"Elapsed time: 2002.813288 msecs"
The reason this is so hard to time is all you're doing is spinning up some side effects in a doseq. There's nothing defining what "done" is, so there's nothing to block on. I'm not great with core.async, but I suspect there may be something that may help in there. It may be possible to have a call to <!! that blocks until a channel has a certain number of elements. In that case, you would just need to put results into the channel as they're produced.
I recently discovered the Specter library that provides data-structure navigation and transformation functions and is written in Clojure.
Implementing some of its API as a learning exercise seemed like a good idea. Specter implements an API taking a function and a nested structure as arguments and returns a vector of elements from the nested structure that satisfies the function like below:
(select (walker number?) [1 :a {:b 2}]) => [1 2]
Below is my attempt at implementing a function with similar API:
(defn select-walker [afn ds]
(vec (if (and (coll? ds) (not-empty ds))
(concat (select-walker afn (first ds))
(select-walker afn (rest ds)))
(if (afn ds) [ds]))))
(select-walker number? [1 :a {:b 2}]) => [1 2]
I have tried implementing select-walker by using list comprehension, looping, and using cons and conj. In all these cases
the return value was a nested list instead of a flat vector of elements.
Yet my implementation does not seem like idiomatic Clojure and has poor time and space complexity.
(time (dotimes [_ 1000] (select (walker number?) (range 100))))
"Elapsed time: 19.445396 msecs"
(time (dotimes [_ 1000] (select-walker number? (range 100))))
"Elapsed time: 237.000334 msecs"
Notice that my implementation is about 12 times slower than Specter's implementation.
I have three questions on the implemention of select-walker.
Is a tail-recursive implementaion of select-walker possible?
Can select-walker be written in more idiomatic Clojure?
Any hints to make select-walker execute faster?
there are at least two possibilities to make it tail recursive. First one is to process data in loop like this:
(defn select-walker-rec [afn ds]
(loop [res [] ds ds]
(cond (empty? ds) res
(coll? (first ds)) (recur res
(doall (concat (first ds)
(rest ds))))
(afn (first ds)) (recur (conj res (first ds)) (rest ds))
:else (recur res (rest ds)))))
in repl:
user> (select-walker-rec number? [1 :a {:b 2}])
[1 2]
user> user> (time (dotimes [_ 1000] (select-walker-rec number? (range 100))))
"Elapsed time: 19.428887 msecs"
(simple select-walker works about 200ms for me)
the second one (slower though, and more suitable for more difficult tasks) is to use zippers:
(require '[clojure.zip :as z])
(defn select-walker-z [afn ds]
(loop [res [] curr (z/zipper coll? seq nil ds)]
(cond (z/end? curr) res
(z/branch? curr) (recur res (z/next curr))
(afn (z/node curr)) (recur (conj res (z/node curr))
(z/next curr))
:else (recur res (z/next curr)))))
user> (time (dotimes [_ 1000] (select-walker-z number? (range 100))))
"Elapsed time: 219.015153 msecs"
this one is really slow, since zipper operates on more complex structures. It's great power brings unneeded overhead to this simple task.
the most idiomatic approach i guess, is to use tree-seq:
(defn select-walker-t [afn ds]
(filter #(and (not (coll? %)) (afn %))
(tree-seq coll? seq ds)))
user> (time (dotimes [_ 1000] (select-walker-t number? (range 100))))
"Elapsed time: 1.320209 msecs"
it is incredibly fast, as it produces a lazy sequence of results. In fact you should realize its data for the fair test:
user> (time (dotimes [_ 1000] (doall (select-walker-t number? (range 100)))))
"Elapsed time: 53.641014 msecs"
one more thing to notice about this variant, is that it's not tail recursive, so it would fail in case of really deeply nested structures (maybe i'm mistaken, but i guess it's about couple of thousands levels of nesting), still it's suitable for the most cases.
What is the difference between butlast and drop-last in Clojure ?
Is it only the laziness ? Should I should prefer one over the other ?
also, if you need to realize the whole collection, butlast is dramatically faster, which is logical if you look at their source:
(def
butlast (fn ^:static butlast [s]
(loop [ret [] s s]
(if (next s)
(recur (conj ret (first s)) (next s))
(seq ret)))))
(defn drop-last
([s] (drop-last 1 s))
([n s] (map (fn [x _] x) s (drop n s))))
so drop-last uses map, while butlast uses simple iteration with recur. Here is a little example:
user> (time (let [_ (butlast (range 10000000))]))
"Elapsed time: 2052.853726 msecs"
nil
user> (time (let [_ (doall (drop-last (range 10000000)))]))
"Elapsed time: 14072.259077 msecs"
nil
so i wouldn't blindly prefer one over another. I use drop-last only when i really need laziness, otherwise butlast.
Yes, laziness as well as the fact that drop-last can take also take n, indicating how many elements to drop from the end lazily.
There's a discussion here where someone is making the case that butlast is more readable and maybe a familiar idiom for Lisp programmers, but I generally just opt to use drop-last.
I'm doing problem 7 of Project Euler (calculate the 10001st prime). I have coded a solution in the form of a lazy sequence, but it is super slow, whereas another solution I found on the web (link below) and which does essentially the same thing takes less than a second.
I'm new to clojure and lazy sequences, so my usage of take-while, lazy-cat rest or map may be the culprits. Could you PLEASE look at my code and tell me if you see anything?
The solution that runs under a second is here:
https://zach.se/project-euler-solutions/7/
It doesn't use lazy sequences. I'd like to know why it's so fast while mine is so slow (the process they follow is similar).
My solution which is super slow:
(def primes
(letfn [(getnextprime [largestprimesofar]
(let [primessofar (concat (take-while #(not= largestprimesofar %) primes) [largestprimesofar])]
(loop [n (+ (last primessofar) 2)]
(if
(loop [primessofarnottriedyet (rest primessofar)]
(if (= 0 (count primessofarnottriedyet))
true
(if (= 0 (rem n (first primessofarnottriedyet)))
false
(recur (rest primessofarnottriedyet)))))
n
(recur (+ n 2))))))]
(lazy-cat '(2 3) (map getnextprime (rest primes)))))
To try it, just load it and run something like (take 10000 primes), but use Ctrl+C to kill the process, because it is too slow. However, if you try (take 100 primes), you should get an instant answer.
Let me re-write your code just a bit to break it down into pieces that will be easier to discuss. I'm using your same algorithm, I'm just splitting out some of the inner forms into separate functions.
(declare primes) ;; declare this up front so we can refer to it below
(defn is-relatively-prime? [n candidates]
(if (= 0 (count candidates))
true
(if (zero? (rem n (first candidates)))
false
(is-relatively-prime? n (rest candidates)))))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (let [p (doall (take 200 primes))]))
That last line is just to make it easier to get some really rough benchmarks in the REPL. By making the timing statement part of the source file, I can keep re-loading the source, and get a fresh benchmark each time. If I just load the file once, and keep trying to do (take 500 primes) the benchmark will be skewed because primes will hold on to the primes it has already calculated. I also need the doall because I'm pulling my prime numbers inside a let statement, and if I don't use doall, it will just store the lazy sequence in p, instead of actually calculating the primes.
Now, let's get some base values. On my PC, I get this:
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.492597 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 293.673962 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 322.035034 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 285.29596 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 224.311828 msecs"
So about 275 milliseconds, give or take 50. My first suspicion is how we're getting primes-so-far in the let statement inside get-next-prime. We're walking through the complete list of primes (as far as we have it) until we get to one that's equal to the largest prime so far. The way we've structured our code, however, all the primes are already in order, so we're effectively walking thru all the primes except the last, and then concatenating the last value. We end up with exactly the same values as have been realized so far in the primes sequence, so we can skip that whole step and just use primes. That should save us something.
My next suspicion is the call to (last primes-so-far) in the loop. When we use the last function on a sequence, it also walks the list from the head down to the tail (or at least, that's my understanding -- I wouldn't put it past the Clojure compiler writers to have snuck in some special-case code to speed things up.) But again, we don't need it. We're calling get-next-prime with largest-prime-so-far, and since our primes are in order, that's already the last of the primes as far as we've realized them, so we can just use largest-prime-so-far instead of (last primes). That will give us this:
(defn get-next-prime [largest-prime-so-far]
; deleted the let statement since we don't need it
(loop [n (+ largest-prime-so-far 2)]
(if
(is-relatively-prime? n (rest primes))
n
(recur (+ n 2)))))
That seems like it should speed things up, since we've eliminated two complete walks through the primes sequence. Let's try it.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 242.130691 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 223.200787 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 287.63579 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 244.927825 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.146199 msecs"
Hmm, maybe slightly better (?), but not nearly the improvement I expected. Let's look at the code for is-relatively-prime? (as I've re-written it). And the first thing that jumps out at me is the count function. The primes sequence is a sequence, not a vector, which means the count function also has to walk the complete list to add up how many elements are in it. What's worse, if we start with a list of, say, 10 candidates, it walks all ten the first time through the loop, then walks the nine remaining candidates on the next loop, then the 8 remaining, and so on. As the number of primes gets larger, we're going to spend more and more time in the count function, so maybe that's our bottleneck.
We want to get rid of that count, and that suggests a more idiomatic way we could do the loop, using if-let. Like this:
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
The (first candidates) function will return nil if the candidates list is empty, and if that happens, the if-let function will notice, and automatically jump to the else clause, which in this case is our return result of "true." Otherwise, we'll execute the "then" clause, and can test for whether n is evenly divisible by the current candidate. If it is, we return false, otherwise we recur back with the rest of the candidates. I also took advantage of the zero? function just because I could. Let's see what this gets us.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.981985 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.011646 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.154197 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.905292 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.215208 msecs"
Pretty dramatic, eh? I'm an intermediate-level Clojure coder with a pretty sketchy understanding of the internals, so take my analysis with a grain of salt, but based on those numbers, I'd guess you were getting bitten by the count.
There's one other optimization the "fast" code is using that yours isn't, and that's bailing out on the is-relatively-prime? test whenever current squared is greater than n---you might speed up your code some more if you can throw that in. But I think count is the main thing you're looking for.
I will continue speeding it up, based on #manutter's solution.
(declare primes)
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (first (drop 10000 primes)))
"Elapsed time: 14092.414513 msecs"
Ok. First of all let's add this current^2 > n optimization:
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n
(take-while #(<= (* % %) n)
(rest primes-so-far)))
n
(recur (+ n 2))))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 10564.470626 msecs"
104743
Nice. Now let's look closer at the get-next-prime:
if you check the algorithm carefully, you will notice that this:
(concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far]) really equals to all the primes we've found so far, and (last primes-so-far) is really the largest-prime-so-far. So let's rewrite it a little:
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (is-relatively-prime? n
(take-while #(<= (* % %) n) (rest primes)))
n
(recur (+ n 2)))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 142.676634 msecs"
104743
let's add one more order of magnitude:
user> (time (first (drop 100000 primes)))
"Elapsed time: 2615.910723 msecs"
1299721
Wow! it's just mind blowing!
but that's not all. let's take a look at is-relatively-prime function:
it just checks that none of the candidates evenly divides the number. So it is really what not-any? library function does. So let's just replace it in get-next-prime.
(declare primes)
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (not-any? #(zero? (rem n %))
(take-while #(<= (* % %) n)
(rest primes)))
n
(recur (+ n 2)))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
it is a bit more productive
user> (time (first (drop 100000 primes)))
"Elapsed time: 2493.291323 msecs"
1299721
and obviously much cleaner and shorter.
The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.