Why is my Clojure prime number lazy sequence so slow? - clojure

I'm doing problem 7 of Project Euler (calculate the 10001st prime). I have coded a solution in the form of a lazy sequence, but it is super slow, whereas another solution I found on the web (link below) and which does essentially the same thing takes less than a second.
I'm new to clojure and lazy sequences, so my usage of take-while, lazy-cat rest or map may be the culprits. Could you PLEASE look at my code and tell me if you see anything?
The solution that runs under a second is here:
https://zach.se/project-euler-solutions/7/
It doesn't use lazy sequences. I'd like to know why it's so fast while mine is so slow (the process they follow is similar).
My solution which is super slow:
(def primes
(letfn [(getnextprime [largestprimesofar]
(let [primessofar (concat (take-while #(not= largestprimesofar %) primes) [largestprimesofar])]
(loop [n (+ (last primessofar) 2)]
(if
(loop [primessofarnottriedyet (rest primessofar)]
(if (= 0 (count primessofarnottriedyet))
true
(if (= 0 (rem n (first primessofarnottriedyet)))
false
(recur (rest primessofarnottriedyet)))))
n
(recur (+ n 2))))))]
(lazy-cat '(2 3) (map getnextprime (rest primes)))))
To try it, just load it and run something like (take 10000 primes), but use Ctrl+C to kill the process, because it is too slow. However, if you try (take 100 primes), you should get an instant answer.

Let me re-write your code just a bit to break it down into pieces that will be easier to discuss. I'm using your same algorithm, I'm just splitting out some of the inner forms into separate functions.
(declare primes) ;; declare this up front so we can refer to it below
(defn is-relatively-prime? [n candidates]
(if (= 0 (count candidates))
true
(if (zero? (rem n (first candidates)))
false
(is-relatively-prime? n (rest candidates)))))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (let [p (doall (take 200 primes))]))
That last line is just to make it easier to get some really rough benchmarks in the REPL. By making the timing statement part of the source file, I can keep re-loading the source, and get a fresh benchmark each time. If I just load the file once, and keep trying to do (take 500 primes) the benchmark will be skewed because primes will hold on to the primes it has already calculated. I also need the doall because I'm pulling my prime numbers inside a let statement, and if I don't use doall, it will just store the lazy sequence in p, instead of actually calculating the primes.
Now, let's get some base values. On my PC, I get this:
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.492597 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 293.673962 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 322.035034 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 285.29596 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 224.311828 msecs"
So about 275 milliseconds, give or take 50. My first suspicion is how we're getting primes-so-far in the let statement inside get-next-prime. We're walking through the complete list of primes (as far as we have it) until we get to one that's equal to the largest prime so far. The way we've structured our code, however, all the primes are already in order, so we're effectively walking thru all the primes except the last, and then concatenating the last value. We end up with exactly the same values as have been realized so far in the primes sequence, so we can skip that whole step and just use primes. That should save us something.
My next suspicion is the call to (last primes-so-far) in the loop. When we use the last function on a sequence, it also walks the list from the head down to the tail (or at least, that's my understanding -- I wouldn't put it past the Clojure compiler writers to have snuck in some special-case code to speed things up.) But again, we don't need it. We're calling get-next-prime with largest-prime-so-far, and since our primes are in order, that's already the last of the primes as far as we've realized them, so we can just use largest-prime-so-far instead of (last primes). That will give us this:
(defn get-next-prime [largest-prime-so-far]
; deleted the let statement since we don't need it
(loop [n (+ largest-prime-so-far 2)]
(if
(is-relatively-prime? n (rest primes))
n
(recur (+ n 2)))))
That seems like it should speed things up, since we've eliminated two complete walks through the primes sequence. Let's try it.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 242.130691 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 223.200787 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 287.63579 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 244.927825 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.146199 msecs"
Hmm, maybe slightly better (?), but not nearly the improvement I expected. Let's look at the code for is-relatively-prime? (as I've re-written it). And the first thing that jumps out at me is the count function. The primes sequence is a sequence, not a vector, which means the count function also has to walk the complete list to add up how many elements are in it. What's worse, if we start with a list of, say, 10 candidates, it walks all ten the first time through the loop, then walks the nine remaining candidates on the next loop, then the 8 remaining, and so on. As the number of primes gets larger, we're going to spend more and more time in the count function, so maybe that's our bottleneck.
We want to get rid of that count, and that suggests a more idiomatic way we could do the loop, using if-let. Like this:
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
The (first candidates) function will return nil if the candidates list is empty, and if that happens, the if-let function will notice, and automatically jump to the else clause, which in this case is our return result of "true." Otherwise, we'll execute the "then" clause, and can test for whether n is evenly divisible by the current candidate. If it is, we return false, otherwise we recur back with the rest of the candidates. I also took advantage of the zero? function just because I could. Let's see what this gets us.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.981985 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.011646 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.154197 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.905292 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.215208 msecs"
Pretty dramatic, eh? I'm an intermediate-level Clojure coder with a pretty sketchy understanding of the internals, so take my analysis with a grain of salt, but based on those numbers, I'd guess you were getting bitten by the count.
There's one other optimization the "fast" code is using that yours isn't, and that's bailing out on the is-relatively-prime? test whenever current squared is greater than n---you might speed up your code some more if you can throw that in. But I think count is the main thing you're looking for.

I will continue speeding it up, based on #manutter's solution.
(declare primes)
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (first (drop 10000 primes)))
"Elapsed time: 14092.414513 msecs"
Ok. First of all let's add this current^2 > n optimization:
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n
(take-while #(<= (* % %) n)
(rest primes-so-far)))
n
(recur (+ n 2))))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 10564.470626 msecs"
104743
Nice. Now let's look closer at the get-next-prime:
if you check the algorithm carefully, you will notice that this:
(concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far]) really equals to all the primes we've found so far, and (last primes-so-far) is really the largest-prime-so-far. So let's rewrite it a little:
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (is-relatively-prime? n
(take-while #(<= (* % %) n) (rest primes)))
n
(recur (+ n 2)))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 142.676634 msecs"
104743
let's add one more order of magnitude:
user> (time (first (drop 100000 primes)))
"Elapsed time: 2615.910723 msecs"
1299721
Wow! it's just mind blowing!
but that's not all. let's take a look at is-relatively-prime function:
it just checks that none of the candidates evenly divides the number. So it is really what not-any? library function does. So let's just replace it in get-next-prime.
(declare primes)
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (not-any? #(zero? (rem n %))
(take-while #(<= (* % %) n)
(rest primes)))
n
(recur (+ n 2)))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
it is a bit more productive
user> (time (first (drop 100000 primes)))
"Elapsed time: 2493.291323 msecs"
1299721
and obviously much cleaner and shorter.

Related

Select elements from a nested structure that match a condition in Clojure

I recently discovered the Specter library that provides data-structure navigation and transformation functions and is written in Clojure.
Implementing some of its API as a learning exercise seemed like a good idea. Specter implements an API taking a function and a nested structure as arguments and returns a vector of elements from the nested structure that satisfies the function like below:
(select (walker number?) [1 :a {:b 2}]) => [1 2]
Below is my attempt at implementing a function with similar API:
(defn select-walker [afn ds]
(vec (if (and (coll? ds) (not-empty ds))
(concat (select-walker afn (first ds))
(select-walker afn (rest ds)))
(if (afn ds) [ds]))))
(select-walker number? [1 :a {:b 2}]) => [1 2]
I have tried implementing select-walker by using list comprehension, looping, and using cons and conj. In all these cases
the return value was a nested list instead of a flat vector of elements.
Yet my implementation does not seem like idiomatic Clojure and has poor time and space complexity.
(time (dotimes [_ 1000] (select (walker number?) (range 100))))
"Elapsed time: 19.445396 msecs"
(time (dotimes [_ 1000] (select-walker number? (range 100))))
"Elapsed time: 237.000334 msecs"
Notice that my implementation is about 12 times slower than Specter's implementation.
I have three questions on the implemention of select-walker.
Is a tail-recursive implementaion of select-walker possible?
Can select-walker be written in more idiomatic Clojure?
Any hints to make select-walker execute faster?
there are at least two possibilities to make it tail recursive. First one is to process data in loop like this:
(defn select-walker-rec [afn ds]
(loop [res [] ds ds]
(cond (empty? ds) res
(coll? (first ds)) (recur res
(doall (concat (first ds)
(rest ds))))
(afn (first ds)) (recur (conj res (first ds)) (rest ds))
:else (recur res (rest ds)))))
in repl:
user> (select-walker-rec number? [1 :a {:b 2}])
[1 2]
user> user> (time (dotimes [_ 1000] (select-walker-rec number? (range 100))))
"Elapsed time: 19.428887 msecs"
(simple select-walker works about 200ms for me)
the second one (slower though, and more suitable for more difficult tasks) is to use zippers:
(require '[clojure.zip :as z])
(defn select-walker-z [afn ds]
(loop [res [] curr (z/zipper coll? seq nil ds)]
(cond (z/end? curr) res
(z/branch? curr) (recur res (z/next curr))
(afn (z/node curr)) (recur (conj res (z/node curr))
(z/next curr))
:else (recur res (z/next curr)))))
user> (time (dotimes [_ 1000] (select-walker-z number? (range 100))))
"Elapsed time: 219.015153 msecs"
this one is really slow, since zipper operates on more complex structures. It's great power brings unneeded overhead to this simple task.
the most idiomatic approach i guess, is to use tree-seq:
(defn select-walker-t [afn ds]
(filter #(and (not (coll? %)) (afn %))
(tree-seq coll? seq ds)))
user> (time (dotimes [_ 1000] (select-walker-t number? (range 100))))
"Elapsed time: 1.320209 msecs"
it is incredibly fast, as it produces a lazy sequence of results. In fact you should realize its data for the fair test:
user> (time (dotimes [_ 1000] (doall (select-walker-t number? (range 100)))))
"Elapsed time: 53.641014 msecs"
one more thing to notice about this variant, is that it's not tail recursive, so it would fail in case of really deeply nested structures (maybe i'm mistaken, but i guess it's about couple of thousands levels of nesting), still it's suitable for the most cases.

Clojure butlast vs drop-last

What is the difference between butlast and drop-last in Clojure ?
Is it only the laziness ? Should I should prefer one over the other ?
also, if you need to realize the whole collection, butlast is dramatically faster, which is logical if you look at their source:
(def
butlast (fn ^:static butlast [s]
(loop [ret [] s s]
(if (next s)
(recur (conj ret (first s)) (next s))
(seq ret)))))
(defn drop-last
([s] (drop-last 1 s))
([n s] (map (fn [x _] x) s (drop n s))))
so drop-last uses map, while butlast uses simple iteration with recur. Here is a little example:
user> (time (let [_ (butlast (range 10000000))]))
"Elapsed time: 2052.853726 msecs"
nil
user> (time (let [_ (doall (drop-last (range 10000000)))]))
"Elapsed time: 14072.259077 msecs"
nil
so i wouldn't blindly prefer one over another. I use drop-last only when i really need laziness, otherwise butlast.
Yes, laziness as well as the fact that drop-last can take also take n, indicating how many elements to drop from the end lazily.
There's a discussion here where someone is making the case that butlast is more readable and maybe a familiar idiom for Lisp programmers, but I generally just opt to use drop-last.

Parallel Sieve of Eratosthenes using Clojure Reducers

I have implemented the Sieve of Eratosthenes using Clojure's standard library.
(defn primes [below]
(remove (set (mapcat #(range (* % %) below %)
(range 3 (Math/sqrt below) 2)))
(cons 2 (range 3 below 2))))
I think this should be amenable to parallelism as there is no recursion and the reducer versions of remove and mapcat can be dropped in. Here is what I came up with:
(defn pprimes [below]
(r/foldcat
(r/remove
(into #{} (r/mapcat #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2))))
(into [] (cons 2 (range 3 below 2))))))
I've poured the initial set and the generated multiples into vectors as I understand that LazySeqs can't be folded. Also, r/foldcat is used to finally realize the collection.
My problem is that this is a little slower than the first version.
(time (first (primes 1000000))) ;=> approx 26000 seconds
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
Is there too much overhead from the coordinating processes or am I using reducers wrong?
Thanks to leetwinski this seems to work:
(defn pprimes2 [below]
(r/foldcat
(r/remove
(into #{} (r/foldcat (r/map #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2)))))
(into [] (cons 2 (range 3 below 2))))))
Apparently I needed to add another fold operation in order to map #(range (* % %) below %) in parallel.
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
(time (first (pprimes2 1000000))) ;=> approx 7500 seconds
Edit: The above code doesn't work. r/foldcat isn't concatenating the composite numbers it is just returning a vector of the multiples for each prime number. The final result is a vector of 2 and all the odd numbers. Replacing r/map with r/mapcat gives the correct answer but it is again slower than the original primes.
as far as remember, the r/mapcat and r/remove are not parallel themselves, thy are just producing foldable collections, which are in turn can be subject to parallelization by r/fold. in your case the only parallel operation is r/foldcat, which is according to documentation "Equivalent to (fold cat append! coll)", meaning that you just potentially do append! in parallel, which isn't what you want at all.
To make it parallel you should probably use r/fold with remove as a reduce function and concat as a combine function, but it won't really make your code faster i guess, due to the nature of your algorithm (i mean you will try to remove a big set of items from every chunk of a collection)

Futures somehow slower then agents?

The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.

How to find length of lazy sequence without forcing realization?

I'm currently reading the O'reilly Clojure programming book which it's says the following in it's section about lazy sequences:
It is possible (though very rare) for a lazy sequence to know its length, and therefore return it as the result of count without realizing its contents.
My question is, How this is done and why it's so rare?
Unfortunately, the book does not specify these things in this section. I personally think that it's very useful to know the length of a lazy sequence prior it's realization, for instance, in the same page is an example of a lazy sequence of files that are processed with a function using map. It would be nice to know how many files could be processed before realizing the sequence.
As inspired by soulcheck's answer, here is a lazy but counted map of an expensive function over a fixed size collection.
(defn foo [s f]
(let [c (count s), res (map f s)]
(reify
clojure.lang.ISeq
(seq [_] res)
clojure.lang.Counted
(count [_] c)
clojure.lang.IPending
(isRealized [_] (realized? res)))))
(def bar (foo (range 5) (fn [x] (Thread/sleep 1000) (inc x))))
(time (count bar))
;=> "Elapsed time: 0.016848 msecs"
; 5
(realized? bar)
;=> false
(time (into [] bar))
;=> "Elapsed time: 4996.398302 msecs"
; [1 2 3 4 5]
(realized? bar)
;=> true
(time (into [] bar))
;=> "Elapsed time: 0.042735 msecs"
; [1 2 3 4 5]
I suppose it's due to the fact that usually there are other ways to find out the size.
The only sequence implementation I can think of now that could potentially do that, is some kind of map of an expensive function/procedure over a known size collection.
A simple implementation would return the size of the underlying collection, while postponing realization of the elements of the lazy-sequence (and therefore execution of the expensive part) until necessary.
In that case one knows the size of the collection that is being mapped over beforehand and can use that instead of the lazy-seq size.
It might be handy sometimes and that's why it's not impossible to implement, but I guess rarely necessary.