Quicksort in Clojure - clojure

I am trying to prove Clojure performance can be on equal footing with Java. An important use case I've found is the Quicksort. I have written an implementation as follows:
(set! *unchecked-math* true)
(defn qsort [^longs a]
(let [qs (fn qs [^long low, ^long high]
(when (< low high)
(let [pivot (aget a low)
[i j]
(loop [i low, j high]
(let [i (loop [i i] (if (< (aget a i) pivot)
(recur (inc i)) i))
j (loop [j j] (if (> (aget a j) pivot)
(recur (dec j)) j))
[i j] (if (<= i j)
(let [tmp (aget a i)]
(aset a i (aget a j)) (aset a j tmp)
[(inc i) (dec j)])
[i j])]
(if (< i j) (recur i j) [i j])))]
(when (< low j) (qs low j))
(when (< i high) (qs i high)))))]
(qs 0 (dec (alength a))))
a)
Also, this helps call the Java quicksort:
(defn jqsort [^longs a] (java.util.Arrays/sort a) a))
Now, for the benchmark.
user> (def xs (let [rnd (java.util.Random.)]
(long-array (repeatedly 100000 #(.nextLong rnd)))))
#'user/xs
user> (def ys (long-array xs))
#'user/ys
user> (time (qsort ys))
"Elapsed time: 163.33 msecs"
#<long[] [J#3ae34094>
user> (def ys (long-array xs))
user> (time (jqsort ys))
"Elapsed time: 13.895 msecs"
#<long[] [J#1b2b2f7f>
Performance is worlds apart (an order of magnitude, and then some).
Is there anything I'm missing, any Clojure feature I may have used? I think the main source of performance degradation is when I need to return several values from a loop and must allocate a vector for that. Can this be avoided?
BTW running Clojure 1.4. Also note that I have run the benchmark multiple times in order to warm up the HotSpot. These are the times when they settle down.
Update
The most terrible weakness in my code is not just the allocation of vectors, but the fact that they force boxing and break the primitive chain. Another weakness is using results of loop because they also break the chain. Yep, performance in Clojure is still a minefield.

This version is based on #mikera's, is just as fast and doesn't require the use of ugly macros. On my machine this takes ~12ms vs ~9ms for java.util.Arrays/sort:
(set! *unchecked-math* true)
(set! *warn-on-reflection* true)
(defn swap [^longs a ^long i ^long j]
(let [t (aget a i)]
(aset a i (aget a j))
(aset a j t)))
(defn ^long apartition [^longs a ^long pivot ^long i ^long j]
(loop [i i j j]
(if (<= i j)
(let [v (aget a i)]
(if (< v pivot)
(recur (inc i) j)
(do
(when (< i j)
(aset a i (aget a j))
(aset a j v))
(recur i (dec j)))))
i)))
(defn qsort
([^longs a]
(qsort a 0 (long (alength a))))
([^longs a ^long lo ^long hi]
(when
(< (inc lo) hi)
(let [pivot (aget a lo)
split (dec (apartition a pivot (inc lo) (dec hi)))]
(when (> split lo)
(swap a lo split))
(qsort a lo split)
(qsort a (inc split) hi)))
a))
(defn ^longs rand-long-array []
(let [rnd (java.util.Random.)]
(long-array (repeatedly 100000 #(.nextLong rnd)))))
(comment
(dotimes [_ 10]
(let [as (rand-long-array)]
(time
(dotimes [_ 1]
(qsort as)))))
)
The need for manual inlining is mostly unnecessary starting with Clojure 1.3. With a few type hints only on the function arguments the JVM will do the inlining for you. There is no need to cast index arguments to int for the the array operations - Clojure does this for you.
One thing to watch out for is that nested loop/recur does present problems for JVM inlining since loop/recur doesn't (at this time) support returning primitives. So you have to break apart your code into separate fns. This is for the best as nested loop/recurs get very ugly in Clojure anyhow.
For a more detailed look on how to consistently achieve Java performance (when you actually need it) please examine and understand test.benchmark.

This is slightly horrific because of the macros, but with this code I think you can match the Java speed (I get around 11ms for the benchmark):
(set! *unchecked-math* true)
(defmacro swap [a i j]
`(let [a# ~a
i# ~i
j# ~j
t# (aget a# i#)]
(aset a# i# (aget a# j#))
(aset a# j# t#)))
(defmacro apartition [a pivot i j]
`(let [pivot# ~pivot]
(loop [i# ~i
j# ~j]
(if (<= i# j#)
(let [v# (aget ~a i#)]
(if (< v# pivot#)
(recur (inc i#) j#)
(do
(when (< i# j#)
(aset ~a i# (aget ~a j#))
(aset ~a j# v#))
(recur i# (dec j#)))))
i#))))
(defn qsort
([^longs a]
(qsort a 0 (alength a)))
([^longs a ^long lo ^long hi]
(let [lo (int lo)
hi (int hi)]
(when
(< (inc lo) hi)
(let [pivot (aget a lo)
split (dec (apartition a pivot (inc lo) (dec hi)))]
(when (> split lo) (swap a lo split))
(qsort a lo split)
(qsort a (inc split) hi)))
a)))
The main tricks are:
Do everything with primitive arithmetic
Use ints for the array indexes (this avoids some unnecessary casts, not a big deal but every little helps....)
Use macros rather than functions to break up the code (avoids function call overhead and parameter boxing)
Use loop/recur for maximum speed in the inner loop (i.e. partitioning the subarray)
Avoid constructing any new objects on the heap (so avoid vectors, sequences, maps etc.)

The Joy of Clojure, Chapter 6.4 describes a lazy quicksort algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n).
(ns joy.q)
(defn sort-parts
"Lazy, tail-recursive, incremental quicksort. Works against
and creates partitions based on the pivot, defined as 'work'."
[work]
(lazy-seq
(loop [[part & parts] work]
(if-let [[pivot & xs] (seq part)]
(let [smaller? #(< % pivot)]
(recur (list*
(filter smaller? xs)
pivot
(remove smaller? xs)
parts)))
(when-let [[x & parts] parts]
(cons x (sort-parts parts)))))))
(defn qsort [xs]
(sort-parts (list xs)))

By examining the main points from mikera's answer, you can see that they are mostly focused on eliminating the overhead introduced by using idiomatic (as opposed to tweaked) Clojure, which would probably not exist in an idiomatic Java implementation:
primitive arithmetic - slightly easier and more idiomatic in Java, you are more likely to use ints than Integers
ints for the array indexes - the same
Use macros rather than functions to break up the code (avoids functional call overhead and boxing) - fixes a problem introduced by using the language. Clojure encourages functional style, hence a function call overhead (and boxing).
Use loop/recur for maximum speed in the inner loop - in Java you'd idiomatically use an ordinary loop (which is what loop/recur compiles to anyway, as far as I know)
That being said, there actually is another trivial solution. Write (or find) an efficient Java implementation of Quick Sort, say something with a signature like this:
Sort.quickSort(long[] elems)
And then call it from Clojure:
(Sort/quickSort elems)
Checklist:
as efficient as in Java - yes
idiomatic in Clojure - arguably yes, I'd say that Java-interop is one of Clojure's core features.
reusable - yes, there's a good chance that you can easily find a very efficient Java implementation already written.
I'm not trying to troll, I understand what you are trying to find out with these experiments I'm just adding this answer for the sake of completeness. Let's not overlook the obvious one! :)

Related

Debugging a slow performing function in Clojure

I am trying to implement a solution for minimum-swaps required to sort an array in clojure.
The code works, but takes about a second to solve for the 7 element vector, which is very poor compared to a similar solution in Java. (edited)
I already tried providing the explicit types, but doesnt seem to make a difference
I tried using transients, but has an open bug for subvec, that I am using in my solution- https://dev.clojure.org/jira/browse/CLJ-787
Any pointers on how I can optimize the solution?
;; Find minimumSwaps required to sort the array. The algorithm, starts by iterating from 0 to n-1. In each iteration, it places the least element in the ith position.
(defn minimumSwaps [input]
(loop [mv input, i (long 0), swap-count (long 0)]
(if (< i (count input))
(let [min-elem (apply min (drop i mv))]
(if (not= min-elem (mv i))
(recur (swap-arr mv i min-elem),
(unchecked-inc i),
(unchecked-inc swap-count))
(recur mv,
(unchecked-inc i),
swap-count)))
swap-count)))
(defn swap-arr [vec x min-elem]
(let [y (long (.indexOf vec min-elem))]
(assoc vec x (vec y) y (vec x))))
(time (println (minimumSwaps [7 6 5 4 3 2 1])))
There are a few things that can be improved in your solution, both algorithmically and efficiency-wise. The main improvement is to remember both the minimal element in the vector and its position when you search for it. This allows you to not search for the minimal element again with .indexOf.
Here's my revised solution that is ~4 times faster:
(defn swap-arr [v x y]
(assoc v x (v y) y (v x)))
(defn find-min-and-position-in-vector [v, ^long start-from]
(let [size (count v)]
(loop [i start-from, min-so-far (long (nth v start-from)), min-pos start-from]
(if (< i size)
(let [x (long (nth v i))]
(if (< x min-so-far)
(recur (inc i) x i)
(recur (inc i) min-so-far min-pos)))
[min-so-far min-pos]))))
(defn minimumSwaps [input]
(loop [mv input, i (long 0), swap-count (long 0)]
(if (< i (count input))
(let [[min-elem min-pos] (find-min-and-position-in-vector mv i)]
(if (not= min-elem (mv i))
(recur (swap-arr mv i min-pos),
(inc i),
(inc swap-count))
(recur mv,
(inc i),
swap-count)))
swap-count)))
To understand where are the performance bottlenecks in your program, it is better to use https://github.com/clojure-goes-fast/clj-async-profiler rather than to guess.
Notice how I dropped unchecked-* stuff from your code. It is not as important here, and it is easy to get it wrong. If you want to use them for performance, make sure to check the resulting bytecode with a decompiler: https://github.com/clojure-goes-fast/clj-java-decompiler
A similar implementation in java, runs almost in half the time.
That's actually fairly good for Clojure, given that you use immutable vectors where in Java you probably use arrays. After rewriting the Clojure solution to arrays, the performance would be almost the same.

collect function of Lisp in Clojure

I defined a function is-prime? in Clojure that returns if a number is prime or not, and I am trying to define a function prime-seq that returns all the prime numbers between two numbersn and m.
I have created the function in Common Lisp since I am more comfortable with it and I am trying to translate the code to Clojure. However, I cannot find how to replace the collect function in Lisp to Clojure.
This is my prime-seq function in Lisp:
(defun prime-seq (i j)
(loop for x from i to j
when (is-prime x)
collect x
)
)
And this is the try I did in Clojure but it is not working:
(defn prime-seq? [n m]
(def list ())
(loop [k n]
(cond
(< k m) (if (prime? k) (def list (conj list k)))
)
)
(println list)
)
Any ideas?
loop in Clojure is not the same as CL loop. You probably want for:
(defn prime-seq [i j]
(for [x (range i j)
:when (is-prime x)]
x))
Which is basically the same as saying:
(defn prime-seq [i j]
(filter is-prime (range i j)))
which may be written using the ->> macro for readability:
(defn prime-seq [i j]
(->> (range i j)
(filter is-prime)))
However, you might actually want a lazy-sequence of all prime numbers which you could write with something like this:
(defonce prime-seq
(let [c (fn [m numbers] (filter #(-> % (mod m) (not= 0)) numbers))
f (fn g [s]
(when (seq s)
(cons (first s)
(lazy-seq (g (c (first s) (next s)))))))]
(f (iterate inc 2))))
The lazy sequence will cache the results of the previous calculation, and you can use things like take-while and drop-while to filter the sequence.
Also, you probably shouldn't be using def inside a function call like that. def is for defining a var, which is essentially global. Then using def to change that value completely destroys the var and replaces it with another var pointing to the new state. It's something that's allowed to enable iterative REPL based development and shouldn't really be used in that way. Var's are designed to isolate changes locally to a thread, and are used as containers for global things like functions and singletons in your system. If the algorithm you're writing needs a local mutable state you could use a transient or an atom, and define a reference to that using let, but it would be more idiomatic to use the sequence processing lib or maybe a transducer.
Loop works more like a tail recursive function:
(defn prime-seq [i j]
(let [l (transient [])]
(loop [k i]
(when (< k j)
(when (is-prime k)
(conj! l k))
(recur (inc k))))
(persistent! l)))
But that should be considered strictly a performance optimisation. The decision to use transients shouldn't be taken lightly, and it's often best to start with a more functional algorithm, benchmark and optimise accordingly. Here is a way to write the same thing without the mutable state:
(defn prime-seq [i j]
(loop [k i
l []]
(if (< k j)
(recur (inc k)
(if (is-prime k)
(conj l k)
l))
l)))
I'd try to use for:
(for [x (range n m) :when (is-prime? x)] x)

Clojure - sum up a bunch of numbers

Hey I'm doing a Project Euler question, and I'm looking to sum up all the numbers under 1000 that are multiplies of 3 or 5.
But being a clojure noob, my code just keeps returning zero.. and I'm not sure why.
(defn sum-of-multiples [max]
(let [result (atom 0)]
(for [i (range max)]
(if (or (= (rem i 3) 0) (= (rem i 5) 0))
(swap! result (+ #result i)))
)
#result))
(sum-of-multiples 1000)
Also the line (swap! result (+ #result i))) bugs me.. In C# I could do result += i, but I'm guessing there must be a better way to this in Clojure?
In Clojure - and at large in functional programming - we avoid assignment as it destroys state history and makes writing concurrent programs a whole lot harder. In fact, Clojure doesn't even support assignment. An atom is a reference type that is thread safe.
Another common trait of functional programming is that we try to solve problems as a series of data transformations. In your case you case some data, a list of numbers from 0 to 1000 exclusive, and you need to obtain the sum of all numbers that match a predicate. This can certainly be done by applying data transformations and completely removing the need for assignment. One such implementation is this:
(->> (range 1000)
(filter #(or (= (rem % 3) 0) (= (rem % 5) 0)))
(reduce +))
Please understand that a function such as the one you wrote isn't considered idiomatic code. Having said that, in the interest of learning, it can be made to work like so:
(defn sum-of-multiples [max]
(let [result (atom 0)]
(doseq [i (range max)]
(if (or (= (rem i 3) 0) (= (rem i 5) 0))
(swap! result #(+ % i)))
)
#result))
(sum-of-multiples 1000)
for returns a lazy sequence but since you're simply interested in the side-effects caused by swap! you need to use doseq to force the sequence. The other problem is that the second argument to swap! is a function, so you don't need to deref result again.
for is a list comprehension that return a lazy sequence, you have to traverse it for your code to work:
(defn sum-of-multiples [max]
(let [result (atom 0)]
(dorun
(for [i (range max)]
(if (or (= (rem i 3) 0) (= (rem i 5) 0))
(swap! result + i))))
#result))
An equivalent, more idiomatic implementation using for:
(defn sum-of-multiples [max]
(reduce +
(for [i (range max)
:when (or (zero? (rem i 3))
(zero? (rem i 5)))]
i)))
The other answers are good examples of what I alluded to in my comment. For the sake of completeness, here's a solution that uses loop/recur, so it may be easier to understand for someone who's still not comfortable with concepts like filter, map or reduce. It also happens to be about 30-40% faster, not that it really matters in this case.
(defn sum-of-multiples [max]
(loop [i 0
sum 0]
(if (> max i)
(recur (inc i)
(if (or (zero? (rem i 3)) (zero? (rem i 5)))
(+ sum i)
sum))
sum)))

Why is Clojure much faster than mit-scheme for equivalent functions?

I found this code in Clojure to sieve out first n prime numbers:
(defn sieve [n]
(let [n (int n)]
"Returns a list of all primes from 2 to n"
(let [root (int (Math/round (Math/floor (Math/sqrt n))))]
(loop [i (int 3)
a (int-array n)
result (list 2)]
(if (>= i n)
(reverse result)
(recur (+ i (int 2))
(if (< i root)
(loop [arr a
inc (+ i i)
j (* i i)]
(if (>= j n)
arr
(recur (do (aset arr j (int 1)) arr)
inc
(+ j inc))))
a)
(if (zero? (aget a i))
(conj result i)
result)))))))
Then I wrote the equivalent (I think) code in Scheme (I use mit-scheme)
(define (sieve n)
(let ((root (round (sqrt n)))
(a (make-vector n)))
(define (cross-out t to dt)
(cond ((> t to) 0)
(else
(vector-set! a t #t)
(cross-out (+ t dt) to dt)
)))
(define (iter i result)
(cond ((>= i n) (reverse result))
(else
(if (< i root)
(cross-out (* i i) (- n 1) (+ i i)))
(iter (+ i 2) (if (vector-ref a i)
result
(cons i result))))))
(iter 3 (list 2))))
The timing results are:
For Clojure:
(time (reduce + 0 (sieve 5000000)))
"Elapsed time: 168.01169 msecs"
For mit-scheme:
(time (fold + 0 (sieve 5000000)))
"Elapsed time: 3990 msecs"
Can anyone tell me why mit-scheme is more than 20 times slower?
update: "the difference was in iterpreted/compiled mode. After I compiled the mit-scheme code, it was running comparably fast. – abo-abo Apr 30 '12 at 15:43"
Modern incarnations of the Java Virtual Machine have extremely good performance when compared to interpreted languages. A significant amount of engineering resource has gone into the JVM, in particular the hotspot JIT compiler, highly tuned garbage collection and so on.
I suspect the difference you are seeing is primarily down to that. For example if you look Are the Java programs faster? you can see a comparison of java vs ruby which shows that java outperforms by a factor of 220 on one of the benchmarks.
You don't say what JVM options you are running your clojure benchmark with. Try running java with the -Xint flag which runs in pure interpreted mode and see what the difference is.
Also, it's possible that your example is too small to really warm-up the JIT compiler. Using a larger example may yield an even larger performance difference.
To give you an idea of how much Hotspot is helping you. I ran your code on my MBP 2011 (quad core 2.2Ghz), using java 1.6.0_31 with default opts (-server hotspot) and interpreted mode (-Xint) and see a large difference
; with -server hotspot (best of 10 runs)
>(time (reduce + 0 (sieve 5000000)))
"Elapsed time: 282.322 msecs"
838596693108
; in interpreted mode using -Xint cmdline arg
> (time (reduce + 0 (sieve 5000000)))
"Elapsed time: 3268.823 msecs"
838596693108
As to comparing Scheme and Clojure code, there were a few things to simplify at the Clojure end:
don't rebind the mutable array in loops;
remove many of those explicit primitive coercions, no change in performance. As of Clojure 1.3 literals in function calls compile to primitives if such a function signature is available, and generally the difference in performance is so small that it gets quickly drowned by any other operations happening in a loop;
add a primitive long annotation into the fn signature, thus removing the rebinding of n;
call to Math/floor is not needed -- the int coercion has the same semantics.
Code:
(defn sieve [^long n]
(let [root (int (Math/sqrt n))
a (int-array n)]
(loop [i 3, result (list 2)]
(if (>= i n)
(reverse result)
(do
(when (< i root)
(loop [inc (+ i i), j (* i i)]
(when (>= j n) (aset a j 1) (recur inc (+ j inc)))))
(recur (+ i 2) (if (zero? (aget a i))
(conj result i)
result)))))))

Binary search in clojure (implementation / performance)

I wrote a binary search function as part of a larger program, but it seems to be slower than it should be and profiling shows a lot of calls to methods in clojure.lang.Numbers.
My understanding is that Clojure can use primitives when it can determine that it can do so. The calls to the methods in clojure.lang.Numbers seems to indicate that it's not using primitives here.
If I coerce the loop variables to ints, it properly complains that the recur arguments are not primitive. If i coerce those too, the code works again but again it's slow. My only guess is that (quot (+ low-idx high-idx) 2) is not producing a primitive but I'm not sure where to go from here.
This is my first program in Clojure so feel free to let me know if there are more cleaner/functional/Clojure ways to do something.
(defn binary-search
[coll coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
(defn binary-search-perf-test
[test-size]
(do
(let [test-set (vec (range 1 (inc test-size))) test-set-size (count test-set)]
(time (count (map #(binary-search2 test-set test-set-size %) test-set)))
)))
First of all, you can use the binary search implementation provided by java.util.Collections:
(java.util.Collections/binarySearch [0 1 2 3] 2 compare)
; => 2
If you skip the compare, the search will be faster still, unless the collection includes bigints, in which case it'll break.
As for your pure Clojure implementation, you can hint coll-size with ^long in the parameter vector -- or maybe just ask for the vector's size at the beginning of the function's body (that's a very fast, constant time operation), replace the (quot ... 2) call with (bit-shift-right ... 1) and use unchecked math for the index calculations. With some additional tweaks a binary search could be written as follows:
(defn binary-search
"Finds earliest occurrence of x in xs (a vector) using binary search."
([xs x]
(loop [l 0 h (unchecked-dec (count xs))]
(if (<= h (inc l))
(cond
(== x (xs l)) l
(== x (xs h)) h
:else nil)
(let [m (unchecked-add l (bit-shift-right (unchecked-subtract h l) 1))]
(if (< (xs m) x)
(recur (unchecked-inc m) h)
(recur l m)))))))
This is still noticeably slower than the Java variant:
(defn java-binsearch [xs x]
(java.util.Collections/binarySearch xs x compare))
binary-search as defined above seems to take about 25% more time than this java-binsearch.
in Clojure 1.2.x you can only coerce local variables and they can't cross functions calls.
starting in Clojure 1.3.0 Clojure can use primative numbers across function calls but not through Higher Order Functions such as map.
if you are using clojure 1.3.0+ then you should be able to accomplish this using type hints
as with any clojure optimization problem the first step is to turn on (set! *warn-on-reflection* true) then add type hints until it no longer complains.
user=> (set! *warn-on-reflection* true)
true
user=> (defn binary-search
[coll coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
NO_SOURCE_FILE:23 recur arg for primitive local: low_idx is not matching primitive,
had: Object, needed: long
Auto-boxing loop arg: low-idx
#'user/binary-search
user=>
to remove this you can type hint the coll-size argument
(defn binary-search
[coll ^long coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
it is understandably difficult to connect the auto-boxing on line 10 to the coll-size parameter because it goes through cnt then high-idx then mid-ixd and so on, so I generally approach these problems by type-hinting everything until I find the one that makes the warnings go away, then remove hints so long as they stay gone