Why is my transducer function slower than using ->> operator? - clojure

While solving problem from Hackkerank (https://www.hackerrank.com/challenges/string-compression/problem) I've written 2 implementations with and without transducers.
I was expecting the transducer implementation to be faster, than the function chaining operator ->>. Unfortunately, according to my mini-benchmark the chaining operator was outperforming the transducer by 2.5 times.
I was thinking, that I should use transducers wherever possible. Or didn't I understand the concept of transducers correctly?
Time:
"Elapsed time: 0.844459 msecs"
"Elapsed time: 2.697836 msecs"
Code:
(defn string-compression-2
[s]
(->> s
(partition-by identity)
(mapcat #(if (> (count %) 1)
(list (first %) (count %))
(list (first %))))
(apply str)))
(def xform-str-compr
(comp (partition-by identity)
(mapcat #(if (> (count %) 1)
(list (first %) (count %))
(list (first %))))))
(defn string-compression-3
[s]
(transduce xform-str-compr str s))
(time (string-compression-2 "aaabccdddd"))
(time (string-compression-3 "aaabccdddd"))

The transducer version does seem to be faster, according to Criterium:
(crit/quick-bench (string-compression-2 "aaabccdddd"))
Execution time mean : 6.150477 µs
Execution time std-deviation : 246.740784 ns
Execution time lower quantile : 5.769961 µs ( 2.5%)
Execution time upper quantile : 6.398563 µs (97.5%)
Overhead used : 1.620718 ns
(crit/quick-bench (string-compression-3 "aaabccdddd"))
Execution time mean : 2.533919 µs
Execution time std-deviation : 157.594154 ns
Execution time lower quantile : 2.341610 µs ( 2.5%)
Execution time upper quantile : 2.704182 µs (97.5%)
Overhead used : 1.620718 ns
As coredump commented, a sample size of one is not enough to say whether one approach is generally faster than the other.

Related

Why is my Clojure prime number lazy sequence so slow?

I'm doing problem 7 of Project Euler (calculate the 10001st prime). I have coded a solution in the form of a lazy sequence, but it is super slow, whereas another solution I found on the web (link below) and which does essentially the same thing takes less than a second.
I'm new to clojure and lazy sequences, so my usage of take-while, lazy-cat rest or map may be the culprits. Could you PLEASE look at my code and tell me if you see anything?
The solution that runs under a second is here:
https://zach.se/project-euler-solutions/7/
It doesn't use lazy sequences. I'd like to know why it's so fast while mine is so slow (the process they follow is similar).
My solution which is super slow:
(def primes
(letfn [(getnextprime [largestprimesofar]
(let [primessofar (concat (take-while #(not= largestprimesofar %) primes) [largestprimesofar])]
(loop [n (+ (last primessofar) 2)]
(if
(loop [primessofarnottriedyet (rest primessofar)]
(if (= 0 (count primessofarnottriedyet))
true
(if (= 0 (rem n (first primessofarnottriedyet)))
false
(recur (rest primessofarnottriedyet)))))
n
(recur (+ n 2))))))]
(lazy-cat '(2 3) (map getnextprime (rest primes)))))
To try it, just load it and run something like (take 10000 primes), but use Ctrl+C to kill the process, because it is too slow. However, if you try (take 100 primes), you should get an instant answer.
Let me re-write your code just a bit to break it down into pieces that will be easier to discuss. I'm using your same algorithm, I'm just splitting out some of the inner forms into separate functions.
(declare primes) ;; declare this up front so we can refer to it below
(defn is-relatively-prime? [n candidates]
(if (= 0 (count candidates))
true
(if (zero? (rem n (first candidates)))
false
(is-relatively-prime? n (rest candidates)))))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (let [p (doall (take 200 primes))]))
That last line is just to make it easier to get some really rough benchmarks in the REPL. By making the timing statement part of the source file, I can keep re-loading the source, and get a fresh benchmark each time. If I just load the file once, and keep trying to do (take 500 primes) the benchmark will be skewed because primes will hold on to the primes it has already calculated. I also need the doall because I'm pulling my prime numbers inside a let statement, and if I don't use doall, it will just store the lazy sequence in p, instead of actually calculating the primes.
Now, let's get some base values. On my PC, I get this:
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.492597 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 293.673962 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 322.035034 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 285.29596 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 224.311828 msecs"
So about 275 milliseconds, give or take 50. My first suspicion is how we're getting primes-so-far in the let statement inside get-next-prime. We're walking through the complete list of primes (as far as we have it) until we get to one that's equal to the largest prime so far. The way we've structured our code, however, all the primes are already in order, so we're effectively walking thru all the primes except the last, and then concatenating the last value. We end up with exactly the same values as have been realized so far in the primes sequence, so we can skip that whole step and just use primes. That should save us something.
My next suspicion is the call to (last primes-so-far) in the loop. When we use the last function on a sequence, it also walks the list from the head down to the tail (or at least, that's my understanding -- I wouldn't put it past the Clojure compiler writers to have snuck in some special-case code to speed things up.) But again, we don't need it. We're calling get-next-prime with largest-prime-so-far, and since our primes are in order, that's already the last of the primes as far as we've realized them, so we can just use largest-prime-so-far instead of (last primes). That will give us this:
(defn get-next-prime [largest-prime-so-far]
; deleted the let statement since we don't need it
(loop [n (+ largest-prime-so-far 2)]
(if
(is-relatively-prime? n (rest primes))
n
(recur (+ n 2)))))
That seems like it should speed things up, since we've eliminated two complete walks through the primes sequence. Let's try it.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 242.130691 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 223.200787 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 287.63579 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 244.927825 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 274.146199 msecs"
Hmm, maybe slightly better (?), but not nearly the improvement I expected. Let's look at the code for is-relatively-prime? (as I've re-written it). And the first thing that jumps out at me is the count function. The primes sequence is a sequence, not a vector, which means the count function also has to walk the complete list to add up how many elements are in it. What's worse, if we start with a list of, say, 10 candidates, it walks all ten the first time through the loop, then walks the nine remaining candidates on the next loop, then the 8 remaining, and so on. As the number of primes gets larger, we're going to spend more and more time in the count function, so maybe that's our bottleneck.
We want to get rid of that count, and that suggests a more idiomatic way we could do the loop, using if-let. Like this:
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
The (first candidates) function will return nil if the candidates list is empty, and if that happens, the if-let function will notice, and automatically jump to the else clause, which in this case is our return result of "true." Otherwise, we'll execute the "then" clause, and can test for whether n is evenly divisible by the current candidate. If it is, we return false, otherwise we recur back with the rest of the candidates. I also took advantage of the zero? function just because I could. Let's see what this gets us.
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.981985 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.011646 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.154197 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 9.905292 msecs"
Loading src/scratch_clojure/core.clj... done
"Elapsed time: 8.215208 msecs"
Pretty dramatic, eh? I'm an intermediate-level Clojure coder with a pretty sketchy understanding of the internals, so take my analysis with a grain of salt, but based on those numbers, I'd guess you were getting bitten by the count.
There's one other optimization the "fast" code is using that yours isn't, and that's bailing out on the is-relatively-prime? test whenever current squared is greater than n---you might speed up your code some more if you can throw that in. But I think count is the main thing you're looking for.
I will continue speeding it up, based on #manutter's solution.
(declare primes)
(defn is-relatively-prime? [n candidates]
(if-let [current (first candidates)]
(if (zero? (rem n current))
false
(recur n (rest candidates)))
true))
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n (rest primes-so-far))
n
(recur (+ n 2))))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
(time (first (drop 10000 primes)))
"Elapsed time: 14092.414513 msecs"
Ok. First of all let's add this current^2 > n optimization:
(defn get-next-prime [largest-prime-so-far]
(let [primes-so-far (concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far])]
(loop [n (+ (last primes-so-far) 2)]
(if
(is-relatively-prime? n
(take-while #(<= (* % %) n)
(rest primes-so-far)))
n
(recur (+ n 2))))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 10564.470626 msecs"
104743
Nice. Now let's look closer at the get-next-prime:
if you check the algorithm carefully, you will notice that this:
(concat (take-while #(not= largest-prime-so-far %) primes) [largest-prime-so-far]) really equals to all the primes we've found so far, and (last primes-so-far) is really the largest-prime-so-far. So let's rewrite it a little:
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (is-relatively-prime? n
(take-while #(<= (* % %) n) (rest primes)))
n
(recur (+ n 2)))))
user> (time (first (drop 10000 primes)))
"Elapsed time: 142.676634 msecs"
104743
let's add one more order of magnitude:
user> (time (first (drop 100000 primes)))
"Elapsed time: 2615.910723 msecs"
1299721
Wow! it's just mind blowing!
but that's not all. let's take a look at is-relatively-prime function:
it just checks that none of the candidates evenly divides the number. So it is really what not-any? library function does. So let's just replace it in get-next-prime.
(declare primes)
(defn get-next-prime [largest-prime-so-far]
(loop [n (+ largest-prime-so-far 2)]
(if (not-any? #(zero? (rem n %))
(take-while #(<= (* % %) n)
(rest primes)))
n
(recur (+ n 2)))))
(def primes
(lazy-cat '(2 3) (map get-next-prime (rest primes))))
it is a bit more productive
user> (time (first (drop 100000 primes)))
"Elapsed time: 2493.291323 msecs"
1299721
and obviously much cleaner and shorter.

Performance of multimethod vs cond in Clojure

Multimethods are slower than protocols and one should try to use protocols when they can solve the problem, even though using multimethods gives a more flexible solution.
So what is the case with cond and multimethod? They can be used to solve the same problem but my guess is that multimethod has a huge performance overhead vs cond. If so why would i ever want to use multimethod instead of cond?
Multimethods allow for open extension; others can extend your multimethod dispatching on arbitrary expressions by adding new defmethods in their source. Cond expressions are closed to extension by others or even your own code without editing the cond source.
If you just want to act on conditional logic then a cond is the way to go. If you're wanting to do more complex dispatching, or apply a function over many types of data with different behaviour then a multimethod is probably more appropriate.
Why worry when you can measure?
Here is a benchmark sample using criterium library. Cond and Multi-methods codes are taken from http://blog.8thlight.com/myles-megyesi/2012/04/26/polymorphism-in-clojure.html.
Caveat This is just a sample on doing benchmark comparing multimethod and cond performance. The result below--that shows cond performs better than multimethod, can not be generalized to various usage in practice. You can use this benchmarking method to your own code.
;; cond
(defn convert-cond [data]
(cond
(nil? data)
"null"
(string? data)
(str "\"" data "\"")
(keyword? data)
(convert-cond (name data))
:else
(str data)))
(bench (convert-cond "yolo"))
Evaluation count : 437830380 in 60 samples of 7297173 calls.
Execution time mean : 134.822430 ns
Execution time std-deviation : 1.134226 ns
Execution time lower quantile : 133.066750 ns ( 2.5%)
Execution time upper quantile : 137.077603 ns (97.5%)
Overhead used : 1.893383 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
;; multimethod
(defmulti convert class)
(defmethod convert clojure.lang.Keyword [data]
(convert (name data)))
(defmethod convert java.lang.String [data]
(str "\"" data "\""))
(defmethod convert nil [data]
"null")
(defmethod convert :default [data]
(str data))
(bench (convert "yolo"))
Evaluation count : 340091760 in 60 samples of 5668196 calls.
Execution time mean : 174.225558 ns
Execution time std-deviation : 1.824118 ns
Execution time lower quantile : 170.841203 ns ( 2.5%)
Execution time upper quantile : 177.465794 ns (97.5%)
Overhead used : 1.893383 ns
nil
To follow up on #AlexMiller comment, I tried to benchmark with more randomised data and added up the protocol implementation (also added one more type - Integer - to the different methods).
(defprotocol StrConvert
(to-str [this]))
(extend-protocol StrConvert
nil
(to-str [this] "null")
java.lang.Integer
(to-str [this] (str this))
java.lang.String
(to-str [this] (str "\"" this "\""))
clojure.lang.Keyword
(to-str [this] (to-str (name this)))
java.lang.Object
(to-str [this] (str this)))
data contains a sequence of 10000 random integers, that are randomly converted to String, nil, keyword or vector.
(let [fns [identity ; as is (integer)
str ; stringify
(fn [_] nil) ; nilify
#(-> % str keyword) ; keywordize
vector] ; vectorize
data (doall (map #(let [f (rand-nth fns)] (f %))
(repeatedly 10000 (partial rand-int 1000000))))]
;; print a summary of what we have in data
(println (map (fn [[k v]] [k (count v)]) (group-by class data)))
;; multimethods
(c/quick-bench (dorun (map convert data)))
;; cond-itionnal
(c/quick-bench (dorun (map convert-cond data)))
;; protocols
(c/quick-bench (dorun (map to-str data))))
The result is for data containing :
([clojure.lang.PersistentVector 1999] [clojure.lang.Keyword 1949]
[java.lang.Integer 2021] [java.lang.String 2069] [nil 1962])
Multimethods: 6.26 ms
Cond-itionnal: 5.18 ms
Protocols: 6.04 ms
I would certainly suggest as #DanielCompton : the design matters more than the pure performances that seem on pair for each method, at least on this example.

Futures somehow slower then agents?

The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.

Maximum subarray algorithm in clojure

kanade's algorithm solves the maximum subarray problem. i'm trying to learn clojure, so i came up with this implementation:
(defn max-subarray [xs]
(last
(reduce
(fn [[here sofar] x]
(let [new-here (max 0 (+ here x))]
[new-here (max new-here sofar)]))
[0 0]
xs)))
this seems really verbose. is there a cleaner way to implement this algorithm in clojure?
As I said in a comment on the question, I believe the OP's approach is optimal. That's given the fully general problem in which the input is a seqable of arbitrary numbers.
However, if the requirement were added that the input should be a collection of longs (or doubles; other primitives are fine too, as long as we're not mixing integers with floating-point numbers), a loop / recur based solution could be made to be significantly faster by taking advantage of primitive arithmetic:
(defn max-subarray-prim [xs]
(loop [xs (seq xs) here 0 so-far 0]
(if xs
(let [x (long (first xs))
new-here (max 0 (+ here x))]
(recur (next xs) new-here (max new-here so-far)))
so-far)))
This is actually quite readable to my eye, though I do prefer reduce where there is no particular reason to use loop / recur. The hope now is that loop's ability to keep here and so-far unboxed throughout the loop's execution will make enough of a difference performance-wise.
To benchmark this, I generated a vector of 100000 random integers from the range -50000, ..., 49999:
(def xs (vec (repeatedly 100000 #(- (rand-int 100000) 50000))))
Sanity check (max-subarray-orig refers to the OP's implementation):
(= (max-subarray-orig xs) (max-subarray-prim xs))
;= true
Criterium benchmarks:
(do (c/bench (max-subarray-orig xs))
(flush)
(c/bench (max-subarray-prim xs)))
WARNING: Final GC required 3.8238570080506156 % of runtime
Evaluation count : 11460 in 60 samples of 191 calls.
Execution time mean : 5.295551 ms
Execution time std-deviation : 97.329399 µs
Execution time lower quantile : 5.106146 ms ( 2.5%)
Execution time upper quantile : 5.456003 ms (97.5%)
Overhead used : 2.038603 ns
Evaluation count : 28560 in 60 samples of 476 calls.
Execution time mean : 2.121256 ms
Execution time std-deviation : 42.014943 µs
Execution time lower quantile : 2.045558 ms ( 2.5%)
Execution time upper quantile : 2.206587 ms (97.5%)
Overhead used : 2.038603 ns
Found 5 outliers in 60 samples (8.3333 %)
low-severe 1 (1.6667 %)
low-mild 4 (6.6667 %)
Variance from outliers : 7.8724 % Variance is slightly inflated by outliers
So that's a jump from ~5.29 ms to ~2.12 ms per call.
Here it is using loop and recur to more closely mimic the example in the wikipedia page.
user> (defn max-subarray [xs]
(loop [here 0 sofar 0 ar xs]
(if (not (empty? ar))
(let [x (first ar) new-here (max 0 (+ here x))]
(recur new-here (max new-here sofar) (rest ar)))
sofar)))
#'user/max-subarray
user> (max-subarray [0 -1 1 2 -4 3])
3
Some people may find this easier to follow, others prefer reduce or map.

Clojure: Are side-effect-free calculations executed many times

lets say I have a side-effect free function that I use repeatedly with the same parameters without storing the results in a variable.
Does Clojure notice this and uses the pre-calculated value of the function or is the value recalculated all the time?
Example:
(defn rank-selection [population fitness]
(map
#(select-with-probability (sort-by fitness population) %)
(repeatedly (count population) #(rand))))
(defn rank-selection [population fitness]
(let [sorted-population (sort-by fitness population)]
(map
#(select-with-probability sorted-population %)
(repeatedly (count population) #(rand)))))
In the first version sort-by is executed n-times (where n is the size of the population).
In the second version sort-by is executed once and the result is used n-times
Does Clojure stores the result nonetheless?
Are these methods comparable fast?
Clojure doesn't store the results unless you specify that in your code, either by using memoize like mentioned in the comments or by saving the calculation/result in a local binding like you did.
Regarding the questions about how fast is one function regarding the other, here's some code that returns the time for the execution of each (I had to mock the select-with-probability function). The doalls are necessary to force the evaluation of the result of map.
(defn select-with-probability [x p]
(when (< p 0.5)
x))
(defn rank-selection [population fitness]
(map
#(select-with-probability (sort-by fitness population) %)
(repeatedly (count population) rand)))
(defn rank-selection-let [population fitness]
(let [sorted-population (sort-by fitness population)]
(map
#(select-with-probability sorted-population %)
(repeatedly (count population) rand))))
(let [population (take 1000 (repeatedly #(rand-int 10)))]
(time (doall (rank-selection population <)))
(time (doall (rank-selection-let population <)))
;; So that we don't get the result seq
nil)
This returns the following in my local environment:
"Elapsed time: 95.700138 msecs"
"Elapsed time: 1.477563 msecs"
nil
EDIT
In order to avoid the use of the let form you could also use partial which receives a function and any number of arguments, and returns a partial application of that function with the values of the arguments supplied. The performance of the resulting code is in the same order as the one with the let form but is more succinct and readable.
(defn rank-selection-partial [population fitness]
(map
(partial select-with-probability (sort-by fitness population))
(repeatedly (count population) rand)))
(let [population (take 1000 (repeatedly #(rand-int 10)))]
(time (doall (rank-selection-partial population <)))
;; So that we don't get the result seq
nil)
;= "Elapsed time: 0.964413 msecs"
In Clojure sequences are lazy, but the rest of the language, including function evaluation, is eager. Clojure will invoke the function every time for you. Use the second version of your rank-selection function.