Performance of multimethod vs cond in Clojure - clojure

Multimethods are slower than protocols and one should try to use protocols when they can solve the problem, even though using multimethods gives a more flexible solution.
So what is the case with cond and multimethod? They can be used to solve the same problem but my guess is that multimethod has a huge performance overhead vs cond. If so why would i ever want to use multimethod instead of cond?

Multimethods allow for open extension; others can extend your multimethod dispatching on arbitrary expressions by adding new defmethods in their source. Cond expressions are closed to extension by others or even your own code without editing the cond source.
If you just want to act on conditional logic then a cond is the way to go. If you're wanting to do more complex dispatching, or apply a function over many types of data with different behaviour then a multimethod is probably more appropriate.

Why worry when you can measure?
Here is a benchmark sample using criterium library. Cond and Multi-methods codes are taken from http://blog.8thlight.com/myles-megyesi/2012/04/26/polymorphism-in-clojure.html.
Caveat This is just a sample on doing benchmark comparing multimethod and cond performance. The result below--that shows cond performs better than multimethod, can not be generalized to various usage in practice. You can use this benchmarking method to your own code.
;; cond
(defn convert-cond [data]
(cond
(nil? data)
"null"
(string? data)
(str "\"" data "\"")
(keyword? data)
(convert-cond (name data))
:else
(str data)))
(bench (convert-cond "yolo"))
Evaluation count : 437830380 in 60 samples of 7297173 calls.
Execution time mean : 134.822430 ns
Execution time std-deviation : 1.134226 ns
Execution time lower quantile : 133.066750 ns ( 2.5%)
Execution time upper quantile : 137.077603 ns (97.5%)
Overhead used : 1.893383 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
;; multimethod
(defmulti convert class)
(defmethod convert clojure.lang.Keyword [data]
(convert (name data)))
(defmethod convert java.lang.String [data]
(str "\"" data "\""))
(defmethod convert nil [data]
"null")
(defmethod convert :default [data]
(str data))
(bench (convert "yolo"))
Evaluation count : 340091760 in 60 samples of 5668196 calls.
Execution time mean : 174.225558 ns
Execution time std-deviation : 1.824118 ns
Execution time lower quantile : 170.841203 ns ( 2.5%)
Execution time upper quantile : 177.465794 ns (97.5%)
Overhead used : 1.893383 ns
nil

To follow up on #AlexMiller comment, I tried to benchmark with more randomised data and added up the protocol implementation (also added one more type - Integer - to the different methods).
(defprotocol StrConvert
(to-str [this]))
(extend-protocol StrConvert
nil
(to-str [this] "null")
java.lang.Integer
(to-str [this] (str this))
java.lang.String
(to-str [this] (str "\"" this "\""))
clojure.lang.Keyword
(to-str [this] (to-str (name this)))
java.lang.Object
(to-str [this] (str this)))
data contains a sequence of 10000 random integers, that are randomly converted to String, nil, keyword or vector.
(let [fns [identity ; as is (integer)
str ; stringify
(fn [_] nil) ; nilify
#(-> % str keyword) ; keywordize
vector] ; vectorize
data (doall (map #(let [f (rand-nth fns)] (f %))
(repeatedly 10000 (partial rand-int 1000000))))]
;; print a summary of what we have in data
(println (map (fn [[k v]] [k (count v)]) (group-by class data)))
;; multimethods
(c/quick-bench (dorun (map convert data)))
;; cond-itionnal
(c/quick-bench (dorun (map convert-cond data)))
;; protocols
(c/quick-bench (dorun (map to-str data))))
The result is for data containing :
([clojure.lang.PersistentVector 1999] [clojure.lang.Keyword 1949]
[java.lang.Integer 2021] [java.lang.String 2069] [nil 1962])
Multimethods: 6.26 ms
Cond-itionnal: 5.18 ms
Protocols: 6.04 ms
I would certainly suggest as #DanielCompton : the design matters more than the pure performances that seem on pair for each method, at least on this example.

Related

Clojure butlast vs drop-last

What is the difference between butlast and drop-last in Clojure ?
Is it only the laziness ? Should I should prefer one over the other ?
also, if you need to realize the whole collection, butlast is dramatically faster, which is logical if you look at their source:
(def
butlast (fn ^:static butlast [s]
(loop [ret [] s s]
(if (next s)
(recur (conj ret (first s)) (next s))
(seq ret)))))
(defn drop-last
([s] (drop-last 1 s))
([n s] (map (fn [x _] x) s (drop n s))))
so drop-last uses map, while butlast uses simple iteration with recur. Here is a little example:
user> (time (let [_ (butlast (range 10000000))]))
"Elapsed time: 2052.853726 msecs"
nil
user> (time (let [_ (doall (drop-last (range 10000000)))]))
"Elapsed time: 14072.259077 msecs"
nil
so i wouldn't blindly prefer one over another. I use drop-last only when i really need laziness, otherwise butlast.
Yes, laziness as well as the fact that drop-last can take also take n, indicating how many elements to drop from the end lazily.
There's a discussion here where someone is making the case that butlast is more readable and maybe a familiar idiom for Lisp programmers, but I generally just opt to use drop-last.

Futures somehow slower then agents?

The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.

How to find length of lazy sequence without forcing realization?

I'm currently reading the O'reilly Clojure programming book which it's says the following in it's section about lazy sequences:
It is possible (though very rare) for a lazy sequence to know its length, and therefore return it as the result of count without realizing its contents.
My question is, How this is done and why it's so rare?
Unfortunately, the book does not specify these things in this section. I personally think that it's very useful to know the length of a lazy sequence prior it's realization, for instance, in the same page is an example of a lazy sequence of files that are processed with a function using map. It would be nice to know how many files could be processed before realizing the sequence.
As inspired by soulcheck's answer, here is a lazy but counted map of an expensive function over a fixed size collection.
(defn foo [s f]
(let [c (count s), res (map f s)]
(reify
clojure.lang.ISeq
(seq [_] res)
clojure.lang.Counted
(count [_] c)
clojure.lang.IPending
(isRealized [_] (realized? res)))))
(def bar (foo (range 5) (fn [x] (Thread/sleep 1000) (inc x))))
(time (count bar))
;=> "Elapsed time: 0.016848 msecs"
; 5
(realized? bar)
;=> false
(time (into [] bar))
;=> "Elapsed time: 4996.398302 msecs"
; [1 2 3 4 5]
(realized? bar)
;=> true
(time (into [] bar))
;=> "Elapsed time: 0.042735 msecs"
; [1 2 3 4 5]
I suppose it's due to the fact that usually there are other ways to find out the size.
The only sequence implementation I can think of now that could potentially do that, is some kind of map of an expensive function/procedure over a known size collection.
A simple implementation would return the size of the underlying collection, while postponing realization of the elements of the lazy-sequence (and therefore execution of the expensive part) until necessary.
In that case one knows the size of the collection that is being mapped over beforehand and can use that instead of the lazy-seq size.
It might be handy sometimes and that's why it's not impossible to implement, but I guess rarely necessary.

Clojure: Are side-effect-free calculations executed many times

lets say I have a side-effect free function that I use repeatedly with the same parameters without storing the results in a variable.
Does Clojure notice this and uses the pre-calculated value of the function or is the value recalculated all the time?
Example:
(defn rank-selection [population fitness]
(map
#(select-with-probability (sort-by fitness population) %)
(repeatedly (count population) #(rand))))
(defn rank-selection [population fitness]
(let [sorted-population (sort-by fitness population)]
(map
#(select-with-probability sorted-population %)
(repeatedly (count population) #(rand)))))
In the first version sort-by is executed n-times (where n is the size of the population).
In the second version sort-by is executed once and the result is used n-times
Does Clojure stores the result nonetheless?
Are these methods comparable fast?
Clojure doesn't store the results unless you specify that in your code, either by using memoize like mentioned in the comments or by saving the calculation/result in a local binding like you did.
Regarding the questions about how fast is one function regarding the other, here's some code that returns the time for the execution of each (I had to mock the select-with-probability function). The doalls are necessary to force the evaluation of the result of map.
(defn select-with-probability [x p]
(when (< p 0.5)
x))
(defn rank-selection [population fitness]
(map
#(select-with-probability (sort-by fitness population) %)
(repeatedly (count population) rand)))
(defn rank-selection-let [population fitness]
(let [sorted-population (sort-by fitness population)]
(map
#(select-with-probability sorted-population %)
(repeatedly (count population) rand))))
(let [population (take 1000 (repeatedly #(rand-int 10)))]
(time (doall (rank-selection population <)))
(time (doall (rank-selection-let population <)))
;; So that we don't get the result seq
nil)
This returns the following in my local environment:
"Elapsed time: 95.700138 msecs"
"Elapsed time: 1.477563 msecs"
nil
EDIT
In order to avoid the use of the let form you could also use partial which receives a function and any number of arguments, and returns a partial application of that function with the values of the arguments supplied. The performance of the resulting code is in the same order as the one with the let form but is more succinct and readable.
(defn rank-selection-partial [population fitness]
(map
(partial select-with-probability (sort-by fitness population))
(repeatedly (count population) rand)))
(let [population (take 1000 (repeatedly #(rand-int 10)))]
(time (doall (rank-selection-partial population <)))
;; So that we don't get the result seq
nil)
;= "Elapsed time: 0.964413 msecs"
In Clojure sequences are lazy, but the rest of the language, including function evaluation, is eager. Clojure will invoke the function every time for you. Use the second version of your rank-selection function.

I have two versions of a function to count leading hash(#) characters, which is better?

I wrote a piece of code to count the leading hash(#) character of a line, which is much like a heading line in Markdown
### Line one -> return 3
######## Line two -> return 6 (Only care about the first 6 characters.
Version 1
(defn
count-leading-hash
[line]
(let [cnt (count (take-while #(= % \#) line))]
(if (> cnt 6) 6 cnt)))
Version 2
(defn
count-leading-hash
[line]
(loop [cnt 0]
(if (and (= (.charAt line cnt) \#) (< cnt 6))
(recur (inc cnt))
cnt)))
I used time to measure both tow implementations, found that the first version based on take-while is 2x faster than version 2. Taken "###### Line one" as input, version 1 took 0.09 msecs, version 2 took about 0.19 msecs.
Question 1. Is it recur that slows down the second implementation?
Question 2. Version 1 is closer to functional programming paradigm , is it?
Question 3. Which one do you prefer? Why? (You're welcome to write your own implementation.)
--Update--
After reading the doc of cloujure, I came up with a new version of this function, and I think it's much clear.
(defn
count-leading-hash
[line]
(->> line (take 6) (take-while #(= \# %)) count))
IMO it isn't useful to take time measurements for small pieces of code
Yes, version 1 is more functional
I prefer version 1 because it is easier to spot errors
I prefer version 1 because it is less code, thus less cost to maintain.
I would write the function like this:
(defn count-leading-hash [line]
(count (take-while #{\#} (take 6 line))))
No, it's the reflection used to invoke .charAt. Call (set! *warn-on-reflection* true) before creating the function, and you'll see the warning.
Insofar as it uses HOFs, sure.
The first, though (if (> cnt 6) 6 cnt) is better written as (min 6 cnt).
1: No. recur is pretty fast. For every function you call, there is a bit of overhead and "noise" from the VM: the REPL needs to parse and evaluate your call for example, or some garbage collection might happen. That's why benchmarks on such tiny bits of code don't mean anything.
Compare with:
(defn
count-leading-hash
[line]
(let [cnt (count (take-while #(= % \#) line))]
(if (> cnt 6) 6 cnt)))
(defn
count-leading-hash2
[line]
(loop [cnt 0]
(if (and (= (.charAt line cnt) \#) (< cnt 6))
(recur (inc cnt))
cnt)))
(def lines ["### Line one" "######## Line two"])
(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash lines)))))
;; "Elapsed time: 620.628 msecs"
;; => nil
(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash2 lines)))))
;; "Elapsed time: 592.721 msecs"
;; => nil
No significant difference.
2: Using loop/recur is not idiomatic in this instance; it's best to use it only when you really need it and use other available functions when you can. There are many useful functions that operate on collections/sequences; check ClojureDocs for a reference and examples. In my experience, people with imperative programming skills who are new to functional programming use loop/recur a lot more than those who have a lot of Clojure experience; loop/recur can be a code smell.
3: I like the first version better. There are lots of different approaches:
;; more expensive, because it iterates n times, where n is the number of #'s
(defn count-leading-hash [line]
(min 6 (count (take-while #(= \# %) line))))
;; takes only at most 6 characters from line, so less expensive
(defn count-leading-hash [line]
(count (take-while #(= \# %) (take 6 line))))
;; instead of an anonymous function, you can use `partial`
(defn count-leading-hash [line]
(count (take-while (partial = \#) (take 6 line))))
edit:
How to decide when to use partial vs an anonymous function?
In terms of performance it doesn't matter, because (partial = \#) evaluates to (fn [& args] (apply = \# args)). #(= \# %) translates to (fn [arg] (= \# arg)). Both are very similar, but partial gives you a function that accepts an arbitrary number of arguments, so in situations where you need it, that's the way to go. partial is the λ (lambda) in lambda calculus. I'd say, use what's easier to read, or partial if you need a function with an arbitrary number of arguments.
Micro-benchmarks on the JVM are almost always misleading, unless you really know what you're doing. So, I wouldn't put too much weight on the relative performance of your two solutions.
The first solution is more idiomatic. You only really see explicit loop/recur in Clojure code when it's the only reasonable alternative. In this case, clearly, there is a reasonable alternative.
Another option, if you're comfortable with regular expressions:
(defn count-leading-hash [line]
(count (or (re-find #"^#{1,6}" line) "")))