(time (fib 30))
;; "Elapsed time: 8179.04028 msecs"
;; Fibonacci number with recursion and memoize.
(def m-fib
(memoize (fn [n]
(condp = n
0 1
1 1
(+ (m-fib (dec n)) (m-fib (- n 2)))))))
(time (m-fib 30))
;; "Elapsed time: 1.282557 msecs"
This the example code from https://clojuredocs.org/clojure.core/memoize, but when I run it on my browser, the time doesn't change at all. I am wondering why it is?
I don't know what browser-based REPL you're using, but the left-hand column shows the result of the function call, not the time, which is why both values are the same. Both functions compute the same Fibonacci number.
The time macro is designed to wrap arbitrary expressions without having to change the structure of the code, so it returns the same value that's returned by the wrapped expression and prints the execution time to standard out. Essentially, the expression (time (m-fib 30)) is expanded to
(let [start (. System (nanoTime))
ret (m-fib 30)]
(println (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start)) 1000000.0) " msecs"))
ret)
So to see the execution time in your REPL, you'll need to see the printed output of an expression.
Related
Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.
Here's what I got so far:
(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))
Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':
Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000
There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).
I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.
If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.
For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.
Code for the above (requires [tupelo "0.9.69"] ):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~#forms)
`(time ~#forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.
Can anyone explain why the time jumps by an order of magnitude simply by wrapping this in a function?
user> (time (loop [n 0 t 0]
(if (= n 10000000)
t
(recur (inc n) (+ t n)))))
"Elapsed time: 29.312145 msecs"
49999995000000
user> (defn tl [r] (loop [n 0 t 0]
(if (= n r)
t
(recur (inc n) (+ t n)))))
#<Var#54dd6004: #object[user$eval3462$tl__3463 0x7d8ba46 "user$eval3462$tl__3463#7d8ba46"]>
user> (time (tl 10000000))
"Elapsed time: 507.333844 msecs"
49999995000000
I'm curious how a simply iteration like this can be made much faster. For example, a similar iterative loop in C++ takes less than 1 ms in Release mode, or about 20 ms in Debug mode on the same system as this Clojure code.
This happens because in second case passed argument is being boxed. Add type hint to fix this:
user> (defn tl [^long r]
(loop [n 0 t 0]
(if (= n r)
t
(recur (inc n) (+ t n)))))
user> (time (tl 10000000))
"Elapsed time: 20.268396 msecs"
49999995000000
UPD:
1, 2) In the first case you operate with java primitives, that's why it's so fast. ^Integer won't work here, because it's type hint for boxed type java.lang.Integer (that's why it's capitalized). ^long is type hint exactly for java long primitive. For function parameters you may do only ^long and ^double primitive type hints (others are not supported besides of these you may do type hints for all kinds of primitive arrays, like ^floats, ^bytes etc.).
3) Since passed argument is boxed this forces generic aritmethics for all operations. In other words, every + and inc operation will create new object on heap (in case of primitives, they will remain on stack).
UPD 2:
As an alternative to type hinting you may explicitly convert passed argument to primitive before the loop:
user> (defn tl [r]
(let [r (long r)]
(loop [n 0 t 0]
(if (= n r)
t
(recur (inc n) (+ t n))))))
user> (time (tl 10000000))
"Elapsed time: 18.907161 msecs"
49999995000000
I translated a piece of Rust code example to Clojure.
Rust (imperative and functional):
Note: Both imperative and functional code here are together for clarity. In the test, I run them separately.
// The `AdditiveIterator` trait adds the `sum` method to iterators
use std::iter::AdditiveIterator;
use std::iter;
fn main() {
println!("Find the sum of all the squared odd numbers under 1000");
let upper = 1000u;
// Imperative approach
// Declare accumulator variable
let mut acc = 0;
// Iterate: 0, 1, 2, ... to infinity
for n in iter::count(0u, 1) {
// Square the number
let n_squared = n * n;
if n_squared >= upper {
// Break loop if exceeded the upper limit
break;
} else if is_odd(n_squared) {
// Accumulate value, if it's odd
acc += n_squared;
}
}
println!("imperative style: {}", acc);
// Functional approach
let sum_of_squared_odd_numbers =
// All natural numbers
iter::count(0u, 1).
// Squared
map(|n| n * n).
// Below upper limit
take_while(|&n| n < upper).
// That are odd
filter(|n| is_odd(*n)).
// Sum them
sum();
println!("functional style: {}", sum_of_squared_odd_numbers);
}
fn is_odd(n: uint) -> bool {
n % 2 == 1
}
Rust (imperative) time:
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.006s
user 0m0.001s
sys 0m0.004s
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.004s
user 0m0.000s
sys 0m0.004s
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.005s
user 0m0.004s
sys 0m0.001s
Rust (Functional) time:
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.001s
sys 0m0.004s
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.007s
sys 0m0.000s
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.004s
sys 0m0.003s
Clojure:
(defn sum-square-less-1000 []
"Find the sum of all the squared odd numbers under 1000
"
(->> (iterate inc 0)
(map (fn [n] (* n n)))
(take-while (partial > 1000))
(filter odd?)
(reduce +)))
Clojure time:
user> (time (sum-square-less-1000))
"Elapsed time: 0.443562 msecs"
5456
user> (time (sum-square-less-1000))
"Elapsed time: 0.201981 msecs"
5456
user> (time (sum-square-less-1000))
"Elapsed time: 0.4752 msecs"
5456
Question:
What's the difference of (reduce +) and (apply +) in Clojure?
Is this Clojure code the idiomatic way?
Can I draw conclusion that Speed: Clojure > Rust imperative > Rust functional ? Clojure really surprised me here for performance.
If you look at the source of +, you will see that (reduce +) and (apply +) are identical for higher argument counts. (apply +) is optimized for the 1 or 2 argument versions though.
(range) is going to be much faster than (iterate inc 0) for most cases.
partial is slower than a simple anonymous function, and should be reserved for cases where you don't know how many more args will be supplied.
Showing the results of benchmarking with criterium, we can see that applying those changes give a 36% drop in execution time:
user> (crit/bench (->> (iterate inc 0)
(map (fn [n] (* n n)))
(take-while (partial > 1000))
(filter odd?)
(reduce +)))
WARNING: Final GC required 2.679748643529675 % of runtime
Evaluation count : 3522840 in 60 samples of 58714 calls.
Execution time mean : 16.954649 µs
Execution time std-deviation : 140.180401 ns
Execution time lower quantile : 16.720122 µs ( 2.5%)
Execution time upper quantile : 17.261693 µs (97.5%)
Overhead used : 2.208566 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user> (crit/bench (->> (range)
(map (fn [n] (* n n)))
(take-while #(> 1000 %))
(filter odd?)
(reduce +)))
Evaluation count : 5521440 in 60 samples of 92024 calls.
Execution time mean : 10.993332 µs
Execution time std-deviation : 118.100723 ns
Execution time lower quantile : 10.855536 µs ( 2.5%)
Execution time upper quantile : 11.238964 µs (97.5%)
Overhead used : 2.208566 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 1 (1.6667 %)
low-mild 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
The Clojure code looks idiomatic in my opinion but it's doing a lot of unnecessary work. Here is an alternative way.
(reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2))
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.180778 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.255972 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.346192 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.162615 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.257901 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.175507 msecs"
5456
You can't really conclude that one is faster than the other based on this test though. Benchmarking is a tricky game. You need to test your programs in production-like environments with heavy inputs to get any meaningful results.
What's the difference of (reduce +) and (apply +) in Clojure?
apply is a higher order function with variable arity. Its first argument is a function of variable arity, takes a bunch of intervening args, and then the last arg must be a list of args. It works by first consing the intervening args to the list of args, then passes the args to the function.
Example:
(apply + 0 1 2 3 '(4 5 6 7))
=> (apply + '(0 1 2 3 4 5 6 7))
=> (+ 0 1 2 3 4 5 6 7)
=> result
As for reduce, well I think the docs say it clearly
user=> (doc reduce)
-------------------------
clojure.core/reduce
([f coll] [f val coll])
f should be a function of 2 arguments. If val is not supplied,
returns the result of applying f to the first 2 items in coll, then
applying f to that result and the 3rd item, etc. If coll contains no
items, f must accept no arguments as well, and reduce returns the
result of calling f with no arguments. If coll has only 1 item, it
is returned and f is not called. If val is supplied, returns the
result of applying f to val and the first item in coll, then
applying f to that result and the 2nd item, etc. If coll contains no
items, returns val and f is not called.
nil
There are situations were you could use either apply f coll or reduce f coll, but I normally use apply when f has variable arity, and reduce when f is a 2-ary function.
I'm approximating PI using the series:
The function for the series then looks like this:
(defn- pi-series [k]
(/ (if (even? (inc k)) 1 -1)
(dec (* 2 k))))
And then my series generator looks like *:
(defn pi [n]
(* 4
(loop [k 1
acc 0]
(if (= k (inc n))
acc
(recur (inc k)
(+ acc (double (pi-series k))))))))
Running pi with the value 999,999 produces the following:
(time (pi 999999))
;;=> "Elapsed time: 497.686 msecs"
;;=> 3.1415936535907734
That looks great, but I realize pi could be written more declarative. Here's what I ended up with:
(defn pi-fn [n]
(* 4 (reduce +
(map #(double (pi-series %))
(range 1 (inc n))))))
Which resulted in the following:
(time (pi-fn 999999))
;;=> "Elapsed time: 4431.626 msecs"
;;=> 3.1415936535907734
NOTE: The declarative version took around 4-seconds longer. Why?
Why is the declarative version so much slower? How can I update the declarative version to make it as fast as the imperative version?
I'm casting the result of pi-series to a double, because using clojure's ratio types performed a lot slower.
By the way, you can express an alternating finite sum as a difference of two sums, eliminating the need to adjust each term for sign individually. For example,
(defn alt-sum [f n]
(- (apply + (map f (range 1 (inc n) 2)))
(apply + (map f (range 2 (inc n) 2)))))
(time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
; "Elapsed time: 195.244047 msecs"
;= 3.141593653590707
On my laptop pi runs at 2500 msec. However, pi and pi-fn (either version) run at approx. the same rate (10x slower than alt-sum). More often than not, pi-fn is faster than pi. Are you sure you didn't accidentally insert an extra 9 before the second timing? Contra Juan, I do not think you're iterating over the sequence more than once, since the terms are generated lazily.
scratch.core> (time (pi 999999))
"Elapsed time: 2682.86669 msecs"
3.1415936535907734
scratch.core> (time (pi-fn 999999))
"Elapsed time: 2082.071798 msecs"
3.1415936535907734
scratch.core> (time (pi-fn-juan 999999))
"Elapsed time: 1934.976217 msecs"
3.1415936535907734
scratch.core> (time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
"Elapsed time: 199.998438 msecs"
3.141593653590707
I implemented a function that returns the n-grams of a given input collection as a lazy seq.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
When I call this function with larger input collections, I would expect to see a linear increase in execution time. However, the timing I observe is worse than that:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
corpus is a Cons of string tokens.
What causes this behavior and how can I improve the performance?
I think your issue is with "(count coll)", which is iterating over the coll for each call to ngrams.
The solution would be to use the build in partition function:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Have a look in the partition source if curious about the implementation http://clojuredocs.org/clojure_core/clojure.core/partition
I am far from a Clojure expert, but I think the cons function causes this problem.
Try to use list instead:
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (list (take n coll) (gen-ngrams n (rest coll))))))
I think cons construct a new seq which is more generic than a list, and therefore is slower.
Edit: and if "corpus is a Cons of string tokens", then try to make it a list...