Memoize a Clojure function that takes a lazy sequence as input - clojure

How can I have memoize work when the argument to a memoised function is a sequence
(defn foo
([x] (println "Hello First") (reduce + x))
([x y] (println "Hello Second") (reduce + (range x y))))
(def baz (memoize foo))
Passing one arg:
1)
(time (baz (range 1 1000000))) ;=> Hello First "Elapsed time: 14.870628 msecs"
2)
(time (baz (range 1 1000000))) ;=> "Elapsed time: 65.386561 msecs"
Passing 2 args:
1)
(time (baz 1 1000000)) ;=> Hello Second "Elapsed time: 18.619768 msecs"
2)
(time (baz 1 1000000)) ;=> "Elapsed time: 0.069684 msecs"
The second run of the function when passed 2 arguments seems to be what I expect.
However using a vector appears to work...
(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> Hello First "Elapsed time: 0.294963 msecs"
(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> "Elapsed time: 0.068229 msecs"

memoize does work with sequences, you just need to compare apples to apples. memoize looks up the parameter in the hash map of previously used ones, and as a result you end up comparing the sequences. Comparing long sequences is what takes a long time, whether they are vectors or not:
user> (def x (vec (range 1000000)))
;; => #'user/x
user> (def y (vec (range 1000000)))
;; => #'user/y
user> (time (= x y))
"Elapsed time: 64.351274 msecs"
;; => true
user> (time (baz x))
"Elapsed time: 67.42694 msecs"
;; => 499999500000
user> (time (baz x))
"Elapsed time: 73.231174 msecs"
;; => 499999500000
When you use very short input sequences, the timing is dominated by the reduce inside your function. But with very long ones most of the time you see is actually the comparison time inside memoize.
So technically memoize works, in the same way for all sequences. But working "technically" doesn't imply "being useful." As you have discovered yourself, it is useless (actually maybe even harmful) for input with expensive comparison semantics. Your second signature solves this problem.

Related

How to make this Clojure code faster?

Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.
Here's what I got so far:
(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))
Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':
Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000
There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).
I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.
If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.
For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.
Code for the above (requires [tupelo "0.9.69"] ):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~#forms)
`(time ~#forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.

Rust vs. Clojure speed comparasion, any improvement for the Clojure code?

I translated a piece of Rust code example to Clojure.
Rust (imperative and functional):
Note: Both imperative and functional code here are together for clarity. In the test, I run them separately.
// The `AdditiveIterator` trait adds the `sum` method to iterators
use std::iter::AdditiveIterator;
use std::iter;
fn main() {
println!("Find the sum of all the squared odd numbers under 1000");
let upper = 1000u;
// Imperative approach
// Declare accumulator variable
let mut acc = 0;
// Iterate: 0, 1, 2, ... to infinity
for n in iter::count(0u, 1) {
// Square the number
let n_squared = n * n;
if n_squared >= upper {
// Break loop if exceeded the upper limit
break;
} else if is_odd(n_squared) {
// Accumulate value, if it's odd
acc += n_squared;
}
}
println!("imperative style: {}", acc);
// Functional approach
let sum_of_squared_odd_numbers =
// All natural numbers
iter::count(0u, 1).
// Squared
map(|n| n * n).
// Below upper limit
take_while(|&n| n < upper).
// That are odd
filter(|n| is_odd(*n)).
// Sum them
sum();
println!("functional style: {}", sum_of_squared_odd_numbers);
}
fn is_odd(n: uint) -> bool {
n % 2 == 1
}
Rust (imperative) time:
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.006s
user 0m0.001s
sys 0m0.004s
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.004s
user 0m0.000s
sys 0m0.004s
~/projects/rust_proj $> time ./hof_imperative
Find the sum of all the squared odd numbers under 1000
imperative style: 5456
real 0m0.005s
user 0m0.004s
sys 0m0.001s
Rust (Functional) time:
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.001s
sys 0m0.004s
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.007s
sys 0m0.000s
~/projects/rust_proj $> time ./hof
Find the sum of all the squared odd numbers under 1000
functional style: 5456
real 0m0.007s
user 0m0.004s
sys 0m0.003s
Clojure:
(defn sum-square-less-1000 []
"Find the sum of all the squared odd numbers under 1000
"
(->> (iterate inc 0)
(map (fn [n] (* n n)))
(take-while (partial > 1000))
(filter odd?)
(reduce +)))
Clojure time:
user> (time (sum-square-less-1000))
"Elapsed time: 0.443562 msecs"
5456
user> (time (sum-square-less-1000))
"Elapsed time: 0.201981 msecs"
5456
user> (time (sum-square-less-1000))
"Elapsed time: 0.4752 msecs"
5456
Question:
What's the difference of (reduce +) and (apply +) in Clojure?
Is this Clojure code the idiomatic way?
Can I draw conclusion that Speed: Clojure > Rust imperative > Rust functional ? Clojure really surprised me here for performance.
If you look at the source of +, you will see that (reduce +) and (apply +) are identical for higher argument counts. (apply +) is optimized for the 1 or 2 argument versions though.
(range) is going to be much faster than (iterate inc 0) for most cases.
partial is slower than a simple anonymous function, and should be reserved for cases where you don't know how many more args will be supplied.
Showing the results of benchmarking with criterium, we can see that applying those changes give a 36% drop in execution time:
user> (crit/bench (->> (iterate inc 0)
(map (fn [n] (* n n)))
(take-while (partial > 1000))
(filter odd?)
(reduce +)))
WARNING: Final GC required 2.679748643529675 % of runtime
Evaluation count : 3522840 in 60 samples of 58714 calls.
Execution time mean : 16.954649 µs
Execution time std-deviation : 140.180401 ns
Execution time lower quantile : 16.720122 µs ( 2.5%)
Execution time upper quantile : 17.261693 µs (97.5%)
Overhead used : 2.208566 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user> (crit/bench (->> (range)
(map (fn [n] (* n n)))
(take-while #(> 1000 %))
(filter odd?)
(reduce +)))
Evaluation count : 5521440 in 60 samples of 92024 calls.
Execution time mean : 10.993332 µs
Execution time std-deviation : 118.100723 ns
Execution time lower quantile : 10.855536 µs ( 2.5%)
Execution time upper quantile : 11.238964 µs (97.5%)
Overhead used : 2.208566 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 1 (1.6667 %)
low-mild 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
The Clojure code looks idiomatic in my opinion but it's doing a lot of unnecessary work. Here is an alternative way.
(reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2))
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.180778 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.255972 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.346192 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.162615 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.257901 msecs"
5456
user=> (time (reduce #(+ %1 (* %2 %2)) 0 (range 1 32 2)))
"Elapsed time: 0.175507 msecs"
5456
You can't really conclude that one is faster than the other based on this test though. Benchmarking is a tricky game. You need to test your programs in production-like environments with heavy inputs to get any meaningful results.
What's the difference of (reduce +) and (apply +) in Clojure?
apply is a higher order function with variable arity. Its first argument is a function of variable arity, takes a bunch of intervening args, and then the last arg must be a list of args. It works by first consing the intervening args to the list of args, then passes the args to the function.
Example:
(apply + 0 1 2 3 '(4 5 6 7))
=> (apply + '(0 1 2 3 4 5 6 7))
=> (+ 0 1 2 3 4 5 6 7)
=> result
As for reduce, well I think the docs say it clearly
user=> (doc reduce)
-------------------------
clojure.core/reduce
([f coll] [f val coll])
f should be a function of 2 arguments. If val is not supplied,
returns the result of applying f to the first 2 items in coll, then
applying f to that result and the 3rd item, etc. If coll contains no
items, f must accept no arguments as well, and reduce returns the
result of calling f with no arguments. If coll has only 1 item, it
is returned and f is not called. If val is supplied, returns the
result of applying f to val and the first item in coll, then
applying f to that result and the 2nd item, etc. If coll contains no
items, returns val and f is not called.
nil
There are situations were you could use either apply f coll or reduce f coll, but I normally use apply when f has variable arity, and reduce when f is a 2-ary function.

PI Approximation: Why is my declarative version slower?

I'm approximating PI using the series:
The function for the series then looks like this:
(defn- pi-series [k]
(/ (if (even? (inc k)) 1 -1)
(dec (* 2 k))))
And then my series generator looks like *:
(defn pi [n]
(* 4
(loop [k 1
acc 0]
(if (= k (inc n))
acc
(recur (inc k)
(+ acc (double (pi-series k))))))))
Running pi with the value 999,999 produces the following:
(time (pi 999999))
;;=> "Elapsed time: 497.686 msecs"
;;=> 3.1415936535907734
That looks great, but I realize pi could be written more declarative. Here's what I ended up with:
(defn pi-fn [n]
(* 4 (reduce +
(map #(double (pi-series %))
(range 1 (inc n))))))
Which resulted in the following:
(time (pi-fn 999999))
;;=> "Elapsed time: 4431.626 msecs"
;;=> 3.1415936535907734
NOTE: The declarative version took around 4-seconds longer. Why?
Why is the declarative version so much slower? How can I update the declarative version to make it as fast as the imperative version?
I'm casting the result of pi-series to a double, because using clojure's ratio types performed a lot slower.
By the way, you can express an alternating finite sum as a difference of two sums, eliminating the need to adjust each term for sign individually. For example,
(defn alt-sum [f n]
(- (apply + (map f (range 1 (inc n) 2)))
(apply + (map f (range 2 (inc n) 2)))))
(time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
; "Elapsed time: 195.244047 msecs"
;= 3.141593653590707
On my laptop pi runs at 2500 msec. However, pi and pi-fn (either version) run at approx. the same rate (10x slower than alt-sum). More often than not, pi-fn is faster than pi. Are you sure you didn't accidentally insert an extra 9 before the second timing? Contra Juan, I do not think you're iterating over the sequence more than once, since the terms are generated lazily.
scratch.core> (time (pi 999999))
"Elapsed time: 2682.86669 msecs"
3.1415936535907734
scratch.core> (time (pi-fn 999999))
"Elapsed time: 2082.071798 msecs"
3.1415936535907734
scratch.core> (time (pi-fn-juan 999999))
"Elapsed time: 1934.976217 msecs"
3.1415936535907734
scratch.core> (time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
"Elapsed time: 199.998438 msecs"
3.141593653590707

Head retention in Clojure

Reading paragraph about head retention in "Clojure Programming" (page 98), i couldn't figure out what happens in split-with example. I've tried to experiment with repl but it made me more confused.
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 1910.401711 msecs"
[12 9999988]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b) (count a)]))
"Elapsed time: 3580.473787 msecs"
[9999988 12]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b)]))
"Elapsed time: 3516.70982 msecs"
[9999988]
As you can see from the last example, if I don't compute a, time consuming somehow grows. I guess, i've missed something here, but what?
Count is O(1). That's why your measurements don't depend on it.
The count function is O(1) for Counted collections, which includes vectors and lists.
Sequences, on the other hand, are not counted which makes count O(n) for them. The important part here is that the functions take-while and drop-while return sequences. The fact that they are also lazy is not a major factor here.
When using time a a benchmark, run the tests many times to get an accurate result
user> (defn example2 [] (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
#'user/example2
user> (dorun (take 1000 (repeatedly example2)))
nil
user> (time (example2))
"Elapsed time: 614.4 msecs"
[12 9999988]
I blame variance in runtime because the hotspot compiler has not yet fully optomized the generated classes. I ran the first and second examples several times and got mixed relative results:
run example one twice:
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 929.681423 msecs"
[12 9999988]
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 887.81269 msecs"
[12 9999988]
then run example two a couple times:
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 3838.751561 msecs"
[12 9999988]
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 970.397078 msecs"
[12 9999988]
sometiems the second examples are just as fast
Binding in let form performed even we don't use this value.
(let [x (println "Side effect")] 1)
Code above prints "Side effect", and return 1.
In all your three examples used the same binding in let form, so I don't see any difference here. By the way, on my machine all your snippets took approximately equal time.
The real difference when you try something like this:
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count a)]))
"Elapsed time: 0.128304 msecs"
[100]
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count b)]))
"Elapsed time: 3807.591342 msecs"
[19999900]
Due to fact that b and a are lazy sequences, count works in O(n) time. But in first example we don't calculate count for b, so it works almost immediately.
the time it is showing is completely system dependent....
if you re-execute it, it will show some different elapsed time for each execution

Non-linear slowdown creating a lazy seq in Clojure

I implemented a function that returns the n-grams of a given input collection as a lazy seq.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
When I call this function with larger input collections, I would expect to see a linear increase in execution time. However, the timing I observe is worse than that:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
corpus is a Cons of string tokens.
What causes this behavior and how can I improve the performance?
I think your issue is with "(count coll)", which is iterating over the coll for each call to ngrams.
The solution would be to use the build in partition function:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Have a look in the partition source if curious about the implementation http://clojuredocs.org/clojure_core/clojure.core/partition
I am far from a Clojure expert, but I think the cons function causes this problem.
Try to use list instead:
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (list (take n coll) (gen-ngrams n (rest coll))))))
I think cons construct a new seq which is more generic than a list, and therefore is slower.
Edit: and if "corpus is a Cons of string tokens", then try to make it a list...