Non-linear slowdown creating a lazy seq in Clojure - clojure

I implemented a function that returns the n-grams of a given input collection as a lazy seq.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
When I call this function with larger input collections, I would expect to see a linear increase in execution time. However, the timing I observe is worse than that:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
corpus is a Cons of string tokens.
What causes this behavior and how can I improve the performance?

I think your issue is with "(count coll)", which is iterating over the coll for each call to ngrams.
The solution would be to use the build in partition function:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Have a look in the partition source if curious about the implementation http://clojuredocs.org/clojure_core/clojure.core/partition

I am far from a Clojure expert, but I think the cons function causes this problem.
Try to use list instead:
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (list (take n coll) (gen-ngrams n (rest coll))))))
I think cons construct a new seq which is more generic than a list, and therefore is slower.
Edit: and if "corpus is a Cons of string tokens", then try to make it a list...

Related

How to make this Clojure code faster?

Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.
Here's what I got so far:
(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))
Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':
Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000
There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).
I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.
If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.
For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.
Code for the above (requires [tupelo "0.9.69"] ):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~#forms)
`(time ~#forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.

Memoize a Clojure function that takes a lazy sequence as input

How can I have memoize work when the argument to a memoised function is a sequence
(defn foo
([x] (println "Hello First") (reduce + x))
([x y] (println "Hello Second") (reduce + (range x y))))
(def baz (memoize foo))
Passing one arg:
1)
(time (baz (range 1 1000000))) ;=> Hello First "Elapsed time: 14.870628 msecs"
2)
(time (baz (range 1 1000000))) ;=> "Elapsed time: 65.386561 msecs"
Passing 2 args:
1)
(time (baz 1 1000000)) ;=> Hello Second "Elapsed time: 18.619768 msecs"
2)
(time (baz 1 1000000)) ;=> "Elapsed time: 0.069684 msecs"
The second run of the function when passed 2 arguments seems to be what I expect.
However using a vector appears to work...
(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> Hello First "Elapsed time: 0.294963 msecs"
(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> "Elapsed time: 0.068229 msecs"
memoize does work with sequences, you just need to compare apples to apples. memoize looks up the parameter in the hash map of previously used ones, and as a result you end up comparing the sequences. Comparing long sequences is what takes a long time, whether they are vectors or not:
user> (def x (vec (range 1000000)))
;; => #'user/x
user> (def y (vec (range 1000000)))
;; => #'user/y
user> (time (= x y))
"Elapsed time: 64.351274 msecs"
;; => true
user> (time (baz x))
"Elapsed time: 67.42694 msecs"
;; => 499999500000
user> (time (baz x))
"Elapsed time: 73.231174 msecs"
;; => 499999500000
When you use very short input sequences, the timing is dominated by the reduce inside your function. But with very long ones most of the time you see is actually the comparison time inside memoize.
So technically memoize works, in the same way for all sequences. But working "technically" doesn't imply "being useful." As you have discovered yourself, it is useless (actually maybe even harmful) for input with expensive comparison semantics. Your second signature solves this problem.

Largest collections first

I have a list of lists. And I want the biggest lists to come at the beginning. This works, but can take a long time:
(reverse (sort-by count coll))
What is a more efficient way of doing this, presumably in one go?
thank you galdre for pointing out my error
don't use lazy sequences
a quick demo:
user> (let [xs (doall(repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 29.393886 msecs"
1000000
user> (let [xs (into [] (repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 0.013346 msecs"
1000000

PI Approximation: Why is my declarative version slower?

I'm approximating PI using the series:
The function for the series then looks like this:
(defn- pi-series [k]
(/ (if (even? (inc k)) 1 -1)
(dec (* 2 k))))
And then my series generator looks like *:
(defn pi [n]
(* 4
(loop [k 1
acc 0]
(if (= k (inc n))
acc
(recur (inc k)
(+ acc (double (pi-series k))))))))
Running pi with the value 999,999 produces the following:
(time (pi 999999))
;;=> "Elapsed time: 497.686 msecs"
;;=> 3.1415936535907734
That looks great, but I realize pi could be written more declarative. Here's what I ended up with:
(defn pi-fn [n]
(* 4 (reduce +
(map #(double (pi-series %))
(range 1 (inc n))))))
Which resulted in the following:
(time (pi-fn 999999))
;;=> "Elapsed time: 4431.626 msecs"
;;=> 3.1415936535907734
NOTE: The declarative version took around 4-seconds longer. Why?
Why is the declarative version so much slower? How can I update the declarative version to make it as fast as the imperative version?
I'm casting the result of pi-series to a double, because using clojure's ratio types performed a lot slower.
By the way, you can express an alternating finite sum as a difference of two sums, eliminating the need to adjust each term for sign individually. For example,
(defn alt-sum [f n]
(- (apply + (map f (range 1 (inc n) 2)))
(apply + (map f (range 2 (inc n) 2)))))
(time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
; "Elapsed time: 195.244047 msecs"
;= 3.141593653590707
On my laptop pi runs at 2500 msec. However, pi and pi-fn (either version) run at approx. the same rate (10x slower than alt-sum). More often than not, pi-fn is faster than pi. Are you sure you didn't accidentally insert an extra 9 before the second timing? Contra Juan, I do not think you're iterating over the sequence more than once, since the terms are generated lazily.
scratch.core> (time (pi 999999))
"Elapsed time: 2682.86669 msecs"
3.1415936535907734
scratch.core> (time (pi-fn 999999))
"Elapsed time: 2082.071798 msecs"
3.1415936535907734
scratch.core> (time (pi-fn-juan 999999))
"Elapsed time: 1934.976217 msecs"
3.1415936535907734
scratch.core> (time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
"Elapsed time: 199.998438 msecs"
3.141593653590707

Head retention in Clojure

Reading paragraph about head retention in "Clojure Programming" (page 98), i couldn't figure out what happens in split-with example. I've tried to experiment with repl but it made me more confused.
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 1910.401711 msecs"
[12 9999988]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b) (count a)]))
"Elapsed time: 3580.473787 msecs"
[9999988 12]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b)]))
"Elapsed time: 3516.70982 msecs"
[9999988]
As you can see from the last example, if I don't compute a, time consuming somehow grows. I guess, i've missed something here, but what?
Count is O(1). That's why your measurements don't depend on it.
The count function is O(1) for Counted collections, which includes vectors and lists.
Sequences, on the other hand, are not counted which makes count O(n) for them. The important part here is that the functions take-while and drop-while return sequences. The fact that they are also lazy is not a major factor here.
When using time a a benchmark, run the tests many times to get an accurate result
user> (defn example2 [] (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
#'user/example2
user> (dorun (take 1000 (repeatedly example2)))
nil
user> (time (example2))
"Elapsed time: 614.4 msecs"
[12 9999988]
I blame variance in runtime because the hotspot compiler has not yet fully optomized the generated classes. I ran the first and second examples several times and got mixed relative results:
run example one twice:
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 929.681423 msecs"
[12 9999988]
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 887.81269 msecs"
[12 9999988]
then run example two a couple times:
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 3838.751561 msecs"
[12 9999988]
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 970.397078 msecs"
[12 9999988]
sometiems the second examples are just as fast
Binding in let form performed even we don't use this value.
(let [x (println "Side effect")] 1)
Code above prints "Side effect", and return 1.
In all your three examples used the same binding in let form, so I don't see any difference here. By the way, on my machine all your snippets took approximately equal time.
The real difference when you try something like this:
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count a)]))
"Elapsed time: 0.128304 msecs"
[100]
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count b)]))
"Elapsed time: 3807.591342 msecs"
[19999900]
Due to fact that b and a are lazy sequences, count works in O(n) time. But in first example we don't calculate count for b, so it works almost immediately.
the time it is showing is completely system dependent....
if you re-execute it, it will show some different elapsed time for each execution