I have a list of lists. And I want the biggest lists to come at the beginning. This works, but can take a long time:
(reverse (sort-by count coll))
What is a more efficient way of doing this, presumably in one go?
thank you galdre for pointing out my error
don't use lazy sequences
a quick demo:
user> (let [xs (doall(repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 29.393886 msecs"
1000000
user> (let [xs (into [] (repeat 1000000 1))]
(time (count xs)))
"Elapsed time: 0.013346 msecs"
1000000
Related
Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.
Here's what I got so far:
(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))
Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':
Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000
There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).
I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.
If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.
For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.
Code for the above (requires [tupelo "0.9.69"] ):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~#forms)
`(time ~#forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.
I'm approximating PI using the series:
The function for the series then looks like this:
(defn- pi-series [k]
(/ (if (even? (inc k)) 1 -1)
(dec (* 2 k))))
And then my series generator looks like *:
(defn pi [n]
(* 4
(loop [k 1
acc 0]
(if (= k (inc n))
acc
(recur (inc k)
(+ acc (double (pi-series k))))))))
Running pi with the value 999,999 produces the following:
(time (pi 999999))
;;=> "Elapsed time: 497.686 msecs"
;;=> 3.1415936535907734
That looks great, but I realize pi could be written more declarative. Here's what I ended up with:
(defn pi-fn [n]
(* 4 (reduce +
(map #(double (pi-series %))
(range 1 (inc n))))))
Which resulted in the following:
(time (pi-fn 999999))
;;=> "Elapsed time: 4431.626 msecs"
;;=> 3.1415936535907734
NOTE: The declarative version took around 4-seconds longer. Why?
Why is the declarative version so much slower? How can I update the declarative version to make it as fast as the imperative version?
I'm casting the result of pi-series to a double, because using clojure's ratio types performed a lot slower.
By the way, you can express an alternating finite sum as a difference of two sums, eliminating the need to adjust each term for sign individually. For example,
(defn alt-sum [f n]
(- (apply + (map f (range 1 (inc n) 2)))
(apply + (map f (range 2 (inc n) 2)))))
(time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
; "Elapsed time: 195.244047 msecs"
;= 3.141593653590707
On my laptop pi runs at 2500 msec. However, pi and pi-fn (either version) run at approx. the same rate (10x slower than alt-sum). More often than not, pi-fn is faster than pi. Are you sure you didn't accidentally insert an extra 9 before the second timing? Contra Juan, I do not think you're iterating over the sequence more than once, since the terms are generated lazily.
scratch.core> (time (pi 999999))
"Elapsed time: 2682.86669 msecs"
3.1415936535907734
scratch.core> (time (pi-fn 999999))
"Elapsed time: 2082.071798 msecs"
3.1415936535907734
scratch.core> (time (pi-fn-juan 999999))
"Elapsed time: 1934.976217 msecs"
3.1415936535907734
scratch.core> (time (* 4 (alt-sum #(/ 1.0 (dec (+ % %))) 999999)))
"Elapsed time: 199.998438 msecs"
3.141593653590707
Reading paragraph about head retention in "Clojure Programming" (page 98), i couldn't figure out what happens in split-with example. I've tried to experiment with repl but it made me more confused.
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 1910.401711 msecs"
[12 9999988]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b) (count a)]))
"Elapsed time: 3580.473787 msecs"
[9999988 12]
(time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count b)]))
"Elapsed time: 3516.70982 msecs"
[9999988]
As you can see from the last example, if I don't compute a, time consuming somehow grows. I guess, i've missed something here, but what?
Count is O(1). That's why your measurements don't depend on it.
The count function is O(1) for Counted collections, which includes vectors and lists.
Sequences, on the other hand, are not counted which makes count O(n) for them. The important part here is that the functions take-while and drop-while return sequences. The fact that they are also lazy is not a major factor here.
When using time a a benchmark, run the tests many times to get an accurate result
user> (defn example2 [] (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
#'user/example2
user> (dorun (take 1000 (repeatedly example2)))
nil
user> (time (example2))
"Elapsed time: 614.4 msecs"
[12 9999988]
I blame variance in runtime because the hotspot compiler has not yet fully optomized the generated classes. I ran the first and second examples several times and got mixed relative results:
run example one twice:
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 929.681423 msecs"
[12 9999988]
autotestbed.core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 887.81269 msecs"
[12 9999988]
then run example two a couple times:
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 3838.751561 msecs"
[12 9999988]
core> (time (let [r (range 1e7)
a (take-while #(< % 12) r)
b (drop-while #(< % 12) r)]
[(count a) (count b)]))
"Elapsed time: 970.397078 msecs"
[12 9999988]
sometiems the second examples are just as fast
Binding in let form performed even we don't use this value.
(let [x (println "Side effect")] 1)
Code above prints "Side effect", and return 1.
In all your three examples used the same binding in let form, so I don't see any difference here. By the way, on my machine all your snippets took approximately equal time.
The real difference when you try something like this:
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count a)]))
"Elapsed time: 0.128304 msecs"
[100]
(time (let [r (range 2e7)
a (take 100 r)
b (drop 100 r)]
[(count b)]))
"Elapsed time: 3807.591342 msecs"
[19999900]
Due to fact that b and a are lazy sequences, count works in O(n) time. But in first example we don't calculate count for b, so it works almost immediately.
the time it is showing is completely system dependent....
if you re-execute it, it will show some different elapsed time for each execution
I implemented a function that returns the n-grams of a given input collection as a lazy seq.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
When I call this function with larger input collections, I would expect to see a linear increase in execution time. However, the timing I observe is worse than that:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
corpus is a Cons of string tokens.
What causes this behavior and how can I improve the performance?
I think your issue is with "(count coll)", which is iterating over the coll for each call to ngrams.
The solution would be to use the build in partition function:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Have a look in the partition source if curious about the implementation http://clojuredocs.org/clojure_core/clojure.core/partition
I am far from a Clojure expert, but I think the cons function causes this problem.
Try to use list instead:
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (list (take n coll) (gen-ngrams n (rest coll))))))
I think cons construct a new seq which is more generic than a list, and therefore is slower.
Edit: and if "corpus is a Cons of string tokens", then try to make it a list...
I have a question about why there is such a difference in speed between the loop method and the iterate method in clojure. I followed the tutorial in http://www.learningclojure.com/2010/02/clojure-dojo-2-what-about-numbers-that.html and defined two square-root methods using the Heron method:
(defn avg [& nums] (/ (apply + nums) (count nums)))
(defn abs [x] (if (< x 0) (- x) x))
(defn close [a b] (-> a (- b) abs (< 1e-10) ) )
(defn sqrt [num]
(loop [guess 1]
(if (close num (* guess guess))
guess
(recur (avg guess (/ num guess)))
)))
(time (dotimes [n 10000] (sqrt 10))) ;;"Elapsed time: 1169.502 msecs"
;; Calculation using the iterate method
(defn sqrt2 [number]
(first (filter #(close number (* % %))
(iterate #(avg % (/ number %)) 1.0))))
(time (dotimes [n 10000] (sqrt2 10))) ;;"Elapsed time: 184.119 msecs"
There is about a x10 increase in speed between the two methods and I'm wondering what is happening below the surface to cause the two to be so pronouced?
Your results are surprising: normally loop/recur is the fastest construct in Clojure for looping.
I suspect that the JVM JIT has worked out a clever optimisation for the iterate method, but not for the loop/recur version. It's surprising how often this happens when you use clean functional code in Clojure: it seems to be very amenable to optimisation.
Note that you can get a substantial speedup in both versions by explicitly using doubles:
(set! *unchecked-math* true)
(defn sqrt [num]
(loop [guess (double 1)]
(if (close num (* guess guess))
guess
(recur (double (avg guess (/ num guess)))))))
(time (dotimes [n 10000] (sqrt 10)))
=> "Elapsed time: 25.347291 msecs"
(defn sqrt2 [number]
(let [number (double number)]
(first (filter #(close number (* % %))
(iterate #(avg % (/ number %)) 1.0)))))
(time (dotimes [n 10000] (sqrt 10)))
=> "Elapsed time: 32.939526 msecs"
As expected, the loop/recur version now has a slight edge. Results are for Clojure 1.3