Trust a Clojure collection to be sorted - clojure

If I get a sorted list of objects from an external API, is there a way to put it in a sorted set without the overhead of re-sorting it? Something like:
=> (sorted? (assume-sorted [1 2 3]))
true

Clojure uses a persistant Red/Black Tree data structure for sorted sets & maps. When an inserted item makes the tree too unbalanced, the root & nodes of the tree are rearranged to keep itself "approximately" balanced.
What your measurement shows is that there is slightly more overhead in rebalancing a tree that only grows on the right (every new addition unbalances the tree further to the right) compared to a tree that grows in random locations (some insertions will randomly make the tree more balanced).
See:
https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentTreeMap.java
Update
I just tried on my computer and get very different results than your test. This once again shows the folly of trying to optimize prematurely (especially if the change is less than 2x):
(def x (range 1000000))
(def y (doall (shuffle x)))
parse.core=> (time (doall (set x) nil))
"Elapsed time: 279.259531 msecs"
"Elapsed time: 291.31022 msecs"
"Elapsed time: 414.752484 msecs"
parse.core=> (time (doall (set y) nil))
"Elapsed time: 286.496324 msecs"
"Elapsed time: 284.95049 msecs"
"Elapsed time: 285.568222 msecs"
"Elapsed time: 301.55659 msecs"
parse.core=> (time (doall (into (sorted-set) x) nil))
"Elapsed time: 816.473169 msecs"
"Elapsed time: 775.025901 msecs"
"Elapsed time: 763.638447 msecs"
parse.core=> (time (doall (into (sorted-set) y) nil))
"Elapsed time: 1307.969889 msecs"
"Elapsed time: 1313.099123 msecs"
"Elapsed time: 1285.665797 msecs"
"Elapsed time: 1307.879676 msecs"
The Moral of the Story
First make it right
If it is fast enough, move on to the next problem
If it needs to be faster, measure where the biggest bottleneck is
Decide if it's cheaper to just use more h/w at $0.03/hr or to spend human time on code changes (which will increase complexity & reduce maintainability etc).

Related

What is a foldable collection in Clojure?

I am beginner to Clojure, while trying to read about Reducers I found something called foldable collection.
They are mentioning that vectors and maps are foldable collection, but not the list.
I am trying to understand what is foldable collection, why vectors and maps are foldable ?
I have not found any definition or explanation for foldable collection.
The answer is there in the docs, if not quite as clear as it could be:
Additionally, some collections (persistent vectors and maps) are
foldable. The fold operation on a reducer executes the reduction in
parallel...
The idea is that, with modern hardware, a "reduction" operation like summing all elements of a vector can be done in parallel. For example, if summing all elements of a 400K length vector, we could break them up into 4 groups of 100K chunks, sum those in parallel, then combine the 4 subtotals into the final answer. This would be approximately 4x faster than using only a single thread (single cpu core).
Reducers live in the clojure.core.reducers namespace. Assume we define aliases like:
( ns demo.xyz
(:require [clojure.core :as core]
[clojure.core.reducers :as r] ))
Compared to clojure.core, we have:
core/reduce <=> r/fold ; new name for `reduce`
core/map <=> r/map ; same name for `map`
core/filter <=> r/filter ; same name for `filter`
So, the naming is not the best. reduce lives in the clojure.core namespace, but there is no reduce in the clojure.core.reducers namespace. Instead, there is a work-alike function named fold in clojure.core.reducers.
Note that fold is a historical name for combining lists of data as with our summation example. See the Wikipedia entry for more information.
Because folding accesses the data in non-linear order (which is very ineffecient for linked lists), folding is only worth doing on random-access data structures like vectors).
Update #1:
Having said the above, remember the adage that "Premature optimization is the root of all evil." Here are some measurements for (vec (range 1e7)), i.e. 10M entries, on an 8-core machine:
(time (reduce + data))
"Elapsed time: 284.52735 msecs"
"Elapsed time: 119.310289 msecs"
"Elapsed time: 98.740421 msecs"
"Elapsed time: 100.58998 msecs"
"Elapsed time: 98.642878 msecs"
"Elapsed time: 105.021808 msecs"
"Elapsed time: 99.886083 msecs"
"Elapsed time: 98.49152 msecs"
"Elapsed time: 99.879767 msecs"
(time (r/fold + data))
"Elapsed time: 61.67537 msecs"
"Elapsed time: 56.811961 msecs"
"Elapsed time: 55.613058 msecs"
"Elapsed time: 58.359599 msecs"
"Elapsed time: 55.299767 msecs"
"Elapsed time: 62.989939 msecs"
"Elapsed time: 56.518486 msecs"
"Elapsed time: 54.218251 msecs"
"Elapsed time: 54.438623 msecs"
Criterium reports:
reduce 144 ms
r/fold 72 ms
Update #2
Rich Hickey talked about the design of transducers/reducers at the 2014 Clojure Conj. You may find these details useful. The basic idea is that the folding is delegated to each collection type, which uses knowledge of its implementation details to perform the fold efficiently.
Since hash-maps use a vector internally, they can fold in parallel efficiently.
There is this talk by Guy Steele which predates reducers and might just have served as an inspiration for them.
https://vimeo.com/6624203

Why does `doall` not force the sequence to be counted?

(counted? (map identity (range 100))) ;; false, expected
(time (counted? (doall (map identity (range 100))))) ;; false, unexpected
(time (counted? (into '() (map identity (range 100))))) ;; true, expected - but slower
(Clojure "1.8.0")
The first result is expected since map is lazy.
The second is unexpected for me, since after doall the entire sequence has been realized, is now in memory. Since the implementation probably has to walk through the list anyway, why not count it?
The third is a workaround. Is it idiomatic? Is there an alternative?
It sounds like you already know that lazy sequences are not counted?.
However, in your example, whilst doall realizes the entire sequence, it still returns that result as a LazySeq. Have a look at this REPL output:
user=> (class (doall (map identity (range 100))))
clojure.lang.LazySeq
Using something like into seems to be the right way to go to me; because you need to force your result into a non-lazy sequence. You say into is slower, but it still seems acceptably fast to me.
Nevertheless, you could perhaps improve the time performance by calling vec on your result, instead of into:
user=> (time (counted? (into '() (map identity (range 100)))))
"Elapsed time: 0.287542 msecs"
true
user=> (time (counted? (vec (map identity (range 100)))))
"Elapsed time: 0.169342 msecs"
true
Note: I'm using Clojure 1.9, rather than 1.8 on my machine, so you may see different results.
Update / corrections:
Commenters have respectfully pointed out that:
1) time is terrible for benchmarking, and doesn't really provide any useful evidence in this instance.
2) (vec x) substituted for (list x); (list x) is a constant-time operation no matter what the contents of x.
3) doall returns its input as its output; you get a LazySeq if you passed in a LazySeq, or a map if you passed in a map, etc.

Clojure Transients Example - No significant speedup

I copied the code from:
http://clojure.org/transients
but my results differ signifiantly from what was posted.
(defn vrange [n]
(loop [i 0 v []]
(if (< i n)
(recur (inc i) (conj v i))
v)))
(defn vrange2 [n]
(loop [i 0 v (transient [])]
(if (< i n)
(recur (inc i) (conj! v i))
(persistent! v))))
(quick-bench (def v (vrange 1000000)))
"Elapsed time: 459.59 msecs"
(quick-bench (def v2 (vrange2 1000000)))
"Elapsed time: 379.85 msecs"
That's a slight speedup, but nothing like the 8x boost implied in the example docs?
Starting java in server mode changes the story, but still nothing like the docs..
(quick-bench (def v (vrange 1000000)))
"Elapsed time: 121.14 msecs"
(quick-bench (def v2 (vrange2 1000000)))
"Elapsed time: 75.15 msecs"
Is it that the persistent implementations have improved since the post about transients here: http://clojure.org/transients ?
What other factors might be contributing to the lack of boost with transients?
I'm using the OpenJDK java version 1.7 on ubuntu 12.04. Maybe that's a lot slower than the (presumed) Hotspot 1.6 version used in the docs? But wouldn't that imply BOTH tests should be slower by some constant, with the same gap?
Your results are consistent with my experience with transients. I've used them quite a bit and I typically see a 2x performance improvement.
I tried this on Ubuntu 12.04, OpenJDK 1.7 with Clojure 1.6.0 and 1.7.0-alpha3. I get 2x performance with transients, slightly less than the 3x I get on OSX with the 1.8 Oracle jvm.
Also the above page is from the time of Clojure 1.2, and the performance of collections has improved significantly since then. I tried the experiment with 1.2 but Criterium doesn't work with it, so I had to use time just like on that page. Obviously the results are significantly variable (from 2x to 8x). I suspect the example in the documentation may have been cherry-picked.

How can I compute the sum of a large list of numbers in parallel using Clojure

I am trying to figure out how to use clojure to efficiently apply a simple operation to a large sequence in parallel. I would like to be able to use the parallel solution to take advantage of the multiple cores on my machine to achieve some speedup.
I am attempting to use pmap in combination with partition-all to reduce the overhead of creating a future for every item in the input seq. Unfortunately, partition-all forces the complete evaluation of each partition seq. This causes an OutOfMemoryError on my machine.
(defn sum [vs]
(reduce + vs))
(def workers
(+ 2 (.. Runtime getRuntime availableProcessors)))
(let
[n 80000000
vs (range n)]
(time (sum vs))
(time (sum (pmap sum (partition-all (long (/ n workers)) vs)))))
How can I apply sum to a large input set, and beat the performance of the serial implementation?
Solution
Thanks to #Arthur Ulfeldt for pointing out the reducers library. Here is the solution using reducers. This code shows the expected performance improvement when running on a multi-core machine. (NOTE: I have changed vs to be a function to make the timing be more accurate)
(require '[clojure.core.reducers :as r])
(let
[n 80000000
vs #(range n)]
(time (reduce + (vs)))
(time (r/fold + (vs)))
When using pmap I have found that fairly large chunks are required to overcome the switching and future overhead try a chunk size of 10,000 for an opperation as fast as +. The potential gains are bounded by the overhead of generating the chunks. This results in an optimal value that balances the available cores and the time required to make the chunks. In this case with + as the workload I was unable to make this faster than the single threaded option.
If you're interested in doing this without pmap and potentially using fork/join check out the new(ish) reducers library
The OOM situation comes from the first test realizing the lazy sequence from (range n) which is then retained so it can be passed to the second sequence.
If I make the + function much slower by defining a slow+ function and use that the diference between single thread, pmap over chunks, and reducers w/ forkJoin become visable:
user> *clojure-version*
{:major 1, :minor 5, :incremental 0, :qualifier "RC15"}
(require '[clojure.core.reducers :as r])
(def workers
(+ 2 (.. Runtime getRuntime availableProcessors)))
(defn slow+
([] 0)
([x] x)
([x y] (reduce + (range 100000)) (+ x y)))
(defn run-test []
(let [n 8000]
(time (reduce slow+ (range n)))
(time (reduce slow+ (pmap #(reduce slow+ %) (partition-all (* workers 100) (range n)))))
(time (r/fold slow+ (vec (range n))))))
user> (run-test)
"Elapsed time: 28655.951241 msecs" ; one thread
"Elapsed time: 6975.488591 msecs" ; pmap over chunks
"Elapsed time: 8170.254426 msecs" ; using reducer

Performance of large maps in Clojure

I have a Clojure program which is using some large maps (1000 - 2000 items) which are accessed 30 - 40 times a second and using Strings as the keys. I was wondering if there is a big performance difference if I used keywords or symbols as the keys instead?
Clojure map lookups are very fast, and do not particularly depend on the size of the map.
In fact, they are almost as fast as pure Java HashMaps, while enjoying many advantages over traditional HashMaps including being immutable and thread-safe.
If you are only doing 30-40 lookups a second then I guarantee you will never notice the difference regardless of what you use as keys. Worrying about this would count as premature optimisation.
Let's prove it: the following code does a million map lookups using strings as keys:
(def str-keys (map str (range 1000)))
(def m (zipmap str-keys (range 1000)))
(time (dotimes [i 1000] (doseq [k str-keys] (m k))))
=> "Elapsed time: 69.082224 msecs"
The following does a million map lookups using keywords as keys:
(def kw-keys (map #(keyword (str %)) (range 1000)))
(def m (zipmap kw-keys (range 1000)))
(time (dotimes [i 1000] (doseq [k kw-keys] (m k))))
=> "Elapsed time: 59.212864 msecs"
And for symbols:
(def sym-keys (map #(symbol (str %)) (range 1000)))
(def m (zipmap sym-keys (range 1000)))
(time (dotimes [i 1000] (doseq [k sym-keys] (m k))))
=> "Elapsed time: 61.590925 msecs"
In my tests, Symbols and Keywords were slightly faster than Strings, but still the difference could easily be explained by statistical error, and the average execution time per lookup was less than 100 nanoseconds for all cases.
So your 30-40 lookups are probably taking in the order of 0.001% of your CPU time (this even allows for the fact that in a real app, lookups will probably be a few times slower due to caching issues)
The likely reason for Keywords in particular being slightly faster is that they are interned (and can therefore use reference equality to check for equality). But as you can see the difference is sufficiently small that you really don't need to worry about it.