performance gain while using core.reducers - clojure

I tried the below to compare the performance of core/map vs transducers vc core.reducers/map vs core.reducers/fold -
(time (->> (range 10000)
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; core.reducers/map
;; "Elapsed time: 3.962802 msecs"
(time (->> (range 10000)
vec
(r/map inc)
(r/map inc)
(r/map inc)
(r/fold conj)))
;; core.reducers/fold
;; "Elapsed time: 3.318809 msecs"
(time (->> (range 10000)
(map inc)
(map inc)
(map inc)))
;; core/map
;; "Elapsed time: 0.148433 msecs"
(time (->> (range 10000)
(sequence (comp (map inc)
(map inc)
(map inc)))))
;; transducers
;; "Elapsed time: 0.215037 msecs"
1) My expectation was that core/map will have the highest time, however it has the lowest time. Why is it more performant than transducers, when intermediate seqs dont get created for transducers, and transducers should be faster ?
2) Why is the core.reducers/fold version not significantly faster than the core.reducers/map version, shouldnt it have parallelized the operation ?
3) Why are the core.reducers versions so slow as compared to their lazy counterparts, the whole sequence is being realized at the end, so should not eager evaluation be more performant than the lazy one ?

map is lazy, so your test case with core/map does no work at all. Try doalling the collection (or into []), and I expect it will be the slowest after all. You can convince yourself of this by changing 10000 to 1e12, and observe that if your computer can process a trillion elements just as quickly as it can process ten thousand, it must not be doing much work for each element!
What is there to parallelize? The most expensive part of this operation is not the calls to inc (which are parallelized), but combining the results into a vector at the end (which can't be). Try it with a much more expensive function, like #(do (Thread/sleep 500) (inc %)) and you may see different results.
Isn't this the same question as (1)?

;; core/map without transducers
(quick-bench (doall (->> [1 2 3 4]
(map inc)
(map inc)
(map inc))))
;; Evaluation count : 168090 in 6 samples of 28015 calls.
;; Execution time mean : 3.651319 µs
;; Execution time std-deviation : 88.055389 ns
;; Execution time lower quantile : 3.584198 µs ( 2.5%)
;; Execution time upper quantile : 3.799202 µs (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe 1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;; transducers with a non lazy seq
(quick-bench (doall (->> [1 2 3 4]
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 214902 in 6 samples of 35817 calls.
;; Execution time mean : 2.776696 µs
;; Execution time std-deviation : 24.377634 ns
;; Execution time lower quantile : 2.750123 µs ( 2.5%)
;; Execution time upper quantile : 2.809933 µs (97.5%)
;; Overhead used : 7.546189 ns
;;;;
;; tranducers with a lazy seq
;;;;
(quick-bench (doall (->> (range 1 5)
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 214230 in 6 samples of 35705 calls.
;; Execution time mean : 3.361220 µs
;; Execution time std-deviation : 622.084860 ns
;; Execution time lower quantile : 2.874093 µs ( 2.5%)
;; Execution time upper quantile : 4.328653 µs (97.5%)
;; Overhead used : 7.546189 ns
;;;;
;; core.reducers
;;;;
(quick-bench (->> [1 2 3 4]
(r/map inc)
(r/map inc)
(r/map inc)))
;; Evaluation count : 6258966 in 6 samples of 1043161 calls.
;; Execution time mean : 89.610689 ns
;; Execution time std-deviation : 0.936108 ns
;; Execution time lower quantile : 88.786938 ns ( 2.5%)
;; Execution time upper quantile : 91.128549 ns (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe 1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;;;; Evaluating a larger range so that the chunking comes into play ;;;;
;; core/map without transducers
(quick-bench (doall (->> (range 500)
(map inc)
(map inc)
(map inc))))
;; transducers with a non lazy seq
(quick-bench (doall (->> (doall (range 500))
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 2598 in 6 samples of 433 calls.
;; Execution time mean : 237.164523 µs
;; Execution time std-deviation : 5.336417 µs
;; Execution time lower quantile : 231.751575 µs ( 2.5%)
;; Execution time upper quantile : 244.836021 µs (97.5%)
;; Overhead used : 7.546189 ns
;; tranducers with a lazy seq
(quick-bench (doall (->> (range 500)
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 3210 in 6 samples of 535 calls.
;; Execution time mean : 183.866148 µs
;; Execution time std-deviation : 1.799841 µs
;; Execution time lower quantile : 182.137656 µs ( 2.5%)
;; Execution time upper quantile : 186.347677 µs (97.5%)
;; Overhead used : 7.546189 ns
;; core.reducers
(quick-bench (->> (range 500)
(r/map inc)
(r/map inc)
(r/map inc)))
;; Evaluation count : 4695642 in 6 samples of 782607 calls.
;; Execution time mean : 126.973627 ns
;; Execution time std-deviation : 5.972927 ns
;; Execution time lower quantile : 122.471060 ns ( 2.5%)
;; Execution time upper quantile : 134.181056 ns (97.5%)
;; Overhead used : 7.546189 ns
Based on the above answers / comments I tried the benchmarking again -
1) The reducers version is faster on a magnitude of 10^3.
2) This applies for both small collections (4 elements) and larger ones (500 element) where chunking can happen for lazy seqs.
3) Thus even with chunking, lazy evaluation is much slower than eager evaluation.
Corrections based on the remark :- the reducers only get executed on the reduce operation, which was not getting executed in the above code -
(quick-bench (->> [1 2 3 4]
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; Evaluation count : 331302 in 6 samples of 55217 calls.
;; Execution time mean : 2.035153 µs
;; Execution time std-deviation : 314.070348 ns
;; Execution time lower quantile : 1.720615 µs ( 2.5%)
;; Execution time upper quantile : 2.381706 µs (97.5%)
;; Overhead used : 7.546189 ns
(quick-bench (->> (range 500)
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; Evaluation count : 3870 in 6 samples of 645 calls.
;; Execution time mean : 150.349870 µs
;; Execution time std-deviation : 2.825632 µs
;; Execution time lower quantile : 146.468231 µs ( 2.5%)
;; Execution time upper quantile : 153.271325 µs (97.5%)
;; Overhead used : 7.546189 ns
So the reducer versions are 30-70 % faster than the transducer counterparts. The performance differential increases as the data set size increases.

Related

Why reduce performs much sub-optimal than recur in this case?

One of intuitive ways to calculate π in polynomial sum looks like below,
π = ( 1/1 - 1/3 + 1/5 - 1/7 + 1/9 ... ) × 4
The following function ρ or ρ' denotes the polynomial sum, and the consumed time τ to calculate the π is measured respectively,
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [res k] (+ res (term k)))
0
(lazy-seq (range 0 n))))
(defn ρ' [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(defn calc [P]
(let [start (System/nanoTime)
π (* (P 1000000) 4)
end (System/nanoTime)
τ (- end start)]
(printf "π=%.16f τ=%d\n" π τ)))
(calc ρ)
(calc ρ')
The result tells that ρ is about half more time spent than ρ', hence the underlying reduce performs much sub-optimal than recur in this case, but why?
Rewriting your code and using a more accurate timer shows there is no significant difference. This is to be expected since both loop/recur and reduce are very basic forms and we would expect them to both be fairly optimized.
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test)
(:require
[criterium.core :as crit] ))
(def result (atom nil))
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [res k] (+ res (term k)))
0
(range 0 n)) )
(defn ρ' [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)) )
(defn calc [calc-fn N]
(let [pi (* (calc-fn N) 4)]
(reset! result pi)
pi))
We measure the execution time for both algorithms using Criterium:
(defn timings
[power]
(let [N (Math/pow 10 power)]
(newline)
(println :-----------------------------------------------------------------------------)
(spyx N)
(newline)
(crit/quick-bench (calc ρ N))
(println :rho #result)
(newline)
(crit/quick-bench (calc ρ' N))
(println :rho-prime N #result)
(newline)))
and we try it for 10^2, 10^4, and 10^6 values of N:
(dotest
(timings 2)
(timings 4)
(timings 6))
with results for 10^2:
-------------------------------
Clojure 1.10.1 Java 14
-------------------------------
Testing tst.demo.core
:-----------------------------------------------------------------------------
N => 100.0
Evaluation count : 135648 in 6 samples of 22608 calls.
Execution time mean : 4.877255 µs
Execution time std-deviation : 647.723342 ns
Execution time lower quantile : 4.438762 µs ( 2.5%)
Execution time upper quantile : 5.962740 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 31.6928 % Variance is moderately inflated by outliers
:rho 3.1315929035585537
Evaluation count : 148434 in 6 samples of 24739 calls.
Execution time mean : 4.070798 µs
Execution time std-deviation : 68.430348 ns
Execution time lower quantile : 4.009978 µs ( 2.5%)
Execution time upper quantile : 4.170038 µs (97.5%)
Overhead used : 2.165947 ns
:rho-prime 100.0 3.1315929035585537
with results for 10^4:
:-----------------------------------------------------------------------------
N => 10000.0
Evaluation count : 1248 in 6 samples of 208 calls.
Execution time mean : 519.096208 µs
Execution time std-deviation : 143.478354 µs
Execution time lower quantile : 454.389510 µs ( 2.5%)
Execution time upper quantile : 767.610509 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 65.1517 % Variance is severely inflated by outliers
:rho 3.1414926535900345
Evaluation count : 1392 in 6 samples of 232 calls.
Execution time mean : 431.020370 µs
Execution time std-deviation : 14.853924 µs
Execution time lower quantile : 420.838884 µs ( 2.5%)
Execution time upper quantile : 455.282989 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
:rho-prime 10000.0 3.1414926535900345
with results for 10^6:
:-----------------------------------------------------------------------------
N => 1000000.0
Evaluation count : 18 in 6 samples of 3 calls.
Execution time mean : 46.080480 ms
Execution time std-deviation : 1.039714 ms
Execution time lower quantile : 45.132049 ms ( 2.5%)
Execution time upper quantile : 47.430310 ms (97.5%)
Overhead used : 2.165947 ns
:rho 3.1415916535897743
Evaluation count : 18 in 6 samples of 3 calls.
Execution time mean : 52.527777 ms
Execution time std-deviation : 17.483930 ms
Execution time lower quantile : 41.789520 ms ( 2.5%)
Execution time upper quantile : 82.539445 ms (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 81.7010 % Variance is severely inflated by outliers
:rho-prime 1000000.0 3.1415916535897743
Note that the times for rho and rho-prime flip-flop for the 10^4 and 10^6 cases. In any case, don't believe or worry much about timings that vary by less than 2x.
Update
I deleted the lazy-seq in the original code since clojure.core/range is already lazy. Also, I've never seen lazy-seq used without a cons and a recursive call to the generating function.
Re clojure.core/range, we have the docs:
range
Returns a lazy seq of nums from start (inclusive) to end (exclusive),
by step, where start defaults to 0, step to 1, and end to infinity.
When step is equal to 0, returns an infinite sequence of start. When
start is equal to end, returns empty list.
In the source code, it calls out into the Java impl of clojure.core:
([start end]
(if (and (instance? Long start) (instance? Long end))
(clojure.lang.LongRange/create start end)
(clojure.lang.Range/create start end)))
& the Java code indicates it is chunked:
public class Range extends ASeq implements IChunkedSeq, IReduce {
private static final int CHUNK_SIZE = 32;
<snip>
Additionally to other answers.
Performance can be significantly increased in you eliminate math boxing (original versions were both about 25ms). And variant with loop/recur is 2× faster.
(set! *unchecked-math* :warn-on-boxed)
(defn term ^double [^long k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [^double res ^long k] (+ res (term k)))
0
(range 0 n)))
(defn ρ' [^long n]
(loop [res (double 0) k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(criterium.core/quick-bench
(ρ 1000000))
Evaluation count : 42 in 6 samples of 7 calls.
Execution time mean : 15,639294 ms
Execution time std-deviation : 371,972168 µs
Execution time lower quantile : 15,327698 ms ( 2,5%)
Execution time upper quantile : 16,227505 ms (97,5%)
Overhead used : 1,855553 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 13,8889 % Variance is moderately inflated by outliers
=> nil
(criterium.core/quick-bench
(ρ' 1000000))
Evaluation count : 72 in 6 samples of 12 calls.
Execution time mean : 8,570961 ms
Execution time std-deviation : 302,554974 µs
Execution time lower quantile : 8,285648 ms ( 2,5%)
Execution time upper quantile : 8,919635 ms (97,5%)
Overhead used : 1,855553 ns
=> nil
Below is the improved version to be more representative. Seemingly, the performance varies from case to case but not by that much.
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ1 [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(defn ρ2 [n]
(reduce
(fn [res k] (+ res (term k)))
0
(range 0 n)))
(defn ρ3 [n]
(reduce + 0 (map term (range 0 n))))
(defn ρ4 [n]
(transduce (map term) + 0 (range 0 n)))
(defn calc [ρname ρ n]
(let [start (System/nanoTime)
π (* (ρ n) 4)
end (System/nanoTime)
τ (- end start)]
(printf "ρ=%8s n=%10d π=%.16f τ=%10d\n" ρname n π τ)))
(def args
{:N (map #(long (Math/pow 10 %)) [4 6])
:T 10
:P {:recur ρ1 :reduce ρ2 :mreduce ρ3 :xreduce ρ4}})
(doseq [n (:N args)]
(dotimes [_ (:T args)]
(println "---")
(doseq [kv (:P args)] (calc (key kv) (val kv) n))))

Trim unnecessary entries from deeply nested data structure using specter

I'm looking to use Clojure Specter to simplify a deeply nested datastructure. I want to remove:
any entries with nil values
any entries with empty string values
any entries with empty map values
any entries with empty sequential values
any entries with maps/sequential values that are empty after removing the above cases.
Something like this:
(do-something
{:a {:aa 1}
:b {:ba -1
:bb 2
:bc nil
:bd ""
:be []
:bf {}
:bg {:ga nil}
:bh [nil]
:bi [{}]
:bj [{:ja nil}]}
:c nil
:d ""
:e []
:f {}
:g {:ga nil}
:h [nil]
:i [{}]
:j [{:ja nil}]})
=>
{:a {:aa 1}
:b {:ba -1
:bb 2}}
I have something in vanilla Clojure:
(defn prunable?
[v]
(if (sequential? v)
(keep identity v)
(or (nil? v) (#{"" [] {}} v))))
(defn- remove-nil-values
[ticket]
(clojure.walk/postwalk
(fn [el]
(if (map? el)
(let [m (into {} (remove (comp prunable? second) el))]
(when (seq m)
m))
el))
ticket))
I think I need some sort of recursive-path but I'm not getting anywhere fast. Help much appreciated.
Comparing the performance of different versions against specter implementation:
#bm1729 plain vanilla version:
Evaluation count : 1060560 in 60 samples of 17676 calls.
Execution time mean : 57.083226 µs
Execution time std-deviation : 543.184398 ns
Execution time lower quantile : 56.559237 µs ( 2.5%)
Execution time upper quantile : 58.519433 µs (97.5%)
Overhead used : 7.023993 ns
Found 5 outliers in 60 samples (8.3333 %)
low-severe 3 (5.0000 %)
low-mild 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Below version:
Evaluation count : 3621960 in 60 samples of 60366 calls.
Execution time mean : 16.606135 µs
Execution time std-deviation : 141.114975 ns
Execution time lower quantile : 16.481250 µs ( 2.5%)
Execution time upper quantile : 16.922734 µs (97.5%)
Overhead used : 7.023993 ns
Found 9 outliers in 60 samples (15.0000 %)
low-severe 6 (10.0000 %)
low-mild 3 (5.0000 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
(defn prune [x]
(cond
(map? x) (not-empty
(reduce-kv
(fn [s k v]
(let [v' (prune v)]
(cond-> s
v' (assoc k v'))))
(empty x)
x))
(seqable? x) (not-empty
(into
(empty x)
(->> x (map prune) (filter identity))))
:else x))
Test case:
(prune {:a {:aa 1}
:b {:ba -1
:bb 2
:bc nil
:bd ""
:be []
:bf {}
:bg {:ga nil}
:bh [nil]
:bi [{}]
:bj [{:ja nil}]}
:c nil
:d ""
:e []
:f {}
:g {:ga nil}
:h [nil]
:i [{}]
:j [{:ja nil}]})
;; => {:b {:bb 2, :ba -1}, :a {:aa 1}}
UPDATE - #bm1729 specter version
Evaluation count : 3314820 in 60 samples of 55247 calls.
Execution time mean : 18.421613 µs
Execution time std-deviation : 591.106243 ns
Execution time lower quantile : 18.148204 µs ( 2.5%)
Execution time upper quantile : 20.674292 µs (97.5%)
Overhead used : 7.065044 ns
Found 8 outliers in 60 samples (13.3333 %)
low-severe 2 (3.3333 %)
low-mild 6 (10.0000 %)
Variance from outliers : 18.9883 % Variance is moderately inflated by outliers
Thanks to nathanmarz on the Clojurians slack channel:
(def COMPACTED-VALS-PATH
(recursive-path [] p
(continue-then-stay
(cond-path
map? [(compact MAP-VALS) p]
vector? [(compact ALL) p]))))
(defn- compact-data
[m]
(setval [MAP-VALS COMPACTED-VALS-PATH #(or (nil? %) (= "" %))] NONE m))

Optimize tail-recursion in Clojure: exponential moving average

I'm new to Clojure and trying to implement an exponential moving average function using tail recursion. After battling a little with stack overflows using lazy-seq and concat, I got to the following implementation which works, but is very slow:
(defn ema3 [c a]
(loop [ct (rest c) res [(first c)]]
(if (= (count ct) 0)
res
(recur
(rest ct)
(into;NOT LAZY-SEQ OR CONCAT
res
[(+ (* a (first ct)) (* (- 1 a) (last res)))]
)
)
)
)
)
For a 10,000 item collection, Clojure will take around 1300ms, whereas a Python Pandas call such as
s.ewm(alpha=0.3, adjust=True).mean()
will only take 700 us. How can I reduce that performance gap? Thank you,
Personally I would do this lazily with reductions. It's simpler to do than using loop/recur or building up a result vector by hand with reduce, and it also means you can consume the result as it is built up, rather than needing to wait for the last element to be finished before you can look at the first one.
If you care most about throughput then I suppose Taylor Wood's reduce is the best approach, but the lazy solution is only very slightly slower and is much more flexible.
(defn ema3-reductions [c a]
(let [a' (- 1 a)]
(reductions
(fn [ave x]
(+ (* a x)
(* (- 1 a') ave)))
(first c)
(rest c))))
user> (quick-bench (dorun (ema3-reductions (range 10000) 0.3)))
Evaluation count : 288 in 6 samples of 48 calls.
Execution time mean : 2.336732 ms
Execution time std-deviation : 282.205842 µs
Execution time lower quantile : 2.125654 ms ( 2.5%)
Execution time upper quantile : 2.686204 ms (97.5%)
Overhead used : 8.637601 ns
nil
user> (quick-bench (dorun (ema3-reduce (range 10000) 0.3)))
Evaluation count : 270 in 6 samples of 45 calls.
Execution time mean : 2.357937 ms
Execution time std-deviation : 26.934956 µs
Execution time lower quantile : 2.311448 ms ( 2.5%)
Execution time upper quantile : 2.381077 ms (97.5%)
Overhead used : 8.637601 ns
nil
Honestly in that benchmark you can't even tell the lazy version is slower than the vector version. I think my version is still slower, but it's a vanishingly trivial difference.
You can also speed things up if you tell Clojure to expect doubles, so it doesn't have to keep double-checking the types of a, c, and so on.
(defn ema3-reductions-prim [c ^double a]
(let [a' (- 1.0 a)]
(reductions (fn [ave x]
(+ (* a (double x))
(* a' (double ave))))
(first c)
(rest c))))
user> (quick-bench (dorun (ema3-reductions-prim (range 10000) 0.3)))
Evaluation count : 432 in 6 samples of 72 calls.
Execution time mean : 1.720125 ms
Execution time std-deviation : 385.880730 µs
Execution time lower quantile : 1.354539 ms ( 2.5%)
Execution time upper quantile : 2.141612 ms (97.5%)
Overhead used : 8.637601 ns
nil
Another 25% speedup, not too bad. I expect you could squeeze out a bit more by using primitives in either a reduce solution or with loop/recur if you were really desperate. It would be especially helpful in a loop because you wouldn't have to keep boxing and unboxing the intermediate results between double and Double.
If res is a vector (which it is in your example) then using peek instead of last yields much better performance:
(defn ema3 [c a]
(loop [ct (rest c) res [(first c)]]
(if (= (count ct) 0)
res
(recur
(rest ct)
(into
res
[(+ (* a (first ct)) (* (- 1 a) (peek res)))])))))
Your example on my computer:
(time (ema3 (range 10000) 0.3))
"Elapsed time: 990.417668 msecs"
Using peek:
(time (ema3 (range 10000) 0.3))
"Elapsed time: 9.736761 msecs"
Here's a version using reduce that's even faster on my computer:
(defn ema3 [c a]
(reduce (fn [res ct]
(conj
res
(+ (* a ct)
(* (- 1 a) (peek res)))))
[(first c)]
(rest c)))
;; "Elapsed time: 0.98824 msecs"
Take these timings with a grain of salt. Use something like criterium for more thorough benchmarking. You might be able to squeeze out more gains using mutability/transients.

clojure simple markov data transform

If I have a vector of words for example ["john" "said"... "john" "walked"...]
and I want to make a hash map of each word and the number of occurrences of next word for example {"john" {"said" 1 "walked" 1 "kicked" 3}}
The best solution I came up with was recursively walking through a list by index and using assoc to keep updating the hash-map but that seems really messy. Is there a more idiomatic way of doing this?
Given you have words:
(def words ["john" "said" "lara" "chased" "john" "walked" "lara" "chased"])
Use this transformation-fn
(defn transform
[words]
(->> words
(partition 2 1)
(reduce (fn [acc [w next-w]]
;; could be shortened to #(update-in %1 %2 (fnil inc 0))
(update-in acc
[w next-w]
(fnil inc 0)))
{})))
(transform words)
;; {"walked" {"lara" 1}, "chased" {"john" 1}, "lara" {"chased" 2}, "said" {"lara" 1}, "john" {"walked" 1, "said" 1}}
EDIT: You can gain performance using transient hash-maps like this:
(defn transform-fast
[words]
(->> (map vector words (next words))
(reduce (fn [acc [w1 w2]]
(let [c-map (get acc w1 (transient {}))]
(assoc! acc w1 (assoc! c-map w2
(inc (get c-map w2 0))))))
(transient {}))
persistent!
(reduce-kv (fn [acc w1 c-map]
(assoc! acc w1 (persistent! c-map)))
(transient {}))
persistent!))
Obviously the resulting source code doesn't look as nice and such optimization should only happen if it is critical.
(Criterium says it beats Michał Marczyks transform* being roughly two times as fast on King Lear).
(Update: see below for a solution using java.util.HashMap for the intermediate products -- end result still being fully persistent -- which is the fastest yet, with a factor of 2.35 advantage over transform-fast in the King Lear benchmark.)
merge-with-based solution
Here's a faster solution, by a factor of roughly 1.7 on words taken from King Lear (see below for the exact methodology), almost 3x on the sample words:
(defn transform* [words]
(apply merge-with
#(merge-with + %1 %2)
(map (fn [w nw] {w {nw 1}})
words
(next words))))
The function passed to map could alternatively be written
#(array-map %1 (array-map %2 1)),
although the timings with this approach aren't quite as good. (I still include this version in the benchmark below as transform**.)
First, a sanity check:
;; same input
(def words ["john" "said" "lara" "chased" "john"
"walked" "lara" "chased"])
(= (transform words) (transform* words) (transform** words))
;= true
Criterium benchmark using the test input (OpenJDK 1.7 with -XX:+UseConcMarkSweepGC):
(do (c/bench (transform words))
(flush)
(c/bench (transform* words))
(flush)
(c/bench (transform** words)))
Evaluation count : 4345080 in 60 samples of 72418 calls.
Execution time mean : 13.945669 µs
Execution time std-deviation : 158.808075 ns
Execution time lower quantile : 13.696874 µs ( 2.5%)
Execution time upper quantile : 14.295440 µs (97.5%)
Overhead used : 1.612143 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 12998220 in 60 samples of 216637 calls.
Execution time mean : 4.705608 µs
Execution time std-deviation : 63.133406 ns
Execution time lower quantile : 4.605234 µs ( 2.5%)
Execution time upper quantile : 4.830540 µs (97.5%)
Overhead used : 1.612143 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 10847220 in 60 samples of 180787 calls.
Execution time mean : 5.706852 µs
Execution time std-deviation : 73.589941 ns
Execution time lower quantile : 5.560404 µs ( 2.5%)
Execution time upper quantile : 5.828209 µs (97.5%)
Overhead used : 1.612143 ns
Finally, the more interesting benchmark using King Lear as found on Project Gutenberg (not bothering to strip out the legal notices and such before processing):
(def king-lear (slurp (io/file "/path/to/pg1128.txt")))
(def king-lear-words
(-> king-lear
(string/lower-case)
(string/replace #"[^a-z]" " ")
(string/trim)
(string/split #"\s+")))
(do (c/bench (transform king-lear-words))
(flush)
(c/bench (transform* king-lear-words))
(flush)
(c/bench (transform** king-lear-words)))
Evaluation count : 720 in 60 samples of 12 calls.
Execution time mean : 87.012898 ms
Execution time std-deviation : 833.381589 µs
Execution time lower quantile : 85.772832 ms ( 2.5%)
Execution time upper quantile : 88.585741 ms (97.5%)
Overhead used : 1.612143 ns
Evaluation count : 1200 in 60 samples of 20 calls.
Execution time mean : 51.786860 ms
Execution time std-deviation : 587.029829 µs
Execution time lower quantile : 50.854355 ms ( 2.5%)
Execution time upper quantile : 52.940274 ms (97.5%)
Overhead used : 1.612143 ns
Evaluation count : 1020 in 60 samples of 17 calls.
Execution time mean : 61.287369 ms
Execution time std-deviation : 720.816107 µs
Execution time lower quantile : 60.131219 ms ( 2.5%)
Execution time upper quantile : 62.960647 ms (97.5%)
Overhead used : 1.612143 ns
java.util.HashMap-based solution
Going all out, it's possible to do even better using mutable hash maps for the intermediate state and loop / recur to avoid consing while looping over pairs of words:
(defn t9 [words]
(let [m (java.util.HashMap.)]
(loop [ws words
nws (next words)]
(if nws
(let [w (first ws)
nw (first nws)]
(if-let [im ^java.util.HashMap (.get m w)]
(.put im nw (inc (or (.get im nw) 0)))
(let [im (java.util.HashMap.)]
(.put im nw 1)
(.put m w im)))
(recur (next ws) (next nws)))
(persistent!
(reduce (fn [out k]
(assoc! out k
(clojure.lang.PersistentHashMap/create
^java.util.HashMap (.get m k))))
(transient {})
(iterator-seq (.iterator (.keySet m)))))))))
clojure.lang.PersistentHashMap/create is a static method in the PHM class and is admittedly an implementation detail. (Not likely to change in the near future, though -- currently all map creation in Clojure for the built-in map types goes through static methods like this.)
Sanity check:
(= (transform king-lear-words) (t9 king-lear-words))
;= true
Benchmark results:
(c/bench (transform-fast king-lear-words))
Evaluation count : 2100 in 60 samples of 35 calls.
Execution time mean : 28.560527 ms
Execution time std-deviation : 262.483916 µs
Execution time lower quantile : 28.117982 ms ( 2.5%)
Execution time upper quantile : 29.104784 ms (97.5%)
Overhead used : 1.898836 ns
(c/bench (t9 king-lear-words))
Evaluation count : 4980 in 60 samples of 83 calls.
Execution time mean : 12.153898 ms
Execution time std-deviation : 119.028100 µs
Execution time lower quantile : 11.953013 ms ( 2.5%)
Execution time upper quantile : 12.411588 ms (97.5%)
Overhead used : 1.898836 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Get all moving k-sized partitions of a string

I'm trying to get all "moving" partitions sized k of a string. Basically, I want to move a window of sized k along the string and get that k-word.
Here's an example,
k: 3
Input: ABDEFGH
Output: ABD, EFG, BDE, FGH, DEF
My idea was to walk along the input, drop a head and partition and then drop a head again from the previously (now headless) sequence, but I'm not sure exactly how to do this...Also, maybe there's a better way of doing this? Below is the idea I had in mind.
(#(partition k input) (collection of s where head was consecutively dropped))
Strings in Clojure can be treated as seqs of characters, so you can partition them directly. To get a sequence of overlapping partitions, use the version that accepts a size and a step:
user> (partition 3 1 "abcdef")
((\a \b \c) (\b \c \d) (\c \d \e) (\d \e \f))
To put a character sequence back into a string, just apply str to it:
user> (apply str '(\a \b \c))
"abc"
To put it all together:
user> (map (partial apply str) (partition 3 1 "abcdef"))
("abc" "bcd" "cde" "def")
Here are implementations of partition and partition-all for strings, returning a lazy-seq of strings, doing the splitting using subs. If you need high performance doing string transformations, these will be significantly faster (by average 8 times as fast, see criterium benchmarks below) than creating char-seqs.
(defn partition-string
"Like partition, but splits string using subs and returns a
lazy-seq of strings."
([n s]
(partition-string n n s))
([n p s]
(lazy-seq
(if-not (< (count s) n)
(cons
(subs s 0 n)
(->> (subs s p)
(partition-string n p)))))))
(defn partition-string-all
"Like partition-all, but splits string using subs and returns a
lazy-seq of strings."
([n s]
(partition-string-all n n s))
([n p s]
(let [less (if (< (count s) n)
(count s))]
(lazy-seq
(cons
(subs s 0 (or less n))
(if-not less
(->> (subs s p)
(partition-string-all n p))))))))
;; Alex answer:
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (map (partial apply str) (partition 3 1 test-str)))))
;; WARNING: Final GC required 1.010207840526515 % of runtime
;; Evaluation count : 773220 in 60 samples of 12887 calls.
;; Execution time mean : 79.900801 µs
;; Execution time std-deviation : 2.008823 µs
;; Execution time lower quantile : 77.725304 µs ( 2.5%)
;; Execution time upper quantile : 83.888349 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 3 (5.0000 %)
;; Variance from outliers : 12.5585 % Variance is moderately inflated by outliers
;; KobbyPemson answer:
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (moving-partition test-str 3))))
;; WARNING: Final GC required 1.674347646128195 % of runtime
;; Evaluation count : 386820 in 60 samples of 6447 calls.
;; Execution time mean : 161.928479 µs
;; Execution time std-deviation : 8.362590 µs
;; Execution time lower quantile : 154.707888 µs ( 2.5%)
;; Execution time upper quantile : 184.095816 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 2 (3.3333 %)
;; low-mild 1 (1.6667 %)
;; Variance from outliers : 36.8985 % Variance is moderately inflated by outliers
;; This answer
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (partition-string 3 1 test-str))))
;; WARNING: Final GC required 1.317098148979236 % of runtime
;; Evaluation count : 5706000 in 60 samples of 95100 calls.
;; Execution time mean : 10.480174 µs
;; Execution time std-deviation : 240.957206 ns
;; Execution time lower quantile : 10.234580 µs ( 2.5%)
;; Execution time upper quantile : 11.075740 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 3 (5.0000 %)
;; Variance from outliers : 10.9961 % Variance is moderately inflated by outliers
(defn moving-partition
[input k]
(map #(.substring input % (+ k %))
(range (- (count input) (dec k)))))