Optimize tail-recursion in Clojure: exponential moving average

Optimize tail-recursion in Clojure: exponential moving average - clojure

I'm new to Clojure and trying to implement an exponential moving average function using tail recursion. After battling a little with stack overflows using lazy-seq and concat, I got to the following implementation which works, but is very slow:
(defn ema3 [c a]
(loop [ct (rest c) res [(first c)]]
(if (= (count ct) 0)
res
(recur
(rest ct)
(into;NOT LAZY-SEQ OR CONCAT
res
[(+ (* a (first ct)) (* (- 1 a) (last res)))]
)
)
)
)
)
For a 10,000 item collection, Clojure will take around 1300ms, whereas a Python Pandas call such as
s.ewm(alpha=0.3, adjust=True).mean()
will only take 700 us. How can I reduce that performance gap? Thank you,

Personally I would do this lazily with reductions. It's simpler to do than using loop/recur or building up a result vector by hand with reduce, and it also means you can consume the result as it is built up, rather than needing to wait for the last element to be finished before you can look at the first one.
If you care most about throughput then I suppose Taylor Wood's reduce is the best approach, but the lazy solution is only very slightly slower and is much more flexible.
(defn ema3-reductions [c a]
(let [a' (- 1 a)]
(reductions
(fn [ave x]
(+ (* a x)
(* (- 1 a') ave)))
(first c)
(rest c))))
user> (quick-bench (dorun (ema3-reductions (range 10000) 0.3)))
Evaluation count : 288 in 6 samples of 48 calls.
Execution time mean : 2.336732 ms
Execution time std-deviation : 282.205842 µs
Execution time lower quantile : 2.125654 ms ( 2.5%)
Execution time upper quantile : 2.686204 ms (97.5%)
Overhead used : 8.637601 ns
nil
user> (quick-bench (dorun (ema3-reduce (range 10000) 0.3)))
Evaluation count : 270 in 6 samples of 45 calls.
Execution time mean : 2.357937 ms
Execution time std-deviation : 26.934956 µs
Execution time lower quantile : 2.311448 ms ( 2.5%)
Execution time upper quantile : 2.381077 ms (97.5%)
Overhead used : 8.637601 ns
nil
Honestly in that benchmark you can't even tell the lazy version is slower than the vector version. I think my version is still slower, but it's a vanishingly trivial difference.
You can also speed things up if you tell Clojure to expect doubles, so it doesn't have to keep double-checking the types of a, c, and so on.
(defn ema3-reductions-prim [c ^double a]
(let [a' (- 1.0 a)]
(reductions (fn [ave x]
(+ (* a (double x))
(* a' (double ave))))
(first c)
(rest c))))
user> (quick-bench (dorun (ema3-reductions-prim (range 10000) 0.3)))
Evaluation count : 432 in 6 samples of 72 calls.
Execution time mean : 1.720125 ms
Execution time std-deviation : 385.880730 µs
Execution time lower quantile : 1.354539 ms ( 2.5%)
Execution time upper quantile : 2.141612 ms (97.5%)
Overhead used : 8.637601 ns
nil
Another 25% speedup, not too bad. I expect you could squeeze out a bit more by using primitives in either a reduce solution or with loop/recur if you were really desperate. It would be especially helpful in a loop because you wouldn't have to keep boxing and unboxing the intermediate results between double and Double.

If res is a vector (which it is in your example) then using peek instead of last yields much better performance:
(defn ema3 [c a]
(loop [ct (rest c) res [(first c)]]
(if (= (count ct) 0)
res
(recur
(rest ct)
(into
res
[(+ (* a (first ct)) (* (- 1 a) (peek res)))])))))
Your example on my computer:
(time (ema3 (range 10000) 0.3))
"Elapsed time: 990.417668 msecs"
Using peek:
(time (ema3 (range 10000) 0.3))
"Elapsed time: 9.736761 msecs"
Here's a version using reduce that's even faster on my computer:
(defn ema3 [c a]
(reduce (fn [res ct]
(conj
res
(+ (* a ct)
(* (- 1 a) (peek res)))))
[(first c)]
(rest c)))
;; "Elapsed time: 0.98824 msecs"
Take these timings with a grain of salt. Use something like criterium for more thorough benchmarking. You might be able to squeeze out more gains using mutability/transients.

Related

Why reduce performs much sub-optimal than recur in this case?

One of intuitive ways to calculate π in polynomial sum looks like below,
π = ( 1/1 - 1/3 + 1/5 - 1/7 + 1/9 ... ) × 4
The following function ρ or ρ' denotes the polynomial sum, and the consumed time τ to calculate the π is measured respectively,
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [res k] (+ res (term k)))
0
(lazy-seq (range 0 n))))
(defn ρ' [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(defn calc [P]
(let [start (System/nanoTime)
π (* (P 1000000) 4)
end (System/nanoTime)
τ (- end start)]
(printf "π=%.16f τ=%d\n" π τ)))
(calc ρ)
(calc ρ')
The result tells that ρ is about half more time spent than ρ', hence the underlying reduce performs much sub-optimal than recur in this case, but why?

Rewriting your code and using a more accurate timer shows there is no significant difference. This is to be expected since both loop/recur and reduce are very basic forms and we would expect them to both be fairly optimized.
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test)
(:require
[criterium.core :as crit] ))
(def result (atom nil))
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [res k] (+ res (term k)))
0
(range 0 n)) )
(defn ρ' [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)) )
(defn calc [calc-fn N]
(let [pi (* (calc-fn N) 4)]
(reset! result pi)
pi))
We measure the execution time for both algorithms using Criterium:
(defn timings
[power]
(let [N (Math/pow 10 power)]
(newline)
(println :-----------------------------------------------------------------------------)
(spyx N)
(newline)
(crit/quick-bench (calc ρ N))
(println :rho #result)
(newline)
(crit/quick-bench (calc ρ' N))
(println :rho-prime N #result)
(newline)))
and we try it for 10^2, 10^4, and 10^6 values of N:
(dotest
(timings 2)
(timings 4)
(timings 6))
with results for 10^2:
-------------------------------
Clojure 1.10.1 Java 14
-------------------------------
Testing tst.demo.core
:-----------------------------------------------------------------------------
N => 100.0
Evaluation count : 135648 in 6 samples of 22608 calls.
Execution time mean : 4.877255 µs
Execution time std-deviation : 647.723342 ns
Execution time lower quantile : 4.438762 µs ( 2.5%)
Execution time upper quantile : 5.962740 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 31.6928 % Variance is moderately inflated by outliers
:rho 3.1315929035585537
Evaluation count : 148434 in 6 samples of 24739 calls.
Execution time mean : 4.070798 µs
Execution time std-deviation : 68.430348 ns
Execution time lower quantile : 4.009978 µs ( 2.5%)
Execution time upper quantile : 4.170038 µs (97.5%)
Overhead used : 2.165947 ns
:rho-prime 100.0 3.1315929035585537
with results for 10^4:
:-----------------------------------------------------------------------------
N => 10000.0
Evaluation count : 1248 in 6 samples of 208 calls.
Execution time mean : 519.096208 µs
Execution time std-deviation : 143.478354 µs
Execution time lower quantile : 454.389510 µs ( 2.5%)
Execution time upper quantile : 767.610509 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 65.1517 % Variance is severely inflated by outliers
:rho 3.1414926535900345
Evaluation count : 1392 in 6 samples of 232 calls.
Execution time mean : 431.020370 µs
Execution time std-deviation : 14.853924 µs
Execution time lower quantile : 420.838884 µs ( 2.5%)
Execution time upper quantile : 455.282989 µs (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
:rho-prime 10000.0 3.1414926535900345
with results for 10^6:
:-----------------------------------------------------------------------------
N => 1000000.0
Evaluation count : 18 in 6 samples of 3 calls.
Execution time mean : 46.080480 ms
Execution time std-deviation : 1.039714 ms
Execution time lower quantile : 45.132049 ms ( 2.5%)
Execution time upper quantile : 47.430310 ms (97.5%)
Overhead used : 2.165947 ns
:rho 3.1415916535897743
Evaluation count : 18 in 6 samples of 3 calls.
Execution time mean : 52.527777 ms
Execution time std-deviation : 17.483930 ms
Execution time lower quantile : 41.789520 ms ( 2.5%)
Execution time upper quantile : 82.539445 ms (97.5%)
Overhead used : 2.165947 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 81.7010 % Variance is severely inflated by outliers
:rho-prime 1000000.0 3.1415916535897743
Note that the times for rho and rho-prime flip-flop for the 10^4 and 10^6 cases. In any case, don't believe or worry much about timings that vary by less than 2x.
Update
I deleted the lazy-seq in the original code since clojure.core/range is already lazy. Also, I've never seen lazy-seq used without a cons and a recursive call to the generating function.
Re clojure.core/range, we have the docs:
range
Returns a lazy seq of nums from start (inclusive) to end (exclusive),
by step, where start defaults to 0, step to 1, and end to infinity.
When step is equal to 0, returns an infinite sequence of start. When
start is equal to end, returns empty list.
In the source code, it calls out into the Java impl of clojure.core:
([start end]
(if (and (instance? Long start) (instance? Long end))
(clojure.lang.LongRange/create start end)
(clojure.lang.Range/create start end)))
& the Java code indicates it is chunked:
public class Range extends ASeq implements IChunkedSeq, IReduce {
private static final int CHUNK_SIZE = 32;
<snip>

Additionally to other answers.
Performance can be significantly increased in you eliminate math boxing (original versions were both about 25ms). And variant with loop/recur is 2× faster.
(set! *unchecked-math* :warn-on-boxed)
(defn term ^double [^long k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ [n]
(reduce
(fn [^double res ^long k] (+ res (term k)))
0
(range 0 n)))
(defn ρ' [^long n]
(loop [res (double 0) k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(criterium.core/quick-bench
(ρ 1000000))
Evaluation count : 42 in 6 samples of 7 calls.
Execution time mean : 15,639294 ms
Execution time std-deviation : 371,972168 µs
Execution time lower quantile : 15,327698 ms ( 2,5%)
Execution time upper quantile : 16,227505 ms (97,5%)
Overhead used : 1,855553 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 13,8889 % Variance is moderately inflated by outliers
=> nil
(criterium.core/quick-bench
(ρ' 1000000))
Evaluation count : 72 in 6 samples of 12 calls.
Execution time mean : 8,570961 ms
Execution time std-deviation : 302,554974 µs
Execution time lower quantile : 8,285648 ms ( 2,5%)
Execution time upper quantile : 8,919635 ms (97,5%)
Overhead used : 1,855553 ns
=> nil

Below is the improved version to be more representative. Seemingly, the performance varies from case to case but not by that much.
(defn term [k]
(let [x (/ 1. (inc (* 2 k)))]
(if (even? k)
x
(- x))))
(defn ρ1 [n]
(loop [res 0 k 0]
(if (< k n)
(recur (+ res (term k)) (inc k))
res)))
(defn ρ2 [n]
(reduce
(fn [res k] (+ res (term k)))
0
(range 0 n)))
(defn ρ3 [n]
(reduce + 0 (map term (range 0 n))))
(defn ρ4 [n]
(transduce (map term) + 0 (range 0 n)))
(defn calc [ρname ρ n]
(let [start (System/nanoTime)
π (* (ρ n) 4)
end (System/nanoTime)
τ (- end start)]
(printf "ρ=%8s n=%10d π=%.16f τ=%10d\n" ρname n π τ)))
(def args
{:N (map #(long (Math/pow 10 %)) [4 6])
:T 10
:P {:recur ρ1 :reduce ρ2 :mreduce ρ3 :xreduce ρ4}})
(doseq [n (:N args)]
(dotimes [_ (:T args)]
(println "---")
(doseq [kv (:P args)] (calc (key kv) (val kv) n))))

performance gain while using core.reducers

I tried the below to compare the performance of core/map vs transducers vc core.reducers/map vs core.reducers/fold -
(time (->> (range 10000)
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; core.reducers/map
;; "Elapsed time: 3.962802 msecs"
(time (->> (range 10000)
vec
(r/map inc)
(r/map inc)
(r/map inc)
(r/fold conj)))
;; core.reducers/fold
;; "Elapsed time: 3.318809 msecs"
(time (->> (range 10000)
(map inc)
(map inc)
(map inc)))
;; core/map
;; "Elapsed time: 0.148433 msecs"
(time (->> (range 10000)
(sequence (comp (map inc)
(map inc)
(map inc)))))
;; transducers
;; "Elapsed time: 0.215037 msecs"
1) My expectation was that core/map will have the highest time, however it has the lowest time. Why is it more performant than transducers, when intermediate seqs dont get created for transducers, and transducers should be faster ?
2) Why is the core.reducers/fold version not significantly faster than the core.reducers/map version, shouldnt it have parallelized the operation ?
3) Why are the core.reducers versions so slow as compared to their lazy counterparts, the whole sequence is being realized at the end, so should not eager evaluation be more performant than the lazy one ?

map is lazy, so your test case with core/map does no work at all. Try doalling the collection (or into []), and I expect it will be the slowest after all. You can convince yourself of this by changing 10000 to 1e12, and observe that if your computer can process a trillion elements just as quickly as it can process ten thousand, it must not be doing much work for each element!
What is there to parallelize? The most expensive part of this operation is not the calls to inc (which are parallelized), but combining the results into a vector at the end (which can't be). Try it with a much more expensive function, like #(do (Thread/sleep 500) (inc %)) and you may see different results.
Isn't this the same question as (1)?

;; core/map without transducers
(quick-bench (doall (->> [1 2 3 4]
(map inc)
(map inc)
(map inc))))
;; Evaluation count : 168090 in 6 samples of 28015 calls.
;; Execution time mean : 3.651319 µs
;; Execution time std-deviation : 88.055389 ns
;; Execution time lower quantile : 3.584198 µs ( 2.5%)
;; Execution time upper quantile : 3.799202 µs (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe 1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;; transducers with a non lazy seq
(quick-bench (doall (->> [1 2 3 4]
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 214902 in 6 samples of 35817 calls.
;; Execution time mean : 2.776696 µs
;; Execution time std-deviation : 24.377634 ns
;; Execution time lower quantile : 2.750123 µs ( 2.5%)
;; Execution time upper quantile : 2.809933 µs (97.5%)
;; Overhead used : 7.546189 ns
;;;;
;; tranducers with a lazy seq
;;;;
(quick-bench (doall (->> (range 1 5)
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 214230 in 6 samples of 35705 calls.
;; Execution time mean : 3.361220 µs
;; Execution time std-deviation : 622.084860 ns
;; Execution time lower quantile : 2.874093 µs ( 2.5%)
;; Execution time upper quantile : 4.328653 µs (97.5%)
;; Overhead used : 7.546189 ns
;;;;
;; core.reducers
;;;;
(quick-bench (->> [1 2 3 4]
(r/map inc)
(r/map inc)
(r/map inc)))
;; Evaluation count : 6258966 in 6 samples of 1043161 calls.
;; Execution time mean : 89.610689 ns
;; Execution time std-deviation : 0.936108 ns
;; Execution time lower quantile : 88.786938 ns ( 2.5%)
;; Execution time upper quantile : 91.128549 ns (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe 1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;;;; Evaluating a larger range so that the chunking comes into play ;;;;
;; core/map without transducers
(quick-bench (doall (->> (range 500)
(map inc)
(map inc)
(map inc))))
;; transducers with a non lazy seq
(quick-bench (doall (->> (doall (range 500))
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 2598 in 6 samples of 433 calls.
;; Execution time mean : 237.164523 µs
;; Execution time std-deviation : 5.336417 µs
;; Execution time lower quantile : 231.751575 µs ( 2.5%)
;; Execution time upper quantile : 244.836021 µs (97.5%)
;; Overhead used : 7.546189 ns
;; tranducers with a lazy seq
(quick-bench (doall (->> (range 500)
(sequence (comp (map inc)
(map inc)
(map inc))))))
;; Evaluation count : 3210 in 6 samples of 535 calls.
;; Execution time mean : 183.866148 µs
;; Execution time std-deviation : 1.799841 µs
;; Execution time lower quantile : 182.137656 µs ( 2.5%)
;; Execution time upper quantile : 186.347677 µs (97.5%)
;; Overhead used : 7.546189 ns
;; core.reducers
(quick-bench (->> (range 500)
(r/map inc)
(r/map inc)
(r/map inc)))
;; Evaluation count : 4695642 in 6 samples of 782607 calls.
;; Execution time mean : 126.973627 ns
;; Execution time std-deviation : 5.972927 ns
;; Execution time lower quantile : 122.471060 ns ( 2.5%)
;; Execution time upper quantile : 134.181056 ns (97.5%)
;; Overhead used : 7.546189 ns
Based on the above answers / comments I tried the benchmarking again -
1) The reducers version is faster on a magnitude of 10^3.
2) This applies for both small collections (4 elements) and larger ones (500 element) where chunking can happen for lazy seqs.
3) Thus even with chunking, lazy evaluation is much slower than eager evaluation.
Corrections based on the remark :- the reducers only get executed on the reduce operation, which was not getting executed in the above code -
(quick-bench (->> [1 2 3 4]
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; Evaluation count : 331302 in 6 samples of 55217 calls.
;; Execution time mean : 2.035153 µs
;; Execution time std-deviation : 314.070348 ns
;; Execution time lower quantile : 1.720615 µs ( 2.5%)
;; Execution time upper quantile : 2.381706 µs (97.5%)
;; Overhead used : 7.546189 ns
(quick-bench (->> (range 500)
(r/map inc)
(r/map inc)
(r/map inc)
(into [])))
;; Evaluation count : 3870 in 6 samples of 645 calls.
;; Execution time mean : 150.349870 µs
;; Execution time std-deviation : 2.825632 µs
;; Execution time lower quantile : 146.468231 µs ( 2.5%)
;; Execution time upper quantile : 153.271325 µs (97.5%)
;; Overhead used : 7.546189 ns
So the reducer versions are 30-70 % faster than the transducer counterparts. The performance differential increases as the data set size increases.

Assoc or update Clojure lists and lazy sequences

If I have a vector (def v [1 2 3]), I can replace the first element with (assoc v 0 666), obtaining [666 2 3]
But if I try to do the same after mapping over the vector:
(def v (map inc [1 2 3]))
(assoc v 0 666)
the following exception is thrown:
ClassCastException clojure.lang.LazySeq cannot be cast to clojure.lang.Associative
What's the most idiomatic way of editing or updating a single element of a lazy sequence?
Should I use map-indexed and alter only the index 0 or realize the lazy sequence into a vector and then edit it via assoc/update?
The first has the advantage of maintaining the laziness, while the second is less efficient but maybe more obvious.
I guess for the first element I can also use drop and cons.
Are there any other ways? I was not able to find any examples anywhere.

What's the most idiomatic way of editing or updating a single element of a lazy sequence?
There's no built-in function for modifying a single element of a sequence/list, but map-indexed is probably the closest thing. It's not an efficient operation for lists. Assuming you don't need laziness, I'd pour the sequence into a vector, which is what mapv does i.e. (into [] (map f coll)). Depending on how you use your modified sequence, it may be just as performant to vectorize it and modify.
You could write a function using map-indexed to do something similar and lazy:
(defn assoc-seq [s i v]
(map-indexed (fn [j x] (if (= i j) v x)) s))
Or if you want to do this work in one pass lazily without vector-izing, you can also use a transducer:
(sequence
(comp
(map inc)
(map-indexed (fn [j x] (if (= 0 j) 666 x))))
[1 2 3])
Realizing your use case is to only modify the first item in a lazy sequence, then you can do something simpler while preserving laziness:
(concat [666] (rest s))
Update re: comment on optimization: leetwinski's assoc-at function is ~8ms faster when updating the 500,000th element in a 1,000,000 element lazy sequence, so you should use his answer if you're looking to squeeze every bit of performance out of an inherently inefficient operation:
(def big-lazy (range 1e6))
(crit/bench
(last (assoc-at big-lazy 500000 666)))
Evaluation count : 1080 in 60 samples of 18 calls.
Execution time mean : 51.567317 ms
Execution time std-deviation : 4.947684 ms
Execution time lower quantile : 47.038877 ms ( 2.5%)
Execution time upper quantile : 65.604790 ms (97.5%)
Overhead used : 1.662189 ns
Found 6 outliers in 60 samples (10.0000 %)
low-severe 4 (6.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 68.6139 % Variance is severely inflated by outliers
=> nil
(crit/bench
(last (assoc-seq big-lazy 500000 666)))
Evaluation count : 1140 in 60 samples of 19 calls.
Execution time mean : 59.553335 ms
Execution time std-deviation : 4.507430 ms
Execution time lower quantile : 54.450115 ms ( 2.5%)
Execution time upper quantile : 69.288104 ms (97.5%)
Overhead used : 1.662189 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe 4 (6.6667 %)
Variance from outliers : 56.7865 % Variance is severely inflated by outliers
=> nil
The assoc-at version is 2-3x faster when updating the first item in a large lazy sequence, but it's no faster than (last (concat [666] (rest big-lazy))).

i would probably go with something generic like this, if this functionality is really needed (which i strongly doubt about):
(defn assoc-at [data i item]
(if (associative? data)
(assoc data i item)
(if-not (neg? i)
(letfn [(assoc-lazy [i data]
(cond (zero? i) (cons item (rest data))
(empty? data) data
:else (lazy-seq (cons (first data)
(assoc-lazy (dec i) (rest data))))))]
(assoc-lazy i data))
data)))
user> (assoc-at {:a 10} :b 20)
;; {:a 10, :b 20}
user> (assoc-at [1 2 3 4] 3 101)
;; [1 2 3 101]
user> (assoc-at (map inc [1 2 3 4]) 2 123)
;; (2 3 123 5)
another way is to use split-at:
(defn assoc-at [data i item]
(if (neg? i)
data
(let [[l r] (split-at i data)]
(if (seq r)
(concat l [item] (rest r))
data))))
notice that both this functions short circuit the coll traversal, which mapping approach doesn't. Here some quick and dirty benchmark:
(defn massoc-at [data i item]
(if (neg? i)
data
(map-indexed (fn [j x] (if (== i j) item x)) data)))
(time (last (assoc-at (range 10000000) 0 1000)))
;;=> "Elapsed time: 747.921032 msecs"
9999999
(time (last (massoc-at (range 10000000) 0 1000)))
;;=> "Elapsed time: 1525.446511 msecs"
9999999

Get all moving k-sized partitions of a string

I'm trying to get all "moving" partitions sized k of a string. Basically, I want to move a window of sized k along the string and get that k-word.
Here's an example,
k: 3
Input: ABDEFGH
Output: ABD, EFG, BDE, FGH, DEF
My idea was to walk along the input, drop a head and partition and then drop a head again from the previously (now headless) sequence, but I'm not sure exactly how to do this...Also, maybe there's a better way of doing this? Below is the idea I had in mind.
(#(partition k input) (collection of s where head was consecutively dropped))

Strings in Clojure can be treated as seqs of characters, so you can partition them directly. To get a sequence of overlapping partitions, use the version that accepts a size and a step:
user> (partition 3 1 "abcdef")
((\a \b \c) (\b \c \d) (\c \d \e) (\d \e \f))
To put a character sequence back into a string, just apply str to it:
user> (apply str '(\a \b \c))
"abc"
To put it all together:
user> (map (partial apply str) (partition 3 1 "abcdef"))
("abc" "bcd" "cde" "def")

Here are implementations of partition and partition-all for strings, returning a lazy-seq of strings, doing the splitting using subs. If you need high performance doing string transformations, these will be significantly faster (by average 8 times as fast, see criterium benchmarks below) than creating char-seqs.
(defn partition-string
"Like partition, but splits string using subs and returns a
lazy-seq of strings."
([n s]
(partition-string n n s))
([n p s]
(lazy-seq
(if-not (< (count s) n)
(cons
(subs s 0 n)
(->> (subs s p)
(partition-string n p)))))))
(defn partition-string-all
"Like partition-all, but splits string using subs and returns a
lazy-seq of strings."
([n s]
(partition-string-all n n s))
([n p s]
(let [less (if (< (count s) n)
(count s))]
(lazy-seq
(cons
(subs s 0 (or less n))
(if-not less
(->> (subs s p)
(partition-string-all n p))))))))
;; Alex answer:
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (map (partial apply str) (partition 3 1 test-str)))))
;; WARNING: Final GC required 1.010207840526515 % of runtime
;; Evaluation count : 773220 in 60 samples of 12887 calls.
;; Execution time mean : 79.900801 µs
;; Execution time std-deviation : 2.008823 µs
;; Execution time lower quantile : 77.725304 µs ( 2.5%)
;; Execution time upper quantile : 83.888349 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 3 (5.0000 %)
;; Variance from outliers : 12.5585 % Variance is moderately inflated by outliers
;; KobbyPemson answer:
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (moving-partition test-str 3))))
;; WARNING: Final GC required 1.674347646128195 % of runtime
;; Evaluation count : 386820 in 60 samples of 6447 calls.
;; Execution time mean : 161.928479 µs
;; Execution time std-deviation : 8.362590 µs
;; Execution time lower quantile : 154.707888 µs ( 2.5%)
;; Execution time upper quantile : 184.095816 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 2 (3.3333 %)
;; low-mild 1 (1.6667 %)
;; Variance from outliers : 36.8985 % Variance is moderately inflated by outliers
;; This answer
;; (let [test-str "abcdefghijklmnopqrstuwxyz"]
;; (criterium.core/bench
;; (doall
;; (partition-string 3 1 test-str))))
;; WARNING: Final GC required 1.317098148979236 % of runtime
;; Evaluation count : 5706000 in 60 samples of 95100 calls.
;; Execution time mean : 10.480174 µs
;; Execution time std-deviation : 240.957206 ns
;; Execution time lower quantile : 10.234580 µs ( 2.5%)
;; Execution time upper quantile : 11.075740 µs (97.5%)
;; Overhead used : 17.786101 ns
;; Found 3 outliers in 60 samples (5.0000 %)
;; low-severe 3 (5.0000 %)
;; Variance from outliers : 10.9961 % Variance is moderately inflated by outliers

(defn moving-partition
[input k]
(map #(.substring input % (+ k %))
(range (- (count input) (dec k)))))

why is this looping function so slow compared to map?

I looked at maps source code which basically keeps creating lazy sequences. I would think that iterating over a collection and adding to a transient vector would be faster, but clearly it isn't. What don't I understand about clojures performance behavior?
;=> (time (do-with / (range 1 1000) (range 1 1000)))
;"Elapsed time: 23.1808 msecs"
;
; vs
;=> (time (doall (map #(/ %1 %2) (range 1 1000) (range 1 1000))))
;"Elapsed time: 2.604174 msecs"
(defn do-with
[fn coll1 coll2]
(let [end (count coll1)]
(loop [i 0
res (transient [])]
(if
(= i end)
(persistent! res)
(let [x (nth coll1 i)
y (nth coll2 i)
r (fn x y)]
(recur (inc i) (conj! res r)))
))))

In order of conjectured impact on relative results:
Your do-with function uses nth to access the individual items in the input collections. nth operates in linear time on ranges, making do-with quadratic. Needless to say, this will kill performance on large collections.
range produces chunked seqs and map handles those extremely efficiently. (Essentially it produces chunks of up to 32 elements -- here it will in fact be exactly 32 -- by running a tight loop over the internal array of each input chunk in turn, placing results in internal arrays of output chunks.)
Benchmarking with time doesn't give you steady state performance. (Which is why one should really use a proper benchmarking library; in the case of Clojure, Criterium is the standard solution.)
Incidentally, (map #(/ %1 %2) xs ys) can simply be written as (map / xs ys).
Update:
I've benchmarked the map version, the original do-with and a new do-with version with Criterium, using (range 1 1000) as both inputs in each case (as in the question text), obtaining the following mean execution times:
;;; (range 1 1000)
new do-with 170.383334 µs
(doall (map ...)) 230.756753 µs
original do-with 15.624444 ms
Additionally, I've repeated the benchmark using a vector stored in a Var as input rather than ranges (that is, with (def r (vec (range 1 1000))) at the start and using r as both collection arguments in each benchmark). Unsurprisingly, the original do-with came in first -- nth is very fast on vectors (plus using nth with a vector avoids all the intermediate allocations involved in seq traversal).
;;; (vec (range 1 1000))
original do-with 73.975419 µs
new do-with 87.399952 µs
(doall (map ...)) 153.493128 µs
Here's the new do-with with linear time complexity:
(defn do-with [f xs ys]
(loop [xs (seq xs)
ys (seq ys)
ret (transient [])]
(if (and xs ys)
(recur (next xs)
(next ys)
(conj! ret (f (first xs) (first ys))))
(persistent! ret))))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Optimize tail-recursion in Clojure: exponential moving average - clojure

Related

Why reduce performs much sub-optimal than recur in this case?

performance gain while using core.reducers

Assoc or update Clojure lists and lazy sequences

Get all moving k-sized partitions of a string

why is this looping function so slow compared to map?

Categories

Resources