Lazy sequence using loop/recur? - clojure

I'd like to write an implementation to an algorithm that produces an infinite sequence of results, where each element represents the calculation of a single iteration of the algorithm. Using a lazy sequence is convenient, as it decouples the logic of the number of iterations (by using take) and burn-in iterations (by using drop) from the implementation.
Here's an example of two algorithm implementations, one that produces a lazy sequence (yadda-lazy), and one that does not (yadda-loop).
(defn yadda-iter
[v1 v2 v3]
(+ (first v1)
(first v2)
(first v3)))
(defn yadda-lazy
[len]
(letfn [(inner [v1 v2 v3]
(cons (yadda-iter v1 v2 v3)
(lazy-seq (inner (rest v1)
(rest v2)
(rest v3)))))]
(let [base (cycle (range len))]
(inner base
(map #(* %1 %1) base)
(map #(* %1 %1 %1) base)))))
(defn yadda-loop
[len iters]
(let [base (cycle (range len))]
(loop [result nil
i 0
v1 base
v2 (map #(* %1 %1) base)
v3 (map #(* %1 %1 %1) base)]
(if (= i iters)
result
(recur (cons (yadda-iter v1 v2 v3) result)
(inc i)
(rest v1)
(rest v2)
(rest v3))))))
(prn (take 11 (yadda-lazy 4)))
(prn (yadda-loop 4 11))
Is there a way to create a lazy sequence using the same style as loop/recur? I like yadda-loop better, because:
It's more obvious what the initial conditions are and how the algorithm progresses to the next iteration.
It won't suffer from a stack overflow due to tail optimization.

Your loop version would be better written to (1) pull the addition out of the loop so you don't have to recur on so many sequences, and (2) use conj on a vector accumulator so your results are in the same order as your yadda-lazy.
(defn yadda-loop-2 [len iters]
(let [v1 (cycle (range len))
v2 (map * v1 v1)
v3 (map * v1 v2)
s (map + v1 v2 v3)]
(loop [result [], s s, i 0]
(if (= i iters)
result
(recur (conj result (first s)), (rest s), (inc i))))))
However, at this point it becomes clear that the loop is pointless as this is just
(defn yadda-loop-3 [len iters]
(let [v1 (cycle (range len))
v2 (map * v1 v1)
v3 (map * v1 v2)
s (map + v1 v2 v3)]
(into [] (take iters s))))
and we might as well pull out the iters parameter, return simply s and take from it.
(defn yadda-yadda [len]
(let [v1 (cycle (range len))
v2 (map * v1 v1)
v3 (map * v1 v2)]
(map + v1 v2 v3)))
This produces the same results as your yadda-lazy, is also lazy, and is quite clear
(take 11 (yadda-yadda 4)) ;=> (0 3 14 39 0 3 14 39 0 3 14)
You could also, equivalently
(defn yadda-yadda [len]
(as-> (range len) s
(cycle s)
(take 3 (iterate (partial map * s) s))
(apply map + s)))
Addendum
If you are looking for a pattern for converting an eager loop like yours to a lazy-sequence
(loop [acc [] args args] ...) -> ((fn step [args] ...) args)
(if condition (recur ...) acc) -> (when condition (lazy-seq ...)
(recur (conj acc (f ...)) ...) -> (lazy-seq (cons (f ...) (step ...)))
Applying this to your yadda-lazy
(defn yadda-lazy-2 [len iters]
(let [base (cycle (range len))]
((fn step [i, v1, v2, v3]
(when (< i iters)
(lazy-seq
(cons (yadda-iter v1 v2 v3)
(step (inc i), (rest v1), (rest v2), (rest v3))))))
0, base, (map #(* %1 %1) base), (map #(* %1 %1 %1) base))))
And at this point you'd probably want to pull out the iters
(defn yadda-lazy-3 [len]
(let [base (cycle (range len))]
((fn step [v1, v2, v3]
(lazy-seq
(cons (yadda-iter v1 v2 v3)
(step (rest v1), (rest v2), (rest v3)))))
base, (map #(* %1 %1) base), (map #(* %1 %1 %1) base))))
So you can
(take 11 (yadda-lazy-3 4)) ;=> (0 3 14 39 0 3 14 39 0 3 14)
And then you might say, hey, my yadda-iter is just applying + on the first and step is applied on the rest, so why not combine my v1, v2, v3 and make this a bit clearer?
(defn yadda-lazy-4 [len]
(let [base (cycle (range len))]
((fn step [vs]
(lazy-seq
(cons (apply + (map first vs))
(step (map rest vs)))))
[base, (map #(* %1 %1) base), (map #(* %1 %1 %1) base)])))
And lo and behold, you have just re-implemented variadic map
(defn yadda-lazy-5 [len]
(let [base (cycle (range len))]
(map + base, (map #(* %1 %1) base), (map #(* %1 %1 %1) base))))

#A.Webb's answer is perfect, but if your love for loop/recur overcomes his arguments, know that you can still combine both styles of recursion.
For example, take a look at the implementation of range:
(defn range
(...)
([start end step]
(lazy-seq
(let [b (chunk-buffer 32)
comp (cond (or (zero? step) (= start end)) not=
(pos? step) <
(neg? step) >)]
(loop [i start] ;; chunk building through loop/recur
(if (and (< (count b) 32)
(comp i end))
(do
(chunk-append b i)
(recur (+ i step)))
(chunk-cons (chunk b)
(when (comp i end)
(range i end step))))))))) ;; lazy recursive call
Here's another example, an alternate implementation of filter:
(defn filter [pred coll]
(letfn [(step [pred coll]
(when-let [[x & more] (seq coll)]
(if (pred x)
(cons x (lazy-seq (step pred more))) ;; lazy recursive call
(recur pred more))))] ;; eager recursive call
(lazy-seq (step pred coll))))

The Tupelo library has a new lazy-gen/yield feature that mimics generator functions in Python. It can generate a lazy sequence from any point in a looping structure. Here is version of yadda-loop that shows lazy-gen & yield in action:
(ns tst.xyz
(:use clojure.test tupelo.test)
(:require [tupelo.core :as t] ))
(defn yadda-lazy-gen
[len iters]
(t/lazy-gen
(let [base (cycle (range len))]
(loop [i 0
v1 base
v2 (map #(* %1 %1) base)
v3 (map #(* %1 %1 %1) base)]
(when (< i iters)
(t/yield (yadda-iter v1 v2 v3))
(recur
(inc i)
(rest v1)
(rest v2)
(rest v3)))))))
Testing tst.clj.core
(take 11 (yadda-lazy 4)) => (0 3 14 39 0 3 14 39 0 3 14)
(yadda-loop 4 11) => (0 3 14 39 0 3 14 39 0 3 14)
(yadda-lazy-gen 4 11) => (0 3 14 39 0 3 14 39 0 3 14)
Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

Related

lazy re-implementation of Clojure Interleaving

I want to rewrite (I'm not sure if the original implementation is lazy of not) a lazy implementation of clojure interleaving using lazy-seq that works like this :
(take 4 (lazy-interleave ’( 1 2 3) ’( a b c)))
(1 a 2 b)
I came up with something like this, but I'm not sure why it doesn't work:
(defn lazy-interleave [v1 v2]
(lazy-seq (concat (list (first v1) (first v2))) (lazy-interleave (next v1) (next v2)))
)
Edit:
Thanks to Arthur's answer, here is a modified working solution:
(defn lazy-interleave [v1 v2]
(lazy-seq
(cons (first v1) (cons (first v2) (lazy-interleave (rest v1) (rest v2))))
)
)
A bit of reformatting reveals the problem:
(defn lazy-interleave [v1 v2]
(lazy-seq
(concat (list (first v1) (first v2)))
(lazy-interleave (next v1) (next v2))))
In other words, you're constructing a lazy sequence that, when realized, will evaluate (concat (list (first v1) (first v2))), ignore the result, and then try to evaluate and return (lazy-interleave (next v1) (next v2)). This call to lazy-interleave does the same thing, again dropping the first elements of v1 and v2, and so on, ad infinitum.
You never get to the bottom because you have no empty check, and so since (next nil) returns nil, it just keeps going even after you exhaust both sequences. You don't get a StackOverflowError because you're using lazy sequences instead of recursion.
A correct implementation would look like this:
(defn lazy-interleave [v1 v2]
(when (and (seq v1) (seq v2))
(lazy-cat [(first v1) (first v2)]
(lazy-interleave (rest v1) (rest v2)))))
interleave is already lazy:
user> (interleave (take 5 (iterate #(do (println "sequence A:" %) (inc %)) 0))
(take 5 (iterate #(do (println "sequence B:" %) (inc %)) 100)))
sequence A: 0
sequence B: 100
sequence A: 1
sequence B: 101
sequence A: 2
sequence B: 102
sequence A: 3
sequence B: 103
(0 100 1 101 2 102 3 103 4 104)
user> (take 4 (interleave (take 5 (iterate #(do (println "sequence A:" %) (inc %)) 0))
(take 5 (iterate #(do (println "sequence B:" %) (inc %)) 100))))
sequence A: 0
sequence B: 100
(0 100 1 101)
And the core of it's implementation looks much like your example:
(lazy-seq
(let [s1 (seq c1) s2 (seq c2)]
(when (and s1 s2)
(cons (first s1) (cons (first s2)
(interleave (rest s1) (rest s2)))))))
Except it also works on more than two sequences, so it has another arity that handles that case.

Function for replacing subsequences

Is there a function that could replace subsequences? For example:
user> (good-fnc [1 2 3 4 5] [1 2] [3 4 5])
;; => [3 4 5 3 4 5]
I know that there is clojure.string/replace for strings:
user> (clojure.string/replace "fat cat caught a rat" "a" "AA")
;; => "fAAt cAAt cAAught AA rAAt"
Is there something similar for vectors and lists?
Does this work for you?
(defn good-fnc [s sub r]
(loop [acc []
s s]
(cond
(empty? s) (seq acc)
(= (take (count sub) s) sub) (recur (apply conj acc r)
(drop (count sub) s))
:else (recur (conj acc (first s)) (rest s)))))
Here is a version that plays nicely with lazy seq inputs. Note that it can take an infinite lazy sequence (range) without looping infinitely as a loop based version would.
(defn sq-replace
[match replacement sq]
(let [matching (count match)]
((fn replace-in-sequence [[elt & elts :as sq]]
(lazy-seq
(cond (empty? sq)
()
(= match (take matching sq))
(concat replacement (replace-in-sequence (drop matching sq)))
:default
(cons elt (replace-in-sequence elts)))))
sq)))
#'user/sq-replace
user> (take 10 (sq-replace [3 4 5] ["hello, world"] (range)))
(0 1 2 "hello, world" 6 7 8 9 10 11)
I took the liberty of making the sequence argument the final argument, since this is the convention in Clojure for functions that walk a sequence.
My previous (now deleted) answer was incorrect because this was not as trivial as I first thought, here is my second attempt:
(defn seq-replace
[coll sub rep]
(letfn [(seq-replace' [coll]
(when-let [s (seq coll)]
(let [start (take (count sub) s)
end (drop (count sub) s)]
(if (= start sub)
(lazy-cat rep (seq-replace' end))
(cons (first s) (lazy-seq (seq-replace' (rest s))))))))]
(seq-replace' coll)))

Clojure: find repetition

Let's say we have a list of integers: 1, 2, 5, 13, 6, 5, 7 and I want to find the first number that repeats and return a vector of the two indices. In my sample, it's 5 at [2, 5]. What I did so far is loop, but can I do it more elegant, short way?
(defn get-cycle
[xs]
(loop [[x & xs_rest] xs, indices {}, i 0]
(if (nil? x)
[0 i] ; Sequence is over before we found a duplicate.
(if-let [x_index (indices x)]
[x_index i]
(recur xs_rest (assoc indices x i) (inc i))))))
No need to return number itself, because I can get it by index and, second, it may be not always there.
An option using list processing, but not significantly more concise:
(defn get-cycle [xs]
(first (filter #(number? (first %))
(reductions
(fn [[m i] x] (if-let [xat (m x)] [xat i] [(assoc m x i) (inc i)]))
[(hash-map) 0] xs))))
Here is a version using reduced to stop consuming the sequence when you find the first duplicate:
(defn first-duplicate [coll]
(reduce (fn [acc [idx x]]
(if-let [v (get acc x)]
(reduced (conj v idx))
(assoc acc x [idx])))
{} (map-indexed #(vector % %2) coll)))
I know that you have only asked for the first. Here is a fully lazy implementation with little per-step allocation overhead
(defn dups
[coll]
(letfn [(loop-fn [idx [elem & rest] cached]
(if elem
(if-let [last-idx (cached elem)]
(cons [last-idx idx]
(lazy-seq (loop-fn (inc idx) rest (dissoc cached elem))))
(lazy-seq (loop-fn (inc idx) rest (assoc cached elem idx))))))]
(loop-fn 0 coll {})))
(first (dups v))
=> [2 5]
Edit: Here are some criterium benchmarks:
The answer that got accepted: 7.819269 µs
This answer (first (dups [12 5 13 6 5 7])): 6.176290 µs
Beschastnys: 5.841101 µs
first-duplicate: 5.025445 µs
Actually, loop is a pretty good choice unless you want to find all duplicates. Things like reduce will cause the full scan of an input sequence even when it's not necessary.
Here is my version of get-cycle:
(defn get-cycle [coll]
(loop [i 0 seen {} coll coll]
(when-let [[x & xs] (seq coll)]
(if-let [j (seen x)]
[j i]
(recur (inc i) (assoc seen x i) xs)))))
The only difference from your get-cycle is that my version returns nil when there is no duplicates.
The intent of your code seems different from your description in the comments so I'm not totally confident I understand. That said, loop/recur is definitely a valid way to approach the problem.
Here's what I came up with:
(defn get-cycle [xs]
(loop [xs xs index 0]
(when-let [[x & more] (seq xs)]
(when-let [[y] (seq more)]
(if (= x y)
{x [index (inc index)]}
(recur more (inc index)))))))
This will return a map of the repeated item to a vector of the two indices the item was found at.
(get-cycle [1 1 2 1 2 4 2 1 4 5 6 7])
;=> {1 [0 1]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 7])
;=> {7 [10 11]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 8])
;=> nil
Here's an alternative solution using sequence functions. I like this way better but whether it's shorter or more elegant is probably subjective.
(defn pairwise [coll]
(map vector coll (rest coll)))
(defn find-first [pred xs]
(first (filter pred xs)))
(defn get-cycle [xs]
(find-first #(apply = (val (first %)))
(map-indexed hash-map (pairwise xs))))
Edited based on clarification from #demi
Ah, got it. Is this what you have in mind?
(defn get-cycle [xs]
(loop [xs (map-indexed vector xs)]
(when-let [[[i n] & more] (seq xs)]
(if-let [[j _] (find-first #(= n (second %)) more)]
{n [i j]}
(recur more)))))
I re-used find-first from my earlier sequence-based solution.

Is there any better and more idiomatic way of taking "while not enough" from a seq?

I need to take some amount of elements from a sequence based on some quantity rule. Here is a solution I came up with:
(defn take-while-not-enough
[p len xs]
(loop [ac 0
r []
s xs]
(if (empty? s)
r
(let [new-ac (p ac (first s))]
(if (>= new-ac len)
r
(recur new-ac (conj r (first s)) (rest s)))))))
(take-while-not-enough + 10 [2 5 7 8 2 1]) ; [2 5]
(take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]]) ; [[2 5]]
Is there any better way to achieve the same?
Thanks.
UPDATE:
Somebody posted that solution, but then removed it. It does the same is the answer that I accepted, but is more readable. Thank you, anonymous well-wisher!
(defn take-while-not-enough [reducer-fn limit data]
(->> (reductions reducer-fn 0 data) ; 1. the sequence of accumulated values
(map vector data) ; 2. paired with the original sequence
(take-while #(< (second %) limit)) ; 3. until a certain accumulated value
(map first))) ; 4. then extract the original values
My first thought is to view this problem as a variation on reduce and thus to break the problem into two steps:
count the number of items in the result
take that many from the input
I also took some liberties with the argument names:
user> (defn take-while-not-enough [reducer-fn limit data]
(take (dec (count (take-while #(< % limit) (reductions reducer-fn 0 data))))
data))
#'user/take-while-not-enough
user> (take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]])
([2 5])
user> (take-while-not-enough + 10 [2 5 7 8 2 1])
(2 5)
This returns a sequence and your examples return a vector, if this is important then you can add a call to vec
Something that would traverse the input sequence only once:
(defn take-while-not-enough [r v data]
(->> (rest (reductions (fn [s i] [(r (s 0) i) i]) [0 []] data))
(take-while (comp #(< % v) first))
(map second)))
Well, if you want to use flatland/useful, this is a kinda-okay way to use glue:
(defn take-while-not-enough [p len xs]
(first (glue conj []
(constantly true)
#(>= (reduce p 0 %) len)
xs)))
But it's rebuilding the accumulator for the entire "processed so far" chunk every time it decides whether to grow the chunk more, so it's O(n^2), which will be unacceptable for larger inputs.
The most obvious improvement to your implementation is to make it lazy instead of tail-recursive:
(defn take-while-not-enough [p len xs]
((fn step [acc coll]
(lazy-seq
(when-let [xs (seq coll)]
(let [x (first xs)
acc (p acc x)]
(when-not (>= acc len)
(cons x (step acc xs)))))))
0 xs))
Sometimes lazy-seq is straight-forward and self-explaining.
(defn take-while-not-enough
([f limit coll] (take-while-not-enough f limit (f) coll))
([f limit acc coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
nacc (f acc fst)]
(when (< nxt-sd limit)
(cons fst (take-while-not-enough f limit nacc (rest s)))))))))
Note: f is expected to follow the rules of reduce.

How do I filter elements from a sequence based on indexes

I have a sequence s and a list of indexes into this sequence indexes. How do I retain only the items given via the indexes?
Simple example:
(filter-by-index '(a b c d e f g) '(0 2 3 4)) ; => (a c d e)
My usecase:
(filter-by-index '(c c# d d# e f f# g g# a a# b) '(0 2 4 5 7 9 11)) ; => (c d e f g a b)
You can use keep-indexed:
(defn filter-by-index [coll idxs]
(keep-indexed #(when ((set idxs) %1) %2)
coll))
Another version using explicit recur and lazy-seq:
(defn filter-by-index [coll idxs]
(lazy-seq
(when-let [idx (first idxs)]
(if (zero? idx)
(cons (first coll)
(filter-by-index (rest coll) (rest (map dec idxs))))
(filter-by-index (drop idx coll)
(map #(- % idx) idxs))))))
make a list of vectors containing the items combined with the indexes,
(def with-indexes (map #(vector %1 %2 ) ['a 'b 'c 'd 'e 'f] (range)))
#'clojure.core/with-indexes
with-indexes
([a 0] [b 1] [c 2] [d 3] [e 4] [f 5])
filter this list
lojure.core=> (def filtered (filter #(#{1 3 5 7} (second % )) with-indexes))
#'clojure.core/filtered
clojure.core=> filtered
([b 1] [d 3] [f 5])
then remove the indexes.
clojure.core=> (map first filtered)
(b d f)
then we thread it together with the "thread last" macro
(defn filter-by-index [coll idxs]
(->> coll
(map #(vector %1 %2)(range))
(filter #(idxs (first %)))
(map second)))
clojure.core=> (filter-by-index ['a 'b 'c 'd 'e 'f 'g] #{2 3 1 6})
(b c d g)
The moral of the story is, break it into small independent parts, test them, then compose them into a working function.
The easiest solution is to use map:
(defn filter-by-index [coll idx]
(map (partial nth coll) idx))
I like Jonas's answer, but neither version will work well for an infinite sequence of indices: the first tries to create an infinite set, and the latter runs into a stack overflow by layering too many unrealized lazy sequences on top of each other. To avoid both problems you have to do slightly more manual work:
(defn filter-by-index [coll idxs]
((fn helper [coll idxs offset]
(lazy-seq
(when-let [idx (first idxs)]
(if (= idx offset)
(cons (first coll)
(helper (rest coll) (rest idxs) (inc offset)))
(helper (rest coll) idxs (inc offset))))))
coll idxs 0))
With this version, both coll and idxs can be infinite and you will still have no problems:
user> (nth (filter-by-index (range) (iterate #(+ 2 %) 0)) 1e6)
2000000
Edit: not trying to single out Jonas's answer: none of the other solutions work for infinite index sequences, which is why I felt a solution that does is needed.
I had a similar use case and came up with another easy solution. This one expects vectors.
I've changed the function name to match other similar clojure functions.
(defn select-indices [coll indices]
(reverse (vals (select-keys coll indices))))
(defn filter-by-index [seq idxs]
(let [idxs (into #{} idxs)]
(reduce (fn [h [char idx]]
(if (contains? idxs idx)
(conj h char) h))
[] (partition 2 (interleave seq (iterate inc 0))))))
(filter-by-index [\a \b \c \d \e \f \g] [0 2 3 4])
=>[\a \c \d \e]
=> (defn filter-by-index [src indexes]
(reduce (fn [a i] (conj a (nth src i))) [] indexes))
=> (filter-by-index '(a b c d e f g) '(0 2 3 4))
[a c d e]
I know this is not what was asked, but after reading these answers, I realized in my own personal use case, what I actually wanted was basically filtering by a mask.
So here was my take. Hopefully this will help someone else.
(defn filter-by-mask [coll mask]
(filter some? (map #(if %1 %2) mask coll)))
(defn make-errors-mask [coll]
(map #(nil? (:error %)) coll))
Usage
(let [v [{} {:error 3} {:ok 2} {:error 4 :yea 7}]
data ["one" "two" "three" "four"]
mask (make-errors-mask v)]
(filter-by-mask data mask))
; ==> ("one" "three")