Why the program runs endlessly? - clojure

Why the program runs endlessly?
(defn lazycombine
([s] (lazycombine s []))
([s v] (let [a (take 1 s)
b (drop 1 s)]
(if (= a :start)
(lazy-seq (lazycombine b v))
(if (= a :end)
(lazy-seq (cons v (lazycombine b [])))
(lazy-seq (lazycombine b (conj v a))))))))
(def w '(:start 1 2 3 :end :start 7 7 :end))
(lazycombine w)
I need a function that returns a lazy sequence of elements by taking elements from another sequence of the form [: start 1 2: end: start: 5: end] and combining all the elements between: start and: end into a vector

You need to handle the termination condition - i.e. what should return when input s is empty?
And also the detection of :start and :end should use first instead of (take 1 s). And you can simplify that with destructuring.
(defn lazycombine
([s] (lazycombine s []))
([[a & b :as s] v]
(if (empty? s)
v
(if (= a :start)
(lazy-seq (lazycombine b v))
(if (= a :end)
(lazy-seq (cons v (lazycombine b [])))
(lazy-seq (lazycombine b (conj v a))))))))
(def w '(:start 1 2 3 :end :start 7 7 :end))
(lazycombine w)
;; => ([1 2 3] [7 7])
To reduce cyclomatic complexity a bit, you can use condp to replace couple if:
(defn lazycombine
([s] (lazycombine s []))
([[a & b :as s] v]
(if (empty? s)
v
(lazy-seq
(condp = a
:start (lazycombine b v)
:end (cons v (lazycombine b []))
(lazycombine b (conj v a)))))))

I would do it like so, using take-while:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(def data
[:start 1 2 3 :end :start 7 7 :end])
(defn end-tag? [it] (= it :end))
(defn start-tag? [it] (= it :start))
(defn lazy-segments
[data]
(when-not (empty? data)
(let [next-segment (take-while #(not (end-tag? %)) data)
data-next (drop (inc (count next-segment)) data)
segment-result (vec (remove #(start-tag? %) next-segment))]
(cons segment-result
(lazy-seq (lazy-segments data-next))))))
(dotest
(println "result: " (lazy-segments data)))
Running we get:
result: ([1 2 3] [7 7])
Note the contract when constructing a sequence recursively using cons (lazy or not). You must return either the next value in the sequence, or nil. Supplying nil to cons is the same as supplying an empty sequence:
(cons 5 nil) => (5)
(cons 5 []) => (5)
So it is convenient to use a when form to test the termination condition (instead of using if and returning an empty vector when the sequence must end).
Suppose we wrote the cons as a simple recursion:
(cons segment-result
(lazy-segments data-next))
This works great and produces the same result. The only thing the lazy-seq part does is to delay when the recursive call takes place. Because lazy-seq is a Clojure built-in (special form), it it is similar to loop/recur and does not consume the stack like ordinary recursion does . Thus, we can generate millions (or more) values in the lazy sequence without creating a StackOverflowError (on my computer, the default maximum stack size is ~4000). Consider the infinite lazy-sequence of integers beginning at 0:
(defn intrange
[n]
(cons n (lazy-seq (intrange (inc n)))))
(dotest
(time
(spyx (first (drop 1e6 (intrange 0))))))
Dropping the first million integers and taking the next one succeeds and requires only a few milliseconds:
(first (drop 1000000.0 (intrange 0))) => 1000000
"Elapsed time: 49.5 msecs"

Related

Clojure. Why the wrapper lazy-seq is required?

Why the wrapper lazy-cons is required? There are two functions with the same result.
(defn seq1 [s]
(lazy-seq
(when-let [x (seq s)]
(cons (first x) (seq1 (rest x))))))
(defn seq2 [s]
(when-let [x (seq s)]
(cons (first x) (seq2 (rest x)))))
Both case I got the same result none chunked sequences.
repl.core=> (first (map println (seq1 (range 1000))))
0
nil
repl.core=> (first (map println (seq2 (range 1000))))
0
nil
repl.core=> (chunked-seq? (seq2 (range 1000)))
false
repl.core=> (chunked-seq? (seq1 (range 1000)))
false
The first is lazy. It only evaluates elements of the sequence as needed. The second however is strict and goes through the entire sequence immediately. This can be seen if you add some println calls in each:
(defn seq1 [s]
(lazy-seq
(when-let [x (seq s)]
(println "Seq1" (first x))
(cons (first x) (seq1 (rest x))))))
(defn seq2 [s]
(when-let [x (seq s)]
(println "Seq2" (first x))
(cons (first x) (seq2 (rest x)))))
(->> (range 10)
(seq1)
(take 5))
Seq1 0
Seq1 1
Seq1 2
Seq1 3
Seq1 4 ; Only iterated over what was asked for
=> (0 1 2 3 4)
(->> (range 10)
(seq2)
(take 5))
Seq2 0
Seq2 1
Seq2 2
Seq2 3
Seq2 4
Seq2 5
Seq2 6
Seq2 7
Seq2 8
Seq2 9 ; Iterated over everything immediately
=> (0 1 2 3 4)
So, to answer the question, lazy-seq is only required if you intend the iteration of the sequence to only happen as needed. Prefer lazy solutions if you think you may not need the entire sequence, or the sequence is infinite, or you want to sequence several transformations using map or filter. Use a strict solution if you're likely going to need the entire sequence at some point and/or you need fast random access.

Make (map f c1 c2) map (count c1) times, even if c2 has less elements

When doing
(map f [0 1 2] [:0 :1])
f will get called twice, with the arguments being
0 :0
1 :1
Is there a simple yet efficient way, i.e. without producing more intermediate sequences etc., to make f get called for every value of the first collection, with the following arguments?
0 :0
1 :1
2 nil
Edit Addressing question by #fl00r in the comments.
The actual use case that triggered this question needed map to always work exactly (count first-coll) times, regardless if the second (or third, or ...) collection was longer.
It's a bit late in the game now and somewhat unfair after having accepted an answer, but if a good answer gets added that only does what I specifically asked for - mapping (count first-coll) times - I would accept that.
You could do:
(map f [0 1 2] (concat [:0 :1] (repeat nil)))
Basically, pad the second coll with an infinite sequence of nils. map stops when it reaches the end of the first collection.
An (eager) loop/recur form that walks to end of longest:
(loop [c1 [0 1 2] c2 [:0 :1] o []]
(if (or (seq c1) (seq c2))
(recur (rest c1) (rest c2) (conj o (f (first c1) (first c2))))
o))
Or you could write a lazy version of map that did something similar.
A general lazy version, as suggested by Alex Miller's answer, is
(defn map-all [f & colls]
(lazy-seq
(when-not (not-any? seq colls)
(cons
(apply f (map first colls))
(apply map-all f (map rest colls))))))
For example,
(map-all vector [0 1 2] [:0 :1])
;([0 :0] [1 :1] [2 nil])
You would probably want to specialise map-all for one and two collections.
just for fun
this could easily be done with common lisp's do macro. We could implement it in clojure and do this (and much more fun things) with it:
(defmacro cl-do [clauses [end-check result] & body]
(let [clauses (map #(if (coll? %) % (list %)) clauses)
bindings (mapcat (juxt first second) clauses)
nexts (map #(nth % 2 (first %)) clauses)]
`(loop [~#bindings]
(if ~end-check
~result
(do
~#body
(recur ~#nexts))))))
and then just use it for mapping (notice it can operate on more than 2 colls):
(defn map-all [f & colls]
(cl-do ((colls colls (map next colls))
(res [] (conj res (apply f (map first colls)))))
((every? empty? colls) res)))
in repl:
user> (map-all vector [1 2 3] [:a :s] '[z x c v])
;;=> [[1 :a z] [2 :s x] [3 nil c] [nil nil v]]

Function for replacing subsequences

Is there a function that could replace subsequences? For example:
user> (good-fnc [1 2 3 4 5] [1 2] [3 4 5])
;; => [3 4 5 3 4 5]
I know that there is clojure.string/replace for strings:
user> (clojure.string/replace "fat cat caught a rat" "a" "AA")
;; => "fAAt cAAt cAAught AA rAAt"
Is there something similar for vectors and lists?
Does this work for you?
(defn good-fnc [s sub r]
(loop [acc []
s s]
(cond
(empty? s) (seq acc)
(= (take (count sub) s) sub) (recur (apply conj acc r)
(drop (count sub) s))
:else (recur (conj acc (first s)) (rest s)))))
Here is a version that plays nicely with lazy seq inputs. Note that it can take an infinite lazy sequence (range) without looping infinitely as a loop based version would.
(defn sq-replace
[match replacement sq]
(let [matching (count match)]
((fn replace-in-sequence [[elt & elts :as sq]]
(lazy-seq
(cond (empty? sq)
()
(= match (take matching sq))
(concat replacement (replace-in-sequence (drop matching sq)))
:default
(cons elt (replace-in-sequence elts)))))
sq)))
#'user/sq-replace
user> (take 10 (sq-replace [3 4 5] ["hello, world"] (range)))
(0 1 2 "hello, world" 6 7 8 9 10 11)
I took the liberty of making the sequence argument the final argument, since this is the convention in Clojure for functions that walk a sequence.
My previous (now deleted) answer was incorrect because this was not as trivial as I first thought, here is my second attempt:
(defn seq-replace
[coll sub rep]
(letfn [(seq-replace' [coll]
(when-let [s (seq coll)]
(let [start (take (count sub) s)
end (drop (count sub) s)]
(if (= start sub)
(lazy-cat rep (seq-replace' end))
(cons (first s) (lazy-seq (seq-replace' (rest s))))))))]
(seq-replace' coll)))

Clojure: find repetition

Let's say we have a list of integers: 1, 2, 5, 13, 6, 5, 7 and I want to find the first number that repeats and return a vector of the two indices. In my sample, it's 5 at [2, 5]. What I did so far is loop, but can I do it more elegant, short way?
(defn get-cycle
[xs]
(loop [[x & xs_rest] xs, indices {}, i 0]
(if (nil? x)
[0 i] ; Sequence is over before we found a duplicate.
(if-let [x_index (indices x)]
[x_index i]
(recur xs_rest (assoc indices x i) (inc i))))))
No need to return number itself, because I can get it by index and, second, it may be not always there.
An option using list processing, but not significantly more concise:
(defn get-cycle [xs]
(first (filter #(number? (first %))
(reductions
(fn [[m i] x] (if-let [xat (m x)] [xat i] [(assoc m x i) (inc i)]))
[(hash-map) 0] xs))))
Here is a version using reduced to stop consuming the sequence when you find the first duplicate:
(defn first-duplicate [coll]
(reduce (fn [acc [idx x]]
(if-let [v (get acc x)]
(reduced (conj v idx))
(assoc acc x [idx])))
{} (map-indexed #(vector % %2) coll)))
I know that you have only asked for the first. Here is a fully lazy implementation with little per-step allocation overhead
(defn dups
[coll]
(letfn [(loop-fn [idx [elem & rest] cached]
(if elem
(if-let [last-idx (cached elem)]
(cons [last-idx idx]
(lazy-seq (loop-fn (inc idx) rest (dissoc cached elem))))
(lazy-seq (loop-fn (inc idx) rest (assoc cached elem idx))))))]
(loop-fn 0 coll {})))
(first (dups v))
=> [2 5]
Edit: Here are some criterium benchmarks:
The answer that got accepted: 7.819269 µs
This answer (first (dups [12 5 13 6 5 7])): 6.176290 µs
Beschastnys: 5.841101 µs
first-duplicate: 5.025445 µs
Actually, loop is a pretty good choice unless you want to find all duplicates. Things like reduce will cause the full scan of an input sequence even when it's not necessary.
Here is my version of get-cycle:
(defn get-cycle [coll]
(loop [i 0 seen {} coll coll]
(when-let [[x & xs] (seq coll)]
(if-let [j (seen x)]
[j i]
(recur (inc i) (assoc seen x i) xs)))))
The only difference from your get-cycle is that my version returns nil when there is no duplicates.
The intent of your code seems different from your description in the comments so I'm not totally confident I understand. That said, loop/recur is definitely a valid way to approach the problem.
Here's what I came up with:
(defn get-cycle [xs]
(loop [xs xs index 0]
(when-let [[x & more] (seq xs)]
(when-let [[y] (seq more)]
(if (= x y)
{x [index (inc index)]}
(recur more (inc index)))))))
This will return a map of the repeated item to a vector of the two indices the item was found at.
(get-cycle [1 1 2 1 2 4 2 1 4 5 6 7])
;=> {1 [0 1]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 7])
;=> {7 [10 11]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 8])
;=> nil
Here's an alternative solution using sequence functions. I like this way better but whether it's shorter or more elegant is probably subjective.
(defn pairwise [coll]
(map vector coll (rest coll)))
(defn find-first [pred xs]
(first (filter pred xs)))
(defn get-cycle [xs]
(find-first #(apply = (val (first %)))
(map-indexed hash-map (pairwise xs))))
Edited based on clarification from #demi
Ah, got it. Is this what you have in mind?
(defn get-cycle [xs]
(loop [xs (map-indexed vector xs)]
(when-let [[[i n] & more] (seq xs)]
(if-let [[j _] (find-first #(= n (second %)) more)]
{n [i j]}
(recur more)))))
I re-used find-first from my earlier sequence-based solution.

How do I filter elements from a sequence based on indexes

I have a sequence s and a list of indexes into this sequence indexes. How do I retain only the items given via the indexes?
Simple example:
(filter-by-index '(a b c d e f g) '(0 2 3 4)) ; => (a c d e)
My usecase:
(filter-by-index '(c c# d d# e f f# g g# a a# b) '(0 2 4 5 7 9 11)) ; => (c d e f g a b)
You can use keep-indexed:
(defn filter-by-index [coll idxs]
(keep-indexed #(when ((set idxs) %1) %2)
coll))
Another version using explicit recur and lazy-seq:
(defn filter-by-index [coll idxs]
(lazy-seq
(when-let [idx (first idxs)]
(if (zero? idx)
(cons (first coll)
(filter-by-index (rest coll) (rest (map dec idxs))))
(filter-by-index (drop idx coll)
(map #(- % idx) idxs))))))
make a list of vectors containing the items combined with the indexes,
(def with-indexes (map #(vector %1 %2 ) ['a 'b 'c 'd 'e 'f] (range)))
#'clojure.core/with-indexes
with-indexes
([a 0] [b 1] [c 2] [d 3] [e 4] [f 5])
filter this list
lojure.core=> (def filtered (filter #(#{1 3 5 7} (second % )) with-indexes))
#'clojure.core/filtered
clojure.core=> filtered
([b 1] [d 3] [f 5])
then remove the indexes.
clojure.core=> (map first filtered)
(b d f)
then we thread it together with the "thread last" macro
(defn filter-by-index [coll idxs]
(->> coll
(map #(vector %1 %2)(range))
(filter #(idxs (first %)))
(map second)))
clojure.core=> (filter-by-index ['a 'b 'c 'd 'e 'f 'g] #{2 3 1 6})
(b c d g)
The moral of the story is, break it into small independent parts, test them, then compose them into a working function.
The easiest solution is to use map:
(defn filter-by-index [coll idx]
(map (partial nth coll) idx))
I like Jonas's answer, but neither version will work well for an infinite sequence of indices: the first tries to create an infinite set, and the latter runs into a stack overflow by layering too many unrealized lazy sequences on top of each other. To avoid both problems you have to do slightly more manual work:
(defn filter-by-index [coll idxs]
((fn helper [coll idxs offset]
(lazy-seq
(when-let [idx (first idxs)]
(if (= idx offset)
(cons (first coll)
(helper (rest coll) (rest idxs) (inc offset)))
(helper (rest coll) idxs (inc offset))))))
coll idxs 0))
With this version, both coll and idxs can be infinite and you will still have no problems:
user> (nth (filter-by-index (range) (iterate #(+ 2 %) 0)) 1e6)
2000000
Edit: not trying to single out Jonas's answer: none of the other solutions work for infinite index sequences, which is why I felt a solution that does is needed.
I had a similar use case and came up with another easy solution. This one expects vectors.
I've changed the function name to match other similar clojure functions.
(defn select-indices [coll indices]
(reverse (vals (select-keys coll indices))))
(defn filter-by-index [seq idxs]
(let [idxs (into #{} idxs)]
(reduce (fn [h [char idx]]
(if (contains? idxs idx)
(conj h char) h))
[] (partition 2 (interleave seq (iterate inc 0))))))
(filter-by-index [\a \b \c \d \e \f \g] [0 2 3 4])
=>[\a \c \d \e]
=> (defn filter-by-index [src indexes]
(reduce (fn [a i] (conj a (nth src i))) [] indexes))
=> (filter-by-index '(a b c d e f g) '(0 2 3 4))
[a c d e]
I know this is not what was asked, but after reading these answers, I realized in my own personal use case, what I actually wanted was basically filtering by a mask.
So here was my take. Hopefully this will help someone else.
(defn filter-by-mask [coll mask]
(filter some? (map #(if %1 %2) mask coll)))
(defn make-errors-mask [coll]
(map #(nil? (:error %)) coll))
Usage
(let [v [{} {:error 3} {:ok 2} {:error 4 :yea 7}]
data ["one" "two" "three" "four"]
mask (make-errors-mask v)]
(filter-by-mask data mask))
; ==> ("one" "three")