how to build a chunked lazy-seq that blocks? - clojure

I'd like to use chunked cons or some other way to create a lazy-seq that blocks. Given a source:
(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))
(take 2 (-source-))
;; => (<future> <future>)
I'd like to have a function called injest where:
(take 3 (injest (-source-)))
=> [;; sleep 100
1 2
;; sleep 100
1]
(take 6 (injest (-source-)))
=> [;; sleep 100
1 2
;; sleep 100
1 2
;; sleep 100
1 2]
;; ... etc ...
how would I go about writing this function?

This source will naturally block as you consume it, so you don't have to do anything terribly fancy. It's almost enough to simply (mapcat deref):
(doseq [x (take 16 (mapcat deref (-source- )))]
(println {:value x :time (System/currentTimeMillis)}))
{:value 1, :time 1597725323091}
{:value 2, :time 1597725323092}
{:value 1, :time 1597725323092}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323093}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323194}
{:value 2, :time 1597725323195}
{:value 1, :time 1597725323299}
{:value 2, :time 1597725323300}
{:value 1, :time 1597725323406}
{:value 2, :time 1597725323406}
{:value 1, :time 1597725323510}
{:value 2, :time 1597725323511}
Notice how the first few items come in all at once, and then after that each pair is staggered by about the time you'd expect? This is due to the well-known(?) fact that apply (and therefore mapcat, which is implemented with apply concat) is more eager than necessary, for performance reasons. If it is important for you to get the right delay even on the first few items, you can simply implement your own version of apply concat that doesn't optimize for short input lists.
(defn ingest [xs]
(when-let [coll (seq (map (comp seq deref) xs))]
((fn step [curr remaining]
(lazy-seq
(cond curr (cons (first curr) (step (next curr) remaining))
remaining (step (first remaining) (next remaining)))))
(first coll) (next coll))))
A. Webb in the comments suggests an equivalent but much simpler implementation:
(defn ingest [coll]
(for [batch coll,
item #batch]
item))

You can solve it by iterating a state machine. I don't think this suffers from the optimizations related to apply pointed out by others, but I am not sure if there might be other issues with this approach:
(defn step-state [[current-element-to-unpack input-seq]]
(cond
(empty? input-seq) nil
(empty? current-element-to-unpack) [(deref (first input-seq)) (rest input-seq)]
:default [(rest current-element-to-unpack) input-seq]))
(defn injest [input-seq]
(->> [[] input-seq]
(iterate step-state)
(take-while some?)
(map first)
(filter seq)
(map first)))

I think you're good with just deref'ing the elements of the lazy seq, and just force the consumption of the entries you need, like this:
(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))
(defn injest [src]
(map deref src))
;; (time (dorun (take 3 (injest (-source-)))))
;; => "Elapsed time: 303.432003 msecs"
;; (time (dorun (take 6 (injest (-source-)))))
;; => "Elapsed time: 603.319103 msecs"
On the other hand, I think that depending on the number of items it might be better to avoid creating lots of futures and use a lazy-seq that depending on the index of the element might block for a while.

Related

How to schedule a list of functions `n` seconds apart with Clojure Core Async

In my Clojure project I'm trying to make a list of http calls to an API that has a rate limiter that only allows n calls per minute. I want each of the responses to be returned once all the http calls are finished for further processing. I am new to Clojure's Core Async, but thought it would be a good fit, but because I need to run each call n seconds apart I am also trying to use the Chime library. In Chime's library it has examples using Core Async, but the examples all call the same function at each time interval which won't work for this use case.
While there is probably a way to use chime-async that better serves this use case, all of my attempts at that have failed so I've tried simply wrapping Chime calls with core async, but I am probably more baffled by Core Async than Chime.
This is an example of my name space.
(ns mp.util.schedule
(:require [chime.core :as chime]
[clojure.core.async :as a]
[tick.alpha.api :as tick]))
(defn schedule-fns
"Takes a list of functions and a duration in seconds then runs each function in the list `sec` seconds apart
optionally provide an inst to start from"
[fs sec & [{:keys [inst] :or {inst (tick/now)}}]]
(let [ch (a/chan (count fs))
chime-times (map-indexed
(fn mapped-fn [i f]
(a/put! ch (chime/chime-at [(.plusSeconds inst (* i sec))]
(fn wrapped-fn [_] (f)))))
fs)]
(doseq [chi chime-times]
(a/<!! chi))))
; === Test Code ===
; simple test function
(defn sim-fn
"simple function that prints a message and value, then returns the value"
[v m]
(println m :at (tick/now))
v)
; list of test functions
(def fns [#(sim-fn 1 :one)
#(sim-fn 2 :two)
#(sim-fn 3 :three)])
What I want to happen when calling (schedule-fns fns 2) is for each function in fns to run n seconds from each other and for schedule-fns to return (1 2 3) (the return values of the functions), but this isn't what it is doing. It is calling each of the functions at the correct times (which I can see from the log statements) but it isn't returning anything and there's an error I don't understand. I'm getting:
(schedule-fns fns 2)
:one :at #time/instant "2021-03-05T23:31:52.565Z"
Execution error (IllegalArgumentException) at clojure.core.async.impl.protocols/eval11496$fn$G (protocols.clj:15).
No implementation of method: :take! of protocol: #'clojure.core.async.impl.protocols/ReadPort found for class: java.lang.Boolean
:two :at #time/instant "2021-03-05T23:31:54.568Z"
:three :at #time/instant "2021-03-05T23:31:56.569Z"
If I could get help getting my code to use Core Async properly (with or without Chime) I'd really appreciate it. Thanks.
Try this:
(defn sim-fn
"simple function that prints a message and value, then returns the value"
[v m]
(println m)
v)
; list of test functions
(def fns [#(sim-fn 1 :one)
#(sim-fn 2 :two)
#(sim-fn 3 :three)])
(defn schedule-fns [fns sec]
(let [program (interpose #(Thread/sleep (* sec 1000))
fns)]
(remove #(= % nil)
(for [p program]
(p)))))
Then call:
> (schedule-fns fns 2)
:one
:two
:three
=> (1 2 3)
I came up with a way to get what I want...with some caveats.
(def results (atom []))
(defn schedule-fns
"Takes a list of functions and a duration in seconds then runs each function in the list `sec` seconds apart
optionally provide an inst to start from"
[fs sec]
(let [ch (chan (count fs))]
(go-loop []
(swap! results conj (<! ch))
(recur))
(map-indexed (fn [i f]
(println :waiting (* i sec) :seconds)
(go (<! (timeout (* i sec 1000)))
(>! ch (f))))
fs)))
This code has the timing and behavior that I want, but I have to use an atom to store the responses. While I can add a watcher to determine when all the results are in, I still feel like I shouldn't have to do that.
I guess I'll use this for now, but at some point I'll keep working on this and if anyone has something better than this approach I'd love to see it.
I had a couple friends look at this and they each came up with different answers. These are certainly better than what I was doing.
(defn schedule-fns [fs secs]
(let [ret (atom {})
sink (a/chan)]
(doseq [[n f] (map-indexed vector fs)]
(a/thread (a/<!! (a/timeout (* 1000 n secs)))
(let [val (f)
this-ret (swap! ret assoc n val)]
(when (= (count fs) (count this-ret))
(a/>!! sink (mapv (fn [i] (get this-ret i)) (range (count fs))))))))
(a/<!! sink)))
and
(defn schedule-fns
[fns sec]
(let [concurrent (count fns)
output-chan (a/chan)
timedout-coll (map-indexed (fn [i f]
#(do (println "Waiting")
(a/<!! (a/timeout (* 1000 i sec)))
(f))) fns)]
(a/pipeline-blocking concurrent
output-chan
(map (fn [f] (f)))
(a/to-chan timedout-coll))
(a/<!! (a/into [] output-chan))))
If your objective is to work around the rate limiter, you can consider implementing it in the async channel. Below is one sample implementation - the function takes a channel, throttled its input with a token based limiter and pipe it to an output channel.
(require '[clojure.core.async :as async])
(defn rate-limiting-ch [input xf rate]
(let [tokens (numerator rate)
period (denominator rate)
ans (async/chan tokens xf)
next (fn [] (+ period (System/currentTimeMillis)))]
(async/go-loop [c tokens
t (next)]
(if (zero? c)
(do
(async/<! (async/timeout (- t (System/currentTimeMillis))))
(recur tokens (next)))
(when-let [x (async/<! input)]
(async/>! ans x)
(recur (dec c) t))))
ans))
And here is a sample usage:
(let [start (System/currentTimeMillis)
input (async/to-chan (range 10))
output (rate-limiting-ch input
;; simulate an api call with roundtrip time of ~300ms
(map #(let [wait (rand-int 300)
ans {:time (- (System/currentTimeMillis) start)
:wait wait
:input %}]
(Thread/sleep wait)
ans))
;; rate limited to 2 calls per 1000ms
2/1000)]
;; consume the output
(async/go-loop []
(when-let [x (async/<! output)]
(println x)
(recur))))
Output:
{:time 4, :wait 63, :input 0}
{:time 68, :wait 160, :input 1}
{:time 1003, :wait 74, :input 2}
{:time 1079, :wait 151, :input 3}
{:time 2003, :wait 165, :input 4}
{:time 2169, :wait 182, :input 5}
{:time 3003, :wait 5, :input 6}
{:time 3009, :wait 18, :input 7}
{:time 4007, :wait 138, :input 8}
{:time 4149, :wait 229, :input 9}

Clojure return value from a loop

I want to calculate intersection points. This works well, but I want to store the points in a vector that the function should return.
Here is my code:
(defn intersections
[polygon line]
(let [[p1 p2] line
polypoints (conj polygon (first polygon))]
(doseq [x (range (- (count polypoints) 1))]
(println (intersect p1 p2 (nth polypoints x) (nth polypoints (+ x 1))))
)))
Instead of println I want to add the result to a new vector that should be returned. How can I change it?
You need to use a for loop. The doseq function is meant for side-effects only and always returns nil. An example:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn intersect-1
[numbers]
(let [data-vec (vec numbers)]
(vec
(for [i (range (dec (count numbers)))]
{:start (nth data-vec i)
:stop (nth data-vec (inc i))}))))
The above way works, as seen by the unit test:
(dotest
(is= (intersect-1 (range 5))
[{:start 0, :stop 1}
{:start 1, :stop 2}
{:start 2, :stop 3}
{:start 3, :stop 4}])
However, it is more natural to write it like so in Clojure:
(defn intersect-2
[numbers]
(let [pairs (partition 2 1 numbers)]
(vec
(for [[start stop] pairs]
{:start start :stop stop} ))))
With the same result
(is= (intersect-2 (range 5))
[{:start 0, :stop 1}
{:start 1, :stop 2}
{:start 2, :stop 3}
{:start 3, :stop 4}]))
You can get more details on my favorite template project (including a big documentation list!). See especially the Clojure CheatSheet!
Side note: The vec is optional in both versions. This just forces the answer into a Clojure vector (instead of a "lazy seq"), which is easier to cut and paste in examples and unit tests.
Instead of for-loop, a map would be more idiomatic.
(defn intersections
[polygon line]
(let [[p1 p2] line]
(vec (map (fn [pp1 pp2] (intersect p1 p2 pp1 pp2)) polygon (cdr polygon)))))
or:
(defn intersections
[polygon line]
(let [[p1 p2] line]
(vec (map #(intersect p1 p2 %1 %2) polygon (cdr polygon)))))

clojure - contains?, conj and recur

I'm trying to write a function with recur that cut the sequence as soon as it encounters a repetition ([1 2 3 1 4] should return [1 2 3]), this is my function:
(defn cut-at-repetition [a-seq]
(loop[[head & tail] a-seq, coll '()]
(if (empty? head)
coll
(if (contains? coll head)
coll
(recur (rest tail) (conj coll head))))))
The first problem is with the contains? that throws an exception, I tried replacing it with some but with no success. The second problem is in the recur part which will also throw an exception
You've made several mistakes:
You've used contains? on a sequence. It only works on associative
collections. Use some instead.
You've tested the first element of the sequence (head) for empty?.
Test the whole sequence.
Use a vector to accumulate the answer. conj adds elements to the
front of a list, reversing the answer.
Correcting these, we get
(defn cut-at-repetition [a-seq]
(loop [[head & tail :as all] a-seq, coll []]
(if (empty? all)
coll
(if (some #(= head %) coll)
coll
(recur tail (conj coll head))))))
(cut-at-repetition [1 2 3 1 4])
=> [1 2 3]
The above works, but it's slow, since it scans the whole sequence for every absent element. So better use a set.
Let's call the function take-distinct, since it is similar to take-while. If we follow that precedent and make it lazy, we can do it thus:
(defn take-distinct [coll]
(letfn [(td [seen unseen]
(lazy-seq
(when-let [[x & xs] (seq unseen)]
(when-not (contains? seen x)
(cons x (td (conj seen x) xs))))))]
(td #{} coll)))
We get the expected results for finite sequences:
(map (juxt identity take-distinct) [[] (range 5) [2 3 2]]
=> ([[] nil] [(0 1 2 3 4) (0 1 2 3 4)] [[2 3 2] (2 3)])
And we can take as much as we need from an endless result:
(take 10 (take-distinct (range)))
=> (0 1 2 3 4 5 6 7 8 9)
I would call your eager version take-distinctv, on the map -> mapv precedent. And I'd do it this way:
(defn take-distinctv [coll]
(loop [seen-vec [], seen-set #{}, unseen coll]
(if-let [[x & xs] (seq unseen)]
(if (contains? seen-set x)
seen-vec
(recur (conj seen-vec x) (conj seen-set x) xs))
seen-vec)))
Notice that we carry the seen elements twice:
as a vector, to return as the solution; and
as a set, to test for membership of.
Two of the three mistakes were commented on by #cfrick.
There is a tradeoff between saving a line or two and making the logic as simple & explicit as possible. To make it as obvious as possible, I would do it something like this:
(defn cut-at-repetition
[values]
(loop [remaining-values values
result []]
(if (empty? remaining-values)
result
(let [found-values (into #{} result)
new-value (first remaining-values)]
(if (contains? found-values new-value)
result
(recur
(rest remaining-values)
(conj result new-value)))))))
(cut-at-repetition [1 2 3 1 4]) => [1 2 3]
Also, be sure to bookmark The Clojure Cheatsheet and always keep a browser tab open to it.
I'd like to hear feedback on this utility function which I wrote for myself (uses filter with stateful pred instead of a loop):
(defn my-distinct
"Returns distinct values from a seq, as defined by id-getter."
[id-getter coll]
(let [seen-ids (volatile! #{})
seen? (fn [id] (if-not (contains? #seen-ids id)
(vswap! seen-ids conj id)))]
(filter (comp seen? id-getter) coll)))
(my-distinct identity "abracadabra")
; (\a \b \r \c \d)
(->> (for [i (range 50)] {:id (mod (* i i) 21) :value i})
(my-distinct :id)
pprint)
; ({:id 0, :value 0}
; {:id 1, :value 1}
; {:id 4, :value 2}
; {:id 9, :value 3}
; {:id 16, :value 4}
; {:id 15, :value 6}
; {:id 7, :value 7}
; {:id 18, :value 9})
Docs of filter says "pred must be free of side-effects" but I'm not sure if it is ok in this case. Is filter guaranteed to iterate over the sequence in order and not for example take skips forward?

All subsets of a set in clojure

I wish to generate all subsets of a set except empty set
ie
(all-subsets #{1 2 3}) => #{#{1},#{2},#{3},#{1,2},#{2,3},#{3,1},#{1,2,3}}
How can this be done in clojure?
In your :dependencies in project.clj:
[org.clojure/math.combinatorics "0.0.7"]
At the REPL:
(require '[clojure.math.combinatorics :as combinatorics])
(->> #{1 2 3}
(combinatorics/subsets)
(remove empty?)
(map set)
(set))
;= #{#{1} #{2} #{3} #{1 2} #{1 3} #{2 3} #{1 2 3}}
clojure.math.combinatorics/subsets sensibly returns a seq of seqs, hence the extra transformations to match your desired output.
Here's a concise, tail-recursive version with dependencies only on clojure.core.
(defn power [s]
(loop [[f & r] (seq s) p '(())]
(if f (recur r (concat p (map (partial cons f) p)))
p)))
If you want the results in a set of sets, use the following.
(defn power-set [s] (set (map set (power s))))
#zcaudate: For completeness, here is a recursive implementation:
(defn subsets
[s]
(if (empty? s)
#{#{}}
(let [ts (subsets (rest s))]
(->> ts
(map #(conj % (first s)))
(clojure.set/union ts)))))
;; (subsets #{1 2 3})
;; => #{#{} #{1} #{2} #{3} #{1 2} #{1 3} #{2 3} #{1 2 3}} (which is correct).
This is a slight variation of #Brent M. Spell's solution in order to seek enlightenment on performance consideration in idiomatic Clojure.
I just wonder if having the construction of the subset in the loop instead of another iteration through (map set ...) would save some overhead, especially, when the set is very large?
(defn power [s]
(set (loop [[f & r] (seq s) p '(#{})]
(if f (recur r (concat p (map #(conj % f) p)))
p))))
(power [1 2 3])
;; => #{#{} #{3} #{2} #{1} #{1 3 2} #{1 3} #{1 2} #{3 2}}
It seems to me loop and recuris not lazy.
It would be nice to have a lazy evaluation version like Brent's, to keep the expression elegancy, while using laziness to achieve efficiency at the sametime.
This version as a framework has another advantage to easily support pruning of candidates for subsets, when there are too many subsets to compute. One can add the logic of pruning at position of conj. I used it to implement the prior algorithm for "Frequent Item Set".
refer to: Algorithm to return all combinations of k elements from n
(defn comb [k l]
(if (= 1 k) (map vector l)
(apply concat
(map-indexed
#(map (fn [x] (conj x %2))
(comb (dec k) (drop (inc %1) l)))
l))))
(defn all-subsets [s]
(apply concat
(for [x (range 1 (inc (count s)))]
(map #(into #{} %) (comb x s)))))
; (all-subsets #{1 2 3})
; (#{1} #{2} #{3} #{1 2} #{1 3} #{2 3} #{1 2 3})
This version is loosely modeled after the ES5 version on Rosetta Code. I know this question seems reasonably solved already... but here you go, anyways.
(fn [s]
(reduce
(fn [a b] (clojure.set/union a
(set (map (fn [y] (clojure.set/union #{b} y)) a))))
#{#{}} s))

make sequence side-effectfull in Clojure

What I want to do is like following.
(def mystream (stream (range 100)))
(take 3 mystream)
;=> (0 1 2)
(take 3 mystream)
;=> (3 4 5)
(first (drop 1 mystream))
;=> 7
The stream function make sequence side-effectfull like io stream.
I think this is almost impossible.
Here is my attempt.
(defprotocol Stream (first! [this]))
(defn stream [lst]
(let [alst (atom lst)]
(reify Stream
(first! [this]
(let [[fs] #alst]
(swap! alst rest)
fs)))))
(let [mystream (stream (iterate inc 1))]
(map #(if (string? %) (first! mystream) %)
[:a "e" "b" :c "i" :f]))
;=> (:a 1 2 :c 3 :f)
Unfotunately this approach need to implement all function I will use.
Judging by your followup comment to Maurits, you don't need mutation, but rather simply need to emit a new sequence with the elements in the right place.
For example:
(defn replace-when [pred coll replacements]
(lazy-seq
(when (seq coll)
(if (seq replacements)
(if (pred (first coll))
(cons (first replacements)
(replace-when pred (rest coll) (rest replacements)))
(cons (first coll)
(replace-when pred (rest coll) replacements)))
coll))))
user=> (def seq1 [:a :b :c])
#'user/seq1
user=> (def seq2 [:x "i" "u" :y :z "e"])
#'user/seq2
user=> (replace-when string? seq2 seq1)
(:x :a :b :y :z :c)
This won't work with the standard take and drop, but you could quite easily write your own to work on a mutable atom, e.g. you could do something like this:
(def mystream (atom (range 100)))
(defn my-take [n stream]
(let [data #stream
result (take n data)]
(reset! stream (drop n data))
result))
(my-take 3 mystream)
=> (0 1 2)
(my-take 3 mystream)
=> (3 4 5)