In the same way alt! waits for one of n channels to get a value, I'm looking for the idiomatic way to wait for all n channels to get a value.
I need this because I "spawn" n go blocks to work on async tasks, and I want to know when they are all done. I'm sure there is a very beautiful way to achieve this.
Use the core.async map function:
(<!! (a/map vector [ch1 ch2 ch3]))
;; [val-from-ch-1 val-from-ch2 val-from-ch3]
You can say (mapv #(async/<!! %) channels).
If you wanted to handle individual values as they arrive, and then do something special after the final channel produces a value, you can use exploit the fact that alts! / alts!! take a vector of channels, and they are functions, not macros, so you can easily pass in dynamically constructed vectors.
So, you can use alts!! to wait on your initial collection of n channels, then use it again on the remaining channels etc.
(def c1 (async/chan))
(def c2 (async/chan))
(def out
(async/thread
(loop [cs [c1 c2] vs []]
(let [[v p] (async/alts!! cs)
cs (filterv #(not= p %) cs)
vs (conj vs v)]
(if (seq cs)
(recur cs vs)
vs)))))
(async/>!! c1 :foo)
(async/>!! c2 :bar)
(async/<!! out)
;= [:foo :bar]
If instead you wanted to take all values from all the input channels and then do something else when they all close, you'd want to use async/merge:
clojure.core.async/merge
([chs] [chs buf-or-n])
Takes a collection of source channels and returns a channel which
contains all values taken from them. The returned channel will be
unbuffered by default, or a buf-or-n can be supplied. The channel
will close after all the source channels have closed.
Related
let's say I've got a channel out (chan). I need to take values that are put into the channel and add them. The number of values is undetermined (thus cannot use traditional loop with an end case with (<! out)) and comes from an external IO. I'm using a fixed timeout with alts! but that doesn't seem like the best way to approach the problem. So far, I've got the following (which I got from https://gist.github.com/schaueho/5726a96641693dce3e47)
(go-loop
[[v ch] (alts! [out (timeout 1000)])
acc 0]
(if-not v
(do (close! out)
(deliver p acc))
(do
(>! task-ch (as/progress-tick))
(recur (alts! [out (timeout 1000)]) (+ acc v)))))
The problem I've got is that a timeout of 1000 is sometimes not enough and causes the go-loop to exit prematurely (as it may take more than 1000ms for the IO operation to complete and put the val in the out channel). I do not think increasing the timeout value is such a good idea as it may cause me to wait longer than necessary.
What is the best way to guarantee all reads from the out channel and exit out correctly from the loop?
Update:
Why am I using timeout?
Because the number of values being put in the channel is not fixed; which means, I cannot create an exit case. W/o the exit case, the go-loop will park indefinely waiting ((<! out)) for values to be put in the channel out. If you have a solve without the timeout, that'd be really awesome.
How do i know I've read the last value?
I dont. That's the problem. That's why I'm using timeout and alts!! to exit the go-loop.
What do you want to do w/ the result?
Simple addition for now. However, that's not the important bit.
Update Final:
I figured out a way to get the number of values I'd be dealing with. So I modified my logic to make use of that. I'm still going to use the timeout and alts! to prevent any locking.
(go-loop
[[v _] (alts! [out (timeout 1000)])
i 0
acc 0]
(if (and v (not= n i))
(do
(>! task-ch (as/progress-tick))
(recur (alts! [out (timeout 1000)]) (inc i) (+ acc v)))
(do (close! out)
(deliver p* (if (= n i) acc nil)))))
I think your problem is a bit higher-up in your design, not a core-async specific one:
On one hand, you have an undetermined amount of values coming in a channel — there could be 0, there could be 10, there could be 1,000,000.
On the other hand, you want to read all of them, do some calculation and then return. This is impossible to do — unless there is some other signal that you can use to say "I think I'm done now".
If that signal is the timing of the values, then your approach of using alts! is the correct one, albeit I believe the code can be simplified a bit.
Update: Do you have access to the "upstream" IO? Can you put a sentinel value (e.g. something like ::closed) to the channel when the IO operation finishes?
The 'best' way is to wait for either a special batch ending message from out or for out to be closed by the sender to mark the end of the inputs.
Either way, the solution rests with the sender communicating something about the inputs.
Consider a core.async channel which is created like so:
(def c (chan))
And let's assume values are put and taken to this channel from different places (eg. in go-loops).
How would one flush all the items on the channel at a certain time?
For instance one could make the channel an atom and then have an event like this:
(def c (atom (chan))
(defn reset []
(close! #c)
(reset! c (chan)))
Is there another way to do so?
Read everything to a vector with into and don't use it.
(go (async/into [] c))
Let's define a little more clearly what you seem to want to do: you have code running in several go-loops, each of them putting data on the same channel. You want to be able to tell them all: "the channel you're putting values on is no good anymore; from now on, put your values on some other channel." If that's not what you want to do, then your original question doesn't make much sense, as there's no "flushing" to be done -- you either take the values being put on the channel, or you don't.
First, understand the reason your approach won't work, which the comments to your question touch on: if you deref an atom c, you get a channel, and that value is always the same channel. You have code in go-loops that have called >! and are currently parked, waiting for takers. When you close #c, those parked threads stay parked (anyone parked while taking from a channel (<!) will immediately get the value nil when the channel closes, but parked >!s will simply stay parked). You can reset! c all day long, but the parked threads are still parked on a previous value they got from derefing.
So, how do you do it? Here's one approach.
(require '[clojure.core.async :as a
:refer [>! <! >!! <!! alt! take! go-loop chan close! mult tap]])
(def rand-int-chan (chan))
(def control-chan (chan))
(def control-chan-mult (mult control-chan))
(defn create-worker
[put-chan control-chan worker-num]
(go-loop [put-chan put-chan]
(alt!
[[put-chan (rand-int 10)]]
([_ _] (println (str "Worker" worker-num " generated value."))
(recur put-chan))
control-chan
([new-chan] (recur new-chan)))))
(defn create-workers
[n c cc]
(dotimes [n n]
(let [tap-chan (chan)]
(a/tap cc tap-chan)
(create-worker c tap-chan n))))
(create-workers 5 rand-int-chan control-chan-mult)
So we are going to create 5 worker loops that will put their result on rand-int-chan, and we will give them a "control channel." I will let you explore mult and tap on your own, but in short, we are creating a single channel which we can put values on, and that value is then broadcast to all channels which tap it.
In our worker loop, we do one of two things: put a value onto the rand-int-chan that we use when we create it, or we will take a value off of this control channel. We can cleverly let the worker thread know that the channel to put its values on has changed by actually handing it the new channel, which it will then bind on the next time through the loop. So, to see it in action:
(<!! rand-int-chan)
=> 6
Worker2 generated value.
This will take random ints from the channel, and the worker thread will print that it has generated a value, to see that indeed multiple threads are participating here.
Now, let's say we want to change the channel to put the random integers on. No problem, we do:
(def new-rand-int-chan (chan))
(>!! control-chan new-rand-int-chan)
(close! rand-int-chan) ;; for good measure, may not be necessary
We create the channel, and then we put that channel onto our control-chan. When we do this, ever worker thread will have the second portion of its alt! executed, which simply loops back to the top of the go-loop, except this time, the put-chan will be bound to the new-rand-int-chan we just received. So now:
(<!! new-rand-int-chan)
=> 3
Worker1 generated value.
This gives us our integers, which is exactly what we want. Any attempt to <!! from the old channel will give nil, since we closed the channel:
(<!! rand-int-chan)
; nil
In Clojure(Script), is there anyway to jam a value at the bottom (as opposed to the top) of a channel so that the next time it is taken from (for example by using <! inside a go block), it is guaranteed that that the value you jammed into it will be the one that is received?
I'm not sure there is any easy way to do this. You may be able to implement your own type of buffer, as in buffers.clj, and then instantiate your channel using that buffer:
(chan (lifo-buffer 10))
For example you could create a buffer that works like a LIFO queue:
(deftype LIFOBuffer [^LinkedList buf ^long n]
impl/UnblockingBuffer
impl/Buffer
(full? [this]
false)
(remove! [this]
(.removeFirst buf))
(add!* [this itm]
(when-not (>= (.size buf) n)
(.addFirst buf itm))
this)
(close-buf! [this])
clojure.lang.Counted
(count [this]
(.size buf)))
(defn lifo-buffer [n]
(LIFOBuffer. (LinkedList.) n))
Beware, though, that this solution relies on implementation details of core.async and may break in the future.
You can have two channels, imagine one is called first-class and the other is called economy. You write a go-loop that just recurs and calls alts! (or alts!!) on the two channels, but with the :priority option for first-class. Then you put values into economy as your "normal" input channel, and when you want a value to jump the line, you put it into first-class.
Say I have a function, (get-events "feed"), that returns a vector of events in chronological order, taken from an external source.
Now, at any given moment, that function returns a list of events up to that point in time. Called a few seconds later, it will return a few more events, etc, as the feed continually grows.
If I want to create a lazy-seq that forever pulls new events from the feed, making sure it doesn't repeat those that have already been seen, how would I write this? I'm running into a stack overflow error when I don't use recur, but I can't use recur, because it doesn't appear in a tail position.
(def continually-list-events
([feed] (continually-list-events feed (hash-set)))
([feed seen]
(let [events-now (get-events feed)]
(into (remove seen events-now)
(lazy-seq
(continually-list-events feed
(into seen events-now))))))
You can see I'm trying to use an accumulator to track events already seen (in a set), and I'm making sure to always filter out the ones I've seen.
If each step keeps track of how many events have been received so far then that iteration can return a sequence of new events by dropping the old ones.
user> (->> (iterate (fn [[events-so-far contents]]
(let [events (get-events)
new-events (drop events-so-far events)]
[(count events) new-events])))
(mapcat second))
Then you can drop the counts from the sequence and flatten the chunks of events into a sequence of single events.
In your example the stackoverflow is because there is no call to cons after the call to lazy-seq so it's calculating the whole list as the first item in the sequence.
user> (defn example [x] (lazy-seq (cons x (example (inc x)))))
#'user/example
user> (take 5 (example 4))
(4 5 6 7 8)
user> (defn example [x] (lazy-seq (example (inc x))))
#'user/example
user> (take 5 (example 4))
... long pause then out of memory ...
PS: using lazy-seq directly is somewhat uncommon, though it's important to know how it works.
I'm experimenting with filtering through elements in parallel. For each element, I need to perform a distance calculation to see if it is close enough to a target point. Never mind that data structures already exist for doing this, I'm just doing initial experiments for now.
Anyway, I wanted to run some very basic experiments where I generate random vectors and filter them. Here's my implementation that does all of this
(defn pfilter [pred coll]
(map second
(filter first
(pmap (fn [item] [(pred item) item]) coll))))
(defn random-n-vector [n]
(take n (repeatedly rand)))
(defn distance [u v]
(Math/sqrt (reduce + (map #(Math/pow (- %1 %2) 2) u v))))
(defn -main [& args]
(let [[n-str vectors-str threshold-str] args
n (Integer/parseInt n-str)
vectors (Integer/parseInt vectors-str)
threshold (Double/parseDouble threshold-str)
random-vector (partial random-n-vector n)
u (random-vector)]
(time (println n vectors
(count
(pfilter
(fn [v] (< (distance u v) threshold))
(take vectors (repeatedly random-vector))))))))
The code executes and returns what I expect, that is the parameter n (length of vectors), vectors (the number of vectors) and the number of vectors that are closer than a threshold to the target vector. What I don't understand is why the programs hangs for an additional minute before terminating.
Here is the output of a run which demonstrates the error
$ time lein run 10 100000 1.0
[null] 10 100000 12283
[null] "Elapsed time: 3300.856 msecs"
real 1m6.336s
user 0m7.204s
sys 0m1.495s
Any comments on how to filter in parallel in general are also more than welcome, as I haven't yet confirmed that pfilter actually works.
You need to call shutdown-agents to kill the threads backing the threadpool used by pmap.
About pfilter, it should work but run slower than filter, since your predicate is simple. Parallelization isn't free so you have to give each thread moderately intensive tasks to offset the multithreading overhead. Batch your items before filtering them.