How can I return a vector? - clojure

I have a channel where I am putting values into inside a doseq loop.
This code reads from a list of isbns and for each isbn, does an amazon search to return contents of a book, and then calls another function to get the title and rank
(def book_channel (chan 10))

make sure you use clojure.core.async/into rather than clojure.core/into. Here is an example of a round trip from collection to channel and back to collection:
user> (require '[clojure.core.async :as async :refer [<! <!! >!! >! chan go]])
nil
user> (def book-chan (async/to-chan [:book1 :book2 :book3]))
#'user/book-chan
user> (<!! (clojure.core.async/into [] book-chan))
[:book1 :book2 :book3]
clojure.core.async/into returns a channel that will have exactly one item written to it. That one item will be written once it's input channel closes. This keeps the whole thing asynchronous and it does require that the code putting things into the book-channel close the chan to signal that all the books are there.

You need to do some type of coordination to determine when all of your work is finished. You can pull that coordination out into the main thread fairly easily:
(def book_channel (chan 10))
(defn concurrency_test
[list_of_isbns]
(doseq [isbn list_of_isbns]
(go (>! book_channel
(get_title_and_rank_for_one_isbn
(amazon_search isbn)))))
(prn (loop [results []]
(if (= (count results) (count list_of_isbns))
results
(recur (conj results (<!! book_channel)))))))
Here, I used a loop that keeps waiting for results and adding them to the vector until we have as many results as we do isbns. You'll want to make sure that get_title_and_rank_for_one_isbn always generates a result that can be put on a channel, otherwise the loop will wait forever.

You should close! the book_channel after you finish pushing stuff into it. Per async/into documentation - "ch must close before into produces a result."
(let [book> (chan)]
(go
(doseq [e (range 8)]
(>! book> e))
(close! book>))
(<!! (async/into [] book>)))
Alternatively, you can use async/onto-chan which will close the channel for you:
(let [book> (chan)]
(async/onto-chan book> (range 8))
(<!! (async/into [] book>)))

Related

Clojure program works fine when debugged, fails in repl

I'm learning core.async and have written a simple producer consumer code:
(ns webcrawler.parallel
(:require [clojure.core.async :as async
:refer [>! <! >!! <!! go chan buffer close! thread alts! alts!! timeout]]))
(defn consumer
[in out f]
(go (loop [request (<! in)]
(if (nil? request)
(close! out)
(do (print f)
(let [result (f request)]
(>! out result))
(recur (<! in)))))))
(defn make-consumer [in f]
(let [out (chan)]
(consumer in out f)
out))
(defn process
[f s no-of-consumers]
(let [in (chan (count s))
consumers (repeatedly no-of-consumers #(make-consumer in f))
out (async/merge consumers)]
(map #(>!! in %1) s)
(close! in)
(loop [result (<!! out)
results '()]
(if (nil? result)
results
(recur (<!! out)
(conj results result))))))
This code works fine when I step in through the process function in debugger supplied with Emacs' cider.
(process (partial + 1) '(1 2 3 4) 1)
(5 4 3 2)
However, if I run it by itself (or hit continue in the debugger) I get an empty result.
(process (partial + 1) '(1 2 3 4) 1)
()
My guess is that in the second case for some reason producer doesn't wait for consumers before exiting, but I'm not sure why. Thanks for help!
The problem is that your call to map is lazy, and will not run until something asks for the results. Nothing does this in your code.
There are 2 solutions:
(1) Use the eager function mapv:
(mapv #(>!! in %1) items)
(2) Use the doseq, which is intended for side-effecting operations (like putting values on a channel):
(doseq [item items]
(>!! in item))
Both will work and produce output:
(process (partial + 1) [1 2 3 4] 1) => (5 4 3 2)
P.S. You have a debug statement in (defn consumer ...)
(print f)
that produces a lot of noise in the output:
<#clojure.core$partial$fn__5561 #object[clojure.core$partial$fn__5561 0x31cced7
"clojure.core$partial$fn__5561#31cced7"]>
That is repeated 5 times back to back. You probably want to avoid that, as printing function "refs" is pretty useless to a human reader.
Also, debug printouts in general should normally use println so you can see where each one begins and ends.
I'm going to take a safe stab that this is being caused by the lazy behavior of map, and this line that's carrying out side effects:
(map #(>!! in %1) s)
Because you never explicitly use the results, it never runs. Change it to use mapv, which is strict, or more correctly, use doseq. Never use map to run side effects. It's meant to lazily transform a list, and abuse of it leads to behaviour like this.
So why is it working while debugging? I'm going to guess because the debugger forces evaluation as part of its operation, which is masking the problem.
As you can read from docstring map returns a lazy sequence. And I think the best way is to use dorun. Here is an example from clojuredocs:
;;map a function which makes database calls over a vector of values
user=> (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"])
JdbcSQLException The object is already closed [90007-170] org.h2.message.DbE
xception.getJdbcSQLException (DbException.java:329)
;;database connection was closed before we got a chance to do our transactions
;;lets wrap it in dorun
user=> (dorun (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"]))
DEBUG :db insert into person values name = 'Fred'
DEBUG :db insert into person values name = 'Ethel'
DEBUG :db insert into person values name = 'Lucy'
DEBUG :db insert into person values name = 'Ricardo'
nil

How to stop go block in ClojureScript / core.async?

Is there an elegant way to stop a running go block?
(without introducing a flag and polluting the code with checks/branches)
(ns example
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [cljs.core.async :refer [<! timeout]]))
(defn some-long-task []
(go
(println "entering")
; some complex long-running task (e.g. fetching something via network)
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(println "leaving")))
; run the task
(def task (some-long-task))
; later, realize we no longer need the result and want to cancel it
; (stop! task)
Sorry, this is not possible with core.async today. What you get back from creating a go block is a normal channel what the result of the block will be put on, though this does not give you any handle to the actual block itself.
As stated in Arthur's answer, you cannot terminate a go block immediately, but you since your example indicates a multi-phased task (using sub-tasks), an approach like this might work:
(defn task-processor
"Takes an initial state value and number of tasks (fns). Puts tasks
on a work queue channel and then executes them in a go-loop, with
each task passed the current state. A task's return value is used as
input for next task. When all tasks are processed or queue has been
closed, places current result/state onto a result channel. To allow
nil values, result is wrapped in a map:
{:value state :complete? true/false}
This fn returns a map of {:queue queue-chan :result result-chan}"
[init & tasks]
(assert (pos? (count tasks)))
(let [queue (chan)
result (chan)]
(async/onto-chan queue tasks)
(go-loop [state init, i 0]
(if-let [task (<! queue)]
(recur (task state) (inc i))
(do (prn "task queue finished/terminated")
(>! result {:value state :complete? (== i (count tasks))}))))
{:queue queue
:result result}))
(defn dummy-task [x] (prn :task x) (Thread/sleep 1000) (inc x))
;; kick of tasks
(def proc (apply task-processor 0 (repeat 100 dummy-task)))
;; result handler
(go
(let [res (<! (:result proc))]
(prn :final-result res)))
;; to stop the queue after current task is complete
;; in this example it might take up to an additional second
;; for the terminated result to be delivered
(close! (:queue proc))
You may want to use future and future-cancel for such task.
(def f (future (while (not (Thread/interrupted)) (your-function ... ))))
(future-cancel f)
Why do cancelled Clojure futures continue using CPU?

what's the best way to alts!! on a vector of channel multiple times?

I'm using core.async to do something in parallel, and then using alts!! wait on certain amount of result with timeout.
(ns c
(:require [clojure.core.async :as a]))
(defn async-call-on-vector [v]
(mapv (fn [n]
(a/go (a/<! (a/timeout n)) ; simulate long time work
n))
v))
(defn wait-result-with-timeout [chans num-to-get timeout]
(let [chans-count (count chans)
num-to-get (min num-to-get
chans-count)]
(if (empty? chans)
[]
(let [timeout (a/timeout timeout)]
(loop [result []
met 0]
(if (or (= (count result) num-to-get)
(= met chans-count)) ; all chan has been consumed
result
(let [[v c] (a/alts!! (conj chans timeout))]
(if (= c timeout)
result
(case v
nil (do (println "got nil") (recur result met)) ; close! on that channel
(recur (conj result v) (inc met)))))))))))
and then invoke like:
user=> (-> [1 200 300 400 500] c/async-call-on-vector (c/wait-result-with-timeout 2 30))
this expression will prints out a lot of got nil. It seems channel returned by go block will close that channel after result has been returned. And this will causes alts!! return nil on this case. but this is very CPU unfriendly, it just like busy waiting. Is there a way to avoid this?
I solved this by define a macro like go, but return a channel that will not closed on result returned. Is this a right way to solve it?
I'm using core.async to do something in parallel, and then using alts!! wait on certain amount of result with timeout.
It looks like you want to collect all of the values that will be delivered by some channels, until all of those channels are closed, or until a timeout occurs. One way to do that is to merge those channels onto a single channel, and then use alts! within a go-loop to collect the values into a vector:
(defn wait-result-with-timeout [chans timeout]
(let [all-chans (a/merge chans)
t-out (a/timeout timeout)]
(a/go-loop [vs []]
(let [[v _] (a/alts! [all-chans t-out])]
;; v will be nil if either every channel in
;; `chans` is closed, or if `t-out` fires.
(if (nil? v)
vs
(recur (conj vs v)))))))
It seems channel returned by go block will close that channel after result has been returned.
You are correct, that is the documented behavior of a go block.
I solved this by define a macro like go, but return a channel that will not closed on result returned. Is this a right way to solve it?
Probably not, although it's not for me to say whether it's right or wrong for your particular use case. Generally speaking, channels should close if they are done delivering values, to indicate the semantics of being done delivering values. For example, the above code uses the closing of all-chans to indicate that there is no more work to wait on.

clojure.async: "<! not in (go ...) block" error

When I evaluate the following core.async clojurescript code I get an error: "Uncaught Error: <! used not in (go ...) block"
(let [chans [(chan)]]
(go
(doall (for [c chans]
(let [x (<! c)]
x)))))
What am I doing wrong here? It definitely looks like the <! is in the go block.
because go blocks can't cross function boundaries I tend to fall back on loop/recur for a lot of these cases. the (go (loop pattern is so common that it has a short-hand form in core.async that is useful in cases like this:
user> (require '[clojure.core.async :as async])
user> (async/<!! (let [chans [(async/chan) (async/chan) (async/chan)]]
(doseq [c chans]
(async/go (async/>! c 42)))
(async/go-loop [[f & r] chans result []]
(if f
(recur r (conj result (async/<! f)))
result))))
[42 42 42]
Why dont you use alts! from Core.Async?
This function lets you listen on multiple channels and know which channel you read from on each data.
For example:
(let [chans [(chan)]]
(go
(let [[data ch] (alts! chans)]
data)))))
You can ask of the channel origin too:
...
(let [slow-chan (chan)
fast-chan (chan)
[data ch] (alts! [slow-chan fast-chan])]
(when (= ch slow-chan)
...))
From the Docs:
Completes at most one of several channel operations. Must be called
inside a (go ...) block. ports is a vector of channel endpoints,
which can be either a channel to take from or a vector of
[channel-to-put-to val-to-put], in any combination. Takes will be
made as if by !. Unless
the :priority option is true, if more than one port operation is
ready a non-deterministic choice will be made. If no operation is
ready and a :default value is supplied, [default-val :default] will
be returned, otherwise alts! will park until the first operation to
become ready completes. Returns [val port] of the completed
operation, where val is the value taken for takes, and a
boolean (true unless already closed, as per put!) for put
Doumentation ref

Agent/actor like constructs in clojure that operate on all messages received since last update

What's best way in clojure to implement something like an actor or agent (asynchronously updated, uncoordinated reference) that does the following?
gets sent messages/data
executes some function on that data to obtain new state; something like (fn [state new-msgs] ...)
continues to receive messages/data during that update
once done with that update, runs the same update function against all messages that have been sent in the interim
An agent doesn't seem quite right here. One must simultaneously send function and data to agents, which doesn't leave room for a function which operates on all data that has come in during the last update. The goal implicitly requires a decoupling of function and data.
The actor model seems generally better suited in that there is a decoupling of function and data. However, all actor frameworks I'm aware of seem to assume each message sent will be processed separately. It's not clear how one would turn this on it's head without adding extra machinery. I know Pulsar's actors accept a :lifecycle-handle function which can be used to make actors do "special tricks" but there isn't a lot of documentation around this so it's unclear whether the functionality would be helpful.
I do have a solution to this problem using agents, core.async channels, and watch functions, but it's a bit messy, and I'm hoping there is a better solution. I'll post it as a solution in case others find it helpful, but I'd like to see what other's come up with.
Here's the solution I came up with using agents, core.async channels, and watch functions. Again, it's a bit messy, but it does what I need it to for now. Here it is, in broad strokes:
(require '[clojure.core.async :as async :refer [>!! <!! >! <! chan go]])
; We'll call this thing a queued-agent
(defprotocol IQueuedAgent
(enqueue [this message])
(ping [this]))
(defrecord QueuedAgent [agent queue]
IQueuedAgent
(enqueue [_ message]
(go (>! queue message)))
(ping [_]
(send agent identity)))
; Need a function for draining a core async channel of all messages
(defn drain! [c]
(let [cc (chan 1)]
(go (>! cc ::queue-empty))
(letfn
; This fn does all the hard work, but closes over cc to avoid reconstruction
[(drainer! [c]
(let [[v _] (<!! (go (async/alts! [c cc] :priority true)))]
(if (= v ::queue-empty)
(lazy-seq [])
(lazy-seq (cons v (drainer! c))))))]
(drainer! c))))
; Constructor function
(defn queued-agent [& {:keys [buffer update-fn init-fn error-handler-builder] :or {:buffer 100}}]
(let [q (chan buffer)
a (agent (if init-fn (init-fn) {}))
error-handler-fn (error-handler-builder q a)]
; Set up the queue, and watcher which runs the update function when there is new data
(add-watch
a
:update-conv
(fn [k r o n]
(let [queued (drain! q)]
(when-not (empty? queued)
(send a update-fn queued error-handler-fn)))))
(QueuedAgent. a q)))
; Now we can use these like this
(def a (queued-agent
:init-fn (fn [] {:some "initial value"})
:update-fn (fn [a queued-data error-handler-fn]
(println "Receiving data" queued-data)
; Simulate some work/load on data
(Thread/sleep 2000)
(println "Done with work; ready to queue more up!"))
; This is a little warty at the moment, but closing over the queue and agent lets you requeue work on
; failure so you can try again.
:error-handler-builder
(fn [q a] (println "do something with errors"))))
(defn -main []
(doseq [i (range 10)]
(enqueue a (str "data" i))
(Thread/sleep 500) ; simulate things happening
; This part stinks... have to manually let the queued agent know that we've queued some things up for it
(ping a)))
As you'll notice, having to ping the queued-agent here every time new data is added is pretty warty. It definitely feels like things are being twisted out of typical usage.
Agents are the inverse of what you want here - they are a value that gets sent updating functions. This easiest with a queue and a Thread. For convenience I am using future to construct the thread.
user> (def q (java.util.concurrent.LinkedBlockingDeque.))
#'user/q
user> (defn accumulate
[summary input]
(let [{vowels true consonents false}
(group-by #(contains? (set "aeiouAEIOU") %) input)]
(-> summary
(update-in [:vowels] + (count vowels))
(update-in [:consonents] + (count consonents)))))
#'user/accumulate
user> (def worker
(future (loop [summary {:vowels 0 :consonents 0} in-string (.take q)]
(if (not in-string)
summary
(recur (accumulate summary in-string)
(.take q))))))
#'user/worker
user> (.add q "hello")
true
user> (.add q "goodbye")
true
user> (.add q false)
true
user> #worker
{:vowels 5, :consonents 7}
I came up with something closer to an actor, inspired by Tim Baldridge's cast on actors (Episode 16). I think this addresses the problem much more cleanly.
(defmacro take-all! [c]
`(loop [acc# []]
(let [[v# ~c] (alts! [~c] :default nil)]
(if (not= ~c :default)
(recur (conj acc# v#))
acc#))))
(defn eager-actor [f]
(let [msgbox (chan 1024)]
(go (loop [f f]
(let [first-msg (<! msgbox) ; do this so we park efficiently, and only
; run when there are actually messages
msgs (take-all! msgbox)
msgs (concat [first-msg] msgs)]
(recur (f msgs)))))
msgbox))
(let [a (eager-actor (fn f [ms]
(Thread/sleep 1000) ; simulate work
(println "doing something with" ms)
f))]
(doseq [i (range 20)]
(Thread/sleep 300)
(put! a i)))
;; =>
;; doing something with (0)
;; doing something with (1 2 3)
;; doing something with (4 5 6)
;; doing something with (7 8 9 10)
;; doing something with (11 12 13)