I'm learning core.async and have written a simple producer consumer code:
(ns webcrawler.parallel
(:require [clojure.core.async :as async
:refer [>! <! >!! <!! go chan buffer close! thread alts! alts!! timeout]]))
(defn consumer
[in out f]
(go (loop [request (<! in)]
(if (nil? request)
(close! out)
(do (print f)
(let [result (f request)]
(>! out result))
(recur (<! in)))))))
(defn make-consumer [in f]
(let [out (chan)]
(consumer in out f)
out))
(defn process
[f s no-of-consumers]
(let [in (chan (count s))
consumers (repeatedly no-of-consumers #(make-consumer in f))
out (async/merge consumers)]
(map #(>!! in %1) s)
(close! in)
(loop [result (<!! out)
results '()]
(if (nil? result)
results
(recur (<!! out)
(conj results result))))))
This code works fine when I step in through the process function in debugger supplied with Emacs' cider.
(process (partial + 1) '(1 2 3 4) 1)
(5 4 3 2)
However, if I run it by itself (or hit continue in the debugger) I get an empty result.
(process (partial + 1) '(1 2 3 4) 1)
()
My guess is that in the second case for some reason producer doesn't wait for consumers before exiting, but I'm not sure why. Thanks for help!
The problem is that your call to map is lazy, and will not run until something asks for the results. Nothing does this in your code.
There are 2 solutions:
(1) Use the eager function mapv:
(mapv #(>!! in %1) items)
(2) Use the doseq, which is intended for side-effecting operations (like putting values on a channel):
(doseq [item items]
(>!! in item))
Both will work and produce output:
(process (partial + 1) [1 2 3 4] 1) => (5 4 3 2)
P.S. You have a debug statement in (defn consumer ...)
(print f)
that produces a lot of noise in the output:
<#clojure.core$partial$fn__5561 #object[clojure.core$partial$fn__5561 0x31cced7
"clojure.core$partial$fn__5561#31cced7"]>
That is repeated 5 times back to back. You probably want to avoid that, as printing function "refs" is pretty useless to a human reader.
Also, debug printouts in general should normally use println so you can see where each one begins and ends.
I'm going to take a safe stab that this is being caused by the lazy behavior of map, and this line that's carrying out side effects:
(map #(>!! in %1) s)
Because you never explicitly use the results, it never runs. Change it to use mapv, which is strict, or more correctly, use doseq. Never use map to run side effects. It's meant to lazily transform a list, and abuse of it leads to behaviour like this.
So why is it working while debugging? I'm going to guess because the debugger forces evaluation as part of its operation, which is masking the problem.
As you can read from docstring map returns a lazy sequence. And I think the best way is to use dorun. Here is an example from clojuredocs:
;;map a function which makes database calls over a vector of values
user=> (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"])
JdbcSQLException The object is already closed [90007-170] org.h2.message.DbE
xception.getJdbcSQLException (DbException.java:329)
;;database connection was closed before we got a chance to do our transactions
;;lets wrap it in dorun
user=> (dorun (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"]))
DEBUG :db insert into person values name = 'Fred'
DEBUG :db insert into person values name = 'Ethel'
DEBUG :db insert into person values name = 'Lucy'
DEBUG :db insert into person values name = 'Ricardo'
nil
Related
I have a channel where I am putting values into inside a doseq loop.
This code reads from a list of isbns and for each isbn, does an amazon search to return contents of a book, and then calls another function to get the title and rank
(def book_channel (chan 10))
make sure you use clojure.core.async/into rather than clojure.core/into. Here is an example of a round trip from collection to channel and back to collection:
user> (require '[clojure.core.async :as async :refer [<! <!! >!! >! chan go]])
nil
user> (def book-chan (async/to-chan [:book1 :book2 :book3]))
#'user/book-chan
user> (<!! (clojure.core.async/into [] book-chan))
[:book1 :book2 :book3]
clojure.core.async/into returns a channel that will have exactly one item written to it. That one item will be written once it's input channel closes. This keeps the whole thing asynchronous and it does require that the code putting things into the book-channel close the chan to signal that all the books are there.
You need to do some type of coordination to determine when all of your work is finished. You can pull that coordination out into the main thread fairly easily:
(def book_channel (chan 10))
(defn concurrency_test
[list_of_isbns]
(doseq [isbn list_of_isbns]
(go (>! book_channel
(get_title_and_rank_for_one_isbn
(amazon_search isbn)))))
(prn (loop [results []]
(if (= (count results) (count list_of_isbns))
results
(recur (conj results (<!! book_channel)))))))
Here, I used a loop that keeps waiting for results and adding them to the vector until we have as many results as we do isbns. You'll want to make sure that get_title_and_rank_for_one_isbn always generates a result that can be put on a channel, otherwise the loop will wait forever.
You should close! the book_channel after you finish pushing stuff into it. Per async/into documentation - "ch must close before into produces a result."
(let [book> (chan)]
(go
(doseq [e (range 8)]
(>! book> e))
(close! book>))
(<!! (async/into [] book>)))
Alternatively, you can use async/onto-chan which will close the channel for you:
(let [book> (chan)]
(async/onto-chan book> (range 8))
(<!! (async/into [] book>)))
Consider a dataset like this:
(def data [{:url "http://www.url1.com" :type :a}
{:url "http://www.url2.com" :type :a}
{:url "http://www.url3.com" :type :a}
{:url "http://www.url4.com" :type :b}])
The contents of those URL's should be requested in parallel. Depending on the item's :type value those contents should be parsed by corresponding functions. The parsing functions return collections, which should be concatenated, once all the responses have arrived.
So let's assume that there are functions parse-a and parse-b, which both return a collection of strings when they are passed a string containing HTML content.
It looks like core.async could be a good tool for this. One could either have separate channels for each item ore one single channel. I'm not sure which way would be preferable here. With several channels one could use transducers for the postprocessing/parsing. There is also a special promise-chan which might be proper here.
Here is a code-sketch, I'm using a callback based HTTP kit function. Unfortunately, I could not find a generic solution inside the go block.
(defn f [data]
(let [chans (map (fn [{:keys [url type]}]
(let [c (promise-chan (map ({:a parse-a :b parse-b} type)))]
(http/get url {} #(put! c %))
c))
data)
result-c (promise-chan)]
(go (put! result-c (concat (<! (nth chans 0))
(<! (nth chans 1))
(<! (nth chans 2))
(<! (nth chans 3)))))
result-c))
The result can be read like so:
(go (prn (<! (f data))))
I'd say that promise-chan does more harm than good here. The problem is that most of core.async API (a/merge, a/reduce etc.) relies on fact that channels will close at some point, promise-chans in turn never close.
So, if sticking with core.async is crucial for you, the better solution will be not to use promise-chan, but ordinary channel instead, which will be closed after first put!:
...
(let [c (chan 1 (map ({:a parse-a :b parse-b} type)))]
(http/get url {} #(do (put! c %) (close! c)))
c)
...
At this point, you're working with closed channels and things become a bit simpler. To collect all values you could do something like this:
;; (go (put! result-c (concat (<! (nth chans 0))
;; (<! (nth chans 1))
;; (<! (nth chans 2))
;; (<! (nth chans 3)))))
;; instead of above, now you can do this:
(->> chans
async/merge
(async/reduce into []))
UPD (below are my personal opinions):
Seems, that using core.async channels as promises (either in form of promise-chan or channel that closes after single put!) is not the best approach. When things grow, it turns out that core.async API overall is (you may have noticed that) not that pleasant as it could be. Also there are several unsupported constructs, that may force you to write less idiomatic code than it could be. In addition, there is no built-in error handling (if error occurs within go-block, go-block will silently return nil) and to address this you'll need to come up with something of your own (reinvent the wheel). Therefore, if you need promises, I'd recommend to use specific library for that, for example manifold or promesa.
I wanted this functionality as well because I really like core.async but I also wanted to use it in certain places like traditional JavaScript promises. I came up with a solution using macros. In the code below, <? is the same thing as <! but it throws if there's an error. It behaves like Promise.all() in that it returns a vector of all the returned values from the channels if they all are successful; otherwise it will return the first error (since <? will cause it to throw that value).
(defmacro <<? [chans]
`(let [res# (atom [])]
(doseq [c# ~chans]
(swap! res# conj (serverless.core.async/<? c#)))
#res#))
If you'd like to see the full context of the function it's located on GitHub. It's heavily inspired from David Nolen's blog post.
Use pipeline-async in async.core to launch asynchronous operations like http/get concurrently while delivering the result in the same order as the input:
(let [result (chan)]
(pipeline-async
20 result
(fn [{:keys [url type]} ch]
(let [parse ({:a parse-a :b parse-b} type)
callback #(put! ch (parse %)(partial close! ch))]
(http/get url {} callback)))
(to-chan data))
result)
if anyone is still looking at this, adding on to the answer by #OlegTheCat:
You can use a separate channel for errors.
(:require [cljs.core.async :as async]
[cljs-http.client :as http])
(:require-macros [cljs.core.async.macros :refer [go]])
(go (as-> [(http/post <url1> <params1>)
(http/post <url2> <params2>)
...]
chans
(async/merge chans (count chans))
(async/reduce conj [] chans)
(async/<! chans)
(<callback> chans)))
I have 100 workers (agents) that share one ref that contains collection of tasks. While this collection have tasks, each worker get one task from this collection (in dosync block), print it and sometimes put it back in the collection (in dosync block):
(defn have-tasks?
[tasks]
(not (empty? #tasks)))
(defn get-task
[tasks]
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task)))
(defn put-task
[tasks task]
(dosync (alter tasks conj task))
nil)
(defn worker
[& {:keys [tasks]}]
(agent {:tasks tasks}))
(defn worker-loop
[{:keys [tasks] :as state}]
(while (have-tasks? tasks)
(let [task (get-task tasks)]
(println "Task: " task)
(when (< (rand) 0.1)
(put-task tasks task))))
state)
(defn create-workers
[count & options]
(->> (range 0 count)
(map (fn [_] (apply worker options)))
(into [])))
(defn start-workers
[workers]
(doseq [worker workers] (send-off worker worker-loop)))
(def tasks (ref (range 1 10000000)))
(def workers (create-workers 100 :tasks tasks))
(start-workers workers)
(apply await workers)
When i run this code, the last value printed by agents is (after several tries):
435445,
4556294,
1322061,
3950017.
But never 9999999 what I expect.
And every time the collection is really empty at the end.
What I'm doing wrong?
Edit:
I rewrote worker-loop as simple as possible:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop []
(when-let [task (get-task tasks)]
(println "Task: " task)
(recur)))
state)
But problem is still there.
This code behaves as expected when create one and only one worker.
The problem here has nothing to do with agents and barely anything to do with laziness. Here's a somewhat reduced version of the original code that still exhibits the problem:
(defn f [init]
(let [state (ref init)
task (fn []
(loop [last-n nil]
(if-let [n (dosync
(let [n (first #state)]
(alter state rest)
n))]
(recur n)
(locking :out
(println "Last seen:" last-n)))))
workers (->> (range 0 5)
(mapv (fn [_] (Thread. task))))]
(doseq [w workers] (.start w))
(doseq [w workers] (.join w))))
(defn r []
(f (range 1 100000)))
(defn i [] (f (->> (iterate inc 1)
(take 100000))))
(defn t []
(f (->> (range 1 100000)
(take Integer/MAX_VALUE))))
Running this code shows that both i and t, both lazy, reliably work, whereas r reliably doesn't. The problem is in fact a concurrency bug in the class returned by the range call. Indeed, that bug is documented in this Clojure ticket and is fixed as of Clojure version 1.9.0-alpha11.
A quick summary of the bug in case the ticket is not accessible for some reason: in the internals of the rest call on the result of range, there was a small opportunity for a race condition: the "flag" that says "the next value has already been computed" was set before the actual value itself, which meant that a second thread could see that flag as true even though the "next value" is still nil. The call to alter would then fix that nil value on the ref. It's been fixed by swapping the two assignment lines.
In cases where the result of range was either forcibly realized in a single thread or wrapped in another lazy seq, that bug would not appear.
I asked this question on the Clojure Google Group and it helped me to find the answer.
The problem is that I used a lazy sequence within the STM transaction.
When I replaced this code:
(def tasks (ref (range 1 10000000)))
by this:
(def tasks (ref (into [] (range 1 10000000))))
it worked as expected!
In my production code where the problem occurred, I used the Korma framework that also returns a lazy collection of tuples, as in my example.
Conclusion: Avoid the use of lazy data structures within the STM transaction.
When the last number in the range is reached, there a are still older numbers being held by the workers. Some of these will be returned to the queue, to be processed again.
In order to better see what is happening, you can change worker-loop to print the last task handled by each worker:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop [last-task nil]
(if (have-tasks? tasks)
(let [task (get-task tasks)]
;; (when (< (rand) 0.1)
;; (put-task tasks task)
(recur task))
(when last-task
(println "Last task:" last-task))))
state)
This also shows the race condition in the code, where tasks seen by have-tasks? often is taken by others when get-task is called near the end of the processing of the tasks.
The race condition can be solved by removing have-tasks? and instead using the return value of nil from get-task as a signal that no more tasks are available (at the moment).
Updated:
As observed, this race conditions does not explain the problem.
Neither is the problem solved by removing a possible race condition in get-task like this:
(defn get-task [tasks]
(dosync
(first (alter tasks rest))))
However changing get-task to use an explicit lock seems to solve the problem:
(defn get-task [tasks]
(locking :lock
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task))))
What's best way in clojure to implement something like an actor or agent (asynchronously updated, uncoordinated reference) that does the following?
gets sent messages/data
executes some function on that data to obtain new state; something like (fn [state new-msgs] ...)
continues to receive messages/data during that update
once done with that update, runs the same update function against all messages that have been sent in the interim
An agent doesn't seem quite right here. One must simultaneously send function and data to agents, which doesn't leave room for a function which operates on all data that has come in during the last update. The goal implicitly requires a decoupling of function and data.
The actor model seems generally better suited in that there is a decoupling of function and data. However, all actor frameworks I'm aware of seem to assume each message sent will be processed separately. It's not clear how one would turn this on it's head without adding extra machinery. I know Pulsar's actors accept a :lifecycle-handle function which can be used to make actors do "special tricks" but there isn't a lot of documentation around this so it's unclear whether the functionality would be helpful.
I do have a solution to this problem using agents, core.async channels, and watch functions, but it's a bit messy, and I'm hoping there is a better solution. I'll post it as a solution in case others find it helpful, but I'd like to see what other's come up with.
Here's the solution I came up with using agents, core.async channels, and watch functions. Again, it's a bit messy, but it does what I need it to for now. Here it is, in broad strokes:
(require '[clojure.core.async :as async :refer [>!! <!! >! <! chan go]])
; We'll call this thing a queued-agent
(defprotocol IQueuedAgent
(enqueue [this message])
(ping [this]))
(defrecord QueuedAgent [agent queue]
IQueuedAgent
(enqueue [_ message]
(go (>! queue message)))
(ping [_]
(send agent identity)))
; Need a function for draining a core async channel of all messages
(defn drain! [c]
(let [cc (chan 1)]
(go (>! cc ::queue-empty))
(letfn
; This fn does all the hard work, but closes over cc to avoid reconstruction
[(drainer! [c]
(let [[v _] (<!! (go (async/alts! [c cc] :priority true)))]
(if (= v ::queue-empty)
(lazy-seq [])
(lazy-seq (cons v (drainer! c))))))]
(drainer! c))))
; Constructor function
(defn queued-agent [& {:keys [buffer update-fn init-fn error-handler-builder] :or {:buffer 100}}]
(let [q (chan buffer)
a (agent (if init-fn (init-fn) {}))
error-handler-fn (error-handler-builder q a)]
; Set up the queue, and watcher which runs the update function when there is new data
(add-watch
a
:update-conv
(fn [k r o n]
(let [queued (drain! q)]
(when-not (empty? queued)
(send a update-fn queued error-handler-fn)))))
(QueuedAgent. a q)))
; Now we can use these like this
(def a (queued-agent
:init-fn (fn [] {:some "initial value"})
:update-fn (fn [a queued-data error-handler-fn]
(println "Receiving data" queued-data)
; Simulate some work/load on data
(Thread/sleep 2000)
(println "Done with work; ready to queue more up!"))
; This is a little warty at the moment, but closing over the queue and agent lets you requeue work on
; failure so you can try again.
:error-handler-builder
(fn [q a] (println "do something with errors"))))
(defn -main []
(doseq [i (range 10)]
(enqueue a (str "data" i))
(Thread/sleep 500) ; simulate things happening
; This part stinks... have to manually let the queued agent know that we've queued some things up for it
(ping a)))
As you'll notice, having to ping the queued-agent here every time new data is added is pretty warty. It definitely feels like things are being twisted out of typical usage.
Agents are the inverse of what you want here - they are a value that gets sent updating functions. This easiest with a queue and a Thread. For convenience I am using future to construct the thread.
user> (def q (java.util.concurrent.LinkedBlockingDeque.))
#'user/q
user> (defn accumulate
[summary input]
(let [{vowels true consonents false}
(group-by #(contains? (set "aeiouAEIOU") %) input)]
(-> summary
(update-in [:vowels] + (count vowels))
(update-in [:consonents] + (count consonents)))))
#'user/accumulate
user> (def worker
(future (loop [summary {:vowels 0 :consonents 0} in-string (.take q)]
(if (not in-string)
summary
(recur (accumulate summary in-string)
(.take q))))))
#'user/worker
user> (.add q "hello")
true
user> (.add q "goodbye")
true
user> (.add q false)
true
user> #worker
{:vowels 5, :consonents 7}
I came up with something closer to an actor, inspired by Tim Baldridge's cast on actors (Episode 16). I think this addresses the problem much more cleanly.
(defmacro take-all! [c]
`(loop [acc# []]
(let [[v# ~c] (alts! [~c] :default nil)]
(if (not= ~c :default)
(recur (conj acc# v#))
acc#))))
(defn eager-actor [f]
(let [msgbox (chan 1024)]
(go (loop [f f]
(let [first-msg (<! msgbox) ; do this so we park efficiently, and only
; run when there are actually messages
msgs (take-all! msgbox)
msgs (concat [first-msg] msgs)]
(recur (f msgs)))))
msgbox))
(let [a (eager-actor (fn f [ms]
(Thread/sleep 1000) ; simulate work
(println "doing something with" ms)
f))]
(doseq [i (range 20)]
(Thread/sleep 300)
(put! a i)))
;; =>
;; doing something with (0)
;; doing something with (1 2 3)
;; doing something with (4 5 6)
;; doing something with (7 8 9 10)
;; doing something with (11 12 13)
I have a number of (unevaluated) expressions held in a vector; [ expr1 expr2 expr3 ... ]
What I wish to do is hand each expression to a separate thread and wait until one returns a value. At that point I'm not interested in the results from the other threads and would like to cancel them to save CPU resource.
( I realise that this could cause non-determinism in that different runs of the program might cause different expressions to be evaluated first. I have this in hand. )
Is there a standard / idiomatic way of achieving the above?
Here's my take on it.
Basically you have to resolve a global promise inside each of your futures, then return a vector containing future list and the resolved value and then cancel all the futures in the list:
(defn run-and-cancel [& expr]
(let [p (promise)
run-futures (fn [& expr] [(doall (map #(future (deliver p (eval %1))) expr)) #p])
[fs res] (apply run-futures expr)]
(map future-cancel fs)
res))
It's not reached an official release yet, but core.async looks like it might be an interesting way of solving your problem - and other asynchronous problems, very neatly.
The leiningen incantation for core.async is (currently) as follows:
[org.clojure/core.async "0.1.0-SNAPSHOT"]
And here's some code to make a function that will take a number of time-consuming functions, and block until one of them returns.
(require '[clojure.core.async :refer [>!! chan alts!! thread]]))
(defn return-first [& ops]
(let [v (map vector ops (repeatedly chan))]
(doseq [[op c] v]
(thread (>!! c (op))))
(let [[value channel] (alts!! (map second v))]
value)))
;; Make sure the function returns what we expect with a simple Thread/sleep
(assert (= (return-first (fn [] (Thread/sleep 3000) 3000)
(fn [] (Thread/sleep 2000) 2000)
(fn [] (Thread/sleep 5000) 5000))
2000))
In the sample above:
chan creates an asynchronous channel
>!! puts a value onto a channel
thread executes the body in another thread
alts!! takes a vector of channels, and returns when a value appears on any of them
There's way more to it than this, and I'm still getting my head round it, but there's a walkthrough here: https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
And David Nolen's blog has some great, if mind-boggling, posts on it (http://swannodette.github.io/)
Edit
Just seen that MichaĆ Marczyk has answered a very similar question, but better, here, and it allows you to cancel/short-circuit.
with Clojure threading long running processes and comparing their returns
What you want is Java's CompletionService. I don't know of any wrapper around this in clojure, but it wouldn't be hard to do with interop. The example below is loosely based around the example on the JavaDoc page for the ExecutorCompletionService.
(defn f [col]
(let [cs (ExecutorCompletionService. (Executors/newCachedThreadPool))
futures (map #(.submit cs %) col)
result (.get (.take cs))]
(map #(.cancel % true) futures)
result))
You could use future-call to get a list of all futures, storing them in an Atom. then, compose each running future with a "shoot the other ones in the head" function so the first one will terminate all the remaining ones. Here is an example:
(defn first-out [& fns]
(let [fs (atom [])
terminate (fn [] (println "cancling..") (doall (map future-cancel #fs)))]
(reset! fs (doall (map (fn [x] (future-call #((x) (terminate)))) fns)))))
(defn wait-for [n s]
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush)))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
Edit
Just noticed that the previous code does not return the first results, so it is mainly useful for side-effects. here is another version that returns the first result using a promise:
(defn first-out [& fns]
(let [fs (atom [])
ret (promise)
terminate (fn [x] (println "cancling.." )
(doall (map future-cancel #fs))
(deliver ret x))]
(reset! fs (doall (map (fn [x] (future-call #(terminate (x)))) fns)))
#ret))
(defn wait-for [n s]
"this time, return the value"
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush) s))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
While I don't know if there is an idiomatic way to achieve your goal but Clojure Future looks like a good fit.
Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/#. If the computation has
not yet finished, calls to deref/# will block, unless the variant of
deref with timeout is used.