Why does the execution order of tasks in clojure boot change? - clojure

(deftask test1 "first test task" [] (print "1") identity)
(deftask test2 "second test task" [] (print "2") identity)
(boot (comp (test1) (test2)))
=> 12nil
(boot (comp (fn [x] (print "1") identity) (fn [x] (print "2") identity)))
=> 21nil
If I use comp on tasks, the execution order is from left to right. If I use comp on anonymous functions, the execution order is from right to left. How is this inconsistency reasonable?

The reason for that difference is that when you use comp with boot tasks is that they are not bare logic that gets composed but each boot tasks returns a function which will be invoked later and that function wraps another function which is passed to it (like in ring middleware).
With plain functions it works like this:
(comp inc dec)
produces a function which does following:
(inc (dec n))
In boot tasks are similar to ring middleware. Each task is a function which returns another function that will wrap the next handler from the pipeline. It works like that (not literally, it's simplified for readability sake):
(defn task1 []
(fn [next-handler]
(fn [fileset]
(print 1) ;; do something in task1
(next-handler fileset))) ;; and call wrapped handler
(defn task2 []
(fn [next-handler]
(fn [fileset]
(print 2) ;; do something in task1
(next-handler fileset)))) ;; and call wrapped handler
So when you do:
(comp (task1) (task2))
And execute such composed task it works as if it was:
(fn [fileset1]
(print 1)
((fn [fileset2]
(print 2)
(next-handler fileset2))
fileset1))
Because a function produces by (task2) will be passed to function produced by (task1) which will wrap the one from (task2) (and call it after printing 1).
You can read more on boot tasks anatomy in its wiki. It might be also useful to read on ring middleware.

Related

Clojure program works fine when debugged, fails in repl

I'm learning core.async and have written a simple producer consumer code:
(ns webcrawler.parallel
(:require [clojure.core.async :as async
:refer [>! <! >!! <!! go chan buffer close! thread alts! alts!! timeout]]))
(defn consumer
[in out f]
(go (loop [request (<! in)]
(if (nil? request)
(close! out)
(do (print f)
(let [result (f request)]
(>! out result))
(recur (<! in)))))))
(defn make-consumer [in f]
(let [out (chan)]
(consumer in out f)
out))
(defn process
[f s no-of-consumers]
(let [in (chan (count s))
consumers (repeatedly no-of-consumers #(make-consumer in f))
out (async/merge consumers)]
(map #(>!! in %1) s)
(close! in)
(loop [result (<!! out)
results '()]
(if (nil? result)
results
(recur (<!! out)
(conj results result))))))
This code works fine when I step in through the process function in debugger supplied with Emacs' cider.
(process (partial + 1) '(1 2 3 4) 1)
(5 4 3 2)
However, if I run it by itself (or hit continue in the debugger) I get an empty result.
(process (partial + 1) '(1 2 3 4) 1)
()
My guess is that in the second case for some reason producer doesn't wait for consumers before exiting, but I'm not sure why. Thanks for help!
The problem is that your call to map is lazy, and will not run until something asks for the results. Nothing does this in your code.
There are 2 solutions:
(1) Use the eager function mapv:
(mapv #(>!! in %1) items)
(2) Use the doseq, which is intended for side-effecting operations (like putting values on a channel):
(doseq [item items]
(>!! in item))
Both will work and produce output:
(process (partial + 1) [1 2 3 4] 1) => (5 4 3 2)
P.S. You have a debug statement in (defn consumer ...)
(print f)
that produces a lot of noise in the output:
<#clojure.core$partial$fn__5561 #object[clojure.core$partial$fn__5561 0x31cced7
"clojure.core$partial$fn__5561#31cced7"]>
That is repeated 5 times back to back. You probably want to avoid that, as printing function "refs" is pretty useless to a human reader.
Also, debug printouts in general should normally use println so you can see where each one begins and ends.
I'm going to take a safe stab that this is being caused by the lazy behavior of map, and this line that's carrying out side effects:
(map #(>!! in %1) s)
Because you never explicitly use the results, it never runs. Change it to use mapv, which is strict, or more correctly, use doseq. Never use map to run side effects. It's meant to lazily transform a list, and abuse of it leads to behaviour like this.
So why is it working while debugging? I'm going to guess because the debugger forces evaluation as part of its operation, which is masking the problem.
As you can read from docstring map returns a lazy sequence. And I think the best way is to use dorun. Here is an example from clojuredocs:
;;map a function which makes database calls over a vector of values
user=> (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"])
JdbcSQLException The object is already closed [90007-170] org.h2.message.DbE
xception.getJdbcSQLException (DbException.java:329)
;;database connection was closed before we got a chance to do our transactions
;;lets wrap it in dorun
user=> (dorun (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"]))
DEBUG :db insert into person values name = 'Fred'
DEBUG :db insert into person values name = 'Ethel'
DEBUG :db insert into person values name = 'Lucy'
DEBUG :db insert into person values name = 'Ricardo'
nil

How to stop go block in ClojureScript / core.async?

Is there an elegant way to stop a running go block?
(without introducing a flag and polluting the code with checks/branches)
(ns example
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [cljs.core.async :refer [<! timeout]]))
(defn some-long-task []
(go
(println "entering")
; some complex long-running task (e.g. fetching something via network)
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(println "leaving")))
; run the task
(def task (some-long-task))
; later, realize we no longer need the result and want to cancel it
; (stop! task)
Sorry, this is not possible with core.async today. What you get back from creating a go block is a normal channel what the result of the block will be put on, though this does not give you any handle to the actual block itself.
As stated in Arthur's answer, you cannot terminate a go block immediately, but you since your example indicates a multi-phased task (using sub-tasks), an approach like this might work:
(defn task-processor
"Takes an initial state value and number of tasks (fns). Puts tasks
on a work queue channel and then executes them in a go-loop, with
each task passed the current state. A task's return value is used as
input for next task. When all tasks are processed or queue has been
closed, places current result/state onto a result channel. To allow
nil values, result is wrapped in a map:
{:value state :complete? true/false}
This fn returns a map of {:queue queue-chan :result result-chan}"
[init & tasks]
(assert (pos? (count tasks)))
(let [queue (chan)
result (chan)]
(async/onto-chan queue tasks)
(go-loop [state init, i 0]
(if-let [task (<! queue)]
(recur (task state) (inc i))
(do (prn "task queue finished/terminated")
(>! result {:value state :complete? (== i (count tasks))}))))
{:queue queue
:result result}))
(defn dummy-task [x] (prn :task x) (Thread/sleep 1000) (inc x))
;; kick of tasks
(def proc (apply task-processor 0 (repeat 100 dummy-task)))
;; result handler
(go
(let [res (<! (:result proc))]
(prn :final-result res)))
;; to stop the queue after current task is complete
;; in this example it might take up to an additional second
;; for the terminated result to be delivered
(close! (:queue proc))
You may want to use future and future-cancel for such task.
(def f (future (while (not (Thread/interrupted)) (your-function ... ))))
(future-cancel f)
Why do cancelled Clojure futures continue using CPU?

Agent/actor like constructs in clojure that operate on all messages received since last update

What's best way in clojure to implement something like an actor or agent (asynchronously updated, uncoordinated reference) that does the following?
gets sent messages/data
executes some function on that data to obtain new state; something like (fn [state new-msgs] ...)
continues to receive messages/data during that update
once done with that update, runs the same update function against all messages that have been sent in the interim
An agent doesn't seem quite right here. One must simultaneously send function and data to agents, which doesn't leave room for a function which operates on all data that has come in during the last update. The goal implicitly requires a decoupling of function and data.
The actor model seems generally better suited in that there is a decoupling of function and data. However, all actor frameworks I'm aware of seem to assume each message sent will be processed separately. It's not clear how one would turn this on it's head without adding extra machinery. I know Pulsar's actors accept a :lifecycle-handle function which can be used to make actors do "special tricks" but there isn't a lot of documentation around this so it's unclear whether the functionality would be helpful.
I do have a solution to this problem using agents, core.async channels, and watch functions, but it's a bit messy, and I'm hoping there is a better solution. I'll post it as a solution in case others find it helpful, but I'd like to see what other's come up with.
Here's the solution I came up with using agents, core.async channels, and watch functions. Again, it's a bit messy, but it does what I need it to for now. Here it is, in broad strokes:
(require '[clojure.core.async :as async :refer [>!! <!! >! <! chan go]])
; We'll call this thing a queued-agent
(defprotocol IQueuedAgent
(enqueue [this message])
(ping [this]))
(defrecord QueuedAgent [agent queue]
IQueuedAgent
(enqueue [_ message]
(go (>! queue message)))
(ping [_]
(send agent identity)))
; Need a function for draining a core async channel of all messages
(defn drain! [c]
(let [cc (chan 1)]
(go (>! cc ::queue-empty))
(letfn
; This fn does all the hard work, but closes over cc to avoid reconstruction
[(drainer! [c]
(let [[v _] (<!! (go (async/alts! [c cc] :priority true)))]
(if (= v ::queue-empty)
(lazy-seq [])
(lazy-seq (cons v (drainer! c))))))]
(drainer! c))))
; Constructor function
(defn queued-agent [& {:keys [buffer update-fn init-fn error-handler-builder] :or {:buffer 100}}]
(let [q (chan buffer)
a (agent (if init-fn (init-fn) {}))
error-handler-fn (error-handler-builder q a)]
; Set up the queue, and watcher which runs the update function when there is new data
(add-watch
a
:update-conv
(fn [k r o n]
(let [queued (drain! q)]
(when-not (empty? queued)
(send a update-fn queued error-handler-fn)))))
(QueuedAgent. a q)))
; Now we can use these like this
(def a (queued-agent
:init-fn (fn [] {:some "initial value"})
:update-fn (fn [a queued-data error-handler-fn]
(println "Receiving data" queued-data)
; Simulate some work/load on data
(Thread/sleep 2000)
(println "Done with work; ready to queue more up!"))
; This is a little warty at the moment, but closing over the queue and agent lets you requeue work on
; failure so you can try again.
:error-handler-builder
(fn [q a] (println "do something with errors"))))
(defn -main []
(doseq [i (range 10)]
(enqueue a (str "data" i))
(Thread/sleep 500) ; simulate things happening
; This part stinks... have to manually let the queued agent know that we've queued some things up for it
(ping a)))
As you'll notice, having to ping the queued-agent here every time new data is added is pretty warty. It definitely feels like things are being twisted out of typical usage.
Agents are the inverse of what you want here - they are a value that gets sent updating functions. This easiest with a queue and a Thread. For convenience I am using future to construct the thread.
user> (def q (java.util.concurrent.LinkedBlockingDeque.))
#'user/q
user> (defn accumulate
[summary input]
(let [{vowels true consonents false}
(group-by #(contains? (set "aeiouAEIOU") %) input)]
(-> summary
(update-in [:vowels] + (count vowels))
(update-in [:consonents] + (count consonents)))))
#'user/accumulate
user> (def worker
(future (loop [summary {:vowels 0 :consonents 0} in-string (.take q)]
(if (not in-string)
summary
(recur (accumulate summary in-string)
(.take q))))))
#'user/worker
user> (.add q "hello")
true
user> (.add q "goodbye")
true
user> (.add q false)
true
user> #worker
{:vowels 5, :consonents 7}
I came up with something closer to an actor, inspired by Tim Baldridge's cast on actors (Episode 16). I think this addresses the problem much more cleanly.
(defmacro take-all! [c]
`(loop [acc# []]
(let [[v# ~c] (alts! [~c] :default nil)]
(if (not= ~c :default)
(recur (conj acc# v#))
acc#))))
(defn eager-actor [f]
(let [msgbox (chan 1024)]
(go (loop [f f]
(let [first-msg (<! msgbox) ; do this so we park efficiently, and only
; run when there are actually messages
msgs (take-all! msgbox)
msgs (concat [first-msg] msgs)]
(recur (f msgs)))))
msgbox))
(let [a (eager-actor (fn f [ms]
(Thread/sleep 1000) ; simulate work
(println "doing something with" ms)
f))]
(doseq [i (range 20)]
(Thread/sleep 300)
(put! a i)))
;; =>
;; doing something with (0)
;; doing something with (1 2 3)
;; doing something with (4 5 6)
;; doing something with (7 8 9 10)
;; doing something with (11 12 13)

How to memoize a function that uses core.async and blocking channel read?

I'd like to use memoize for a function that uses core.async and <!! e.g
(defn foo [x]
(go
(<!! (timeout 2000))
(* 2 x)))
(In the real-life, it could be useful in order to cache the results of server calls)
I was able to achieve that by writing a core.async version of memoize (almost the same code as memoize):
(defn memoize-async [f]
(let [mem (atom {})]
(fn [& args]
(go
(if-let [e (find #mem args)]
(val e)
(let [ret (<! (apply f args))]; this line differs from memoize [ret (apply f args)]
(swap! mem assoc args ret)
ret))))))
Example of usage:
(def foo-memo (memoize-async foo))
(go (println (<!! (foo-memo 3)))); delay because of (<!! (timeout 2000))
(go (println (<!! (foo-memo 3)))); subsequent calls are memoized => no delay
I am wondering if there are simpler ways to achieve the same result.
Remark: I need a solution that works with <!!. For <!, see this question: How to memoize a function that uses core.async and non-blocking channel read?
You can use the built in memoize function for this. Start by defining a method that reads from a channel and returns the value:
(defn wait-for [ch]
(<!! ch))
Note that we'll use <!! and not <! because we want this function block until there is data on the channel in all cases. <! only exhibits this behavior when used in a form inside of a go block.
You can then construct your memoized function by composing this function with foo, like such:
(def foo-memo (memoize (comp wait-for foo)))
foo returns a channel, so wait-for will block until that channel has a value (i.e. until the operation inside foo finished).
foo-memo can be used similar to your example above, except you do not need the call to <!! because wait-for will block for you:
(go (println (foo-memo 3))
You can also call this outside of a go block, and it will behave like you expect (i.e. block the calling thread until foo returns).

Is there a Clojure idiom for dispatching multiple expressions in parallel

I have a number of (unevaluated) expressions held in a vector; [ expr1 expr2 expr3 ... ]
What I wish to do is hand each expression to a separate thread and wait until one returns a value. At that point I'm not interested in the results from the other threads and would like to cancel them to save CPU resource.
( I realise that this could cause non-determinism in that different runs of the program might cause different expressions to be evaluated first. I have this in hand. )
Is there a standard / idiomatic way of achieving the above?
Here's my take on it.
Basically you have to resolve a global promise inside each of your futures, then return a vector containing future list and the resolved value and then cancel all the futures in the list:
(defn run-and-cancel [& expr]
(let [p (promise)
run-futures (fn [& expr] [(doall (map #(future (deliver p (eval %1))) expr)) #p])
[fs res] (apply run-futures expr)]
(map future-cancel fs)
res))
It's not reached an official release yet, but core.async looks like it might be an interesting way of solving your problem - and other asynchronous problems, very neatly.
The leiningen incantation for core.async is (currently) as follows:
[org.clojure/core.async "0.1.0-SNAPSHOT"]
And here's some code to make a function that will take a number of time-consuming functions, and block until one of them returns.
(require '[clojure.core.async :refer [>!! chan alts!! thread]]))
(defn return-first [& ops]
(let [v (map vector ops (repeatedly chan))]
(doseq [[op c] v]
(thread (>!! c (op))))
(let [[value channel] (alts!! (map second v))]
value)))
;; Make sure the function returns what we expect with a simple Thread/sleep
(assert (= (return-first (fn [] (Thread/sleep 3000) 3000)
(fn [] (Thread/sleep 2000) 2000)
(fn [] (Thread/sleep 5000) 5000))
2000))
In the sample above:
chan creates an asynchronous channel
>!! puts a value onto a channel
thread executes the body in another thread
alts!! takes a vector of channels, and returns when a value appears on any of them
There's way more to it than this, and I'm still getting my head round it, but there's a walkthrough here: https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
And David Nolen's blog has some great, if mind-boggling, posts on it (http://swannodette.github.io/)
Edit
Just seen that MichaƂ Marczyk has answered a very similar question, but better, here, and it allows you to cancel/short-circuit.
with Clojure threading long running processes and comparing their returns
What you want is Java's CompletionService. I don't know of any wrapper around this in clojure, but it wouldn't be hard to do with interop. The example below is loosely based around the example on the JavaDoc page for the ExecutorCompletionService.
(defn f [col]
(let [cs (ExecutorCompletionService. (Executors/newCachedThreadPool))
futures (map #(.submit cs %) col)
result (.get (.take cs))]
(map #(.cancel % true) futures)
result))
You could use future-call to get a list of all futures, storing them in an Atom. then, compose each running future with a "shoot the other ones in the head" function so the first one will terminate all the remaining ones. Here is an example:
(defn first-out [& fns]
(let [fs (atom [])
terminate (fn [] (println "cancling..") (doall (map future-cancel #fs)))]
(reset! fs (doall (map (fn [x] (future-call #((x) (terminate)))) fns)))))
(defn wait-for [n s]
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush)))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
Edit
Just noticed that the previous code does not return the first results, so it is mainly useful for side-effects. here is another version that returns the first result using a promise:
(defn first-out [& fns]
(let [fs (atom [])
ret (promise)
terminate (fn [x] (println "cancling.." )
(doall (map future-cancel #fs))
(deliver ret x))]
(reset! fs (doall (map (fn [x] (future-call #(terminate (x)))) fns)))
#ret))
(defn wait-for [n s]
"this time, return the value"
(fn [] (print "start...") (flush) (Thread/sleep n) (print s) (flush) s))
(first-out (wait-for 1000 "long") (wait-for 500 "short"))
While I don't know if there is an idiomatic way to achieve your goal but Clojure Future looks like a good fit.
Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/#. If the computation has
not yet finished, calls to deref/# will block, unless the variant of
deref with timeout is used.