How can I improve this Clojure Component+async example? - clojure

I want to figure out how best to create an async component, or accommodate async code in a Component-friendly way. This is the best I can come up with, and... it just doesn't feel quite right.
The Gist: take words, uppercase them and reverse them, finally print them.
Problem 1: I can't get the system to stop at the end. I expect to see a println of the individual c-chans stopping, but don't.
Problem 2: How do I properly inject deps. into the producer/consumer fns? I mean, they're not components, and I think they should not be components since they have no sensible lifecycle.
Problem 3: How do I idiomatically handle the async/pipeline-creating side-effects named a>b, and b>c? Should a pipeline be a component?
(ns pipelines.core
(:require [clojure.core.async :as async
:refer [go >! <! chan pipeline-blocking close!]]
[com.stuartsierra.component :as component]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; PIPELINES
(defn a>b [a> b>]
(pipeline-blocking 4
b>
(map clojure.string/upper-case)
a>))
(defn b>c [b> c>]
(pipeline-blocking 4
c>
(map (comp (partial apply str)
reverse))
b>))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; PRODUCER / CONSUMER
(defn producer [a>]
(doseq [word ["apple" "banana" "carrot"]]
(go (>! a> word))))
(defn consumer [c>]
(go (while true
(println "Your Word Is: " (<! c>)))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; SYSTEM
(defn pipeline-system [config-options]
(let [c-chan (reify component/Lifecycle
(start [this]
(println "starting chan: " this)
(chan 1))
(stop [this]
(println "stopping chan: " this)
(close! this)))]
(-> (component/system-map
:a> c-chan
:b> c-chan
:c> c-chan)
(component/using {}))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; RUN IT!
(def system (atom nil))
(let [_ (reset! system (component/start (pipeline-system {})))
_ (a>b (:a> #system) (:b> #system))
_ (b>c (:b> #system) (:c> #system))
_ (producer (:a> #system))
_ (consumer (:c> #system))
_ (component/stop #system)])
EDIT:
I started thinking about the following, but I'm not quite sure if it's closing properly...
(extend-protocol component/Lifecycle
clojure.core.async.impl.channels.ManyToManyChannel
(start [this]
this)
(stop [this]
(close! this)))

I rewrote your example a little to make it reloadable:
Reloadable Pipeline
(ns pipeline
(:require [clojure.core.async :as ca :refer [>! <!]]
[clojure.string :as s]))
(defn upverse [from to]
(ca/pipeline-blocking 4
to
(map (comp s/upper-case
s/reverse))
from))
(defn produce [ch xs]
(doseq [word xs]
(ca/go (>! ch word))))
(defn consume [ch]
(ca/go-loop []
(when-let [word (<! ch)]
(println "your word is:" word)
(recur))))
(defn start-engine []
(let [[from to] [(ca/chan) (ca/chan)]]
(upverse to from)
(consume from)
{:stop (fn []
(ca/close! to)
(ca/close! from)
(println "engine is stopped"))
:process (partial produce to)}))
this way you can just do (start-engine) and use it to process word sequences:
REPL time
boot.user=> (require '[pipeline])
boot.user=> (def engine (pipeline/start-engine))
#'boot.user/engine
running with it
boot.user=> ((engine :process) ["apple" "banana" "carrot"])
your word is: TORRAC
your word is: ANANAB
your word is: ELPPA
boot.user=> ((engine :process) ["do" "what" "makes" "sense"])
your word is: OD
your word is: SEKAM
your word is: ESNES
your word is: TAHW
stopping it
boot.user=> ((:stop engine))
engine is stopped
;; engine would not process anymore
boot.user=> ((engine :process) ["apple" "banana" "carrot"])
nil
State Management
Depending on how you intend to use this pipeline, a state management framework like Component might not be needed at all: no need to add anything "just in case", starting and stopping the pipeline in this case is a matter of calling two functions.
However in case this pipeline is used within a larger app with more states you could definitely benefit from a state management library.
I am not a fan of Component primarily because it requires a full app buyin (which makes it a framework), but I do respect other people using it.
mount
I would recommend to either not use anything specific in case the app is small: you, for example could compose this pipeline with other pipelines / logic and kick it off from -main, but if the app is any bigger and has more unrelated states, here is all you need to do to add mount to it:
(defstate engine :start (start-engine)
:stop ((:stop engine)))
starting pipeline
boot.user=> (mount/start)
{:started ["#'pipeline/engine"]}
running with it
boot.user=> ((engine :process) ["do" "what" "makes" "sense"])
your word is: OD
your word is: SEKAM
your word is: ESNES
your word is: TAHW
stopping it
boot.user=> (mount/stop)
engine is stopped
{:stopped ["#'pipeline/engine"]}
Here is a gist with a full example that includes build.boot.
You can just download and play with it via boot repl
[EDIT]: to answer the comments
In case you are already hooked on Component, this should get you started:
(defrecord WordEngine []
component/Lifecycle
(start [component]
(merge component (start-engine)))
(stop [component]
((:stop component))
(assoc component :process nil :stop nil)))
This, on start, would create a WordEngine object that would have a :process method.
You won't be able to call it as you would a normal Clojure function: i.e. from REPL or any namespace just by :requireing it, unless you pass a reference to the whole system around which is not recommended.
So in order to call it, this WordEngine would need to be plugged into a Component system, and injected into yet another Component which can then destructure the :process function and call it.

Related

Understanding core.async merge, in Clojure vs ClojureScript

I'm experimenting with core.async on Clojure and ClojureScript, to try and understand how merge works. In particular, whether merge makes any values put on input channels available to take immediately on the merged channel.
I have the following code:
(ns async-merge-example.core
(:require
#?(:clj [clojure.core.async :as async] :cljs [cljs.core.async :as async])
[async-merge-example.exec :as exec]))
(defn async-fn-timeout
[v]
(async/go
(async/<! (async/timeout (rand-int 5000)))
v))
(defn async-fn-exec
[v]
(exec/exec "sh" "-c" (str "sleep " (rand-int 5) "; echo " v ";")))
(defn merge-and-print-results
[seq async-fn]
(let [chans (async/merge (map async-fn seq))]
(async/go
(while (when-let [v (async/<! chans)]
(prn v)
v)))))
When I try async-fn-timeout with a large-ish seq:
(merge-and-print-results (range 20) async-fn-timeout)
For both Clojure and ClojureScript I get the result I expect, as in, results start getting printed pretty much immediately, with the expected delays.
However, when I try async-fn-exec with the same seq:
(merge-and-print-results (range 20) async-fn-exec)
For ClojureScript, I get the result I expect, as in results start getting printed pretty much immediately, with the expected delays. However for Clojure even though the sh processes are executed concurrently (subject to the size of the core.async thread pool), the results appear to be initially delayed, then mostly printed all at once! I can make this difference more obvious by increasing the size of the seq e.g. (range 40)
Since the results for async-fn-timeout are as expected on both Clojure and ClojureScript, the finger is pointed at the differences between the Clojure and ClojureScript implementation for exec..
But I don't know why this difference would cause this issue?
Notes:
These observations were made in WSL on Windows 10
The source code for async-merge-example.exec is below
In exec, the implementation differs for Clojure and ClojureScript due to differences between Clojure/Java and ClojureScript/NodeJS.
(ns async-merge-example.exec
(:require
#?(:clj [clojure.core.async :as async] :cljs [cljs.core.async :as async])))
; cljs implementation based on https://gist.github.com/frankhenderson/d60471e64faec9e2158c
; clj implementation based on https://stackoverflow.com/questions/45292625/how-to-perform-non-blocking-reading-stdout-from-a-subprocess-in-clojure
#?(:cljs (def spawn (.-spawn (js/require "child_process"))))
#?(:cljs
(defn exec-chan
"spawns a child process for cmd with args. routes stdout, stderr, and
the exit code to a channel. returns the channel immediately."
[cmd args]
(let [c (async/chan), p (spawn cmd (if args (clj->js args) (clj->js [])))]
(.on (.-stdout p) "data" #(async/put! c [:out (str %)]))
(.on (.-stderr p) "data" #(async/put! c [:err (str %)]))
(.on p "close" #(async/put! c [:exit (str %)]))
c)))
#?(:clj
(defn exec-chan
"spawns a child process for cmd with args. routes stdout, stderr, and
the exit code to a channel. returns the channel immediately."
[cmd args]
(let [c (async/chan)]
(async/go
(let [builder (ProcessBuilder. (into-array String (cons cmd (map str args))))
process (.start builder)]
(with-open [reader (clojure.java.io/reader (.getInputStream process))
err-reader (clojure.java.io/reader (.getErrorStream process))]
(loop []
(let [line (.readLine ^java.io.BufferedReader reader)
err (.readLine ^java.io.BufferedReader err-reader)]
(if (or line err)
(do (when line (async/>! c [:out line]))
(when err (async/>! c [:err err]))
(recur))
(do
(.waitFor process)
(async/>! c [:exit (.exitValue process)]))))))))
c)))
(defn exec
"executes cmd with args. returns a channel immediately which
will eventually receive a result map of
{:out [stdout-lines] :err [stderr-lines] :exit [exit-code]}"
[cmd & args]
(let [c (exec-chan cmd args)]
(async/go (loop [output (async/<! c) result {}]
(if (= :exit (first output))
(assoc result :exit (second output))
(recur (async/<! c) (update result (first output) #(conj (or % []) (second output)))))))))
Your Clojure implementation uses blocking IO in a single thread. You are first reading from stdout and then stderr in a loop. Both do a blocking readLine so they will only return once they actually finished reading a line. So unless your process creates the same amount of output to stdout and stderr one stream will end up blocking the other one.
Once the process is finished the readLine will no longer block and just return nil once the buffer is empty. So the loop just finishes reading the buffered output and then finally completes explaining the "all at once" messages.
You'll probably want to start a second thread that deals reading from stderr.
node does not do blocking IO so everything happens async by default and one stream doesn't block the other.

Can an echo program in Clojure be built with lazy infinite sequences?

Take the following program as an example:
(defn echo-ints []
(doseq [i (->> (BufferedReader. *in*)
(line-seq)
(map read-string)
(take-while integer?))]
(println i)))
The idea is to prompt the user for input and then echo it back if it's an integer. However, in this particular program almost every second input won't be echoed immediately. Instead the program will wait for additional input before processing two inputs at once.
Presumably this a consequence of some performance tweaks happening behind the scenes. However in this instance I'd really like to have an immediate feedback loop. Is there an easy way to accomplish this, or does the logic of the program have to be significantly altered?
(The main motivation here is to pass the infinite sequence of user inputs to another function f that transforms lazy sequences to other lazy sequences. If I wrote some kind of while-loop, I wouldn't be able to use f.)
It is generally not good to mix lazyness with side-effect (printing in this case), since most sequence functions have built-in optimizations that cause unintended effects while still being functionally correct.
Here's a good write up: https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects
What you are trying to do seems like a good fit for core.async channels. I would think as the problem as 'a stream of user input' instead of 'infinite sequence of user inputs', and 'f transforms lazy sequences to lazy sequences' becomes 'f transform a stream into another stream'. This will allow you to write f as transducers which you can arbitrarily compose.
I would do it like the following. Note we use spyx and spyxx from the Tupelo library to display some results.
First, write a simple version with canned test data:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t] )
(:import [java.io BufferedReader StringReader]))
(t/refer-tupelo)
(def user-input
"hello
there
and
a
1
and-a
2
and
a
3.14159
and-a
4
bye" )
(defn echo-ints
[str]
(let [lines (line-seq (BufferedReader. (StringReader. str)))
data (map read-string lines)
nums (filter integer? data) ]
(doseq [it data]
(spyxx it))
(spyx nums)))
(newline)
(echo-ints user-input)
This gives us the results:
it => <#clojure.lang.Symbol hello>
it => <#clojure.lang.Symbol there>
it => <#clojure.lang.Symbol and>
it => <#clojure.lang.Symbol a>
it => <#java.lang.Long 1>
it => <#clojure.lang.Symbol and-a>
it => <#java.lang.Long 2>
it => <#clojure.lang.Symbol and>
it => <#clojure.lang.Symbol a>
it => <#java.lang.Double 3.14159>
it => <#clojure.lang.Symbol and-a>
it => <#java.lang.Long 4>
it => <#clojure.lang.Symbol bye>
nums => (1 2 4)
So, we see that it works and gives us the numbers we want.
Next, write a looping version. We make it terminate gracefully when our test data runs out.
(defn echo-ints-loop
[str]
(loop [lines (line-seq (BufferedReader. (StringReader. str)))]
(let [line (first lines)
remaining (rest lines)
data (read-string line)]
(when (integer? data)
(println "found:" data))
(when (not-empty? remaining)
(recur remaining)))))
(newline)
(echo-ints-loop user-input)
found: 1
found: 2
found: 4
Next, we write an infinite loop to read the keyboard. You need to terminate this one with CRTL-C at the keyboard:
(ns demo.core
(:require [tupelo.core :as t])
(:import [java.io BufferedReader StringReader]))
(t/refer-tupelo)
(defn echo-ints-inf
[]
(loop [lines (line-seq (BufferedReader. *in*))]
(let [line (first lines)
remaining (rest lines)
data (read-string line)]
(when (integer? data)
(println "found:" data))
(when (not-empty? remaining)
(recur remaining)))))
(defn -main []
(println "main - enter")
(newline)
(echo-ints-inf))
And we run it manually:
~/clj > lein run
main - enter
hello
there
1
found: 1
and
a
2
found: 2
and-a
3
found: 3
further more
4
found: 4
^C
~/clj >
~/clj >

How can I return a vector?

I have a channel where I am putting values into inside a doseq loop.
This code reads from a list of isbns and for each isbn, does an amazon search to return contents of a book, and then calls another function to get the title and rank
(def book_channel (chan 10))
make sure you use clojure.core.async/into rather than clojure.core/into. Here is an example of a round trip from collection to channel and back to collection:
user> (require '[clojure.core.async :as async :refer [<! <!! >!! >! chan go]])
nil
user> (def book-chan (async/to-chan [:book1 :book2 :book3]))
#'user/book-chan
user> (<!! (clojure.core.async/into [] book-chan))
[:book1 :book2 :book3]
clojure.core.async/into returns a channel that will have exactly one item written to it. That one item will be written once it's input channel closes. This keeps the whole thing asynchronous and it does require that the code putting things into the book-channel close the chan to signal that all the books are there.
You need to do some type of coordination to determine when all of your work is finished. You can pull that coordination out into the main thread fairly easily:
(def book_channel (chan 10))
(defn concurrency_test
[list_of_isbns]
(doseq [isbn list_of_isbns]
(go (>! book_channel
(get_title_and_rank_for_one_isbn
(amazon_search isbn)))))
(prn (loop [results []]
(if (= (count results) (count list_of_isbns))
results
(recur (conj results (<!! book_channel)))))))
Here, I used a loop that keeps waiting for results and adding them to the vector until we have as many results as we do isbns. You'll want to make sure that get_title_and_rank_for_one_isbn always generates a result that can be put on a channel, otherwise the loop will wait forever.
You should close! the book_channel after you finish pushing stuff into it. Per async/into documentation - "ch must close before into produces a result."
(let [book> (chan)]
(go
(doseq [e (range 8)]
(>! book> e))
(close! book>))
(<!! (async/into [] book>)))
Alternatively, you can use async/onto-chan which will close the channel for you:
(let [book> (chan)]
(async/onto-chan book> (range 8))
(<!! (async/into [] book>)))

Clojure - tests with components strategy

I am implementing an app using Stuart Sierra component. As he states in the README :
Having a coherent way to set up and tear down all the state associated
with an application enables rapid development cycles without
restarting the JVM. It can also make unit tests faster and more
independent, since the cost of creating and starting a system is low
enough that every test can create a new instance of the system.
What would be the preferred strategy here ? Something similar to JUnit oneTimeSetUp / oneTimeTearDown , or really between each test (similar to setUp / tearDown) ?
And if between each test, is there a simple way to start/stop a system for all tests (before and after) without repeating the code every time ?
Edit : sample code to show what I mean
(defn test-component-lifecycle [f]
(println "Setting up test-system")
(let [s (system/new-test-system)]
(f s) ;; I cannot pass an argument here ( https://github.com/clojure/clojure/blob/master/src/clj/clojure/test.clj#L718 ), so how can I pass a system in parameters of a test ?
(println "Stopping test-system")
(component/stop s)))
(use-fixtures :once test-component-lifecycle)
Note : I am talking about unit-testing here.
I would write a macro, which takes a system-map and starts all components before running tests and stop all components after testing.
For example:
(ns de.hh.new-test
(:require [clojure.test :refer :all]
[com.stuartsierra.component :as component]))
;;; Macro to start and stop component
(defmacro with-started-components [bindings & body]
`(let [~(bindings 0) (component/start ~(bindings 1))]
(try
(let* ~(destructure (vec (drop 2 bindings)))
~#body)
(catch Exception e1#)
(finally
(component/stop ~(bindings 0))))))
;; Test Component
(defprotocol Action
(do-it [self]))
(defrecord TestComponent [state]
component/Lifecycle
(start [self]
(println "====> start")
(assoc self :state (atom state)))
(stop [self]
(println "====> stop"))
Action
(do-it [self]
(println "====> do action")
#(:state self)))
;: TEST
(deftest ^:focused component-test
(with-started-components
[system (component/system-map :test-component (->TestComponent"startup-state"))
test-component (:test-component system)]
(is (= "startup-state" (do-it test-component)))))
Running Test you should see the out put like this
====> start
====> do action
====> stop
Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

Agent/actor like constructs in clojure that operate on all messages received since last update

What's best way in clojure to implement something like an actor or agent (asynchronously updated, uncoordinated reference) that does the following?
gets sent messages/data
executes some function on that data to obtain new state; something like (fn [state new-msgs] ...)
continues to receive messages/data during that update
once done with that update, runs the same update function against all messages that have been sent in the interim
An agent doesn't seem quite right here. One must simultaneously send function and data to agents, which doesn't leave room for a function which operates on all data that has come in during the last update. The goal implicitly requires a decoupling of function and data.
The actor model seems generally better suited in that there is a decoupling of function and data. However, all actor frameworks I'm aware of seem to assume each message sent will be processed separately. It's not clear how one would turn this on it's head without adding extra machinery. I know Pulsar's actors accept a :lifecycle-handle function which can be used to make actors do "special tricks" but there isn't a lot of documentation around this so it's unclear whether the functionality would be helpful.
I do have a solution to this problem using agents, core.async channels, and watch functions, but it's a bit messy, and I'm hoping there is a better solution. I'll post it as a solution in case others find it helpful, but I'd like to see what other's come up with.
Here's the solution I came up with using agents, core.async channels, and watch functions. Again, it's a bit messy, but it does what I need it to for now. Here it is, in broad strokes:
(require '[clojure.core.async :as async :refer [>!! <!! >! <! chan go]])
; We'll call this thing a queued-agent
(defprotocol IQueuedAgent
(enqueue [this message])
(ping [this]))
(defrecord QueuedAgent [agent queue]
IQueuedAgent
(enqueue [_ message]
(go (>! queue message)))
(ping [_]
(send agent identity)))
; Need a function for draining a core async channel of all messages
(defn drain! [c]
(let [cc (chan 1)]
(go (>! cc ::queue-empty))
(letfn
; This fn does all the hard work, but closes over cc to avoid reconstruction
[(drainer! [c]
(let [[v _] (<!! (go (async/alts! [c cc] :priority true)))]
(if (= v ::queue-empty)
(lazy-seq [])
(lazy-seq (cons v (drainer! c))))))]
(drainer! c))))
; Constructor function
(defn queued-agent [& {:keys [buffer update-fn init-fn error-handler-builder] :or {:buffer 100}}]
(let [q (chan buffer)
a (agent (if init-fn (init-fn) {}))
error-handler-fn (error-handler-builder q a)]
; Set up the queue, and watcher which runs the update function when there is new data
(add-watch
a
:update-conv
(fn [k r o n]
(let [queued (drain! q)]
(when-not (empty? queued)
(send a update-fn queued error-handler-fn)))))
(QueuedAgent. a q)))
; Now we can use these like this
(def a (queued-agent
:init-fn (fn [] {:some "initial value"})
:update-fn (fn [a queued-data error-handler-fn]
(println "Receiving data" queued-data)
; Simulate some work/load on data
(Thread/sleep 2000)
(println "Done with work; ready to queue more up!"))
; This is a little warty at the moment, but closing over the queue and agent lets you requeue work on
; failure so you can try again.
:error-handler-builder
(fn [q a] (println "do something with errors"))))
(defn -main []
(doseq [i (range 10)]
(enqueue a (str "data" i))
(Thread/sleep 500) ; simulate things happening
; This part stinks... have to manually let the queued agent know that we've queued some things up for it
(ping a)))
As you'll notice, having to ping the queued-agent here every time new data is added is pretty warty. It definitely feels like things are being twisted out of typical usage.
Agents are the inverse of what you want here - they are a value that gets sent updating functions. This easiest with a queue and a Thread. For convenience I am using future to construct the thread.
user> (def q (java.util.concurrent.LinkedBlockingDeque.))
#'user/q
user> (defn accumulate
[summary input]
(let [{vowels true consonents false}
(group-by #(contains? (set "aeiouAEIOU") %) input)]
(-> summary
(update-in [:vowels] + (count vowels))
(update-in [:consonents] + (count consonents)))))
#'user/accumulate
user> (def worker
(future (loop [summary {:vowels 0 :consonents 0} in-string (.take q)]
(if (not in-string)
summary
(recur (accumulate summary in-string)
(.take q))))))
#'user/worker
user> (.add q "hello")
true
user> (.add q "goodbye")
true
user> (.add q false)
true
user> #worker
{:vowels 5, :consonents 7}
I came up with something closer to an actor, inspired by Tim Baldridge's cast on actors (Episode 16). I think this addresses the problem much more cleanly.
(defmacro take-all! [c]
`(loop [acc# []]
(let [[v# ~c] (alts! [~c] :default nil)]
(if (not= ~c :default)
(recur (conj acc# v#))
acc#))))
(defn eager-actor [f]
(let [msgbox (chan 1024)]
(go (loop [f f]
(let [first-msg (<! msgbox) ; do this so we park efficiently, and only
; run when there are actually messages
msgs (take-all! msgbox)
msgs (concat [first-msg] msgs)]
(recur (f msgs)))))
msgbox))
(let [a (eager-actor (fn f [ms]
(Thread/sleep 1000) ; simulate work
(println "doing something with" ms)
f))]
(doseq [i (range 20)]
(Thread/sleep 300)
(put! a i)))
;; =>
;; doing something with (0)
;; doing something with (1 2 3)
;; doing something with (4 5 6)
;; doing something with (7 8 9 10)
;; doing something with (11 12 13)