I am new to clojure and am trying to understand how to properly use its concurrency features, so any critique/suggestions is appreciated.
So I am trying to write a small test program in clojure that works as follows:
there 5 producers and 2 consumers
a producer waits for a random time and then pushes a number onto a shared queue.
a consumer should pull a number off the queue as soon as the queue is nonempty and then sleep for a short time to simulate doing work
the consumers should block when the queue is empty
producers should block when the queue has more than 4 items in it to prevent it from growing huge
Here is my plan for each step above:
the producers and consumers will be agents that don't really care for their state (just nil values or something); i just use the agents to send-off a "consumer" or "producer" function to do at some time. Then the shared queue will be (def queue (ref [])). Perhaps this should be an atom though?
in the "producer" agent function, simply (Thread/sleep (rand-int 1000)) and then (dosync (alter queue conj (rand-int 100))) to push onto the queue.
I am thinking to make the consumer agents watch the queue for changes with add-watcher. Not sure about this though..it will wake up the consumers on any change, even if the change came from a consumer pulling something off (possibly making it empty) . Perhaps checking for this in the watcher function is sufficient. Another problem I see is that if all consumers are busy, then what happens when a producer adds something new to the queue? Does the watched event get queued up on some consumer agent or does it disappear?
see above
I really don't know how to do this. I heard that clojure's seque may be useful, but I couldn't find enough doc on how to use it and my initial testing didn't seem to work (sorry don't have the code on me anymore)
Here's my take on it. I made a point of only using Clojure data structures to see how that would work out. Note that it would have been perfectly usual and idiomatic to take a blocking queue from the Java toolbox and use it here; the code would be easy to adapt, I think. Update: I actually did adapt it to java.util.concurrent.LinkedBlockingQueue, see below.
clojure.lang.PersistentQueue
Call (pro-con) to start a test run; then have a look at the contents of output to see if anything happened and queue-lengths to see if they stayed within the given bound.
Update: To explain why I felt the need to use ensure below (I was asked about this on IRC), this is to prevent write skew (see the Wikipedia article on Snapshot isolation for a definition). If I substituted #queue for (ensure queue), it would become possible for two or more producers to check the length of the queue, find that it is less than 4, then place additional items on the queue and possibly bring the total length of the queue above 4, breaking the constraint. Similarly, two consumers doing #queue could accept the same item for processing, then pop two items off the queue. ensure prevents either of these scenarios from happening.
(def go-on? (atom true))
(def queue (ref clojure.lang.PersistentQueue/EMPTY))
(def output (ref ()))
(def queue-lengths (ref ()))
(def *max-queue-length* 4)
(defn overseer
([] (overseer 20000))
([timeout]
(Thread/sleep timeout)
(swap! go-on? not)))
(defn queue-length-watch [_ _ _ new-queue-state]
(dosync (alter queue-lengths conj (count new-queue-state))))
(add-watch queue :queue-length-watch queue-length-watch)
(defn producer [tag]
(future
(while #go-on?
(if (dosync (let [l (count (ensure queue))]
(when (< l *max-queue-length*)
(alter queue conj tag)
true)))
(Thread/sleep (rand-int 2000))))))
(defn consumer []
(future
(while #go-on?
(Thread/sleep 100) ; don't look at the queue too often
(when-let [item (dosync (let [item (first (ensure queue))]
(alter queue pop)
item))]
(Thread/sleep (rand-int 500)) ; do stuff
(dosync (alter output conj item)))))) ; and let us know
(defn pro-con []
(reset! go-on? true)
(dorun (map #(%1 %2)
(repeat 5 producer)
(iterate inc 0)))
(dorun (repeatedly 2 consumer))
(overseer))
java.util.concurrent.LinkedBlockingQueue
A version of the above written using LinkedBlockingQueue. Note how the general outline of the code is basically the same, with some details actually being slightly cleaner. I removed queue-lengths from this version, as LBQ takes care of that constraint for us.
(def go-on? (atom true))
(def *max-queue-length* 4)
(def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length*))
(def output (ref ()))
(defn overseer
([] (overseer 20000))
([timeout]
(Thread/sleep timeout)
(swap! go-on? not)))
(defn producer [tag]
(future
(while #go-on?
(.put queue tag)
(Thread/sleep (rand-int 2000)))))
(defn consumer []
(future
(while #go-on?
;; I'm using .poll on the next line so as not to block
;; indefinitely if we're done; note that this has the
;; side effect that nulls = nils on the queue will not
;; be handled; there's a number of other ways to go about
;; this if this is a problem, see docs on LinkedBlockingQueue
(when-let [item (.poll queue)]
(Thread/sleep (rand-int 500)) ; do stuff
(dosync (alter output conj item)))))) ; and let us know
(defn pro-con []
(reset! go-on? true)
(dorun (map #(%1 %2)
(repeat 5 producer)
(iterate inc 0)))
(dorun (repeatedly 2 consumer))
(overseer))
Related
Below is a simplified version of an application I am working on. Specifically, I am interested in benchmarking the execution time of process-list. In the function process-list, I partition the input list into partitions equal to the number of threads I would like to execute in parallel. I then pass each partition to a thread through a call to future. Finally, In main I call process-list with time wrapped around it. Time should return the elapsed time of processing done by process-list but apparently, it only returns the amount of time it takes to create the future threads and does not wait for the futures to execute to completion. How can I dereference the futures inside process-list to ensure the elapsed time accounts for the execution of the future-threads to completion?
(ns listProcessing
(:require [clojure.string]
[clojure.pprint]
[input-random :as input]))
(def N-THREADS 4)
(def element_processing_retries (atom 0))
(def list-collection
"Each element is made into a ref"
(map ref input/myList))
(defn partition-list [threads list]
"partition list into required number of partitions which is equal
to the number of threads"
(let [partitions (partition-all
(Math/ceil (/ (count list) threads)) list)]
partitions))
(defn increase-element [element]
(ref-set element inc))
(defn process-list [list]
"Process `members of list` one by one."
(let [sub-lists (partition-list N-THREADS list)]
(doseq [sub-list sub-lists]
(let [futures '()
myFuture (future (dosync (swap! element_processing_retries inc)
(map increase-element sub-list)))]
(cons myFuture futures)
(map deref futures)))))
(defn main []
(let [f1 (future (time (process-list input/mylist)))]
#f1)
(main)
(shutdown-agents)
Below is an example of a simplified list input: Note the input here is simplified and the list processing too to simplify the question.
(ns input-random)
(def myList (list 1 2 4 7 89 12 34 45 56))
This will have some overhead. If you're trying to time millisecond differences, this will skew things a bit (although minute timings shouldn't be using time anyways).
I think your example was a little convoluted, so I reduced it down to what I think represents the problem a little better:
(time (doseq [n (range 5)]
(future
(Thread/sleep 2000))))
"Elapsed time: 1.687702 msecs"
The problem here is the same as the problem with your code: all this really does is time how long it takes for doseq to dispatch all the jobs.
The idea with my hack is to put each finished job into an atom, then check for an end condition in a busy wait:
(defn do-stuff [n-things]
(let [ret-atom (atom 0)]
(doseq [n (range n-things)]
(future
(Thread/sleep 2000)
(swap! ret-atom inc)))
ret-atom))
; Time how long it takes the entire `let` to run
(time
(let [n 5
ret-atom (do-stuff n)]
; Will block until the condition is met
(while (< #ret-atom n))))
"Elapsed time: 2002.813288 msecs"
The reason this is so hard to time is all you're doing is spinning up some side effects in a doseq. There's nothing defining what "done" is, so there's nothing to block on. I'm not great with core.async, but I suspect there may be something that may help in there. It may be possible to have a call to <!! that blocks until a channel has a certain number of elements. In that case, you would just need to put results into the channel as they're produced.
I have a code that is a combination of clj-http, core.async facilities and an atom. It creates some threads to fetch and parse a bunch of pages:
(defn fetch-page
([url] (fetch-page url nil))
([url conn-manager]
(-> (http.client/get url {:connection-manager conn-manager})
:body hickory/parse hickory/as-hickory)))
(defn- create-worker
[url-chan result conn-manager]
(async/thread
(loop [url (async/<!! url-chan)]
(when url
(swap! result assoc url (fetch-page url conn-manager))
(recur (async/<!! url-chan))))))
(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))]
; wait for workers to finish and shut conn-manager down
(dotimes [_ n-cpus] (async/alts!! workers))
(http.conn-mgr/shutdown-manager conn-manager)
(mapv #(get #pages %) urls)))
The idea is to use multiple threads to reduce the time to fetch and parse the pages, but I'd like to not overload the server, sending a lot of requests at once - that is why a connection manager was used. I don't know if my approach is correct, suggestions are welcome. Currently the problem is that the last requests fail because the connection manager is shutdown before they terminate: Exception in thread "async-thread-macro-15" java.lang.IllegalStateException: Connection pool shut down.
The main questions: how do I close the connection manager at the right moment (and why my current code fails in doing it)? The side quest: is my approach right? If not, what could I do to fetch and parse multiple pages at once, while not overloading the server?
Thanks!
The problem is that async/alts!! returns on the first result (and will keep doing so since workers never changes). I think using async/merge to build a channel and then repeatedly read off of it should work.
(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))
all-workers (async/merge workers)]
; wait for workers to finish and shut conn-manager down
(dotimes [_ n-cpus] (async/<!! all-workers))
(http.conn-mgr/shutdown-manager conn-manager)
(mapv #(get #pages %) urls)))
Alternatively, you could recur and keep shrinking workers instead so that you're only waiting on previously unfinished workers.
(defn fetch-pages
[urls]
(let [url-chan (async/to-chan urls)
pages (atom (reduce (fn [m u] (assoc m u nil)) {} urls))
conn-manager (http.conn-mgr/make-reusable-conn-manager {})
workers (mapv (fn [_] (create-worker url-chan pages conn-manager))
(range n-cpus))]
; wait for workers to finish and shut conn-manager down
(loop [workers workers]
(when (seq workers)
(let [[_ finished-worker] (async/alts!! workers)]
(recur (filterv #(not= finished-worker %) workers)))))
(http.conn-mgr/shutdown-manager conn-manager)
(mapv #(get #pages %) urls)))
I believe Alejandro is correct about the reason for your error, and this is logical, since your error indicates that you have shut down the connection manager before all requests have been completed, so it's likely that all the workers have not finished when you shut it down.
Another solution I'll propose stems from the fact that you aren't actually doing anything in your create-worker thread that requires it to be a channel, which is implicitly created by async/thread. So, you can replace it with a future, like so:
(defn- create-worker
[url-chan result conn-manager]
(future
(loop [url (a/<!! url-chan)]
(when url
(swap! result assoc url (fetch-page url conn-manager))
(recur (a/<!! url-chan))))))
And in your fetch-pages function, "join" by derefing:
(doseq [worker workers]
#worker) ; alternatively, use deref to specify timeout
This eliminates a good deal of core.async interference in what is not a core.async issue to begin with. This of course depends on you keeping your method of collecting the data as-is, that is, using swap! on an atom to keep track of page data. If you were to send the result of fetch-page out onto a return channel, or something similar, then you'd want to keep your current thread approach.
Regarding your concern about overloading the server -- you have not yet defined what it means to "overload" the server. There are two dimensions of this: one is the rate of requests (number of requests per second, for example), and the other is the number of concurrent requests. Your current app has n worker threads, and that is the effective concurrency (along with the settings in the connection manager). But this does nothing to address the rate of requests per second.
This is a little more complicated than it might seem, though it is possible. You have to consider the total of all requests done by all threads per unit of time, and managing that is not something to tackle in one answer here. I suggest you do some research about throttling and rate limiting, and give it a go, and then go from there with questions.
I have 100 workers (agents) that share one ref that contains collection of tasks. While this collection have tasks, each worker get one task from this collection (in dosync block), print it and sometimes put it back in the collection (in dosync block):
(defn have-tasks?
[tasks]
(not (empty? #tasks)))
(defn get-task
[tasks]
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task)))
(defn put-task
[tasks task]
(dosync (alter tasks conj task))
nil)
(defn worker
[& {:keys [tasks]}]
(agent {:tasks tasks}))
(defn worker-loop
[{:keys [tasks] :as state}]
(while (have-tasks? tasks)
(let [task (get-task tasks)]
(println "Task: " task)
(when (< (rand) 0.1)
(put-task tasks task))))
state)
(defn create-workers
[count & options]
(->> (range 0 count)
(map (fn [_] (apply worker options)))
(into [])))
(defn start-workers
[workers]
(doseq [worker workers] (send-off worker worker-loop)))
(def tasks (ref (range 1 10000000)))
(def workers (create-workers 100 :tasks tasks))
(start-workers workers)
(apply await workers)
When i run this code, the last value printed by agents is (after several tries):
435445,
4556294,
1322061,
3950017.
But never 9999999 what I expect.
And every time the collection is really empty at the end.
What I'm doing wrong?
Edit:
I rewrote worker-loop as simple as possible:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop []
(when-let [task (get-task tasks)]
(println "Task: " task)
(recur)))
state)
But problem is still there.
This code behaves as expected when create one and only one worker.
The problem here has nothing to do with agents and barely anything to do with laziness. Here's a somewhat reduced version of the original code that still exhibits the problem:
(defn f [init]
(let [state (ref init)
task (fn []
(loop [last-n nil]
(if-let [n (dosync
(let [n (first #state)]
(alter state rest)
n))]
(recur n)
(locking :out
(println "Last seen:" last-n)))))
workers (->> (range 0 5)
(mapv (fn [_] (Thread. task))))]
(doseq [w workers] (.start w))
(doseq [w workers] (.join w))))
(defn r []
(f (range 1 100000)))
(defn i [] (f (->> (iterate inc 1)
(take 100000))))
(defn t []
(f (->> (range 1 100000)
(take Integer/MAX_VALUE))))
Running this code shows that both i and t, both lazy, reliably work, whereas r reliably doesn't. The problem is in fact a concurrency bug in the class returned by the range call. Indeed, that bug is documented in this Clojure ticket and is fixed as of Clojure version 1.9.0-alpha11.
A quick summary of the bug in case the ticket is not accessible for some reason: in the internals of the rest call on the result of range, there was a small opportunity for a race condition: the "flag" that says "the next value has already been computed" was set before the actual value itself, which meant that a second thread could see that flag as true even though the "next value" is still nil. The call to alter would then fix that nil value on the ref. It's been fixed by swapping the two assignment lines.
In cases where the result of range was either forcibly realized in a single thread or wrapped in another lazy seq, that bug would not appear.
I asked this question on the Clojure Google Group and it helped me to find the answer.
The problem is that I used a lazy sequence within the STM transaction.
When I replaced this code:
(def tasks (ref (range 1 10000000)))
by this:
(def tasks (ref (into [] (range 1 10000000))))
it worked as expected!
In my production code where the problem occurred, I used the Korma framework that also returns a lazy collection of tuples, as in my example.
Conclusion: Avoid the use of lazy data structures within the STM transaction.
When the last number in the range is reached, there a are still older numbers being held by the workers. Some of these will be returned to the queue, to be processed again.
In order to better see what is happening, you can change worker-loop to print the last task handled by each worker:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop [last-task nil]
(if (have-tasks? tasks)
(let [task (get-task tasks)]
;; (when (< (rand) 0.1)
;; (put-task tasks task)
(recur task))
(when last-task
(println "Last task:" last-task))))
state)
This also shows the race condition in the code, where tasks seen by have-tasks? often is taken by others when get-task is called near the end of the processing of the tasks.
The race condition can be solved by removing have-tasks? and instead using the return value of nil from get-task as a signal that no more tasks are available (at the moment).
Updated:
As observed, this race conditions does not explain the problem.
Neither is the problem solved by removing a possible race condition in get-task like this:
(defn get-task [tasks]
(dosync
(first (alter tasks rest))))
However changing get-task to use an explicit lock seems to solve the problem:
(defn get-task [tasks]
(locking :lock
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task))))
Is there an elegant way to stop a running go block?
(without introducing a flag and polluting the code with checks/branches)
(ns example
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [cljs.core.async :refer [<! timeout]]))
(defn some-long-task []
(go
(println "entering")
; some complex long-running task (e.g. fetching something via network)
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(println "leaving")))
; run the task
(def task (some-long-task))
; later, realize we no longer need the result and want to cancel it
; (stop! task)
Sorry, this is not possible with core.async today. What you get back from creating a go block is a normal channel what the result of the block will be put on, though this does not give you any handle to the actual block itself.
As stated in Arthur's answer, you cannot terminate a go block immediately, but you since your example indicates a multi-phased task (using sub-tasks), an approach like this might work:
(defn task-processor
"Takes an initial state value and number of tasks (fns). Puts tasks
on a work queue channel and then executes them in a go-loop, with
each task passed the current state. A task's return value is used as
input for next task. When all tasks are processed or queue has been
closed, places current result/state onto a result channel. To allow
nil values, result is wrapped in a map:
{:value state :complete? true/false}
This fn returns a map of {:queue queue-chan :result result-chan}"
[init & tasks]
(assert (pos? (count tasks)))
(let [queue (chan)
result (chan)]
(async/onto-chan queue tasks)
(go-loop [state init, i 0]
(if-let [task (<! queue)]
(recur (task state) (inc i))
(do (prn "task queue finished/terminated")
(>! result {:value state :complete? (== i (count tasks))}))))
{:queue queue
:result result}))
(defn dummy-task [x] (prn :task x) (Thread/sleep 1000) (inc x))
;; kick of tasks
(def proc (apply task-processor 0 (repeat 100 dummy-task)))
;; result handler
(go
(let [res (<! (:result proc))]
(prn :final-result res)))
;; to stop the queue after current task is complete
;; in this example it might take up to an additional second
;; for the terminated result to be delivered
(close! (:queue proc))
You may want to use future and future-cancel for such task.
(def f (future (while (not (Thread/interrupted)) (your-function ... ))))
(future-cancel f)
Why do cancelled Clojure futures continue using CPU?
What's best way in clojure to implement something like an actor or agent (asynchronously updated, uncoordinated reference) that does the following?
gets sent messages/data
executes some function on that data to obtain new state; something like (fn [state new-msgs] ...)
continues to receive messages/data during that update
once done with that update, runs the same update function against all messages that have been sent in the interim
An agent doesn't seem quite right here. One must simultaneously send function and data to agents, which doesn't leave room for a function which operates on all data that has come in during the last update. The goal implicitly requires a decoupling of function and data.
The actor model seems generally better suited in that there is a decoupling of function and data. However, all actor frameworks I'm aware of seem to assume each message sent will be processed separately. It's not clear how one would turn this on it's head without adding extra machinery. I know Pulsar's actors accept a :lifecycle-handle function which can be used to make actors do "special tricks" but there isn't a lot of documentation around this so it's unclear whether the functionality would be helpful.
I do have a solution to this problem using agents, core.async channels, and watch functions, but it's a bit messy, and I'm hoping there is a better solution. I'll post it as a solution in case others find it helpful, but I'd like to see what other's come up with.
Here's the solution I came up with using agents, core.async channels, and watch functions. Again, it's a bit messy, but it does what I need it to for now. Here it is, in broad strokes:
(require '[clojure.core.async :as async :refer [>!! <!! >! <! chan go]])
; We'll call this thing a queued-agent
(defprotocol IQueuedAgent
(enqueue [this message])
(ping [this]))
(defrecord QueuedAgent [agent queue]
IQueuedAgent
(enqueue [_ message]
(go (>! queue message)))
(ping [_]
(send agent identity)))
; Need a function for draining a core async channel of all messages
(defn drain! [c]
(let [cc (chan 1)]
(go (>! cc ::queue-empty))
(letfn
; This fn does all the hard work, but closes over cc to avoid reconstruction
[(drainer! [c]
(let [[v _] (<!! (go (async/alts! [c cc] :priority true)))]
(if (= v ::queue-empty)
(lazy-seq [])
(lazy-seq (cons v (drainer! c))))))]
(drainer! c))))
; Constructor function
(defn queued-agent [& {:keys [buffer update-fn init-fn error-handler-builder] :or {:buffer 100}}]
(let [q (chan buffer)
a (agent (if init-fn (init-fn) {}))
error-handler-fn (error-handler-builder q a)]
; Set up the queue, and watcher which runs the update function when there is new data
(add-watch
a
:update-conv
(fn [k r o n]
(let [queued (drain! q)]
(when-not (empty? queued)
(send a update-fn queued error-handler-fn)))))
(QueuedAgent. a q)))
; Now we can use these like this
(def a (queued-agent
:init-fn (fn [] {:some "initial value"})
:update-fn (fn [a queued-data error-handler-fn]
(println "Receiving data" queued-data)
; Simulate some work/load on data
(Thread/sleep 2000)
(println "Done with work; ready to queue more up!"))
; This is a little warty at the moment, but closing over the queue and agent lets you requeue work on
; failure so you can try again.
:error-handler-builder
(fn [q a] (println "do something with errors"))))
(defn -main []
(doseq [i (range 10)]
(enqueue a (str "data" i))
(Thread/sleep 500) ; simulate things happening
; This part stinks... have to manually let the queued agent know that we've queued some things up for it
(ping a)))
As you'll notice, having to ping the queued-agent here every time new data is added is pretty warty. It definitely feels like things are being twisted out of typical usage.
Agents are the inverse of what you want here - they are a value that gets sent updating functions. This easiest with a queue and a Thread. For convenience I am using future to construct the thread.
user> (def q (java.util.concurrent.LinkedBlockingDeque.))
#'user/q
user> (defn accumulate
[summary input]
(let [{vowels true consonents false}
(group-by #(contains? (set "aeiouAEIOU") %) input)]
(-> summary
(update-in [:vowels] + (count vowels))
(update-in [:consonents] + (count consonents)))))
#'user/accumulate
user> (def worker
(future (loop [summary {:vowels 0 :consonents 0} in-string (.take q)]
(if (not in-string)
summary
(recur (accumulate summary in-string)
(.take q))))))
#'user/worker
user> (.add q "hello")
true
user> (.add q "goodbye")
true
user> (.add q false)
true
user> #worker
{:vowels 5, :consonents 7}
I came up with something closer to an actor, inspired by Tim Baldridge's cast on actors (Episode 16). I think this addresses the problem much more cleanly.
(defmacro take-all! [c]
`(loop [acc# []]
(let [[v# ~c] (alts! [~c] :default nil)]
(if (not= ~c :default)
(recur (conj acc# v#))
acc#))))
(defn eager-actor [f]
(let [msgbox (chan 1024)]
(go (loop [f f]
(let [first-msg (<! msgbox) ; do this so we park efficiently, and only
; run when there are actually messages
msgs (take-all! msgbox)
msgs (concat [first-msg] msgs)]
(recur (f msgs)))))
msgbox))
(let [a (eager-actor (fn f [ms]
(Thread/sleep 1000) ; simulate work
(println "doing something with" ms)
f))]
(doseq [i (range 20)]
(Thread/sleep 300)
(put! a i)))
;; =>
;; doing something with (0)
;; doing something with (1 2 3)
;; doing something with (4 5 6)
;; doing something with (7 8 9 10)
;; doing something with (11 12 13)