I've got a number of workers running, pulling work items off a queue. Something like:
(def num-workers 100)
(def num-tied-up (atom 0))
(def num-started (atom 0))
(def input-queue (chan (dropping-buffer 200))
(dotimes [x num-workers]
(go
(swap num-started inc)
(loop []
(let [item (<! input-queue)]
(swap! num-tied-up inc)
(try
(process-f worker-id item)
(catch Exception e _))
(swap! num-tied-up dec))
(recur)))
(swap! num-started dec))))
Hopefully num-tied-up represents the number of workers performing work at a given point in time. The value of num-tied-up hovers around a fairly consistent 50, sometimes 60. As num-workers is 100 and the value of num-started is, as expected 100 (i.e. all the go routines are running), this feels like there's a comfortable margin.
My problem is that input-queue is growing. I would expect it to hover around the zero mark because there are enough workers to take items off it. But in practice, eventually it maxes out and drops events.
It looks like tied-up has plenty of head-room in num-workers so workers should be available to take work off the queue.
My questions:
Is there anything I can do to make this more robust?
Are there any other diagnostics I can use to work out what's going wrong? Is there a way to monitor the number of goroutines currently working in case they die?
Can you make the observations fit with the data?
When I fix the indentation of your code, this is what I see:
(def num-workers 100)
(def num-tied-up (atom 0))
(def num-started (atom 0))
(def input-queue (chan (dropping-buffer 200))
(dotimes [x num-workers]
(go
(swap num-started inc)
(loop []
(let [item (<! input-queue)]
(swap! num-tied-up inc)
(try
(process-f worker-id item)
(catch Exception e _))
(swap! num-tied-up dec))
(recur)))
(swap! num-started dec))
)) ;; These parens don't balance.
Ignoring the extra parens, which I assume are some copy/paste error, here are some observations:
You increment num-started inside of a go thread, but you decrement it outside of the go thread immediately after creating that thread. There's a good likelihood that the decrement is always happening before the increment.
The 100 loops you create (one per go thread) never terminate. This in and of itself isn't a problem, as long as this is intentional and by design.
Remember, spawning a go thread doesn't mean that the current thread (the one doing the spawning, in which the dotimes executes) will block. I may be mistaken, but it looks as though your code is making the assumption that (swap! num-started dec) will only run when the go thread spawned immediately above it is finished. But this is not true, even if your go threads did eventually finish (which, as mentioned above, they don't).
Work done by go routines shouldn't do any IO or blocking operation (like Thread/sleep) as all go routines share the same thread pool, which right now has a fixed size of cpus * 2 + 42. For IO bounded work, use core.async/thread.
Note that the thread pool limits the number of go routines that will be executing concurrently, but you can have plenty waiting to be executed.
As an analogy, if you start Chrome, Vim and iTunes (equivalent to 3 go routines) but you have only one CPU in your laptop (equivalent to a thread pool of size 1), then only one of them will be executing in the CPU and the other will be waiting to be executed. It is the OS the one that takes care of pausing/resuming the programs so it looks like they are all running at the same time. Core.async just does the same, but with the difference that core.async can just pause the go-routines when they hit a !.
Now to answer your questions:
No. Try/catch all exceptions is your best option. Also, monitor the size of the queue. I would reevaluate the need of core.async. Zach Tellman has a very nice thread pool implementation with plenty of metrics.
A thread dump will show you where all the core.async threads are blocking, but as I said, you shouldn't really be doing any IO work in the go-routine threads. Have a look at core.async/pipeline-blocking
If you have 4 cores, you will get a core.async thread pool of 50, which matches your observations of 50 concurrent go blocks. The other 50 go blocks are running but either waiting for work to appear in the queue or for a time-slot to be executed. Note that they all have had a chance to be executed at least once as num-started is 100.
Hope it helps.
Related
According to the official Clojure docs:
Since another thread may have changed the value in the intervening time, it [swap!] may have to
retry, and does so in a spin loop.
Does this mean that the thread performing the swap might possibly get stuck forever inside swap! if the atom in question never returns to the value read by swap!?
Yes. If you have one very slow mutation competing with a large number of fast operations, the slow operation will have to retry every time, and will not finish until all the fast operations finish. If the fast operations go on indefinitely, then the slow one will never finish.
For example, try:
(time
(let [a (atom 0)]
(future (dotimes [_ 1e9] (swap! a inc)))
(swap! a (fn [x] (Thread/sleep 1000) (* 2 x)))))
You'll see, first of all, that it takes a long time to finish, much longer than a second. This is because the swap! outside of the loop can't make any progress until the smaller tasks have all finished. You'll also see that the answer you get is exactly 2000000000, meaning that the doubling operation definitely happened last, after every single increment. If there were more increments, they would all get "priority".
I've additionally thought of a few cute ways to deadlock an atom forever without tying up any more threads at all!
One approach is to get the thread to race with itself:
(let [a (atom 0)]
(swap! a (fn [x]
(swap! a inc')
(inc' x))))
I use inc' so that it really is forever: it won't break after Long/MAX_VALUE.
And a way that doesn't even involve another swap! operation, much less another thread!
(swap! (atom (repeat 1)) rest)
Here the problem is that the .equals comparison in compare-and-swap never terminates, because (repeat 1) goes on forever.
No*. When the atom re-trys the operation, it uses the "new" value for the comparison.
You can find many examples online (like here and also here) where people have used from dozens to hundreds of threads to pound on an atom, and it always returns the correct result.
* Unless you have an infinite number of interruptions from competing, "faster" threads.
I'm going through the book 7 concurrency models in 7 weeks. In it philosophers are represented as a number of ref's:
(def philosophers (into [] (repeatedly 5 #(ref :thinking))))
The state of each philosopher is flipped between :thinking and :eating using dosync transactions to ensure consistency.
Now I want to have a thread that outputs current status, so that I can be sure that the state is valid at all times:
(defn status-thread []
(Thread.
#(while true
(dosync
(println (map (fn [p] #p) philosophers))
(Thread/sleep 100)))))
We use multiple # to read values of each philosopher. It can happen that some refs are changed as we map over philosophers. Would it cause us to print inconsistent state although we don't have one?
I'm aware that Clojure uses MVCC to implement STM, but I'm not sure that I apply it correctly.
My transaction contains side effects and generally they should not appear inside a transaction. But in this case, transaction will always succeed and side effect should take place only once. Is it acceptable?
Your transaction doesn't really need a side effect, and if you scale the problem up enough I believe the transaction could fail for lack of history and retry the side effect if there's a lot of writing going on. I think the more appropriate way here would be to pull the dosync closer in. The transaction should be a pure, side-effect free fact finding mission. Once that has resulted in a value, you are then free to perform side effects with it without affecting the STM.
(defn status-thread []
(-> #(while true
(println (dosync (mapv deref philosophers)))
(Thread/sleep 100))
Thread.
.start)) ;;Threw in starting of the thread for my own testing
A few things I want to mention here:
# is a reader macro for the deref fn, so (fn [p] #p) is equivalent to just deref.
You should avoid laziness within transactions as some of the lazy values may be evaluated outside the context of the dosync or not at all. For mappings that means you can use e.g. doall, or like here just the eagerly evaluated mapv variant that makes a vector rather than a sequence.
This contingency was included in the STM design.
This problem is explicitly solved by combining agents with refs. refs guarantee that all messages set to agents in a transaction are sent exactly once and they are only sent when the transaction commits. If the transaction is retried then they will be dropped and not sent. When the transaction does eventually get through they will be sent at the moment the transaction commits.
(def watcher (agent nil))
(defn status-thread []
(future
(while true
(dosync
(send watcher (fn [_] (println (map (fn [p] #p) philosophers))))
(Thread/sleep 100)))))
The STM guarantees that your transaction will not be committed if the refs you deref during the transaction where changes in an incompatible way while it was running. You don't need to explicitly worry about derefing multiple refs in a transaction (that what the STM was made for)
Consider the following snippet:
(let [chs (repeatedly 10 chan)]
(doseq [c chs]
(>!! c "hello"))
(doseq [c chs]
(println (<!! c))))
Executing this will hang forever. Why is that?
If I do (go (>! c "hello")) instead, it works just fine.
To make an asynchronous put, use clojure.core.async/put!
(let [chs (repeatedly 10 chan)]
(doseq [c chs]
(put! c "hello"))
(doseq [c chs]
(println (<!! c))))
This works in this example as <!! will always unblock because of all necessary puts happening asynchronously. Notice the following things:
Blocking serves as a synchronization constraint between different processes
>!! and <!! block the main-thread. go routines run on the main thread but their code is modified via macroexpansion so that execution control is inverted and they can be parked/executed successively ordered by the laws of core.async channel blocking/buffering logic. This technique is commonly referred to as IOC (inversion of control) state machine.
ClojureScript has only one thread. Consequently its implementation of core.async doesn't even contain >!!/<!!. If you write code intended to be ClojureScript compatible, only take from channels within go-routines or dispatch values from them in higher order functions passed to take! and always do puts either in go-routines or use put!.
Is (go (>! ch v)) equivalent to (put! ch v)?
Yes but it's not the same. put! is an API wrapper around the channels implementation of the core.async.impl.protocols/WritePort put! method. Macroexpansion of (go (>! ch v)) ends up in the same method call happening but wraps it in lots of generated state-machine code to possibly park the putting operation and pause execution of the go-routine until a consumer is ready to take from ch (try to (macroexpand `(go (>! ch v))) yourself). Spawning a go-block to only do one asynchronous putting operation is kind of a waste and performs worse than calling put! right away. go spawns and returns an extra channel that you can take its bodies result from. This offers you to await completion of its execution which you didn't intend to do in your example (aiming at an asynchronous operation).
That channel has no buffer, and >!! is blocking. Refer to the examples for this exact case. They spawn a second thread for this reason - to prevent blocking the main thread. Using a goroutine, as in your question, operates on a similar principle.
You can also create the channel with some buffer space - (chan 1)
I am trying to understand Clojure futures, and I've seen examples from the common Clojure books out there, and there are examples where futures are used for parallel computations (which seems to makes sense).
However, I am hoping someone can explain the behavior of a simple example adapted from O'Reilly's Programming Clojure book.
(def long-calculation (future (apply + (range 1e8))))
When I try to dereference this, by doing
(time #long-calculation)
It returns the correct result (4999999950000000), but almost instantly (in 0.045 msecs) on my machine.
But when I call the actual function, like so
(time (apply + (range 1e8)))
I get the correct result as well, but the time taken is much larger (~ 5000 msecs).
When I dereference the future, my understanding is that a new thread is created on which the expression is evaluated - in which case I would expect it to take around 5000 msec as well.
How come the dereferenced future returns the correct result so quickly?
The calculation in a future starts as soon as you create the future (in a separate thread). In your case, the calculation starts as soon as you execute (def long-calculation ....)
Dereferencing will do one of two things:
If the future has not completed, block until it completes and then return the value (this could take an arbitrary amount of time, or even never complete if the future fails to terminate)
If the future has completed, return the result. This is almost instantaneous (which is why you are seeing very fast dereference returns)
You can see the effect by comparing the following:
;; dereference before future completes
(let [f (future (Thread/sleep 1000))]
(time #f))
=> "Elapsed time: 999.46176 msecs"
;; dereference after future completes
(let [f (future (Thread/sleep 1000))]
(Thread/sleep 2000)
(time #f))
=> "Elapsed time: 0.039598 msecs"
What is the correct way, in Clojure, to do parallel processing when each job of the processing can occur in utter isolation and may generate a list of additional jobs that need to be evaluated?
My actual problem is a nutritional calculation problem, but I will put this in the form of Chess which shares the same problem space traits as my calculation.
Assume, for instance, that I am trying to find all of the moves to Checkmate in a game of Chess. When searching through the board states, I would start out with 20 possible states, each representing a different possible opening move. Each of those will need to be evaluated, accepted or rejected, and then for each accepted move, a new list of jobs would be created representing all of the possible next moves. The jobs would look like this:
initial: '([] proposed-move)
accepted: '([move] proposed-response)
'([move move] proposed-response)
The number of states to evaluates grows as a result of each computation, and each state can be evaluated in complete isolation from all of the others.
A solution I am playing with goes as such:
; a list of all final solutions, each of which is a sequence of moves
(def solutions (agent []))
; a list of all jobs pending evaluation
(def jobs (agent []))
Given these definitions, I would have a java thread pool, and each thread would request a job from the jobs agent (and wait for that request to be fulfilled). It would then run the calculation, generate a list of solutions and possible solutions. Finally, it would send the solutions to the solutions agent, and the possible solutions to the jobs agent.
Is using a combination of agents and threads the most idiomatic way to go in this case? Can I even get data out of the job queue in the way I am proposing?
Or should my jobs be a java.util.concurrent.LinkedBlockingQueue, as described in Producer consumer with qualifications?
You can do this with the following approach:
Repeated applications of pmap (which provides parallel processing of all elements in collection)
The function used in pmap returns a list of elements. Could be zero, one or multiple elements, which will then be processed in the next iteration
The results get recombined with concat
You repeat the processing of the list for as many times as you like, perhaps storing the result in an atom.
Example code could be something like the following
(def jobs (atom '(1 10 100)))
(defn process-element [value]
(if (< (rand) 0.8)
[(inc value)]
[]))
(defn do-processing []
(swap! jobs
(fn [job-list] (apply concat (pmap process-element job-list)))))
(while (seq #jobs)
(prn #jobs)
(do-processing))
Whick could produce output like:
(1 10 100)
(2 11 101)
(3 12 102)
(4 13 103)
(5 14 104)
(6 15 105)
(7 106)
(107)
(108)
(109)
nil
Note that you need to be a bit careful to make sure your algorithm terminates! In the example this is guaranteed by the elements dying off over time, but if your seach space is growing then you will probably want to apply a time limit instead of just using a (while ... ) loop.
Your approach with agents and threads seems quite close to (what I see as) idiomatic clojure.
the only thing I would change to make it more "clojure like" would be to use pmap to iterate over queue that is stored in an agent. using pmap instead of your own thread pool will save you the effort of managing the thread pool because pmap already uses clojure's thread pool which is initialized properly for the current number of processors. it also helps you take advantage of sequence chunking (which perhaps could help).
You could also use channels. Maybe something like this:
(def jobs (chan))
(def solutions (chan))
(def accepted-solutions (atom (vector)))
(go (loop [job (<! jobs)]
(when job
(go (doseq [solution (process-job-into-solutions job)]
(>! solutions)))
(recur (<! jobs)))))
(go (loop [solution (<! solutions)]
(when (acceptable? solution)
(swap! accepted-solutions conj solution)
(doseq [new-job (generate-new-jobs solution)]
(>! jobs))
(recur (<! solutions)))))
(>!! jobs initial-job)