Which clojure parallelism technique to use when searching a growing solution space? - clojure

What is the correct way, in Clojure, to do parallel processing when each job of the processing can occur in utter isolation and may generate a list of additional jobs that need to be evaluated?
My actual problem is a nutritional calculation problem, but I will put this in the form of Chess which shares the same problem space traits as my calculation.
Assume, for instance, that I am trying to find all of the moves to Checkmate in a game of Chess. When searching through the board states, I would start out with 20 possible states, each representing a different possible opening move. Each of those will need to be evaluated, accepted or rejected, and then for each accepted move, a new list of jobs would be created representing all of the possible next moves. The jobs would look like this:
initial: '([] proposed-move)
accepted: '([move] proposed-response)
'([move move] proposed-response)
The number of states to evaluates grows as a result of each computation, and each state can be evaluated in complete isolation from all of the others.
A solution I am playing with goes as such:
; a list of all final solutions, each of which is a sequence of moves
(def solutions (agent []))
; a list of all jobs pending evaluation
(def jobs (agent []))
Given these definitions, I would have a java thread pool, and each thread would request a job from the jobs agent (and wait for that request to be fulfilled). It would then run the calculation, generate a list of solutions and possible solutions. Finally, it would send the solutions to the solutions agent, and the possible solutions to the jobs agent.
Is using a combination of agents and threads the most idiomatic way to go in this case? Can I even get data out of the job queue in the way I am proposing?
Or should my jobs be a java.util.concurrent.LinkedBlockingQueue, as described in Producer consumer with qualifications?

You can do this with the following approach:
Repeated applications of pmap (which provides parallel processing of all elements in collection)
The function used in pmap returns a list of elements. Could be zero, one or multiple elements, which will then be processed in the next iteration
The results get recombined with concat
You repeat the processing of the list for as many times as you like, perhaps storing the result in an atom.
Example code could be something like the following
(def jobs (atom '(1 10 100)))
(defn process-element [value]
(if (< (rand) 0.8)
[(inc value)]
[]))
(defn do-processing []
(swap! jobs
(fn [job-list] (apply concat (pmap process-element job-list)))))
(while (seq #jobs)
(prn #jobs)
(do-processing))
Whick could produce output like:
(1 10 100)
(2 11 101)
(3 12 102)
(4 13 103)
(5 14 104)
(6 15 105)
(7 106)
(107)
(108)
(109)
nil
Note that you need to be a bit careful to make sure your algorithm terminates! In the example this is guaranteed by the elements dying off over time, but if your seach space is growing then you will probably want to apply a time limit instead of just using a (while ... ) loop.

Your approach with agents and threads seems quite close to (what I see as) idiomatic clojure.
the only thing I would change to make it more "clojure like" would be to use pmap to iterate over queue that is stored in an agent. using pmap instead of your own thread pool will save you the effort of managing the thread pool because pmap already uses clojure's thread pool which is initialized properly for the current number of processors. it also helps you take advantage of sequence chunking (which perhaps could help).

You could also use channels. Maybe something like this:
(def jobs (chan))
(def solutions (chan))
(def accepted-solutions (atom (vector)))
(go (loop [job (<! jobs)]
(when job
(go (doseq [solution (process-job-into-solutions job)]
(>! solutions)))
(recur (<! jobs)))))
(go (loop [solution (<! solutions)]
(when (acceptable? solution)
(swap! accepted-solutions conj solution)
(doseq [new-job (generate-new-jobs solution)]
(>! jobs))
(recur (<! solutions)))))
(>!! jobs initial-job)

Related

What happens if a thread changes the value of an atom during `swap!`?

According to the official Clojure docs:
Since another thread may have changed the value in the intervening time, it [swap!] may have to
retry, and does so in a spin loop.
Does this mean that the thread performing the swap might possibly get stuck forever inside swap! if the atom in question never returns to the value read by swap!?
Yes. If you have one very slow mutation competing with a large number of fast operations, the slow operation will have to retry every time, and will not finish until all the fast operations finish. If the fast operations go on indefinitely, then the slow one will never finish.
For example, try:
(time
(let [a (atom 0)]
(future (dotimes [_ 1e9] (swap! a inc)))
(swap! a (fn [x] (Thread/sleep 1000) (* 2 x)))))
You'll see, first of all, that it takes a long time to finish, much longer than a second. This is because the swap! outside of the loop can't make any progress until the smaller tasks have all finished. You'll also see that the answer you get is exactly 2000000000, meaning that the doubling operation definitely happened last, after every single increment. If there were more increments, they would all get "priority".
I've additionally thought of a few cute ways to deadlock an atom forever without tying up any more threads at all!
One approach is to get the thread to race with itself:
(let [a (atom 0)]
(swap! a (fn [x]
(swap! a inc')
(inc' x))))
I use inc' so that it really is forever: it won't break after Long/MAX_VALUE.
And a way that doesn't even involve another swap! operation, much less another thread!
(swap! (atom (repeat 1)) rest)
Here the problem is that the .equals comparison in compare-and-swap never terminates, because (repeat 1) goes on forever.
No*. When the atom re-trys the operation, it uses the "new" value for the comparison.
You can find many examples online (like here and also here) where people have used from dozens to hundreds of threads to pound on an atom, and it always returns the correct result.
* Unless you have an infinite number of interruptions from competing, "faster" threads.

Understanding STM properties in Clojure

I'm going through the book 7 concurrency models in 7 weeks. In it philosophers are represented as a number of ref's:
(def philosophers (into [] (repeatedly 5 #(ref :thinking))))
The state of each philosopher is flipped between :thinking and :eating using dosync transactions to ensure consistency.
Now I want to have a thread that outputs current status, so that I can be sure that the state is valid at all times:
(defn status-thread []
(Thread.
#(while true
(dosync
(println (map (fn [p] #p) philosophers))
(Thread/sleep 100)))))
We use multiple # to read values of each philosopher. It can happen that some refs are changed as we map over philosophers. Would it cause us to print inconsistent state although we don't have one?
I'm aware that Clojure uses MVCC to implement STM, but I'm not sure that I apply it correctly.
My transaction contains side effects and generally they should not appear inside a transaction. But in this case, transaction will always succeed and side effect should take place only once. Is it acceptable?
Your transaction doesn't really need a side effect, and if you scale the problem up enough I believe the transaction could fail for lack of history and retry the side effect if there's a lot of writing going on. I think the more appropriate way here would be to pull the dosync closer in. The transaction should be a pure, side-effect free fact finding mission. Once that has resulted in a value, you are then free to perform side effects with it without affecting the STM.
(defn status-thread []
(-> #(while true
(println (dosync (mapv deref philosophers)))
(Thread/sleep 100))
Thread.
.start)) ;;Threw in starting of the thread for my own testing
A few things I want to mention here:
# is a reader macro for the deref fn, so (fn [p] #p) is equivalent to just deref.
You should avoid laziness within transactions as some of the lazy values may be evaluated outside the context of the dosync or not at all. For mappings that means you can use e.g. doall, or like here just the eagerly evaluated mapv variant that makes a vector rather than a sequence.
This contingency was included in the STM design.
This problem is explicitly solved by combining agents with refs. refs guarantee that all messages set to agents in a transaction are sent exactly once and they are only sent when the transaction commits. If the transaction is retried then they will be dropped and not sent. When the transaction does eventually get through they will be sent at the moment the transaction commits.
(def watcher (agent nil))
(defn status-thread []
(future
(while true
(dosync
(send watcher (fn [_] (println (map (fn [p] #p) philosophers))))
(Thread/sleep 100)))))
The STM guarantees that your transaction will not be committed if the refs you deref during the transaction where changes in an incompatible way while it was running. You don't need to explicitly worry about derefing multiple refs in a transaction (that what the STM was made for)

Monitor goroutine and channel leaking / starvation?

I've got a number of workers running, pulling work items off a queue. Something like:
(def num-workers 100)
(def num-tied-up (atom 0))
(def num-started (atom 0))
(def input-queue (chan (dropping-buffer 200))
(dotimes [x num-workers]
(go
(swap num-started inc)
(loop []
(let [item (<! input-queue)]
(swap! num-tied-up inc)
(try
(process-f worker-id item)
(catch Exception e _))
(swap! num-tied-up dec))
(recur)))
(swap! num-started dec))))
Hopefully num-tied-up represents the number of workers performing work at a given point in time. The value of num-tied-up hovers around a fairly consistent 50, sometimes 60. As num-workers is 100 and the value of num-started is, as expected 100 (i.e. all the go routines are running), this feels like there's a comfortable margin.
My problem is that input-queue is growing. I would expect it to hover around the zero mark because there are enough workers to take items off it. But in practice, eventually it maxes out and drops events.
It looks like tied-up has plenty of head-room in num-workers so workers should be available to take work off the queue.
My questions:
Is there anything I can do to make this more robust?
Are there any other diagnostics I can use to work out what's going wrong? Is there a way to monitor the number of goroutines currently working in case they die?
Can you make the observations fit with the data?
When I fix the indentation of your code, this is what I see:
(def num-workers 100)
(def num-tied-up (atom 0))
(def num-started (atom 0))
(def input-queue (chan (dropping-buffer 200))
(dotimes [x num-workers]
(go
(swap num-started inc)
(loop []
(let [item (<! input-queue)]
(swap! num-tied-up inc)
(try
(process-f worker-id item)
(catch Exception e _))
(swap! num-tied-up dec))
(recur)))
(swap! num-started dec))
)) ;; These parens don't balance.
Ignoring the extra parens, which I assume are some copy/paste error, here are some observations:
You increment num-started inside of a go thread, but you decrement it outside of the go thread immediately after creating that thread. There's a good likelihood that the decrement is always happening before the increment.
The 100 loops you create (one per go thread) never terminate. This in and of itself isn't a problem, as long as this is intentional and by design.
Remember, spawning a go thread doesn't mean that the current thread (the one doing the spawning, in which the dotimes executes) will block. I may be mistaken, but it looks as though your code is making the assumption that (swap! num-started dec) will only run when the go thread spawned immediately above it is finished. But this is not true, even if your go threads did eventually finish (which, as mentioned above, they don't).
Work done by go routines shouldn't do any IO or blocking operation (like Thread/sleep) as all go routines share the same thread pool, which right now has a fixed size of cpus * 2 + 42. For IO bounded work, use core.async/thread.
Note that the thread pool limits the number of go routines that will be executing concurrently, but you can have plenty waiting to be executed.
As an analogy, if you start Chrome, Vim and iTunes (equivalent to 3 go routines) but you have only one CPU in your laptop (equivalent to a thread pool of size 1), then only one of them will be executing in the CPU and the other will be waiting to be executed. It is the OS the one that takes care of pausing/resuming the programs so it looks like they are all running at the same time. Core.async just does the same, but with the difference that core.async can just pause the go-routines when they hit a !.
Now to answer your questions:
No. Try/catch all exceptions is your best option. Also, monitor the size of the queue. I would reevaluate the need of core.async. Zach Tellman has a very nice thread pool implementation with plenty of metrics.
A thread dump will show you where all the core.async threads are blocking, but as I said, you shouldn't really be doing any IO work in the go-routine threads. Have a look at core.async/pipeline-blocking
If you have 4 cores, you will get a core.async thread pool of 50, which matches your observations of 50 concurrent go blocks. The other 50 go blocks are running but either waiting for work to appear in the queue or for a time-slot to be executed. Note that they all have had a chance to be executed at least once as num-started is 100.
Hope it helps.

Clojure confusion - behavior of map, doseq in a multiprocess environment

In trying to replicate some websockets examples I've run into some behavior I don't understand and can't seem to find documentation for. Simplified, here's an example I'm running in lein that's supposed to run a function for every element in a shared map once per second:
(def clients (atom {"a" "b" "c" "d" }))
(def ticker-agent (agent nil))
(defn execute [a]
(println "execute")
(let [ keys (keys #clients) ]
(println "keys= " keys )
(doseq [ x keys ] (println x)))
;(map (fn [k] (println k)) keys)) ;; replace doseq with this?
(Thread/sleep 1000)
(send *agent* execute))
(defn -main [& args]
(send ticker-agent execute)
)
If I run this with map I get
execute
keys= (a c)
execute
keys= (a c)
...
First confusing issue: I understand that I'm likely using map incorrectly because there's no return value, but does that mean the inner println is optimized away? Especially given that if I run this in a repl:
(map #(println %) '(1 2 3))
it works fine?
Second question - if I run this with doseq instead of map I can run into conditions where the execution agent stops (which I'd append here, but am having difficulty isolating/recreating). Clearly there's something I"m missing possibly relating to locking on the maps keyset? I was able to do this even moving the shared map out of an atom. Is there default syncrhonization on the clojure map?
map is lazy. This means that it does not calculate any result until the result is accessed from the data structure it reteruns. This means that it will not run anything if its result is not used.
When you use map from the repl the print stage of the repl accesses the data, which causes any side effects in your mapped function to be invoked. Inside a function, if the return value is not investigated, any side effects in the mapping function will not occur.
You can use doall to force full evaluation of a lazy sequence. You can use dorun if you don't need the result value but want to ensure all side effects are invoked. Also you can use mapv which is not lazy (because vectors are never lazy), and gives you an associative data structure, which is often useful (better random access performance, optimized for appending rather than prepending).
Edit: Regarding the second part of your question (moving this here from a comment).
No, there is nothing about doseq that would hang your execution, try checking the agent-error status of your agent to see if there is some exception, because agents stop executing and stop accepting new tasks by default if they hit an error condition. You can also use set-error-model and set-error-handler! to customize the agent's error handling behavior.

How many threads does Clojure's pmap function spawn for URL-fetching operations?

The documentation on the pmap function leaves me wondering how efficient it would be for something like fetching a collection of XML feeds over the web. I have no idea how many concurrent fetch operations pmap would spawn and what the maximum would be.
If you check the source you see:
> (use 'clojure.repl)
> (source pmap)
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
{:added "1.0"}
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
([f coll & colls]
(let [step (fn step [cs]
(lazy-seq
(let [ss (map seq cs)]
(when (every? identity ss)
(cons (map first ss) (step (map rest ss)))))))]
(pmap #(apply f %) (step (cons coll colls))))))
The (+ 2 (.. Runtime getRuntime availableProcessors)) is a big clue there. pmap will grab the first (+ 2 processors) pieces of work and run them asynchronously via future. So if you have 2 cores, it's going to launch 4 pieces of work at a time, trying to keep a bit ahead of you but the max should be 2+n.
future ultimately uses the agent I/O thread pool which supports an unbounded number of threads. It will grow as work is thrown at it and shrink if threads are unused.
Building on Alex's excellent answer that explains how pmap works, here's my suggestion for your situation:
(doall
(map
#(future (my-web-fetch-function %))
list-of-xml-feeds-to-fetch))
Rationale:
You want as many pieces of work in-flight as you can, since most will block on network IO.
Future will fire off an asynchronous piece of work for each request, to be handled in a thread pool. You can let Clojure take care of that intelligently.
The doall on the map will force the evaluation of the full sequence (i.e. the launch of all the requests).
Your main thread can start dereferencing the futures right away, and can therefore continue making progress as the individual results come back
No time to write a long response, but there's a clojure.contrib http-agent which creates each get/post request as its own agent. So you can fire off a thousand requests and they'll all run in parallel and complete as the results come in.
Looking the operation of pmap, it seems to go 32 threads at a time no mater what number of processors you have, the issue is that map will go ahead of the computation by 32 and the futures are started in their own. (SAMPLE)
(defn samplef [n]
(println "starting " n)
(Thread/sleep 10000)
n)
(def result (pmap samplef (range 0 100)))
; you will wait for 10 seconds and see 32 prints then when you take the 33rd an other 32
; prints this mins that you are doing 32 concurrent threads at a time
; to me this is not perfect
; SALUDOS Felipe