I am trying to understand Clojure futures, and I've seen examples from the common Clojure books out there, and there are examples where futures are used for parallel computations (which seems to makes sense).
However, I am hoping someone can explain the behavior of a simple example adapted from O'Reilly's Programming Clojure book.
(def long-calculation (future (apply + (range 1e8))))
When I try to dereference this, by doing
(time #long-calculation)
It returns the correct result (4999999950000000), but almost instantly (in 0.045 msecs) on my machine.
But when I call the actual function, like so
(time (apply + (range 1e8)))
I get the correct result as well, but the time taken is much larger (~ 5000 msecs).
When I dereference the future, my understanding is that a new thread is created on which the expression is evaluated - in which case I would expect it to take around 5000 msec as well.
How come the dereferenced future returns the correct result so quickly?
The calculation in a future starts as soon as you create the future (in a separate thread). In your case, the calculation starts as soon as you execute (def long-calculation ....)
Dereferencing will do one of two things:
If the future has not completed, block until it completes and then return the value (this could take an arbitrary amount of time, or even never complete if the future fails to terminate)
If the future has completed, return the result. This is almost instantaneous (which is why you are seeing very fast dereference returns)
You can see the effect by comparing the following:
;; dereference before future completes
(let [f (future (Thread/sleep 1000))]
(time #f))
=> "Elapsed time: 999.46176 msecs"
;; dereference after future completes
(let [f (future (Thread/sleep 1000))]
(Thread/sleep 2000)
(time #f))
=> "Elapsed time: 0.039598 msecs"
Related
According to the official Clojure docs:
Since another thread may have changed the value in the intervening time, it [swap!] may have to
retry, and does so in a spin loop.
Does this mean that the thread performing the swap might possibly get stuck forever inside swap! if the atom in question never returns to the value read by swap!?
Yes. If you have one very slow mutation competing with a large number of fast operations, the slow operation will have to retry every time, and will not finish until all the fast operations finish. If the fast operations go on indefinitely, then the slow one will never finish.
For example, try:
(time
(let [a (atom 0)]
(future (dotimes [_ 1e9] (swap! a inc)))
(swap! a (fn [x] (Thread/sleep 1000) (* 2 x)))))
You'll see, first of all, that it takes a long time to finish, much longer than a second. This is because the swap! outside of the loop can't make any progress until the smaller tasks have all finished. You'll also see that the answer you get is exactly 2000000000, meaning that the doubling operation definitely happened last, after every single increment. If there were more increments, they would all get "priority".
I've additionally thought of a few cute ways to deadlock an atom forever without tying up any more threads at all!
One approach is to get the thread to race with itself:
(let [a (atom 0)]
(swap! a (fn [x]
(swap! a inc')
(inc' x))))
I use inc' so that it really is forever: it won't break after Long/MAX_VALUE.
And a way that doesn't even involve another swap! operation, much less another thread!
(swap! (atom (repeat 1)) rest)
Here the problem is that the .equals comparison in compare-and-swap never terminates, because (repeat 1) goes on forever.
No*. When the atom re-trys the operation, it uses the "new" value for the comparison.
You can find many examples online (like here and also here) where people have used from dozens to hundreds of threads to pound on an atom, and it always returns the correct result.
* Unless you have an infinite number of interruptions from competing, "faster" threads.
I would like to define a predicate that, taking as input some predicates
with corresponding inputs (they could be given as a lazy sequence of calls),
runs them in parallel and computes the logical or of the results,
in such a way that, the moment a predicate call terminates returning true,
the whole computation also terminates (returning true).
Apart from offering time optimization, this would also help avoiding
non-termination in some cases (some predicate calls may not terminate).
Actually, interpreting non-termination as a third undefined value,
this predicate simulates the or operation in Kleene's K3 logic
(the join in the initial centered Kleene algebra).
Something similar is presented here for the Haskell family.
Is there any (preferably simple) way to do this in Clojure?
EDIT: I decided to add some clarifications after reading the comments.
(a) First of all, what happens after the thread pool gets exhausted is of less importance. I think creating a thread pool large enough for our needs is a reasonable convention.
(b) The most crucial requirement is that the predicate calls start running in parallel and, once a predicate call terminates returning true, all the other threads running get interrupted. The intended behavior is that:
if there is a predicate call returning true: the parallel or returns true
else if there is a predicate call that does not terminate: the parallel or does not terminate
else: the parallel or returns false
In other words, it behaves like the join in the 3-element lattice given by false<undefined<true, with undefined representing non-termination.
(c) The parallel or should be able to take as input many predicates and many predicate-inputs (each one corresponding to a predicate). But it would be even better if it took as input a lazy sequence. Then, naming the parallel or pany (for "parallel any"), we could have calls like the following:
(pany (map (comp eval list) predicates inputs))
(pany (map (comp eval list) predicates (repeat input)))
(pany (map (comp eval list) (repeat predicate) inputs)) which is equivalent to (pany (map predicate (unchunk inputs)))
As a final remark, I think that it is quite natural to ask for things like pany, a dual pall or a mechanism for building such early-terminating parallel reductions to be easily implementable or even built-in in a parallelism-oriented language like Clojure.
I will define our predicates in terms of a reducing function. Practically, we could reimplement all of the Clojure iteration functions to support this parallel operation, but I'll just use reduce as an example.
I'll define a computation function. I'll just use the same one, but nothing stopping you from having many. The function is "true" if it accumulates 1000.
(defn computor [acc val]
(let [new (+' acc val)] (if (> new 1000) (reduced new) new)))
(reduce computor 0 (range))
;; =>
1035
(reduce computor 0 (range Long/MIN_VALUE 0))
;; =>
;; ...this is a proxy for a non-returning computation
;; wrap these up in a form suitable for application of reduction
(def predicates [[computor 0 (range)]
[computor 0 (range Long/MIN_VALUE 0)]])
Now let's get to the meat of this. I want to take a step in each computation, and if one of the computations completes, I want to return it. In actual fact one step at a time using pmap is very slow - the units of work are too small to be worth threading. Here I've changed things to do 1000 iterations of each unit of work before moving on. You'd probably tune this based on your workload and the cost of a step.
(defn p-or-reducer* [reductions]
(let [splits (map #(split-at 1000 %) reductions) ;; do at least 1000 iterations per cycle
complete (some #(if (empty? (second %)) (last (first %))) splits)]
(or complete (recur (map second splits)))))
I then wrap this in a driver.
(defn p-or [s]
(p-or-reducer* (map #(apply reductions %) s)))
(p-or predicates)
;; =>
1035
Where to insert the CPU parallelism? s/map/pmap/ in p-or-reducer* should do it. I suggest just parallelising the first operation, as this will drive the reducing sequences to compute.
(defn p-or-reducer* [reductions]
(let [splits (pmap #(split-at 1000 %) reductions) ;; do at least 1000 iterations per cycle
complete (some #(if (empty? (second %)) (last (first %))) splits)]
(or complete (recur (map second splits)))))
(def parallelism-tester (conj (vec (repeat 40000 [computor 0 (range Long/MIN_VALUE 0)]))
[computor 0 (range)]))
(p-or parallelism-tester) ;; terminates even though the first 40K predicates will not
It's extremely hard to define a performant generic version of this. Without knowing the cost per iteration an efficient parallelism strategy is hard to derive - if one iteration take 10s then we'd probably take a single step at a time. If it takes 100ns then we need to take many steps at a time.
Will you consider adopting core.async to handle parallel tasks with async/go or async/thread, and early return with async/alts!?
For example, to turn the core or function from serial into parallel. We can create a macro (I called it por) to wrap the input functions (or predicates) into async/thread and then do a socket select async/alts! on top of them:
(defmacro por [& fns]
`(let [[v# c#] (async/alts!!
[~#(for [f fns]
(list `async/thread f))])]
v#))
(time
(por (do (println "running a") (Thread/sleep 30) :a)
(do (println "running b") (Thread/sleep 20) :b)
(do (println "running c") (Thread/sleep 10) :c)))
;; running a
;; running b
;; running c
;; "Elapsed time: 11.919169 msecs"
;; => :c
In comparison with the original or (which run in serial):
(time
(or (do (println "running a") (Thread/sleep 30) :a)
(do (println "running b") (Thread/sleep 20) :b)
(do (println "running c") (Thread/sleep 10) :c)))
;; running a
;; => :a
;; "Elapsed time: 31.642506 msecs"
Let's say I have a huge lazy seq and I want to iterate it so I can process on the data that I get during the iteration.
The thing is I want to lose head(GC'd) of lazy seq(that processed) so I can work on seqs that have millions of data without having OutofMemoryException.
I have 3 examples that I'm not sure.
Could you provide best practices(examples) for that purpose?
Do these functions lose head?
Example 1
(defn lose-head-fn
[lazy-seq-coll]
(when (seq (take 1 lazy-seq-coll))
(do
;;do some processing...(take 10000 lazy-seq-coll)
(recur (drop 10000 lazy-seq-coll)))))
Example 2
(defn lose-head-fn
[lazy-seq-coll]
(loop [i lazy-seq-coll]
(when (seq (take 1 i))
(do
;;do some processing...(take 10000 i)
(recur (drop 10000 i))))))
Example 3
(doseq [i lazy-seq-coll]
;;do some processing...
)
Update: Also there is an explanation in this answer here
copy of my above comments
As far as I know, all of the above would lose head (first two are obvious, since you manually drop the head, while doseq's doc claims that it doesn't retain head).
That means that if the lazy-seq-coll you pass to the function isn't bound somewhere else with def or let and used later, there should be nothing to worry about. So (lose-head-fn (range)) won't eat all your memory, while
(def r (range))
(lose-head-fn r)
probably would.
And the only best practice I could think of is not to def possibly infinite (or just huge) sequences, because all of their realized items would live forever in the var.
In general, you must be careful not to retain a reference either locally or globally for a part of a lazy seq that precedes another which involves excessive computation.
For example:
(let [nums (range)
first-ten (take 10 nums)]
(+ (last first-ten) (nth nums 100000000)))
=> 100000009
This takes about 2 seconds on a modern machine. How about this though? The difference is the last line, where the order of arguments to + is swapped:
;; Don't do this!
(let [nums (range)
first-ten (take 10 nums)]
(+ (nth nums 100000000) (last first-ten)))
You'll hear your chassis/cpu fans come to life, and if you're running htop or similar, you'll see memory usage grow rather quickly (about 1G in the first several seconds for me).
What's going on?
Much like a linked list, elements in a lazy seq in clojure reference the portion of the seq that comes next. In the second example above, first-ten is needed for the second argument to +. Thus, even though nth is happy to hold no references to anything (after all, it's just finding an index in a long list), first-ten refers to a portion of the sequence that, as stated above, must hold onto references to the rest of the sequence.
The first example, by contrast, computes (last first-ten), and after this, first-ten is no longer used. Now the only reference to any portion of the lazy sequence is nums. As nth does its work, each portion of the list that it's finished with is no longer needed, and since nothing else refers to the list in this block, as nth walks the list, the memory taken by the sequence that has been examined can be garbage collected.
Consider this:
;; Don't do this!
(let [nums (range)]
(time (nth nums 1e8))
(time (nth nums 1e8)))
Why does this have a similar result as the second example above? Because the sequence will be cached (held in memory) on the first realization of it (the first (time (nth nums 1e8))), because nums is being used on the next line. If, instead, we use a different sequence for the second nth, then there is no need to cache the first one, so it can be discarded as it's processed:
(let [nums (range)]
(time (nth nums 1e8))
(time (nth (range) 1e8)))
"Elapsed time: 2127.814253 msecs"
"Elapsed time: 2042.608043 msecs"
So as you work with large lazy seqs, consider whether anything is still pointing to the list, and if anything is (global vars being a common one), then it will be held in memory.
What is the correct way, in Clojure, to do parallel processing when each job of the processing can occur in utter isolation and may generate a list of additional jobs that need to be evaluated?
My actual problem is a nutritional calculation problem, but I will put this in the form of Chess which shares the same problem space traits as my calculation.
Assume, for instance, that I am trying to find all of the moves to Checkmate in a game of Chess. When searching through the board states, I would start out with 20 possible states, each representing a different possible opening move. Each of those will need to be evaluated, accepted or rejected, and then for each accepted move, a new list of jobs would be created representing all of the possible next moves. The jobs would look like this:
initial: '([] proposed-move)
accepted: '([move] proposed-response)
'([move move] proposed-response)
The number of states to evaluates grows as a result of each computation, and each state can be evaluated in complete isolation from all of the others.
A solution I am playing with goes as such:
; a list of all final solutions, each of which is a sequence of moves
(def solutions (agent []))
; a list of all jobs pending evaluation
(def jobs (agent []))
Given these definitions, I would have a java thread pool, and each thread would request a job from the jobs agent (and wait for that request to be fulfilled). It would then run the calculation, generate a list of solutions and possible solutions. Finally, it would send the solutions to the solutions agent, and the possible solutions to the jobs agent.
Is using a combination of agents and threads the most idiomatic way to go in this case? Can I even get data out of the job queue in the way I am proposing?
Or should my jobs be a java.util.concurrent.LinkedBlockingQueue, as described in Producer consumer with qualifications?
You can do this with the following approach:
Repeated applications of pmap (which provides parallel processing of all elements in collection)
The function used in pmap returns a list of elements. Could be zero, one or multiple elements, which will then be processed in the next iteration
The results get recombined with concat
You repeat the processing of the list for as many times as you like, perhaps storing the result in an atom.
Example code could be something like the following
(def jobs (atom '(1 10 100)))
(defn process-element [value]
(if (< (rand) 0.8)
[(inc value)]
[]))
(defn do-processing []
(swap! jobs
(fn [job-list] (apply concat (pmap process-element job-list)))))
(while (seq #jobs)
(prn #jobs)
(do-processing))
Whick could produce output like:
(1 10 100)
(2 11 101)
(3 12 102)
(4 13 103)
(5 14 104)
(6 15 105)
(7 106)
(107)
(108)
(109)
nil
Note that you need to be a bit careful to make sure your algorithm terminates! In the example this is guaranteed by the elements dying off over time, but if your seach space is growing then you will probably want to apply a time limit instead of just using a (while ... ) loop.
Your approach with agents and threads seems quite close to (what I see as) idiomatic clojure.
the only thing I would change to make it more "clojure like" would be to use pmap to iterate over queue that is stored in an agent. using pmap instead of your own thread pool will save you the effort of managing the thread pool because pmap already uses clojure's thread pool which is initialized properly for the current number of processors. it also helps you take advantage of sequence chunking (which perhaps could help).
You could also use channels. Maybe something like this:
(def jobs (chan))
(def solutions (chan))
(def accepted-solutions (atom (vector)))
(go (loop [job (<! jobs)]
(when job
(go (doseq [solution (process-job-into-solutions job)]
(>! solutions)))
(recur (<! jobs)))))
(go (loop [solution (<! solutions)]
(when (acceptable? solution)
(swap! accepted-solutions conj solution)
(doseq [new-job (generate-new-jobs solution)]
(>! jobs))
(recur (<! solutions)))))
(>!! jobs initial-job)
I'm looking for a macro that will throw an exception if an expression takes longer than X seconds to complete.
This question has better answers here:
Executing a function with a timeout
Futures to the rescue!
user=> (let [f (future (reduce * (range 1 1001)))]
(.get f 1 java.util.concurrent.TimeUnit/MILLISECONDS))
java.util.concurrent.TimeoutException (NO_SOURCE_FILE:0)
And to make a macro of it:
(defmacro time-limited [ms & body]
`(let [f# (future ~#body)]
(.get f# ~ms java.util.concurrent.TimeUnit/MILLISECONDS)))
So you can do this:
user=> (time-limited 1 (reduce * (range 1 1001)))
java.util.concurrent.TimeoutException (NO_SOURCE_FILE:0)
user=> (time-limited 1 (reduce * (range 1 101)))
93326215443944152681699238856266700490715968264381621468592963895217599993229915
608941463976156518286253697920827223758251185210916864000000000000000000000000
I'm not sure this is possible without running the expression in a separate thread. The reason being, if the thread is busy processing the expression, you can't inject code to throw an exception.
A version with a monitor thread that throws an exception if the expression takes too long is definitely possible, however, the exception thrown would be from the monitor thread, not the thread in which the expression is running. Then, there'd be no way of stopping it short of sending that thread an interrupt, which it might ignore if you haven't coded for it in the expression.
If it's acceptable to have a version which runs the expression in a separate thread, let me know and I can post some sample code. Otherwise, your best bet sound like it would be to write the main loop/recursion of the expression in such a way that it checks to see how long it has taken at every iteration and throws an exception if it's exceeded the bound. Sorry if that's not quite what you need...
I came across this thread recently while asking the same question. I wasn't completely satisfied with the answers given so I cobbled together an alternative solution. This solution will run your code in the current thread and spin of a future to interrupt it after a set timeout in ms.
(defn invoke-timeout [f timeout-ms]
(let [thr (Thread/currentThread)
fut (future (Thread/sleep timeout-ms)
(.interrupt thr))]
(try (f)
(catch InterruptedException e
(throw (TimeoutException. "Execution timed out!")))
(finally
(future-cancel fut)))))
(defmacro timeout [ms & body] `(invoke-timeout (fn [] ~#body) ~ms))
You would use it in your code like this:
(timeout 1000 your-code)
OR
(invoke-timeout #(your-code) 1000)
One caveat to keep in mind is that your-code must not catch the InterruptedException used to trigger the TimeoutException. I use this for testing and it works well.
See the Thread.interrupt() javadoc for additional caveats.
You can see this code in use here.