Does alter use locks internally? - concurrency

(def a (ref 0))
(def b (ref 0))
(def f1 (future
(dosync
(println "f1 start")
(alter a inc)
(Thread/sleep 500)
(alter b inc)
(println "f1 end"))))
(def f2 (future
(dosync
(println "f2 start")
(alter a inc)
(println "f2 end"))))
#f1 #f2
In the example above, I thought that the thread f2 should terminate before f1, although f1 reached the expression (alter a inc) before f2 did, but f1 continues its time-consuming execution, so f2 committed first, thus, upon commission of f1, it founds that ref a has been modified, then f1 should retry.
But the result showed I was wrong, it printed out the following:
f1 start
f2 start
f2 start
f2 start
f2 start
f2 start
f1 end
f2 start
f2 end
It's f2 that retried, it seems like f1 "locked" the ref on (alter a inc), and f2 waited for f1 to "release the lock" before f2 could successfully commit the alteration.
What is the underlying mechanism?

Your program illustrates one problem of the STM: when you have multiple transactions working on the same state, those transactions must essentially be serialisable, that is run serially (one after the other in its entirety).
That is why long-running transactions are really a very bad thing as they may cause all other transactions that are working on the same refs to retry, even if in theory they could finish very quickly.
commute is the tool that is provided to mitigate this problem. If you know that certain operations in different transactions alter the same refs but don’t (logically) interfere with each other because they are commutative operations, you can use commute instead of alter to loosen the serialisability requirement.
And yes, STM transactions use locks internally. Basically, you can think of (alter a inc) as obtaining the ‘write lock’ on ref a and failing and retrying if it is already taken – consider this an implementation detail. (There are complications: under stress, older transactions are allowed to break through a lock held by a younger transaction; also, the STM uses timeouts internally, so using timeouts in your program brings these implementation details to the surface.)

It seems to vary depending on the timing. I tried your version and got the same results. Then I changed it a bit and got different results:
(let [a (ref 0)
b (ref 0)
f1 (future
(dosync
(println "f1 start")
(alter a inc)
(Thread/sleep 500)
(alter b inc)
(println "f1 end")))
f2 (future
(dosync
(println "f2 start")
(alter a inc)
(println "f2 end")))]
(println #f1)
(println #f2)
(println #a)
(println #b))
with results:
Testing tst.demo.core
f1 start
f2 start
f2 start
f2 end
f1 start
f1 end
(clojure.core/deref f1) => nil
(clojure.core/deref f2) => nil
(clojure.core/deref a) => 2
(clojure.core/deref b) => 1
So the detailed timing of the threads on any particular machine at any particular time seems to make a big difference.
I will admit that I am surprised, as I would have predicted the same thing as you. Neither your result nor this new one is what I would have expected.
Update
I modified it a bit and then got the expected result:
(let [a (ref 0)
b (ref 0)
f1 (future
(dosync
(println "f1 start")
(alter a inc)
(Thread/sleep 500)
(alter b inc)
(println "f1 end")))
f2 (future
(dosync
(Thread/sleep 100)
(println "f2 start")
(alter a inc)
(println "f2 end")))]
(println #f1)
(println #f2)
(println #a)
(println #b))
f1 start
f2 start
f2 end
f1 start
f1 end
(clojure.core/deref f1) => nil
(clojure.core/deref f2) => nil
(clojure.core/deref a) => 2
(clojure.core/deref b) => 1

Related

Clojure STM simple program

I am writing on a simple program to transform the values of two integers using Clojure's STM.
I am following the approch of Lewandowski (http://lewandowski.io/2016/01/clojure-summary/), using his function a06.
My code:
(defn trans [p1 p2]
(println "a")
(dosync
(let [newval1 (#p1 + 50)
newval2 (#p2 - 30)]
(do
(println "b")
(ref-set p1 newval1)
(ref-set p2 newval2)))))
(defn main []
(let [p1 (ref 20) p2 (ref 100)]
(do
(future (trans p1 p2))
(future (trans p1 p2))
(Thread/sleep 500))
(println #p1)
(println #p2)))
For some reason, my function main is not getting into the "do part" of the function trans. My output is hence only:
a
a
20
100
=> nil
I added "a" and "b" to show my problem.
I am sorry if this is a very simple mistake I made but I simply don't see what is missing in my code. Every answer is much apreciated! Thank you!
Compare to the following, which works correctly:
(defn trans [p1 p2]
(println "a")
(dosync
(let [newval1 (+ #p1 50)
newval2 (- #p2 30)]
(do
(println "b")
(ref-set p1 newval1)
(ref-set p2 newval2)))))
(defn main []
(let [p1 (ref 20) p2 (ref 100)]
(let [f1 (future (trans p1 p2))
f2 (future (trans p1 p2))]
(Thread/sleep 500))
(println (pr-str #p1))
(println (pr-str #p2))))
That said, futures aren't really the ideal tool for the job when your interest is in a side effect, rather than in later retrieval of the result of the calculation itself.

Memoize over one parameter

I have a function which takes two inputs which I would like to memoize. The output of the function only depends on the value of the first input, the value of the second input has no functional effect on the outcome (but it may affect how long it takes to finish). Since I don't want the second parameter to affect the memoization I cannot use memoize. Is there an idiomatic way to do this or will I just have to implement the memoization myself?
I'd recommend using a cache (like clojure.core.cache) for this instead of function memoization:
(defonce result-cache
(atom (cache/fifo-cache-factory {})))
(defn expensive-fun [n s]
(println "Sleeping" s)
(Thread/sleep s)
(* n n))
(defn cached-fun [n s]
(cache/lookup
(swap! result-cache
#(cache/through
(fn [k] (expensive-fun k s))
%
n))
n))
(cached-fun 111 500)
Sleeping 500
=> 12321
(cached-fun 111 600) ;; returns immediately regardless of 2nd arg
=> 12321
(cached-fun 123 600)
Sleeping 600
=> 15129
memoize doesn't support caching only on some args, but's pretty easy to make it yourself:
(defn search* [a b]
(* a b))
(def search
(let [mem (atom {})]
(fn [a b]
(or (when-let [cached (get #mem a)]
(println "retrieved from cache")
cached)
(let [ret (search* a b)]
(println "storing in cache")
(swap! mem assoc a ret)
ret)))))
You can wrap you function into another function (with one parameter) and call it the function with second default parameter. Then you can memoize the new function.
(defn foo
[param1]
(baz param1 default-value))

Is there a way to be notified when a clojure future finishes?

Is there a way to set a watch on a future so that it triggers a callback when it is done?
something like this?
> (def a (future (Thread/sleep 1000) "Hello World!")
> (when-done a (println #a))
...waits for 1sec...
;; => "Hello World"
You can start another task that watches the future and then runs the function. In this case I'll just use another future. Which wraps up nicely into a when-done function:
user=> (defn when-done [future-to-watch function-to-call]
(future (function-to-call #future-to-watch)))
user=> (def meaning-of-the-universe
(let [f (future (Thread/sleep 10000) 42)]
(when-done f #(println "future available and the answer is:" %))
f))
#'user/meaning-of-the-universe
... waiting ...
user=> future available and the answer is: 42
user=> #meaning-of-the-universe
42
For very simple cases:
If you dont want to block and dont care about the result just add the callback in the future definition.
(future (a-taking-time-computation) (the-callback))
If you care about the result use comp with the call back
(future (the-callback (a-taking-time-computation)))
or
(future (-> input a-taking-time-computation callback))
Semantically speaking the java equivalent code would be:
final MyCallBack callbackObj = new MyCallBack();
new Thread() {
public void run() {
a-taking-time-computation();
callbackObj.call();
}
}.start()
For complex cases you may want to look:
https://github.com/ztellman/manifold
https://github.com/clojure/core.async
I found this thread on google which looks interesting:
https://groups.google.com/forum/?fromgroups=#!topic/clojure-dev/7BKQi9nWwAw
http://dev.clojure.org/display/design/Promises
https://github.com/stuartsierra/cljque/blob/master/src/cljque/promises.clj
An extension to the accepted answer
please note the following caveat:
with the above when-done implementation the callback will not be invoked
if the future is cancelled.
That is
(do
(def f0 (future (Thread/sleep 1000)))
(when-done f0 (fn [e] (println "THEN=>" e)))
(Thread/sleep 500)
(future-cancel f0))
will not print because the deref call on the cancelled
future will raise an exception.
If you need callbacks to happen i suggest:
(defn then
"Run a future waiting on another, and invoke
the callback with the exit value if the future completes,
or invoke the callback with no args if the future is cancelled"
[fut cb]
(future
(try
(cb #fut)
(catch java.util.concurrent.CancellationException e
(cb)))))
So now this will print:
(do
(def f0 (future (Thread/sleep 1000)))
(then f0 (fn
([] (println "CANCELLED"))
([e] (println "THEN=>" e))))
(Thread/sleep 500)
(future-cancel f0))
Instead of adding additional time with Thread/Sleep , I'm leveraging on the fact that #future-ref for any reference to the future will wait until the future is done.
(defn wait-for
[futures-to-complete]
(every? #(#%) futures-to-complete))

Are there any good libraries or strategies for testing multithreaded applications in clojure?

I'm curious if anyone has come up with a good strategy for testing multithreaded apps.
I do alot of testing with midje, which is great for testing functions... but I'm not really sure how to test multithreaded code without it looking really hacky:
(fact "the state is modified by a thread call"
(Thread/sleep 100)
(check-state-eq *state* nil)
(Thread/sleep 100)
(modify-state-thread-call *state* :updated-value)
(Thread/sleep 100)
(check-state-eq *state* :updated-value))
Sometimes, because of compilation time, my tests fail because a state was not updated in time, so then I have to sleep for longer. Ideally, I would want a way to write something like:
(fact "the state is modified by a thread call"
(modify-state-thread-call *state* :updated-value)
=leads-to=> (check-state-eq *state* :updated-value))
and move away from the sleeps. Are there strategies to do that?
If *state* in this example is one of the clojure reference types, you can add a function that is notified of every change to that object using add-watch: http://clojuredocs.org/clojure_core/clojure.core/add-watch
An approach I might suggest is to use a watch to deliver a promise when the condition is satisfied.
(let [check-promise (promise)]
(add-watch *state* :check-for-updated-value
(fn [rkey refr _oldval newval]
(when (some-check newval)
(remove-watch refr rkey)
(deliver check-promise true))))
(modify-state-thread-call *state* :updated-value)
(deref check-promise 1000 false))
This will return true immediately if *state* takes on a value that satisfies some-check within 1000ms, or after 1000ms if the condition is not met, returns false.
Based on Crate's response, I've created a wait function:
(defn return-val [p ms ret]
(cond (nil? ms) (deref p)
:else (deref p ms ret)))
(defn wait
([f rf] (wait f rf nil nil))
([f rf ms] (wait f rf ms nil))
([f rf ms ret]
(let [p (promise)
pk (hash-keyword p)
d-fn (fn [_ rf _ _]
(remove-watch rf pk)
(deliver p rf))]
(add-watch rf pk d-fn)
(f rf)
(return-val p ms ret))))
Its usage is:
(defn threaded-inc [rf]
(future
(Thread/sleep 100)
(dosync (alter rf inc)))
rf)
(def arf (ref 0))
(deref (threaded-inc arf)) ;=> 0
(dosync (ref-set arf 0))
(deref (wait threaded-inc arf)) ;=> 1

Dead simple Fork-Join concurrency in Clojure

I have two costly functions that are independent. I want to run them in parallel. I don't want to deal with futures and such (I'm new to Clojure and easily confused).
I'm looking for a simple way to run two functions concurrently. I want it to work like the following
(defn fn1 [input] ...) ; costly
(defn fn2 [input] ...) ; costly
(let [[out1 out2] (conc (fn1 x) (fn2 y))] ...)
I want this to return a vector with a pair of outputs. It should only return once both threads have terminated. Ideally conc should work for any number of inputs. I suspect this is a simple pattern.
Using futures is very easy in Clojure. At any rate, here is an answer that avoids them
(defn conc [& fns]
(doall (pmap (fn [f] (f)) fns)))
pmap uses futures under the hood. doall will force the sequence to evaluate.
(let [[out1 out2] (conc fn1 fn2)]
[out1 out2])
Note, that I destructured out1 and out2 in an attempt to preserve your example.
You do need a macro to preserve the desired syntax, though there are other ways of obtaining the same behavior, as the other answers indicate. Here is one way to do it:
(defn f1 [x] (Thread/sleep 500) 5)
(defn f2 [y] 2)
(defmacro conc [& exprs]
`(map deref
[~#(for [x# exprs] `(future ~x#))]))
(time (let [[a b] (conc (f1 6) (f2 7))]
[a b]))
; "Elapsed time: 500.951 msecs"
;= (5 2)
The expansion shows how it works:
(macroexpand-1 '(conc (f1 6) (f2 7)))
;= (clojure.core/map clojure.core/deref [(clojure.core/future (f1 6))
;= (clojure.core/future (f2 7))])
You specified two functions but this should work with any number of expressions.
I understand you don't want your final solution to expose futures though it is useful to illustrate how to do this with futures, and then wrap them in something that hides this detail:
core> (defn fn1 [input] (java.lang.Thread/sleep 2000) (inc input))
#'core/fn1
core> (defn fn2 [input] (java.lang.Thread/sleep 3000) (* 2 input))
#'core/fn2
core> (time (let [f1 (future (fn1 4)) f2 (future (fn2 4))] #f1 #f2))
"Elapsed time: 3000.791021 msecs"
then we can wrap that up in any of the many clojure wrappers around futures. the simplest being just a function which takes two functions and runs them in parallel.
core> (defn conc [fn1 fn2]
(let [f1 (future (fn1))
f2 (future (fn2))] [#f1 #f2]))
#'core/conc
core> (time (conc #(fn1 4) #(fn2 4)))
"Elapsed time: 3001.197634 msecs"
This avoids the need to write it as a macro by having conc take the function to run instead of the body to evaluate, and then create the functions to pass to it by putting # infront of the calls.
This can also be written with map and future-call:
core> (map deref (map future-call [#(fn1 4) #(fn2 42)]))
(5 84)
You can then improce conc until it resembles (as Julien Chastang wisely points out) pmap