Dead simple Fork-Join concurrency in Clojure - concurrency

I have two costly functions that are independent. I want to run them in parallel. I don't want to deal with futures and such (I'm new to Clojure and easily confused).
I'm looking for a simple way to run two functions concurrently. I want it to work like the following
(defn fn1 [input] ...) ; costly
(defn fn2 [input] ...) ; costly
(let [[out1 out2] (conc (fn1 x) (fn2 y))] ...)
I want this to return a vector with a pair of outputs. It should only return once both threads have terminated. Ideally conc should work for any number of inputs. I suspect this is a simple pattern.

Using futures is very easy in Clojure. At any rate, here is an answer that avoids them
(defn conc [& fns]
(doall (pmap (fn [f] (f)) fns)))
pmap uses futures under the hood. doall will force the sequence to evaluate.
(let [[out1 out2] (conc fn1 fn2)]
[out1 out2])
Note, that I destructured out1 and out2 in an attempt to preserve your example.

You do need a macro to preserve the desired syntax, though there are other ways of obtaining the same behavior, as the other answers indicate. Here is one way to do it:
(defn f1 [x] (Thread/sleep 500) 5)
(defn f2 [y] 2)
(defmacro conc [& exprs]
`(map deref
[~#(for [x# exprs] `(future ~x#))]))
(time (let [[a b] (conc (f1 6) (f2 7))]
[a b]))
; "Elapsed time: 500.951 msecs"
;= (5 2)
The expansion shows how it works:
(macroexpand-1 '(conc (f1 6) (f2 7)))
;= (clojure.core/map clojure.core/deref [(clojure.core/future (f1 6))
;= (clojure.core/future (f2 7))])
You specified two functions but this should work with any number of expressions.

I understand you don't want your final solution to expose futures though it is useful to illustrate how to do this with futures, and then wrap them in something that hides this detail:
core> (defn fn1 [input] (java.lang.Thread/sleep 2000) (inc input))
#'core/fn1
core> (defn fn2 [input] (java.lang.Thread/sleep 3000) (* 2 input))
#'core/fn2
core> (time (let [f1 (future (fn1 4)) f2 (future (fn2 4))] #f1 #f2))
"Elapsed time: 3000.791021 msecs"
then we can wrap that up in any of the many clojure wrappers around futures. the simplest being just a function which takes two functions and runs them in parallel.
core> (defn conc [fn1 fn2]
(let [f1 (future (fn1))
f2 (future (fn2))] [#f1 #f2]))
#'core/conc
core> (time (conc #(fn1 4) #(fn2 4)))
"Elapsed time: 3001.197634 msecs"
This avoids the need to write it as a macro by having conc take the function to run instead of the body to evaluate, and then create the functions to pass to it by putting # infront of the calls.
This can also be written with map and future-call:
core> (map deref (map future-call [#(fn1 4) #(fn2 42)]))
(5 84)
You can then improce conc until it resembles (as Julien Chastang wisely points out) pmap

Related

Memoize over one parameter

I have a function which takes two inputs which I would like to memoize. The output of the function only depends on the value of the first input, the value of the second input has no functional effect on the outcome (but it may affect how long it takes to finish). Since I don't want the second parameter to affect the memoization I cannot use memoize. Is there an idiomatic way to do this or will I just have to implement the memoization myself?
I'd recommend using a cache (like clojure.core.cache) for this instead of function memoization:
(defonce result-cache
(atom (cache/fifo-cache-factory {})))
(defn expensive-fun [n s]
(println "Sleeping" s)
(Thread/sleep s)
(* n n))
(defn cached-fun [n s]
(cache/lookup
(swap! result-cache
#(cache/through
(fn [k] (expensive-fun k s))
%
n))
n))
(cached-fun 111 500)
Sleeping 500
=> 12321
(cached-fun 111 600) ;; returns immediately regardless of 2nd arg
=> 12321
(cached-fun 123 600)
Sleeping 600
=> 15129
memoize doesn't support caching only on some args, but's pretty easy to make it yourself:
(defn search* [a b]
(* a b))
(def search
(let [mem (atom {})]
(fn [a b]
(or (when-let [cached (get #mem a)]
(println "retrieved from cache")
cached)
(let [ret (search* a b)]
(println "storing in cache")
(swap! mem assoc a ret)
ret)))))
You can wrap you function into another function (with one parameter) and call it the function with second default parameter. Then you can memoize the new function.
(defn foo
[param1]
(baz param1 default-value))

Is is possible to destructure a clojure vector into last two items and the rest?

I know I can destructure a vector "from the front" like this:
(fn [[a b & rest]] (+ a b))
Is there any (short) way to access the last two elements instead?
(fn [[rest & a b]] (+ a b)) ;;Not legal
My current alternative is to
(fn [my-vector] (let [[a b] (take-last 2 my-vector)] (+ a b)))
and it was trying to figure out if there is way to do that in a more convenient way directly in the function arguments.
You can peel off the last two elements and add them thus:
((fn [v] (let [[b a] (rseq v)] (+ a b))) [1 2 3 4])
; 7
rseq supplies a reverse sequence for a vector in quick time.
We just destructure its first two elements.
We needn't mention the rest of it, which we don't do anything with.
user=> (def v (vec (range 0 10000000)))
#'user/v
user=> (time ((fn [my-vector] (let [[a b] (take-last 2 my-vector)] (+ a b))) v))
"Elapsed time: 482.965121 msecs"
19999997
user=> (time ((fn [my-vector] (let [a (peek my-vector) b (peek (pop my-vector))] (+ a b))) v))
"Elapsed time: 0.175539 msecs"
19999997
My advice would be to throw convenience to the wind and use peek and pop to work with the end of a vector. When your input vector is very large, you'll see tremendous performance gains.
(Also, to answer the question in the title: no.)

Is there a simpler way to memoize a recursive let fn?

Let's say you have a recursive function defined in a let block:
(let [fib (fn fib [n]
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))]
(fib 42))
This can be mechanically transformed to utilize memoize:
Wrap the fn form in a call to memoize.
Move the function name in as the 1st argument.
Pass the function into itself wherever it is called.
Rebind the function symbol to do the same using partial.
Transforming the above code leads to:
(let [fib (memoize
(fn [fib n]
(if (< n 2)
n
(+ (fib fib (- n 1))
(fib fib (- n 2))))))
fib (partial fib fib)]
(fib 42))
This works, but feels overly complicated. The question is: Is there a simpler way?
I take risks in answering since I am not a scholar but I don't think so. You pretty much did the standard thing which in fine is a partial application of memoization through a fixed point combinator.
You could try to fiddle with macros though (for simple cases it could be easy, syntax-quote would do name resolution for you and you could operate on that). I'll try once I get home.
edit: went back home and tried out stuff, this seems to be ok-ish for simple cases
(defmacro memoize-rec [form]
(let [[fn* fname params & body] form
params-with-fname (vec (cons fname params))]
`(let [f# (memoize (fn ~params-with-fname
(let [~fname (partial ~fname ~fname)] ~#body)))]
(partial f# f#))))
;; (clojure.pprint/pprint (macroexpand '(memoize-rec (fn f [x] (str (f x))))))
((memoize-rec (fn fib [n]
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))) 75) ;; instant
((fn fib [n]
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2))))) 75) ;; slooooooow
simpler than what i thought!
I'm not sure this is "simpler" per se, but I thought I'd share an approach I took to re-implement letfn for a CPS transformer I wrote.
The key is to introduce the variables, but delay assigning them values until they are all in scope. Basically, what you would like to write is:
(let [f nil]
(set! f (memoize (fn []
<body-of-f>)))
(f))
Of course this doesn't work as is, because let bindings are immutable in Clojure. We can get around that, though, by using a reference type — for example, a promise:
(let [f (promise)]
(deliver! f (memoize (fn []
<body-of-f>)))
(#f))
But this still falls short, because we must replace every instance of f in <body-of-f> with (deref f). But we can solve this by introducing another function that invokes the function stored in the promise. So the entire solution might look like this:
(let [f* (promise)]
(letfn [(f []
(#f*))]
(deliver f* (memoize (fn []
<body-of-f>)))
(f)))
If you have a set of mutually-recursive functions:
(let [f* (promise)
g* (promise)]
(letfn [(f []
(#f*))
(g []
(#g*))]
(deliver f* (memoize (fn []
(g))))
(deliver g* (memoize (fn []
(f))))
(f)))
Obviously that's a lot of boiler-plate. But it's pretty trivial to construct a macro that gives you letfn-style syntax.
Yes, there is a simpler way.
It is not a functional transformation, but builds on the impurity allowed in clojure.
(defn fib [n]
(if (< n 2)
n
(+ (#'fib (- n 1))
(#'fib (- n 2)))))
(def fib (memoize fib))
First step defines fib in almost the normal way, but recursive calls are made using whatever the var fib contains. Then fib is redefined, becoming the memoized version of its old self.
I would suppose that clojure idiomatic way will be using recur
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt))
;; Memoized recursive function, a mash-up of memoize and fn
(defmacro mrfn
"Returns an anonymous function like `fn` but recursive calls to the given `name` within
`body` use a memoized version of the function, potentially improving performance (see
`memoize`). Only simple argument symbols are supported, not varargs or destructing or
multiple arities. Memoized recursion requires explicit calls to `name` so the `body`
should not use recur to the top level."
[name args & body]
{:pre [(simple-symbol? name) (vector? args) (seq args) (every? simple-symbol? args)]}
(let [akey (if (= (count args) 1) (first args) args)]
;; name becomes extra arg to support recursive memoized calls
`(let [f# (fn [~name ~#args] ~#body)
mem# (atom {})]
(fn mr# [~#args]
(if-let [e# (find #mem# ~akey)]
(val e#)
(let [ret# (f# mr# ~#args)]
(swap! mem# assoc ~akey ret#)
ret#))))))
;; only change is fn to mrfn
(let [fib (mrfn fib [n]
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))]
(fib 42))
Timings on my oldish Mac:
original, Elapsed time: 14089.417441 msecs
mrfn version, Elapsed time: 0.220748 msecs

Efficient side-effect-only analogue of Clojure's map function

What if map and doseq had a baby? I'm trying to write a function or macro like Common Lisp's mapc, but in Clojure. This does essentially what map does, but only for side-effects, so it doesn't need to generate a sequence of results, and wouldn't be lazy. I know that one can iterate over a single sequence using doseq, but map can iterate over multiple sequences, applying a function to each element in turn of all of the sequences. I also know that one can wrap map in dorun. (Note: This question has been extensively edited after many comments and a very thorough answer. The original question focused on macros, but those macro issues turned out to be peripheral.)
This is fast (according to criterium):
(defn domap2
[f coll]
(dotimes [i (count coll)]
(f (nth coll i))))
but it only accepts one collection. This accepts arbitrary collections:
(defn domap3
[f & colls]
(dotimes [i (apply min (map count colls))]
(apply f (map #(nth % i) colls))))
but it's very slow by comparison. I could also write a version like the first, but with different parameter cases [f c1 c2], [f c1 c2 c3], etc., but in the end, I'll need a case that handles arbitrary numbers of collections, like the last example, which is simpler anyway. I've tried many other solutions as well.
Since the second example is very much like the first except for the use of apply and the map inside the loop, I suspect that getting rid of them would speed things up a lot. I have tried to do this by writing domap2 as a macro, but the way that the catch-all variable after & is handled keeps tripping me up, as illustrated above.
Other examples (out of 15 or 20 different versions), benchmark code, and times on a Macbook Pro that's a few years old (full source here):
(defn domap1
[f coll]
(doseq [e coll]
(f e)))
(defn domap7
[f coll]
(dorun (map f coll)))
(defn domap18
[f & colls]
(dorun (apply map f colls)))
(defn domap15
[f coll]
(when (seq coll)
(f (first coll))
(recur f (rest coll))))
(defn domap17
[f & colls]
(let [argvecs (apply (partial map vector) colls)] ; seq of ntuples of interleaved vals
(doseq [args argvecs]
(apply f args))))
I'm working on an application that uses core.matrix matrices and vectors, but feel free to substitute your own side-effecting functions below.
(ns tst
(:use criterium.core
[clojure.core.matrix :as mx]))
(def howmany 1000)
(def a-coll (vec (range howmany)))
(def maskvec (zero-vector :vectorz howmany))
(defn unmaskit!
[idx]
(mx/mset! maskvec idx 1.0)) ; sets element idx of maskvec to 1.0
(defn runbench
[domapfn label]
(print (str "\n" label ":\n"))
(bench (def _ (domapfn unmaskit! a-coll))))
Mean execution times according to Criterium, in microseconds:
domap1: 12.317551 [doseq]
domap2: 19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7: 53.263230 [map with dorun]
domap18: 54.456801 [map with dorun, multiple collections]
domap15: 32.034993 [recur]
domap17: 95.259984 [doseq, multiple collections interleaved using map]
EDIT: It may be that dorun+map is the best way to implement domap for multiple large lazy sequence arguments, but doseq is still king when it comes to single lazy sequences. Performing the same operation as unmask! above, but running the index through (mod idx 1000), and iterating over (range 100000000), doseq is about twice as fast as dorun+map in my tests (i.e. (def domap25 (comp dorun map))).
You don't need a macro, and I don't see why a macro would be helpful here.
user> (defn do-map [f & lists] (apply mapv f lists) nil)
#'user/do-map
user> (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
note do-map here is eager (thanks to mapv) and only executes for side effects
Macros can use varargs lists, as the (useless!) macro version of do-map demonstrates:
user> (defmacro do-map-macro [f & lists] `(do (mapv ~f ~#lists) nil))
#'user/do-map-macro
user> (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
user> (macroexpand-1 '(do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40)))
(do (clojure.core/mapv (comp println +) (range 2 6) (range 8 11) (range 22 40)) nil)
Addendum:
addressing the efficiency / garbage-creation concerns:
note that below I truncate the output of the criterium bench function, for conciseness reasons:
(defn do-map-loop
[f & lists]
(loop [heads lists]
(when (every? seq heads)
(apply f (map first heads))
(recur (map rest heads)))))
user> (crit/bench (with-out-str (do-map-loop (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 11.367804 µs
...
This looks promising because it doesn't create a data structure that we aren't using anyway (unlike mapv above). But it turns out it is slower than the previous (maybe because of the two map calls?).
user> (crit/bench (with-out-str (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 7.427182 µs
...
user> (crit/bench (with-out-str (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 8.355587 µs
...
Since the loop still wasn't faster, let's try a version which specializes on arity, so that we don't need to call map twice on every iteration:
(defn do-map-loop-3
[f a b c]
(loop [[a & as] a
[b & bs] b
[c & cs] c]
(when (and a b c)
(f a b c)
(recur as bs cs))))
Remarkably, though this is faster, it is still slower than the version that just used mapv:
user> (crit/bench (with-out-str (do-map-loop-3 (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 9.450108 µs
...
Next I wondered if the size of the input was a factor. With larger inputs...
user> (def test-input (repeatedly 3 #(range (rand-int 100) (rand-int 1000))))
#'user/test-input
user> (map count test-input)
(475 531 511)
user> (crit/bench (with-out-str (apply do-map-loop-3 (comp println +) test-input)))
...
Execution time mean : 1.005073 ms
...
user> (crit/bench (with-out-str (apply do-map (comp println +) test-input)))
...
Execution time mean : 756.955238 µs
...
Finally, for completeness, the timing of do-map-loop (which as expected is slightly slower than do-map-loop-3)
user> (crit/bench (with-out-str (apply do-map-loop (comp println +) test-input)))
...
Execution time mean : 1.553932 ms
As we see, even with larger input sizes, mapv is faster.
(I should note for completeness here that map is slightly faster than mapv, but not by a large degree).

Piping data through arbitrary functions in Clojure

I know that the -> form can be used to pass the results of one function result to another:
(f1 (f2 (f3 x)))
(-> x f3 f2 f1) ; equivalent to the line above
(taken from the excellent Clojure tutorial at ociweb)
However this form requires that you know the functions you want to use at design time. I'd like to do the same thing, but at run time with a list of arbitrary functions.
I've written this looping function that does it, but I have a feeling there's a better way:
(defn pipe [initialData, functions]
(loop [
frontFunc (first functions)
restFuncs (rest functions)
data initialData ]
(if frontFunc
(recur (first restFuncs) (rest restFuncs) (frontFunc data) )
data )
) )
What's the best way to go about this?
I must admit I'm really new to clojure and I might be missing the point here completely, but can't this just be done using comp and apply?
user> (defn fn1 [x] (+ 2 x))
user> (defn fn2 [x] (/ x 3))
user> (defn fn3 [x] (* 1.2 x))
user> (defn pipe [initial-data my-functions] ((apply comp my-functions) initial-data))
user> (pipe 2 [fn1 fn2 fn3])
2.8
You can do this with a plain old reduce:
(defn pipe [x fs] (reduce (fn [acc f] (f acc)) x fs))
That can be shortened to:
(defn pipe [x fs] (reduce #(%2 %1) x fs))
Used like this:
user> (pipe [1 2 3] [#(conj % 77) rest reverse (partial map inc) vec])
[78 4 3]
If functions is a sequence of functions, you can reduce it using comp to get a composed function. At a REPL:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
#'user/my-list
user> ((reduce comp functions) 9)
20
apply also works in this case because comp takes a variable number of arguments:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
#'user/my-list
user> ((apply comp functions) 9)
20