I am reading Fogus' book on Joy of Clojure and in the parallel programming chapter I saw a function definition which surely want to illustrate something important but I can't find out what. Moreover, I can't see what is this function for - when I execute, it doesn't do anything:
(import '(java.util.concurrent Executors))
(def *pool* (Executors/newFixedThreadPool
(+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [_ exec-count] (f)))))
I tried to run in this way:
(defn wait [] (Thread/sleep 1000))
(dothreads! wait :thread-count 10 :exec-count 10)
(dothreads! wait)
(dothreads! #(println "running"))
...but it returns nil. Why?
So, here's the same code, tweaked slightly so that the function passed to dothreads! gets passed the count of the inner dotimes.
(import 'java.util.concurrent.Executors)
(def ^:dynamic *pool* (Executors/newFixedThreadPool (+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c)))))
(defn hello [name]
(println "Hello " name))
Try running it like this:
(dothreads! hello :threads 2 :times 4)
For me, it prints something to the effect of:
Hello 0
Hello 1
Hello 2
Hello 3
nil
user=> Hello 0
Hello 1
Hello 2
Hello 3
So, note one mistake you made when calling the function: you passed in :thread-count and :exec-count as the keys whereas those are actually the bindings in the destructuring that's happening inside dothreads!. The keywords are the words starting with a colon, :threads and :times.
As to what this code actually does:
It creates a new fixed size thread pool that will use at most the
number of cores in your machine + 2. This pool is called *pool* and is created using the Java Executor Framework. See [1] for more details.
The dothreads! function gets a function that will be called exec-count times on each of the thread-count threads. So, in the example above, you can clearly see it being called 4 times per thread (:threads being 2 and :times being 4).
The reason why this function returns nil is that the function dothreads! doesn't return anything. The submit method of the thread pool returns void in Java and this means it returns nil in Clojure. If you were to add some other expression at the end of the function making it:
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c))))
(* thread-count exec-count))
It will return 8 for the example above (2 * 4). Only the last expression in the function is returned, so if in a function you were to write (fn [x y] (+ x y) (* x y)) this will always return the product. The sum will be evaluated, but it will be for nothing. So, don't do this! If you want add more that one expression to a function, make sure that all but the last one have side effects, otherwise they'll be useless.
You might also notice that the order in which stuff is printed is asynchronous. So, on my machine, it says hello 4 times, then returns the result of the function and then says hello 4 other times. The order in which the functions are executed is undetermined between threads, however the hellos are sequential in each thread (there can never be a Hello 3 before a Hello 2). The reason for the sequentiality is that the function actually submitted to the thread pools is #(dotimes [c exec-count] (f c)) and
[1] http://download.oracle.com/javase/tutorial/essential/concurrency/executors.html
It's used afterwards in the book to run test functions multiple times in multiple threads. It doesn't illustrate anything by itself, but it's used to demonstrate locking, promises, and other parallel and concurrent stuff.
dotimes, dothreads!, and println are not pure functions: they're used to introduce side-effects. For example,
user=> (println 3)
3
nil
That code snippet prints 3 to the screen, but returns nil. Similarly, dothreads! is useful for its side-effects and not its return value.
Related
This question already has answers here:
Why isn't my code printing like it should?
(3 answers)
Closed 2 years ago.
(defn f1 []
(for [a [1 2]]
a))
;user=>(f1)
;(1 2)
(defn f2 []
(for [a [1 2]]
(prn a)))
;user=>(f2)
;1
;2
;(nil nil)
(defn f3 []
(for [a [1 2]]
(prn a))
'something-else)
;user=>(f3)
;something-else
Why does f3 not print 1 and 2 before printing 'something-else?
i.e. I expected and had assumed (wrongly) it would print the following:
; 1
; 2
; something-else
Came across this when using a for with a lot of code inside it and this played havoc with
my attempt to use prn statements to trace the value of variables while debugging.
i.e. prn and println do not print out. I think it's only when the for block is not the
final form in its enclosing form, but i'm still not sure what's going on.
The point being that a side-effect such as prn or println, should not require to be in return value position for it to fire. So there's something deeper with list comprehensions that I don't understand.
A notion as I write this - maybe in f3 the list comprehension is simply never evaluated, due to laziness? ... oh [censored].
Yes, that is indeed it:
(defn f4 []
(doall
(for [a [1 2]]
(prn a)))
'something-else)
user=> (f4)
;1
;2
;something-else
So, even though mostly solved, I will still post this question to consolidate the learning - would anyone care to post some examples of their own gotchas re. laziness.
As you have documented here, the macro for is lazy, and doesn't execute until it has to. You can force it by wrapping it in a (vec ...) form or a doall.
For debugging purposes especially, I like to use spyx and forv from the Tupelo library to avoid all of these gotchas:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn f5 []
(forv [a [1 2]]
(spyx a))
(spy :returning :something-else))
(dotest
(spyx (f5)))
With result:
-------------------------------
Clojure 1.10.1 Java 14
-------------------------------
Testing tst.demo.core
a => 1
a => 2
:returning => :something-else
(f5) => :something-else
Ran 2 tests containing 1 assertions.
0 failures, 0 errors.
So you can see printed nicely:
the labeled value of a at each step of the forv loop.
the return value of the function with a custom label
the explicit value of the function call (f5)
nothing anywhere is lazy
spy, spyx, et al always return the value printed for further use
a convenient unit test via is= confirms the output value.
There is a nice template project that you can clone to get you started quickly.
You may also be interested in with-result, or just doseq. Don't forget the Clojure CheatSheet.
I was practicing one Clojure tutorial and had to ensure that a for loop was executed so I put a println command there, but it did not display messages.
So now I have got the question...
This code prints Tom's name:
(ns tutorial.core)
(defn -main []
(println 'Jane)
(for [a ['Tom]]
(println a))
;; 'Kate
)
tutorial.core> (-main)
Jane
Tom
(nil)
tutorial.core>
but this not:
(ns tutorial.core)
(defn -main []
(println 'Jane)
(for [a ['Tom]]
(println a))
'Kate
)
tutorial.core> (-main)
Jane
Kate
tutorial.core>
Why? In which cases can we expect that println will not print texts?
for is not a loop, it is a sequence comprehension which returns a lazy sequence. Your for expression will therefore only execute its side-effects (calls to println) when the returned sequence is evaluated. The REPL evaluates the values returned from your calls to -main so it can print them.
Your first example returns a lazy sequence which is evaluted by the REPL causing the (println 'Tom) call to be evaluated. Since println returns nil, the resulting sequence contains a single nil value - this is the (nil) you see in the output.
Your second example creates the same sequence but does not evaluate it, instead 'Kate is returned from the function and the REPL prints that.
If you want an imperative for loop you should use doseq:
(defn -main []
(println 'Jane)
(doseq [a ['Tom]]
(println a))
'Kate)
As Lee says, if you only want side effects like printing, a doseq is the best solution as it never returns a value other than nil.
If you do want to use a for loop, you can remove the laziness by wrapping it inside a (vec ...) expression, which will force the for loop to run immediately. Thus we get:
(println :start)
(vec
(for [a [1 2 3]]
(println a)))
(println :end)
with result:
:start
1
2
3
:end
Without the vec, we get the behavior you saw:
(println :start)
(for [a [1 2 3]]
(println a))
(println :end)
with result:
:start
:end
I almost never want a lazy result, as the uncertainty over when a computation occurs can make debugging difficult. I use the above construct so often that I wrote a small macro forv that always returns a vector result, similar to the mapv function.
It appears that apply forces the realization of four elements given a lazy sequence.
(take 1
(apply concat
(repeatedly #(do
(println "called")
(range 1 10)))))
=> "called"
=> "called"
=> "called"
=> "called"
Is there a way to do an apply which does not behave this way?
Thank You
Is there a way to do an apply which does not behave this way?
I think the short answer is: not without reimplementing some of Clojure's basic functionality. apply's implementation relies directly on Clojure's implementation of callable functions, and tries to discover the proper arity of the given function to .invoke by enumerating the input sequence of arguments.
It may be easier to factor your solution using functions over lazy, un-chunked sequences / reducers / transducers, rather than using variadic functions with apply. For example, here's your sample reimplemented with transducers and it only invokes the body function once (per length of range):
(sequence
(comp
(mapcat identity)
(take 1))
(repeatedly #(do
(println "called")
(range 1 10))))
;; called
;; => (1)
Digging into what's happening in your example with apply, concat, seq, LazySeq, etc.:
repeatedly returns a new LazySeq instance: (lazy-seq (cons (f) (repeatedly f))).
For the given 2-arity (apply concat <args>), apply calls RT.seq on its argument list, which for a LazySeq then invokes LazySeq.seq, which will invoke your function
apply then calls a Java impl. method applyToHelper which tries to get the length of the argument sequence. applyToHelper tries to determine the length of the argument list using RT.boundedLength, which internally calls next and in turn seq, so it can find the proper overload of IFn.invoke to call
concat itself adds another layer of lazy-seq behavior.
You can see the stack traces of these invocations like this:
(take 1
(repeatedly #(do
(clojure.stacktrace/print-stack-trace (Exception.))
(range 1 10))))
The first trace descends from the apply's initial call to seq, and the subsequent traces from RT.boundedLength.
in fact, your code doesn't realize any of the items from the concatenated collections (ranges in your case). So the resulting collection is truly lazy as far as elements are concerned. The prints you get are from the function calls, generating unrealized lazy seqs. This one could easily be checked this way:
(defn range-logged [a b]
(lazy-seq
(when (< a b)
(println "realizing item" a)
(cons a (range-logged (inc a) b)))))
user> (take 1
(apply concat
(repeatedly #(do
(println "called")
(range-logged 1 10)))))
;;=> called
;; called
;; called
;; called
;; realizing item 1
(1)
user> (take 10
(apply concat
(repeatedly #(do
(println "called")
(range-logged 1 10)))))
;; called
;; called
;; called
;; called
;; realizing item 1
;; realizing item 2
;; realizing item 3
;; realizing item 4
;; realizing item 5
;; realizing item 6
;; realizing item 7
;; realizing item 8
;; realizing item 9
;; realizing item 1
(1 2 3 4 5 6 7 8 9 1)
So my guess is that you have nothing to worry about, as long as the collection returned from repeatedly closure is lazy
I'm learning core.async and have written a simple producer consumer code:
(ns webcrawler.parallel
(:require [clojure.core.async :as async
:refer [>! <! >!! <!! go chan buffer close! thread alts! alts!! timeout]]))
(defn consumer
[in out f]
(go (loop [request (<! in)]
(if (nil? request)
(close! out)
(do (print f)
(let [result (f request)]
(>! out result))
(recur (<! in)))))))
(defn make-consumer [in f]
(let [out (chan)]
(consumer in out f)
out))
(defn process
[f s no-of-consumers]
(let [in (chan (count s))
consumers (repeatedly no-of-consumers #(make-consumer in f))
out (async/merge consumers)]
(map #(>!! in %1) s)
(close! in)
(loop [result (<!! out)
results '()]
(if (nil? result)
results
(recur (<!! out)
(conj results result))))))
This code works fine when I step in through the process function in debugger supplied with Emacs' cider.
(process (partial + 1) '(1 2 3 4) 1)
(5 4 3 2)
However, if I run it by itself (or hit continue in the debugger) I get an empty result.
(process (partial + 1) '(1 2 3 4) 1)
()
My guess is that in the second case for some reason producer doesn't wait for consumers before exiting, but I'm not sure why. Thanks for help!
The problem is that your call to map is lazy, and will not run until something asks for the results. Nothing does this in your code.
There are 2 solutions:
(1) Use the eager function mapv:
(mapv #(>!! in %1) items)
(2) Use the doseq, which is intended for side-effecting operations (like putting values on a channel):
(doseq [item items]
(>!! in item))
Both will work and produce output:
(process (partial + 1) [1 2 3 4] 1) => (5 4 3 2)
P.S. You have a debug statement in (defn consumer ...)
(print f)
that produces a lot of noise in the output:
<#clojure.core$partial$fn__5561 #object[clojure.core$partial$fn__5561 0x31cced7
"clojure.core$partial$fn__5561#31cced7"]>
That is repeated 5 times back to back. You probably want to avoid that, as printing function "refs" is pretty useless to a human reader.
Also, debug printouts in general should normally use println so you can see where each one begins and ends.
I'm going to take a safe stab that this is being caused by the lazy behavior of map, and this line that's carrying out side effects:
(map #(>!! in %1) s)
Because you never explicitly use the results, it never runs. Change it to use mapv, which is strict, or more correctly, use doseq. Never use map to run side effects. It's meant to lazily transform a list, and abuse of it leads to behaviour like this.
So why is it working while debugging? I'm going to guess because the debugger forces evaluation as part of its operation, which is masking the problem.
As you can read from docstring map returns a lazy sequence. And I think the best way is to use dorun. Here is an example from clojuredocs:
;;map a function which makes database calls over a vector of values
user=> (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"])
JdbcSQLException The object is already closed [90007-170] org.h2.message.DbE
xception.getJdbcSQLException (DbException.java:329)
;;database connection was closed before we got a chance to do our transactions
;;lets wrap it in dorun
user=> (dorun (map #(db/insert :person {:name %}) ["Fred" "Ethel" "Lucy" "Ricardo"]))
DEBUG :db insert into person values name = 'Fred'
DEBUG :db insert into person values name = 'Ethel'
DEBUG :db insert into person values name = 'Lucy'
DEBUG :db insert into person values name = 'Ricardo'
nil
I am trying to wrap a go routine inside a function (as in Method 1 below). Method 2 works perfectly fine, but Method 1 doesn't. The only difference is that in Method 1, I am passing a channel as a parameter to a function and the put is inside the function. What are the exact rules concerning go routines and functions?
(defn doit [ch i]
(print "g\n")
(async/>! ch i)
(print "f\n"))
;Method 1
(let [c1 (async/chan)]
(async/go (while true
(let [[v ch] (async/alts! [c1])]
(println "Read" v "from" ch))))
(dotimes [i 10]
(async/go (doit c1 i))))
;Method 2
(let [ch (async/chan)]
(async/go (while true
(let [[v ch] (async/alts! [ch])]
(println "Read" v "from" ch))))
(dotimes [i 10]
(async/go
(print "g\n")
(async/>! ch i)
(print "f\n"))))
I also noticed that if I remove the go in Method 1 and move it to the do it function as shown below, the function prints "g" but not "f" but otherwise works fine. Why?
(defn doit [ch i]
(async/go
(print "g\n")
(async/>! ch i)
(print "f\n")))
>! and <! are not really functions that get called (see here and here). The go macro identifies these two symbols and auto-magically generates code according to the semantics of those two operators. So this macro has no way of knowing that a function uses the >! or <! operator internally, since all it gets is the forms that calls that function.
Method 1 is actually throwing an exception on each call to doit since the actual code for >! and <! is just an assertion that always fails. Evaluating the code for this method in a REPLy session started with lein repl shows the exception Exception in thread "async-dispatch-46" java.lang.AssertionError: Assert failed: >! used not in (go ...) block a bunch of times (10 to be exact). If you are using a REPL through an nREPL client you might not be seeing this because the exception is thrown asynchronously in the server and the client is not taking this into account.
Additionally instead of using (print "something\n") you could just use (println "something"), not really related to your question but thought I mentioned this.