clojure, freeze after call future and same - concurrency

I have test script
(defn foo [] ( print "OK!" ))
(print "main")
(future-call foo)
(print "end")
When I run it in REPL, always fine
user=> (defn foo [] ( print "OK!" ))
#'user/foo
user=> (print "main")
mainnil
user=> (future-call foo)
OK!#<core$future_call$reify__6320#1d4997: nil>
user=> (print "end")
endnil
But when I run it from console, I have strange freeze after the code has finished executing
$ time clojure-1.6 /tmp/1.clj
mainend
real 1m1.672s
user 0m2.229s
sys 0m0.143s
mainend displayed almost immediately, but returns to the shell takes about a minute.
pmap also work strange
(defn foo [x] ( print x ))
(print "main")
(pmap foo [1 2 3 4 5 6 7 8 9 0])
(print "end")
will displayed
$ time clojure-1.6 /tmp/1.clj
main12365409end
real 1m1.688s
user 0m2.320s
sys 0m0.114s
I known that ..365.. it's normal for concurrency code, but why 7 and 8 not displayed?

You need to call shutdown-agents
Note: If you leave out the call to (shutdown-agents), the program will on most (all?) OS/JVM combinations "hang" for 1 minute before the process exits. It is waiting for a thread created by the future call to be shut down. shutdown-agents will shut them down immediately, or (System/exit ) will exit immediately without waiting for them to shut down.
This wait occurs even if you use futures indirectly through some other Clojure functions that use them internally, such as pmap or clojure.java.shell/sh
From https://clojuredocs.org/clojure.core/future

Related

Printing inside a for loop

I was practicing one Clojure tutorial and had to ensure that a for loop was executed so I put a println command there, but it did not display messages.
So now I have got the question...
This code prints Tom's name:
(ns tutorial.core)
(defn -main []
(println 'Jane)
(for [a ['Tom]]
(println a))
;; 'Kate
)
tutorial.core> (-main)
Jane
Tom
(nil)
tutorial.core>
but this not:
(ns tutorial.core)
(defn -main []
(println 'Jane)
(for [a ['Tom]]
(println a))
'Kate
)
tutorial.core> (-main)
Jane
Kate
tutorial.core>
Why? In which cases can we expect that println will not print texts?
for is not a loop, it is a sequence comprehension which returns a lazy sequence. Your for expression will therefore only execute its side-effects (calls to println) when the returned sequence is evaluated. The REPL evaluates the values returned from your calls to -main so it can print them.
Your first example returns a lazy sequence which is evaluted by the REPL causing the (println 'Tom) call to be evaluated. Since println returns nil, the resulting sequence contains a single nil value - this is the (nil) you see in the output.
Your second example creates the same sequence but does not evaluate it, instead 'Kate is returned from the function and the REPL prints that.
If you want an imperative for loop you should use doseq:
(defn -main []
(println 'Jane)
(doseq [a ['Tom]]
(println a))
'Kate)
As Lee says, if you only want side effects like printing, a doseq is the best solution as it never returns a value other than nil.
If you do want to use a for loop, you can remove the laziness by wrapping it inside a (vec ...) expression, which will force the for loop to run immediately. Thus we get:
(println :start)
(vec
(for [a [1 2 3]]
(println a)))
(println :end)
with result:
:start
1
2
3
:end
Without the vec, we get the behavior you saw:
(println :start)
(for [a [1 2 3]]
(println a))
(println :end)
with result:
:start
:end
I almost never want a lazy result, as the uncertainty over when a computation occurs can make debugging difficult. I use the above construct so often that I wrote a small macro forv that always returns a vector result, similar to the mapv function.

run sql in parallel using future: but the sql is not executed

I have the following function
(defn run [x]
(doseq [i (range 1 x)]
(println i)
(future (j/execute! vertica-db ["insert /*+ direct */ into a select * from a limit 1"]))
))
when call it using
(run 100)
it will print 1..99, however if check the row number of table a, the row number is not increased which mean the sql is not executed. How to run the sql in parallel?
The only suspicious thing I see in your code is the fact that you never wait for the futures to finish (so maybe they don't ?).
You need to collect the values returned by future calls and then block until they finish by using (deref f)/#f (i.e. dereferencing the future) where f is one of those values.
Something like this should work:
(defn run [x]
(let [db-insert (fn [i] ((println i) (future (j/execute! vertica-db ["insert /*+ direct */ into a select * from a limit 1"]))))
inserts (doall (map db-insert (range 1 x)))] ;force execution of all the db-insert calls
(doseq [insert inserts] #insert))) ;wait for all the futures to finish

Out of memory on spit

I don't understand why my code raise an Out of memory exception.
I have an agent that call a function which append a line to the "test.log" file. The out of memory is on PersistentHashMap $ BitmapIndexedNode.assoc(PersistentHashMap.java:624).
(use 'clojure.java.io)
(def the-agent(agent nil))
(defn process [_o content]
(spit "test.log" content :append true)
)
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-off
the-agent
process
"Line to be appended\n"
)
)
)
Thanks !
When you have many agents running blocking (or just long) tasks at the same time you can get into trouble with Clojure's default behaviour. By default send-off uses unbounded parallelism which tends to fall over in situations like this. Fortunatly in Clojure 1.5 plus you can set the execution strategy used by send-off to limit the degree of parallel execution
(use 'clojure.java.io)
(def the-agent (agent nil))
(defn process [_o content]
(spit "test.log" content :append true))
(set-agent-send-executor!
(java.util.concurrent.Executors/newFixedThreadPool 20))
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-off
the-agent
process
"Line to be appended\n")))
which then completes with out running out of memory:
hello.core> (write-all)
nil
hello.core>
This is a global change that affects all agents in most cases it is preferable to make a thread pool specifically for this task and use send-via to use that specific pool:
(def output-thread-pool (java.util.concurrent.Executors/newFixedThreadPool 20))
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-via output-thread-pool
the-agent
process
"Line to be appended\n")))
This allows you to choose the degree of parallelism you want for each task. Just remember to shut down your thread pools when you are finished with them.
The dispatched sends are blocked on I/O on the individual spits. The dispatches are created much faster than they can be completed and are accumulating.
(defn write-all []
(doseq [x (range 1 5000000)]
(send-off the-agent process "foo")
(when (zero? (mod x 100000))
(println (. the-agent clojure.lang.Agent/getQueueCount)))))
user=> (write-all)
99577
199161
298644
398145
497576
596548
Exception in thread "nREPL-worker-0" java.lang.OutOfMemoryError: Java heap space

why can't I call a println after a while in defn body in Clojure?

I have a function that looks like,
(defn app [server]
(println "before while...."))
(while test
while-body)
(println "...after while."))
However when I call the fn I just see the "before while" at the REPL, and then when the while fails its test, "nil".
If I write a test foo at the repl like
(defn foo []
(println "testing before")
(loop [i 100]
(when (> i 10)
(prn i)
(recur (- i 2))))
(println "after..."))
It works as I'd expect.
I've put the actual code up in a paste here, https://www.refheap.com/paste/12147 , if it helps.
What explains the difference in behavior here?
edit
Apologies for not trying this before, but this does work at the REPL:
(defn bar []
(let [i (atom 100)]
(println "before...")
(while (> #i 10)
(swap! i dec))
(println "after...")))
So there's something else going on.
edit #2
Testing more at the repl, if I comment out the while loop, the println before and after will print. I was mistaken before about the 'nil', this is the return value of a different function called after the while was called. So it seems to have something to do with the while loop.
I noticed that if I change the while to this
(loop []
(if test
(do things and recur...)
(println "test failed")))
The "test failed" never prints to the repl.
You've got an extranious ) at the end of the first println.
(defn app [server]
(println "before while....")
(while test
while-body)
(println "...after while."))
But since this is obviously example code that you didn't run, I expect the problem to be in the code that you did run. Please copy & paste that code exactly as is if this doesn't fix the problem.
The problem wasn't what I thought it was. I was blocking on a select call (called in the while loop) and that was causing problems with my shutdown function which ended the while loop. Adding a timeout to the select fixes it.
In Clojure function definitions, the last thing to be evaluated is what's returned, and (println ...) returns nil

Clojure dothreads! function

I am reading Fogus' book on Joy of Clojure and in the parallel programming chapter I saw a function definition which surely want to illustrate something important but I can't find out what. Moreover, I can't see what is this function for - when I execute, it doesn't do anything:
(import '(java.util.concurrent Executors))
(def *pool* (Executors/newFixedThreadPool
(+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [_ exec-count] (f)))))
I tried to run in this way:
(defn wait [] (Thread/sleep 1000))
(dothreads! wait :thread-count 10 :exec-count 10)
(dothreads! wait)
(dothreads! #(println "running"))
...but it returns nil. Why?
So, here's the same code, tweaked slightly so that the function passed to dothreads! gets passed the count of the inner dotimes.
(import 'java.util.concurrent.Executors)
(def ^:dynamic *pool* (Executors/newFixedThreadPool (+ 2 (.availableProcessors (Runtime/getRuntime)))))
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c)))))
(defn hello [name]
(println "Hello " name))
Try running it like this:
(dothreads! hello :threads 2 :times 4)
For me, it prints something to the effect of:
Hello 0
Hello 1
Hello 2
Hello 3
nil
user=> Hello 0
Hello 1
Hello 2
Hello 3
So, note one mistake you made when calling the function: you passed in :thread-count and :exec-count as the keys whereas those are actually the bindings in the destructuring that's happening inside dothreads!. The keywords are the words starting with a colon, :threads and :times.
As to what this code actually does:
It creates a new fixed size thread pool that will use at most the
number of cores in your machine + 2. This pool is called *pool* and is created using the Java Executor Framework. See [1] for more details.
The dothreads! function gets a function that will be called exec-count times on each of the thread-count threads. So, in the example above, you can clearly see it being called 4 times per thread (:threads being 2 and :times being 4).
The reason why this function returns nil is that the function dothreads! doesn't return anything. The submit method of the thread pool returns void in Java and this means it returns nil in Clojure. If you were to add some other expression at the end of the function making it:
(defn dothreads! [f & {thread-count :threads
exec-count :times
:or {thread-count 1 exec-count 1}}]
(dotimes [t thread-count]
(.submit *pool* #(dotimes [c exec-count] (f c))))
(* thread-count exec-count))
It will return 8 for the example above (2 * 4). Only the last expression in the function is returned, so if in a function you were to write (fn [x y] (+ x y) (* x y)) this will always return the product. The sum will be evaluated, but it will be for nothing. So, don't do this! If you want add more that one expression to a function, make sure that all but the last one have side effects, otherwise they'll be useless.
You might also notice that the order in which stuff is printed is asynchronous. So, on my machine, it says hello 4 times, then returns the result of the function and then says hello 4 other times. The order in which the functions are executed is undetermined between threads, however the hellos are sequential in each thread (there can never be a Hello 3 before a Hello 2). The reason for the sequentiality is that the function actually submitted to the thread pools is #(dotimes [c exec-count] (f c)) and
[1] http://download.oracle.com/javase/tutorial/essential/concurrency/executors.html
It's used afterwards in the book to run test functions multiple times in multiple threads. It doesn't illustrate anything by itself, but it's used to demonstrate locking, promises, and other parallel and concurrent stuff.
dotimes, dothreads!, and println are not pure functions: they're used to introduce side-effects. For example,
user=> (println 3)
3
nil
That code snippet prints 3 to the screen, but returns nil. Similarly, dothreads! is useful for its side-effects and not its return value.