Clojure error- GC overhead limit exceeded using reducer with lazy sequence - clojure

; Test 1 - Using Map Reduce (Successful)
(ns example
(:gen-class))
(require '[clojure.core.reducers :as r])
(def n 100000000000)
(time (println "map: " (reduce + 0N (map inc (range n)))))
I get:
map: 5000000000050000000000N
"Elapsed time: 8540888.550507 msecs"
; Test 2 - Using Map Reducer (Creates GC Error)
(ns example
(:gen-class))
(require '[clojure.core.reducers :as r])
(def n 100000000000)
(time (println "rmap: " (reduce + 0N (r/map inc (range n)))))
I get:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded, compiling:...
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
; Test 3 - Using Reducer with Fold (Creates GC Error)
(ns example
(:gen-class))
(require '[clojure.core.reducers :as r])
(def n 100000000000)
(time (println "fold: " (r/fold + (r/map inc (range n)))))
I get:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded, compiling:...
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Would have expected all three to produce the same result. Instead only #1 works but the other two have GC issues.
Note: You can get all three to work using smaller values of n.

Related

why is the time macro claiming a very short elapsed time for slow function call?

Was looking at the exercises at the bottom of chapter 9 of clojure for the brave and true (in particular the last one of searching multiple engines and returning the first hit of each)
I mocked the actual search with slurp part to be this:
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
And implemented the behavior like this:
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine))) engines)]
(doall futs)
(map deref futs)))
(I know the return here is a list and the exercise asks for a vector but can just do an into for that...)
but when i run this in the REPL
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
it seems to take 2 seconds after I added the doall (since map returns a lazy seq i don't think any of the futures even start unless i consume the seq, (last futs) also seems to work) but when I use the time macro in the REPL it reports almost no time consumed even if it is taking 2 seconds:
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.189609 msecs"
("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
what is going on with the time macro here?
TL;DR: Lazy seqs don't play nice with the time macro, and your function get-first-hit-from-each returns a lazy seq. To make lazy seqs work with time, wrap them in a doall, as suggested by the documentation. See below for a fuller thought process:
The following is the definition of the time macro in clojure.core (source):
(defmacro time
"Evaluates expr and prints the time it took. Returns the value of
expr."
{:added "1.0"}
[expr]
`(let [start# (. System (nanoTime))
ret# ~expr]
(prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
ret#))
Notice how the macro saves the return value of expr in ret#, right after which it prints the elapsed time? Only after that does ret# get returned. The key here is that your function get-first-hit-from-each returns a lazy sequence (since map returns a lazy sequence):
(type (get-first-hit-from-each "gray+cat" '("google" "bing")))
;; => clojure.lang.LazySeq
As such, when you do (time (get-first-hit-from-each "gray+cat" '("google" "bing"))), what gets saved in ret# is a lazy sequence, which doesn't actually get evaluated until we try to use its value...
We can check whether a lazy sequence has been evaluated using the realized? function. So let's tweak the time macro by adding a line to check whether ret# has been evaluated, right after the elapsed time is printed:
(defmacro my-time
[expr]
`(let [start# (. System (nanoTime))
ret# ~expr]
(prn (str "Elapsed time: " (/ (double (- (. System (nanoTime)) start#)) 1000000.0) " msecs"))
(prn (realized? ret#)) ;; has the lazy sequence been evaluated?
ret#))
Now testing it:
(my-time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 0.223054 msecs"
false
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
Nope... so that's why time prints inaccurately. None of the computationally-long stuff actually gets to run before the printout is made.
To fix that and get an accurate time, we need to ensure evaluation of the lazy seq, which can be done by strategically placing a doall in a bunch of possible places, either within your function, wrapping the map:
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine))) engines)]
(doall futs)
(doall (map deref futs))))
;; => #'propeller.core/get-first-hit-from-each
(time (get-first-hit-from-each "gray+cat" '("google" "bing")))
"Elapsed time: 2005.478689 msecs"
;; => ("https://www.google.com/search?q%3Dgray+cat" "https://www.bing.com/search?q%3Dgray+cat")
or within time, wrapping the function call:
(time (doall (get-first-hit-from-each "gray+cat" '("google" "bing"))))
Something weird is happening in your setup. It works as expected for me, taking 4 seconds:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
(defn get-first-hit-from-each
[query engines]
(let [futs (map (fn [engine]
(future (search-for query engine)))
engines)]
; (doall futs)
(mapv #(println (deref %)) futs)))
(dotest
(time
(get-first-hit-from-each "gray+cat" '("google" "bing"))))
with result
--------------------------------------
Clojure 1.10.2-alpha1 Java 14
--------------------------------------
Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 4001.384795 msecs"
I did not even use the doall.
Update:
I found my mistake. I had accidentally used mapv instead of map on line 15. This forces it to wait for each call to deref. If you use map here, you get a lazy seq of a lazy seq and the function ends before the timer expires (twice => 4 sec).
--------------------------------------
Clojure 1.10.2-alpha1 Java 14
--------------------------------------
Testing tst.demo.core
"Elapsed time: 0.182797 msecs"
I always recommend to use mapv instead of map. There is also a filterv available. When in doubt, force the output into a nice, eager vector using (vec ...) to rid yourself of the headaches due to laziness.
Maybe one time in a hundred you will need the features a lazy seq provides. The other times it is a problem since you cannot predict the order of execution of statements.
P.S.
See this list of documentation, including the fabulous Clojure CheatSheet.
P.P.S.
The OP is correct that, ideally, each query should have run in parallel in a separate thread (each future uses a separate thread). The problem is, again, due to lazy map behavior.
On the println at the end, each element in the lazy list from futs is not evaluated until required by the println. So, they don't even start until later, in sequence. This defeats the intended goal of parallel execution. Again, lazy-seq behavior is the cause.
The cure: make everything explicit & eager 99+ percent of the time (i.e. mapv):
(defn search-for
[query engine]
(Thread/sleep 2000)
(format "https://www.%s.com/search?q%%3D%s", engine query))
(defn get-first-hit-from-each
[query engines]
(let [futs (mapv (fn [engine]
(future (search-for query engine)))
engines)]
(mapv #(println (deref %)) futs)))
with result:
Testing tst.demo.core
https://www.google.com/search?q%3Dgray+cat
https://www.bing.com/search?q%3Dgray+cat
"Elapsed time: 2003.770331 msecs"

Measuring time of a process executing inside a future in Clojure using "time"

Below is a simplified version of an application I am working on. Specifically, I am interested in benchmarking the execution time of process-list. In the function process-list, I partition the input list into partitions equal to the number of threads I would like to execute in parallel. I then pass each partition to a thread through a call to future. Finally, In main I call process-list with time wrapped around it. Time should return the elapsed time of processing done by process-list but apparently, it only returns the amount of time it takes to create the future threads and does not wait for the futures to execute to completion. How can I dereference the futures inside process-list to ensure the elapsed time accounts for the execution of the future-threads to completion?
(ns listProcessing
(:require [clojure.string]
[clojure.pprint]
[input-random :as input]))
(def N-THREADS 4)
(def element_processing_retries (atom 0))
(def list-collection
"Each element is made into a ref"
(map ref input/myList))
(defn partition-list [threads list]
"partition list into required number of partitions which is equal
to the number of threads"
(let [partitions (partition-all
(Math/ceil (/ (count list) threads)) list)]
partitions))
(defn increase-element [element]
(ref-set element inc))
(defn process-list [list]
"Process `members of list` one by one."
(let [sub-lists (partition-list N-THREADS list)]
(doseq [sub-list sub-lists]
(let [futures '()
myFuture (future (dosync (swap! element_processing_retries inc)
(map increase-element sub-list)))]
(cons myFuture futures)
(map deref futures)))))
(defn main []
(let [f1 (future (time (process-list input/mylist)))]
#f1)
(main)
(shutdown-agents)
Below is an example of a simplified list input: Note the input here is simplified and the list processing too to simplify the question.
(ns input-random)
(def myList (list 1 2 4 7 89 12 34 45 56))
This will have some overhead. If you're trying to time millisecond differences, this will skew things a bit (although minute timings shouldn't be using time anyways).
I think your example was a little convoluted, so I reduced it down to what I think represents the problem a little better:
(time (doseq [n (range 5)]
(future
(Thread/sleep 2000))))
"Elapsed time: 1.687702 msecs"
The problem here is the same as the problem with your code: all this really does is time how long it takes for doseq to dispatch all the jobs.
The idea with my hack is to put each finished job into an atom, then check for an end condition in a busy wait:
(defn do-stuff [n-things]
(let [ret-atom (atom 0)]
(doseq [n (range n-things)]
(future
(Thread/sleep 2000)
(swap! ret-atom inc)))
ret-atom))
; Time how long it takes the entire `let` to run
(time
(let [n 5
ret-atom (do-stuff n)]
; Will block until the condition is met
(while (< #ret-atom n))))
"Elapsed time: 2002.813288 msecs"
The reason this is so hard to time is all you're doing is spinning up some side effects in a doseq. There's nothing defining what "done" is, so there's nothing to block on. I'm not great with core.async, but I suspect there may be something that may help in there. It may be possible to have a call to <!! that blocks until a channel has a certain number of elements. In that case, you would just need to put results into the channel as they're produced.

What is the correct way to perform side effects in a clojure atom swap

I'm keeping a registry of processes in an atom.
I want to start one and only one process (specifically a core.async go-loop) per id.
However, you're not supposed to perform side-effects in a swap!, so this code is no good:
(swap! processes-atom
(fn [processes]
(if (get processes id)
processes ;; already exists, do nothing
(assoc processes id (create-process! id)))))
How would I go about doing this correctly?
I have looked at locking, which takes an object as a monitor for the lock. I would prefer that each id - which are dynamic - have their own lock.
It seems that you need to protect processes-atom from concurrent modification, so that only single thread can have access to it. locking will work in this case. Since, by usage of locking, we will manage thread safety by ourselves, we can use volatile instead of atom (volatile is faster, but doesn't provide any thread-safety and atomicity guaranees).
Summing up the above, something like below should work fine:
(def processes-volatile (volatile! {}))
(defn create-and-save-process! [id]
(locking processes-volatile
(vswap! processes-volatile
(fn [processes]
(if (get processes id)
processes
(assoc processes id (create-process! id)))))))
You can do this by hand with locking, as OlegTheCat shows, and often that is a fine approach. However, in the comments you remark that it would be nice to avoid having the whole atom locked for as long as it takes to spawn a process, and that too is possible in a surprisingly simple way: instead of having a map from pid to process, have a map from pid to delay of process. That way, you can add a new delay very cheaply, and only actually create the process by dereferencing the delay, outside of the call to swap!. Dereferencing the delay will block waiting for that particular delay, so multiple threads who need the same process will not step on each other's toes, but the atom itself will be unlocked, allowing threads who want a different process to get it.
Here is a sample implementation of that approach, along with example definitions of the other vars your question implies, to make the code runnable as-is:
(def process-results (atom []))
(defn create-process! [id]
;; pretend creating the process takes a long time
(Thread/sleep (* 1000 (rand-int 3)))
(future
;; running it takes longer, but happens on a new thread
(Thread/sleep (* 1000 (rand-int 10)))
(swap! process-results conj id)))
(def processes-atom (atom {}))
(defn cached-process [id]
(-> processes-atom
(swap! (fn [processes]
(update processes id #(or % (delay (create-process! id))))))
(get id)
(deref)))
Of course only cached-process is needed if you already have the other things defined. And a sample run, to show that processes are successfully reused:
(defn stress-test [num-processes]
(reset! process-results [])
(reset! processes-atom {})
(let [running-processes (doall (for [i (range num-processes)]
(cached-process (rand-int 10))))]
(run! deref running-processes)
(deref process-results)))
user> (time (stress-test 40))
"Elapsed time: 18004.617869 msecs"
[1 5 2 0 9 7 8 4 3 6]
I prefer using a channel
(defn create-process! [id] {:id id})
(def ^:private processes-channel (chan))
(go (loop [processes {}]
(let [id (<! processes-channel)
process (if (contains? processes id)
(get processes id)
(create-process! id))]
(>! processes-channel process)
(recur (assoc processes id process)))))
(defn get-process-by-id
"Public API"
[id]
(>!! processes-channel id)
(<!! processes-channel))
Another answer is to use an agent to start each process. This decouples each process from each other, and avoids the problem of possible multiple calls to the "create-process" function:
(defn start-proc-agent
[state]
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" (:id state)))
(Thread/sleep delay)
(println (format "finished %d" (:id state)))
(merge state {:delay delay :state :running} )))
(def procs-agent (atom {}))
(dotimes [i 3]
(let [curr-agent (agent {:id i :state :unstarted})]
(swap! procs-agent assoc i curr-agent)
(send curr-agent start-proc-agent )))
(println "all dispatched...")
(pprint #procs-agent)
(Thread/sleep 3000)
(pprint #procs-agent)
When run we see:
starting 2
starting 1
starting 0
all dispatched...
{0 #<Agent#39d8240b: {:id 0, :state :unstarted}>,
1 #<Agent#3a6732bc: {:id 1, :state :unstarted}>,
2 #<Agent#7414167a: {:id 2, :state :unstarted}>}
finished 0
finished 1
finished 2
{0 #<Agent#39d8240b: {:id 0, :state :running, :delay 317}>,
1 #<Agent#3a6732bc: {:id 1, :state :running, :delay 1635}>,
2 #<Agent#7414167a: {:id 2, :state :running, :delay 1687}>}
So the global map procs-agent associates each process ID with the agent for that process. A side benefit of this approach is that you can send subsequent commands (in the form of functions) to the agent for a process and be assured they are independent (and parallel & asynchronous) to every other agent.
Alternate solution
Similar to your original question, we could use a single agent (instead of an agent per process) to simply serialize the creation of each process. Since agents are asynchronous, they don't have the possibility of re-trying the input function like swap!. Thus, side-effecting functions aren't a problem. You could write it like so:
(defn start-proc-once-only
[state i]
(let [curr-proc (get state i) ]
(if (= :running (:state curr-proc))
(do
(println "skipping restart of" i)
state)
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" i))
(Thread/sleep delay)
(println (format "finished %d" i))
(assoc state i {:delay delay :state :running})))))
(def procs (agent {}))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(println "all dispatched...")
(println :procs) (pprint #procs)
(Thread/sleep 5000)
(println :procs) (pprint #procs)
with result
:starting 0
:starting 1
:starting 2
starting 0
:starting 0
:starting 1
:starting 2
all dispatched...
:procs
{}
finished 0
starting 1
finished 1
starting 2
finished 2
skipping restart of 0
skipping restart of 1
skipping restart of 2
:procs
{0 {:delay 1970, :state :running},
1 {:delay 189, :state :running},
2 {:delay 1337, :state :running}}
I think you should use add-watch. It gets called once per change to the atom. In the watch-fn check whether a new id has been added to the atom, if so, create the process and add it to the atom. That'll trigger another call to the watch-fn, but that second call won't identify any new id needing a process.

Futures somehow slower then agents?

The following code does essentially just let you execute something like (function (range n)) in parallel.
(experiment-with-agents 10000 10 #(filter prime? %))
This for example finds the prime numbers between 0 and 10000 with 10 agents.
(experiment-with-futures 10000 10 #(filter prime? %))
Same just with futures.
Now the problem is that the solution with futures doesn't run faster with more futures. Example:
; Futures
(time (experiment-with-futures 10000 1 #(filter prime? %)))
"Elapsed time: 33417.524634 msecs"
(time (experiment-with-futures 10000 10 #(filter prime? %)))
"Elapsed time: 33891.495702 msecs"
; Agents
(time (experiment-with-agents 10000 1 #(filter prime? %)))
"Elapsed time: 33048.80492 msecs"
(time (experiment-with-agents 10000 10 #(filter prime? %)))
"Elapsed time: 9211.864133 msecs"
Why? Did I do something wrong (probably, new to Clojure and just playing around with stuff^^)? Because I thought that futures are actually prefered in that scenario.
Source:
(defn setup-agents
[coll-size num-agents]
(let [step (/ coll-size num-agents)
parts (partition step (range coll-size))
agents (for [_ (range num-agents)] (agent []) )
vect (map #(into [] [%1 %2]) agents parts)]
(vec vect)))
(defn start-agents
[coll f]
(for [[agent part] coll] (send agent into (f part))))
(defn results
[agents]
(apply await agents)
(vec (flatten (map deref agents))))
(defn experiment-with-agents
[coll-size num-agents f]
(-> (setup-agents coll-size num-agents)
(start-agents f)
(results)))
(defn experiment-with-futures
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (for [index (range num-futures)] (future (f (nth parts index))))]
(vec (flatten (map deref futures)))))
You're getting tripped up by the fact that for produces a lazy sequence inside of experiment-with-futures. In particular, this piece of code:
(for [index (range num-futures)] (future (f (nth parts index))))
does not immediately create all of the futures; it returns a lazy sequence that will not create the futures until the contents of the sequence are realized. The code that realizes the lazy sequence is:
(vec (flatten (map deref futures)))
Here, map returns a lazy sequence of the dereferenced future results, backed by the lazy sequence of futures. As vec consumes results from the sequence produced by map, each new future is not submitted for processing until the previous one completes.
To get parallel processing, you need to not create the futures lazily. Try wrapping the for loop where you create the futures inside a doall.
The reason you're seeing an improvement with agents is the call to (apply await agents) immediately before you gather the agent results. Your start-agents function also returns a lazy sequence and does not actually dispatch the agent actions. An implementation detail of apply is that it completely realizes small sequences (under 20 items or so) passed to it. A side effect of passing agents to apply is that the sequence is realized and all agent actions are dispatched before it is handed off to await.

Clojure error - GC overhead limit exceeded

I'm trying to randomly sample a large FASTQ file and write it to standard out. I keep getting 'GC overhead limit exceeded' errors and I'm not sure what I'm doing wrong. I've tried increasing Xmx in leiningen but that didn't help. Here is my code:
(ns fastq-sample.core
(:gen-class)
(:use clojure.java.io))
(def n-read-pair-lines 8)
(defn sample? [sample-rate]
(> sample-rate (rand)))
;
; Agent for writing the reads asynchronously
;
(def wtr (agent (writer *out*)))
(defn write-out [r]
(letfn [(write [out msg] (.write out msg) out)]
(send wtr write r)))
(defn write-close []
(send wtr #(.close %))
(await wtr))
;
; Main
;
(defn reads [file]
(->>
(input-stream file)
(java.util.zip.GZIPInputStream.)
(reader)
(line-seq)))
(defn -main [fastq-file sample-rate-str]
(let [sample-rate (Float. sample-rate-str)
in-reads (partition n-read-pair-lines (reads fastq-file))]
(doseq [x (filter (fn [_] (sample? sample-rate)) in-reads)]
(write-out (clojure.string/join "\n" x)))
(write-close)
(shutdown-agents)))
This is the same symptom I often get when I try to merge an infinite sequence into a simgle data structure like a map or vector. It very often means that memory was tight and the garbage collector could not keep up with demand for new objects. Most likely the wtr agent is too large for memory. Perhaps you may want to not store the printed results in the atom by changing
(write [out msg] (.write out msg) out)
to
(write [out msg] (.write out msg))