Out of memory on spit - clojure

I don't understand why my code raise an Out of memory exception.
I have an agent that call a function which append a line to the "test.log" file. The out of memory is on PersistentHashMap $ BitmapIndexedNode.assoc(PersistentHashMap.java:624).
(use 'clojure.java.io)
(def the-agent(agent nil))
(defn process [_o content]
(spit "test.log" content :append true)
)
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-off
the-agent
process
"Line to be appended\n"
)
)
)
Thanks !

When you have many agents running blocking (or just long) tasks at the same time you can get into trouble with Clojure's default behaviour. By default send-off uses unbounded parallelism which tends to fall over in situations like this. Fortunatly in Clojure 1.5 plus you can set the execution strategy used by send-off to limit the degree of parallel execution
(use 'clojure.java.io)
(def the-agent (agent nil))
(defn process [_o content]
(spit "test.log" content :append true))
(set-agent-send-executor!
(java.util.concurrent.Executors/newFixedThreadPool 20))
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-off
the-agent
process
"Line to be appended\n")))
which then completes with out running out of memory:
hello.core> (write-all)
nil
hello.core>
This is a global change that affects all agents in most cases it is preferable to make a thread pool specifically for this task and use send-via to use that specific pool:
(def output-thread-pool (java.util.concurrent.Executors/newFixedThreadPool 20))
(defn write-all []
(doseq
[x (range 1 5000000)]
(send-via output-thread-pool
the-agent
process
"Line to be appended\n")))
This allows you to choose the degree of parallelism you want for each task. Just remember to shut down your thread pools when you are finished with them.

The dispatched sends are blocked on I/O on the individual spits. The dispatches are created much faster than they can be completed and are accumulating.
(defn write-all []
(doseq [x (range 1 5000000)]
(send-off the-agent process "foo")
(when (zero? (mod x 100000))
(println (. the-agent clojure.lang.Agent/getQueueCount)))))
user=> (write-all)
99577
199161
298644
398145
497576
596548
Exception in thread "nREPL-worker-0" java.lang.OutOfMemoryError: Java heap space

Related

Clojure: Polling database periodically - core.async w/ timeout channel VS vanilla recursive Thread w/ sleep?

I have a Ring-based server which has an atom for storing application state which is periodically fetched from database every 10 seconds for frequently changing info and every 60 seconds for the rest.
(defn set-world-update-interval
[f time-in-ms]
(let [stop (async/chan)]
(async/go-loop []
(async/alt!
(async/timeout time-in-ms) (do (async/<! (async/thread (f)))
(recur))
stop :stop))
stop))
(mount/defstate world-listener
:start (set-world-update-interval #(do (println "Checking data in db") (reset! world-atom (fetch-world-data)) ) 10000)
:stop (async/close! world-listener))
It works pretty good. RAM usage is pretty stable. But I'm wondering if this is an improper use of core.async?
Perhaps it should be a regular Thread instead like this?
(doto (Thread. (fn []
(loop []
(Thread/sleep 1000)
(println "Checking data in db")
(reset! world-atom (fetch-world-data))
(recur))))
(.setUncaughtExceptionHandler
(reify Thread$UncaughtExceptionHandler
(uncaughtException [this thread exception]
(println "Cleaning up!"))))
(.start))
While there's nothing wrong with your core.async implementation of this pattern, I'd suggest using a java.util.concurrent.ScheduledExecutorService for this. It gives you precise control over the thread pool and the scheduling.
Try something like this:
(ns your-app.world
(:require [clojure.tools.logging :as log]
[mount.core :as mount])
(:import
(java.util.concurrent Executors ScheduledExecutorService ThreadFactory TimeUnit)))
(defn ^ThreadFactory create-thread-factory
[thread-name-prefix]
(let [thread-number (atom 0)]
(reify ThreadFactory
(newThread [_ runnable]
(Thread. runnable (str thread-name-prefix "-" (swap! thread-number inc)))))))
(defn ^ScheduledExecutorService create-single-thread-scheduled-executor
[thread-name-prefix]
(let [thread-factory (create-thread-factory thread-name-prefix)]
(Executors/newSingleThreadScheduledExecutor thread-factory)))
(defn schedule
[executor runnable interval unit]
(.scheduleWithFixedDelay executor runnable 0 interval unit))
(defn shutdown-executor
"Industrial-strength executor shutdown, modify/simplify according to need."
[^ScheduledExecutorService executor]
(if (.isShutdown executor)
(log/info "Executor already shut down")
(do
(log/info "Shutting down executor")
(.shutdown executor) ;; Disable new tasks from being scheduled
(try
;; Wait a while for currently running tasks to finish
(if-not (.awaitTermination executor 10 TimeUnit/SECONDS)
(do
(.shutdownNow executor) ;; Cancel currently running tasks
(log/info "Still waiting to shut down executor. Sending interrupt to tasks.")
;; Wait a while for tasks to respond to being cancelled
(when-not (.awaitTermination executor 10 TimeUnit/SECONDS)
(throw (ex-info "Executor could not be shut down" {}))))
(log/info "Executor shutdown completed"))
(catch InterruptedException _
(log/info "Interrupted while shutting down. Sending interrupt to tasks.")
;; Re-cancel if current thread also interrupted
(.shutdownNow executor)
;; Preserve interrupt status
(.interrupt (Thread/currentThread)))))))
(defn world-updating-fn
[]
(log/info "Updating world atom")
;; Do your thing here
)
(mount/defstate world-listener
:start (doto (create-single-thread-scheduled-executor "world-listener")
(schedule world-updating-fn 10 TimeUnit/MINUTES))
:stop (shutdown-executor world-listener))
It seems a little silly to me to use go-loop to create a goroutine that parks on a timeout, when all the work you actually want to do is IO-intensive, and therefore (rightly) done in a separate thread. The result of this is that you're going through threads every cycle. These threads are pooled by core.async, so you're not doing the expensive work of creating new threads from nothing, but there's still some overhead to getting them out of the pool and interacting with core.async for the timeout. I'd just keep it simple by leaving core.async out of this operation.

What is the correct way to perform side effects in a clojure atom swap

I'm keeping a registry of processes in an atom.
I want to start one and only one process (specifically a core.async go-loop) per id.
However, you're not supposed to perform side-effects in a swap!, so this code is no good:
(swap! processes-atom
(fn [processes]
(if (get processes id)
processes ;; already exists, do nothing
(assoc processes id (create-process! id)))))
How would I go about doing this correctly?
I have looked at locking, which takes an object as a monitor for the lock. I would prefer that each id - which are dynamic - have their own lock.
It seems that you need to protect processes-atom from concurrent modification, so that only single thread can have access to it. locking will work in this case. Since, by usage of locking, we will manage thread safety by ourselves, we can use volatile instead of atom (volatile is faster, but doesn't provide any thread-safety and atomicity guaranees).
Summing up the above, something like below should work fine:
(def processes-volatile (volatile! {}))
(defn create-and-save-process! [id]
(locking processes-volatile
(vswap! processes-volatile
(fn [processes]
(if (get processes id)
processes
(assoc processes id (create-process! id)))))))
You can do this by hand with locking, as OlegTheCat shows, and often that is a fine approach. However, in the comments you remark that it would be nice to avoid having the whole atom locked for as long as it takes to spawn a process, and that too is possible in a surprisingly simple way: instead of having a map from pid to process, have a map from pid to delay of process. That way, you can add a new delay very cheaply, and only actually create the process by dereferencing the delay, outside of the call to swap!. Dereferencing the delay will block waiting for that particular delay, so multiple threads who need the same process will not step on each other's toes, but the atom itself will be unlocked, allowing threads who want a different process to get it.
Here is a sample implementation of that approach, along with example definitions of the other vars your question implies, to make the code runnable as-is:
(def process-results (atom []))
(defn create-process! [id]
;; pretend creating the process takes a long time
(Thread/sleep (* 1000 (rand-int 3)))
(future
;; running it takes longer, but happens on a new thread
(Thread/sleep (* 1000 (rand-int 10)))
(swap! process-results conj id)))
(def processes-atom (atom {}))
(defn cached-process [id]
(-> processes-atom
(swap! (fn [processes]
(update processes id #(or % (delay (create-process! id))))))
(get id)
(deref)))
Of course only cached-process is needed if you already have the other things defined. And a sample run, to show that processes are successfully reused:
(defn stress-test [num-processes]
(reset! process-results [])
(reset! processes-atom {})
(let [running-processes (doall (for [i (range num-processes)]
(cached-process (rand-int 10))))]
(run! deref running-processes)
(deref process-results)))
user> (time (stress-test 40))
"Elapsed time: 18004.617869 msecs"
[1 5 2 0 9 7 8 4 3 6]
I prefer using a channel
(defn create-process! [id] {:id id})
(def ^:private processes-channel (chan))
(go (loop [processes {}]
(let [id (<! processes-channel)
process (if (contains? processes id)
(get processes id)
(create-process! id))]
(>! processes-channel process)
(recur (assoc processes id process)))))
(defn get-process-by-id
"Public API"
[id]
(>!! processes-channel id)
(<!! processes-channel))
Another answer is to use an agent to start each process. This decouples each process from each other, and avoids the problem of possible multiple calls to the "create-process" function:
(defn start-proc-agent
[state]
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" (:id state)))
(Thread/sleep delay)
(println (format "finished %d" (:id state)))
(merge state {:delay delay :state :running} )))
(def procs-agent (atom {}))
(dotimes [i 3]
(let [curr-agent (agent {:id i :state :unstarted})]
(swap! procs-agent assoc i curr-agent)
(send curr-agent start-proc-agent )))
(println "all dispatched...")
(pprint #procs-agent)
(Thread/sleep 3000)
(pprint #procs-agent)
When run we see:
starting 2
starting 1
starting 0
all dispatched...
{0 #<Agent#39d8240b: {:id 0, :state :unstarted}>,
1 #<Agent#3a6732bc: {:id 1, :state :unstarted}>,
2 #<Agent#7414167a: {:id 2, :state :unstarted}>}
finished 0
finished 1
finished 2
{0 #<Agent#39d8240b: {:id 0, :state :running, :delay 317}>,
1 #<Agent#3a6732bc: {:id 1, :state :running, :delay 1635}>,
2 #<Agent#7414167a: {:id 2, :state :running, :delay 1687}>}
So the global map procs-agent associates each process ID with the agent for that process. A side benefit of this approach is that you can send subsequent commands (in the form of functions) to the agent for a process and be assured they are independent (and parallel & asynchronous) to every other agent.
Alternate solution
Similar to your original question, we could use a single agent (instead of an agent per process) to simply serialize the creation of each process. Since agents are asynchronous, they don't have the possibility of re-trying the input function like swap!. Thus, side-effecting functions aren't a problem. You could write it like so:
(defn start-proc-once-only
[state i]
(let [curr-proc (get state i) ]
(if (= :running (:state curr-proc))
(do
(println "skipping restart of" i)
state)
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" i))
(Thread/sleep delay)
(println (format "finished %d" i))
(assoc state i {:delay delay :state :running})))))
(def procs (agent {}))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(println "all dispatched...")
(println :procs) (pprint #procs)
(Thread/sleep 5000)
(println :procs) (pprint #procs)
with result
:starting 0
:starting 1
:starting 2
starting 0
:starting 0
:starting 1
:starting 2
all dispatched...
:procs
{}
finished 0
starting 1
finished 1
starting 2
finished 2
skipping restart of 0
skipping restart of 1
skipping restart of 2
:procs
{0 {:delay 1970, :state :running},
1 {:delay 189, :state :running},
2 {:delay 1337, :state :running}}
I think you should use add-watch. It gets called once per change to the atom. In the watch-fn check whether a new id has been added to the atom, if so, create the process and add it to the atom. That'll trigger another call to the watch-fn, but that second call won't identify any new id needing a process.

Strange behavior of clojure ref

I have 100 workers (agents) that share one ref that contains collection of tasks. While this collection have tasks, each worker get one task from this collection (in dosync block), print it and sometimes put it back in the collection (in dosync block):
(defn have-tasks?
[tasks]
(not (empty? #tasks)))
(defn get-task
[tasks]
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task)))
(defn put-task
[tasks task]
(dosync (alter tasks conj task))
nil)
(defn worker
[& {:keys [tasks]}]
(agent {:tasks tasks}))
(defn worker-loop
[{:keys [tasks] :as state}]
(while (have-tasks? tasks)
(let [task (get-task tasks)]
(println "Task: " task)
(when (< (rand) 0.1)
(put-task tasks task))))
state)
(defn create-workers
[count & options]
(->> (range 0 count)
(map (fn [_] (apply worker options)))
(into [])))
(defn start-workers
[workers]
(doseq [worker workers] (send-off worker worker-loop)))
(def tasks (ref (range 1 10000000)))
(def workers (create-workers 100 :tasks tasks))
(start-workers workers)
(apply await workers)
When i run this code, the last value printed by agents is (after several tries):
435445,
4556294,
1322061,
3950017.
But never 9999999 what I expect.
And every time the collection is really empty at the end.
What I'm doing wrong?
Edit:
I rewrote worker-loop as simple as possible:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop []
(when-let [task (get-task tasks)]
(println "Task: " task)
(recur)))
state)
But problem is still there.
This code behaves as expected when create one and only one worker.
The problem here has nothing to do with agents and barely anything to do with laziness. Here's a somewhat reduced version of the original code that still exhibits the problem:
(defn f [init]
(let [state (ref init)
task (fn []
(loop [last-n nil]
(if-let [n (dosync
(let [n (first #state)]
(alter state rest)
n))]
(recur n)
(locking :out
(println "Last seen:" last-n)))))
workers (->> (range 0 5)
(mapv (fn [_] (Thread. task))))]
(doseq [w workers] (.start w))
(doseq [w workers] (.join w))))
(defn r []
(f (range 1 100000)))
(defn i [] (f (->> (iterate inc 1)
(take 100000))))
(defn t []
(f (->> (range 1 100000)
(take Integer/MAX_VALUE))))
Running this code shows that both i and t, both lazy, reliably work, whereas r reliably doesn't. The problem is in fact a concurrency bug in the class returned by the range call. Indeed, that bug is documented in this Clojure ticket and is fixed as of Clojure version 1.9.0-alpha11.
A quick summary of the bug in case the ticket is not accessible for some reason: in the internals of the rest call on the result of range, there was a small opportunity for a race condition: the "flag" that says "the next value has already been computed" was set before the actual value itself, which meant that a second thread could see that flag as true even though the "next value" is still nil. The call to alter would then fix that nil value on the ref. It's been fixed by swapping the two assignment lines.
In cases where the result of range was either forcibly realized in a single thread or wrapped in another lazy seq, that bug would not appear.
I asked this question on the Clojure Google Group and it helped me to find the answer.
The problem is that I used a lazy sequence within the STM transaction.
When I replaced this code:
(def tasks (ref (range 1 10000000)))
by this:
(def tasks (ref (into [] (range 1 10000000))))
it worked as expected!
In my production code where the problem occurred, I used the Korma framework that also returns a lazy collection of tuples, as in my example.
Conclusion: Avoid the use of lazy data structures within the STM transaction.
When the last number in the range is reached, there a are still older numbers being held by the workers. Some of these will be returned to the queue, to be processed again.
In order to better see what is happening, you can change worker-loop to print the last task handled by each worker:
(defn worker-loop
[{:keys [tasks] :as state}]
(loop [last-task nil]
(if (have-tasks? tasks)
(let [task (get-task tasks)]
;; (when (< (rand) 0.1)
;; (put-task tasks task)
(recur task))
(when last-task
(println "Last task:" last-task))))
state)
This also shows the race condition in the code, where tasks seen by have-tasks? often is taken by others when get-task is called near the end of the processing of the tasks.
The race condition can be solved by removing have-tasks? and instead using the return value of nil from get-task as a signal that no more tasks are available (at the moment).
Updated:
As observed, this race conditions does not explain the problem.
Neither is the problem solved by removing a possible race condition in get-task like this:
(defn get-task [tasks]
(dosync
(first (alter tasks rest))))
However changing get-task to use an explicit lock seems to solve the problem:
(defn get-task [tasks]
(locking :lock
(dosync
(let [task (first #tasks)]
(alter tasks rest)
task))))

How to stop go block in ClojureScript / core.async?

Is there an elegant way to stop a running go block?
(without introducing a flag and polluting the code with checks/branches)
(ns example
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [cljs.core.async :refer [<! timeout]]))
(defn some-long-task []
(go
(println "entering")
; some complex long-running task (e.g. fetching something via network)
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(<! (timeout 1000))
(println "leaving")))
; run the task
(def task (some-long-task))
; later, realize we no longer need the result and want to cancel it
; (stop! task)
Sorry, this is not possible with core.async today. What you get back from creating a go block is a normal channel what the result of the block will be put on, though this does not give you any handle to the actual block itself.
As stated in Arthur's answer, you cannot terminate a go block immediately, but you since your example indicates a multi-phased task (using sub-tasks), an approach like this might work:
(defn task-processor
"Takes an initial state value and number of tasks (fns). Puts tasks
on a work queue channel and then executes them in a go-loop, with
each task passed the current state. A task's return value is used as
input for next task. When all tasks are processed or queue has been
closed, places current result/state onto a result channel. To allow
nil values, result is wrapped in a map:
{:value state :complete? true/false}
This fn returns a map of {:queue queue-chan :result result-chan}"
[init & tasks]
(assert (pos? (count tasks)))
(let [queue (chan)
result (chan)]
(async/onto-chan queue tasks)
(go-loop [state init, i 0]
(if-let [task (<! queue)]
(recur (task state) (inc i))
(do (prn "task queue finished/terminated")
(>! result {:value state :complete? (== i (count tasks))}))))
{:queue queue
:result result}))
(defn dummy-task [x] (prn :task x) (Thread/sleep 1000) (inc x))
;; kick of tasks
(def proc (apply task-processor 0 (repeat 100 dummy-task)))
;; result handler
(go
(let [res (<! (:result proc))]
(prn :final-result res)))
;; to stop the queue after current task is complete
;; in this example it might take up to an additional second
;; for the terminated result to be delivered
(close! (:queue proc))
You may want to use future and future-cancel for such task.
(def f (future (while (not (Thread/interrupted)) (your-function ... ))))
(future-cancel f)
Why do cancelled Clojure futures continue using CPU?

Producer consumer with qualifications

I am new to clojure and am trying to understand how to properly use its concurrency features, so any critique/suggestions is appreciated.
So I am trying to write a small test program in clojure that works as follows:
there 5 producers and 2 consumers
a producer waits for a random time and then pushes a number onto a shared queue.
a consumer should pull a number off the queue as soon as the queue is nonempty and then sleep for a short time to simulate doing work
the consumers should block when the queue is empty
producers should block when the queue has more than 4 items in it to prevent it from growing huge
Here is my plan for each step above:
the producers and consumers will be agents that don't really care for their state (just nil values or something); i just use the agents to send-off a "consumer" or "producer" function to do at some time. Then the shared queue will be (def queue (ref [])). Perhaps this should be an atom though?
in the "producer" agent function, simply (Thread/sleep (rand-int 1000)) and then (dosync (alter queue conj (rand-int 100))) to push onto the queue.
I am thinking to make the consumer agents watch the queue for changes with add-watcher. Not sure about this though..it will wake up the consumers on any change, even if the change came from a consumer pulling something off (possibly making it empty) . Perhaps checking for this in the watcher function is sufficient. Another problem I see is that if all consumers are busy, then what happens when a producer adds something new to the queue? Does the watched event get queued up on some consumer agent or does it disappear?
see above
I really don't know how to do this. I heard that clojure's seque may be useful, but I couldn't find enough doc on how to use it and my initial testing didn't seem to work (sorry don't have the code on me anymore)
Here's my take on it. I made a point of only using Clojure data structures to see how that would work out. Note that it would have been perfectly usual and idiomatic to take a blocking queue from the Java toolbox and use it here; the code would be easy to adapt, I think. Update: I actually did adapt it to java.util.concurrent.LinkedBlockingQueue, see below.
clojure.lang.PersistentQueue
Call (pro-con) to start a test run; then have a look at the contents of output to see if anything happened and queue-lengths to see if they stayed within the given bound.
Update: To explain why I felt the need to use ensure below (I was asked about this on IRC), this is to prevent write skew (see the Wikipedia article on Snapshot isolation for a definition). If I substituted #queue for (ensure queue), it would become possible for two or more producers to check the length of the queue, find that it is less than 4, then place additional items on the queue and possibly bring the total length of the queue above 4, breaking the constraint. Similarly, two consumers doing #queue could accept the same item for processing, then pop two items off the queue. ensure prevents either of these scenarios from happening.
(def go-on? (atom true))
(def queue (ref clojure.lang.PersistentQueue/EMPTY))
(def output (ref ()))
(def queue-lengths (ref ()))
(def *max-queue-length* 4)
(defn overseer
([] (overseer 20000))
([timeout]
(Thread/sleep timeout)
(swap! go-on? not)))
(defn queue-length-watch [_ _ _ new-queue-state]
(dosync (alter queue-lengths conj (count new-queue-state))))
(add-watch queue :queue-length-watch queue-length-watch)
(defn producer [tag]
(future
(while #go-on?
(if (dosync (let [l (count (ensure queue))]
(when (< l *max-queue-length*)
(alter queue conj tag)
true)))
(Thread/sleep (rand-int 2000))))))
(defn consumer []
(future
(while #go-on?
(Thread/sleep 100) ; don't look at the queue too often
(when-let [item (dosync (let [item (first (ensure queue))]
(alter queue pop)
item))]
(Thread/sleep (rand-int 500)) ; do stuff
(dosync (alter output conj item)))))) ; and let us know
(defn pro-con []
(reset! go-on? true)
(dorun (map #(%1 %2)
(repeat 5 producer)
(iterate inc 0)))
(dorun (repeatedly 2 consumer))
(overseer))
java.util.concurrent.LinkedBlockingQueue
A version of the above written using LinkedBlockingQueue. Note how the general outline of the code is basically the same, with some details actually being slightly cleaner. I removed queue-lengths from this version, as LBQ takes care of that constraint for us.
(def go-on? (atom true))
(def *max-queue-length* 4)
(def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length*))
(def output (ref ()))
(defn overseer
([] (overseer 20000))
([timeout]
(Thread/sleep timeout)
(swap! go-on? not)))
(defn producer [tag]
(future
(while #go-on?
(.put queue tag)
(Thread/sleep (rand-int 2000)))))
(defn consumer []
(future
(while #go-on?
;; I'm using .poll on the next line so as not to block
;; indefinitely if we're done; note that this has the
;; side effect that nulls = nils on the queue will not
;; be handled; there's a number of other ways to go about
;; this if this is a problem, see docs on LinkedBlockingQueue
(when-let [item (.poll queue)]
(Thread/sleep (rand-int 500)) ; do stuff
(dosync (alter output conj item)))))) ; and let us know
(defn pro-con []
(reset! go-on? true)
(dorun (map #(%1 %2)
(repeat 5 producer)
(iterate inc 0)))
(dorun (repeatedly 2 consumer))
(overseer))