Create 10k+ agents in clojure

Create 10k+ agents in clojure - concurrency

As I tested, a separate thread is used for each new agent, when I create them.
Could several agents be run in one thread?
My idea is to create 10K+ light-weight agents (like actors in erlang), so is it a challenge for Clojure?
Thanks

This is incorrect. Agents use a thread pool which is the number of core + 2 in size. So on a quad core machine even 10k+ agents will only use 6 worker threads.
With send, that is. With send-off new threads will be started.

Consider using a j.u.c.DelayQueue
Here's a sketch of how it would work,
the (delayed-function is a bit cumbersome here, but it basically constructs an instance of j.u.c.Delayed for submission to the queue.)
(import [java.util.concurrent Delayed DelayQueue TimeUnit])
(defn delayed-function [f]
(let [execute-time    (+ 5000 (System/currentTimeMillis))
remaining-delay (fn [t] (.convert t
(- execute-time
(System/currentTimeMillis))
TimeUnit/MILLISECONDS))]
(reify
Delayed    (getDelay [_ t] (remaining-delay t))
Runnable   (run [_] (f))
Comparable (compareTo [_ _] 0))))
;;; Use java's DelayQueue from clojure.
;;; See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/DelayQueue.html
(def q (DelayQueue.))
(defn delayed
"put the function f on the queue, it will only execute after the delay
expires"
[f]
(.offer q (delayed-function f)))
(defn start-processing
"starts a thread that endlessly reads from the delay queue and
executes the function found in the queue"
[]
(.start
(Thread.
#(while true
(.run (.take q))))))
user> (start-processing)
user> (delayed #(println "Hello"))
; 5 seconds passes
Hello

the at function of the at-at library that was developed to support the (in my opinion fantastic) Overtone music synthesizer provides a nice clean interfase for running functions at a specific point in time.
(use 'overtone.at-at)
(def my-pool (mk-pool))
(after 1000 #(println "hello from the past!") my-pool)

Related

Sleeping Barber Problem in Clojure using core/async and go

Hello and thank you for any help. I am just starting to learn Clojure, and think its amazing. Below is my codes for the sleeping barber problem. I thought that dropping-buffers from core/async would be perfect for this problem, and while it seems to work it never stops.
The haircuts and dropping buffer seem to work right.
---Edited
It does stop now. But I get an error trying to check customer-num for nil (I've noted the line in the code below). It seems like it can't do an if on a nil because it's nil!
(if (not (nil? customer-num)) ;; throws error => Cannot invoke "clojure.lang.IFn.invoke()" because the return value of "clojure.lang.IFn.invoke(Object)" is null
---End of edit
Also, how to get the return value of the number of haircuts to the calling operate-shop?
Sleeping barber problem as written up in Seven Languages in Seven Weeks. It was created by Edsger Dijkstra in 1965.
A barber shop takes customers.
Customers arrive at random intervals, from ten to thirty milliseconds.
The barber shop has three chairs in the waiting room.
The barber shop has one barber and one barber chair.
When the barber’s chair is empty, a customer sits in the chair, wakes up the barber, and gets a haircut.
If the chairs are occupied, all new customers will turn away.
Haircuts take twenty milliseconds.
After a customer receives a haircut, he gets up and leaves.
Determine how many haircuts a barber can give in ten seconds.
(ns sleepbarber
(:require [clojure.core.async
:as a
:refer [>! <! >!! <!! go go-loop chan dropping-buffer close! thread
alt! alts! alts!! timeout]]))
(def barber-shop (chan (dropping-buffer 3))) ;; no more than 3 customers waiting
(defn cut-hair []
(go-loop [haircuts 0]
(let [customer-num (<! barber-shop)]
(if (not (nil? customer-num)) ;; throws error => Cannot invoke "clojure.lang.IFn.invoke()" because the return value of "clojure.lang.IFn.invoke(Object)" is null
(do (<! (timeout 20)) ;; wait for haircut to finish
(println haircuts "haircuts!" (- customer-num haircuts) "customers turned away!!")
(recur (inc haircuts)))
haircuts))))
(defn operate-shop [open-time]
((let [[_ opening] (alts!! [(timeout open-time)
(go-loop [customer 0]
(<! (timeout (+ 10 (rand-int 20)))) ;; wait for random arrival of customers
(>! barber-shop customer)
(recur (+ customer 1)))])]
(close! barber-shop)
(close! opening)
)))
(cut-hair)
(operate-shop 2000)

Without running your code to confirm my suspicions, I see two problems with your implementation.
The first is that the body of operate-shop starts with ((, which you appear to intend as a grouping mechanism. But of course, in Clojure, (f x y) is how you call the function f with arguments x y. So your implementation calls alts!, then calls close!, then calls shutdown-agents - all intended so far - but then calls the result of alts! (which surely is not a function) with two nil arguments. So you should get a ClassCastException once your shop closes. Normally I would recommend just removing the outer parens, but since you're using core.async you should wrap the body in go, as in (go x y z). Is this your real code? If you call alts! outside of a go context, as your snippet suggests, you can only get a runtime error.
The second is that your first go-loop has no termination condition. You treat customer-num as if it were a number, but if the channel is closed, it will be nil: that's how you can tell a channel is closed. Involving it in subtraction should throw some kind of exception. Instead, you should check whether the result is nil, and if so, exit the loop as the shop is closed.

Closing a channel at the producer end when all the jobs are finished

For my Mandelbrot explorer project, I need to run several expensive jobs, ideally in parallel. I decided to try chunking the jobs, and running each chunk in its own thread, and end ended up with something like
(defn point-calculator [chunk-size points]
(let [out-chan (chan (count points))
chunked (partition chunk-size points)]
(doseq [chunk chunked]
(thread
(let [processed-chunk (expensive-calculation chunk)]
(>!! out-chan processed-chunk))))
out-chan))
Where points is a list of [real, imaginary] coordinates to be tested, and expensive-calculation is a function that takes the chunk, and tests each point in the chunk. Each chunk can take a long time to finish (potentially a minute or more depending on the chunk size and the number of jobs).
On my consumer end, I'm using
(loop []
(when-let [proc-chunk (<!! result-chan)]
; Do stuff with chunk
(recur)))
To consume each processed chunk. Right now, this blocks when the last chunk is consumed since the channel is still open.
I need a way of closing the channel when the jobs are done. This is proving difficult because of asynchronicity of the producer loop. I can't simply put a close! after the doseq since the loop doesn't block, and I can't just close when the last-indexed job is done, since the order is indeterminate.
The best idea I could come up with was maintaining a (atom #{}) of jobs, and disj each job as it finishes. Then I could either check for the set size in the loop, and close! when it's 0, or attach a watch to the atom and check there.
This seems very hackish though. Is there a more idiomatic way of dealing with this? Does this scenario suggest I'm using async incorrectly?

i would take a look at the take function from core-async. That is what it's documentation says:
"Returns a channel that will return, at most, n items from ch. After n items
have been returned, or ch has been closed, the return channel will close.
"
so it leads you to a simple fix: instead of returning out-chan you can just wrap it into take:
(clojure.core.async/take (count chunked) out-chan)
that should work.
Also i would recommend you to rewrite your example from blocking put/get to parking (<!, >!) and thread to go / go-loop which is more idiomatic usage for core async.

You may want to use async/pipeline(-blocking) to control parallelisms. And use aysnc/onto-chan to close the input channel automatically after all the chunks are copied.
E.g. below example shows a 16x improvement on elapsed time when parallelisms is set to 16.
(defn expensive-calculation [pts]
(Thread/sleep 100)
(reduce + pts))
(time
(let [points (take 10000 (repeatedly #(rand 100)))
chunk-size 500
inp-chan (chan)
out-chan (chan)]
(go-loop [] (when-let [res (<! out-chan)]
;; do stuff with chunk
(recur)))
(pipeline-blocking 16 out-chan (map expensive-calculation) inp-chan)
(<!! (onto-chan inp-chan (partition-all chunk-size points)))))

Should clojure.core.async channel be drained to release parked puts

The problem: I have channel that consumer reads from and might stop reading when got enough data. When reader stops it closes channel with clojure.core.async/close!
The documentation says that at this moment all puts to channel after close is invoked should return false and do nothing. But the documentation also says that
Logically closing happens after all puts have been delivered. Therefore, any blocked or parked puts will remain blocked/parked until a taker releases them.
Does it mean that to release producers that were already blocked in parked puts at the moment of closing channel I should always also drain channel (read all remaining items) at consumer side? Following code shows that go block never finishes:
(require '[clojure.core.async :as a])
(let [c (a/chan)]
(a/go
(prn "Go")
(prn "Put" (a/>! c 333)))
(Thread/sleep 300) ;; Let go block to be scheduled
(a/close! c))
If this is true, and I do not want to read all events then I should implement e.g. timeouts at producer side to detect that no more data is necessary?
Is there simpler way for consumer to tell "enough" to push back so producer stops also gracefully?
I found out that clojure.core.async/put! does not block and allows to avoid unnecessary blocking. Are there disadvantages of using it instead of clojure.core.aasync/>!?

closing chans frees all who are reading from them them, and leaves writers blocked
here is the reading case (where it works nicely):
user> (def a-chan (async/chan))
#'user/a-chan
user> (future (async/<!! a-chan)
(println "continuting after take"))
#future[{:status :pending, :val nil} 0x5fb5a025]
user> (async/close! a-chan)
nil
user> continuting after take
And here is a test of the writing case where, as you say, draining it may be a good idea:
user> (def b-chan (async/chan))
#'user/b-chan
user> (future (try (async/>!! b-chan 4)
(println "continuting after put")
(catch Exception e
(println "got exception" e))
(finally
(println "finished in finally"))))
#future[{:status :pending, :val nil} 0x17be0f7b]
user> (async/close! b-chan)
nil
I don't find any evidence of the stuck writer unblocking here when the chan is closed

This behavior is intended, since they explicitly state it in the docs!
In your case, do (while (async/poll! c)) after closing channel c to release all blocked/parked (message sending) threads/go-blocks.
If you want to do anything with the content you can do:
(->> (repeatedly #(async/poll! c))
(take-while identity))

easiest way to use a i/o callback within concurrent http-kit/get instances

I am launching a few hundreds concurrent http-kit.client/get requests provided with a callback to write results to a single file.
What would be a good way to deal with thread-safety? Using chanand <!! from core.asyc?
Here's the code I would consider :
(defn launch-async [channel url]
(http/get url {:timeout 5000
:user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:10.0) Gecko/20100101 Firefox/10.0"}
(fn [{:keys [status headers body error]}]
(if error
(put! channel (json/generate-string {:url url :headers headers :status status}))
(put! channel (json/generate-string body))))))
(defn process-async [channel func]
(when-let [response (<!! channel)]
(func response)))
(defn http-gets-async [func urls]
(let [channel (chan)]
(doall (map #(launch-async channel %) urls))
(process-async channel func)))
Thanks for your insights.

Since you are already using core.async in your example, I thought I'd point out a few issues and how you can address them. The other answer mentions using a more basic approach, and I agree wholeheartedly that a simpler approach is just fine. However, with channels, you have a simple way of consuming the data which does not involve mapping over a vector, which will also grow large over time if you have many responses. Consider the following issues and how we can fix them:
(1) Your current version will crash if your url list has more than 1024 elements. There's an internal buffer for puts and takes that are asynchronous (i.e., put! and take! don't block but always return immediately), and the limit is 1024. This is in place to prevent unbounded asynchronous usage of the channel. To see for yourself, call (http-gets-async println (repeat 1025 "http://blah-blah-asdf-fakedomain.com")).
What you want to do is to only put something on the channel when there's room to do so. This is called back-pressure. Taking a page from the excellent wiki on go block best practices, one clever way to do this from your http-kit callback is to use the put! callback option to launch your next http get; this will only happen when the put! immediately succeeds, so you will never have a situation where you can go beyond the channel's buffer:
(defn launch-async
[channel [url & urls]]
(when url
(http/get url {:timeout 5000
:user-agent "Mozilla"}
(fn [{:keys [status headers body error]}]
(let [put-on-chan (if error
(json/generate-string {:url url :headers headers :status status})
(json/generate-string body))]
(put! channel put-on-chan (fn [_] (launch-async channel urls))))))))
(2) Next, you seem to be only processing one response. Instead, use a go-loop:
(defn process-async
[channel func]
(go-loop []
(when-let [response (<! channel)]
(func response)
(recur))))
(3) Here's your http-gets-async function. I see no harm in adding a buffer here, as it should help you fire off a nice burst of requests at the beginning:
(defn http-gets-async
[func urls]
(let [channel (chan 1000)]
(launch-async channel urls)
(process-async channel func)))
Now, you have the ability to process an infinite number of urls, with back-pressure. To test this, define a counter, and then make your processing function increment this counter to see your progress. Using a localhost URL that is easy to bang on (wouldn't recommend firing off hundreds of thousands of requests to, say, google, etc.):
(def responses (atom 0))
(http-gets-async (fn [_] (swap! responses inc))
(repeat 1000000 "http://localhost:8000"))
As this is all asynchronous, your function will return immediately and you can look at #responses grow.
One other interesting thing you can do is instead of running your processing function in process-async, you could optionally apply it as a transducer on the channel itself.
(defn process-async
[channel]
(go-loop []
(when-let [_ (<! channel)]
(recur))))
(defn http-gets-async
[func urls]
(let [channel (chan 10000 (map func))] ;; <-- transducer on channel
(launch-async channel urls)
(process-async channel)))
There are many ways to do this, including constructing it so that the channel closes (note that above, it stays open). You have java.util.concurrent primitives to help in this regard if you like, and they are quite easy to use. The possibilities are very numerous.

This is simple enough that I wouldn't use core.async for it. You can do this with an atom storing use a vector of the responses, then have a separate thread reading the contents of atom until it's seen all of the responses. Then, in your http-kit callback, you could just swap! the response into the atom directly.
If you do want to use core.async, I'd recommend a buffered channel to keep from blocking your http-kit thread pool.

leiningen run does not return for one minute after invoking future

If I wrap a function in a future and invoke this from leiningen on the commandline, it adds 1 full minute to the runtime. Can any one tell me why this is? Can you also tell me how to stop this behavior? I'd like to avoid this extra minute.
Example code:
(ns futest.core
(:gen-class))
(defn testme []
(Thread/sleep 2000)
42)
(defn -main
[& args]
(if (= (nth args 0) "a")
;; either run a simple function that takes 2 seconds
(do
(println "no future sleep")
(let [t (testme)]
(println t))
(println "done."))
;; or run a simple function that takes 2 seconds as a future
(do
(println "future sleep")
(let [t (future (testme))]
(println #t))
(println "done.")
;; soo, why does it wait for 1 minute here?
)))

this is because agents uses two threadpools, the first is a fixed threadpool and the second a cached threadpool. cached threadpool terminates running threads that were inactive for a certain duration, the default being 60 seconds. This is the reason you see the 60 seconds delay. Of course, if you manually call shutdown-agents both these threadpools terminate leaving no non-daemon threads that blocks your exit.

As noted in the answer to this question you need to call shutdown-agents at the end of your -main method.
I'm posting this as self-answered Q&A since that question doesn't mention future, so it didn't turn up on my google searches. Sure enough, if I add:
;; soo, why does it wait for 1 minute here?
(shutdown-agents)
)))
the problem goes away.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Create 10k+ agents in clojure - concurrency

As I tested, a separate thread is used for each new agent, when I create them. Could several agents be run in one thread? My idea is to create 10K+ light-weight agents (like actors in erlang), so is it a challenge for Clojure? Thanks

This is incorrect. Agents use a thread pool which is the number of core + 2 in size. So on a quad core machine even 10k+ agents will only use 6 worker threads. With send, that is. With send-off new threads will be started.

Related

Sleeping Barber Problem in Clojure using core/async and go

Closing a channel at the producer end when all the jobs are finished

Should clojure.core.async channel be drained to release parked puts

easiest way to use a i/o callback within concurrent http-kit/get instances

leiningen run does not return for one minute after invoking future

Categories

Resources