timing loading of clojure source files - clojure

I want to do some timings of how quickly clojure files load.
Is there a way to hook into the loader to get this information?
I would like to obtain information like this:
ie. A depends on B depends on C
A -> starting to load at T(a0)
B -> starting to load at T(b0)
C -> starting to load at T(c0)
C -> loaded at T(c1), time taken 50ms
B -> loaded at T(b1), time taken 100ms
A -> loaded at T(a1), time taken 200ms

You could replace clojure.core/load-lib (or a similar function like load-one or load) with a wrapping function that logs/times each invocation:
(alter-var-root
#'clojure.core/load-lib
(fn [f]
(fn [prefix lib & options]
(println lib "loading...")
(let [start (System/nanoTime)
result (apply f prefix lib options)
elapsed (double (/ (- (System/nanoTime) start) 1000000))]
(println lib "loaded in" elapsed "ms")
result))))
Then whenever you load a file with a ns declaration, or use require/use, etc., you'll see output like this:
(require 'clojure.spec.test.alpha)
clojure.spec.test.alpha loading...
clojure.pprint loading...
clojure.pprint loaded in 0.159361 ms
clojure.spec.alpha loading...
clojure.spec.alpha loaded in 0.108056 ms
clojure.spec.gen.alpha loading...
clojure.spec.gen.alpha loaded in 0.116731 ms
clojure.string loading...
clojure.string loaded in 0.241387 ms
clojure.spec.test.alpha loaded in 103.37399 ms
=> nil
clojure.tools.namespace might also have some useful functionality for this.

Related

How can I record time for function call in clojure

I am newbie to Clojure. I am invoking Clojure function using java and I want to record the time a particular line of clojure code execution takes:
Suppose if my clojure function is:
(defn sampleFunction [sampleInput]
(fun1 (fun2 sampleInput))
Above function I am invoking from java which returns some String value and I want to record the time it takes for executing fun2.
I have a another function say logTime which will write the parameter passed to it in to some database:
(defn logTime [time]
.....
)
My Question is: How can I modify my sampleFunction(..) to invoke logTime for recording time it took to execute fun2.
Thank you in advance.
I'm not entirely sure how the different pieces of your code fit together and interoperate with Java, but here's something that could work with the way you described it.
To get the execution time of a piece of code, there's a core function called time. However, this function doesn't return the execution time, it just prints it... So given that you want to log that time into a database, we need to write a macro to capture both the return value of fun2 as well the time it took to execute:
(defmacro time-execution
[& body]
`(let [s# (new java.io.StringWriter)]
(binding [*out* s#]
(hash-map :return (time ~#body)
:time (.replaceAll (str s#) "[^0-9\\.]" "")))))
What this macro does is bind standard output to a Java StringWriter, so that we can use it to store whatever the time function prints. To return both the result of fun2 and the time it took to execute, we package the two values in a hash-map (could be some other collection too - we'll end up destructuring it later). Notice that the code whose execution we're timing is wrapped in a call to time, so that we trigger the printing side effect and capture it in s#. Finally, the .replaceAll is just to ensure that we're only extracting the actual numeric value (in miliseconds), since time prints something of the form "Elapsed time: 0.014617 msecs".
Incorporating this into your code, we need to rewrite sampleFunction like so:
(defn sampleFunction [sampleInput]
(let [{:keys [return time]} (time-execution (fun2 sampleInput))]
(logTime time)
(fun1 return)))
We're simply destructuring the hash-map to access both the return value of fun2 and the time it took to execute, then we log the execution time using logTime, and finally we finish by calling fun1 on the return value of fun2.
The library tupelo.prof gives you many options if you want to capture execution time for one or more functions and accumulate it over multiple calls. An example:
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.profile :as prof]))
(defn add2 [x y] (+ x y))
(prof/defnp fast [] (reduce add2 0 (range 10000)))
(prof/defnp slow [] (reduce add2 0 (range 10000000)))
(dotest
(prof/timer-stats-reset)
(dotimes [i 10000] (fast))
(dotimes [i 10] (slow))
(prof/print-profile-stats)
)
with result:
--------------------------------------
Clojure 1.10.2-alpha1 Java 14
--------------------------------------
Testing tst.demo.core
---------------------------------------------------------------------------------------------------
Profile Stats:
Samples TOTAL MEAN SIGMA ID
10000 0.955 0.000096 0.000045 :tst.demo.core/fast
10 0.905 0.090500 0.000965 :tst.demo.core/slow
---------------------------------------------------------------------------------------------------
If you want detailed timing for a single method, the Criterium library is what you need. Start off with the quick-bench function.
Since the accepted answer has some shortcomings around eating up logs etc,
A simpler solution compared to the accepted answer perhaps
(defmacro time-execution [body]
`(let [st# (System/currentTimeMillis)
return# ~body
se# (System/currentTimeMillis)]
{:return return#
:time (double (/ (- se# st#) 1000))}))

easiest way to use a i/o callback within concurrent http-kit/get instances

I am launching a few hundreds concurrent http-kit.client/get requests provided with a callback to write results to a single file.
What would be a good way to deal with thread-safety? Using chanand <!! from core.asyc?
Here's the code I would consider :
(defn launch-async [channel url]
(http/get url {:timeout 5000
:user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:10.0) Gecko/20100101 Firefox/10.0"}
(fn [{:keys [status headers body error]}]
(if error
(put! channel (json/generate-string {:url url :headers headers :status status}))
(put! channel (json/generate-string body))))))
(defn process-async [channel func]
(when-let [response (<!! channel)]
(func response)))
(defn http-gets-async [func urls]
(let [channel (chan)]
(doall (map #(launch-async channel %) urls))
(process-async channel func)))
Thanks for your insights.
Since you are already using core.async in your example, I thought I'd point out a few issues and how you can address them. The other answer mentions using a more basic approach, and I agree wholeheartedly that a simpler approach is just fine. However, with channels, you have a simple way of consuming the data which does not involve mapping over a vector, which will also grow large over time if you have many responses. Consider the following issues and how we can fix them:
(1) Your current version will crash if your url list has more than 1024 elements. There's an internal buffer for puts and takes that are asynchronous (i.e., put! and take! don't block but always return immediately), and the limit is 1024. This is in place to prevent unbounded asynchronous usage of the channel. To see for yourself, call (http-gets-async println (repeat 1025 "http://blah-blah-asdf-fakedomain.com")).
What you want to do is to only put something on the channel when there's room to do so. This is called back-pressure. Taking a page from the excellent wiki on go block best practices, one clever way to do this from your http-kit callback is to use the put! callback option to launch your next http get; this will only happen when the put! immediately succeeds, so you will never have a situation where you can go beyond the channel's buffer:
(defn launch-async
[channel [url & urls]]
(when url
(http/get url {:timeout 5000
:user-agent "Mozilla"}
(fn [{:keys [status headers body error]}]
(let [put-on-chan (if error
(json/generate-string {:url url :headers headers :status status})
(json/generate-string body))]
(put! channel put-on-chan (fn [_] (launch-async channel urls))))))))
(2) Next, you seem to be only processing one response. Instead, use a go-loop:
(defn process-async
[channel func]
(go-loop []
(when-let [response (<! channel)]
(func response)
(recur))))
(3) Here's your http-gets-async function. I see no harm in adding a buffer here, as it should help you fire off a nice burst of requests at the beginning:
(defn http-gets-async
[func urls]
(let [channel (chan 1000)]
(launch-async channel urls)
(process-async channel func)))
Now, you have the ability to process an infinite number of urls, with back-pressure. To test this, define a counter, and then make your processing function increment this counter to see your progress. Using a localhost URL that is easy to bang on (wouldn't recommend firing off hundreds of thousands of requests to, say, google, etc.):
(def responses (atom 0))
(http-gets-async (fn [_] (swap! responses inc))
(repeat 1000000 "http://localhost:8000"))
As this is all asynchronous, your function will return immediately and you can look at #responses grow.
One other interesting thing you can do is instead of running your processing function in process-async, you could optionally apply it as a transducer on the channel itself.
(defn process-async
[channel]
(go-loop []
(when-let [_ (<! channel)]
(recur))))
(defn http-gets-async
[func urls]
(let [channel (chan 10000 (map func))] ;; <-- transducer on channel
(launch-async channel urls)
(process-async channel)))
There are many ways to do this, including constructing it so that the channel closes (note that above, it stays open). You have java.util.concurrent primitives to help in this regard if you like, and they are quite easy to use. The possibilities are very numerous.
This is simple enough that I wouldn't use core.async for it. You can do this with an atom storing use a vector of the responses, then have a separate thread reading the contents of atom until it's seen all of the responses. Then, in your http-kit callback, you could just swap! the response into the atom directly.
If you do want to use core.async, I'd recommend a buffered channel to keep from blocking your http-kit thread pool.

clojure Riemann project collectd

I am trying to do a custom configuration apparently simple using Riemann and Collectd. Basically I'd like to calculate the ratio between two streams. In order to do that I tried something like (as in Rieamann API project suggestion here):
(project [(service "cahe-miss")
(service "cache-all")]
(smap folds/quotient
(with :service "ratio"
index)))
Which apparently works, but after a while I noticed some of the results where miss calculated. After log debugging I finished with the following configuration in order to see what's happening and proint the values:
(project [(service "cache-miss")
(service "cache-all")]
(fn [[miss all]]
(if (or (nil? miss) (nil? all))
(do nil)
(do (where (= (:time miss) (:time all))
;to print time marks
(println (:time all))
(println (:time miss))
; to distinguish easily each event
(println "NEW LINE")
))
)
)
)
My surprise is that each time I get new data from collectd (every 10 seconds) the function I created is executed twice, like reusing previous unused data, and more over it looks like it doesn't care at all about my time equality constraint in the (where (= :time....) clasue. The problem is than I am dividing metrics with different time stamp. Below some ouput of the previous code:
1445606294
1445606294
NEW LINE -- First time I get data
1445606304
1445606294
NEW LINE
1445606304
1445606304
NEW LINE -- Second time I get data
1445606314
1445606304
NEW LINE
1445606314
1445606314
NEW LINE -- Third time I get data
Is there anyone that can give a hint on how to get the data formatted as I expected? I assume there is something I am not understading about the "project" function. Or something related to how incoming data is processed in riemann.
Thanks in advance!
Updated
I managed to solve my problem but still I don't have a clear idea of how it works, however I managed to do so. Right now I am receiving two different streams from collectd tail plugin (from nginx logs) and I managed to make the quotient between them as it follows:
(where (or (service "nginx/counter-cacheHit") (service "nginx/counter-cacheAll"))
(coalesce
(smap folds/quotient (with :service "cacheHit" (scale (* 1 100) index)))))
I have tested it widely and up to now it produces the right results. However I still don't understand several things... First, how it is that coalesce only returns data after both events are processed. Collectd sends the events of the both streams every two seconds with the same time mark, usin "project" instead of "coalesce" resulted in two different executions of smap each two seconds (one for each event), however coalesce results only with one execution of smap with the two events with the same time mark, which is exactly what I wanted.
Finally, I don't know which is the criteria to choose which is the numerator and denominator. Is it becaouse of the "or" clauses in "where" clause?
Anyway, with some blackmagic behind it but I managed to solve my problem ;^)
Thank you all!
taking the ratios between streams that where moving at different rates didn't work out for me. I have since settled on calculating ratios and rates within a fixed time interval or a moving time interval. This way you are capturing a consistent snapshot of events in a time block and calculating over this. Here is some elided code from comparing the rate a service is receiving events to the rate at which it is forwarding events:
(moving-time-window 30 ;; seconds
(smap (fn [events]
(let [in (or (->> events
(filter #(= (:service %) "event-received"))
count)
0)
out (or (->> events
(filter #(= (:service %) "event-sent"))
count)
0)
flow-rate (float (if (> in 0) (/ out in) 0))]
{:service "flow rate"
:metric flow-rate
:host "All"
:state (if (< flow-rate 0.99) "WARNING" "OK")
:time (:time (last events))
:ttl default-interval}))
(tag ["some" "tags" "here"] index)
(where (and
(< (:metric event) 0.9)
(= (:environment event) "production"))
(throttle 1 3600 send-to-slack))))
This takes in a fixed window of events, calculates the ratio for that block and emits an event containing that ratio as it's metric. then if the metric is bad it calls me on slack.

leiningen run does not return for one minute after invoking future

If I wrap a function in a future and invoke this from leiningen on the commandline, it adds 1 full minute to the runtime. Can any one tell me why this is? Can you also tell me how to stop this behavior? I'd like to avoid this extra minute.
Example code:
(ns futest.core
(:gen-class))
(defn testme []
(Thread/sleep 2000)
42)
(defn -main
[& args]
(if (= (nth args 0) "a")
;; either run a simple function that takes 2 seconds
(do
(println "no future sleep")
(let [t (testme)]
(println t))
(println "done."))
;; or run a simple function that takes 2 seconds as a future
(do
(println "future sleep")
(let [t (future (testme))]
(println #t))
(println "done.")
;; soo, why does it wait for 1 minute here?
)))
this is because agents uses two threadpools, the first is a fixed threadpool and the second a cached threadpool. cached threadpool terminates running threads that were inactive for a certain duration, the default being 60 seconds. This is the reason you see the 60 seconds delay. Of course, if you manually call shutdown-agents both these threadpools terminate leaving no non-daemon threads that blocks your exit.
As noted in the answer to this question you need to call shutdown-agents at the end of your -main method.
I'm posting this as self-answered Q&A since that question doesn't mention future, so it didn't turn up on my google searches. Sure enough, if I add:
;; soo, why does it wait for 1 minute here?
(shutdown-agents)
)))
the problem goes away.

Create 10k+ agents in clojure

As I tested, a separate thread is used for each new agent, when I create them.
Could several agents be run in one thread?
My idea is to create 10K+ light-weight agents (like actors in erlang), so is it a challenge for Clojure?
Thanks
This is incorrect. Agents use a thread pool which is the number of core + 2 in size. So on a quad core machine even 10k+ agents will only use 6 worker threads.
With send, that is. With send-off new threads will be started.
Consider using a j.u.c.DelayQueue
Here's a sketch of how it would work,
the (delayed-function is a bit cumbersome here, but it basically constructs an instance of j.u.c.Delayed for submission to the queue.)
(import [java.util.concurrent Delayed DelayQueue TimeUnit])
(defn delayed-function [f]
(let [execute-time    (+ 5000 (System/currentTimeMillis))
remaining-delay (fn [t] (.convert t
(- execute-time
(System/currentTimeMillis))
TimeUnit/MILLISECONDS))]
(reify
Delayed    (getDelay [_ t] (remaining-delay t))
Runnable   (run [_] (f))
Comparable (compareTo [_ _] 0))))
;;; Use java's DelayQueue from clojure.
;;; See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/DelayQueue.html
(def q (DelayQueue.))
(defn delayed
"put the function f on the queue, it will only execute after the delay
expires"
[f]
(.offer q (delayed-function f)))
(defn start-processing
"starts a thread that endlessly reads from the delay queue and
executes the function found in the queue"
[]
(.start
(Thread.
#(while true
(.run (.take q))))))
user> (start-processing)
user> (delayed #(println "Hello"))
; 5 seconds passes
Hello
the at function of the at-at library that was developed to support the (in my opinion fantastic) Overtone music synthesizer provides a nice clean interfase for running functions at a specific point in time.
(use 'overtone.at-at)
(def my-pool (mk-pool))
(after 1000 #(println "hello from the past!") my-pool)