I have a bit of computation that is somewhat expensive (starting a database), and I only want to create the database if I actually am going to use it. I am looking for a reference variable (or just a plain variable, if that is possible) that would only evaluate its value in the event that it is used (or dereferenced). Something conceptually like the following.
(def v (lazy-var (fn [] (do (println "REALLY EXPENSIVE FUNCTION") true))))
and in the future, when I either just use var v, or call #v, I then get it to print out "REALLY EXPENSIVE FUNCTION", and from thereon v has a value of true. The important thing here is that the fn was not evaluated until the variable was (de)referenced. When needed, the function is evaluated once and only once to calculate the value of the variable. Is this possible in clojure?
delay would be perfect for this application:
delay- (delay & body)
Takes a body of expressions and yields a Delay object that will invoke the body only the first time it is forced (with force or deref/#), and will cache the result and return it on all subsequent force calls.
Place the code to construct the database handle within the body of a delay invocation, stored as a Var. Then dereference this Var whenever you need to use the DB handle — on the first dereference the body will be run, and on subsequent dereferences the cached handle will be returned.
(def db (delay (println "DB stuff") x))
(select #db ...) ; "DB stuff" printed, x returned
(insert #db ...) ; x returned (cached)
Clojure 1.3 introduced memoize function for this purpose:
(memoize f)
Returns a memoized version of a referentially transparent function.
The memoized version of the function keeps a cache of the mapping from
arguments to results and, when calls with the same arguments are
repeated often, has higher performance at the expense of higher memory
use.
In your example replace non-existing lazy-var with memoize:
(def v (memoize (fn [] (do (println "REALLY EXPENSIVE FUNCTION") true))))
(v)
=>REALLY EXPENSIVE FUNCTION
=>true
(v)
=>true
(delay expr) also does the job as another answer explains. An extra comment on dereferencing the delay - the difference between force and deref/# is that force does not throw exception if used on non-delay variable while deref/# may throw ClassCastException "cannot be cast to clojure.lang.IDeref".
Related
The clojure reference contains the following comments about transducers, which seem like saying something important about the safety of writing and using transducers:
If you have a new context for applying transducers, there are a few general rules to be aware of:
If a step function returns a reduced value, the transducible process must not supply any more inputs to the step function. The
reduced value must be unwrapped with deref before completion.
A completing process must call the completion operation on the final accumulated value exactly once.
A transducing process must encapsulate references to the function returned by invoking a transducer - these may be stateful and unsafe
for use across threads.
Can you explain, possibly with some examples, what each of these cases mean? also, what does "context" refer to in this context?
Thanks!
If a step function returns a reduced value, the transducible process must not supply any more inputs to the step function. The reduced value must be unwrapped with deref before completion.
One example of this scenario is take-while transducer:
(fn [rf]
(fn
([] (rf))
([result] (rf result))
([result input]
(if (pred input)
(rf result input)
(reduced result)))))
As you can see, it can return a reduced value which means there is no point (and actually it would be an error) to provide more input to such step function - we know already there can be no more values produced.
For example while processing (1 1 3 5 6 8 7) input collection with odd? predicate once we reach value 6 there will be no more values returned by a step function created by take-while odd? transducer.
A completing process must call the completion operation on the final accumulated value exactly once.
This is a scenario where a transducer returns a stateful step function. A good example would be partition-by transducer. For example when (partition-by odd?) is used by the transducible process for processing (1 3 2 4 5 2) it will produce ((1 3) (2 4) (5) (6 8)).
(fn [rf]
(let [a (java.util.ArrayList.)
pv (volatile! ::none)]
(fn
([] (rf))
([result]
(let [result (if (.isEmpty a)
result
(let [v (vec (.toArray a))]
;;clear first!
(.clear a)
(unreduced (rf result v))))]
(rf result)))
([result input]
(let [pval #pv
val (f input)]
(vreset! pv val)
(if (or (identical? pval ::none)
(= val pval))
(do
(.add a input)
result)
(let [v (vec (.toArray a))]
(.clear a)
(let [ret (rf result v)]
(when-not (reduced? ret)
(.add a input))
ret))))))))
If you take a look at the implementation you will notice that the step function won't return it's accumulated values (stored in a array list) until the predicate function will return a different result (e.g. after a sequence of odd numbers it will receive an even number, it will return a seq of accumulated odd numbers). The issue is if we reach the end of the source data - there will be no chance to observe a change in the predicate result value and the accumulated value wouldn't be returned. Thus the transducible process must call a completion operation of the step function (arity 1) so it can return its accumulated result (in our case (6 8)).
A transducing process must encapsulate references to the function returned by invoking a transducer - these may be stateful and unsafe for use across threads.
When a transducible process is executed by passing a source data and transducer instance, it will first call the transducer function to produce a step function. The transducer is a function of the following shape:
(fn [xf]
(fn ([] ...)
([result] ...)
([result input] ...)))
Thus the transducible process will call this top level function (accepting xf - a reducing function) to obtain the actual step function used for processing the data elements. The issue is that the transducible process must keep the reference to that step function and use the same instance for processing elements from a particular data source (e.g. the step function instance produced partition-by transducer must be used for processing the whole input sequence as it keeps its internal state as you saw above). Using different instances for processing a single data source would yield incorrect results.
Similarly, a transducible process cannot reuse a step function instance for processing multiple data sources due to the same reason - the step function instance might be stateful and keep an internal state for processing a particular data source. That state would be corrupted when the step function would be used for processing another data source.
Also there is no guarantee if the step function implementation is thread safe or not.
What does "context" refer to in this context?
"A new context for applying transducers" means implementing a new type of a transducible process. Clojure provides transducible processes working with collections (e.g. into, sequence). core.async library chan function (one of its arities) accepts a transducer instance as an argument which produces an asynchronous transducible process producing values (that can be consumed from the channel) by applying a transducer to consumed values.
You could for example create a transducible process for handling data received on a socket, or your own implementation of observables.
They could use transducers for transforming the data as transducers are agnostic when it comes where the data comes from (a socket, a stream, collection, an event source etc.) - it is just a function called with individual elements.
They also don't care (and don't know) what should be done with the result they generate (e.g. should it be appended to a result sequence (for example conj)? should it be sent over network? inserted to a database?) - it's abstracted by using a reducing function that is captured by the step function (rf argument above).
So instead of creating a step function that just uses conj or saves elements to db, we pass a function which has a specific implementation of that operation. And your transducible process defines what that operation is.
Why does this bit of Clojure code:
user=> (map (constantly (println "Loop it.")) (range 0 3))
Yield this output:
Loop it.
(nil nil nil)
I'd expect it to print "Loop it" three times as a side effect of evaluating the function three times.
constantly doesn't evaluate its argument multiple times. It's a function, not a macro, so the argument is evaluated exactly once before constantly runs. All constantly does is it takes its (evaluated) argument and returns a function that returns the given value every time it's called (without re-evaluating anything since, as I said, the argument is evaluated already before constantly even runs).
If all you want to do is to call (println "Loop it") for every element in the range, you should pass that in as the function to map instead of constantly. Note that you'll actually have to pass it in as a function, not an evaluated expression.
As sepp2k rightly points out constantly is a function, so its argument will only be evaluated once.
The idiomatic way to achieve what you are doing here would be to use doseq:
(doseq [i (range 0 3)]
(println "Loop it."))
Or alternatively dotimes (which is a little more concise and efficient in this particular case as you aren't actually using the sequence produced by range):
(dotimes [i 3]
(println "Loop it."))
Both of these solutions are non-lazy, which is probably what you want if you are just running some code for the side effects.
You can get a behavior close to your intent by usig repeatedly and a lambda expression.
For instance:
(repeatedly 3 #(println "Loop it"))
Unless you're at the REPL, this needs to be surrounded by a dorun or similar. repeatedly is lazy.
I have a web app where i want to be able to track the number of times a given function is called in a request (i.e. thread).
I know that it is possible to do in a non-thread local way with a ref, but how would I go about doing it thread locally?
There's a tool for this in useful called thread-local. You can write, for example, (def counter (thread-local (atom 0))). This will create a global variable which, when derefed, will yield a fresh atom per thread. So you could read the current value with ##counter, or increment it with (swap! #counter inc). Of course, you could also get hold of the atom itself with #counter and just treat it like a normal atom from then on.
You can use a dynamic global var, bound to a value with binding in combination with the special form set! to change its value. Vars bound with binding are thread-local. The following will increase *counter* every time my-fn is called for any form called within a with-counter call:
(def ^{:dynamic true} *counter*)
(defmacro with-counter [& body]
`(binding [*counter* 0]
~#body
*counter*))
(defn my-fn []
(set! *counter* (inc *counter*)))
To demonstrate, try:
(with-counter (doall (repeatedly 5 my-fn)))
;; ==> 5
For more information, see http://clojure.org/vars#set
You can keep instance of ThreadLocal in ref. And every time you need to increase it just read value, increase it and set back. At the beginning of request you should initialize thread local with 0, because threads may be reused for different requests.
In the accepted answer to another question, Setting Clojure "constants" at runtime the clojure function constantly is used.
The definition of constantly looks like so:
(defn constantly
"Returns a function that takes any number of arguments and returns x."
{:added "1.0"}
[x] (fn [& args] x))
The doc string says what it does but not why one would use it.
In the answer given in the previous question constantly is used as follows:
(declare version)
(defn -main
[& args]
(alter-var-root #'version (constantly (-> ...)))
(do-stuff))
So the function returned by constantly is directly evaluated for its result. I am confused as to how this is useful. I am probably not understanding how x would be evaluated with and without being wrapped in `constantly'.
When should I use constantly and why is it necessary?
The constantly function is useful when an API expects a function and you just want a constant. This is the case in the example provided in the question.
Most of the alter-* functions (including alter-var-root) take a function, to allow the caller to modify something based on its old value. Even if you just want the new value to be 7 (disregarding the old value), you still need to provide a function (providing just 7 will result in an attempt to evaluate it, which will fail). So you have to provide a function that just returns 7. (constantly 7) produces just this function, sparing the effort required to define it.
Edit: As to second part of the question, constantly is an ordinary function, so its argument is evaluated before the constant function is constructed. So (constantly #myref) always returns the value referenced by myref at the time constantly was called, even if it is changed later.
I'm having some trouble understanding how the delay macro works in Clojure. It doesn't seem to do what expect it to do (that is: delaying evaluation). As you can see in this code sample:
; returns the current time
(defn get-timestamp [] (System/currentTimeMillis))
; var should contain the current timestamp after calling "force"
(def current-time (delay (get-timestamp)))
However, calling current-time in the REPL appears to immediately evaluate the expression, even without having used the force macro:
user=> current-time
#<Delay#19b5217: 1276376485859>
user=> (force current-time)
1276376485859
Why was the evaluation of get-timestamp not delayed until the first force call?
The printed representation of various objects which appears at the REPL is the product of a multimethod called print-method. It resides in the file core_print.clj in Clojure's sources, which constitutes part of what goes in the clojure.core namespace.
The problem here is that for objects implementing clojure.lang.IDeref -- the Java interface for things deref / # can operate on -- print-method includes the value behind the object in the printed representation. To this end, it needs to deref the object, and although special provisions are made for printing failed Agents and pending Futures, Delays are always forced.
Actually I'm inclined to consider this a bug, or at best a situation in need of an improvement. As a workaround for now, take extra care not to print unforced delays.