delay and defonce both give similar results - clojure

I am trying to understand the difference between delay and defonce, and one should be used over the other.
Below is a snippet from tomekw/hikari-cp
(defonce datasource
(delay (hcp/make-datasource datasource-options)))
Why is defonce and delay both used ? Is it not enough to just use defonce ?
As I understand both will allow the execution of the form just once. What is the difference between both ?

in the notion of times the body is being evaluated, both of them do call the body exactly once, but this form is not solely about the execution times, it is also about the lazyness/eagerness.
delay macro wraps the body into the special reference object, and postpones it's execution until the first time it is dereferenced:
user> (let [x (delay 100)]
(println x)
(Thread/sleep 100)
(println "woke up")
(println x)
(println #x)
(println x)
#x)
;;=> #delay[{:status :pending, :val nil} 0x273336d4]
;; woke up
;; #delay[{:status :pending, :val nil} 0x273336d4]
;; 100
;; #delay[{:status :ready, :val 100} 0x273336d4]
;; 100
defonce itself is eager, returning the delayed block at once, but not executing it.
so that means that datasource is being set to encapsulated lazy block, which should be executed exactly when the code first needs to get the data source (calling deref or #), and defonce is here to prohibit redefinition of this var. I would call it 'lazy singleton'.
user> (defonce value
(do (println "DEFINING!")
(delay
(println "EVALUATING!")
101)))
;;=> DEFINING!
#'user/value
user> value
#<Delay#b1388c1a: :not-delivered>
user> #value
;;=> EVALUATING!
101
user> value
#<Delay#1ed9ec81: 101>
;; `delay` caches value (doesn't print again)
user> #value
101

Related

What is the difference between using reset! and a new def to change the value of a variable associated to an atom in Clojure?

According to Clojure's documentation, reset! can be used as:
Sets the value of atom to newval without regard for the current value.
Returns newval.
Thus, I can do:
user> (def my-test (atom 666))
#'user/my-test
user> my-test
#<Atom#66d7a880: 666>
user> #my-test
666
user> (reset! my-test 77)
77
user> my-test
#<Atom#66d7a880: 77>
user> #my-test
77
But, is there any difference between using another def instead of reset!?
user> (def my-test (atom 666))
#'user/my-test
user> my-test
#<Atom#66d7a880: 666>
user> #my-test
666
user> (reset! my-test 77)
77
user> my-test
#<Atom#66d7a880: 77>
user> #my-test
77
;;;; converting it back to the original value via def
user> (def my-test (atom 666))
#'user/my-test
user> #my-test
666
user> my-test
#<Atom#7ce4f432: 666>
user>
Just by reading the experiments on the REPL I cannot identify any difference. But I am new to Clojure, so I am probably naive here.
If there is any difference, why should I use reset! instead of a new def?
You can see the answer in the REPL output in your question. When you write (reset! a 1), you give a new value to the existing atom. When you write (def a (atom 1)), you get a brand new atom. Why does this matter? Because someone may have another reference to the old atom: in the former case they see the new value, and in the latter case they don't. Compare, for example:
(def a (atom 0))
(defn counter [c] (fn [] (swap! c inc)))
(def count-up (counter a))
(count-up) ; 1
(count-up) ; 2
(reset! a 0)
(count-up) ; 1 again
with
(def a (atom 0))
(defn counter [c] (fn [] (swap! c inc)))
(def count-up (counter a))
(count-up) ; 1
(count-up) ; 2
(def a (atom 0))
(count-up) ; 3, because the old atom still holds 2
Changes to atoms are always free of race conditions. New-def-ing is not.
A Clojure Var is meant to be a global value that, in general, never changes (as always, there are exceptions to every rule). As an example, function declarations are normally stored in a Var.
A Clojure Atom is meant to point to a value that can change. An atom may be held in a global Var or a local variable binding (e.g. in a (let ...) form). Atoms are thread-safe (this is one of their primary purposes).
If you are just playing around with experimental code with only one thread, you can do a lot of sloppy or dangerous stuff and there is no problem. However, you should learn how to use each tool for its intended purpose.
More detailed discussion:
Brave Clojure
Book Getting Clojure
Clojure.org - Vars
Clojure.org - Atoms
clojuredocs.org - atom
Clojure CheatSheet
def creates a new atom (means allocate new memory space for an atom - setting it up - setting a pointer), while reset! just resets an existing atom (just changing value in the cell the pointer points to).
Therefore it is logical that reset! must be much cheaper (faster execution and less usage of resources) than def which you can test by:
(def n 10000000)
(time (dotimes [_ n] (def a (atom 1))))
## "Elapsed time: 2294.676443 msecs"
(def b (atom 1))
(time (dotimes [_ n] (reset! b 1)))
## "Elapsed time: 106.03302 msecs"
So reset! is one magnitude of order faster than def.

What is the correct way to perform side effects in a clojure atom swap

I'm keeping a registry of processes in an atom.
I want to start one and only one process (specifically a core.async go-loop) per id.
However, you're not supposed to perform side-effects in a swap!, so this code is no good:
(swap! processes-atom
(fn [processes]
(if (get processes id)
processes ;; already exists, do nothing
(assoc processes id (create-process! id)))))
How would I go about doing this correctly?
I have looked at locking, which takes an object as a monitor for the lock. I would prefer that each id - which are dynamic - have their own lock.
It seems that you need to protect processes-atom from concurrent modification, so that only single thread can have access to it. locking will work in this case. Since, by usage of locking, we will manage thread safety by ourselves, we can use volatile instead of atom (volatile is faster, but doesn't provide any thread-safety and atomicity guaranees).
Summing up the above, something like below should work fine:
(def processes-volatile (volatile! {}))
(defn create-and-save-process! [id]
(locking processes-volatile
(vswap! processes-volatile
(fn [processes]
(if (get processes id)
processes
(assoc processes id (create-process! id)))))))
You can do this by hand with locking, as OlegTheCat shows, and often that is a fine approach. However, in the comments you remark that it would be nice to avoid having the whole atom locked for as long as it takes to spawn a process, and that too is possible in a surprisingly simple way: instead of having a map from pid to process, have a map from pid to delay of process. That way, you can add a new delay very cheaply, and only actually create the process by dereferencing the delay, outside of the call to swap!. Dereferencing the delay will block waiting for that particular delay, so multiple threads who need the same process will not step on each other's toes, but the atom itself will be unlocked, allowing threads who want a different process to get it.
Here is a sample implementation of that approach, along with example definitions of the other vars your question implies, to make the code runnable as-is:
(def process-results (atom []))
(defn create-process! [id]
;; pretend creating the process takes a long time
(Thread/sleep (* 1000 (rand-int 3)))
(future
;; running it takes longer, but happens on a new thread
(Thread/sleep (* 1000 (rand-int 10)))
(swap! process-results conj id)))
(def processes-atom (atom {}))
(defn cached-process [id]
(-> processes-atom
(swap! (fn [processes]
(update processes id #(or % (delay (create-process! id))))))
(get id)
(deref)))
Of course only cached-process is needed if you already have the other things defined. And a sample run, to show that processes are successfully reused:
(defn stress-test [num-processes]
(reset! process-results [])
(reset! processes-atom {})
(let [running-processes (doall (for [i (range num-processes)]
(cached-process (rand-int 10))))]
(run! deref running-processes)
(deref process-results)))
user> (time (stress-test 40))
"Elapsed time: 18004.617869 msecs"
[1 5 2 0 9 7 8 4 3 6]
I prefer using a channel
(defn create-process! [id] {:id id})
(def ^:private processes-channel (chan))
(go (loop [processes {}]
(let [id (<! processes-channel)
process (if (contains? processes id)
(get processes id)
(create-process! id))]
(>! processes-channel process)
(recur (assoc processes id process)))))
(defn get-process-by-id
"Public API"
[id]
(>!! processes-channel id)
(<!! processes-channel))
Another answer is to use an agent to start each process. This decouples each process from each other, and avoids the problem of possible multiple calls to the "create-process" function:
(defn start-proc-agent
[state]
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" (:id state)))
(Thread/sleep delay)
(println (format "finished %d" (:id state)))
(merge state {:delay delay :state :running} )))
(def procs-agent (atom {}))
(dotimes [i 3]
(let [curr-agent (agent {:id i :state :unstarted})]
(swap! procs-agent assoc i curr-agent)
(send curr-agent start-proc-agent )))
(println "all dispatched...")
(pprint #procs-agent)
(Thread/sleep 3000)
(pprint #procs-agent)
When run we see:
starting 2
starting 1
starting 0
all dispatched...
{0 #<Agent#39d8240b: {:id 0, :state :unstarted}>,
1 #<Agent#3a6732bc: {:id 1, :state :unstarted}>,
2 #<Agent#7414167a: {:id 2, :state :unstarted}>}
finished 0
finished 1
finished 2
{0 #<Agent#39d8240b: {:id 0, :state :running, :delay 317}>,
1 #<Agent#3a6732bc: {:id 1, :state :running, :delay 1635}>,
2 #<Agent#7414167a: {:id 2, :state :running, :delay 1687}>}
So the global map procs-agent associates each process ID with the agent for that process. A side benefit of this approach is that you can send subsequent commands (in the form of functions) to the agent for a process and be assured they are independent (and parallel & asynchronous) to every other agent.
Alternate solution
Similar to your original question, we could use a single agent (instead of an agent per process) to simply serialize the creation of each process. Since agents are asynchronous, they don't have the possibility of re-trying the input function like swap!. Thus, side-effecting functions aren't a problem. You could write it like so:
(defn start-proc-once-only
[state i]
(let [curr-proc (get state i) ]
(if (= :running (:state curr-proc))
(do
(println "skipping restart of" i)
state)
(let [delay (int (* 2000 (rand)))]
(println (format "starting %d" i))
(Thread/sleep delay)
(println (format "finished %d" i))
(assoc state i {:delay delay :state :running})))))
(def procs (agent {}))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(dotimes [i 3]
(println :starting i)
(send procs start-proc-once-only i))
(println "all dispatched...")
(println :procs) (pprint #procs)
(Thread/sleep 5000)
(println :procs) (pprint #procs)
with result
:starting 0
:starting 1
:starting 2
starting 0
:starting 0
:starting 1
:starting 2
all dispatched...
:procs
{}
finished 0
starting 1
finished 1
starting 2
finished 2
skipping restart of 0
skipping restart of 1
skipping restart of 2
:procs
{0 {:delay 1970, :state :running},
1 {:delay 189, :state :running},
2 {:delay 1337, :state :running}}
I think you should use add-watch. It gets called once per change to the atom. In the watch-fn check whether a new id has been added to the atom, if so, create the process and add it to the atom. That'll trigger another call to the watch-fn, but that second call won't identify any new id needing a process.

Passing compile-time state between nested macros in Clojure

I'm trying to write a macro that can be used both in a global and nested way, like so:
;;; global:
(do-stuff 1)
;;; nested, within a "with-context" block:
(with-context {:foo :bar}
(do-stuff 2)
(do-stuff 3))
When used in the nested way, do-stuff should have access to {:foo :bar} set by with-context.
I've been able to implement it like this:
(def ^:dynamic *ctx* nil)
(defmacro with-context [ctx & body]
`(binding [*ctx* ~ctx]
(do ~#body)))
(defmacro do-stuff [v]
`(if *ctx*
(println "within context" *ctx* ":" ~v)
(println "no context:" ~v)))
However, I've been trying to shift the if within do-stuff from runtime to compile-time, because whether do-stuff is being called from within the body of with-context or globally is an information that's already available at compile-time.
Unfortunately, I've not been able to find a solution, because nested macros seem to get expanded in multiple "macro expansion runs", so the dynamic binding of *ctx* (as set within with-context) is not available anymore when do-stuff gets expanded. So this does not work:
(def ^:dynamic *ctx* nil)
(defmacro with-context [ctx & body]
(binding [*ctx* ctx]
`(do ~#body)))
(defmacro do-stuff [v]
(if *ctx*
`(println "within context" ~*ctx* ":" ~v)
`(println "no context:" ~v)))
Any ideas how to accomplish this?
Or is my approach totally insane and there's a pattern for how to pass state in such a way from one macro to a nested one?
EDIT:
The body of with-context should be able to work with arbitrary expressions, not only with do-stuff (or other context aware functions/macros). So something like this should also be possible:
(with-context {:foo :bar}
(do-stuff 2)
(some-arbitrary-function)
(do-stuff 3))
(I'm aware that some-arbitrary-function is about side effects, it might write something to a database for example.)
When the code is being macroexpanded, Clojure computes a fixpoint:
(defn macroexpand
"Repeatedly calls macroexpand-1 on form until it no longer
represents a macro form, then returns it. Note neither
macroexpand-1 nor macroexpand expand macros in subforms."
{:added "1.0"
:static true}
[form]
(let [ex (macroexpand-1 form)]
(if (identical? ex form)
form
(macroexpand ex))))
Any binding you establish during the execution of a macro is no more in place when you exit your macro (this happens inside macroexpand-1). By the time an inner macro is being expanded, the context is long gone.
But, you can call macroexpand directly, in which case the binding are still effective. Note however that in your case, you probably need to call macroexpand-all.
This answer explains the differences between macroexpand and clojure.walk/macroexpand-all: basically, you need to make sure all inner forms are macroexanded.
The source code for macroexpand-all shows how it is implemented.
So, you can implement your macro as follows:
(defmacro with-context [ctx form]
(binding [*ctx* ctx]
(clojure.walk/macroexpand-all form)))
In that case, the dynamic bindings should be visible from inside the inner macros.
I'd keep it simple.
This is solution avoids state in an additional *ctx* variable. I think it is a more functional approach.
(defmacro do-stuff
([arg1 context]
`(do (prn :arg1 ~arg1 :context ~context))
{:a 4 :b 5})
([arg1]
`(prn :arg1 ~arg1 :no-context)))
(->> {:a 3 :b 4}
(do-stuff 1)
(do-stuff 2))
output:
:arg1 1 :context {:a 3, :b 4}
:arg1 2 :context {:b 5, :a 4}
there is one more variant to do this, using some macro magic:
(defmacro with-context [ctx & body]
(let [ctx (eval ctx)]
`(let [~'&ctx ~ctx]
(binding [*ctx* ~ctx]
(do ~#body)))))
in this definition we introduce another let binding for ctx. Clojure's macro system would then put it into the &env variable, accessible by the inner macros at compile-time. Notice that we also keep bindings so that inner functions could use it.
now we need to define the function to get the context value from macro's &env:
(defn env-ctx [env]
(some-> env ('&ctx) .init .eval))
and then you can easily define do-stuff:
(defmacro do-stuff [v]
(if-let [ctx (env-ctx &env)]
`(println "within context" ~ctx ":" ~v)
`(println "no context:" ~v)))
in repl:
user> (defn my-fun []
(println "context in fn is: " *ctx*))
#'user/my-fun
user> (defmacro my-macro []
`(do-stuff 100))
#'user/my-macro
user> (with-context {:a 10 :b 20}
(do-stuff 1)
(my-fun)
(my-macro)
(do-stuff 2))
;;within context {:a 10, :b 20} : 1
;;context in fn is: {:a 10, :b 20}
;;within context {:a 10, :b 20} : 100
;;within context {:a 10, :b 20} : 2
nil
user> (do (do-stuff 1)
(my-fun)
(my-macro)
(do-stuff 2))
;;no context: 1
;;context in fn is: nil
;;no context: 100
;;no context: 2
nil

What is the difference between def and defonce in Clojure?

What is the difference between def and defonce in Clojure?
When to use def over defonce or vice versa?
defonce is skipped when variable is already defined.
user> (def a 1) ;;=> #'user/a
user> a ;;=> 1
user> (def a 2) ;;=> #'user/a
user> a ;;=> 2
user> (defonce b 1) ;;=> #'user/b
user> b ;;=> 1
user> (defonce b 2) ;;=> nil
user> b ;;=> 1
Defonce only binds the name to the root value if the name has no root value.
For example, like Jay Fields blogs about, it can be used in conjunction when you want to reload namespaces but you might not need to reload all.
(defonce ignored-namespaces (atom #{}))
(defn reload-all []
(doseq [n (remove (comp #ignored-namespaces ns-name) (all-ns))]
(require (ns-name n) :reload )))
As for when to use defonce, if you're using system with hot reloading (CLJS with mount and re-frame for example), defonce is useful to keep the state between reloads.
Similar situation when you re-evaluate source file yourself (e.g. in REPL) but want to keep the value of the var bound to the symbol.

When are the different elements of a lazy sequence realized in clojure?

I'm trying to understand when clojure's lazy sequences are lazy, and when the work happens, and how I can influence those things.
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (let [[a b] lz-seq])
fn call!
fn call!
fn call!
fn call!
nil
I was hoping to see only two "fn call!"s here. Is there a way to manage that?
Anyway, moving on to something which indisputably only requires one evaluation:
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (first lz-seq)
fn call!
fn call!
fn call!
fn call!
0
Is first not suitable for lazy sequences?
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (take 1 lz-seq)
(fn call!
fn call!
fn call!
fn call!
0)
At this point, I'm completely at a loss as to how to access the beginning of my toy lz-seq without having to realize the entire thing. What's going on?
Clojure's sequences are lazy, but for efficiency are also chunked, realizing blocks of 32 results at a time.
=>(def lz-seq (map #(do (println (str "fn call " %)) (identity %)) (range 100)))
=>(first lz-seq)
fn call 0
fn call 1
...
fn call 31
0
The same thing happens once you cross the 32 boundary first
=>(nth lz-seq 33)
fn call 0
fn call 1
...
fn call 63
33
For code where considerable work needs to be done per realisation, Fogus gives a way to work around chunking, and gives a hint an official way to control chunking might be underway.
I believe that the expression produces a chunked sequence. Try replacing 4 with 10000 in the range expression - you'll see something like 32 calls on first eval, which is the size of the chunk.
A lazy sequence is one where we evaluate the sequence as and when needed. (hence lazy). Once a result is evaluated, it is cached so that it can be re-used (and we don't have to do the work again). If you try to realize an item of the sequence that hasn't been evaluated yet, clojure evaluates it and returns the value to you. However, it also does some extra work. It anticipates that you might want to evaluate the next element(s) in the sequence and does that for you too. This is done to avoid some performance overheads, the exact nature of which is beyond my skill-level. Thus, when you say (first lz-seq), it actually calculates the first as well as the next few elements in the seq. Since your println statement is a side effect, you can see the evaluation happening. Now if you were to say (second lz-seq), you will not see the println again since the result has already been evaluated and cached.
A better way to see that your sequence is lazy is :
user=> def lz-seq (map #(do (println "fn call!") (identity %)) (range 400))
#'user/lz-seq
user=> (first lz-seq)
This will print a few "fn call!" statements, but not all 400 of them. That's because the first call will actually end up evaluating more than one element of the sequence.
Hope this explanation is clear enough.
I think its some sort of optimization made by repl.
My repl is caching 32 at a time.
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 100))
#'user/lz-seq
user=> (first lz-seq)
prints 32 times
user=> (take 20 lz-seq)
does not print any "fn call!"
user=> (take 33 lz-seq)
prints 0 to 30, then prints 32 more "fn call!"s followed by 31,32