Unused args in watcher function - concurrency

Here is a code example from Clojure Programming
(def history (atom ()))
(defn log->list
[dest-atom key source old new]
(when (not= old new)
(swap! dest-atom conj new)))
(def sarah (atom {:name "Sarah" :age 25}))
(add-watch sarah :record (partial log->list history))
(swap! sarah update-in [:age] inc)
(swap! sarah identity)
(pprint #history)
The log->list function has two args key and source that never gets used. What are they for here?

the key is the tag used to later remove it:
(remove-watch reference key)
If the key is then passed to the watching function then it has the ability to later remove it's self once it has been triggered. Though you can build that by agreeing on the key name in advance as well, though having this feature can make things more elegant.

From the docstring on add-watch:
Adds a watch function to an agent/atom/var/ref reference. The watch
fn must be a fn of 4 args: a key, the reference, its old-state, its
new-state...
Because we're adding a watch on the atom, log->list needs to be a 4 arity function. We will get passed the key, reference, old value, and new value. Even though we only use the old and new values, we need to have 4 args in the function so it has the right arity.
It is often idiomatic in Clojure to use an underscore in place of an unused argument in a function definition to show it is unused, and to reduce visual noise. However in this case, I think keeping the unused variables and naming them makes it clearer what's happening and makes it easier to modify this in the future if necessary. This is particularly important as we need to keep the arity of this function exactly the same for it to work with add-watch. In a book, I would lean towards this option.
In actual code it's borderline and I could go either way on underscore vs naming the args. What I chose would probably depend on the context and whatever struck the right balance of clarity and conciseness.

Related

In Clojure, why do you have to use parenthesis when def'ing a function and use cases of let

I'm starting to learn clojure and I've stumbled upon the following, when I found myself declaring a "sum" function (for learning purposes) I wrote the following code
(def sum (fn [& args] (apply + args)))
I have understood that I defined the symbol sum as containing that fn, but why do I have to enclose the Fn in parenthesis, isn't the compiler calling that function upon definition instead of when someone is actually invoking it? Maybe it's just my imperative brain talking.
Also, what are the use cases of let? Sometimes I stumble on code that use it and other code that don't, for example on the Clojure site there's an exercise to use the OpenStream function from the Java Interop, I wrote the following code:
(defn http-get
[url]
(let [url-obj (java.net.URL. url)]
(slurp (.openStream url-obj))))
(http-get "https://www.google.com")
whilst they wrote the following on the clojure site as an answer
(defn http-get [url]
(slurp
(.openStream
(java.net.URL. url))))
Again maybe it's just my imperative brain talking, the need of having a "variable" or an "object" to store something before using it, but I quite don't understand when I should use let or when I shouldn't.
To answer both of your questions:
1.
(def sum (fn [& args] (apply + args)))
Using def here is very unorthodox. When you define a function you usually want to use defn. But since you used def you should know that def binds a name to a value. fn's return value is a function. Effectively you bound the name sum to the function returned by applying (using parenthesis which are used for application) fn.
You could have used the more traditional (defn sum [& args] (apply + args))
2.
While using let sometimes makes sense for readability (separating steps outside their nested use) it is sometimes required when you want to do something once and use it multiple times. It binds the result to a name within a specified context.
We can look at the following example and see that without let it becomes harder to write (function is for demonstration purposes):
(let [db-results (query "select * from table")] ;; note: query is not a pure function
;; do stuff with db-results
(f db-results)
;; return db-results
db-results)))
This simply re-uses a return value (db-results) from a function that you usually only want to run once - in multiple locations. So let can be used for style like the example you've given, but its also very useful for value reuse within some context.
Both def and defn define a global symbol, sort of like a global variable in Java, etc. Also, (defn xxx ...) is a (very common) shortcut for (def xxx (fn ...)). So, both versions will work exactly the same way when you run the program. Since the defn version is shorter and more explicit, that is what you will do 99% of the time.
Typing (let [xxx ...] ...) defines a local symbol, which cannot be seen by code outside of the let form, just like a local variable (block-scope) in Java, etc.
Just like Java, it is optional when to have a local variable like url-obj. It will make no difference to the running program. You must answer the question, "Which version makes my code easier to read and understand?" This part is no different than Java.

Creating clojure atoms with a function

I want to 1) create a list of symbols with the function below; then 2) create atoms with these symbols/names so that the atoms can be modified from other functions. This is the function to generate symbols/names:
(defn genVars [ dist ]
(let [ nms (map str (range dist)) neigs (map #(apply str "neig" %) nms) ]
(doseq [ v neigs ]
(intern *ns* (symbol v) [ ] ))
))
If dist=3, then 3 symbols, neig0, ... neig2 are created each bound with an empty vector. If it is possible to functionally create atoms with these symbols so that they are accessible from other functions. Any help is much appreciated, even if there are other ways to accomplish this.
your function seems to be correct, just wrap the value in the intern call with atom call. Also I would rather use dotimes.
user>
(defn gen-atoms [amount prefix]
(dotimes [i amount]
(intern *ns* (symbol (str prefix i)) (atom []))))
#'user/gen-atoms
user> (gen-atoms 2 "x")
nil
user> x0
#atom[[] 0x30f1a7b]
user> x1
#atom[[] 0x2149efef]
The desire to generate names suggests you would be better served by a single map instead:
(def neighbours (atom (make-neighbours)))
Where the definition of make-neigbours might look something like this:
(defn make-neighbours []
(into {} (for [i (range 10)]
[(str "neig" i) {:age i}])))
Where the other namespace would look values up using something like:
(get-in #data/neighbours ["neig0" :age])
Idiomatic Clojure tends to avoid creating many named global vars, preferring instead to collocating state into one or a few vars governed by Clojure's concurrency primitives (atom/ref/agent). I encourage you to think about whether your problem can be solved with a single atom in this way instead of requiring defining multiple vars.
Having said that, if you really really need multiple atoms, consider storing them all in a single map var instead of creating many global vars. Personally, I have never encountered a situation where creating many atoms was better than a single big atom (so I would be interested to hear about situations where this would be important).
If you really really need many vars, be aware that defining vars inside a function is actually bad style (https://github.com/bbatsov/clojure-style-guide#dont-def-vars-inside-fns). With good reason too! The beauty of using functions and data comes from the purity of the functions. def inside a function is particularly nasty as it is not only a side-effect, but is an potentially execution flow altering side-effect.
Of course yes there is a way to achieve it, as another answer points out.
Where it comes to defining things that goes beyond def and defn, there is quite a lot of precedence to using macros. For example defroutes from compojure, defschema from Schema, deftest from clojure.test. Generally anything that is a convenience form for creating vars. You could use a macro solution to create defs for your atoms:
(defmacro defneighbours [n]
`(do
~#(for [sym (for [i (range n)]
(symbol (str "neig" i)))]
`(def ~sym (atom {}))))
In my opinion this is actually less offensive than a functional version, only because it is creating global defs. It is a little more obvious about creating global defs by using the regular def syntax. But I only bring it up as a strawman, because this is still bad.
The reason functions and data work best is because they compose.
There are tangible considerations that make a single atom governing state very convenient. You can iterate over all neighbors conveniently, you can add new ones dynamically. Also you can do things like concatenating neighbors with other neighbors etc. Basically there are lots of function/data abstractions that you lock yourself out of if you create many global vars.
This is the reason that macros are generally considered useful for syntactic tricks, but best avoided in favor of functions and data. And it has a real impact on the flexibility of your code. For example going back to compojure; the macro syntax is actually very limiting, and for that reason I prefer not to use defroutes at all.
In summary:
Don't make lots of global defs if you can avoid it.
Prefer 1 atom over many atoms where possible.
Don't def inside a function.
Macros are best avoided in favor of functions and data.
Regardless of these guidelines, it is always good to explore what is possible, and I can't know your circumstances, so above all I hope you overcome your immediate problem and find Clojure a pleasant language to use.

Call a side effecting function only when atom value changes

What is the simplest way to trigger a side-effecting function to be called only when an atom's value changes?
If I were using a ref, I think I could just do this:
(defn transform-item [x] ...)
(defn do-side-effect-on-change [] nil)
(def my-ref (ref ...))
(when (dosync (let [old-value #my-ref
_ (alter! my-ref transform-item)
new-value #my-ref]
(not= old-value new-value)))
(do-side-effect-on-change))
But this seems seems a bit roundabout, since I'm using a ref even though I am not trying to coordinate changes across multiple refs. Essentially I am using it just to conveniently access the old and new value within a successful transaction.
I feel like I should be able to use an atom instead. Is there a solution simpler than this?
(def my-atom (atom ...))
(let [watch-key ::side-effect-watch
watch-fn (fn [_ _ old-value new-value]
(when (not= old-value new-value)
(do-side-effect-on-change)))]
(add-watch my-atom watch-key watch-fn)
(swap! my-atom transform-item)
(remove-watch watch-key))
This also seems roundabout, because I am adding and removing the watch around every call to swap!. But I need this, because I don't want a watch hanging around that causes the side-effecting function to be triggered when other code modifies the atom.
It is important that the side-effecting function be called exactly once per mutation to the atom, and only when the transform function transform-item actually returns a new value. Sometimes it will return the old value, yielding new change.
(when (not= #a (swap! a transform))
(do-side-effect))
But you should be very clear about what concurrency semantics you need. For example another thread may modify the atom between reading it and swapping it:
a = 1
Thread 1 reads a as 1
Thread 2 modifies a to 2
Thread 1 swaps a from 2 to 2
Thread 1 determines 1 != 2 and calls do-side-effect
It is not clear to me from the question whether this is desirable or not desirable. If you do not want this behavior, then an atom just will not do the job unless you introduce concurrency control with a lock.
Seeing as you started with a ref and asked about an atom, I think you have probably given some thought to concurrency already. It seems like from your description the ref approach is better:
(when (dosync (not= #r (alter r transform))
(do-side-effect))
Is there a reason you don't like your ref solution?
If the answer is "because I don't have concurrency" Then I would encourage you to use a ref anyway. There isn't really a downside to it, and it makes your semantics explicit. IMO programs tend to grow and to a point where concurrency exists, and Clojure is really great at being explicit about what should happen when it exists. (For example oh I'm just calculating stuff, oh I'm just exposing this stuff as a web service now, oh now I'm concurrent).
In any case, bear in mind that functions like alter and swap! return the value, so you can make use of this for concise expressions.
I'm running into the same situation and just come up 2 solutions.
state field :changed?
Keeping a meanless :changed mark in atom to track swap function. And take the return value of swap! to see if things changed. For example:
(defn data (atom {:value 0 :changed? false}))
(let [{changed? :changed?} (swap! data (fn [data] (if (change?)
{:value 1 :changed? true}
{:value 0 :change? false})))]
(when changed? (do-your-task)))
exception based
You can throw an Exception in swap function, and catch it outside:
(try
(swap! data (fn [d] (if (changed?) d2 (ex-info "unchanged" {})))
(do-your-task)
(catch Exception _
))

When to use `constantly` in clojure, how and when are its arguments evaluated?

In the accepted answer to another question, Setting Clojure "constants" at runtime the clojure function constantly is used.
The definition of constantly looks like so:
(defn constantly
"Returns a function that takes any number of arguments and returns x."
{:added "1.0"}
[x] (fn [& args] x))
The doc string says what it does but not why one would use it.
In the answer given in the previous question constantly is used as follows:
(declare version)
(defn -main
[& args]
(alter-var-root #'version (constantly (-> ...)))
(do-stuff))
So the function returned by constantly is directly evaluated for its result. I am confused as to how this is useful. I am probably not understanding how x would be evaluated with and without being wrapped in `constantly'.
When should I use constantly and why is it necessary?
The constantly function is useful when an API expects a function and you just want a constant. This is the case in the example provided in the question.
Most of the alter-* functions (including alter-var-root) take a function, to allow the caller to modify something based on its old value. Even if you just want the new value to be 7 (disregarding the old value), you still need to provide a function (providing just 7 will result in an attempt to evaluate it, which will fail). So you have to provide a function that just returns 7. (constantly 7) produces just this function, sparing the effort required to define it.
Edit: As to second part of the question, constantly is an ordinary function, so its argument is evaluated before the constant function is constructed. So (constantly #myref) always returns the value referenced by myref at the time constantly was called, even if it is changed later.

Best Practice for globals in clojure, (refs vs alter-var-root)?

I've found myself using the following idiom lately in clojure code.
(def *some-global-var* (ref {}))
(defn get-global-var []
#*global-var*)
(defn update-global-var [val]
(dosync (ref-set *global-var* val)))
Most of the time this isn't even multi-threaded code that might need the transactional semantics that refs give you. It just feels like refs are for more than threaded code but basically for any global that requires immutability. Is there a better practice for this? I could try to refactor the code to just use binding or let but that can get particularly tricky for some applications.
I always use an atom rather than a ref when I see this kind of pattern - if you don't need transactions, just a shared mutable storage location, then atoms seem to be the way to go.
e.g. for a mutable map of key/value pairs I would use:
(def state (atom {}))
(defn get-state [key]
(#state key))
(defn update-state [key val]
(swap! state assoc key val))
Your functions have side effects. Calling them twice with the same inputs may give different return values depending on the current value of *some-global-var*. This makes things difficult to test and reason about, especially once you have more than one of these global vars floating around.
People calling your functions may not even know that your functions are depending on the value of the global var, without inspecting the source. What if they forget to initialize the global var? It's easy to forget. What if you have two sets of code both trying to use a library that relies on these global vars? They are probably going to step all over each other, unless you use binding. You also add overheads every time you access data from a ref.
If you write your code side-effect free, these problems go away. A function stands on its own. It's easy to test: pass it some inputs, inspect the outputs, they'll always be the same. It's easy to see what inputs a function depends on: they're all in the argument list. And now your code is thread-safe. And probably runs faster.
It's tricky to think about code this way if you're used to the "mutate a bunch of objects/memory" style of programming, but once you get the hang of it, it becomes relatively straightforward to organize your programs this way. Your code generally ends up as simple as or simpler than the global-mutation version of the same code.
Here's a highly contrived example:
(def *address-book* (ref {}))
(defn add [name addr]
(dosync (alter *address-book* assoc name addr)))
(defn report []
(doseq [[name addr] #*address-book*]
(println name ":" addr)))
(defn do-some-stuff []
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))
Looking at do-some-stuff in isolation, what the heck is it doing? There are a lot of things happening implicitly. Down this path lies spaghetti. An arguably better version:
(defn make-address-book [] {})
(defn add [addr-book name addr]
(assoc addr-book name addr))
(defn report [addr-book]
(doseq [[name addr] addr-book]
(println name ":" addr)))
(defn do-some-stuff []
(let [addr-book (make-address-book)]
(-> addr-book
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))))
Now it's clear what do-some-stuff is doing, even in isolation. You can have as many address books floating around as you want. Multiple threads could have their own. You can use this code from multiple namespaces safely. You can't forget to initialize the address book, because you pass it as an argument. You can test report easily: just pass the desired "mock" address book in and see what it prints. You don't have to care about any global state or anything but the function you're testing at the moment.
If you don't need to coordinate updates to a data structure from multiple threads, there's usually no need to use refs or global vars.