Clojure idiomatic get-and-set function - clojure

Is there a more idiomatic/readable way of writing a get-and-set function in Clojure than:
(def the-ref (ref {}))
(defn get-and-set [new-value]
(dosync
(let [old-value #the-ref]
(do
(ref-set the-ref new-value)
old-value))))

for the simple cases I tend to see this operation used directly instead of wrapped in a function:
hello.core> (dosync (some-work #the-ref) (ref-set the-ref 5))
5
In this case dosync generally serves as the wrapper you are looking for. within the dosync This is significant because dosync composes nicely with other transactions and makes the bounds of the transaction visible. If you are in a position where the wrapper function can completely encapsulate all the references to the ref then perhaps refs are not the best tool.
A typical use of refs could look more along these lines:
(dosync (some-work #the-ref #the-other-ref) (ref-set the-ref #the-other-ref))
The need to wrap it is rare because when ref's are used they typically are use in groups of more than one ref because coordinated changes are required by the problem at hand. In cases where their is just one value then atoms are more common.

Related

Functional alternative to "let"

I find myself writing a lot of clojure in this manner:
(defn my-fun [input]
(let [result1 (some-complicated-procedure input)
result2 (some-other-procedure result1)]
(do-something-with-results result1 result2)))
This let statement seems very... imperative. Which I don't like. In principal, I could be writing the same function like this:
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
The problem with this is that it involves recomputation of some-complicated-procedure, which may be arbitrarily expensive. Also you can imagine that some-complicated-procedure is actually a series of nested function calls, and then I either have to write a whole new function, or risk that changes in the first invocation don't get applied to the second:
E.g. this works, but I have to have an extra shallow, top-level function that makes it hard to do a mental stack trace:
(defn some-complicated-procedure [input] (lots (of (nested (operations input)))))
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
E.g. this is dangerous because refactoring is hard:
(defn my-fun [input]
(do-something-with-results (lots (of (nested (operations (mistake input))))) ; oops made a change here that wasn't applied to the other nested calls
(some-other-procedure (lots (of (nested (operations input))))))))
Given these tradeoffs, I feel like I don't have any alternatives to writing long, imperative let statements, but when I do, I cant shake the feeling that I'm not writing idiomatic clojure. Is there a way I can address the computation and code cleanliness problems raised above and write idiomatic clojure? Are imperitive-ish let statements idiomatic?
The kind of let statements you describe might remind you of imperative code, but there is nothing imperative about them. Haskell has similar statements for binding names to values within bodies, too.
If your situation really needs a bigger hammer, there are some bigger hammers that you can either use or take for inspiration. The following two libraries offer some kind of binding form (akin to let) with a localized memoization of results, so as to perform only the necessary steps and reuse their results if needed again: Plumatic Plumbing, specifically the Graph part; and Zach Tellman's Manifold, whose let-flow form furthermore orchestrates asynchronous steps to wait for the necessary inputs to become available, and to run in parallel when possible. Even if you decide to maintain your present course, their docs make good reading, and the code of Manifold itself is educational.
I recently had this same question when I looked at this code I wrote
(let [user-symbols (map :symbol states)
duplicates (for [[id freq] (frequencies user-symbols) :when (> freq 1)] id)]
(do-something-with duplicates))
You'll note that map and for are lazy and will not be executed until do-something-with is executed. It's also possible that not all (or even not any) of the states will be mapped or the frequencies calculated. It depends on what do-something-with actually requests of the sequence returned by for. This is very much functional and idiomatic functional programming.
i guess the simplest approach to keep it functional would be to have a pass-through state to accumulate the intermediate results. something like this:
(defn with-state [res-key f state]
(assoc state res-key (f state)))
user> (with-state :res (comp inc :init) {:init 10})
;;=> {:init 10, :res 11}
so you can move on to something like this:
(->> {:init 100}
(with-state :inc'd (comp inc :init))
(with-state :inc-doubled (comp (partial * 2) :inc'd))
(with-state :inc-doubled-squared (comp #(* % %) :inc-doubled))
(with-state :summarized (fn [st] (apply + (vals st)))))
;;=> {:init 100,
;; :inc'd 101,
;; :inc-doubled 202,
;; :inc-doubled-squared 40804,
;; :summarized 41207}
The let form is a perfectly functional construct and can be seen as syntactic sugar for calls to anonymous functions. We can easily write a recursive macro to implement our own version of let:
(defmacro my-let [bindings body]
(if (empty? bindings)
body
`((fn [~(first bindings)]
(my-let ~(rest (rest bindings)) ~body))
~(second bindings))))
Here is an example of calling it:
(my-let [a 3
b (+ a 1)]
(* a b))
;; => 12
And here is a macroexpand-all called on the above expression, that reveal how we implement my-let using anonymous functions:
(clojure.walk/macroexpand-all '(my-let [a 3
b (+ a 1)]
(* a b)))
;; => ((fn* ([a] ((fn* ([b] (* a b))) (+ a 1)))) 3)
Note that the expansion doesn't rely on let and that the bound symbols become parameter names in the anonymous functions.
As others write, let is actually perfectly functional, but at times it can feel imperative. It's better to become fully comfortable with it.
You might, however, want to kick the tires of my little library tl;dr that lets you write code like for example
(compute
(+ a b c)
where
a (f b)
c (+ 100 b))

Call a side effecting function only when atom value changes

What is the simplest way to trigger a side-effecting function to be called only when an atom's value changes?
If I were using a ref, I think I could just do this:
(defn transform-item [x] ...)
(defn do-side-effect-on-change [] nil)
(def my-ref (ref ...))
(when (dosync (let [old-value #my-ref
_ (alter! my-ref transform-item)
new-value #my-ref]
(not= old-value new-value)))
(do-side-effect-on-change))
But this seems seems a bit roundabout, since I'm using a ref even though I am not trying to coordinate changes across multiple refs. Essentially I am using it just to conveniently access the old and new value within a successful transaction.
I feel like I should be able to use an atom instead. Is there a solution simpler than this?
(def my-atom (atom ...))
(let [watch-key ::side-effect-watch
watch-fn (fn [_ _ old-value new-value]
(when (not= old-value new-value)
(do-side-effect-on-change)))]
(add-watch my-atom watch-key watch-fn)
(swap! my-atom transform-item)
(remove-watch watch-key))
This also seems roundabout, because I am adding and removing the watch around every call to swap!. But I need this, because I don't want a watch hanging around that causes the side-effecting function to be triggered when other code modifies the atom.
It is important that the side-effecting function be called exactly once per mutation to the atom, and only when the transform function transform-item actually returns a new value. Sometimes it will return the old value, yielding new change.
(when (not= #a (swap! a transform))
(do-side-effect))
But you should be very clear about what concurrency semantics you need. For example another thread may modify the atom between reading it and swapping it:
a = 1
Thread 1 reads a as 1
Thread 2 modifies a to 2
Thread 1 swaps a from 2 to 2
Thread 1 determines 1 != 2 and calls do-side-effect
It is not clear to me from the question whether this is desirable or not desirable. If you do not want this behavior, then an atom just will not do the job unless you introduce concurrency control with a lock.
Seeing as you started with a ref and asked about an atom, I think you have probably given some thought to concurrency already. It seems like from your description the ref approach is better:
(when (dosync (not= #r (alter r transform))
(do-side-effect))
Is there a reason you don't like your ref solution?
If the answer is "because I don't have concurrency" Then I would encourage you to use a ref anyway. There isn't really a downside to it, and it makes your semantics explicit. IMO programs tend to grow and to a point where concurrency exists, and Clojure is really great at being explicit about what should happen when it exists. (For example oh I'm just calculating stuff, oh I'm just exposing this stuff as a web service now, oh now I'm concurrent).
In any case, bear in mind that functions like alter and swap! return the value, so you can make use of this for concise expressions.
I'm running into the same situation and just come up 2 solutions.
state field :changed?
Keeping a meanless :changed mark in atom to track swap function. And take the return value of swap! to see if things changed. For example:
(defn data (atom {:value 0 :changed? false}))
(let [{changed? :changed?} (swap! data (fn [data] (if (change?)
{:value 1 :changed? true}
{:value 0 :change? false})))]
(when changed? (do-your-task)))
exception based
You can throw an Exception in swap function, and catch it outside:
(try
(swap! data (fn [d] (if (changed?) d2 (ex-info "unchanged" {})))
(do-your-task)
(catch Exception _
))

What are side-effects in predicates and why are they bad?

I'm wondering what is considered to be a side-effect in predicates for fns like remove or filter. There seems to be a range of possibilities. Clearly, if the predicate writes to a file, this is a side-effect. But consider a situation like this:
(def *big-var-that-might-be-garbage-collected* ...)
(let [my-ref *big-var-that-might-be-garbage-collected*]
(defn my-pred
[x]
(some-operation-on my-ref x)))
Even if some-operation-on is merely a query that does not change state, the fact that my-pred retains a reference to *big... changes the state of the system in that the big var cannot be garbage collected. Is this also considered to be side-effect?
In my case, I'd like to write to a logging system in a predicate. Is this a side effect?
And why are side-effects in predicates discouraged exactly? Is it because filter and remove and their friends work lazily so that you cannot determine when the predicates are called (and - hence - when the side-effects happen)?
GC is not typically considered when evaluating if a function is pure or not, although many actions that make a function impure can have a GC effect.
Logging is a side effect, as is changing any state in the program or the world. A pure function takes data and returns data, without modifying anything else.
https://softwareengineering.stackexchange.com/questions/15269/why-are-side-effects-considered-evil-in-functional-programming covers why side effects are avoided in functional languages.
I found this link helpful
The problem is determining when, or even whether, the side-effects will occur on any given call to the function.
If you only care that the same inputs return the same answer, you are fine. Side-effects are dependent on how the function is executed.
For example,
(first (filter odd? (range 20)))
; 1
But if we arrange for odd? to print its argument as it goes:
(first (filter #(do (print %) (odd? %)) (range 20)))
It will print 012345678910111213141516171819 before returning 1!
The reason is that filter, where it can, deals with its sequence argument in chunks of 32 elements.
If we take the limit off the range:
(first (filter #(do (print %) (odd? %)) (range)))
... we get a full-size chunk printed: 012345678910111213141516171819012345678910111213141516171819202122232425262728293031
Just printing the argument is confusing. If the side effects are significant, things could go seriously awry.

Creating refs with transducers

Is it possible to create a ref with a transducer in Clojure, in a way analogous to creating a chan with a transducer?
i.e., when you create a chan with a transducer, it filters/maps all the inputs into the outputs.
I'd expect there's also a way to create a ref such that whatever you set, it can either ignore or modify the input. Is this possible to do?
Adding a transducer to a channel modifies the contents as they pass through, which is roughly analogous to adding a watch to a ref that applies it's own change each time the value changes. This change it's self then triggers the watch again so be careful not to blow the stack if they are recursive.
user> (def r (ref 0))
#'user/r
user> (add-watch r :label
(fn [label the-ref old-state new-state]
(println "adding that little something extra")
(if (< old-state 10) (dosync (commute the-ref inc)))))
#<Ref#1af618c2: 0>
user> (dosync (alter r inc))
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
adding that little something extra
1
user> #r
11
You could even apply a transducer to the state of the atom if you wanted.
This is an interesting idea, but the wrong way to go about it for at least a couple reasons. You'd lose some relationships you'd expect to hold:
(alter r identity) =/= r
(alter r f)(alter r f) =/= (alter r (comp f f))
(alter r f) =/= (ref-set r (f #r))
Also some transducers are side-effecting volatiles, and have no business in a dosync block. i.e. if you use (take n) as your transducer then if your dosync fails, then it'll retry as though invoked with (take (dec n)), which violates dosync body requirements.
The problem is a ref lets you read and write as separate steps. If instead there was something foundational that let you "apply" an input to a hidden "state" and collect the output all in one step, consistently with the STM, then that'd be something to work with.

Best Practice for globals in clojure, (refs vs alter-var-root)?

I've found myself using the following idiom lately in clojure code.
(def *some-global-var* (ref {}))
(defn get-global-var []
#*global-var*)
(defn update-global-var [val]
(dosync (ref-set *global-var* val)))
Most of the time this isn't even multi-threaded code that might need the transactional semantics that refs give you. It just feels like refs are for more than threaded code but basically for any global that requires immutability. Is there a better practice for this? I could try to refactor the code to just use binding or let but that can get particularly tricky for some applications.
I always use an atom rather than a ref when I see this kind of pattern - if you don't need transactions, just a shared mutable storage location, then atoms seem to be the way to go.
e.g. for a mutable map of key/value pairs I would use:
(def state (atom {}))
(defn get-state [key]
(#state key))
(defn update-state [key val]
(swap! state assoc key val))
Your functions have side effects. Calling them twice with the same inputs may give different return values depending on the current value of *some-global-var*. This makes things difficult to test and reason about, especially once you have more than one of these global vars floating around.
People calling your functions may not even know that your functions are depending on the value of the global var, without inspecting the source. What if they forget to initialize the global var? It's easy to forget. What if you have two sets of code both trying to use a library that relies on these global vars? They are probably going to step all over each other, unless you use binding. You also add overheads every time you access data from a ref.
If you write your code side-effect free, these problems go away. A function stands on its own. It's easy to test: pass it some inputs, inspect the outputs, they'll always be the same. It's easy to see what inputs a function depends on: they're all in the argument list. And now your code is thread-safe. And probably runs faster.
It's tricky to think about code this way if you're used to the "mutate a bunch of objects/memory" style of programming, but once you get the hang of it, it becomes relatively straightforward to organize your programs this way. Your code generally ends up as simple as or simpler than the global-mutation version of the same code.
Here's a highly contrived example:
(def *address-book* (ref {}))
(defn add [name addr]
(dosync (alter *address-book* assoc name addr)))
(defn report []
(doseq [[name addr] #*address-book*]
(println name ":" addr)))
(defn do-some-stuff []
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))
Looking at do-some-stuff in isolation, what the heck is it doing? There are a lot of things happening implicitly. Down this path lies spaghetti. An arguably better version:
(defn make-address-book [] {})
(defn add [addr-book name addr]
(assoc addr-book name addr))
(defn report [addr-book]
(doseq [[name addr] addr-book]
(println name ":" addr)))
(defn do-some-stuff []
(let [addr-book (make-address-book)]
(-> addr-book
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))))
Now it's clear what do-some-stuff is doing, even in isolation. You can have as many address books floating around as you want. Multiple threads could have their own. You can use this code from multiple namespaces safely. You can't forget to initialize the address book, because you pass it as an argument. You can test report easily: just pass the desired "mock" address book in and see what it prints. You don't have to care about any global state or anything but the function you're testing at the moment.
If you don't need to coordinate updates to a data structure from multiple threads, there's usually no need to use refs or global vars.