Differences between assoc-in with two elements and update / assoc - clojure

I've been doing for a while things like (assoc-in my-hash [:data :id] 1), and it looks fine.
Recently, since I rarely have more than two levels, I noticed I can do (update my-hash :data assoc :id 1), which sounds totally different, but returns the same.
So, I wonder, is there any difference in performance? Do you think it's more readable in one way than the other? More idiomatic?
update / assoc feels like it's more expensive to me, but I really like it better than assoc-in, which makes me stop to think each time I see it.

When it comes to performance, it's always good to measure. Ideally you'd assemble a realistic map (whether your maps are big or small will have some impact on the relative cost of various operations) and try it both ways with Criterium:
(require '[criterium.core :as c])
(let [m (construct-your-map)]
(c/bench (assoc-in m [:data :id] 1))
(c/bench (update m :data assoc :id 1)))
Under the hood, update + assoc is sort of the unrolled version of assoc-in here that doesn't need the auxiliary vector to hold the keys, so I would expect it to be faster than assoc-in. But (1) ordinarily I wouldn't worry about minor performance differences when it comes to things like this, (2) when I do care, again, it's better to measure than to guess.
(On my box, with Clojure 1.9.0-alpha14, update + assoc is indeed faster at ~282 ns vs ~353 ns for assoc-in given my small test map of (assoc (into {} (map #(vector % %)) (range 20)) :data {:id 0}).)
Ultimately most of the time readability will be the more important factor, but I don't think you can say in general than one approach is more readable than the other. If you have a → chain that already uses assoc-in or update multiple times, it may be preferable to repeat the same function for the sake of consistency (just to avoid making the reader wonder "is this thing really different"). If you have a codebase that you control, you can adopt a "house style" that favours one approach over the other. Etc., etc.
I might see assoc-in as a little more readable most of the time – it uses a single "verb" and makes it clear at a glance what the (single, exact) path to the update is – but if you prefer update + assoc and expect to keep their use consistent in your codebase, that's certainly fine as well.

Related

Functional alternative to "let"

I find myself writing a lot of clojure in this manner:
(defn my-fun [input]
(let [result1 (some-complicated-procedure input)
result2 (some-other-procedure result1)]
(do-something-with-results result1 result2)))
This let statement seems very... imperative. Which I don't like. In principal, I could be writing the same function like this:
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
The problem with this is that it involves recomputation of some-complicated-procedure, which may be arbitrarily expensive. Also you can imagine that some-complicated-procedure is actually a series of nested function calls, and then I either have to write a whole new function, or risk that changes in the first invocation don't get applied to the second:
E.g. this works, but I have to have an extra shallow, top-level function that makes it hard to do a mental stack trace:
(defn some-complicated-procedure [input] (lots (of (nested (operations input)))))
(defn my-fun [input]
(do-something-with-results (some-complicated-procedure input)
(some-other-procedure (some-complicated-procedure input)))))
E.g. this is dangerous because refactoring is hard:
(defn my-fun [input]
(do-something-with-results (lots (of (nested (operations (mistake input))))) ; oops made a change here that wasn't applied to the other nested calls
(some-other-procedure (lots (of (nested (operations input))))))))
Given these tradeoffs, I feel like I don't have any alternatives to writing long, imperative let statements, but when I do, I cant shake the feeling that I'm not writing idiomatic clojure. Is there a way I can address the computation and code cleanliness problems raised above and write idiomatic clojure? Are imperitive-ish let statements idiomatic?
The kind of let statements you describe might remind you of imperative code, but there is nothing imperative about them. Haskell has similar statements for binding names to values within bodies, too.
If your situation really needs a bigger hammer, there are some bigger hammers that you can either use or take for inspiration. The following two libraries offer some kind of binding form (akin to let) with a localized memoization of results, so as to perform only the necessary steps and reuse their results if needed again: Plumatic Plumbing, specifically the Graph part; and Zach Tellman's Manifold, whose let-flow form furthermore orchestrates asynchronous steps to wait for the necessary inputs to become available, and to run in parallel when possible. Even if you decide to maintain your present course, their docs make good reading, and the code of Manifold itself is educational.
I recently had this same question when I looked at this code I wrote
(let [user-symbols (map :symbol states)
duplicates (for [[id freq] (frequencies user-symbols) :when (> freq 1)] id)]
(do-something-with duplicates))
You'll note that map and for are lazy and will not be executed until do-something-with is executed. It's also possible that not all (or even not any) of the states will be mapped or the frequencies calculated. It depends on what do-something-with actually requests of the sequence returned by for. This is very much functional and idiomatic functional programming.
i guess the simplest approach to keep it functional would be to have a pass-through state to accumulate the intermediate results. something like this:
(defn with-state [res-key f state]
(assoc state res-key (f state)))
user> (with-state :res (comp inc :init) {:init 10})
;;=> {:init 10, :res 11}
so you can move on to something like this:
(->> {:init 100}
(with-state :inc'd (comp inc :init))
(with-state :inc-doubled (comp (partial * 2) :inc'd))
(with-state :inc-doubled-squared (comp #(* % %) :inc-doubled))
(with-state :summarized (fn [st] (apply + (vals st)))))
;;=> {:init 100,
;; :inc'd 101,
;; :inc-doubled 202,
;; :inc-doubled-squared 40804,
;; :summarized 41207}
The let form is a perfectly functional construct and can be seen as syntactic sugar for calls to anonymous functions. We can easily write a recursive macro to implement our own version of let:
(defmacro my-let [bindings body]
(if (empty? bindings)
body
`((fn [~(first bindings)]
(my-let ~(rest (rest bindings)) ~body))
~(second bindings))))
Here is an example of calling it:
(my-let [a 3
b (+ a 1)]
(* a b))
;; => 12
And here is a macroexpand-all called on the above expression, that reveal how we implement my-let using anonymous functions:
(clojure.walk/macroexpand-all '(my-let [a 3
b (+ a 1)]
(* a b)))
;; => ((fn* ([a] ((fn* ([b] (* a b))) (+ a 1)))) 3)
Note that the expansion doesn't rely on let and that the bound symbols become parameter names in the anonymous functions.
As others write, let is actually perfectly functional, but at times it can feel imperative. It's better to become fully comfortable with it.
You might, however, want to kick the tires of my little library tl;dr that lets you write code like for example
(compute
(+ a b c)
where
a (f b)
c (+ 100 b))

Functionality of update vs update-in for non-nested structures

I was looking over some Quil examples, and noticed that 2 different authors (for the "Hyper" and "Equilibrium" examples) used:
(update-in s [:x] + dx vx)
instead of simply
(update s :x + dx vx)
Is there a reason for this? If s was a deeply nested structure, ya, it would make sense. In both cases though, the list of keys only has 1 entry, so to me, the 2 snippets above seem equivalent:
(let [dx 1
vx 2
s {:x 5}]
(println (update-in s [:x] + dx vx))
(println (update s :x + dx vx)))
{:x 8}
{:x 8}
Except that update-in will probably have a little bit more overhead.
The only reason I could think of is if they make the state nested in the future, it will ease the transition. For such a simple example though, this seemed unlikely, especially given there are magic constants everywhere.
Is there any reason to use update-in over update when the structure isn't nested?
There is no reason to use update-in for a non-nested structure and update is preferred.
If you look at the source code, you will see they both use assoc under the covers. There is no reason to prefer one over the other except for style and code clarity, taking nearby and related code into account.
Also, update was not added until Clojure 1.7, which may play a role in the choice.
P.S. If you are ever looking for the missing function dissoc-in you can find it in the Tupelo library.

Creating clojure atoms with a function

I want to 1) create a list of symbols with the function below; then 2) create atoms with these symbols/names so that the atoms can be modified from other functions. This is the function to generate symbols/names:
(defn genVars [ dist ]
(let [ nms (map str (range dist)) neigs (map #(apply str "neig" %) nms) ]
(doseq [ v neigs ]
(intern *ns* (symbol v) [ ] ))
))
If dist=3, then 3 symbols, neig0, ... neig2 are created each bound with an empty vector. If it is possible to functionally create atoms with these symbols so that they are accessible from other functions. Any help is much appreciated, even if there are other ways to accomplish this.
your function seems to be correct, just wrap the value in the intern call with atom call. Also I would rather use dotimes.
user>
(defn gen-atoms [amount prefix]
(dotimes [i amount]
(intern *ns* (symbol (str prefix i)) (atom []))))
#'user/gen-atoms
user> (gen-atoms 2 "x")
nil
user> x0
#atom[[] 0x30f1a7b]
user> x1
#atom[[] 0x2149efef]
The desire to generate names suggests you would be better served by a single map instead:
(def neighbours (atom (make-neighbours)))
Where the definition of make-neigbours might look something like this:
(defn make-neighbours []
(into {} (for [i (range 10)]
[(str "neig" i) {:age i}])))
Where the other namespace would look values up using something like:
(get-in #data/neighbours ["neig0" :age])
Idiomatic Clojure tends to avoid creating many named global vars, preferring instead to collocating state into one or a few vars governed by Clojure's concurrency primitives (atom/ref/agent). I encourage you to think about whether your problem can be solved with a single atom in this way instead of requiring defining multiple vars.
Having said that, if you really really need multiple atoms, consider storing them all in a single map var instead of creating many global vars. Personally, I have never encountered a situation where creating many atoms was better than a single big atom (so I would be interested to hear about situations where this would be important).
If you really really need many vars, be aware that defining vars inside a function is actually bad style (https://github.com/bbatsov/clojure-style-guide#dont-def-vars-inside-fns). With good reason too! The beauty of using functions and data comes from the purity of the functions. def inside a function is particularly nasty as it is not only a side-effect, but is an potentially execution flow altering side-effect.
Of course yes there is a way to achieve it, as another answer points out.
Where it comes to defining things that goes beyond def and defn, there is quite a lot of precedence to using macros. For example defroutes from compojure, defschema from Schema, deftest from clojure.test. Generally anything that is a convenience form for creating vars. You could use a macro solution to create defs for your atoms:
(defmacro defneighbours [n]
`(do
~#(for [sym (for [i (range n)]
(symbol (str "neig" i)))]
`(def ~sym (atom {}))))
In my opinion this is actually less offensive than a functional version, only because it is creating global defs. It is a little more obvious about creating global defs by using the regular def syntax. But I only bring it up as a strawman, because this is still bad.
The reason functions and data work best is because they compose.
There are tangible considerations that make a single atom governing state very convenient. You can iterate over all neighbors conveniently, you can add new ones dynamically. Also you can do things like concatenating neighbors with other neighbors etc. Basically there are lots of function/data abstractions that you lock yourself out of if you create many global vars.
This is the reason that macros are generally considered useful for syntactic tricks, but best avoided in favor of functions and data. And it has a real impact on the flexibility of your code. For example going back to compojure; the macro syntax is actually very limiting, and for that reason I prefer not to use defroutes at all.
In summary:
Don't make lots of global defs if you can avoid it.
Prefer 1 atom over many atoms where possible.
Don't def inside a function.
Macros are best avoided in favor of functions and data.
Regardless of these guidelines, it is always good to explore what is possible, and I can't know your circumstances, so above all I hope you overcome your immediate problem and find Clojure a pleasant language to use.

Best Practice for globals in clojure, (refs vs alter-var-root)?

I've found myself using the following idiom lately in clojure code.
(def *some-global-var* (ref {}))
(defn get-global-var []
#*global-var*)
(defn update-global-var [val]
(dosync (ref-set *global-var* val)))
Most of the time this isn't even multi-threaded code that might need the transactional semantics that refs give you. It just feels like refs are for more than threaded code but basically for any global that requires immutability. Is there a better practice for this? I could try to refactor the code to just use binding or let but that can get particularly tricky for some applications.
I always use an atom rather than a ref when I see this kind of pattern - if you don't need transactions, just a shared mutable storage location, then atoms seem to be the way to go.
e.g. for a mutable map of key/value pairs I would use:
(def state (atom {}))
(defn get-state [key]
(#state key))
(defn update-state [key val]
(swap! state assoc key val))
Your functions have side effects. Calling them twice with the same inputs may give different return values depending on the current value of *some-global-var*. This makes things difficult to test and reason about, especially once you have more than one of these global vars floating around.
People calling your functions may not even know that your functions are depending on the value of the global var, without inspecting the source. What if they forget to initialize the global var? It's easy to forget. What if you have two sets of code both trying to use a library that relies on these global vars? They are probably going to step all over each other, unless you use binding. You also add overheads every time you access data from a ref.
If you write your code side-effect free, these problems go away. A function stands on its own. It's easy to test: pass it some inputs, inspect the outputs, they'll always be the same. It's easy to see what inputs a function depends on: they're all in the argument list. And now your code is thread-safe. And probably runs faster.
It's tricky to think about code this way if you're used to the "mutate a bunch of objects/memory" style of programming, but once you get the hang of it, it becomes relatively straightforward to organize your programs this way. Your code generally ends up as simple as or simpler than the global-mutation version of the same code.
Here's a highly contrived example:
(def *address-book* (ref {}))
(defn add [name addr]
(dosync (alter *address-book* assoc name addr)))
(defn report []
(doseq [[name addr] #*address-book*]
(println name ":" addr)))
(defn do-some-stuff []
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))
Looking at do-some-stuff in isolation, what the heck is it doing? There are a lot of things happening implicitly. Down this path lies spaghetti. An arguably better version:
(defn make-address-book [] {})
(defn add [addr-book name addr]
(assoc addr-book name addr))
(defn report [addr-book]
(doseq [[name addr] addr-book]
(println name ":" addr)))
(defn do-some-stuff []
(let [addr-book (make-address-book)]
(-> addr-book
(add "Brian" "123 Bovine University Blvd.")
(add "Roger" "456 Main St.")
(report))))
Now it's clear what do-some-stuff is doing, even in isolation. You can have as many address books floating around as you want. Multiple threads could have their own. You can use this code from multiple namespaces safely. You can't forget to initialize the address book, because you pass it as an argument. You can test report easily: just pass the desired "mock" address book in and see what it prints. You don't have to care about any global state or anything but the function you're testing at the moment.
If you don't need to coordinate updates to a data structure from multiple threads, there's usually no need to use refs or global vars.

Clojure: reduce vs. apply

I understand the conceptual difference between reduce and apply:
(reduce + (list 1 2 3 4 5))
; translates to: (+ (+ (+ (+ 1 2) 3) 4) 5)
(apply + (list 1 2 3 4 5))
; translates to: (+ 1 2 3 4 5)
However, which one is more idiomatic clojure? Does it make much difference one way or the other? From my (limited) performance testing, it seems reduce is a bit faster.
reduce and apply are of course only equivalent (in terms of the ultimate result returned) for associative functions which need to see all their arguments in the variable-arity case. When they are result-wise equivalent, I'd say that apply is always perfectly idiomatic, while reduce is equivalent -- and might shave off a fraction of a blink of an eye -- in a lot of the common cases. What follows is my rationale for believing this.
+ is itself implemented in terms of reduce for the variable-arity case (more than 2 arguments). Indeed, this seems like an immensely sensible "default" way to go for any variable-arity, associative function: reduce has the potential to perform some optimisations to speed things up -- perhaps through something like internal-reduce, a 1.2 novelty recently disabled in master, but hopefully to be reintroduced in the future -- which it would be silly to replicate in every function which might benefit from them in the vararg case. In such common cases, apply will just add a little overhead. (Note it's nothing to be really worried about.)
On the other hand, a complex function might take advantage of some optimisation opportunities which aren't general enough to be built into reduce; then apply would let you take advantage of those while reduce might actually slow you down. A good example of the latter scenario occuring in practice is provided by str: it uses a StringBuilder internally and will benefit significantly from the use of apply rather than reduce.
So, I'd say use apply when in doubt; and if you happen to know that it's not buying you anything over reduce (and that this is unlikely to change very soon), feel free to use reduce to shave off that diminutive unnecessary overhead if you feel like it.
For newbies looking at this answer,
be careful, they are not the same:
(apply hash-map [:a 5 :b 6])
;= {:a 5, :b 6}
(reduce hash-map [:a 5 :b 6])
;= {{{:a 5} :b} 6}
It doesn't make a difference in this case, because + is a special case that can apply to any number of arguments. Reduce is a way to apply a function that expects a fixed number of arguments (2) to an arbitrarily long list of arguments.
Opinions vary- In the greater Lisp world, reduce is definitely considered more idiomatic. First, there is the variadic issues already discussed. Also, some Common Lisp compilers will actually fail when apply is applied against very long lists because of how they handle argument lists.
Amongst Clojurists in my circle, though, using apply in this case seems more common. I find it easier to grok and prefer it also.
I normally find myself preferring reduce when acting on any kind of collection - it performs well, and is a pretty useful function in general.
The main reason I would use apply is if the parameters mean different things in different positions, or if you have a couple of initial parameters but want to get the rest from a collection, e.g.
(apply + 1 2 other-number-list)
In this specific case I prefer reduce because it's more readable: when I read
(reduce + some-numbers)
I know immediately that you're turning a sequence into a value.
With apply I have to consider which function is being applied: "ah, it's the + function, so I'm getting... a single number". Slightly less straightforward.
When using a simple function like +, it really doesn't matter which one you use.
In general, the idea is that reduce is an accumulating operation. You present the current accumulation value and one new value to your accumulating function The result of the function is the cumulative value for the next iteration. So, your iterations look like:
cum-val[i+1] = F( cum-val[i], input-val[i] ) ; please forgive the java-like syntax!
For apply, the idea is that you are attempting to call a function expecting a number of scalar arguments, but they are currently in a collection and need to be pulled out. So, instead of saying:
vals = [ val1 val2 val3 ]
(some-fn (vals 0) (vals 1) (vals 2))
we can say:
(apply some-fn vals)
and it is converted to be equivalent to:
(some-fn val1 val2 val3)
So, using "apply" is like "removing the parentheses" around the sequence.
Bit late on the topic but I did a simple experiment after reading this example. Here is result from my repl, I just can't deduce anything from the response, but seems there is some sort of caching kick in between reduce and apply.
user=> (time (reduce + (range 1e3)))
"Elapsed time: 5.543 msecs"
499500
user=> (time (apply + (range 1e3)))
"Elapsed time: 5.263 msecs"
499500
user=> (time (apply + (range 1e4)))
"Elapsed time: 19.721 msecs"
49995000
user=> (time (reduce + (range 1e4)))
"Elapsed time: 1.409 msecs"
49995000
user=> (time (reduce + (range 1e5)))
"Elapsed time: 17.524 msecs"
4999950000
user=> (time (apply + (range 1e5)))
"Elapsed time: 11.548 msecs"
4999950000
Looking at source code of clojure reduce its pretty clean recursion with internal-reduce, didn't found anything on implementation of apply though. Clojure implementation of + for apply internally invoke reduce, which is cached by repl, which seem to explain the 4th call. Can someone clarify whats really happening here?
The beauty of apply is given function (+ in this case) can be applied to argument list formed by pre-pending intervening arguments with an ending collection. Reduce is an abstraction to process collection items applying the function for each and doesn't work with variable args case.
(apply + 1 2 3 [3 4])
=> 13
(reduce + 1 2 3 [3 4])
ArityException Wrong number of args (5) passed to: core/reduce clojure.lang.AFn.throwArity (AFn.java:429)
A bit late, but...
In this case, there is not a big difference. But in general they are not equivalent. Further more reduce can be more performant. Why?
reduce checks if a collection or type implements IReduced interface. That means a type knows how provide its values to the reducing function in the most performant why.
reduce can be stopped prematurely by returning a Reduced value.
Apply on the other hand, is invoked by applyToHelper. Which dispatches to the right arity by counting the args, unpacking the values from the collection.
Is it a big performance impact? Probably not.
My opinion is as others already pointed out. Use reduce if you want to semantically "reduce" a collection to a single value. Otherwise use apply.