Clojure: reduce vs. apply

Clojure: reduce vs. apply - clojure

I understand the conceptual difference between reduce and apply:
(reduce + (list 1 2 3 4 5))
; translates to: (+ (+ (+ (+ 1 2) 3) 4) 5)
(apply + (list 1 2 3 4 5))
; translates to: (+ 1 2 3 4 5)
However, which one is more idiomatic clojure? Does it make much difference one way or the other? From my (limited) performance testing, it seems reduce is a bit faster.

reduce and apply are of course only equivalent (in terms of the ultimate result returned) for associative functions which need to see all their arguments in the variable-arity case. When they are result-wise equivalent, I'd say that apply is always perfectly idiomatic, while reduce is equivalent -- and might shave off a fraction of a blink of an eye -- in a lot of the common cases. What follows is my rationale for believing this.
+ is itself implemented in terms of reduce for the variable-arity case (more than 2 arguments). Indeed, this seems like an immensely sensible "default" way to go for any variable-arity, associative function: reduce has the potential to perform some optimisations to speed things up -- perhaps through something like internal-reduce, a 1.2 novelty recently disabled in master, but hopefully to be reintroduced in the future -- which it would be silly to replicate in every function which might benefit from them in the vararg case. In such common cases, apply will just add a little overhead. (Note it's nothing to be really worried about.)
On the other hand, a complex function might take advantage of some optimisation opportunities which aren't general enough to be built into reduce; then apply would let you take advantage of those while reduce might actually slow you down. A good example of the latter scenario occuring in practice is provided by str: it uses a StringBuilder internally and will benefit significantly from the use of apply rather than reduce.
So, I'd say use apply when in doubt; and if you happen to know that it's not buying you anything over reduce (and that this is unlikely to change very soon), feel free to use reduce to shave off that diminutive unnecessary overhead if you feel like it.

For newbies looking at this answer,
be careful, they are not the same:
(apply hash-map [:a 5 :b 6])
;= {:a 5, :b 6}
(reduce hash-map [:a 5 :b 6])
;= {{{:a 5} :b} 6}

It doesn't make a difference in this case, because + is a special case that can apply to any number of arguments. Reduce is a way to apply a function that expects a fixed number of arguments (2) to an arbitrarily long list of arguments.

Opinions vary- In the greater Lisp world, reduce is definitely considered more idiomatic. First, there is the variadic issues already discussed. Also, some Common Lisp compilers will actually fail when apply is applied against very long lists because of how they handle argument lists.
Amongst Clojurists in my circle, though, using apply in this case seems more common. I find it easier to grok and prefer it also.

I normally find myself preferring reduce when acting on any kind of collection - it performs well, and is a pretty useful function in general.
The main reason I would use apply is if the parameters mean different things in different positions, or if you have a couple of initial parameters but want to get the rest from a collection, e.g.
(apply + 1 2 other-number-list)

In this specific case I prefer reduce because it's more readable: when I read
(reduce + some-numbers)
I know immediately that you're turning a sequence into a value.
With apply I have to consider which function is being applied: "ah, it's the + function, so I'm getting... a single number". Slightly less straightforward.

When using a simple function like +, it really doesn't matter which one you use.
In general, the idea is that reduce is an accumulating operation. You present the current accumulation value and one new value to your accumulating function The result of the function is the cumulative value for the next iteration. So, your iterations look like:
cum-val[i+1] = F( cum-val[i], input-val[i] ) ; please forgive the java-like syntax!
For apply, the idea is that you are attempting to call a function expecting a number of scalar arguments, but they are currently in a collection and need to be pulled out. So, instead of saying:
vals = [ val1 val2 val3 ]
(some-fn (vals 0) (vals 1) (vals 2))
we can say:
(apply some-fn vals)
and it is converted to be equivalent to:
(some-fn val1 val2 val3)
So, using "apply" is like "removing the parentheses" around the sequence.

Bit late on the topic but I did a simple experiment after reading this example. Here is result from my repl, I just can't deduce anything from the response, but seems there is some sort of caching kick in between reduce and apply.
user=> (time (reduce + (range 1e3)))
"Elapsed time: 5.543 msecs"
499500
user=> (time (apply + (range 1e3)))
"Elapsed time: 5.263 msecs"
499500
user=> (time (apply + (range 1e4)))
"Elapsed time: 19.721 msecs"
49995000
user=> (time (reduce + (range 1e4)))
"Elapsed time: 1.409 msecs"
49995000
user=> (time (reduce + (range 1e5)))
"Elapsed time: 17.524 msecs"
4999950000
user=> (time (apply + (range 1e5)))
"Elapsed time: 11.548 msecs"
4999950000
Looking at source code of clojure reduce its pretty clean recursion with internal-reduce, didn't found anything on implementation of apply though. Clojure implementation of + for apply internally invoke reduce, which is cached by repl, which seem to explain the 4th call. Can someone clarify whats really happening here?

The beauty of apply is given function (+ in this case) can be applied to argument list formed by pre-pending intervening arguments with an ending collection. Reduce is an abstraction to process collection items applying the function for each and doesn't work with variable args case.
(apply + 1 2 3 [3 4])
=> 13
(reduce + 1 2 3 [3 4])
ArityException Wrong number of args (5) passed to: core/reduce clojure.lang.AFn.throwArity (AFn.java:429)

A bit late, but...
In this case, there is not a big difference. But in general they are not equivalent. Further more reduce can be more performant. Why?
reduce checks if a collection or type implements IReduced interface. That means a type knows how provide its values to the reducing function in the most performant why.
reduce can be stopped prematurely by returning a Reduced value.
Apply on the other hand, is invoked by applyToHelper. Which dispatches to the right arity by counting the args, unpacking the values from the collection.
Is it a big performance impact? Probably not.
My opinion is as others already pointed out. Use reduce if you want to semantically "reduce" a collection to a single value. Otherwise use apply.

Related

Differences between assoc-in with two elements and update / assoc

I've been doing for a while things like (assoc-in my-hash [:data :id] 1), and it looks fine.
Recently, since I rarely have more than two levels, I noticed I can do (update my-hash :data assoc :id 1), which sounds totally different, but returns the same.
So, I wonder, is there any difference in performance? Do you think it's more readable in one way than the other? More idiomatic?
update / assoc feels like it's more expensive to me, but I really like it better than assoc-in, which makes me stop to think each time I see it.

When it comes to performance, it's always good to measure. Ideally you'd assemble a realistic map (whether your maps are big or small will have some impact on the relative cost of various operations) and try it both ways with Criterium:
(require '[criterium.core :as c])
(let [m (construct-your-map)]
(c/bench (assoc-in m [:data :id] 1))
(c/bench (update m :data assoc :id 1)))
Under the hood, update + assoc is sort of the unrolled version of assoc-in here that doesn't need the auxiliary vector to hold the keys, so I would expect it to be faster than assoc-in. But (1) ordinarily I wouldn't worry about minor performance differences when it comes to things like this, (2) when I do care, again, it's better to measure than to guess.
(On my box, with Clojure 1.9.0-alpha14, update + assoc is indeed faster at ~282 ns vs ~353 ns for assoc-in given my small test map of (assoc (into {} (map #(vector % %)) (range 20)) :data {:id 0}).)
Ultimately most of the time readability will be the more important factor, but I don't think you can say in general than one approach is more readable than the other. If you have a → chain that already uses assoc-in or update multiple times, it may be preferable to repeat the same function for the sake of consistency (just to avoid making the reader wonder "is this thing really different"). If you have a codebase that you control, you can adopt a "house style" that favours one approach over the other. Etc., etc.
I might see assoc-in as a little more readable most of the time – it uses a single "verb" and makes it clear at a glance what the (single, exact) path to the update is – but if you prefer update + assoc and expect to keep their use consistent in your codebase, that's certainly fine as well.

What is the difference between (take 5 (range)) and (range 5)

I'm just starting to learn Clojure and I've seen several uses of the 'take' function in reference to range.
Specifically
(take 5 (range))
Which seems identical to
(range 5)
Both Generate
(0 1 2 3 4)
Is there a reason either stylistically or for performance to use one or the other?

Generally speaking, using (range 5) is likely to be more performant and I would consider it more idiomatic. However, keep in mind that this requires one to know the extent of the range at the time of its creation.
In cases where the size is unknown initially, or some other transformation may take place after construction, having the take option is quite nice. For example:
(->> (range) (filter even?) (drop 1) (take 5))

Both have the same performance. Because (range) function is returning a lazy seq, not yet realized till access the elements. According to Danial Higginbotham in his book "Clojure for the brave and true" The lazy sequence
consists of two parts: a recipe for how to realize the elements of a sequence and the elements have been realized so far. When you are using (range) it doesn't include any realized elements
but it does have the recipe for generating its elements. everytime you try to access an unrealized elements the lazy seq will use its recipe to generate the requested element.
here is the link that explains the lazy seq in depth
http://www.braveclojure.com/core-functions-in-depth/

Range can be used in following forms
(range) #or
(range end) #or
(range start end) #or
(range start end step)
So here you have control over the range you are generating and you are generating collection
in your example using (range) will give a lazy sequence which will be evaluated as per the need so you take function needs 5 items so those many items are generated
While take is used like
(take n) #or
(take n coll)
where you need to pass the collection from which you want to take n items

purpose of clojure reduced function

What is the purpose of the clojure reduced function (added in clojure 1.5, https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/reduced)
I can't find any examples for it. The doc says:
Wraps x in a way such that a reduce will terminate with the value x.
There is also a reduced? which is acquainted to it
Returns true if x is the result of a call to reduced
When I try it out, e.g with (reduce + (reduced 100)), I get an error instead of 100. Also why would I reduce something when I know the result in advance? Since it was added there is likely a reason, but googling for clojure reduced only contains reduce results.

reduced allows you to short circuit a reduction:
(reduce (fn [acc x]
(if (> acc 10)
(reduced acc)
(+ acc x)))
0
(range 100))
;= 15
(NB. the edge case with (reduced 0) passed in as the initial value doesn't work as of Clojure 1.6.)
This is useful, because reduce-based looping is both very elegant and very performant (so much so that reduce-based loops are not infrequently more performant than the "natural" replacements based on loop/recur), so it's good to make this pattern as broadly applicable as possible. The ability to short circuit reduce vastly increases the range of possible applications.
As for reduced?, I find it useful primarily when implementing reduce logic for new data structures; in regular code, I let reduce perform its own reduced? checks where appropriate.

Is there a more idiomatic way to get N random elements of a collection in Clojure?

I’m currrently doing this: (repeatedly n #(rand-nth (seq coll))) but I suspect there might be a more idiomatic way, for 2 reasons:
I’ve found that there’s frequently a more concise and expressive alternative to using short anonymous functions, e.g. partial
the docstring for repeatedly says “presumably with side effects”, implying that it’s not intended to be used to produce values
I suppose I could figure out a way to use reduce but that seems like it would be tricky and less efficient, as it would have to process the entire collection, since reduce is not lazy.

An easy solution but not optimal for big collections could be:
(take n (shuffle coll))
Has the "advantage" of not repeating elements. Also you could implement a lazy-shuffle but it will involve more code.

I know it's not exactly what you're asking - but if you're doing a lot of sampling and statistical work, you might be interested in Incanter ([incanter "1.5.2"]).
Incanter provides the function sample, which provides options for sample size, and replacement.
(require '[incanter.stats :refer [sample]]))
(sample [1 2 3 4 5 6 7] :size 5 :replacement false)
; => (1 5 6 2 7)

Splice in Clojure

is there a single function for getting "from x to y" items in a sequence?
For example, given (range 10) I want [5 6 7 8] (take from 6th to nineth, or take 4 from the 6th,). Of course I can have this with a combination of a couple of functions (eg (take 4 (drop 5 (range 10)))), but is seems strange that there's not a built-in like pythons's mylist[5:9]. Thanks

subvec for vectors, primarily since it is O(1). For seqs you will need to use the O(n) of take/drop.

From a philosophical point of view, the reason there's no built-in operator is that you don't need a built-in operator to make it feel "natural" like you do in Python.
(defn splice [coll start stop]
(take (- stop start) (drop start coll)))
(splice coll 6 10)
Feels just like a language built-in, with exactly as much "new syntax" as any feature. In Python, the special [x:y] operator needs language-level support to make it feel as natural as the single-element accessor.
So rather than cluttering up the (already crowded) language core, Clojure simply leaves room for a user or library to implement this if you want it.

(range 5 9), or (vec (range 5 9)).
(Perhaps this syntax for range wasn't available mid-2012.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure: reduce vs. apply - clojure

For newbies looking at this answer, be careful, they are not the same: (apply hash-map [:a 5 :b 6]) ;= {:a 5, :b 6} (reduce hash-map [:a 5 :b 6]) ;= {{{:a 5} :b} 6}

It doesn't make a difference in this case, because + is a special case that can apply to any number of arguments. Reduce is a way to apply a function that expects a fixed number of arguments (2) to an arbitrarily long list of arguments.

Related

Differences between assoc-in with two elements and update / assoc

What is the difference between (take 5 (range)) and (range 5)

purpose of clojure reduced function

Is there a more idiomatic way to get N random elements of a collection in Clojure?

Splice in Clojure

Categories

Resources