Clojure: precise definition of reducer

Clojure: precise definition of reducer - clojure

The Clojure website defines a reducer as follows:
A reducer is the combination of a reducible collection (a collection that knows how to reduce itself) with a reducing function (the "recipe" for what needs to be done during the reduction).
The following is the implementation of the reducer function (from Rich's blog post on the topic)
(defn reducer
([coll xf]
(reify
clojure.core.protocols/CollReduce
(coll-reduce [_ f1 init]
(clojure.core.protocols/coll-reduce coll (xf f1) init)))))
It seems that it would be more accurate to say that a reducer is a combination of a reducible collection and a reducing function transformer (later called a transducer), instead of a reducing function.
The reducer does not "know" anything about the reducing function, which is provided by reduce. All it knows is a "recipe" for taking some reducing function and modifying (transforming) it.
Is my understanding the right defintion of "reducer"? Or is there something I'm missing about the "reducible collection with a reducing function" definition?

When you look at the source of the reducer function from the blog:
(defn reducer
([coll xf]
(reify
clojure.core.protocols/CollReduce
(coll-reduce [_ f1 init]
(clojure.core.protocols/coll-reduce coll (xf f1) init)))))
you can see that the reducer asks the collection to reduce itself (by calling coll-reduce on the coll) and supplying a reducing function which is produced by calling the reducing function transformer ((xf f1)).
So I would say that the sentence:
A reducer is a combination of a reducible collection and a reducing function transformer (later called a transducer), instead of a reducing function.
is more accurate as what you need to start with is a reducible collection and a reducing function transformer. A reduction function is just a result of calling the reducing function transformer.

The language in the docs is confusing a reducer and a transducer. A transducer can be thought of as nothing more than an optimized implementation of a reducer, with the goal of reducing object allocation (& subsequent GC). This optimization does not change the conceptual model of a reducer.
So, sticking to plain-old reduce, it's goal is to simply to provide a way to accumulate a result from a sequence. The simplest example is to total a sequence:
(ns clj.core
(:require [clojure.string :as str] )
(:use tupelo.core)) ; it->
(def values (range 6))
(spyx values)
(def total (reduce + 0 values))
(spyx total)
;=> values => (0 1 2 3 4 5)
;=> total => 15
However, the "reducing function" can do anything. It can also return a sequence, not just a scalar value:
(def duplicate (reduce (fn [cum-result new-val] ; accumulating function
(conj cum-result new-val))
[] ; initial value
values)) ; sequence to process
(spyx duplicate)
;=> duplicate => [0 1 2 3 4 5]
Here is a more complicated reducing function that computes the integral of the input sequence:
(def integral (reduce (fn [cum-state new-val] ; accumulating function
(let [integ-val (+ (:running-total cum-state) new-val) ]
{ :integ-vals (conj (:integ-vals cum-state) integ-val)
:running-total integ-val} ))
{:integ-vals [] :running-total 0} ; initial value
values)) ; sequence to process
(spyx integral)
;=> integral => {:integ-vals [0 1 3 6 10 15], :running-total 15}
So this is the big difference compared to map. We call map as:
(def y (map f x))
where x and y are sequences, and the result looks like
y(0) = f( x(0) ) ; math notation used here
y(1) = f( x(1) )
y(2) = f( x(2) )
...
So each y(i) depends only on the function f and x(i). In contrast, we define reduce as:
(def y (reduce f init x))
where x and y are sequences, and init is a scalar (like 0 or []). The result looks like
y(0) = f( init, x(0) ) ; math notation used here
y(1) = f( y(0), x(1) )
y(2) = f( y(1), x(2) )
...
So the reducing function fis a function of 2 values: the cumulative result and the new x value.

Related

How do I use "mean" as the final reducing function in a transducer?

I'm trying to estimate the mean distance of all pairs of points in a unit square.
This transducer returns a vector of the distances of x randomly selected pairs of points, but the final step would be to take the mean of all values in that vector. Is there a way to use mean as the final reducing function (or to include it in the composition)?
(defn square [x] (* x x))
(defn mean [x] (/ (reduce + x) (count x)))
(defn xform [iterations]
(comp
(partition-all 4)
(map #(Math/sqrt (+ (square (- (first %) (nth % 1)))
(square (- (nth % 2) (nth % 3))))))
(take iterations)))
(transduce (xform 5) conj (repeatedly #(rand)))
[0.5544757422041136
0.4170515673848907
0.7457675423415904
0.5560901974277822
0.6053573945754688]
(transduce (xform 5) mean (repeatedly #(rand)))
Execution error (ArityException) at test.core/eval19667 (form-init9118116578029918666.clj:562).
Wrong number of args (0) passed to: test.core/mean

If you implement your mean function differently, you won't have to collect all the values before computing the mean. Here is how you can implement it, based on this Java code:
(defn mean
([] [0 1]) ;; <-- Construct an empty accumulator
([[mu n]] mu) ;; <-- Get the mean (final step)
([[mu n] x] ;; <-- Accumulate a value to the mean
[(+ mu (/ (- x mu) n)) (inc n)]))
And you use it like this:
(transduce identity mean [1 2 3 4])
;; => 5/2
or like this:
(transduce (xform 5) mean (repeatedly #(rand)))
;; => 0.582883812837961

From the docs of transduce:
If init is not supplied, (f) will be called to produce it. f should be
a reducing step function that accepts both 1 and 2 arguments, if it
accepts only 2 you can add the arity-1 with 'completing'.
To disect this:
Your function needs 0-arity to produce an initial value -- so conj
is fine (it produces an empty vector).
You need to provide a 2-arity function to do the actual redudcing
-- again conj is fine here
You need to provide a 1-arity function to finalize - here you want
your mean.
So as the docs suggest, you can use completing to just provide that:
(transduce (xform 5) (completing conj mean) (repeatedly #(rand)))
; → 0.4723186070904141
If you look at the source of completing you will see how it produces
all of this:
(defn completing
"Takes a reducing function f of 2 args and returns a fn suitable for
transduce by adding an arity-1 signature that calls cf (default -
identity) on the result argument."
{:added "1.7"}
([f] (completing f identity))
([f cf]
(fn
([] (f))
([x] (cf x))
([x y] (f x y)))))

Iteratively apply function to its result without generating a seq

This is one of those "Is there a built-in/better/idiomatic/clever way to do this?" questions.
I want a function--call it fn-pow--that will apply a function f to the result of applying f to an argument, then apply it to the result of applying it to its result, etc., n times. For example,
(fn-pow inc 0 3)
would be equivalent to
(inc (inc (inc 0)))
It's easy to do this with iterate:
(defn fn-pow-0
[f x n]
(nth (iterate f x) n))
but that creates and throws away an unnecessary lazy sequence.
It's not hard to write the function from scratch. Here is one version:
(defn fn-pow-1
[f x n]
(if (> n 0)
(recur f (f x) (dec n))
x))
I found this to be almost twice as fast as fn-pow-0, using Criterium on (fn-pow inc 0 10000000).
I don't consider the definition of fn-pow-1 to be unidiomatic, but fn-pow seems like something that might be a standard built-in function, or there may be some simple way to define it with a couple of higher-order functions in a clever arrangement. I haven't succeeded in discovering either. Am I missing something?

The built-in you are looking for is probably dotimes. I'll tell you why in a round-about fashion.
Time
What you are testing in your benchmark is mainly the overhead of a level of indirection. That (nth (iterate ...) n) is only twice as slow as what compiles to a loop when the body is a very fast function is rather surprising/encouraging. If f is a more costly function, the importance of that overhead diminishes. (Of course if your f is low-level and fast, then you should use a low-level loop construct.)
Say your function takes ~ 1 ms instead
(defn my-inc [x] (Thread/sleep 1) (inc x))
Then both of these will take about 1 second -- the difference is around 2% rather than 100%.
(bench (fn-pow-0 my-inc 0 1000))
(bench (fn-pow-1 my-inc 0 1000))
Space
The other concern is that iterate is creating an unnecessary sequence. But, if you are not holding onto the head, just doing an nth, then you aren't really creating a sequence per se but sequentially creating, using, and discarding LazySeq objects. In other words, you are using a constant amount of space, though generating garbage in proportion to n. However, unless your f is primitive or mutating its argument, then it is already producing garbage in proportion to n in producing its own intermediate results.
Reducing Garbage
An interesting compromise between fn-pow-0 and fn-pow-1 would be
(defn fn-pow-2 [f x n] (reduce (fn [x _] (f x)) x (range n)))
Since range objects know how to intelligently reduce themselves, this does not create additional garbage in proportion to n. It boils down to a loop as well. This is the reduce method of range:
public Object reduce(IFn f, Object start) {
Object ret = f.invoke(start,n);
for(int x = n+1;x < end;x++)
ret = f.invoke(ret, x);
return ret;
}
This was actually the fastest of the three (before adding primitive type-hints on n in the recur version, that is) with the slowed down my-inc.
Mutation
If you are iterating a function potentially expensive in time or space, such as matrix operations, then you may very well be wanting to use (in a contained manner) an f that mutates its argument to eliminate the garbage overhead. Since mutation is a side effect, and you want that side effect n times, dotimes is the natural choice.
For the sake of example, I'll use an atom as a stand-in, but imagine bashing on a mutable matrix instead.
(def my-x (atom 0))
(defn my-inc! [x] (Thread/sleep 1) (swap! x inc))
(defn fn-pow-3! [f! x n] (dotimes [i n] (f! x)))

That sounds just like composing functions n times.
(defn fn-pow [f p t]
((apply comp (repeat t f)) p))

Hmmm. I note that Ankur's version is around 10x slower than your original - possibly not the intent, no matter how idiomatic? :-)
Type hinting fn-pow-1 simply for the counter yields substantially faster results for me - around 3x faster.
(defn fn-pow-3 [f x ^long n]
(if (> n 0)
(recur f (f x) (dec n))
x))
This is around twice as slow as a version which uses inc directly, losing the variability (not hinting x to keep to the spirit of the test)...
(defn inc-pow [x ^long n]
(if (> n 0)
(recur (inc x) (dec n))
x))
I think that for any nontrivial f that fn-pow-3 is probably the best solution.
I haven't found a particularly "idiomatic" way of doing this as it does not feel like common use case outside of micro benchmarks (although would love to be contradicted).
Would be intrigued to hear of a real world example, if you have one?

To us benighted imperative programmers, a more general pattern is known as a while statement. We can capture it in a macro:
(defmacro while [bv ; binding vector
tf ; test form
recf ; recur form
retf ; return form
]
`(loop ~bv (if ~tf (recur ~#recf) ~retf)))
... in your case
(while [x 0, n 3] (pos? n)
[(inc x) (dec n)]
x)
; 3
I was hoping to type-hint the n, but it's illegal. Maybe it's
inferred.
Forgive me (re)using while.
This isn't quite right: it doesn't allow for computation prior to the recur-form.
We can adapt the macro to do things prior to the recur:
(defmacro while [bv ; binding vector
tf ; test form
bodyf ; body form
retf ; return form
]
(let [bodyf (vec bodyf)
recf (peek bodyf)
bodyf (seq (conj (pop bodyf) (cons `recur recf)))]
`(loop ~bv (if ~tf ~bodyf ~retf))))
For example
(while [x 0, n 3] (pos? n)
(let [x- (inc x) n- (dec n)] [x- n-])
x)
; 3
I find this quite expressive. YMMV.

Pass multiple parameters function from other function with Clojure and readability issues

I'm trying to learn functional programming with SICP. I want to use Clojure.
Clojure is a dialect of Lisp but I'm very unfamiliar with Lisp. This code snippet unclean and unreadable. How to write more efficient code with Lisp dialects ?
And how to pass multiple parameters function from other function ?
(defn greater [x y z]
(if (and (>= x y) (>= x z))
(if (>= y z)
[x,y]
[x,z])
(if (and (>= y x) (>= y z))
(if (>= x z)
[y,x]
[y,z])
(if (and (>= z x) (>= z y))
(if (>= y x)
[z,y]
[z,x])))))
(defn sum-of-squares [x y]
(+ (* x x) (* y y)))
(defn -main
[& args]
(def greats (greater 2 3 4))
(def sum (sum-of-squares greats)))

You are asking two questions, and I will try to answer them in reverse order.
Applying Collections as Arguments
To use a collection as an function argument, where each item is a positional argument to the function, you would use the apply function.
(apply sum-of-squares greats) ;; => 25
Readability
As for the more general question of readability:
You can gain readability by generalizing the problem. From your code sample, it looks like the problem consists of performing the sum, of the squares, on the two largest numbers in a collection. So, it would be visually cleaner to sort the collection in descending order and take the first two items.
(defn greater [& numbers]
(take 2 (sort > numbers)))
(defn sum-of-squares [x y]
(+ (* x x) (* y y)))
You can then use apply to pass them to your sum-of-squares function.
(apply sum-of-squares (greater 2 3 4)) ;; => 25
Keep in Mind: The sort function is not lazy. So, it will both realize and sort the entire collection you give it. This could have performance implications in some scenarios. But, in this case, it is not an issue.
One Step Further
You can further generalize your sum-of-squares function to handle multiple arguments by switching the two arguments, x and y, to a collection.
(defn sum-of-squares [& xs]
(reduce + (map #(* % %) xs)))
The above function creates an anonymous function, using the #() short hand syntax, to square a number. That function is then lazily mapped, using map, over every item in the xs collection. So, [1 2 3] would become (1 4 9). The reduce function takes each item and applies the + function to it and the current total, thus producing the sum of the collection. (Because + takes multiple parameters, in this case you could also use apply.)
If put it all together using one of the threading macros, ->>, it starts looking very approachable. (Although, an argument could be made that, in this case, I have traded some composability for more readability.)
(defn super-sum-of-squares [n numbers]
(->> (sort > numbers)
(take n)
(map #(* % %))
(reduce +)))
(super-sum-of-squares 2 [2 3 4]) ;;=> 25

(defn greater [& args] (take 2 (sort > args)))
(defn -main
[& args]
(let [greats (greater 2 3 4)
sum (apply sum-of-squares greats)]
sum))
A key to good clojure style is to use the built in sequence operations. An alternate approach would have been a single cond form instead of the deeply nested if statements.
def should not be used inside function bodies.
A function should return a usable result (the value returned by -main will be printed if you run the project).
apply uses a list as the args for the function provided.

To write readable code, use the functions provided by the language as much as possible:
e.g. greater can be defined as
(defn greater [& args]
(butlast (sort > args)))
To make sum-of-squares work on the return value from greater, use argument destructuring
(defn sum-of-squares [[x y]]
(+ (* x x) (* y y)))
which requires the number of elements in the argument sequence to be known,
or define sum-of-squares to take a single sequence as argument
(defn sum-of-squares [args]
(reduce + (map (fn [x] (* x x)) args)))

Clojure Partial Application - How to get 'map' to return a collection of functions?

I have a function that I basically yanked from a discussion in the Clojure google group, that takes a collection and a list of functions of arbitrary length, and filters it to return a new collection containing all elements of the original list for which at least one of the functions evaluates to true:
(defn multi-any-filter [coll & funcs]
(filter #(some true? ((apply juxt funcs) %)) coll))
I'm playing around with making a generalizable solution to Project Euler Problem 1, so I'm using it like this:
(def f3 (fn [x] (= 0 (mod x 3))))
(def f5 (fn [x] (= 0 (mod x 5))))
(reduce + (multi-any-filter (range 1 1000) f3 f5))
Which gives the correct answer.
However, I want to modify it so I can pass ints to it instead of functions, like
(reduce + (multi-any-filter (range 1 1000) 3 5))
where I can replace 3 and 5 with an arbitrary number of ints and do the function wrapping of (=0 (mod x y)) as an anonymous function inside the multi-any-filter function.
Unfortunately this is past the limit of my Clojure ability. I'm thinking that I would need to do something with map to the list of args, but I'm not sure how to get map to return a list of functions, each of which is waiting for another argument. Clojure doesn't seem to support currying the way I learned how to do it in other functional languages. Perhaps I need to use partial in the right spot, but I'm not quite sure how.
In other words, I want to be able to pass an arbitrary number of arguments (that are not functions) and then have each of those arguments get wrapped in the same function, and then that list of functions gets passed to juxt in place of funcs in my multi-any-filter function above.
Thanks for any tips!

(defn evenly-divisible? [x y]
(zero? (mod x y)))
(defn multi-any-filter [col & nums]
(let [partials (map #(fn [x] (evenly-divisible? x %)) nums)
f (apply juxt partials)]
(filter #(some true? (f %)) col)))
I coudn't use partial because it applies the arg in the first position of the fn. We want it in the second position of evenly-divisible? We could re-arrange in evenly-divisible? but then it would not really look correct when using it standalone.
user=> (reduce + (multi-any-filter (range 1 1000) 3 5))
233168

Partial out all but i-th variable in variadic function in Clojure

I'm starting out learning Clojure, and was trying to implement some basic numerical derivative functions for practice. I'm trying to create a gradient function that accepts an n-variable function and the points at which to evaluate it. To do this in a "functional" style, I want to implement the gradient as a map of a 1-variable derivatives.
The 1-variable derivative function is simple:
(defn derivative
"Numerical derivative of a univariate function."
[f x]
(let [eps 10e-6] ; Fix epsilon, just for starters.
; Centered derivative is [f(x+e) - f(x-e)] / (2e)
(/ (- (f (+ x eps)) (f (- x eps))) (* 2 eps))))
I'd like to design the gradient along these lines:
(defn gradient
"Numerical gradient of a multivariate function."
[f & x]
(let [varity-index (range (count x))
univariate-in-i (fn [i] (_?_))] ; Creates a univariate fn
; of x_i (other x's fixed)
;; For each i = 0, ... n-1:
;; (1) Get univariate function of x_i
;; (2) Take derivative of that function
;; Gradient is sequence of those univariate derivatives.
(map derivative (map univariate-in-i varity-index) x)))
So, gradient has variable arity (can accept any # of x's), and the order of the x's counts. The function univariate-in-i takes an index i = 0, 1, ... n-1 and returns a 1-variable function by partial-ing out all the variables except x_i. E.g., you'd get:
#(f x_0 x_1 ... x_i-1 % x_i+1 ... x_n)
^
(x_i still variable)
map-ping this function over varity-index gets you a sequence of 1-variable functions in each x_i, and then map-ping derivative over these gets you a sequence of derivatives in each x_i which is the gradient we want.
My questions is: I'm not sure what a good way to implement univariate-in-i is. I essentially need to fill in values for x's in f except at some particular spot (i.e., place the % above), but programmatically.
I'm more interested in technique than solution (i.e., I know how to compute gradients, I'm trying to learn functional programming and idiomatic Clojure). Therefore, I'd like to stay true to the strategy of treating the gradient as a map of 1-d derivatives over partialed-out functions. But if there's a better "functional" approach to this, please let me know. I'd rather not resort to macros if possible.
Thanks in advance!
Update:
Using Ankur's answer below, the gradient function I get is:
(defn gradient
"Numerical gradient of a multivariate function."
[f & x]
(let [varity-index (range (count x))
x-vec (vec x)
univariate-in-i
(fn [i] #(->> (assoc x-vec i %) (apply f)))]
(map derivative (map univariate-in-i varity-index) x)))
which does exactly what I'd hoped, and seems very concise and functional.

You can define univariate-in-i as shown below. (Assuming that all the other position values are defined in some var default which is a vector)
(fn [i] #(->>
(assoc default i %)
(apply f)))

if you find this abit difficult to comprehend (in the context of how to implement gradient), another variant of multivariable gradient implementation using clojure:
then, given f and vector v of a1,....,aN, will differentiate while all the variables except xi are fixed:
(defn partial-diff [f v i]
(let [h 10e-6
w (update v i + h)]
(/ (- (apply f w) (apply f v))
h)))
(defn gradient [f v]
(map #(partial-diff f v %) (range (count v))))
=>
(gradient (fn [x y]
(+ (* x x) (* x y y))) [3 3])
=> (15.000009999965867 18.000030000564493)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure: precise definition of reducer - clojure

Related

How do I use "mean" as the final reducing function in a transducer?

Iteratively apply function to its result without generating a seq

Pass multiple parameters function from other function with Clojure and readability issues

Clojure Partial Application - How to get 'map' to return a collection of functions?

Partial out all but i-th variable in variadic function in Clojure

Categories

Resources