How do I use "mean" as the final reducing function in a transducer? - clojure

I'm trying to estimate the mean distance of all pairs of points in a unit square.
This transducer returns a vector of the distances of x randomly selected pairs of points, but the final step would be to take the mean of all values in that vector. Is there a way to use mean as the final reducing function (or to include it in the composition)?
(defn square [x] (* x x))
(defn mean [x] (/ (reduce + x) (count x)))
(defn xform [iterations]
(comp
(partition-all 4)
(map #(Math/sqrt (+ (square (- (first %) (nth % 1)))
(square (- (nth % 2) (nth % 3))))))
(take iterations)))
(transduce (xform 5) conj (repeatedly #(rand)))
[0.5544757422041136
0.4170515673848907
0.7457675423415904
0.5560901974277822
0.6053573945754688]
(transduce (xform 5) mean (repeatedly #(rand)))
Execution error (ArityException) at test.core/eval19667 (form-init9118116578029918666.clj:562).
Wrong number of args (0) passed to: test.core/mean

If you implement your mean function differently, you won't have to collect all the values before computing the mean. Here is how you can implement it, based on this Java code:
(defn mean
([] [0 1]) ;; <-- Construct an empty accumulator
([[mu n]] mu) ;; <-- Get the mean (final step)
([[mu n] x] ;; <-- Accumulate a value to the mean
[(+ mu (/ (- x mu) n)) (inc n)]))
And you use it like this:
(transduce identity mean [1 2 3 4])
;; => 5/2
or like this:
(transduce (xform 5) mean (repeatedly #(rand)))
;; => 0.582883812837961

From the docs of transduce:
If init is not supplied, (f) will be called to produce it. f should be
a reducing step function that accepts both 1 and 2 arguments, if it
accepts only 2 you can add the arity-1 with 'completing'.
To disect this:
Your function needs 0-arity to produce an initial value -- so conj
is fine (it produces an empty vector).
You need to provide a 2-arity function to do the actual redudcing
-- again conj is fine here
You need to provide a 1-arity function to finalize - here you want
your mean.
So as the docs suggest, you can use completing to just provide that:
(transduce (xform 5) (completing conj mean) (repeatedly #(rand)))
; → 0.4723186070904141
If you look at the source of completing you will see how it produces
all of this:
(defn completing
"Takes a reducing function f of 2 args and returns a fn suitable for
transduce by adding an arity-1 signature that calls cf (default -
identity) on the result argument."
{:added "1.7"}
([f] (completing f identity))
([f cf]
(fn
([] (f))
([x] (cf x))
([x y] (f x y)))))

Related

Clojure function to Replace Count

I need help with an assignment that uses Clojure. It is very small but the language is a bit confusing to understand. I need to create a function that behaves like count without actually using the count funtion. I know a loop can be involved with it somehow but I am at a lost because nothing I have tried even gets my code to work. I expect it to output the number of elements in list. For example:
(defn functionname []
...
...)
(println(functionname '(1 4 8)))
Output:3
Here is what I have so far:
(defn functionname [n]
(def n 0)
(def x 0)
(while (< x n)
do
()
)
)
(println(functionname '(1 4 8)))
It's not much but I think it goes something like this.
This implementation takes the first element of the list and runs a sum until it can't anymore and then returns the sum.
(defn recount [list-to-count]
(loop [xs list-to-count sum 0]
(if (first xs)
(recur (rest xs) (inc sum))
sum
)))
user=> (recount '(3 4 5 9))
4
A couple more example implementations:
(defn not-count [coll]
(reduce + (map (constantly 1) coll)))
or:
(defn not-count [coll]
(reduce (fn [a _] (inc a)) 0 coll))
or:
(defn not-count [coll]
(apply + (map (fn [_] 1) coll)))
result:
(not-count '(5 7 8 1))
=> 4
I personally like the first one with reduce and constantly.

How to make reduce more readable in Clojure?

A reduce call has its f argument first. Visually speaking, this is often the biggest part of the form.
e.g.
(reduce
(fn [[longest current] x]
(let [tail (last current)
next-seq (if (or (not tail) (> x tail))
(conj current x)
[x])
new-longest (if (> (count next-seq) (count longest))
next-seq
longest)]
[new-longest next-seq]))
[[][]]
col))
The problem is, the val argument (in this case [[][]]) and col argument come afterward, below, and it's a long way for your eyes to travel to match those with the parameters of f.
It would look more readable to me if it were in this order instead:
(reduceb val col
(fn [x y]
...))
Should I implement this macro, or am I approaching this entirely wrong in the first place?
You certainly shouldn't write that macro, since it is easily written as a function instead. I'm not super keen on writing it as a function, either, though; if you really want to pair the reduce with its last two args, you could write:
(-> (fn [x y]
...)
(reduce init coll))
Personally when I need a large function like this, I find that a comma actually serves as a good visual anchor, and makes it easier to tell that two forms are on that last line:
(reduce (fn [x y]
...)
init, coll)
Better still is usually to not write such a large reduce in the first place. Here you're combining at least two steps into one rather large and difficult step, by trying to find all at once the longest decreasing subsequence. Instead, try splitting the collection up into decreasing subsequences, and then take the largest one.
(defn decreasing-subsequences [xs]
(lazy-seq
(cond (empty? xs) []
(not (next xs)) (list xs)
:else (let [[x & [y :as more]] xs
remainder (decreasing-subsequences more)]
(if (> y x)
(cons [x] remainder)
(cons (cons x (first remainder)) (rest remainder)))))))
Then you can replace your reduce with:
(apply max-key count (decreasing-subsequences xs))
Now, the lazy function is not particularly shorter than your reduce, but it is doing one single thing, which means it can be understood more easily; also, it has a name (giving you a hint as to what it's supposed to do), and it can be reused in contexts where you're looking for some other property based on decreasing subsequences, not just the longest. You can even reuse it more often than that, if you replace the > in (> y x) with a function parameter, allowing you to split up into subsequences based on any predicate. Plus, as mentioned it is lazy, so you can use it in situations where a reduce of any sort would be impossible.
Speaking of ease of understanding, as you can see I misunderstood what your function is supposed to do when reading it. I'll leave as an exercise for you the task of converting this to strictly-increasing subsequences, where it looked to me like you were computing decreasing subsequences.
You don't have to use reduce or recursion to get the descending (or ascending) sequences. Here we are returning all the descending sequences in order from longest to shortest:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(defn descending-sequences [xs]
(->> xs
(partition 2 1)
(map (juxt (fn [[x y]] (> x y)) identity))
(partition-by first)
(filter ffirst)
(map #(let [xs' (mapcat second %)]
(take-nth 2 (cons (first xs') xs'))))
(sort-by (comp - count))))
(descending-sequences in)
;;=> ((7 6 5 4 3 2) (3 2 1 0 -1) (7 6))
(partition 2 1) gives every possible comparison and partition-by allows you to mark out the runs of continuous decreases. At this point you can already see the answer and the rest of the code is removing the baggage that is no longer needed.
If you want the ascending sequences instead then you only need to change the < to a >:
;;=> ((-1 2 7) (6 7))
If, as in the question, you only want the longest sequence then put a first as the last function call in the thread last macro. Alternatively replace the sort-by with:
(apply max-key count)
For maximum readability you can name the operations:
(defn greatest-continuous [op xs]
(let [op-pair? (fn [[x y]] (op x y))
take-every-second #(take-nth 2 (cons (first %) %))
make-canonical #(take-every-second (apply concat %))]
(->> xs
(partition 2 1)
(partition-by op-pair?)
(filter (comp op-pair? first))
(map make-canonical)
(apply max-key count))))
I feel your pain...they can be hard to read.
I see 2 possible improvements. The simplest is to write a wrapper similar to the Plumatic Plumbing defnk style:
(fnk-reduce { :fn (fn [state val] ... <new state value>)
:init []
:coll some-collection } )
so the function call has a single map arg, where each of the 3 pieces is labelled & can come in any order in the map literal.
Another possibility is to just extract the reducing fn and give it a name. This can be either internal or external to the code expression containing the reduce:
(let [glommer (fn [state value] (into state value)) ]
(reduce glommer #{} some-coll))
or possibly
(defn glommer [state value] (into state value))
(reduce glommer #{} some-coll))
As always, anything that increases clarity is preferred. If you haven't noticed already, I'm a big fan of Martin Fowler's idea of Introduce Explaining Variable refactoring. :)
I will apologize in advance for posting a longer solution to something where you wanted more brevity/clarity.
We are in the new age of clojure transducers and it appears a bit that your solution was passing the "longest" and "current" forward for record-keeping. Rather than passing that state forward, a stateful transducer would do the trick.
(def longest-decreasing
(fn [rf]
(let [longest (volatile! [])
current (volatile! [])
tail (volatile! nil)]
(fn
([] (rf))
([result] (transduce identity rf result))
([result x] (do (if (or (nil? #tail) (< x #tail))
(if (> (count (vswap! current conj (vreset! tail x)))
(count #longest))
(vreset! longest #current))
(vreset! current [(vreset! tail x)]))
#longest)))))))
Before you dismiss this approach, realize that it just gives you the right answer and you can do some different things with it:
(def coll [2 1 10 9 8 40])
(transduce longest-decreasing conj coll) ;; => [10 9 8]
(transduce longest-decreasing + coll) ;; => 27
(reductions (longest-decreasing conj) [] coll) ;; => ([] [2] [2 1] [2 1] [2 1] [10 9 8] [10 9 8])
Again, I know that this may appear longer but the potential to compose this with other transducers might be worth the effort (not sure if my airity 1 breaks that??)
I believe that iterate can be a more readable substitute for reduce. For example here is the iteratee function that iterate will use to solve this problem:
(defn step-state-hof [op]
(fn [{:keys [unprocessed current answer]}]
(let [[x y & more] unprocessed]
(let [next-current (if (op x y)
(conj current y)
[y])
next-answer (if (> (count next-current) (count answer))
next-current
answer)]
{:unprocessed (cons y more)
:current next-current
:answer next-answer}))))
current is built up until it becomes longer than answer, in which case a new answer is created. Whenever the condition op is not satisfied we start again building up a new current.
iterate itself returns an infinite sequence, so needs to be stopped when the iteratee has been called the right number of times:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(->> (iterate (step-state-hof >) {:unprocessed (rest in)
:current (vec (take 1 in))})
(drop (- (count in) 2))
first
:answer)
;;=> [7 6 5 4 3 2]
Often you would use a drop-while or take-while to short circuit just when the answer has been obtained. We could so that here however there is no short circuiting required as we know in advance that the inner function of step-state-hof needs to be called (- (count in) 1) times. That is one less than the count because it is processing two elements at a time. Note that first is forcing the final call.
I wanted this order for the form:
reduce
val, col
f
I was able to figure out that this technically satisfies my requirements:
> (apply reduce
(->>
[0 [1 2 3 4]]
(cons
(fn [acc x]
(+ acc x)))))
10
But it's not the easiest thing to read.
This looks much simpler:
> (defn reduce< [val col f]
(reduce f val col))
nil
> (reduce< 0 [1 2 3 4]
(fn [acc x]
(+ acc x)))
10
(< is shorthand for "parameters are rotated left"). Using reduce<, I can see what's being passed to f by the time my eyes get to the f argument, so I can just focus on reading the f implementation (which may get pretty long). Additionally, if f does get long, I no longer have to visually check the indentation of the val and col arguments to determine that they belong to the reduce symbol way farther up. I personally think this is more readable than binding f to a symbol before calling reduce, especially since fn can still accept a name for clarity.
This is a general solution, but the other answers here provide many good alternative ways to solve the specific problem I gave as an example.

mapcat breaking the lazyness

I have a function that produces lazy-sequences called a-function.
If I run the code:
(map a-function a-sequence-of-values)
it returns a lazy sequence as expected.
But when I run the code:
(mapcat a-function a-sequence-of-values)
it breaks the lazyness of my function. In fact it turns that code into
(apply concat (map a-function a-sequence-of-values))
So it needs to realize all the values from the map before concatenating those values.
What I need is a function that concatenates the result of a map function on demand without realizing all the map beforehand.
I can hack a function for this:
(defn my-mapcat
[f coll]
(lazy-seq
(if (not-empty coll)
(concat
(f (first coll))
(my-mapcat f (rest coll))))))
But I can't believe that clojure doesn't have something already done. Do you know if clojure has such feature? Only a few people and I have the same problem?
I also found a blog that deals with the same issue: http://clojurian.blogspot.com.br/2012/11/beware-of-mapcat.html
Lazy-sequence production and consumption is different than lazy evaluation.
Clojure functions do strict/eager evaluation of their arguments. Evaluation of an argument that is or that yields a lazy sequence does not force realization of the yielded lazy sequence in and of itself. However, any side effects caused by evaluation of the argument will occur.
The ordinary use case for mapcat is to concatenate sequences yielded without side effects. Therefore, it hardly matters that some of the arguments are eagerly evaluated because no side effects are expected.
Your function my-mapcat imposes additional laziness on the evaluation of its arguments by wrapping them in thunks (other lazy-seqs). This can be useful when significant side effects - IO, significant memory consumption, state updates - are expected. However, the warning bells should probably be going off in your head if your function is doing side effects and producing a sequence to be concatenated that your code probably needs refactoring.
Here is similar from algo.monads
(defn- flatten*
"Like #(apply concat %), but fully lazy: it evaluates each sublist
only when it is needed."
[ss]
(lazy-seq
(when-let [s (seq ss)]
(concat (first s) (flatten* (rest s))))))
Another way to write my-mapcat:
(defn my-mapcat [f coll] (for [x coll, fx (f x)] fx))
Applying a function to a lazy sequence will force realization of a portion of that lazy sequence necessary to satisfy the arguments of the function. If that function itself produces lazy sequences as a result, those are not realized as a matter of course.
Consider this function to count the realized portion of a sequence
(defn count-realized [s]
(loop [s s, n 0]
(if (instance? clojure.lang.IPending s)
(if (and (realized? s) (seq s))
(recur (rest s) (inc n))
n)
(if (seq s)
(recur (rest s) (inc n))
n))))
Now let's see what's being realized
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
concat-seq (apply concat seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "concat-seq: " (count-realized concat-seq))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 4
; concat-seq: 0
; seqs-in-seq: [0 0 0 0 0 0]
So, 4 elements of the seq-of-seqs got realized, but none of its component sequences were realized nor was there any realization in the concatenated sequence.
Why 4? Because the applicable arity overloaded version of concat takes 4 arguments [x y & xs] (count the &).
Compare to
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
foo-seq (apply (fn foo [& more] more) seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 2
; seqs-in-seq: [0 0 0 0 0 0]
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
foo-seq (apply (fn foo [a b c & more] more) seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 5
; seqs-in-seq: [0 0 0 0 0 0]
Clojure has two solutions to making the evaluation of arguments lazy.
One is macros. Unlike functions, macros do not evaluate their arguments.
Here's a function with a side effect
(defn f [n] (println "foo!") (repeat n n))
Side effects are produced even though the sequence is not realized
user=> (def x (concat (f 1) (f 2)))
foo!
foo!
#'user/x
user=> (count-realized x)
0
Clojure has a lazy-cat macro to prevent this
user=> (def y (lazy-cat (f 1) (f 2)))
#'user/y
user=> (count-realized y)
0
user=> (dorun y)
foo!
foo!
nil
user=> (count-realized y)
3
user=> y
(1 2 2)
Unfortunately, you cannot apply a macro.
The other solution to delay evaluation is wrap in thunks, which is exactly what you've done.
Your premise is wrong. Concat is lazy, apply is lazy if its first argument is, and mapcat is lazy.
user> (class (mapcat (fn [x y] (println x y) (list x y)) (range) (range)))
0 0
1 1
2 2
3 3
clojure.lang.LazySeq
note that some of the initial values are evaluated (more on this below), but clearly the whole thing is still lazy (or the call would never have returned, (range) returns an endless sequence, and will not return when used eagerly).
The blog you link to is about the danger of recursively using mapcat on a lazy tree, because it is eager on the first few elements (which can add up in a recursive application).

Writing the Lp norm function

I'm attempting to write the Lp norm function as to generalize the standard L2 norm (Euclidean distance) used. Here is what I have come up with so far, given how I had written the L2 norm:
(defn foo [a b p]
(reduce + (map (comp (map #(power a %) p) -) a b)))
However I am getting the error ClassCastException whenever I try to implement this function. Part of the interim code is from a previously asked question Raising elements in a vector to a power where the following code was provided:
(defn compute [exp numbers]
(map #(power exp %) numbers))
Consider factoring your code.
First define the p-norm
(defn p-norm [p x]
(if (= p :infinity)
(apply max (for [xi x] (Math/abs xi)))
(Math/pow
(reduce + (for [xi x] (Math/pow xi p)))
(/ 1 p))))
And then use the p-norm to define your p-metric
(defn p-metric [p x y]
(p-norm p (map - x y)))
Example
(p-metric 2 [0 0] [3 4])
;=> 5.0
(p-metric :infinity [0 0] [3 4])
;=> 4
Your inner (map):
(map #(power a %) p)
Returns a sequence and you can't feed that to (comp). 'comp' is for 'Function Composition'.
In the REPL:
(doc comp)
clojure.core/comp
([] [f] [f g] [f g h] [f1 f2 f3 & fs])
Takes a set of functions and returns a fn that is the composition
of those fns. The returned fn takes a variable number of args,
applies the rightmost of fns to the args, the next
fn (right-to-left) to the result, etc.
Start breaking your code into smaller steps. (let) form is quite handy, don't be shy to use it.

Piping data through arbitrary functions in Clojure

I know that the -> form can be used to pass the results of one function result to another:
(f1 (f2 (f3 x)))
(-> x f3 f2 f1) ; equivalent to the line above
(taken from the excellent Clojure tutorial at ociweb)
However this form requires that you know the functions you want to use at design time. I'd like to do the same thing, but at run time with a list of arbitrary functions.
I've written this looping function that does it, but I have a feeling there's a better way:
(defn pipe [initialData, functions]
(loop [
frontFunc (first functions)
restFuncs (rest functions)
data initialData ]
(if frontFunc
(recur (first restFuncs) (rest restFuncs) (frontFunc data) )
data )
) )
What's the best way to go about this?
I must admit I'm really new to clojure and I might be missing the point here completely, but can't this just be done using comp and apply?
user> (defn fn1 [x] (+ 2 x))
user> (defn fn2 [x] (/ x 3))
user> (defn fn3 [x] (* 1.2 x))
user> (defn pipe [initial-data my-functions] ((apply comp my-functions) initial-data))
user> (pipe 2 [fn1 fn2 fn3])
2.8
You can do this with a plain old reduce:
(defn pipe [x fs] (reduce (fn [acc f] (f acc)) x fs))
That can be shortened to:
(defn pipe [x fs] (reduce #(%2 %1) x fs))
Used like this:
user> (pipe [1 2 3] [#(conj % 77) rest reverse (partial map inc) vec])
[78 4 3]
If functions is a sequence of functions, you can reduce it using comp to get a composed function. At a REPL:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
#'user/my-list
user> ((reduce comp functions) 9)
20
apply also works in this case because comp takes a variable number of arguments:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
#'user/my-list
user> ((apply comp functions) 9)
20