Perform multiple reductions in a single pass in Clojure - clojure

In Clojure I want to find the result of multiple reductions while only consuming the sequence once. In Java I would do something like the following:
double min = Double.MIN_VALUE;
double max = Double.MAX_VALUE;
for (Item item : items) {
double price = item.getPrice();
if (price > min) {
min = price;
}
if (price < max) {
max = price;
}
}
In Clojure I could do much the same thing by using loop and recur, but it's not very composable - I'd like to do something that lets you add in other aggregation functions as needed.
I've written the following function to do this:
(defn reduce-multi
"Given a sequence of fns and a coll, returns a vector of the result of each fn
when reduced over the coll."
[fns coll]
(let [n (count fns)
r (rest coll)
initial-v (transient (into [] (repeat n (first coll))))
fns (into [] fns)
reduction-fn
(fn [v x]
(loop [v-current v, i 0]
(let [y (nth v-current i)
f (nth fns i)
v-new (assoc! v-current i (f y x))]
(if (= i (- n 1))
v-new
(recur v-new (inc i))))))]
(persistent! (reduce reduction-fn initial-v r))))
This can be used in the following way:
(reduce-multi [max min] [4 3 6 7 0 1 8 2 5 9])
=> [9 0]
I appreciate that it's not implemented in the most idiomatic way, but the main problem is that it's about 10x as slow as doing the reductions one at at time. This might be useful for lots performing lots of reductions where the seq is doing heavy IO, but surely this could be better.
Is there something in an existing Clojure library that would do what I want? If not, where am I going wrong in my function?

that's what i would do: simply delegate this task to a core reduce function, like this:
(defn multi-reduce
([fs accs xs] (reduce (fn [accs x] (doall (map #(%1 %2 x) fs accs)))
accs xs))
([fs xs] (when (seq xs)
(multi-reduce fs (repeat (count fs) (first xs))
(rest xs)))))
in repl:
user> (multi-reduce [+ * min max] (range 1 10))
(45 362880 1 9)
user> (multi-reduce [+ * min max] [10])
(10 10 10 10)
user> (multi-reduce [+ * min max] [])
nil
user> (multi-reduce [+ * min max] [1 1 1000 0] [])
[1 1 1000 0]
user> (multi-reduce [+ * min max] [1 1 1000 0] [1])
(2 1 1 1)
user> (multi-reduce [+ * min max] [1 1 1000 0] (range 1 10))
(46 362880 1 9)
user> (multi-reduce [max min] (range 1000000))
(999999 0)

The code for reduce is fast for reducible collections. So it's worth trying to base multi-reduce on core reduce. To do so, we have to be able to construct reducing functions of the right shape. An ancillary function to do so is ...
(defn juxt-reducer [f g]
(fn [[fa ga] x] [(f fa x) (g ga x)]))
Now we can define the function you want, which combines juxt with reduce as ...
(defn juxt-reduce
([[f g] coll]
(if-let [[x & xs] (seq coll)]
(juxt-reduce (list f g) [x x] xs)
[(f) (g)]))
([[f g] init coll]
(reduce (juxt-reducer f g) init coll)))
For example,
(juxt-reduce [max min] [4 3 6 7 0 1 8 2 5 9]) ;=> [9 0]
The above follows the shape of core reduce. It can clearly be extended to cope with more than two functions. And I'd expect it to be faster than yours for reducible collections.

Here is how I would do it:
(ns clj.core
(:require [clojure.string :as str] )
(:use tupelo.core))
(def data (flatten [ (range 5 10) (range 5) ] ))
(spyx data)
(def result (reduce (fn [cum-result curr-val] ; reducing (accumulator) fn
(it-> cum-result
(update it :min-val min curr-val)
(update it :max-val max curr-val)))
{ :min-val (first data) :max-val (first data) } ; inital value
data)) ; seq to reduce
(spyx result)
(defn -main [] )
;=> data => (5 6 7 8 9 0 1 2 3 4)
;=> result => {:min-val 0, :max-val 9}
So the reducing function (fn ...) carries along a map like {:min-val xxx :max-val yyy} through each element of the sequence, updating the min & max values as required at each step.
While this does make only one pass through the data, it is doing a lot of extra work calling update twice per element. Unless your sequence is very unusual, it is probably more efficient to make two (very efficient) passes through the data like:
(def min-val (apply min data))
(def max-val (apply max data))
(spyx min-val)
(spyx max-val)
;=> min-val => 0
;=> max-val => 9

Related

Concatenate elements to list by using loop in clojure?

I am trying to get into Lisps and FP by trying out the 99 problems.
Here is the problem statement (Problem 15)
Replicate the elements of a list a given number of times.
I have come up with the following code which simply returns an empty list []
I am unable to figure out why my code doesn't work and would really appreciate some help.
(defn replicateList "Replicates each element of the list n times" [l n]
(loop [initList l returnList []]
(if (empty? initList)
returnList
(let [[head & rest] initList]
(loop [x 0]
(when (< x n)
(conj returnList head)
(recur (inc x))))
(recur rest returnList)))))
(defn -main
"Main" []
(test/is (=
(replicateList [1 2] 2)
[1 1 2 2])
"Failed basic test")
)
copying my comment to answer:
this line: (conj returnList head) doesn't modify returnlist, rather it just drops the result in your case. You should restructure your program to pass the accumulated list further to the next iteration. But there are better ways to do this in clojure. Like (defn replicate-list [data times] (apply concat (repeat times data)))
If you still need the loop/recur version for educational reasons, i would go with this:
(defn replicate-list [data times]
(loop [[h & t :as input] data times times result []]
(if-not (pos? times)
result
(if (empty? input)
(recur data (dec times) result)
(recur t times (conj result h))))))
user> (replicate-list [1 2 3] 3)
;;=> [1 2 3 1 2 3 1 2 3]
user> (replicate-list [ ] 2)
;;=> []
user> (replicate-list [1 2 3] -1)
;;=> []
update
based on the clarified question, the simplest way to do this is
(defn replicate-list [data times]
(mapcat (partial repeat times) data))
user> (replicate-list [1 2 3] 3)
;;=> (1 1 1 2 2 2 3 3 3)
and the loop/recur variant:
(defn replicate-list [data times]
(loop [[h & t :as data] data n 0 res []]
(cond (empty? data) res
(>= n times) (recur t 0 res)
:else (recur data (inc n) (conj res h)))))
user> (replicate-list [1 2 3] 3)
;;=> [1 1 1 2 2 2 3 3 3]
user> (replicate-list [1 2 3] 0)
;;=> []
user> (replicate-list [] 10)
;;=> []
Here is a version based on the original post, with minimal modifications:
;; Based on the original version posted
(defn replicateList "Replicates each element of the list n times" [l n]
(loop [initList l returnList []]
(if (empty? initList)
returnList
(let [[head & rest] initList]
(recur
rest
(loop [inner-returnList returnList
x 0]
(if (< x n)
(recur (conj inner-returnList head) (inc x))
inner-returnList)))))))
Please keep in mind that Clojure is mainly a functional language, meaning that most functions produce their results as a new return value instead of updating in place. So, as pointed out in the comment, the line (conj returnList head) will not have an effect, because it's return value is ignored.
The above version works, but does not really take advantage of Clojure's sequence processing facilities. So here are two other suggestions for solving your problem:
;; Using lazy seqs and reduce
(defn replicateList2 [l n]
(reduce into [] (map #(take n (repeat %)) l)))
;; Yet another way using transducers
(defn replicateList3 [l n]
(transduce
(comp (map #(take n (repeat %)))
cat
)
conj
[]
l))
One thing is not clear about your question though: From your implementation, it looks like you want to create a new list where each element is repeated n times, e.g.
playground.replicate> (replicateList [1 2 3] 4)
[1 1 1 1 2 2 2 2 3 3 3 3]
But if you would instead like this result
playground.replicate> (replicateList [1 2 3] 4)
[1 2 3 1 2 3 1 2 3 1 2 3]
the answer to your question will be different.
If you want to learn idiomatic Clojure you should try to find a solution without such low level facilities as loop. Rather try to combine higher level functions like take, repeat, repeatedly. If you're feeling adventurous you might throw in laziness as well. Clojure's sequences are lazy, that is they get evaluated only when needed.
One example I came up with would be
(defn repeat-list-items [l n]
(lazy-seq
(when-let [s (seq l)]
(concat (repeat n (first l))
(repeat-list-items (next l) n)))))
Please also note the common naming with kebab-case
This seems to do what you want pretty well and works for an unlimited input (see the call (range) below), too:
experi.core> (def l [:a :b :c])
#'experi.core/
experi.core> (repeat-list-items l 2)
(:a :a :b :b :c :c)
experi.core> (repeat-list-items l 0)
()
experi.core> (repeat-list-items l 1)
(:a :b :c)
experi.core> (take 10 (drop 10000 (repeat-list-items (range) 4)))
(2500 2500 2500 2500 2501 2501 2501 2501 2502 2502)

Clojure : idiomatic weighted mean of vectors

I would like to compute the weighted mean of vectors in an idiomatic way.
To illustrate what I want, imagine I have this data :
data 1 = [2 1] , weight 1 = 1
data 2 = [3 4], weight 2 = 2
Then mean = [(2*1 + 3*2)/(1+2) (1*1 + 2*4)/(1+2)] = [2.67 3.0]
Here is my code :
(defn meanv
"Returns the vector that is the mean of input ones.
You can also pass weights just like apache-maths.stats/mean"
([data]
(let [n (count (first data))]
(->> (for [i (range 0 n)]
(vec (map (i-partial nth i) data)))
(mapv stats/mean))))
([data weights]
(let [n (count (first data))]
(->> (for [i (range 0 n)]
(vec (map (i-partial nth i) data)))
(mapv (i-partial stats/mean weights))))))
Then
(meanv [[2 1] [3 4]] [1 2]) = [2.67 3.0]
Few notes :
stats/means takes 1 or 2 inputs.
One input version has weights = 1 by default.
Two inputs is the weighted version.
i-partial is like partial but the fn has reversed args
Ex : ((partial / 2) 1) = 2
((i-partial / 2) 1 = 1/2
So my function works, no problem.
But in a way I would like to implement it in a more idiomatic Clojure.
I tried many combinations with things like (map (fn [&xs ... but it does not work.
Is it possible to take all nth elements of undefined number of vectors and directly apply stats/mean ? I mean a one-liner
Thanks
EDIT (birdspider answer)
(defn meanv
([data]
(->> (apply mapv vector data)
(mapv stats/mean)))
([data weights]
(->> (apply mapv vector data)
(mapv (i-partial stats/mean weights)))))
And with
(defn transpose [m]
(apply mapv vector m))
(defn meanv
([data]
(->> (transpose data)
(mapv stats/mean)))
([data weights]
(->> (transpose data)
(mapv (i-partial stats/mean weights)))))
(def mult-v (partial mapv *))
(def sum-v (partial reduce +))
(def transpose (partial apply mapv vector))
(defn meanv [data weights]
(->> data
transpose
(map (partial mult-v weights))
(map sum-v)
(map #(/ % (sum-v weights)))))
First thing you want to do is to transpose the matrix (get the firsts, seconds, thirds, etc.)
See this SO page.
; https://stackoverflow.com/a/10347404/2645347
(defn transpose [m]
(apply mapv vector m))
Then I would do it as follows, input checks are utterly absent.
(defn meanv
([data]
; no weigths default to (1 1 1 ...
(meanv data (repeat (count data) 1))))
([data weigths]
(let [wf (mapv #(partial * %) weigths) ; vector of weight mult fns
wsum (reduce + weigths)]
(map-indexed
(fn [i datum]
(/
; map over datum apply corresponding weight-fn - then sum
(apply + (map-indexed #((wf %1) %2) datum))
wsum))
(transpose data)))))
(meanv [[2 1] [3 4]] [1 2]) => (8/3 3) ; (2.6666 3.0)
Profit!

Partition a seq by a "windowing" predicate in Clojure

I would like to "chunk" a seq into subseqs the same as partition-by, except that the function is not applied to each individual element, but to a range of elements.
So, for example:
(gather (fn [a b] (> (- b a) 2))
[1 4 5 8 9 10 15 20 21])
would result in:
[[1] [4 5] [8 9 10] [15] [20 21]]
Likewise:
(defn f [a b] (> (- b a) 2))
(gather f [1 2 3 4]) ;; => [[1 2 3] [4]]
(gather f [1 2 3 4 5 6 7 8 9]) ;; => [[1 2 3] [4 5 6] [7 8 9]]
The idea is that I apply the start of the list and the next element to the function, and if the function returns true we partition the current head of the list up to that point into a new partition.
I've written this:
(defn gather
[pred? lst]
(loop [acc [] cur [] l lst]
(let [a (first cur)
b (first l)
nxt (conj cur b)
rst (rest l)]
(cond
(empty? l) (conj acc cur)
(empty? cur) (recur acc nxt rst)
((complement pred?) a b) (recur acc nxt rst)
:else (recur (conj acc cur) [b] rst)))))
and it works, but I know there's a simpler way. My question is:
Is there a built in function to do this where this function would be unnecessary? If not, is there a more idiomatic (or simpler) solution that I have overlooked? Something combining reduce and take-while?
Thanks.
Original interpretation of question
We (all) seemed to have misinterpreted your question as wanting to start a new partition whenever the predicate held for consecutive elements.
Yet another, lazy, built on partition-by
(defn partition-between [pred? coll]
(let [switch (reductions not= true (map pred? coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(partition-between (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))
Actual Question
The actual question asks us to start a new partition whenever pred? holds for the beginning of the current partition and the current element. For this we can just rip off partition-by with a few tweaks to its source.
(defn gather [pred? coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
run (cons fst (take-while #((complement pred?) fst %) (next s)))]
(cons run (gather pred? (seq (drop (count run) s))))))))
(gather (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))
(gather (fn [a b] (> (- b a) 2)) [1 2 3 4])
;=> ((1 2 3) (4))
(gather (fn [a b] (> (- b a) 2)) [1 2 3 4 5 6 7 8 9])
;=> ((1 2 3) (4 5 6) (7 8 9))
Since you need to have the information from previous or next elements than the one you are currently deciding on, a partition of pairs with a reduce could do the trick in this case.
This is what I came up with after some iterations:
(defn gather [pred s]
(->> (partition 2 1 (repeat nil) s) ; partition the sequence and if necessary
; fill the last partition with nils
(reduce (fn [acc [x :as s]]
(let [n (dec (count acc))
acc (update-in acc [n] conj x)]
(if (apply pred s)
(conj acc [])
acc)))
[[]])))
(gather (fn [a b] (when (and a b) (> (- b a) 2)))
[1 4 5 8 9 10 15 20 21])
;= [[1] [4 5] [8 9 10] [15] [20 21]]
The basic idea is to make partitions of the number of elements the predicate function takes, filling the last partition with nils if necessary. The function then reduces each partition by determining if the predicate is met, if so then the first element in the partition is added to the current group and a new group is created. Since the last partition could have been filled with nulls, the predicate has to be modified.
Tow possible improvements to this function would be to let the user:
Define the value to fill the last partition, so the reducing function could check if any of the elements in the partition is this value.
Specify the arity of the predicate, thus allowing to determine the grouping taking into account the current and the next n elements.
I wrote this some time ago in useful:
(defn partition-between [split? coll]
(lazy-seq
(when-let [[x & more] (seq coll)]
(lazy-loop [items [x], coll more]
(if-let [[x & more] (seq coll)]
(if (split? [(peek items) x])
(cons items (lazy-recur [x] more))
(lazy-recur (conj items x) more))
[items])))))
It uses lazy-loop, which is just a way to write lazy-seq expressions that look like loop/recur, but I hope it's fairly clear.
I linked to a historical version of the function, because later I realized there's a more general function that you can use to implement partition-between, or partition-by, or indeed lots of other sequential functions. These days the implementation is much shorter, but it's less obvious what's going on if you're not familiar with the more general function I called glue:
(defn partition-between [split? coll]
(glue conj []
(fn [v x]
(not (split? [(peek v) x])))
(constantly false)
coll))
Note that both of these solutions are lazy, which at the time I'm writing this is not true of any of the other solutions in this thread.
Here is one way, with steps split up. It can be narrowed down to fewer statements.
(def l [1 4 5 8 9 10 15 20 21])
(defn reduce_fn [f x y]
(cond
(f (last (last x)) y) (conj x [y])
:else (conj (vec (butlast x)) (conj (last x) y)) )
)
(def reduce_fn1 (partial reduce_fn #(> (- %2 %1) 2)))
(reduce reduce_fn1 [[(first l)]] (rest l))
keep-indexed is a wonderful function. Given a function f and a vector lst,
(keep-indexed (fn [idx it] (if (apply f it) idx))
(partition 2 1 lst)))
(0 2 5 6)
this returns the indices after which you want to split. Let's increment them and tack a 0 at the front:
(cons 0 (map inc (.....)))
(0 1 3 6 7)
Partition these to get ranges:
(partition 2 1 nil (....))
((0 1) (1 3) (3 6) (6 7) (7))
Now use these to generate subvecs:
(map (partial apply subvec lst) ....)
([1] [4 5] [8 9 10] [15] [20 21])
Putting it all together:
(defn gather
[f lst]
(let [indices (cons 0 (map inc
(keep-indexed (fn [idx it]
(if (apply f it) idx))
(partition 2 1 lst))))]
(map (partial apply subvec (vec lst))
(partition 2 1 nil indices))))
(gather #(> (- %2 %) 2) '(1 4 5 8 9 10 15 20 21))
([1] [4 5] [8 9 10] [15] [20 21])

Clojure: find repetition

Let's say we have a list of integers: 1, 2, 5, 13, 6, 5, 7 and I want to find the first number that repeats and return a vector of the two indices. In my sample, it's 5 at [2, 5]. What I did so far is loop, but can I do it more elegant, short way?
(defn get-cycle
[xs]
(loop [[x & xs_rest] xs, indices {}, i 0]
(if (nil? x)
[0 i] ; Sequence is over before we found a duplicate.
(if-let [x_index (indices x)]
[x_index i]
(recur xs_rest (assoc indices x i) (inc i))))))
No need to return number itself, because I can get it by index and, second, it may be not always there.
An option using list processing, but not significantly more concise:
(defn get-cycle [xs]
(first (filter #(number? (first %))
(reductions
(fn [[m i] x] (if-let [xat (m x)] [xat i] [(assoc m x i) (inc i)]))
[(hash-map) 0] xs))))
Here is a version using reduced to stop consuming the sequence when you find the first duplicate:
(defn first-duplicate [coll]
(reduce (fn [acc [idx x]]
(if-let [v (get acc x)]
(reduced (conj v idx))
(assoc acc x [idx])))
{} (map-indexed #(vector % %2) coll)))
I know that you have only asked for the first. Here is a fully lazy implementation with little per-step allocation overhead
(defn dups
[coll]
(letfn [(loop-fn [idx [elem & rest] cached]
(if elem
(if-let [last-idx (cached elem)]
(cons [last-idx idx]
(lazy-seq (loop-fn (inc idx) rest (dissoc cached elem))))
(lazy-seq (loop-fn (inc idx) rest (assoc cached elem idx))))))]
(loop-fn 0 coll {})))
(first (dups v))
=> [2 5]
Edit: Here are some criterium benchmarks:
The answer that got accepted: 7.819269 µs
This answer (first (dups [12 5 13 6 5 7])): 6.176290 µs
Beschastnys: 5.841101 µs
first-duplicate: 5.025445 µs
Actually, loop is a pretty good choice unless you want to find all duplicates. Things like reduce will cause the full scan of an input sequence even when it's not necessary.
Here is my version of get-cycle:
(defn get-cycle [coll]
(loop [i 0 seen {} coll coll]
(when-let [[x & xs] (seq coll)]
(if-let [j (seen x)]
[j i]
(recur (inc i) (assoc seen x i) xs)))))
The only difference from your get-cycle is that my version returns nil when there is no duplicates.
The intent of your code seems different from your description in the comments so I'm not totally confident I understand. That said, loop/recur is definitely a valid way to approach the problem.
Here's what I came up with:
(defn get-cycle [xs]
(loop [xs xs index 0]
(when-let [[x & more] (seq xs)]
(when-let [[y] (seq more)]
(if (= x y)
{x [index (inc index)]}
(recur more (inc index)))))))
This will return a map of the repeated item to a vector of the two indices the item was found at.
(get-cycle [1 1 2 1 2 4 2 1 4 5 6 7])
;=> {1 [0 1]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 7])
;=> {7 [10 11]}
(get-cycle [1 2 1 2 4 2 1 4 5 6 7 8])
;=> nil
Here's an alternative solution using sequence functions. I like this way better but whether it's shorter or more elegant is probably subjective.
(defn pairwise [coll]
(map vector coll (rest coll)))
(defn find-first [pred xs]
(first (filter pred xs)))
(defn get-cycle [xs]
(find-first #(apply = (val (first %)))
(map-indexed hash-map (pairwise xs))))
Edited based on clarification from #demi
Ah, got it. Is this what you have in mind?
(defn get-cycle [xs]
(loop [xs (map-indexed vector xs)]
(when-let [[[i n] & more] (seq xs)]
(if-let [[j _] (find-first #(= n (second %)) more)]
{n [i j]}
(recur more)))))
I re-used find-first from my earlier sequence-based solution.

Is there any better and more idiomatic way of taking "while not enough" from a seq?

I need to take some amount of elements from a sequence based on some quantity rule. Here is a solution I came up with:
(defn take-while-not-enough
[p len xs]
(loop [ac 0
r []
s xs]
(if (empty? s)
r
(let [new-ac (p ac (first s))]
(if (>= new-ac len)
r
(recur new-ac (conj r (first s)) (rest s)))))))
(take-while-not-enough + 10 [2 5 7 8 2 1]) ; [2 5]
(take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]]) ; [[2 5]]
Is there any better way to achieve the same?
Thanks.
UPDATE:
Somebody posted that solution, but then removed it. It does the same is the answer that I accepted, but is more readable. Thank you, anonymous well-wisher!
(defn take-while-not-enough [reducer-fn limit data]
(->> (reductions reducer-fn 0 data) ; 1. the sequence of accumulated values
(map vector data) ; 2. paired with the original sequence
(take-while #(< (second %) limit)) ; 3. until a certain accumulated value
(map first))) ; 4. then extract the original values
My first thought is to view this problem as a variation on reduce and thus to break the problem into two steps:
count the number of items in the result
take that many from the input
I also took some liberties with the argument names:
user> (defn take-while-not-enough [reducer-fn limit data]
(take (dec (count (take-while #(< % limit) (reductions reducer-fn 0 data))))
data))
#'user/take-while-not-enough
user> (take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]])
([2 5])
user> (take-while-not-enough + 10 [2 5 7 8 2 1])
(2 5)
This returns a sequence and your examples return a vector, if this is important then you can add a call to vec
Something that would traverse the input sequence only once:
(defn take-while-not-enough [r v data]
(->> (rest (reductions (fn [s i] [(r (s 0) i) i]) [0 []] data))
(take-while (comp #(< % v) first))
(map second)))
Well, if you want to use flatland/useful, this is a kinda-okay way to use glue:
(defn take-while-not-enough [p len xs]
(first (glue conj []
(constantly true)
#(>= (reduce p 0 %) len)
xs)))
But it's rebuilding the accumulator for the entire "processed so far" chunk every time it decides whether to grow the chunk more, so it's O(n^2), which will be unacceptable for larger inputs.
The most obvious improvement to your implementation is to make it lazy instead of tail-recursive:
(defn take-while-not-enough [p len xs]
((fn step [acc coll]
(lazy-seq
(when-let [xs (seq coll)]
(let [x (first xs)
acc (p acc x)]
(when-not (>= acc len)
(cons x (step acc xs)))))))
0 xs))
Sometimes lazy-seq is straight-forward and self-explaining.
(defn take-while-not-enough
([f limit coll] (take-while-not-enough f limit (f) coll))
([f limit acc coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
nacc (f acc fst)]
(when (< nxt-sd limit)
(cons fst (take-while-not-enough f limit nacc (rest s)))))))))
Note: f is expected to follow the rules of reduce.