I would like to compute the weighted mean of vectors in an idiomatic way.
To illustrate what I want, imagine I have this data :
data 1 = [2 1] , weight 1 = 1
data 2 = [3 4], weight 2 = 2
Then mean = [(2*1 + 3*2)/(1+2) (1*1 + 2*4)/(1+2)] = [2.67 3.0]
Here is my code :
(defn meanv
"Returns the vector that is the mean of input ones.
You can also pass weights just like apache-maths.stats/mean"
([data]
(let [n (count (first data))]
(->> (for [i (range 0 n)]
(vec (map (i-partial nth i) data)))
(mapv stats/mean))))
([data weights]
(let [n (count (first data))]
(->> (for [i (range 0 n)]
(vec (map (i-partial nth i) data)))
(mapv (i-partial stats/mean weights))))))
Then
(meanv [[2 1] [3 4]] [1 2]) = [2.67 3.0]
Few notes :
stats/means takes 1 or 2 inputs.
One input version has weights = 1 by default.
Two inputs is the weighted version.
i-partial is like partial but the fn has reversed args
Ex : ((partial / 2) 1) = 2
((i-partial / 2) 1 = 1/2
So my function works, no problem.
But in a way I would like to implement it in a more idiomatic Clojure.
I tried many combinations with things like (map (fn [&xs ... but it does not work.
Is it possible to take all nth elements of undefined number of vectors and directly apply stats/mean ? I mean a one-liner
Thanks
EDIT (birdspider answer)
(defn meanv
([data]
(->> (apply mapv vector data)
(mapv stats/mean)))
([data weights]
(->> (apply mapv vector data)
(mapv (i-partial stats/mean weights)))))
And with
(defn transpose [m]
(apply mapv vector m))
(defn meanv
([data]
(->> (transpose data)
(mapv stats/mean)))
([data weights]
(->> (transpose data)
(mapv (i-partial stats/mean weights)))))
(def mult-v (partial mapv *))
(def sum-v (partial reduce +))
(def transpose (partial apply mapv vector))
(defn meanv [data weights]
(->> data
transpose
(map (partial mult-v weights))
(map sum-v)
(map #(/ % (sum-v weights)))))
First thing you want to do is to transpose the matrix (get the firsts, seconds, thirds, etc.)
See this SO page.
; https://stackoverflow.com/a/10347404/2645347
(defn transpose [m]
(apply mapv vector m))
Then I would do it as follows, input checks are utterly absent.
(defn meanv
([data]
; no weigths default to (1 1 1 ...
(meanv data (repeat (count data) 1))))
([data weigths]
(let [wf (mapv #(partial * %) weigths) ; vector of weight mult fns
wsum (reduce + weigths)]
(map-indexed
(fn [i datum]
(/
; map over datum apply corresponding weight-fn - then sum
(apply + (map-indexed #((wf %1) %2) datum))
wsum))
(transpose data)))))
(meanv [[2 1] [3 4]] [1 2]) => (8/3 3) ; (2.6666 3.0)
Profit!
Related
I am trying to extract two elements of a map with the largest distance. For that, I defined the function for calculating the distance and can obtain the distance between the first element (p1) and other elements of the map. But I need to calculate distances between the second item (p2) and the next ones (p3, p4, p5), the third item (p3) and (p4, p5), the fourth item (p4) and fifth item (p5). Then I need to identify the maximum amount between all distances and return the 2 items with the largest distance and the distance itself. Any help is highly appreciated.
Here is my code:
(defn eclid-dist
[u v]
(Math/sqrt (apply + (map #(* % %) (mapv - u v)))))
(def error
{:p1 [1 2 3]
:p2 [4 5 6]
:p3 [7 8 9]
:p4 [1 2 3]
:p5 [6 5 4]})
(dotimes [i (dec (count error))]
(let [dis (eclid-dist (second (nth (seq error) 0))
(second (nth (seq error) (+ i 1))))
max-error (max dis)]
(println [':dis' dis ':max-error' max-error])))
I tried to save each calculated distance as a vector element separately to prevent overwriting but it was not successful.
You could use the for macro for this. It let's you combine two nested loops to test for all pairs. Then you can use max-key to pick the pair with largest distance:
(defn find-largest-dist-pair [vec-map]
(apply max-key second
(for [[[k0 v0] & r] (iterate rest vec-map)
:while r
[k1 v1] r]
[[k0 k1] (eclid-dist v0 v1)])))
(find-largest-dist-pair error)
;; => [[:p3 :p4] 10.392304845413264]
There is nothing wrong with eclid-dist, you could just use the dedicated Clojure library clojure.math (and ->> thread-last macro for better readability) and rewrite it like this:
(:require [clojure.math :as m])
(defn distance [u v]
(->> (mapv - u v)
(mapv #(m/pow % 2))
(reduce +)
m/sqrt))
Your main problem is, how to create unique pairs of points from your data. You could write a recursive function for this:
(defn unique-pairs [point-seq]
(let [[f & r] point-seq]
(when (seq r)
(concat (map #(vector f %) r)
(unique-pairs r)))))
(def error {:p1 [1 2 3]
:p2 [4 5 6]
:p3 [7 8 9]
:p4 [1 2 3]
:p5 [6 5 4]})
(unique-pairs (vals error))
or use library clojure.math.combinatorics:
Dependency: [org.clojure/math.combinatorics "0.1.6"]
(:require [clojure.math.combinatorics :as combi])
(combi/combinations (vals error) 2)
Note that these functions have slightly different results- it doesn't affect the final result, but if you can, you should use combinations.
Now, you have to compute distance for all these pairs and return the pair with the largest one:
(defn max-distance [point-map]
(->> (combi/combinations (vals point-map) 2)
(map (fn [[u v]] {:u u :v v :distance (distance u v)}))
(apply max-key :distance)))
(max-distance error)
=> {:u [1 2 3], :v [7 8 9], :distance 10.392304845413264}
In Clojure I want to find the result of multiple reductions while only consuming the sequence once. In Java I would do something like the following:
double min = Double.MIN_VALUE;
double max = Double.MAX_VALUE;
for (Item item : items) {
double price = item.getPrice();
if (price > min) {
min = price;
}
if (price < max) {
max = price;
}
}
In Clojure I could do much the same thing by using loop and recur, but it's not very composable - I'd like to do something that lets you add in other aggregation functions as needed.
I've written the following function to do this:
(defn reduce-multi
"Given a sequence of fns and a coll, returns a vector of the result of each fn
when reduced over the coll."
[fns coll]
(let [n (count fns)
r (rest coll)
initial-v (transient (into [] (repeat n (first coll))))
fns (into [] fns)
reduction-fn
(fn [v x]
(loop [v-current v, i 0]
(let [y (nth v-current i)
f (nth fns i)
v-new (assoc! v-current i (f y x))]
(if (= i (- n 1))
v-new
(recur v-new (inc i))))))]
(persistent! (reduce reduction-fn initial-v r))))
This can be used in the following way:
(reduce-multi [max min] [4 3 6 7 0 1 8 2 5 9])
=> [9 0]
I appreciate that it's not implemented in the most idiomatic way, but the main problem is that it's about 10x as slow as doing the reductions one at at time. This might be useful for lots performing lots of reductions where the seq is doing heavy IO, but surely this could be better.
Is there something in an existing Clojure library that would do what I want? If not, where am I going wrong in my function?
that's what i would do: simply delegate this task to a core reduce function, like this:
(defn multi-reduce
([fs accs xs] (reduce (fn [accs x] (doall (map #(%1 %2 x) fs accs)))
accs xs))
([fs xs] (when (seq xs)
(multi-reduce fs (repeat (count fs) (first xs))
(rest xs)))))
in repl:
user> (multi-reduce [+ * min max] (range 1 10))
(45 362880 1 9)
user> (multi-reduce [+ * min max] [10])
(10 10 10 10)
user> (multi-reduce [+ * min max] [])
nil
user> (multi-reduce [+ * min max] [1 1 1000 0] [])
[1 1 1000 0]
user> (multi-reduce [+ * min max] [1 1 1000 0] [1])
(2 1 1 1)
user> (multi-reduce [+ * min max] [1 1 1000 0] (range 1 10))
(46 362880 1 9)
user> (multi-reduce [max min] (range 1000000))
(999999 0)
The code for reduce is fast for reducible collections. So it's worth trying to base multi-reduce on core reduce. To do so, we have to be able to construct reducing functions of the right shape. An ancillary function to do so is ...
(defn juxt-reducer [f g]
(fn [[fa ga] x] [(f fa x) (g ga x)]))
Now we can define the function you want, which combines juxt with reduce as ...
(defn juxt-reduce
([[f g] coll]
(if-let [[x & xs] (seq coll)]
(juxt-reduce (list f g) [x x] xs)
[(f) (g)]))
([[f g] init coll]
(reduce (juxt-reducer f g) init coll)))
For example,
(juxt-reduce [max min] [4 3 6 7 0 1 8 2 5 9]) ;=> [9 0]
The above follows the shape of core reduce. It can clearly be extended to cope with more than two functions. And I'd expect it to be faster than yours for reducible collections.
Here is how I would do it:
(ns clj.core
(:require [clojure.string :as str] )
(:use tupelo.core))
(def data (flatten [ (range 5 10) (range 5) ] ))
(spyx data)
(def result (reduce (fn [cum-result curr-val] ; reducing (accumulator) fn
(it-> cum-result
(update it :min-val min curr-val)
(update it :max-val max curr-val)))
{ :min-val (first data) :max-val (first data) } ; inital value
data)) ; seq to reduce
(spyx result)
(defn -main [] )
;=> data => (5 6 7 8 9 0 1 2 3 4)
;=> result => {:min-val 0, :max-val 9}
So the reducing function (fn ...) carries along a map like {:min-val xxx :max-val yyy} through each element of the sequence, updating the min & max values as required at each step.
While this does make only one pass through the data, it is doing a lot of extra work calling update twice per element. Unless your sequence is very unusual, it is probably more efficient to make two (very efficient) passes through the data like:
(def min-val (apply min data))
(def max-val (apply max data))
(spyx min-val)
(spyx max-val)
;=> min-val => 0
;=> max-val => 9
I have the following variable
(def a [[1 2] [3 4] [5 6]])
and want to return
[[1 3 5][2 4 6]]
and if input is
[[1 2] [3 4] [5 6] [7 8 9]] then the required result is
[[1 3 5 7] [2 4 6 8] [9]]
How to do it in clojure?
(persistent!
(reduce
(fn [acc e]
(reduce-kv
(fn [acc2 i2 e2]
(assoc! acc2 i2 ((fnil conj []) (get acc2 i2) e2)))
acc
e))
(transient [])
[[1 2 3] [:a :b] [\a] [111] [200 300 400 500]]))
;;=> [[1 :a \a 111 200] [2 :b 300] [3 400] [500]]
An empty vector can be updated via the update-in fn at the 0th index, a non-empty vector can be, additionally, updated at the index immediately following the last value.
The reduction here is about passing the outer accumulator to the inner reducing function, updating it accordingly, and then returning it back to the outer reducing function, which in turn will pass again to the inner rf for processing the next element.
EDIT: Updated to fastest version.
I like ifett's implementation, though it seems weird to use reduce-kv to build a vector that could be easily build with map/mapv.
So, here is how I would've done it:
(defn transpose [v]
(mapv (fn [ind]
(mapv #(get % ind)
(filter #(contains? % ind) v)))
(->> (map count v)
(apply max)
range)))
(->> (range)
(map (fn [i]
(->> a
(filter #(contains? % i))
(map #(nth % i)))))
(take-while seq))
Notice that this algorithm creates a lazy seq of lazy seqs so you that you will only pay for the transformations you really consume. If you insist on creating vectors instead, wrap the forms in vec at the necessary places - or if you are using Clojurescript or don't mind a Clojure 1.7 alpha use transducers to create vectors eagerly without paying for laziness or immutability:
(into []
(comp
(map (fn [i]
(into [] (comp (filter #(contains? % i))
(map #(nth % i)))
a)))
(take-while seq))
(range))
I find this easy to understand:
(defn nth-column [matrix n]
(for [row matrix] (nth row n)))
(defn transpose [matrix]
(for [column (range (count (first matrix)))]
(nth-column matrix column)))
(transpose a)
=> ((1 3 5) (2 4 6))
nth-column is a list comprehension generating a sequence from the nth element of each sequence (of rows).
Then transpose-matrix is simply iterating over the columns creating a sequence element for each, consisting of (nth-column matrix column) i.e. the sequence of elements for that column.
(map
(partial filter identity) ;;remove nil in each sub-list
(take-while
#(some identity %) ;;stop on all nil sub-list
(for [i (range)]
(map #(get % i) a)))) ;; get returns nil on missing values
Use get to have nil on missing values, iterate (for) on an infinite range, stop on all nil sub-list, remove nil from sub-lists. Add vector constructor before first map and in it's function (first argument) if you really need vectors.
EDIT: please leave a comment if you think this is not useful. We can all learn from mistakes.
I need to take some amount of elements from a sequence based on some quantity rule. Here is a solution I came up with:
(defn take-while-not-enough
[p len xs]
(loop [ac 0
r []
s xs]
(if (empty? s)
r
(let [new-ac (p ac (first s))]
(if (>= new-ac len)
r
(recur new-ac (conj r (first s)) (rest s)))))))
(take-while-not-enough + 10 [2 5 7 8 2 1]) ; [2 5]
(take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]]) ; [[2 5]]
Is there any better way to achieve the same?
Thanks.
UPDATE:
Somebody posted that solution, but then removed it. It does the same is the answer that I accepted, but is more readable. Thank you, anonymous well-wisher!
(defn take-while-not-enough [reducer-fn limit data]
(->> (reductions reducer-fn 0 data) ; 1. the sequence of accumulated values
(map vector data) ; 2. paired with the original sequence
(take-while #(< (second %) limit)) ; 3. until a certain accumulated value
(map first))) ; 4. then extract the original values
My first thought is to view this problem as a variation on reduce and thus to break the problem into two steps:
count the number of items in the result
take that many from the input
I also took some liberties with the argument names:
user> (defn take-while-not-enough [reducer-fn limit data]
(take (dec (count (take-while #(< % limit) (reductions reducer-fn 0 data))))
data))
#'user/take-while-not-enough
user> (take-while-not-enough #(+ %1 (%2 1)) 7 [[2 5] [7 8] [2 1]])
([2 5])
user> (take-while-not-enough + 10 [2 5 7 8 2 1])
(2 5)
This returns a sequence and your examples return a vector, if this is important then you can add a call to vec
Something that would traverse the input sequence only once:
(defn take-while-not-enough [r v data]
(->> (rest (reductions (fn [s i] [(r (s 0) i) i]) [0 []] data))
(take-while (comp #(< % v) first))
(map second)))
Well, if you want to use flatland/useful, this is a kinda-okay way to use glue:
(defn take-while-not-enough [p len xs]
(first (glue conj []
(constantly true)
#(>= (reduce p 0 %) len)
xs)))
But it's rebuilding the accumulator for the entire "processed so far" chunk every time it decides whether to grow the chunk more, so it's O(n^2), which will be unacceptable for larger inputs.
The most obvious improvement to your implementation is to make it lazy instead of tail-recursive:
(defn take-while-not-enough [p len xs]
((fn step [acc coll]
(lazy-seq
(when-let [xs (seq coll)]
(let [x (first xs)
acc (p acc x)]
(when-not (>= acc len)
(cons x (step acc xs)))))))
0 xs))
Sometimes lazy-seq is straight-forward and self-explaining.
(defn take-while-not-enough
([f limit coll] (take-while-not-enough f limit (f) coll))
([f limit acc coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
nacc (f acc fst)]
(when (< nxt-sd limit)
(cons fst (take-while-not-enough f limit nacc (rest s)))))))))
Note: f is expected to follow the rules of reduce.
The following code
(let [coll [1 2 3 4 5]
filters [#(> % 1) #(< % 5)]]
(->> coll
(filter (first filters))
(filter (second filters))))
Gives me
(2 3 4)
Which is great, but how do I apply all the filters in coll without having to explicitly name them?
There may be totally better ways of doing this, but ideally I'd like to know an expression that can replace (filter (first filters)) (filter (second filters)) above.
Thanks!
Clojure 1.3 has a new every-pred function, which you could use thusly:
(filter (apply every-pred filters) coll)
This should work :-
(let [coll [1 2 3 4 5]
filters [#(> % 1) #(< % 5)]]
(filter (fn [x] (every? #(% x) filters)) coll)
)
I can't say I'm very proud of the following, but at least it works and allows for infinite filters:
(seq
(reduce #(clojure.set/intersection
(set %1)
(set %2)) (map #(filter % coll) filters)))
If you can use sets in place of seqs it would simplify the above code as follows:
(reduce clojure.set/intersection (map #(filter % coll) filters))
(let [coll [1 2 3 4 5]
filters [#(> % 1) #(< % 5)]]
(reduce (fn [c f] (filter f c)) coll filters))