Is there a better way to write this pivot function in Clojure? - clojure

I've written a function to "pivot" some data tables but wonder if there's a simpler way to accomplish the same result.
(defn pivot-ab-ba
[data]
(r/reduce
(fn [result [ks c]]
(assoc-in result ks c))
{}
;; pivot {a [b1 c1 b2 c2]} => [[b1 a] c1] [[b2 a] c2]
(mapcat (fn [[a bcseq]]
;; pivot [a [b c]] => [[[b a] c]]
(mapcat (fn [[b c]] [[[b a] c]]) bcseq))
data)))
(let [data {1 {:good [1 2] :bad [3 4]}
2 {:good [5 6] :bad [7 8]}}]
(pivot-ab-ba data))
; => {:good {1 [1 2], 2 [5 6]}, :bad {1 [3 4], 2 [7 8]}}
This works, but seems over-complicated.
UPDATE:
#TaylorWood proposed a solution below; here is that answer with a modification to avoid passing in the keys that are being pivoted:
(defn pivot [data]
(reduce-kv
(fn [acc k v]
(reduce (fn [acc' k'] (assoc-in acc' [k' k] (k' v)))
acc
(keys v)))
{}
data))
UPDATE 2: Thank you all for your answers. Because there is such a diversity of answers I profiled the results to get an idea about how they performed. Admittedly, this is a single test, but still interesting:
Benchmarks performed with (criterium.core/bench pivot-function)
# Original pivot-ab-ba
Evaluation count : 8466240 in 60 samples of 141104 calls.
Execution time mean : 7.274613 µs
Execution time std-deviation : 108.681498 ns
# #TaylorWood - pivot
Evaluation count : 39848280 in 60 samples of 664138 calls.
Execution time mean : 1.568971 µs
Execution time std-deviation : 32.567822 ns
# #AlanThompson - reorder-tree
Evaluation count : 25999260 in 60 samples of 433321 calls.
Execution time mean : 2.385929 µs
Execution time std-deviation : 33.130731 ns
# #AlanThompson reorder-tree-reduce
Evaluation count : 14507820 in 60 samples of 241797 calls.
Execution time mean : 4.249135 µs
Execution time std-deviation : 89.933197 ns
# #amalloy - pivot
Evaluation count : 12721980 in 60 samples of 212033 calls.
Execution time mean : 5.087314 µs
Execution time std-deviation : 226.242206 ns

Here's another way to do it:
(defn pivot [data ks]
(reduce-kv
(fn [acc k v]
(reduce (fn [acc' k'] (assoc-in acc' [k' k] (k' v)))
acc
ks))
{}
data))
This takes a map and the expected keys, then reduces over each key/value pair in data, then does another inner reduce over each expected key, grabbing their value from the data map and assoc'ing it into the output map.
(def data
{1 {:good [1 2] :bad [3 4]}
2 {:good [5 6] :bad [7 8]}})
user=> (pivot data [:good :bad])
{:good {1 [1 2], 2 [5 6]}, :bad {1 [3 4], 2 [7 8]}}

Here is how I would do it. I'm using an atom to accumulate the results, but you could translate that to reduce if you really wanted:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test) )
(defn reorder-tree
[data]
(let [result (atom {})]
(doseq [[letter gb-data] data]
(doseq [[gb-key nums-data] gb-data]
(swap! result assoc-in [gb-key letter] nums-data)))
#result))
(dotest
(let [data {:a {:good [1 2]
:bad [3 4]}
:b {:good [5 6]
:bad [7 8]}}
expected {:good {:a [1 2]
:b [5 6]}
:bad {:a [3 4]
:b [7 8]}}]
(is= expected (reorder-tree data))))
Update: OK, I couldn't resist writing the reduce version using a nested for:
(defn reorder-tree-reduce
[data]
(reduce
(fn [cum-map [letter gb-key nums-data]]
(assoc-in cum-map [gb-key letter] nums-data))
{}
(for [[letter gb-data] data
[gb-key nums-data] gb-data]
[letter gb-key nums-data])))

Reduce is an unnecessarily low-level function to accomplish this goal. I'd prefer to generate a sequence of maps, in a way that is self-evidently correct, and then use merge to combine them. Interleaving the combination logic with the production logic makes it harder for a reader of the function to see what it's doing and whether it's correct. Relying instead on well-understood, simple functions merge and merge-with means there's no extra complexity that has to be understood anew.
(defn pivot [coll]
(apply merge-with merge
(for [[a m] coll
[b x] m]
{b {a x}})))

Related

Extracting two map elements with the largest distance in Clojure

I am trying to extract two elements of a map with the largest distance. For that, I defined the function for calculating the distance and can obtain the distance between the first element (p1) and other elements of the map. But I need to calculate distances between the second item (p2) and the next ones (p3, p4, p5), the third item (p3) and (p4, p5), the fourth item (p4) and fifth item (p5). Then I need to identify the maximum amount between all distances and return the 2 items with the largest distance and the distance itself. Any help is highly appreciated.
Here is my code:
(defn eclid-dist
[u v]
(Math/sqrt (apply + (map #(* % %) (mapv - u v)))))
(def error
{:p1 [1 2 3]
:p2 [4 5 6]
:p3 [7 8 9]
:p4 [1 2 3]
:p5 [6 5 4]})
(dotimes [i (dec (count error))]
(let [dis (eclid-dist (second (nth (seq error) 0))
(second (nth (seq error) (+ i 1))))
max-error (max dis)]
(println [':dis' dis ':max-error' max-error])))
I tried to save each calculated distance as a vector element separately to prevent overwriting but it was not successful.
You could use the for macro for this. It let's you combine two nested loops to test for all pairs. Then you can use max-key to pick the pair with largest distance:
(defn find-largest-dist-pair [vec-map]
(apply max-key second
(for [[[k0 v0] & r] (iterate rest vec-map)
:while r
[k1 v1] r]
[[k0 k1] (eclid-dist v0 v1)])))
(find-largest-dist-pair error)
;; => [[:p3 :p4] 10.392304845413264]
There is nothing wrong with eclid-dist, you could just use the dedicated Clojure library clojure.math (and ->> thread-last macro for better readability) and rewrite it like this:
(:require [clojure.math :as m])
(defn distance [u v]
(->> (mapv - u v)
(mapv #(m/pow % 2))
(reduce +)
m/sqrt))
Your main problem is, how to create unique pairs of points from your data. You could write a recursive function for this:
(defn unique-pairs [point-seq]
(let [[f & r] point-seq]
(when (seq r)
(concat (map #(vector f %) r)
(unique-pairs r)))))
(def error {:p1 [1 2 3]
:p2 [4 5 6]
:p3 [7 8 9]
:p4 [1 2 3]
:p5 [6 5 4]})
(unique-pairs (vals error))
or use library clojure.math.combinatorics:
Dependency: [org.clojure/math.combinatorics "0.1.6"]
(:require [clojure.math.combinatorics :as combi])
(combi/combinations (vals error) 2)
Note that these functions have slightly different results- it doesn't affect the final result, but if you can, you should use combinations.
Now, you have to compute distance for all these pairs and return the pair with the largest one:
(defn max-distance [point-map]
(->> (combi/combinations (vals point-map) 2)
(map (fn [[u v]] {:u u :v v :distance (distance u v)}))
(apply max-key :distance)))
(max-distance error)
=> {:u [1 2 3], :v [7 8 9], :distance 10.392304845413264}

Clojure - Function that returns all the indices of a vector of vectors

If I have a vector [[[1 2 3] [4 5 6] [7 8 9]] [[10 11] [12 13]] [[14] [15]]]
How can I return the positions of each element in the vector?
For example 1 has index [0 0 0], 2 has index [0 0 1], etc
I want something like
(some-fn [[[1 2 3] [4 5 6] [7 8 9]] [[10 11] [12 13]] [[14] [15]]] 1)
=> [0 0 0]
I know that if I have a vector [1 2 3 4], I can do (.indexOf [1 2 3 4] 1) => 0 but how can I extend this to vectors within vectors.
Thanks
and one more solution with zippers:
(require '[clojure.zip :as z])
(defn find-in-vec [x data]
(loop [curr (z/vector-zip data)]
(cond (z/end? curr) nil
(= x (z/node curr)) (let [path (rseq (conj (z/path curr) x))]
(reverse (map #(.indexOf %2 %1) path (rest path))))
:else (recur (z/next curr)))))
user> (find-in-vec 11 data)
(1 0 1)
user> (find-in-vec 12 data)
(1 1 0)
user> (find-in-vec 18 data)
nil
user> (find-in-vec 8 data)
(0 2 1)
the idea is to make a depth-first search for an item, and then reconstruct a path to it, indexing it.
Maybe something like this.
Unlike Asthor's answer it works for any nesting depth (until it runs out of stack). Their answer will give the indices of all items that match, while mine will return the first one. Which one you want depends on the specific use-case.
(defn indexed [coll]
(map-indexed vector coll))
(defn nested-index-of [coll target]
(letfn [(step [indices coll]
(reduce (fn [_ [i x]]
(if (sequential? x)
(when-let [result (step (conj indices i) x)]
(reduced result))
(when (= x target)
(reduced (conj indices i)))))
nil, (indexed coll)))]
(step [] coll)))
(def x [[[1 2 3] [4 5 6] [7 8 9]] [[10 11] [12 13]] [[14] [15]]])
(nested-index-of x 2) ;=> [0 0 1]
(nested-index-of x 15) ;=> [2 1 0]
Edit: Target never changes, so the inner step fn doesn't need it as an argument.
Edit 2: Cause I'm procrastinating here, and recursion is a nice puzzle, maybe you wanted the indices of all matches.
You can tweak my first function slightly to carry around an accumulator.
(defn nested-indices-of [coll target]
(letfn [(step [indices acc coll]
(reduce (fn [acc [i x]]
(if (sequential? x)
(step (conj indices i) acc x)
(if (= x target)
(conj acc (conj indices i))
acc)))
acc, (indexed coll)))]
(step [] [] coll)))
(def y [[[1 2 3] [4 5 6] [7 8 9]] [[10 11] [12 13]] [[14] [15 [16 17 4]]]])
(nested-indices-of y 4) ;=> [[0 1 0] [2 1 1 2]]
Vectors within vectors are no different to ints within vectors:
(.indexOf [[[1 2 3] [4 5 6] [7 8 9]] [[10 11] [12 13]] [[14] [15]]] [[14] [15]])
;;=> 2
The above might be a bit difficult to read, but [[14] [15]] is the third element.
Something like
(defn indexer [vec number]
(for [[x set1] (map-indexed vector vec)
[y set2] (map-indexed vector set1)
[z val] (map-indexed vector set2)
:when (= number val)]
[x y z]))
Written directly into here so not tested. Giving more context on what this would be used for might make it easier to give a good answer as this feels like something you shouldn't end up doing in Clojure.
You can also try and flatten the vectors in some way
An other solution to find the path of every occurrences of a given number.
Usually with functional programming you can go for broader, general, elegant, bite size solution. You will always be able to optimize using language constructs or techniques as you need (tail recursion, use of accumulator, use of lazy-seq, etc)
(defn indexes-of-value [v coll]
(into []
(comp (map-indexed #(if (== v %2) %1))
(remove nil?))
coll))
(defn coord' [v path coll]
(cond
;; node is a leaf: empty or coll of numbers
(or (empty? coll)
(number? (first coll)))
(when-let [indexes (seq (indexes-of-value v coll))]
(map #(conj path %) indexes))
;; node is branch: a coll of colls
(coll? (first coll))
(seq (sequence (comp (map-indexed vector)
(mapcat #(coord' v (conj path (first %)) (second %))))
coll))))
(defn coords [v coll] (coord' v [] coll))
Execution examples:
(def coll [[2 1] [] [7 8 9] [[] [1 2 2 3 2]]])
(coords 2 coll)
=> ([0 0] [3 1 1] [3 1 2] [3 1 4])
As a bonus you can write a function to test if paths are all valid:
(defn valid-coords? [v coll coords]
(->> coords
(map #(get-in coll %))
(remove #(== v %))
empty?))
and try the solution with input generated with clojure.spec:
(s/def ::leaf-vec (s/coll-of nat-int? :kind vector?))
(s/def ::branch-vec (s/or :branch (s/coll-of ::branch-vec :kind vector?
:min-count 1)
:leaf ::leaf-vec))
(let [v 1
coll (first (gen/sample (s/gen ::branch-vec) 1))
res (coords v coll)]
(println "generated coll: " coll)
(if-not (valid-coords? v coll res)
(println "Error:" res)
:ok))
Here is a function that can recursively search for a target value, keeping track of the indexes as it goes:
(ns tst.clj.core
(:use clj.core tupelo.test)
(:require [tupelo.core :as t] ))
(t/refer-tupelo)
(defn index-impl
[idxs data tgt]
(apply glue
(for [[idx val] (zip (range (count data)) data)]
(let [idxs-curr (append idxs idx)]
(if (sequential? val)
(index-impl idxs-curr val tgt)
(if (= val tgt)
[{:idxs idxs-curr :val val}]
[nil]))))))
(defn index [data tgt]
(keep-if not-nil? (index-impl [] data tgt)))
(dotest
(let [data-1 [1 2 3]
data-2 [[1 2 3]
[10 11]
[]]
data-3 [[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11]
[12 13]]
[[20]
[21]]
[[30]]
[[]]]
]
(spyx (index data-1 2))
(spyx (index data-2 10))
(spyx (index data-3 13))
(spyx (index data-3 21))
(spyx (index data-3 99))
))
with results:
(index data-1 2) => [{:idxs [1], :val 2}]
(index data-2 10) => [{:idxs [1 0], :val 10}]
(index data-3 13) => [{:idxs [1 1 1], :val 13}]
(index data-3 21) => [{:idxs [2 1 0], :val 21}]
(index data-3 99) => []
If we add repeated values we get the following:
data-4 [[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11]
[12 2]]
[[20]
[21]]
[[30]]
[[2]]]
(index data-4 2) => [{:idxs [0 0 1], :val 2}
{:idxs [1 1 1], :val 2}
{:idxs [4 0 0], :val 2}]

How to transpose a nested vector in clojure

I have the following variable
(def a [[1 2] [3 4] [5 6]])
and want to return
[[1 3 5][2 4 6]]
and if input is
[[1 2] [3 4] [5 6] [7 8 9]] then the required result is
[[1 3 5 7] [2 4 6 8] [9]]
How to do it in clojure?
(persistent!
(reduce
(fn [acc e]
(reduce-kv
(fn [acc2 i2 e2]
(assoc! acc2 i2 ((fnil conj []) (get acc2 i2) e2)))
acc
e))
(transient [])
[[1 2 3] [:a :b] [\a] [111] [200 300 400 500]]))
;;=> [[1 :a \a 111 200] [2 :b 300] [3 400] [500]]
An empty vector can be updated via the update-in fn at the 0th index, a non-empty vector can be, additionally, updated at the index immediately following the last value.
The reduction here is about passing the outer accumulator to the inner reducing function, updating it accordingly, and then returning it back to the outer reducing function, which in turn will pass again to the inner rf for processing the next element.
EDIT: Updated to fastest version.
I like ifett's implementation, though it seems weird to use reduce-kv to build a vector that could be easily build with map/mapv.
So, here is how I would've done it:
(defn transpose [v]
(mapv (fn [ind]
(mapv #(get % ind)
(filter #(contains? % ind) v)))
(->> (map count v)
(apply max)
range)))
(->> (range)
(map (fn [i]
(->> a
(filter #(contains? % i))
(map #(nth % i)))))
(take-while seq))
Notice that this algorithm creates a lazy seq of lazy seqs so you that you will only pay for the transformations you really consume. If you insist on creating vectors instead, wrap the forms in vec at the necessary places - or if you are using Clojurescript or don't mind a Clojure 1.7 alpha use transducers to create vectors eagerly without paying for laziness or immutability:
(into []
(comp
(map (fn [i]
(into [] (comp (filter #(contains? % i))
(map #(nth % i)))
a)))
(take-while seq))
(range))
I find this easy to understand:
(defn nth-column [matrix n]
(for [row matrix] (nth row n)))
(defn transpose [matrix]
(for [column (range (count (first matrix)))]
(nth-column matrix column)))
(transpose a)
=> ((1 3 5) (2 4 6))
nth-column is a list comprehension generating a sequence from the nth element of each sequence (of rows).
Then transpose-matrix is simply iterating over the columns creating a sequence element for each, consisting of (nth-column matrix column) i.e. the sequence of elements for that column.
(map
(partial filter identity) ;;remove nil in each sub-list
(take-while
#(some identity %) ;;stop on all nil sub-list
(for [i (range)]
(map #(get % i) a)))) ;; get returns nil on missing values
Use get to have nil on missing values, iterate (for) on an infinite range, stop on all nil sub-list, remove nil from sub-lists. Add vector constructor before first map and in it's function (first argument) if you really need vectors.
EDIT: please leave a comment if you think this is not useful. We can all learn from mistakes.

How to create a map from a list of key-value pairs using values as a predicate in Clojure?

Just started learning Clojure, so I imagine my main issue is I don't know how to formulate the problem correctly to find an existing solution. I have a map:
{[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1}
and I'd like to "transform" it to:
{[0 1] "a", [1 1] "a"}
i.e. use the two first elements of the composite key as they new key and the third element as the value for the key-value pair that had the highest value in the original map.
I can easily create a new map structure:
=> (into {} (for [[[x y z] v] {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1}] [[x y] {z v}]))
{[0 1] {"b" 1}, [1 1] {"a" 1}}
but into accepts no predicates so last one wins. I also experimented with :let and merge-with but can't seem to correctly refer to the map, eliminate the unwanted pairs or replace values of the map while processing.
You can do this by threading together a series of sequence transformations.
(->> data
(group-by #(->> % key (take 2)))
vals
(map (comp first first (partial sort-by (comp - val))))
(map (juxt #(subvec % 0 2) #(% 2)))
(into {}))
;{[0 1] "a", [1 1] "a"}
... where
(def data {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1})
You build up the solution line by line. I recommend you follow in the footsteps of the construction, starting with ...
(->> data
(group-by #(->> % key (take 2)))
;{(0 1) [[[0 1 "a"] 2] [[0 1 "b"] 1]], (1 1) [[[1 1 "a"] 1]]}
Stacking up layers of (lazy) sequences can run fairly slowly, but the transducers available in Clojure 1.7 will allow you to write faster code in this idiom, as seen in this excellent answer.
Into tends to be most useful when you just need to take a seq of values and with no additional transformation construct a result from it using only conj. Anything else where you are performing construction tends to be better suited by preprocessing such as sorting, or by a reduction which allows you to perform accumulator introspection such as you want here.
First of all we have to be able to compare two strings..
(defn greater? [^String a ^String b]
(> (.compareTo a b) 0))
Now we can write a transformation that compares the current value in the accumulator to the "next" value and keeps the maximum. -> used somewhat gratuitusly to make the update function more readable.
(defn transform [input]
(-> (fn [acc [[x y z] _]] ;; take the acc, [k, v], destructure k discard v
(let [key [x y]] ;; construct key into accumulator
(if-let [v (acc key)] ;; if the key is set
(if (greater? z v) ;; and z (the new val) is greater
(assoc acc key z) ;; then update
acc) ;; else do nothing
(assoc acc key z)))) ;; else update
(reduce {} input))) ;; do that over all [k, v]s from empty acc
user> (def m {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1})
#'user/m
user> (->> m
keys
sort
reverse
(mapcat (fn [x]
(vector (-> x butlast vec)
(last x))))
(apply sorted-map))
;=> {[0 1] "a", [1 1] "a"}

Clojure: how to merge several sorted vectors of vectors into common structure?

Need your help. Stuck on an intuitively simple task.
I have a few vectors of vectors. The first element of each of the sub-vectors is a numeric key. All parent vectors are sorted by these keys. For example:
[[1 a b] [3 c d] [4 f d] .... ]
[[1 aa bb] [2 cc dd] [3 ww qq] [5 f]... ]
[[3 ccc ddd] [4 fff ddd] ...]
Need to clarify that some key values in nested vectors may be missing, but sorting order guaranteed.
I need to merge all of these vectors into some unified structure by numeric keys. I also need to now, that a key was missed in original vector or vectors.
Like this:
[ [[1 a b][1 aa bb][]] [[][2 cc dd]] [[3 c d][3 ww qq][3 ccc ddd]] [[4 f d][][4 fff dd]]...]
You can break the problem down into two parts:
1) get the unique keys in sorted order
2) for each unique key, iterate through the list of vectors, and output either the entry for the key, or an empty list if missing
To get the unique keys, just pull out all the keys into lists, concat them into one big list, and then put them into a sorted-set:
(into
(sorted-set)
(apply concat
(for [vector vectors]
(map first vector))))
if we start with a list of vectors of:
(def vectors
[[[1 'a 'b] [3 'c 'd] [4 'f 'd]]
[[1 'aa 'bb] [2 'cc 'dd] [3 'ww 'qq] [5 'f]]
[[3 'ccc 'ddd] [4 'fff 'ddd]]])
then we get a sorted set of:
=> #{1 2 3 4 5}
so far so good. now for each key in that sorted set we need to iterate through the vectors, and get the entry with that key, or an empty list if it's missing. You can do that using two 'for' forms and then 'some' to find the entry (if present)
(for [k #{1 2 3 4 5}]
(for [vector vectors]
(or (some #(when (= (first %) k) %) vector )
[])))
this yields:
=> (([1 a b] [1 aa bb] []) ([] [2 cc dd] []) ([3 c d] [3 ww qq] [3 ccc ddd]) ([4 f d] [] [4 fff ddd]) ([] [5 f] []))
which I think is what you want. (if you need vectors and not lists, just use "(into [] ...)" in the appropriate places.)
Putting it all together, we get:
(defn unify-vectors [vectors]
(for [k (into (sorted-set)
(apply concat
(for [vector vectors]
(map first vector))))]
(for [vector vectors]
(or (some #(when (= (first %) k) %) vector)
[]))))
(unify-vectors
[[[1 'a 'b] [3 'c 'd] [4 'f 'd]]
[[1 'aa 'bb] [2 'cc 'dd] [3 'ww 'qq] [5 'f]]
[[3 'ccc 'ddd] [4 'fff 'ddd]]])
=> (([1 a b] [1 aa bb] []) ([] [2 cc dd] []) ([3 c d] [3 ww qq] [3 ccc ddd]) ([4 f d] [] [4 fff ddd]) ([] [5 f] []))
I do not have a complete solution for you, but as a hint: use group-by to sort your vectors for the first argument.
This will be more idiomatic and maybe just a few lines when it is ready.
So you could write something like
(group-by first [[1 :a :b] [3 :c :d] [4 :f :d]])
and do this for all vectors. Then you can sort / merge them with the keys provided by group-by.
This is a simple workaround, but doesn't meet the best practices of Clojure Programming. Just to give a simple idea here.
(def vectors
[
[[1 'a 'b] [3 'c 'd] [4 'f 'd]]
[[1 'aa 'bb] [2 'cc 'dd] [3 'ww 'qq] [5 'f]]
[[3 'ccc 'ddd] [4 'fff 'ddd]]]
)
(loop [i 1
result []]
(def sub-result [])
(doseq [v vectors]
(doseq [sub-v v]
(if
(= i (first sub-v))
(def sub-result (into sub-result [sub-v]))))
(if-not
(some #{i}
(map first v))
(def sub-result (into sub-result [[]]))
))
(if (< i 6)
(recur (inc i) (into result [sub-result]))
(print result)))