I have a data set that looks like this
[{1 "a"} {2 "b"} {3 "c"}]
I want to transform it into a cummulative map that looks like
{1 "a" 3 "b" 6 "c"}
I think my current approach is long winded. So far I have come up with
(reduce
(fn [sum item]
(assoc sum (+ (reduce + (keys sum))
(key (first item)))
(val (first item))))
split-map)
but the addition on the keys is incorrect. Does anyone know how I can improve on this?
and one more fun version:
(->> data
(reductions (fn [[sum] m] (update (first m) 0 + sum)) [0])
rest
(into {}))
;;=> {1 "a", 3 "b", 6 "c"}
the trick is that reduction function operates on previous and current key-value pairs updating the current pair's key:
(reductions (fn [[sum] m] (update (first m) 0 + sum)) [0] data)
;;=> ([0] [1 "a"] [3 "b"] [6 "c"])
If you fancy transducers:
(require '[net.cgrand.xforms :as xf])
(let [data [{1 "a"} {2 "b"} {3 "c"}]]
(into {} (comp
(map first)
(xf/multiplex [(map last)
(comp (map first) (xf/reductions +) (drop 1))])
(partition-all 2)) data))
=> {1 "a", 3 "b", 6 "c"}
Here is a possible implementation of the computation and it makes extensive use of Clojure sequence functions:
(defn cumul-pairs [data]
(zipmap (rest (reductions ((map (comp key first)) +) 0 data))
(map (comp val first) data)))
(cumul-pairs [{1 "a"} {2 "b"} {3 "c"}])
;; => {1 "a", 3 "b", 6 "c"}
In this code, the expression (rest (reductions ((map (comp first keys)) +) 0 data)) computes the keys of the resulting map and the expression (map (comp first vals) data) computes the values. Then we combine them with zipmap. The function reductions works just like reduce but returns a sequence of all intermediate results instead of just the last one. The curious looking subexpression ((map (comp first keys)) +) is the reducing function, where we use a mapping transducer to construct a reducing function from the + reducing function that will map the input value before adding it.
This is a bit of an awkward problem to solve succintly. Here is one way:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(dotest
(let-spy
[x1 [{1 "a"} {2 "b"} {3 "c"}]
nums (mapv #(first (first %)) x1)
chars (mapv #(second (first %)) x1)
nums-cum (reductions + nums)
pairs (mapv vector nums-cum chars) ; these 2 lines are
result (into {} pairs)] ; like `zipmap`
(is= result {1 "a", 3 "b", 6 "c"})))
By using my favorite template project
we are able to use let-spy from the Tupelo library
and see the results printed at each step:
-----------------------------------
Clojure 1.10.3 Java 15.0.2
-----------------------------------
Testing tst.demo.core
x1 => [{1 "a"} {2 "b"} {3 "c"}]
nums => [1 2 3]
chars => ["a" "b" "c"]
nums-cum => (1 3 6)
pairs => [[1 "a"] [3 "b"] [6 "c"]]
result => {1 "a", 3 "b", 6 "c"}
Ran 2 tests containing 1 assertions.
0 failures, 0 errors.
When it is working with all unit tests, just trim off the -spy part to leave a normal (let ...)form.
Be sure to see this list of documentation sources, especially the Clojure CheatSheet.
Probably the easiest (most readable) version:
(def ml [{1 "a"} {2 "b"} {3 "c"}])
(defn cumsum [l] (reductions + l))
(let [m (into (sorted-map) ml)]
(zipmap (cumsum (keys m)) (vals m)))
;; => {1 "a", 3 "b", 6 "c"}
How about this?
(defn f [v]
(zipmap (reductions + (mapcat keys v)) (mapcat vals v)))
which works with the original vector of maps:
(f [{1 "a"} {2 "b"} {3 "c"}])
;; => {1 "a", 3 "b", 6 "c"}
.. and also with maps of varying length:
(f [{1 "a"} {2 "b"} {3 "c" 4 "d"} {5 "e" 6 "f" 7 "g"}])
;; => {1 "a", 3 "b", 6 "c", 10 "d", 15 "e", 21 "f", 28 "g"}
Related
Getting data from the database as a list of maps (LazySeq) leaves me in need of transforming it into a map of maps.
I tried to 'assoc' and 'merge', but that didn't bring the desired result because of the nesting.
This is the form of my data:
(def data (list {:structure 1 :cat "A" :item "item1" :val 0.1}
{:structure 1 :cat "A" :item "item2" :val 0.2}
{:structure 1 :cat "B" :item "item3" :val 0.4}
{:structure 2 :cat "A" :item "item1" :val 0.3}
{:structure 2 :cat "B" :item "item3" :val 0.5}))
I would like to get it in the form
=> {1 {"A" {"item1" 0.1}
"item2" 0.2}}
{"B" {"item3" 0.4}}
2 {"A" {"item1" 0.3}}
{"B" {"item3" 0.5}}}
I tried
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply merge-with into))
This gives
{1 {"A" {"item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
By merging I lose some entries, but I can't think of any other way. Is there a simple way? I was even about to try to use specter.
Any thoughts would be appreciated.
If I'm dealing with nested maps, first stop is usually to think about update-in or assoc-in - these take a sequence of the nested keys. For a problem like this where the data is very regular, it's straightforward.
(assoc-in {} [1 "A" "item1"] 0.1)
;; =>
{1 {"A" {"item1" 0.1}}}
To consume a sequence into something else, reduce is the idiomatic choice. The reducing function is right on the edge of the complexity level I'd consider an anonymous fn for, so I'll pull it out instead for clarity.
(defn- add-val [acc line]
(assoc-in acc [(:structure line) (:cat line) (:item line)] (:val line)))
(reduce add-val {} data)
;; =>
{1 {"A" {"item1" 0.1, "item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
Which I think was the effect you were looking for.
Roads less travelled:
As your sequence is coming from a database, I wouldn't worry about using a transient collection to speed the aggregation up. Also, now I think about it, dealing with nested transient maps is a pain anyway.
update-in would be handy if you wanted to add up any values with the same key, for example, but the implication of your question is that structure/cat/item tuples are unique and so you just need the grouping.
juxt could be used to generate the key structure - i.e.
((juxt :structure :cat :item) (first data))
[1 "A" "item1"]
but it's not clear to me that there's any way to use this to make the add-val fn more readable.
You may continue to use your existing code. Only the final merge has to change:
(defn deep-merge [& xs]
(if (every? map? xs)
(apply merge-with deep-merge xs)
(apply merge xs)))
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply deep-merge))
;; =>
{1
{"A" {"item1" 0.1, "item2" 0.2},
"B" {"item3" 0.4}},
2
{"A" {"item1" 0.3},
"B" {"item3" 0.5}}}
Explanation: your original (apply merge-with into) only merge one level down. deep-merge from above will recurse into all nested maps to do the merge.
Addendum: #pete23 - one use of juxt I can think of is to make the function reusable. For example, we can extract arbitrary fields with juxt, then convert them to nested maps (with yet another function ->nested) and finally do a deep-merge:
(->> data
(map (juxt :structure :cat :item :val))
(map ->nested)
(apply deep-merge))
where ->nested can be implemented like:
(defn ->nested [[k & [v & r :as t]]]
{k (if (seq r) (->nested t) v)})
(->nested [1 "A" "item1" 0.1])
;; => {1 {"A" {"item1" 0.1}}}
One sample application (sum val by category):
(let [ks [:cat :val]]
(->> data
(map (apply juxt ks))
(map ->nested)
(apply (partial deep-merge-with +))))
;; => {"A" 0.6000000000000001, "B" 0.9}
Note deep-merge-with is left as an exercise for our readers :)
(defn map-values [f m]
(into {} (map (fn [[k v]] [k (f v)])) m))
(defn- transform-structures [ss]
(map-values (fn [cs]
(into {} (map (juxt :item :val) cs))) (group-by :cat ss)))
(defn transform [data]
(map-values transform-structures (group-by :structure data)))
then
(transform data)
I'm trying to write a function with recur that cut the sequence as soon as it encounters a repetition ([1 2 3 1 4] should return [1 2 3]), this is my function:
(defn cut-at-repetition [a-seq]
(loop[[head & tail] a-seq, coll '()]
(if (empty? head)
coll
(if (contains? coll head)
coll
(recur (rest tail) (conj coll head))))))
The first problem is with the contains? that throws an exception, I tried replacing it with some but with no success. The second problem is in the recur part which will also throw an exception
You've made several mistakes:
You've used contains? on a sequence. It only works on associative
collections. Use some instead.
You've tested the first element of the sequence (head) for empty?.
Test the whole sequence.
Use a vector to accumulate the answer. conj adds elements to the
front of a list, reversing the answer.
Correcting these, we get
(defn cut-at-repetition [a-seq]
(loop [[head & tail :as all] a-seq, coll []]
(if (empty? all)
coll
(if (some #(= head %) coll)
coll
(recur tail (conj coll head))))))
(cut-at-repetition [1 2 3 1 4])
=> [1 2 3]
The above works, but it's slow, since it scans the whole sequence for every absent element. So better use a set.
Let's call the function take-distinct, since it is similar to take-while. If we follow that precedent and make it lazy, we can do it thus:
(defn take-distinct [coll]
(letfn [(td [seen unseen]
(lazy-seq
(when-let [[x & xs] (seq unseen)]
(when-not (contains? seen x)
(cons x (td (conj seen x) xs))))))]
(td #{} coll)))
We get the expected results for finite sequences:
(map (juxt identity take-distinct) [[] (range 5) [2 3 2]]
=> ([[] nil] [(0 1 2 3 4) (0 1 2 3 4)] [[2 3 2] (2 3)])
And we can take as much as we need from an endless result:
(take 10 (take-distinct (range)))
=> (0 1 2 3 4 5 6 7 8 9)
I would call your eager version take-distinctv, on the map -> mapv precedent. And I'd do it this way:
(defn take-distinctv [coll]
(loop [seen-vec [], seen-set #{}, unseen coll]
(if-let [[x & xs] (seq unseen)]
(if (contains? seen-set x)
seen-vec
(recur (conj seen-vec x) (conj seen-set x) xs))
seen-vec)))
Notice that we carry the seen elements twice:
as a vector, to return as the solution; and
as a set, to test for membership of.
Two of the three mistakes were commented on by #cfrick.
There is a tradeoff between saving a line or two and making the logic as simple & explicit as possible. To make it as obvious as possible, I would do it something like this:
(defn cut-at-repetition
[values]
(loop [remaining-values values
result []]
(if (empty? remaining-values)
result
(let [found-values (into #{} result)
new-value (first remaining-values)]
(if (contains? found-values new-value)
result
(recur
(rest remaining-values)
(conj result new-value)))))))
(cut-at-repetition [1 2 3 1 4]) => [1 2 3]
Also, be sure to bookmark The Clojure Cheatsheet and always keep a browser tab open to it.
I'd like to hear feedback on this utility function which I wrote for myself (uses filter with stateful pred instead of a loop):
(defn my-distinct
"Returns distinct values from a seq, as defined by id-getter."
[id-getter coll]
(let [seen-ids (volatile! #{})
seen? (fn [id] (if-not (contains? #seen-ids id)
(vswap! seen-ids conj id)))]
(filter (comp seen? id-getter) coll)))
(my-distinct identity "abracadabra")
; (\a \b \r \c \d)
(->> (for [i (range 50)] {:id (mod (* i i) 21) :value i})
(my-distinct :id)
pprint)
; ({:id 0, :value 0}
; {:id 1, :value 1}
; {:id 4, :value 2}
; {:id 9, :value 3}
; {:id 16, :value 4}
; {:id 15, :value 6}
; {:id 7, :value 7}
; {:id 18, :value 9})
Docs of filter says "pred must be free of side-effects" but I'm not sure if it is ok in this case. Is filter guaranteed to iterate over the sequence in order and not for example take skips forward?
I have a list of lists of maps:
(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} ))
I want to get ids which are at intersection of these 3 sets only by :id key. So my result here will be 1
I came up with this solution but it's hurting my eyes:
(def coll '(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} )))
(apply clojure.set/intersection
(map set (map (fn [m]
(map #(select-keys % '(:id)) m)) coll)))
returns
#{{:id 1}}
which is Ok, but any other suggestions?
If you are fine with getting #{1} (as you mention initially) instead of #{{:id 1}}, then it can be slightly improved:
(apply set/intersection (map (fn [c] (into #{} (map :id c))) coll))
(require '[clojure.set :refer [intersection]])
The select keys I guess you don't need, since you are only interested in the id. (map :id m) does the job for the inner-most map. By this you are getting rid of a function shorthand. You can use it in the next map:
(map #(map :id %) coll)
;; ((1 2) (1) (1 2))
The third map you introduce is not necessary. it can be merged in the above piece of code:
(map (comp set #(map :id %)) coll)
or:
(map #(set (map :id %)) coll)
both evaluating to: (#{1 2} #{1} #{1 2})
This is still pretty nested. Threading macros don't help here. But you can use a very powerful list comprehension macro called for:
(for [row coll]
(set (map :id row)))
This gives you the advantage of naming list items (rows) but keeping it concise at the same time.
So finally:
(apply intersection (for [row coll]
(set (map :id row))))
;; #{1}
Just started learning Clojure, so I imagine my main issue is I don't know how to formulate the problem correctly to find an existing solution. I have a map:
{[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1}
and I'd like to "transform" it to:
{[0 1] "a", [1 1] "a"}
i.e. use the two first elements of the composite key as they new key and the third element as the value for the key-value pair that had the highest value in the original map.
I can easily create a new map structure:
=> (into {} (for [[[x y z] v] {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1}] [[x y] {z v}]))
{[0 1] {"b" 1}, [1 1] {"a" 1}}
but into accepts no predicates so last one wins. I also experimented with :let and merge-with but can't seem to correctly refer to the map, eliminate the unwanted pairs or replace values of the map while processing.
You can do this by threading together a series of sequence transformations.
(->> data
(group-by #(->> % key (take 2)))
vals
(map (comp first first (partial sort-by (comp - val))))
(map (juxt #(subvec % 0 2) #(% 2)))
(into {}))
;{[0 1] "a", [1 1] "a"}
... where
(def data {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1})
You build up the solution line by line. I recommend you follow in the footsteps of the construction, starting with ...
(->> data
(group-by #(->> % key (take 2)))
;{(0 1) [[[0 1 "a"] 2] [[0 1 "b"] 1]], (1 1) [[[1 1 "a"] 1]]}
Stacking up layers of (lazy) sequences can run fairly slowly, but the transducers available in Clojure 1.7 will allow you to write faster code in this idiom, as seen in this excellent answer.
Into tends to be most useful when you just need to take a seq of values and with no additional transformation construct a result from it using only conj. Anything else where you are performing construction tends to be better suited by preprocessing such as sorting, or by a reduction which allows you to perform accumulator introspection such as you want here.
First of all we have to be able to compare two strings..
(defn greater? [^String a ^String b]
(> (.compareTo a b) 0))
Now we can write a transformation that compares the current value in the accumulator to the "next" value and keeps the maximum. -> used somewhat gratuitusly to make the update function more readable.
(defn transform [input]
(-> (fn [acc [[x y z] _]] ;; take the acc, [k, v], destructure k discard v
(let [key [x y]] ;; construct key into accumulator
(if-let [v (acc key)] ;; if the key is set
(if (greater? z v) ;; and z (the new val) is greater
(assoc acc key z) ;; then update
acc) ;; else do nothing
(assoc acc key z)))) ;; else update
(reduce {} input))) ;; do that over all [k, v]s from empty acc
user> (def m {[0 1 "a"] 2, [0 1 "b"] 1, [1 1 "a"] 1})
#'user/m
user> (->> m
keys
sort
reverse
(mapcat (fn [x]
(vector (-> x butlast vec)
(last x))))
(apply sorted-map))
;=> {[0 1] "a", [1 1] "a"}
Given:
(def my-vec [{:a "foo" :b 10} {:a "bar" :b 13} {:a "baz" :b 7}])
How could iterate over each element to print that element's :a and the sum of all :b's to that point? That is:
"foo" 10
"bar" 23
"baz" 30
I'm trying things like this to no avail:
; Does not work!
(map #(prn (:a %2) %1) (iterate #(+ (:b %2) %1) 0)) my-vec)
This doesn't work because the "iterate" lazy-seq can't refer to the current element in my-vec (as far as I can tell).
TIA! Sean
user> (reduce (fn [total {:keys [a b]}]
(let [total (+ total b)]
(prn a total)
total))
0 my-vec)
"foo" 10
"bar" 23
"baz" 30
30
You could look at this as starting with a sequence of maps, filtering out a sequence of the :a values and a separate sequence of the rolling sum of the :b values and then mapping a function of two arguments onto the two derived sequences.
create sequence of just the :a and :b values with
(map :a my-vec)
(map :b my-vec)
then a function to get the rolling sum:
(defn sums [sum seq]
"produce a seq of the rolling sum"
(if (empty? seq)
sum
(lazy-seq
(cons sum
(recur (+ sum (first seq)) (rest seq))))))
then put them together:
(map #(prn %1 %s) (map :a my-vec) (sums 0 (map :b my-vec)))
This separates the problem of generating the data from processing it. Hopefully this makes life easier.
PS: whats a better way of getting the rolling sum?
Transform it into the summed sequence:
(defn f [start mapvec]
(if (empty? mapvec) '()
(let [[ m & tail ] mapvec]
(cons [(m :a)(+ start (m :b))] (f (+ start (m :b)) tail)))))
Called as:
(f 0 my-vec)
returns:
(["foo" 10] ["bar" 23] ["baz" 30])