GROUP BY and Aggregation on Vector of maps - Clojure - clojure

I have data that looks like this
(def a [{:firmAccount "MSFT" :Val 10 :PE 3 }
{:firmAccount "MSFT" :Val 15 :PE 4}
{:firmAccount "GOG" :Val 15 :PE 3}
{:firmAccount "YAH" :Val 8 :PE 1}])
I want to group by on :firmAccount and then SUM the :Val and :PE for each firm account and get something like
[{:firmAccount "MSFT" :Val 25 :PE 7}
{:firmAccount "GOG" :Val 15 :PE 3}
{:FirmAccount "YAH" :Val 8 :PE 1}]
It is really a trivial thing and in SQL I would not even think twice but since I am learning clojure please bear with me

Clojure.core has a built-in group-by function. The solution becomes a little ugly by the presence of both text and int vals in the maps.
(for [m (group-by :firmAccount a)]
(assoc (apply merge-with + (map #(dissoc % :firmAccount) (val m)))
:firmAccount (key m)))

And for completeness here's an alternate version that uses map:
(map (fn [[firmAccount vals]]
{:firmAccount firmAccount
:Val (reduce + (map :Val vals))
:PE (reduce + (map :PE vals))})
(group-by :firmAccount a))

Try creating a new map array or map of maps with the same structure. You can write a function to add elements to this new map that sums that fields if the :firm-account exists. Maybe a map like this?
(def a {"MSFT" {:Val 25 :PE 7 }
"GOG" {:Val 15 :PE 3}
"YAH" {:Val 8 :PE 1}})
With a personalized add function like:
(add-to-map [element map]
(if (contains? (find-name element))
{map (add-fields element (find (find-name element)))}
{map element}))

It's a matter of style, but I find using thread-last macro (->>) is easier to read.
(->> a
(group-by :firmAccount)
(map (fn [[firmAccount vals]]
{:firmAccount firmAccount
:Val (reduce + (map :Val vals))
:PE (reduce + (map :PE vals))})

Related

How to transform a list of maps to a nested map of maps?

Getting data from the database as a list of maps (LazySeq) leaves me in need of transforming it into a map of maps.
I tried to 'assoc' and 'merge', but that didn't bring the desired result because of the nesting.
This is the form of my data:
(def data (list {:structure 1 :cat "A" :item "item1" :val 0.1}
{:structure 1 :cat "A" :item "item2" :val 0.2}
{:structure 1 :cat "B" :item "item3" :val 0.4}
{:structure 2 :cat "A" :item "item1" :val 0.3}
{:structure 2 :cat "B" :item "item3" :val 0.5}))
I would like to get it in the form
=> {1 {"A" {"item1" 0.1}
"item2" 0.2}}
{"B" {"item3" 0.4}}
2 {"A" {"item1" 0.3}}
{"B" {"item3" 0.5}}}
I tried
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply merge-with into))
This gives
{1 {"A" {"item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
By merging I lose some entries, but I can't think of any other way. Is there a simple way? I was even about to try to use specter.
Any thoughts would be appreciated.
If I'm dealing with nested maps, first stop is usually to think about update-in or assoc-in - these take a sequence of the nested keys. For a problem like this where the data is very regular, it's straightforward.
(assoc-in {} [1 "A" "item1"] 0.1)
;; =>
{1 {"A" {"item1" 0.1}}}
To consume a sequence into something else, reduce is the idiomatic choice. The reducing function is right on the edge of the complexity level I'd consider an anonymous fn for, so I'll pull it out instead for clarity.
(defn- add-val [acc line]
(assoc-in acc [(:structure line) (:cat line) (:item line)] (:val line)))
(reduce add-val {} data)
;; =>
{1 {"A" {"item1" 0.1, "item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
Which I think was the effect you were looking for.
Roads less travelled:
As your sequence is coming from a database, I wouldn't worry about using a transient collection to speed the aggregation up. Also, now I think about it, dealing with nested transient maps is a pain anyway.
update-in would be handy if you wanted to add up any values with the same key, for example, but the implication of your question is that structure/cat/item tuples are unique and so you just need the grouping.
juxt could be used to generate the key structure - i.e.
((juxt :structure :cat :item) (first data))
[1 "A" "item1"]
but it's not clear to me that there's any way to use this to make the add-val fn more readable.
You may continue to use your existing code. Only the final merge has to change:
(defn deep-merge [& xs]
(if (every? map? xs)
(apply merge-with deep-merge xs)
(apply merge xs)))
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply deep-merge))
;; =>
{1
{"A" {"item1" 0.1, "item2" 0.2},
"B" {"item3" 0.4}},
2
{"A" {"item1" 0.3},
"B" {"item3" 0.5}}}
Explanation: your original (apply merge-with into) only merge one level down. deep-merge from above will recurse into all nested maps to do the merge.
Addendum: #pete23 - one use of juxt I can think of is to make the function reusable. For example, we can extract arbitrary fields with juxt, then convert them to nested maps (with yet another function ->nested) and finally do a deep-merge:
(->> data
(map (juxt :structure :cat :item :val))
(map ->nested)
(apply deep-merge))
where ->nested can be implemented like:
(defn ->nested [[k & [v & r :as t]]]
{k (if (seq r) (->nested t) v)})
(->nested [1 "A" "item1" 0.1])
;; => {1 {"A" {"item1" 0.1}}}
One sample application (sum val by category):
(let [ks [:cat :val]]
(->> data
(map (apply juxt ks))
(map ->nested)
(apply (partial deep-merge-with +))))
;; => {"A" 0.6000000000000001, "B" 0.9}
Note deep-merge-with is left as an exercise for our readers :)
(defn map-values [f m]
(into {} (map (fn [[k v]] [k (f v)])) m))
(defn- transform-structures [ss]
(map-values (fn [cs]
(into {} (map (juxt :item :val) cs))) (group-by :cat ss)))
(defn transform [data]
(map-values transform-structures (group-by :structure data)))
then
(transform data)

Intersecting a list of lists of maps overriding equality

I have a list of lists of maps:
(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} ))
I want to get ids which are at intersection of these 3 sets only by :id key. So my result here will be 1
I came up with this solution but it's hurting my eyes:
(def coll '(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} )))
(apply clojure.set/intersection
(map set (map (fn [m]
(map #(select-keys % '(:id)) m)) coll)))
returns
#{{:id 1}}
which is Ok, but any other suggestions?
If you are fine with getting #{1} (as you mention initially) instead of #{{:id 1}}, then it can be slightly improved:
(apply set/intersection (map (fn [c] (into #{} (map :id c))) coll))
(require '[clojure.set :refer [intersection]])
The select keys I guess you don't need, since you are only interested in the id. (map :id m) does the job for the inner-most map. By this you are getting rid of a function shorthand. You can use it in the next map:
(map #(map :id %) coll)
;; ((1 2) (1) (1 2))
The third map you introduce is not necessary. it can be merged in the above piece of code:
(map (comp set #(map :id %)) coll)
or:
(map #(set (map :id %)) coll)
both evaluating to: (#{1 2} #{1} #{1 2})
This is still pretty nested. Threading macros don't help here. But you can use a very powerful list comprehension macro called for:
(for [row coll]
(set (map :id row)))
This gives you the advantage of naming list items (rows) but keeping it concise at the same time.
So finally:
(apply intersection (for [row coll]
(set (map :id row))))
;; #{1}

How to group-by a collection that is already grouped by in Clojure?

I have a collection of maps
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
I want to group this first by value, followed by type.
My output should look like:
{
:orange [{
:value 3,
:id [9345, 145]
}, {
:value 2,
:id [2935]
}],
:apple [{
:value 6,
:id [2745, 2345]
}]
}
How would I do this in Clojure? Appreciate your answers.
Thanks!
Edit:
Here is what I had so far:
(defn by-type-key [data]
(group-by #(get % "type") data))
(reduce-kv
(fn [m k v] (assoc m k (reduce-kv
(fn [sm sk sv] (assoc sm sk (into [] (map #(:id %) sv))))
{}
(group-by :value (map #(dissoc % :type) v)))))
{}
(by-type-key a))
Output:
=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345], 3 [125]}}
I just couldnt figure out how to proceed next...
Your requirements are a bit inconsistent (or rather irregular) - you use :type values as keywords in the result, but the rest of the keywords are carried through. Maybe that's what you must do to satisfy some external formats - otherwise you need to either use the same approach as with :type through, or add a new keyword to the result, like :group or :rows and keep the original keywords intact. I will assume the former approach for the moment (but see below, I will get to the shape as you want it,) so the final shape of data is like
{:orange
{:3 [9345 145],
:2 [2945]},
:apple
{:6 [2745 2345]}
}
There is more than one way of getting there, here's the gist of one:
(group-by (juxt :type :value) a)
The result:
{["orange" 3] [{:id 9345, :value 3, :type "orange"} {:id 145, :value 3, :type "orange"}],
["orange" 2] [{:id 2945, :value 2, :type "orange"}],
["apple" 6] [{:id 2745, :value 6, :type "apple"} {:id 2345, :value 6, :type "apple"}]}
Now all rows in your collection are grouped by the keys you need. From this, you can go and get the shape you want, say to get to the shape above you can do
(reduce
(fn [m [k v]]
(let [ks (map (comp keyword str) k)]
(assoc-in m ks
(map :id v))))
{}
(group-by (juxt :type :value) a))
The basic idea is to get the rows grouped by the key sequence (and that's what group-by and juxt do,) and then combine reduce and assoc-in or update-in to beat the result into place.
To get exactly the shape you described:
(reduce
(fn [m [k v]]
(let [type (keyword (first k))
value (second k)
ids (map :id v)]
(update-in m [type]
#(conj % {:value value :id ids}))))
{}
(group-by (juxt :type :value) a))
It's a bit noisy, and it might be harder to see the forest for the trees - that's why I simplified the shape, to highlight the main idea. The more regular your shapes are, the shorter and more regular your functions become - so if you have control over it, try to make it simpler for you.
I would do the transform in two stages (using reduce):
the first to collect the values
the second for formating
The following code solves your problem:
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
(defn standardise [m]
(->> m
;; first stage
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{})
;; second stage
(reduce-kv (fn [out k v]
(assoc out (keyword k)
(reduce-kv (fn [out value id]
(conj out {:value value
:id id}))
[]
v)))
{})))
(standardise a)
;; => {:orange [{:value 3, :id [9345 145]}
;; {:value 2, :id [2945]}],
;; :apple [{:value 6, :id [2745 2345]}]}
the output of the first stage is:
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{}
a)
;;=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345]}}
You may wish to use the built-in function group-by. See http://clojuredocs.org/clojure.core/group-by

How to reduce a nested collection without using mutable state?

Given a nested collection I would like to reduce it to only the k-v pairs which are the form [_ D] where D is an integer. For instance I would like to transform as follows:
; Start with this ...
{:a {:val 1 :val 2} :b {:val 3 :c {:val 4}} :val 5}
; ... end with this
{:val 1, :val 2, :val 3, :val 4, :val 5}
I have written a function using postwalk as follows:
(defn mindwave-values [data]
(let [values (atom {})
integer-walk (fn [x]
(if (map? x)
(doseq [[k v] x]
(if (integer? v) (swap! values assoc k v)))
x))]
(postwalk integer-walk data)
#values))
I am curious if it is possible to do this without using mutable state?
EDIT The original function was not quite correct.
Your example data structure is not a legal map, so I've changed it a bit:
(defn int-vals [x]
(cond (map? x) (mapcat int-vals x)
(coll? x) (when (= 2 (count x))
(if (integer? (second x))
[x]
(int-vals (second x))))))
user> (int-vals {:a {:x 1 :y 2} :b {:val 3 :c {:val 4}} :val 5})
([:y 2] [:x 1] [:val 4] [:val 3] [:val 5])
Your requirements are a bit vague: you say "collection", but your example contains only maps, so I've just had to guess at what you intended.

How to Increment Values in a Map

I am wrapping my head around state in Clojure. I come from languages where state can be mutated. For example, in Python, I can create a dictionary, put some string => integer pairs inside, and then walk over the dictionary and increment the values.
How would I do this in idiomatic Clojure?
(def my-map {:a 1 :b 2})
(zipmap (keys my-map) (map inc (vals my-map)))
;;=> {:b 3, :a 2}
To update only one value by key:
(update-in my-map [:b] inc) ;;=> {:a 1, :b 3}
Since Clojure 1.7 it's also possible to use update:
(update my-map :b inc)
Just produce a new map and use it:
(def m {:a 3 :b 4})
(apply merge
(map (fn [[k v]] {k (inc v) }) m))
; {:b 5, :a 4}
To update multiple values, you could also take advantage of reduce taking an already filled accumulator, and applying a function on that and every member of the following collection.
=> (reduce (fn [a k] (update-in a k inc)) {:a 1 :b 2 :c 3 :d 4} [[:a] [:c]])
{:a 2, :c 4, :b 2, :d 4}
Be aware of the keys needing to be enclosed in vectors, but you can still do multiple update-ins in nested structures like the original update in.
If you made it a generalized function, you could automatically wrap a vector over a key by testing it with coll?:
(defn multi-update-in
[m v f & args]
(reduce
(fn [acc p] (apply
(partial update-in acc (if (coll? p) p (vector p)) f)
args)) m v))
which would allow for single-level/key updates without the need for wrapping the keys in vectors
=> (multi-update-in {:a 1 :b 2 :c 3 :d 4} [:a :c] inc)
{:a 2, :c 4, :b 2, :d 4}
but still be able to do nested updates
(def people
{"keith" {:age 27 :hobby "needlefelting"}
"penelope" {:age 39 :hobby "thaiboxing"}
"brian" {:age 12 :hobby "rocket science"}})
=> (multi-update-in people [["keith" :age] ["brian" :age]] inc)
{"keith" {:age 28, :hobby "needlefelting"},
"penelope" {:age 39, :hobby "thaiboxing"},
"brian" {:age 13, :hobby "rocket science"}}
To slightly improve #Michiel Brokent's answer. This will work if the key already doesn't present.
(update my-map :a #(if (nil? %) 1 (inc %)))
I've been toying with the same idea, so I came up with:
(defn remap
"returns a function which takes a map as argument
and applies f to each value in the map"
[f]
#(into {} (map (fn [[k v]] [k (f v)]) %)))
((remap inc) {:foo 1})
;=> {:foo 2}
or
(def inc-vals (remap inc))
(inc-vals {:foo 1})
;=> {:foo 2}