In Clojure, how to group elements? - clojure

In clojure, I want to aggregate this data:
(def data [[:morning :pear][:morning :mango][:evening :mango][:evening :pear]])
(group-by first data)
;{:morning [[:morning :pear][:morning :mango]],:evening [[:evening :mango][:evening :pear]]}
My problem is that :evening and :morning are redundant.
Instead, I would like to create the following collection:
([:morning (:pear :mango)] [:evening (:mango :pear)])
I came up with:
(for [[moment moment-fruit-vec] (group-by first data)] [moment (map second moment-fruit-vec)])
Is there a more idiomatic solution?

I've come across similar grouping problems. Usually I end up plugging merge-with or update-in into some seq processing step:
(apply merge-with list (map (partial apply hash-map) data))
You get a map, but this is just a seq of key-value pairs:
user> (apply merge-with list (map (partial apply hash-map) data))
{:morning (:pear :mango), :evening (:mango :pear)}
user> (seq *1)
([:morning (:pear :mango)] [:evening (:mango :pear)])
This solution only gets what you want if each key appears twice, however. This might be better:
(reduce (fn [map [x y]] (update-in map [x] #(cons y %))) {} data)
Both of these feel "more functional" but also feel a little convoluted. Don't be too quick to dismiss your solution, it's easy-to-understand and functional enough.

Don't be too quick to dismiss group-by, it has aggregated your data by the desired key and it hasn't changed the data. Any other function expecting a sequence of moment-fruit pairs will accept any value looked up in the map returned by group-by.
In terms of computing the summary my inclination was to reach for merge-with but for that I had to transform the input data into a sequence of maps and construct a "base-map" with the required keys and empty-vectors as values.
(let [i-maps (for [[moment fruit] data] {moment fruit})
base-map (into {}
(for [key (into #{} (map first data))]
[key []]))]
(apply merge-with conj base-map i-maps))
{:morning [:pear :mango], :evening [:mango :pear]}

Meditating on #mike t's answer, I've come up with:
(defn agg[x y] (if (coll? x) (cons y x) (list y x)))
(apply merge-with agg (map (partial apply hash-map) data))
This solution works also when the keys appear more than twice on data:
(apply merge-with agg (map (partial apply hash-map)
[[:morning :pear][:morning :mango][:evening :mango] [:evening :pear] [:evening :kiwi]]))
;{:morning (:mango :pear), :evening (:kiwi :pear :mango)}

maybe just modify the standard group-by a little bit:
(defn my-group-by
[fk fv coll]
(persistent!
(reduce
(fn [ret x]
(let [k (fk x)]
(assoc! ret k (conj (get ret k []) (fv x)))))
(transient {}) coll)))
then use it as:
(my-group-by first second data)

Related

Using clojure, Is there a better way to to remove a item from a sequence, which is the value in a map?

There is a map containing sequences. The sequences contain items.
I want to remove a given item from any sequence that contains it.
The solution I found does what it should, but I wonder if there is a better
or more elegant way to achieve the same.
my current solution:
(defn remove-item-from-map-value [my-map item]
(apply merge (for [[k v] my-map] {k (remove #(= item %) v)})))
The test describe the expected behaviour:
(require '[clojure.test :as t])
(def my-map {:keyOne ["itemOne"]
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemFive" "itemSix"]})
(defn remove-item-from-map-value [my-map item]
(apply merge (for [[k v] my-map] {k (remove #(= item %) v)})))
(t/is (= (remove-item-from-map-value my-map "unknown-item") my-map))
(t/is (= (remove-item-from-map-value my-map "itemFive") {:keyOne ["itemOne"]
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemSix"]}))
(t/is (= (remove-item-from-map-value my-map "itemThree") {:keyOne ["itemOne"]
:keyTwo ["itemTwo"]
:keyThree ["itemFour" "itemFive" "itemSix"]}))
(t/is (= (remove-item-from-map-value my-map "itemOne") {:keyOne []
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemFive" "itemSix"]}))
I'm fairly new to clojure and am interested in different solutions.
So any input is welcome.
I throw in the specter
version for good measure. It keeps the vectors inside the map
and it's really compact.
(setval [MAP-VALS ALL #{"itemFive"}] NONE my-map)
Example
user=> (use 'com.rpl.specter)
nil
user=> (def my-map {:keyOne ["itemOne"]
#_=> :keyTwo ["itemTwo" "itemThree"]
#_=> :keyThree ["itemFour" "itemFive" "itemSix"]})
#_=>
#'user/my-map
user=> (setval [MAP-VALS ALL #{"itemFive"}] NONE my-map)
{:keyOne ["itemOne"],
:keyThree ["itemFour" "itemSix"],
:keyTwo ["itemTwo" "itemThree"]}
user=> (setval [MAP-VALS ALL #{"unknown"}] NONE my-map)
{:keyOne ["itemOne"],
:keyThree ["itemFour" "itemFive" "itemSix"],
:keyTwo ["itemTwo" "itemThree"]}
i would go with something like this:
user> (defn remove-item [my-map item]
(into {}
(map (fn [[k v]] [k (remove #{item} v)]))
my-map))
#'user/remove-item
user> (remove-item my-map "itemFour")
;;=> {:keyOne ("itemOne"),
;; :keyTwo ("itemTwo" "itemThree"),
;; :keyThree ("itemFive" "itemSix")}
you could also make up a handy function map-val performing mapping on map values:
(defn map-val [f data]
(reduce-kv
(fn [acc k v] (assoc acc k (f v)))
{} data))
or shortly like this:
(defn map-val [f data]
(reduce #(update % %2 f) data (keys data)))
user> (map-val inc {:a 1 :b 2})
;;=> {:a 2, :b 3}
(defn remove-item [my-map item]
(map-val (partial remove #{item}) my-map))
user> (remove-item my-map "itemFour")
;;=> {:keyOne ("itemOne"),
;; :keyTwo ("itemTwo" "itemThree"),
;; :keyThree ("itemFive" "itemSix")}
I think your solution is mostly okay, but I would try to avoid the apply merge part, as you can easily recreate a map from a sequence with into. Also, you could also use map instead of for which I think is a little bit more idiomatic in this case as you don't use any of the list comprehension features of for.
(defn remove-item-from-map-value [m item]
(->> m
(map (fn [[k vs]]
{k (remove #(= item %) vs)}))
(into {})))
Another solution much like #leetwinski:
(defn remove-item [m i]
(zipmap (keys m)
(map (fn [v] (remove #(= % i) v))
(vals m))))
Here's a one-liner which does this in an elegant way. The perfect function for me to use in this scenario is clojure.walk/prewalk. What this fn does is it traverse all of the sub-forms of the form that you pass to it and it transforms them with the provided fn:
(defn remove-item-from-map-value [data item]
(clojure.walk/prewalk #(if (map-entry? %) [(first %) (remove #{item} (second %))] %) data))
What the remove-item-from-map-value fn will do is it will check if current form is a map entry and if so, it will remove specified key from its value (second element of the map entry, which is a vector containing a key and a value, respectively).
The best this about this approach is that is is completely extendable: you could decide to do different things for different types of forms, you can also handle nested forms, etc.
It took me some time to master this fn but once I got it I found it extremely useful!

Get key by first element in value list in Clojure

This is similar to Clojure get map key by value
However, there is one difference. How would you do the same thing if hm is like
{1 ["bar" "choco"]}
The idea being to get 1 (the key) where the first element if the value list is "bar"? Please feel free to close/merge this question if some other question answers it.
I tried something like this, but it doesn't work.
(def hm {:foo ["bar", "choco"]})
(keep #(when (= ((nth val 0) %) "bar")
(key %))
hm)
You can filter the map and return the first element of the first item in the resulting sequence:
(ffirst (filter (fn [[k [v & _]]] (= "bar" v)) hm))
you can destructure the vector value to access the second and/or third elements e.g.
(ffirst (filter (fn [[k [f s t & _]]] (= "choco" s))
{:foo ["bar", "choco"]}))
past the first few elements you will probably find nth more readable.
Another way to do it using some:
(some (fn [[k [v & _]]] (when (= "bar" v) k)) hm)
Your example was pretty close to working, with some minor changes:
(keep #(when (= (nth (val %) 0) "bar")
(key %))
hm)
keep and some are similar, but some only returns one result.
in addition to all the above (correct) answers, you could also want to reindex your map to desired form, especially if the search operation is called quite frequently and the the initial map is rather big, this would allow you to decrease the search complexity from linear to constant:
(defn map-invert+ [kfn vfn data]
(reduce (fn [acc entry] (assoc acc (kfn entry) (vfn entry)))
{} data))
user> (def data
{1 ["bar" "choco"]
2 ["some" "thing"]})
#'user/data
user> (def inverted (map-invert+ (comp first val) key data))
#'user/inverted
user> inverted
;;=> {"bar" 1, "some" 2}
user> (inverted "bar")
;;=> 1

Is there a reducing function in Clojure that performs the equivalent of `first`?

I'm often writing code of the form
(->> init
(map ...)
(filter ...)
(first))
When converting this into code that uses transducers I'll end up with something like
(transduce (comp (map ...) (filter ...)) (completing #(reduced %2)) nil init)
Writing (completing #(reduced %2)) instead of first doesn't sit well with me at all. It needlessly obscures a very straightforward task. Is there a more idiomatic way of performing this task?
I'd personally use your approach with a custom reducing function but here are some alternatives:
(let [[x] (into [] (comp (map inc) (filter even?) (take 1)) [0 1])]
x)
Using destructing :/
Or:
(first (eduction (map inc) (filter even?) [0 1])
Here you save on calling comp which is done for you. Though it's not super lazy. It'll realize up to 32 elements so it's potentially wasteful.
Fixed with a (take 1):
(first (eduction (map inc) (filter even?) (take 1) [0 1]))
Overall a bit shorter and not too unclear compared to:
(transduce (comp (map inc) (filter even?) (take 1)) (completing #(reduced %2)) nil [0 1])
If you need this a bunch, then I'd probably NOT create a custom reducer function but instead a function similar to transduce that takes xform, coll as the argument and returns the first value. It's clearer what it does and you can give it a nice docstring. If you want to save on calling comp you can also make it similar to eduction:
(defn single-xf
"Returns the first item of transducing the xforms over collection"
{:arglists '([xform* coll])}
[& xforms]
(transduce (apply comp (butlast xforms)) (completing #(reduced %2)) nil (last xforms)))
Example:
(single-xf (map inc) (filter even?) [0 1])
medley has find-first with a transducer arity and xforms has a reducing function called last. I think that the combination of the two is what you're after.
(ns foo.bar
(:require
[medley.core :as medley]
[net.cgrand.xforms.rfs :as rfs]))
(transduce (comp (map ,,,) (medley/find-first ,,,)) rfs/last init)

Simple "R-like" melt : better way to do?

Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))

Grouping words and more

I'm working on a project to learn Clojure in practice. I'm doing well, but sometimes I get stuck. This time I need to transform sequence of the form:
[":keyword0" "word0" "word1" ":keyword1" "word2" "word3"]
into:
[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]
I'm trying for at least two hours, but I know not so many Clojure functions to compose something useful to solve the problem in functional manner.
I think that this transformation should include some partition, here is my attempt:
(partition-by (fn [x] (.startsWith x ":")) *1)
But the result looks like this:
((":keyword0") ("word1" "word2") (":keyword1") ("word3" "word4"))
Now I should group it again... I doubt that I'm doing right things here... Also, I need to convert strings (only those that begin with :) into keywords. I think this combination should work:
(keyword (subs ":keyword0" 1))
How to write a function which performs the transformation in most idiomatic way?
Here is a high performance version, using reduce
(reduce (fn [acc next]
(if (.startsWith next ":")
(conj acc [(-> next (subs 1) keyword)])
(conj (pop acc) (conj (peek acc)
next))))
[] data)
Alternatively, you could extend your code like this
(->> data
(partition-by #(.startsWith % ":"))
(partition 2)
(map (fn [[[kw-str] strs]]
(cons (-> kw-str
(subs 1)
keyword)
strs))))
what about that:
(defn group-that [ arg ]
(if (not-empty arg)
(loop [list arg, acc [], result []]
(if (not-empty list)
(if (.startsWith (first list) ":")
(if (not-empty acc)
(recur (rest list) (vector (first list)) (conj result acc))
(recur (rest list) (vector (first list)) result))
(recur (rest list) (conj acc (first list)) result))
(conj result acc)
))))
Just 1x iteration over the Seq and without any need of macros.
Since the question is already here... This is my best effort:
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
(->> data
(partition-by (fn [x] (.startsWith x ":")))
(partition 2)
(map (fn [[[k] w]] (apply conj [(keyword (subs k 1))] w))))
I'm still looking for a better solution or criticism of this one.
First, let's construct a function that breaks vector v into sub-vectors, the breaks occurring everywhere property pred holds.
(defn breakv-by [pred v]
(let [break-points (filter identity (map-indexed (fn [n x] (when (pred x) n)) v))
starts (cons 0 break-points)
finishes (concat break-points [(count v)])]
(mapv (partial subvec v) starts finishes)))
For our case, given
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
then
(breakv-by #(= (first %) \:) data)
produces
[[] [":keyword0" "word0" "word1"] [":keyword1" "word2" "word3"]]
Notice that the initial sub-vector is different:
It has no element for which the predicate holds.
It can be of length zero.
All the others
start with their only element for which the predicate holds and
are at least of length 1.
So breakv-by behaves properly with data that
doesn't start with a breaking element or
has a succession of breaking elements.
For the purposes of the question, we need to muck about with what breakv-by produces somewhat:
(let [pieces (breakv-by #(= (first %) \:) data)]
(mapv
#(update-in % [0] (fn [s] (keyword (subs s 1))))
(rest pieces)))
;[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]