Idiomatic Clojure way to find most frequent items in a seq - clojure

Given a sequence of items I want to find the n most frequent items, in descending order of frequency. So for example I would like this unit test to pass:
(fact "can find 2 most common items in a sequence"
(most-frequent-n 2 ["a" "bb" "a" "x" "bb" "ccc" "dddd" "dddd" "bb" "dddd" "bb"])
=>
'("bb" "dddd"))
I am fairly new to Clojure and still trying to get to grip with the standard library. Here is what I came up with:
(defn- sort-by-val [s] (sort-by val s))
(defn- first-elements [pairs] (map #(get % 0) pairs))
(defn most-frequent-n [n items]
"return the most common n items, e.g.
(most-frequent-n 2 [:a :b :a :d :x :b :c :d :d :b :d :b]) =>
=> (:d :b)"
(take n (->
items ; [:a :b :a :d :x :b :c :d :d :b :d :b]
frequencies ; {:a 2, :b 4, :d 4, :x 1, :c 1}
seq ; ([:a 2] [:b 4] [:d 4] [:x 1] [:c 1])
sort-by-val ; ([:x 1] [:c 1] [:a 2] [:b 4] [:d 4])
reverse ; ([:d 4] [:b 4] [:a 2] [:c 1] [:x 1])
first-elements))) ; (:d :b :a :c :x)
However this seems like a complicated chain of functions to do a fairly common operation. Is there a more elegant or more idiomatic (or more efficient) way to do this?

As you have discovered, typically you would use a combination of sort-by and frequencies to get a frequency-sorted list.
(sort-by val (frequencies ["a" "bb" "a" "x" "bb" "ccc" "dddd" "dddd" "bb" "dddd" "bb"]))
=> (["x" 1] ["ccc" 1] ["a" 2] ["dddd" 3] ["bb" 4])
Then you can manipulate this fairly easily to get the lowest / highest frequency items. Perhaps something like:
(defn most-frequent-n [n items]
(->> items
frequencies
(sort-by val)
reverse
(take n)
(map first)))
Which again is pretty similar to your solution (apart from that you don't need the helper functions with clever use of the ->> macro).
So overall I think your solution is pretty good. Don't worry about the chain of functions - it's actually a very short solution for what is logically quite a complicated concept. Try coding the same thing in C# / Java and you will see what I mean......

Related

into vs. partition

This makes sense:
user=> (into {} [[:a 1] [:b 2]])
{:a 1, :b 2}
But why does this generate an error?
user=> (into {} (partition 2 [:a 1 :b 2]))
ClassCastException clojure.lang.Keyword cannot be cast to java.util.Map$Entry clojure.lang.ATransientMap.conj (ATransientMap.java:44)
Just to be sure:
user=> (partition 2 [:a 1 :b 2])
((:a 1) (:b 2))
Does into have a problem with lazy sequences? If so, why?
Beyond an explanation of why this doesn't work, what is the recommended way to conj a sequence of key-value pairs like [:a 1 :b 2] into a map? (apply conj doesn't seem to work, either.)
You can apply the sequence to assoc:
(apply assoc {:foo 1} [:a 1 :b 2])
=> {:foo 1, :a 1, :b 2}
Does into have a problem with lazy sequences? If so, why?
No, into is commonly used with lazily evaluated sequences. This is lazy, but each key/value tuple is a vector, which is why it works when into is reducing the pairs into the map:
(into {} (map vector (range 3) (repeat :x)))
=> {0 :x, 1 :x, 2 :x}
This doesn't work because the key/value pairs are lists:
(into {} (map list (range 3) (repeat :x)))
So the difference isn't laziness; it's due to into using reduce using conj on the map, which only works with vector key/value pairs (or MapEntrys):
(conj {} [:a 1]) ;; ok
(conj {} (MapEntry. :a 1)) ;; ok
(conj {} '(:a 1)) ;; not ok
Update: assoc wrapper for applying empty/nil sequences as suggested in comments:
(defn assoc*
([m] m)
([m k v & kvs]
(apply assoc m k v kvs)))
The recommended way – (assuming the seq arg is non-empty, as pointed out by the OP) – would be
Clojure 1.9.0
user=> (apply assoc {} [:a 1 :b 2])
{:a 1, :b 2}
The version with partition doesn't work because the blocks that partition returns are seqs and those are not treated as map entries when conj'd on to a map the way vectors and actual map entries are.
E.g. (into {} (map vec) (partition 2 [:a 1 :b 2])) would work because here the pairs get converted to vectors before conjing.
Still the approach with assoc is preferable unless there's some particular circumstance that makes into convenient (like, say, if you have a bunch of transducers that you want to use for preprocessing your partition-generated pairs etc.).
Clojure treats a 2-vec such as [:a 1] as equivalent to a MapEntry, doing what amounts to "automatic type conversion". I try to avoid this and always be explicit.
(first {:a 1}) => <#clojure.lang.MapEntry [:a 1]>
(conj {:a 1} [:b 2]) => <#clojure.lang.PersistentArrayMap {:a 1, :b 2}>
So we see that a MapEntry prints like a vector but has a different type (just like a Clojure seq prints like a list but has a different type). seq converts a Clojure map into a sequence of MapEntry's, and first gets us the first one (most Clojure functions call (seq ...) on any input collections before any other processing).
Notice that conj does the inverse type conversion, treating the vector [:b 2] as if it were a MapEntry. However, conj won't perform automatic type conversion for a list or a seq:
(throws? (conj {:a 1} '(:b 2)))
(throws? (into {:a 1} '(:b 2)))
into has the same problem since it is basically just (reduce conj <1st-arg> <2nd-seq>).
The other answers already have 3 ways that work:
(assoc {} :b 2) => {:b 2}
(conj {} [:b 2]) => {:b 2}
(into {} [[:a 1] [:b 2]]) => {:a 1, :b 2}
However, I would avoid those and stick to either hash-map or sorted-map, both of which avoid the problem of empty input seqs:
(apply hash-map []) => {} ; works for empty input seq
(apply hash-map [:a 1 :b 2]) => {:b 2, :a 1}
If your input sequence is a list of pairs, flatten is sometimes helpful:
(apply sorted-map (flatten [[:a 1] [:b 2]])) => {:a 1, :b 2}
(apply hash-map (flatten '((:a 1) (:b 2)))) => {:a 1, :b 2}
P.S.
Please be note that these are not the same:
java.util.Map$Entry (listed in jdk docs as "Map.Entry")
clojure.lang.MapEntry
P.P.S
If you already have a map and want to merge in a (possibly empty) sequence of key-value pairs, just use a combination of into and hash-map:
(into {:a 1} (apply hash-map [])) => {:a 1}
(into {:a 1} (apply hash-map [:b 2])) => {:a 1, :b 2}

Understanding what appears to be an example of destructuring in clojure

The following code:
(into {} [[:a 1][:b 2][:c 3][:d 4][:e 5]])
...produces a map(?) of keyword / value pairs. I don't quite understand the significance of the double square brackets and I am assuming it is an example of destructuring?
Thanks,
~Caitlin
It's not a destructuring, it's just an example of using into core function.
into is a function used to conjoin two collection by repeatedly adding elements from the second collection to the first one with conj function.
So, (into {} [[:a 1][:b 2]]) is just a synonym for
(-> {} (conj [:a 1]) (conj [:b 2]))
This answer is a supplement to Leonid's. One can think of a Clojure map as a collection of "map entries", key/value pairs. These are sometimes printed so that they look like 2-element vectors, though they are not 2-element vectors. Nevertheless, if you want to convert something into a map using into, it makes sense that you should pass the data that will turn into map entries in the form of 2-element vectors.
=> (def foo {:a 1 :b 2 :c 3})
#'/foo
=> (find foo :b)
[:b 2]
=> (class (find foo :b))
clojure.lang.MapEntry
=> (map identity foo)
([:c 3] [:b 2] [:a 1])
=> (map class (map identity foo))
(clojure.lang.MapEntry clojure.lang.MapEntry clojure.lang.MapEntry)
=> (list [:c 3] [:b 2] [:a 1])
([:c 3] [:b 2] [:a 1])
=> (map class (list [:c 3] [:b 2] [:a 1]))
(clojure.lang.PersistentVector clojure.lang.PersistentVector clojure.lang.PersistentVector)

What does Clojure's zip-map do?

I am new at Clojure and I needed help with this function. If you could please tell me what this function does and how it works I would be really thankfull.
(defn zip-map
[k v]
(into{} (map vec (partition 2 (interleave k v)))))
Example of usage:
(zip-map [:a :b :c] [1 2 3]) ;=> {:a 1, :b 2, :c 3}
And from the inside out:
(interleave [:a :b :c] [1 2 3]) ;=> (:a 1 :b 2 :c 3)
(partition 2 '(:a 1 :b 2 :c 3)) ;=> ((:a 1) (:b 2) (:c 3))
(map vec '((:a 1) (:b 2) (:c 3))) ;=> ([:a 1] [:b 2] [:c 3])
(into {} '([:a 1] [:b 2] [:c 3])) ;=> {:a 1, :b 2, :c 3}
The function is more complicated hence harder to understand than it need be. It could be written thus:
(defn zip-map [ks vs]
(into {} (map vector ks vs)))
when
(zip-map [:a :b :c] [1 2 3])
;{:a 1, :b 2, :c 3}
as before.
The function imitates the standard zipmap, which you can find explained, complete with source code, in the official docs or ClojureDocs, which also gives examples. Both these sites help you to pick your way through the Clojure vocabulary.
As is often the case, the standard function is faster though more complex than the simple one-liner above.

Clojure - map-indexed is it possible to start with an index other than 0?

Usually map-indexed function maps each list item to a respective index where the first index is 0, the second is 1 etc.
Is it possible to have the index start at another number and proceed from there?
Easiest way is to just remember that you can pass multiple sequences to map.
(map vector [:a :b :c] (iterate inc 100))
=> ([:a 100] [:b 101] [:c 102])
You simply wrap the index with another function in the receiving function
For example if we wanted to start at 1 instead of zero we would simply use inc
(map-indexed (fn [i v] (vector (inc i) v)) ["one" "two" "three"])
Will return
([1 "one"] [2 "two"] [3 "three"])
map-indexed does not allow this. However, it's easy to write your own version that lets you do it.
(defn map-indexed-from [n f coll]
(map f (range n Double/POSITIVE_INFINITY) coll))
Example usage:
user> (map-indexed-from 5 vector [:a :b :c])
([5 :a] [6 :b] [7 :c])

Convert map of list into list of maps (i.e. rows to colums)

I have the following data structure in Clojure
{:a [1 2 3]
:b [4 5 6]
:c [7 8 9]}
And I'd like to convert it into something like
[{:a 1 :b 4 :c 7}
{:a 2 :b 5 :c 8}
{:a 3 :b 6 :c 9}]
At the moment I'm kinda stumped as to how to do this.
In Clojure you can never guarantee the order of keys in maps after transformations. They're indexed by key, not by order.
Vectors are, however. And with get-in you can do a lookup on position with a vector of coordinates .
=> (def mat
[[1 2 3]
[4 5 6]
[7 8 9]])
=> (defn transpose
[m]
(apply mapv vector m))
=> (get-in (transpose mat) [1 2])
8
Got it:
(defn transpose-lists [x]
(map (fn [m] (zipmap (keys x) m)) (apply map vector (vals x))))
Unfortunately it doesn't preserve order of the keys.
If anyone has a better solution then of course I'd like to hear it!