Clojure: Why is flatten "the wrong thing to use" - clojure

I've read this kind of thing a couple of times since I've started Clojure.
For instance, here: How to convert map to a sequence?
And in some tweet I don't remember exactly that was more or less saying "if you're using flatten you're probably doing it wrong".
I would like to know, what is wrong with flatten?

I think this is what they were talking about in the answer you linked:
so> ((comp flatten seq) {:a [1 2] :b [3 4]})
(:b 3 4 :a 1 2)
so> (apply concat {:a [1 2] :b [3 4]})
(:b [3 4] :a [1 2])
Flatten will remove the structure from the keys and values, which is probably not what you want. There are use cases where you do want to remove the structure of nested sequences, and flatten was written for such cases. But for destructuring a map, you usually do want to keep the internal sequences as is.

Anything flatten can't flatten, it ought to return intact. At the top level, it doesn't.
(flatten 8)
()
(flatten {1 2, 3 4})
()
If you think you've supplied a sequence, but you haven't, you'll get the effect of supplying an empty sequence. This is the sort of leg-breaker that most core functions take care to preclude. For example, (str nil) => "".
flatten ought to work like this:
(defn flatten [x]
(if (sequential? x)
((fn flat [y] (if (sequential? y) (mapcat flat y) [y])) x)
x))
(flatten 8)
;8
(flatten [{1 2, 3 4}])
;({1 2, 3 4})
(flatten [1 [2 [[3]] 4]])
;(1 2 3 4)
You can find Steve Miner's faster lazy version of this here.

Probability of "probably"
Listen to people who say "you're probably doing it wrong", but also do not forget they say "probably", because it all depends on the problem.
For example if your task is to flatten the map where you could care less what was the key what was the value, you just need an unstructured sequence of all, then by all means, use flatten (or apply concat).
The reason it causes a "suspicion" is the fact that you had / were given a map to begin with, hence whoever gave it to you meant a "key value" paired structure, and if you flatten it, you lose that intention, as well as flexibility and clarity.
Keep in mind
In case you are still not sure what to do with a map for you particular problem, have a for comprehension in mind, since you would have a full control on what to do with the map as you iterate of it:
create a vector?
;; can also be (apply vector {:a 34 :b 42}), but just to use "for" for all consistently
user=> (into [] (for [[k v] {:a 34 :b 42}] [k v]))
[[:a 34] [:b 42]]
create another map?
user=> (into {} (for [[k v] {:a 34 :b 42}] [k (inc v)]))
{:a 35, :b 43}
create a set?
user=> (into #{} (for [[k v] {:a 34 :b 42}] [k v]))
#{[:a 34] [:b 42]}
reverse keys and values?
user=> (into {} (for [[k v] {:a 34 :b 42}] [v k]))
{34 :a, 42 :b}

Related

How do I exit a Clojure walk postwalk on a nested maps on the first true predicate match?

I am using clojure.walk/postwalk to compare a predicate to every map in a nested collection and want to exit with true on the first true. How would I do that? I am ok with it walking the whole data structure and then returning true if there is a true match.
As a corollary question, I guess the same question could apply to when one performs a map as opposed to a postwalk.
UPDATE: this was truly a tired/lazy question; I should have provided a code example. That said, I'm leaving it up in case anyone is currently formulating an answer to my half-baked question. The only thing that is worse than asking one is taking it down after someone has been kind enough to start helping. I will be quite content if no one answers, if they request a better question, or if they just give me suggestions of what to research.
a bit different way to do it, also employing tree-seq:
(defn find-deep [pred data not-found]
(->> data
(tree-seq coll? seq)
(some #(when (pred %) [%]))
((fnil first [not-found]))))
user> (find-deep #(= (:c %) 30) [{:a 10 :b [{:c 20 :d {:c 30}}]}] ::none)
;;=> {:c 30}
user> (find-deep #(= (:c %) 40) [{:a 10 :b [{:c 20 :d {:c 30}}]}] ::none)
;;=> :user/none
You may be interested in this function I call walk-seq. It returns a lazy depth-first sequence over a data structure which you can then seek against to find the first match. I find it to be preferable here because it doesn't require callbacks and exceptions to exit early like clojure.walk/postwalk would.
(defn walk-seq
"Returns a lazy depth-first sequence of all forms within a data structure."
[form]
(tree-seq coll? seq form))
(defn seek
"Find the first element in the collection that matches pred,
else returns not-found. Note that using seek can lead to
poor performance and you should always use indexed data
structures instead of multiple seeks over the same data."
([pred coll]
(seek pred coll nil))
([pred coll not-found]
(reduce (fn [nf x] (if (pred x) (reduced x) nf)) not-found coll)))
Usage of walk-seq:
(walk-seq {:a [{:b -1} {:b 1}] :b 2})
=>
({:a [{:b -1} {:b 1}], :b 2}
[:a [{:b -1} {:b 1}]]
:a
[{:b -1} {:b 1}]
{:b -1}
[:b -1]
:b
-1
{:b 1}
[:b 1]
:b
1
[:b 2]
:b
2)
Combining the two:
(seek (every-pred number? pos?) (walk-seq {:a [{:b -1} {:b 1}] :b 2}))
=>
1
It can be done using postwalk by throwing an exception once the predicate is true as I suggested in the comment. This approach is unconventional but concise and lets us reuse the logic of postwalk for walking the datastructure:
(defn walk-some [pred data]
(try
(clojure.walk/postwalk
#(if (pred %)
(throw (ex-info "Found" {:data %}))
%)
data)
false
(catch clojure.lang.ExceptionInfo e
true)))
(walk-some #(and (number? %) (odd? %)) {:a [[9] 3]})
;; => true
(walk-some #(and (number? %) (even? %)) {:a [[9] 3]})
;; => false
Using exceptions for control flow is rarely needed but occasionally it useful to deviate a bit from convention. You may want to define a custom exception type for improved robustness in case your predicate can throw objects of type ExceptionInfo.

How to merge maps and get a map of lists?

Let's say we a list of maps. Maps all have the same keywords, but we don't know the keywords beforehand.
[{:a 1 :b 2} {:a 3 :b 4}]
And what would be the idiomatic way of merging this list into such a map:
{:a [1 3]
:b [2 4]}
Doesn't seem hard, however as I start to implement the function, it gets super ugly and repetitive. I have a feeling that there are much cleaner ways of achieving this.
Thank you
You can actually get a pretty elegant solution by using several functions from the standard library:
(defn consolidate [& ms]
(apply merge-with conj (zipmap (mapcat keys ms) (repeat [])) ms))
Example:
(consolidate {:a 1 :b 2} {:a 3 :b 4})
;=> {:a [1 3], :b [2 4]}
One cool thing about this solution is that it works even if the maps have different key sets.
i would rather use double reduction to "merge" them with update:
(defn merge-maps-with-vec [maps]
(reduce (partial reduce-kv #(update %1 %2 (fnil conj []) %3))
{} maps))
user> (merge-maps-with-vec [{:a 1 :b 2} {:a 3 :b 4 :c 10}])
{:a [1 3], :b [2 4], :c [10]}
It is not as expressive as #Sam Estep's answer, but on the other hand it doesn't generate any intermediate sequences (like every-key-to-empty-vector map which also needs one extra pass through every entry of every map). Of course, premature optimizations are bad in general, but it won't hurt here i guess. Though the reduce based solution looks a bit more obscure, but being put into a library with proper docs it would not look as obscure to the end user (or to yourself a year after)
While many solutions are possible, here is one that uses some of the convenience functions in the Tupelo library:
(ns clj.core
(:use tupelo.core)
(:require [tupelo.schema :as ts]
[schema.core :as s] ))
(s/defn gather-keys
[list-of-maps :- [ts/KeyMap]]
(newline)
(let [keys-vec (keys (first list-of-maps))]
(s/validate [s/Keyword] keys-vec) ; verify it is a vector of keywords
(apply glue
(for [curr-key keys-vec]
{curr-key (forv [curr-map list-of-maps]
(get curr-map curr-key))} ))))
(deftest t-maps
(spyx
(gather-keys [{:a 1 :b 2}
{:a 3 :b 4} ] )))
(gather-keys [{:a 1, :b 2} {:a 3, :b 4}]) ;=> {:a [1 3], :b [2 4]}
Note that this solution assumes that each input map has an identical set of keys. Normally I'd want to enforce that assumption with a sanity check in the code as well.
Looking at the answer from Sam, I would rewrite it with some temporary variables to help document the sub-steps:
(defn consolidate-keys [list-of-maps]
(let [keys-set (set (mapcat keys list-of-maps))
base-result (zipmap keys-set (repeat [] )) ]
(apply merge-with conj base-result list-of-maps)))
(consolidate-keys [ {:a 1 :b 2}
{:a 3 :z 9} ] )
;=> {:z [9], :b [2], :a [1 3]}

Transforming list of hashmaps into set

Having two maps:
(def a {:a 1 :b 2 :c 3})
(def b {:b 222 :d 4})
placed into one vector:
(def l [a b])
what's the easiest way to construct a set (in terms of structure of unique keys) where the priority in case of key conflict (:b in this case) has a left-hand operand (:b 2 in this case). In other words I'd like to get a result:
{:a 1 :b 2 :c 3 :d 4}
Two solutions which came to mind mind are:
(apply merge-with (fn [left _] left) l)
(reduce conj (reverse l))
First one doesn't seem idiomatic for me, second one worries me because of eager list reversing which sounds a bit inneficient. Any other ideas?
Numerous other possibilities of which (reduce #(into %2 %1) l) (or with merge instead of into) could be considered. Your merge-with solution is absolutely fine.
How about
(apply merge (reverse l))
it's seems fine and simular to second one.

dissoc in clojure can't get to work

I have this function:
(defn dissoc-all [m kv]
(let [[k & ks] kv]
(dissoc m k ks)))
Where m is the map and kv is the vector of keys. I use it like this:
(dissoc-all {:a 1 :b 2} [:a :b])
=>{:b 2}
This is not what I've expected. ks has :b but I don't know why it is not being use by dissoc. Anyone can help me with this?
Edit: Added question is that why is this not triggering the 3rd overload of dissoc, which is dissoc [map key & ks]?
Changed name from dissoc-in to dissoc-all as noisesmith have said, -in is not a proper name for this and I agree.
This won't work because ks is a collection of all the elements in kv after the first. So instead of :b it is [:b].
Instead, you can just use apply:
(defn dissoc-in [m vs]
(apply dissoc m vs))
Also, dissoc-in is an odd name for this function, because the standard functions with -in in the name all do nested access, and this does not use the keys to do any nested access of the map.
Why not something like this?
(defn dissoc-all [m ks]
(apply dissoc m ks))
(dissoc-all {:a 1 :b 2} [:a :b])
=> {}
The reason the third overlod of dissoc is not getting called is because it does not expect a collection of keys like [:a :b] - it expects just the keys.
For example:
(dissoc {:a "a" :b "b" :c "c" :d "d"} :a :b :c)
=> {:d "d"}
Further to noisesmith's answer:
You're being confused by the overloads/arities of dissoc, which have this simple effect:
[m & ks]
"Returns a new map of the same (hashed/sorted) type,
that does not contain a mapping for any of ks. "
The explicit arities for no keys and one key are for performance. Many clojure functions are so organised, and the documentation follows the organisation, not the underlying idea.
Now, the action of
(dissoc-all {:a 1 :b 2} [:a :b])
;{:b 2}
is to bind
k to :a
ks to [:b]
Note the latter. The example removes the :a but fails to remove the [:b], which isn't there.
You can use apply to crack open ks:
(defn dissoc-all [m kk]
(let [[k & ks] kk]
(apply dissoc m k ks)))
(dissoc-all {:a 1 :b 2} [:a :b])
;{}
... or, better, do as #noisesmith does and short-circuit the destructuring, using apply at once.

clojure: create a lazy-seq containing another lazy-seq

I would like to create a lazy-seq containing another lazy-seq using clojure.
The data structure that I aready have is a lazy-seq of map and it looks like this:
({:a 1 :b 1})
Now I would like to put that lazy-seq into another one so that the result would be a lazy-seq of a lazy-seq of map:
(({:a 1 :b 1}))
Does anyone know how to do this? Any help would be appreciated
Regards,
Here is an example of creating a list containing a list of maps:
=> (list (list {:a 1 :b 1}))
(({:a 1, :b 1}))
It's not lazy, but you can make both lists lazy with lazy-seq macro:
=> (lazy-seq (list (lazy-seq (list {:a 1 :b 1}))))
or the same code with -> macro:
=> (-> {:a 1 :b 1} list lazy-seq list lazy-seq)
Actually, if you'll replace lists here with vectors you'll get the same result:
=> (lazy-seq [(lazy-seq [{:a 1 :b 1}])])
(({:a 1, :b 1}))
I'm not sure what you're trying to do and why do you want both lists to be lazy. So, provide better explanation if you want further help.
generally, there's nothing special about having a lazy-seq containing many lazy-seq's, so i dont understand exactly what it is you are really after.
you could always do
(map list '({:a 1 :b 1})) ;; gives (({:a 1, :b 1}))
we can even verify that it maintains laziness:
(def a
(concat
(take 5 (repeat {:a 1 :b 2}))
(lazy-seq
(throw (Exception. "too eager")))))
(println (take 5 (map list a))) ;; works fine
(println (take 6 (map list a))) ;; throws an exception