Clojure data structure translation - list

I need to translate an array map that has this structure:
{A [(A B) (A C)], C [(C D)], B [(B nil)], D [(D E) (D F)]}
Into this equivalent list:
'(A (B (nil)) (C (D (E) (F))))
I have this function that works just fine for not that deep structures:
(def to-tree (memoize (fn [start nodes]
(list* start
(if-let [connections (seq (nodes start))]
(map #(to-tree (second %) nodes) connections))))))
However, as the n of nested elements grows, it gives off stack overflow error. How can I optimize this function, or rather, is there a way of doing this using walk or any other functional approach?

The input data that you provide looks a lot like an adjacency list. One approach you could take would be to convert your data into a graph and then create trees from it.
Here is a solution using loom to work with graphs. This example only uses one function from loom (loom.graph/digraph), so you could probably build something similar if adding a dependency is not an option for you.
Let's start by creating a directed graph from your data structure.
(defn adj-list
"Converts the data structure into an adjacency list."
[ds]
(into {} (map
;; convert [:a [[:a :b] [:a :c]]] => [:a [:b :c]]
(fn [[k vs]] [k (map second vs)])
ds)))
(defn ds->digraph
"Creates a directed graph that mirrors the data structure."
[ds]
(loom.graph/digraph (adj-list ds)))
Once we have the graph built, we want to generate the trees from the root nodes of the graph. In your example, there is only one root node (A), but there is really nothing limiting it to just one.
Loom stores a list of all nodes in the graph as well as a set of all nodes with incoming edges to a given node in the graph. We can use these to find the root nodes.
(defn roots
"Finds the set of nodes that are root nodes in the graph.
Root nodes are those with no incoming edges."
[g]
(clojure.set/difference (:nodeset g)
(set (keys (:in g)))))
Given the root nodes, we now just need to create a tree for each. We can query the graph for the nodes adjacent to a given node, and then create trees for those recursively.
(defn to-tree [g n]
"Given a node in a graph, create a tree (lazily).
Assumes that n is a node in g."
(if-let [succ (get-in g [:adj n])]
(cons n (lazy-seq (map #(to-tree g %) succ)))
(list n)))
(defn to-trees
"Convert a graph into a collection of trees, one for each root node."
[g]
(map #(to-tree g %) (roots g)))
...and that's it! Taking your input, we can generate the desired output:
(def input {:a [[:a :b] [:a :c]] :c [[:c :d]] :b [[:b nil]] :d [[:d :e] [:d :f]]})
(first (to-trees (ds->digraph input))) ; => (:a (:c (:d (:e) (:f))) (:b (nil)))
Here are a couple of inputs for generating structures that are deep or have multiple root nodes.
(def input-deep (into {} (map (fn [[x y z]] [x [[x y] [x z]]]) (partition 3 2 (range 1000)))))
(def input-2-roots {:a [[:a :b] [:a :c]] :b [[:b nil]] :c [[:c :d]] :e [[:e :b] [:e :d]]})
(to-trees (ds->digraph input-2-roots)) ; => ((:e (:b (nil)) (:d)) (:a (:c (:d)) (:b (nil))))
One of the cool things about this approach is that it can work with infinitely nested data structures since generating the tree is lazy. You will get a StackOverflowException if you try to render the tree (because it also infinitely nested), but actually generating it is no problem.
The easiest way to play with this is to create a structure with a cycle, such as in the following example. (Note that the :c node is necessary. If only :a and :b are in the graph, there are no root nodes!)
(def input-cycle {:a [[:a :b]] :b [[:b :a]] :c [[:c :a]]})
(def ts (to-trees (ds->digraph input-cycle)))
(-> ts first second first) ;; :a
(-> ts first second second first) ;; :b
You can test for this condition using loom.alg/dag?.

Related

visit clojure.zip tree by path pattern

Say I have a tree where I want to visit - and that should include the possibility to modify the visited items - all items that match the path
(def visit-path [:b :all :x :all])
where I use :all as a wildcard to match all child nodes. In the following example tree,
(def my-tree
{:a "a"
:b
{:b-1
{:x
{:b-1-1 "b11"
:b-1-2 "b12"}}
:b-2
{:x
{:b-2-1 "b21"}}}})
that would be items
[:b-1-1 "b11"]
[:b-1-2 "b12"]
[:b-2-1 "b21"]
Is there an elegant way to do this using clojure core?
FYI, I did solve this by creating my own pattern-visitor
(defn visit-zipper-pattern
[loc pattern f]
but although this function is generically usable, it is quite complex, combing both stack-consuming recursion and tail-call recursion. So when calling that method like
(visit-zipper-pattern (map-zipper my-tree) visit-path
(fn [[k v]] [k (str "mod-" v)]))
using map-zipper from https://stackoverflow.com/a/15020649/709537, it transforms the tree to
{:a "a"
:b {:b-1
{:x
{:b-1-1 "mod-b11"
:b-1-2 "mod-b12"}}
:b-2
{:x
{:b-2-1 "mod-b21"}}}}
The following will work - note that 1) it may allocate unneeded objects when handling:all keys and 2) you need to decide how to handle edge cases like :all on non-map leaves.
(defn traverse [[k & rest-ks :as pattern] f tree]
(if (empty? pattern)
(f tree)
(if (= k :all)
(reduce #(assoc %1 %2 (traverse rest-ks f (get tree %2)))
tree (keys tree))
(cond-> tree (contains? tree k)
(assoc k (traverse rest-ks f (get tree k)))))))
For a more efficient solution, it's probably better to use https://github.com/nathanmarz/specter as recommended above.

Clojure and ClojureScript REPL produce different output

Using the following recursive definition of a depth first search a Clojure (JVM) and ClojureScript (tested with both browser connected repl and lumo) REPL produce two different outputs, i.e. order in which nodes are printed is different and Clojure REPL produces a duplicate :f. The ClojureScript order is the behaviour I would expect. Why is this?
Code:
(defn dfs
([g v] (dfs g v #{}))
([g v seen]
(println v)
(let [seen (conj seen v)]
(for [n (v g)]
(if-not (contains? seen n)
(dfs g n seen))))))
(def graph {:a [:b :c :e]
:b [:d :f]
:c [:g]})
(dfs graph :a)
Cloure REPL output:
:a
:b
:c
:e
:d
:f
:g
:f
;; => ((() ()) (()) (()))
CLojureScript REPL output:
:a
:b
:d
:f
:c
:g
:e
;; => ((() ()) (()) ())
Clojure's for generates a lazy sequence. The actual evaluation of all recurrent dfs calls is triggered only by your REPL as it needs to print the function's output, i.e. ((() ()) (()) ()). If you evaluate (do (dfs graph :a) nil), you will only get :a printed.
Now, Clojure's lazy sequences are evaluated in chunks of size 32 for efficiency. So, when the REPL (through str function) evaluates the first element lazy sequence the top level for (which should print :b), the other elements of that seq are also evaluated and you get :c and :e printed before the child nodes' sequences are evaluated (which are lazy too).
In contrast, Clojurescript's lazy sequences are not chunked (LazySeq does not implement IChunkedSeq) and are evaluated one-by-one, so when the return value is recursively converted to string, everything is evaluated in depth-first order.
To illustrate this - try (first (for [i (range 300)] (do (println "printing:" i) i))) in the REPL both in Clojure and CLJS - you will get 32 numbers printed in clojure and only one number in CLJS.
If you want better guarantees on the order of evaluation, you can use doseq instead of for or wrap the for in doall.
Hope this helps.
Side note: just as #Josh, I get no :f in the end in Clojure 1.8, and the parens are identical to the cljs output - that's really weird...
I am not sure I am following how you currently want to use the result of your DFS. If you want to use side effects, i. e. print all the nodes to the console, use doseq to make sure that they are traversed:
(defn dfs-eager
([g v] (dfs-eager g v #{}))
([g v seen]
(println v)
(let [seen (conj seen v)]
(doseq [n (v g)]
(if-not (contains? seen n)
(dfs-eager g n seen))))))
This will print all the nodes to the console, depth-first. If you want to get the traversal as a return value, use for but make sure you actually return a meaningful value:
(defn dfs-lazy
([g v] (dfs-lazy g v #{}))
([g v seen]
(cons v
(let [seen (conj seen v)]
(for [n (v g)]
(if-not (contains? seen n)
(dfs-lazy g n seen)))))))
You will get a nested list (:a (:b (:d) (:f)) (:c (:g)) (:e)) - which you can then flatten to get a traversal. You will also get the benefits of laziness.

Transforming list of hashmaps into set

Having two maps:
(def a {:a 1 :b 2 :c 3})
(def b {:b 222 :d 4})
placed into one vector:
(def l [a b])
what's the easiest way to construct a set (in terms of structure of unique keys) where the priority in case of key conflict (:b in this case) has a left-hand operand (:b 2 in this case). In other words I'd like to get a result:
{:a 1 :b 2 :c 3 :d 4}
Two solutions which came to mind mind are:
(apply merge-with (fn [left _] left) l)
(reduce conj (reverse l))
First one doesn't seem idiomatic for me, second one worries me because of eager list reversing which sounds a bit inneficient. Any other ideas?
Numerous other possibilities of which (reduce #(into %2 %1) l) (or with merge instead of into) could be considered. Your merge-with solution is absolutely fine.
How about
(apply merge (reverse l))
it's seems fine and simular to second one.

Clojure: Why is flatten "the wrong thing to use"

I've read this kind of thing a couple of times since I've started Clojure.
For instance, here: How to convert map to a sequence?
And in some tweet I don't remember exactly that was more or less saying "if you're using flatten you're probably doing it wrong".
I would like to know, what is wrong with flatten?
I think this is what they were talking about in the answer you linked:
so> ((comp flatten seq) {:a [1 2] :b [3 4]})
(:b 3 4 :a 1 2)
so> (apply concat {:a [1 2] :b [3 4]})
(:b [3 4] :a [1 2])
Flatten will remove the structure from the keys and values, which is probably not what you want. There are use cases where you do want to remove the structure of nested sequences, and flatten was written for such cases. But for destructuring a map, you usually do want to keep the internal sequences as is.
Anything flatten can't flatten, it ought to return intact. At the top level, it doesn't.
(flatten 8)
()
(flatten {1 2, 3 4})
()
If you think you've supplied a sequence, but you haven't, you'll get the effect of supplying an empty sequence. This is the sort of leg-breaker that most core functions take care to preclude. For example, (str nil) => "".
flatten ought to work like this:
(defn flatten [x]
(if (sequential? x)
((fn flat [y] (if (sequential? y) (mapcat flat y) [y])) x)
x))
(flatten 8)
;8
(flatten [{1 2, 3 4}])
;({1 2, 3 4})
(flatten [1 [2 [[3]] 4]])
;(1 2 3 4)
You can find Steve Miner's faster lazy version of this here.
Probability of "probably"
Listen to people who say "you're probably doing it wrong", but also do not forget they say "probably", because it all depends on the problem.
For example if your task is to flatten the map where you could care less what was the key what was the value, you just need an unstructured sequence of all, then by all means, use flatten (or apply concat).
The reason it causes a "suspicion" is the fact that you had / were given a map to begin with, hence whoever gave it to you meant a "key value" paired structure, and if you flatten it, you lose that intention, as well as flexibility and clarity.
Keep in mind
In case you are still not sure what to do with a map for you particular problem, have a for comprehension in mind, since you would have a full control on what to do with the map as you iterate of it:
create a vector?
;; can also be (apply vector {:a 34 :b 42}), but just to use "for" for all consistently
user=> (into [] (for [[k v] {:a 34 :b 42}] [k v]))
[[:a 34] [:b 42]]
create another map?
user=> (into {} (for [[k v] {:a 34 :b 42}] [k (inc v)]))
{:a 35, :b 43}
create a set?
user=> (into #{} (for [[k v] {:a 34 :b 42}] [k v]))
#{[:a 34] [:b 42]}
reverse keys and values?
user=> (into {} (for [[k v] {:a 34 :b 42}] [v k]))
{34 :a, 42 :b}

Converting lists to sets in a list of maps of lists in Clojure

I have a list of maps where each key is associated with a list of strings.
I would like to convert each of these string list to sets instead.
(def list-of-maps-of-lists '({:a ["abc"]} {:a ["abc"]} {:a ["def"]} {:x ["xyz"]} {:x ["xx"]}))
This is my best attempt so far:
(flatten (map (fn [amap] (for [[k v] amap] {k (set v)})) list-of-maps-of-lists))
=> ({:a #{"abc"}} {:a #{"abc"}} {:a #{"def"}} {:x #{"xyz"}} {:x #{"xx"}})
What is the idiomatic solution to this problem?
This is very similar to your solution.
Using list comprehension:
(map
#(into {} (for [[k v] %] [k (set v)]))
list-of-maps-of-lists)
Alternative:
(map
#(zipmap (keys %) (map set (vals %)))
list-of-maps-of-lists)
I prefer solving such problems with fmap function from clojure.contrib:
(map (partial fmap set)
list-of-maps-of-lists)
Update: According to This Migration Guide, fmap has been moved to clojure.algo.generic.functor namespace of algo.generic library.