Clojure Zipper of nested Maps repressing a TRIE - clojure

How can I create a Clojure zipper for a TRIE, represented by nested maps, were the keys are the letters.?
Something like this:
{\b {\a {\n {\a {\n {\a {'$ '$}}}}}} \a {\n {\a {'$ '$}}}}
Represents a trie with 2 words 'banana' and 'ana'. (If necessary , its possible to make some changes here in maps..)
I've tried to pass map? vals assoc as the 3 functions to the zipper,respectively.
But it doesnt seem to work..
What 3 functions should I use?
And how the insert-into-trie would look like based on the zipper ?

map? vals #(zipmap (keys %1) %2) would do but doesn't support insertion/removal of children (since children are only values, you don't know which key to remove/add).
The map-zipper below does support insertion/removal because nodes are [k v] pairs (except the root which is a map).
(defn map-zipper [m]
(z/zipper
(fn [x] (or (map? x) (map? (nth x 1))))
(fn [x] (seq (if (map? x) x (nth x 1))))
(fn [x children]
(if (map? x)
(into {} children)
(assoc x 1 (into {} children))))
m))

The solution proposed by #cgrant is great, but implicitly describes a tree where all branches and leaf nodes have an associated value (the key in the dictionary) except for the root node that is just a branch without a value.
So, the tree {"/" nil}, is not a tree with a single leaf node, but a tree with an anonymous root branch and a single leaf node with value /.
In practice, this means that every traversal of the tree has to first execute a (zip/down t) in order to descend the root node.
An alternative solution is to explicitly model the root inside the map, that is, only create zippers from maps with a single key at the root. For example: {"/" {"etc/" {"hosts" nil}}}
The zipper can then be implemented with:
(defn map-zipper [map-or-pair]
"Define a zipper data-structure to navigate trees represented as nested dictionaries."
(if (or (and (map? map-or-pair) (= 1 (count map-or-pair))) (and (= 2 (count map-or-pair))))
(let [pair (if (map? map-or-pair) (first (seq map-or-pair)) map-or-pair)]
(zip/zipper
(fn [x] (map? (nth x 1)))
(fn [x] (seq (nth x 1)))
(fn [x children] (assoc x 1 (into {} children)))
pair))
(throw (Exception. "Input must be a map with a single root node or a pair."))))

Related

How to calculate the sum of all depths in a tree using basic `recur`?

For a given tree I would like to sum the depth of each node and calculate that recursively (so not with map/flatten/sum).
Is there a way to do that with recur or do I need to use a zipper in this case?
recur is for tail recursion, meaning if you could do it with normal recursion, where the return value is exactly what a single recursive call would return, then you can use it.
Most functions on trees cannot be written in a straightforward way when restricted to using only tail recursion. Normal recursive calls are much more straightforward, and as long as the tree depth is not thousands of levels deep, then normal recursive calls are just fine in Clojure.
The reason you may have found recommendations against using normal recursive calls in Clojure is for cases when the call stack could grow to tens or hundreds of thousands of calls deep, e.g. a recursive call one level deep for each element of a sequence that could be tens or hundreds of thousands of elements long. That would exceed the default maximum call stack depth limits of many run-time systems.
Using normal stack consuming recursion you can accomplish this pretty easily, by doing a depth-first traversal and summing the depth on the way back out.
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 6
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 19
The details depend a little bit on how you model your tree, so the above assumes that a node is either a vector pair where the first element is the value and the second element is a vector of children nodes, or if it is a leaf node then it's not a vector.
So a leaf node is anything that's not a vector.
And a node with children is a vector of form: [value [child1 child2 ...]
And here I assumed you wanted to sum the depth of all leaf nodes. Since I see from your answer, that your example gives 42, I'm now thinking you meant the sum of the depth of every node, not just leaves, if so it only takes one extra line of code to do so:
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
depth
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 7
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 42
And like your own answer showed, this particular algorithm can be solved without a stack as well, by doing a level order traversal (aka breadth-first traversal) of the tree. Here it is working on my tree data-structure (similar strategy then your own answer otherwise):
(defn sum-depths [tree]
(loop [children (second tree) depth 0 total 0]
(if (empty? children)
total
(let [child-depth (inc depth)
level-total (* (count children) child-depth)]
(recur (into [] (comp (filter vector?) (mapcat second)) children)
child-depth
(+ total level-total))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 7
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 42
And for completeness, I also want to show how you can do a depth-first recursive traversal using core.async instead of the function call stack in order to be able to traverse trees that would cause a StackOverFlow otherwise, but still using a stack based recursive depth-first traversal instead of an iterative one. As an aside, there exists some non stack consuming O(1) space depth-first traversals as well, using threaded trees (Morris algorithm) or tree transformations, but I won't show those as I'm not super familiar with them and I believe they only work on binary trees.
First, let's construct a degenerate tree of depth 10000 which causes a StackOverFlow when run against our original stack-recursive sum-depths:
(def tree
(loop [i 0 t [:a [:b]]]
(if (< i 10000)
(recur (inc i)
[:a [t]])
t)))
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
depth
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths tree)
;; => java.lang.StackOverflowError
If it works on your machine, try increasing 10000 to something even bigger.
Now we rewrite it to use core.async instead:
(require '[clojure.core.async :as async])
(defmacro for* [[element-sym coll] & body]
`(loop [acc# [] coll# ~coll]
(if-let [~element-sym (first coll#)]
(recur (conj acc# (do ~#body)) (next coll#))
acc#)))
(def tree
(loop [i 0 t [:a [:b]]]
(if (< i 10000)
(recur (inc i)
[:a [t]])
t)))
(defn sum-depths
([tree]
(async/<!! (sum-depths tree 0)))
([node depth]
(async/go
(if-not (vector? node)
depth
(do
(apply
+
depth
(for* [child-node (second node)]
(async/<!
(sum-depths child-node (inc depth))))))))))
;; => (sum-depths tree)
50015001
It is relatively easy to rewrite a stack-recursive algorithm to use core.async instead of the call stack, and thus make it so it isn't at risk of causing a StackOverFlow in the case of large inputs. Just wrap it in a go block, and wrap the recursive calls in a <! and the whole algorithm in a <!!. The only tricky part is that core.async cannot cross function boundaries, which is why the for* macro is used above. The normal Clojure for macro crosses function boundaries internally, and thus we can't use <! inside it. By rewriting it to not do so, we can use <! inside it.
Now for this particular algorithm, the tail-recursive variant using loop/recur is probably best, but I wanted to show this technique of using core.async for posterity, since it can be useful in other cases where there isn't a trivial tail-recursive implementation.
i would also propose this one, which is kinda straightforward:
it uses more or less the same approach, as tail recursive flatten does:
(defn sum-depth
([data] (sum-depth data 1 0))
([[x & xs :as data] curr res]
(cond (empty? data) res
(coll? x) (recur (concat x [:local/up] xs) (inc curr) res)
(= :local/up x) (recur xs (dec curr) res)
:else (recur xs curr (+ res curr)))))
the trick is that when you encounter the collection at the head of the sequence, you concat it to the rest, adding special indicator that signals the end of branch and level up. It allows you to track the current depth value. Quite simple, and also using one pass.
user> (sum-depth [1 [2 7] [3]])
;;=> 7
user> (sum-depth [1 2 3 [[[[[4]]]]]])
;;=> 9
You can use map/mapcat to walk a tree recursively to produce a lazy-seq (of leaf nodes). If you need depth information, just add it along the way.
(defn leaf-seq
[branch? children root]
(let [walk (fn walk [lvl node]
(if (branch? node)
(->> node
children
(mapcat (partial walk (inc lvl))))
[{:lvl lvl
:leaf node}]))]
(walk 0 root)))
To run:
(->> '((1 2 ((3))) (4))
(leaf-seq seq? identity)
(map :lvl)
(reduce +))
;; => 10
where the depths of each node are:
(->> '((1 2 ((3))) (4))
(leaf-seq seq? identity)
(map :lvl))
;; => (2 2 4 2)
Updates - sum all nodes instead of just leaf nodes
I misread the original requirement and was assuming leaf nodes only. To add the branch node back is easy, we just need to cons it before its child sequence.
(defn node-seq
"Returns all the nodes marked with depth/level"
[branch? children root]
(let [walk (fn walk [lvl node]
(lazy-seq
(cons {:lvl lvl
:node node}
(when (branch? node)
(->> node
children
(mapcat (partial walk (inc lvl))))))))]
(walk 0 root)))
Then we can walk on the hiccup-like tree as before:
(->> ["COM" [["B" [["C" [["D" [["E" [["F"] ["J" [["K" [["L"]]]]]]] ["I"]]]]] ["G" [["H"]]]]]]]
(node-seq #(s/valid? ::branch %) second)
(map :lvl)
(reduce +))
;; => 42
Note: above function uses below helper specs to identify the branch/leaf:
(s/def ::leaf (s/coll-of string? :min-count 1 :max-count 1))
(s/def ::branch (s/cat :tag string? :children (s/coll-of (s/or :leaf ::leaf
:branch ::branch))))
Here's my alternative approach that does use recur:
(defn sum-of-depths
[branches]
(loop [branches branches
cur-depth 0
total-depth 0]
(cond
(empty? branches) total-depth
:else (recur
(mapcat (fn [node] (second node)) branches)
(inc cur-depth)
(+ total-depth (* (count branches) cur-depth))))))
(def tree ["COM" (["B" (["C" (["D" (["E" (["F"] ["J" (["K" (["L"])])])] ["I"])])] ["G" (["H"])])])])
(sum-of-depths [tree]) ; For the first call we have to wrap the tree in a list.
=> 42
You can do this using the Tupelo Forest library. Here is a function to extract information about a tree in Hiccup format. First, think about how we want to use the information for a simple tree with 3 nodes:
(dotest
(hid-count-reset)
(let [td (tree-data [:a
[:b 21]
[:c 39]])]
(is= (grab :paths td) [[1003]
[1003 1001]
[1003 1002]])
(is= (grab :node-hids td) [1003 1001 1002])
(is= (grab :tags td) [:a :b :c])
(is= (grab :depths td) [1 2 2])
(is= (grab :total-depth td) 5) ))
Here is how we calculate the above information:
(ns tst.demo.core
(:use tupelo.forest tupelo.core tupelo.test)
(:require
[schema.core :as s]
[tupelo.schema :as tsk]))
(s/defn tree-data :- tsk/KeyMap
"Returns data about a hiccup tree"
[hiccup :- tsk/Vec]
(with-forest (new-forest)
(let [root-hid (add-tree-hiccup hiccup)
paths (find-paths root-hid [:** :*])
node-hids (mapv xlast paths)
tags (mapv #(grab :tag (hid->node %)) node-hids)
depths (mapv count paths)
total-depth (apply + depths)]
(vals->map paths node-hids tags depths total-depth))))
and an example on a larger Hiccup-format tree:
(dotest
(let [td (tree-data [:a
[:b 21]
[:b 22]
[:b
[:c
[:d
[:e
[:f
[:g 7]
[:h
[:i 9]]]]]]
[:c 32]]
[:c 39]])]
(is= (grab :tags td) [:a :b :b :b :c :d :e :f :g :h :i :c :c])
(is= (grab :depths td) [1 2 2 2 3 4 5 6 7 7 8 3 2])
(is= (grab :total-depth td) 52)))
Don't be afraid of stack size for normal processing. On my computer, the default stack doesn't overflow until you get to a stack depth of over 3900 recursive calls. For a binary tree, just 2^30 is over a billion nodes, and 2^300 is more nodes than the number of protons in the universe (approx).

Flatten nested map into rows [duplicate]

Given:
{:o {:i1 1
:i2 {:ii1 4}}}
I'd like to iterate over the keys of the map in "absolute" form from the root as a vector. So I'd like:
{
[:o :i1] 1
[:o :i2 :ii1] 4
}
As the result. Basically only get the leaf nodes.
A version that I think is rather nicer, using for instead of mapcat:
(defn flatten-keys [m]
(if (not (map? m))
{[] m}
(into {}
(for [[k v] m
[ks v'] (flatten-keys v)]
[(cons k ks) v']))))
The function is naturally recursive, and the most convenient base case for a non-map is "this one value, with no keyseq leading to it". For a map, you can just call flatten-keys on each value in the map, and prepend its key to each keyseq of the resulting map.
Looks like this is basically a flatten of the nested keys. This also seems to be a 4clojure problem.
A flatten-map search on github yield many results.
One example implementation:
(defn flatten-map
"Flattens the keys of a nested into a map of depth one but
with the keys turned into vectors (the paths into the original
nested map)."
[s]
(let [combine-map (fn [k s] (for [[x y] s] {[k x] y}))]
(loop [result {}, coll s]
(if (empty? coll)
result
(let [[i j] (first coll)]
(recur (into result (combine-map i j)) (rest coll)))))))
Example
(flatten-map {:OUT
{:x 5
:x/A 21
:x/B 33
:y/A 24}})
=> {[:OUT :x] 5, [:OUT :x/A] 21, [:OUT :x/B] 33, [:OUT :y/A] 24}
An even more general version from Christoph Grand:
(defn flatten-map
"Take a nested map (or a nested collection of key-value pairs) and returns a
sequence of key-value pairs where keys are replaced by the result of calling
(reduce f pk path) where path is the path to this key through the nested
maps."
([f kvs] (flatten-map f nil kvs))
([f pk kvs]
(mapcat (fn [[k v]]
(if (map? v)
(flatten-map f (f pk k) v)
[[(f pk k) v]])) kvs)))
Example:
(flatten-map conj [] {:OUT
{:x 5
:x/A 21
:x/B 33
:y/A 24}})
=> ([[:OUT :x] 5] [[:OUT :x/A] 21] [[:OUT :x/B] 33] [[:OUT :y/A] 24])

Simple "R-like" melt : better way to do?

Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))

Grouping words and more

I'm working on a project to learn Clojure in practice. I'm doing well, but sometimes I get stuck. This time I need to transform sequence of the form:
[":keyword0" "word0" "word1" ":keyword1" "word2" "word3"]
into:
[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]
I'm trying for at least two hours, but I know not so many Clojure functions to compose something useful to solve the problem in functional manner.
I think that this transformation should include some partition, here is my attempt:
(partition-by (fn [x] (.startsWith x ":")) *1)
But the result looks like this:
((":keyword0") ("word1" "word2") (":keyword1") ("word3" "word4"))
Now I should group it again... I doubt that I'm doing right things here... Also, I need to convert strings (only those that begin with :) into keywords. I think this combination should work:
(keyword (subs ":keyword0" 1))
How to write a function which performs the transformation in most idiomatic way?
Here is a high performance version, using reduce
(reduce (fn [acc next]
(if (.startsWith next ":")
(conj acc [(-> next (subs 1) keyword)])
(conj (pop acc) (conj (peek acc)
next))))
[] data)
Alternatively, you could extend your code like this
(->> data
(partition-by #(.startsWith % ":"))
(partition 2)
(map (fn [[[kw-str] strs]]
(cons (-> kw-str
(subs 1)
keyword)
strs))))
what about that:
(defn group-that [ arg ]
(if (not-empty arg)
(loop [list arg, acc [], result []]
(if (not-empty list)
(if (.startsWith (first list) ":")
(if (not-empty acc)
(recur (rest list) (vector (first list)) (conj result acc))
(recur (rest list) (vector (first list)) result))
(recur (rest list) (conj acc (first list)) result))
(conj result acc)
))))
Just 1x iteration over the Seq and without any need of macros.
Since the question is already here... This is my best effort:
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
(->> data
(partition-by (fn [x] (.startsWith x ":")))
(partition 2)
(map (fn [[[k] w]] (apply conj [(keyword (subs k 1))] w))))
I'm still looking for a better solution or criticism of this one.
First, let's construct a function that breaks vector v into sub-vectors, the breaks occurring everywhere property pred holds.
(defn breakv-by [pred v]
(let [break-points (filter identity (map-indexed (fn [n x] (when (pred x) n)) v))
starts (cons 0 break-points)
finishes (concat break-points [(count v)])]
(mapv (partial subvec v) starts finishes)))
For our case, given
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
then
(breakv-by #(= (first %) \:) data)
produces
[[] [":keyword0" "word0" "word1"] [":keyword1" "word2" "word3"]]
Notice that the initial sub-vector is different:
It has no element for which the predicate holds.
It can be of length zero.
All the others
start with their only element for which the predicate holds and
are at least of length 1.
So breakv-by behaves properly with data that
doesn't start with a breaking element or
has a succession of breaking elements.
For the purposes of the question, we need to muck about with what breakv-by produces somewhat:
(let [pieces (breakv-by #(= (first %) \:) data)]
(mapv
#(update-in % [0] (fn [s] (keyword (subs s 1))))
(rest pieces)))
;[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]

Clojure: index of a value in a list or other collection

How do I get the index of any of the elements on a list of strings as so:
(list "a" "b" "c")
For example, (function "a") would have to return 0, (function "b") 1, (function "c") 2 and so on.
and... will it be better to use any other type of collection if dealing with a very long list of data?
Christian Berg's answer is fine. Also it is possible to just fall back on Java's indexOf method of class String:
(.indexOf (appl­y str (list­ "a" "b" "c"))­ "c")
; => 2
Of course, this will only work with lists (or more general, seqs) of strings (of length 1) or characters.
A more general approach would be:
(defn index-of [e coll] (first (keep-indexed #(if (= e %2) %1) coll)))
More idiomatic would be to lazily return all indexes and only ask for the ones you need:
(defn indexes-of [e coll] (keep-indexed #(if (= e %2) %1) coll))
(first (indexes-of "a" (list "a" "a" "b"))) ;; => 0
I'm not sure I understand your question. Do you want the nth letter of each of the strings in a list? That could be accomplished like this:
(map #(nth % 1) (list "abc" "def" "ghi"))
The result is:
(\b \e \h)
Update
After reading your comment on my initial answer, I assume your question is "How do I find the index (position) of a search string in a list?"
One possibility is to search for the string from the beginning of the list and count all the entries you have to skip:
(defn index-of [item coll]
(count (take-while (partial not= item) coll)))
Example: (index-of "b" (list "a" "b" "c")) returns 1.
If you have to do a lot of look-ups, it might be more efficient to construct a hash-map of all strings and their indices:
(def my-list (list "a" "b" "c"))
(def index-map (zipmap my-list (range)))
(index-map "b") ;; returns 1
Note that with the above definitions, when there are duplicate entries in the list index-of will return the first index, while index-map will return the last.
You can use the Java .indexOf method reliably for strings and vectors, but not for lists. This solution should work for all collections, I think:
(defn index-of
"Clojure doesn't have an index-of function. The Java .indexOf method
works reliably for vectors and strings, but not for lists. This solution
works for all three."
[item coll]
(let [v (if
(or (vector? coll) (string? coll))
coll
(apply vector coll))]
(.indexOf coll item)))
Do you mean, how do you get the nth element of a list?
For example, if you want to get the 2nd element on the list (with zero-based index):
(nth (list "a" "b" "c") 2)
yields
"c"
Cat-skinning is fun. Here's a low-level approach.
(defn index-of
([item coll]
(index-of item coll 0))
([item coll from-idx]
(loop [idx from-idx coll (seq (drop from-idx coll))]
(if coll
(if (= item (first coll))
idx
(recur (inc idx) (next coll)))
-1))))
This is a Lispy answer, I suspect those expert in Clojure could do it better:
(defn position
"Returns the position of elt in this list, or nil if not present"
([list elt n]
(cond
(empty? list) nil
(= (first list) elt) n
true (position (rest list) elt (inc n))))
([list elt]
(position list elt 0)))
You seem to want to use the nth function.
From the docs for that function:
clojure.core/nth
([coll index] [coll index not-found])
Returns the value at the index. get returns nil if index out of
bounds, nth throws an exception unless not-found is supplied. nth
also works for strings, Java arrays, regex Matchers and Lists, and,
in O(n) time, for sequences.
That last clause means that in practice, nth is slower for elements "farther off" in sequences, with no guarantee to work quicker for collections that in principle support faster access (~ O(n)) to indexed elements. For (clojure) sequences, this makes sense; the clojure seq API is based on the linked-list API and in a linked list, you can only access the nth item by traversing every item before it. Keeping that restriction is what makes concrete list implementations interchangeable with lazy sequences.
Clojure collection access functions are generally designed this way; functions that do have significantly better access times on specific collections have separate names and cannot be used "by accident" on slower collections.
As an example of a collection type that supports fast "random" access to items, clojure vectors are callable; (vector-collection index-number) yields the item at index index-number - and note that clojure seqs are not callable.
I know this question has been answered a million time but here is a recursive solution that leverages deconstructing.
(defn index-of-coll
([coll elm]
(index-of-coll coll elm 0))
([[first & rest :as coll] elm idx]
(cond (empty? coll) -1
(= first elm) idx
:else (recur rest elm (inc idx)))))
(defn index-of [item items]
(or (last (first (filter (fn [x] (= (first x) item))
(map list items (range (count items))))))
-1))
seems to work - but I only have like three items in my list