Clojure tools.analyzer - identifying last leaf node? - clojure

I'm struggling to come up with a reliable solution to a problem I'm having using tools.analyzer.
What I'm trying to achieve is, given an ast node, what is the last/furthest node in the tree? E.g. If the following code were analayzed: (def a (do (+ 1 2) 3))
Is there a reliable way of marking the node that has the value "3" as the last node in this tree? Essentially what I'm attempting to do is work out which form will eventually be bound to the var b.

Questions like the above are the reason I created the tupelo.forest library a few years ago.
You may wish to view:
The home page of the lib
The API docs
The Clojure/Conj talk
Many live examples
To get started, lets add the data as a tree, after putting it into hiccup format. Here is an outline of how to proceed, with unit tests showing the result of the operations:
(ns tst.tupelo.forest-examples
(:use tupelo.core tupelo.forest tupelo.test))
(dotest-focus
(hid-count-reset)
(with-forest (new-forest)
(let [data-orig (quote (def a (do (+ 1 2) 3)))
data-vec (unlazy data-orig)
root-hid (add-tree-hiccup data-vec)
all-paths (find-paths root-hid [:** :*])
max-len (apply max (mapv #(count %) all-paths))
paths-max-len (keep-if #(= max-len (count %)) all-paths)]
Here is the result:
(is= data-vec (quote [def a [do [+ 1 2] 3]]))
(is= (hid->bush root-hid)
(quote
[{:tag def}
[{:tag :tupelo.forest/raw, :value a}]
[{:tag do}
[{:tag +}
[{:tag :tupelo.forest/raw, :value 1}]
[{:tag :tupelo.forest/raw, :value 2}]]
[{:tag :tupelo.forest/raw, :value 3}]]]))
(is= all-paths
[[1007]
[1007 1001]
[1007 1006]
[1007 1006 1004]
[1007 1006 1004 1002]
[1007 1006 1004 1003]
[1007 1006 1005]])
(is= paths-max-len
[[1007 1006 1004 1002]
[1007 1006 1004 1003]])
(nl)
(is= (format-paths paths-max-len)
(quote [[{:tag def}
[{:tag do} [{:tag +} [{:tag :tupelo.forest/raw, :value 1}]]]]
[{:tag def}
[{:tag do} [{:tag +} [{:tag :tupelo.forest/raw, :value 2}]]]]])))))
Depending on your specific goal, you can continue the processing further.

Related

How to obtain paths to all the child nodes in a tree that only have leaves using clojure zippers?

Say I have a tree like this. I would like to obtain the paths to child nodes that only contain leaves and not non-leaf child nodes.
So for this tree
root
├──leaf123
├──level_a_node1
│ ├──leaf456
├──level_a_node2
│ ├──level_b_node1
│ │ └──leaf987
│ └──level_b_node2
│ └──level_c_node1
| └── leaf654
├──leaf789
└──level_a_node3
└──leaf432
The result would be
[["root" "level_a_node1"]
["root" "level_a_node2" "level_b_node1"]
["root" "level_a_node2" "level_b_node2" "level_c_node1"]
["root" "level_a_node3"]]
I've attempted to go down to the bottom nodes and check if the (lefts) and the (rights) are not branches, but that that doesn't quite work.
(z/vector-zip ["root"
["level_a_node3" ["leaf432"]]
["level_a_node2" ["level_b_node2" ["level_c_node1" ["leaf654"]]] ["level_b_node1" ["leaf987"]] ["leaf789"]]
["level_a_node1" ["leaf456"]]
["leaf123"]])
edit: my data is actually coming in as a list of paths and I'm converting that into a tree. But maybe that is an overcomplication?
[["root" "leaf"]
["root" "level_a_node1" "leaf"]
["root" "level_a_node2" "leaf"]
["root" "level_a_node2" "level_b_node1" "leaf"]
["root" "level_a_node2" "level_b_node2" "level_c_node1" "leaf"]
["root" "level_a_node3" "leaf"]]
Hiccup-style structures are a nice place to visit, but I wouldn't want to live there. That is, they're very succinct to write, but a giant pain to manipulate programmatically, because the semantic nesting structure is not reflected in the physical structure of the nodes. So, the first thing I would do is convert to Enlive-style tree representation (or, ideally, generate Enlive to begin with):
(def hiccup
["root"
["level_a_node3" ["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456"]]
["leaf123"]])
(defn hiccup->enlive [x]
(when (vector? x)
{:tag (first x)
:content (map hiccup->enlive (rest x))}))
(def enlive (hiccup->enlive hiccup))
;; Yielding...
{:tag "root",
:content
({:tag "level_a_node3", :content ({:tag "leaf432", :content ()})}
{:tag "level_a_node2",
:content
({:tag "level_b_node2",
:content
({:tag "level_c_node1",
:content ({:tag "leaf654", :content ()})})}
{:tag "level_b_node1", :content ({:tag "leaf987", :content ()})}
{:tag "leaf789", :content ()})}
{:tag "level_a_node1", :content ({:tag "leaf456", :content ()})}
{:tag "leaf123", :content ()})}
Having done this, the last thing getting in your way is your desire to use zippers. They are a good tool for targeted traversals, where you care a lot about the structure near the node you are working on. But if all you care about is the node and its children, it is much easier to just write a simple recursive function to traverse the tree:
(defn paths-to-leaves [{:keys [tag content] :as root}]
(when (seq content)
(if (every? #(empty? (:content %)) content)
[(list tag)]
(for [child content
path (paths-to-leaves child)]
(cons tag path)))))
The ability to write recursive traversals like this is a skill that will serve you many times throughout your Clojure career (for example, a similar question I recently answered on Code Review). It turns out that a huge number of functions on trees are just: call yourself recursively on each child, and somehow combine the results, usually in a possibly-nested for loop. The hard part is just figuring out what your base case needs to be, and the correct sequence of maps/mapcats to combine the results without introducing undesired levels of nesting.
If you insist on sticking with Hiccup, you can de-mangle it at the use site without too much pain:
(defn hiccup-paths-to-leaves [node]
(when (vector? node)
(let [tag (first node), content (next node)]
(if (and content (every? #(= 1 (count %)) content))
[(list tag)]
(for [child content
path (hiccup-paths-to-leaves child)]
(cons tag path))))))
But it's noticeably messier, and is work you'll have to repeat every time you work with a tree. Again I encourage you to use Enlive-style trees for your internal data representation.
You can definitely use the file api to navigate the directory. If using zipper, you can do this:
(loop [loc (vector-zip ["root"
["level_a_node3"
["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456" "leaf456b"]]
["leaf123"]])
ans nil]
(if (end? loc)
ans
(recur (next loc)
(cond->> ans
(contains-leaves-only? loc)
(cons (->> loc down path (map node)))))))
which will output this:
(("root" "level_a_node1")
("root" "level_a_node2" "level_b_node1")
("root" "level_a_node2" "level_b_node2" "level_c_node1")
("root" "level_a_node3"))
with the way you define the tree, helper functions can be implemented
as:
(def is-leaf? #(-> % down nil?))
(defn contains-leaves-only?
[loc]
(some->> loc
down ;; branch name
right ;; children list
down ;; first child
(iterate right) ;; with other sibiling
(take-while identity)
(every? is-leaf?)))
UPDATE - add a lazy sequence version
(->> ["root"
["level_a_node3"
["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456" "leaf456b"]]
["leaf123"]]
vector-zip
(iterate next)
(take-while (complement end?))
(filter contains-leaves-only?)
(map #(->> % down path (map node))))
It is because zippers have so many limitations that I created the Tupelo Forest library for processing tree-like data structures. Your problem then has a simple solution:
(ns tst.tupelo.forest-examples
(:use tupelo.core tupelo.forest tupelo.test))
(with-forest (new-forest)
(let [data ["root"
["level_a_node3" ["leaf"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf"]]]
["level_b_node1" ["leaf"]]]
["level_a_node1" ["leaf"]]
["leaf"]]
root-hid (add-tree-hiccup data)
leaf-paths (find-paths-with root-hid [:** :*] leaf-path?)]
with a tree that looks like:
(hid->bush root-hid) =>
[{:tag "root"}
[{:tag "level_a_node3"}
[{:tag "leaf"}]]
[{:tag "level_a_node2"}
[{:tag "level_b_node2"}
[{:tag "level_c_node1"}
[{:tag "leaf"}]]]
[{:tag "level_b_node1"}
[{:tag "leaf"}]]]
[{:tag "level_a_node1"}
[{:tag "leaf"}]]
[{:tag "leaf"}]])
and a result like:
(format-paths leaf-paths) =>
[[{:tag "root"} [{:tag "level_a_node3"} [{:tag "leaf"}]]]
[{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node2"} [{:tag "level_c_node1"} [{:tag "leaf"}]]]]]
[{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node1"} [{:tag "leaf"}]]]]
[{:tag "root"} [{:tag "level_a_node1"} [{:tag "leaf"}]]]
[{:tag "root"} [{:tag "leaf"}]]]))))
There are many choices after this depending on the next steps in the processing chain.

What is the Clojure way to transform following data?

So I've just played with Clojure today.
Using this data,
(def test-data
[{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}
{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}])
Then transform the above data with,
(as-> test-data input
(group-by :id input)
(map (fn [x] {:id (key x)
:p {:a (as-> (filter #(= (:status %) "COMPLETED") (val x)) tmp
(into {} tmp)
(get tmp :p))
:b (as-> (filter #(= (:status %) "CREATED") (val x)) tmp
(into {} tmp)
(get tmp :p))}
:i {:a (as-> (filter #(= (:status %) "COMPLETED") (val x)) tmp
(into {} tmp)
(get tmp :i))
:b (as-> (filter #(= (:status %) "CREATED") (val x)) tmp
(into {} tmp)
(get tmp :i))}})
input)
(into [] input))
To produce,
[{:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
{:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}}]
But I have a feeling that my code is not the "Clojure way". So my question is, what is the "Clojure way" to achieve what I've produced?
The only things that stand out to me are using as-> when ->> would work just as well, and some work being done redundantly, and some destructuring opportunities:
(defn aggregate [[id values]]
(let [completed (->> (filter #(= (:status %) "COMPLETED") values)
(into {}))
created (->> (filter #(= (:status %) "CREATED") values)
(into {}))]
{:id id
:p {:a (:p completed)
:b (:p created)}
:i {:a (:i completed)
:b (:i created)}}))
(->> test-data
(group-by :id)
(map aggregate))
=>
({:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
{:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}})
However, pouring those filtered values (which are maps themselves) into a map seems suspect to me. This is creating a last-one-wins scenario where the order of your test data affects the output. Try this to see how different orders of test-data affect output:
(into {} (filter #(= (:status %) "COMPLETED") (shuffle test-data)))
It's a pretty odd transformation, keys seem a little arbitrary and it's hard to generalise from n=2 (or indeed to know whether n ever > 2).
I'd use functional decomposition to factor out some of the commonality and get some traction. First of all let us transform the statuses into our keys...
(def status->ab {"COMPLETED" :a "CREATED" :b})
Then, with that in hand, I'd like an easy way of getting the "meat" outof the substructure. Here, for a given key into the data, I'm providing the content of the enclosing map for that key and a given group result.
(defn subgroup->subresult [k subgroup]
(apply array-map (mapcat #(vector (status->ab (:status %)) (k %)) subgroup)))
With this, the main transformer becomes much more tractable:
(defn group->result [group]
{
:id (key group)
:p (subgroup->subresult :p (val group))
:i (subgroup->subresult :i (val group))})
I wouldn't consider generalising across :p and :i for this - if you had more than two keys, then maybe I would generate a map of k -> the subgroup result and do some sort of reducing merge. Anyway, we have an answer:
(map group->result (group-by :id test-data))
;; =>
({:id 35462, :p {:b 240000, :a 2640000}, :i {:b 3200, :a 261600}}
{:id 57217, :p {:b 1409999, :a 470001}, :i {:b 120105, :a 48043}})
There are no one "Clojure way" (I guess you mean functional way) as it depends on how you decompose a problem.
Here is the way I will do:
(->> test-data
(map (juxt :id :status identity))
(map ->nested)
(apply deep-merge)
(map (fn [[id m]]
{:id id
:p (->ab-map m :p)
:i (->ab-map m :i)})))
;; ({:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
;; {:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}})
As you can see, I used a few functions and here is the step-by-step explanation:
Extract index keys (id + status) and the map itself into vector
(map (juxt :id :status identity) test-data)
;; ([35462 "COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600}]
;; [35462 "CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}]
;; [57217 "COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043}]
;; [57217 "CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}])
Transform into nested map (id, then status)
(map ->nested *1)
;; ({35462 {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600}}}
;; {35462 {"CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}}}
;; {57217 {"COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043}}}
;; {57217 {"CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}}})
Merge nested map by id
(apply deep-merge *1)
;; {35462
;; {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600},
;; "CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}},
;; 57217
;; {"COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043},
;; "CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}}}
For attribute :p and :i, map to :a and :b according to status
(->ab-map {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600},
"CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}}
:p)
;; => {:a 2640000, :b 240000}
And below are the few helper functions I used:
(defn ->ab-map [m k]
(zipmap [:a :b]
(map #(get-in m [% k]) ["COMPLETED" "CREATED"])))
(defn ->nested [[k & [v & r :as t]]]
{k (if (seq r) (->nested t) v)})
(defn deep-merge [& xs]
(if (every? map? xs)
(apply merge-with deep-merge xs)
(apply merge xs)))
I would approach it more like the following, so it can handle any number of entries for each :id value. Of course, many variations are possible.
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test)
(:require
[tupelo.core :as t] ))
(dotest
(let [test-data [{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}
{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}]
d1 (group-by :id test-data)
d2 (t/forv [[id entries] d1]
{:id id
:status-all (mapv :status entries)
:p-all (mapv :p entries)
:i-all (mapv :i entries)})]
(is= d1
{35462
[{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}],
57217
[{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}]})
(is= d2 [{:id 35462,
:status-all ["COMPLETED" "CREATED"],
:p-all [2640000 240000],
:i-all [261600 3200]}
{:id 57217,
:status-all ["COMPLETED" "CREATED"],
:p-all [470001 1409999],
:i-all [48043 120105]}])
))

Clojure parse nested vectors

I am looking to transform a clojure tree structure into a map with its dependencies
For example, an input like:
[{:value "A"}
[{:value "B"}
[{:value "C"} {:value "D"}]
[{:value "E"} [{:value "F"}]]]]
equivalent to:
:A
:B
:C
:D
:E
:F
output:
{:A [:B :E] :B [:C :D] :C [] :D [] :E [:F] :F}
I have taken a look at tree-seq and zippers but can't figure it out!
Here's a way to build up the desired map while using a zipper to traverse the tree. First let's simplify the input tree to match your output format (maps of :value strings → keywords):
(def tree
[{:value "A"}
[{:value "B"} [{:value "C"} {:value "D"}]
{:value "E"} [{:value "F"}]]])
(def simpler-tree
(clojure.walk/postwalk
#(if (map? %) (keyword (:value %)) %)
tree))
;; [:A [:B [:C :D] :E [:F]]]
Then you can traverse the tree with loop/recur and clojure.zip/next, using two loop bindings: the current position in tree, and the map being built.
(loop [loc (z/vector-zip simpler-tree)
deps {}]
(if (z/end? loc)
deps ;; return map when end is reached
(recur
(z/next loc) ;; advance through tree
(if (z/branch? loc)
;; for (non-root) branches, add top-level key with direct descendants
(if-let [parent (some-> (z/prev loc) z/node)]
(assoc deps parent (filterv keyword? (z/children loc)))
deps)
;; otherwise add top-level key with no direct descendants
(assoc deps (z/node loc) [])))))
=> {:A [:B :E], :B [:C :D], :C [], :D [], :E [:F], :F []}
This is easy to do using the tupelo.forest library. I reformatted your source data to make it fit into the Hiccup syntax:
(dotest
(let [relationhip-data-hiccup [:A
[:B
[:C]
[:D]]
[:E
[:F]]]
expected-result {:A [:B :E]
:B [:C :D]
:C []
:D []
:E [:F]
:F []} ]
(with-debug-hid
(with-forest (new-forest)
(let [root-hid (tf/add-tree-hiccup relationhip-data-hiccup)
result (apply glue (sorted-map)
(forv [hid (all-hids)]
(let [parent-tag (grab :tag (hid->node hid))
kid-tags (forv [kid-hid (hid->kids hid)]
(let [kid-tag (grab :tag (hid->node kid-hid))]
kid-tag))]
{parent-tag kid-tags})))]
(is= (format-paths (find-paths root-hid [:A]))
[[{:tag :A}
[{:tag :B} [{:tag :C}] [{:tag :D}]]
[{:tag :E} [{:tag :F}]]]])
(is= result expected-result ))))))
API docs are here. The project README (in progress) is here. A video from the 2017 Clojure Conj is here.
You can see the above live code in the project repo.

How to get the href attribute value using enlive

I am a new to Clojure and enlive.
I have html like this
<SPAN CLASS="f10">....</SPAN></DIV><DIV CLASS="p5"><SPAN CLASS="f10">.....</SPAN>
I tried this
(html/select (fetch-url base-url) [:span.f10 [:a (html/attr? :href)]]))
but it returns this
({:tag :a,
:attrs
{:target "detail",
:title
"...",
:href
"value1"},
:content ("....")}
{:tag :a,
:attrs
{:target "detail",
:title
"....",
:href
"value2"},
:content
("....")}
What i want is just value1 and value 2 in the output. How can i accomplish it ?
select returns the matched nodes, but you still need to extract their href attributes. To do that, you can use attr-values:
(mapcat #(html/attr-values % :href)
(html/select (html/html-resource "sample.html") [:span.f10 (html/attr? :href)]))
I use this little function because the Enlive attr functions don't return the values. You are basically just walking the hash to get the value.
user=> (def data {:tag :a, :attrs {:target "detail", :title "...", :href "value1"}})
#'user/data
user=> (defn- get-attr [node attr]
#_=> (some-> node :attrs attr))
#'user/get-attr
user=> (get-attr data :href)
"value1"

How best to update this tree?

I've got the following tree:
{:start_date "2014-12-07"
:data {
:people [
{:id 1
:projects [{:id 1} {:id 2}]}
{:id 2
:projects [{:id 1} {:id 3}]}
]
}
}
I want to update the people and projects subtrees by adding a :name key-value pair.
Assuming I have these maps to perform the lookup:
(def people {1 "Susan" 2 "John")
(def projects {1 "Foo" 2 "Bar" 3 "Qux")
How could I update the original tree so that I end up with the following?
{:start_date "2014-12-07"
:data {
:people [
{:id 1
:name "Susan"
:projects [{:id 1 :name "Foo"} {:id 2 :name "Bar"}]}
{:id 2
:name "John"
:projects [{:id 1 :name "Foo"} {:id 3 :name "Qux"}]}
]
}
}
I've tried multiple combinations of assoc-in, update-in, get-in and map calls, but haven't been able to figure this out.
I have used letfn to break down the update into easier to understand units.
user> (def tree {:start_date "2014-12-07"
:data {:people [{:id 1
:projects [{:id 1} {:id 2}]}
{:id 2
:projects [{:id 1} {:id 3}]}]}})
#'user/tree
user> (def people {1 "Susan" 2 "John"})
#'user/people
user> (def projects {1 "Foo" 2 "Bar" 3 "Qux"})
#'user/projects
user>
(defn integrate-tree
[tree people projects]
;; letfn is like let, but it creates fn, and allows forward references
(letfn [(update-person [person]
;; -> is the "thread first" macro, the result of each expression
;; becomes the first arg to the next
(-> person
(assoc :name (people (:id person)))
(update-in [:projects] update-projects)))
(update-projects [all-projects]
(mapv
#(assoc % :name (projects (:id %)))
all-projects))]
(update-in tree [:data :people] #(mapv update-person %))))
#'user/integrate-tree
user> (pprint (integrate-tree tree people projects))
{:start_date "2014-12-07",
:data
{:people
[{:projects [{:name "Foo", :id 1} {:name "Bar", :id 2}],
:name "Susan",
:id 1}
{:projects [{:name "Foo", :id 1} {:name "Qux", :id 3}],
:name "John",
:id 2}]}}
nil
Not sure if entirely the best approach:
(defn update-names
[tree people projects]
(reduce
(fn [t [id name]]
(let [person-idx (ffirst (filter #(= (:id (second %)) id)
(map-indexed vector (:people (:data t)))))
temp (assoc-in t [:data :people person-idx :name] name)]
(reduce
(fn [t [id name]]
(let [project-idx (ffirst (filter #(= (:id (second %)) id)
(map-indexed vector (get-in t [:data :people person-idx :projects]))))]
(if project-idx
(assoc-in t [:data :people person-idx :projects project-idx :name] name)
t)))
temp
projects)))
tree
people))
Just call it with your parameters:
(clojure.pprint/pprint (update-names tree people projects))
{:start_date "2014-12-07",
:data
{:people
[{:projects [{:name "Foo", :id 1} {:name "Bar", :id 2}],
:name "Susan",
:id 1}
{:projects [{:name "Foo", :id 1} {:name "Qux", :id 3}],
:name "John",
:id 2}]}}
With nested reduces
Reduce over the people to update corresponding names
For each people, reduce over projects to update corresponding names
The noisesmith solution looks better since doesn't need to find person index or project index for each step.
Naturally you tried to assoc-in or update-in but the problem lies in your tree structure, since the key path to update John name is [:data :people 1 :name], so your assoc-in code would look like:
(assoc-in tree [:data :people 1 :name] "John")
But you need to find John's index in the people vector before you can update it, same things happens with projects inside.