Clojure parse nested vectors - clojure

I am looking to transform a clojure tree structure into a map with its dependencies
For example, an input like:
[{:value "A"}
[{:value "B"}
[{:value "C"} {:value "D"}]
[{:value "E"} [{:value "F"}]]]]
equivalent to:
:A
:B
:C
:D
:E
:F
output:
{:A [:B :E] :B [:C :D] :C [] :D [] :E [:F] :F}
I have taken a look at tree-seq and zippers but can't figure it out!

Here's a way to build up the desired map while using a zipper to traverse the tree. First let's simplify the input tree to match your output format (maps of :value strings → keywords):
(def tree
[{:value "A"}
[{:value "B"} [{:value "C"} {:value "D"}]
{:value "E"} [{:value "F"}]]])
(def simpler-tree
(clojure.walk/postwalk
#(if (map? %) (keyword (:value %)) %)
tree))
;; [:A [:B [:C :D] :E [:F]]]
Then you can traverse the tree with loop/recur and clojure.zip/next, using two loop bindings: the current position in tree, and the map being built.
(loop [loc (z/vector-zip simpler-tree)
deps {}]
(if (z/end? loc)
deps ;; return map when end is reached
(recur
(z/next loc) ;; advance through tree
(if (z/branch? loc)
;; for (non-root) branches, add top-level key with direct descendants
(if-let [parent (some-> (z/prev loc) z/node)]
(assoc deps parent (filterv keyword? (z/children loc)))
deps)
;; otherwise add top-level key with no direct descendants
(assoc deps (z/node loc) [])))))
=> {:A [:B :E], :B [:C :D], :C [], :D [], :E [:F], :F []}

This is easy to do using the tupelo.forest library. I reformatted your source data to make it fit into the Hiccup syntax:
(dotest
(let [relationhip-data-hiccup [:A
[:B
[:C]
[:D]]
[:E
[:F]]]
expected-result {:A [:B :E]
:B [:C :D]
:C []
:D []
:E [:F]
:F []} ]
(with-debug-hid
(with-forest (new-forest)
(let [root-hid (tf/add-tree-hiccup relationhip-data-hiccup)
result (apply glue (sorted-map)
(forv [hid (all-hids)]
(let [parent-tag (grab :tag (hid->node hid))
kid-tags (forv [kid-hid (hid->kids hid)]
(let [kid-tag (grab :tag (hid->node kid-hid))]
kid-tag))]
{parent-tag kid-tags})))]
(is= (format-paths (find-paths root-hid [:A]))
[[{:tag :A}
[{:tag :B} [{:tag :C}] [{:tag :D}]]
[{:tag :E} [{:tag :F}]]]])
(is= result expected-result ))))))
API docs are here. The project README (in progress) is here. A video from the 2017 Clojure Conj is here.
You can see the above live code in the project repo.

Related

What is the Clojure way to transform following data?

So I've just played with Clojure today.
Using this data,
(def test-data
[{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}
{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}])
Then transform the above data with,
(as-> test-data input
(group-by :id input)
(map (fn [x] {:id (key x)
:p {:a (as-> (filter #(= (:status %) "COMPLETED") (val x)) tmp
(into {} tmp)
(get tmp :p))
:b (as-> (filter #(= (:status %) "CREATED") (val x)) tmp
(into {} tmp)
(get tmp :p))}
:i {:a (as-> (filter #(= (:status %) "COMPLETED") (val x)) tmp
(into {} tmp)
(get tmp :i))
:b (as-> (filter #(= (:status %) "CREATED") (val x)) tmp
(into {} tmp)
(get tmp :i))}})
input)
(into [] input))
To produce,
[{:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
{:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}}]
But I have a feeling that my code is not the "Clojure way". So my question is, what is the "Clojure way" to achieve what I've produced?
The only things that stand out to me are using as-> when ->> would work just as well, and some work being done redundantly, and some destructuring opportunities:
(defn aggregate [[id values]]
(let [completed (->> (filter #(= (:status %) "COMPLETED") values)
(into {}))
created (->> (filter #(= (:status %) "CREATED") values)
(into {}))]
{:id id
:p {:a (:p completed)
:b (:p created)}
:i {:a (:i completed)
:b (:i created)}}))
(->> test-data
(group-by :id)
(map aggregate))
=>
({:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
{:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}})
However, pouring those filtered values (which are maps themselves) into a map seems suspect to me. This is creating a last-one-wins scenario where the order of your test data affects the output. Try this to see how different orders of test-data affect output:
(into {} (filter #(= (:status %) "COMPLETED") (shuffle test-data)))
It's a pretty odd transformation, keys seem a little arbitrary and it's hard to generalise from n=2 (or indeed to know whether n ever > 2).
I'd use functional decomposition to factor out some of the commonality and get some traction. First of all let us transform the statuses into our keys...
(def status->ab {"COMPLETED" :a "CREATED" :b})
Then, with that in hand, I'd like an easy way of getting the "meat" outof the substructure. Here, for a given key into the data, I'm providing the content of the enclosing map for that key and a given group result.
(defn subgroup->subresult [k subgroup]
(apply array-map (mapcat #(vector (status->ab (:status %)) (k %)) subgroup)))
With this, the main transformer becomes much more tractable:
(defn group->result [group]
{
:id (key group)
:p (subgroup->subresult :p (val group))
:i (subgroup->subresult :i (val group))})
I wouldn't consider generalising across :p and :i for this - if you had more than two keys, then maybe I would generate a map of k -> the subgroup result and do some sort of reducing merge. Anyway, we have an answer:
(map group->result (group-by :id test-data))
;; =>
({:id 35462, :p {:b 240000, :a 2640000}, :i {:b 3200, :a 261600}}
{:id 57217, :p {:b 1409999, :a 470001}, :i {:b 120105, :a 48043}})
There are no one "Clojure way" (I guess you mean functional way) as it depends on how you decompose a problem.
Here is the way I will do:
(->> test-data
(map (juxt :id :status identity))
(map ->nested)
(apply deep-merge)
(map (fn [[id m]]
{:id id
:p (->ab-map m :p)
:i (->ab-map m :i)})))
;; ({:id 35462, :p {:a 2640000, :b 240000}, :i {:a 261600, :b 3200}}
;; {:id 57217, :p {:a 470001, :b 1409999}, :i {:a 48043, :b 120105}})
As you can see, I used a few functions and here is the step-by-step explanation:
Extract index keys (id + status) and the map itself into vector
(map (juxt :id :status identity) test-data)
;; ([35462 "COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600}]
;; [35462 "CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}]
;; [57217 "COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043}]
;; [57217 "CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}])
Transform into nested map (id, then status)
(map ->nested *1)
;; ({35462 {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600}}}
;; {35462 {"CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}}}
;; {57217 {"COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043}}}
;; {57217 {"CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}}})
Merge nested map by id
(apply deep-merge *1)
;; {35462
;; {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600},
;; "CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}},
;; 57217
;; {"COMPLETED" {:id 57217, :status "COMPLETED", :p 470001, :i 48043},
;; "CREATED" {:id 57217, :status "CREATED", :p 1409999, :i 120105}}}
For attribute :p and :i, map to :a and :b according to status
(->ab-map {"COMPLETED" {:id 35462, :status "COMPLETED", :p 2640000, :i 261600},
"CREATED" {:id 35462, :status "CREATED", :p 240000, :i 3200}}
:p)
;; => {:a 2640000, :b 240000}
And below are the few helper functions I used:
(defn ->ab-map [m k]
(zipmap [:a :b]
(map #(get-in m [% k]) ["COMPLETED" "CREATED"])))
(defn ->nested [[k & [v & r :as t]]]
{k (if (seq r) (->nested t) v)})
(defn deep-merge [& xs]
(if (every? map? xs)
(apply merge-with deep-merge xs)
(apply merge xs)))
I would approach it more like the following, so it can handle any number of entries for each :id value. Of course, many variations are possible.
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test)
(:require
[tupelo.core :as t] ))
(dotest
(let [test-data [{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}
{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}]
d1 (group-by :id test-data)
d2 (t/forv [[id entries] d1]
{:id id
:status-all (mapv :status entries)
:p-all (mapv :p entries)
:i-all (mapv :i entries)})]
(is= d1
{35462
[{:id 35462, :status "COMPLETED", :p 2640000, :i 261600}
{:id 35462, :status "CREATED", :p 240000, :i 3200}],
57217
[{:id 57217, :status "COMPLETED", :p 470001, :i 48043}
{:id 57217, :status "CREATED", :p 1409999, :i 120105}]})
(is= d2 [{:id 35462,
:status-all ["COMPLETED" "CREATED"],
:p-all [2640000 240000],
:i-all [261600 3200]}
{:id 57217,
:status-all ["COMPLETED" "CREATED"],
:p-all [470001 1409999],
:i-all [48043 120105]}])
))

How to get 'at-least' schema?

By 'at-least' I mean schema that will ignore all disallowed-key errors.
Consider the following snippet:
(require '[schema.core :as s])
(def s {:a s/Int})
(s/check s {:a 1}) ;; => nil (check passed)
(s/check s {:a 1 :b 2}) ;; => {:b disallowed-key}
(def at-least-s (at-least s))
(s/check at-least-s {:a 1}) ;; => nil
(s/check at-least-s {:a 1 :b 2}) ;; => nil
The first idea about implementation of at-least function was to conjoin [s/Any s/Any] entry to initial schema:
(defn at-least [s]
(conj s [s/Any s/Any]))
but unfortunately such implementation won't work for nested maps:
(def another-s {:a {:b s/Int}})
(s/check (at-least another-s) {:a {:b 1} :c 2}) ;; => nil
(s/check (at-least another-s) {:a {:b 1 :d 3} :c 2}) ;; => {:a {:d disallowed-key}}
Is there's a possibility to get at-least schemas that work for nested maps as well? Or maybe prismatic/schema provides something out of the box that I'm missing?
There is something you can use from metosin/schema-tools: schema-tools.walk. There is even a code that you need in the test:
(defn recursive-optional-keys [m]
(sw/postwalk (fn [s]
(if (and (map? s) (not (record? s)))
(st/optional-keys s)
s))
m))

how to traverse / walk an arbitrary embedded structure

I want a mechanism to traverse an arbitrarily nested data structure. Then apply a fn on every node, and then check if the fn returned true at each point.
Its easy to do this with a flat structure -
(walk (complement string?) #(every? true? %) [ 1 2 3 4])
However walk doesnt work with a nested one -
(walk (complement string?) #(every? true? %) [ 1 2 3 [ "a" ]])
Using only flatten also wont work, as I will have a map as one of the forms, and I want fn applied to each value in the map too. This is the structure I will have -
[ ["2012" [{:a 2} {:b 3}]] ["2013" [{:a 2} {:b 3}]] ]
I can easily write a fn to only traverse the above and apply the fn to each val. However is there a way to write a generic mechanism for traversing?
tree-seq might be what you want
(every? (complement string?)
(remove coll?
(tree-seq coll? #(if (map? %)
(vals %)
%)
[["2012" [{:a 2} {:b 3}]] ["2013" [{:a 2} {:b 3}]]])))
;; false
(every? (complement string?)
(remove coll?
(tree-seq coll? #(if (map? %)
(vals %)
%)
[[2012 [{:a 2} {:b 3}]] [2013 [{:a 2} {:b 3}]]])))
;; true

how to check if a nested key exists in a map

I want to check if every key given in a vector [:e [:a :b] [:c :d]] exists in a map.
{:e 2 :a {:b 3} :c {:d 5}}
I could write the following to check -
(def kvs {:e 2 :a {:b 3} :c {:d 5}})
(every? #(contains? kvs %) [[:e] [:a :b] [:c :d]])
However the above would fail as contains doesnt check the key one level deep like update-in does. How do I accomplish the above ?
An improvement on murtaza's basic approach, which also works when the map has nil or false values:
(defn contains-every? [m keyseqs]
(let [not-found (Object.)]
(not-any? #{not-found}
(for [ks keyseqs]
(get-in m ks not-found)))))
user> (contains-every? {:e 2 :a {:b 3} :c {:d 5}}
[[:e] [:a :b] [:c :d]])
true
user> (contains-every? {:e 2 :a {:b 3} :c {:d 5}}
[[:e] [:a :b] [:c :d :e]])
false
The following does it -
(every? #(get-in kvs %) [[:e] [:a :b] [:c :d]])
Any other answers also welcome !
How about this:
(every? #(if (vector? %)
(contains? (get-in kvs (drop-last %)) (last %))
(contains? kvs %)) [:e [:a :b] [:c :d]])

Clojure: How to pass two sets of unbounded parameters?

Contrived example to illustrate:
(def nest1 {:a {:b {:c "foo"}}})
(def nest2 {:d {:e "bar"}})
If I wanted to conj these nests at arbitrary levels, I could explicitly do this:
(conj (-> nest1 :a :b) (-> nest2 :d)) ; yields {:e "bar", :c "foo"}
(conj (-> nest1 :a) (-> nest2 :d)) ; yields {:e "bar", :b {:c "foo"}}
But what if I wanted to create a function that would accept the "depth" of nest1 and nest2 as parameters?
; Does not work, but shows what I am trying to do
(defn join-nests-by-paths [nest1-path nest2-path]
(conj (-> nest1 nest1-path) (-> nest2 nest2-path))
And I might try to call it like this:
; Does not work
(join-nests-by-paths '(:a :b) '(:d))
This doesn't work. I can't simply pass each "path" as a list to the function (or maybe I can, but need to work with it differently in the function).
Any thoughts? TIA...
Sean
Use get-in:
(defn join-by-paths [path1 path2]
(conj (get-in nest1 path1) (get-in nest2 path2)))
user> (join-by-paths [:a :b] [:d])
{:e "bar", :c "foo"}
user> (join-by-paths [:a] [:d])
{:e "bar", :b {:c "foo"}}
Your version is actually doing something like this:
user> (macroexpand '(-> nest1 (:a :b)))
(:a nest1 :b)
which doesn't work, as you said.
get-in has friends assoc-in and update-in, all for working with nested maps of maps. There's a dissoc-in somewhere in clojure.conrtrib.
(In Clojure it's more idiomatic to use vectors instead of quoted lists when you're passing around sequential groups of things.)