Dynamically create map using vector - clojure

I have a vector ["x" "y" "z"].
I am trying to dynamically create the following:
{:aggs {:bucket-aggregation
{:terms {:field "x"},
:aggs {:bucket-aggregation
{:terms {:field "y"},
:aggs {:bucket-aggregation
{:terms {:field "z"}}}}}}}}
I currently have the following but can't figure out how to make it recursive
(defn testing [terms]
{:aggs {:bucket-aggregation
{:terms {:field (nth terms 0)} (testing (pop terms))}}})

Here's one way to solve:
(def my-vec ["x" "y" "z"])
(defn testing [[head & tail]]
(when head
{:aggs {:bucket-aggregation (merge {:terms {:field head}}
(testing tail))}}))
(testing my-vec)
;=>
;{:aggs {:bucket-aggregation {:terms {:field "x"},
; :aggs {:bucket-aggregation {:terms {:field "y"},
; :aggs {:bucket-aggregation {:terms {:field "z"}}}}}}}}
This works by destructuring the input into a head element and tail elements, so each call is adding a :field of head and recursing on the tail.
And here's another way to solve using reduce:
(reduce
(fn [acc elem]
{:aggs {:bucket-aggregation (merge {:terms {:field elem}} acc)}})
nil
(reverse my-vec))
This works by reverseing the input vector and building the map from the inside-out. This reduce approach won't result in a stack overflow for large vectors, but the first solution will.

Related

Using clojure, Is there a better way to to remove a item from a sequence, which is the value in a map?

There is a map containing sequences. The sequences contain items.
I want to remove a given item from any sequence that contains it.
The solution I found does what it should, but I wonder if there is a better
or more elegant way to achieve the same.
my current solution:
(defn remove-item-from-map-value [my-map item]
(apply merge (for [[k v] my-map] {k (remove #(= item %) v)})))
The test describe the expected behaviour:
(require '[clojure.test :as t])
(def my-map {:keyOne ["itemOne"]
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemFive" "itemSix"]})
(defn remove-item-from-map-value [my-map item]
(apply merge (for [[k v] my-map] {k (remove #(= item %) v)})))
(t/is (= (remove-item-from-map-value my-map "unknown-item") my-map))
(t/is (= (remove-item-from-map-value my-map "itemFive") {:keyOne ["itemOne"]
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemSix"]}))
(t/is (= (remove-item-from-map-value my-map "itemThree") {:keyOne ["itemOne"]
:keyTwo ["itemTwo"]
:keyThree ["itemFour" "itemFive" "itemSix"]}))
(t/is (= (remove-item-from-map-value my-map "itemOne") {:keyOne []
:keyTwo ["itemTwo" "itemThree"]
:keyThree ["itemFour" "itemFive" "itemSix"]}))
I'm fairly new to clojure and am interested in different solutions.
So any input is welcome.
I throw in the specter
version for good measure. It keeps the vectors inside the map
and it's really compact.
(setval [MAP-VALS ALL #{"itemFive"}] NONE my-map)
Example
user=> (use 'com.rpl.specter)
nil
user=> (def my-map {:keyOne ["itemOne"]
#_=> :keyTwo ["itemTwo" "itemThree"]
#_=> :keyThree ["itemFour" "itemFive" "itemSix"]})
#_=>
#'user/my-map
user=> (setval [MAP-VALS ALL #{"itemFive"}] NONE my-map)
{:keyOne ["itemOne"],
:keyThree ["itemFour" "itemSix"],
:keyTwo ["itemTwo" "itemThree"]}
user=> (setval [MAP-VALS ALL #{"unknown"}] NONE my-map)
{:keyOne ["itemOne"],
:keyThree ["itemFour" "itemFive" "itemSix"],
:keyTwo ["itemTwo" "itemThree"]}
i would go with something like this:
user> (defn remove-item [my-map item]
(into {}
(map (fn [[k v]] [k (remove #{item} v)]))
my-map))
#'user/remove-item
user> (remove-item my-map "itemFour")
;;=> {:keyOne ("itemOne"),
;; :keyTwo ("itemTwo" "itemThree"),
;; :keyThree ("itemFive" "itemSix")}
you could also make up a handy function map-val performing mapping on map values:
(defn map-val [f data]
(reduce-kv
(fn [acc k v] (assoc acc k (f v)))
{} data))
or shortly like this:
(defn map-val [f data]
(reduce #(update % %2 f) data (keys data)))
user> (map-val inc {:a 1 :b 2})
;;=> {:a 2, :b 3}
(defn remove-item [my-map item]
(map-val (partial remove #{item}) my-map))
user> (remove-item my-map "itemFour")
;;=> {:keyOne ("itemOne"),
;; :keyTwo ("itemTwo" "itemThree"),
;; :keyThree ("itemFive" "itemSix")}
I think your solution is mostly okay, but I would try to avoid the apply merge part, as you can easily recreate a map from a sequence with into. Also, you could also use map instead of for which I think is a little bit more idiomatic in this case as you don't use any of the list comprehension features of for.
(defn remove-item-from-map-value [m item]
(->> m
(map (fn [[k vs]]
{k (remove #(= item %) vs)}))
(into {})))
Another solution much like #leetwinski:
(defn remove-item [m i]
(zipmap (keys m)
(map (fn [v] (remove #(= % i) v))
(vals m))))
Here's a one-liner which does this in an elegant way. The perfect function for me to use in this scenario is clojure.walk/prewalk. What this fn does is it traverse all of the sub-forms of the form that you pass to it and it transforms them with the provided fn:
(defn remove-item-from-map-value [data item]
(clojure.walk/prewalk #(if (map-entry? %) [(first %) (remove #{item} (second %))] %) data))
What the remove-item-from-map-value fn will do is it will check if current form is a map entry and if so, it will remove specified key from its value (second element of the map entry, which is a vector containing a key and a value, respectively).
The best this about this approach is that is is completely extendable: you could decide to do different things for different types of forms, you can also handle nested forms, etc.
It took me some time to master this fn but once I got it I found it extremely useful!

How to calculate the sum of all depths in a tree using basic `recur`?

For a given tree I would like to sum the depth of each node and calculate that recursively (so not with map/flatten/sum).
Is there a way to do that with recur or do I need to use a zipper in this case?
recur is for tail recursion, meaning if you could do it with normal recursion, where the return value is exactly what a single recursive call would return, then you can use it.
Most functions on trees cannot be written in a straightforward way when restricted to using only tail recursion. Normal recursive calls are much more straightforward, and as long as the tree depth is not thousands of levels deep, then normal recursive calls are just fine in Clojure.
The reason you may have found recommendations against using normal recursive calls in Clojure is for cases when the call stack could grow to tens or hundreds of thousands of calls deep, e.g. a recursive call one level deep for each element of a sequence that could be tens or hundreds of thousands of elements long. That would exceed the default maximum call stack depth limits of many run-time systems.
Using normal stack consuming recursion you can accomplish this pretty easily, by doing a depth-first traversal and summing the depth on the way back out.
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 6
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 19
The details depend a little bit on how you model your tree, so the above assumes that a node is either a vector pair where the first element is the value and the second element is a vector of children nodes, or if it is a leaf node then it's not a vector.
So a leaf node is anything that's not a vector.
And a node with children is a vector of form: [value [child1 child2 ...]
And here I assumed you wanted to sum the depth of all leaf nodes. Since I see from your answer, that your example gives 42, I'm now thinking you meant the sum of the depth of every node, not just leaves, if so it only takes one extra line of code to do so:
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
depth
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 7
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 42
And like your own answer showed, this particular algorithm can be solved without a stack as well, by doing a level order traversal (aka breadth-first traversal) of the tree. Here it is working on my tree data-structure (similar strategy then your own answer otherwise):
(defn sum-depths [tree]
(loop [children (second tree) depth 0 total 0]
(if (empty? children)
total
(let [child-depth (inc depth)
level-total (* (count children) child-depth)]
(recur (into [] (comp (filter vector?) (mapcat second)) children)
child-depth
(+ total level-total))))))
(sum-depths [:root
[:a1
[:b1
[:a2 :b2]]
:c1]])
;; => 7
(sum-depths ["COM"
[["B"
[["C"
[["D"
[["E"
["F"
["J"
[["K"
["L"]]]]]]
"I"]]]]
["G"
["H"]]]]]])
;; => 42
And for completeness, I also want to show how you can do a depth-first recursive traversal using core.async instead of the function call stack in order to be able to traverse trees that would cause a StackOverFlow otherwise, but still using a stack based recursive depth-first traversal instead of an iterative one. As an aside, there exists some non stack consuming O(1) space depth-first traversals as well, using threaded trees (Morris algorithm) or tree transformations, but I won't show those as I'm not super familiar with them and I believe they only work on binary trees.
First, let's construct a degenerate tree of depth 10000 which causes a StackOverFlow when run against our original stack-recursive sum-depths:
(def tree
(loop [i 0 t [:a [:b]]]
(if (< i 10000)
(recur (inc i)
[:a [t]])
t)))
(defn sum-depths
([tree]
(sum-depths tree 0))
([node depth]
(if-not (vector? node)
depth
(do
(apply
+
depth
(for [child-node (second node)]
(sum-depths child-node (inc depth))))))))
(sum-depths tree)
;; => java.lang.StackOverflowError
If it works on your machine, try increasing 10000 to something even bigger.
Now we rewrite it to use core.async instead:
(require '[clojure.core.async :as async])
(defmacro for* [[element-sym coll] & body]
`(loop [acc# [] coll# ~coll]
(if-let [~element-sym (first coll#)]
(recur (conj acc# (do ~#body)) (next coll#))
acc#)))
(def tree
(loop [i 0 t [:a [:b]]]
(if (< i 10000)
(recur (inc i)
[:a [t]])
t)))
(defn sum-depths
([tree]
(async/<!! (sum-depths tree 0)))
([node depth]
(async/go
(if-not (vector? node)
depth
(do
(apply
+
depth
(for* [child-node (second node)]
(async/<!
(sum-depths child-node (inc depth))))))))))
;; => (sum-depths tree)
50015001
It is relatively easy to rewrite a stack-recursive algorithm to use core.async instead of the call stack, and thus make it so it isn't at risk of causing a StackOverFlow in the case of large inputs. Just wrap it in a go block, and wrap the recursive calls in a <! and the whole algorithm in a <!!. The only tricky part is that core.async cannot cross function boundaries, which is why the for* macro is used above. The normal Clojure for macro crosses function boundaries internally, and thus we can't use <! inside it. By rewriting it to not do so, we can use <! inside it.
Now for this particular algorithm, the tail-recursive variant using loop/recur is probably best, but I wanted to show this technique of using core.async for posterity, since it can be useful in other cases where there isn't a trivial tail-recursive implementation.
i would also propose this one, which is kinda straightforward:
it uses more or less the same approach, as tail recursive flatten does:
(defn sum-depth
([data] (sum-depth data 1 0))
([[x & xs :as data] curr res]
(cond (empty? data) res
(coll? x) (recur (concat x [:local/up] xs) (inc curr) res)
(= :local/up x) (recur xs (dec curr) res)
:else (recur xs curr (+ res curr)))))
the trick is that when you encounter the collection at the head of the sequence, you concat it to the rest, adding special indicator that signals the end of branch and level up. It allows you to track the current depth value. Quite simple, and also using one pass.
user> (sum-depth [1 [2 7] [3]])
;;=> 7
user> (sum-depth [1 2 3 [[[[[4]]]]]])
;;=> 9
You can use map/mapcat to walk a tree recursively to produce a lazy-seq (of leaf nodes). If you need depth information, just add it along the way.
(defn leaf-seq
[branch? children root]
(let [walk (fn walk [lvl node]
(if (branch? node)
(->> node
children
(mapcat (partial walk (inc lvl))))
[{:lvl lvl
:leaf node}]))]
(walk 0 root)))
To run:
(->> '((1 2 ((3))) (4))
(leaf-seq seq? identity)
(map :lvl)
(reduce +))
;; => 10
where the depths of each node are:
(->> '((1 2 ((3))) (4))
(leaf-seq seq? identity)
(map :lvl))
;; => (2 2 4 2)
Updates - sum all nodes instead of just leaf nodes
I misread the original requirement and was assuming leaf nodes only. To add the branch node back is easy, we just need to cons it before its child sequence.
(defn node-seq
"Returns all the nodes marked with depth/level"
[branch? children root]
(let [walk (fn walk [lvl node]
(lazy-seq
(cons {:lvl lvl
:node node}
(when (branch? node)
(->> node
children
(mapcat (partial walk (inc lvl))))))))]
(walk 0 root)))
Then we can walk on the hiccup-like tree as before:
(->> ["COM" [["B" [["C" [["D" [["E" [["F"] ["J" [["K" [["L"]]]]]]] ["I"]]]]] ["G" [["H"]]]]]]]
(node-seq #(s/valid? ::branch %) second)
(map :lvl)
(reduce +))
;; => 42
Note: above function uses below helper specs to identify the branch/leaf:
(s/def ::leaf (s/coll-of string? :min-count 1 :max-count 1))
(s/def ::branch (s/cat :tag string? :children (s/coll-of (s/or :leaf ::leaf
:branch ::branch))))
Here's my alternative approach that does use recur:
(defn sum-of-depths
[branches]
(loop [branches branches
cur-depth 0
total-depth 0]
(cond
(empty? branches) total-depth
:else (recur
(mapcat (fn [node] (second node)) branches)
(inc cur-depth)
(+ total-depth (* (count branches) cur-depth))))))
(def tree ["COM" (["B" (["C" (["D" (["E" (["F"] ["J" (["K" (["L"])])])] ["I"])])] ["G" (["H"])])])])
(sum-of-depths [tree]) ; For the first call we have to wrap the tree in a list.
=> 42
You can do this using the Tupelo Forest library. Here is a function to extract information about a tree in Hiccup format. First, think about how we want to use the information for a simple tree with 3 nodes:
(dotest
(hid-count-reset)
(let [td (tree-data [:a
[:b 21]
[:c 39]])]
(is= (grab :paths td) [[1003]
[1003 1001]
[1003 1002]])
(is= (grab :node-hids td) [1003 1001 1002])
(is= (grab :tags td) [:a :b :c])
(is= (grab :depths td) [1 2 2])
(is= (grab :total-depth td) 5) ))
Here is how we calculate the above information:
(ns tst.demo.core
(:use tupelo.forest tupelo.core tupelo.test)
(:require
[schema.core :as s]
[tupelo.schema :as tsk]))
(s/defn tree-data :- tsk/KeyMap
"Returns data about a hiccup tree"
[hiccup :- tsk/Vec]
(with-forest (new-forest)
(let [root-hid (add-tree-hiccup hiccup)
paths (find-paths root-hid [:** :*])
node-hids (mapv xlast paths)
tags (mapv #(grab :tag (hid->node %)) node-hids)
depths (mapv count paths)
total-depth (apply + depths)]
(vals->map paths node-hids tags depths total-depth))))
and an example on a larger Hiccup-format tree:
(dotest
(let [td (tree-data [:a
[:b 21]
[:b 22]
[:b
[:c
[:d
[:e
[:f
[:g 7]
[:h
[:i 9]]]]]]
[:c 32]]
[:c 39]])]
(is= (grab :tags td) [:a :b :b :b :c :d :e :f :g :h :i :c :c])
(is= (grab :depths td) [1 2 2 2 3 4 5 6 7 7 8 3 2])
(is= (grab :total-depth td) 52)))
Don't be afraid of stack size for normal processing. On my computer, the default stack doesn't overflow until you get to a stack depth of over 3900 recursive calls. For a binary tree, just 2^30 is over a billion nodes, and 2^300 is more nodes than the number of protons in the universe (approx).

In a tree, how do I find paths to tree nodes that have children with leaves?

Basically, I am trying to implement this algorithm, though maybe there's a better way to go about it.
starting at the root
check each child of current node for children with leafs (child of child)
if any child-of-child nodes of the current node have leafs, record path to current node (not to child) and do not continue down that path any farther.
else continue DFS
non-functional pseudo code:
def find_paths(node):
for child in node.children:
if child.children.len() == 0
child_with_leaf = true
if child_with_leaf
record path to node
else
for child in node.children
find_paths(child)
For example:
:root
|- :a
| +- :x
| |- :y
| | +- :t
| | +- :l2
| +- :z
| +- :l3
+- :b
+- :c
|- :d
| +- :l4
+- :e
+- :l5
The result would be:
[[:root :a]
[:root :b :c]]
Here is my crack at it in clojure:
(defn atleast-one?
[pred coll]
(not (nil? (some pred coll))))
; updated with erdos's answer
(defn children-have-leaves?
[loc]
(some->> loc
(iterate z/children)
(take-while z/branch?)
(atleast-one? (comp not empty? z/children))))
(defn find-paths
[tree]
(loop [loc (z/vector-zip tree)
ans nil]
(if (z/end? loc)
ans
(recur (z/next loc)
(cond->> ans
(children-have-leaves? loc)
(cons (->> loc z/down z/path (map z/node)))))))
)
(def test-data2
[:root [:a [:x [:y [:t [:l2]]] [:z [:l3]]]] [:b [:c [:d [:l4]] [:e [:l5]]]]]
)
Update: fixed the crash with erdos' answer below, but I think there's still a problem with my code since this prints every path and not the desired ones.
I assume you have referenced my previous answer related to zipper. But please note that my previous answer uses vector-zip as is and hence you have to navigate it like a vector-zip - which you may have to wrap your head around how the two cursors work. To simplify the navigation, I suggest you create your own zipper for your tree structure. I.e.
(defn my-zipper [root]
(z/zipper ;; branch?
(fn [x]
(when (vector? x)
(let [[n & xs] x] (and n (-> xs count zero? not)))))
;; children
(fn [[n & xs]] xs)
;; make-node
(fn [[n & _] xs] [n xs])
root))
then the solution will be similar to my other answer:
(def test-data2
[:root
[:a
[:x
[:y
[:t [:l2]]]
[:z [:l3]]]]
[:b
[:c
[:d [:l4]]
[:e [:l5]]]]])
(->> test-data2
my-zipper
(iterate z/next)
(take-while (complement z/end?))
(filter (comp children-with-leaves? z/node))
(map #(->> % z/path (map z/node)))
set)
;; => #{(:root :a :x) (:root :a :x :y) (:root :b :c)}
where the main logic is simplified to:
(defn children-with-leaves? [[_ & children]]
(some (fn [[c & xs]] (nil? xs)) children))
The exception comes from your children-have-leaves? function.
The (not (empty? z/children)) expression fails, because z/children is a function, however, empty? must be invoked on a collection.
What you need is a predicate that returns true if a node has children, like: (fn [x] (not (empty? (z/children x)))) or shorter: (comp not empty? z/children)
The correct implementation:
(defn children-have-leaves?
[loc]
(some->> loc
(iterate z/children)
(take-while z/branch?)
(atleast-one? (comp not empty? z/children))))
If you want to process tree-like data structures, I would highly recommend the tupelo.forest library.
I don't understand your goal, though. Nodes :a and :c in your example are not equally distant from the closest leaf.
Actually, I just noticed that the tree in your example is different than the tree in your code attempt. Could you please update the question to make them consistent?
Here is an example of how you could do it:
(dotest ; find the grandparent of each leaf
(hid-count-reset)
(with-forest (new-forest)
(let [data [:root
[:a
[:x
[:y [:t [:l2]]]
[:z [:l3]]]]
[:b [:c
[:d [:l4]]
[:e [:l5]]]]]
root-hid (add-tree-hiccup data)
leaf-paths (find-paths-with root-hid [:** :*] leaf-path?)
grandparent-paths (mapv #(drop-last 2 %) leaf-paths)
grandparent-tags (set
(forv [path grandparent-paths]
(let [path-tags (it-> path
(mapv #(hid->node %) it)
(mapv #(grab :tag %) it))]
path-tags)))]
(is= (format-paths leaf-paths)
[[{:tag :root} [{:tag :a} [{:tag :x} [{:tag :y} [{:tag :t} [{:tag :l2}]]]]]]
[{:tag :root} [{:tag :a} [{:tag :x} [{:tag :z} [{:tag :l3}]]]]]
[{:tag :root} [{:tag :b} [{:tag :c} [{:tag :d} [{:tag :l4}]]]]]
[{:tag :root} [{:tag :b} [{:tag :c} [{:tag :e} [{:tag :l5}]]]]]])
(is= grandparent-tags
#{[:root :a :x]
[:root :a :x :y]
[:root :b :c]} ))))

How to transform a list of maps to a nested map of maps?

Getting data from the database as a list of maps (LazySeq) leaves me in need of transforming it into a map of maps.
I tried to 'assoc' and 'merge', but that didn't bring the desired result because of the nesting.
This is the form of my data:
(def data (list {:structure 1 :cat "A" :item "item1" :val 0.1}
{:structure 1 :cat "A" :item "item2" :val 0.2}
{:structure 1 :cat "B" :item "item3" :val 0.4}
{:structure 2 :cat "A" :item "item1" :val 0.3}
{:structure 2 :cat "B" :item "item3" :val 0.5}))
I would like to get it in the form
=> {1 {"A" {"item1" 0.1}
"item2" 0.2}}
{"B" {"item3" 0.4}}
2 {"A" {"item1" 0.3}}
{"B" {"item3" 0.5}}}
I tried
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply merge-with into))
This gives
{1 {"A" {"item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
By merging I lose some entries, but I can't think of any other way. Is there a simple way? I was even about to try to use specter.
Any thoughts would be appreciated.
If I'm dealing with nested maps, first stop is usually to think about update-in or assoc-in - these take a sequence of the nested keys. For a problem like this where the data is very regular, it's straightforward.
(assoc-in {} [1 "A" "item1"] 0.1)
;; =>
{1 {"A" {"item1" 0.1}}}
To consume a sequence into something else, reduce is the idiomatic choice. The reducing function is right on the edge of the complexity level I'd consider an anonymous fn for, so I'll pull it out instead for clarity.
(defn- add-val [acc line]
(assoc-in acc [(:structure line) (:cat line) (:item line)] (:val line)))
(reduce add-val {} data)
;; =>
{1 {"A" {"item1" 0.1, "item2" 0.2}, "B" {"item3" 0.4}},
2 {"A" {"item1" 0.3}, "B" {"item3" 0.5}}}
Which I think was the effect you were looking for.
Roads less travelled:
As your sequence is coming from a database, I wouldn't worry about using a transient collection to speed the aggregation up. Also, now I think about it, dealing with nested transient maps is a pain anyway.
update-in would be handy if you wanted to add up any values with the same key, for example, but the implication of your question is that structure/cat/item tuples are unique and so you just need the grouping.
juxt could be used to generate the key structure - i.e.
((juxt :structure :cat :item) (first data))
[1 "A" "item1"]
but it's not clear to me that there's any way to use this to make the add-val fn more readable.
You may continue to use your existing code. Only the final merge has to change:
(defn deep-merge [& xs]
(if (every? map? xs)
(apply merge-with deep-merge xs)
(apply merge xs)))
(->> data
(map #(assoc {} (:structure %) {(:cat %) {(:item %) (:val %)}}))
(apply deep-merge))
;; =>
{1
{"A" {"item1" 0.1, "item2" 0.2},
"B" {"item3" 0.4}},
2
{"A" {"item1" 0.3},
"B" {"item3" 0.5}}}
Explanation: your original (apply merge-with into) only merge one level down. deep-merge from above will recurse into all nested maps to do the merge.
Addendum: #pete23 - one use of juxt I can think of is to make the function reusable. For example, we can extract arbitrary fields with juxt, then convert them to nested maps (with yet another function ->nested) and finally do a deep-merge:
(->> data
(map (juxt :structure :cat :item :val))
(map ->nested)
(apply deep-merge))
where ->nested can be implemented like:
(defn ->nested [[k & [v & r :as t]]]
{k (if (seq r) (->nested t) v)})
(->nested [1 "A" "item1" 0.1])
;; => {1 {"A" {"item1" 0.1}}}
One sample application (sum val by category):
(let [ks [:cat :val]]
(->> data
(map (apply juxt ks))
(map ->nested)
(apply (partial deep-merge-with +))))
;; => {"A" 0.6000000000000001, "B" 0.9}
Note deep-merge-with is left as an exercise for our readers :)
(defn map-values [f m]
(into {} (map (fn [[k v]] [k (f v)])) m))
(defn- transform-structures [ss]
(map-values (fn [cs]
(into {} (map (juxt :item :val) cs))) (group-by :cat ss)))
(defn transform [data]
(map-values transform-structures (group-by :structure data)))
then
(transform data)

get-in for lists

Apparently get-in doesn't work for '() lists since they're not an associative data structure. This makes sense for the API and from the perspective of performance of large lists. From my perspective as a user it'd be great to still use this function to explore some small test data in the repl. For example I want to be able to:
(-> '({:a ("zero" 0)} {:a ("one" 1)} {:a ("two" 2)})
(get-in [1 :a 0]))
=> "one"
Is there some other function that works this way? Is there some other way to achieve this behavior that doesn't involve converting all my lists to (say) vectors?
This does what you ask:
(defn get-nth-in [init ks]
(reduce
(fn [a k]
(if (associative? a)
(get a k)
(nth a k)))
init
ks))
For example,
(-> '({:a "zero"} {:a "one"} {:a "two"})
(get-nth-in [1 :a]))
;"one"
and
(-> '({:a ("zero" 0)} {:a ("one" 1)} {:a ("two" 2)})
(get-nth-in [1 :a 0]))
;"one"
The extra 's you have get expanded into (quote ...):
(-> '({:a '("zero" 0)} {:a '("one" 1)} {:a '("two" 2)})
(get-nth-in [1 :a 0]))
;quote
Not what you intended, I think.
A post just yesterday had a problem regarding lazy lists and lazy maps (from clojure/data.xml). One answer was to just replace the lazy bits with plain vectors & maps using this function:
(defn unlazy
[coll]
(let [unlazy-item (fn [item]
(cond
(sequential? item) (vec item)
(map? item) (into {} item)
:else item))
result (postwalk unlazy-item coll)
]
result ))
Since the resulting data structure uses only vectors & maps, it works for your example with get-in:
(let [l2 '({:a ("zero" 0)} {:a ("one" 1)} {:a ("two" 2)})
e2 (unlazy l2) ]
(is= l2 e2)
(is= "one" (get-in e2 [1 :a 0] l2))
)
You can find the unlazy function in the Tupelo library.
The first param for get-in should be a map.
You have to figure out the feature of your sequence, use last, first, filter or some e.g. to get the element first
for example you could use (:a (last data))