Is manipulating a vector of nested maps possible using zippers? - clojure

I need to turn the following input into output by applying the following two rules:
remove all vectors that have "nope" as last item
remove each map that does not have at least one vector with "ds1" as last item
(def input
[{:simple1 [:from [:simple1 'ds1]]}
{:simple2 [:from-any [[:simple2 'nope] [:simple2 'ds1]]]}
{:walk1 [:from [:sub1 :sub2 'ds1]]}
{:unaffected [:from [:unaffected 'nope]]}
{:replaced-with-nil [:from [:the-original 'ds1]]}
{:concat1 [:concat [[:simple1 'ds1] [:simple2 'ds1]]]}
{:lookup-word [:lookup [:word 'word :word 'ds1]]}])
(def output
[{:simple1 [:from [:simple1 'ds1]]}
{:simple2 [:from-any [[:simple2 'ds1]]]}
{:walk1 [:from [:sub1 :sub2 'ds1]]}
{:replaced-with-nil [:from [:the-original 'ds1]]}
{:concat1 [:concat [[:simple1 'ds1] [:simple2 'ds1]]]}
{:lookup-word [:lookup [:word 'word :word 'ds1]]}])
I was wondering if performing this transformation is possible with zippers?

I'd recommend clojure.walk instead for this kind of general tree transformation. It can take a bit of fiddling to get the replacement functions right but it works nicely with any nesting of Clojure data structures, which AFAIK can be a bit more challenging in a zipper based approach.
We're looking to shrink our tree, so postwalk is my go-to here. It takes a function f and a tree root and goes through the tree, replacing each leaf value with (f leaf), then their parents and their parents etc. until finally replacing the root. (prewalk is similar but proceeds from root and down to leaves, so it's usually more natural when you're growing the tree by splitting branches.)
The strategy here is to somehow construct a function that prunes any branch which meets our removal criteria, but returns any other value unchanged.
(ns shrink-tree
(:require [clojure.walk :refer [postwalk]]))
(letfn[(rule-1 [node]
(and (vector? node)
(= 'nope (last node))))
(rule-2 [node]
(and
(map? node)
(not-any? #(and (vector? %) (= 'ds1 (last %)))
(tree-seq vector? seq (-> node vals first)))))
(remove-marked [node]
(if (coll? node)
(into (empty node) (remove (some-fn rule-1 rule-2) node))
node))]
(= output (postwalk remove-marked input)))
;; => true
Here the fns rule-1 and rule-2 try to turn your rules into predicates and remove-marked:
If a node is a collection, returns the same collection, less any members for which rule1 or rule2 return truthy when called with that member. To check for either one at the same time we combine the predicates with some-fn.
Otherwise returns the same node. This is how we keep values like 'ds1 or :from-any around.

You might also want to consider looking at specter. It supports these sorts of transformations by allowing you to select and transform arbitrarily complex structures.

Related

Loop through vector of vectors and remove element from vector in Clojure

I am new to clojure programming and would like some help with some code. I have this vector of vectors like below,
(def start-pos [[[:fox :goose :corn :you] [:boat] []]])
I would like to loop through the vector and remove an element from one of the internal vectors, e.g. remove ':goose' from start-pos.
I tried the code below but for some reason it doesnt work as intended,
(map #(disj (set %) :goose) start-pos)
Instead the result is,
(#{[:boat] [] [:fox :goose :corn :you]})
As you can see from the result, the internal vectors are now a set and yes, the original order is distorted, is there a way of removing the element and not disarrange the original order of the vectors, maybe without converting it to a set first? I choose this conversion to a set first because according to the docs disj only works for sets.
Add: This post is not similar to this suggested post as my vector is nested three vectors deep.
the internal vectors are now a set
That's because the result of #(disj (set %) :goose) returns a set.
original order is distorted
Sets don't preserve insertion order by default, similar to maps with over 8 keys.
I would like to loop through the vector and remove an element from one of the internal vectors, e.g. remove ':goose' from start-pos.
The function you need for removing an element from a collection by predicate is called remove, but...
The value you want to remove is actually nested three vectors deep in start-pos, so you'd need an additional iteration for each inner vector, and so on if you wanted to remove the keyword :goose from every vector recursively. That's an excuse to use clojure.walk:
(clojure.walk/postwalk
(fn [v]
(if (coll? v)
(into (empty v) (remove #{:goose}) v)
v))
start-pos)
=> [[[:fox :corn :you] [:boat] []]]
This walks every value in start-pos, removing :goose from any collections it finds.
Here is a less flexible approach, that I made more so for my own benefit (learning Clojure)
(update-in
start-pos
[0 0]
#(vec (concat
(subvec % 0 1)
(subvec % (inc 1)))))
It manually navigates in and reconstructs the :goose level of keywords to not have :goose inside
I think some alternative approaches to this problem include Specter and Zippers
you could also employ clojure zipper for that:
user> (require '[clojure.zip :as z])
user> (loop [curr (z/vector-zip start-pos)]
(cond (z/end? curr) (z/root curr)
(= :goose (z/node curr)) (recur (z/remove curr))
:else (recur (z/next curr))))
;; => [[[:fox :corn :you] [:boat] []]]
also, that is quite easy to do with clojure's core functions only:
user> (defn remv [pred data]
(if (vector? data)
(mapv (partial remv pred) (remove pred data))
data))
#'user/remv
user> (remv #{:goose} start-pos)
;; => [[[:fox :corn :you] [:boat] []]]

Clojure Specter: Pass complete entry path to walker()

I am using Nathan Marz's wonderful Specter library. I am doing syntax tree transformations with it, among other things. Suppose that there is a nested data structure:
(def expr
'([:var price (5)]
[:var output ("")]
(clojure.core/cond
(< price 4)
[:var output ("6")]
(= price 5)
[:var output ("==")]
:else
[:var output ("7")])))
I can apply a transformation to all :var nodes via:
(transform [(walker #(and (sequential? %1) (= :var (first %1))))]
transform-fn expr)
However, i'd like to pass the complete path of the navigated-to node to transform-fn, so that to distinguish between parent :var entries and nested ones.
More generally, the action of transform-fn should depend on the complete path of the node being operated on. In a sense, this is similar to inserting VAL for each node visited by walker. How can this be achieved?
Thanks!

"nth not supported" on PersistentHashSet when destructuring Set in Loop header

Clojure noob here.
I want to pull the front and rest out of a Set. Doing (front #{1}) and (rest #{1}) produce 1 and () respectively, which is mostly what I'd expect.
However in the code below, I use the destructuring [current-node & open-nodes] #{start} in my loop to pull something out of the set (at this point I don't really care about if it was the first or last item. I just want this form working) and it breaks.
Here's my function, half-implementing a grid search:
(defn navigate-to [grid start dest]
"provides route from start to dest, not including start"
(loop [[current-node & open-nodes] #{start} ;; << throws exception
closed-nodes #{}]
(if (= dest current-node)
[] ;; todo: return route
(let [all-current-neighbours (neighbours-of grid current-node) ;; << returns a set
open-neighbours (set/difference all-current-neighbours closed-nodes)]
(recur (set/union open-nodes open-neighbours)
(conj closed-nodes current-node))))))
When stepping through (with Cider), on the start of the first loop, it throws this exception:
UnsupportedOperationException nth not supported on this type: PersistentHashSet clojure.lang.RT.nthFrom (RT.java:933)
I could use a nested let form that does first/rest manually, but that seems wasteful. Is there a way to get destructured Sets working like this in the loop form? Is it just not supported on Sets?
Sets are unordered, so positional destructuring doesn't make much sense.
According to the documentation for Special Forms, which treats destructuring as well, sequential (vector) binding is specified to use nth and nthnext to look up the elements to bind.
Vector binding-exprs allow you to bind names to parts of sequential things (not just vectors), like vectors, lists, seqs, strings, arrays, and anything that supports nth.
Clojure hash sets (being instances of java.util.Set) do not support lookup by index.
I don't know the context of your example code, but in any case pouring the set contents into an ordered collection, for example (vec #{start}), would make the destructuring work.
As mentioned by others you cannot bind a set to a vector literal, because a set is not sequential. So even this simple let fails with nth not supported:
(let [[x] #{1}])
You could work around this by "destructuring" the set with the use of first and disj:
(loop [remaining-nodes #{start}
closed-nodes #{}]
(let [current-node (first remaining-nodes)
open-nodes (disj remaining-nodes current-node)]
;; rest of your code ...
))
Using (rest remaining-nodes) instead of (disj remaining-nodes current-node) could be possible, but as sets are unordered, rest is in theory not obliged to take out the same element as was extracted with first. Anyway disj will do the job.
NB: be sure to detect remaining-nodes being nil, which could lead to an endless loop.
Algorithm for returning the route
For implementing the missing part in the algorithm (returning the route) you could maintain
a map of paths. It would have one path for each visited node: a vector with the nodes leading from the start node to that node, keyed by that node.
You could use reduce to maintain that map of paths as you visit new nodes. With a new function used together with that reduce and an added nil test, the program could look like this:
(defn add-path [[path paths] node]
"adds a node to a given path, which is added to a map of paths, keyed by that node"
[path (assoc paths node (conj path node))])
(defn navigate-to [grid start dest]
"provides route from start to dest, including both"
(loop [remaining-nodes #{start}
closed-nodes #{}
paths (hash-map start [start])]
(let [current-node (first remaining-nodes)
current-path (get paths current-node)
all-current-neighbours (neighbours-of grid current-node)
open-neighbours (set/difference all-current-neighbours closed-nodes)]
(if (contains? #{dest nil} current-node)
current-path ;; search complete
(recur (set/union (disj remaining-nodes current-node) open-neighbours)
(conj closed-nodes current-node)
(second (reduce add-path [current-path paths] open-neighbours)))))))
The essence of the algorithm is still the same, although I merged the original let with the one needed for destructuring the nodes. This is not absolutely needed, but it probably makes the code more readable.
Test
I tested this with a poor-mans definition of grid and neighbours-of, based on this graph (digits are nodes, bars indicate linked nodes:
0--1 2
| | |
3--4--5
|
6--7--8
This graph seems a good candidate for a test as it has a loop, a dead end, and is connected.
The graph is encoded with grid being a vector, where each element represents a node. An element's index in that vector is the node's identifier. The content of each element is a set of neighbours, making the neighbours-of function a trivial thing (your implementation will be different):
(def grid [#{1 3} #{0 4} #{5}
#{0 4 6} #{1 3 5} #{2 4}
#{3 7} #{6 8} #{7} ])
(defn neighbours-of [grid node]
(get grid node))
Then the test is to find the route from node 0 to node 8:
(println (navigate-to grid 0 8))
Output is:
[0 1 4 3 6 7 8]
This outcome demonstrates that the algoritm does not guarantee a shortest route, only that a route will be found if it exists. I suppose the outcome could be different on different engines, depending on how the Conjure internals decide which element to take from a set with first.
After removing one of the necessary node links, like the one between node 7 and 8, the output is nil.
NB: I found this an interesting question, and probably went a bit too far in my answer.

How do I flatten a sequence of sequences of maps into a sequence of vectors?

I'm trying to build a POS tagger in Clojure. I need to iterate over a file and build out feature vectors. The input is (text pos chunk) triples from a file like the following:
input from the file:
I PP B-NP
am VBP B-VB
groot NN B-NP
I've written functions to input the file, transform each line into a map, and then slide over a variable amount of the data.
(defn lazy-file-lines
"open a file and make it a lazy sequence."
[filename]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader filename))))
(defn to-map
"take a a line from a file and make it a map."
[lines]
(map
#(zipmap [:text :pos :chunk] (clojure.string/split (apply str %) #" "))lines)
)
(defn window
"create windows around the target word."
[size filelines]
(partition size 1 [] filelines))
I plan to use the above functions in the following way:
(take 2 (window 3(to-map(lazy-file-lines "/path/to/train.txt"))))
which gives the following output for the first two entries in the sequence:
(({:chunk B-NP, :pos NN, :text Confidence} {:chunk B-PP, :pos IN, :text in} {:chunk B-NP, :pos DT, :text the}) ({:chunk B-PP, :pos IN, :text in} {:chunk B-NP, :pos DT, :text the} {:chunk I-NP, :pos NN, :text pound}))
Given each sequence of maps within the sequence, I want to extract :pos and :text for each map and put them in one vector. Like so:
[Confidence in the NN IN DT]
[in the pound IN DT NN]
I've not been able to conceptualize how to handle this in clojure. My partial attempted solution is below:
(defn create-features
"creates the features and tags from the datafile."
[filename windowsize & features]
(map #(apply select-keys % [:text :pos])
(->>
(lazy-file-lines filename)
(window windowsize))))
I think one of the issues is that apply is referencing a sequence itself, so select-keys isn't operating on a map. I'm not sure how to nest another apply function into this, though.
Any thoughts on this code would be great. Thanks.
I'm not entirely sure what you want as input and output, and to be honest, I don't want to work through all of the code that you've provided to figure that out, since I don't think that all of the code is essential to the question. Someone else may give you an answer that's narrowly tailored to your code, but I think the real question is more general.
I'm guessing that the general idea of what you want to implement is that:
Given a sequence of sequence of maps, select those map entries that have particular keys, and then return a sequence of vectors representing map entries. If that's not what you want, I think that the following will probably give you an idea about how to proceed.
This method is not the most efficient or concise, but it breaks the problem down into a series of steps that are easy to understand:
(defn selkeys-or-not
"Like select-keys, but returns nil rather than {} if no keys match."
[keys map]
(not-empty (select-keys map keys)))
(defn seq-seqs-maps-to-seq-vecs
"Given a sequence of keys, and a sequence of sequences of maps,
returns a sequence of vectors, where each vector contains key-val
pairs from the maps for matching keys."
[keys seq-seqs-maps]
(let [maps (flatten seq-seqs-maps)]
(map vec
(apply concat
(filter identity
(map (partial selkeys-or-not keys) maps))))))
What's happening in the second function:
First, we flatten the outer sequence, since fact that the maps are within inner sequences is irrelevant to our goals. This gives us a single sequence of maps.
Then we map a helper function selkeys-or-not over the sequence of maps, passing our keys to the helper function. select-keys returns {} when it finds nothing, but {} is truthy, and we want a falsey value in this case for the next step. selkeys-or-not returns a falsey value (nil) instead of {}.
Now we can filter out the nils using filter identity--filter returns a sequence containing all values such that its first argument returns a truthy value.
At this point we have a sequence of maps, but we want a sequence of vectors instead. applying concat turns the sequence of maps into a sequence of map entries, and mapping vec over them turns the map entries into vectors.
(defn extract-line-seq
[ls]
(concat (map :text ls)
(map :pos ls)))
(extract-line-seq '({:chunk B-NP, :pos NN, :text Confidence} {:chunk B-PP, :pos IN, :text in} {:chunk B-NP, :pos DT, :text the}))
;-> (Confidence in the NN IN DT)
You can put it into a vector if you want outside of the function. This way laziness is an option to the caller.

Clojure (or any functional language): is there a functional way of building flat lists by a recursive function?

I've got a recursive function building a list:
(defn- traverse-dir
"Traverses the (source) directory, preorder"
[src-dir dst-root dst-step ffc!]
(let [{:keys [options]} *parsed-args*
uname (:unified-name options)
[dirs files] (list-dir-groomed (fs/list-dir src-dir))
... recursive call of traverse-dir is the last expression of dir-handler
(doall (concat (map-indexed (dir-handler) dirs) (map-indexed (file-handler) files))))) ;; traverse-dir
The list, built by traverse-dir, is recursive, while I want a flat one:
flat-list (->> (flatten recursive-list) (partition 2) (map vec))
Is there a way of building the flat list in the first place? Short of using mutable lists, that is.
I don't quite understand your context with a dir-handler that is called with nothing and returns a function which expects indices and directories, list-dir-groomed and all of that, but I'd recommend a look at tree-seq:
(defn tree-seq
"Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the
tree."
{:added "1.0"
:static true}
[branch? children root]
(let [walk (fn walk [node]
(lazy-seq
(cons node
(when (branch? node)
(mapcat walk (children node))))))]
(walk root)))
My go-to use here is
(tree-seq #(.isDirectory %) #(.listFiles %) (clojure.java.io/as-file file-name))
but your context might mean that doesn't work. You can change to different functions for getting child files if you need to sanitize those, or you can just use filter on the output. If that's no good, the same pattern of a local fn from nodes into pre-walks that handles children by recursively mapcatting itself over them seems pretty applicable.