Build a tree from an vector in Clojure - clojure

I'm working on my first Clojure program. I'm having some issues figuring out how to build a tree based on an input that looks like this:
["A B" "A C" "C D" "D E" "A F" "F G"]
And the output should be something like this:
'(A (B) (C (D (E)) (F (G)))
I'm not sure how to start doing this. When it comes to an imperative programming, for example, I'd use nested loops to find whether this relation already exists or not and when it doesn't, I'd find the parent element and append the child to it. But as far as I know, functional programming would use another approach, one that I would recursively walk all the elements in a vector and in lieu of changing the existent tree I'd built a new one.
I'm not sure whether this works or not, but I have a function that looks like this:
(defn build-tree
[func tree parent child]
(map (fn [i] (if (seq? i)
build-tree
(func i parent child)))))
I'm sure the issue comes from my unfamiliarity with Clojure and functional programming and was hoping someone could explain the best way to use recursion and built this tree.

You are possibly going to receive a few answers much shorter than mine below. I've decided to kind of teach you fish - instead of showing a short program I am going to take you through a systematic way of solving this class of problems. You will have to fill in a few blanks, including converting your data into a more palatable form. These are minor points that will only distract.
The first thing you need to do is break down your output into construction steps. Those steps should match the abstract type your output is representing. In your case the output's concrete type is a list, but it actually represents a tree. These are different abstract types - a list has a head and a tail, that's it, but the tree has nodes, which may have children. You need to think in terms of abstract constructors that build different types of nodes, not about the accidental specific structure you happened to choose to represent your abstract type - a list in your case.
So imagine that you have two constructors that look like
(defn ->branch
[id kids]
...)
(defn ->leaf
[id]
...)
It is of course assumed that every kid is a tree - that is, either a branch or a leaf. In other words, every kid should be a result of either ->branch or ->leaf.
So the only way to build any tree is to have a nested chain of constructor invocations. That's what I mean by "breaking down the output into construction steps." In your case, it means the following chain:
(->branch :A [(->leaf :B) (->branch :C [(->branch :D (->leaf :E))]) (->branch :F [(->leaf :G)])])
(I am going to use keywords instead of symbols for node ids, otherwise we will get bogged down in binding/quoting details.)
Stop here for a second to feel the difference between functional and imperative styles. You outlined the imperative style yourself - you think in terms of updating a tree by finding the right place and adding a new node there. In functional style, you think in terms of creating a value, not updating another value. (Of course nothing technically prevents you from doing the same in an imperative language - like using a nested call to constructors, or using a static factory this way, it's just not something that comes about naturally.)
To get to your specific tree representation as a list, you just need to fill in the implementation of ->branch and ->leaf above, which is very simple. (Left as an exercise for you.)
Now back to building the tree. The task of coding a function that builds a tree is actually creating a chain of constructor calls from the input data. So you need to understand two things:
When to call which constructor (here, when to call ->branch and when call ->leaf)
How to get parameters for the constructors (id for either, and kids for ->branch)
To start, how do we know what node to begin? That depends on what the problem is, but let's assume that it is given us as a parameter. So we assume that we are building the following function:
(defn ->tree
[adj-list node]
...)
The adj-list is your input - which is an adjacency list even if you try to disguise it as a list of strings, and we are going to treat it this way (like #SamEstep suggested in the comments.) Left as an exercise for you to convert your form of input into an adjacency list.
So it should be clear what we use as an id in our constructors - the node. But is it ->branch or ->leaf? Well, the answer depends on whether we have direct descendants of node in the adj-list, so apparently we need a function
(defn descendants
[adj-list node]
...)
which returns a list (possibly empty) of ids of direct descendants of the node. (Left as an exercise.) So we can decide whether to call ->branch or ->leaf depending on whether that list is empty or not, like
(if-let [kid-ids (descendants node)]
(->branch node ...)
(->leaf node))
Ok, then we need to supply these ... to ->branch. Are they kid-ids? No, they must be trees, not ids, either branches or leaves themselves. In other words, we need to first call ->tree on them. See? We hit the recursion point, and it came about naturally (or so I hope). In Clojure, calling a function on every element of a sequence is done by map, which returns a sequence of results, which is exactly what we need:
(if-let [kid-ids (descendants adj-list node)]
(->branch node (map ->tree kid-ids)
(->leaf node))
except that ->tree expects an additional parameter, adj-list. We can use an anonymous function, or we can use partial. We will use a partial with a let binding, which is the cleanest way of doing it:
(let [->tree' (partial ->tree adj-list)]
(if-let [kid-ids (descendants adj-list node)]
(->branch node (map ->tree' kid-ids))
(->leaf node)))
This is it. Let's put it together:
(defn ->tree
[adj-list node]
(let [->tree' (partial ->tree adj-list)]
(if-let [kid-ids (descendants adj-list node)]
(->branch node (map ->tree' kid-ids))
(->leaf node))))
The result:
(def adj-list [[:A :B] [:A :C] [:C :D] [:D :E] [:A :F] [:F :G]])
(->tree adj-list :A)
;; => (:A (:B) (:C (:D (:E))) (:F (:G)))
Let's sum up:
Look at your data in abstract terms. Create constructors, and use them for building your output.
To build output means to create a chain of constructor calls.
Which means you need to map your input's shape to output constructors and their parameters.
In many cases, the recursion will emerge by itself naturally at this step.
After you have done it, you can optimize and short-circuit to your heart's content (like if you look closely, you can do away with ->leaf.) But don't do it prematurely.

Your output looks like an associative structure, is there a reason you're using a list instead of a map here?
Also is the ordering guaranteed to match the tree structure as in your example? I'll assume not. And I think it's easier to not use recursion here...
So let's parse the input into an adjacency list, like:
(reduce
(fn[g edge]
(let [[from to]] (map keyword (str/split " " edge))
(update g from #(conj % to))
{}
["A B" "B C"])
Which should output:
{A [B C F],,,}
which can be used to create your tree as a list if you desire.
I can't test this because I'm in mobile at the moment so forgive any mistakes. :)

Thought user2946753's example was useful. Tweaked it to get it working.
(defn example []
(reduce (fn[g edge]
(let [[from to] (map keyword (str/split edge #" "))]
(update g from #(conj % to))))
{}
["A B" "B C"]))
(example)
=> {:A (:B), :B (:C)}

Related

Process a changing list using a higher-order function in Clojure

Is there any way to process a changing list using higher-order functions in Clojure and not using explicit recursion? For example, consider the following problem (that I made up to illustrate what I have in mind):
Problem: Given a list of unique integers of unknown order. Write a
that produces an output list as follows:
For any even integer, keep the same relative position in the output list.
For any odd integer, multiply by ten, and put the new number at a new
place: at the back of the original list.
So for example, from original vector [1 2 3 4 5], we get: [2 4 10 30 50]
I know how to solve this using explicit recursion. For example:
(defn process
[v]
(loop
[results []
remaining v]
(if (empty? remaining)
results
(if (even? (first remaining))
(recur (conj results (first remaining)) (rest remaining))
(recur results (conj (vec (rest remaining)) (* 10 (first remaining))))))))
This works fine. Notice that remaining changes as the function does its work. I'm doing the housekeeping here too: shuffling elements from remaining to results. What I would like to do is use a higher-order function that does the housekeeping for me. For example, if remaining did not change as the function does its work, I would use reduce and just kick off the process without worrying about loop or recur.
So my question is: is there any way to process an input (in this example, v) that changes over the course of its operations, using a higher-order function?
(Side note for more context: this question was inspired by Advent of Code 2020, Question 7, first part. There, the natural way to approach it, is to use recursion. I do here (in the find-all-containers function; which is the same way other have approached it, for example, here in the find-outer-bags function, or here in the sub-contains? function.)
This is much easier to do without recursion than with it! Since you only care about the order of evens relative to other evens, and likewise for odds, you can start by splitting the list in two. Then, map the right function over each, and simply concatenate the results.
(defn process [xs]
(let [evens (filter even? xs)
odds (filter odd? xs)]
(concat evens (map #(* 10 %) odds))))
As to the advent of code problem, I recommend working with a better data structure than a list or a vector. A map is a better way to represent what's going on, because you can easily look up the properties of each sub-bag by name. If you have a map from bag color to contents, you can write a simple (recursive) function that asks: "Can color a contain color b?" For leaf nodes the answer is no, for nodes equal to the goal color it's yes, and for branches you recurse on the contents.

clojure's `into` in common lisp

clojure has a handy (into to-coll from-coll) function, adding elements from from-coll to to-coll, retaining to-coll's type.
How can this one be implemented in common lisp?
The first attempt would be
(defun into (seq1 seq2)
(concatenate (type-of seq1) seq1 seq2))
but this one obviously fails, since type-of includes the vector's length in it's result, disallowing adding more elements (as of sbcl), though it still works for list as a first arg
(while still failing for empty list).
the question is: is it possible to make up this kind of function without using generic methods and/or complex type-of result processing (e.g. removing length for vectors/arrays etc) ?
i'm okay with into acting as append (in contrast with clojure, where into result depends on target collection type) Let's call it concat-into
In Clojure, you have a concrete idea (most of the time) of what kind that first collection is when you use into, because it changes the semantics: if it is a list, additional elements will be conjed onto the front, if it is a vector, they will be conjed to the back, if it is a map, you need to supply map entry designators (i. e. actual map entries or two-element vectors), sets are more flexible but also carry their own semantics. That's why I'd guess that using concatenate directly, explicitly supplying the type, is probably a good enough fit for many use cases.
Other than that, I think that it could be useful to extend this functionality (Common Lisp only has a closed set of sequence types), but for that, it seems too obviously convenient to use generic functions to ignore. It is not trivial to provide a solution that is extensible, generic, and performant.
EDIT: To summarize: no, you can't get that behaviour with clever application of one or two “built-ins”, but you can certainly write an extensible and generic solution using generic functions.
ok, the only thing i've come to (besides generic methods) is this dead simple function:
(defun into (target source)
(let ((target-type (etypecase target
(vector (list 'array (array-element-type target) (*)))
(list 'list))))
(concatenate target-type target source)))
CL-USER> (into (list 1 2 4) "asd")
;;=> (1 2 4 #\a #\s #\d)
CL-USER> (into #*0010 (list 1 1 0 0))
;;=> #*00101100
CL-USER> (into "asdasd" (list #\a #\b))
;;=> "asdasdab"
also the simple empty impl:
(defun empty (target)
(etypecase target
(vector (make-array 0
:element-type (array-element-type target)
:adjustable t :fill-pointer 0))
(list)))
The result indeed (as #Svante noted) doesn't have the exact type, but rather "the collection with the element type being the same as that of target". It doesn't conform the clojure's protocol (where list target should be prepended to).
Can't see where it flaws (if it does), so would be nice to hear about that.. Anyway, as it was only for the sake of education, that will do.

Create a map entry in Clojure

What is the built-in Clojure way (if any), to create a single map entry?
In other words, I would like something like (map-entry key value). In other words, the result should be more or less equivalent to (first {key value}).
Remarks:
Of course, I already tried googling, and only found map-entry? However, this document has no linked resources.
I know that (first {1 2}) returns [1 2], which seems a vector. However:
(class (first {1 2}))
; --> clojure.lang.MapEntry
(class [1 2])
; --> clojure.lang.PersistentVector
I checked in the source code, and I'm aware that both MapEntry and PersistentVector extend APersistentVector (so MapEntry is more-or-less also a vector). However, the question is still, whether I can create a MapEntry instance from Clojure code.
Last, but not least: "no, there is no built in way to do that in Clojure" is also a valid answer (which I strongly suspect is the case, just want to make sure that I did not accidentally miss something).
"no, there is no built in way to do that in Clojure" is also a valid answer
Yeah, unfortunately that's the answer. I'd say the best you can do is define a map-entry function yourself:
(defn map-entry [k v]
(clojure.lang.MapEntry/create k v))
Just specify a class name as follows
(clojure.lang.MapEntry. "key" "val")
or import the class to instantiate by a short name
(import (clojure.lang MapEntry))
(MapEntry. "key" "val")
As Rich Hickey says here: "I make no promises about the continued existence of MapEntry. Please don't use it." You should not attempt to directly instantiate an implementation class such clojure.lang.MapEntry. It's better to just use:
(defn map-entry [k v] (first {k v}))

How to inherit function options

I wrote this function that gets an element out of a tree. It's just this:
(defn at [address tree] (reduce nth tree address))
Now the problem with this is that nth has 2 overloads; one that throws an exception if the index is out of range, and one that takes a not-found argument to return instead of throwing and exception.
Now I could make an overload for my function to add this option like so:
(defn at [address tree not-found]
(reduce (fn [curr-tree index] (nth curr-tree index not-found))
tree address))
I could complain about how I have to explicitly make a new function instead of the nice nth by itself.
This isn't the real problem though. I shouldn't have to make an overload for every overload that nth has.
nth only has two overloads, but for other times when I want to write a wrapper-like function, how can I defer decisions to the user. In this example, I'm just wrapping nth; to be consistent, I want at to imitate the behavior of nth. How do I inherit the options of other functions?
I'm asking this from a clojure point of view, but it may or may not apply to other languages.
Why do you design at in terms of nth's design? Whoever calls at shouldn't be thinking about nth's overloads.
at can be passed a function like in #amalloy's answer. But I would suggest starting with a simpler design and refactoring later if need be:
(defn at
([address tree]
(reduce nth tree address))
([address tree not-found]
(reduce #(nth %1 %2 not-found) tree address)))
My rationale is passing a not-found value is easier to understand than passing a function:
(def maybe-x (at addr tree :bummer))
;; See #amalloy's answer
(def maybe-x (at' addr tree #(nth %1 %2 :bummer))))
In fact if I need to pass some other function later, I would make a new at-by function (see group-by).
Maybe something like this?
(defn at [address tree & more]
(reduce (fn [a i] (apply nth a i more)) tree address))
The & more picks up whatever extra arguments the user might supply, and the apply sticks them on to the end of the call to nth, without you having to worry about what nth is going to do with them.
Stylistically, though, I'd prefer to write out the overloads. It will make for better documentation of what options your function supports, and it will be easier to maintain down the road.
You can take a function argument to call instead of nth, and then you don't care how many overloads it has, because the caller will handle the one overload they actually want to use.
(defn at
([address tree]
(at address tree nth))
([address tree f]
(reduce f tree address)))
(at [whatever] some-tree #(nth % %2 nil))

Clojure: working with a java.util.HashMap in an idiomatic Clojure fashion

I have a java.util.HashMap object m (a return value from a call to Java code) and I'd like to get a new map with an additional key-value pair.
If m were a Clojure map, I could use:
(assoc m "key" "value")
But trying that on a HashMap gives:
java.lang.ClassCastException: java.util.HashMap cannot be cast to clojure.lang.Associative
No luck with seq either:
(assoc (seq m) "key" "value")
java.lang.ClassCastException: clojure.lang.IteratorSeq cannot be cast to clojure.lang.Associative
The only way I managed to do it was to use HashMap's own put, but that returns void so I have to explicitly return m:
(do (. m put "key" "value") m)
This is not idiomatic Clojure code, plus I'm modifying m instead of creating a new map.
How to work with a HashMap in a more Clojure-ish way?
Clojure makes the java Collections seq-able, so you can directly use the Clojure sequence functions on the java.util.HashMap.
But assoc expects a clojure.lang.Associative so you'll have to first convert the java.util.HashMap to that:
(assoc (zipmap (.keySet m) (.values m)) "key" "value")
Edit: simpler solution:
(assoc (into {} m) "key" "value")
If you're interfacing with Java code, you might have to bite the bullet and do it the Java way, using .put. This is not necessarily a mortal sin; Clojure gives you things like do and . specifically so you can work with Java code easily.
assoc only works on Clojure data structures because a lot of work has gone into making it very cheap to create new (immutable) copies of them with slight alterations. Java HashMaps are not intended to work in the same way. You'd have to keep cloning them every time you make an alteration, which may be expensive.
If you really want to get out of Java mutation-land (e.g. maybe you're keeping these HashMaps around for a long time and don't want Java calls all over the place, or you need to serialize them via print and read, or you want to work with them in a thread-safe way using the Clojure STM) you can convert between Java HashMaps and Clojure hash-maps easily enough, because Clojure data structures implement the right Java interfaces so they can talk to each other.
user> (java.util.HashMap. {:foo :bar})
#<HashMap {:foo=:bar}>
user> (into {} (java.util.HashMap. {:foo :bar}))
{:foo :bar}
If you want a do-like thing that returns the object you're working on once you're done working on it, you can use doto. In fact, a Java HashMap is used as the example in the official documentation for this function, which is another indication that it's not the end of the world if you use Java objects (judiciously).
clojure.core/doto
([x & forms])
Macro
Evaluates x then calls all of the methods and functions with the
value of x supplied at the front of the given arguments. The forms
are evaluated in order. Returns x.
(doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))
Some possible strategies:
Limit your mutation and side-effects to a single function if you can. If your function always returns the same value given the same inputs, it can do whatever it wants internally. Sometimes mutating an array or map is the most efficient or easiest way to implement an algorithm. You will still enjoy the benefits of functional programming as long as you don't "leak" side-effects to the rest of the world.
If your objects are going to be around for a while or they need to play nicely with other Clojure code, try to get them into Clojure data structures as soon as you can, and cast them back into Java HashMaps at the last second (when feeding them back to Java).
It's totally OK to use the java hash map in the traditional way.
(do (. m put "key" "value") m)
This is not idiomatic Clojure code, plus I'm modifying m instead of creating a new map.
You are modifying a data structure that really is intended to be modified. Java's hash map lacks the structural sharing that allows Clojures map's to be efficiently copied. The generally idiomatic way of doing this is to use java-interop functions to work with the java structures in the typical java way, or to cleanly convert them into Clojure structures and work with them in the functional Clojure way. Unless of course it makes life easier and results in better code; then all bets are off.
This is some code I wrote using hashmaps when I was trying to compare memory characteristics of the clojure version vs java's (but used from clojure)
(import '(java.util Hashtable))
(defn frequencies2 [coll]
(let [mydict (new Hashtable)]
(reduce (fn [counts x]
(let [y (.toLowerCase x)]
(if (.get mydict y)
(.put mydict y (+ (.get mydict y) 1))
(.put mydict y 1)))) coll) mydict))
This is to take some collection and return how many times each different thing (say a word in a string) is reused.