"rerootable" purely functional tree data structure

"rerootable" purely functional tree data structure - clojure

I recently purchased Inferring Phylogenies by Joseph Felsenstein, which is a great book about mathematical and computational methods for inferring phylogenetic trees, and have been playing around with implementing some of the algorithms it describes.
Specifically I'm interested in doing so using in a functional setting with persistent data structures, as a lot of the methods involve walking through a space of possible trees and it would be nice to cheaply remember the history of where we've been via structural sharing (a lá what aphyr does with "worlds" in this blog post), easily cache previously computed values for subtrees, etc.
The problem with this is that a lot of the methods involve "rerooting" trees, which I cannot figure out how to do cheaply in a purely functional way. Basically I need some way of capturing the idea that each of the following (using clojure notation, representing trees as vectors):
[:a [:b [:c :d]]]
[:b [:a [:c :d]]]
[:a [:b [:d :c]]]
[:b [:a [:d :c]]]
[[:a :b] [:c :d]]
[[:c :d] [:a :b]]
[:c [:d [:a :b]]]
[:d [:c [:a :b]]]
[:c [:d [:b :a]]]
[:d [:c [:b :a]]]
represent the same data and only differ in where the root is placed; they each represent the unrooted tree:
a b
\ /
|
/ \
c d
I'd like to be able to navigate into one of these trees with a zipper and then call a function reroot, which will return a new tree that's zipped up in such a way that the root is at the current loc.
In the book Felsenstein describes a data structure for a cheaply rerootable tree, which looks something like the following hastily made diagram
in which the circles are structs and the arrows are pointers. The rings of structs are internal nodes on the tree, and once we have a reference to one, we can move the root there by doing some pointer swapping. Unfortunately this is a mutating operation and requires mutual references, both of which are impossible in a purely functional setting.
I feel like there should be a way to do what I want using zippers, but I've been playing around with clojure.core/zip for a while and getting nowhere.
Does anyone know of an implementation of something like this or have suggestions for things I should read / papers I should look at / ideas for how to do this?
Thanks!

The jvm doesn't actually give us access to pointers as such that we can directly manipulate. But we do have a few options for representing a doubly linked structure.
This looks a lot like a graph, and for sparse graphs like this, a classic representation is the adjacency list. An advantage of adjacency lists is that they dereference by name rather than relying on pointer / object identity, and as such we can express arbitrary circular or self referential paths in the structure without any need for mutation.
naming your nodes alphabetically left to right / top to bottom:
{:a [:c]
:b [:d]
:c [:a :d :e]
:d [:b :c :e]
:e [:c :d :g]
:f [:h]
:g [:e :h :i]
:h [:f :g :i]
:i [:g :h]}
elements in the network are looked up by name, and arrows coming out from that element are represented by a vector as the associated value. Traversal can be implemented as a recursive function looking up the node to step to at each iteration. The "root" is just the element used to start your traversal (:i in your graph).
Various kinds of insertion / splitting rearrangement can be done with conj, update-in, assoc, etc. since the hash-map literal is a regular clojure persistent data structure.

An unrooted tree is a graph with the following characteristics:
It is symmetric/undirected - it is its own inverse.
It is strongly connected - you can get everywhere from anywhere.
The only way to get back to where you came from is to retrace your
steps.
The standard way to represent a graph is as a map giving the set of neighbors for each node. This is what the standard clojure graph library does, though its operations are obscured behind a largely redundant defstruct.
For your example, the map is
{:I #{:a :b :c :d}, :a #{:I}, :b #{:I}, :c #{:I}, :d #{:I}}
This is an undirected graph when it is its own inverse, where
(defn inverse [g]
(apply merge-with clojure.set/union
(for [[x xs] g, y xs] {y #{x}})))
You don't need to do anything to root it anywhere. As #noisesmith says, the root is just the node you start enumerating from. Judging by the diagram, this is equally true of Felsenstein's data structure.
If, as the diagram suggests, only your internal nodes are multiply connected, you could save some space by mapping directly from each external node to its unique neighbour . Your example would become
{:I #{:a :b :c :d}, :a :I, :b :I, :c :I, :d :I}
perhaps better expressed as two maps:
{:internals {:I #{:a :b :c :d}}, :externals {:a :I, :b :I, :c :I, :d :I}}

Related

Scheme to Clojure Function (subst)

I am reading Paul Graham's The Roots of Lisp
I have tried converting the function subst on page 5, which is defined like this:
(defun subst (x y z)
(cond ((atom z)
(cond ((eq z y) x)
('t z)))
('t (cons (subst x y (car z))
(subst x y (cdr z))))))
To its corresponding Clojure implementation. I do not have production experience in either languages (I have been reading Clojure), so any help would be appreciated since I am reading this to understand the roots of LISP. The closest I came to was this (but it is horribly wrong):
(defn subst
[x y z]
(if (not (nil? z)) z x)
(if (= z y) x z)
(cons (subst x y (first z))
(subst (x y (rest z)))))

"Traduttore, traditore"
(This can be translated as "translator, traitor", but doing so ruins the pun, which is fun in itself)
It is hard to hint at possible fixes in your Clojure code because the specification is unclear:
if you follow the The Roots of Lisp to the letter, you are going to implement a Lisp on top of Clojure, and subst might be similar to the one in the book.
But if you want to implement subst as commonly used in Lisp, the code shown here won't do it.
Even though Clojure has cons and nil? functions, they do not mean the same as in Common Lisp (resp. cons and null): See clojure: no cons cells for details.
Before you can translate subst, you have to determine what is the idiomatic thing to do in Clojure.
Typically subst is used to transform a tree, made of cons cells; note for example that subst does not recurse into vectors, strings, etc. Among those trees, a particular subset of trees are those which are Lisp forms. In fact, one important use case for subst is to search-and-replace forms during code generation.
If you restrict yourself to the Clojure Cons type, you won't support code as data, as far as I know.
Since Clojure code also uses vectors and maps, you probably need to recurse into such objects. So, how to translate subst is not an easy problem to specify.
A possible starting point is to read LispReader.java to determine the set of objects that constitute an AST, and see what kind of code walking you want to do.
My advice would be to study those languages independently first. With a bit of experience with each, you will have a better way to see how similar and how different they are with each other.

the translation of the scheme version could possibly look like this:
(defn my-subst [new old data]
(when-not (nil? data)
(cond (sequential? data) (cons (my-subst new old (first data))
(my-subst new old (next data)))
(= data old) new
:else data)))
user> (my-subst 1 :x '(1 2 :x (:x 10 :x [:x :z :x])))
;;=> (1 2 1 (1 10 1 (1 :z 1)))
this is quite close (though not exactly the same, since there are more than one native collection type, which makes you face the choice: which ones should be considered to be the targets to substitution). This example handles 'listy' (sequential) structures, while omitting hash maps and sets.
Another problem is retaining the type AND form of the original sequence, which is not really as trivial as it sounds (e.g (into (empty (list 1 2 3)) (list 1 2 3)) => (3 2 1)
So what you have to do, is to first decide the semantics of the substitution, while in scheme it is just a natural list processing.
As of clojure.walk which has already been mentioned, the simplest way to use it for substitution could be
(defn subst [new old data]
(clojure.walk/prewalk-replace {old new} data))
user> (subst :z :x [1 :x 3 '(:x {:a :x}) #{:x 1}])
;;=> [1 :z 3 (:z {:a :z}) #{1 :z}]

This is how I would do it, including a unit test to verify:
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [clojure.walk :as walk]))
(defn subst
[replacement target listy]
(walk/postwalk (fn [elem]
(if (= elem target)
replacement
elem))
listy))
(dotest
(is= (subst :m :b [:a :b [:a :b :c] :d])
[:a :m [:a :m :c] :d]))
However, I would not spend a lot of time reading 40-year old texts on Common Lisp, even though I think Paul Graham's book Hackers & Painters is quite mind blowing.
Clojure has evolved the state-of-the-art for lisp by at least one order-of-magnitude (more like 2, I would say). Major improvements include the use of the JVM, persistent data structures, concurrency, syntax, and data literals, just to name a few.
Please see this list of Clojure learning resources, and maybe start with Getting Clojure or similar.
Update
More from Paul Graham on Clojure

Clojurescript - map from list of subvecs

I'm trying to create a map from a list of 2-element Subvecs.
This works fine in Clojure:
(into {} (list (subvec [1 2 3] 1)))
>> {2 3}
But fails in ClojureScript, with the following error:
No protocol method IMapEntry.-key defined for type number: 2
Replacing (subvec [1 2 3] 1) with [2 3] makes it work in both languages.
I'm new to ClojureScript, and can't find where this behaviour is documented. Is this a bug? And how would you suggest going around it efficiently?
Thanks!

I think it's an omission. Subvectors should be indistinguishable from ordinary vectors, and therefore Subvec should have an implementation of IMapEntry added to it, like the one in PersistentVector.

Namespace qualified record field accessors

I've made the same dumb mistake many many times:
(defrecord Record [field-name])
(let [field (:feld-name (->Record 1))] ; Whoops!
(+ 1 field))
Since I misspelled the field name keyword, this will cause a NPE.
The "obvious" solution to this would be to have defrecord emit namespaced keywords instead, since then, especially when working in a different file, the IDE will be able to immediately show what keywords are available as soon as I type ::n/.
I could probably with some creativity create a macro that wraps defrecord that creates the keywords for me, but this seems like overkill.
Is there a way to have defrecord emit namespaced field accessors, or is there any other good way to avoid this problem?

Because defrecords compile to java classes and fields on a java class don't have a concept of namespaces, I don't think there's a good way to have defrecord emit namespaced keywords.
One alternative, if the code is not performance sensitive and doesn't need to implement any protocols and similar, is to just use maps.
Another is, like Alan Thompson's solution, to make a safe-get funtion. The prismatic/plumbing util library also has an implementation of this.
(defn safe-get [m k]
(let [ret (get m k ::not-found)]
(if (= ::not-found ret)
(throw (ex-info "Key not found: " {:map m, :key k}))
ret)))
(defrecord x [foo])
(safe-get (->x 1) :foo) ;=> 1
(safe-get (->x 1) :fo) ;=>
;; 1. Unhandled clojure.lang.ExceptionInfo
;; Key not found:
;; {:map {:foo 1}, :key :fo}

I feel your pain. Thankfully I have a solution that saves me many times/week that I've been using a couple of years. It is the grab function from the Tupelo library. It does not provide the type of IDE integration you are hoping for, but it does provide fail-fast typo-detection, so you always be notified the very first time you try to use the non-existant key. Another benefit is that you'll get a stacktrace showing the line number with the misspelled keyword, not the line number (possibly far, far away) where the nil value causes a NPE.
It also works equally well for both records & plain-old maps (my usual use-case).
From the README:
Map Value Lookup
Maps are convenient, especially when keywords are used as functions to look up a value in a map. Unfortunately, attempting to look up a non-existent keyword in a map will return nil. While sometimes convenient, this means that a simple typo in the keyword name will silently return corrupted data (i.e. nil) instead of the desired value.
Instead, use the function grab for keyword/map lookup:
(grab k m)
"A fail-fast version of keyword/map lookup. When invoked as (grab :the-key the-map),
returns the value associated with :the-key as for (clojure.core/get the-map :the-key).
Throws an Exception if :the-key is not present in the-map."
(def sidekicks {:batman "robin" :clark "lois"})
(grab :batman sidekicks)
;=> "robin"
(grab :spiderman m)
;=> IllegalArgumentException Key not present in map:
map : {:batman "robin", :clark "lois"}
keys: [:spiderman]
The function grab should also be used in place of clojure.core/get. Simply reverse the order of arguments to match the "keyword-first, map-second" convention.
For looking up values in nested maps, the function fetch-in replaces clojure.core/get-in:
(fetch-in m ks)
"A fail-fast version of clojure.core/get-in. When invoked as (fetch-in the-map keys-vec),
returns the value associated with keys-vec as for (clojure.core/get-in the-map keys-vec).
Throws an Exception if the path keys-vec is not present in the-map."
(def my-map {:a 1
:b {:c 3}})
(fetch-in my-map [:b :c])
3
(fetch-in my-map [:b :z])
;=> IllegalArgumentException Key seq not present in map:
;=> map : {:b {:c 3}, :a 1}
;=> keys: [:b :z]
Your other option, using records, is to use the Java-interop style of accessor:
(.field-name myrec)
Since Clojure defrecord compiles into a simple Java class, your IDE may be able to recognize these names more easily. YMMV

Clojure : is there a more idiomatic way to work on nested vectors?

I want to cap samples that I generate from Poisson's distributions.
Original data is like
[[2 12] [3 14]] (samples)
Here, [2 12] correspond to samples of distributions [P1 P2], [3 14] as well.
I want to cap P1 and P2 with max values, let's say for instance
[4 12] (max-values)
With these parameters, I want so to output (I want to keep vectors)
[[2 12] [3 12]]
This is pretty easy but I do not know if my way is very idiomatic :
(defn cap-poisson-samples
"Cap poisson samples to meet the expactations
if required"
[data max-values]
(mapv
(fn [x]
(mapv (fn [u v] (if (> u v) v u)) x max-values))
data))
Someone told me that it's better to avoid nested map in the past but I do not know if it's true.
I know prewalk exists but it's not possible to pass two inputs (like my second mapv).
i could also use a for but it's heavier.
Generally speaking, I'm quite lost when I work with two vectors I have to process on same indexes. I searched in clojure;core but I did not find any.
So I generally use for or mapv depending on the fact that the index is important or not.
Thanks

Can I add fields to clojure types?

Clojure structs can be arbitrarily extended, adding new fields.
Is it possible to extend types (created using deftype) in a similar way?
EDIT: For the benefit future visitors, as Brian pointed out below, this feature is subject to change.

Actually you can treat types as maps, you just need to extend clojure.lang.IPersistentMap (an implementation is magically supplied).
(deftype A [a b]
clojure.lang.IPersistentMap)
(A 1 2) ;; => #:A{:a 1, :b 2}
(assoc (A 1 2) :c 3) ;; => #:A{:a 1, :b 2, :c 3}
Note
Clojure has since split the semantics of types into defrecord and deftype. For most application-level programming, you'll want to use records. Conveniently, they automatically provide an implementation of clojure.lang.IPersistentMap, no magic necessary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js