Simplifying regex in Haskell with trees - regex

I have this data structure for regular expressions (RE), and so far I do not have any functions modifying REs:
data Regex a = Letter a | Emptyword | Concat (Regex a) (Regex a) | Emptyset | Or (Regex a) (Regex a) | Star (Regex a)
deriving (Show, Eq)
I would like to implement a simplification algorithm for my REs. For this I thought I should first represent the RE as tree, update the tree according to some equivalences and then convert it back to a RE. My reasoning was that with trees I would have functions to find, extract and attach subtrees, update values etc.
However, I have difficulties finding a tree module giving these functionalities and being simple enough for a beginner to learn.
I found this avl-tree package however, it seems very large.
I'd like to have alternative suggestions to my approach with trees and suggestions on easy tree modules supporting mentioned functions.
Note that I'm a beginner in Haskell and I do not understand monads yet and that I'm not interested in an implementation to simplify REs.
Edit 1: We know that the following two REs are equivalent, where L b stands for Letter b and C for Concat:
Or Or
/ \ / \
L b C = L b L a
/ \
L a Emptyword
So given the left RE I'd like to replace the subtree with its root labeled by C with a node labeled by L a. As was pointed out my data structure is a tree structure. However, currently I do not have functions to, e.g. replace a subtree with a node, or find a subtree of a structure that I can replace.

As noted in the comments, you already have a tree. You can simplify right away:
simplify :: Regex a -> Regex a
simplify (Star Emptyset) = Emptyword
simplify (Star (Star x)) = Star (simplify x)
simplify (Concat x Emptyword) = simplify x
simplify (Concat Emptyword y) = simplify y
simplify (Or x y) | x == y = x
-- or rather simplify (Or x y) | simplify x == simplify y = simplify x
-- more sophisticated rules here
-- ...
-- otherwise just push down
simplify (Or x y) = simplify (Or (simplify x) (simplify y)
-- ...
simplify x#(Letter _) = x
This is just superficial, e.g. the first rule should be simplify (Star x) | simplify x == Emptyset = emptyword.
AVL Trees
AVL trees are for balance, not really applicable here. The only place where balance make sense is for the associative operations
Or (x (Or y z) == Or (Or x y) y
I suggest to use lists for those operations
data Regex' a = Letter' a | Concat' [Regex a] | Or [Regex a] | Star (Regex a)
deriving (Show, Eq)
(No Emptyword' because it is Concat' []; same with Emptyset' and Or.)
Converting between Regex and Regex' is the usual exercise for the reader.
General Hardness
Note that Regex equivalence is not easy:
(a|b)* = (a*b)*a*
Optimizing Or "(a|b)*" "(a*b)*a*" is hard...

Related

How do I combine consectuive numbers in a list into a range in Haskell?

I'm trying to grapple my head around Haskell and I'm having a hard time pinning down the general procedure/algorithm for this specific task. What I want to do is basically give Haskell a list [1,2,3,5,6,9,16,17,18,19] and have it give me back [1-3, 5, 6, 9, 16-19] so essentially turning three or more consecutive numbers into a range in the style of lowestnumber - highestnumber. What I have issue with it I suppose is the all too common difficulty grappling with with the functional paradigm of Haskell. So I would really appreciate a general algorithm or an insight into how to view this from an "Haskellian" point of view.
Thanks in advance.
If I understand the question correctly, the idea is to break up the input lists in chunks, where a chunk is either a single input element or a range of at least three consecutive elements.
So, let's start by defining a datatype for representing such chunks:
data Chunk a = Single a | Range a a
As you can see, the type is parametric in the type of input elements.
Next, we define a function chunks to actually construct a list of chunks from a list of input elements. For this, we require the ability to compare input elements and to obtain the immediate consecutive for a given input element (that is, its successor). Hence, the type of the function reads
chunks :: (Eq a, Enum a) => [a] -> [Chunk a]
Implementation is relatively straightforward:
chunks = foldr go []
where
go x (Single y : Single z : cs) | y == succ x && z == succ y = Range x z : cs
go x (Range y z : cs) | y == succ x = Range x z : cs
go x cs = Single x : cs
We traverse the list from right to left, generating chunks as we go. We generate a range if an input element precedes its two immediate consecutive elements (the first case of the helper function go) or if it precedes a range that starts with its immediate consecutive (the second case). Otherwise, we generate a single element (the final case).
To arrange for pretty output, we declare applications of the type constructor Chunk to be instances of the class Show (given that the type of input elements is in Show):
instance Show a => Show (Chunk a) where
show (Single x ) = show x
show (Range x y) = show x ++ "-" ++ show y
Returning to the example from the question, we then have:
> chunks [1,2,3,5,6,9,16,17,18,19]
[1-3,5,6,9,16-19]
Unfortunately, things are slightly more complicated if we need to account for bounded element types; such types have a largest element for which succ is undefined:
> chunks [maxBound, 1, 2, 3] :: [Chunk Int]
*** Exception: Prelude.Enum.succ{Int}: tried to take `succ' of maxBound
This suggests that we should abstract from the specific approach for determining whether one elements succeeds another:
chunksBy :: (a -> a -> Bool) -> [a] -> [Chunk a]
chunksBy succeeds = foldr go []
where
go x (Single y : Single z : cs) | y `succeeds` x && z `succeeds` y =
Range x z : cs
go x (Range y z : cs) | y `succeeds` x = Range x z : cs
go x cs = Single x : cs
Now, the version of chunks that was given above, can be expressed in terms of chunksBy simply by writing
chunks :: (Eq a, Enum a) => [a] -> [Chunk a]
chunks = chunksBy (\y x -> y == succ x)
Moreover, we can now also implement a version for bounded input types as well:
chunks' :: (Eq a, Enum a, Bounded a) => [a] -> [Chunk a]
chunks' = chunksBy (\y x -> x /= maxBound && y == succ x)
That merrily gives us:
> chunks' [maxBound, 1, 2, 3] :: [Chunk Int]
[9223372036854775807,1-3]
First, all elements of a list must be of the same type. Your resulting list has two different types. Ranges (for what ever that means) and Ints. We should convert one single digit into a range with lowest and highest been the same.
Been said so, You should define the Range data type and fold your list of Int into a list of Range
data Range = Range {from :: Int , to :: Int}
intsToRange :: [Int] -> [Range]
intsToRange [] = []
intsToRange [x] = [Range x x]
intsToRange (x:y:xs) = ... -- hint: you can use and auxiliar acc which holds the lowest value and keep recursion till find a y - x differece greater than 1.
You can also use fold, etc... to get a very haskelly point of view
Use recursion. Recursion is a leap of faith. It is imagining you've already written your definition and so can ("recursively") call it on a sub-problem of your full problem, and combine the (recursively calculated) sub-result with the left-over part to get the full solution -- easy:
ranges xs = let (leftovers, subproblem) = split xs
subresult = ranges subproblem
result = combine leftovers subresult
in
result
where
split xs = ....
combine as rs = ....
Now, we know the type of rs in combine (i.e. subresult in ranges) -- it is what ranges returns:
ranges :: [a] -> rngs
So, how do we split our input list xs? The type-oriented design philosophy says, follow the type.
xs is a list [a] of as. This type has two cases: [] or x:ys with x :: a and ys :: [a]. So the easiest way to split a list into a smaller list and some leftover part is
split (x:xs) = (x, ys)
split [] = *error* "no way to do this" -- intentionally invalid code
Taking note of the last case, we'll have to tweak the overall design to take that into account. But first things first, what's the rngs type could be? Going by your example data, it's a list of rngs, naturally, rngs ~ [rng].
A rng type though, we have a considerable degree of freedom to make it to be whatever we want. The cases we have to account for are pairs and singletons:
data Rng a = Single a
| Pair a a
.... and now we need to fit the jagged pieces together into one picture.
Combining a number with a range which starts from consecutive number is obvious.
Combining a number with a single number will have two obvious cases, for whether those numbers are consecutive or not.
I think / hope you can proceed from here.

SML map a filter?

If I have this code:
fun coord_select (x : int, cs : (int*int) list) =
List.filter (fn (first, _) => first = x ) cs
testing with input gives this:
coord_select (2, [(2,2),(2,3),(3,3),(4,3)])
: val it = [(2,2),(2,3)] : (int * int) list
Now, what if I don't give the desired first coordinate as an int but as a list of several required first coordinates such as [3,4], i.e., I want all coordinate tuples that start with 3 as well as 4? The easy way would simply to create a recursive wrapper around this that went through the list and plugged in the value as coord_select's first variable. But I would like to understand nested things better than such brute force. So I came up with this:
fun coord_match (fs : int list, cs :(int*int) list) =
map (coord_select (f, cs)) fs
but this can't really work because, as was pointed out, coord_select in the map is actually trying to return a list -- and how does map know to plug in the members of fs into f in the first place? Common Lisp does have a device to keep functions like this from running, i.e., the ' operator. But this wouldn't help, again, because map doesn't know which variable fs is supplying. For input, e.g., I have these coordinates:
[(2,2),(2,3),(3,3),(4,3)]
and I have this list of x-coordinates to match against the above list
[3,4]
Again, I could just put a recursive wrapper around this, but I'm reaching for a more elegant nested solution from the greater fold family.
what if I don't give the desired first coordinate as an int but as a list of several required first coordinates such as [3,4], i.e., I want all coordinate tuples that start with 3 and 4
It sounds like you want all coordinate tuples that start with 3 or 4, since a coordinate can't both be 3 and 4.
Given that, you can write coord_select like:
fun member (x, xs) =
List.exists (fn x2 => x = x2) xs
fun coord_select (xs, coords) =
List.filter (fn (x, _) => member (x, xs)) coords
the greater fold family
This family is called catamorphisms, of which map, filter, exists and foldl belong. Since foldl is the most general of these, it is technically possible to write the code above using folds entirely:
fun coord_select (xs, coords) =
foldr (fn ((x, y), acc1) =>
if foldl (fn (x2, acc2) => acc2 orelse x = x2) false xs
then (x, y) :: acc1
else acc1) [] coords
but as should be evident, explicit folds are not very readable.
If there's a specialised combinator that does a job, you would rather want that over a fold. And if it doesn't exist, creating it from slightly less specialised combinators improves readability. Folding is as close to manual recursion as you get and so provides little information to the reader about what kind of recursion we're attempting.
For that reason I also made member from exists, since exists requires me to specify a predicate and my predicate is "equality with x"; so even exists, I feel, adds clutter to the coord_select function.
You can learn more about list catamorphisms in functional programming by reading Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire (1991) by Meijer, Fokkinga, Paterson.

elm list comprehensions, retrieving the nth element of a list

I was trying to do a simulation of the Rubik's cube in Elm when I noticed Elm doesn't support list comprehensions. In Haskell or even Python I would write something like:
ghci> [2*c | c <- [1,2,3,4]]
[2,4,6,8]
I could not find a way in Elm. The actual list comprehension I had to write was (in Haskell):
ghci> let x = [0,1,3,2]
ghci> let y = [2,3,1,0]
ghci> [y !! fromIntegral c | c <- x]
[2,3,0,1]
where fromIntegral :: (Integral a, Num b) => a -> b turns Integer into Num.
In Elm, I tried to use Arrays:
x = Array.fromList [0,1,3,2]
y = Array.fromList [2,3,1,0]
Array.get (Array.get 2 x) y
And I started getting difficulties with Maybe types:
Expected Type: Maybe number
Actual Type: Int
In fact, I had to look up what they were. Instead of working around the maybe, I just did something with lists:
x = [0,1,3,2]
y = [2,3,1,0]
f n = head ( drop n x)
map f y
I have no idea if that's efficient or correct, but it worked in the cases I tried.
I guess my two main questions are:
does Elm support list comprehensions? ( I guess just use map)
how to get around the maybe types in the Array example?
is it efficient to call head ( drop n x) to get the nth element of a list?
Elm doesn't and will not support list comprehensions: https://github.com/elm-lang/Elm/issues/147
The style guide Evan refers to says 'prefer map, filter, and fold', so.. using `map:
map ((y !!).fromIntegral) x
or
map (\i-> y !! fromIntegral i) x
Commenters point out that (!!) isn't valid Elm (it is valid Haskell). We can define it as either:
(!!) a n = head (drop n a), a total function.
or perhaps
(!!) a n = case (head (drop n a)) of
Just x -> x
Nothing -> crash "(!!) index error"
I don't know much about Elm, so I can't answer to whether it supports list comprehensions (couldn't find anything via Google about it either way), but I can answer your other two questions.
How to get around the Maybe types in the Array example?
The type of Array.get is Int -> Array a -> Maybe a, which means that it returns either Nothing or Just x, where x is the value at the given index. If you want to feed the result of one of these operations into another, in Haskell you could just do
Array.get 2 x >>= \i -> Array.get i y
Or with do notation:
do
i <- Array.get 2 x
Array.get i y
However, from a quick search it seems that Elm may or may not support all monadic types, but hopefully you can still use a case statement to get around this (it's just not very fun)
case Array.get 2 x of
Nothing -> Nothing
Just i -> Array.get i y
In fact, I would recommend writing a function to do this in general for you, it's just a direct clone of >>= for Maybe in Haskell:
mayBind :: Maybe a -> (a -> Maybe b) -> Maybe b
mayBind Nothing _ = Nothing
mayBind (Just x) f = f x
Then you could use it as
Array.get 2 x `mayBind` (\i -> Array.get i y)
Is it efficient to call head (drop n x) to get the nth element of a list?
No, but neither is direct indexing, which is equivalent to head . drop n. For lists, indexing will always be O(n) complexity, meaning it takes n steps to get the nth element from the list. Arrays have a different structure, which lets them index in logarithmic time, which is significantly faster. For small lists (< 100 elements), this doesn't really matter, but once you start getting more than a hundred or a thousand elements, it starts becoming a bottleneck. Lists are great for simple code that doesn't have to be the fastest, as they are generally more convenient. Now, I don't know how exactly this gets translated in Elm, it may be that Elm will convert them into Javascript arrays, which are true arrays and indexable in O(1) time. If Elm uses its own version of Haskell lists after it's been compiled, then you'll still have a slowdown.

Haskell Tree With Function Branches

I'll start of by saying I'm very new to Haskell, so I haven't learned about things like Monads yet.
In Haskell I'm trying to make a type of tree that has numbers as the leaves and functions as the branches so the whole tree can act kind of like a calculator.
Here's my code so far. Currently instead of having functions as an input I'm just using characters.
data Tree3 = Leaf3 Int | Node3 Char Tree3 Tree3 deriving (Show)
-- I would like to replace this ^ Char somehow with a function.
evaluate :: Tree3 -> Int
evaluate (Leaf3 x) = x
evaluate (Node3 c m n) | c == '+' = evaluate m + evaluate n
| c == '-' = evaluate m - evaluate n
| c == '/' = evaluate m `div` evaluate n
| c == '*' = evaluate m * evaluate n
So my question is can I have an input of a function in the data structure (and what would the type be?)
Sorry for the probably confusing question, but thanks for any advice!
I would recommend writing your tree as:
data Tree = Leaf Int | Node (Int -> Int -> Int) Tree Tree
Note that you won't be able to derive Eq or Show, since Int -> Int doesn't implement either of those typeclasses (and it's impossible impractical to do so).
Then you can write your evaluate function as
evaluate :: Tree -> Int
evaluate (Leaf x) = x
evaluate (Node f l r) = f (evaluate l) (evaluate r)
which is much simpler!
You can make a tree to represent an expression like (1 + 2) * (3 * 4) as
expr :: Tree
expr = Node (*) (Node (+) (Leaf 1) (Leaf 2)) (Node (*) (Leaf 3) (Leaf 4))
Another way that would make it easier to prettier print your tree would be to use almost the same definition you have:
data Tree = Leaf Int | Node String Tree Tree
-- ^ String instead of Char
Then if you have Data.Map imported, you can create a map of functions to look up, but it makes your evaluate function a bit more complex since you introduce the possibility that your function won't be in your map. Luckily Haskell has some really handy tools for handling this elegantly!
import qualified Data.Map as Map
type Tree = Leaf Int | Node String Tree Tree deriving (Eq, Show)
type FuncMap = Map.Map String (Int -> Int -> Int)
evaluate :: FuncMap -> Tree -> Maybe Tree
evaluate funcs (Leaf x) = return x
evaluate funcs (Node funcName left right) = do
-- Use qualified import since there's a Prelude.lookup
f <- Map.lookup funcName funcs
l <- evaluate funcs left
r <- evaluate funcs right
return $ f l r
This will automatically result in Nothing if you try something like
evaluate (Map.fromList [("+", (+))]) (Node "blah" (Leaf 1) (Leaf 2))
since the function "blah" isn't in your FuncMap. Notice how we didn't have to do any explicit error handling of any kind thanks to Maybe's monad instance! If any of the lookups to the function map return Nothing, the whole computation returns Nothing without us having to think about it.

Is it possible to match with decomposed sequences in F#?

I seem to remember an older version of F# allowing structural decomposition when matching sequences just like lists. Is there a way to use the list syntax while keeping the sequence lazy? I'm hoping to avoid a lot of calls to Seq.head and Seq.skip 1.
I'm hoping for something like:
let decomposable (xs:seq<'a>) =
match xs with
| h :: t -> true
| _ -> false
seq{ 1..100 } |> decomposable
But this only handles lists and gives a type error when using sequences. When using List.of_seq, it seems to evaluate all the elements in the sequence, even if it is infinite.
If you use the LazyList type in the PowerPack, it has Active Patterns called LazyList.Nil and LazyList.Cons that are great for this.
The seq/IEnumerable type is not particulaly amenable to pattern matching; I'd highly recommend LazyList for this. (See also Why is using a sequence so much slower than using a list in this example.)
let s = seq { 1..100 }
let ll = LazyList.ofSeq s
match ll with
| LazyList.Nil -> printfn "empty"
| LazyList.Cons(h,t) -> printfn "head: %d" h
Seq works fine in active patterns! Unless I'm doing something horrible here...
let (|SeqEmpty|SeqCons|) (xs: 'a seq) =
if Seq.isEmpty xs then SeqEmpty
else SeqCons(Seq.head xs, Seq.skip 1 xs)
// Stupid example usage
let a = [1; 2; 3]
let f = function
| SeqEmpty -> 0
| SeqCons(x, rest) -> x
let result = f a
Remember seq has map reduce functions as well, so you might often be able to get away with only those. In the example, your function is equivalent to "Seq.isEmpty". You might try to launch fsi and just run through the tab completion options (enter "Seq." and hit tab a lot); it might have what you want.