Parsing String of parenthesis to nested List in Haskell - list

My goal was to write a function to parse string of nested parentheses into a corresponding list:
parseParens "()" --> []
parseParens "(())" --> [[]]
parseParens "((()()))" --> [[[],[]]]
First off I discovered that I can't specify easily define a type of the return value. I could do something like:
parseParens :: String -> [[[[t]]]]
But how do I say that it's infinitely nested? I guess Haskell doesn't allow that.
My solution
I came up with my own data type:
data InfiniteList = EmptyList | Cons InfiniteList InfiniteList deriving (Show)
And a parser function that uses this:
parseParens :: String -> InfiniteList
parseParens ('(':xs) =
if remainder == ""
then result
else error "Unbalanced parenthesis"
where (result, remainder) = parseToClose EmptyList xs
parseParens _ = error "Unbalanced parenthesis"
parseToClose :: InfiniteList -> String -> (InfiniteList, String)
parseToClose acc "" = error "Unbalanced parenthesis!"
parseToClose acc (')':xs) = (acc, xs)
parseToClose acc ('(':xs) = parseToClose (concatInfLists acc (Cons result EmptyList)) remainder
where (result, remainder) = parseToClose EmptyList xs
concatInfLists :: InfiniteList -> InfiniteList -> InfiniteList
concatInfLists EmptyList ys = ys
concatInfLists (Cons x xs) ys = Cons x (concatInfLists xs ys)
Working like so:
parseParens "()" --> EmptyList
parseParens "(())" --> Cons EmptyList EmptyList
parseParens "((()()))" --> Cons (Cons EmptyList (Cons EmptyList EmptyList)) EmptyList
How to improve?
There surely must be a better way to do this. Perhaps there's even a way to use the built-in List data type for this?

Edit: Fixed my mischaracterization of Benjamin's answer.
While the answer in #Benjamin Hodgson's comment:
data Nested a = Flat a | Nested (Nested [a]) deriving (Show)
gives a good way to represent a homogeneous list of arbitrary nesting depth (i.e., sort of like a sum type of [a] plus [[a]] plus [[[a]]] plus all the rest), it seems like an unusual representation for your problem, particularly in a case like:
parseParens "(()(()))"
where the nesting depth of the "child nodes" differs. This would be represented as:
Nested (Nested (Nested (Flat [[],[[]]]))) :: Nested a
so it literally allows you to represent the result of the parse as the desired list, given enough Nested constructors, but it has some odd properties. For example, the innermost empty lists actually have different types: the first is of type [[a]] while the second is of type [a].
As a alternative approach, I think the data type you actually want is probably just:
data Nested = N [Nested] deriving (Show)
where each node N is a (possibly empty) list of nodes. Then, you'll get:
> parseParens "()"
N []
> parseParens "(())"
N [N []]
> parseParens "((()()))"
N [N [N [],N []]]
> parseParens "(()(()))"
N [N [],N [N []]]
If you just ignore the N constructors in these results, the first three of these match your "corresponding list" test cases from the beginning of your question.
As a side note: the Nested data type above is actually a "rose tree" containing no data, equivalent to Tree () using the Tree data type from Data.Tree in the containers package.
Finally, I can't emphasize enough how helpful it is to learn and use a monadic parsing library, even for simple parsing jobs. Using the parsec library, for example, you can write a parser for your grammar in one line:
nested = N <$> between (char '(') (char ')') (many nested)
My full code for parseParens is:
import Data.Tree
import Text.Parsec
import Text.Parsec.String
data Nested = N [Nested] deriving (Show)
nested :: Parser Nested
nested = N <$> between (char '(') (char ')') (many nested)
parseParens :: String -> Nested
parseParens str =
let Right result = parse (nested <* eof) "" str
in result

Related

Ocaml Type error: This expression has type 'a * 'b but an expression was expected of type 'c list

I'm required to output a pair of lists and I'm not understanding why the pair I'm returning is not of the correct type.
let rec split l = match l with
| [] -> []
| [y] -> [y]
| x :: xs ->
let rec helper l1 acc = match l1 with
| [] -> []
| x :: xs ->
if ((List.length xs) = ((List.length l) / 2)) then
(xs, (x :: acc))
else helper xs (x :: acc)
in helper l []
(Please take the time to copy/paste and format your code on SO rather than providing a link to an image. It makes it much easier to help, and more useful in the future.)
The first case of the match in your helper function doesn't return a pair. All the cases of a match need to return the same type (of course).
Note that the cases of your outermost match are also of different types (if you assume that helper returns a pair).

How to compare elements in a [[]]?

I am dealing with small program with Haskell. Probably the answer is really simple but I try and get no result.
So one of the part in my program is the list:
first = [(3,3),(4,6),(7,7),(5,43),(9,9),(32,1),(43,43) ..]
and according to that list I want to make new one with element that are equal in the () =:
result = [3,7,9,43, ..]
Even though you appear to have not made the most minimal amount of effort to solve this question by yourself, I will give you the answer because it is so trivial and because Haskell is a great language.
Create a function with this signature:
findIdentical :: [(Int, Int)] -> [Int]
It takes a list of tuples and returns a list of ints.
Implement it like this:
findIdentical [] = []
findIdentical ((a,b) : xs)
| a == b = a : (findIdentical xs)
| otherwise = findIdentical xs
As you can see, findIdentical is a recursive function that compares a tuple for equality between both items, and then adds it to the result list if there is found equality.
You can do this for instance with list comprehension. We iterate over every tuple f,s) in first, so we write (f,s) <- first in the right side of the list comprehension, and need to filter on the fact that f and s are equal, so f == s. In that case we add f (or s) to the result. So:
result = [ f | (f,s) <- first, f == s ]
We can turn this into a function that takes as input a list of 2-tuples [(a,a)], and compares these two elements, and returns a list [a]:
f :: Eq a => [(a,a)] -> [a]
f dat = [f | (f,s) <- dat, f == s ]
An easy way to do this is to use the Prelude's filter function, which has the type definition:
filter :: (a -> Bool) -> [a] -> [a]
All you need to do is supply predicate on how to filter the elements in the list, and the list to filter. You can accomplish this easily below:
filterList :: (Eq a) => [(a, a)] -> [a]
filterList xs = [x | (x, y) <- filter (\(a, b) -> a == b) xs]
Which behaves as expected:
*Main> filterList [(3,3),(4,6),(7,7),(5,43),(9,9),(32,1),(43,43)]
[3,7,9,43]

Converting a hierarchical data structure to a flat one in Haskell

I'm extracting some data from a text document organized like this:
- "day 1"
- "Person 1"
- "Bill 1"
- "Person 2"
- "Bill 2"
I can read this into a list of tuples that looks like this:
[(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Where the first item of each tuple indicates the heading level, and the second item the information associated with each heading.
My question is, how can I get a list of items that looks like this:
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
I.e. one list per deepest nested item, containing all the information from the headings above it.
The closest I've gotten is this:
f [] = []
f (x:xs) = row:f rest where
leaves = takeWhile (\i -> fst i > fst x) xs
rest = dropWhile (\i -> fst i > fst x) xs
row = concat $ map (\i -> (snd x):[snd i]) leaves
Which gives me this:
[[["day 1"],["Intro 1"],["day 1"],["Bill 1"],["day 1"],["Intro 2"],["day 1"],["Bill 2"]]]
I'd like the solution to work for any number of levels.
P.s. I'm new to Haskell. I have a sense that I could/should use a tree to store the data, but I can't wrap my head around it. I also could not think of a better title.
Trees
You were right that you should probably use a tree to store the data. I'll copy how Data.Tree does it:
data Tree a = Node a (Forest a) deriving (Show)
type Forest a = [Tree a]
Building the Tree
Now we want to take your weakly typed list of tuples and convert it to a (slightly) stronger Tree of Strings. Any time you need to convert a weakly typed value and validate it before converting to a stronger type, you use a Parser:
type YourData = [(Int, [String])]
type Parser a = YourData -> Maybe (a, YourData)
The YourData type synonym represents the weak type that you are parsing. The a type variable is the value you are retrieving from the parse. Our Parser type returns a Maybe because the Parser might fail. To see why, the following input does not correspond to a valid Tree, since it is missing level 1 of the tree:
[(0, ["val1"]), (2, ["val2"])]
If the Parser does succeed, it also returns the unconsumed input so that subsequent parsing stages can use it.
Now, curiously enough, the above Parser type exactly matches a well known monad transformer stack:
StateT s Maybe a
You can see this if you expand out the underlying implementation of StateT:
StateT s Maybe a ~ s -> Maybe (a, s)
This means we can just define:
import Control.Monad.Trans.State.Strict
type Parser a = StateT [(Int, [String])] Maybe a
If we do this, we get a Monad, Applicative and Alternative instance for our Parser type for free. This makes it very easy to define parsers!
First, we must define a primitive parser that consumes a single node of the tree:
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
This is the only non-trivial piece of code we have to write, which, because it is total, handles all the following corner cases:
The list is empty
Your node has multiple values in it
The number in the tuple doesn't match the expected depth
The next part is where things get really elegant. We can then define two mutually recursive parsers, one for parsing a Tree, and the other for parsing a Forest:
import Control.Applicative
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
The first parser uses Applicative style, since StateT gave us an Applicative instance for free. However, I could also have used StateT's Monad instance instead, to give code that's more readable for an imperative programmer:
parseTree :: Int -> Parser (Tree String)
parseTree level = do
str <- parseElement level
forest <- parseForest (level + 1)
return $ Node str forest
But what about the many function? What's that doing? Let's look at its type:
many :: (Alternative f) => f a -> f [a]
It takes anything that returns a value and implements Applicative and instead calls it repeatedly to return a list of values instead. When we defined our Parser type in terms of State, we got an Alternative instance for free, so we can use the many function to convert something that parses a single Tree (i.e. parseTree), into something that parses a Forest (i.e. parseForest).
To use our Parser, we just rename an existing StateT function to make its purpose clear:
runParser :: Parser a -> [(Int, [String])] -> Maybe a
runParser = evalStateT
Then we just run it!
>>> runParser (parseForest 0) [(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Just [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
That's just magic! Let's see what happens if we give it an invalid input:
>>> runParser (parseForest 0) [(0, ["val1"]), (2, ["val2"])]
Just [Node "val1" []]
It succeeds on a portion of the input! We can actually specify that it must consume the entire input by defining a parser that matches the end of the input:
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
Now let's try it:
>>> runParser (parseForest 0 >> eof) [(0, ["val1"]), (2, ["val2"])]
Nothing
Perfect!
Flattening the Tree
To answer your second question, we again solve the problem using mutually recursive functions:
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
Let's try it!
>>> flattenForest [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
Now, technically I didn't have to use mutually recursive functions. I could have done a single recursive function. I was just following the definition of the Tree type from Data.Tree.
Conclusion
So in theory I could have shortened the code even further by skipping the intermediate Tree type and just parsing the flattened result directly, but I figured you might want to use the Tree-based representation for other purposes.
The key take home points from this are:
Learn Haskell abstractions to simplify your code
Always write total functions
Learn to use recursion effectively
If you do these, you will write robust and elegant code that exactly matches the problem.
Appendix
Here is the final code that incorporates everything I've said:
import Control.Applicative
import Control.Monad.Trans.State.Strict
import Data.Tree
type YourType = [(Int, [String])]
type Parser a = StateT [(Int, [String])] Maybe a
runParser :: Parser a -> [(Int, [String])] -> Maybe a
runParser = evalStateT
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
I seem to have solved it.
group :: [(Integer, [String])] -> [[String]]
group ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subgroups = map (str ++) $ group children
in if null children then [str] ++ group rest
else subgroups ++ group rest
group [] = []
I didn't test it much though.
The idea is to notice the recursive pattern. This function takes the first element (N, S) of the list and then gathers all entries in higher levels until another element at level N, into a list 'children'. If there are no children, we are at the top level and S forms the output. If there are some, S is appended to all of them.
As for why your algorithm doesn't work, the problem is mostly in row. Notice that you are not descending recursively.
Trees can be used too.
data Tree a = Node a [Tree a] deriving Show
listToTree :: [(Integer, [String])] -> [Tree [String]]
listToTree ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subtrees = listToTree children
in Node str subtrees : listToTree rest
listToTree [] = []
treeToList :: [Tree [String]] -> [[String]]
treeToList (Node s ns:ts) = children ++ treeToList ts where
children = if null ns then [s] else map (s++) (treeToList ns)
treeToList [] = []
The algorithm is essentially the same. The first half goes to the first function, the second half to the second.

Lists defined as Maybe in Haskell? Why not?

You don't offen see Maybe List except for error-handling for example, because lists are a bit Maybe themselves: they have their own "Nothing": [] and their own "Just": (:).
I wrote a list type using Maybe and functions to convert standard and to "experimental" lists. toStd . toExp == id.
data List a = List a (Maybe (List a))
deriving (Eq, Show, Read)
toExp [] = Nothing
toExp (x:xs) = Just (List x (toExp xs))
toStd Nothing = []
toStd (Just (List x xs)) = x : (toStd xs)
What do you think about it, as an attempt to reduce repetition, to generalize?
Trees too could be defined using these lists:
type Tree a = List (Tree a, Tree a)
I haven't tested this last piece of code, though.
All ADTs are isomorphic (almost--see end) to some combination of (,),Either,(),(->),Void and Mu where
data Void --using empty data decls or
newtype Void = Void Void
and Mu computes the fixpoint of a functor
newtype Mu f = Mu (f (Mu f))
so for example
data [a] = [] | (a:[a])
is the same as
data [a] = Mu (ListF a)
data ListF a f = End | Pair a f
which itself is isomorphic to
newtype ListF a f = ListF (Either () (a,f))
since
data Maybe a = Nothing | Just a
is isomorphic to
newtype Maybe a = Maybe (Either () a)
you have
newtype ListF a f = ListF (Maybe (a,f))
which can be inlined in the mu to
data List a = List (Maybe (a,List a))
and your definition
data List a = List a (Maybe (List a))
is just the unfolding of the Mu and elimination of the outer Maybe (corresponding to non-empty lists)
and you are done...
a couple of things
Using custom ADTs increases clarity and type safety
This universality is useful: see GHC.Generic
Okay, I said almost isomorphic. It is not exactly, namely
hmm = List (Just undefined)
has no equivalent value in the [a] = [] | (a:[a]) definition of lists. This is because Haskell data types are coinductive, and has been a point of criticism of the lazy evaluation model. You can get around these problems by only using strict sums and products (and call by value functions), and adding a special "Lazy" data constructor
data SPair a b = SPair !a !b
data SEither a b = SLeft !a | SRight !b
data Lazy a = Lazy a --Note, this has no obvious encoding in Pure CBV languages,
--although Laza a = (() -> a) is semantically correct,
--it is strictly less efficient than Haskell's CB-Need
and then all the isomorphisms can be faithfully encoded.
You can define lists in a bunch of ways in Haskell. For example, as functions:
{-# LANGUAGE RankNTypes #-}
newtype List a = List { runList :: forall b. (a -> b -> b) -> b -> b }
nil :: List a
nil = List (\_ z -> z )
cons :: a -> List a -> List a
cons x xs = List (\f z -> f x (runList xs f z))
isNil :: List a -> Bool
isNil xs = runList xs (\x xs -> False) True
head :: List a -> a
head xs = runList xs (\x xs -> x) (error "empty list")
tail :: List a -> List a
tail xs | isNil xs = error "empty list"
tail xs = fst (runList xs go (nil, nil))
where go x (xs, xs') = (xs', cons x xs)
foldr :: (a -> b -> b) -> b -> List a -> b
foldr f z xs = runList xs f z
The trick to this implementation is that lists are being represented as functions that execute a fold over the elements of the list:
fromNative :: [a] -> List a
fromNative xs = List (\f z -> foldr f z xs)
toNative :: List a -> [a]
toNative xs = runList xs (:) []
In any case, what really matters is the contract (or laws) that the type and its operations follow, and the performance of implementation. Basically, any implementation that fulfills the contract will give you correct programs, and faster implementations will give you faster programs.
What is the contract of lists? Well, I'm not going to express it in complete detail, but lists obey statements like these:
head (x:xs) == x
tail (x:xs) == xs
[] == []
[] /= x:xs
If xs == ys and x == y, then x:xs == y:ys
foldr f z [] == z
foldr f z (x:xs) == f x (foldr f z xs)
EDIT: And to tie this to augustss' answer:
newtype ExpList a = ExpList (Maybe (a, ExpList a))
toExpList :: List a -> ExpList a
toExpList xs = runList xs (\x xs -> ExpList (Just (x, xs))) (ExpList Nothing)
foldExpList f z (ExpList Nothing) = z
foldExpList f z (ExpList (Just (head, taill))) = f head (foldExpList f z tail)
fromExpList :: ExpList a -> List a
fromExpList xs = List (\f z -> foldExpList f z xs)
You could define lists in terms of Maybe, but not that way do. Your List type cannot be empty. Or did you intend Maybe (List a) to be the replacement of [a]. This seems bad since it doesn't distinguish the list and maybe types.
This would work
newtype List a = List (Maybe (a, List a))
This has some problems. First using this would be more verbose than usual lists, and second, the domain is not isomorphic to lists since we got a pair in there (which can be undefined; adding an extra level in the domain).
If it's a list, it should be an instance of Functor, right?
instance Functor List
where fmap f (List a as) = List (f a) (mapMaybeList f as)
mapMaybeList :: (a -> b) -> Maybe (List a) -> Maybe (List b)
mapMaybeList f as = fmap (fmap f) as
Here's a problem: you can make List an instance of Functor, but your Maybe List is not: even if Maybe was not already an instance of Functor in its own right, you can't directly make a construction like Maybe . List into an instance of anything (you'd need a wrapper type).
Similarly for other typeclasses.
Having said that, with your formulation you can do this, which you can't do with standard Haskell lists:
instance Comonad List
where extract (List a _) = a
duplicate x # (List _ y) = List x (duplicate y)
A Maybe List still wouldn't be comonadic though.
When I first started using Haskell, I too tried to represent things in existing types as much as I could on the grounds that it's good to avoid redundancy. My current understanding (moving target!) tends to involve more the idea of a multidimensional web of trade-offs. I won't be giving any “answer” here so much as pasting examples and asking “do you see what I mean?” I hope it helps anyway.
Let's have a look at a bit of Darcs code:
data UseCache = YesUseCache | NoUseCache
deriving ( Eq )
data DryRun = YesDryRun | NoDryRun
deriving ( Eq )
data Compression = NoCompression
| GzipCompression
deriving ( Eq )
Did you notice that these three types could all have been Bool's? Why do you think the Darcs hackers decided that they should introduce this sort of redundancy in their code? As another example, here is a piece of code we changed a few years back:
type Slot = Maybe Bool -- OLD code
data Slot = InFirst | InMiddle | InLast -- newer code
Why do you think we decided that the second code was an improvement over the first?
Finally, here is a bit of code from some of my day job stuff. It uses the newtype syntax that augustss mentioned,
newtype Role = Role { fromRole :: Text }
deriving (Eq, Ord)
newtype KmClass = KmClass { fromKmClass :: Text }
deriving (Eq, Ord)
newtype Lemma = Lemma { fromLemma :: Text }
deriving (Eq, Ord)
Here you'll notice that I've done the curious thing of taking a perfectly good Text type and then wrapping it up into three different things. The three things don't have any new features compared to plain old Text. They're just there to be different. To be honest, I'm not entirely sure if it was a good idea for me to do this. I provisionally think it was because I manipulate lots of different bits and pieces of text for lots of reasons, but time will tell.
Can you see what I'm trying to get at?

Flatten a list of lists

I have to write a function that flattens a list of lists.
For example flatten [] = [] or flatten [1,2,3,4] = [1,2,3,4] or flatten [[1,2],[3],4,5]] = [1,2,3,4,5]
I'm having trouble with the being able to match the type depending on what is given to the flatten function.
Here's what I have:
data A a = B a | C [a] deriving (Show, Eq, Ord)
flatten::(Show a, Eq a, Ord a)=>A a -> A a
flatten (C []) = (C [])
flatten (C (x:xs) ) = (C flatten x) ++ (C flatten xs)
flatten (B a) = (C [a])
From what I can tell the issue is that the ++ operator is expecting a list for both of its arguments and I'm trying to give it something of type A. I've added the A type so the function can either get a single element or a list of elements.
Does anyone know a different way to do this differently, or explain what I can do to fix the type error?
It's a bit unclear what you are asking for, but flattening a list of list is a standard function called concat in the prelude with type signature [[a]] -> [a].
If you make a data type of nested lists as you have started above, maybe you want to adjust your data type to something like this:
data Lists a = List [a] | ListOfLists [Lists a]
Then you can flatten these to a list;
flatten :: Lists a -> [a]
flatten (List xs) = xs
flatten (ListOfLists xss) = concatMap flatten xss
As a test,
> flatten (ListOfLists [List [1,2],List [3],ListOfLists [List [4],List[5]]])
[1,2,3,4,5]
Firstly, the A type is on the right track but I don't think it's quite correct. You want it to be able to flatten arbitrarily nested lists, so a value of type "A a" should be able to contain values of type "A a":
data A a = B a | C [A a]
Secondly, the type of the function should be slightly different. Instead of returning a value of type "A a", you probably want it to return just a list of a, since by definition the function is always returning a flat list. So the type signature is thus:
flatten :: A a -> [a]
Also note that no typeclass constraints are necessary -- this function is completely generic since it does not look at the list's elements at all.
Here's my implementation:
flatten (B a) = [a]
flatten (C []) = []
flatten (C (x:xs)) = flatten x ++ flatten (C xs)
this one liner will do the job. Although as it was mentioned by Malin the type signature is different:
flatten :: [[a]] -> [a]
flatten xs = (\z n -> foldr (\x y -> foldr z y x) n xs) (:) []
simple test
frege> li = [[3,4,2],[1,9,9],[5,8]]
frege> flatten li
[3,4,2,1,9,9,5,8]
Flatten via list comprehension.
flatten arr = [y | x<- arr, y <- x]