It is easy to reconstruct the list type in Haskell:
data MyList a = Cons a (MyList a)
| Empty
deriving (Show)
someList = Cons 1 (Cons 2 (Cons 3 Empty)) -- represents [1,2,3]
This allows for constructing infinites lists. Is it possible to somehow define a list type, that only allows for finite (but still arbitrary length) lists?
The example of lists here can replaced with any other potentially infinite data structures like trees etc. Note that I do not have any particular application in mind, so there is no need to question the usefullness of this, I'm just curious whether that would be possible.
Alternative 1: lists with a strict tail
data MyList a = Cons a !(MyList a) | Empty
Trying to build an infinite list will surely lead to a bottom element of MyList a.
Alternative 2: existentially-quantified fixed-length list
data Nat = O | S Nat
data List a (n :: Nat) where
Nil :: List a O
Cons :: a -> List a n -> List a (S n)
data MyList a where
MyList :: List a n -> MyList a
I would say that this does not allow for infinite lists as well.
This is because we can't pattern match on a GADT with where (or lazy patterns in general).
-- fails to compile
test :: MyList Int
test = MyList (Cons 1 list)
where MyList list = test
The following would be too strict.
-- diverges
test2 :: MyList Int
test2 = case test2 of
MyList list -> MyList (Cons 1 list)
The following makes the existentially quantified type variable "escape" the scope of the case:
-- fails to compile
test3 :: MyList Int
test3 = MyList $ case test3 of
MyList list -> (Cons 1 list)
Related
Does this function take two int lists "x and y" and return an int list of y-x?
let rec fun4 (l: int list) :int list =
begin match l with | [] -> []
| [_] -> []
| x::y::rest -> (y-x)::(fun4 (y::rest))
end
A list is defined as a recursive type:
type 'a list =
| [] of 'a list (* the empty case *)
| ( :: ) of 'a * 'a list
So you basically have two constructors: [] which is the empty list, and x :: other_list which is a list with x as head and other_list as tail. The use of these constructors makes it easy to define a list: [0; 1; 2; 3] is exactly the same of 0 :: 1 :: 2 :: 3 and of (::) (0, (::) (1, (::) (2, (::) (3, [])))) (which is not very pleasant to read).
Recursive algebraic types, here we have the conjunction of sums ([] and (::)) and products (('a * 'a list)), combined with pattern matching make it possible to describe all sorts of common data structures, as well as their functions for consuming, modifying etc.
In your example, you use pattern matching to deconstruct the list:
let rec fun4 my_list =
match my_list with
(* if my list is empty, I can't process the function so
I return the empty list *)
| [] -> []
(* if my list my list has only one element, I can't process the function
so, like in the previouse case, I return the empty list *)
| [ _ ] -> []
(* Here is the last case of the function, If I almost have two elements in the
list. Remember that empty list is also a list ! *)
| x :: y :: rest -> (y - x) :: (fun4 (y :: rest))
As you can see, Recursives Algebraic data types coupled with pattern matching are a powerful for describing data structures (like list but also many others) and for writing function that use those data structures.
The question I have is how might I transform a list of a string and integer pair to a list of string and int list pairs.
For example, if I have the list [("hello",1) ; ("hi", 1) ; ("hello", 1) ; ("hi", 1) ; ("hey",1 )] then I should get back [("hello",[1;1]) ; ("hi", [1;1]) ; ("hey",[1])] where basically from a previous function I wrote that creates string * int pairs in a list, I want to group every string that's the same into a pair that has a list of ones of a length = to how many times that exact string appeared in a pair from the input list. Sorry if my wording is confusing but I am quite lost on this function. Below is the code I have written so far:
let transform5 (lst: (string *int) list) : (string *int list) list =
match lst with
| (hd,n)::(tl,n) -> let x,[o] = List.fold_left (fun (x,[o]) y -> if y = x then x,[o]#[1] else
(x,[o])::y,[o]) (hd,[o]) tl in (x,[1])::(tl,[1])
Any help is appreciated!
General advice on how to improve understanding of core concepts:
The code suggests you could use more practice with destructuring and manipulating lists. I recommend reading the chapter on Lists and Patterns in Real World Ocaml and spending some time working through the first 20 or so 99 OCaml Problems.
Some pointers on the code you've written so far:
I have reorganized your code into a strictly equivalent function, with some annotations indicating problem areas:
let transform5 : (string * int) list -> (string * int list) list =
fun lst ->
let f (x, [o]) y =
if y = x then (* The two branches of this conditional are values of different types *)
(x, [o] # [1]) (* : ('a * int list) *)
else
(x, [o]) :: (y, [o]) (* : ('a * int list) list *)
in
match lst with
| (hd, n) :: (tl, n) -> (* This will only match a list with two tuples *)
let x, [o] = List.fold_left f (hd, [o]) tl (* [o] can only match a singleton list *)
in (x, [1]) :: (tl, [1]) (* Doesn't use the value of o, so that info is lost*)
(* case analysis in match expressions should be exhaustive, but this omits
matches for, [], [_], and (_ :: _ :: _) *)
If you load your code in utop or compile it in a file, you should get a number of warnings and type errors that help indicate problem areas. You can learn a lot by taking up each of those messages one by one and working out what they are indicating.
Refactoring the problem
A solution to your problem using a fold over the input list is probably the right way to go. But writing solutions that use explicit recursion and break the task down into a number of sub-problems can often help study the problem and make the underlying mechanics very clear.
In general, a function of type 'a -> 'b can be understood as a problem:
Given a x : 'a, construct a y : 'b where ...
Our function has type (string * int) list -> (string * int list) list and you
state the problem quite clearly, but I've edited a bit to fit the format:
Given xs : (string * int) list, construct ys: (string * int list) list
where I want to group every string from xs that's the same into a pair
(string * int list) in ys that has a list of ones of a length = to how
many times that exact string appeared in a pair from xs.
We can break this into two sub-problems:
Given xs : (string * int) list, construct ys : (string * int) list list where each y : (string * int) list in ys is a group of the items in xs with the same string.
let rec group : (string * int) list -> (string * int) list list = function
| [] -> []
| x :: xs ->
let (grouped, rest) = List.partition (fun y -> y = x) xs in
(x :: grouped) :: group rest
Given xs : (string * int) list list, construct ys : (string * int list) list where for each group (string, int) list in xs we have one (s : string, n : int list) in ys where s is the string determining the group and n is a list holding all the 1s in the group.
let rec tally : (string * int) list list -> (string * int list) list = function
| [] -> []
| group :: xs ->
match group with
| [] -> tally xs (* This case shouldn't arise, but we match it to be complete *)
| (s, _) :: _ ->
let ones = List.map (fun (_, one) -> one) group in
(s, ones) :: tally xs
The solution to your initial problem will just be the composition of these two sub-problems:
let transform5 : (string * int) list -> (string * int list) list =
fun xs -> (tally (group xs))
Hopefully this is a helpful illustration of one way to go about decomposing these kinds of problems. However, there are some obvious defects with the code I have written: it is inefficient, in that it creates an intermediate data structure and it must iterate through the first list repeatedly to form its groups, before finally tallying up the results. It also resorts to explicit recursion, whereas it would be preferable to use higher order functions to take care of iterating over the lists for us (as you tried in your example). Trying to fix these defects might be instructive.
Reconsidering our context
Is the problem you've posed in this SO question the best sub-problem from the overall task you are pursuing? Here are two questions have occurred to me:
Why, do you have a (string * int) list where the value of int is always 1 in the first place? Does this actually carry any more information than a string list?
In general, we can represent any n : int by a int list which contains only 1s and has length = n. By why not just use n here?
Here is the expected input/output:
repeated "Mississippi" == "ips"
repeated [1,2,3,4,2,5,6,7,1] == [1,2]
repeated " " == " "
And here is my code so far:
repeated :: String -> String
repeated "" = ""
repeated x = group $ sort x
I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.
Your code already does half of the job
> group $ sort "Mississippi"
["M","iiii","pp","ssss"]
You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
Using this:
> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]
Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:
> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"
Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.
Yet another option is to use a list comprehension
> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"
This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.
Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:
repeated :: Ord a => [a] -> [a]
After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.
Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.
So, you might fill in the following skeleton:
module Repeated (repeated) where
import Data.List (group, sort)
repeated :: Ord a => [a] -> [a]
repeated = map _
. filter (\x -> _)
. group
. sort
I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)
When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.
Here's some other approaches, to evaluate #chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)
While it's true that sorting is O(n lg n), ...
It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:
a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...
group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.
while greatly complicating the implementation.
The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.
repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.
repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
inserter2 x [] = [(x, False)]
inserter2 x (xb#(x', _): xs)
| x == x' = (x', True): xs
| otherwise = xb: inserter2 x xs
repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.
repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
inserter3 x [] = [(x, 1)]
inserter3 x xss#(xc#(x', c): xs) = case x `compare` x' of
{ LT -> ((x, 1): xss)
; EQ -> ((x', c+1): xs)
; GT -> (xc: inserter3 x xs)
}
repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.
repeated4 xs = sort $ go4 [] [] xs
go4 :: Ord a => [a] -> [a] -> [a] -> [a]
go4 repeats _ [] = repeats
go4 repeats onces (x: xs) = case findUpd x onces of
{ (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
; (False, oncesU) -> go4 repeats oncesU xs
}
findUpd :: Ord a => a -> [a] -> (Bool, [a])
findUpd x [] = (False, [x])
findUpd x (x': os) | x == x' = (True, os) -- i.e. x' removed
| otherwise =
let (b, os') = findUpd x os in (b, x': os')
(That last bit of list-fiddling in findUpd is very similar to span.)
I am a new one in Haskell and i am trying to define a function that it takes a finite list and create an infinite list adding in each repetition 1 to each element of the list. for example if i have the list [3,4,5] the function will generate the list [3,4,5,4,5,6,5,6,7....]
I'm thinking something like the loop, which will be infinite and it will add each loop one to each element and then add it in the list. But the problem is that I dont know exactly how to write it in Haskell!
Quick example in GHCi:
> let f x = x ++ (f $ map (+1) x)
> take 10 $ f [3,4,5]
[3,4,5,4,5,6,5,6,7,6]
Here, we define a recursive function f, that simply appends to the initial list the output of the recursive call with each number incremented by one. We can break it out to examine the function more closely.
GHCi will give you information on what type f is using
> :t f
f :: Num b => [b] -> [b]
This means it will work on any list of things with a Num instance (like Int).
So what does f do?
> let f x = x ++ (f $ map (+1) x)
^ -- Start with the initial list we pass in
^ -- Modify each element of that list and increment their values by 1.
^ -- This is where the `Num` constraint comes in
^ -- Recursively call f with the new "initial list"
^ -- Append the result of calling f recursively to the initial list
The components you need for this are:
map (+ 1) :: Num n => [n] -> [n] to add 1 to each element of the list
iterate :: (a -> a) -> a -> [a] to create an infinite list where each element is a function over the previous element
concat :: [[a]] -> [a] to flatten a list of lists
take 9 :: [a] -> [a] which we will use to get the first 9 elements, for the sake of testing, to avoid trying to print an infinite list
λ> [3,4,5] & iterate (map (+ 1)) & concat & take 9
[3,4,5,4,5,6,5,6,7]
My goal was to write a function to parse string of nested parentheses into a corresponding list:
parseParens "()" --> []
parseParens "(())" --> [[]]
parseParens "((()()))" --> [[[],[]]]
First off I discovered that I can't specify easily define a type of the return value. I could do something like:
parseParens :: String -> [[[[t]]]]
But how do I say that it's infinitely nested? I guess Haskell doesn't allow that.
My solution
I came up with my own data type:
data InfiniteList = EmptyList | Cons InfiniteList InfiniteList deriving (Show)
And a parser function that uses this:
parseParens :: String -> InfiniteList
parseParens ('(':xs) =
if remainder == ""
then result
else error "Unbalanced parenthesis"
where (result, remainder) = parseToClose EmptyList xs
parseParens _ = error "Unbalanced parenthesis"
parseToClose :: InfiniteList -> String -> (InfiniteList, String)
parseToClose acc "" = error "Unbalanced parenthesis!"
parseToClose acc (')':xs) = (acc, xs)
parseToClose acc ('(':xs) = parseToClose (concatInfLists acc (Cons result EmptyList)) remainder
where (result, remainder) = parseToClose EmptyList xs
concatInfLists :: InfiniteList -> InfiniteList -> InfiniteList
concatInfLists EmptyList ys = ys
concatInfLists (Cons x xs) ys = Cons x (concatInfLists xs ys)
Working like so:
parseParens "()" --> EmptyList
parseParens "(())" --> Cons EmptyList EmptyList
parseParens "((()()))" --> Cons (Cons EmptyList (Cons EmptyList EmptyList)) EmptyList
How to improve?
There surely must be a better way to do this. Perhaps there's even a way to use the built-in List data type for this?
Edit: Fixed my mischaracterization of Benjamin's answer.
While the answer in #Benjamin Hodgson's comment:
data Nested a = Flat a | Nested (Nested [a]) deriving (Show)
gives a good way to represent a homogeneous list of arbitrary nesting depth (i.e., sort of like a sum type of [a] plus [[a]] plus [[[a]]] plus all the rest), it seems like an unusual representation for your problem, particularly in a case like:
parseParens "(()(()))"
where the nesting depth of the "child nodes" differs. This would be represented as:
Nested (Nested (Nested (Flat [[],[[]]]))) :: Nested a
so it literally allows you to represent the result of the parse as the desired list, given enough Nested constructors, but it has some odd properties. For example, the innermost empty lists actually have different types: the first is of type [[a]] while the second is of type [a].
As a alternative approach, I think the data type you actually want is probably just:
data Nested = N [Nested] deriving (Show)
where each node N is a (possibly empty) list of nodes. Then, you'll get:
> parseParens "()"
N []
> parseParens "(())"
N [N []]
> parseParens "((()()))"
N [N [N [],N []]]
> parseParens "(()(()))"
N [N [],N [N []]]
If you just ignore the N constructors in these results, the first three of these match your "corresponding list" test cases from the beginning of your question.
As a side note: the Nested data type above is actually a "rose tree" containing no data, equivalent to Tree () using the Tree data type from Data.Tree in the containers package.
Finally, I can't emphasize enough how helpful it is to learn and use a monadic parsing library, even for simple parsing jobs. Using the parsec library, for example, you can write a parser for your grammar in one line:
nested = N <$> between (char '(') (char ')') (many nested)
My full code for parseParens is:
import Data.Tree
import Text.Parsec
import Text.Parsec.String
data Nested = N [Nested] deriving (Show)
nested :: Parser Nested
nested = N <$> between (char '(') (char ')') (many nested)
parseParens :: String -> Nested
parseParens str =
let Right result = parse (nested <* eof) "" str
in result