Pattern matching and infinite lists

Pattern matching and infinite lists - list

I'm having trouble understanding this simple snippet of code:
-- This works: foldr go1 [] [1..]
-- This doesn't: foldr go2 [] [1..]
go1 a b = a : b
go2 a [] = a : []
go2 a b = a : b
Folding with go1 immediately starts returning values, but go2 appears to be waiting for the end of the list.
Clearly the pattern matching is causing something to be handled differently. Can someone explain what exactly is going on here?

Unlike go1, go2 checks whether or not its second argument is empty. In order to do that the second argument needs to be evaluated, at least enough to determine whether it is empty or not.
So for your call to foldr this means the following:
Both go1 and go2 are first called with two arguments: 1 and the result of foldr go [] [2 ..]. In the case of go1 the second argument remains untouched, so the result of the foldr is simply 1 :: foldr go [] [2 ..] without evaluating the tail any further until it is accessed.
In the case of go2 however, foldr go [] [2 ..] needs to be evaluated to check whether it is empty. And to do that foldr go [] [3 ..] then needs to be evaluated for the same reason. And so on ad infinitum.

To test, whether an expression satisfies some pattern, you need to evaluate it to weak head normal form at least. So pattern-matching forces evaluation.
One common example is the interleave function, which interleaves two lists. It could be defined like
interleave :: [a] -> [a] -> [a]
interleave xs [] = xs
interleave [] ys = ys
interleave (x:xs) (y:ys) = x : y : interleave xs ys
But this function is strict in the second argument. And more lazy version is
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs
You can read more here: http://en.wikibooks.org/wiki/Haskell/Laziness

It is because of laziness.... Because of the way that go1 and go2 were defined in this example, they will behave exactly the same was for b==[], but the compiler doesn't know this.
For go1, the left-most fold will use tail-recursion to immediately output the value of a, and then compute the value of b.
go1 a b -> create and return the value of a, then calculate b
For go2, the compiler doesn't even know which case to match until the value of b is computed.... which will never happen.
go2 a b -> wait for the value of b, pattern match against it, then output a:b

To see the difference try this in GHCi:
> head (go1 1 (error "urk!"))
1
> head (go2 1 (error "urk!"))
*** Exception: urk!
The issue is that go2 will evaluate its second argument before returning the head of the list. That is, go2 is strict in its second argument, unlike go1 which is lazy.
This matters when you fold over infinite lists:
fold1 go1 [] [1..] =
go1 1 (go1 2 (go1 3 ( ..... =
1 : (go1 2 (go1 3 ( ..... =
1 : 2 : (go1 3 ( ...
works fine, but
fold1 go1 [] [1..] =
go2 1 (go2 2 (go2 3 ( .....
can not be simplified to 1:... since go2 insists in evaluating its second argument, which is another call to go2, which in turn requires its own second argument to be evaluated, which is another ...
Well, you get the point. The second one will not halt.

Related

Checking whether element is in list or not

I have just started writing my first few programs in Haskell and I'm struggling a lot - so please excuse errors galore. I found this practice problem online and it wants me to take in a list, an int, and return a bool. I should check if the int is to be found within the list or not and return true or false, respectively.
This is what I have:
import Data.List
elem':: (Eq a) => a -> [a] -> Bool
elem' x [x]
| x == [x:xs] = True
| length [x] == 1 = False
| otherwise = elem' tail [x]
The compiler returns
Illegal type signature [...] Type signatures are only allowed in patterns with ScopedTypeVariables
which I cannot really make sense of yet. I've found some people saying that this error occurs when using tabs instead of spaces, yet I solely used spaces. I am grateful for any advice!

First you have a simple indentation problem. Your definition is parsed as
elem':: (Eq a) => a -> [a] -> Bool elem' x [x]
| x == [x:xs] = True
| length [x] == 1 = False
| otherwise = elem' tail [x]
I'm a bit surprised that this isn't reported as a parse error, but basically it is one.
What you need to do is, indent all function clauses to the same level as the signature (i.e. in this case, not indented at all).
elem':: (Eq a) => a -> [a] -> Bool
elem' x [x]
| x == [x:xs] = True
| length [x] == 1 = False
| otherwise = elem' tail [x]
This is still very wrong though. First, you're matching twice on x. This is not possible in Haskell. OTOH there's no variable xs in scope. If it were, it would have to have type [a], and then [x:xs] would have type [[a]]: a nested list. But then comparing it to x, which is not even a simple list, doesn't make any sense. length [x] is always 1, regardless of what x is. Similarly, tail [x] is always the empty list (but that's not even what's used here, you've actually wrote that the tail function should be passed as an argument to elem').
Perhaps the confusion comes from the fact that [ ] can mean two different things: one the type level, it means you're talking about lists of elements of the enclosed type. I.e. [a] is the type of lists whose elements have type a.By contrast, on the value level, [x] is specifically the list that contains only a single element, namely x.
You should make sure you understand how pattern matching works. Your clause would have been legal if you had two different variables in it. Then, matching [y] would mean this clause only accepts lists with exactly one element, and y will than be the value of the single element. In this case, the correct implementation would be
elem' x [y] = x==y
(Alternatively, you could have guards here yielding True and False, but that's just redundant work.)
Actually though there's no good reason to have a clause for exactly one element. All you need is a clause for zero elements
elem' x [] = ...
and a clause for at least one element:
elem' x (y:ys)
| ...
| ...

Haskell: Why does (x:xs) match the list with only one element?

Given this definition of a function f:
f :: [Int] -> [Int]
f [] = []
f (x:xs) = x:[]
I would assume that a call such as
f [1]
would not match, because the pattern (x:xs) only matches, if there are more elements xs after the x in the list, which is not the case for the list [1]. Or is it?

If you write [x], a list with one element, this is short for (x : []), or even more verbose (:) x []. So it is a "cons" ((:)) with x as element, and the empty list as tail.
So your function f (x:xs) will indeed match a list with one (or more) elements. For a list with one element, x will be the element, and xs an empty list.
would not match, because the pattern (x:xs) only matches, if there are more elements xs after the x in the list, which is not the case for the list [1].
No the (x:xs) matches with every non-empty list, with x the first element of the list, and xs a (possibly empty) list of remaining elements.
If you want to match only lists with for example two or more elements. You can match this with:
-- two or more elements
f (x1 : x2 : xs) = …
Here x1 and x2 will match the first and second item of the list respectively, and xs is a list that contains the remaining elements.
EDIT: to answer your comments:
I wonder why my function definition does even compile in the first place, because the function type is [Int] -> [Int], so if I give it the empty list, then that is not [Int] as a result, is it?
The empty list [] is one of the data constructors of the [a] type, so that means that [] has type [] :: [a]. It can match the type variable a with Int, and thus [] can have type [] :: [Int].
Second then, how to I match a list with exactly two elements? [a, b]?
You can match such list with:
f (a : b : []) = …
or you can match it with:
f [a, b] = …
The two are equivalent. [a, b] is syntactical sugar: it is replaced by the compiler to (a : b : []), but for humans, it is of course more convenient to work with [a, b].

Haskell self-referential List termination

EDIT: see this followup question that simplifies the problem I am trying to identify here, and asks for input on a GHC modification proposal.
So I was trying to write a generic breadth-first search function and came up with the following:
bfs :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs predf expandf xs = find predf bfsList
where bfsList = xs ++ concatMap expandf bfsList
which I thought was pretty elegant, however in the does-not-exist case it blocks forever.
After all the terms have been expanded to [], concatMap will never return another item, so concatMap is blocking waiting for another item from itself? Could Haskell be made smart enough to realize the list generation is blocked reading the self-reference and terminate the list?
The best replacement I've been able to come up with isn't quite as elegant, since I have to handle the termination case myself:
where bfsList = concat.takeWhile (not.null) $ iterate (concatMap expandf) xs
For concrete examples, the first search terminates with success, and the second one blocks:
bfs (==3) (\x -> if x<1 then [] else [x/2, x/5]) [5, 3*2**8]
bfs (==3) (\x -> if x<1 then [] else [x/2, x/5]) [5, 2**8]

Edited to add a note to explain my bfs' solution below.
The way your question is phrased ("could Haskell be made smart enough"), it sounds like you think the correct value for a computation like:
bfs (\x -> False) (\x -> []) []
given your original definition of bfs should be Nothing, and Haskell is just failing to find the correct answer.
However, the correct value for the above computation is bottom. Substituting the definition of bfs (and simplifying the [] ++ expression), the above computation is equal to:
find (\x -> False) bfsList
where bfsList = concatMap (\x -> []) bfsList
Evaluating find requires determining if bfsList is empty or not, so it must be forced to weak head normal form. This forcing requires evaluating the concatMap expression, which also must determine if bfsList is empty or not, forcing it to WHNF. This forcing loop implies bfsList is bottom, and therefore so is find.
Haskell could be smarter in detecting the loop and giving an error, but it would be incorrect to return [].
Ultimately, this is the same thing that happens with:
foo = case foo of [] -> []
which also loops infinitely. Haskell's semantics imply that this case construct must force foo, and forcing foo requires forcing foo, so the result is bottom. It's true that if we considered this definition an equation, then substituting foo = [] would "satisfy" it, but that's not how Haskell semantics work, for the same reason that:
bar = bar
does not have value 1 or "awesome", even though these values satisfy it as an "equation".
So, the answer to your question is, no, this behavior couldn't be changed so as to return an empty list without fundamentally changing Haskell semantics.
Also, as an alternative that looks pretty slick -- even with its explicit termination condition -- maybe consider:
bfs' :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs' predf expandf = look
where look [] = Nothing
look xs = find predf xs <|> look (concatMap expandf xs)
This uses the Alternative instance for Maybe, which is really very straightforward:
Just x <|> ... -- yields `Just x`
Nothing <|> Just y -- yields `Just y`
Nothing <|> Nothing -- yields `Nothing` (doesn't happen above)
so look checks the current set of values xs with find, and if it fails and returns Nothing, it recursively looks in their expansions.
As a silly example that makes the termination condition look less explicit, here's its double-monad (Maybe in implicit Reader) version using listToMaybe as the terminator! (Not recommended in real code.)
bfs'' :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs'' predf expandf = look
where look = listToMaybe *>* find predf *|* (look . concatMap expandf)
(*>*) = liftM2 (>>)
(*|*) = liftM2 (<|>)
infixl 1 *>*
infixl 3 *|*
How does this work? Well, it's a joke. As a hint, the definition of look is the same as:
where look xs = listToMaybe xs >>
(find predf xs <|> look (concatMap expandf xs))

We produce the results list (queue) in steps. On each step we consume what we have produced on the previous step. When the last expansion step added nothing, we stop:
bfs :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs predf expandf xs = find predf queue
where
queue = xs ++ gen (length xs) queue -- start the queue with `xs`, and
gen 0 _ = [] -- when nothing in queue, stop;
gen n q = let next = concatMap expandf (take n q) -- take n elemts from queue,
in next ++ -- process, enqueue the results,
gen (length next) (drop n q) -- advance by `n` and continue
Thus we get
~> bfs (==3) (\x -> if x<1 then [] else [x/2, x/5]) [5, 3*2**8]
Just 3.0
~> bfs (==3) (\x -> if x<1 then [] else [x/2, x/5]) [5, 2**8]
Nothing
One potentially serious flow in this solution is that if any expandf step produces an infinite list of results, it will get stuck calculating its length, totally needlessly so.
In general, just introduce a counter and increment it by the length of solutions produced at each expansion step (length . concatMap expandf or something), decrementing by the amount that was consumed. When it reaches 0, do not attempt to consume anything anymore because there's nothing to consume at that point, and you should instead terminate.
This counter serves in effect as a pointer back into the queue being constructed. A value of n indicates that the place where the next result will be placed is n notches ahead of the place in the list from which the input is taken. 1 thus means that the next result is placed directly after the input value.
The following code can be found in Wikipedia's article about corecursion (search for "corecursive queue"):
data Tree a b = Leaf a | Branch b (Tree a b) (Tree a b)
bftrav :: Tree a b -> [Tree a b]
bftrav tree = queue
where
queue = tree : gen 1 queue -- have one value in queue from the start
gen 0 _ = []
gen len (Leaf _ : s) = gen (len-1) s -- consumed one, produced none
gen len (Branch _ l r : s) = l : r : gen (len+1) s -- consumed one, produced two
This technique is natural in Prolog with top-down list instantiation and logical variables which can be explicitly in a not-yet-set state. See also tailrecursion-modulo-cons.
gen in bfs can be re-written to be more incremental, which is usually a good thing to have:
gen 0 _ = []
gen n (y:ys) = let next = expandf y
in next ++ gen (n - 1 + length next) ys

bfsList is defined recursively, which is not in itself a problem in Haskell. It does, however, produce an infinite list, which, again, isn't in itself a problem, because Haskell is lazily evaluated.
As long as find eventually finds what it's looking for, it's not an issue that there's still an infinity of elements, because at that point evaluation stops (or, rather, moves on to do other things).
AFAICT, the problem in the second case is that the predicate is never matched, so bfsList just keeps producing new elements, and find keeps on looking.
After all the terms have been expanded to [] concatMap will never return another item
Are you sure that's the correct diagnosis? As far as I can tell, with the lambda expressions supplied above, each input element always expand to two new elements - never to []. The list is, however, infinite, so if the predicate goes unmatched, the function will evaluate forever.
Could Haskell be made smart enough to realize the list generation is blocked reading the self-reference and terminate the list?
It'd be nice if there was a general-purpose algorithm to determine whether or not a computation would eventually complete. Alas, as both Turing and Church (independently of each other) proved in 1936, such an algorithm can't exist. This is also known as the Halting problem. I'm not a mathematician, though, so I may be wrong, but I think it applies here as well...
The best replacement I've been able to come up with isn't quite as elegant
Not sure about that one... If I try to use it instead of the other definition of bfsList, it doesn't compile... Still, I don't think the problem is the empty list.

weaving lists in haskell

I'm trying to create a function to weave two lists together for example
[1,3,5] and [2,4] -> [1,2,3,4,5]
I get the basic principle of what I have to do and check for but I'm running into the problem that the required type
interleave :: ([a],[a]) -> [a]
is giving errors about different number of arguments. This is the version that's given me the least amount of errors so far
interleave ([],[]) = []
interleave (xs,[]) = [xs]
interleave ([],ys) = [ys]
interleave (x:xs) (y:ys) = x : y : interleave xs ys
I've tried messing with the arguments and outputs a few times but I'm new to haskell syntax so I don't really see where I'm going wrong
PART 2: Also I have a testing file to makes sure the functions are correct so if I'm still having trouble after this with that file (as I was getting similar input/output mismatches there which led me to change to what I have now) I'll probably post that code too for help

Unless you have a requirement to take only a single parameter, I think it would make more sense to change all cases to take two parameters:
interleave [] [] = []
interleave xs [] = xs
interleave [] ys = ys
interleave (x:xs) (y:ys) = x : y : interleave xs ys
Note that in the case interleave xs [], you originally returned [xs]. This is a list containing the list named xs. Instead you should return xs directly. Simlarly for the case involving ys.

The problem is that your type signature, and your three cases, are all defining a function of one parameter (of type ([a], [a])), but then your fourth case is trying to define a function of two parameters (the first one being x:xs, the second y:ys).
The fix is to change the fourth case to also be over a single pair parameter:
interleave (x:xs, y:ys) = x : y : interleave (xs, ys)

Haskell - List pattern matching on multiple parameters? (cannot construct the infinite type)

I'm having trouble using list pattern with multiple parameters. For example, trying to define:
somefunction (x:xs) (y:ys) = x:[y]
results in Occurs check: cannot construct the infinite type: t0 = [t0].
Basically, I want to take two lists as parameters to a function and manipulate each of them using the (x:xs) pattern matching approach. Why is this wrong and how can I do it right? Thank you much!
EDIT: Update with more code as suggested was needed in answers.
somefunction a [] = [a]:[]
somefunction [] b = [b]:[]
somefunction (x:xs) (y:ys) = x:[y]
EDIT 2: Missed an important update. The error I'm getting with the above code is Occurs check: cannot construct the infinite type: t0 = [[t0]]. I think I understand the problem now.

Your function snippet is perfectly sound:
(! 514)-> ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let f (x:xs) (y:ys) = x:[y]
Prelude> :type f
f :: [a] -> [a] -> [a]
But the context contradicts that type, and the type inference give you that error. For instance, I can create a context that will give this error:
Prelude> let g xs ys = xs : ys
Prelude> :type g
g :: a -> [a] -> [a]
And then if I combine f and g like below, then I get your error:
Prelude> let z x y = g x (f x y)
<interactive>:7:20:
Occurs check: cannot construct the infinite type: a0 = [a0]
In the first argument of `f', namely `x'
In the second argument of `g', namely `(f x y)'
In the expression: g x (f x y)
Prelude>
To understand you error properly, you will need to examine (or post) enough context.

The problem is with all 3 lines taken together:
somefunction a [] = [a]:[]
somefunction [] b = [b]:[]
somefunction (x:xs) (y:ys) = x:[y]
None of them are incorrect taken on their own. The problem is that the three equations are inconsistent about the return type of somefunction.
From the last equation, we can see that both arguments are lists (since you pattern match on them using the list constructor :).
From the last equation, we can see that the return type is a list whose elements must be the same type as the elements of the argument lists (which must also both be the same type), since the return value is x:[y] (which is more often written [x, y]; just the list containing only the two elements x and y) and x and y were elements of the argument lists. So if x has type t0, the arguments to somefunction both have type [t0] and the return type is [t0].
Now try to apply those facts to the first equation. a must be a list. So [a] (the list containing exactly one element a) must be a list of lists. And then [a]:[] (the list whose first element is [a] and whose tail is empty - also written [[a]]) must be a list of lists of lists! If the parameter a has type [t0] (to match the type we figured out from looking at the last equation), then [a] has type [[t0]] and [a]:[] (or [[a]]) has type [[[t0]]], which is the return type we get from this equation.
To reconcile what we learned from those two equations we need to find some type expression to use for t0 such that [t0] = [[[t0]]], which also requires that t0 = [[t0]]. This is impossible, which is what the error message Occurs check: cannot construct the infinite type: t0 = [[t0]] was about.
If your intention was to return one of the parameters as-is when the other one is empty, then you need something more like:
somefunction a [] = a
somefunction [] b = b
somefunction (x:xs) (y:ys) = [x, y]
Or it's possible that the first two equations were correct (you intend to return a list of lists of lists?), in which case the last one needs to be modified. Without knowing what you wanted the function to do, I can't say.

May be you want to write:
somefunction xs [] = xs
somefunction [] ys = ys
somefunction (x:xs) (y:ys) = x : y : []
You have extra brackets. And your definition of x : y not contains []. So compiler think, y is already a list

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Pattern matching and infinite lists - list

Related

Checking whether element is in list or not

Haskell: Why does (x:xs) match the list with only one element?

Haskell self-referential List termination

weaving lists in haskell

Haskell - List pattern matching on multiple parameters? (cannot construct the infinite type)

Categories

Resources