Haskell - List comprehension with infinite lists - list
This is the piece of code
primepowers n = foldr merge [] [ map (^i) primes | i <- [1..n] ] -- (1)
merge::(Ord t) =>[t]->[t]->[t]
merge x [] = x
merge [] y = y
merge (x:xs) (y:ys)
| x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
which is equal to the mathematical expression {p^i | p is prime, 1 <= i <= n}.
prime returns an infinite list of prime numbers. What I am interested is in the evaluation of (1).
These are my thoughts:
If we first just look at [ map (^i) primes | i <- [1..3] ] this would return an infinite list of [[2,3,5,7,9,...],...]. But as we know p^1 (p is prime) never ends, Haskell will never evaluate [p^2] and [p^3]. Is this just because it is an infinite list or because of lazy evaluation?
Let's carry on with merge:
merge will return [2,3,5,7,9,11,...] because again we still have an infinite list or because of some other reason?
Now to foldr:
foldr starts evaluating from back. Here with specifically ask for the rightmost element, which is a infinite list [p^3].
So the evaluation would be like this
merge (merge (merge [] [p^3]) [p^2]) [p^1]
But we should not forget that these lists are infinite, so how does Haskell deal with that fact?
Could anyone explain me the evaluation process of the above function?
The trick is to define it as
primepowers n = foldr (\(x:xs) r-> x:merge xs r)
[] [ map (^i) primes | i <- [1..n] ]
(as seen in Richard Bird's code in the article O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes").
The lists to the right of a current one all start with bigger numbers, there's no chance of their merged list ever producing a value smaller or equal to the current list's head, so it can be produced unconditionally.
That way it will also explore only as many of the internal streams as needed:
GHCi> let pps_list = [ map (^i) primes | i <- [1..42] ]
GHCi> :sprint pps_list
pps_list = _
GHCi> take 20 $ foldr (\(x:xs) r-> x:merge xs r) [] pps_list
[2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]
GHCi> :sprint pps_list
pps_list = (2 : 3 : 5 : 7 : 11 : 13 : 17 : 19 : 23 : 29 : 31 : 37 :
41 : _) :
(4 : 9 : 25 : 49 : _) : (8 : 27 : 125 : _) : (16 : 81 : _) :
(32 : 243 : _) : (64 : _) : _
To your question per se, foldr f z [a,b,c,...,n] = f a (f b (f c (... (f n z)...))) so (writing ps_n for map (^n) primes), your expression is equivalent to
merge ps (merge ps_2 (merge ps_3 (... (merge ps_n [])...)))
= merge ps r
where r = merge ps_2 (merge ps_3 (... (merge ps_n [])...))
because you use merge as your combining function. Notice that the leftmost merge springs into action first, while the expression for r isn't even built yet (because its value wasn't yet needed - Haskell's evaluation is by need.)
Now, this merge demands the head value of both its first and second argument (as written, it actually checks the second argument first, for being []).
The first argument isn't the problem, but the second is the result of folding all the rest of the lists ("r" in foldr's combining function stands for "recursive result"). Thus, each element in the list will be visited and its head element forced - and all this just to produce one very first value, the head of the result list, by the leftmost merge call...
In my code, the combining function does not at first demand the head of its second argument list. That's what limits its exploration of the whole list of lists, makes it more economic in its demands, and thus more productive (it will even work if you just omit the n altogether).
Your example Haskell expression [ map (^i) primes | i <- [1..3] ] returns finite list of length 3, each element being an infinite list: [[2,3,5,7,11,...],[4,9,25,...],[8,27,125,...]] so foldr has no problem translating it into merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] [])):
foldr merge [] [ map (^i) primes | i <- [1..3] ]
= merge [2,3,5,7,11,...] (foldr merge [] [ map (^i) primes | i <- [2..3] ])
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (foldr merge [] [ map (^i) primes | i <- [3..3] ]))
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] (foldr merge [] [])))
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] []))
= merge [2,3,5,7,11,...] (merge [4,9,25,...] [8,27,125,..])
= merge [2,3,5,7,11,...] (4:merge [9,25,...] [8,27,125,..])
= 2:merge [3,5,7,11,...] (4:merge [9,25,...] [8,27,125,..])
= 2:3:merge [5,7,11,...] (4:merge [9,25,...] [8,27,125,..])
= 2:3:4:merge [5,7,11,...] (merge [9,25,...] [8,27,125,..])
= 2:3:4:merge [5,7,11,...] (8:merge [9,25,...] [27,125,..])
= 2:3:4:5:merge [7,11,...] (8:merge [9,25,...] [27,125,..])
.....
As you can see, the rightmost inner list is examined first, because merge is strict in (i.e. demands to know) both its arguments, as explained above. For [ map (^i) primes | i <- [1..42] ] it would expand all 42 of them, and examine the heads of all of them, before producing even the head element of the result.
With the tweaked function, mg (x:xs) r = x:merge xs r, the evaluation proceeds as
foldr mg [] [ map (^i) primes | i <- [1..3] ]
= mg [2,3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ])
= 2:merge [3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ])
= 2:merge [3,5,7,11,...] (mg [4,9,25,...]
(foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:merge [3,5,7,11,...] (4:merge [9,25,...]
(foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:3:merge [5,7,11,...] (4:merge [9,25,...]
(foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:3:4:merge [5,7,11,...] (merge [9,25,...]
(foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:3:4:merge [5,7,11,...] (merge [9,25,...]
(mg [8,27,125,..] (foldr mg [] [])))
= 2:3:4:merge [5,7,11,...] (merge [9,25,...]
(8:merge [27,125,..] (foldr mg [] [])))
= 2:3:4:merge [5,7,11,...] (8:merge [9,25,...]
(merge [27,125,..] (foldr mg [] [])))
= 2:3:4:5:merge [7,11,...] (8:merge [9,25,...]
(merge [27,125,..] (foldr mg [] [])))
.....
so it starts producing the results much sooner, without expanding much of the inner lists. This just follows the definition of foldr,
foldr f z (x:xs) = f x (foldr f z xs)
where, because of the laziness, (foldr f z xs) is not evaluated right away if f does not demand its value (or a part of it, like its head).
The lists being merged are infinite, but that doesn't matter.
What matters is that you only have a finite number of lists being merged, and so to compute the next element of the merge you only need to perform a finite number of comparisons.
To compute the head of merge xs ys you only need to compute the head of xs and the head of ys. So by induction, if you have a finite tree of merge operations, you can compute the head of the overall merge in finite time.
[map (^i) primes | i <- [1..3]] returns just thunk. Nothing is evaluated for now. You could try this:
xs = [x | x <- [1..], error ""]
main = print $ const 0 xs
This program prints 0, so error "" wasn't evaluated here.
You can think about foldr being defined like this:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f xs)
Then
primepowers n = foldr merge [] [map (^i) primes | i <- [1..3]]
evaluates like this (after it was forced):
merge thunk1 (merge thunk2 (merge thunk3 []))
where thunkn is a suspended computation of primes in n-th power. Now the first merge forces evaluation of thunk1 and merge thunk2 (merge thunk3 []), which are evaluated to weak head normal forms (whnf). Forcing merge thunk2 (merge thunk3 []) causes forcing thunk2 and merge thunk3 []. merge thunk3 [] reduces to thunk3 and then thunk3 is forced. So the expression becomes
merge (2 : thunk1') (merge (4 : thunk2') (8 : thunk3'))
Which, due to the definition of merge, reduces to
merge (2 : thunk1') (4 : merge thunk2' (8 : thunk3')
And again:
2 : merge thunk1' (4 : merge thunk2' (8 : thunk3')
Now merge forces thunk1', but not the rest of the expression, because it's already in whnf
2 : merge (3 : thunk1'') (4 : merge thunk2' (8 : thunk3)
2 : 3 : merge thunk1'' (4 : merge thunk2' (8 : thunk3')
2 : 3 : merge (5 : thunk1''') (4 : merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge (9 : thunk2'') (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (8 : merge (9 : thunk2'') thunk3')
2 : 3 : 4 : 5 : merge thunk1''' (8 : merge (9 : thunk2'') thunk3')
...
Intuitively, only those values become evaluated, that are needed. Read this for a better explanation.
You can also merge infinite list of infinite lists. The simplest way would be:
interleave (x:xs) ys = x : interleave ys xs
primepowers = foldr1 interleave [map (^i) primes | i <- [1..]]
The interleave function interleaves two infinite lists, for example, interleave [1,3..] [2,4..] is equal to [1..]. So take 20 primepowers gives you [2,4,3,8,5,9,7,16,11,25,13,27,17,49,19,32,23,121,29,125]. But this list is unordered, we can do better.
[map (^i) primes | i <- [1..]] reduces to
[[2,3,5...]
,[4,9,25...]
,[8,27,125...]
...
]
We have the precondition, that in every n-th list there are elements, that are smaller, than head of the (n+1)-th list. We can extract such elements from the first list (2 and 3 are smaller than 4), and now we have this:
[[5,7,11...]
,[4,9,25...]
,[8,27,125...]
...
]
The precondition doesn't hold, so we must fix this and swap the first list and the second:
[[4,9,25...]
,[5,7,11...]
,[8,27,125...]
...
]
Now we extract 4 and swap the first list and the second:
[[5,7,11...]
,[9,25,49...]
,[8,27,125...]
...
]
But the precondition doesn't hold, since there are elements in the second list (9), that are not smaller than the head of the third list (8). So we do the same trick again:
[[5,7,11...]
,[8,27,125...]
,[9,25,49...]
...
]
And now we can extract elements again. Repeating the process infinitely gives us ordered list of prime powers. Here is the code:
swap xs#(x:_) xss = xss1 ++ xs : xss2 where
(xss1, xss2) = span ((< x) . head) xss
mergeAll (xs:xss#((x:_):_)) = xs1 ++ mergeAll (swap xs2 xss) where
(xs1, xs2) = span (< x) xs
primepowers = mergeAll [map (^i) primes | i <- [1..]]
For example, take 20 primepowers is equal to [2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41].
This is probably not the nicest way to obtaining ordered list of prime powers, but it's fairly easy one.
EDIT
Look at the Will Ness' answer for a better solution, which is both easier and nicer.
It is true that merge needs to completely scan its whole input lists to produce its whole output. However, the key point is that every element in the output depends only from finite prefixes of the input lists.
For instance, consider take 10 (map (*2) [1..]). To compute the first 10 elements, you do not need to examine the whole [1..]. Indeed, map will not scan the whole infinite list and "after that" start returning the output: if it behaved like that, it would simply hang on infinite lists. This "streaming" property of map is given by laziness and the map definition
map f [] = []
map f (x:xs) = x : map f xs
The last line reads "yield x, and then proceed with the rest", so the caller gets to inspect x before map produces its whole output. By comparison
map f xs = go xs []
where go [] acc = acc
go (x:xs) acc = go xs (acc ++ [f x])
would be another definition of map which would start generating its output only after its input has been consumed. It is equivalent on finite lists (performance aside), but not equivalent on infinite ones (hangs on infinite lists).
If you want to empirically test that your merge is indeed working lazily, try this:
take 10 $ merge (10:20:30:error "end of 1") (5:15:25:35:error "end of 2")
Feel free to play by changing the constants. You will see an exception being printed on screen, but only after a few list elements have already been produced by merge.
Related
Return a list of all the even elements in the orginal list - how can I write this function without using recursion?
This function takes a list and returns a list of all the even elements from the original list. I'm trying to figure out how to do this using foldl, foldr, or map instead but I can't seem to figure it out. fun evens [] = [] | evens (x::xs) = if x mod 2 = 0 then x::evens(xs) else evens(xs);
Since you want fewer elements than you start with, map is out. If you copy a list using both foldl and foldr, - foldl (op ::) [] [1,2,3]; val it = [3,2,1] : int list - foldr (op ::) [] [1,2,3]; val it = [1,2,3] : int list you see that foldl reverses it, so foldr is a pretty natural choice if you want to maintain the order. Now all you need is a function that conses a number to a list if it is even, and just produces the list otherwise. Like this one: fun cons_if_even (x, xs) = if x mod 2 = 0 then x::xs else xs And then you have fun evens xs = foldr cons_if_even [] xs or inlined, fun evens xs = foldr (fn (y, ys) => if y mod 2 = 0 then y::ys else ys) [] xs It's more "natural" to use the standard filtering function, though: fun evens xs = filter (fn x => x mod 2 = 0) xs
How to fix the error ('cannot construct the infinite type') in my code and how to make my code work
Basically i'm trying to do a function where you are given a list and a number and you have to split the list in lists of the same size as the number given and the last split of all can have a length lower than the number given separa a xs = if length xs >= a then separaM a (drop a xs) ([take a xs]) else [xs] separaM a xs yss = if length xs >= a then separaM a (drop a xs) (yss : (take a xs)) else separaM a [] (yss : xs) separaM a [] yss = yss I expect the output of 3 "comovais" to be ["com","ova","is"] but in my program there is no output because of the error
Note that the expression: yss : (take a xs) (take a xs) has type [b], so yss has type b. But when you pass yss : (take a xs) as an argument to separaM function, yss is expected has type [b] not b. That is why the error occurred. Actually, you don't need yss to store the result, the recursive function can be defined as: separaM _ [] = [] separaM a xs = (if length xs >= a then (take a xs) else xs) : separaM a (drop a xs)
Your code has some errors in it. Tweaking your misuse of (:) gets it to pass the type-checker: separa a xs | length xs >= a = go a (drop a xs) [take a xs] | otherwise = [xs] where go a xs yss | length xs >= a = go a (drop a xs) (yss ++ [take a xs]) -- was: (yss : (take a xs)) | otherwise = go a [] (yss ++ [xs]) -- was: (yss : xs) go a [] yss = yss but it's better to further change it to separa :: Int -> [a] -> [[a]] separa a xs | length xs >= a = go a (drop a xs) [take a xs] | otherwise = [xs] where go a xs yss | length xs >= a = go a (drop a xs) ([take a xs] ++ yss) | otherwise = reverse ([xs] ++ yss) It works: > separa 3 [1..10] [[1,2,3],[4,5,6],[7,8,9],[10]] This is a common "build in reverse, then reverse when built" idiom, frequently seen in strict functional languages. Some of them allow for lists to be built in top-down, natural order, by a technique known as tail-recursion modulo cons. Haskell is lazy, and lets us build its lists in top-down manner naturally and easily, with the equivalent guarded recursion: separa :: Int -> [a] -> [[a]] separa a xs | length xs >= a = go a (drop a xs) [take a xs] | otherwise = [xs] where go a xs yss | length xs >= a = -- go a (drop a xs) (yss ++ [take a xs]) yss ++ go a (drop a xs) [take a xs] | otherwise = -- go a [] (yss ++ [xs]) yss ++ [xs] There's an off-by-one error here; I'll leave it for you to fix on your own. But sometimes the infinite type is inherent to a problem, and not a result of a programming error. Then we can fix it by using recursive types. Whenever we get type equivalency t ~ a..b..t..c.., we can start by defining a type newtype T = MkT (a..b..T..c..) then see which type variables are free and close over them, as newtype T a b c = MkT (a..b..(T a b c)..c..) An example: Infinite type error when defining zip with foldr only; can it be fixed?
Hamming with lists in Haskell
I want to write a hamming function in Haskell that gets a list as Input. I already have this: merge :: [Integer] -> [Integer] -> [Integer] merge (x:xs)(y:ys) | x == y = x : merge xs ys | x < y = x : merge xs (y:ys) | otherwise = y : merge (x:xs) ys hamming :: [Integer] hamming = 1 : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming)) That was easy. But now i want something like "hamming [4,6,7,9]" as input. The actual input is 1 but now the input should be a list and every number that is in the list is in the hamming-list. And of course 2x 3x and 5x are in the list. I wrote something like "hamming (x:xs) = x : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming))" just to test with a list but it doesn't work.
Even though this is a duplicate, I'll show you how you could arrive at the solution. Which does appear at the duplicate; here my focus will be more on a journey, not its destination. You tried hamming (x:xs) = 1 : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming)) What is going on here? Is it a function? A list? It's all jumbled up here; it's a mess. You want to turn your list definition into a function, calling it as hamming [2,3,5], say; but then what should be going into the map expressions? A function call, hamming [2,3,5], as well? But that would defeat the purpose, as we are expressly using the same list here in several separate places, i.e. the three (or possibly more...) maps, each maintaining its own pointer into the shared sequence. And making separate function calls, even if equivalent, will (most likely and nearly assuredly) produce three separate even if equal lists. And that is not what we need here (this is actually a fun exercise; try it and see how much slower and memory hungry the function will get). So, separate your concerns! Re-write it first as (still invalid) hamming (x:xs) = h where h = 1 : merge (map (2*) h) (merge (map (3*) h) (map (5*) h)) Now h is the shared list, and you have the freedom to make your function, hamming, whatever you want it to be, i.e. hamming :: [Integer] -> [Integer] hamming [2,3,5] = h where h = 1 : merge (map (2*) h) (merge (map (3*) h) (map (5*) h)) = 1 : merge (map (2*) h) (merge (map (3*) h) (merge (map (5*) h) [])) that is, = 1 : foldr merge [] [map (p*) h | p <- [2,3,5]] because g a (g b (g c (... (g n z) ...))) = foldr g z [a,b,c,...,n] and there it is, your answer, up to some mundane renaming of parameters. Don't forget to rename your merge function as union, as "merge" isn't supposed to skip the duplicates, being evocative of mergesort as it is. And keep all your definitions starting at the same indentation level in the file.
What does this list permutations implementation in Haskell exactly do?
I am studying the code in the Data.List module and can't exactly wrap my head around this implementation of permutations: permutations :: [a] -> [[a]] permutations xs0 = xs0 : perms xs0 [] where perms [] _ = [] perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is) where interleave xs r = let (_,zs) = interleave' id xs r in zs interleave' _ [] r = (ts, r) interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r in (y:us, f (t:y:us) : zs) Can somebody explain in detail how these nested functions connect/work with each other?
Sorry about the late answer, it took a bit longer to write down than expected. So, first of all to maximize lazyness in a list function like this there are two goals: Produce as many answers as possible before inspecting the next element of the input list The answers themselves must be lazy, and so there the same must hold. Now consider the permutation function. Here maximal lazyness means: We should determine that there are at least n! permutations after inspecting just n elements of input For each of these n! permutations, the first n elements should depend only on the first n elements of the input. The first condition could be formalized as length (take (factorial n) $ permutations ([1..n] ++ undefined))) `seq` () == () David Benbennick formalized the second condition as map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n] Combined, we have map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined)) == permutations [1..n] Let's start with some simple cases. First permutation [1..]. We must have permutations [1..] = [1,???] : ??? And with two elements we must have permutations [1..] = [1,2,???] : [2,1,???] : ??? Note that there is no choice about the order of the first two elements, we can't put [2,1,...] first, since we already decided that the first permutation must start with 1. It should be clear by now that the first element of permutations xs must be equal to xs itself. Now on to the implementation. First of all, there are two different ways to make all permutations of a list: Selection style: keep picking elements from the list until there are none left permutations [] = [[]] permutations xxs = [(y:ys) | (y,xs) <- picks xxs, ys <- permutations xs] where picks (x:xs) = (x,xs) : [(y,x:ys) | (y,ys) <- picks xs] Insertion style: insert or interleave each element in all possible places permutations [] = [[]] permutations (x:xs) = [y | p <- permutations xs, y <- interleave p] where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys) Note that neither of these is maximally lazy. The first case, the first thing this function does is pick the first element from the entire list, which is not lazy at all. In the second case we need the permutations of the tail before we can make any permutation. To start, note that interleave can be made more lazy. The first element of interleave yss list is [x] if yss=[] or (x:y:ys) if yss=y:ys. But both of these are the same as x:yss, so we can write interleave yss = (x:yss) : interleave' yss interleave' [] = [] interleave' (y:ys) = map (y:) (interleave ys) The implementation in Data.List continues on this idea, but uses a few more tricks. It is perhaps easiest to go through the mailing list discussion. We start with David Benbennick's version, which is the same as the one I wrote above (without the lazy interleave). We already know that the first elment of permutations xs should be xs itself. So, let's put that in permutations xxs = xxs : permutations' xxs permutations' [] = [] permutations' (x:xs) = tail $ concatMap interleave $ permutations xs where interleave = .. The call to tail is of course not very nice. But if we inline the definitions of permutations and interleave we get permutations' (x:xs) = tail $ concatMap interleave $ permutations xs = tail $ interleave xs ++ concatMap interleave (permutations' xs) = tail $ (x:xs) : interleave' xs ++ concatMap interleave (permutations' xs) = interleave' xs ++ concatMap interleave (permutations' xs) Now we have permutations xxs = xxs : permutations' xxs permutations' [] = [] permutations' (x:xs) = interleave' xs ++ concatMap interleave (permutations' xs) where interleave yss = (x:yss) : interleave' yss interleave' [] = [] interleave' (y:ys) = map (y:) (interleave ys) The next step is optimization. An important target would be to eliminate the (++) calls in interleave. This is not so easy, because of the last line, map (y:) (interleave ys). We can't immediately use the foldr/ShowS trick of passing the tail as a parameter. The way out is to get rid of the map. If we pass a parameter f as the function that has to be mapped over the result at the end, we get permutations' (x:xs) = interleave' id xs ++ concatMap (interleave id) (permutations' xs) where interleave f yss = f (x:yss) : interleave' f yss interleave' f [] = [] interleave' f (y:ys) = interleave (f . (y:)) ys Now we can pass in the tail, permutations' (x:xs) = interleave' id xs $ foldr (interleave id) [] (permutations' xs) where interleave f yss r = f (x:yss) : interleave' f yss r interleave' f [] r = r interleave' f (y:ys) r = interleave (f . (y:)) ys r This is starting to look like the one in Data.List, but it is not the same yet. In particular, it is not as lazy as it could be. Let's try it out: *Main> let n = 4 *Main> map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined)) [[1,2,3,4],[2,1,3,4],[2,3,1,4],[2,3,4,1]*** Exception: Prelude.undefined Uh oh, only the first n elements are correct, not the first factorial n. The reason is that we still try to place the first element (the 1 in the above example) in all possible locations before trying anything else. Yitzchak Gale came up with a solution. Considered all ways to split the input into an initial part, a middle element, and a tail: [1..n] == [] ++ 1 : [2..n] == [1] ++ 2 : [3..n] == [1,2] ++ 3 : [4..n] If you haven't seen the trick to generate these before before, you can do this with zip (inits xs) (tails xs). Now the permutations of [1..n] will be [] ++ 1 : [2..n] aka. [1..n], or 2 inserted (interleaved) somewhere into a permutation of [1], followed by [3..n]. But not 2 inserted at the end of [1], since we already go that result in the previous bullet point. 3 interleaved into a permutation of [1,2] (not at the end), followed by [4..n]. etc. You can see that this is maximally lazy, since before we even consider doing something with 3, we have given all permutations that start with some permutation of [1,2]. The code that Yitzchak gave was permutations xs = xs : concat (zipWith newPerms (init $ tail $ tails xs) (init $ tail $ inits xs)) where newPerms (t:ts) = map (++ts) . concatMap (interleave t) . permutations3 interleave t [y] = [[t, y]] interleave t ys#(y:ys') = (t:ys) : map (y:) (interleave t ys') Note the recursive call to permutations3, which can be a variant that doesn't have to be maximally lazy. As you can see this is a bit less optimized than what we had before. But we can apply some of the same tricks. The first step is to get rid of init and tail. Let's look at what zip (init $ tail $ tails xs) (init $ tail $ inits xs) actually is *Main> let xs = [1..5] in zip (init $ tail $ tails xs) (init $ tail $ inits xs) [([2,3,4,5],[1]),([3,4,5],[1,2]),([4,5],[1,2,3]),([5],[1,2,3,4])] The init gets rid of the combination ([],[1..n]), while the tail gets rid of the combination ([1..n],[]). We don't want the former, because that would fail the pattern match in newPerms. The latter would fail interleave. Both are easy to fix: just add a case for newPerms [] and for interleave t []. permutations xs = xs : concat (zipWith newPerms (tails xs) (inits xs)) where newPerms [] is = [] newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations is)) interleave t [] = [] interleave t ys#(y:ys') = (t:ys) : map (y:) (interleave t ys') Now we can try to inline tails and inits. Their definition is tails xxs = xxs : case xxs of [] -> [] (_:xs) -> tails xs inits xxs = [] : case xxs of [] -> [] (x:xs) -> map (x:) (inits xs) The problem is that inits is not tail recursive. But since we are going to take a permutation of the inits anyway, we don't care about the order of the elements. So we can use an accumulating parameter, inits' = inits'' [] where inits'' is xxs = is : case xxs of [] -> [] (x:xs) -> inits'' (x:is) xs Now we make newPerms a function of xxs and this accumulating parameter, instead of tails xxs and inits xxs. permutations xs = xs : concat (newPerms' xs []) where newPerms' xxs is = newPerms xxs is : case xxs of [] -> [] (x:xs) -> newPerms' xs (x:is) newPerms [] is = [] newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations3 is)) inlining newPerms into newPerms' then gives permutations xs = xs : concat (newPerms' xs []) where newPerms' [] is = [] : [] newPerms' (t:ts) is = map (++ts) (concatMap (interleave t) (permutations is)) : newPerms' ts (t:is) inlining and unfolding concat, and moving the final map (++ts) into interleave, permutations xs = xs : newPerms' xs [] where newPerms' [] is = [] newPerms' (t:ts) is = concatMap interleave (permutations is) ++ newPerms' ts (t:is) where interleave [] = [] interleave (y:ys) = (t:y:ys++ts) : map (y:) (interleave ys) Then finally, we can reapply the foldr trick to get rid of the (++): permutations xs = xs : newPerms' xs [] where newPerms' [] is = [] newPerms' (t:ts) is = foldr (interleave id) (newPerms' ts (t:is)) (permutations is) where interleave f [] r = r interleave f (y:ys) r = f (t:y:ys++ts) : interleave (f . (y:)) ys r Wait, I said get rid of the (++). We got rid of one of them, but not the one in interleave. For that, we can see that we are always concatenating some tail of yys to ts. So, we can unfold the calculating (ys++ts) along with the recursion of interleave, and have the function interleave' f ys r return the tuple (ys++ts, interleave f ys r). This gives permutations xs = xs : newPerms' xs [] where newPerms' [] is = [] newPerms' (t:ts) is = foldr interleave (newPerms' ts (t:is)) (permutations is) where interleave ys r = let (_,zs) = interleave' id ys r in zs interleave' f [] r = (ts,r) interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r in (y:us, f (t:y:us) : zs) And there you have it, Data.List.permutations in all its maximally lazy optimized glory. Great write-up by Twan! I (#Yitz) will just add a few references: The original email thread where Twan developed this algorithm, linked above by Twan, is fascinating reading. Knuth classifies all possible algorithms that satisfy these criteria in Vol. 4 Fasc. 2 Sec. 7.2.1.2. Twan's permutations3 is essentially the same as Knuth's "Algorithm P". As far as Knuth knows, that algorithm was first published by English church bell ringers in the 1600's.
The basic algorithm is based on the idea of taking one item from the list at a time, finding every permutation of items including that new one, and then repeating. To explain what this looks like, [1..] will mean a list from one up, where no values (no even the first) have been examined yet. It is the parameter to the function. The resulting list is something like: [[1..]] ++ [[2,1,3..]] ++ [[3,2,1,4..], [2,3,1,4..]] ++ [[3,1,2,4..], [1,3,2,4..]] [[4,3,2,1,5..], etc The clustering above reflects the core idea of the algorithm... each row represents a new item taken from the input list, and added to the set of items that are being permuted. Furthermore, it is recursive... on each new row, it takes all the existing permutations, and places the item in each place it hasn't been yet (all the places other then the last one). So, on the third row, we have the two permutations [2,1] and [1,2], and then we take place 3 in both available slots, so [[3,2,1], [2,3,1]] and [[3,1,2], [1,3,2]] respectively, and then append whatever the unobserved part is. Hopefully, this at least clarifies the algorithm a little. However, there are some optimizations and implementation details to explain. (Side note: There are two central performance optimizations that are used: first, if you want to repeatedly prepend some items to multiple lists, map (x:y:z:) list is a lot faster then matching some conditional or pattern matching, because it has not branch, just a calculated jump. Second, and this one is used a lot, it is cheap (and handy) to build lists from the back to the front, by repeatedly prepending items; this is used in a few places. The first thing the function does is establish a two bases cases: first, every list has one permutation at least: itself. This can be returned with no evaluation whatsoever. This could be thought of as the "take 0" case. The outer loop is the part that looks like the following: perms (t:ts) is = <prepend_stuff_to> (perms ts (t:is)) ts is the "untouched" part of the list, that we are not yet permuting and haven't even examined yet, and is initially the entire input sequence. t is the new item we will be sticking in between the permutations. is is the list of items that we will permute, and then place t in between, and is initially empty. Each time we calculate one of the above rows, we reach the end of the items we have prepended to the thunk containing (perms ts (t:is)) and will recurse. The second loop in is a foldr. It for each permutation of is (the stuff before the current item in the original list), it interleaves the item into that list, and prepends it to the thunk. foldr interleave <thunk> (permutations is) The third loop is one of the most complex. We know that it prepends each possible interspersing of our target item t in a permutation, followed by the unobserved tail onto the result sequence. It does this with a recursive call, where it folds the permutation into a stack of functions as it recurses, and then as it returns, it executes what amounts to a two little state machines to build the results. Lets look at an example: interleave [<thunk>] [1,2,3] where t = 4 and is = [5..] First, as interleave' is called recursively, it builds up ys and fs on the stack, like this: y = 1, f = id y = 2, f = (id . (1:)) y = 3, f = ((id . (1:)) . (2:)) (the functions are conceptually the same as ([]++), ([1]++), and ([1,2]++) respectively) Then, as we go back up, we return and evalute a tuple containing two values, (us, zs). us is the list to which we prepend the ys after our target t. zs is the result accumulator, where each time we get a new permutation, we prepend it to the results lists. Thus, to finish the example, f (t:y:us) gets evaluated and returned as a result for each level of the stack above. ([1,2]++) (4:3:[5..]) === [1,2,4,3,5..] ([1]++) (4:2[3,5..]) === [1,4,2,3,5..] ([]++) (4:1[2,3,5..]) === [4,1,2,3,5..] Hopefully that helps, or at least supplements the material linked in the author's comment above. (Thanks to dfeuer for bringing this up on IRC and discussing it for a few hours)
Creating tuples variations from a list - Haskell
I am a relative haskell newbie and am trying to create a list of tuples with an equation I named splits that arises from a single list originally, like this: splits [1..4] --> [ ([1],[2,3,4]), ([1,2],[3,4]), ([1,2,3],[4]) ] or splits "xyz" --> [ ("x","yz"), ("xy","z") ] Creating a list of tuples that take 1, then 2, then 3 elements, etc. I figured out I should probably use the take/drop functions, but this is what I have so far and I'm running into a lot of type declaration errors... Any ideas? splits :: (Num a) => [a] -> [([a], [a])] splits [] = error "shortList" splits [x] | length [x] <= 1 = error "shortList" | otherwise = splits' [x] 1 where splits' [x] n = [(take n [x], drop n [x])] + splits' [x] (n+1)
The Haskell-y approach is to use the inits and tails functions from Data.List: inits [1,2,3,4] = [ [], [1], [1,2], [1,2,3], [1,2,3,4] ] tails [1,2,3,4] = [ [1,2,3,4], [2,3,4], [3,4], [4], [] ] We then just zip these two lists together and drop the first pair: splits xs = tail $ zip (inits xs) (tails xs) or equivalently, drop the first element of each of the constituent lists first: = zip (tail (inits xs)) (tail (tails xs))
splits [] = [] splits [_] = [] splits (x:xs) = ([x], xs) : map (\(ys, zs) -> (x:ys, zs)) (splits xs)
You have several mistakes. You don't need to have Num a class for a. use [] or [x] as pattern, but not a variable, use xs instead. Use ++ instead of + for concatenating lists. In our case use (:) to add list to value instead of ++. Add stop for recursion, like additional variable maxn to splits' splits :: [a] -> [([a], [a])] splits [] = error "shortList" splits xs | lxs <= 1 = error "shortList" | otherwise = splits' xs 1 lxs where lxs = length xs splits' xs n maxn | n > maxn = [] | otherwise = (take n xs, drop n xs) : splits' xs (n+1) maxn
There is a built in function that kind of does a part of what you want: splitAt :: Int -> [a] -> ([a], [a]) which does what it looks like it would do: > splitAt 2 [1..4] ([1,2],[3,4]) Using this function, you can just define splits like this: splits xs = map (flip splitAt xs) [1..length xs - 1]