I've just started learning about Functional Programming, using Haskel.
I'm slowly getting through Erik Meijer's lectures on Channel 9 (I've watched the first 4 so far) and in the 4th video Erik explains how tail works, and it fascinated me.
I've tried to write a function that returns the middle of a list (2 items for even lengths, 1 for odd) and I'd like to hear how others would implement it in
The least amount of Haskell code
The fastest Haskell code
If you could explain your choices I'd be very grateful.
My beginners code looks like this:
middle as | length as > 2 = middle (drop 2 (reverse as))
| otherwise = as
Just for your amusement, a solution from someone who doesn't speak Haskell:
Write a recursive function that takes two arguments, a1 and a2, and pass your list in as both of them. At each recursion, drop 2 from a2 and 1 from a1. If you're out of elements for a2, you'll be at the middle of a1. You can handle the case of just 1 element remaining in a2 to answer whether you need 1 or 2 elements for your "middle".
I don't make any performance claims, though it only processes the elements of the list once (my assumption is that computing length t is an O(N) operation, so I avoid it), but here's my solution:
mid [] = [] -- Base case: the list is empty ==> no midpt
mid t = m t t -- The 1st t is the slow ptr, the 2nd is fast
where m (x:_) [_] = [x] -- Base case: list tracked by the fast ptr has
-- exactly one item left ==> the first item
-- pointed to by the slow ptr is the midpt.
m (x:y:_) [_,_] = [x,y] -- Base case: list tracked by the fast ptr has
-- exactly two items left ==> the first two
-- items pointed to by the slow ptr are the
-- midpts
m (_:t) (_:_:u) = m t u -- Recursive step: advance slow ptr by 1, and
-- advance fast ptr by 2.
The idea is to have two "pointers" into the list, one that increments one step at each point in the recursion, and one that increments by two.
(which is essentially what Carl Smotricz suggested)
Two versions
Using pattern matching, tail and init:
middle :: [a] -> [a]
middle l#(_:_:_:_) = middle $ tail $ init l
middle l = l
Using length, take, signum, mod, drop and div:
middle :: [a] -> [a]
middle xs = take (signum ((l + 1) `mod` 2) + 1) $ drop ((l - 1) `div ` 2) xs
where l = length xs
The second one is basically a one-liner (but uses where for readability).
I've tried to write a function that returns the middle of a list (2 items for even lengths, 1 for odd) and I'd like to hear how others would implement it in
The right datastructure for the right problem. In this case, you've specified something that only makes sense on a finite list, right? There is no 'middle' to an infinite list. So just reading the description, we know that the default Haskell list may not be the best solution: we may be paying the price for the laziness even when we don't need it. Notice how many of the solutions have difficulty avoiding 2*O(n) or O(n). Singly-linked lazy lists just don't match a quasi-array-problem too well.
Fortunately, we do have a finite list in Haskell: it's called Data.Sequence.
Let's tackle it the most obvious way: 'index (length / 2)'.
Data.Seq.length is O(1) according to the docs. Data.Seq.index is O(log(min(i,n-i))) (where I think i=index, and n=length). Let's just call it O(log n). Pretty good!
And note that even if we don't start out with a Seq and have to convert a [a] into a Seq, we may still win. Data.Seq.fromList is O(n). So if our rival was a O(n)+O(n) solution like xs !! (length xs), a solution like
middle x = let x' = Seq.fromList x in Seq.index(Seq.length x' `div` 2)
will be better since it would be O(1) + O(log n) + O(n), which simplifies to O(log n) + O(n), obviously better than O(n)+O(n).
(I leave as an exercise to the reader modifying middle to return 2 items if length be even and 1 if length be odd. And no doubt one could do better with an array with constant-time length and indexing operations, but an array isn't a list, I feel.)
Haskell solution inspired by Carl's answer.
middle = m =<< drop 1
where m [] = take 1
m [_] = take 2
m (_:_:ys) = m ys . drop 1
If the sequence is a linked list, traversal of this list is the dominating factor of efficiency. Since we need to know the overall length, we have to traverse the list at least once. There are two equivalent ways to get the middle elements:
Traverse the list once to get the length, then traverse it half to get at the middle elements.
Traverse the list in double steps and single steps at the same time, so that when the first traversal stops, the second traversal is in the middle.
Both need the same number of steps. The second is needlessly complicated, in my opinion.
In Haskell, it might be something like this:
middle xs = take (2 - r) $ drop ((div l 2) + r - 1) xs
where l = length xs
r = rem l 2
middle xs =
let (ms, len) = go xs 0 [] len
in ms
go (x:xs) i acc len =
let acc_ = case len `divMod` 2 of
(m, 0) -> if m == (i+1) then (take 2 (x:xs))
else acc
(m, 1) -> if m == i then [x]
else acc
in go xs (i+1) acc_ len
go [] i acc _ = (acc,i)
This solution traverses the list just once using lazy evaluation. While it traverses the list, it calculates the length and then backfeeds it to the function:
let (ms, len) = go xs 0 [] len
Now the middle elements can be calculated:
let acc' = case len `divMod` 2 of
...
F# solution based on Carl's answer:
let halve_list l =
let rec loop acc1 = function
| x::xs, [] -> List.rev acc1, x::xs
| x::xs, [y] -> List.rev (x::acc1), xs
| x::xs, y::y'::ys -> loop (x::acc1) (xs, ys)
| [], _ -> [], []
loop [] (l, l)
It's pretty easy to modify to get the median elements in the list too:
let median l =
let rec loop acc1 = function
| x::xs, [] -> [List.head acc1; x]
| x::xs, [y] -> [x]
| x::xs, y::y'::ys -> loop (x::acc1) (xs, ys)
| [], _ -> []
loop [] (l, l)
A more intuitive approach uses a counter:
let halve_list2 l =
let rec loop acc = function
| (_, []) -> [], []
| (0, rest) -> List.rev acc, rest
| (n, x::xs) -> loop (x::acc) (n - 1, xs)
let count = (List.length l) / 2
loop [] (count, l)
And a really ugly modification to get the median elements:
let median2 l =
let rec loop acc = function
| (n, [], isEven) -> []
| (0, rest, isEven) ->
match rest, isEven with
| x::xs, true -> [List.head acc; x]
| x::xs, false -> [x]
| _, _ -> failwith "Should never happen"
| (n, x::xs, isEven) -> loop (x::acc) (n - 1, xs, isEven)
let len = List.length l
let count = len / 2
let isEven = if len % 2 = 0 then true else false
loop [] (count, l, isEven)
Getting the length of a list requires traversing its entire contents at least once. Fortunately, it's perfectly easy to write your own list data structure which holds the length of the list in each node, allowing you get get the length in O(1).
Weird that this perfectly obvious formulation hasn't come up yet:
middle [] = []
middle [x] = [x]
middle [x,y] = [x,y]
middle xs = middle $ init $ tail xs
A very straightforward, yet unelegant and not so terse solution might be:
middle :: [a] -> Maybe [a]
middle xs
| len <= 2 = Nothing
| even len = Just $ take 2 . drop (half - 1) $ xs
| odd len = Just $ take 1 . drop (half) $ xs
where
len = length xs
half = len `div` 2
This iterates twice over the list.
mid xs = m where
l = length xs
m | l `elem` [0..2] = xs
m | odd l = drop (l `div` 2) $ take 1 $ xs
m | otherwise = drop (l `div` 2 - 1) $ take 2 $ xs
I live for one liners, although this example only works for odd lists. I just want to stretch my brain! Thank you for the fun =)
foo d = map (\(Just a) -> a) $ filter (/=Nothing) $ zipWith (\a b -> if a == b then Just a else Nothing) (Data.List.nub d) (Data.List.nub $ reverse d)
I'm not much of a haskeller myself but I tried this one.
First the tests (yes, you can do TDD using Haskell)
module Main
where
import Test.HUnit
import Middle
main = do runTestTT tests
tests = TestList [ test1
, test2
, test3
, test4
, test_final1
, test_final2
]
test1 = [0] ~=? middle [0]
test2 = [0, 1] ~=? middle [0, 1]
test3 = [1] ~=? middle [0, 1, 2]
test4 = [1, 2] ~=? middle [0, 1, 2, 3]
test_final1 = [3] ~=? middle [0, 1, 2, 3, 4, 5, 6]
test_final2 = [3, 4] ~=? middle [0, 1, 2, 3, 4, 5, 6, 7]
And the solution I came to:
module Middle
where
middle a = midlen a (length a)
midlen (a:xs) 1 = [a]
midlen (a:b:xs) 2 = [a, b]
midlen (a:xs) lg = midlen xs (lg - (2))
It will traverse list twice, once for getting length and a half more to get the middle, but I don't care it's still O(n) (and getting the middle of something implies to get it's length, so no reason to avoid it).
My solution, I like to keep things simple:
middle [] = []
middle xs | odd (length xs) = [xs !! ((length xs) `div` 2)]
| otherwise = [(xs !! ((length xs) `div` 2)),(reverse $ xs) !! ((length xs)`div` 2)]
Use of !! in Data.List as the function to get the value at a given index, which in this case is half the length of the list.
Edit: it actually works now
I like Svante's answer. My version:
> middle :: [a] -> [a]
> middle [] = []
> middle xs = take (r+1) . drop d $ xs
> where
> (d,r) = (length xs - 1) `divMod` 2
Here is my version. It was just a quick run up. I'm sure it's not very good.
middleList xs#(_:_:_:_) = take (if odd n then 1 else 2) $ drop en xs
where n = length xs
en = if n < 5 then 1 else 2 * (n `div` 4)
middleList xs = xs
I tried. :)
If anyone feels like commenting and telling me how awful or good this solution is, I would deeply appreciate it. I'm not very well versed in Haskell.
EDIT: Improved with suggestions from kmc on #haskell-blah
EDIT 2: Can now accept input lists with a length of less than 5.
Another one-line solution:
--
middle = ap (take . (1 +) . signum . (`mod` 2) . (1 +) . length) $ drop =<< (`div` 2) . subtract 1 . length
--
Related
I'm taking a functional programming class and I'm having a hard time leaving the OOP mindset behind and finding answers to a lot of my questions.
I have to create a function that takes an ordered list and converts it into specified size sublists using a variation of fold.
This isn't right, but it's what I have:
splitList :: (Ord a) => Int -> [a] -> [[a]]
splitList size xs
| [condition] = foldr (\item subList -> item:subList) [] xs
| otherwise =
I've been searching and I found out that foldr is the variation that works better for what I want, and I think I've understood how fold works, I just don't know how I'll set up the guards so that when length sublist == size haskell resets the accumulator and goes on to the next list.
If I didn't explain myself correctly, here's the result I want:
> splitList 3 [1..10]
> [[1,2,3],[4,5,6],[7,8,9],[10]]
Thanks!
While Fabián's and chi's answers are entirely correct, there is actually an option to solve this puzzle using foldr. Consider the following code:
splitList :: Int -> [a] -> [[a]]
splitList n =
foldr (\el acc -> case acc of
[] -> [[el]]
(h : t) | length h < n -> (el : h) : t
_ -> [el] : acc
) []
The strategy here is to build up a list by extending its head as long as its length is lesser than desired. This solution has, however, two drawbacks:
It does something slightly different than in your example;
splitList 3 [1..10] produces [[1],[2,3,4],[5,6,7],[8,9,10]]
It's complexity is O(n * length l), as we measure length of up to n–sized list on each of the element which yields linear number of linear operations.
Let's first take care of first issue. In order to start counting at the beginning we need to traverse the list left–to–right, while foldr does it right–to–left. There is a common trick called "continuation passing" which will allow us to reverse the direction of the walk:
splitList :: Int -> [a] -> [[a]]
splitList n l = map reverse . reverse $
foldr (\el cont acc ->
case acc of
[] -> cont [[el]]
(h : t) | length h < n -> cont ((el : h) : t)
_ -> cont ([el] : acc)
) id l []
Here, instead of building the list in the accumulator we build up a function that will transform the list in the right direction. See this question for details. The side effect is reversing the list so we need to counter that by reverse application to the whole list and all of its elements. This goes linearly and tail-recursively tho.
Now let's work on the performance issue. The problem was that the length is linear on casual lists. There are two solutions for this:
Use another structure that caches length for a constant time access
Cache the value by ourselves
Because I guess it is a list exercise, let's go for the latter option:
splitList :: Int -> [a] -> [[a]]
splitList n l = map reverse . reverse . snd $
foldr (\el cont (countAcc, listAcc) ->
case listAcc of
[] -> cont (countAcc, [[el]])
(h : t) | countAcc < n -> cont (countAcc + 1, (el : h) : t)
(h : t) -> cont (1, [el] : (h : t))
) id l (1, [])
Here we extend our computational state with a counter that at each points stores the current length of the list. This gives us a constant check on each element and results in linear time complexity in the end.
A way to simplify this problem would be to split this into multiple functions. There are two things you need to do:
take n elements from the list, and
keep taking from the list as much as possible.
Lets try taking first:
taking :: Int -> [a] -> [a]
taking n [] = undefined
taking n (x:xs) = undefined
If there are no elemensts then we cannot take any more elements so we can only return an empty list, on the other hand if we do have an element then we can think of taking n (x:xs) as x : taking (n-1) xs, we would only need to check that n > 0.
taking n (x:xs)
| n > 0 = x :taking (n-1) xs
| otherwise = []
Now, we need to do that multiple times with the remainder so we should probably also return whatever remains from taking n elements from a list, in this case it would be whatever remains when n = 0 so we could try to adapt it to
| otherwise = ([], x:xs)
and then you would need to modify the type signature to return ([a], [a]) and the other 2 definitions to ensure you do return whatever remained after taking n.
With this approach your splitList would look like:
splitList n [] = []
splitList n l = chunk : splitList n remainder
where (chunk, remainder) = taking n l
Note however that folding would not be appropriate since it "flattens" whatever you are working on, for example given a [Int] you could fold to produce a sum which would be an Int. (foldr :: (a -> b -> b) -> b -> [a] -> b or "foldr function zero list produces an element of the function return type")
You want:
splitList 3 [1..10]
> [[1,2,3],[4,5,6],[7,8,9],[10]]
Since the "remainder" [10] in on the tail, I recommend you use foldl instead. E.g.
splitList :: (Ord a) => Int -> [a] -> [[a]]
splitList size xs
| size > 0 = foldl go [] xs
| otherwise = error "need a positive size"
where go acc x = ....
What should go do? Essentially, on your example, we must have:
splitList 3 [1..10]
= go (splitList 3 [1..9]) 10
= go [[1,2,3],[4,5,6],[7,8,9]] 10
= [[1,2,3],[4,5,6],[7,8,9],[10]]
splitList 3 [1..9]
= go (splitList 3 [1..8]) 9
= go [[1,2,3],[4,5,6],[7,8]] 9
= [[1,2,3],[4,5,6],[7,8,9]]
splitList 3 [1..8]
= go (splitList 3 [1..7]) 8
= go [[1,2,3],[4,5,6],[7]] 8
= [[1,2,3],[4,5,6],[7,8]]
and
splitList 3 [1]
= go [] 1
= [[1]]
Hence, go acc x should
check if acc is empty, if so, produce a singleton list [[x]].
otherwise, check the last list in acc:
if its length is less than size, append x
otherwise, append a new list [x] to acc
Try doing this by hand on your example to understand all the cases.
This will not be efficient, but it will work.
You don't really need the Ord a constraint.
Checking the accumulator's first sublist's length would lead to information flow from the right and the first chunk ending up the shorter one, potentially, instead of the last. Such function won't work on infinite lists either (not to mention the foldl-based variants).
A standard way to arrange for the information flow from the left with foldr is using an additional argument. The general scheme is
subLists n xs = foldr g z xs n
where
g x r i = cons x i (r (i-1))
....
The i argument to cons will guide its decision as to where to add the current element into. The i-1 decrements the counter on the way forward from the left, instead of on the way back from the right. z must have the same type as r and as the foldr itself as a whole, so,
z _ = [[]]
This means there must be a post-processing step, and some edge cases must be handled as well,
subLists n xs = post . foldr g z xs $ n
where
z _ = [[]]
g x r i | i == 1 = cons x i (r n)
g x r i = cons x i (r (i-1))
....
cons must be lazy enough not to force the results of the recursive call prematurely.
I leave it as an exercise finishing this up.
For a simpler version with a pre-processing step instead, see this recent answer of mine.
Just going to give another answer: this is quite similar to trying to write groupBy as a fold, and actually has a couple gotchas w.r.t. laziness that you have to bear in mind for an efficient and correct implementation. The following is the fastest version I found that maintains all the relevant laziness properties:
splitList :: Int -> [a] -> [[a]]
splitList m xs = snd (foldr f (const ([],[])) xs 1)
where
f x a i
| i <= 1 = let (ys,zs) = a m in ([], (x : ys) : zs)
| otherwise = let (ys,zs) = a (i-1) in (x : ys , zs)
The ys and the zs gotten from the recursive processing of the rest of list indicate the first and the rest of the groups into which the rest of the list will be broken up, by said recursive processing. So we either prepend the current element before that first subgroup if it is still shorter than needed, or we prepend before the first subgroup when it is just right and start a new, empty subgroup.
I'd like to reverse the first k elements of a list efficiently.
This is what I came up with:
reverseFirst :: Int -> [a] -> [a] -> [a]
reverseFirst 0 xs rev = rev ++ xs
reverseFirst k (x:xs) rev = reverseFirst (k-1) xs (x:rev)
reversed = reverseFirst 3 [1..5] mempty -- Result: [3,2,1,4,5]
It is fairly nice, but the (++) bothers me. Or should I maybe consider using another data structure? I want to do this many million times with short lists.
Let's think about the usual structure of reverse:
reverse = rev [] where
rev acc [] = acc
rev acc (x : xs) = rev (x : acc) xs
It starts with the empty list and tacks on elements from the front of the argument list till it's done. We want to do something similar, except we want to tack the elements onto the front of the portion of the list that we don't reverse. How can we do that when we don't have that un-reversed portion yet?
The simplest way I can think of to avoid traversing the front of the list twice is to use laziness:
reverseFirst :: Int -> [a] -> [a]
reverseFirst k xs = dis where
(dis, dat) = rf dat k xs
rf acc 0 ys = (acc, ys)
rf acc n [] = (acc, [])
rf acc n (y : ys) = rf (y : acc) (n - 1) ys
dat represents the portion of the list that is left alone. We calculate it in the same helper function rf that does the reversing, but we also pass it to rf in the initial call. It's never actually examined in rf, so everything just works. Looking at the generated core (using ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures) suggests that the pairs are compiled away into unlifted pairs and the Ints are unboxed, so everything should probably be quite efficient.
Profiling suggests that this implementation is about 1.3 times as fast as the difference list one, and allocates about 65% as much memory.
Well, usually I'd just write splitAt 3 >>> first reverse >>> uncurry(++) to achieve the goal.
If you're anxious about performance, you can consider a difference list:
reverseFirstN :: Int -> [a] -> [a]
reverseFirstN = go id
where go rev 0 xs = rev xs
go rev k (x:xs) = go ((x:).rev) (k-1) xs
but frankly I wouldn't expect this to be a lot faster: you need to traverse the first n elements either way. Actual performance will depend a lot on what the compiler is able to fuse away.
What's the most direct/efficient way to create all possibilities of dividing one (even) list into two in Haskell? I toyed with splitting all permutations of the list but that would add many extras - all the instances where each half contains the same elements, just in a different order. For example,
[1,2,3,4] should produce something like:
[ [1,2], [3,4] ]
[ [1,3], [2,4] ]
[ [1,4], [2,3] ]
Edit: thank you for your comments -- the order of elements and the type of the result is less important to me than the concept - an expression of all two-groups from one group, where element order is unimportant.
Here's an implementation, closely following the definition.
The first element always goes into the left group. After that, we add the next head element into one, or the other group. If one of the groups becomes too big, there is no choice anymore and we must add all the rest into the the shorter group.
divide :: [a] -> [([a], [a])]
divide [] = [([],[])]
divide (x:xs) = go ([x],[], xs, 1,length xs) []
where
go (a,b, [], i,j) zs = (a,b) : zs -- i == lengh a - length b
go (a,b, s#(x:xs), i,j) zs -- j == length s
| i >= j = (a,b++s) : zs
| (-i) >= j = (a++s,b) : zs
| otherwise = go (x:a, b, xs, i+1, j-1) $ go (a, x:b, xs, i-1, j-1) zs
This produces
*Main> divide [1,2,3,4]
[([2,1],[3,4]),([3,1],[2,4]),([1,4],[3,2])]
The limitation of having an even length list is unnecessary:
*Main> divide [1,2,3]
[([2,1],[3]),([3,1],[2]),([1],[3,2])]
(the code was re-written in the "difference-list" style for efficiency: go2 A zs == go1 A ++ zs).
edit: How does this work? Imagine yourself sitting at a pile of stones, dividing it into two. You put the first stone to a side, which one it doesn't matter (so, left, say). Then there's a choice where to put each next stone — unless one of the two piles becomes too small by comparison, and we thus must put all the remaining stones there at once.
To find all partitions of a non-empty list (of even length n) into two equal-sized parts, we can, to avoid repetitions, posit that the first element shall be in the first part. Then it remains to find all ways to split the tail of the list into one part of length n/2 - 1 and one of length n/2.
-- not to be exported
splitLen :: Int -> Int -> [a] -> [([a],[a])]
splitLen 0 _ xs = [([],xs)]
splitLen _ _ [] = error "Oops"
splitLen k l ys#(x:xs)
| k == l = [(ys,[])]
| otherwise = [(x:us,vs) | (us,vs) <- splitLen (k-1) (l-1) xs]
++ [(us,x:vs) | (us,vs) <- splitLen k (l-1) xs]
does that splitting if called appropriately. Then
partitions :: [a] -> [([a],[a])]
partitions [] = [([],[])]
partitions (x:xs)
| even len = error "Original list with odd length"
| otherwise = [(x:us,vs) | (us,vs) <- splitLen half len xs]
where
len = length xs
half = len `quot` 2
generates all the partitions without redundantly computing duplicates.
luqui raises a good point. I haven't taken into account the possibility that you'd want to split lists with repeated elements. With those, it gets a little more complicated, but not much. First, we group the list into equal elements (done here for an Ord constraint, for only Eq, that could still be done in O(length²)). The idea is then similar, to avoid repetitions, we posit that the first half contains more elements of the first group than the second (or, if there is an even number in the first group, equally many, and similar restrictions hold for the next group etc.).
repartitions :: Ord a => [a] -> [([a],[a])]
repartitions = map flatten2 . halves . prepare
where
flatten2 (u,v) = (flatten u, flatten v)
prepare :: Ord a => [a] -> [(a,Int)]
prepare = map (\xs -> (head xs, length xs)) . group . sort
halves :: [(a,Int)] -> [([(a,Int)],[(a,Int)])]
halves [] = [([],[])]
halves ((a,k):more)
| odd total = error "Odd number of elements"
| even k = [((a,low):us,(a,low):vs) | (us,vs) <- halves more] ++ [normalise ((a,c):us,(a,k-c):vs) | c <- [low + 1 .. min half k], (us,vs) <- choose (half-c) remaining more]
| otherwise = [normalise ((a,c):us,(a,k-c):vs) | c <- [low + 1 .. min half k], (us,vs) <- choose (half-c) remaining more]
where
remaining = sum $ map snd more
total = k + remaining
half = total `quot` 2
low = k `quot` 2
normalise (u,v) = (nz u, nz v)
nz = filter ((/= 0) . snd)
choose :: Int -> Int -> [(a,Int)] -> [([(a,Int)],[(a,Int)])]
choose 0 _ xs = [([],xs)]
choose _ _ [] = error "Oops"
choose need have ((a,k):more) = [((a,c):us,(a,k-c):vs) | c <- [least .. most], (us,vs) <- choose (need-c) (have-k) more]
where
least = max 0 (need + k - have)
most = min need k
flatten :: [(a,Int)] -> [a]
flatten xs = xs >>= uncurry (flip replicate)
Daniel Fischer's answer is a good way to solve the problem. I offer a worse (more inefficient) way, but one which more obviously (to me) corresponds to the problem description. I will generate all partitions of the list into two equal length sublists, then filter out equivalent ones according to your definition of equivalence. The way I usually solve problems is by starting like this -- create a solution that is as obvious as possible, then gradually transform it into a more efficient one (if necessary).
import Data.List (sort, nubBy, permutations)
type Partition a = ([a],[a])
-- Your notion of equivalence (sort to ignore the order)
equiv :: (Ord a) => Partition a -> Partition a -> Bool
equiv p q = canon p == canon q
where
canon (xs,ys) = sort [sort xs, sort ys]
-- All ordered partitions
partitions :: [a] -> [Partition a]
partitions xs = map (splitAt l) (permutations xs)
where
l = length xs `div` 2
-- All partitions filtered out by the equivalence
equivPartitions :: (Ord a) => [a] -> [Partition a]
equivPartitions = nubBy equiv . partitions
Testing
>>> equivPartitions [1,2,3,4]
[([1,2],[3,4]),([3,2],[1,4]),([3,1],[2,4])]
Note
After using QuickCheck to test the equivalence of this implementation with Daniel's, I found an important difference. Clearly, mine requires an (Ord a) constraint and his does not, and this hints at what the difference would be. In particular, if you give his [0,0,0,0], you will get a list with three copies of ([0,0],[0,0]), whereas mine will give only one copy. Which of these is correct was not specified; Daniel's is natural when considering the two output lists to be ordered sequences (which is what that type is usually considered to be), mine is natural when considering them as sets or bags (which is how this question seemed to be treating them).
Splitting The Difference
It is possible to get from an implementation that requires Ord to one that doesn't, by operating on the positions rather than the values in a list. I came up with this transformation -- an idea which I believe originates with Benjamin Pierce in his work on bidirectional programming.
import Data.Traversable
import Control.Monad.Trans.State
data Labelled a = Labelled { label :: Integer, value :: a }
instance Eq (Labelled a) where
a == b = compare a b == EQ
instance Ord (Labelled a) where
compare a b = compare (label a) (label b)
labels :: (Traversable t) => t a -> t (Labelled a)
labels t = evalState (traverse trav t) 0
where
trav x = state (\i -> i `seq` (Labelled i x, i + 1))
onIndices :: (Traversable t, Functor u)
=> (forall a. Ord a => t a -> u a)
-> forall b. t b -> u b
onIndices f = fmap value . f . labels
Using onIndices on equivPartitions wouldn't speed it up at all, but it would allow it to have the same semantics as Daniel's (up to equiv of the results) without the constraint, and with my more naive and obvious way of expressing it -- and I just thought it was an interesting way to get rid of the constraint.
My own generalized version, added much later, inspired by Will's answer:
import Data.Map (adjust, fromList, toList)
import Data.List (groupBy, sort)
divide xs n evenly = divide' xs (zip [0..] (replicate n [])) where
evenPSize = div (length xs) n
divide' [] result = [result]
divide' (x:xs) result = do
index <- indexes
divide' xs (toList $ adjust (x :) index (fromList result)) where
notEmptyBins = filter (not . null . snd) $ result
partlyFullBins | evenly == "evenly" = map fst . filter ((<evenPSize) . length . snd) $ notEmptyBins
| otherwise = map fst notEmptyBins
indexes = partlyFullBins
++ if any (null . snd) result
then map fst . take 1 . filter (null . snd) $ result
else if null partlyFullBins
then map fst. head . groupBy (\a b -> length (snd a) == length (snd b)) . sort $ result
else []
I'm pretty new to Haskell, and I'm having a little trouble. I'm trying to implement a function that takes a list, and an int. the int is supposed to be the index k at which the list is split into a pair of lists. The first one containing the first k elements of the list, and the second from k+1 to the last element. Here's what I have so far:
split :: [a] -> Int -> ([a], [a])
split [] k = error "Empty list!"
split (x:[]) k = ([x],[])
split xs k | k >= (length xs) = error "Number out of range!"
| k < 0 = error "Number out of range!"
I can't actually figure out how to do the split. Any help would be appreciated.
First of all, note that the function you are trying to construct is already in the standard library, in the Prelude - it is called splitAt. Now, directly looking at its definition is confusing, as there are two algorithms, one which doesn't use the standard recursive structure at all -splitAt n xs = (take n xs, drop n xs) - and one that is hand-optimized making it ugly. The former makes more intuitive sense, as you are simply taking a prefix and a suffix and putting them in a pair. However, the latter teaches more, and has this overall structure:
splitAt :: Int -> [a] -> ([a], [a])
splitAt 0 xs = ([], xs)
splitAt _ [] = ([], [])
splitAt n (x:xs) = (x:xs', xs'')
where
(xs', xs'') = splitAt (n - 1) xs
The basic idea is that if a list is made up of a head and a tail (it is of the form x:xs), then the list going from index k+1 onwards will be the same as the list going from k onwards once you remove the first element - drop (k + 1) (x : xs) == drop k xs. To construct the prefix, you similarly remove the first element, take a smaller prefix, and stick the element back on - take (k + 1) (x : xs) == x : take k xs.
What about this:
splitAt' = \n -> \xs -> (take n xs, drop n xs)
Some tests:
> splitAt' 3 [1..10]
> ([1,2,3],[4,5,6,7,8,9,10])
> splitAt' 0 [1..10]
> ([],[1,2,3,4,5,6,7,8,9,10])
> splitAt' 3 []
> ([],[])
> splitAt' 11 [1..10]
> ([1,2,3,4,5,6,7,8,9,10],[])
> splitAt' 2 "haskell"
> ("ha","skell")
Basically, you need some way of passing along partial progress as you recurse through the list. I used a second function that takes an accumulator parameter; it is called from split and then calls itself recursively. There are almost certainly better ways..
EDIT: removed all the length checks., but I believe the use of ++ means it's still O(n^2).
split xs k | k < 0 = error "Number out of range!"
split xs k = ssplit [] xs k
ssplit p xs 0 = (p, xs)
ssplit p (x:xs) k = ssplit (p++[x]) xs (k-1)
ssplit p [] k = error "Number out of range!"
to get the behavior in the original post or
ssplit p [] k = (p,[])
To get the more forgiving behavior of the standard splitAt function.
A common trick for getting rid of quadratic behavior in building a list is to build it up backwards, then reverse it, modifying Mark Reed's solution:
split xs k | k < 0 = error "Number out of range!"
split xs k = (reverse a, b)
where
(a,b) = ssplit [] xs k
ssplit p xs 0 = (p, xs)
ssplit p (x:xs) k = ssplit (x:p) xs (k-1)
ssplit p [] k = error "Number out of range!"
The error check in ssplit is fine since won't get checked (one of the earlier patterns will match) unless there is an actual error.
In practice you might want to add a few strictness annotations to ssplit to manage stack growth, but that's a further refinement.
See splitAt in the prelude:
ghci> :t flip splitAt
flip splitAt :: [a] -> Int -> ([a], [a])
ghci> flip splitAt ['a'..'j'] 5
("abcde","fghij")
I'm looking for the best way to partition a list (or seq) so that groups have a given size.
for ex. let's say I want to group with size 2 (this could be any other number though):
let xs = [(a,b,c); (a,b,d); (y,z,y); (w,y,z); (n,y,z)]
let grouped = partitionBySize 2 input
// => [[(a,b,c);(a,b,d)]; [(y,z,y);(w,y,z)]; [(n,y,z)]]
The obvious way to implement partitionBySize would be by adding the position to every tuple in the input list so that it becomes
[(0,a,b,c), (1,a,b,d), (2,y,z,y), (3,w,y,z), (4,n,y,z)]
and then use GroupBy with
xs |> Seq.ofList |> Seq.GroupBy (function | (i,_,_,_) -> i - (i % n))
However this solution doesn't look very elegant to me.
Is there a better way to implement this function (maybe with a built-in function)?
This seems to be a repeating pattern that's not captured by any function in the F# core library. When solving similar problems earlier, I defined a function Seq.groupWhen (see F# snippets) that turns a sequence into groups. A new group is started when the predicate holds.
You could solve the problem using Seq.groupWhen similarly to Seq.group (by starting a new group at even index). Unlike with Seq.group, this is efficient, because Seq.groupWhen iterates over the input sequence just once:
[3;3;2;4;1;2;8]
|> Seq.mapi (fun i v -> i, v) // Add indices to the values (as first tuple element)
|> Seq.groupWhen (fun (i, v) -> i%2 = 0) // Start new group after every 2nd element
|> Seq.map (Seq.map snd) // Remove indices from the values
Implementing the function directly using recursion is probably easier - the solution from John does exactly what you need - but if you wanted to see a more general approach then Seq.groupWhen may be interesting.
List.chunkBySize (hat tip: Scott Wlaschin) is now available and does exactly what you're talking about. It appears to be new with F# 4.0.
let grouped = [1..10] |> List.chunkBySize 3
// val grouped : int list list =
// [[1; 2; 3]; [4; 5; 6]; [7; 8; 9]; [10]]
Seq.chunkBySize and Array.chunkBySize are also now available.
Here's a tail-recursive function that traverses the list once.
let chunksOf n items =
let rec loop i acc items =
seq {
match i, items, acc with
//exit if chunk size is zero or input list is empty
| _, [], [] | 0, _, [] -> ()
//counter=0 so yield group and continue looping
| 0, _, _::_ -> yield List.rev acc; yield! loop n [] items
//decrement counter, add head to group, and loop through tail
| _, h::t, _ -> yield! loop (i-1) (h::acc) t
//reached the end of input list, yield accumulated elements
//handles items.Length % n <> 0
| _, [], _ -> yield List.rev acc
}
loop n [] items
Usage
[1; 2; 3; 4; 5]
|> chunksOf 2
|> Seq.toList //[[1; 2]; [3; 4]; [5]]
I like the elegance of Tomas' approach, but I benchmarked both our functions using an input list of 10 million elements. This one clocked in at 9 secs vs 22 for his. Of course, as he admitted, the most efficient method would probably involve arrays/loops.
What about a recursive approach? - only requires a single pass
let rec partitionBySize length inp dummy =
match inp with
|h::t ->
if dummy |> List.length < length then
partitionBySize length t (h::dummy)
else dummy::(partitionBySize length t (h::[]))
|[] -> dummy::[]
Then invoke it with partitionBySize 2 xs []
let partitionBySize size xs =
let sq = ref (seq xs)
seq {
while (Seq.length !sq >= size) do
yield Seq.take size !sq
sq := Seq.skip size !sq
if not (Seq.isEmpty !sq) then yield !sq
}
// result to list, if you want
|> Seq.map (Seq.toList)
|> Seq.toList
UPDATE
let partitionBySize size (sq:seq<_>) =
seq {
let e = sq.GetEnumerator()
let empty = ref true;
while !empty do
yield seq { for i = 1 to size do
empty := e.MoveNext()
if !empty then yield e.Current
}
}
array slice version:
let partitionBySize size xs =
let xa = Array.ofList xs
let len = xa.Length
[
for i in 0..size..(len-1) do
yield ( if i + size >= len then xa.[i..] else xa.[i..(i+size-1)] ) |> Array.toList
]
Well, I was late for the party. The code below is a tail-recursive version using high-order functions on List:
let partitionBySize size xs =
let i = size - (List.length xs - 1) % size
let xss, _, _ =
List.foldBack( fun x (acc, ls, j) ->
if j = size then ((x::ls)::acc, [], 1)
else (acc, x::ls, j+1)
) xs ([], [], i)
xss
I did the same benchmark as Daniel did. This function is efficient while it is 2x faster than his approach on my machine. I also compared it with an array/loop version, they are comparable in terms of performance.
Moreover, unlike John's answer, this version preserves order of elements in inner lists.