TL;DR: I want the exact behavior as filter ((== 4) . length) . subsequences. Just using subsequences also creates variable length of lists, which takes a lot of time to process. Since in the end only lists of length 4 are needed, I was thinking there must be a faster way.
I have a list of functions. The list has the type [Wor -> Wor]
The list looks something like this
[f1, f2, f3 .. fn]
What I want is a list of lists of n functions while preserving order like this
input : [f1, f2, f3 .. fn]
argument : 4 functions
output : A list of lists of 4 functions.
Expected output would be where if there's an f1 in the sublist, it'll always be at the head of the list.
If there's a f2 in the sublist and if the sublist doens't have f1, f2 would be at head. If fn is in the sublist, it'll be at last.
In general if there's a fx in the list, it never will be infront of f(x - 1) .
Basically preserving the main list's order when generating sublists.
It can be assumed that length of list will always be greater then given argument.
I'm just starting to learn Haskell so I haven't tried all that much but so far this is what I have tried is this:
Generation permutations with subsequences function and applying (filter (== 4) . length) on it seems to generate correct permutations -but it doesn't preserve order- (It preserves order, I was confusing it with my own function).
So what should I do?
Also if possible, is there a function or a combination of functions present in Hackage or Stackage which can do this? Because I would like to understand the source.
You describe a nondeterministic take:
ndtake :: Int -> [a] -> [[a]]
ndtake 0 _ = [[]]
ndtake n [] = []
ndtake n (x:xs) = map (x:) (ndtake (n-1) xs) ++ ndtake n xs
Either we take an x, and have n-1 more to take from xs; or we don't take the x and have n more elements to take from xs.
Running:
> ndtake 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Update: you wanted efficiency. If we're sure the input list is finite, we can aim at stopping as soon as possible:
ndetake n xs = go (length xs) n xs
where
go spare n _ | n > spare = []
go spare n xs | n == spare = [xs]
go spare 0 _ = [[]]
go spare n [] = []
go spare n (x:xs) = map (x:) (go (spare-1) (n-1) xs)
++ go (spare-1) n xs
Trying it:
> length $ ndetake 443 [1..444]
444
The former version seems to be stuck on this input, but the latter one returns immediately.
But, it measures the length of the whole list, and needlessly so, as pointed out by #dfeuer in the comments. We can achieve the same improvement in efficiency while retaining a bit more laziness:
ndzetake :: Int -> [a] -> [[a]]
ndzetake n xs | n > 0 =
go n (length (take n xs) == n) (drop n xs) xs
where
go n b p ~(x:xs)
| n == 0 = [[]]
| not b = []
| null p = [(x:xs)]
| otherwise = map (x:) (go (n-1) b p xs)
++ go n b (tail p) xs
Now the last test also works instantly with this code as well.
There's still room for improvement here. Just as with the library function subsequences, the search space could be explored even more lazily. Right now we have
> take 9 $ ndzetake 3 [1..]
[[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,2,11]]
but it could be finding [2,3,4] before forcing the 5 out of the input list. Shall we leave it as an exercise?
Here's the best I've been able to come up with. It answers the challenge Will Ness laid down to be as lazy as possible in the input. In particular, ndtake m ([1..n]++undefined) will produce as many entries as possible before throwing an exception. Furthermore, it strives to maximize sharing among the result lists (note the treatment of end in ndtakeEnding'). It avoids problems with badly balanced list appends using a difference list. This sequence-based version is considerably faster than any pure-list version I've come up with, but I haven't teased apart just why that is. I have the feeling it may be possible to do even better with a better understanding of just what's going on, but this seems to work pretty well.
Here's the general idea. Suppose we ask for ndtake 3 [1..5]. We first produce all the results ending in 3 (of which there is one). Then we produce all the results ending in 4. We do this by (essentially) calling ndtake 2 [1..3] and adding the 4 onto each result. We continue in this manner until we have no more elements.
import qualified Data.Sequence as S
import Data.Sequence (Seq, (|>))
import Data.Foldable (toList)
We will use the following simple utility function. It's almost the same as splitAtExactMay from the 'safe' package, but hopefully a bit easier to understand. For reasons I haven't investigated, letting this produce a result when its argument is negative leads to ndtake with a negative argument being equivalent to subsequences. If you want, you can easily change ndtake to do something else for negative arguments.
-- to return an empty list in the negative case.
splitAtMay :: Int -> [a] -> Maybe ([a], [a])
splitAtMay n xs
| n <= 0 = Just ([], xs)
splitAtMay _ [] = Nothing
splitAtMay n (x : xs) = flip fmap (splitAtMay (n - 1) xs) $
\(front, rear) -> (x : front, rear)
Now we really get started. ndtake is implemented using ndtakeEnding, which produces a sort of "difference list", allowing all the partial results to be concatenated cheaply.
ndtake :: Int -> [t] -> [[t]]
ndtake n xs = ndtakeEnding n xs []
ndtakeEnding :: Int -> [t] -> ([[t]] -> [[t]])
ndtakeEnding 0 _xs = ([]:)
ndtakeEnding n xs = case splitAtMay n xs of
Nothing -> id -- Not enough elements
Just (front, rear) ->
(front :) . go rear (S.fromList front)
where
-- For each element, produce a list of all combinations
-- *ending* with that element.
go [] _front = id
go (r : rs) front =
ndtakeEnding' [r] (n - 1) front
. go rs (front |> r)
ndtakeEnding doesn't call itself recursively. Rather, it calls ndtakeEnding' to calculate the combinations of the front part. ndtakeEnding' is very much like ndtakeEnding, but with a few differences:
We use a Seq rather than a list to represent the input sequence. This lets us split and snoc cheaply, but I'm not yet sure why that seems to give amortized performance that is so much better in this case.
We already know that the input sequence is long enough, so we don't need to check.
We're passed a tail (end) to add to each result. This lets us share tails when possible. There are lots of opportunities for sharing tails, so this can be expected to be a substantial optimization.
We use foldr rather than pattern matching. Doing this manually with pattern matching gives clearer code, but worse constant factors. That's because the :<|, and :|> patterns exported from Data.Sequence are non-trivial pattern synonyms that perform a bit of calculation, including amortized O(1) allocation, to build the tail or initial segment, whereas folds don't need to build those.
NB: this implementation of ndtakeEnding' works well for recent GHC and containers; it seems less efficient for earlier versions. That might be the work of Donnacha Kidney on foldr for Data.Sequence. In earlier versions, it might be more efficient to pattern match by hand, using viewl for versions that don't offer the pattern synonyms.
ndtakeEnding' :: [t] -> Int -> Seq t -> ([[t]] -> [[t]])
ndtakeEnding' end 0 _xs = (end:)
ndtakeEnding' end n xs = case S.splitAt n xs of
(front, rear) ->
((toList front ++ end) :) . go rear front
where
go = foldr go' (const id) where
go' r k !front = ndtakeEnding' (r : end) (n - 1) front . k (front |> r)
-- With patterns, a bit less efficiently:
-- go Empty _front = id
-- go (r :<| rs) !front =
-- ndtakeEnding' (r : end) (n - 1) front
-- . go rs (front :|> r)
I wanted to create the function genAllSize ::[a] -> [[a]], that receives a list l and generates all the lists sorted by size that can be built with the elements of the list l; i.e.
> genAllSize [2,4,8]
[[],[2],[4],[8],[2,2],[4,2],[8,2],[2,4],[4,4],[8,4],[2,8],[4,8],[8,8],[2,2,2],[4,2,2],[8,2,2], ...
How would you do it? I came up with a solution using permutations from Data.List but I do not want to use it.
Given an input list xs, select a prefix of that in a non deterministic way
For each element in the prefix, replace it with any element of xs, in a non deterministic way
Result:
> xs = [2,4,8]
> inits xs >>= mapM (const xs)
[[],[2],[4],[8],[2,2],[2,4],[2,8],[4,2],[4,4],[4,8],[8,2],[8,4],
[8,8],[2,2,2],[2,2,4],[2,2,8],[2,4,2],[2,4,4],[2,4,8],[2,8,2],
[2,8,4],[2,8,8],[4,2,2],[4,2,4],[4,2,8],[4,4,2],[4,4,4],[4,4,8],
[4,8,2],[4,8,4],[4,8,8],[8,2,2],[8,2,4],[8,2,8],[8,4,2],[8,4,4],
[8,4,8],[8,8,2],[8,8,4],[8,8,8]]
The other answers seem sort of complicated. I'd do it this way:
> [0..] >>= flip replicateM "abc"
["","a","b","c","aa","ab","ac","ba","bb","bc","ca","cb","cc","aaa","aab",...
Hmm I guess you a need a lazy infinite list of cycling subsequences. One naive way could be like
Prelude> take 100 $ nub . subsequences . cycle $ [2,4,8]
[[],[2],[4],[2,4],[8],[2,8],[4,8],[2,4,8],[2,2],[4,2],[2,4,2],[8,2],[2,8,2],[4,8,2],[2,4,8,2],[4,4],[2,4,4],[8,4],[2,8,4],[4,8,4],[2,4,8,4],[2,2,4],[4,2,4],[2,4,2,4],[8,2,4],[2,8,2,4],[4,8,2,4],[2,4,8,2,4],[8,8],[2,8,8],[4,8,8],[2,4,8,8],[2,2,8],[4,2,8],[2,4,2,8],[8,2,8],[2,8,2,8],[4,8,2,8],[2,4,8,2,8],[4,4,8],[2,4,4,8],[8,4,8],[2,8,4,8],[4,8,4,8],[2,4,8,4,8],[2,2,4,8],[4,2,4,8],[2,4,2,4,8],[8,2,4,8],[2,8,2,4,8],[4,8,2,4,8],[2,4,8,2,4,8],[2,2,2],[4,2,2],[2,4,2,2],[8,2,2],[2,8,2,2],[4,8,2,2],[2,4,8,2,2],[4,4,2],[2,4,4,2],[8,4,2],[2,8,4,2],[4,8,4,2],[2,4,8,4,2],[2,2,4,2],[4,2,4,2],[2,4,2,4,2],[8,2,4,2],[2,8,2,4,2],[4,8,2,4,2],[2,4,8,2,4,2]]
A simple and highly efficient option:
genAllSize [] = [[]]
genAllSize [a] = iterate (a:) []
genAllSize xs =
[] : [x:q|q<-genAllSize xs,x<-xs]
(Thanks to Will Ness for a small but very nice simplification.)
This solution takes advantage of the fact that a valid solution list is either empty or an element of the argument list consed onto a shorter valid solution list. Unlike Daniel Wagner's solution, this one doesn't resort to counting. My tests suggest that it performs extremely well under typical conditions.
Why do we need a special case for a one-element list? The general case performs extremely badly for that, because it maps over the same list over and over with no logarithmic slowdown.
But what's the deal with that call to genAllSizes with the very same argument? Wouldn't it be better to save the result to increase sharing?
genAllSize [] = [[]]
genAllSize xs = p
where
p = [] : [x:q|q<-p,x<-xs]
Indeed, on a theoretical machine with unlimited constant-time memory, this is optimal: walking the list takes worst-case O(1) time for each cons. In practice, it's only a good idea if a great many entries will be realized and retained. Otherwise, there's a problem: most of the list entries will be retained indefinitely, dramatically increasing memory residency and the amount of work the garbage collector needs to do. The non-bold sharing version above still offers amortized O(1) time per cons, but it needs very little memory (logarithmic rather than linear).
Examples
genAllSize "ab" =
["","a","b","aa","ba"
,"ab","bb","aaa","baa"
,"aba","bba","aab","bab"
,"abb","bbb","aaaa",...]
genAllSize "abc" =
["","a","b","c","aa","ba"
,"ca","ab","bb","cb","ac"
,"bc","cc","aaa","baa"
,"caa","aba","bba","cba"
,"aca",...]
An explicit option
You can also use two accumulators:
genAllSize [] = [[]]
genAllSize [a] = iterate (a:) []
genAllSize (x:xs) = go ([], []) where
go (curr, remain) = curr : go (step curr remain)
step [] [] = ([x], [xs])
step (_:ls) ((r:rs):rss) =
(r:ls, rs:rss)
step (_:ls) ([] : rs) =
(x : ls', xs : rs')
where
!(ls', rs') = step ls rs
This version keeps track of the current "word" and also the remaining available "letters" in each position. The performance seems comparable in general, but a bit better with regard to memory residency. It's also much harder to understand!
This produces the elements in a different order within each length than your example, but it meets the definition of the text of your question. Changing the order is easy - you have to replace <*> with a slightly different operator of your own making.
import Control.Applicative
import Control.Monad
rinvjoin :: Applicative both => both a -> both (both a)
rinvjoin = fmap pure
extendBranches options branches = (<|>) <$> options <*> branches
singletonBranchExtensions = rinvjoin
genAllSize [] = []
genAllSize xs = join <$> iterate (extendBranches extensions) $ initialBranches
where extensions = singletonBranchExtensions xs
initialBranches = pure empty
To put it straigth, I'm fairly new to Haskell and trying to solve a problem (programming exercise) I came over. Where it says I should create a function
com :: Int -> [t] -> [[t]]
that returns all possible choices of n elements, where n and list are the first and second arguments, respectively. Elements can be picked over again and in a different order. A result would be like:
com 2 [1,2,3] = [[1,1], [1,2]..[3,3]]
For the cases n = 1 and n = 2, I manage to solve the cases. The case n = 1 is quite simple, and, for the case n = 2, I would use concatenation and build it up. However, I don't understand how it can be made n-ary and work for all n. Like if suddenly a function call would be like com 10 ...
Is this what you want?
> sequence (replicate 3 "abc")
["aaa","aab","aac","aba","abb","abc","aca","acb","acc"
,"baa","bab","bac","bba","bbb","bbc","bca","bcb","bcc"
,"caa","cab","cac","cba","cbb","cbc","cca","ccb","ccc"]
The above exploits the fact that sequence, in the list monad, builds the cartesian product of a list of lists. So, we can simply replicate our list n times, and then take the product.
(Note that "abc" above is a shorthand for the list of charatcters ['a','b','c'])
So, a solution could be
com n xs = sequence (replicate n xs)
or equivalently, as Daniel Wagner points out below,
com = replicateM
A final note: I do realize that this is probably not very helpful for actually learning how to program. Indeed, I pulled two "magic" functions from the library which solved the task. Still, it shows how the problem can be reduced to two subproblems: 1) replicating a value n times and 2) building a cartesian product. The second task is a nice exercise on its own, if you don't want to use the library. You may wish to solve that starting from:
sequence :: [[a]] -> [[a]]
sequence [] = [[]]
sequence (x:xs) = ...
where ys = sequence xs
First: [] is a list constructor, not a tuple. I don't know any general way to build n-ary tuple.
However, sticking to lists, if you have n = 1 case solved and n = 2 case solved try to express the latter in term of the former. Then generalize to any n in terms of n-1:
com n xs = concat [map (x:) (com (n-1) xs) | x <- xs ]
A more verbose way to write it, but potentially more helpful when trying to understand List non-determinism and trying to understand exactly what the Haskell comprehension syntactic sugar really means, is to write with do notation:
com :: Int -> [a] -> [[a]]
com 0 _ = []
com 1 xs = [[x] | x <- xs]
com n xs = do
x <- xs
let ys = com (n - 1) xs
map (x:) ys
Say I have any list like this:
[4,5,6,7,1,2,3,4,5,6,1,2]
I need a Haskell function that will transform this list into a list of lists which are composed of the segments of the original list which form a series in ascending order. So the result should look like this:
[[4,5,6,7],[1,2,3,4,5,6],[1,2]]
Any suggestions?
You can do this by resorting to manual recursion, but I like to believe Haskell is a more evolved language. Let's see if we can develop a solution that uses existing recursion strategies. First some preliminaries.
{-# LANGUAGE NoMonomorphismRestriction #-}
-- because who wants to write type signatures, amirite?
import Data.List.Split -- from package split on Hackage
Step one is to observe that we want to split the list based on a criteria that looks at two elements of the list at once. So we'll need a new list with elements representing a "previous" and "next" value. There's a very standard trick for this:
previousAndNext xs = zip xs (drop 1 xs)
However, for our purposes, this won't quite work: this function always outputs a list that's shorter than the input, and we will always want a list of the same length as the input (and in particular we want some output even when the input is a list of length one). So we'll modify the standard trick just a bit with a "null terminator".
pan xs = zip xs (map Just (drop 1 xs) ++ [Nothing])
Now we're going to look through this list for places where the previous element is bigger than the next element (or the next element doesn't exist). Let's write a predicate that does that check.
bigger (x, y) = maybe False (x >) y
Now let's write the function that actually does the split. Our "delimiters" will be values that satisfy bigger; and we never want to throw them away, so let's keep them.
ascendingTuples = split . keepDelimsR $ whenElt bigger
The final step is just to throw together the bit that constructs the tuples, the bit that splits the tuples, and a last bit of munging to throw away the bits of the tuples we don't care about:
ascending = map (map fst) . ascendingTuples . pan
Let's try it out in ghci:
*Main> ascending [4,5,6,7,1,2,3,4,5,6,1,2]
[[4,5,6,7],[1,2,3,4,5,6],[1,2]]
*Main> ascending [7,6..1]
[[7],[6],[5],[4],[3],[2],[1]]
*Main> ascending []
[[]]
*Main> ascending [1]
[[1]]
P.S. In the current release of split, keepDelimsR is slightly stricter than it needs to be, and as a result ascending currently doesn't work with infinite lists. I've submitted a patch that makes it lazier, though.
ascend :: Ord a => [a] -> [[a]]
ascend xs = foldr f [] xs
where
f a [] = [[a]]
f a xs'#(y:ys) | a < head y = (a:y):ys
| otherwise = [a]:xs'
In ghci
*Main> ascend [4,5,6,7,1,2,3,4,5,6,1,2]
[[4,5,6,7],[1,2,3,4,5,6],[1,2]]
This problem is a natural fit for a paramorphism-based solution. Having (as defined in that post)
para :: (a -> [a] -> b -> b) -> b -> [a] -> b
foldr :: (a -> b -> b) -> b -> [a] -> b
para c n (x : xs) = c x xs (para c n xs)
foldr c n (x : xs) = c x (foldr c n xs)
para c n [] = n
foldr c n [] = n
we can write
partition_asc xs = para c [] xs where
c x (y:_) ~(a:b) | x<y = (x:a):b
c x _ r = [x]:r
Trivial, since the abstraction fits.
BTW they have two kinds of map in Common Lisp - mapcar
(processing elements of an input list one by one)
and maplist (processing "tails" of a list). With this idea we get
import Data.List (tails)
partition_asc2 xs = foldr c [] . init . tails $ xs where
c (x:y:_) ~(a:b) | x<y = (x:a):b
c (x:_) r = [x]:r
Lazy patterns in both versions make it work with infinite input lists
in a productive manner (as first shown in Daniel Fischer's answer).
update 2020-05-08: not so trivial after all. Both head . head . partition_asc $ [4] ++ undefined and the same for partition_asc2 fail with *** Exception: Prelude.undefined. The combining function g forces the next element y prematurely. It needs to be more carefully written to be productive right away before ever looking at the next element, as e.g. for the second version,
partition_asc2' xs = foldr c [] . init . tails $ xs where
c (x:ys) r#(~(a:b)) = (x:g):gs
where
(g,gs) | not (null ys)
&& x < head ys = (a,b)
| otherwise = ([],r)
(again, as first shown in Daniel's answer).
You can use a right fold to break up the list at down-steps:
foldr foo [] xs
where
foo x yss = (x:zs) : ws
where
(zs, ws) = case yss of
(ys#(y:_)) : rest
| x < y -> (ys,rest)
| otherwise -> ([],yss)
_ -> ([],[])
(It's a bit complicated in order to have the combining function lazy in the second argument, so that it works well for infinite lists too.)
One other way of approaching this task (which, in fact lays the fundamentals of a very efficient sorting algorithm) is using the Continuation Passing Style a.k.a CPS which, in this particular case applied to folding from right; foldr.
As is, this answer would only chunk up the ascending chunks however, it would be nice to chunk up the descending ones at the same time... preferably in reverse order all in O(n) which would leave us with only binary merging of the obtained chunks for a perfectly sorted output. Yet that's another answer for another question.
chunks :: Ord a => [a] -> [[a]]
chunks xs = foldr go return xs $ []
where
go :: Ord a => a -> ([a] -> [[a]]) -> ([a] -> [[a]])
go c f = \ps -> let (r:rs) = f [c]
in case ps of
[] -> r:rs
[p] -> if c > p then (p:r):rs else [p]:(r:rs)
*Main> chunks [4,5,6,7,1,2,3,4,5,6,1,2]
[[4,5,6,7],[1,2,3,4,5,6],[1,2]]
*Main> chunks [4,5,6,7,1,2,3,4,5,4,3,2,6,1,2]
[[4,5,6,7],[1,2,3,4,5],[4],[3],[2,6],[1,2]]
In the above code c stands for current and p is for previous and again, remember we are folding from right so previous, is actually the next item to process.