Haskell: how to treat big combinatorial Lists?

Haskell: how to treat big combinatorial Lists? - list

just recently I tried to write a program which basically simulates a simple system from an Online Game. The idea behind that is, to let the program calculate the most efficient set of items for the most possible stat efficiency from a set. To clarify this a bit more:
you've got 8 item Slots and 74 different items, you can't use any item twice, and it doesn't matter which item is in which slot. I'am not even yet trying to calculate one set of stats, I'am stuck way earlier!
So the problem with this is the number of possibilities which are (74^8) before filtering and (74 choose 8) after filtering.
My program already starts lagging when I just try head(permu' 2).
Since I know Haskell is supposed to work with infinite Lists, how does it work with a List of 899 trillion entries? Well I know obviously it takes a lot of capacity for the PC but that's why I am here to ask:
How do I treat a big List in Haskell so that I can work with it?
the function simplified looks like this:
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort [a] = [a]
quicksort (x:xs) = (quicksort [y | y <- xs, y <= x]) ++ [x] ++ (quicksort [z | z <- xs , z > x])
eliminatedouble [] = []
eliminatedouble (x:xs) = if x `elem` xs then eliminatedouble xs else x:(eliminatedouble xs)
permu' n | n>8 = error "8 is max"
| otherwise = eliminatedouble (filter allSatisfied (generate n))
where
generate 0 = [[]]
generate x = [quicksort (a:xs) | a <- [1..74], xs <- generate (x-1)]
allSatisfied [] = True
allSatisfied (x:xs) = (checkConstraint x xs) && (allSatisfied xs)
checkConstraint x xs = not (doubled x xs)
doubled x xs = x `elem` xs
would be interesting to know how to do all this way cheaper.
Thanks in advance, regards.

You're making this much more difficult than it needs to be.
choose 0 xs = [[]]
choose n [] = []
choose n (x:xs) = map (x:) (choose (n-1) xs) ++ choose n xs
In my interpreter, choose 5 [1..74] takes about 22 seconds to compute all the entries and choose 6 [1..74] takes 273 seconds. Additionally, choose 8 [1..74] starts chugging through combinations straight away; I estimate that it would take about 6 hours to generate them all. N.B. that this is in the interpreter, with no optimization or other fanciness going on; possibly it could go much faster if you give GHC a chance to figure out how.
Assuming that you intend to do some nontrivial computation on each element of choose 8 [1..74], I suggest you either schedule a largish chunk of time or else think about solutions that do not do an exhaustive search -- perhaps using some heuristics to get an approximate answer, or figuring out how to do some pruning to cut out large, uninteresting swaths of the search space.

Related

A faster way of generating combinations with a given length, preserving the order

TL;DR: I want the exact behavior as filter ((== 4) . length) . subsequences. Just using subsequences also creates variable length of lists, which takes a lot of time to process. Since in the end only lists of length 4 are needed, I was thinking there must be a faster way.
I have a list of functions. The list has the type [Wor -> Wor]
The list looks something like this
[f1, f2, f3 .. fn]
What I want is a list of lists of n functions while preserving order like this
input : [f1, f2, f3 .. fn]
argument : 4 functions
output : A list of lists of 4 functions.
Expected output would be where if there's an f1 in the sublist, it'll always be at the head of the list.
If there's a f2 in the sublist and if the sublist doens't have f1, f2 would be at head. If fn is in the sublist, it'll be at last.
In general if there's a fx in the list, it never will be infront of f(x - 1) .
Basically preserving the main list's order when generating sublists.
It can be assumed that length of list will always be greater then given argument.
I'm just starting to learn Haskell so I haven't tried all that much but so far this is what I have tried is this:
Generation permutations with subsequences function and applying (filter (== 4) . length) on it seems to generate correct permutations -but it doesn't preserve order- (It preserves order, I was confusing it with my own function).
So what should I do?
Also if possible, is there a function or a combination of functions present in Hackage or Stackage which can do this? Because I would like to understand the source.

You describe a nondeterministic take:
ndtake :: Int -> [a] -> [[a]]
ndtake 0 _ = [[]]
ndtake n [] = []
ndtake n (x:xs) = map (x:) (ndtake (n-1) xs) ++ ndtake n xs
Either we take an x, and have n-1 more to take from xs; or we don't take the x and have n more elements to take from xs.
Running:
> ndtake 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Update: you wanted efficiency. If we're sure the input list is finite, we can aim at stopping as soon as possible:
ndetake n xs = go (length xs) n xs
where
go spare n _ | n > spare = []
go spare n xs | n == spare = [xs]
go spare 0 _ = [[]]
go spare n [] = []
go spare n (x:xs) = map (x:) (go (spare-1) (n-1) xs)
++ go (spare-1) n xs
Trying it:
> length $ ndetake 443 [1..444]
444
The former version seems to be stuck on this input, but the latter one returns immediately.
But, it measures the length of the whole list, and needlessly so, as pointed out by #dfeuer in the comments. We can achieve the same improvement in efficiency while retaining a bit more laziness:
ndzetake :: Int -> [a] -> [[a]]
ndzetake n xs | n > 0 =
go n (length (take n xs) == n) (drop n xs) xs
where
go n b p ~(x:xs)
| n == 0 = [[]]
| not b = []
| null p = [(x:xs)]
| otherwise = map (x:) (go (n-1) b p xs)
++ go n b (tail p) xs
Now the last test also works instantly with this code as well.
There's still room for improvement here. Just as with the library function subsequences, the search space could be explored even more lazily. Right now we have
> take 9 $ ndzetake 3 [1..]
[[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,2,11]]
but it could be finding [2,3,4] before forcing the 5 out of the input list. Shall we leave it as an exercise?

Here's the best I've been able to come up with. It answers the challenge Will Ness laid down to be as lazy as possible in the input. In particular, ndtake m ([1..n]++undefined) will produce as many entries as possible before throwing an exception. Furthermore, it strives to maximize sharing among the result lists (note the treatment of end in ndtakeEnding'). It avoids problems with badly balanced list appends using a difference list. This sequence-based version is considerably faster than any pure-list version I've come up with, but I haven't teased apart just why that is. I have the feeling it may be possible to do even better with a better understanding of just what's going on, but this seems to work pretty well.
Here's the general idea. Suppose we ask for ndtake 3 [1..5]. We first produce all the results ending in 3 (of which there is one). Then we produce all the results ending in 4. We do this by (essentially) calling ndtake 2 [1..3] and adding the 4 onto each result. We continue in this manner until we have no more elements.
import qualified Data.Sequence as S
import Data.Sequence (Seq, (|>))
import Data.Foldable (toList)
We will use the following simple utility function. It's almost the same as splitAtExactMay from the 'safe' package, but hopefully a bit easier to understand. For reasons I haven't investigated, letting this produce a result when its argument is negative leads to ndtake with a negative argument being equivalent to subsequences. If you want, you can easily change ndtake to do something else for negative arguments.
-- to return an empty list in the negative case.
splitAtMay :: Int -> [a] -> Maybe ([a], [a])
splitAtMay n xs
| n <= 0 = Just ([], xs)
splitAtMay _ [] = Nothing
splitAtMay n (x : xs) = flip fmap (splitAtMay (n - 1) xs) $
\(front, rear) -> (x : front, rear)
Now we really get started. ndtake is implemented using ndtakeEnding, which produces a sort of "difference list", allowing all the partial results to be concatenated cheaply.
ndtake :: Int -> [t] -> [[t]]
ndtake n xs = ndtakeEnding n xs []
ndtakeEnding :: Int -> [t] -> ([[t]] -> [[t]])
ndtakeEnding 0 _xs = ([]:)
ndtakeEnding n xs = case splitAtMay n xs of
Nothing -> id -- Not enough elements
Just (front, rear) ->
(front :) . go rear (S.fromList front)
where
-- For each element, produce a list of all combinations
-- *ending* with that element.
go [] _front = id
go (r : rs) front =
ndtakeEnding' [r] (n - 1) front
. go rs (front |> r)
ndtakeEnding doesn't call itself recursively. Rather, it calls ndtakeEnding' to calculate the combinations of the front part. ndtakeEnding' is very much like ndtakeEnding, but with a few differences:
We use a Seq rather than a list to represent the input sequence. This lets us split and snoc cheaply, but I'm not yet sure why that seems to give amortized performance that is so much better in this case.
We already know that the input sequence is long enough, so we don't need to check.
We're passed a tail (end) to add to each result. This lets us share tails when possible. There are lots of opportunities for sharing tails, so this can be expected to be a substantial optimization.
We use foldr rather than pattern matching. Doing this manually with pattern matching gives clearer code, but worse constant factors. That's because the :<|, and :|> patterns exported from Data.Sequence are non-trivial pattern synonyms that perform a bit of calculation, including amortized O(1) allocation, to build the tail or initial segment, whereas folds don't need to build those.
NB: this implementation of ndtakeEnding' works well for recent GHC and containers; it seems less efficient for earlier versions. That might be the work of Donnacha Kidney on foldr for Data.Sequence. In earlier versions, it might be more efficient to pattern match by hand, using viewl for versions that don't offer the pattern synonyms.
ndtakeEnding' :: [t] -> Int -> Seq t -> ([[t]] -> [[t]])
ndtakeEnding' end 0 _xs = (end:)
ndtakeEnding' end n xs = case S.splitAt n xs of
(front, rear) ->
((toList front ++ end) :) . go rear front
where
go = foldr go' (const id) where
go' r k !front = ndtakeEnding' (r : end) (n - 1) front . k (front |> r)
-- With patterns, a bit less efficiently:
-- go Empty _front = id
-- go (r :<| rs) !front =
-- ndtakeEnding' (r : end) (n - 1) front
-- . go rs (front :|> r)

Reverse first k elements of a list

I'd like to reverse the first k elements of a list efficiently.
This is what I came up with:
reverseFirst :: Int -> [a] -> [a] -> [a]
reverseFirst 0 xs rev = rev ++ xs
reverseFirst k (x:xs) rev = reverseFirst (k-1) xs (x:rev)
reversed = reverseFirst 3 [1..5] mempty -- Result: [3,2,1,4,5]
It is fairly nice, but the (++) bothers me. Or should I maybe consider using another data structure? I want to do this many million times with short lists.

Let's think about the usual structure of reverse:
reverse = rev [] where
rev acc [] = acc
rev acc (x : xs) = rev (x : acc) xs
It starts with the empty list and tacks on elements from the front of the argument list till it's done. We want to do something similar, except we want to tack the elements onto the front of the portion of the list that we don't reverse. How can we do that when we don't have that un-reversed portion yet?
The simplest way I can think of to avoid traversing the front of the list twice is to use laziness:
reverseFirst :: Int -> [a] -> [a]
reverseFirst k xs = dis where
(dis, dat) = rf dat k xs
rf acc 0 ys = (acc, ys)
rf acc n [] = (acc, [])
rf acc n (y : ys) = rf (y : acc) (n - 1) ys
dat represents the portion of the list that is left alone. We calculate it in the same helper function rf that does the reversing, but we also pass it to rf in the initial call. It's never actually examined in rf, so everything just works. Looking at the generated core (using ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures) suggests that the pairs are compiled away into unlifted pairs and the Ints are unboxed, so everything should probably be quite efficient.
Profiling suggests that this implementation is about 1.3 times as fast as the difference list one, and allocates about 65% as much memory.

Well, usually I'd just write splitAt 3 >>> first reverse >>> uncurry(++) to achieve the goal.
If you're anxious about performance, you can consider a difference list:
reverseFirstN :: Int -> [a] -> [a]
reverseFirstN = go id
where go rev 0 xs = rev xs
go rev k (x:xs) = go ((x:).rev) (k-1) xs
but frankly I wouldn't expect this to be a lot faster: you need to traverse the first n elements either way. Actual performance will depend a lot on what the compiler is able to fuse away.

Fast length of an intersection with duplicates in Haskell

I'm writing a mastermind solver, and in an inner loop I calculate the length of the intersection with duplicates of two lists. Right now the function I have is
overlap :: Eq c => [c] -> [c] -> Int
overlap [] _ = 0
overlap (x:xs) ys
| x `elem` ys = 1 + overlap xs (delete x ys)
| otherwise = overlap xs ys
Is it possible to make this faster? If it helps, the arguments to overlap are short lists of the same length, at most 6 elements, and the c type has less than 10 possible values.

In general it is (almost) impossible to boost the performance of such algorithm: in order to remove duplicates in two unordered and unhashable lists, can be done in O(n^2).
In general, you can however boost performance with the following conditions (per condition, a different approach):
If you can for instance ensure that for each list you create/modify/..., the order of the elements is maintained; this can require some engineering. In that case, the algorithm can run in O(n).
In that case you can run it with:
--Use this only if xs and ys are sorted
overlap :: Ord c => [c] -> [c] -> Int
overlap (x:xs) (y:ys) | x < y = overlap xs (y:ys)
| x > y = overlap (x:xs) ys
| otherwise = 1 + overlap xs ys
overlap [] _ = 0
overlap _ [] = 0
In general sorting of a list can be done in O(n log n) and is thus more efficient than your O(n^2) overlap algorithm. The new overlap algorithm runs in O(n).
In case c is ordered, you might use a Data.Set as well. In that case you can use the fromList method that runs in O(n log n) to create a TreeSet for the two lists, then use the intersection function to calculate the intersection in O(n) time and finally use the size function to calculate the size.
--Use this only if c can be ordered
overlap :: Ord c => [c] -> [c] -> Int
overlap xs ys = size $ intersection (fromList xs) (fromList ys)

Are you using same ys for multiple xs?
If yes, you can try to calculate hash values for each element in ys and match by this value, but keep in mind that calculating hash needs to be faster then 6 comparisons.
If either of those is Ord you may also sort it earlier, and verify only necessary part of ys.
However, if you need fast random access lists aren't the best structure, you should probably take a look at Data.Array and Data.HashMap

Haskell create an n-ary tuple from given input

To put it straigth, I'm fairly new to Haskell and trying to solve a problem (programming exercise) I came over. Where it says I should create a function
com :: Int -> [t] -> [[t]]
that returns all possible choices of n elements, where n and list are the first and second arguments, respectively. Elements can be picked over again and in a different order. A result would be like:
com 2 [1,2,3] = [[1,1], [1,2]..[3,3]]
For the cases n = 1 and n = 2, I manage to solve the cases. The case n = 1 is quite simple, and, for the case n = 2, I would use concatenation and build it up. However, I don't understand how it can be made n-ary and work for all n. Like if suddenly a function call would be like com 10 ...

Is this what you want?
> sequence (replicate 3 "abc")
["aaa","aab","aac","aba","abb","abc","aca","acb","acc"
,"baa","bab","bac","bba","bbb","bbc","bca","bcb","bcc"
,"caa","cab","cac","cba","cbb","cbc","cca","ccb","ccc"]
The above exploits the fact that sequence, in the list monad, builds the cartesian product of a list of lists. So, we can simply replicate our list n times, and then take the product.
(Note that "abc" above is a shorthand for the list of charatcters ['a','b','c'])
So, a solution could be
com n xs = sequence (replicate n xs)
or equivalently, as Daniel Wagner points out below,
com = replicateM
A final note: I do realize that this is probably not very helpful for actually learning how to program. Indeed, I pulled two "magic" functions from the library which solved the task. Still, it shows how the problem can be reduced to two subproblems: 1) replicating a value n times and 2) building a cartesian product. The second task is a nice exercise on its own, if you don't want to use the library. You may wish to solve that starting from:
sequence :: [[a]] -> [[a]]
sequence [] = [[]]
sequence (x:xs) = ...
where ys = sequence xs

First: [] is a list constructor, not a tuple. I don't know any general way to build n-ary tuple.
However, sticking to lists, if you have n = 1 case solved and n = 2 case solved try to express the latter in term of the former. Then generalize to any n in terms of n-1:
com n xs = concat [map (x:) (com (n-1) xs) | x <- xs ]

A more verbose way to write it, but potentially more helpful when trying to understand List non-determinism and trying to understand exactly what the Haskell comprehension syntactic sugar really means, is to write with do notation:
com :: Int -> [a] -> [[a]]
com 0 _ = []
com 1 xs = [[x] | x <- xs]
com n xs = do
x <- xs
let ys = com (n - 1) xs
map (x:) ys

Better way to solve this [Int] -> Int -> Int

Here is an sample problem I'm working upon:
Example Input: test [4, 1, 5, 6] 6 returns 5
I'm solving this using this function:
test :: [Int] -> Int -> Int
test [] _ = 0
test (x:xs) time = if (time - x) < 0
then x
else test xs $ time - x
Any better way to solve this function (probably using any inbuilt higher order function) ?

How about
test xs time = maybe 0 id . fmap snd . find ((>time) . fst) $ zip sums xs
where sums = scanl1 (+) xs
or equivalently with that sugary list comprehension
test xs time = headDef 0 $ [v | (s, v) <- zip sums xs, s > time]
where sums = scanl1 (+) xs
headDef is provided by safe. It's trivial to implement (f _ (x:_) = x; f x _ = x) but the safe package has loads of useful functions like these so it's good to check out.
Which sums the list up to each point and finds the first occurence greater than time. scanl is a useful function that behaves like foldl but keeps intermediate results and zip zips two lists into a list of tuples. Then we just use fmap and maybe to manipulate the Maybe (Integer, Integer) to get our result.
This defaults to 0 like yours but I like the version that simply goes to Maybe Integer better from a user point of view, to get this simply remove the maybe 0 id.

You might like scanl and its close relative, scanl1. For example:
test_ xs time = [curr | (curr, tot) <- zip xs (scanl1 (+) xs), tot > time]
This finds all the places where the running sum is greater than time. Then you can pick the first one (or 0) like this:
safeHead def xs = head (xs ++ [def])
test xs time = safeHead 0 (test_ xs time)

This is verbose, and I don't necessarily recommend writing such a simple function like this (IMO the pattern matching & recursion is plenty clear). But, here's a pretty declarative pipeline:
import Control.Error
import Data.List
deadline :: (Num a, Ord a) => a -> [a] -> a
deadline time = fromMaybe 0 . findDeadline time
findDeadline :: (Num a, Ord a) => a -> [a] -> Maybe a
findDeadline time xs = decayWithDifferences time xs
>>= findIndex (< 0)
>>= atMay xs
decayWithDifferences :: Num b => b -> [b] -> Maybe [b]
decayWithDifferences time = tailMay . scanl (-) time
-- > deadline 6 [4, 1, 5, 6]
-- 5
This documents the code a bit and in principle lets you test a little better, though IMO these functions fit more-or-less into the 'obviously correct' category.
You can verify that it matches your implementation:
import Test.QuickCheck
prop_equality :: [Int] -> Int -> Bool
prop_equality time xs = test xs time == deadline time xs
-- > quickCheck prop_equality
-- +++ OK, passed 100 tests.

In this particular case zipping suggested by others in not quite necessary:
test xs time = head $ [y-x | (x:y:_) <- tails $ scanl1 (+) $ 0:xs, y > time]++[0]
Here scanl1 will produce a list of rolling sums of the list xs, starting it with 0. Therefore, tails will produce a list with at least one list having two elements for non-empty xs. Pattern-matching (x:y:_) extracts two elements from each tail of rolling sums, so in effect it enumerates pairs of neighbouring elements in the list of rolling sums. Filtering on the condition, we reconstruct a part of the list that starts with the first element that produces a rolling sum greater than time. Then use headDef 0 as suggested before, or append a [0], so that head always returns something.

If you want to retain readability, I would just stick with your current solution. It's easy to understand, and isn't doing anything wrong.
Just because you can make it into a one line scan map fold mutant doesn't mean that you should!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Haskell: how to treat big combinatorial Lists? - list

Related

A faster way of generating combinations with a given length, preserving the order

Reverse first k elements of a list

Fast length of an intersection with duplicates in Haskell

Haskell create an n-ary tuple from given input

Better way to solve this [Int] -> Int -> Int

Categories

Resources