Haskell Sieve of Eratosthenes with list of composites - list

I have to implement the classic problem of the Sieve of Eratosthenes in Haskell for a project. Rather than computing each prime I only have to compare numbers between lists. For instance I pass a list of potential primes (parameter 1) and a list of composites (list 2). sieve [2..10] [] results with the list [2,3,5,7].
I think I am very close and it compiles, but it appends every item to the prime list rather than throwing out the composites. My thinking was that it would take the list x of all numbers 2..10 or whatever and a list y of composites use elem to see if the head of list x is found in list y and if so append to list z and print. Thanks in advance!
Currently my code returns everything in the first list and refuses to sort. sieve [2..10] [] results in [2,3,4,5,6,7,8,9,10]
sieve ::[Int]->[Int]->[Int]
z = []
sieve [] [] = []
sieve x [] = x
sieve [] y = y
sieve xs ys = if ((elem (head xs)) ys) then (sieve (tail xs) ys)
else ((head xs):z)

what you call sieve is usually called minus, subtracting the second list from the first, assuming the both are ordered, increasing lists of numbers. then it is enough to compare just the two head elements, without any elem calls.
but it could still work, had you provided a proper definition for z. z=[] is just a placeholder, to make it compile (right?); it's not the right definition. it should have been:
sieve :: [Int] -> [Int] -> [Int]
-- z = []
sieve [] [] = []
sieve x [] = x
sieve [] y = y
sieve xs ys = if ((elem (head xs)) ys) then (sieve (tail xs) z)
else ((head xs) : sieve (tail xs) ys )
where
z = ... -- need to remove (head xs) from ys
For the last comment's task, you could use e.g. delete function.
This still won't produce you a list of primes without the list of composites, so the initial call can not be with the second list empty (or else, you'd get the same first argument back, as you do, because of the sieve x [] = x equation):
primesAmong input = sieve input composites
But what are composites? Eratosthenes's answer is, why, they are multiples of primes! (and trial division says: composites have other primes as their divisors).
Given a prime, say 2, we just count: 2,4,6,...; and for 3, say, it's 3,6,9,12,...; to find its multiples. Let's write it down:
composites = mutliplesOf primes
mutliplesOf primes = [ mult | p <- primes, mult <- [...] ]
This doesn't quite fit: this multiplesOf needs an argument:
primes = primesAmong input
primesAmong input = sieve input (mutliplesOf primes)
We seem to be chasing our own tail here; we don't have primes yet; what can we use instead? Is there a harm in finding multiples of non-primes, as well as primes?
After you do have a running code, try to find a way to use primes after all.

The program you show doesn't make much sense, first of all sieve x [] will always be used, furthermore in you should check if an element is divisable by the other list. Finally you should make the call recursive, something you don't do with head xs : z since z is defined as the empty list.
Let's start with the base case: in case the left list is empty, regardless of the content of the second list, one returns the empty list. Since sieving nothing will result in nothing:
sieve [] _ = []
Next we look for the inductive case, with as pattern:
sieve (x:xs) ds = ...
Now we need to enumerate over the list of already found elements. From the moment any of the found elements can divide x, we know the number is not (relative) prime. This condition is formalized as:
(==) 0 . mod x :: Integral b => b -> Bool
Or to iterate over the list of ds:
any ((==) 0 . mod x) ds
In case such element exists, we simply skip the element, and call the inductive case with sieve xs ds.
In case there is no such element, we add it to the list of ds and emit it. The result is thus: x : sieve xs (x:ds). The inductive case is thus:
sieve (x:xs) ds | any ((==) 0 . mod x) ds = sieve xs ds
| otherwise = x : sieve xs (x:ds)
We can shorten this a bit by creating a specific variable for sieve xs:
sieve (x:xs) ds | any ((==) 0 . mod x) ds = rec ds
| otherwise = x : rec (x:ds)
where rec = sieve xs
The full function is thus:
sieve [] _ = []
sieve (x:xs) ds | any ((==) 0 . mod x) ds = rec ds
| otherwise = x : rec (x:ds)
where rec = sieve xs
You can boost the performance in two ways:
Adding x at the end of ds. This is indeed a more expensive operation. But after a while you don't add numbers that often. It is interesting because in that case ys looks like [2,3,5,7,11] instead of [11,7,5,3,2]. Now the chances that a number is divisible by 2 (50%) is greater than a number being divisible by 11 (9.9%). It is better to try to try the test that will succeed most probable first.
Furthermore you can end the check after the dividers you have reached the square root of the number to be tested: if a number is not divisible by a number smaller than such number, it is definitely not divisible by a number greater than the square root.
A more efficient approach is thus:
sieve [] _ = []
sieve (x:xs) ds | any ((==) 0 . mod x) $ takeWhile (\y -> y*y <= x) ds = rec ds
| otherwise = x : rec (ds++[x])
where rec = sieve xs

Related

Haskell - split a list into two sublists with closest sums

I'm a Haskell beginner trying to learn more about the language by solving some online quizzes/problem sets.
The problem/question is quite lengthy but a part of it requires code that can find the number which divides a given list into two (nearly) equal (by sum) sub-lists.
Given [1..10]
Answer should be 7 since 1+2+..7 = 28 & 8+9+10 = 27
This is the way I implemented it
-- partitions list by y
partishner :: (Floating a) => Int -> [a] -> [[[a]]]
partishner 0 xs = [[xs],[]]
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
-- finds the equal sum
findTheEquilizer :: (Ord a, Floating a) => [a] -> [[a]]
findTheEquilizer xs = fst $ minimumBy (comparing snd) zipParty
where party = (tail . init) (partishner (length xs) xs) -- removes [xs,[]] types
afterParty = (map (\[x, y] -> (x - y) ** 2) . init . map (map sum)) party
zipParty = zip party afterParty -- zips partitions and squared diff betn their sums
Given (last . head) (findTheEquilizer [1..10])
output : 7
For numbers near 50k it works fine
λ> (last . head) (findTheEquilizer [1..10000])
7071.0
The trouble starts when I put in lists with any more than 70k elements in it. It takes forever to compute.
So what do I have to change in the code to make it run better or do I have to change my whole approach? I'm guessing it's the later, but I'm not sure how to go about do that.
It looks to me that the implementation is quite chaotic. For example partishner seems to construct a list of lists of lists of a, where, given I understood it correctly, the outer list contains lists with each two elements: the list of elements on "the left", and the list of elements at the "right". As a result, this will take O(n2) to construct the lists.
By using lists over 2-tuples, this is also quite "unsafe", since a list can - although here probably impossible - contain no elements, one element, or more than two elements. If you make a mistake in one of the functions, it will be hard to find out that mistake.
It looks to me that it might be easier to implement a "sweep algorithm": we first calculate the sum of all the elements in the list. This is the value on the "right" in case we decide to split at that specific point, next we start moving from left to right, each time subtracting the element from the sum on the right, and adding it to the sum on the left. We can each time evaluate the difference in score, like:
import Data.List(unfoldr)
sweep :: Num a => [a] -> [(Int, a, [a])]
sweep lst = x0 : unfoldr f x0
where x0 = (0, sum lst, lst)
f (_, _, []) = Nothing
f (i, r, (x: xs)) = Just (l, l)
where l = (i+1, r-2*x, xs)
For example:
Prelude Data.List> sweep [1,4,2,5]
[(0,12,[1,4,2,5]),(1,10,[4,2,5]),(2,2,[2,5]),(3,-2,[5]),(4,-12,[])]
So if we select to split at the first split point (before the first element), the sum on the right is 12 higher than the sum on the left, if we split after the first element, the sum on the right (11) is 10 higher than the sum on the left (1).
We can then obtain the minimum of these splits with minimumBy :: (a -> a -> Ordering) -> [a] -> a:
import Data.List(minimumBy)
import Data.Ord(comparing)
findTheEquilizer :: (Ord a, Num a) => [a] -> ([a], [a])
findTheEquilizer lst = (take idx lst, tl)
where (idx, _, tl) = minimumBy (comparing (abs . \(_, x, _) -> x)) (sweep lst)
We then obtain the correct value for [1..10]:
Prelude Data.List Data.Ord Data.List> findTheEquilizer [1..10]
([1,2,3,4,5,6,7],[8,9,10])
or for 70'000:
Prelude Data.List Data.Ord Data.List> head (snd (findTheEquilizer [1..70000]))
49498
The above is not ideal, it can be implemented more elegantly, but I leave this as an exercise.
Okay, firstly, let analyse why it run forever (...actually not forever, just slow), take a look of partishner function:
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
where take y xs and drop y xs are run linear time, i.e. O(N), and so as
[take y xs : [drop y xs]]
is O(N) too.
However, it is run again and again in recursive way over each element of given list. Now suppose the length of given list is M, each call of partishner function take O(N) times, to finish computation need:
O(1+2+...M) = (M(1+M)/2) ~ O(M^2)
Now, the list has 70k elements, it at least need 70k ^ 2 step. So why it hang.
Instead of using partishner function, you can sum the list in linear way as:
sumList::(Floating a)=>[a]->[a]
sumList xs = sum 0 xs
where sum _ [] = []
sum s (y:ys) = let s' = s + y in s' : sum s' ys
and findEqilizer just sum the given list from left to right (leftSum) and from right to left (rightSum) and take the result just as your original program, but the whole process just take linear time.
findEquilizer::(Ord a, Floating a) => [a] -> a
findEquilizer [] = 0
findEquilizer xs =
let leftSum = reverse $ 0:(sumList $ init xs)
rightSum = sumList $ reverse $ xs
afterParty = zipWith (\x y->(x-y) ** 2) leftSum rightSum
in fst $ minimumBy (comparing snd) (zip (reverse $ init xs) afterParty)
I assume that none of the list elements are negative, and use a "tortoise and hare" approach. The hare steps through the list, adding up elements. The tortoise does the same thing, but it keeps its sum doubled and it carefully ensures that it only takes a step when that step won't put it ahead of the hare.
approxEqualSums
:: (Num a, Ord a)
=> [a] -> (Maybe a, [a])
approxEqualSums as0 = stepHare 0 Nothing as0 0 as0
where
-- ht is the current best guess.
stepHare _tortoiseSum ht tortoise _hareSum []
= (ht, tortoise)
stepHare tortoiseSum ht tortoise hareSum (h:hs)
= stepTortoise tortoiseSum ht tortoise (hareSum + h) hs
stepTortoise tortoiseSum ht [] hareSum hare
= stepHare tortoiseSum ht [] hareSum hare
stepTortoise tortoiseSum ht tortoise#(t:ts) hareSum hare
| tortoiseSum' <= hareSum
= stepTortoise tortoiseSum' (Just t) ts hareSum hare
| otherwise
= stepHare tortoiseSum ht tortoise hareSum hare
where tortoiseSum' = tortoiseSum + 2*t
In use:
> approxEqualSums [1..10]
(Just 6,[7,8,9,10])
6 is the last element before going over half, and 7 is the first one after that.
I asked in the comment and OP says [1..n] is not really defining the question. Yes i guess what's asked is like [1 -> n] in random ascending sequence such as [1,3,7,19,37,...,1453,...,n].
Yet..! Even as per the given answers, for a list like [1..n] we really don't need to do any list operation at all.
The sum of [1..n] is n*(n+1)/2.
Which means we need to find m for n*(n+1)/4
Which means m(m+1)/2 = n*(n+1)/4.
So if n == 100 then m^2 + m - 5050 = 0
All we need is
formula where a = 1, b = 1 and c = -5050 yielding the reasonable root to be 70.565 ⇒ 71 (rounded). Lets check. 71*72/2 = 2556 and 5050-2556 = 2494 which says 2556 - 2494 = 62 minimal difference (<71). Yes we must split at 71. So just do like result = [[1..71],[72..100]] over..!
But when it comes to not subsequent ascending, that's a different animal. It has to be done by first finding the sum and then like binary search by jumping halfway the list and comparing the sums to decide whether to jump halfway back or forward accordingly. I will implement that one later.
Here's a code which is empirically behaving better than linear, and gets to the 2,000,000 in just over 1 second even when interpreted:
g :: (Ord c, Num c) => [c] -> [(Int, c)]
g = head . dropWhile ((> 0) . snd . last) . map (take 2) . tails . zip [1..]
. (\xs -> zipWith (-) (map (last xs -) xs) xs) . scanl1 (+)
g [1..10] ==> [(6,13),(7,-1)] -- 0.0s
g [1..70000] ==> [(49497,32494),(49498,-66502)] -- 0.09s
g [70000,70000-1..1] ==> [(20502,66502),(20503,-32494)] -- 0.09s
g [1..100000] ==> [(70710,75190),(70711,-66232)] -- 0.11s
g [1..1000000] ==> [(707106,897658),(707107,-516556)] -- 0.62s
g [1..2000000] ==> [(1414213,1176418),(1414214,-1652010)] -- 1.14s n^0.88
g [1..3000000] ==> [(2121320,836280),(2121321,-3406362)] -- 1.65s n^0.91
It works by running the partial sums with scanl1 (+) and taking the total sum as its last, so that for each partial sum, subtracting it from the total gives us the sum of the second part of the split.
The algorithm assumes all the numbers in the input list are strictly positive, so the partial sums list is monotonically increasing. Nothing else is assumed about the numbers.
The value must be chosen from the pair (the g's result) so that its second component's absolute value is the smaller between the two.
This is achieved by minimumBy (comparing (abs . snd)) . g.
clarifications: There's some confusion about "complexity" in the comments below, yet the answer says nothing at all about complexity but uses a specific empirical measurement. You can't argue with empirical data (unless you misinterpret its meaning).
The answer does not claim it "is better than linear", it says "it behaves better than linear" [in the tested range of problem sizes], which the empirical data incontrovertibly show.
Finally, an appeal to authority. Robert Sedgewick is an authority on algorithms. Take it up with him.
(and of course the algorithm handles unordered data as well as it does ordered).
As for the reasons for OP's code inefficiency: map sum . inits can't help being quadratic, but the equivalent scanl (+) 0 is linear. The radical improvement comes about from a lot of redundant calculations in the former being avoided in the latter. (Another example of this can be seen here.)

Reverse first k elements of a list

I'd like to reverse the first k elements of a list efficiently.
This is what I came up with:
reverseFirst :: Int -> [a] -> [a] -> [a]
reverseFirst 0 xs rev = rev ++ xs
reverseFirst k (x:xs) rev = reverseFirst (k-1) xs (x:rev)
reversed = reverseFirst 3 [1..5] mempty -- Result: [3,2,1,4,5]
It is fairly nice, but the (++) bothers me. Or should I maybe consider using another data structure? I want to do this many million times with short lists.
Let's think about the usual structure of reverse:
reverse = rev [] where
rev acc [] = acc
rev acc (x : xs) = rev (x : acc) xs
It starts with the empty list and tacks on elements from the front of the argument list till it's done. We want to do something similar, except we want to tack the elements onto the front of the portion of the list that we don't reverse. How can we do that when we don't have that un-reversed portion yet?
The simplest way I can think of to avoid traversing the front of the list twice is to use laziness:
reverseFirst :: Int -> [a] -> [a]
reverseFirst k xs = dis where
(dis, dat) = rf dat k xs
rf acc 0 ys = (acc, ys)
rf acc n [] = (acc, [])
rf acc n (y : ys) = rf (y : acc) (n - 1) ys
dat represents the portion of the list that is left alone. We calculate it in the same helper function rf that does the reversing, but we also pass it to rf in the initial call. It's never actually examined in rf, so everything just works. Looking at the generated core (using ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures) suggests that the pairs are compiled away into unlifted pairs and the Ints are unboxed, so everything should probably be quite efficient.
Profiling suggests that this implementation is about 1.3 times as fast as the difference list one, and allocates about 65% as much memory.
Well, usually I'd just write splitAt 3 >>> first reverse >>> uncurry(++) to achieve the goal.
If you're anxious about performance, you can consider a difference list:
reverseFirstN :: Int -> [a] -> [a]
reverseFirstN = go id
where go rev 0 xs = rev xs
go rev k (x:xs) = go ((x:).rev) (k-1) xs
but frankly I wouldn't expect this to be a lot faster: you need to traverse the first n elements either way. Actual performance will depend a lot on what the compiler is able to fuse away.

Fast length of an intersection with duplicates in Haskell

I'm writing a mastermind solver, and in an inner loop I calculate the length of the intersection with duplicates of two lists. Right now the function I have is
overlap :: Eq c => [c] -> [c] -> Int
overlap [] _ = 0
overlap (x:xs) ys
| x `elem` ys = 1 + overlap xs (delete x ys)
| otherwise = overlap xs ys
Is it possible to make this faster? If it helps, the arguments to overlap are short lists of the same length, at most 6 elements, and the c type has less than 10 possible values.
In general it is (almost) impossible to boost the performance of such algorithm: in order to remove duplicates in two unordered and unhashable lists, can be done in O(n^2).
In general, you can however boost performance with the following conditions (per condition, a different approach):
If you can for instance ensure that for each list you create/modify/..., the order of the elements is maintained; this can require some engineering. In that case, the algorithm can run in O(n).
In that case you can run it with:
--Use this only if xs and ys are sorted
overlap :: Ord c => [c] -> [c] -> Int
overlap (x:xs) (y:ys) | x < y = overlap xs (y:ys)
| x > y = overlap (x:xs) ys
| otherwise = 1 + overlap xs ys
overlap [] _ = 0
overlap _ [] = 0
In general sorting of a list can be done in O(n log n) and is thus more efficient than your O(n^2) overlap algorithm. The new overlap algorithm runs in O(n).
In case c is ordered, you might use a Data.Set as well. In that case you can use the fromList method that runs in O(n log n) to create a TreeSet for the two lists, then use the intersection function to calculate the intersection in O(n) time and finally use the size function to calculate the size.
--Use this only if c can be ordered
overlap :: Ord c => [c] -> [c] -> Int
overlap xs ys = size $ intersection (fromList xs) (fromList ys)
Are you using same ys for multiple xs?
If yes, you can try to calculate hash values for each element in ys and match by this value, but keep in mind that calculating hash needs to be faster then 6 comparisons.
If either of those is Ord you may also sort it earlier, and verify only necessary part of ys.
However, if you need fast random access lists aren't the best structure, you should probably take a look at Data.Array and Data.HashMap

Better way to solve this [Int] -> Int -> Int

Here is an sample problem I'm working upon:
Example Input: test [4, 1, 5, 6] 6 returns 5
I'm solving this using this function:
test :: [Int] -> Int -> Int
test [] _ = 0
test (x:xs) time = if (time - x) < 0
then x
else test xs $ time - x
Any better way to solve this function (probably using any inbuilt higher order function) ?
How about
test xs time = maybe 0 id . fmap snd . find ((>time) . fst) $ zip sums xs
where sums = scanl1 (+) xs
or equivalently with that sugary list comprehension
test xs time = headDef 0 $ [v | (s, v) <- zip sums xs, s > time]
where sums = scanl1 (+) xs
headDef is provided by safe. It's trivial to implement (f _ (x:_) = x; f x _ = x) but the safe package has loads of useful functions like these so it's good to check out.
Which sums the list up to each point and finds the first occurence greater than time. scanl is a useful function that behaves like foldl but keeps intermediate results and zip zips two lists into a list of tuples. Then we just use fmap and maybe to manipulate the Maybe (Integer, Integer) to get our result.
This defaults to 0 like yours but I like the version that simply goes to Maybe Integer better from a user point of view, to get this simply remove the maybe 0 id.
You might like scanl and its close relative, scanl1. For example:
test_ xs time = [curr | (curr, tot) <- zip xs (scanl1 (+) xs), tot > time]
This finds all the places where the running sum is greater than time. Then you can pick the first one (or 0) like this:
safeHead def xs = head (xs ++ [def])
test xs time = safeHead 0 (test_ xs time)
This is verbose, and I don't necessarily recommend writing such a simple function like this (IMO the pattern matching & recursion is plenty clear). But, here's a pretty declarative pipeline:
import Control.Error
import Data.List
deadline :: (Num a, Ord a) => a -> [a] -> a
deadline time = fromMaybe 0 . findDeadline time
findDeadline :: (Num a, Ord a) => a -> [a] -> Maybe a
findDeadline time xs = decayWithDifferences time xs
>>= findIndex (< 0)
>>= atMay xs
decayWithDifferences :: Num b => b -> [b] -> Maybe [b]
decayWithDifferences time = tailMay . scanl (-) time
-- > deadline 6 [4, 1, 5, 6]
-- 5
This documents the code a bit and in principle lets you test a little better, though IMO these functions fit more-or-less into the 'obviously correct' category.
You can verify that it matches your implementation:
import Test.QuickCheck
prop_equality :: [Int] -> Int -> Bool
prop_equality time xs = test xs time == deadline time xs
-- > quickCheck prop_equality
-- +++ OK, passed 100 tests.
In this particular case zipping suggested by others in not quite necessary:
test xs time = head $ [y-x | (x:y:_) <- tails $ scanl1 (+) $ 0:xs, y > time]++[0]
Here scanl1 will produce a list of rolling sums of the list xs, starting it with 0. Therefore, tails will produce a list with at least one list having two elements for non-empty xs. Pattern-matching (x:y:_) extracts two elements from each tail of rolling sums, so in effect it enumerates pairs of neighbouring elements in the list of rolling sums. Filtering on the condition, we reconstruct a part of the list that starts with the first element that produces a rolling sum greater than time. Then use headDef 0 as suggested before, or append a [0], so that head always returns something.
If you want to retain readability, I would just stick with your current solution. It's easy to understand, and isn't doing anything wrong.
Just because you can make it into a one line scan map fold mutant doesn't mean that you should!

Haskell: List Boundary

I have a list of doubles(myList), which I want to add to a new List (someList), but once the new list reaches a set size i.e. 25, I want to stop adding to it. I have tried implementing this function using sum but was unsuccessful. Example code below.
someList = [(a)| a <- myList, sum someList < 30]
The way #DanielFischer phrased the question is compatible with the Haskell way of thinking.
Do you want someList to be the longest prefix of myList that has a sum < 30?
Here's how I'd approach it: let's say our list is
>>> let list = [1..20]
we can find the "cumulative sums" using:
>>> let sums = tail . scanl (+) 0
>>> sums list
[1,3,6,10,15,21,28,36,45,55,66,78,91,105,120,136,153,171,190,210]
Now zip that with the original list to get pairs of elements with the sum up to that point
>>> zip list (sums list)
[(1,1),(2,3),(3,6),(4,10),(5,15),(6,21),(7,28),(8,36),
(9,45),(10,55),(11,66),(12,78),(13,91),(14,105),(15,120),
(16,136),(17,153),(18,171),(19,190),(20,210)]
Then we can takeWhile this list to get the prefix we want:
>>> takeWhile (\x -> snd x < 30) (zip list (sums list))
[(1,1),(2,3),(3,6),(4,10),(5,15),(6,21),(7,28)]
finally we can get rid of the cumulative sums that we used to perform this calculation:
>>> map fst (takeWhile (\x -> snd x < 30) (zip list (sums list)))
[1,2,3,4,5,6,7]
Note that because of laziness, this is as efficient as the recursive solutions -- only the sums up to the point where they fail the test need to be calculated. This can be seen because the solution works on infinite lists (because if we needed to calculate all the sums, we would never finish).
I'd probably abstract this and take the limit as a parameter:
>>> :{
... let initial lim list =
... map fst (takeWhile (\x -> snd x < lim) (zip list (sums list)))
... :}
This function has an obvious property it should satisfy, namely that the sum of a list should always be less than the limit (as long as the limit is greater than 0). So we can use QuickCheck to make sure we did it right:
>>> import Test.QuickCheck
>>> quickCheck (\lim list -> lim > 0 ==> sum (initial lim list) < lim)
+++ OK, passed 100 tests.
someList = makeList myList [] 0 where
makeList (x:xs) ys total = let newTot = total + x
in if newTot >= 25
then ys
else makeList xs (ys ++ [x]) newTot
This takes elements from myList as long as their sum is less than 25.
The logic takes place in makeList. It takes the first element of the input list and adds it to the running total, to see if it's greater than 25. If it is, we shouldn't add it to the output list, and we finish recursing. Otherwise, we put x on the end of the output list (ys) and keep going with the rest of the input list.
The behaviour you want is
ghci> appendWhileUnder 25 [1..5] [1..5]
[1,2,3,4,5,1,2,3]
because that sums to 21 and adding the 4 would bring it to 25.
OK, one way to go about this is by just appending them with ++ then taking the initial segment that's under 25.
appendWhileUnder n xs ys = takeWhileUnder n (xs++ys)
I don't want to keep summing intermediate lists, so I'll keep track with how much I'm allowed (n).
takeWhileUnder n [] = []
takeWhileUnder n (x:xs) | x < n = x:takeWhileUnder (n-x) xs
| otherwise = []
Here I allow x through if it doesn't take me beyond what's left of my allowance.
Possibly undesired side effect: it'll chop out bits of the original list if it sums to over 25. Workaround: use
appendWhileUnder' n xs ys = xs ++ takeWhileUnder (n - sum xs)
which keeps the entire xs whether it brings you over n or not.