Edit: I must not have worded it clearly enough, but I'm looking for a function like the one below, but not exactly it.
Given a list, I wanted to be able to find the index of the largest element in the list
(So, list !! (indexOfMaximum list) == maximum list)
I wrote some code that seems pretty efficient, although I feel I'm reinventing the wheel somewhere.
indexOfMaximum :: (Ord n, Num n) => [n] -> Int
indexOfMaximum list =
let indexOfMaximum' :: (Ord n, Num n) => [n] -> Int -> n -> Int -> Int
indexOfMaximum' list' currIndex highestVal highestIndex
| null list' = highestIndex
| (head list') > highestVal =
indexOfMaximum' (tail list') (1 + currIndex) (head list') currIndex
| otherwise =
indexOfMaximum' (tail list') (1 + currIndex) highestVal highestIndex
in indexOfMaximum' list 0 0 0
Now I want to return a list of the indices of the largest n numbers in the list.
My only solution is to store the top n elements in a list and replace (head list') > highestVal with a comparison across the n-largest-so-far list.
It feels like there has to be a more efficient way than to do this, and I also feel I'm making insufficient use of Prelude and Data.List. Any suggestions?
This solution associates each element with its index, sorts the list, so the smallest element is first, reverses it so the largest element is first, takes the first n elements, and then extracts the index.
maxn n xs = map snd . take n . reverse . sort $ zip xs [0..]
The shortest way finds the last index of a maximum element,
maxIndex list = snd . maximum $ zip list [0 .. ]
If you want the first index,
maxIndex list = snd . maximumBy cmp $ zip list [0 .. ]
where
cmp (v,i) (w,j) = case compare v w of
EQ -> compare j i
ne -> ne
The downside is that maximum and maximumBy are too lazy, so these may build large thunks. To avoid that, either use a manual recursion (like you did, but some strictness annotations may be necessary) or use a strict left fold with a strict accumulator type, tuples are not good for that because foldl' only evaluates to weak head normal form, that is to the outermost constructor here, and thus you build thunks in the tuple components.
Well, a simple way would be to use maximum to find the largest element and then use findIndices to find each occurrence of it. Something like:
largestIndices :: Ord a => [a] -> [Int]
largestIndices ls = findIndices (== maximum ls) ls
However, this is not perfect because maximum is a partial function and will barf horribly if given an empty list. You can easily avoid this by adding a [] case:
largestIndices :: Ord a => [a] -> [Int]
largestIndices [] = []
largestIndices ls = findIndices (== maximum ls) ls
The real trick to this answer is how I figured it out. I didn't even know about findIndices before now! However, GHCi has a neat command called :browse.
Prelude> :browse Data.List
This lists every single function exported by Data.List. Using this, I just search first for maximum and then for index to see what the options were. And, right by findIndex, there was findIndecies, which was perfect.
Finally, I would not worry about efficiency unless you actually see that code is running slowly. GHC can--and does--perform some very aggressive optimizations because the language is pure and it can get away with it. So the only time you need to worry about performance is when--after compiling with -O2--you see that it's a problem.
EDIT: If you want to find the n top elements' indices, here's an easy idea: sort the list in descending order, grab the first n unique elements, get their indices with elemIndices and take the first n indices from that. I hope this is relatively clear.
Here's a quick version of my idea:
nLargestInices n ls = take n $ concatMap (`elemIndices` ls) nums
where nums = take n . reverse . nub $ sort ls
Related
TL;DR: I want the exact behavior as filter ((== 4) . length) . subsequences. Just using subsequences also creates variable length of lists, which takes a lot of time to process. Since in the end only lists of length 4 are needed, I was thinking there must be a faster way.
I have a list of functions. The list has the type [Wor -> Wor]
The list looks something like this
[f1, f2, f3 .. fn]
What I want is a list of lists of n functions while preserving order like this
input : [f1, f2, f3 .. fn]
argument : 4 functions
output : A list of lists of 4 functions.
Expected output would be where if there's an f1 in the sublist, it'll always be at the head of the list.
If there's a f2 in the sublist and if the sublist doens't have f1, f2 would be at head. If fn is in the sublist, it'll be at last.
In general if there's a fx in the list, it never will be infront of f(x - 1) .
Basically preserving the main list's order when generating sublists.
It can be assumed that length of list will always be greater then given argument.
I'm just starting to learn Haskell so I haven't tried all that much but so far this is what I have tried is this:
Generation permutations with subsequences function and applying (filter (== 4) . length) on it seems to generate correct permutations -but it doesn't preserve order- (It preserves order, I was confusing it with my own function).
So what should I do?
Also if possible, is there a function or a combination of functions present in Hackage or Stackage which can do this? Because I would like to understand the source.
You describe a nondeterministic take:
ndtake :: Int -> [a] -> [[a]]
ndtake 0 _ = [[]]
ndtake n [] = []
ndtake n (x:xs) = map (x:) (ndtake (n-1) xs) ++ ndtake n xs
Either we take an x, and have n-1 more to take from xs; or we don't take the x and have n more elements to take from xs.
Running:
> ndtake 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Update: you wanted efficiency. If we're sure the input list is finite, we can aim at stopping as soon as possible:
ndetake n xs = go (length xs) n xs
where
go spare n _ | n > spare = []
go spare n xs | n == spare = [xs]
go spare 0 _ = [[]]
go spare n [] = []
go spare n (x:xs) = map (x:) (go (spare-1) (n-1) xs)
++ go (spare-1) n xs
Trying it:
> length $ ndetake 443 [1..444]
444
The former version seems to be stuck on this input, but the latter one returns immediately.
But, it measures the length of the whole list, and needlessly so, as pointed out by #dfeuer in the comments. We can achieve the same improvement in efficiency while retaining a bit more laziness:
ndzetake :: Int -> [a] -> [[a]]
ndzetake n xs | n > 0 =
go n (length (take n xs) == n) (drop n xs) xs
where
go n b p ~(x:xs)
| n == 0 = [[]]
| not b = []
| null p = [(x:xs)]
| otherwise = map (x:) (go (n-1) b p xs)
++ go n b (tail p) xs
Now the last test also works instantly with this code as well.
There's still room for improvement here. Just as with the library function subsequences, the search space could be explored even more lazily. Right now we have
> take 9 $ ndzetake 3 [1..]
[[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,2,11]]
but it could be finding [2,3,4] before forcing the 5 out of the input list. Shall we leave it as an exercise?
Here's the best I've been able to come up with. It answers the challenge Will Ness laid down to be as lazy as possible in the input. In particular, ndtake m ([1..n]++undefined) will produce as many entries as possible before throwing an exception. Furthermore, it strives to maximize sharing among the result lists (note the treatment of end in ndtakeEnding'). It avoids problems with badly balanced list appends using a difference list. This sequence-based version is considerably faster than any pure-list version I've come up with, but I haven't teased apart just why that is. I have the feeling it may be possible to do even better with a better understanding of just what's going on, but this seems to work pretty well.
Here's the general idea. Suppose we ask for ndtake 3 [1..5]. We first produce all the results ending in 3 (of which there is one). Then we produce all the results ending in 4. We do this by (essentially) calling ndtake 2 [1..3] and adding the 4 onto each result. We continue in this manner until we have no more elements.
import qualified Data.Sequence as S
import Data.Sequence (Seq, (|>))
import Data.Foldable (toList)
We will use the following simple utility function. It's almost the same as splitAtExactMay from the 'safe' package, but hopefully a bit easier to understand. For reasons I haven't investigated, letting this produce a result when its argument is negative leads to ndtake with a negative argument being equivalent to subsequences. If you want, you can easily change ndtake to do something else for negative arguments.
-- to return an empty list in the negative case.
splitAtMay :: Int -> [a] -> Maybe ([a], [a])
splitAtMay n xs
| n <= 0 = Just ([], xs)
splitAtMay _ [] = Nothing
splitAtMay n (x : xs) = flip fmap (splitAtMay (n - 1) xs) $
\(front, rear) -> (x : front, rear)
Now we really get started. ndtake is implemented using ndtakeEnding, which produces a sort of "difference list", allowing all the partial results to be concatenated cheaply.
ndtake :: Int -> [t] -> [[t]]
ndtake n xs = ndtakeEnding n xs []
ndtakeEnding :: Int -> [t] -> ([[t]] -> [[t]])
ndtakeEnding 0 _xs = ([]:)
ndtakeEnding n xs = case splitAtMay n xs of
Nothing -> id -- Not enough elements
Just (front, rear) ->
(front :) . go rear (S.fromList front)
where
-- For each element, produce a list of all combinations
-- *ending* with that element.
go [] _front = id
go (r : rs) front =
ndtakeEnding' [r] (n - 1) front
. go rs (front |> r)
ndtakeEnding doesn't call itself recursively. Rather, it calls ndtakeEnding' to calculate the combinations of the front part. ndtakeEnding' is very much like ndtakeEnding, but with a few differences:
We use a Seq rather than a list to represent the input sequence. This lets us split and snoc cheaply, but I'm not yet sure why that seems to give amortized performance that is so much better in this case.
We already know that the input sequence is long enough, so we don't need to check.
We're passed a tail (end) to add to each result. This lets us share tails when possible. There are lots of opportunities for sharing tails, so this can be expected to be a substantial optimization.
We use foldr rather than pattern matching. Doing this manually with pattern matching gives clearer code, but worse constant factors. That's because the :<|, and :|> patterns exported from Data.Sequence are non-trivial pattern synonyms that perform a bit of calculation, including amortized O(1) allocation, to build the tail or initial segment, whereas folds don't need to build those.
NB: this implementation of ndtakeEnding' works well for recent GHC and containers; it seems less efficient for earlier versions. That might be the work of Donnacha Kidney on foldr for Data.Sequence. In earlier versions, it might be more efficient to pattern match by hand, using viewl for versions that don't offer the pattern synonyms.
ndtakeEnding' :: [t] -> Int -> Seq t -> ([[t]] -> [[t]])
ndtakeEnding' end 0 _xs = (end:)
ndtakeEnding' end n xs = case S.splitAt n xs of
(front, rear) ->
((toList front ++ end) :) . go rear front
where
go = foldr go' (const id) where
go' r k !front = ndtakeEnding' (r : end) (n - 1) front . k (front |> r)
-- With patterns, a bit less efficiently:
-- go Empty _front = id
-- go (r :<| rs) !front =
-- ndtakeEnding' (r : end) (n - 1) front
-- . go rs (front :|> r)
Here is the expected input/output:
repeated "Mississippi" == "ips"
repeated [1,2,3,4,2,5,6,7,1] == [1,2]
repeated " " == " "
And here is my code so far:
repeated :: String -> String
repeated "" = ""
repeated x = group $ sort x
I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.
Your code already does half of the job
> group $ sort "Mississippi"
["M","iiii","pp","ssss"]
You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
Using this:
> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]
Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:
> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"
Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.
Yet another option is to use a list comprehension
> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"
This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.
Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:
repeated :: Ord a => [a] -> [a]
After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.
Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.
So, you might fill in the following skeleton:
module Repeated (repeated) where
import Data.List (group, sort)
repeated :: Ord a => [a] -> [a]
repeated = map _
. filter (\x -> _)
. group
. sort
I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)
When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.
Here's some other approaches, to evaluate #chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)
While it's true that sorting is O(n lg n), ...
It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:
a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...
group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.
while greatly complicating the implementation.
The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.
repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.
repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
inserter2 x [] = [(x, False)]
inserter2 x (xb#(x', _): xs)
| x == x' = (x', True): xs
| otherwise = xb: inserter2 x xs
repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.
repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
inserter3 x [] = [(x, 1)]
inserter3 x xss#(xc#(x', c): xs) = case x `compare` x' of
{ LT -> ((x, 1): xss)
; EQ -> ((x', c+1): xs)
; GT -> (xc: inserter3 x xs)
}
repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.
repeated4 xs = sort $ go4 [] [] xs
go4 :: Ord a => [a] -> [a] -> [a] -> [a]
go4 repeats _ [] = repeats
go4 repeats onces (x: xs) = case findUpd x onces of
{ (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
; (False, oncesU) -> go4 repeats oncesU xs
}
findUpd :: Ord a => a -> [a] -> (Bool, [a])
findUpd x [] = (False, [x])
findUpd x (x': os) | x == x' = (True, os) -- i.e. x' removed
| otherwise =
let (b, os') = findUpd x os in (b, x': os')
(That last bit of list-fiddling in findUpd is very similar to span.)
Hi I've got a list on Haskell with close to 10^15 Int's in it and I'm trying print the length of the list.
let list1 = [1..1000000000000000] -- this is just a dummy list I dont
print list1 length -- know the actual number of elements
printing this takes a very long time to do, is there another way to get the number of elements in the list and print that number?
I've occasionally gotten some value out of lists that carry their length. The poor man's version goes like this:
import Data.Monoid
type ListLength a = (Sum Integer, [a])
singletonLL :: a -> ListLength a
singletonLL x = (1, [x])
lengthLL :: ListLength a -> Integer
lengthLL (Sum len, _) = len
The Monoid instance that comes for free gives you empty lists, concatenation, and a fromList-alike. Other standard Prelude functions that operate on lists like map, take, drop aren't too hard to mimic, though you'll need to skip the ones like cycle and repeat that produce infinite lists, and filter and the like are a bit expensive. For your question, you would also want analogs of the Enum methods; e.g. perhaps something like:
enumFromToLL :: Integral a => a -> a -> ListLength a
enumFromToLL lo hi = (fromIntegral hi-fromIntegral lo+1, [lo..hi])
Then, in ghci, your example is instant:
> lengthLL (enumFromToLL 1 1000000000000000)
1000000000000000
Not easy way to explain this, but I will try. I think i'm confusing my method with some C, but here it goes:
I want to check if a list is complete, like this:
main> check 1 [1,3,4,5]
False
main> check 1 [1,2,3,4]
True
It's a finite list, and the list doesn't have to be ordered. But inside the list there most be the number that misses to be True. In the first case it's the number 2.
This is my version, but it doesn't even compile.
check :: Eq a => a -> [a] -> Bool
check n [] = False
check n x | n/=(maximum x) = elem n x && check (n+1) x
| otherwise = False
So if I understand this correctly, you want to check to see that all the elements in a list form a sequence without gaps when sorted. Here's one way:
noGaps :: (Enum a, Ord a) => [a] -> Bool
noGaps xs = all (`elem` xs) [minimum xs .. maximum xs]
[minimum xs .. maximum xs] creates a sequential list of all values from the lowest to the highest value. Then you just check that they are all elements of the original list.
Your function doesn't compile because your type constraints are greater than what you declare them as. You say that a only needs to be an instance of Eq - but then you add something to it, which requires it to be an instance of Num. The way you use the function also doesn't make sense with the signature you declared - check [1,2,3,4] is a Bool in your example, but in the code you gave it would be Eq a => [[a]] -> Bool (if it compiled in the first place).
Do you only need this to work with integers? If not, give some example as to what "complete" means in that case. If yes, then do they always start with 1?
Here's another take on the problem, which uses a function that works on sorted lists, and use it with a sorted input.
The following will check that the provided list of n Int contains all values from 1 to n:
check :: (Num a, Ord a) => [a] -> Bool
import List
check l = check_ 1 (sort l)
where check_ n [] = True
check_ n [x] = n == x
check_ n (x:y:xs) = (x+1)==y && check_ (n+1) (y:xs)
Note the use of List.sort to prepare the list for the real check implemented in check_.
Scenario:
If there is an array of integers and I want to get array of integers in return that their total should not exceed 10.
I am a beginner in Haskell and tried below. If any one could correct me, would be greatly appreciated.
numbers :: [Int]
numbers = [1,2,3,4,5,6,7,8,9,10, 11, 12]
getUpTo :: [Int] -> Int -> [Int]
getUpTo (x:xs) max =
if max <= 10
then
max = max + x
getUpTo xs max
else
x
Input
getUpTo numbers 0
Output Expected
[1,2,3,4]
BEWARE: This is not a solution to the knapsack problem :)
A very fast solution I came up with is the following one. Of course solving the full knapsack problem would be harder, but if you only need a quick solution this should work:
import Data.List (sort)
getUpTo :: Int -> [Int] -> [Int]
getUpTo max xs = go (sort xs) 0 []
where
go [] sum acc = acc
go (x:xs) sum acc
| x + sum <= max = go xs (x + sum) (x:acc)
| otherwise = acc
By sorting out the array before everything else, I can take items from the top one after another, until the maximum is exceeded; the list built up to that point is then returned.
edit: as a side note, I swapped the order of the first two arguments because this way should be more useful for partial applications.
For educational purposes (and since I felt like explaining something :-), here's a different version, which uses more standard functions. As written it is slower, because it computes a number of sums, and doesn't keep a running total. On the other hand, I think it expresses quite well how to break the problem down.
getUpTo :: [Int] -> [Int]
getUpTo = last . filter (\xs -> sum xs <= 10) . Data.List.inits
I've written the solution as a 'pipeline' of functions; if you apply getUpTo to a list of numbers, Data.List.inits gets applied to the list first, then filter (\xs -> sum xs <= 10) gets applied to the result, and finally last gets applied to the result of that.
So, let's see what each of those three functions do. First off, Data.List.inits returns the initial segments of a list, in increasing order of length. For example, Data.List.inits [2,3,4,5,6] returns [[],[2],[2,3],[2,3,4],[2,3,4,5],[2,3,4,5,6]]. As you can see, this is a list of lists of integers.
Next up, filter (\xs -> sum xs <= 10) goes through these lists of integer in order, keeping them if their sum is less than 10, and discarding them otherwise. The first argument of filter is a predicate which given a list xs returns True if the sum of xs is less than 10. This may be a bit confusing at first, so an example with a simpler predicate is in order, I think. filter even [1,2,3,4,5,6,7] returns [2,4,6] because that are the even values in the original list. In the earlier example, the lists [], [2], [2,3], and [2,3,4] all have a sum less than 10, but [2,3,4,5] and [2,3,4,5,6] don't, so the result of filter (\xs -> sum xs <= 10) . Data.List.inits applied to [2,3,4,5,6] is [[],[2],[2,3],[2,3,4]], again a list of lists of integers.
The last step is the easiest: we just return the last element of the list of lists of integers. This is in principle unsafe, because what should the last element of an empty list be? In our case, we are good to go, since inits always returns the empty list [] first, which has sum 0, which is less than ten - so there's always at least one element in the list of lists we're taking the last element of. We apply last to a list which contains the initial segments of the original list which sum to less than 10, ordered by length. In other words: we return the longest initial segment which sums to less than 10 - which is what you wanted!
If there are negative numbers in your numbers list, this way of doing things can return something you don't expect: getUpTo [10,4,-5,20] returns [10,4,-5] because that is the longest initial segment of [10,4,-5,20] which sums to under 10; even though [10,4] is above 10. If this is not the behaviour you want, and expect [10], then you must replace filter by takeWhile - that essentially stops the filtering as soon as the first element for which the predicate returns False is encountered. E.g. takeWhile [2,4,1,3,6,8,5,7] evaluates to [2,4]. So in our case, using takeWhile stops the moment the sum goes over 10, not trying longer segments.
By writing getUpTo as a composition of functions, it becomes easy to change parts of your algorithm: if you want the longest initial segment that sums exactly to 10, you can use last . filter (\xs -> sum xs == 10) . Data.List.inits. Or if you want to look at the tail segments instead, use head . filter (\xs -> sum xs <= 10) . Data.List.tails; or to take all the possible sublists into account (i.e. an inefficient knapsack solution!): last . filter (\xs -> sum xs <= 10) . Data.List.sortBy (\xs ys -> length xscomparelength ys) . Control.Monad.filterM (const [False,True]) - but I'm not going to explain that here, I've been rambling long enough!
There is an answer with a fast version; however, I thought it might also be instructive to see the minimal change necessary to your code to make it work the way you expect.
numbers :: [Int]
numbers = [1,2,3,4,5,6,7,8,9,10, 11, 12]
getUpTo :: [Int] -> Int -> [Int]
getUpTo (x:xs) max =
if max < 10 -- (<), not (<=)
then
-- return a list that still contains x;
-- can't reassign to max, but can send a
-- different value on to the next
-- iteration of getUpTo
x : getUpTo xs (max + x)
else
[] -- don't want to return any more values here
I am fairly new to Haskell. I just started with it a few hours ago and as such I see in every question a challenge that helps me get out of the imperative way of thinking and a opportunity to practice my recursion thinking :)
I gave some thought to the question and I came up with this, perhaps, naive solution:
upToBound :: (Integral a) => [a] -> a -> [a]
upToBound (x:xs) bound =
let
summation _ [] = []
summation n (m:ms)
| n + m <= bound = m:summation (n + m) ms
| otherwise = []
in
summation 0 (x:xs)
I know there is already a better answer, I just did it for the fun of it.
I have the impression that I changed the signature of the original invocation, because I thought it was pointless to provide an initial zero to the outer function invocation, since I can only assume it can only be zero at first. As such, in my implementation I hid the seed from the caller and provided, instead, the maximum bound, which is more likely to change.
upToBound [1,2,3,4,5,6,7,8,9,0] 10
Which outputs: [1,2,3,4]