Haskell split list determined by sum value - list

Hi all I am just a starter on Haskell. I have this list [1.0, 2.0, 1.5, 1.5, n-1]. How would I split this list to sub-list for example [3.0, 3.0,etc]. That is when added the values of the list, when the sum equals the given number(in this case 3), separates the list. I hope this question makes sense. Could someone point me out if there is a function in Data.List which can do this? or even the filter function. The only thing I have so far is:
import Data.List
import Text.Printf
--separate :: [Double] -> [(Double)]
separate = sum [1.0, 2.0, 0.5, 2.5, 3.0, 1.1, 2.0] -- [(1.0, 2.0),(0.5, 2.5),(3.0),(1.1,2.0)] (desired list)
main = do
putStrLn (printf "list : %s" $ show separate)
More precisely, the requirements are:
separate :: Double -> [Double] -> [[Double]]
where separate t xs = ys implies
concat ys = xs
all (near t . sum) ys, where near t x is a predicate stating that x is "close" to t (semi-formal requirement).

a recursive function can be written as
split :: [Double] -> [[Double]]
split x = go x [] 0
where go [] _ _ = []
go (x:xs) acc s | x+s>=3 = (acc++[x]) : go xs [] 0
| otherwise = go xs (acc++[x]) (s+x)
split [1.0,2.0,0.5,2.5,3.0,1.1,2.0]
[[1.0,2.0],[0.5,2.5],[3.0],[1.1,2.0]]
carry an accumulator and sum so far, decide to carry forward or split based on the sum.
Note that, this is constructing "at least", not "near". So all sublists will be greater or equal to 3.0.

Related

Triangularizing a list in Haskell

I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]] that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19] should return
[[1, 3, 6, 10, 15]
,[2, 5, 9, 14]
,[4, 8, 13, 19]
,[7, 12, 18]
,[11, 17]
,[16]]
By efficient, I mean that I want it to run in O(n) time where n is the length of the list.
Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:
def triangularize(elements):
row_index = 0
column_index = 0
diagonal_array = []
for a in elements:
if row_index == len(diagonal_array):
diagonal_array.append([a])
else:
diagonal_array[row_index].append(a)
if row_index == 0:
(row_index, column_index) = (column_index + 1, 0)
else:
row_index -= 1
column_index += 1
return diagonal_array
This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.
Perhaps there's some clever (or not-so-clever) way to foldr over the input list, but I haven't been able to sort it out.
Make increasing size chunks:
chunks :: [a] -> [[a]]
chunks = go 0 where
go n [] = []
go n as = b : go (n+1) e where (b,e) = splitAt n as
Then just transpose twice:
diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Try it in ghci:
> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.
So, out of curiosity, let's see if we can get a diagonalize function that way.
Define the infinite list of Cantor pairs recursively in Haskell:
auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
in (x,y) : auxCantorPairList nextPair
cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
And try that inside ghci:
λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
λ>
We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:
λ>
λ> xs = [1..]
λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
λ>
We recognize this is the top row from the OP's result in the text of the question.
Similarly for the next two rows:
λ>
λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
λ>
λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
λ>
From there, we can write our first draft of a diagonalize function:
λ>
λ> printAsLines xs = mapM_ (putStrLn . show) xs
λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
λ>
λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
λ>
EDIT: performance update
For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.
Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.
To improve performance, we can use a Data.Map structure for the target sublists.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.List as L
import qualified Data.Map as M
type MIL a = M.Map Integer [a]
buildCantorMap :: forall a. [a] -> MIL a
buildCantorMap xs =
let ts = zip xs cantorPairList -- triplets (a,(x,y))
m0 = (M.fromList [])::MIL a
redOp m (n,(x,y)) = let afn as = case as of
Nothing -> Just [n]
Just jas -> Just (n:jas)
in M.alter afn x m
m1r = L.foldl' redOp m0 ts
in
fmap reverse m1r
diagonalize :: [a] -> [[a]]
diagonalize xs = let cm = buildCantorMap xs
in map snd $ M.toAscList cm
With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.
It might be a good idea to craete a comb filter.
So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;
comb :: [Bool] -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id
combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ [] = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
in if c then (x:f,s) else (f,x:s)
λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])
Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.
Lets do it;
diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
in h : diags t
λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]
also seems to be lazy too :)
λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]
I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?

Creating a lists of lists with new element in each position

i'm new in the haskell world and i'd like to know how to insert a value in each position of a list in haskell, and return a lists of sublists containing the value in each position. For example:
insert' :: a -> [a] -> [[a]]
insert' a [] = [[a]]
insert' a list = ??
To get something like:
insert' 7 [1,2,3] = [[7,1,2,3],[1,7,2,3],[1,2,7,3],[1,2,3,7]]
insert' :: a -> [a] -> [[a]]
insert' y [] = [[y]]
insert' y xss#(x:xs) = (y : xss) : map (x :) (insert' y xs)
While the empty list case comes natural, let's take a look at insert' y xss#(x:xs). We essentially have two cases we need to cover:
y appears in front of x. Then we can just use y : xss.
y appears somewhere after x. We therefore just insert it in the rest of our list and make sure that x is the first element with map (x:).
Although #delta's answer is definitely more elegant, here a solution with difference lists. If we insert an element x on every location of list ys = [y1,y2,...,yn], the first time we will insert it as head, so that means we can construct x : ys.
. For the second element of the resulting list, we want to construct a list [y1,x,y2,...,yn]. We can do this like y1 : x : y2s. The next lists will all have a structure y1 : ....
The question is: how can we write a recursive structure that keeps track of the fact that we want to put elements in the head. We can use a function for that: we start with a function id. If we now call id (x:ys) then we will of course generate the list (x:ys).
We can however, based on the id function, construct a new function id2 = \z -> id (y1:z). This function will thus put y1 in the head of the list and then add the list with which we call id2 as tail. Next we can construct id3 = \z -> id2 (y2:z). This will put y1 and y2 as first elements followed by the tail z.
So we can put this into the following recursive format:
insert' :: a -> [a] -> [[a]]
insert' x = go id
where go d [] = [d [x]]
go d ys#(yh:yt) = (d (x : ys)) : go (d . (yh :)) yt
So we redirect insert' to go where the initial difference list is simply the id function. Each time we check if we have reached the end of the given list. If that is the case, we return the basecase: we call [x] (as tail) on the difference list, and thus construct a list where we append x as last element.
In case we have not yet reached the last element, we will first emit d (x : ys): we prepend x to the list and provide this as argument to the difference list d. d will prepend y1 : y2 : ... : yk up to the point where we insert x. Furthermore we call recursively go (d . (yh :)) yt on the tail of the list: we thus construct a new difference list, wehere we insert (yh :) as tail of the list. We thus produce a new function with one argument: the tail after the yh element.
This function produces the expected results:
*Main> insert' 4 []
[[4]]
*Main> insert' 4 [1,2,5]
[[4,1,2,5],[1,4,2,5],[1,2,4,5],[1,2,5,4]]
*Main> insert' 7 [1,2,3]
[[7,1,2,3],[1,7,2,3],[1,2,7,3],[1,2,3,7]]
You may also do as follows;
import Data.List
spread :: a -> [a] -> [[a]]
spread x xs = zipWith (++) (inits xs) ((x:) <$> tails xs)
*Main> spread 7 [1,2,3]
[[7,1,2,3],[1,7,2,3],[1,2,7,3],[1,2,3,7]]
*Main> spread 7 []
[[7]]
So this is about three stages.
(x:) <$> tails xs is all about applying the (x:) function to all elements of tails xs function. So tails [1,2,3] would return [[1,2,3],[2,3],[3],[]] and we are to apply an fmap which is designated by <$> in the inline form. This is going to be the third argument of the zipWith function.
(inits xs) which would return [[],[1],[1,2],[1,2,3]], is going to be the second argument to zipWith.
zipWith (++) is obviously will zip two list of lists by concatenating the list elements.
So we may also express the same functionality with applicative function functors as follows;
spread :: a -> [a] -> [[a]]
spread x = zipWith (++) <$> inits <*> fmap (x:) . tails
In this case we fmap the zipWith (++) function with type [[a]] -> [[a]] -> [[a]] over inits and then apply it over to fmap (x:) . tails.
It could get more pointfree but becomes more complicated to read through (at least for me). In my opinion this is as best as it gets.

Intersection of infinite lists

I know from computability theory that it is possible to take the intersection of two infinite lists, but I can't find a way to express it in Haskell.
The traditional method fails as soon as the second list is infinite, because you spend all your time checking it for a non-matching element in the first list.
Example:
let ones = 1 : ones -- an unending list of 1s
intersect [0,1] ones
This never yields 1, as it never stops checking ones for the element 0.
A successful method needs to ensure that each element of each list will be visited in finite time.
Probably, this will be by iterating through both lists, and spending approximately equal time checking all previously-visited elements in each list against each other.
If possible, I'd like to also have a way to ignore duplicates in the lists, as it is occasionally necessary, but this is not a requirement.
Using the universe package's Cartesian product operator we can write this one-liner:
import Data.Universe.Helpers
isect :: Eq a => [a] -> [a] -> [a]
xs `isect` ys = [x | (x, y) <- xs +*+ ys, x == y]
-- or this, which may do marginally less allocation
xs `isect` ys = foldr ($) [] $ cartesianProduct
(\x y -> if x == y then (x:) else id)
xs ys
Try it in ghci:
> take 10 $ [0,2..] `isect` [0,3..]
[0,6,12,18,24,30,36,42,48,54]
This implementation will not produce any duplicates if the input lists don't have any; but if they do, you can tack on your favorite dup-remover either before or after calling isect. For example, with nub, you might write
> nub ([0,1] `isect` repeat 1)
[1
and then heat up your computer pretty good, since it can never be sure there might not be a 0 in that second list somewhere if it looks deep enough.
This approach is significantly faster than David Fletcher's, produces many fewer duplicates and produces new values much more quickly than Willem Van Onsem's, and doesn't assume the lists are sorted like freestyle's (but is consequently much slower on such lists than freestyle's).
An idea might be to use incrementing bounds. Let is first relax the problem a bit: yielding duplicated values is allowed. In that case you could use:
import Data.List (intersect)
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
where intersectInfinite' n = intersect (take n xs) (take n ys) ++ intersectInfinite' (n+1)
In other words we claim that:
A∩B = A1∩B1 ∪ A2∩B2 ∪ ... ∪ ...
with A1 is a set containing the first i elements of A (yes there is no order in a set, but let's say there is somehow an order). If the set contains less elements then the full set is returned.
If c is in A (at index i) and in B (at index j), c will be emitted in segment (not index) max(i,j).
This will thus always generate an infinite list (with an infinite amount of duplicates) regardless whether the given lists are finite or not. The only exception is when you give it an empty list, in which case it will take forever. Nevertheless we here ensured that every element in the intersection will be emitted at least once.
Making the result finite (if the given lists are finite)
Now we can make our definition better. First we make a more advanced version of take, takeFinite (let's first give a straight-forward, but not very efficient defintion):
takeFinite :: Int -> [a] -> (Bool,[a])
takeFinite _ [] = (True,[])
takeFinite 0 _ = (False,[])
takeFinite n (x:xs) = let (b,t) = takeFinite (n-1) xs in (b,x:t)
Now we can iteratively deepen until both lists have reached the end:
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
intersectInfinite' :: Eq a => Int -> [a] -> [a] -> [a]
intersectInfinite' n xs ys | fa && fb = intersect xs ys
| fa = intersect ys xs
| fb = intersect xs ys
| otherwise = intersect xfa xfb ++ intersectInfinite' (n+1) xs ys
where (fa,xfa) = takeFinite n xs
(fb,xfb) = takeFinite n ys
This will now terminate given both lists are finite, but still produces a lot of duplicates. There are definitely ways to resolve this issue more.
Here's one way. For each x we make a list of maybes which has
Just x only where x appeared in ys. Then we interleave all
these lists.
isect :: Eq a => [a] -> [a] -> [a]
isect xs ys = (catMaybes . foldr interleave [] . map matches) xs
where
matches x = [if x == y then Just x else Nothing | y <- ys]
interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs
Maybe it can be improved using some sort of fairer interleaving -
it's already pretty slow on the example below because (I think)
it's doing an exponential amount of work.
> take 10 (isect [0..] [0,2..])
[0,2,4,6,8,10,12,14,16,18]
If elements in the lists are ordered then you can easy to do that.
intersectOrd :: Ord a => [a] -> [a] -> [a]
intersectOrd [] _ = []
intersectOrd _ [] = []
intersectOrd (x:xs) (y:ys) = case x `compare` y of
EQ -> x : intersectOrd xs ys
LT -> intersectOrd xs (y:ys)
GT -> intersectOrd (x:xs) ys
Here's yet another alternative, leveraging Control.Monad.WeightedSearch
import Control.Monad (guard)
import Control.Applicative
import qualified Control.Monad.WeightedSearch as W
We first define a cost for digging inside the list. Accessing the tail costs 1 unit more. This will ensure a fair scheduling among the two infinite lists.
eachW :: [a] -> W.T Int a
eachW = foldr (\x w -> pure x <|> W.weight 1 w) empty
Then, we simply disregard infinite lists.
intersection :: [Int] -> [Int] -> [Int]
intersection xs ys = W.toList $ do
x <- eachW xs
y <- eachW ys
guard (x==y)
return y
Even better with MonadComprehensions on:
intersection2 :: [Int] -> [Int] -> [Int]
intersection2 xs ys = W.toList [ y | x <- eachW xs, y <- eachW ys, x==y ]
Solution
I ended up using the following implementation; a slight modification of the answer by David Fletcher:
isect :: Eq a => [a] -> [a] -> [a]
isect [] = const [] -- don't bother testing against an empty list
isect xs = catMaybes . diagonal . map matches
where matches y = [if x == y then Just x else Nothing | x <- xs]
This can be augmented with nub to filter out duplicates:
isectUniq :: Eq a => [a] -> [a] -> [a]
isectUniq xs = nub . isect xs
Explanation
Of the line isect xs = catMaybes . diagonal . map matches
(map matches) ys computes a list of lists of comparisons between elements of xs and ys, where the list indices specify the indices in ys and xs respectively: i.e (map matches) ys !! 3 !! 0 would represent the comparison of ys !! 3 with xs !! 0, which would be Nothing if those values differ. If those values are the same, it would be Just that value.
diagonals takes a list of lists and returns a list of lists where the nth output list contains an element each from the first n lists. Another way to conceptualise it is that (diagonals . map matches) ys !! n contains comparisons between elements whose indices in xs and ys sum to n.
diagonal is simply a flat version of diagonals (diagonal = concat diagonals)
Therefore (diagonal . map matches) ys is a list of comparisons between elements of xs and ys, where the elements are approximately sorted by the sum of the indices of the elements of ys and xs being compared; this means that early elements are compared to later elements with the same priority as middle elements being compared to each other.
(catMaybes . diagonal . map matches) ys is a list of only the elements which are in both lists, where the elements are approximately sorted by the sum of the indices of the two elements being compared.
Note
(diagonal . map (catMaybes . matches)) ys does not work: catMaybes . matches only yields when it finds a match, instead of also yielding Nothing on no match, so the interleaving does nothing to distribute the work.
To contrast, in the chosen solution, the interleaving of Nothing and Just values by diagonal means that the program divides its attention between 'searching' for multiple different elements, not waiting for one to succeed; whereas if the Nothing values are removed before interleaving, the program may spend too much time waiting for a fruitless 'search' for a given element to succeed.
Therefore, we would encounter the same problem as in the original question: while one element does not match any elements in the other list, the program will hang; whereas the chosen solution will only hang while no matches are found for any elements in either list.

Haskell - how to iterate list elements in reverse order in an elegant way?

I'm trying to write a function that given a list of numbers, returns a list where every 2nd number is doubled in value, starting from the last element. So if the list elements are 1..n, n-th is going to be left as-is, (n-1)-th is going to be doubled in value, (n-2)-th is going to be left as-is, etc.
So here's how I solved it:
MyFunc :: [Integer] -> [Integer]
MyFunc xs = reverse (MyFuncHelper (reverse xs))
MyFuncHelper :: [Integer] -> [Integer]
MyFuncHelper [] = []
MyFuncHelper (x:[]) = [x]
MyFuncHelper (x:y:zs) = [x,y*2] ++ MyFuncHelper zs
And it works:
MyFunc [1,1,1,1] = [2,1,2,1]
MyFunc [1,1,1] = [1,2,1]
However, I can't help but think there has to be a simpler solution than reversing the list, processing it and then reversing it again. Could I simply iterate the list backwards? If yes, how?
The under reversed f xs idiom from the lens library will apply f to xs in reverse order:
under reversed (take 5) [1..100] => [96,97,98,99,100]
When you need to process the list from the end, usually foldr works pretty well. Here is a solution for you without reversing the whole list twice:
doubleOdd :: Num a => [a] -> [a]
doubleOdd = fst . foldr multiplyCond ([], False)
where multiplyCond x (rest, flag) = ((if flag then (x * 2) else x) : rest, not flag)
The multiplyCond function takes a tuple with a flag and the accumulator list. The flag constantly toggles on and off to track whether we should multiply the element or not. The accumulator list simply gathers the resulting numbers. This solution may be not so concise, but avoids extra work and doesn't use anything but prelude functions.
myFunc = reverse
. map (\(b,x) -> if b then x*2 else x)
. zip (cycle [False,True])
. reverse
But this isn't much better. Your implementation is sufficiently elegant.
The simplest way to iterate the list backwards is to reverse the list. I don't think you can really do much better than that; I suspect that if you have to traverse the whole list to find the end, and remember how to get back up, you might as well just reverse it. If this is a big deal, maybe you should be using some other data structure instead of lists—Vector or Seq might be good choices.
Another way to write your helper function is to use Traversable:
import Control.Monad.State
import Data.Traversable (Traversable, traverse)
toggle :: (Bool -> a -> b) -> a -> State Bool b
toggle f a =
do active <- get
put (not active)
return (f active a)
doubleEvens :: (Num a, Traversable t) => t a -> t a
doubleEvens xs = evalState (traverse (toggle step) xs) False
where step True x = 2*x
step False x = x
yourFunc :: Num a => [a] -> [a]
yourFunc = reverse . doubleEvens
Or if we go a bit crazy with Foldable and Traversable, we can try this:
Use Foldable's foldl to extract a reverse-order list from any of its instances. For some types this will be more efficient than reversing a list.
Then we can use traverse and State to map each element of the original structure to its counterpart in the reversed order.
Here's how to do it:
import Control.Monad.State
import Data.Foldable (Foldable)
import qualified Data.Foldable as F
import Data.Traversable (Traversable, traverse)
import Data.Map (Map)
import qualified Data.Map as Map
toReversedList :: Foldable t => t a -> [a]
toReversedList = F.foldl (flip (:)) []
reverse' :: Traversable t => t a -> t a
reverse' ta = evalState (traverse step ta) (toReversedList ta)
where step _ = do (h:t) <- get
put t
return h
yourFunc' :: (Traversable t, Num a) => t a -> t a
yourFunc' = reverse' . doubleEvens
-- >>> yourFunc' $ Map.fromList [(1, 1), (2, 1), (3, 1), (4, 1)]
-- fromList [(1,2),(2,1),(3,2),(4,1)]
-- >>> yourFunc' $ Map.fromList [(1, 1), (2, 1), (3, 1)]
-- fromList [(1,1),(2,2),(3,1)]
There's probably a better way to do this, though...
func xs = zipWith (*) xs $ reverse . (take $ length xs) $ cycle [1,2]

unique elements in a haskell list

okay, this is probably going to be in the prelude, but: is there a standard library function for finding the unique elements in a list? my (re)implementation, for clarification, is:
has :: (Eq a) => [a] -> a -> Bool
has [] _ = False
has (x:xs) a
| x == a = True
| otherwise = has xs a
unique :: (Eq a) => [a] -> [a]
unique [] = []
unique (x:xs)
| has xs x = unique xs
| otherwise = x : unique xs
I searched for (Eq a) => [a] -> [a] on Hoogle.
First result was nub (remove duplicate elements from a list).
Hoogle is awesome.
The nub function from Data.List (no, it's actually not in the Prelude) definitely does something like what you want, but it is not quite the same as your unique function. They both preserve the original order of the elements, but unique retains the last
occurrence of each element, while nub retains the first occurrence.
You can do this to make nub act exactly like unique, if that's important (though I have a feeling it's not):
unique = reverse . nub . reverse
Also, nub is only good for small lists.
Its complexity is quadratic, so it starts to get slow if your list can contain hundreds of elements.
If you limit your types to types having an Ord instance, you can make it scale better.
This variation on nub still preserves the order of the list elements, but its complexity is O(n * log n):
import qualified Data.Set as Set
nubOrd :: Ord a => [a] -> [a]
nubOrd xs = go Set.empty xs where
go s (x:xs)
| x `Set.member` s = go s xs
| otherwise = x : go (Set.insert x s) xs
go _ _ = []
In fact, it has been proposed to add nubOrd to Data.Set.
import Data.Set (toList, fromList)
uniquify lst = toList $ fromList lst
I think that unique should return a list of elements that only appear once in the original list; that is, any elements of the orginal list that appear more than once should not be included in the result.
May I suggest an alternative definition, unique_alt:
unique_alt :: [Int] -> [Int]
unique_alt [] = []
unique_alt (x:xs)
| elem x ( unique_alt xs ) = [ y | y <- ( unique_alt xs ), y /= x ]
| otherwise = x : ( unique_alt xs )
Here are some examples that highlight the differences between unique_alt and unqiue:
unique [1,2,1] = [2,1]
unique_alt [1,2,1] = [2]
unique [1,2,1,2] = [1,2]
unique_alt [1,2,1,2] = []
unique [4,2,1,3,2,3] = [4,1,2,3]
unique_alt [4,2,1,3,2,3] = [4,1]
I think this would do it.
unique [] = []
unique (x:xs) = x:unique (filter ((/=) x) xs)
Another way to remove duplicates:
unique :: [Int] -> [Int]
unique xs = [x | (x,y) <- zip xs [0..], x `notElem` (take y xs)]
Algorithm in Haskell to create a unique list:
data Foo = Foo { id_ :: Int
, name_ :: String
} deriving (Show)
alldata = [ Foo 1 "Name"
, Foo 2 "Name"
, Foo 3 "Karl"
, Foo 4 "Karl"
, Foo 5 "Karl"
, Foo 7 "Tim"
, Foo 8 "Tim"
, Foo 9 "Gaby"
, Foo 9 "Name"
]
isolate :: [Foo] -> [Foo]
isolate [] = []
isolate (x:xs) = (fst f) : isolate (snd f)
where
f = foldl helper (x,[]) xs
helper (a,b) y = if name_ x == name_ y
then if id_ x >= id_ y
then (x,b)
else (y,b)
else (a,y:b)
main :: IO ()
main = mapM_ (putStrLn . show) (isolate alldata)
Output:
Foo {id_ = 9, name_ = "Name"}
Foo {id_ = 9, name_ = "Gaby"}
Foo {id_ = 5, name_ = "Karl"}
Foo {id_ = 8, name_ = "Tim"}
A library-based solution:
We can use that style of Haskell programming where all looping and recursion activities are pushed out of user code and into suitable library functions. Said library functions are often optimized in ways that are way beyond the skills of a Haskell beginner.
A way to decompose the problem into two passes goes like this:
produce a second list that is parallel to the input list, but with duplicate elements suitably marked
eliminate elements marked as duplicates from that second list
For the first step, duplicate elements don't need a value at all, so we can use [Maybe a] as the type of the second list. So we need a function of type:
pass1 :: Eq a => [a] -> [Maybe a]
Function pass1 is an example of stateful list traversal where the state is the list (or set) of distinct elements seen so far. For this sort of problem, the library provides the mapAccumL :: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function.
Here the mapAccumL function requires, besides the initial state and the input list, a step function argument, of type s -> a -> (s, Maybe a).
If the current element x is not a duplicate, the output of the step function is Just x and x gets added to the current state. If x is a duplicate, the output of the step function is Nothing, and the state is passed unchanged.
Testing under the ghci interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
λ>
λ> import Data.List(mapAccumL)
λ>
λ> pass1 xs = mapAccumL stepFn [] xs
λ>
λ> xs2 = snd $ pass1 "abacrba"
λ> xs2
[Just 'a', Just 'b', Nothing, Just 'c', Just 'r', Nothing, Nothing]
λ>
Writing a pass2 function is even easier. To filter out Nothing non-values, we could use:
import Data.Maybe( fromJust, isJust)
pass2 = (map fromJust) . (filter isJust)
but why bother at all ? - as this is precisely what the catMaybes library function does.
λ>
λ> import Data.Maybe(catMaybes)
λ>
λ> catMaybes xs2
"abcr"
λ>
Putting it all together:
Overall, the source code can be written as:
import Data.Maybe(catMaybes)
import Data.List(mapAccumL)
uniques :: (Eq a) => [a] -> [a]
uniques = let stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
in catMaybes . snd . mapAccumL stepFn []
This code is reasonably compatible with infinite lists, something occasionally referred to as being “laziness-friendly”:
λ>
λ> take 5 $ uniques $ "abacrba" ++ (cycle "abcrf")
"abcrf"
λ>
Efficiency note:
If we anticipate that it is possible to find many distinct elements in the input list and we can have an Ord a instance, the state can be implemented as a Set object rather than a plain list, this without having to alter the overall structure of the solution.
Here's a solution that uses only Prelude functions:
uniqueList theList =
if not (null theList)
then head theList : filter (/= head theList) (uniqueList (tail theList))
else []
I'm assuming this is equivalent to running two or three nested "for" loops (running through each element, then running through each element again to check for other elements with the same value, then removing those other elements) so I'd estimate this is O(n^2) or O(n^3)
Might even be better than reversing a list, nubbing it, then reversing it again, depending on your circumstances.