Haskell: Removing duplicates tuples from a list? - list

I'm trying to get from the before to after state. Is there a convenient Haskell function for removing duplicate tuples from a list? Or perhaps it is something a bit more complicated such as iterating through the entire list?
Before: the list of tuples, sorted by word, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]

Searching for Eq a => [a] -> [a] on hoogle, returns nub function:
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)
As in the documentation the more general case is nubBy.
That said, this is an O(n^2) algorithm and may not be very efficient. An alternative would be to use Data.Set.fromList if the values are an instance of Ord type-class, as in:
import qualified Data.Set as Set
nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList
though this will not maintain the order of the original list.
A simple set style solution which maintains the order of the original list can be:
import Data.Set (Set, member, insert, empty)
nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
where
loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
loop acc#(xs, obs) x
| x `member` obs = acc
| otherwise = (x:xs, x `insert` obs)

If you want to define a version of nub for Ord, I recommend using
nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
where
go x r obs
| x `member` obs = r obs
| otherwise = obs' `seq` x : r obs'
where obs' = x `insert` obs
To see what this is doing, you can get rid of the foldr:
nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
where
nub'' [] obs = obs `seq` []
nub'' (y : ys) obs
| y `member` obs = nub'' ys obs
| otherwise = obs' `seq` y : nub'' ys obs'
where obs' = y `insert` obs
One key point about this implementation, as opposed to
behzad.nouri's, is that it produces elements lazily, as they are consumed. This is generally much better for cache utilization and garbage collection, as well as using a constant factor less memory than the reversing algorithm.

Related

Function to find number of occurrences in list

So I already have a function that finds the number of occurrences in a list using maps.
occur :: [a] -> Map a a
occur xs = fromListWith (+) [(x, 1) | x <- xs]
For example if a list [1,1,2,3,3] is inputted, the code will output [(1,2),(2,1),(3,2)], and for a list [1,2,1,1] the output would be [(1,3),(2,1)].
I was wondering if there's any way I can change this function to use foldr instead to eliminate the use of maps.
You can make use of foldr where the accumulator is a list of key-value pairs. Each "step" we look if the list already contains a 2-tuple for the given element. If that is the case, we increment the corresponding value. If the item x does not yet exists, we add (x, 1) to that list.
Our function thus will look like:
occur :: Eq => [a] -> [(a, Int)]
occur = foldr incMap []
where incMap thus takes an item x and a list of 2-tuples. We can make use of recursion here to update the "map" with:
incMap :: Eq a => a -> [(a, Int)] -> [(a, Int)]
incMap x = go
where go [] = [(x, 1)]
go (y2#(y, ny): ys)
| x == y = … : ys
| otherwise = y2 : …
where I leave implementing the … parts as an exercise.
This algorithm is not very efficient, since it takes O(n) to increment the map with n the number of 2-tuples in the map. You can also implement incrementing the Map for the given item by using insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a, which is more efficient.

Get index of next smallest element in the list in Haskell

I m a newbie to Haskell. I am pretty good with Imperative languages but not with functional. Haskell is my first as a functional language.
I am trying to figure out, how to get the index of the smallest element in the list where the minimum element is defined by me.
Let me explain by examples.
For example :
Function signature
minList :: x -> [x]
let x = 2
let list = [2,3,5,4,6,5,2,1,7,9,2]
minList x list --output 1 <- is index
This should return 1. Because the at list[1] is 3. It returns 1 because 3 is the smallest element after x (=2).
let x = 1
let list = [3,5,4,6,5,2,1,7,9,2]
minList x list -- output 9 <- is index
It should return 9 because at list[9] is 2 and 2 is the smallest element after 1. x = 1 which is defined by me.
What I have tried so far.
minListIndex :: (Ord a, Num a) => a -> [a] -> a
minListIndex x [] = 0
minListIndex x (y:ys)
| x > y = length ys
| otherwise = m
where m = minListIndex x ys
When I load the file I get this error
• Couldn't match expected type ‘a’ with actual type ‘Int’
‘a’ is a rigid type variable bound by
the type signature for:
minListIndex :: forall a. (Ord a, Num a) => a -> [a] -> a
at myFile.hs:36:17
• In the expression: 1 + length ys
In an equation for ‘minListIndex’:
minListIndex x (y : ys)
| x > y = 1 + length ys
| otherwise = 1 + m
where
m = minListIndex x ys
• Relevant bindings include
m :: a (bound at myFile.hs:41:19)
ys :: [a] (bound at myFile.hs:38:19)
y :: a (bound at myFile.hs:38:17)
x :: a (bound at myFile.hs:38:14)
minListIndex :: a -> [a] -> a (bound at myFile.hs:37:1)
When I modify the function like this
minListIndex :: (Ord a, Num a) => a -> [a] -> a
minListIndex x [] = 0
minListIndex x (y:ys)
| x > y = 2 -- <- modified...
| otherwise = 3 -- <- modifiedd
where m = minListIndex x ys
I load the file again then it compiles and runs but ofc the output is not desired.
What is the problem with
| x > y = length ys
| otherwise = m
?
In short: Basically, I want to find the index of the smallest element but higher than the x which is defined by me in parameter/function signature.
Thanks for the help in advance!
minListIndex :: (Ord a, Num a) => a -> [a] -> a
The problem is that you are trying to return result of generic type a but it is actually index in a list.
Suppose you are trying to evaluate your function for a list of doubles. In this case compiler should instantiate function's type to Double -> [Double] -> Double which is nonsense.
Actually compiler notices that you are returning something that is derived from list's length and warns you that it is not possible to match generic type a with concrete Int.
length ys returns Int, so you can try this instead:
minListIndex :: Ord a => a -> [a] -> Int
Regarding your original problem, seems that you can't solve it with plain recursion. Consider defining helper recursive function with accumulator. In your case it can be a pair (min_value_so_far, its_index).
First off, I'd separate the index type from the list element type altogether. There's no apparent reason for them to be the same. I will use the BangPatterns extension to avoid a space leak without too much notation; enable that by adding {-# language BangPatterns #-} to the very top of the file. I will also import Data.Word to get access to the Word64 type.
There are two stages: first, find the index of the given element (if it's present) and the rest of the list beyond that point. Then, find the index of the minimum of the tail.
-- Find the 0-based index of the first occurrence
-- of the given element in the list, and
-- the rest of the list after that element.
findGiven :: Eq a => a -> [a] -> Maybe (Word64, [a])
findGiven given = go 0 where
go !_k [] = Nothing --not found
go !k (x:xs)
| given == xs = Just (k, xs)
| otherwise = go (k+1) xs
-- Find the minimum (and its index) of the elements of the
-- list greater than the given one.
findMinWithIndexOver :: Ord a => a -> [a] -> Maybe (Word64, a)
findMinWithIndexOver given = go 0 Nothing where
go !_k acc [] = acc
go !k acc (x : xs)
| x <= given = go (k + 1) acc xs
| otherwise
= case acc of
Nothing -> go (k + 1) (Just (k, x)) xs
Just (ix_min, curr_min)
| x < ix_min = go (k + 1) (Just (k, x)) xs
| otherwise = go (k + 1) acc xs
You can now put these functions together to construct the one you seek. If you want a general Num result rather than a Word64 one, you can use fromIntegral at the very end. Why use Word64? Unlike Int or Word, it's (practically) guaranteed not to overflow in any reasonable amount of time. It's likely substantially faster than using something like Integer or Natural directly.
It is not clear for me what do you want exactly. Based on examples I guess it is: find the index of the smallest element higher than x which appears after x. In that case, This solution is plain Prelude. No imports
minList :: Ord a => a -> [a] -> Int
minList x l = snd . minimum . filter (\a -> x < fst a) . dropWhile (\a -> x /= fst a) $ zip l [0..]
The logic is:
create the list of pairs, [(elem, index)] using zip l [0..]
drop elements until you find the input x using dropWhile (\a -> x /= fst a)
discards elements less than x using filter (\a -> x < fst a)
find the minimum of the resulting list. Tuples are ordered using lexicographic order so it fits your problem
take the index using snd
Your function can be constructed out of ready-made parts as
import Data.Maybe (listToMaybe)
import Data.List (sortBy)
import Data.Ord (comparing)
foo :: (Ord a, Enum b) => a -> [a] -> Maybe b
foo x = fmap fst . listToMaybe . take 1
. dropWhile ((<= x) . snd)
. sortBy (comparing snd)
. dropWhile ((/= x) . snd)
. zip [toEnum 0..]
This Maybe finds the index of the next smallest element in the list above the given element, situated after the given element, in the input list. As you've requested.
You can use any Enum type of your choosing as the index.
Now you can implement this higher-level executable specs as direct recursion, using an efficient Map data structure to hold your sorted elements above x seen so far to find the next smallest, etc.
Correctness first, efficiency later!
Efficiency update: dropping after the sort drops them sorted, so there's a wasted effort there; indeed it should be replaced with the filtering (as seen in the answer by Luis Morillo) before the sort. And if our element type is in Integral (so it is a properly discrete type, unlike just an Enum, thanks to #dfeuer for pointing this out!), there's one more opportunity for an opportunistic optimization: if we hit on a succ minimal element by pure chance, there's no further chance of improvement, and so we should bail out at that point right there:
bar :: (Integral a, Enum b) => a -> [a] -> Maybe b
bar x = fmap fst . either Just (listToMaybe . take 1
. sortBy (comparing snd))
. findOrFilter ((== succ x).snd) ((> x).snd)
. dropWhile ((/= x) . snd)
. zip [toEnum 0..]
findOrFilter :: (a -> Bool) -> (a -> Bool) -> [a] -> Either a [a]
findOrFilter t p = go
where go [] = Right []
go (x:xs) | t x = Left x
| otherwise = fmap ([x | p x] ++) $ go xs
Testing:
> foo 5 [2,3,5,4,6,5,2,1,7,9,2] :: Maybe Int
Just 4
> foo 2 [2,3,5,4,6,5,2,1,7,9,2] :: Maybe Int
Just 1
> foo 1 [3,5,4,6,5,2,1,7,9,2] :: Maybe Int
Just 9

Intersection of infinite lists

I know from computability theory that it is possible to take the intersection of two infinite lists, but I can't find a way to express it in Haskell.
The traditional method fails as soon as the second list is infinite, because you spend all your time checking it for a non-matching element in the first list.
Example:
let ones = 1 : ones -- an unending list of 1s
intersect [0,1] ones
This never yields 1, as it never stops checking ones for the element 0.
A successful method needs to ensure that each element of each list will be visited in finite time.
Probably, this will be by iterating through both lists, and spending approximately equal time checking all previously-visited elements in each list against each other.
If possible, I'd like to also have a way to ignore duplicates in the lists, as it is occasionally necessary, but this is not a requirement.
Using the universe package's Cartesian product operator we can write this one-liner:
import Data.Universe.Helpers
isect :: Eq a => [a] -> [a] -> [a]
xs `isect` ys = [x | (x, y) <- xs +*+ ys, x == y]
-- or this, which may do marginally less allocation
xs `isect` ys = foldr ($) [] $ cartesianProduct
(\x y -> if x == y then (x:) else id)
xs ys
Try it in ghci:
> take 10 $ [0,2..] `isect` [0,3..]
[0,6,12,18,24,30,36,42,48,54]
This implementation will not produce any duplicates if the input lists don't have any; but if they do, you can tack on your favorite dup-remover either before or after calling isect. For example, with nub, you might write
> nub ([0,1] `isect` repeat 1)
[1
and then heat up your computer pretty good, since it can never be sure there might not be a 0 in that second list somewhere if it looks deep enough.
This approach is significantly faster than David Fletcher's, produces many fewer duplicates and produces new values much more quickly than Willem Van Onsem's, and doesn't assume the lists are sorted like freestyle's (but is consequently much slower on such lists than freestyle's).
An idea might be to use incrementing bounds. Let is first relax the problem a bit: yielding duplicated values is allowed. In that case you could use:
import Data.List (intersect)
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
where intersectInfinite' n = intersect (take n xs) (take n ys) ++ intersectInfinite' (n+1)
In other words we claim that:
A∩B = A1∩B1 ∪ A2∩B2 ∪ ... ∪ ...
with A1 is a set containing the first i elements of A (yes there is no order in a set, but let's say there is somehow an order). If the set contains less elements then the full set is returned.
If c is in A (at index i) and in B (at index j), c will be emitted in segment (not index) max(i,j).
This will thus always generate an infinite list (with an infinite amount of duplicates) regardless whether the given lists are finite or not. The only exception is when you give it an empty list, in which case it will take forever. Nevertheless we here ensured that every element in the intersection will be emitted at least once.
Making the result finite (if the given lists are finite)
Now we can make our definition better. First we make a more advanced version of take, takeFinite (let's first give a straight-forward, but not very efficient defintion):
takeFinite :: Int -> [a] -> (Bool,[a])
takeFinite _ [] = (True,[])
takeFinite 0 _ = (False,[])
takeFinite n (x:xs) = let (b,t) = takeFinite (n-1) xs in (b,x:t)
Now we can iteratively deepen until both lists have reached the end:
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
intersectInfinite' :: Eq a => Int -> [a] -> [a] -> [a]
intersectInfinite' n xs ys | fa && fb = intersect xs ys
| fa = intersect ys xs
| fb = intersect xs ys
| otherwise = intersect xfa xfb ++ intersectInfinite' (n+1) xs ys
where (fa,xfa) = takeFinite n xs
(fb,xfb) = takeFinite n ys
This will now terminate given both lists are finite, but still produces a lot of duplicates. There are definitely ways to resolve this issue more.
Here's one way. For each x we make a list of maybes which has
Just x only where x appeared in ys. Then we interleave all
these lists.
isect :: Eq a => [a] -> [a] -> [a]
isect xs ys = (catMaybes . foldr interleave [] . map matches) xs
where
matches x = [if x == y then Just x else Nothing | y <- ys]
interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs
Maybe it can be improved using some sort of fairer interleaving -
it's already pretty slow on the example below because (I think)
it's doing an exponential amount of work.
> take 10 (isect [0..] [0,2..])
[0,2,4,6,8,10,12,14,16,18]
If elements in the lists are ordered then you can easy to do that.
intersectOrd :: Ord a => [a] -> [a] -> [a]
intersectOrd [] _ = []
intersectOrd _ [] = []
intersectOrd (x:xs) (y:ys) = case x `compare` y of
EQ -> x : intersectOrd xs ys
LT -> intersectOrd xs (y:ys)
GT -> intersectOrd (x:xs) ys
Here's yet another alternative, leveraging Control.Monad.WeightedSearch
import Control.Monad (guard)
import Control.Applicative
import qualified Control.Monad.WeightedSearch as W
We first define a cost for digging inside the list. Accessing the tail costs 1 unit more. This will ensure a fair scheduling among the two infinite lists.
eachW :: [a] -> W.T Int a
eachW = foldr (\x w -> pure x <|> W.weight 1 w) empty
Then, we simply disregard infinite lists.
intersection :: [Int] -> [Int] -> [Int]
intersection xs ys = W.toList $ do
x <- eachW xs
y <- eachW ys
guard (x==y)
return y
Even better with MonadComprehensions on:
intersection2 :: [Int] -> [Int] -> [Int]
intersection2 xs ys = W.toList [ y | x <- eachW xs, y <- eachW ys, x==y ]
Solution
I ended up using the following implementation; a slight modification of the answer by David Fletcher:
isect :: Eq a => [a] -> [a] -> [a]
isect [] = const [] -- don't bother testing against an empty list
isect xs = catMaybes . diagonal . map matches
where matches y = [if x == y then Just x else Nothing | x <- xs]
This can be augmented with nub to filter out duplicates:
isectUniq :: Eq a => [a] -> [a] -> [a]
isectUniq xs = nub . isect xs
Explanation
Of the line isect xs = catMaybes . diagonal . map matches
(map matches) ys computes a list of lists of comparisons between elements of xs and ys, where the list indices specify the indices in ys and xs respectively: i.e (map matches) ys !! 3 !! 0 would represent the comparison of ys !! 3 with xs !! 0, which would be Nothing if those values differ. If those values are the same, it would be Just that value.
diagonals takes a list of lists and returns a list of lists where the nth output list contains an element each from the first n lists. Another way to conceptualise it is that (diagonals . map matches) ys !! n contains comparisons between elements whose indices in xs and ys sum to n.
diagonal is simply a flat version of diagonals (diagonal = concat diagonals)
Therefore (diagonal . map matches) ys is a list of comparisons between elements of xs and ys, where the elements are approximately sorted by the sum of the indices of the elements of ys and xs being compared; this means that early elements are compared to later elements with the same priority as middle elements being compared to each other.
(catMaybes . diagonal . map matches) ys is a list of only the elements which are in both lists, where the elements are approximately sorted by the sum of the indices of the two elements being compared.
Note
(diagonal . map (catMaybes . matches)) ys does not work: catMaybes . matches only yields when it finds a match, instead of also yielding Nothing on no match, so the interleaving does nothing to distribute the work.
To contrast, in the chosen solution, the interleaving of Nothing and Just values by diagonal means that the program divides its attention between 'searching' for multiple different elements, not waiting for one to succeed; whereas if the Nothing values are removed before interleaving, the program may spend too much time waiting for a fruitless 'search' for a given element to succeed.
Therefore, we would encounter the same problem as in the original question: while one element does not match any elements in the other list, the program will hang; whereas the chosen solution will only hang while no matches are found for any elements in either list.

Is there a function that takes a list and returns a list of duplicate elements in that list?

Is there a Haskell function that takes a list and returns a list of duplicates/redundant elements in that list?
I'm aware of the the nub and nubBy functions, but they remove the duplicates; I would like to keep the dupes and collects them in a list.
The simplest way to do this, which is extremely inefficient, is to use nub and \\:
import Data.List (nub, (\\))
getDups :: Eq a => [a] -> [a]
getDups xs = xs \\ nub xs
If you can live with an Ord constraint, everything gets much nicer:
import Data.Set (member, empty, insert)
getDups :: Ord a => [a] -> [a]
getDups xs = foldr go (const []) xs empty
where
go x cont seen
| member x seen = x : r seen
| otherwise = r (insert x seen)
I wrote these functions which seems to work well.
The first one return the list of duplicates element in a list with a basic equlity test (==)
duplicate :: Eq a => [a] -> [a]
duplicate [] = []
duplicate (x:xs)
| null pres = duplicate abs
| otherwise = x:pres++duplicate abs
where (pres,abs) = partition (x ==) xs
The second one make the same job by providing a equality test function (like nubBy)
duplicateBy :: (a -> a -> Bool) -> [a] -> [a]
duplicateBy eq [] = []
duplicateBy eq (x:xs)
| null pres = duplicateBy eq abs
| otherwise = x:pres++duplicateBy eq abs
where (pres,abs) = partition (eq x) xs
Is there a Haskell function that takes a list and returns a list of duplicates/redundant elements in that list?
You can write such a function yourself easily enough. Use a helper function that takes two list arguments, the first one of which being the list whose dupes are sought; walk along that list and accumulate the dupes in the second argument; finally, return the latter when the first argument is the empty list.
dupes l = dupes' l []
where
dupes' [] ls = ls
dupes' (x:xs) ls
| not (x `elem` ls) && x `elem` xs = dupes' xs (x:ls)
| otherwise = dupes' xs ls
Test:
λ> dupes [1,2,3,3,2,2,3,4]
[3,2]
Be aware that the asymptotic time complexity is as bad as that of nub, though: O(n^2). If you want better asymptotics, you'll need an Ord class constraint.
If you are happy with an Ord constraint you can use group from Data.List:
getDups :: Ord a => [a] -> [a]
getDups = concatMap (drop 1) . group . sort

unique elements in a haskell list

okay, this is probably going to be in the prelude, but: is there a standard library function for finding the unique elements in a list? my (re)implementation, for clarification, is:
has :: (Eq a) => [a] -> a -> Bool
has [] _ = False
has (x:xs) a
| x == a = True
| otherwise = has xs a
unique :: (Eq a) => [a] -> [a]
unique [] = []
unique (x:xs)
| has xs x = unique xs
| otherwise = x : unique xs
I searched for (Eq a) => [a] -> [a] on Hoogle.
First result was nub (remove duplicate elements from a list).
Hoogle is awesome.
The nub function from Data.List (no, it's actually not in the Prelude) definitely does something like what you want, but it is not quite the same as your unique function. They both preserve the original order of the elements, but unique retains the last
occurrence of each element, while nub retains the first occurrence.
You can do this to make nub act exactly like unique, if that's important (though I have a feeling it's not):
unique = reverse . nub . reverse
Also, nub is only good for small lists.
Its complexity is quadratic, so it starts to get slow if your list can contain hundreds of elements.
If you limit your types to types having an Ord instance, you can make it scale better.
This variation on nub still preserves the order of the list elements, but its complexity is O(n * log n):
import qualified Data.Set as Set
nubOrd :: Ord a => [a] -> [a]
nubOrd xs = go Set.empty xs where
go s (x:xs)
| x `Set.member` s = go s xs
| otherwise = x : go (Set.insert x s) xs
go _ _ = []
In fact, it has been proposed to add nubOrd to Data.Set.
import Data.Set (toList, fromList)
uniquify lst = toList $ fromList lst
I think that unique should return a list of elements that only appear once in the original list; that is, any elements of the orginal list that appear more than once should not be included in the result.
May I suggest an alternative definition, unique_alt:
unique_alt :: [Int] -> [Int]
unique_alt [] = []
unique_alt (x:xs)
| elem x ( unique_alt xs ) = [ y | y <- ( unique_alt xs ), y /= x ]
| otherwise = x : ( unique_alt xs )
Here are some examples that highlight the differences between unique_alt and unqiue:
unique [1,2,1] = [2,1]
unique_alt [1,2,1] = [2]
unique [1,2,1,2] = [1,2]
unique_alt [1,2,1,2] = []
unique [4,2,1,3,2,3] = [4,1,2,3]
unique_alt [4,2,1,3,2,3] = [4,1]
I think this would do it.
unique [] = []
unique (x:xs) = x:unique (filter ((/=) x) xs)
Another way to remove duplicates:
unique :: [Int] -> [Int]
unique xs = [x | (x,y) <- zip xs [0..], x `notElem` (take y xs)]
Algorithm in Haskell to create a unique list:
data Foo = Foo { id_ :: Int
, name_ :: String
} deriving (Show)
alldata = [ Foo 1 "Name"
, Foo 2 "Name"
, Foo 3 "Karl"
, Foo 4 "Karl"
, Foo 5 "Karl"
, Foo 7 "Tim"
, Foo 8 "Tim"
, Foo 9 "Gaby"
, Foo 9 "Name"
]
isolate :: [Foo] -> [Foo]
isolate [] = []
isolate (x:xs) = (fst f) : isolate (snd f)
where
f = foldl helper (x,[]) xs
helper (a,b) y = if name_ x == name_ y
then if id_ x >= id_ y
then (x,b)
else (y,b)
else (a,y:b)
main :: IO ()
main = mapM_ (putStrLn . show) (isolate alldata)
Output:
Foo {id_ = 9, name_ = "Name"}
Foo {id_ = 9, name_ = "Gaby"}
Foo {id_ = 5, name_ = "Karl"}
Foo {id_ = 8, name_ = "Tim"}
A library-based solution:
We can use that style of Haskell programming where all looping and recursion activities are pushed out of user code and into suitable library functions. Said library functions are often optimized in ways that are way beyond the skills of a Haskell beginner.
A way to decompose the problem into two passes goes like this:
produce a second list that is parallel to the input list, but with duplicate elements suitably marked
eliminate elements marked as duplicates from that second list
For the first step, duplicate elements don't need a value at all, so we can use [Maybe a] as the type of the second list. So we need a function of type:
pass1 :: Eq a => [a] -> [Maybe a]
Function pass1 is an example of stateful list traversal where the state is the list (or set) of distinct elements seen so far. For this sort of problem, the library provides the mapAccumL :: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function.
Here the mapAccumL function requires, besides the initial state and the input list, a step function argument, of type s -> a -> (s, Maybe a).
If the current element x is not a duplicate, the output of the step function is Just x and x gets added to the current state. If x is a duplicate, the output of the step function is Nothing, and the state is passed unchanged.
Testing under the ghci interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
λ>
λ> import Data.List(mapAccumL)
λ>
λ> pass1 xs = mapAccumL stepFn [] xs
λ>
λ> xs2 = snd $ pass1 "abacrba"
λ> xs2
[Just 'a', Just 'b', Nothing, Just 'c', Just 'r', Nothing, Nothing]
λ>
Writing a pass2 function is even easier. To filter out Nothing non-values, we could use:
import Data.Maybe( fromJust, isJust)
pass2 = (map fromJust) . (filter isJust)
but why bother at all ? - as this is precisely what the catMaybes library function does.
λ>
λ> import Data.Maybe(catMaybes)
λ>
λ> catMaybes xs2
"abcr"
λ>
Putting it all together:
Overall, the source code can be written as:
import Data.Maybe(catMaybes)
import Data.List(mapAccumL)
uniques :: (Eq a) => [a] -> [a]
uniques = let stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
in catMaybes . snd . mapAccumL stepFn []
This code is reasonably compatible with infinite lists, something occasionally referred to as being “laziness-friendly”:
λ>
λ> take 5 $ uniques $ "abacrba" ++ (cycle "abcrf")
"abcrf"
λ>
Efficiency note:
If we anticipate that it is possible to find many distinct elements in the input list and we can have an Ord a instance, the state can be implemented as a Set object rather than a plain list, this without having to alter the overall structure of the solution.
Here's a solution that uses only Prelude functions:
uniqueList theList =
if not (null theList)
then head theList : filter (/= head theList) (uniqueList (tail theList))
else []
I'm assuming this is equivalent to running two or three nested "for" loops (running through each element, then running through each element again to check for other elements with the same value, then removing those other elements) so I'd estimate this is O(n^2) or O(n^3)
Might even be better than reversing a list, nubbing it, then reversing it again, depending on your circumstances.