Is there a function that takes a list and returns a list of duplicate elements in that list? - list

Is there a Haskell function that takes a list and returns a list of duplicates/redundant elements in that list?
I'm aware of the the nub and nubBy functions, but they remove the duplicates; I would like to keep the dupes and collects them in a list.

The simplest way to do this, which is extremely inefficient, is to use nub and \\:
import Data.List (nub, (\\))
getDups :: Eq a => [a] -> [a]
getDups xs = xs \\ nub xs
If you can live with an Ord constraint, everything gets much nicer:
import Data.Set (member, empty, insert)
getDups :: Ord a => [a] -> [a]
getDups xs = foldr go (const []) xs empty
where
go x cont seen
| member x seen = x : r seen
| otherwise = r (insert x seen)

I wrote these functions which seems to work well.
The first one return the list of duplicates element in a list with a basic equlity test (==)
duplicate :: Eq a => [a] -> [a]
duplicate [] = []
duplicate (x:xs)
| null pres = duplicate abs
| otherwise = x:pres++duplicate abs
where (pres,abs) = partition (x ==) xs
The second one make the same job by providing a equality test function (like nubBy)
duplicateBy :: (a -> a -> Bool) -> [a] -> [a]
duplicateBy eq [] = []
duplicateBy eq (x:xs)
| null pres = duplicateBy eq abs
| otherwise = x:pres++duplicateBy eq abs
where (pres,abs) = partition (eq x) xs

Is there a Haskell function that takes a list and returns a list of duplicates/redundant elements in that list?
You can write such a function yourself easily enough. Use a helper function that takes two list arguments, the first one of which being the list whose dupes are sought; walk along that list and accumulate the dupes in the second argument; finally, return the latter when the first argument is the empty list.
dupes l = dupes' l []
where
dupes' [] ls = ls
dupes' (x:xs) ls
| not (x `elem` ls) && x `elem` xs = dupes' xs (x:ls)
| otherwise = dupes' xs ls
Test:
λ> dupes [1,2,3,3,2,2,3,4]
[3,2]
Be aware that the asymptotic time complexity is as bad as that of nub, though: O(n^2). If you want better asymptotics, you'll need an Ord class constraint.

If you are happy with an Ord constraint you can use group from Data.List:
getDups :: Ord a => [a] -> [a]
getDups = concatMap (drop 1) . group . sort

Related

Sum corresponding elements of two lists, with the extra elements of the longer list added at the end

I'm trying to add two lists together and keep the extra elements that are unused and add those into the new list e.g.
addLists [1,2,3] [1,3,5,7,9] = [2,5,8,7,9]
I have this so far:
addLists :: Num a => [a] -> [a] -> [a]
addLists xs ys = zipWith (+) xs ys
but unsure of how to get the extra elements into the new list.
and the next step is changing this to a higher order function that takes the combining function
as an argument:
longZip :: (a -> a -> a) -> [a] -> [a] -> [a]
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] is implemented as [src]:
zipWith :: (a->b->c) -> [a]->[b]->[c]
zipWith f = go
where
go [] _ = []
go _ [] = []
go (x:xs) (y:ys) = f x y : go xs ys
It thus uses explicit recursion where go will check if the two lists are non-empty and in that case yield f x y, otherwise it stops and returns an empty list [].
You can implement a variant of zipWith which will continue, even if one of the lists is empty. THis will look like:
zipLongest :: (a -> a -> a) -> [a] -> [a] -> [a]
zipLongest f = go
where go [] ys = …
go xs [] = …
go (x:xs) (y:ys) = f x y : go xs ys
where you still need to fill in ….
You can do it with higher order functions as simple as
import Data.List (transpose)
addLists :: Num a => [a] -> [a] -> [a]
addLists xs ys = map sum . transpose $ [xs, ys]
because the length of transpose[xs, ys, ...] is the length of the longest list in its argument list, and sum :: (Foldable t, Num a) => t a -> a is already defined to sum the elements of a list (since lists are Foldable).
transpose is used here as a kind of a zip (but cutting on the longest instead of the shortest list), with [] being a default element for the lists addition ++, like 0 is a default element for the numbers addition +:
cutLongest [xs, ys] $
zipWith (++) (map pure xs ++ repeat []) (map pure ys ++ repeat [])
See also:
Zip with default value instead of dropping values?
You're looking for the semialign package. It gives you an operation like zipping, but that keeps going until both lists run out. It also generalizes to types other than lists, such as rose trees. In your case, you'd use it like this:
import Data.Semialign
import Data.These
addLists :: (Semialign f, Num a) => f a -> f a -> f a
addLists = alignWith (mergeThese (+))
longZip :: Semialign f => (a -> a -> a) -> f a -> f a -> f a
longZip = alignWith . mergeThese
The new type signatures are optional. If you want, you can keep using your old ones that restrict them to lists.

How to get the Index of an element in a list, by not using "list comprehensions"?

I'm new in haskell programming and I try to solve a problem by/not using list comprehensions.
The Problem is to find the index of an element in a list and return a list of the indexes (where the elements in the list was found.)
I already solved the problem by using list comprehensions but now i have some problems to solve the problem without using list comprehensions.
On my recursive way:
I tried to zip a list of [0..(length list)] and the list as it self.
then if the element a equals an element in the list -> make a new list with the first element of the Tupel of the zipped list(my index) and after that search the function on a recursive way until the list is [].
That's my list comprehension (works):
positions :: Eq a => a -> [a] -> [Int]
positions a list = [x | (x,y) <- zip [0..(length list)] list, a == y]
That's my recursive way (not working):
positions' :: Eq a => a -> [a] -> [Int]
positions' _ [] = []
positions' a (x:xs) =
let ((n,m):ns) = zip [0..(length (x:xs))] (x:xs)
in if (a == m) then n:(positions' a xs)
else (positions' a xs)
*sorry I don't know how to highlight words
but ghci says:
*Main> positions' 2 [1,2,3,4,5,6,7,8,8,9,2]
[0,0]
and it should be like that (my list comprehension):
*Main> positions 2 [1,2,3,4,5,6,7,8,8,9,2]
[1,10]
Where is my mistake ?
The problem with your attempt is simply that when you say:
let ((n,m):ns) = zip [0..(length (x:xs))] (x:xs)
then n will always be 0. That's because you are matching (n,m) against the first element of zip [0..(length (x:xs))] (x:xs), which will necessarily always be (0,x).
That's not a problem in itself - but it does mean you have to handle the recursive step properly. The way you have it now, positions _ _, if non-empty, will always have 0 as its first element, because the only way you allow it to find a match is if it's at the head of the list, resulting in an index of 0. That means that your result will always be a list of the correct length, but with all elements 0 - as you're seeing.
The problem isn't with your recursion scheme though, it's to do with the fact that you're not modifying the result to account for the fact that you don't always want 0 added to the front of the result list. Since each recursive call just adds 1 to the index you want to find, all you need to do is map the increment function (+1) over the recursive result:
positions' :: Eq a => a -> [a] -> [Int]
positions' _ [] = []
positions' a (x:xs) =
let ((0,m):ns) = zip [0..(length (x:xs))] (x:xs)
in if (a == m) then 0:(map (+1) (positions' a xs))
else (map (+1) (positions' a xs))
(Note that I've changed your let to be explicit that n will always be 0 - I prefer to be explicit this way but this in itself doesn't change the output.) Since m is always bound to x and ns isn't used at all, we can elide the let, inlining the definition of m:
positions' :: Eq a => a -> [a] -> [Int]
positions' _ [] = []
positions' a (x:xs) =
if a == x
then 0 : map (+1) (positions' a xs)
else map (+1) (positions' a xs)
You could go on to factor out the repeated map (+1) (positions' a xs) if you wanted to.
Incidentally, you didn't need explicit recursion to avoid a list comprehension here. For one, list comprehensions are basically a replacement for uses of map and filter. I was going to write this out explicitly, but I see #WillemVanOnsem has given this as an answer so I will simply refer you to his answer.
Another way, although perhaps not acceptable if you were asked to implement this yourself, would be to just use the built-in elemIndices function, which does exactly what you are trying to implement here.
We can make use of a filter :: (a -> Bool) -> [a] -> [a] and map :: (a -> b) -> [a] -> [b] approach, like:
positions :: Eq a => a -> [a] -> [Int]
positions x = map fst . filter ((x ==) . snd) . zip [0..]
We thus first construct tuples of the form (i, yi), next we filter such that we only retain these tuples for which x == yi, and finally we fetch the first item of these tuples.
For example:
Prelude> positions 'o' "foobaraboof"
[1,2,8,9]
Your
let ((n,m):ns) = zip [0..(length (x:xs))] (x:xs)
is equivalent to
== {- by laziness -}
let ((n,m):ns) = zip [0..] (x:xs)
== {- by definition of zip -}
let ((n,m):ns) = (0,x) : zip [1..] xs
== {- by pattern matching -}
let {(n,m) = (0,x)
; ns = zip [1..] xs }
== {- by pattern matching -}
let { n = 0
; m = x
; ns = zip [1..] xs }
but you never reference ns! So we don't need its binding at all:
positions' a (x:xs) =
let { n = 0 ; m = x } in
if (a == m) then n : (positions' a xs)
else (positions' a xs)
and so, by substitution, you actually have
positions' :: Eq a => a -> [a] -> [Int]
positions' _ [] = []
positions' a (x:xs) =
if (a == x) then 0 : (positions' a xs) -- NB: 0
else (positions' a xs)
And this is why all you ever produce are 0s. But you want to produce the correct index: 0, 1, 2, 3, ....
First, let's tweak your code a little bit further into
positions' :: Eq a => a -> [a] -> [Int]
positions' a = go xs
where
go [] = []
go (x:xs) | a == x = 0 : go xs -- NB: 0
| otherwise = go xs
This is known as a worker/wrapper transform. go is a worker, positions' is a wrapper. There's no need to pass a around from call to call, it doesn't change, and we have access to it anyway. It is in the enclosing scope with respect to the inner function, go. We've also used guards instead of the more verbose and less visually apparent if ... then ... else.
Now we just need to use something -- the correct index value -- instead of 0.
To use it, we must have it first. What is it? It starts as 0, then it is incremented on each step along the input list.
When do we make a step along the input list? At the recursive call:
positions' :: Eq a => a -> [a] -> [Int]
positions' a = go xs 0
where
go [] _ = []
go (x:xs) i | a == x = 0 : go xs (i+1) -- NB: 0
| otherwise = go xs (i+1)
_ as a pattern means we don't care about the argument's value -- it's there but we're not going to use it.
Now all that's left for us to do is to use that i in place of that 0.

Take From a List While Increasing

I have a list of values that I would like to take from while the value is increasing. I assume it would always take the head of the list and then compare it to the next value. The function will continue to take as long as this continues to increase. Upon reaching an list element that is less than or equal the pervious value the list is returned.
takeIncreasing :: (Ord a) => [a] -> [a]
takeIncreasing [1,2,3,4,3,5,6,7,8] -- Should return [1,2,3,4]
A fold could compare the last element of the accumulation with the next value and append if the condition is met, but would continue to the end of the list. I would like the function to stop taking at the first instance the constraint is not met.
This seems like an application of a monad but cannot determine if an existing monad accomplishes this.
A fold [...] would continue to the end of the list. I would like the function to stop taking at the first instance the constraint is not met.
A right fold can short circuit:
fun :: Ord a => [a] -> [a]
fun [] = []
fun (x:xs) = x: foldr go (const []) xs x
where go x f i = if i < x then x: f x else []
then,
\> fun [1,2,3,4,3,undefined]
[1,2,3,4]
or infinite size list:
\> fun $ [1,2,3,4,3] ++ [1..]
[1,2,3,4]
Right folds are magical, so you never even have to pattern match on the list.
twi xs = foldr go (const []) xs Nothing where
go x _ (Just prev)
| x < prev = []
go x r _ = x : r (Just x)
Or one that IMO has a bit less code complexity:
takeIncreasing :: Ord x => [x] -> [x]
takeIncreasing (x:x':xs) | x < x' = x : takeIncreasing (x':xs)
| otherwise = [x]
takeIncreasing xs = xs
This one is just a bit less clever than previous suggestions. I like un-clever code.
A solution without folds:
takeIncreasing :: Ord a => [a] -> [a]
takeIncreasing [] = []
takeIncreasing (x:xs) = (x :) . map snd . takeWhile (uncurry (<)) $ zip (x:xs) xs

Haskell: Removing duplicates tuples from a list?

I'm trying to get from the before to after state. Is there a convenient Haskell function for removing duplicate tuples from a list? Or perhaps it is something a bit more complicated such as iterating through the entire list?
Before: the list of tuples, sorted by word, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
Searching for Eq a => [a] -> [a] on hoogle, returns nub function:
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)
As in the documentation the more general case is nubBy.
That said, this is an O(n^2) algorithm and may not be very efficient. An alternative would be to use Data.Set.fromList if the values are an instance of Ord type-class, as in:
import qualified Data.Set as Set
nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList
though this will not maintain the order of the original list.
A simple set style solution which maintains the order of the original list can be:
import Data.Set (Set, member, insert, empty)
nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
where
loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
loop acc#(xs, obs) x
| x `member` obs = acc
| otherwise = (x:xs, x `insert` obs)
If you want to define a version of nub for Ord, I recommend using
nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
where
go x r obs
| x `member` obs = r obs
| otherwise = obs' `seq` x : r obs'
where obs' = x `insert` obs
To see what this is doing, you can get rid of the foldr:
nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
where
nub'' [] obs = obs `seq` []
nub'' (y : ys) obs
| y `member` obs = nub'' ys obs
| otherwise = obs' `seq` y : nub'' ys obs'
where obs' = y `insert` obs
One key point about this implementation, as opposed to
behzad.nouri's, is that it produces elements lazily, as they are consumed. This is generally much better for cache utilization and garbage collection, as well as using a constant factor less memory than the reversing algorithm.

unique elements in a haskell list

okay, this is probably going to be in the prelude, but: is there a standard library function for finding the unique elements in a list? my (re)implementation, for clarification, is:
has :: (Eq a) => [a] -> a -> Bool
has [] _ = False
has (x:xs) a
| x == a = True
| otherwise = has xs a
unique :: (Eq a) => [a] -> [a]
unique [] = []
unique (x:xs)
| has xs x = unique xs
| otherwise = x : unique xs
I searched for (Eq a) => [a] -> [a] on Hoogle.
First result was nub (remove duplicate elements from a list).
Hoogle is awesome.
The nub function from Data.List (no, it's actually not in the Prelude) definitely does something like what you want, but it is not quite the same as your unique function. They both preserve the original order of the elements, but unique retains the last
occurrence of each element, while nub retains the first occurrence.
You can do this to make nub act exactly like unique, if that's important (though I have a feeling it's not):
unique = reverse . nub . reverse
Also, nub is only good for small lists.
Its complexity is quadratic, so it starts to get slow if your list can contain hundreds of elements.
If you limit your types to types having an Ord instance, you can make it scale better.
This variation on nub still preserves the order of the list elements, but its complexity is O(n * log n):
import qualified Data.Set as Set
nubOrd :: Ord a => [a] -> [a]
nubOrd xs = go Set.empty xs where
go s (x:xs)
| x `Set.member` s = go s xs
| otherwise = x : go (Set.insert x s) xs
go _ _ = []
In fact, it has been proposed to add nubOrd to Data.Set.
import Data.Set (toList, fromList)
uniquify lst = toList $ fromList lst
I think that unique should return a list of elements that only appear once in the original list; that is, any elements of the orginal list that appear more than once should not be included in the result.
May I suggest an alternative definition, unique_alt:
unique_alt :: [Int] -> [Int]
unique_alt [] = []
unique_alt (x:xs)
| elem x ( unique_alt xs ) = [ y | y <- ( unique_alt xs ), y /= x ]
| otherwise = x : ( unique_alt xs )
Here are some examples that highlight the differences between unique_alt and unqiue:
unique [1,2,1] = [2,1]
unique_alt [1,2,1] = [2]
unique [1,2,1,2] = [1,2]
unique_alt [1,2,1,2] = []
unique [4,2,1,3,2,3] = [4,1,2,3]
unique_alt [4,2,1,3,2,3] = [4,1]
I think this would do it.
unique [] = []
unique (x:xs) = x:unique (filter ((/=) x) xs)
Another way to remove duplicates:
unique :: [Int] -> [Int]
unique xs = [x | (x,y) <- zip xs [0..], x `notElem` (take y xs)]
Algorithm in Haskell to create a unique list:
data Foo = Foo { id_ :: Int
, name_ :: String
} deriving (Show)
alldata = [ Foo 1 "Name"
, Foo 2 "Name"
, Foo 3 "Karl"
, Foo 4 "Karl"
, Foo 5 "Karl"
, Foo 7 "Tim"
, Foo 8 "Tim"
, Foo 9 "Gaby"
, Foo 9 "Name"
]
isolate :: [Foo] -> [Foo]
isolate [] = []
isolate (x:xs) = (fst f) : isolate (snd f)
where
f = foldl helper (x,[]) xs
helper (a,b) y = if name_ x == name_ y
then if id_ x >= id_ y
then (x,b)
else (y,b)
else (a,y:b)
main :: IO ()
main = mapM_ (putStrLn . show) (isolate alldata)
Output:
Foo {id_ = 9, name_ = "Name"}
Foo {id_ = 9, name_ = "Gaby"}
Foo {id_ = 5, name_ = "Karl"}
Foo {id_ = 8, name_ = "Tim"}
A library-based solution:
We can use that style of Haskell programming where all looping and recursion activities are pushed out of user code and into suitable library functions. Said library functions are often optimized in ways that are way beyond the skills of a Haskell beginner.
A way to decompose the problem into two passes goes like this:
produce a second list that is parallel to the input list, but with duplicate elements suitably marked
eliminate elements marked as duplicates from that second list
For the first step, duplicate elements don't need a value at all, so we can use [Maybe a] as the type of the second list. So we need a function of type:
pass1 :: Eq a => [a] -> [Maybe a]
Function pass1 is an example of stateful list traversal where the state is the list (or set) of distinct elements seen so far. For this sort of problem, the library provides the mapAccumL :: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function.
Here the mapAccumL function requires, besides the initial state and the input list, a step function argument, of type s -> a -> (s, Maybe a).
If the current element x is not a duplicate, the output of the step function is Just x and x gets added to the current state. If x is a duplicate, the output of the step function is Nothing, and the state is passed unchanged.
Testing under the ghci interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
λ>
λ> import Data.List(mapAccumL)
λ>
λ> pass1 xs = mapAccumL stepFn [] xs
λ>
λ> xs2 = snd $ pass1 "abacrba"
λ> xs2
[Just 'a', Just 'b', Nothing, Just 'c', Just 'r', Nothing, Nothing]
λ>
Writing a pass2 function is even easier. To filter out Nothing non-values, we could use:
import Data.Maybe( fromJust, isJust)
pass2 = (map fromJust) . (filter isJust)
but why bother at all ? - as this is precisely what the catMaybes library function does.
λ>
λ> import Data.Maybe(catMaybes)
λ>
λ> catMaybes xs2
"abcr"
λ>
Putting it all together:
Overall, the source code can be written as:
import Data.Maybe(catMaybes)
import Data.List(mapAccumL)
uniques :: (Eq a) => [a] -> [a]
uniques = let stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
in catMaybes . snd . mapAccumL stepFn []
This code is reasonably compatible with infinite lists, something occasionally referred to as being “laziness-friendly”:
λ>
λ> take 5 $ uniques $ "abacrba" ++ (cycle "abcrf")
"abcrf"
λ>
Efficiency note:
If we anticipate that it is possible to find many distinct elements in the input list and we can have an Ord a instance, the state can be implemented as a Set object rather than a plain list, this without having to alter the overall structure of the solution.
Here's a solution that uses only Prelude functions:
uniqueList theList =
if not (null theList)
then head theList : filter (/= head theList) (uniqueList (tail theList))
else []
I'm assuming this is equivalent to running two or three nested "for" loops (running through each element, then running through each element again to check for other elements with the same value, then removing those other elements) so I'd estimate this is O(n^2) or O(n^3)
Might even be better than reversing a list, nubbing it, then reversing it again, depending on your circumstances.