Remove duplicate elements from a list in Erlang - list

How can I remove the duplicate from a list in Erlang?
Suppose I have a list like:
[1,1,2,3,4,5,5,6]
How can I get:
[1,2,3,4,5,6]

You could use sets, for example:
my_nonDuplicate_list1() ->
List = [1,1,2,3,4,5,5,6],
Set = sets:from_list(List),
sets:to_list(Set).
This returns [1,2,3,4,5], no more duplicates, but most likely not sorted.
Another possibility without the usage of sets would be:
my_nonDuplicate_list2() ->
List = [1,1,2,3,4,5,5,6],
lists:usort(List).
In this case it returns [1,2,3,4,5], no more duplicates and sorted.

And for those looking to preserve the order of the list:
remove_dups([]) -> [];
remove_dups([H|T]) -> [H | [X || X <- remove_dups(T), X /= H]].

A possible solution that will Preserve the order of the elements to help you learn how to manipulate lists, would involve two functions:
delete_all(Item, [Item | Rest_of_list]) ->
delete_all(Item, Rest_of_list);
delete_all(Item, [Another_item| Rest_of_list]) ->
[Another_item | delete_all(Item, Rest_of_list)];
delete_all(_, []) -> [].
remove_duplicates(List)-> removing(List,[]).
removing([],This) -> lists:reverse(This);
removing([A|Tail],Acc) ->
removing(delete_all(A,Tail),[A|Acc]).
To test,
Eshell V5.9 (abort with ^G)
1> mymod:remove_duplicates([1,2,3,1,2,4,1,2,1]).
[1,2,3,4]
2>

I would do something like this at first to preserve order, though it is not recommended. Remember that AddedStuff ++ Accumulator is OK but Accumulator ++ AddedStuff is really bad.
rm_dup(List) ->
lists:foldl(
fun(Elem, Acc) ->
case lists:member(Elem, Acc) of
true ->
Acc;
false ->
Acc ++ [Elem]
end
end, [], List
).
This solution is much more efficient if you want to preserve order:
rm_dup(List) ->
lists:reverse(lists:foldl(
fun(Elem, Acc) ->
case lists:member(Elem, Acc) of
true ->
Acc;
false ->
[Elem] ++ Acc
end
end, [], List
)).

for my opinion, the best option is to use lists:usort()
But in case you don't want to use BIF's, and you want the list to be sorted, I suggest a version of quick sort, in this implementation you will get the list sorted without duplicate values.
unique_sort([]) -> [];
unique_sort([Pivot|T]) ->
unique_sort ([X || X <- T, X < Pivot ) ]++
[Pivot] ++
unique_sort ([X || X <- T, X > Pivot ]).

Module sets has two functions that can be composed and do the job in an efficient way: sets:from_list/1 returns a set with all the elements of a list (with no duplicated elements from definition) and sets:to_list/1 returns a list with the elements of a set. Here is an example of use:
4> sets:to_list(sets:from_list([1,1,2,3,4,5,5,6])).
[3,6,2,5,1,4]
We could define the function as
nub(L) -> sets:to_list(sets:from_list(L)).

Related

Haskell function to keep the repeating elements of a list

Here is the expected input/output:
repeated "Mississippi" == "ips"
repeated [1,2,3,4,2,5,6,7,1] == [1,2]
repeated " " == " "
And here is my code so far:
repeated :: String -> String
repeated "" = ""
repeated x = group $ sort x
I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.
Your code already does half of the job
> group $ sort "Mississippi"
["M","iiii","pp","ssss"]
You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
Using this:
> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]
Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:
> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"
Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.
Yet another option is to use a list comprehension
> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"
This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.
Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:
repeated :: Ord a => [a] -> [a]
After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.
Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.
So, you might fill in the following skeleton:
module Repeated (repeated) where
import Data.List (group, sort)
repeated :: Ord a => [a] -> [a]
repeated = map _
. filter (\x -> _)
. group
. sort
I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)
When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.
Here's some other approaches, to evaluate #chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)
While it's true that sorting is O(n lg n), ...
It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:
a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...
group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.
while greatly complicating the implementation.
The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.
repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.
repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
inserter2 x [] = [(x, False)]
inserter2 x (xb#(x', _): xs)
| x == x' = (x', True): xs
| otherwise = xb: inserter2 x xs
repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.
repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
inserter3 x [] = [(x, 1)]
inserter3 x xss#(xc#(x', c): xs) = case x `compare` x' of
{ LT -> ((x, 1): xss)
; EQ -> ((x', c+1): xs)
; GT -> (xc: inserter3 x xs)
}
repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.
repeated4 xs = sort $ go4 [] [] xs
go4 :: Ord a => [a] -> [a] -> [a] -> [a]
go4 repeats _ [] = repeats
go4 repeats onces (x: xs) = case findUpd x onces of
{ (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
; (False, oncesU) -> go4 repeats oncesU xs
}
findUpd :: Ord a => a -> [a] -> (Bool, [a])
findUpd x [] = (False, [x])
findUpd x (x': os) | x == x' = (True, os) -- i.e. x' removed
| otherwise =
let (b, os') = findUpd x os in (b, x': os')
(That last bit of list-fiddling in findUpd is very similar to span.)

Haskell - Removing adjacent duplicates from a list

I'm trying to learn haskell by solving some online problems and training exercises.
Right now I'm trying to make a function that'd remove adjacent duplicates from a list.
Sample Input
"acvvca"
"1456776541"
"abbac"
"aabaabckllm"
Expected Output
""
""
"c"
"ckm"
My first though was to make a function that'd simply remove first instance of adjacent duplicates and restore the list.
module Test where
removeAdjDups :: (Eq a) => [a] -> [a]
removeAdjDups [] = []
removeAdjDups [x] = [x]
removeAdjDups (x : y : ys)
| x == y = removeAdjDups ys
| otherwise = x : removeAdjDups (y : ys)
*Test> removeAdjDups "1233213443"
"122133"
This func works for first found pairs.
So now I need to apply same function over the result of the function.
Something I think foldl can help with but I don't know how I'd go about implementing it.
Something along the line of
removeAdjDups' xs = foldl (\acc x -> removeAdjDups x acc) xs
Also is this approach the best way to implement the solution or is there a better way I should be thinking of?
Start in last-first order: first remove duplicates from the tail, then check if head of the input equals to head of the tail result (which, by this moment, won't have any duplicates, so the only possible pair is head of the input vs. head of the tail result):
main = mapM_ (print . squeeze) ["acvvca", "1456776541", "abbac", "aabaabckllm"]
squeeze :: Eq a => [a] -> [a]
squeeze (x:xs) = let ys = squeeze xs in case ys of
(y:ys') | x == y -> ys'
_ -> x:ys
squeeze _ = []
Outputs
""
""
"c"
"ckm"
I don't see how foldl could be used for this. (Generally, foldl pretty much combines the disadvantages of foldr and foldl'... those, or foldMap, are the folds you should normally be using, not foldl.)
What you seem to intend is: repeating the removeAdjDups, until no duplicates are found anymore. The repetition is a job for
iterate :: (a -> a) -> a -> [a]
like
Prelude> iterate removeAdjDups "1233213443"
["1233213443","122133","11","","","","","","","","","","","","","","","","","","","","","","","","","","",""...
This is an infinite list of ever reduced lists. Generally, it will not converge to the empty list; you'll want to add some termination condition. If you want to remove as many dups as necessary, that's the fixpoint; it can be found in a very similar way to how you implemented removeAdjDups: compare neighbor elements, just this time in the list of reductions.
bipll's suggestion to handle recursive duplicates is much better though, it avoids unnecessary comparisons and traversing the start of the list over and over.
List comprehensions are often overlooked. They are, of course syntactic sugar but some, like me are addicted. First off, strings are lists as they are. This functions could handle any list, too as well as singletons and empty lists. You can us map to process many lists in a list.
(\l -> [ x | (x,y) <- zip l $ (tail l) ++ " ", x /= y]) "abcddeeffa"
"abcdefa"
I don't see either how to use foldl. It's maybe because, if you want to fold something here, you have to use foldr.
main = mapM_ (print . squeeze) ["acvvca", "1456776541", "abbac", "aabaabckllm"]
-- I like the name in #bipll answer
squeeze = foldr (\ x xs -> if xs /= "" && x == head(xs) then tail(xs) else x:xs) ""
Let's analyze this. The idea is taken from #bipll answer: go from right to left. If f is the lambda function, then by definition of foldr:
squeeze "abbac" = f('a' f('b' f('b' f('a' f('c' "")))
By definition of f, f('c' "") = 'c':"" = "c" since xs == "". Next char from the right: f('a' "c") = 'a':"c" = "ac" since 'a' != head("c") = 'c'. f('b' "ac") = "bac" for the same reason. But f('b' "bac") = tail("bac") = "ac" because 'b' == head("bac"). And so forth...
Bonus: by replacing foldr with scanr, you can see the whole process:
Prelude> squeeze' = scanr (\ x xs -> if xs /= "" && x == head(xs) then tail(xs) else x:xs) ""
Prelude> zip "abbac" (squeeze' "abbac")
[('a',"c"),('b',"ac"),('b',"bac"),('a',"ac"),('c',"c")]

How can I find the index where one list appears as a sublist of another?

I have been working with Haskell for a little over a week now so I am practicing some functions that might be useful for something. I want to compare two lists recursively. When the first list appears in the second list, I simply want to return the index at where the list starts to match. The index would begin at 0. Here is an example of what I want to execute for clarification:
subList [1,2,3] [4,4,1,2,3,5,6]
the result should be 2
I have attempted to code it:
subList :: [a] -> [a] -> a
subList [] = []
subList (x:xs) = x + 1 (subList xs)
subList xs = [ y:zs | (y,ys) <- select xs, zs <- subList ys]
where select [] = []
select (x:xs) = x
I am receiving an "error on input" and I cannot figure out why my syntax is not working. Any suggestions?
Let's first look at the function signature. You want to take in two lists whose contents can be compared for equality and return an index like so
subList :: Eq a => [a] -> [a] -> Int
So now we go through pattern matching on the arguments. First off, when the second list is empty then there is nothing we can do, so we'll return -1 as an error condition
subList _ [] = -1
Then we look at the recursive step
subList as xxs#(x:xs)
| all (uncurry (==)) $ zip as xxs = 0
| otherwise = 1 + subList as xs
You should be familiar with the guard syntax I've used, although you may not be familiar with the # syntax. Essentially it means that xxs is just a sub-in for if we had used (x:xs).
You may not be familiar with all, uncurry, and possibly zip so let me elaborate on those more. zip has the function signature zip :: [a] -> [b] -> [(a,b)], so it takes two lists and pairs up their elements (and if one list is longer than the other, it just chops off the excess). uncurry is weird so lets just look at (uncurry (==)), its signature is (uncurry (==)) :: Eq a => (a, a) -> Bool, it essentially checks if both the first and second element in the pair are equal. Finally, all will walk over the list and see if the first and second of each pair is equal and return true if that is the case.

Haskell- looping every second element of list

I want to be able to loop every second element of a given list. I can do this recursively as so:
check validate (x:xs) = check (validate x) (tail xs)
But the problem is that I need a function that accepts a list as parameter, then returns a list consisting of only every second element in the list, starting with (and including) the first element of the list, and I do not think this is possible recursively.
Can someone show me how to this using list comprehension? This would probably be the best approach.
second (x:y:xs) = y : second xs;
second _ = []
List comprehension may not be useful.
You can also try mutual recursion
first [] = []
first (x:xs) = x:second xs
second [] = []
second (x:xs) = first xs
such as
> first [1..10]
[1,3,5,7,9]
> second [1..10]
[2,4,6,8,10]
One of the Haskellish approaches would be something with map, filter, and zip.
second xs = map fst $ filter (odd . snd) $ zip xs [1..]
If you really wanted to use list comprehension, you could use the parallel list comprehension extension.
{-# LANGUAGE ParallelListComp #-}
second xs = [ x | (x, n) <- [ (x, n) | x <- xs | n <- [1..] ], odd n ]
I think that the former is concise, though.

erlang list filter question

I have list - Sep1:
[
....
["Message-ID", "AAAAAAAAAAAAAAAAAAA"],
["To", "BBBBBBBBBBBBBBBBB"]
...
]
I try get element where first item = Message_ID for example:
lists:filter(fun(Y) -> (lists:nth(1,lists:nth(1,Y)) =:= "Message-ID") end, Sep1).
But i get error:
exception error: no function clause matching lists:nth(1,[])
in function utils:'-parse_to/1-fun-1-'/1
in call from lists:'-filter/2-lc$^0/1-0-'/2
But if i:
io:format(lists:nth(1,lists:nth(1,Sep1))).
> Message-ID
What's wrong?
Thank you.
It's better to change representation to [{Key, Value}, ...] so you can use lists:key* functions, proplists module, or convert it to dict with dict:from_list/1.
But if you still want to use lists:filter/2 you can filter list of lists by first element as following:
lists:filter(fun ([K | _]) -> K =:= "Message-ID" end, ListOfLists).
If you want to extract tails of lists which first element match with "Message-ID" you can use list comprehensions:
[Tail || ["Message-ID" | Tail] <- ListOfLists].
Why do you use two nested lists:nth calls?
lists:filter(fun(Y) -> lists:nth(1, Y) =:= "Message-ID" end, Sep1) works for me and returns a list containing the elements you want (lists where the first element is "Message-ID"). Just pattern match on that list to get the element you want, e.g. if you want only one such element you can do:
case lists:filter(fun(Y) -> lists:nth(1, Y) =:= "Message-ID" end, Sep1) of
[Result] -> % do something with it;
[] -> % no such element found
end
What you probably want is this:
[B || [A,B|_] <- L, A =:= "Message-ID"].
This does not assume any length of the nested lists:
It will return a list of the second elements of all inner lists whose first element is "Message-ID"
If you are sure there is only one "Message-ID" and want to throw an error otherwise:
[X] = [B || [A,B|_] <- L, A =:= "Message-ID"].
If you only want the first one (still throwing error when there is none):
[X|_] = [B || [A,B|_] <- L, A =:= "Message-ID"].
To understand what this code does I recommend reading official Erlang documentation about list comprehensions and the Learn You Some Erlang-chapter about the same topic: List Comprehensions.
Assuming that your list contains only elements each of them with 2 elements, you could use lists comprehension doing something like this:
1> L = [["Message-ID","AAAAAAAA"],["To","BBBBBBBBBBB"]].
[["Message-ID","AAAAAAAA"],["To","BBBBBBBBBBB"]]
2> [[A,B]||[A,B] <- L, A =:= "Message-ID"].
[["Message-ID","AAAAAAAA"]]
Hope this helps.
You could create your own filter (which doesn't care about the number of the elements):
filter(List) -> filter(List,[]).
filter([],Acc) -> lists:reverse(Acc);
filter([[]|Tail],Acc) -> filter(Tail,Acc);
filter([[H|T]|Tail],Acc) ->
case H =:= "Message-ID" of
true -> filter(Tail,[[H|T]|Acc]);
_ -> filter(Tail,Acc)
end.