nub not compiling when checking a list for duplicates - list

Working on a sudoku inspired assignment and I need to implement a function that checks if a Block Cell has no repeated elements in it (to check if its a valid solution to the puzzle).
okBlock :: Block Cell -> Bool
okBlock b = okList $ filter (/= Nothing) b
where
okList :: [a]-> Bool
okList list
| (length list) == (length (nub list)) = True
| otherwise = False
Block a = [a]
Cell = [Maybe Int]
Haskell complains saying No instance for (Eq a) arising from a use of "==" Possible fix: add (Eq a) to the context of the type signature for okList...
Adding Eq a to the type signature does not help. I have tried the function in the terminal and it works fine for for lists, and for lists of lists (i.e the type I am feeding it in the function).
What am I missing here?

Well you can only filter out duplicates, if there is a way to check whether two values are duplicates. If we look at the type signature for nub, we see:
nub :: Eq a => [a] -> [a]
So that means that in order to filter out duplicates in a list of as, we need a to be an instance of the Eq class. We can thus simply forward the type constraint further in the signatures of the functions:
okBlock :: Block Cell -> Bool
okBlock b = okList $ filter (/= Nothing) b
where
okList :: Eq => [a] -> Bool
okList list
| (length list) == (length (nub list)) = True
| otherwise = False
We do not need to specify that Cell is an instance of Eq because:
Int is an instance of Eq;
if a is an instance of Eq, so is Maybe a, so Maybe Int is an instance of Eq; and
if a is an instance of Eq, so is [a], so [Maybe Int] is an instance of Eq.
That being said we can do some syntactical improvements of the code:
there is no need to work with guards if you simply return the result of the guard True and False, and
you can use an eta reduction and omit the b in okBlock.
you don't need parentheses around function application (unless to feed to result straight to another, non-infix function).
This gives us:
okBlock :: Block Cell -> Bool
okBlock = okList . filter (/= Nothing)
where
okList :: Eq => [a] -> Bool
okList list = length list == length (nub list)
A final note is that usually you do not have to specify a type signature. In that case Haskell will aim to dervice the most generic type signature. So you can write:
okBlock = okList . filter (/= Nothing)
where
okList list = length list == length (nub list)
Now okBlock will have type:
Prelude Data.List> :t okBlock
okBlock :: Eq a => [Maybe a] -> Bool

Three points that are too big to make in a comment.
nub is horribly slow
nub takes O(n^2) time to process a list of length n. Unless you know the list is very short, this is the wrong function to use to remove duplicates from a list. Adding a bit more information about what sort of thing you're working with allows more efficient nubbing. The simplest, and probably most general, approach that isn't absolutely wretched is to use an Ord constraint:
import qualified Data.Set as S
nubOrd :: Ord a => [a] -> [a]
nubOrd = go S.empty where
go _seen [] = []
go seen (a : as)
| a `S.member` seen = go seen as
| otherwise = go (S.insert a seen) as
length is wasteful
Suppose I write
sameLength :: [a] -> [b] -> Bool
sameLength xs ys = length xs == length ys
(which uses the approach you did). Now imagine I calculate
sameLength [1..16] [1..2^100]
How long will that take? Calculating length [1..16] will take nanoseconds. Calculating length [1..2^100] will probably take billions of years using current hardware. Whoops. What's the right way? Pattern match!
sameLength [] [] = True
sameLength (_ : xs) (_ : ys) = sameLength xs ys
sameLength _ _ = False
Nubbing isn't the right solution to this problem
Suppose I ask noDuplicates (1 : [1,2..]). Obviously, there's a duplicate, right at the beginning. But if I use sameLength and nub to check, I will never get an answer. It will keep building the nubbed list and comparing it to the original list until the seen becomes so large it exhausts your computer's memory. How can you fix that? By directly calculating what you need:
noDuplicates = go S.empty where
go _seen [] = True
go seen (x : xs)
| x `S.member` seen = False
| otherwise = go (S.insert x seen) xs
Now the program will conclude that there's a duplicate the moment it sees the second 1.

Related

Function to find number of occurrences in list

So I already have a function that finds the number of occurrences in a list using maps.
occur :: [a] -> Map a a
occur xs = fromListWith (+) [(x, 1) | x <- xs]
For example if a list [1,1,2,3,3] is inputted, the code will output [(1,2),(2,1),(3,2)], and for a list [1,2,1,1] the output would be [(1,3),(2,1)].
I was wondering if there's any way I can change this function to use foldr instead to eliminate the use of maps.
You can make use of foldr where the accumulator is a list of key-value pairs. Each "step" we look if the list already contains a 2-tuple for the given element. If that is the case, we increment the corresponding value. If the item x does not yet exists, we add (x, 1) to that list.
Our function thus will look like:
occur :: Eq => [a] -> [(a, Int)]
occur = foldr incMap []
where incMap thus takes an item x and a list of 2-tuples. We can make use of recursion here to update the "map" with:
incMap :: Eq a => a -> [(a, Int)] -> [(a, Int)]
incMap x = go
where go [] = [(x, 1)]
go (y2#(y, ny): ys)
| x == y = … : ys
| otherwise = y2 : …
where I leave implementing the … parts as an exercise.
This algorithm is not very efficient, since it takes O(n) to increment the map with n the number of 2-tuples in the map. You can also implement incrementing the Map for the given item by using insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a, which is more efficient.

Function to find the most frequent element

I am trying to code a function that returns the element that appears the most in a list. So far I have the following
task :: Eq a => [a] -> a
task xs = (map ((\l#(x:xs) -> (x,length l)) (occur (sort xs))))
occur is a function that takes a list and returns a list of pairs with the elements of the inputted list along with the amount of times they appear. So for example for a list [1,1,2,3,3] the output would be [(1,2),(2,1),(3,2)].
However, I am getting some errors related to the arguments of map. Can anyone tell me what I'm doing wrong?
A map maps every item to another item, so here \l is a 2-tuple, like (1,2), (2, 1) or (3, 2). It thus does not make much sense to work with length l, since length :: Foldable f => f a -> Int will always return one for a 2-tuple: this is because only the second part of the 2-tuple is used in the foldable. But we do not need length in the first place.
What you need is a function that can retrieve the maximum based on the second item of the 2-tuple. We can make use of the maximumOn :: Ord b => (a -> b) -> [a] -> a from the exta package, or we can implement our own function to calculate the maximum on a list of items.
Such function thus should look like:
maximumSnd :: Ord b => [(a, b)] -> (a, b)
maximumSnd [] = error "Empty list"
maximumSnd (x:xs) = go xs x
where go [] m = m
go (x#(xa, xb):xs) (ya, yb)
| xb > yb = go … … -- (1)
| otherwise = go … … -- (2)
Here (1) should be implemented such that we make a recursive call but work with x as the new maximum we found thus far. (2) should make a recursive call with the same thus far maximum.
Once we have implemented the maxSnd function, we can use this function as a helper function for:
task :: Eq a => [a] -> (a, Int)
task xs = maxSnd (occur xs)
or we can use fst :: (a, b) -> a to retrieve the first item of the 2-tuple:
task :: Eq a => [a] -> a
task xs = (fst . maxSnd) (occur xs)
In case there are two characters with a maximum number of elements, the maximumSnd will return the first one in the list of occurrences.

Check if a list of lists has two or more identical elements

I need to write a function which checks if a list has two or more same elements and returns true or false.
For example [3,3,6,1] should return true, but [3,8] should return false.
Here is my code:
identical :: [Int] -> Bool
identical x = (\n-> filter (>= 2) n )( group x )
I know this is bad, and it does not work.
I wanted to group the list into list of lists, and if the length of a list is >= 2, then it is should return with true otherwise false.
Use any to get a Bool result.
any ( . . . ) ( group x )
Don’t forget to sort the list, group works on consecutive elements.
any ( . . . ) ( group ( sort x ) )
You can use (not . null . tail) for a predicate, as one of the options.
Just yesterday I posted a similar algorithm here. A possible way to go about it is,
generate the sequence of cumulative sets of elements
{}, {x0}, {x0,x1}, {x0,x1,x2} ...
pair the original sequence of elements with the cumulative sets
x0, x1 , x2 , x3 ...
{}, {x0}, {x0,x1}, {x0,x1,x2} ...
check repeated insertions, i.e.
xi such that xi ∈ {x0..xi-1}
This can be implemented for instance, via the functions below.
First we use scanl to iteratively add the elements of the list to a set, producing the cumulative sequence of these iterations.
sets :: [Int] -> [Set Int]
sets = scanl (\s x -> insert x s) empty
Then we zip the original list with this sequence, so each xi is paired with {x0...xi-1}.
elsets :: [Int] -> [(Int, Set Int)]
elsets xs = zip xs (sets xs)
Finally we use find to search for an element that is "about to be inserted" in a set which already contains it. The function find returns the pair element / set, and we pattern match to keep only the element, and return it.
result :: [Int] -> Maybe Int
result xs = do (x,_) <- find(\(y,s)->y `elem` s) (elsets xs)
return x
The another way to do that using Data.Map as below is not efficient than ..group . sort.. solution, it is still O(n log n) but able to work with infinite list.
import Data.Map.Lazy as Map (empty, lookup, insert)
identical :: [Int] -> Bool
identical = loop Map.empty
where loop _ [] = False
loop m (x:xs) = if Map.lookup x m == Nothing
then loop (insert x 0 m) xs
else True
OK basically this is one of the rare cases where you really need sort for efficiency. In fact Data.List.Unique package has a repeated function just for this job and if the source is checked one can see that sort and group strategy is chosen. I guess this is not the most efficient algorithm. I will come to how we can make sort even more efficient but for the time being let's enjoy a little since this is a nice question.
So we have the tails :: [a] -> [[a]] functions in Data.List package. Accordingly;
*Main> tails [3,3,6,1]
[[3,3,6,1],[3,6,1],[6,1],[1],[]]
As you may quickly notice we can zipWith the tail of tails list which is [[3,6,1],[6,1],[1],[]], with the given original list by applying a function to check if all item are different. This function could be a list comprehension or simply the all :: Foldable t => (a -> Bool) -> t a -> Bool function. The thing is, I would like to short circuit zipWith so that once i meet the first dupe let's just stop zipWith doing wasteful work by checking the rest. For this purpose i can use the monadic version of zipWith, namely zipWithM :: Applicative m => (a -> b -> m c) -> [a] -> [b] -> m [c] which lives in Control.Monad package. The reason being, from it's type signature we understand that it shall stop calculating any further when it accounts for a Nothing or Left whatever in the middle if my monad happens to be Maybe or Either.
Oh..! In Haskell I also love to use the bool :: a -> a -> Bool -> a function instead of if and then. bool is the ternary operation of Haskell which goes like
bool "work time" "coffee break" isCoffeeTime
The negative choice is on the left and the positive one is on the right where isCoffeeTime :: Bool is a function to return True if it is coffee time. Very composable as well.. so cool..!
So since we now have all the background knowledge we may proceed with the code
import Control.Monad (zipWithM)
import Data.List (tails)
import Data.Bool (bool)
anyDupe :: Eq a => [a] -> Either a [a]
anyDupe xs = zipWithM f xs ts
where ts = tail $ tails xs
f = \x t -> bool (Left x) (Right x) $ all (x /=) t
*Main> anyDupe [1,2,3,4,5]
Right [1,2,3,4,5] -- no dupes so we get the `Right` with the original list
*Main> anyDupe [3,3,6,1]
Left 3 -- here we have the first duplicate since zipWithM short circuits.
*Main> anyDupe $ 10^7:[1..10^7]
Left 10000000 -- wow zipWithM worked and returned reasonably fast.
But again.. as i said, this is still a naive approach because theoretically we are doing n(n+1)/2 operations. Yes zipWithM cuts redundancy down greatly if the first met dupe is close to the head but still this algorithm is O(n^2).
I believe it would be best to use the heavenly sort algorithm of Haskell (which is not merge sort as we know it by the way) in this particular case.
Now the algorithm award goes to -> drum roll here -> sort and fold -> applause. Sorry no grouping.
So now... once again we will use a monadic trick to utilize short circuits. We will use foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b. This, when used with Either monad also allows us to return a more meaningful result. OK lets do it. Any Left n means n is the first dupe and no more calculations while any Right _ means there are no dupes.
import Control.Monad (foldM)
import Data.List (sort)
import Data.Bool (bool)
anyDupe' :: (Eq a, Ord a, Enum a) => [a] -> Either a a
anyDupe' xs = foldM f i $ sort xs
where i = succ $ head xs -- prevent the initial value to be equal with the value at the head
f = \b a -> bool (Left a) (Right a) (a /= b)
*Main> anyDupe' [1,2,3,4,5]
Right 5
*Main> anyDupe' [3,3,6,1]
Left 3
*Main> anyDupe' $ 1:[10^7,(10^7-1)..1]
Left 1
(2.97 secs, 1,040,110,448 bytes)
*Main> anyDupe $ 1:[10^7,(10^7-1)..1]
Left 1
(2.94 secs, 1,440,112,888 bytes)
*Main> anyDupe' $ [1..10^7]++[10^7]
Left 10000000
(5.71 secs, 3,600,116,808 bytes) -- winner by far
*Main> anyDupe $ [1..10^7]++[10^7] -- don't try at home, it's waste of energy
In real world scenarios anyDupe' should always be the winner.

Haskell function to keep the repeating elements of a list

Here is the expected input/output:
repeated "Mississippi" == "ips"
repeated [1,2,3,4,2,5,6,7,1] == [1,2]
repeated " " == " "
And here is my code so far:
repeated :: String -> String
repeated "" = ""
repeated x = group $ sort x
I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.
Your code already does half of the job
> group $ sort "Mississippi"
["M","iiii","pp","ssss"]
You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
Using this:
> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]
Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:
> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"
Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.
Yet another option is to use a list comprehension
> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"
This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.
Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:
repeated :: Ord a => [a] -> [a]
After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.
Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.
So, you might fill in the following skeleton:
module Repeated (repeated) where
import Data.List (group, sort)
repeated :: Ord a => [a] -> [a]
repeated = map _
. filter (\x -> _)
. group
. sort
I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)
When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.
Here's some other approaches, to evaluate #chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)
While it's true that sorting is O(n lg n), ...
It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:
a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...
group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.
while greatly complicating the implementation.
The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.
repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.
repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
inserter2 x [] = [(x, False)]
inserter2 x (xb#(x', _): xs)
| x == x' = (x', True): xs
| otherwise = xb: inserter2 x xs
repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.
repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
inserter3 x [] = [(x, 1)]
inserter3 x xss#(xc#(x', c): xs) = case x `compare` x' of
{ LT -> ((x, 1): xss)
; EQ -> ((x', c+1): xs)
; GT -> (xc: inserter3 x xs)
}
repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.
repeated4 xs = sort $ go4 [] [] xs
go4 :: Ord a => [a] -> [a] -> [a] -> [a]
go4 repeats _ [] = repeats
go4 repeats onces (x: xs) = case findUpd x onces of
{ (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
; (False, oncesU) -> go4 repeats oncesU xs
}
findUpd :: Ord a => a -> [a] -> (Bool, [a])
findUpd x [] = (False, [x])
findUpd x (x': os) | x == x' = (True, os) -- i.e. x' removed
| otherwise =
let (b, os') = findUpd x os in (b, x': os')
(That last bit of list-fiddling in findUpd is very similar to span.)

How to compare elements in a [[]]?

I am dealing with small program with Haskell. Probably the answer is really simple but I try and get no result.
So one of the part in my program is the list:
first = [(3,3),(4,6),(7,7),(5,43),(9,9),(32,1),(43,43) ..]
and according to that list I want to make new one with element that are equal in the () =:
result = [3,7,9,43, ..]
Even though you appear to have not made the most minimal amount of effort to solve this question by yourself, I will give you the answer because it is so trivial and because Haskell is a great language.
Create a function with this signature:
findIdentical :: [(Int, Int)] -> [Int]
It takes a list of tuples and returns a list of ints.
Implement it like this:
findIdentical [] = []
findIdentical ((a,b) : xs)
| a == b = a : (findIdentical xs)
| otherwise = findIdentical xs
As you can see, findIdentical is a recursive function that compares a tuple for equality between both items, and then adds it to the result list if there is found equality.
You can do this for instance with list comprehension. We iterate over every tuple f,s) in first, so we write (f,s) <- first in the right side of the list comprehension, and need to filter on the fact that f and s are equal, so f == s. In that case we add f (or s) to the result. So:
result = [ f | (f,s) <- first, f == s ]
We can turn this into a function that takes as input a list of 2-tuples [(a,a)], and compares these two elements, and returns a list [a]:
f :: Eq a => [(a,a)] -> [a]
f dat = [f | (f,s) <- dat, f == s ]
An easy way to do this is to use the Prelude's filter function, which has the type definition:
filter :: (a -> Bool) -> [a] -> [a]
All you need to do is supply predicate on how to filter the elements in the list, and the list to filter. You can accomplish this easily below:
filterList :: (Eq a) => [(a, a)] -> [a]
filterList xs = [x | (x, y) <- filter (\(a, b) -> a == b) xs]
Which behaves as expected:
*Main> filterList [(3,3),(4,6),(7,7),(5,43),(9,9),(32,1),(43,43)]
[3,7,9,43]