how to divide a 2d list by last element haskell? - list

I am trying to group lists in a 2dlist based on the last element in each list in Haskell. Like this:
[[0,1],[2,2],[0,2],[1,1]]
would either become a 3d list like this:
[[[0,1],[1,1]],[[0,2],[2,2]]]
or would separate the data into n number of categories using any data structures.
Specifically, I'm trying to implement the seperateByClass Method in this tutorial http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/

The goal is to convert
def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
dataset = [[1,20,1], [2,21,0], [3,22,1]]
separated = separateByClass(dataset)
print('Separated instances: {0}').format(separated)
which has output
Separated instances: {0: [[2, 21, 0]], 1: [[1, 20, 1], [3, 22, 1]]}
to Haskell. This is the perfect use-case for Data.Map's fromListWith :: Ord k => (a -> a -> a) -> [(k, a)] -> Map k a, which takes a list of key-value pairs and a strategy for combining values when two pairs happen to share the same key.
λ> import Data.Maybe
λ> import Data.Map
λ> let last_ = listToMaybe . reverse
λ> let pairs = [(last_ x, [x]) | x <- dataset]
λ> fromListWith (\a b -> b) pairs
fromList [(Just 0,[[2,21,0]]),(Just 1,[[1,20,1]])]
λ> fromListWith (++) pairs
fromList [(Just 0,[2,21,0]),(Just 1,[3,22,1,1,20,1])]
λ> fromListWith (++) pairs
fromList [(Just 0,[[2,21,0]]),(Just 1,[[3,22,1],[1,20,1]])]
Great job, Haskell.

import Data.List(groupBy, sortBy)
import Data.Ord(compare)
groupBy (\x y -> x!!1==y!!1) $ sortBy (\x y -> compare (x!!1) (y!!1)) [[0,1],[2,2],[0,2],[1,1]]
[[[0,1],[1,1]],[[2,2],[0,2]]]
or, change the indexed access to last
groupBy (\x y -> last x==last y) $ sortBy (\x y -> compare (last x) (last y)) [[0,1],[2,2],[0,2],[1,1]]
[[[0,1],[1,1]],[[2,2],[0,2]]]
perhaps easier with some helper functions
compareLast x y = compare (last x) (last y)
equalLast x y = EQ == compareLast x y
groupBy equalLast $ sortBy compareLast [[0,1],[2,2],[0,2],[1,1]]
[[[0,1],[1,1]],[[2,2],[0,2]]]
Or, going one step further
compareBy f x y = compare (f x) (f y)
equalBy f = ((EQ ==) .) . compareBy f
partitionBy f = groupBy (equalBy f) . sortBy (compareBy f)
partitionBy last [[0,1],[2,2],[0,2],[1,1]]
[[[0,1],[1,1]],[[2,2],[0,2]]]

Related

Haskell How to rewrite a code using fold-function?

I want to rewrite (or upgrade! :) ) my two functions, hist and sort, using fold-functions. But since I am only in the beginning of my Haskell-way, I can't figure out how to do it.
First of all, I have defined Insertion, Table and imported Data.Char:
type Insertion = (Char, Int)
type Table = [Insertion]
import Data.Char
Then I have implemented the following code for hist:
hist :: String -> Table
hist[] = []
hist(x:xs) = sortBy x (hist xs) where
sortBy x [] = [(x,1)]
sortBy x ((y,z):yzs)
| x == y = (y,z+1) : yzs
| otherwise = (y,z) : sortBy x yzs
And this one for sort:
sort :: Ord a => [a] -> [a]
sort [] = []
sort (x:xs) = paste x (sort xs)
paste :: Ord a => a -> [a] -> [a]
paste y [] = [y]
paste y (x:xs)
| x < y = x : paste y xs
| otherwise = y : x : xs
What can I do next? How can I use the fold-functions to implement them?
foldr f z on a list replaces the "cons" of the list (:) with f and the empty list [] with z.
This thus means that for a list like [1,4,2,5], we thus obtain f 1 (f 4 (f 2 (f 5 z))), since [1,4,2,5] is short for 1 : 4 : 2 : 5 : [] or more canonical (:) 1 ((:) 4 ((:) 2 ((:) 5 []))).
The sort function for example can be replaced with a fold function:
sort :: Ord a => [a] -> [a]
sort = foldr paste []
since sort [1,4,2,5] is equivalent to paste 1 (paste 4 (paste 2 (paste 5 []))). Here f thus takes as first parameter an element, and as second parameter the result of calling foldr f z on the rest of the list,
I leave hist as an exercise.

Haskell add unique combinations of list to tuple

Say for example that I have a list like this
list = ["AC", "BA"]
I would like to add every unique combination of this list to a tuple so the result is like this:
[("AC", "AC"),("AC","BA"),("BA", "BA")]
where ("BA","AC") is excluded.
My first approach was to use a list comprehension like this:
ya = [(x,y) | x <- list, y <- list]
But I couldn't manage to get it to work, is there anyway to achieve my result by using list comprehensions?
My preferred solution uses a list comprehension
f :: [t] -> [(t, t)]
f list = [ (a,b) | theTail#(a:_) <- tails list , b <- theTail ]
I find this to be quite readable: first you choose (non-deterministically) a suffix theTail, starting with a, and then you choose (non-deterministically) an element b of the suffix. Finally, the pair (a,b) is produced, which clearly ranges over the wanted pairs.
It should also be optimally efficient: every time you demand an element from it, that is produced in constant time.
ThreeFx's answer will work, but it adds the constraint that you elements must be orderable. Instead, you can get away with functions in Prelude and Data.List to implement this more efficiently and more generically:
import Data.List (tails)
permutations2 :: [a] -> [(a, a)]
permutations2 list
= concat
$ zipWith (zip . repeat) list
$ tails list
It doesn't use list comprehensions, but it works without having to perform potentially expensive comparisons and without any constraints on what kind of values you can put through it.
To see how this works, consider that if you had the list [1, 2, 3], you'd have the groups
[(1, 1), (1, 2), (1, 3),
(2, 2), (2, 3),
(3, 3)]
This is equivalent to
[(1, [1, 2, 3]),
(2, [2, 3]),
(3, [3])]
since it doesn't contain any extra or any less information. The transformation from this form to our desired output is to map the function f (x, ys) = map (\y -> (x, y)) ys over each tuple, then concat them together. Now we just need to figure out how to get the second element of those tuples. Quite clearly, we see that all its doing is dropping successive elements off the front of the list. Luckily, this is already implemented for us by the tails function in Data.List. The first element in each of these tuples is just makes up the original list, so we know we can use a zip. Initially, you could implement this with
> concatMap (\(x, ys) -> map (\y -> (x, y)) ys) $ zip list $ tails list
But I personally prefer zips, so I'd turn the inner function into one that doesn't use lambdas more than necessary:
> concatMap (\(x, ys) -> zip (repeat x) ys) $ zip list $ tails list
And since I prefer zipWith f over map (uncurry f) . zip, I'd turn this into
> concat $ zipWith (\x ys -> zip (repeat x) ys) list $ tails list
Now, we can reduce this further:
> concat $ zipWith (\x -> zip (repeat x)) list $ tails list
> concat $ zipWith (zip . repeat) list $ tails list
thanks the eta-reduction and function composition. We could make this entirely pointfree where
> permutations2 = concat . ap (zipWith (zip . repeat)) tails
But I find this pretty hard to read and understand, so I think I'll stick with the previous version.
Just use a list comprehension:
f :: (Ord a) => [a] -> [(a, a)]
f list = [ (a, b) | a <- list, b <- list, a <= b ]
Since Haskell's String is in the Ord typeclass, which means it can be ordered, you first tell Haskell to get all possible combinations and then exclude every combination where b is greater than a which removes all "duplicate" combinations.
Example output:
> f [1,2,3,4]
[(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(2,4),(3,3),(3,4),(4,4)]

Haskell: Using a list to access indices

I am making a function that takes a boolean function and two lists. It needs to iterate through the first list and for the indices that make the boolean function true return the corresponding elements of the second list.
for example..
filterAB (>0) [-2, -1, 0, 1, 2] [5, 2, 5, 9, 0]
would return:
[9, 0]
I am using findIndices to return a list of the correct indices from the first list that make the boolean function true so that i can use them to access the elements of the second list. Here is my code so far:
filterAB boolFunc listA listB = take listC listB where
listC = findIndices boolFunc listA
Unfortunately the line
take listC listB
does not work because the take function requires type Int as a specifier while listC is type [Int]
Any help would be greatly appreciated!
Also using simple list comprehensions ...
[ghci] let filterAB f as bs = [ b | (a, b) <- zip as bs, f a]
[ghci] filterAB (>0) [-2,-1,0,1,2] [5,2,5,9,0]
[9,0]
[ghci]
An other version :
filterAB f l1 l2 = map snd $ filter (f . fst) $ zip l1 l2
If you have difficulties understanding the $, this version is the same :
let filterAB f l1 l2 = map snd ( filter (f . fst) ( zip l1 l2 ))
zip take two list and transform it one a list of tuple. For example :
zip [1,2,3,4] ["un", "deux", "trois", "quatre"] == [(1,"un"),(2,"deux"),(3,"trois"),(4,"quatre")]
filter take a list and a function that return true of false for each element of the list and filter it, it's like your filterAB but in simpler :
filter (>0) [-1, 2, -2, 3, -3] == [2,3]
fst take a couple and return the first element, so f . fst will apply f on the first element of your tuple. Like that filter (f . fst) allow use to filter on a list of tuple by just considering the first element of each tuple :
filter (odd . fst) [(1,"un"),(2,"deux"),(3,"trois"),(4,"quatre")] == [(1,"un"),(3,"trois")]
If you don't get the dot, it's just function composition so the next two lines are identical :
h = f . g
h = f ( g x )
snd take a couple and return the second element. Using it with map allow us to take a list of tuple and return a list only of the second element of the tuple :
map snd [(1,"un"),(2,"deux"),(3,"trois"),(4,"quatre")] == ["un","deux","trois","quatre"]
Try this
filterAB f (x:xs) (y:ys)
| f x = y : filterAB f xs ys
| otherwise = filterAB f xs ys
filterAB _ _ _ = []
Chapter 3. Defining Types, Streamlining Functions of Real World Haskell given a very good explanation of the syntax involved here.
Testing:
*Main> filterAB (>0) [-2,-1,0,1,2] [5,2,5,9,0]
[9,0]
*Main> filterAB (>0) [-2,-1,0,1,2] [5,2,5,9]
[9]
*Main> filterAB (>0) [-2,-1,0,1,2] [5,2,5]
[]
*Main> filterAB (>0) [-2,-1,0] [5,2,5,9,0]
[]
*Main>

groupBy with multiple test functions

Is there a better and more concise way to write the following code in Haskell? I've tried using if..else but that is getting less readable than the following. I want to avoid traversing the xs list (which is huge!) 8 times to just separate the elements into 8 groups. groupBy from Data.List takes only one test condition function: (a -> a -> Bool) -> [a] -> [[a]].
x1 = filter (check condition1) xs
x2 = filter (check condition2) xs
x3 = filter (check condition3) xs
x4 = filter (check condition4) xs
x5 = filter (check condition5) xs
x6 = filter (check condition6) xs
x7 = filter (check condition7) xs
x8 = filter (check condition8) xs
results = [x1,x2,x3,x4,x5,x6,x7,x8]
This only traverses the list once:
import Data.Functor
import Control.Monad
filterN :: [a -> Bool] -> [a] -> [[a]]
filterN ps =
map catMaybes . transpose .
map (\x -> map (\p -> x <$ guard (p x)) ps)
For each element of the list, the map produces a list of Maybes, each Maybe corresponding to one of the predicates; it is Nothing if the element does not satisfy the predicate, or Just x if it does satisfy the predicate. Then, the transpose shuffles all these lists so that the list is organised by predicate, rather than by element, and the map catMaybes discards the entries for elements that did not satisfy a predicate.
Some explanation: x <$ m is fmap (const x) m, and for Maybe, guard b is if b then Just () else Nothing, so x <$ guard b is if b then Just x else Nothing.
The map could also be written as map (\x -> [x <$ guard (p x) | p <- ps]).
If you insist on one traversing the list only once, you can write
filterMulti :: [a -> Bool] -> [a] -> [[a]]
filterMulti fs xs = go (reverse xs) (repeat []) where
go [] acc = acc
go (y:ys) acc = go ys $ zipWith (\f a -> if f y then y:a else a) fs acc
map (\ cond -> filter (check cond) xs) [condition1, condition2, ..., condition8]
I think you could use groupWith from GHC.Exts.
If you write the a -> b function to assign every element in xs its 'class', I belive groupWith would split xs just the way you want it to, traversing the list just once.
groupBy doesn't really do what you're wanting; even if it did accept multiple predicate functions, it doesn't do any filtering on the list. It just groups together contiguous runs of list elements that satisfy some condition. Even if your filter conditions, when combined, cover all of the elements in the supplied list, this is still a different operation. For instance, groupBy won't modify the order of the list elements, nor will it have the possibility of including a given element more than once in the result, while your operation can do both of those things.
This function will do what you're looking for:
import Control.Applicative
filterMulti :: [a -> Bool] -> [a] -> [[a]]
filterMulti ps as = filter <$> ps <*> pure as
As an example:
> filterMulti [(<2), (>=5)] [2, 5, 1, -2, 5, 1, 7, 3, -20, 76, 8]
[[1, -2, 1, -20], [5, 5, 7, 76, 8]]
As an addendum to nietaki's answer (this should be a comment but it's too long, so if his answer is correct, accept his!), the function a -> b could be written as a series of nested if ... then .. else, but that is not very idiomatic Haskell and not very extensible. This might be slightly better:
import Data.List (elemIndex)
import GHC.Exts (groupWith)
f xs = groupWith test xs
where test x = elemIndex . map ($ x) $ [condition1, ..., condition8]
It categorises each element by the first condition_ it satisfies (and puts those that don't satisfy any into their own category).
(The documentation for elemIndex is here.)
The first function will return a list of "uppdated" lists and the second function will go through the whole list and for each value uppdate the list
myfilter :: a -> [a -> Bool] -> [[a]] -> [[a]]
myfilter _ [] [] = []
myfilter x f:fs l:ls | f x = (x:l): Myfilter x fs ls
| otherwise = l:Myfilter x fs ls
filterall :: [a] -> [a -> Bool] -> [[a]] -> [[a]]
filterall [] _ l = l
filterall x:xs fl l:ls = filterall xs fl (myfilter x fl l)
This should be called with filterall xs [condition1,condition2...] [[],[]...]

unique elements in a haskell list

okay, this is probably going to be in the prelude, but: is there a standard library function for finding the unique elements in a list? my (re)implementation, for clarification, is:
has :: (Eq a) => [a] -> a -> Bool
has [] _ = False
has (x:xs) a
| x == a = True
| otherwise = has xs a
unique :: (Eq a) => [a] -> [a]
unique [] = []
unique (x:xs)
| has xs x = unique xs
| otherwise = x : unique xs
I searched for (Eq a) => [a] -> [a] on Hoogle.
First result was nub (remove duplicate elements from a list).
Hoogle is awesome.
The nub function from Data.List (no, it's actually not in the Prelude) definitely does something like what you want, but it is not quite the same as your unique function. They both preserve the original order of the elements, but unique retains the last
occurrence of each element, while nub retains the first occurrence.
You can do this to make nub act exactly like unique, if that's important (though I have a feeling it's not):
unique = reverse . nub . reverse
Also, nub is only good for small lists.
Its complexity is quadratic, so it starts to get slow if your list can contain hundreds of elements.
If you limit your types to types having an Ord instance, you can make it scale better.
This variation on nub still preserves the order of the list elements, but its complexity is O(n * log n):
import qualified Data.Set as Set
nubOrd :: Ord a => [a] -> [a]
nubOrd xs = go Set.empty xs where
go s (x:xs)
| x `Set.member` s = go s xs
| otherwise = x : go (Set.insert x s) xs
go _ _ = []
In fact, it has been proposed to add nubOrd to Data.Set.
import Data.Set (toList, fromList)
uniquify lst = toList $ fromList lst
I think that unique should return a list of elements that only appear once in the original list; that is, any elements of the orginal list that appear more than once should not be included in the result.
May I suggest an alternative definition, unique_alt:
unique_alt :: [Int] -> [Int]
unique_alt [] = []
unique_alt (x:xs)
| elem x ( unique_alt xs ) = [ y | y <- ( unique_alt xs ), y /= x ]
| otherwise = x : ( unique_alt xs )
Here are some examples that highlight the differences between unique_alt and unqiue:
unique [1,2,1] = [2,1]
unique_alt [1,2,1] = [2]
unique [1,2,1,2] = [1,2]
unique_alt [1,2,1,2] = []
unique [4,2,1,3,2,3] = [4,1,2,3]
unique_alt [4,2,1,3,2,3] = [4,1]
I think this would do it.
unique [] = []
unique (x:xs) = x:unique (filter ((/=) x) xs)
Another way to remove duplicates:
unique :: [Int] -> [Int]
unique xs = [x | (x,y) <- zip xs [0..], x `notElem` (take y xs)]
Algorithm in Haskell to create a unique list:
data Foo = Foo { id_ :: Int
, name_ :: String
} deriving (Show)
alldata = [ Foo 1 "Name"
, Foo 2 "Name"
, Foo 3 "Karl"
, Foo 4 "Karl"
, Foo 5 "Karl"
, Foo 7 "Tim"
, Foo 8 "Tim"
, Foo 9 "Gaby"
, Foo 9 "Name"
]
isolate :: [Foo] -> [Foo]
isolate [] = []
isolate (x:xs) = (fst f) : isolate (snd f)
where
f = foldl helper (x,[]) xs
helper (a,b) y = if name_ x == name_ y
then if id_ x >= id_ y
then (x,b)
else (y,b)
else (a,y:b)
main :: IO ()
main = mapM_ (putStrLn . show) (isolate alldata)
Output:
Foo {id_ = 9, name_ = "Name"}
Foo {id_ = 9, name_ = "Gaby"}
Foo {id_ = 5, name_ = "Karl"}
Foo {id_ = 8, name_ = "Tim"}
A library-based solution:
We can use that style of Haskell programming where all looping and recursion activities are pushed out of user code and into suitable library functions. Said library functions are often optimized in ways that are way beyond the skills of a Haskell beginner.
A way to decompose the problem into two passes goes like this:
produce a second list that is parallel to the input list, but with duplicate elements suitably marked
eliminate elements marked as duplicates from that second list
For the first step, duplicate elements don't need a value at all, so we can use [Maybe a] as the type of the second list. So we need a function of type:
pass1 :: Eq a => [a] -> [Maybe a]
Function pass1 is an example of stateful list traversal where the state is the list (or set) of distinct elements seen so far. For this sort of problem, the library provides the mapAccumL :: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function.
Here the mapAccumL function requires, besides the initial state and the input list, a step function argument, of type s -> a -> (s, Maybe a).
If the current element x is not a duplicate, the output of the step function is Just x and x gets added to the current state. If x is a duplicate, the output of the step function is Nothing, and the state is passed unchanged.
Testing under the ghci interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
λ>
λ> import Data.List(mapAccumL)
λ>
λ> pass1 xs = mapAccumL stepFn [] xs
λ>
λ> xs2 = snd $ pass1 "abacrba"
λ> xs2
[Just 'a', Just 'b', Nothing, Just 'c', Just 'r', Nothing, Nothing]
λ>
Writing a pass2 function is even easier. To filter out Nothing non-values, we could use:
import Data.Maybe( fromJust, isJust)
pass2 = (map fromJust) . (filter isJust)
but why bother at all ? - as this is precisely what the catMaybes library function does.
λ>
λ> import Data.Maybe(catMaybes)
λ>
λ> catMaybes xs2
"abcr"
λ>
Putting it all together:
Overall, the source code can be written as:
import Data.Maybe(catMaybes)
import Data.List(mapAccumL)
uniques :: (Eq a) => [a] -> [a]
uniques = let stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
in catMaybes . snd . mapAccumL stepFn []
This code is reasonably compatible with infinite lists, something occasionally referred to as being “laziness-friendly”:
λ>
λ> take 5 $ uniques $ "abacrba" ++ (cycle "abcrf")
"abcrf"
λ>
Efficiency note:
If we anticipate that it is possible to find many distinct elements in the input list and we can have an Ord a instance, the state can be implemented as a Set object rather than a plain list, this without having to alter the overall structure of the solution.
Here's a solution that uses only Prelude functions:
uniqueList theList =
if not (null theList)
then head theList : filter (/= head theList) (uniqueList (tail theList))
else []
I'm assuming this is equivalent to running two or three nested "for" loops (running through each element, then running through each element again to check for other elements with the same value, then removing those other elements) so I'd estimate this is O(n^2) or O(n^3)
Might even be better than reversing a list, nubbing it, then reversing it again, depending on your circumstances.