Parsing out words in a string - list

I hope I was clear about my question!
Any help would be appreciated!

The function words from the Prelude will filter out spaces for you (a good way to find functions by desired type is Hoogle).
Prelude> :t words
words :: String -> [String]
You just need to compose this with an appropriate filter that makes use of Set. Here's a really basic one:
import Data.Set (Set, fromList, notMember)
parser :: String -> [String]
parser = words . filter (`notMember` delims)
where delims = fromList ".,!?"
parser "yeah. what?" Will return ["yeah", "what"].
Check out Learn You A Haskell for some good introductory material.

You want Data.List.Split, which covers the vast majority of splitting use cases.
For your example, just use:
splitOneOf ".,!?"
And if you want to get rid of the "empty words" between consecutive delimiters, just use:
filter (not . null) . splitOneOf ".,!?"
If you want those delimiters to come from set that you already stored them in, then just use:
import qualified Data.Set as S
s :: S.Set Char
split = filter (not . null) . splitOneOf (S.toList s)

As you are learning, here's how to do it from scratch.
import qualified Data.Set as S
First, the set of word boundaries:
wordBoundaries :: S.Set Char
wordBoundaries = S.fromList " ."
(Data.Set.fromList takes a list of elements; [Char] is the same as String, which is why we can pass a string in this case.)
Next, splitting a string into words:
toWords :: String -> [String]
toWords = fst . foldr cons ([], True)
where
The documentation for fst and foldr is pretty clear, but that for . is a bit terse if you've not encountered function composition before.
The argument given to toWords is fed to the foldr cons ([], True). . then takes the result from foldr cons ([], True) and feeds it to fst. Finally, the result from fst is used as the result from toWords itself.
We have still to define cons:
cons :: Char -> ([String], Bool) -> ([String], Bool)
cons ch (words, startNew)
| S.member ch wordBoundaries = ( words, True)
| startNew = ([ch] : words, False)
cons ch (word : words, _) = ((ch : word) : words, False)
Homework: work out what cons does and how it works. This may be easier if you first ensure you understand how foldr calls it.

Related

Haskell Text Parser Combinators to parse a Range Greedily like Regex range notation

In regex you can acquire a range of a parse by doing something like \d{1,5}, which parses a digit 1 to 5 times greedily. Or you do \d{1,5}? to make it lazy.
How would you do this in Haskell's Text.ParserCombinators.ReadP?
My attempt gave this:
rangeParse :: Read a => ReadP a -> [Int] -> ReadP [a]
rangeParse parse ranges = foldr1 (<++) $ fmap (\n -> count n $ parse) ranges
Which if you do it like rangeParse (satisfy isDigit) ([5,4..1]) will perform a greedy parse of digits 1 to 5 times. While if you swap the number sequent to [1..5], you get a lazy parse.
Is there a better or more idiomatic way to do this with parser combinators?
update: the below is wrong - for example
rangeGreedy 2 4 a <* string "aab", the equivalent of regexp a{2,4}aab, doesn't match. The questioner's solution gets this right. I won't delete the answer just yet in case it keeps someone else from making the same mistake.
=========
This isn't a complete answer, just a possible way to write the greedy
version. I haven't found a nice way to do the lazy version.
Define a left-biased version of option that returns Maybes:
greedyOption :: ReadP a -> ReadP (Maybe a)
greedyOption p = (Just <$> p) <++ pure Nothing
Then we can do up to n of something with a replicateM of them:
upToGreedy :: Int -> ReadP a -> ReadP [a]
upToGreedy n p = catMaybes <$> replicateM n (greedyOption p)
To allow a minimum count, do the mandatory part separately and append
it:
rangeGreedy :: Int -> Int -> ReadP a -> ReadP [a]
rangeGreedy lo hi p = (++) <$> count lo p <*> upToGreedy (hi - lo) p
The rest of my test code in case it's useful for anyone:
module Main where
import Control.Monad (replicateM)
import Data.Maybe (catMaybes)
import Text.ParserCombinators.ReadP
main :: IO ()
main = mapM_ go ["aaaaa", "aaaab", "aaabb", "aabbb", "abbbb", "bbbbb"]
where
go = print . map fst . readP_to_S test
test :: ReadP [String]
test = ((++) <$> rangeGreedy 2 4 a <*> many aOrB) <* eof
where
a = char 'a' *> pure "ay"
aOrB = (char 'a' +++ char 'b') *> pure "ayorbee"

Haskell split string on last occurence

Is there any way I can split String in Haskell on the last occurrence of given character into 2 lists?
For example I want to split list "a b c d e" on space into ("a b c d", "e").
Thank you for answers.
I'm not sure why the solutions suggested are so complicated. Only one two traversals are needed:
splitLast :: Eq a => a -> [a] -> Either [a] ([a],[a])
splitLast c' = foldr go (Left [])
where
go c (Right (f,b)) = Right (c:f,b)
go c (Left s) | c' == c = Right ([],s)
| otherwise = Left (c:s)
Note this is total and clearly signifies its failure. When a split is not possible (because the character specified wasn't in the string) it returns a Left with the original list. Otherwise, it returns a Right with the two components.
ghci> splitLast ' ' "hello beautiful world"
Right ("hello beautiful","world")
ghci> splitLast ' ' "nospaceshere!"
Left "nospaceshere!"
Its not beautiful, but it works:
import Data.List
f :: Char -> String -> (String, String)
f char str = let n = findIndex (==char) (reverse str) in
case n of
Nothing -> (str, [])
Just n -> splitAt (length str - n -1) str
I mean f 'e' "a b c d e" = ("a b c d ", "e"), but I myself wouldn't crop that trailing space.
I would go with more pattern matching.
import Data.List
splitLast = contract . words
where contract [] = ("", "")
contract [x] = (x, "")
contract [x,y] = (x, y)
contract (x:y:rest) = contract $ intercalate " " [x,y] : rest
For long lists, we just join the first two strings with a space and try the shorter list again. Once the length is reduced to 2, we just return the pair of strings.
(x, "") seemed like a reasonable choice for strings with no whitespace, but I suppose you could return ("", x) instead.
It's not clear that ("", "") is the best choice for empty strings, but it seems like a reasonable alternative to raising an error or changing the return type to something like Maybe (String, String).
I can propose the following solution:
splitLast list elem = (reverse $ snd reversedSplit, reverse $ fst reversedSplit)
where
reversedSplit = span (/= elem) $ reverse list
probably not the fastest one (two needless reverses) but I like it's simplicity.
If you insist on removing the space we're splitting on, you can go for:
import qualified Data.List as List
splitLast list elem = splitAt (last $ List.elemIndices elem list) list
however, this version assumes that there will be at least one element matching the pattern. If you don't like this assumption, the code gets slightly longer (but no double-reversals here):
import qualified Data.List as List
splitLast list elem = splitAt index list where
index = if null indices then 0 else last indices
indices = List.elemIndices elem list
Of course, choice of splitting at the beginning is arbitrary and probably splitting at the end would be more intuitive for you - then you can simply replace 0 with length list
My idea is to split at every occurrence and then separate the initial parts from the last part.
Pointed:
import Control.Arrow -- (&&&)
import Data.List -- intercalate
import Data.List.Split -- splitOn
breakOnLast :: Eq a => a -> [a] -> ([a], [a])
breakOnLast x = (intercalate x . init &&& last) . splitOn x
Point-free:
liftA2 (.) ((&&& last) . (. init) . intercalate) splitOn
(.) <$> ((&&&) <$> ((.) <$> pure init <*> intercalate) <*> pure last) <*> splitOn

string to list of lists of rhyming words

My goal is a function which takes a sentence and returns a list of lists with the words rhyming (rhyming = last 3 chars are equal).
Example: "Six sick hicks nick six slick bricks with picks and sticks." ->
[[Six,six],[sick,nick,slick],[hicks,bricks,picks,sticks],[with]]
This is my code so far (bsort is bubblesort):
rhymeWords:: String -> [[String]]
rhymeWords "" = []
rhymeWords xs = bsort (words (reverse xs))
I do not know how to translate it into code but I would like to take the first three chars of the first string and put them into a list. Then take the next String and test if it is equal to the first. If true put the second string into the first list otherwise create a second list. Then move on to the third string, each time testing with previous lists.
Can anyone please help me?
The following code groups rhymes as requested, although it converts all characters to lower case.
import Data.List (sort)
import Data.Char (toLower)
rhymeWords:: String -> [[String]]
rhymeWords "" = []
rhymeWords xs = [map reverse g | g <- groupRhymes (sortRhymes xs) []]
where sortRhymes xs = sort $ map reverse (words [toLower x | x <- xs])
groupRhymes :: [String] -> [[String]] -> [[String]]
groupRhymes [] acc = acc
groupRhymes (x:xs) acc = case acc of
[] -> groupRhymes xs [[x]]
_ -> if take 3 x == take 3 (head (last acc))
then groupRhymes xs ((init acc) ++ [(last acc) ++ [x]])
else groupRhymes xs (acc ++ [[x]])
Example result:
hymeWords "Six sick hicks nick six slick bricks with picks and sticks"
[["and"],["with"],["slick","nick","sick"],["hicks","picks","bricks","sticks"],["six","six"]]
Note that the example input doesn't have a period at the end of the sentence, because the last word would include it and break the sorting. You'll need to fiddle a bit with presented code if you need to pass sentences with a period.
When you have to group items together, you can use Data.List's grouping higher order functions. With groupBy you can easily solve your problem just by writing your grouping function. In your case, you want to group words that rhyme together. You just have to write the function rhyming:
rhyming :: String -> String -> Bool
rhyming word1 word2 = last3 (lower word1) == last3 (lower word2)
where
last3 = take 3 . reverse -- if you wanted `last3` to return the last three characters in order, you'd just have to apply `reverse` to the result, but that's unnecessary here
lower = map toLower
So your rhymeWords function can be written like so:
import Data.List (groupBy, sort)
import Data.Char (toLower)
rhyming :: String -> String -> Bool
rhyming word1 word2 = last3 (lowercase word1) == last3 (lowercase word2)
where
last3 = take 3 . reverse
lowercase = map toLower
rhymeWords :: String -> [[String]]
rhymeWords = groupBy rhyming . map reverse . sort . map reverse . words
The map reverse . sort . map reverse thing is needed since groupBy groups elements that are next to another. It groups words that are likely to rhyme together.

Type error in generator , haskell list using tuple

Currently working with Haskell on a function that takes a String in parameters and return a list of (Char, Int) The function occur works with multiple type and is used in the function called word.
occur::Eq a=>a->[a]->Int
occur n [] = 0
occur n (x:xs) = if n == x
then 1 + occur n xs
else occur n xs
word::String->[(String,Int)]
word xs = [(x,y) | x<-head xs, y<-(occur x xs)]
Get me this error
ERROR "file.hs":31 - Type error in generator
*** Term : head xs
*** Type : Char
*** Does not match : [a]
What am I doing wrong ? How can I make this code run properly , type-wise ?
The problem is you say that xs has type String, so head xs has type Char, and then you try to iterate over a single Char, which can't be done. The a <- b syntax only works when b is a list. You have the same problem in that y <- occur x xs is trying to iterate over a single Int, not a list of Int. You also had a problem in your type signature, the first type in the tuple should be Char, not String. You can fix it with:
word :: String -> [(Char, Int)]
word xs = [(x, occur x xs) | x <- xs]
Here we loop over the entire string xs, and for each character x in xs we compute occur x xs.
I would actually recommend using a slightly stronger constraint than just Eq. If you generalize word (that I've renamed to occurrences) and constrain it with Ord, you can use group and sort, which allow you to keep from iterating over the list repeatedly for each character and avoid the O(n^2) complexity. You can also simplify the definition pretty significantly:
import Control.Arrow
import Data.List
occurrences :: Ord a => [a] -> [(a, Int)]
occurrences = map (head &&& length) . group . sort
What this does is first sort your list, then group by identical elements. So "Hello, world" turns into
> sort "Hello, world"
" ,Hdellloorw"
> group $ sort "Hello, world"
[" ", ",", "H", "d", "e", "lll", "oo", "r", "w"]
Then we use the arrow operator &&& which takes two functions, applies a single input to both, then return the results as a tuple. So head &&& length is the same as saying
\x -> (head x, length x)
and we map this over our sorted, grouped list:
> map (head &&& length) $ group $ sort "Hello, world"
[(' ',1),(',',1),('H',1),('d',1),('e',1),('l',3),('o',2),('r',1),('w',1)]
This eliminates repeats, you aren't having to scan the list over and over counting the number of elements, and it can be defined in a single line in the pointfree style, which is nice. However, it does not preserve order. If you need to preserve order, I would then use sortBy and the handy function comparing from Data.Ord (but we lose a nice point free form):
import Control.Arrow
import Data.List
import Data.Ord (comparing)
occurrences :: Ord a => [a] -> [(a, Int)]
occurrences = map (head &&& length) . group . sort
occurrences' :: Ord a => [a] -> [(a, Int)]
occurrences' xs = sortBy (comparing ((`elemIndex` xs) . fst)) $ occurrences xs
You can almost read this as plain English. This sorts by comparing the index in xs of the first element of the tuples in occurrences xs. Even though elemIndex returns a value of type Maybe Int, we can still compare those directly (Nothing is "less than" any Just value). It simply looks up the first index of each letter in the original string and sorts by that index. That way
> occurrences' "Hello, world"
returns
[('H',1),('e',1),('l',3),('o',2),(',',1),(' ',1),('w',1),('r',1),('d',1)]
with all the letters in the original order, up to repetition.

Haskell: Function receiving a null list

here are some type definitions in my program FYI:
type BitString -> String
type Plateau -> [BitString]
I have a function called:
--Extract will take a list of lists, and return the inner list of the most items. Empty list should return ["00000"]
extract::[Plateau]->Plateau
extract _ = ["00000"]
extract (x:xs)
|x==maximumBy(compare `on` length)xs=x --thanks SOF
|otherwise = extract (xs)
The problem is, no matter what i do, extract returns ["00000"]
here are some outputs from GHCI, that are working:
>plateau graycodes
[["01000"],["01010","11010","10010"],["00101"],["01101","01001"]]
this is expected, and is in the form of a [Plateau] since this is a list of lists of string.
>maximumBy(compare `on` length)(plateau graycodes)
["01010","11010","10010"]
>extract (plateau graycodes)
["00000"]
in this case, i know for sure that extract will be called with a not empty [Plateau]. But the _ part of the function is returning.
I have tried also:
extract (x:xs)
|x==[]=["00000"]
|x==[""]=["00000"]
|x==maximumBy(compare `on` length)xs=x --thanks SOF
|otherwise = extract (xs)
error: List.maximumBy: Empty list
When you define a function with multiple patterns, they will be tried in order from top to bottom. The problem is that your topmost pattern of extract will match anything, and therefore the first case will always be chosen.
The solution is to either reorder them, or change the first pattern to only match the empty list:
extract [] = ["00000"]
extract (x:xs) = ...
you are getting that error, because you are not passing in your list (x:xs) to maximumBy:
extract :: [Plateau] -> Plateau
extract (x:xs)
|x == maximumBy (compare `on` length) (x:xs) = x
|otherwise = extract (xs)
extract _ = ["00000"]
or, preferably,
extract :: [Plateau] -> Plateau
extract s#(x:xs)
|x == maximumBy (compare `on` length) s = x
|otherwise = extract (xs)
extract _ = ["00000"]
(this also adds a needed = after your otherwise)
EDIT:
I was not satisfied with my answer, or your acceptance of that answer.
I believe this is the code you are really after:
extract :: [Plateau] -> Plateau
extract (x:[]) = x
extract s#(x:xs) = maximumBy (compare `on` length) s
extract _ = ["00000"]
The solution is simple, switch place of the cases of extract. The pattern extract _ will always match,
thus the second case will never be executed.
Working code (hopefully):
--Extract will take a list of lists, and return the inner list of the most items. Empty list should return ["00000"]
extract::[Plateau]->Plateau
extract (x:xs)
|x==maximumBy(compare `on` length)=x --thanks SOF
|otherwise extract (xs)
extract _ = ["00000"]