Haskell getArgs changing data type - regex

I am trying to build a main function for a Haskell module which would take a regular expression from the user and use this in the SimplifyRegExp function but this wants the input in type RegExp:
data RegExp sy = Empty
| Epsilon
| Literal sy
| Or (RegExp sy) (RegExp sy)
| Then (RegExp sy) (RegExp sy)
| Star (RegExp sy)
deriving (Read, Eq)
How would I be able to turn a string to type RegExp?
If I load the program onto GHCi then I can call the method straight like the following:
*Language.HaLex.RegExp> simplifyRegExp(Star (Star a))
'a'*
But I would like to do it so I can pass the program just one argument in command prompt and it would print the result something like the following (which ofcourse doesn't work):
main = do
n <- getArgs $ head
print (simplifyRegExp(n))

You can define a Read instance for your type and use that
data RegEx sy = ...
deriving Read
And then use readMay
import Text.Read
...
main = do
regexp <- (readMay . head) `fmap` getArgs
case regexp of
Just r -> ...
Nothing -> putStrLn "Parse error!"
But this is a little brittle in two ways. First is that read is a partial function! If the regexp is ill formed your program will blow up. Second, using your default read instance forces your internal representation of regexs onto your users! You'd be better off doing some actually parsing if this is a serious project.
Luckily, Haskell has some really awesome parsing libraries. Some of the most famous include parsec and attoparsec.
An example of a parsec parser might be
import Text.Parsec
import Text.Parsec.String
import Control.Applicative
parseStar :: Parsec (RegExp Char)
parseStar = Star <$> (parseRe <* char '*')
parseLiteral :: Parsec (RegExp Char)
parseLiteral = Literal <$> noneOf "*()"
parseOr :: Parsec (RegExp Char)
parseOr = Or <$> parseRe <*> (char '|' *> parseRe)
parseThen :: Parsec (RegExp Char)
parseThen = Then <$> parseRe <*> parseRe
....

Related

Parser for recursive expressions hangs in ghci

I am trying to make a parser for the following recursive datatype:
data Expr = Val Int
| Var Char
| App Op Expr Expr
deriving Show
data Op = Add | Sub | Mul | Div
deriving Show
It should, for example, parse "(1 + (a / -2))" as App Add (Val 1) (App Div (Var 'a') (Val (-2))). I've managed to write parsers for the Val and Var constructors as well as for Op's constructors like so:
import Text.Regex.Applicative
import Data.Char
rNonnegativeIntegral :: (Read a, Integral a) => RE Char a
rNonnegativeIntegral = read <$> some (psym isDigit)
rNegativeIntegral :: (Read a, Integral a) => RE Char a
rNegativeIntegral = negate <$> (sym '-' *> rNonnegativeIntegral)
rIntegral :: (Read a, Integral a) => RE Char a
rIntegral = rNonnegativeIntegral <|> rNegativeIntegral
rVal :: RE Char Expr
rVal = Val <$> rIntegral
rVar :: RE Char Expr
rVar = Var <$> psym isAlpha
rOp = aux <$> (foldr1 (<|>) $ map sym "+-*/")
where
aux '+' = Add
aux '-' = Sub
aux '*' = Mul
aux '/' = Div
When this is loaded into ghci it can produce the following output:
ghci> findLongestPrefix rVal "-271"
Just (Val (-271), "")
ghci> findLongestPrefix rVar "a"
Just (Var 'a', "")
ghci> findLongestPrefix rOp "-"
Just (Sub, "")
The trouble comes when I introduce this recursive definition for the App constructor:
whiteSpace :: RE Char String
whiteSpace = many $ psym isSpace
strictWhiteSpace :: RE Char String
strictWhiteSpace = some $ psym isSpace
rApp :: RE Char Expr
-- flip App :: Expr -> Op -> Expr
-- strictWhiteSpace after rOp to avoid conflict with rNegativeInteger
rApp = flip App <$> (sym '(' *> whiteSpace *> rExpr)
<*> (whiteSpace *> rOp <* strictWhiteSpace)
<*> (rExpr <* whiteSpace <* sym ')')
rExpr :: RE Char Expr
rExpr = rVal <|> rVar <|> rApp
This loads into ghci just fine, and all previous constructors still work. But findLongestPrefix rApp "(1 + a)" and many similar expressions cause ghci to hang and produce no output.
Through experimentation I've found that the issue happens in general when rExpr is passed in as the first argument to <*. For example, findLongestPrefix (rExpr <* whiteSpace) "a)" also causes ghci to hang.
Also, when the definition for rExpr is replaced by
rExpr = rVal <|> rVar
all of these hanging issues go away. Simple expressions like "(1 + a)" are able to be parsed, but support for recursive expressions is not available.
How can I implement a recursive parser here without hanging issues?
The language of expressions that you describe isn't regular. So you'll have to use a different library.
Luckily, essentially the same parser structure should work fine with most other parser combinator libraries. It should be as simple as substituting your new library's name for a few basic parsers in place of their regex-applicative analogs.

Haskell Text Parser Combinators to parse a Range Greedily like Regex range notation

In regex you can acquire a range of a parse by doing something like \d{1,5}, which parses a digit 1 to 5 times greedily. Or you do \d{1,5}? to make it lazy.
How would you do this in Haskell's Text.ParserCombinators.ReadP?
My attempt gave this:
rangeParse :: Read a => ReadP a -> [Int] -> ReadP [a]
rangeParse parse ranges = foldr1 (<++) $ fmap (\n -> count n $ parse) ranges
Which if you do it like rangeParse (satisfy isDigit) ([5,4..1]) will perform a greedy parse of digits 1 to 5 times. While if you swap the number sequent to [1..5], you get a lazy parse.
Is there a better or more idiomatic way to do this with parser combinators?
update: the below is wrong - for example
rangeGreedy 2 4 a <* string "aab", the equivalent of regexp a{2,4}aab, doesn't match. The questioner's solution gets this right. I won't delete the answer just yet in case it keeps someone else from making the same mistake.
=========
This isn't a complete answer, just a possible way to write the greedy
version. I haven't found a nice way to do the lazy version.
Define a left-biased version of option that returns Maybes:
greedyOption :: ReadP a -> ReadP (Maybe a)
greedyOption p = (Just <$> p) <++ pure Nothing
Then we can do up to n of something with a replicateM of them:
upToGreedy :: Int -> ReadP a -> ReadP [a]
upToGreedy n p = catMaybes <$> replicateM n (greedyOption p)
To allow a minimum count, do the mandatory part separately and append
it:
rangeGreedy :: Int -> Int -> ReadP a -> ReadP [a]
rangeGreedy lo hi p = (++) <$> count lo p <*> upToGreedy (hi - lo) p
The rest of my test code in case it's useful for anyone:
module Main where
import Control.Monad (replicateM)
import Data.Maybe (catMaybes)
import Text.ParserCombinators.ReadP
main :: IO ()
main = mapM_ go ["aaaaa", "aaaab", "aaabb", "aabbb", "abbbb", "bbbbb"]
where
go = print . map fst . readP_to_S test
test :: ReadP [String]
test = ((++) <$> rangeGreedy 2 4 a <*> many aOrB) <* eof
where
a = char 'a' *> pure "ay"
aOrB = (char 'a' +++ char 'b') *> pure "ayorbee"

Frequency table in Haskell with list comprehension only, find frequency of characters in a String

I am new to Haskell, trying to learn some stuff and pass the task that I was given. I would like to find the number of characters in a String but without importing Haskell modules.
I need to implement a frequency table and I would like to understand more about programming in Haskell and how I can do it.
I have my FreqTable as a tuple with the character and the number of occurrences of the 'char' in a String.
type FreqTable = [(Char, Int)]
I have been searching for for a solution for couple of days and long hours to find some working examples.
My function or the function in the task id declares as follows:
fTable :: String -> FreqTable
I know that the correct answer can be:
map (\x -> (head x, length x)) $ group $ sort
or
map (head &&& length) . group . sort
or
[ (x,c) | x <- ['A'..'z'], let c = (length . filter (==x)), c>0 ]
I can get this to work exactly with my list but I found this as an optional solution. I am getting an error which I can solve at the moment with the above list comprehension.
Couldn't match expected type ‘String -> FreqTable’
with actual type ‘[(Char, [Char] -> Int)]’
In the expression:
[(x, c) |
x <- ['A' .. 'z'], let c = (length . filter (== x)), c > 0]
In an equation for ‘fTable’:
fTable
= [(x, c) |
x <- ['A' .. 'z'], let c = (length . filter (== x)), c > 0]
Can please someone share with me and explain me a nice and simple way of checking the frequency of characters without importing Data.List or Map
You haven't included what you should be filtering and taking the length of
[ (x,c) | x <- ['A'..'z'], let c = (length . filter (==x)), c>0 ]
-- ^_____________________^
-- this is a function from a String -> Int
-- you want the count, an Int
-- The function needs to be applied to a String
The string to apply it to is the argument to fTable
fTable :: String -> FreqTable
fTable text = [ (x,c) | x <- ['A'..'z'], let c = (length . filter (==x)) text, c>0 ]
-- ^--------------------------------------------------------------------^
The list: ['A'..'z'] is this string:
"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz"
so you are iterating over both upper and lower case letters (and some symbols.) That's why you have a tuple, e.g., for both 'A' and 'a'.
If you want to perform a case-insensitive count, you have to perform a case-insensitive comparison instead of straight equality.
import Data.Char
ciEquals :: Char -> Char -> Bool
ciEquals a b = toLower a == toLower b
Then:
ftable text = [ (x,c) | x <- ['A'..'Z'],
, let c = (length . filter (ciEquals x)) text,
, c > 0 ]

Parsing out words in a string

I hope I was clear about my question!
Any help would be appreciated!
The function words from the Prelude will filter out spaces for you (a good way to find functions by desired type is Hoogle).
Prelude> :t words
words :: String -> [String]
You just need to compose this with an appropriate filter that makes use of Set. Here's a really basic one:
import Data.Set (Set, fromList, notMember)
parser :: String -> [String]
parser = words . filter (`notMember` delims)
where delims = fromList ".,!?"
parser "yeah. what?" Will return ["yeah", "what"].
Check out Learn You A Haskell for some good introductory material.
You want Data.List.Split, which covers the vast majority of splitting use cases.
For your example, just use:
splitOneOf ".,!?"
And if you want to get rid of the "empty words" between consecutive delimiters, just use:
filter (not . null) . splitOneOf ".,!?"
If you want those delimiters to come from set that you already stored them in, then just use:
import qualified Data.Set as S
s :: S.Set Char
split = filter (not . null) . splitOneOf (S.toList s)
As you are learning, here's how to do it from scratch.
import qualified Data.Set as S
First, the set of word boundaries:
wordBoundaries :: S.Set Char
wordBoundaries = S.fromList " ."
(Data.Set.fromList takes a list of elements; [Char] is the same as String, which is why we can pass a string in this case.)
Next, splitting a string into words:
toWords :: String -> [String]
toWords = fst . foldr cons ([], True)
where
The documentation for fst and foldr is pretty clear, but that for . is a bit terse if you've not encountered function composition before.
The argument given to toWords is fed to the foldr cons ([], True). . then takes the result from foldr cons ([], True) and feeds it to fst. Finally, the result from fst is used as the result from toWords itself.
We have still to define cons:
cons :: Char -> ([String], Bool) -> ([String], Bool)
cons ch (words, startNew)
| S.member ch wordBoundaries = ( words, True)
| startNew = ([ch] : words, False)
cons ch (word : words, _) = ((ch : word) : words, False)
Homework: work out what cons does and how it works. This may be easier if you first ensure you understand how foldr calls it.

OCaml function parameter pattern matching for strings

I tried to pass a string in to get a reversed string. Why can't I do this:
let rec reverse x =
match x with
| "" -> ""
| e ^ s -> (reverse s) ^ e;;
The compiler says it's a syntax error. Can't I use ^ to destructure parameters?
The reason for this is that strings are not represented as a datatype in the same way as lists are. Therefore, while cons (::) is a constructor, ^ is not. Instead, strings are represented as a lower level type without a recursive definition (as lists are). There is a way to match strings as a list of characters, using a function from SML (which you can write in OCaml) called explode and implode which -- respectively -- take a string to a char list and vice versa. Here's an example implementation of them.
As Kristopher Micinski explained, you can't decompose strings using pattern matching as you do with lists.
But you can convert them to lists, using explode. Here's your reverse function with pattern matching using explode and its counterpart implode:
let rec reverse str =
match explode str with
[] -> ""
| h::t -> reverse (implode t) ^ string_of_char h
Use it like this:
let () =
let text = "Stack Overflow ♥ OCaml" in
Printf.printf "Regular: %s\n" text;
Printf.printf "Reversed: %s\n" (reverse text)
Which shows that it works for single-byte characters but not for multi-byte ones.
And here are explode and implode along with a helper method:
let string_of_char c = String.make 1 c
(* Converts a string to a list of chars *)
let explode str =
let rec explode_inner cur_index chars =
if cur_index < String.length str then
let new_char = str.[cur_index] in
explode_inner (cur_index + 1) (chars # [new_char])
else chars in
explode_inner 0 []
(* Converts a list of chars to a string *)
let rec implode chars =
match chars with
[] -> ""
| h::t -> string_of_char h ^ (implode t)
When you write a pattern matching expression, you cannot use arbitrary functions in your patterns. You can only use constructors, which look like unevaluated functions. For example, the function "+" is defined on integers. So the expression 1+2 is evaluated and gives 3; the function "+" is evaluated, so you cannot match on x+y. Here is an attempt to define a function on natural numbers that checks whether the number is zero:
let f x = match x with
| 0 -> false
| a+1 -> true
;;
This cannot work! For the same reason, your example with strings cannot work. The function "^" is evaluated on strings, it is not a constructor.
The matching on x+1 would work only if numbers were unevaluated symbolic expressions made out of the unevaluated operator + and a symbolic constant 1. This is not the case in OCAML. Integers are implemented directly through machine numbers.
When you match a variant type, you match on constructors, which are unevaluated expressions. For example:
# let f x = match x with
| Some x -> x+1
| None -> 0
;;
val f : int option -> int = <fun>
This works because the 'a option type is made out of a symbolic expression, such as Some x. Here, Some is not a function that is evaluated and gives some other value, but rather a "constructor", which you can think of as a function that is never evaluated. The expression Some 3 is not evaluated any further; it remains as it is. It is only on such functions that you can pattern-match.
Lists are also symbolic, unevaluated expressions built out of constructors; the constructor is ::. The result of x :: y :: [] is an unevaluated expression, which is represented by the list [x;y] only for cosmetic convenience. For this reason, you can pattern-match on lists.