Where does many produce the empty string?

Where does many produce the empty string? - regex

I am currently reading Programming in Haskell by Graham Hutton. I am stuck on the chapter of Parsers. In it there are two mutually recursive functions defined as:
many p = many1 p +++ return []
many1 p = do v <- p
vs <- many p
return (v:vs)
Where many is actually transformed into this form:
many1 p = p >>= (\ v -> many p >>= (\ vs -> return (v : vs)))
The >>= operator is defined as:
p >>= f = P (\inp -> case parse p inp of
[] -> []
[(v,out)] -> parse (f v) out)
The +++ operator is defined as:
p +++ q = P (\inp -> case parse p inp of
[] -> parse q inp
[(v,out)] -> [(v,out)])
The other functions relevant to this question are these:
parse :: Parser a -> String -> [(a,String)]
parse (P p) inp = p inp
sat p = do x <- item
if p x then return x else failure
digit = sat isDigit
failure = P (\inp -> [])
item = P (\inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)])
return v = P (\inp -> [(v,inp)])
Now, when attempting to use many1 to parse digits from the string "a", like:
parse (many digit) "a"
the result is [("","a")].
When attempting to parse digits from the string "a" using many1 like:
parse (many1 digit) "a"
the result is [].
I think I understand why the second result. (many1 digit) attempts to parse the string "a", and so it calls digit "a" which fails since "a" is not a digit, and so the empty list is returned [].
However, I do not understand the first result when using (many digit). If (many1 digit) returns [] then obviously it failed, and so in the +++ operator, the second case expression is executed. But when I try to parse (return []) "a" the result I get back is [([], "a")].
I don't get it why the result of many is [("", "a")], when the result of many1 is [].
Any help is appreciated.
P.S. I have seen this question already, but it doesn't give me the answer I am looking for.

If your confusion is that you get back [("", "a")] when you expected [([], "a")]:
A string is a list of Chars. So "" is an empty list of Chars. Since [] is an empty list of any type, that means that "" is just a special case of []. In other words [] :: [Char] is completely equivalent to "".
So since your parser is expected to produce a string, the empty list is known to be of type [Char] and thus printed as "" instead of [].

Related

Process a string using foldr where '#' means deleting the previous character

I need to process a string using foldr where '#' means deleting the previous character. For example:
>backspace "abc#d##c"
"ac"
>backspace "#####"
""
It needs to be done using foldr through one pass of the list, without using reverse and/or (++).
Here what I have got so far:
backspace :: String -> String
backspace xs = foldr func [] xs where
func c cs | c /= '#' = c:cs
| otherwise = cs
But it just filter the '#' from the string. I thought about deleting the last element of current answer every time c == '#' and got something like that
backspace :: String -> String
backspace xs = foldr func [] xs where
func c cs | c /= '#' = c:cs
| cs /= [] = init cs
| otherwise = cs
but it is not working properly,
ghci> backspace "abc#d##c"
"abc"

You can use (Int, String) as state for your foldr where the first Int is the number of backspaces, and the String is the current string constructed.
This thus means that you can work with:
backspace :: String -> String
backspace = snd . foldr func (0, [])
where func '#' (n, cs) = (n+1, cs)
func c (n, cs)
| n > 0 = … -- (1)
| otherwise = … -- (2)
In case we have a character that is not a #, but n > 0 it means we need to remove that character, and thus ignore c and decrement n. In case n == 0 we can add c to the String.
I leave filling in the … parts as an exercise.

How to count the number of recurring character repetitions in a char list?

My goal is to take a char list like:
['a'; 'a'; 'a'; 'a'; 'a'; 'b'; 'b'; 'b'; 'a'; 'd'; 'd'; 'd'; 'd']
Count the number of repeated characters and transform it into a (int * char) list like this:
[(5, 'a'); (3, 'b'); (1, 'a'); (4, 'd')]
I am completely lost and also am very very new to OCaml. Here is the code I have rn:
let to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
let count = int 0 in
while t <> [] do
if h = t then
count := count + 1;
done;
I am struggling on how to check the list like you would an array in C or Python. I am not allowed to use fold functions or map or anything like that.
Edit: Updated code, yielding an exception on List.nth:
let rec to_run_length (lst : char list) : (int * char) list =
let n = ref 0 in
match lst with
| [] -> []
| h :: t ->
if h = List.nth t 0 then n := !n + 1 ;
(!n, h) :: to_run_length t ;;
Edit: Added nested match resulting in a function that doesn't work... but no errors!
let rec to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
match to_run_length t with
| [] -> []
| (n, c) :: tail ->
if h <> c then to_run_length t
else (n + 1, c) :: tail ;;
Final Edit: Finally got the code running perfect!
let rec to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
match to_run_length t with
| (n, c) :: tail when h = c -> (n + 1, h) :: tail
| tail -> (1, h) :: tail ;;

One way to answer your question is to point out that a list in OCaml isn't like an array in C or Python. There is no (constant-time) way to index an OCaml list like you can an array.
If you want to code in an imperative style, you can treat an OCaml list like a list in C, i.e., a linked structure that can be traversed in one direction from beginning to end.
To make this work you would indeed have a while statement that continues only as long as the list is non-empty. At each step you examine the head of the list and update your output accordingly. Then replace the list with the tail of the list.
For this you would want to use references for holding the input and output. (As a side comment, where you have int 0 you almost certainly wanted ref 0. I.e., you want to use a reference. There is no predefined OCaml function or operator named int.)
However, the usual reason to study OCaml is to learn functional style. In that case you should be thinking of a recursive function that will compute the value you want.
For that you need a base case and a way to reduce a non-base case to a smaller case that can be solved recursively. A pretty good base case is an empty list. The desired output for this input is (presumably) also an empty list.
Now assume (by recursion hypothesis) you have a function that works, and you are given a non-empty list. You can call your function on the tail of the list, and it (by hypothesis) gives you a run-length encoded version of the tail. What do you need to do to this result to add one more character to the front? That's what you would have to figure out.
Update
Your code is getting closer, as you say.
You need to ask yourself how to add a new character to the beginning of the encoded value. In your code you have this, for example:
. . .
match to_run_length t with
| [] -> []
. . .
This says to return an empty encoding if the tail is empty. But that doesn't make sense. You know for a fact that there's a character in the input (namely, h). You should be returning some kind of result that includes h.
In general if the returned list starts with h, you want to add 1 to the count of the first group. Otherwise you want to add a new group to the front of the returned list.

Partitioning a String into more pieces with separating char in Haskell

I have the following homework:
Define a function split :: Char -> String -> [String] that splits a string, which consists of substrings separated by a separator, into a list of strings.
Examples:
split '#' "foo##goo" = ["foo","","goo"]
split '#' "#" = ["",""]
I have written the following function:
split :: Char -> String -> [String]
split c "" = [""]
split a "a" = ["",""]
split c st = takeWhile (/=c) st : split c tail((dropWhile (/=c) st))
It does not compile, and I can't see why.
TakeWhile adds all the characters which are not c to the result, then tail drops that c that was found already, and we recursively apply split to the rest of the string, gotten with dropWhile. The : should make a list of "lists" as strings are lists of chars in Haskell. Where is the gap in my thinking?
Update:
I have updated my program to the following:
my_tail :: [a]->[a]
my_tail [] = []
my_tail xs = tail xs
split :: Char -> String -> [String]
split c "" = [""]
split a "a" = ["",""]
split c st = takeWhile (/=c) st ++ split c (my_tail(dropWhile (/=c) st))
I still get an error, the following:
Why is the expected type [String] and then [Char]?

The reason why this does not compile is because Haskell, sees your last clause as:
split c st = takeWhile (/=c) st : split c tail ((dropWhile (/=c) st))
It thus thinks that you apply three parameters to split: c, tail and ((dropWhile (/=c) st)). You should use brackets here, like:
split c st = takeWhile (/=c) st : split c (tail (dropWhile (/=c) st))
But that will not fully fix the problem. For example if we try to run your testcase, we see:
Prelude> split '#' "foo##goo"
["foo","","goo"*** Exception: Prelude.tail: empty list
tail :: [a] -> [a] is a "non-total" function. For the empty list, tail will error. Indeed:
Prelude> tail []
*** Exception: Prelude.tail: empty list
Eventually, the list will run out of characters, and then tail will raise an error. We might want to use span :: (a -> Bool) -> [a] -> ([a], [a]) here, and use pattern matching to determine if there is still some element that needs to be processed, like:
split :: Eq a => a -> [a] -> [[a]]
split _ [] = [[]]
split c txt = pf : rst
where rst | (_:sf1) <- sf = split c sf1
| otherwise = []
(pf,sf) = span (c /=) txt
Here span (c /=) txt will thus split the non-empty list txt in two parts pf (prefix) is the longest prefix of items that are not equal to c. sf (suffix) are the remaining elements.
Regardless whether sf is empty or not, we emit the prefix pf. Then we inspect the suffix. We know that either sf is empty (we reached the end of the list), or that the the first element of sf is equal to c. We thus use pattern guard to check if this matches with the (_:sf1) pattern. This happens if sf is non-empty. In that case we bind sf1 with the tail of sf, and we recurse on the tail. In case sf1 is empty, we can stop, and thus return [].
For example:
Prelude> split '#' "foo##goo"
["foo","","goo"]
Prelude> split '#' "#"
["",""]

Haskell Split list into Sublist using pattern recognition

I am trying to split a Array containing I and Os, if a certain pattern occurs.
lets assume i have an input, looking like this:
data Bit = O | I deriving (Eq, Show)
let b = [I,I,O,O,O,O,O,I,I,O,O,O,I,O]
that is what i am generating, when encoding [[Bool]] -> [Bit] corresponding input to my encode function would be let a = [[True, False, False, True],[False, False],[False]]
Now my objective is to decode what ive generated,so i need a function that gets me from b to a.
But i can't come up with a way to split b list into 3 sublists, every time it reads either I,O or I,I. Every Odd letter stands for following member or starting array member. I am basically copying utf unicode encoding.
So i am trying to build a function that would get me from b to a.
After some time i came up with this:
split :: [Bit] -> [[Bit]]
split (x1:x2:xs) = if (x1 == I)
then [x2 : split xs]
else x2 : split xs
And i cant figure out, how to split the list into sublist. Any kind of advice/help/code is greatly appreciated
EDIT:
split :: [Bit] ->[[Bit]]
split [] = []
split xs = case foo xs of (ys,I,x2) -> -- generate new subarray like [...,[x2]]
(ys,O,x2) -> -- append existing subarray with value x2 [.....,[previous values]++x2]
foo :: [a] -> ([a],x1,x2)
foo x1:x2:input = (input,x1,x2)
those 2 comments are the last thing i need to figure out. after that im done :)
if feeding b into function split, i want this ouput: [[I,O,O,I],[O,O],[O]]
final step would be to get from b to [[True, False, False, True],[False, False],[False]]

I would start with if (x1 == 1) ...
If x1 is a Bit that can be either I or O, why are you comparing its equality against a Num, 1?

If I got it right, you need something like:
split [] = []
split xs = case foo xs of (ys,r) -> r : split ys
foo :: [a] -> ([a],r)
foo = undefined
In foo, the list should get partially consumed and returns the rest of the list and the value to collect.
EDIT:
data Bit = O | I deriving (Eq, Show)
sampleA = [[True, False, False, True],[False, False],[False]]
sampleB = [I,I,O,O,O,O,O,I,I,O,O,O,I,O]
type TwoBit = (Bit,Bit)
twobit (x:y:xs) = (x,y) : twobit xs
twobit _ = []
split :: [TwoBit] -> [[Bool]]
split [] = []
split xs = case spli xs of (ys,r) -> r : split ys
where
spli :: [TwoBit] -> ([TwoBit],[Bool])
spli (x:xs) = case span (not . pterm) xs of
(ys,zs) -> (zs, map ptrue $ x:ys)
pterm x = (I,O) == x || (I,I) == x
ptrue x = (O,I) == x || (I,I) == x
splitTB = split . twobit
main = print $ splitTB sampleB == sampleA
PS Functions that look like s -> (s,a) could also be represented as State monad.

Haskell: return the "list" result of a function as a "list of lists" without using an empty list "[]:foo"

What would be the syntax (if possible at all) for returning the list of lists ([[a]]) but without the use of empty list ([]:[a])?
(similar as the second commented guard (2) below, which is incorrect)
This is a function that works correctly:
-- Split string on every (shouldSplit == true)
splitWith :: (Char -> Bool) -> [Char] -> [[Char]]
splitWith shouldSplit list = filter (not.null) -- would like to get rid of filter
(imp' shouldSplit list)
where
imp' _ [] = [[]]
imp' shouldSplit (x:xs)
| shouldSplit x = []:imp' shouldSplit xs -- (1) this line is adding empty lists
-- | shouldSplit x = [imp' shouldSplit xs] -- (2) if this would be correct, no filter needed
| otherwise = let (z:zs) = imp' shouldSplit xs in (x:z):zs
This is the correct result
Prelude> splitWith (== 'a') "miraaaakojajeja234"
["mir","koj","jej","234"]
However, it must use "filter" to clean up its result, so I would like to get rid of function "filter".
This is the result without the use of filter:
["mir","","","","koj","jej","234"]
If "| shouldSplit x = imp' shouldSplit xs" is used instead the first guard, the result is incorrect:
["mirkojjej234"]
The first guard (1) adds empty list so (I assume) compiler can treat the result as a list of lists ([[a]]).
(I'm not interested in another/different solutions of the function, just the syntax clarification.)
.
.
.
ANSWER:
Answer from Dave4420 led me to the answer, but it was a comment, not an answer so I can't accept it as answer. The solution of the problem was that I'm asking the wrong question. It is not the problem of syntax, but of my algorithm.
There are several answers with another/different solutions that solve the empty list problem, but they are not the answer to my question. However, they expanded my view of ways on how things can be done with basic Haskell syntax, and I thank them for it.
Edit:
splitWith :: (Char -> Bool) -> String -> [String]
splitWith p = go False
where
go _ [] = [[]]
go lastEmpty (x:xs)
| p x = if lastEmpty then go True xs else []:go True xs
| otherwise = let (z:zs) = go False xs in (x:z):zs

This one utilizes pattern matching to complete the task of not producing empty interleaving lists in a single traversal:
splitWith :: Eq a => (a -> Bool) -> [a] -> [[a]]
splitWith f list = case splitWith' f list of
[]:result -> result
result -> result
where
splitWith' _ [] = []
splitWith' f (a:[]) = if f a then [] else [[a]]
splitWith' f (a:b:tail) =
let next = splitWith' f (b : tail)
in if f a
then if a == b
then next
else [] : next
else case next of
[] -> [[a]]
nextHead:nextTail -> (a : nextHead) : nextTail
Running it:
main = do
print $ splitWith (== 'a') "miraaaakojajeja234"
print $ splitWith (== 'a') "mirrraaaakkkojjjajeja234"
print $ splitWith (== 'a') "aaabbbaaa"
Produces:
["mir","koj","jej","234"]
["mirrr","kkkojjj","jej","234"]
["bbb"]

The problem is quite naturally expressed as a fold over the list you're splitting. You need to keep track of two pieces of state - the result list, and the current word that is being built up to append to the result list.
I'd probably write a naive version something like this:
splitWith p xs = word:result
where
(result, word) = foldr func ([], []) xs
func x (result, word) = if p x
then (word:result,[])
else (result, x:word)
Note that this also leaves in the empty lists, because it appends the current word to the result whenever it detects a new element that satisfies the predicate p.
To fix that, just replace the list cons operator (:) with a new operator
(~:) :: [a] -> [[a]] -> [[a]]
that only conses one list to another if the original list is non-empty. The rest of the algorithm is unchanged.
splitWith p xs = word ~: result
where
(result, word) = foldr func ([], []) xs
func x (result, word) = if p x
then (word ~: result, [])
else (result, x:word)
x ~: xs = if null x then xs else x:xs
which does what you want.

I guess I had a similar idea to Chris, I think, even if not as elegant:
splitWith shouldSplit list = imp' list [] []
where
imp' [] accum result = result ++ if null accum then [] else [accum]
imp' (x:xs) accum result
| shouldSplit x =
imp' xs [] (result ++ if null accum
then []
else [accum])
| otherwise = imp' xs (accum ++ [x]) result

This is basically just an alternating application of dropWhile and break, isn't it:
splitWith p xs = g xs
where
g xs = let (a,b) = break p (dropWhile p xs)
in if null a then [] else a : g b
You say you aren't interested in other solutions than yours, but other readers might be. It sure is short and seems clear. As you learn, using basic Prelude functions becomes second nature. :)
As to your code, a little bit reworked in non-essential ways (using short suggestive function names, like p for "predicate" and g for a main worker function), it is
splitWith :: (Char -> Bool) -> [Char] -> [[Char]]
splitWith p list = filter (not.null) (g list)
where
g [] = [[]]
g (x:xs)
| p x = [] : g xs
| otherwise = let (z:zs) = g xs
in (x:z):zs
Also, there's no need to pass the predicate as an argument to the worker (as was also mentioned in the comments). Now it is arguably a bit more readable.
Next, with a minimal change it becomes
splitWith :: (Char -> Bool) -> [Char] -> [[Char]]
splitWith p list = case g list of ([]:r)-> r; x->x
where
g [] = [[]]
g (x:xs)
| p x = case z of []-> r; -- start a new word IF not already
_ -> []:r
| otherwise = (x:z):zs
where -- now z,zs are accessible
r#(z:zs) = g xs -- in both cases
which works as you wanted. The top-level case is removing at most one empty word here, which serves as a separator marker at some point during the inner function's work. Your filter (not.null) is essentially fused into the worker function g here, with the conditional opening1 of a new word (i.e. addition1 of an empty list).
Replacing your let with where allowed for the variables (z etc.) to became accessible in both branches of the second clause of the g definition.
In the end, your algorithm was close enough, and the code could be fixed after all.
1 when thinking "right-to-left". In reality the list is constructed left-to-right, in guarded recursion ⁄ tail recursion modulo cons fashion.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Where does many produce the empty string? - regex

Related

Process a string using foldr where '#' means deleting the previous character

How to count the number of recurring character repetitions in a char list?

Partitioning a String into more pieces with separating char in Haskell

Haskell Split list into Sublist using pattern recognition

Haskell: return the "list" result of a function as a "list of lists" without using an empty list "[]:foo"

Categories

Resources