I am hand-writing a parser for a simple regular expression engine.
The engine supports a .. z | * and concatenation and parentheses
Here is the CFG I made:
exp = concat factor1
factor1 = "|" exp | e
concat = term factor2
factor2 = concat | e
term = element factor3
factor3 = * | e
element = (exp) | a .. z
which is equal to
S = T X
X = "|" S | E
T = F Y
Y = T | E
F = U Z
Z = *| E
U = (S) | a .. z
For alternation and closure, I can easily handle them by looking ahead and choose a production based on the token. However, there is no way to handle concatenation by looking ahead cause it is implicit.
I am wondering how can I handle concatenation or is there anything wrong with my grammar?
And this is my OCaml code for parsing:
type regex =
| Closure of regex
| Char of char
| Concatenation of regex * regex
| Alternation of regex * regex
(*| Epsilon*)
exception IllegalExpression of string
type token =
| End
| Alphabet of char
| Star
| LParen
| RParen
| Pipe
let rec parse_S (l : token list) : (regex * token list) =
let (a1, l1) = parse_T l in
let (t, rest) = lookahead l1 in
match t with
| Pipe ->
let (a2, l2) = parse_S rest in
(Alternation (a1, a2), l2)
| _ -> (a1, l1)
and parse_T (l : token list) : (regex * token list) =
let (a1, l1) = parse_F l in
let (t, rest) = lookahead l1 in
match t with
| Alphabet c -> (Concatenation (a1, Char c), rest)
| LParen ->
(let (a, l1) = parse_S rest in
let (t1, l2) = lookahead l1 in
match t1 with
| RParen -> (Concatenation (a1, a), l2)
| _ -> raise (IllegalExpression "Unbalanced parentheses"))
| _ ->
let (a2, rest) = parse_T l1 in
(Concatenation (a1, a2), rest)
and parse_F (l : token list) : (regex * token list) =
let (a1, l1) = parse_U l in
let (t, rest) = lookahead l1 in
match t with
| Star -> (Closure a1, rest)
| _ -> (a1, l1)
and parse_U (l : token list) : (regex * token list) =
let (t, rest) = lookahead l in
match t with
| Alphabet c -> (Char c, rest)
| LParen ->
(let (a, l1) = parse_S rest in
let (t1, l2) = lookahead l1 in
match t1 with
| RParen -> (a, l2)
| _ -> raise (IllegalExpression "Unbalanced parentheses"))
| _ -> raise (IllegalExpression "Unknown token")
For a LL grammar the FIRST sets are the tokens that are allowed as first token for a rule. To can construct them iteratively till you reach a fixed point.
a rule starting with a token has that token in its FIRST set
a rule starting with a term has the FIRST set of that term in its FIRST set
a rule T = A | B has the union of FIRST(A) and FIRST(B) as FIRST set
Start with step 1 and then repeat steps 2 and 3 until the FIRST sets reach a fixed point (don't change). Now you have the true FIRST sets for your grammar and can decide every rule using the lookahead.
Note: In your code the parse_T function doesn't match the FIRST(T) set. If you look at for example 'a|b' then is enters parse_T and the 'a' is matched by the parse_F call. The lookahead then is '|' which matches epsilon in your grammar but not in your code.
Related
module Value =
struct
type t = Int of int
end
module M = Map.Make(String)
type expr =
| Num of int
| Add of expr * expr
type t = Value.t M.t (* Value.t is Int of int *)
let rec add_map (st: string list) (e: expr list) (s: t): t =
match st with
| [] -> s
| s1::st ->
match e with
| e1::e ->
M.add s1 e1 s;
add_map st e s;;
In above function, e is list of user defined type expr, and s is user defined map "t = Int M.t" which store int in key of string. Problem is if I compile this, error says that type of e1 is t = t M.t, and I need expr M.t. Clearly e1 is element of expr list, why does ocaml think it is t?? I know M.add need (M.add string expr (map)
You didn't show the exact error message, but there is a problem with your call to M.add: the map s has type Value.t M.t, but you are giving it a value of type expr, not Value.t.
You have a Map type t that maps strings to Value.t values. But in your add_map function, you're adding values of type expr to the map.
You need to map values of type expr to Value.t:
let rec expr_to_value_t = function
| Num n -> Value.Int n
| Add (e1, e2) ->
let Value.Int n1 = expr_to_value_t e1 in
let Value.Int n2 = expr_to_value_t e2 in
Value.Int (n1 + n2)
let rec add_map (st: string list) (e: expr list) (s: t): t =
match st with
| [] -> s
| s1::st ->
match e with
| e1::e ->
M.add s1 (expr_to_value_t e1) s;
add_map st e s
However, while this compiles, it does prompt errors about non-exhaustive pattern-matching, and worse, M.add s1 (expr_to_value_t e1) s in this context doesn't do anything. Maps in OCaml are functional data structures. You don't mutate them, but rather transform them. M.add doesn't modify s, it just creates a new map with an additional binding.
You can overcome this with relatively few modifications to your function.
let rec add_map (st: string list) (e: expr list) (s: t): t =
match st with
| [] -> s
| s1::st ->
match e with
| e1::e ->
let s = M.add s1 (expr_to_value_t e1) s in
add_map st e s
Here I've shadowed the original s binding with the new map which is used in the recursive call to add_map. Testing this:
utop # add_map ["hello"; "world"] [Num 23; Num 42] M.empty |> M.bindings;;
- : (string * Value.t) list =
[("hello", Value.Int 23); ("world", Value.Int 42)]
This would be a great place to use List.fold_left2, assuming both lists are of equal length. Otherwise Invalid_argument will be raised.
let add_map st e s =
List.fold_left2 (fun m a b -> M.add a b m) s st e
I was working on chapter 1 of Modern Compiler Implementation in ML by Andrew Appel and I decided to implement it in OCaml instead of SML. I'm new to OCaml and I came across a very frustrating problem. OCaml seems to think that the below function has the signature int * (int * 'a) -> 'a option.
let rec lookupTable = function
| name, (i, v) :: _ when name = i -> Some v
| name, (_, _) :: rest -> lookupTable (name, rest)
| _, [] -> None
But as far as I can tell, there should be nothing that suggests that the first element in the tuple is an int. This is a problem because when the lookupTable function down the line, the compiler complains that I am not passing it an integer. Perhaps I am missing something incredibly obvious, but it has been pretty mind-boggling. Here is the rest of the program
open Base
type id = string
type binop = Plus | Minus | Times | Div
type stm =
| CompoundStm of stm * stm
| AssignStm of id * exp
| PrintStm of exp list
and exp =
| IdExp of id
| NumExp of int
| OpExp of exp * binop * exp
| EseqExp of stm * exp
(* Returns the maximum number of arguments of any print
statement within any subexpression of a given statement *)
let rec maxargs s =
match s with
| CompoundStm (stm1, stm2) -> Int.max (maxargs stm1) (maxargs stm2)
| AssignStm (_, exp) -> maxargs_exp exp
(* Might be more nested expressions *)
| PrintStm exps -> Int.max (List.length exps) (maxargs_explist exps)
and maxargs_exp e = match e with EseqExp (stm, _) -> maxargs stm | _ -> 0
and maxargs_explist exps =
match exps with
| exp :: rest -> Int.max (maxargs_exp exp) (maxargs_explist rest)
| [] -> 0
type table = (id * int) list
let updateTable name value t : table = (name, value) :: t
let rec lookupTable = function
| name, (i, v) :: _ when name = i -> Some v
| name, (_, _) :: rest -> lookupTable (name, rest)
| _, [] -> None
exception UndefinedVariable of string
let rec interp s =
let t = [] in
interpStm s t
and interpStm s t =
match s with
| CompoundStm (stm1, stm2) -> interpStm stm2 (interpStm stm1 t)
| AssignStm (id, exp) ->
let v, t' = interpExp exp t in
updateTable id v t'
(* Might be more nested expressions *)
| PrintStm exps ->
let interpretAndPrint t e =
let v, t' = interpExp e t in
Stdio.print_endline (Int.to_string v);
t'
in
List.fold_left exps ~init:t ~f:interpretAndPrint
and interpExp e t =
match e with
| IdExp i -> (
match lookupTable (i, t) with
| Some v -> (v, t)
| None -> raise (UndefinedVariable i))
| NumExp i -> (i, t)
| OpExp (exp1, binop, exp2) ->
let exp1_val, t' = interpExp exp1 t in
let exp2_val, _ = interpExp exp2 t' in
let res =
match binop with
| Plus -> exp1_val + exp2_val
| Minus -> exp1_val - exp2_val
| Times -> exp1_val * exp2_val
| Div -> exp1_val / exp2_val
in
(res, t')
| EseqExp (s, e) -> interpExp e (interpStm s t)
Base defines = as int -> int -> bool, so when you have the expression name = i the compiler will infer them as ints.
You can access the polymorphic functions and operators through the Poly module, or use a type-specific operator by locally opening the relevant module, e.g. String.(name = i).
The reason Base does not expose polymorphic operators by default is briefly explained in the documentation's introduction:
The comparison operators exposed by the OCaml standard library are polymorphic:
What they implement is structural comparison of the runtime representation of values. Since these are often error-prone, i.e., they don't correspond to what the user expects, they are not exposed directly by Base.
There's also a performance-argument to be made, because the polymorphic/structural operators need to also inspect what kind of value it is at runtime in order to compare them correctly.
I must use the following data type:
type ilist = E | L of int * ilist
I can't seem to find much help on working with lists outside of the standard type online ( [1;2;3] )
I am to write a function that takes a lists and reverses the order
for example: reverse (L(1, L(2, L(3, E)))) would output (L(3, L(2, L(1, E))))
So far here is my code:
let rec reverse l =
match l with
| E -> failwith "Empty List"
| L(h, E) -> h
| L(h, t) -> // append tail and recursive call with rest of list?
let list = reverse (L(1, L(2, L(3, E))))
printfn "reversed list: %A" list
Thanks for any help!
What you are lacking is a convenient way to append an int to an ilist:
let rec append x l =
match l with
| E -> L (x,E)
| L (h,t) -> L (h,append x t)
printfn "%A" (append 4 list)
Now use this function in your last match case to append h to the reversed t:
let rec reverse l =
match l with
| E -> E
| L (h,t) -> append h (reverse t)
Note that it's probably better to just return an empty list when the input list is empty (| E -> E), because failwith is something very ugly you should only use in the rarest cases.
Also note that your second match case | L(h, E) -> h is wrong, because it returns an int instead of an ilist. But it is not needed anyway, so just remove it. The singleton list L (h,E) will be matched with | L (h,t) -> ... instead, which in turn recursively matches t with | E -> E.
Here is a working example: https://repl.it/repls/PhonyAdventurousNet
I am trying to fill ma lazylist by unpaired elements (with recursion), starting with element k. For example: k = 2, list is [2,3,5,7,9,...] The code:
let lgen =
let rec gen k = LCons(k, fun () -> gen k (k + 2))
in gen 1;;
But how can I check is the element k unpaired? (I think that here I need to use match).
Assuming your type for lazy lists is something like this:
type 'a llist = LNil | LCons of 'a * (unit -> 'a llist);;
You can pattern match like this:
let rec lfind e lxs =
match lxs with
| LNil -> false
| LCons(x, _) when x > e -> false
| LCons(x, xs) -> if e=x then true else lfind e (xs ())
;;
I'm new at OCaml (and still a novice in learning programming in general) and I have a quick question about checking what kind of string the next element in the string list is.
I want it to put a separator between each element of the string (except for the last one), but I can't figure out how to make the program 'know' that the last element is the last element.
Here is my code as it is now:
let rec join (separator: string) (l : string list) : string =
begin match l with
| []->""
| head::head2::list-> if head2=[] then head^(join separator list) else head^separator^(join separator list)
end
let test () : bool =
(join "," ["a";"b";"c"]) = "a,b,c"
;; run_test "test_join1" test
Thanks in advance!
You're almost there. The idea is breaking down the list in three cases where it has 0, 1 or at least 2 elements. When the list has more than one element, you're safe to insert separator into the output string:
let rec join (separator: string) (l : string list) : string =
begin match l with
| [] -> ""
| head::[] -> head
| head::list-> head^separator^(join separator list)
end
I have several comments about your function:
Type annotation is redundant. Because (^) is string concatenation operator, the type checker can infer types of separator, l and the output of the function easily.
No need to use begin/and pair. Since you have only one level of pattern matching, there is no confusion to the compiler.
You could use function to eliminate match l with part.
Therefore, your code could be shortened as:
let rec join sep l =
match l with
| [] -> ""
| x::[] -> x
| x::xs -> x ^ sep ^ join sep xs
or even more concise:
let rec join sep = function
| [] -> ""
| x::[] -> x
| x::xs -> x ^ sep ^ join sep xs
The empty list is [], the list with one element is [h] and the list with at least one element is h::t. So your function can be written as:
let rec join separator = function
| [] -> ""
| [h] -> h
| h::t -> h ^ separator ^ join separator t