OCaml Parsing a list - list

I would like to parse "[a;b;c;d;e;f;g]" as "a::b::c::d::e::f::g::[]"
In my part of my parser I have
listOps:
| combOps COLONCOLON listOps { Bin($1,Cons,$3) }
| combOps SEMI listOps { Bin($1,Cons,$3) }
| combOps { $1 }
;
and I have this further down.
| LBRAC RBRAC { NilExpr }
| LBRAC listOps RBRAC { $2 }
But I'm not sure how to get it to read the list between the "[" and "]" as having a "::[]" at the end of it.
Any ideas?

Your grammar as given doesn't look quite right to me. In essence it treats :: and ; identically. So it would treat [a::b] and [a;b] as the same. If you figure out how to handle the two cases differently, you'll probably find a place handle the [] at the end of a list specified with ::.
As a side comment, if you allow a :: b :: [] you are allowing the right side of :: to be a non-empty list. So you might want a :: [b] to be allowed, as it is in OCaml. Or maybe you'd rather not, it's your grammar!

Related

What is this OCaml function returning?

As I understand it, OCaml doesn't require explicit return statements to yield a value from a function. The last line of the function is what returns something.
In that case, could someone please let me know what the following function foo is returning? It seems that it's returning a stream of data. Is it returning the lexer?
and foo ?(input = false) =
lexer
| 'x' _
-> let y = get_func lexbuf
get_text y
| ',' -> get_func lexbuf
| _ -> get_text lexbuf
I'm trying to edit the following function, bar, to return a data stream, as well, so that I can replace foo with bar in another function. However, it seems that bar has multiple lexers which is preventing this return. How can I rewrite bar to return a data stream in a similar way that foo appears to?
let bar cmd lexbuf =
let buff = Buffer.create 0 in
let quot plus =
lexer
| "<" -> if plus then Buffer.add_string b "<" quot plus lexbuf
and unquot plus =
lexer
| ">" -> if plus then Buffer.add_string b ">" unquot plus lexbuf
in
match unquot true lexbuf with
| e -> force_text cmd e
First, your code is probably using one of the old camlp4 syntax extension, you should precise that.
Second, foo is returning the same type of value as either get_text or get_funct. Without the code for those functions, it is not really possible to say more than that.
Third,
Buffer.add_string b ">" unquot plus lexbuf
is ill-typed. Are you missing parentheses:
Buffer.add_string b ">" (unquot plus lexbuf)
?

How to parse matched separators by nom?

I want to parse YMD date in four forms ("20190919", "2019.09.19", "2019-09-19", and "2019/09/19") by nom library.
I started with iso8601 parser which parse only "YYYY-MM-DD" form. And I tryed to match separator and reuse it for next matching like in regex (\d{4})([.-/]?)(\d{2})\2(\d{2}).
Turned out that this code works:
fn parse_ymd(i: &[u8]) -> IResult<&[u8], DateType> {
let (i, y) = year(i)?;
// Match separator if it exist.
let (i, sep) = opt(one_of(".-/"))(i)?;
let (i, m) = month(i)?;
// If first separator was matched then try to find next one.
let (i, _) = if let Some(sep) = sep {
tag(&[sep as u8])(i)?
} else {
// Support the same signature as previous branch.
(i, &[' ' as u8][..])
};
let (i, d) = day(i)?;
Ok((
i,
DateType::YMD {
year: y,
month: m,
day: d,
},
))
}
But obviously it looks weird.
Are there some nom tools to do it more appropriate way?
(This question about nom functionality, and how to do things there right. Not about just this particular example.)
Your solution is decent enough. There is only one suggestion I can offer really:
fn parse_ymd(i: &[u8]) -> IResult<&[u8], DateType> {
...
// If first separator was matched then try to find next one.
let i = match sep {
Some(sep) => tag(&[sep as u8])(i)?.0,
_ => i,
};
...
}
You may not be familiar with the syntax of accessing a tuple element directly. From rust book:
In addition to destructuring through pattern matching, we can access a tuple element directly by using a period (.) followed by the index of the value we want to access.
In this case, it saves you the awkwardness of trying to match the signature of two arms.

Remove characters from a string in all elements of a list

Im trying to replace all strings which contain a substring by itself, in a list.
I've tried it by using the map function:
cleanUpChars = map(\w -> if isInfixOf "**" w then map(\c -> if c == "*" then ""; else c); else w)
To me this reads as: map elements in a list, such that if a character of a word contains * replace it with nothing
To Haskell: "Couldnt match expected type [[Char]] -> [[Char]] with actual type [Char] in the expression: w" (and the last w is underlined)
Any help is appreciated
To answer the revised question (when isInfixOf has been imported correctly):
cleanUpChars = map(\w -> if isInfixOf "**" w then map(\c -> if c == "*" then ""; else c); else w)
The most obvious thing wrong here is that c in the inner parentheses will be a Char (since it's the input to a function which is mapped over a String) - and characters use single quotes, not double quotes. This isn't just a case of a typo or wrong syntax, however - "" works fine as an empty string (and is equivalent to [] since Strings are just lists), but there is no such thing as an "empty character".
If, as it seems, your aim is to remove all *s from each string in the list that contains **, then the right tool is filter rather than map:
Prelude Data.List> cleanUpChars = map(\w -> if isInfixOf "**" w then filter (/= '*') w; else w)
Prelude Data.List> cleanUpChars ["th**is", "is", "a*", "t**es*t"]
["this","is","a*","test"]
(Note that in the example I made up, it removes all asterisks from t**es*t, even the single one. This may not be what you actually wanted, but it's what your logic in the faulty version implied - you'll have to be a little more sophisticated to only remove pairs of consecutive *'s.)
PS I would certainly never write the function like that, with the semicolon - it really doesn't gain you anything. I would also use the infix form of isInfixOf, which makes it much clearer which string you are looking for inside the other:
cleanUpChars :: [String] -> [String]
cleanUpChars = map (\w -> if "**" `isInfixOf` w then filter (/= '*') w else w)
I'm still not particularly happy with that for readability - there's probably some nice way to tidy it up that I'm overlooking for now. But even if not, it helps readability imo to give the function a local name (hopefully you can come up with a more concise name than my version!):
cleanUpChars :: [String] -> [String]
cleanUpChars = map possiblyRemoveAsterisks
where possiblyRemoveAsterisks w = if "**" `isInfixOf` w then filter (/= '*') w else w

Changing the State of Lexing.lexbuf

I am writing a lexer for Brainfuck with Ocamllex, and to implement its loop, I need to change the state of lexbuf so it can returns to a previous position in the stream.
Background info on Brainfuck (skippable)
in Brainfuck, a loop is accomplished by a pair of square brackets with
the following rule:
[ -> proceed and evaluate the next token
] -> if the current cell's value is not 0, return to the matching [
Thus, the following code evaluates to 15:
+++ [ > +++++ < - ] > .
it reads:
In the first cell, assign 3 (increment 3 times)
Enter loop, move to the next cell
Assign 5 (increment 5 times)
Move back to the first cell, and subtract 1 from its value
Hit the closing square bracket, now the current cell (first) is equals to 2, thus jumps back to [ and proceed into the loop again
Keep going until the first cell is equals to 0, then exit the loop
Move to the second cell and output the value with .
The value in the second cell would have been incremented to 15
(incremented by 5 for 3 times).
Problem:
Basically, I wrote two functions to take care of pushing and popping the last position of the last [ in the header section of brainfuck.mll file, namely push_curr_p and pop_last_p which pushes and pops the lexbuf's current position to a int list ref named loopstack:
{ (* Header *)
let tape = Array.make 100 0
let tape_pos = ref 0
let loopstack = ref []
let push_curr_p (lexbuf: Lexing.lexbuf) =
let p = lexbuf.Lexing.lex_curr_p in
let curr_pos = p.Lexing.pos_cnum in
(* Saving / pushing the position of `[` to loopstack *)
( loopstack := curr_pos :: !loopstack
; lexbuf
)
let pop_last_p (lexbuf: Lx.lexbuf) =
match !loopstack with
| [] -> lexbuf
| hd :: tl ->
(* This is where I attempt to bring lexbuf back *)
( lexbuf.Lexing.lex_curr_p <- { lexbuf.Lexing.lex_curr_p with Lexing.pos_cnum = hd }
; loopstack := tl
; lexbuf
)
}
{ (* Rules *)
rule brainfuck = parse
| '[' { brainfuck (push_curr_p lexbuf) }
| ']' { (* current cell's value must be 0 to exit the loop *)
if tape.(!tape_pos) = 0
then brainfuck lexbuf
(* this needs to bring lexbuf back to the previous `[`
* and proceed with the parsing
*)
else brainfuck (pop_last_p lexbuf)
}
(* ... other rules ... *)
}
The other rules work just fine, but it seems to ignore [ and ]. The problem is obviously at the loopstack and how I get and set lex_curr_p state. Would appreciate any leads.
lex_curr_p is meant to keep track of the current position, so that you can use it in error messages and the like. Setting it to a new value won't make the lexer actually seek back to an earlier position in the file. In fact I'm 99% sure that you can't make the lexer loop like that no matter what you do.
So you can't use ocamllex to implement the whole interpreter like you're trying to do. What you can do (and what ocamllex is designed to do) is to translate the input stream of characters into a stream of tokens.
In other languages that means translating a character stream like var xyz = /* comment */ 123 into a token stream like VAR, ID("xyz"), EQ, INT(123). So lexing helps in three ways: it finds where one token ends and the next begins, it categorizes tokens into different types (identifiers vs. keywords etc.) and discards tokens you don't need (white space and comments). This can simplify further processing a lot.
Lexing Brainfuck is a lot less helpful as all Brainfuck tokens only consist of a single character anyway. So finding out where each token ends and the next begins is a no-op and finding out the type of the token just means comparing the character against '[', '+' etc. So the only useful thing a Brainfuck lexer does is to discard whitespace and comments.
So what your lexer would do is turn the input [,[+-. lala comment ]>] into something like LOOP_START, IN, LOOP_START, INC, DEC, OUT, LOOP_END, MOVE_RIGHT, LOOP_END, where LOOP_START etc. belong to a discriminated union that you (or your parser generator if you use one) defined and made the lexer output.
If you want to use a parser generator, you'd define the token types in the parser's grammar and make the lexer produce values of those types. Then the parser can just parse the token stream.
If you want to do the parsing by hand, you'd call the lexer's token function by hand in a loop to get all the tokens. In order to implement loops, you'd have to store the already-consumed tokens somewhere to be able to loop back. In the end it'd end up being more work than just reading the input into a string, but for a learning exercise I suppose that doesn't matter.
That said, I would recommend going all the way and using a parser generator to create an AST. That way you don't have to create a buffer of tokens for looping and having an AST actually saves you some work (you no longer need a stack to keep track of which [ belongs to which ]).

OCaml interpreter: evaluate a function inside a function

I'm trying to write an interpreter in OCaml and I have a problem here.
In my program, I want to call a function like this, for example:
print (get_line 4) // print: print to stdout, get_line: get a specific line in a file
How can I do that? The problem is in our parser, I think so as it defines how a program will be run, how a function is defined and the flow of a program. This is what I have so far in parser an lexer (code below), but it didn't seem to work. I don't really see any difference between my code and the calculator on OCaml site, the statement inside the bracket is evaluated firstly, then return its value to its parent operation to do the next evaluating.
In my interpreter, the function get_line inside bracket is evaluate firstly, but I don't think it returns the value to print function, or it does but wrong type (checked, but I don't think it's this error).
One difference between calculator and my interpreter is that the calculator is working with primitive types, mine are functions. But they should be similar.
This is my code, just a part of it:
parser.mly:
%token ODD
%token CUT
%start main
%type <Path.term list> main
%%
main:
| expr EOL main {$1 :: $3}
| expr EOF { [$1] }
| EOL main { $2 }
;
expr:
| ODD INT { Odd $2}
| ODD LPAREN INT RPAREN expr { Odd $3 }
| CUT INT INT { Cut ($2, $3)}
| CUT INT INT expr { Cut ($2, $3) }
lexer.mll:
{
open Parser
}
(* define all keyword used in the program *)
rule main =
parse
| ['\n'] { EOL }
| ['\r']['\n'] { EOL }
| [' ''\t''\n'] { main lexbuf }
| '(' { LPAREN }
| ')' { RPAREN }
| "cut" { CUT }
| "trunclength" { TRUNCLENGTH }
| "firstArithmetic" { FIRSTARITH }
| "f_ArithmeticLength" { F_ARITHLENGTH }
| "secondArithmetic" { SECARITH }
| "s_ArithmeticLength" { S_ARITHLENGTH }
| "odd" { ODD }
| "oddLength" { ODDLENGTH }
| "zip" { ZIP }
| "zipLength" { ZIPLENGTH }
| "newline" { NEWLINE }
| eof { EOF }
| ['0' - '9']+ as lxm { INT(int_of_string lxm) }
| ['a'-'z''A'-'Z'] ['a'-'z''A'-'Z''0'-'9']* as lxm { STRING lxm }
| ODD LPAREN INT RPAREN expr { Odd $3 }
Your grammar rule requires an INT between parenthesis. You need to change that to an expr. There are a number of other issues with this, but I'll leave it at that.
First, you parser only tries to build a list of Path.term, but what do you want to do with it?
Then, there are many things wrong with your parser, so I don't really know where to start. For instance, the second and fourth case of the expr rule totally ignore the last expr. Moreover, your parser only recognize expressions containing "odd <int>" (or "odd (<int>)") and "cut <int> <int>", so how is it supposed to evaluate print and get_line? You should edit your question and try to make it clearer.
To evaluate expressions, you can
do it directly inside the semantic-actions (as in the calculator example),
or (better) build an AST (for Abstract Syntax Tree) with your parser and then interpret it.
If you want to interpret print (get_line 4), your parser need to know what print and get_line mean. In your code, your parser will see print or get_line as a STRING token (having a string value). As they seem to be keywords in your language, your lexer should recognize them and return a specific token.