I have a trivial lexer taken from a tutorial (http://plus.kaist.ac.kr/~shoh/ocaml/ocamllex-ocamlyacc/ocamllex-tutorial/sec-ocamllex-some-simple-examples.html)
{ }
rule translate = parse
| "c" { print_string (Sys.getcwd ()); translate lexbuf }
| _ as c { print_char c; translate lexbuf }
| eof { exit 0 }
After generating the lexer OCaml and creating an executable,
ocamllex testlexer.mll && ocamlc -o testlexer testlexer.ml
I attempt to pass content in via stdin echo c | ./testlexer and via a file ./testlexer input, but neither works.
I also don't see any logic in the generated testlexer.ml for reading from stdin or a file, is it meant to be included as a module in another program or consumed by another code generation tool like ocamlyacc?
You need a main function (in essence). You can adapt it from the other examples on that page.
Here's a full example that I wrote up:
{ }
rule translate = parse
| "c" { print_string (Sys.getcwd ()); translate lexbuf }
| _ as c { print_char c; translate lexbuf }
| eof { exit 0 }
{
let main () =
let lexbuf = Lexing.from_channel stdin in translate lexbuf
let () = main ()
}
It seems to work as intended:
$ ocamllex l.mll
4 states, 257 transitions, table size 1052 bytes
$ ocamlc -o l l.ml
$ echo c/itworks | ./l
/home/jeffsco/tryll2/itworks
Update
Sorry, I forgot to answer your other questions. Yes, without the main function, the original code can be a module in a larger program. It could be a program that users ocamlyacc, or not.
Related
I am trying to implement a parser that read regular expression. It ask the user to enter a valid input of string/integers/float. If is valid and the user press ctrl^d, then print the number. Otherwise shows an error. But the problem in the following code does not stop when I press ctrl^D. How to implement eof token and print the input ?
test.mll :
{ type result = Int of int | Float of float | String of string }
let digit = ['0'-'9']
let digits = digit +
let lower_case = ['a'-'z']
let upper_case = ['A'-'Z']
let letter = upper_case | lower_case
let letters = letter +
rule main = parse
(digits)'.'digits as f { Float (float_of_string f) }
| digits as n { Int (int_of_string n) }
| letters as s { String s}
| _ { main lexbuf }
{ let newlexbuf = (Lexing.from_channel stdin) in
let result = main newlexbuf in
print_endline result }
I'd say the main problem is that each call to main produces one token, and there's only one call to main in your code. So it will process just one token.
You need to have some kind of iteration that calls main repeatedly.
There is a special pattern eof in OCamllex that matches the end of the input file. You can use this to return a special value that stops the iteration.
As a side comment, you can't call print_endline with a result as its parameter. Its parameter must be a string. You will need to write your own function for printing the results.
Update
To get an iteration, change your code to something like this:
{
let newlexbuf = Lexing.from_channel stdin in
let rec loop () =
match main newlexbuf with
| Int i -> iprint i; loop ()
| Float f -> fprint f; loop ()
| String s -> sprint s; loop ()
| Endfile -> ()
in
loop ()
}
Then add a rule something like this to your patterns:
| eof { Endfile }
Then add Endfile as an element of your type.
A assume this is homework. So make sure you see how the iteration is working. Aside from the details of ocamllex, that's something you want to master (apologies for unsolicited advice).
I just hit that "problem" : is there a smart way to insert the end of file (ASCII 0) character in a string?
By "smart", I mean something better than
let s = "foo" ^ (String.make 1 (Char.chr 0))
let s = "foo\000"
that is, something which would reflect that we are adding an EOF, not a "mystery char which ascii value is 0".
EDIT:
Mmh... indeed I was messing with eof being a char. But anyway, in C you can have
#include <stdio.h>
int main(void)
{
char a = getchar();
if (a = EOF)
printf("eof");
else
printf("not eof");
return 0;
}
Where you can test whether a char is an EOF (and (int) EOF is -1, not 0 as I was thinking). And similarly, you can set a char to be EOF, etc..
My question is: Is it possible to have something similar in ocaml ?
As #melpomene says, there is no EOF character, and '\000' really is just a character. So there's no real answer to your question as near as I can tell.
You can define your own name for a string consisting of just the NUL character (as we used to call it):
let eof = "\000"
Then your function looks like this:
let add_eof s = s ^ eof
Your C has two errors. First, you assign EOF to a instead of comparing a with EOF. Second, getchar() returns an int. It returns an int expressly so that it can return EOF, a value not representable by a char. Your code (with the first error corrected), which assigns getchar()s value to a char before testing it, will fail to process a file with a char of value 255 in it:
$ gcc -Wall getchar.c -o getchar
$ echo -e "\xFF" > fake-eof
$ echo " " > space
$ ./getchar < fake-eof
eof
$ ./getchar < space
not eof
The trick with getchar returning int, of returning a larger type so that your return can include the smaller type and alternately other kinds of information, is a trick that's wholly unnecessary in OCaml due to its more advanced type system. OCaml could have
(* using hypothetical c_getchar, a wrapper for the getchar() in C that returns an int *)
let getchar_opt () =
match c_getchar () with
| -1 -> None
| c -> Some (char_of_int c)
let getchar_exn () =
match c_getchar () with
| -1 -> raise End_of_file
| c -> char_of_int c
type `a ior = EOF | Value of 'a
let getchar_ior () =
match c_getchar_ior () with
| -1 -> EOF
| c -> Value (char_of_int c)
Of course Pervasives.input_char in OCaml raises an exception on EOF rather than doing one of these other things. If you want a non-exceptional interface, you could wrap input_char with your own version that catches the exception, or you could - depending on your program - use Unix.read instead, which returns the number of bytes it was able to read, which is 0 on EOF.
I have written an interpreter using ocamllex and ocamlyacc, the lexer and the parser work correctly but currently they only parse the last .txt argument it receives as oppose to all of them in turn. For example, ./interpret one.txt two.txt three.txt only parses three.txt as oppose to parsing one.txt and then two.txt and then three.txt which is what I want. So for example the parse results are as follows:
one.txt -> "1"
two.txt -> "2"
three.txt -> "3"
On calling ./interpret one.txt two.txt three.txt the current output is: 3 but I want it to be 123
Here is my main class which deals with the stdin and stdout
open Lexer
open Parser
open Arg
open Printf
let toParse c =
try let lexbuf = Lexing.from_channel c in
parser_main lexer_main lexbuf
with Parsing.Parse_error -> failwith "Parse failure!" ;;
let argument = ref stdin in
let prog p = argument := open_in p in
let usage = "./interpreter FILE" in
parse [] prog usage ;
let parsed = toParse !argument in
let result = eval parsed in
let _ = parsed in
flush stdout;
Thanks for your time
There's not really enough code here to be able to help.
If I assume that the output is written by eval, then I see only one call to eval. But there's nothing here that deals with filenames from the command line, so it's hard to say more.
If you are planning to read input from files, then there's no reason to be using stdin for anything as far as I can tell.
(I know this is a very minor point, but this code doesn't constitute a class. Other languages use classes for everything, but this is a module.)
Update
Here's a module that works something like the Unix cat command; it writes out the contents of all the files from the command line one after the next.
let cat () =
for i = 1 to Array.length Sys.argv - 1 do
let ic = open_in Sys.argv.(i) in
let rec loop () =
match input_line ic with
| line -> output_string stdout (line ^ "\n"); loop ()
| exception End_of_file -> ()
in
loop ();
close_in ic
done
let () = cat ()
Here's how it looks when you compile and run it.
$ ocamlc -o mycat mycat.ml
$ echo test line 1 > file1
$ echo test line 2 > file2
$ ./mycat file1 file2
test line 1
test line 2
I have some basic ocamllex code, which was written by my professor, and seems to be fine:
{ type token = EOF | Word of string }
rule token = parse
| eof { EOF }
| [’a’-’z’ ’A’-’Z’]+ as word { Word(word) }
| _ { token lexbuf }
{
(*module StringMap = BatMap.Make(String) in *)
let lexbuf = Lexing.from_channel stdin in
let wordlist =
let rec next l = match token lexbuf with
EOF -> l
| Word(s) -> next (s :: l)
in next []
in
List.iter print_endline wordlist
}
However, running ocamllex wordcount.mll produces
File "wordcount.mll", line 4, character 3: syntax error.
This indicates that there is an error at the first [ in the regex in the fourth line here. What is going on?
You seem to have curly quotes (also called "smart quotes" -- ugh) in your text. You need regular old single quotes.
curly quote: ’
old fashioned single quote: '
I'm trying to write an interpreter in OCaml and I have a problem here.
In my program, I want to call a function like this, for example:
print (get_line 4) // print: print to stdout, get_line: get a specific line in a file
How can I do that? The problem is in our parser, I think so as it defines how a program will be run, how a function is defined and the flow of a program. This is what I have so far in parser an lexer (code below), but it didn't seem to work. I don't really see any difference between my code and the calculator on OCaml site, the statement inside the bracket is evaluated firstly, then return its value to its parent operation to do the next evaluating.
In my interpreter, the function get_line inside bracket is evaluate firstly, but I don't think it returns the value to print function, or it does but wrong type (checked, but I don't think it's this error).
One difference between calculator and my interpreter is that the calculator is working with primitive types, mine are functions. But they should be similar.
This is my code, just a part of it:
parser.mly:
%token ODD
%token CUT
%start main
%type <Path.term list> main
%%
main:
| expr EOL main {$1 :: $3}
| expr EOF { [$1] }
| EOL main { $2 }
;
expr:
| ODD INT { Odd $2}
| ODD LPAREN INT RPAREN expr { Odd $3 }
| CUT INT INT { Cut ($2, $3)}
| CUT INT INT expr { Cut ($2, $3) }
lexer.mll:
{
open Parser
}
(* define all keyword used in the program *)
rule main =
parse
| ['\n'] { EOL }
| ['\r']['\n'] { EOL }
| [' ''\t''\n'] { main lexbuf }
| '(' { LPAREN }
| ')' { RPAREN }
| "cut" { CUT }
| "trunclength" { TRUNCLENGTH }
| "firstArithmetic" { FIRSTARITH }
| "f_ArithmeticLength" { F_ARITHLENGTH }
| "secondArithmetic" { SECARITH }
| "s_ArithmeticLength" { S_ARITHLENGTH }
| "odd" { ODD }
| "oddLength" { ODDLENGTH }
| "zip" { ZIP }
| "zipLength" { ZIPLENGTH }
| "newline" { NEWLINE }
| eof { EOF }
| ['0' - '9']+ as lxm { INT(int_of_string lxm) }
| ['a'-'z''A'-'Z'] ['a'-'z''A'-'Z''0'-'9']* as lxm { STRING lxm }
| ODD LPAREN INT RPAREN expr { Odd $3 }
Your grammar rule requires an INT between parenthesis. You need to change that to an expr. There are a number of other issues with this, but I'll leave it at that.
First, you parser only tries to build a list of Path.term, but what do you want to do with it?
Then, there are many things wrong with your parser, so I don't really know where to start. For instance, the second and fourth case of the expr rule totally ignore the last expr. Moreover, your parser only recognize expressions containing "odd <int>" (or "odd (<int>)") and "cut <int> <int>", so how is it supposed to evaluate print and get_line? You should edit your question and try to make it clearer.
To evaluate expressions, you can
do it directly inside the semantic-actions (as in the calculator example),
or (better) build an AST (for Abstract Syntax Tree) with your parser and then interpret it.
If you want to interpret print (get_line 4), your parser need to know what print and get_line mean. In your code, your parser will see print or get_line as a STRING token (having a string value). As they seem to be keywords in your language, your lexer should recognize them and return a specific token.