ocamlyacc parse error: what token? - ocaml

I'm using ocamlyacc and ocamllex. I have an error production in my grammar that signals a custom exception. So far, I can get it to report the error position:
| error { raise (Parse_failure (string_of_position (symbol_start_pos ()))) }
But, I also want to know which token was read. There must be a way---anyone know?
Thanks.

The best way to debug your ocamlyacc parser is to set the OCAMLRUNPARAM param to include the character p - this will make the parser print all the states that it goes through, and each shift / reduce it performs.
If you are using bash, you can do this with the following command:
$ export OCAMLRUNPARAM='p'

Tokens are generated by lexer, hence you can use the current lexer token when error occurs :
let parse_buf_exn lexbuf =
try
T.input T.rule lexbuf
with exn ->
begin
let curr = lexbuf.Lexing.lex_curr_p in
let line = curr.Lexing.pos_lnum in
let cnum = curr.Lexing.pos_cnum - curr.Lexing.pos_bol in
let tok = Lexing.lexeme lexbuf in
let tail = Sql_lexer.ruleTail "" lexbuf in
raise (Error (exn,(line,cnum,tok,tail)))
end
Lexing.lexeme lexbuf is what you need. Other parts are not necessary but useful.
ruleTail will concat all remaining tokens into string for the user to easily locate error position. lexbuf.Lexing.lex_curr_p should be updated in the lexer to contain correct positions. (source)

I think that, similar to yacc, the tokens are stored in variables corresponding to the symbols in your grammar rule. Here since there is one symbol (error), you may be able to simply output $1 using printf, etc.
Edit: responding to comment.
Why do you use an error terminal? I'm reading an ocamlyacc tutorial that says a special error-handling routine is called when a parse error happens. Like so:
3.1.5. The Error Reporting Routine
When ther parser function detects a
syntax error, it calls a function
named parse_error with the string
"syntax error" as argument. The
default parse_error function does
nothing and returns, thus initiating
error recovery (see Error Recovery).
The user can define a customized
parse_error function in the header
section of the grammar file such as:
let parse_error s = (* Called by the parser function on error *)
print_endline s;
flush stdout
Well, looks like you only get "syntax error" with that function though. Stay tuned for more info.

Related

How to do proper error handling in BNFC? (C++, Flex, Bison)

I'm making a compiler in BNFC and it's got to a stage where it already compiles some stuff and the code works on my device. But before shipping it, I want my compiler to return proper error messages when the user tries to compile an invalid program.
I found how bison can write error on the stderr stream and I'm able to catch those. Now suppose the user's code has no syntax error, it just references an undefined variable, I'm able to catch this in my visitor, but I can't know what the line number was, how can I find the line number?
In bison you can access the starting and ending position of the current expression using the variable #$, which contains a struct with the members first_column, first_line, last_column and last_line. Similarly #1 etc. contain the same information for the sub-expressions $1 etc. respectively.
In order to have access to the same information later, you need to write it into your ast. So add a field to your AST node types to store the location and then set that field when creating the node in your bison file.
(previous answer is richer) but in some simple parsers if we declare
%option yylineno
in flex, and print it in yyerror,
yyerror(char *s) {
fprintf(stderr,"ERROR (line %d):before '%s'\n-%s",yylineno, yytext,s);
}
sometimes it help...

Programmatically load code in sml/nj

I try to load an external .sml file - let's say a.sml - and execute a fun (add: int -> int -> int) listed in this file.
I perfectly know how to do this in the interactive shell: use "a.sml";
But how to achieve this in a .sml file? I tried the following:
val doTest =
let
val _ = print ("Loading..." ^ "\n")
val _ = use "a.sml"
val _ = print ("1 + 2 = " ^ Int.toString (add 1 2) ^ "\n")
in
1
end
But the compilers reaction is:
test.sml:7.49-7.52 Error: unbound variable or constructor: add
BTW: I know that using the CM is the more appropriate way. But in my case I do not know the file a.sml prior to the compilation.
You can't do this. The compiler must know the types of the functions you are calling at compile time. What you are asking is for SML to load a file at run time (use ...) and subsequently run the code therein. This isn't possible due to the phase distinction; type checking occurs during compilation, after which all type information can be forgotten.
If you're generating code and know the file name, you can still use the CM and compile in two steps using your build system. Then you'd get the type errors from the generated code in the second compilation step. Please describe your situation if such an approach doesn't work for you.

Datalog require field `unlock'

While compiling an OCaml application I get the following error:
File "/tmp/ocamlpp466ee0", line 308, characters 34-233:
Error: Signature mismatch:
...
The field `unlock' is required but not provided
The field `lock' is required but not provided
Command exited with code 2.
My guess is that the error is releated with the OCaml library Datalog (I've installed the version 0.3 from here) because the line 308 in the file is /tmp/ocamlpp466ee0 the first one in the following code
module Logic = Datalog.Logic.Make(struct
type t = atom
let equal = eq_atom
let hash = hash_atom
let to_string a = Utils.sprintf "%a" pp_atom a
let of_string s = atom_of_json (Json.from_string s)
end)
I would really appreciate if someone could help me to know what I am doing wrong.
Moreover, I would like to undestand why the file /tmp/ocamlpp466ee0 is generated each time I execute 'make'? I tried to understand by reading the Makefile but I did not succeed.
I think that something have changed in Datalog library and in some version > 0.3 functor Datalog.Logic.Make requires module argument with values lock and unlock declared. So, it's version problem.
About temporary file. As you can see, its name consists of ocaml literal, pp which means preprocessor and some number. Preprocessors in OCaml usually work this way: they read input source file and write output source files. That's why some temporary files are created.

OCaml lex: doesn't work at all, whatsoever

I am at the end of my rope here. I cannot get anything to work in ocamllex, and it is driving me nuts. This is my .mll file:
{
open Parser
}
rule next = parse
| (['a'-'z'] ['a'-'z']*) as id { Identifier id }
| '=' { EqualsSign }
| ';' { Semicolon }
| '\n' | ' ' { next lexbuf }
| eof { EOF }
Here are the contents of the file I pass in as input:
a=b;
Yet, when I compile and run the thing, I get an error on the very first character, saying it's not valid. I honestly have no idea what's going on, and Google has not helped me at all. How can this even be possible? As you can see, I'm really stumped here.
EDIT:
I was working for so long that I gave up on the parser. Now this is the relevant code in my main file:
let parse_file filename =
let l = Lexing.from_channel (open_in filename) in
try
Lexer.next l; ()
with
| Failure msg ->
printf "line: %d, col: %d\n" l.lex_curr_p.pos_lnum l.lex_curr_p.pos_cnum
Prints out "line: 1, col: 1".
Without the corresponding ocamlyacc parser, nobody will be able to find the issue with your code since your lexer works perfectly fine!
I have taken the liberty of writing the following tiny parser (parser.mly) that constructs a list of identifier pairs, e.g. input "a=b;" should give the singleton list [("a", "b")].
%{%}
%token <string> Identifier
%token EqualsSign
%token Semicolon
%token EOF
%start start
%type <(string * string) list> start
%%
start:
| EOF {[]}
| Identifier EqualsSign Identifier Semicolon start {($1, $3) :: $5}
;
%%
To test whether the parser does what I promised, we create another file (main.ml) that parses the string "a=b;" and prints the result.
let print_list = List.iter (fun (a, b) -> Printf.printf "%s = %s;\n" a b)
let () = print_list (Parser.start Lexer.next (Lexing.from_string "a=b;"))
The code should compile (e.g. ocamlbuild main.byte) without any complaints and the program should output "a=b;" as promised.
In response to the latest edit:
In general, I don't believe that catching standard library exceptions that are meant to indicate failure or misuse (like Invalid_argument or Failure) is a good idea. The reason is that they are used ubiquitously throughout the library such that you usually cannot tell which function raised the exception and why it did so.
Furthermore, you are throwing away the only useful information: the error message! The error message should tell you what the source of the problem is (my best guess is an IO-related issue). Thus, you should either print the error message or let the exception propagate to the toplevel. Personally, I prefer the latter option.
However, you probably still want to deal with syntactically ill-formed inputs in a graceful manner. For this, you can define a new exception in the lexer and add a default case that catches invalid tokens.
{
exception Unexpected_token
}
...
| _ {raise Unexpected_token}
Now, you can catch the newly defined exception in your main file and, unlike before, the exception is specific to syntactically invalid inputs. Consequently, you know both the source and the cause of the exception giving you the chance to do something far more meaningful than before.
A fairly random OCaml development hint: If you compile the program with debug information enabled, setting the environment variable OCAMLRUNPARAM to "b" (e.g. export OCAMLRUNPARAM=b) enables stack traces for uncaught exceptions!
btw. ocamllex also can do the + operator for 'one or more' in regular expressions, so this
['a'-'z']+
is equivalent to your
['a'-'z']['a'-'z']*
I was just struggling with the same thing (which is how I found this question), only to finally realize that I had mistakenly specified the path to input file as Sys.argv.(0) instead of Sys.argv.(1)! LOLs
I really hope it helps! :)
It looks like you have a space in the regular expression for identifiers. This could keep the lexer from recognizing a=b, although it should still recognize a = b ;

How to detect eof in ml-lex

While writing a code in ml-lex
we need to write to write the eof function
val eof = fn () => EOF;
is this a necessary part to write
also if i want my lexer to stop at the detection of an eof then what should i add to the given function.
Thanks.
The User’s Guide to ML-Lex and ML-Yacc by Roger Price is great for learning ml-lex and ml-yacc.
The eof function is mandatory in the user declarations part of your lex definition together with the lexresult type as:
The function eof is called by the lexer when the end of the input
stream is reached.
Where your eof function can either throw an exception if that is appropriate for your application or the EOF token. In any way it have to return something of type lexresult. There is an example in chapter 7.1.2 of the user guide which prints a string if EOF was in the middle of a block comment.
I use a somewhat "simpler" eof function
structure T = Tokens
structure C = SourceData.Comments
fun eof data =
if C.depth data = 0 then
T.EOF (~1, ~1)
else
fail (C.start data) "Unclosed comment"
where the C structure is a "special" comment handling structure that counts number of opening and closing comments. If the current depth is 0 then it returns the EOF token, where (~1, ~1) are used indicate the left and right position. As I don't use this position information for EOF i just set it to (~1, ~1).
Normally you would then set the %eop (end of parse) to use the EOF token in the yacc file, to indicate that what ever start symbol that is used, it may be followed by the EOF token. Also remember to add EOF to %noshift. Se section 9.4.5 for %eop and %noshift.
Obviously you have to define EOF in %term declaration of your yacc file aswel.
Hope this helps, else take a look at an MLB parser or an SML parser written in ml-lex and ml-yacc. The MLB parser is the simplest and thus might be easier to understand.