verbose error with ocamlyacc - ocaml

In bison, it is sufficient to add
%verbose-error
to the file to make the parser errors more verbose. Is there any way to gain similar functionality with ocamlyacc?
Here is the answer for a similar question, but I could not make anything out of it. This is how I call the lexer and parser functions:
let rec foo () =
try
let line = input_line stdin in
(try
let _ = (Parser.latexstatement lexer_token_safe (Lexing.from_string line)) in
print_string ("SUCCESS\n")
with
LexerException s -> print_string ("$L" ^ line ^ "\n")
| Parsing.Parse_error -> print_string ("$P" ^ line ^ "\n")
| _ -> print_string ("$S " ^ line ^ "\n"));
flush stdout;
foo ();
with
End_of_file -> ()
;;
foo ();;

I don't think that there's an option in ocamlyacc to do what you want automatically, so let me try to provide below a through description of what could be done to handle syntactic errors and have more useful messages. Maybe it is not what you asked.
Errors must actually be separated in lexical and parse errors, depending on which stage of the parsing process the error happens in.
In mll files, a Failure exception will be raised in case of unexpected patterns
in mly files, it's a Parsing.Parse_error exception which will be generated
So you have several solutions:
let the lexer and parser code raise their exceptions, and catch them in the code calling them
implement the specific cases of errors in the either of them with
a catch all rule for the lexer (or some more specific patterns if necessary)
using the error special terminal in the parser rules to catch errors in specific places
In any case, you will have to make functions to get information about the position of the error in the source.
Lexing and Parsing both use a location record, defined in Lexing, with the following fields:
pos_fname : the name of the file currently processed
pos_lnum : the line number in the file
pos_bol : the character number from the start of the file at the beginning of the line
pos_cnum : the character number at the current position
The lexbuf variable used by the lexer has two values like that to track the current token being lexed (lexeme_start_p and lexeme_curr_p in Lexing let you access these data). And the parser has four to track the current symbol (or non-terminal) about to be synthetized, and the current rule items, which can be retrieved with Parsing functions (rhs_start_pos and rhs_end_pos, as well as symbol_start_pos and symbol_end_pos).
Here's a few functions to generate more detailed exceptions:
exception LexErr of string
exception ParseErr of string
let error msg start finish =
Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum
(start.pos_cnum -start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg
let lex_error lexbuf =
raise ( LexErr (error (lexeme lexbuf) (lexeme_start_p lexbuf) (lexeme_end_p lexbuf)))
let parse_error msg nterm =
raise ( ParseErr (error msg (rhs_start_p nterm) (rhs_end_p nterm)))
and some basic use case:
parser:
%token ERR
/* ... */
wsorword:
WS { $1 }
| WORD { $1 }
| error { parse_error "wsorword" 1; ERR "" } /* a token needed for typecheck */
;
lexer:
rule lexer = parse
(* ... *)
(* catch all pattern *)
| _ { lex_error lexbuf }
All that would be left to do is to modify your top level function to catch the exceptions and process them.
Finally, for debugging purposes, there is a set_trace function available in Parsing which enable the display messages of the state machine used by the parsing engine: it traces all the internal state changes of the automaton.

In the Parsing module (you can check it here) there is the function Parsing.set_trace that will do just that. You can use it as:Parsing.set_trace True to enable. Also, you can run ocamlyacc with the -v argument and it will output a .output, listing all states and trasitions.

Related

perl6 Need help to understand more about proto regex/token/rule

The following code is taken from the Perl 6 documentation, and I am trying to learn more about it before more experimentation:
proto token command {*}
token command:sym<create> { <sym> }
token command:sym<retrieve> { <sym> }
token command:sym<update> { <sym> }
token command:sym<delete> { <sym> }
Is the * in the first line a whatever-star? Can it be something else, such as
proto token command { /give me an apple/ }
Can "sym" be something else, such as
command:eat<apple> { <eat> } ?
{*} tells the runtime to call the correct candidate.
Rather than force you to write {{*}} for the common case of just call the correct one, the compiler allows you to shorten it to just {*}
That is the case for all proto routines like sub, method, regex, token, and rule.
In the case of the regex proto routines, only a bare {*} is allowed.
The main reason is probably because no-one has really come up with a good way to make it work sensibly in the regex sub-language.
So here is an example of a proto sub that does some things that are common to all of the candidates.
#! /usr/bin/env perl6
use v6.c;
for #*ARGS { $_ = '--stdin' when '-' }
# find out the number of bytes
proto sub MAIN (|) {
try {
# {*} calls the correct multi
# then we get the number of elems from its result
# and try to say it
say {*}.elems # <-------------
}
# if {*} returns a Failure note the error message to $*ERR
or note $!.message;
}
#| the number of bytes on the clipboard
multi sub MAIN () {
xclip
}
#| the number of bytes in a file
multi sub MAIN ( Str $filename ){
$filename.IO.slurp(:!chomp,:bin)
}
#| the number of bytes from stdin
multi sub MAIN ( Bool :stdin($)! ){
$*IN.slurp-rest(:bin)
}
sub xclip () {
run( «xclip -o», :out )
.out.slurp-rest( :bin, :close );
}
This answers your second question. Yes, it's late.
You have to distinguish two different syms (or eats). The one that's on the definition of the token as an "adverb" (or extended syntax identifier, whatever you want to call it), and the one that's on the token itself.
If you use <eat> in the token body, Perl 6 will simply not find it. You will get an error like
No such method 'eat' for invocant of type 'Foo'
Where Foo would be the name of the grammar. <sym> is a predefined token, which matches the value of the adverb (or pair value) in the token.
You could, in principle, use the extended syntax to define a multi token (or rule, or regex). However, if you try to define it in this way, you will get a different error:
Can only use <sym> token in a proto regex
So, the answer to your second question is no, and no.

Error: Camlp4: Uncaught exception: Not_found

I am working on an Ocsigen example (http://ocsigen.org/tuto/manual/macaque).
I get an error when trying to compile the program, as follows.
File "testDB.ml", line 15, characters 14-81 (end at line 18, character 4):
While finding quotation "table" in a position of "expr":
Available quotation expanders are:
svglist (in a position of expr)
svg (in a position of expr)
html5list (in a position of expr)
html5 (in a position of expr)
xhtmllist (in a position of expr)
xhtml (in a position of expr)
Camlp4: Uncaught exception: Not_found
My code is:
module Lwt_thread = struct
include Lwt
include Lwt_chan
end
module Lwt_PGOCaml = PGOCaml_generic.Make(Lwt_thread)
module Lwt_Query = Query.Make_with_Db(Lwt_thread)(Lwt_PGOCaml)
let get_db : unit -> unit Lwt_PGOCaml.t Lwt.t =
let db_handler = ref None in
fun () ->
match !db_handler with
| Some h -> Lwt.return h
| None -> Lwt_PGOCaml.connect ~database:"testbase" ()
let table = <:table< users (
login text NOT NULL,
password text NOT NULL
) >>
..........
I used eliom-destillery to generate the basic files.
I used "make" to compile the program.
I've tried many different things and done a google search but I can't figure out the problem. Any hints are greatly appreciated.
Generally speaking, the error message indicates that CamlP4 does not know the quotation you used, here table, which is used in your code as <:table< ... >>. The quotations can be added by CamlP4 extensions pa_xxx.cmo (or pa_xxx.cma) modules. Unless you made a typo of the quotation name, you failed to load an extension which provides it to CamlP4.
According to http://ocsigen.org/tuto/manual/macaque , Macaque (or its underlying libraries? I am not sure since I have never used it) provides the quotation table. So you have to instruct CamlP4 to load the corresponding extension. I believe the vanilla eliom-destillery is minimum for the basic eliom programming and does not cover for the extensions for Macaque.
Actually the document http://ocsigen.org/tuto/manual/macaque points out it:
We need to reference macaque in the Makefile :
SERVER_PACKAGE := macaque.syntax
This should be the CamlP4 syntax extension name required for table.

Pgocaml customizing sql queries

I am trying to write a query that simply drops a table.
let drop_table dbh table_name =
let query = String.concat " " ["drop table"; table_name] in
PGSQL(dbh) query
I am receiving the following error from the query
File "save.ml", line 37, characters 10-11:
Parse error: STRING _ expected after ")" (in [expr])
File "save.ml", line 1:
Error: Preprocessor error
Why am I getting this error? It appears that this function is valid Ocaml syntax.
Thanks guys!
You cannot construct query when using PG'OCaml's syntax extension. You must provide a literal string. This is the tradeoff for getting PG'Ocaml's compile time query validation. If query could be any OCaml expression, PG'OCaml wouldn't know how to validate it at compile time.
Personally, I've stopped using the syntax extension completely. My feeling is it doesn't scale to large projects. Instead I call prepare and execute directly. For example, this function will create a new database connection (assuming the connection parameters are previously defined), run the given query, and close the connection:
let exec query =
let db = PGOCaml.connect ~host ~user ~database ~port ~password ()
PGOCaml.prepare db ~query ();
let ans = PGOCaml.execute db ~params:[] () in
PGOCaml.close db;
ans
Of course, this isn't a robust implementation and shouldn't be used in production code. It doesn't handle errors and isn't asynchronous.

Feed ocamlyacc parser from explicit token list?

Is it possible to feed an OCamlYacc-generated parser an explicit token list for analysis?
I'd like to use OCamlLex to explicitly generate a token list which I then analyze using a Yacc-generated parser later. However, the standard use case generates a parser that calls a lexer implicitly for the next token. Here tokens are computed during the yacc analysis rather than before. Conceptually a parser should only work on tokens but a Yacc-generated parser provides an interface that relies on a lexer which in my case I don't need.
As already mentioned by Jeffrey, Menhir specifically offers, as part of its runtime library, a module to the parsers with any kind of token stream (it just asks for a unit -> token function): MenhirLib.Convert.
(You could even use this code without using Menhir, with ocamlyacc instead. In practice the conversion is not terribly complicated so you could even re-implement it yourself.)
If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :
let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf =
match !my_tokens with
| [] -> EOF
| h :: t -> my_tokens := t ; h
let ast = Parser.parse token (Lexbuf.from_string "")
On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:
let deflate token =
let q = Queue.create () in
fun lexbuf ->
if not (Queue.is_empty q) then Queue.pop q else
match token lexbuf with
| [ ] -> EOF
| [tok] -> tok
| hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd
let ast = Parser.parse (deflate my_lexer) lexbuf
The OCamlYacc interface does look pretty complicated; it seems to require a Lexing.lexbuf. Maybe you could consider using Lexing.from_string to feed a fixed string rather than a fixed sequence of tokens. You could also look at Menhir. I haven't used it, but it gets excellent reviews here whenever anybody mentions OCaml parser generators. It might have a more flexible lexing interface.

Suppress "val it" output in Standard ML

I'm writing a "script" in Standard ML (SML/NJ) that sets up the interactive environment to my liking. The last thing the script does is print out a message indicating everything went smoothly. Essentially, the last line is this:
print "SML is ready.\n";
When I run the script, all goes well but the SML interpreter displays the return value from the print function.
SML is ready.
val it = () : unit
-
Since I'm merely printing something to the screen, how can I suppress the "val it = () : unit" output so that all I see is the "SML is ready" message followed by the interpreter prompt?
To surpress the SML-NJ prompt and response, use the following assignment.
Compiler.Control.Print.out := {say=fn _=>(), flush=fn()=>()};
print "I don't show my type";
I don't show my type
although I don't see why the print function returning the type is bad.
The say function controls what is printed out.
There is a larger example in the following SML/NJ notes http://www.cs.cornell.edu/riccardo/prog-smlnj/notes-011001.pdf
The useSilently function can be used to load a file but without displaying any output
associated with the loading
fun useSilently (s) = let
val saved = !Compiler.Control.Print.out
fun done () = Compiler.Control.Print.out := saved
in
Compiler.Control.Print.out := {say = fn _ => (), flush = fn () => ()}
(use (s); done ()) handle _ => done ()
end
This is essentially changing the say function to do nothing and then setting it back at the end.
Use this:
val _ = print "I don't show my type";
In Moscow ML you can run the REPL without declaration output with
mosml -quietdec file.sml