Is it possible to feed an OCamlYacc-generated parser an explicit token list for analysis?
I'd like to use OCamlLex to explicitly generate a token list which I then analyze using a Yacc-generated parser later. However, the standard use case generates a parser that calls a lexer implicitly for the next token. Here tokens are computed during the yacc analysis rather than before. Conceptually a parser should only work on tokens but a Yacc-generated parser provides an interface that relies on a lexer which in my case I don't need.
As already mentioned by Jeffrey, Menhir specifically offers, as part of its runtime library, a module to the parsers with any kind of token stream (it just asks for a unit -> token function): MenhirLib.Convert.
(You could even use this code without using Menhir, with ocamlyacc instead. In practice the conversion is not terribly complicated so you could even re-implement it yourself.)
If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :
let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf =
match !my_tokens with
| [] -> EOF
| h :: t -> my_tokens := t ; h
let ast = Parser.parse token (Lexbuf.from_string "")
On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:
let deflate token =
let q = Queue.create () in
fun lexbuf ->
if not (Queue.is_empty q) then Queue.pop q else
match token lexbuf with
| [ ] -> EOF
| [tok] -> tok
| hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd
let ast = Parser.parse (deflate my_lexer) lexbuf
The OCamlYacc interface does look pretty complicated; it seems to require a Lexing.lexbuf. Maybe you could consider using Lexing.from_string to feed a fixed string rather than a fixed sequence of tokens. You could also look at Menhir. I haven't used it, but it gets excellent reviews here whenever anybody mentions OCaml parser generators. It might have a more flexible lexing interface.
Related
I was using QuasiQuotations in Yesod, and everything worked fine. BUT my file became very large and not nice to look at. Also, my TextEditor does not highlight this syntax correctly. That is why is split my files like so:
getHomeR :: Handler Html
getHomeR = do
webSockets chatApp
defaultLayout $ do
$(luciusFile "templates/chat.lucius")
$(juliusFile "templates/chat.julius")
$(hamletFile "templates/chat.hamlet")
If this is wrong, please do tell. Doing runghc myFile.hs throws many errors like this:
chat_new.hs:115:9:
Couldn't match expected type ‘t0 -> Css’
with actual type ‘WidgetT App IO a0’
The lambda expression ‘\ _render_ajFK
-> (shakespeare-2.0.7:Text.Css.CssNoWhitespace . (foldr ($) ...))
...’
has one argument,
but its type ‘WidgetT App IO a0’ has none
In a stmt of a 'do' block:
\ _render_ajFK
...
And this.
chat_new.hs:116:9:
Couldn't match type ‘(url0 -> [(Text, Text)] -> Text)
-> Javascript’
with ‘WidgetT App IO a1’
Expected type: WidgetT App IO a1
Actual type: JavascriptUrl url0
Probable cause: ‘asJavascriptUrl’ is applied to too few arguments
...
And also one for the HTML (Hamlet).
Thus, one per template.
It seems that hamletFile and others treat templates as self-contained, while yours are referencing something from each other. You can play with order of *File calls, or use widgetFile* from Yesod.Default.Util module:
$(widgetFileNoReload def "chat")
The Reload variant is useful for development - it would make yesod devel to watch for file changes and reload them.
This is an example in native CVC language:
isblue: STRING -> BOOLEAN;
ASSERT isblue("sky");
ASSERT isblue("water");
QUERY isblue("sky"); //valid
QUERY isblue("pig"); //invalid
How would I write it using the C++ API for CVC4? Couldn't find any documentation on how to do this.
There are some API examples in the source distribution that might help you. In particular, examples/api/combination.cpp creates some functions and predicates and makes some assertions:
https://github.com/CVC4/CVC4/blob/master/examples/api/combination.cpp
In your case, you'll create a predicate type with ExprManager::mkFunctionType(), then you construct an "isblue" predicate with ExprManager::mkVar() giving it that type. It will look something like this (assuming you've done "using namespace CVC4" and #included <cvc4/cvc4.h>):
ExprManager em;
SmtEngine smt(&em);
Type predType = em.mkFunctionType(em.stringType(), em.booleanType());
Expr isblue = em.mkVar(predType);
Then you can assert and query applications of your predicate:
smt.assertFormula(em.mkExpr(kind::APPLY_UF, isblue, em.mkConst(String("sky"))));
smt.assertFormula(em.mkExpr(kind::APPLY_UF, isblue, em.mkConst(String("water"))));
smt.query(em.mkExpr(kind::APPLY_UF, isblue, em.mkConst(String("sky"))));
smt.query(em.mkExpr(kind::APPLY_UF, isblue, em.mkConst(String("pig"))));
I'm trying to fix the following VBA statement (converting some old code just for fun and to learn Roslyn, not at all looking for anything perfect) to remove the Set keyword so it's a valid VB.NET statement:
Set f = New Foo()
When I look at it through the Syntax Visualizer, I see it turns into trailing trivia.
I'm trying to figure out how to find it using a query. I tried several approaches but all of the following came up empty:
var attempt1 = root.DescendantTokens().Where(t=>t.IsKind(SyntaxKind.SkippedTokensTrivia));
var attempt2 = root.DescendantTokens().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt3 = root.DescendantTrivia().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt4 = root.DescendantNodes()
.OfType<EmptyStatementSyntax>()
.Where(e => e.DescendantTokens().Any(t => t.IsKeyword()));
(Yes, I'm using C# to work with a VisualBasicSyntaxTree)
I can't seem to just find the SetKeyword token that appears in the visualizer, so I thought maybe it's doing some more heavy lifting to piece together what it really is (is that what's meant by structured trivia?). I read something in the documentation that mentioned the compiler can choose to represent it a couple of different ways, so I thought that may be what's going on here.
The query was just the first thing I tried, but in reality I have a SyntaxRewriter I'm using to visit the code to find and fix all such problems (I'm already able to fix missing parentheses around ArgumentLists, for example) but in this case I can't seem to figure out which Visit method to override.
So again, 1) how to query for these from the root and 2) the best override to select from a rewriter. I've been beating my face on the keyboard for two days on this which exponentially increases the likelihood that I'm having a cranio/recto-insertion moment and I need one of you kind souls to pull me out of it.
Cheers!
Brian
Edit: Fixed typo in query attempt1
So it appears that when the compiler reaches an error condition, it will skip all tokens up to the next point where it can recover and continue parsing (the end of the line in this case). The node representing this error condition is an EmptyStatement with trailing syntax trivia containing the rest of the text as parsed tokens.
So if you're going to rewrite a node, you'll want to rewrite EmptyStatements. But you don't want to write just any empty statement, just the ones with the "BC30807" diagnostic code.
public override SyntaxNode VisitEmptyStatement(EmptyStatementSyntax node)
{
var diagnostic = GetLetSetDiagnostic(node);
if (diagnostic == null)
return base.VisitEmptyStatement(node);
return RewriteLetSetStatement(node);
}
private Diagnostic GetLetSetDiagnostic(EmptyStatementSyntax node)
{
//'Let' and 'Set' assignment statements are no longer supported.
const string code = "BC30807";
return node.GetDiagnostics().SingleOrDefault(n => n.Id == code);
}
The implementation of the RewriteLetSetStatement() method is a bit of a mystery to me, I'm not sure how it can be implemented utilizing the compiler services effectively, I don't think that this is a use case that it covers well. The trivia retains the parsed tokens, but there's not much you can do with those tokens AFAIK.
Ideally, we'd just want to ignore the Set token from the tokens and throw it back into the parser to be reparsed. And as far as I can tell, that's not possible, we can only parse from text.
So, I guess the next best thing to do would be to take the text, rewrite it to remove the Set and parse the text again.
private SyntaxNode RewriteLetSetStatement(EmptyStatementSyntax node)
{
var letSetTokens = node.GetTrailingTrivia()
.Where(triv => triv.IsKind(SyntaxKind.SkippedTokensTrivia))
.SelectMany(triv => triv.GetStructure().ChildTokens())
.TakeWhile(tok => new[] {SyntaxKind.LetKeyword, SyntaxKind.SetKeyword}
.Contains(tok.VisualBasicKind()));
var span = new RelativeTextSpan(node.FullSpan);
var newText = node.GetText().WithChanges(
// replacement spans must be relative to the text
letSetTokens.Select(tok => new TextChange(span.GetSpan(tok.Span), ""))
);
return SyntaxFactory.ParseExecutableStatement(newText.ToString());
}
private class RelativeTextSpan(private TextSpan span)
{
public TextSpan GetSpan(TextSpan token)
{
return new TextSpan(token.Start - span.Start, token.Length);
}
}
I am working on an Ocsigen example (http://ocsigen.org/tuto/manual/macaque).
I get an error when trying to compile the program, as follows.
File "testDB.ml", line 15, characters 14-81 (end at line 18, character 4):
While finding quotation "table" in a position of "expr":
Available quotation expanders are:
svglist (in a position of expr)
svg (in a position of expr)
html5list (in a position of expr)
html5 (in a position of expr)
xhtmllist (in a position of expr)
xhtml (in a position of expr)
Camlp4: Uncaught exception: Not_found
My code is:
module Lwt_thread = struct
include Lwt
include Lwt_chan
end
module Lwt_PGOCaml = PGOCaml_generic.Make(Lwt_thread)
module Lwt_Query = Query.Make_with_Db(Lwt_thread)(Lwt_PGOCaml)
let get_db : unit -> unit Lwt_PGOCaml.t Lwt.t =
let db_handler = ref None in
fun () ->
match !db_handler with
| Some h -> Lwt.return h
| None -> Lwt_PGOCaml.connect ~database:"testbase" ()
let table = <:table< users (
login text NOT NULL,
password text NOT NULL
) >>
..........
I used eliom-destillery to generate the basic files.
I used "make" to compile the program.
I've tried many different things and done a google search but I can't figure out the problem. Any hints are greatly appreciated.
Generally speaking, the error message indicates that CamlP4 does not know the quotation you used, here table, which is used in your code as <:table< ... >>. The quotations can be added by CamlP4 extensions pa_xxx.cmo (or pa_xxx.cma) modules. Unless you made a typo of the quotation name, you failed to load an extension which provides it to CamlP4.
According to http://ocsigen.org/tuto/manual/macaque , Macaque (or its underlying libraries? I am not sure since I have never used it) provides the quotation table. So you have to instruct CamlP4 to load the corresponding extension. I believe the vanilla eliom-destillery is minimum for the basic eliom programming and does not cover for the extensions for Macaque.
Actually the document http://ocsigen.org/tuto/manual/macaque points out it:
We need to reference macaque in the Makefile :
SERVER_PACKAGE := macaque.syntax
This should be the CamlP4 syntax extension name required for table.
In bison, it is sufficient to add
%verbose-error
to the file to make the parser errors more verbose. Is there any way to gain similar functionality with ocamlyacc?
Here is the answer for a similar question, but I could not make anything out of it. This is how I call the lexer and parser functions:
let rec foo () =
try
let line = input_line stdin in
(try
let _ = (Parser.latexstatement lexer_token_safe (Lexing.from_string line)) in
print_string ("SUCCESS\n")
with
LexerException s -> print_string ("$L" ^ line ^ "\n")
| Parsing.Parse_error -> print_string ("$P" ^ line ^ "\n")
| _ -> print_string ("$S " ^ line ^ "\n"));
flush stdout;
foo ();
with
End_of_file -> ()
;;
foo ();;
I don't think that there's an option in ocamlyacc to do what you want automatically, so let me try to provide below a through description of what could be done to handle syntactic errors and have more useful messages. Maybe it is not what you asked.
Errors must actually be separated in lexical and parse errors, depending on which stage of the parsing process the error happens in.
In mll files, a Failure exception will be raised in case of unexpected patterns
in mly files, it's a Parsing.Parse_error exception which will be generated
So you have several solutions:
let the lexer and parser code raise their exceptions, and catch them in the code calling them
implement the specific cases of errors in the either of them with
a catch all rule for the lexer (or some more specific patterns if necessary)
using the error special terminal in the parser rules to catch errors in specific places
In any case, you will have to make functions to get information about the position of the error in the source.
Lexing and Parsing both use a location record, defined in Lexing, with the following fields:
pos_fname : the name of the file currently processed
pos_lnum : the line number in the file
pos_bol : the character number from the start of the file at the beginning of the line
pos_cnum : the character number at the current position
The lexbuf variable used by the lexer has two values like that to track the current token being lexed (lexeme_start_p and lexeme_curr_p in Lexing let you access these data). And the parser has four to track the current symbol (or non-terminal) about to be synthetized, and the current rule items, which can be retrieved with Parsing functions (rhs_start_pos and rhs_end_pos, as well as symbol_start_pos and symbol_end_pos).
Here's a few functions to generate more detailed exceptions:
exception LexErr of string
exception ParseErr of string
let error msg start finish =
Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum
(start.pos_cnum -start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg
let lex_error lexbuf =
raise ( LexErr (error (lexeme lexbuf) (lexeme_start_p lexbuf) (lexeme_end_p lexbuf)))
let parse_error msg nterm =
raise ( ParseErr (error msg (rhs_start_p nterm) (rhs_end_p nterm)))
and some basic use case:
parser:
%token ERR
/* ... */
wsorword:
WS { $1 }
| WORD { $1 }
| error { parse_error "wsorword" 1; ERR "" } /* a token needed for typecheck */
;
lexer:
rule lexer = parse
(* ... *)
(* catch all pattern *)
| _ { lex_error lexbuf }
All that would be left to do is to modify your top level function to catch the exceptions and process them.
Finally, for debugging purposes, there is a set_trace function available in Parsing which enable the display messages of the state machine used by the parsing engine: it traces all the internal state changes of the automaton.
In the Parsing module (you can check it here) there is the function Parsing.set_trace that will do just that. You can use it as:Parsing.set_trace True to enable. Also, you can run ocamlyacc with the -v argument and it will output a .output, listing all states and trasitions.