Integer in rules for parser definition - Ocaml - ocaml

I am running into a problem when compiling the following rule for my parser:
%%
expr:
| expr ASN expr { Asn ($1, $2) }
This is an assignment rule that takes an integer, then the assignment (equal sign) and an expression, as defined in my AST:
type expr =
Asn of int * expr
Of course, the compiler is complaining because I am defining "expr ASN expr", and the first argument should be an integer, not an expression. However, I have not been able to figure out the syntax to specify this.
If somebody could lead me in the right direction, I would really appreciate it.
Thanks!

You don't supply enough details to give a good answer. What do you mean by an integer? I'll assume you mean an integer literal.
Assuming your lexical definitions have a token named INT that represents an integer literal, you might want something like this.
expr:
| INT ASN expr { Asn ($1, $2) }

Probably what you want is the assignment as:
type expr = Asn of var * int
and then define expr in the parser as:
expr:
| VAR ASN INT { Asn ($1, $2) }
in the lexer you should have defined VAR as string and INT as integer literal too, just as examples:
| [a-zA-z]+ { VAR($1) }
| [0-9]+ as i { INT(int_of_string i) }

Related

Does Bison allow "?" in its syntax

I am trying to write Grammar for java specification
for example:-
COMPILATION_UNIT: PACKAGE_DEC? IMPORT_DECS? TYPE_DECS?
but it doesn't work
I have the following error:
invalid character: `?'
for each question mark I use in my file.y
I know that Bison has special characters and it should handle it
Please help
Bison does not allow a ? meaning that the prior token is optional, you have to write out the grammar with the optional elements:
package_decl_opt: %empty
| SOME_TOKEN
;
package: package)_dec_opt TOKEN_PACKAGE TOKEN_IDENTIFIER
;
would allow both of the following:
SOME_TOKEN TOKEN_PACKAGE TOKEN_IDENTIFIER
TOKEN_PACKAGE TOKEN_IDENTIFIER
As you have seen, bison does not implement the ? regular expression optionality operator. Nor does it implement + or * repetition operators. That's because the right-hand sides of productions in contex-free grammars are not regular expressions.
Yacc/bison context-free grammars do allow the | alternation operator, but as an abbreviation:
a : b | c
Is exactly the same as writing
a : b
a : c
and semantic actions only apply to the alternative in which they are specified, so that
a : b | c { /* C action; */ }
Is equivalent to:
a : b { /* Implicit default action*/ }
a : c { /* C action; */ }
It is tempting to create X_opt non-terminals to capture the semantics of X?:
X_opt: X | %empty { $$ = default_value; }
In many simple cases that will work fine, but there are also many grammars in which that introduces an unnecessary shift-reduce conflict. Consider, for example:
label: IDENT ':'
label_opt: label | %empty
statement: label_opt expr
Since expr can start with an identifier, there is no way to know if an IDENT token starts a label or if it starts an expr following an empty label_opt. But LR(1) requires that the empty label_opt be reduced before the IDENT is consumed. So the above grammar is LR(2) and cannot be correctly parsed by an LR(1) parser.
That problem does not occur without the use of the label_opt shortcut:
label: IDENT ':'
statement: label expr
| expr
Since the parser now does not have decide between label and expr before the ':' is encountered.

How to initiate a variable from the Grammar in Bison?

let's imagine we have this grammar
start:
expressions;
expressions:
expressions expression
| expression
;
expression:
expression NAME value { float $2 = $3;}
| NAME value { float $1 = $2;}
;
value:
INT '.' INT
;
and for this grammar we apply this input
a 2.0
b 3.0
this should be interpreted by our grammar like this ( float a = 2.0 ; float b = 3.0; )
my aim is really to declare some variable with a name and with a constructor do some thing like myClass NAME(value); and value is a float.
the problems are I don't know how to get the whole value of a grammatical bloc like value in my exemple and how to make a declaration of variable name that will change in each line with in input file and wont have some generic float a = $1;
I already have my flex tokeniser working which will give me NAME and VALUE
You can't use strings in place of variable names in C++. What you should do instead is to define a map from strings to floats and then do something like the_map[$2] = $3; instead of float $2 = $3;.
On an unrelated note, you need to add an action to value that makes it produce a float value (or make your lexer generate a single token for floats and use that). Otherwise $3 doesn't have a proper value when you use it in expression's action.

ocamlyacc with empty string

So I have a grammar that includes the empty string. The grammar is something like this:
S->ε
S->expression ;; S
I'm getting the error "No more states to discard" when I run my parser so I believe I'm not representing the empty string correctly. So how would I go about representing it, specifically in the lexer .mll file?
I know I need to make a rule for it so I think I have that down. This is what I think it should look like for the parser .mly file, excluding the stuff for expression.
s:
| EMPTY_STRING { [] }
| expression SEMICOLON s { $1::$3 }
You're thinking of epsilon as a token, but it's not a token. It's a 0-length sequence of tokens. Since there are no tokens there, it's not something your scanner needs to know about. Just the parser needs to know about it.
Here's a grammar something like what I think you want:
%token X
%token SEMICOLON
%token EOF
%start main
%type <char list> main
%%
main :
s EOF { $1 }
s :
| epsilon { $1 }
| X SEMICOLON s { 'x' :: $3 }
epsilon :
{ [] }
Note that epsilon is a non-terminal (not a token). Its definition is an empty sequence of symbols.

Grammar to Lex/Yacc

I have been tasked with a project that involves me taking a Grammar (in BNF form) and creating a lexical scanner (using lex) and a parser (using bison). I've never worked with any of these programs and I think a good reference would be to see how these items are created from a grammar. I am looking for a grammar and it's associated .l and .ypp files, preferably in C++. I've been able to find sample files or sample grammars, but not both of them. I've spent some time searching and I could not find anything. I figure I'd post here in hopes that someone has something for me, but I will continue searching in the meantime.
I am currently reading Tom Niemann's
http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf which seems to be pretty well written and understandable.
Thanks
Edit: I am still searching, I am starting to think that what I am looking for does not exist. Google usually never fails me!
Edit 2: Maybe if I provide some of the grammar, you folks could show me what the appropriate .l and .ypp files would look like. This is just a snippet of the grammar, I just need a little 'taste' of how this works and I think I can take it from there.
Grammar:
Program ::= Compound
Statements ::= Compound | Assignment | ...
Assignment ::= Var ASSIGN Expression
Expression ::= Var | Operator Expression Expression | Number
Compound := START Statements END
Number ::= NUMBER
Descriptions:
Assignment is the equal sign ":="
Var is an identifier that begins with a lower case letter and is followed by lower case letters or digits
START is the "start" keyword
END is the "end keyword
Operator is "+", "-", "*", "/"
Number is decimal digits which could potentially be negative (minus sign in front)
Most of this is fairly straightforward. One part, however, is decidedly problematic. You've defined a number to (potentially) include a leading -, and that's a problem.
The problem is pretty simple. Given an input like 321-123, it's essentially impossible for the lexer (which won't normally keep track of current state) to guess at whether that's supposed to be two tokens (321 and -123 or three 321, -, 123). In this case, the - is almost certainly intended to be separate from the 123, but if the input were 321 + -123 you'd apparently want -123 as a single token instead.
To deal with that, you probably want to change your grammar so the leading - isn't part of the number. Instead, you always want to treat the - as an operator, and the number itself is composed solely of the digits. Then it's up to the parser to sort out expressions where the - is unary vs. binary.
Taking that into account, the lexer file would look something like this:
%{
#include "y.tab.h"
%}
%option noyywrap case-insensitive
%%
:= { return ASSIGN; }
start { return START; }
end { return END; }
[+/*] { return OPERATOR; }
- { return MINUS; }
[0-9]+ { return NUMBER; }
[a-z][a-z0-9]* { return VAR; }
[ \r\n] { ; }
%%
void yyerror(char const *s) { fputs(s, stderr); }
The matching yacc file would look something like this:
%token ASSIGN START END OPERATOR MINUS NUMBER VAR
%left '-' '+' '*' '/'
%%
program : compound
statement : compound
| assignment
;
assignment : VAR ASSIGN expression
;
statements :
| statements statement
;
expression : VAR
| expression OPERATOR expression
| expression MINUS expression
| value
;
value: NUMBER
| MINUS NUMBER
;
compound : START statements END
%%
int main() {
yyparse();
return 0;
}
Note: I've tested these only extremely minimally--enough to verify input I believe is grammatical, such as: start a:=1 b:=2 end and start a:=1+3*3 b:=a+4 c:=b*3 end is accepted (no error message printed out) and input I believe is un-grammatical, such as: 9:=13 and a=13 do both print out syntax error messages. Since this doesn't attempt to do any more with the expressions than recognize those which are or are not grammatical, that's about the best we can do though.

OCaml: get value's type name

Is is possible to print value's name in OCaml, for example if I have
type my_type =
| MyType_First of int
| MyType_Second of string
and then do something like:
let my_value = MyType_First 0 in
print_string ("my_value is of type " ^ String.from_type my_value ^ ".\n";
can I get "my_value is of type MyType_First." ?
Thank you.
Monomorphic solution:
let from_type = function
| MyType_First _ -> "MyType_First"
| MyType_Second _ -> "MyType_Second"
Polymorphic solution: none. (AFAIK, lexical tokens corresponding to constructors are not recorded in the bytecode/binary, even when debugging flags are specified. The only thing one could do is to print the integer ‘identifier’ for the constructor, using some dark Obj.magic.)
What you want is a simpler form of generic print and is not available in OCaml as such, but some workarounds exist - e.g. deriving.