I'm trying to implement a toy language with grammar inspired by Scala. I used part of the Scala Syntax Specification:
Expr1 : ‘if’ ‘(’ Expr ‘)’ {nl} Expr [[semi] ‘else’ Expr]
| ...
In this toy language, everything is an expression (if-else, for, while has a return value), and seperated by ; or \n.
Here's the parser.y:
%code top {
#include <cstdio>
}
%union {
int n;
Ast *ast;
}
%code requires {
class Ast;
int yylex(void);
void yyerror(const char *msg);
}
%token<n> NUM
%token<n> PLUS '+'
%token<n> MINUS '-'
%token<n> TIMES '*'
%token<n> DIVIDE '/'
%token<n> SEMICOLON ';'
%token<n> NEWLINE '\n'
%token<n> IF "if"
%token<n> ELSE "else"
%token<n> LPAREN '('
%token<n> RPAREN ')'
%type<ast> prog expr primaryExpr optionalElse semi optionalSemi optionalNewlines newlines
%left PLUS MINUS
%left TIMES DIVIDE
%start prog
%%
prog : expr
;
expr : "if" '(' primaryExpr ')' optionalNewlines expr optionalElse { $$ = nullptr; }
| primaryExpr
;
optionalElse : optionalSemi "else" expr { $$ = nullptr; }
| %empty { $$ = nullptr; }
;
primaryExpr : NUM { $$ = nullptr; }
| primaryExpr '+' NUM { $$ = nullptr; }
| primaryExpr '-' NUM { $$ = nullptr; }
| primaryExpr '*' NUM { $$ = nullptr; }
| primaryExpr '/' NUM { $$ = nullptr; }
;
semi : ';' { $$ = nullptr; }
| '\n' { $$ = nullptr; }
;
optionalSemi : semi { $$ = nullptr; }
| %empty { $$ = nullptr; }
;
optionalNewlines : newlines { $$ = nullptr; }
| %empty { $$ = nullptr; }
;
newlines : '\n' { $$ = nullptr; }
| newlines '\n' { $$ = nullptr; }
;
%%
void yyerror(const char *msg) {
fprintf(stderr, "%s\n", msg);
}
This is my grammar compiled with: bison --debug --verbose -Wcounterexamples -o grammar.tab.cpp --defines=grammar.tab.h grammar.y
It gives me output:
Terminals unused in grammar
PLUS
MINUS
TIMES
DIVIDE
SEMICOLON
NEWLINE
LPAREN
RPAREN
Rules useless in parser due to conflicts
14 optionalSemi: %empty
State 21 conflicts: 2 shift/reduce, 1 reduce/reduce
Grammar
0 $accept: prog $end
1 prog: expr
2 expr: "if" '(' primaryExpr ')' optionalNewlines expr optionalElse
3 | primaryExpr
4 optionalElse: optionalSemi "else" expr
5 | %empty
6 primaryExpr: NUM
7 | primaryExpr '+' NUM
8 | primaryExpr '-' NUM
9 | primaryExpr '*' NUM
10 | primaryExpr '/' NUM
11 semi: ';'
12 | '\n'
13 optionalSemi: semi
14 | %empty
15 optionalNewlines: newlines
16 | %empty
17 newlines: '\n'
18 | newlines '\n'
Terminals, with rules where they appear
$end (0) 0
'\n' <n> (10) 12 17 18
'(' <n> (40) 2
')' <n> (41) 2
'*' <n> (42) 9
'+' <n> (43) 7
'-' <n> (45) 8
'/' <n> (47) 10
';' <n> (59) 11
error (256)
NUM <n> (258) 6 7 8 9 10
PLUS <n> (259)
MINUS <n> (260)
TIMES <n> (261)
DIVIDE <n> (262)
SEMICOLON <n> (263)
NEWLINE <n> (264)
"if" <n> (265) 2
"else" <n> (266) 4
LPAREN <n> (267)
RPAREN <n> (268)
Nonterminals, with rules where they appear
$accept (22)
on left: 0
prog <ast> (23)
on left: 1
on right: 0
expr <ast> (24)
on left: 2 3
on right: 1 2 4
optionalElse <ast> (25)
on left: 4 5
on right: 2
primaryExpr <ast> (26)
on left: 6 7 8 9 10
on right: 2 3 7 8 9 10
semi <ast> (27)
on left: 11 12
on right: 13
optionalSemi <ast> (28)
on left: 13 14
on right: 4
optionalNewlines <ast> (29)
on left: 15 16
on right: 2
newlines <ast> (30)
on left: 17 18
on right: 15 18
State 0
0 $accept: • prog $end
NUM shift, and go to state 1
"if" shift, and go to state 2
prog go to state 3
expr go to state 4
primaryExpr go to state 5
State 1
6 primaryExpr: NUM •
$default reduce using rule 6 (primaryExpr)
State 2
2 expr: "if" • '(' primaryExpr ')' optionalNewlines expr optionalElse
'(' shift, and go to state 6
State 3
0 $accept: prog • $end
$end shift, and go to state 7
State 4
1 prog: expr •
$default reduce using rule 1 (prog)
State 5
3 expr: primaryExpr •
7 primaryExpr: primaryExpr • '+' NUM
8 | primaryExpr • '-' NUM
9 | primaryExpr • '*' NUM
10 | primaryExpr • '/' NUM
'+' shift, and go to state 8
'-' shift, and go to state 9
'*' shift, and go to state 10
'/' shift, and go to state 11
$default reduce using rule 3 (expr)
State 6
2 expr: "if" '(' • primaryExpr ')' optionalNewlines expr optionalElse
NUM shift, and go to state 1
primaryExpr go to state 12
State 7
0 $accept: prog $end •
$default accept
State 8
7 primaryExpr: primaryExpr '+' • NUM
NUM shift, and go to state 13
State 9
8 primaryExpr: primaryExpr '-' • NUM
NUM shift, and go to state 14
State 10
9 primaryExpr: primaryExpr '*' • NUM
NUM shift, and go to state 15
State 11
10 primaryExpr: primaryExpr '/' • NUM
NUM shift, and go to state 16
State 12
2 expr: "if" '(' primaryExpr • ')' optionalNewlines expr optionalElse
7 primaryExpr: primaryExpr • '+' NUM
8 | primaryExpr • '-' NUM
9 | primaryExpr • '*' NUM
10 | primaryExpr • '/' NUM
'+' shift, and go to state 8
'-' shift, and go to state 9
'*' shift, and go to state 10
'/' shift, and go to state 11
')' shift, and go to state 17
State 13
7 primaryExpr: primaryExpr '+' NUM •
$default reduce using rule 7 (primaryExpr)
State 14
8 primaryExpr: primaryExpr '-' NUM •
$default reduce using rule 8 (primaryExpr)
State 15
9 primaryExpr: primaryExpr '*' NUM •
$default reduce using rule 9 (primaryExpr)
State 16
10 primaryExpr: primaryExpr '/' NUM •
$default reduce using rule 10 (primaryExpr)
State 17
2 expr: "if" '(' primaryExpr ')' • optionalNewlines expr optionalElse
'\n' shift, and go to state 18
$default reduce using rule 16 (optionalNewlines)
optionalNewlines go to state 19
newlines go to state 20
State 18
17 newlines: '\n' •
$default reduce using rule 17 (newlines)
State 19
2 expr: "if" '(' primaryExpr ')' optionalNewlines • expr optionalElse
NUM shift, and go to state 1
"if" shift, and go to state 2
expr go to state 21
primaryExpr go to state 5
State 20
15 optionalNewlines: newlines •
18 newlines: newlines • '\n'
'\n' shift, and go to state 22
$default reduce using rule 15 (optionalNewlines)
State 21
2 expr: "if" '(' primaryExpr ')' optionalNewlines expr • optionalElse
';' shift, and go to state 23
'\n' shift, and go to state 24
';' [reduce using rule 5 (optionalElse)]
'\n' [reduce using rule 5 (optionalElse)]
"else" reduce using rule 5 (optionalElse)
"else" [reduce using rule 14 (optionalSemi)]
$default reduce using rule 5 (optionalElse)
optionalElse go to state 25
semi go to state 26
optionalSemi go to state 27
shift/reduce conflict on token ';':
5 optionalElse: • %empty
11 semi: • ';'
Example: "if" '(' primaryExpr ')' optionalNewlines "if" '(' primaryExpr ')' optionalNewlines expr • ';' "else" expr
Shift derivation
expr
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse ↳ ε
↳ optionalSemi "else" expr
↳ semi
↳ • ';'
Reduce derivation
expr
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse ↳ optionalSemi "else" expr
↳ • ↳ semi
↳ ';'
shift/reduce conflict on token '\n':
5 optionalElse: • %empty
12 semi: • '\n'
Example: "if" '(' primaryExpr ')' optionalNewlines "if" '(' primaryExpr ')' optionalNewlines expr • '\n' "else" expr
Shift derivation
expr
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse ↳ ε
↳ optionalSemi "else" expr
↳ semi
↳ • '\n'
Reduce derivation
expr
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse
↳ "if" '(' primaryExpr ')' optionalNewlines expr optionalElse ↳ optionalSemi "else" expr
↳ • ↳ semi
↳ '\n'
reduce/reduce conflict on token "else":
5 optionalElse: • %empty
14 optionalSemi: • %empty
Example: •
First reduce derivation
optionalElse
↳ •
Second reduce derivation
optionalSemi
↳ •
State 22
18 newlines: newlines '\n' •
$default reduce using rule 18 (newlines)
State 23
11 semi: ';' •
$default reduce using rule 11 (semi)
State 24
12 semi: '\n' •
$default reduce using rule 12 (semi)
State 25
2 expr: "if" '(' primaryExpr ')' optionalNewlines expr optionalElse •
$default reduce using rule 2 (expr)
State 26
13 optionalSemi: semi •
$default reduce using rule 13 (optionalSemi)
State 27
4 optionalElse: optionalSemi • "else" expr
"else" shift, and go to state 28
State 28
4 optionalElse: optionalSemi "else" • expr
NUM shift, and go to state 1
"if" shift, and go to state 2
expr go to state 29
primaryExpr go to state 5
State 29
4 optionalElse: optionalSemi "else" expr •
$default reduce using rule 4 (optionalElse)
How should I fix the State 21 conflicts: 2 shift/reduce, 1 reduce/reduce?
For now I use a tricky way to fix this if-else shift/reduce error.
The idea is come from two answers:
https://stackoverflow.com/a/12734499/4438921 - use %prec to fix if-else shift/reduce error.
https://stackoverflow.com/a/63000285/4438921 - merge 2 tokens into single one to simplify bison grammar.
Firstly, add a token SEMICOLON_ELSE in flex token.l:
%option noyywrap
%option bison-bridge
%{
#include "parser.tab.hh"
%}
%%
[\n]+ { return NEWLINE; }
";" { return SEMICOLON; }
"else"{ return ELSE; }
[\n]+[^ \t\v\f\r]*"else" { return SEMICOLON_ELSE; }
";"[^ \t\v\f\r]*"else" { return SEMICOLON_ELSE; }
%%
Suppose there's no comment between ;/\n and else. Make them one single token.
You could use more complicated regex to allow comments between them.
Then in bison parser.y:
%token<n> SEMICOLON_ELSE "semicolon_else"
%token<n> ELSE "else"
%nonassoc "then"
%nonassoc "else"
%nonassoc "semicolon_else"
%%
expr : "if" '(' primaryExpr ')' optionalNewlines expr %prec "then" { $$ = nullptr; }
| "if" '(' primaryExpr ')' optionalNewlines expr "semicolon_else" expr %prec "semicolon_else" { $$ = nullptr; }
| "if" '(' primaryExpr ')' optionalNewlines expr "else" expr %prec "else" { $$ = nullptr; }
| primaryExpr
;
%%
In this case, %prec will work for parser!!!
Related
I am learning Flex/Bison and we are currently on the part about semantics, previously have dealt with lexical and syntax errors. I have googled extensively and haven't been able to find a solution to my error. I am having trouble trying to understand why I need to declare '$4' when I thought it to be automatically done.
When I try to makefile I get this error:
flex scanner.l
mv lex.yy.c scanner.c
bison -d -v parser.y
paser.y:114.71-72: error: $4 of 'case' has no declared type
114 | case WHEN INT_LITERAL ARROW statement_ {case_statements.push_back($4);};
Here is the pseudo code I am trying to follow:
statement:
CASE expression IS cases OTHERS ARROW statement_ ENDCASE
{If the attribute of cases, is a number then
return it as the attribute otherwise return the
attribute of the OTHERS clause};
cases:
cases case
{if the attribute of cases is a number then return it as the
attribute otherwise return the attribute of case} |
%empty
{Set the attribute to the sentinel NAN} ;
case:
WHEN INT_LITERAL ARROW statement_
{$-2 contains the value of the expression after CASE.
It must be compared with the attribute of INT_LITERAL.
If they match the attribute of this production
should become the attribute of statement_
If they don't match, the attribute should be set to the
sentinel value NAN} ;
parser.y:
%{
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <math.h>
using namespace std;
#include "values.h"
#include "listing.h"
#include "symbols.h"
#include <stdlib.h>
#include <stdio.h>
int yylex();
void yyerror(const char* message);
Symbols<int> symbols;
//----------------------------------------------------------------------------------------------
vector<int> case_statements; //<<<<<<<<<<<<Is this wrong?
//---------------------------------------------------------------------------------------------
int result;
double *params;
%}
%define parse.error verbose
%union
{
CharPtr iden;
Operators oper;
int value;
}
%token <iden> IDENTIFIER
%token <value> INT_LITERAL REAL_LITERAL BOOL_LITERAL CASE TRUE FALSE
%token <oper> ADDOP MULOP RELOP OROP NOTOP REMOP EXPOP
%token ANDOP
%token BEGIN_ BOOLEAN END ENDREDUCE FUNCTION INTEGER IS REDUCE RETURNS
%token THEN WHEN ARROW
%token ELSE ENDCASE ENDIF IF OTHERS REAL
%type <value> body statement_ statement reductions expression relation term
factor case cases exponent unary primary
%type <oper> operator
%%
function:
function_header optional_variable body {result = $3;} ;
function_header:
FUNCTION IDENTIFIER optional_parameter RETURNS type ';' |
FUNCTION IDENTIFIER RETURNS type ';' |
error ';' ;
optional_variable:
optional_variable variable |
error ';' |
%empty ;
variable:
IDENTIFIER ':' type IS statement_ ;
parameters:
parameter optional_parameter;
optional_parameter:
optional_parameter ',' parameter |
%empty ;
parameter:
IDENTIFIER ':' type {symbols.insert($1, params[0]);} ;
type:
INTEGER |
REAL |
BOOLEAN ;
body:
BEGIN_ statement_ END ';' {$$ = $2;} ;
statement_:
statement ';' |
error ';' {$$ = 0;} ;
statement:
expression |
REDUCE operator reductions ENDREDUCE {$$ = $3;} |
IF expression THEN statement_ ELSE statement_ ENDIF {
if ($2 == true) {
$$ = $4;
}
else {
$$ = $6;
}
} ; |
CASE expression IS cases OTHERS ARROW statement_ ENDCASE {$$ = $<value>4 == $1 ? $4 : $7;} ;
cases:
cases case {$$ = $<value>1 == $1 ? $1 : $2;} |
%empty {$$ = NAN;} ;
//-----------------------------------------------------------------------------------------------------------
case:
case WHEN INT_LITERAL ARROW statement_ {case_statements.push_back($4);} ; //<<<<<<<<<How do I declare $4?
//-------------------------------------------------------------------------------------------------------------
operator:
ADDOP |
RELOP |
EXPOP |
MULOP ;
reductions:
reductions statement_ {$$ = evaluateReduction($<oper>0, $1, $2);} |
{$$ = $<oper>0 == ADD ? 0 : 1;} %empty ;
expression:
expression OROP relation {$$ = $1 || $3;} |
relation ;
expression:
expression ANDOP relation {$$ = $1 && $3;} |
relation ;
relation:
relation RELOP term {$$ = evaluateRelational($1, $2, $3);} |
term ;
term:
term ADDOP factor {$$ = evaluateArithmetic($1, $2, $3);} |
factor ;
factor:
factor MULOP primary {$$ = evaluateArithmetic($1, $2, $3);} |
factor REMOP exponent {$$ = $1 % $3;} |
exponent ;
exponent:
unary |
unary EXPOP exponent {$$ = pow($1, $3);} ;
unary:
NOTOP primary {$$ = $2;} |
primary;
primary:
'(' expression ')' {$$ = $2;} |
INT_LITERAL |
REAL_LITERAL |
BOOL_LITERAL |
IDENTIFIER {if (!symbols.find($1, $$)) appendError(UNDECLARED, $1);} ;
%%
void yyerror(const char* message)
{
appendError(SYNTAX, message);
}
int main(int argc, char *argv[])
{
params = new double[argc - 1]
for (int i = 1; i < argc; i++)
{
params[i - 1] = atof(argv[i]);
}
firstLine();
yyparse();
if (lastLine() == 0)
cout << "Result = " << result << endl;
return 0;
}
You need to assign a value/type to statement_:
statement_:
statement ';' {$$ = $1;}|
error ';' {$$ = MISMATCH;} ;
I'm trying to integrate error recovery in my grammar. From the bison manual, the simplest error recovery would be skip the current line. But in my flex file, I have no action regarding the newline so the parser would not know about it. So I want the parser to ignore everything until it encounters a semicolon in case of an error.
I have the following grammar:
start : program;
program : program unit
| unit
;
unit : var_declaration
| func_declaration
| func_definition
;
func_declaration : type_specifier ID LPAREN parameter_list RPAREN SEMICOLON
| type_specifier ID LPAREN RPAREN SEMICOLON
;
func_definition : type_specifier ID LPAREN parameter_list RPAREN compound_statement
| type_specifier ID LPAREN RPAREN compound_statement
;
parameter_list : parameter_list COMMA type_specifier ID
| parameter_list COMMA type_specifier
| type_specifier ID
| type_specifier
;
compound_statement : LCURL statements RCURL
| LCURL RCURL
;
var_declaration : type_specifier declaration_list SEMICOLON
;
type_specifier : INT
| FLOAT
| VOID
;
declaration_list : declaration_list COMMA ID
| declaration_list COMMA ID LTHIRD CONST_INT RTHIRD
| ID
| ID LTHIRD CONST_INT RTHIRD
;
statements : statement
| statements statement
;
statement : var_declaration
| expression_statement
| compound_statement
| FOR LPAREN expression_statement expression_statement expression RPAREN statement
| IF LPAREN expression RPAREN statement
| IF LPAREN expression RPAREN statement ELSE statement
| WHILE LPAREN expression RPAREN statement
| PRINTLN LPAREN ID RPAREN SEMICOLON
| RETURN expression SEMICOLON
;
expression_statement : SEMICOLON
| expression SEMICOLON
;
variable : ID
| ID LTHIRD expression RTHIRD
;
expression : logic_expression
| variable ASSIGNOP logic_expression
;
logic_expression : rel_expression
| rel_expression LOGICOP rel_expression
;
rel_expression : simple_expression
| simple_expression RELOP simple_expression
;
simple_expression : term
| simple_expression ADDOP term
;
term : unary_expression
| term MULOP unary_expression
;
unary_expression : ADDOP unary_expression
| NOT unary_expression
| factor
;
factor : variable
| ID LPAREN argument_list RPAREN
| LPAREN expression RPAREN
| CONST_INT
| CONST_FLOAT
| variable INCOP
| variable DECOP
;
argument_list : arguments
|
;
arguments : arguments COMMA logic_expression
| logic_expression
;
I'm currently working on the following input:
int main(){
int a[2],c,i,j ; float c;
a[2.5]=1;
i=2.3
j=2%3.7;
a=4;
func(a);
b=8;
return 0;
}
When the parser encounters i = 2.3, it won't stop parsing but rather continue doing so after reporting a syntax error.
Based on the grammar, where should I put my error production so that the parser can continue parsing without any conflict ? And possibly shed some light on other syntax errors like missing a RPAREN or Curly braces ? How should I approach to add the error production for a given grammar ?
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I am trying to build a simple compiler and I am in the stage to test the Bison parser I created recently over some sample .decaf files, the parser works well with all keywords and the grammar's terminal and non-terminal tokens/types and rest of the grammar rules and actions, but there is only one problem that my parser does not recognize the New keyword/operator, when ever a statement includes a New keyword it results into an error in the output!
Defining New as a terminal token
%token T_New
CFG grammar rule and action for Expr that also includes rule and action for T_New
Expr : LValue '=' Expr { $$=new AssignExpr($1,new Operator(#2,"="),$3); }
| '(' Expr ')' { $$=$2; }
| Expr '+' Expr { $$=new ArithmeticExpr($1,new Operator(#2,"+"),$3); }
| Expr '-' Expr { $$=new ArithmeticExpr($1,new Operator(#2,"-"),$3); }
| Expr '*' Expr { $$=new ArithmeticExpr($1,new Operator(#2,"*"),$3); }
| Expr '/' Expr { $$=new ArithmeticExpr($1,new Operator(#2,"/"),$3); }
| Expr '%' Expr { $$=new ArithmeticExpr($1,new Operator(#2,"%"),$3); }
| '-' Expr %prec T_UnaryMinus { $$=new ArithmeticExpr(new Operator(#1,"-"),$2); }
| Expr T_And Expr { $$=new LogicalExpr($1,new Operator(#2,"&&"),$3); }
| Expr T_Or Expr { $$=new LogicalExpr($1,new Operator(#2,"||"),$3); }
| Expr '<' Expr { $$=new RelationalExpr($1,new Operator(#2,"<"),$3); }
| Expr T_LessEqual Expr { $$=new RelationalExpr($1,new Operator(#2,"<="),$3); }
| Expr '>' Expr { $$=new RelationalExpr($1,new Operator(#2,">"),$3); }
| Expr T_GreaterEqual Expr { $$=new RelationalExpr($1,new Operator(#2,">="),$3); }
| Expr T_Equal Expr { $$=new EqualityExpr($1,new Operator(#2,"=="),$3); }
| Expr T_NotEqual Expr { $$=new EqualityExpr($1,new Operator(#2,"!="),$3); }
| '!' Expr { $$=new LogicalExpr(new Operator(#1, "!"), $2); }
| T_ReadInteger '(' ')' { $$=new ReadIntegerExpr(#1); }
| T_ReadLine '(' ')' { $$=new ReadLineExpr(#1); }
| T_New Identifier { $$=new NewExpr(#2,new NamedType($2)); }
| T_NewArray '(' Expr ',' Type ')' { $$=new NewArrayExpr(#1,$3,$5); }
| LValue { $$=$1; }
| T_This { $$=new This(#1); }
| Call { $$=$1; }
| Constant { $$=$1; }
;
for example I have this sample file interface.decaf for testing and it has a main function as below:
void main() {
Colorable s;
Color green;
green = New(Color);
green.SetRGB(0, 0, 255);
s = New(Rectangle);
s.SetColor(green);
}
But when I run my parser over this sample file in the terminal I get this error:
*** Error line 33.
green = New(Color);
*** syntax error
I tried with other sample files and noticed that any file that has a statement that mentions 'New' keyword returns the same error.
I got some hint from this question that probably New keyword is mixed up between C and C++ and that's why its not recognized by bison. but I am still not able to figure out how to fix this ! Can anyone help please ?
Your grammar has a rule
| T_New Identifier { ...
matching a New keyword followed immediately by an identifier. However, your examples all have parenthesis around the identifier:
green = New(Color)
s = new(Rectangle)
thus the syntax error you are seeing -- the input has a ( where the grammar expects an identifier...
Hi I've used lex and yacc to create my own programming language syntax (sort of) , but no matter how my grammar rules are put , it gives me syntax error at the same first line.
This is my lex code of regular expressions:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%option noyywrap
punct [.]
virgula [,]
numar [0-9]+
numar2 [0-9]
%%
"&librarie=>" {return INCLUDE;}
"stringuri"|"vectori"|"mape"|"matematica" {return LIBRARII;}
"intreg"|"caracter"|"string"|"natural" {return TIPVAR;}
"real" {return REAL;}
"daca" {return DACA;}
"pentru" {return PENTRU;}
"cat_timp" {return CATTIMP;}
def_variabla_globala {return VARGLOBALE;}
def_variabla_locala {return VARLOCALE;}
structura_obiect {return STRUCTURA;}
"procedura[]" {return PROCEDURA;}
start_program {return START;}
stop_program {return STOP;}
inceput_bloc_if {return INBLOCIF;}
sfarsit_bloc_if {return SFBLOCIF;}
atunci {return ATUNCI;}
altfel {return ALTFEL;}
from {return FROM;}
to {return TO;}
smaller_than {return MAIMIC;}
greater_than {return MAIMARE;}
equal_to {return EGAL;}
different_than {return DIFERIT;}
inceput_bloc_for {return INBLOCFOR;}
sfarsit_bloc_for {return SFBLOCFOR;}
inceput_bloc_cat_timp {return INBLOCCATTIMP;}
sfarsit_bloc_cat_timp {return SFBLOCCATTIMP;}
executa {return EXECUTA;}
suma {return SUM;}
invers {return INV;}
oglindit {return OGL;}
"<-" {return ASIGNARE;}
[a-zA-Z][a-zA-Z0-9]* {return ID;REJECT;}
{numar} {return NUMAR;REJECT;}
[0-9]{punct}{numar} {return NUMARREAL;}
{numar}|({virgula}{numar})* {return VECTASIGN;}
({numar2}{punct}{numar})|({virgula}({numar2}{punct}{numar}))* {return VECTASIGNREAL;}
[a-zA-Z][a-zA-Z ]* {return CUVANT;REJECT;}
afisare {return AFISARE;}
[ \t] ;
\n {yylineno++;}
. {return yytext[0];}
%%
I used REJECT at those 2 regex , because it gave me a warning , several of my rules were having conflicts with each other.
My grammar rules :
%{
#include <stdio.h>
extern FILE* yyin;
extern char* yytext;
extern int yylineno;
%}
%token INCLUDE LIBRARII ID TIPVAR CUVCHEIE REAL NUMARREAL FROM TO VECTASIGNREAL VARGLOBALE VARLOCALE VECTASIGN STRUCTURA PROCEDURA START STOP DACA PENTRU CATTIMP INBLOCIF SFBLOCIF ATUNCI ALTFEL MAIMIC MAIMARE EGAL DIFERIT INBLOCFOR SFBLOCFOR INBLOCCATTIMP SFBLOCCATTIMP EXECUTA SUM INV OGL ASIGNARE NUMAR CUVANT AFISARE
%start progr
%left '+' '-'
%left '*' '/'
%%
progr: headere declaratii program {printf("correct syntax");}
;
headere : header
|headere header
;
header : INCLUDE LIBRARII
;
declaratii : declaratie ';'
| declaratii declaratie ';'
;
declaratie : VARGLOBALE TIPVAR ID
| VARGLOBALE TIPVAR ID '[' NUMAR ']'
| VARGLOBALE TIPVAR ID ASIGNARE NUMAR
| VARGLOBALE TIPVAR ID ASIGNARE '#' CUVANT '#'
| VARGLOBALE TIPVAR ID '[' NUMAR ']' ASIGNARE '[' VECTASIGN ']'
| VARGLOBALE REAL ID
| VARGLOBALE REAL ID '[' NUMAR ']'
| VARGLOBALE REAL ID ASIGNARE NUMARREAL
| VARGLOBALE REAL ID '[' NUMAR ']' ASIGNARE '[' VECTASIGNREAL ']'
;
program : PROCEDURA START bloc STOP
;
bloc : declaratiile instructiuni
;
declaratiile : declaratia ';'
| declaratiile declaratia ';'
;
declaratia : VARLOCALE TIPVAR ID
| VARLOCALE TIPVAR ID '[' NUMAR ']'
| VARLOCALE TIPVAR ID ASIGNARE NUMAR
| VARLOCALE TIPVAR ID ASIGNARE '#' CUVANT '#'
| VARLOCALE TIPVAR ID '[' NUMAR ']' ASIGNARE '[' VECTASIGN ']'
| VARLOCALE REAL ID
| VARLOCALE REAL ID '[' NUMAR ']'
| VARLOCALE REAL ID ASIGNARE NUMARREAL
| VARLOCALE REAL ID '[' NUMAR ']' ASIGNARE '[' VECTASIGNREAL ']'
;
instructiuni : instructiune ';'
| instructiuni instructiune ';'
;
instructiune : instructiune_simpla ';'
| instructiune_compusa ';'
;
instructiune_simpla : ID ASIGNARE expresie ';'
;
expresie : expresie '+' expresie
| expresie '-' expresie
| expresie '*' expresie
| expresie '/' expresie
| functie
| NUMAR
| ID
;
functie : INV '(' expresie ')'
| SUM '(' expresie ',' expresie ',' expresie ',' expresie ',' expresie ')'
| OGL '(' expresie ')'
| AFISARE '(' NUMAR ')'
| AFISARE '(' ID ')'
| AFISARE '(' CUVANT ')'
;
instructiune_compusa : DACA conditie ATUNCI
INBLOCIF
instructiuni
SFBLOCIF
ALTFEL
INBLOCIF
instructiuni
SFBLOCIF
|PENTRU ID FROM NUMAR TO NUMAR EXECUTA
INBLOCFOR
instructiuni
SFBLOCFOR
|CATTIMP ID conditie NUMAR
INBLOCCATTIMP
instructiuni
SFBLOCCATTIMP
;
conditie : MAIMARE
| MAIMIC
| EGAL
;
%%
int yyerror(char * s){
printf("error: %s line:%d\n",s,yylineno);
}
int main(int argc, char** argv){
yyin=fopen(argv[1],"r");
yyparse();
}
And this is the file that I test with , and it should return that the program has a correct syntax.
&librarie=>stringuri
&librarie=>vectori
def_variabila_globala intreg var1<-23;
def_variabila_globala natural vect[50]<-{1,3,51,2,421,12,43};
def_variabila_globala real a<-12.5;
def_variabila_globala caracter ch<-#x#;
def_variabila_globala string s<-#alabala portocala#;
structura_obiect persoana
~
real inaltime;
natural varsta;
~
procedura[]
start_program
def_variabila_locala intreg negativ,pozitiv,s,contor1,contor2<-5;
def_variabila_locala adunari_scaderi;
def_variabila_locala inmultiri_impartiri;
def_variabila_locala ad_scd_inm_imp;
persoana p1;
p1#inaltime<-1.82;
p1#varsta<-20;
adunari_scaderi<-243-12+43-12+11+31-124;
afisare<adunari_scaderi>;
inmultiri_impartiri<-3*4/2/2*3/9;
afisare<inmultiri_impartiri>;
ad_scd_inm_imp<-4*2-3+5/2*5;
afisare<ad_scd_inm_imp>;
negativ<-invers<2>;
afisare<negativ>;
daca(invers<inmultiri_impartiri> smaller_than 0)
antunci
inceput_bloc_if
pentru contor1 from 0 to 10 executa
inceput_bloc_for
afiseaza<#for1#>;
afiseaza<#for2#>;
sfarsit_bloc_for
sfarsit_bloc_if
altfel
inceput_bloc_if
afiseaza<#if#>;
sfarsit_bloc_if
cat_timp contor2 greater_than 0
inceput_bloc_cat_timp
afisare<#cattimp#>;
contor<-contor-1;
sfarsit_bloc_cat_timp
pozitiv<-invers<-2>;
afisare<pozitiv>;
var1<-oglindit<321>;
afisare<var1>;
s<-suma<1,3,12,31,oglindit<123-2>>;
afisare<s>;
afisare<#ProgramTerminat#>;
stop_program
I don't know where it could be wrong , is because of the lexical rules , are they interfering with each other?
Thank you.
EDIT
error mesage : error: syntax error at line:1
The exact problem is that the parser sees a non valid rule at my "&librarie=>stringuri" declaration , in my file. (line 1)
I'm trying to parse a subset of cpp source syntax. The follow ANTLR4 parser rules are directly copied from the c++ language specification (except hypens are replaced by underscores):
abstract_declarator:
ptr_operator abstract_declarator?
| direct_abstract_declarator
;
direct_abstract_declarator:
direct_abstract_declarator? '(' parameter_declaration_clause ')' cv_qualifier_seq? exception_specification?
| direct_abstract_declarator? '[' constant_expression? ']'
| '(' abstract_declarator ')'
;
But I got this error when org.antlr.v4.Tool is parsing the grammar:
error(119): cppProcessor.g4::: The following sets of rules are mutually left-recursive [direct_abstract_declarator]
It seems that direct_abstract_declarator? syntax at the left hand side causes the error. How should I correct it? Why can't ANTLR4 support it?
Manually refactoring the rules to this form doesnt produce the error:
direct_abstract_declarator:
direct_abstract_declarator '(' parameter_declaration_clause ')' cv_qualifier_seq? exception_specification?
| '(' parameter_declaration_clause ')' cv_qualifier_seq? exception_specification?
| direct_abstract_declarator '[' constant_expression? ']'
| '[' constant_expression? ']'
| '(' abstract_declarator ')'
So is it possible for ANTLR4 to support the first syntax directly when handling left recursive rules?
ANTLR 4 supports direct left recursion, but not indirect or hidden left recursion. You can address the situation above by explicitly expanding the optional construct.
direct_abstract_declarator
: direct_abstract_declarator '(' parameter_declaration_clause ')' cv_qualifier_seq? exception_specification?
| direct_abstract_declarator '[' constant_expression? ']'
| '(' parameter_declaration_clause ')' cv_qualifier_seq? exception_specification?
| '[' constant_expression? ']'
| '(' abstract_declarator ')'
;