I am trying to create a parser-generator using flex/bison. This is my partial parser.y code:
func_definition : type_specifier ID LPAREN parameter_list RPAREN compound_statement
{
$$=new Symbol_info();
$$->code+="PROC:"+ $2->symbol+"\n";
if($2->symbol!="main")
{
$$->code+="PUSH AX\n";
$$->code+="PUSH BX\n";
$$->code+="PUSH CX\n";
$$->code+="PUSH DX\n";
}
$$->code += $6->code ;
if($2->symbol!="main") {
$$->code+="POP DX\n";
$$->code+="POP CX\n";
$$->code+="POP BX\n";
$$->code+="POP AX\n";
}
fprintf(parseLog, "GRAMMER RULE: func_definition -> type_specifier ID LPAREN parameter_list RPAREN compound_statement \n");
}
;
And this is my partial lex.l code.
{id} {
Symbol_info *s= new Symbol_info(yytext, "ID");
yylval = (YYSTYPE)s;
return ID;
}
And this is my partial symbol_table.h code
class SymbolInfo{
string type;
string symbol;
public:
string code;
SymbolInfo *next;
SymbolInfo(){
symbol="";
type="";
code="";
}
SymbolInfo(string symbol, string type){
this->symbol=symbol;
this->type=type;
code="";
}
SymbolInfo(char *symbol, char *type){
this->symbol=string(symbol);
this->type= string(type);
code="";
}
SymbolInfo(const SymbolInfo *sym){
symbol=sym->symbol;
type=sym->type;
code=sym->code;
}
So, when I create a program, I get a SIGSEGV segmentation fault. (Address boundary error). It appears that I get that error when I try to access the yylval returned to me by the lex function.
I tried to run this code on an Ubuntu 64-bit instance (Ubuntu 17.10). I don't know why but the same code runs fine on a 32 bit system (Ubuntu 14.10).
Maybe it's because of the large Integer sizes. Here is the code if you're interested.
Related
I am studying compilers and studying Lex and Yacc. I write a LexYacc code as my teacher shows:
here is exp.l:
/*%option outfile="scanner.cpp"*/
%{
/*#include "exp.tab.h"*/
#include "y.tab.h"
extern int yylval;
%}
%%
0|[1-9][0-9]* { yylval = atoi(yytext); return INTEGER; }
[+*()\n] { return yytext[0]; }
. { /* do nothing */ }
%%
and this is exp.y:
/*
%output "parser.cpp"
%skeleton "lalr1.cc"
*/
%{
#include <stdio.h>
%}
%token INTEGER
%left '+'
%left '*'
%%
input : /* empty string */
| input line
;
line : '\n'
| exp '\n' { printf ("\t%d\n", $1); }
| error '\n'
;
exp : INTEGER { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '*' exp { $$ = $1 * $3; }
| '(' exp ')' { $$ = $2; } ;
%%
main () {
yyparse ();
}
yyerror (char *s) {
printf ("%s\n", s);
}
and I use linux command to run it:
flex exp.l
bison -d exp.y
gcc exp.tab.c lex.yy.c -o exp -lfl
and it shows this:
exp.tab.c: In function ‘yyparse’:
exp.tab.c:1217:16: warning: implicit declaration of function ‘yylex’ [-Wimplicit-function-declaration]
1217 | yychar = yylex ();
|
exp.tab.c:1374:7: warning: implicit declaration of function ‘yyerror’; did you mean ‘yyerrok’? [-Wimplicit-function-declaration]
1374 | yyerror (YY_("syntax error"));
|
| yyerrok
exp.y: At top level:
exp.y:28:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
28 | main () {
|
exp.y:33:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
33 | yyerror (char *s) {
|
exp.l:4:10: fatal error: y.tab.h: No such file or directory
4 | #include "y.tab.h"
|
compilation terminated.
The whole program is mainly a calculator to calculate addition and multiplication.
I don't know what happened and hope someone can help me.
I'm working on vmWare linux ubuntu on a bison-dlex project, and I have an error in my bison file which I can't get over. In my "line' definition I have " logExp '\n' " definition, but for some reason it never gets there even though it does recognize the expression as logExp.
line:
expr '\n' { printf("\nExpression = %d\n", $1); }
| logExp '\n' { printf("\nNEVER GETS HERE!!\n"); } //ERROR
;
logExp:
expr AND expr { $$ = 0 ; printf("\n$1=%d, $3=%d\n",$1,$3); } //PRINTS GOOD
| AND { }
;
input:
5&&6
output:
$1=5, $3=6
Error: parse error
If it recognizes the logExp, how come it doesn't recognize the line above??
..HELP ?
Write a parser (both Yacc and Lex files) that uses the following productions and actions:
S -> cSS {print “x”}
S -> a {print “y”}
S -> b {print “z”}
Indicate the string that it will print when the input is cacba.
I am getting this error: when I give input to it, it says valid input and also says syntax error.
My Scanner Code is this
%{
#include "prac.h"
%}
%%
[c] {return C; }
[a] {return A; }
[b] {return B; }
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
int yywrap(void) {
return 1;
}
And my yacc code is this:
%{
#include <stdio.h>
%}
%token A B C
%%
statement: S {printf("Valid Input"); }
;
S: C S S {printf("Print x\n");}
| A {printf("Print y\n");}
| B {printf("Print z\n");}
;
%%
int main()
{
return yyparse();
}
yyerror(char *s)
{
printf("\n%s\n",s);
printf("Invalid Input");
fprintf(stderr,"At line %d %s ",s,yylineno);
}
How can I fix this?
(Comments converted to an answer)
#ChrisDodd wrote:
Best guess -- you're running on windows, so you're getting a \r (carriage return) character before the newline which is causing your error. Try adding \r to the [ \t] pattern to ignore it.
#Cyclone wrote:
Change your fprintf() statement to fprintf(stderr, "At line %d %s", yylineno, s); not that it will solve your problem.
The OP wrote:
You mean I should add \r into \t so the new regex for it will be [\r\t] Am I right ?
#rici wrote:
#chris suggests [ \r\t]. If you have Windows somewhere in the loop, I agree.
For some reason bison is rejecting a specific rule, the notequal_expression, beware that Im just starting to learn the whole concept so my line of thought is not so mature, the input file: ( The Error is: "string.y contains 1 useless nonterminal and 1 useless rule." )
/* Parser for StringC */
%{
/* ------------------------------------------------------------------
Initial code (copied verbatim to the output file)
------------------------------------------------------------------ */
// Includes
#include <malloc.h> // _alloca is used by the parser
#include <string.h> // strcpy
#include "lex.h" // the lexer
// Some yacc (bison) defines
#define YYDEBUG 1 // Generate debug code; needed for YYERROR_VERBOSE
#define YYERROR_VERBOSE // Give a more specific parse error message
// Error-reporting function must be defined by the caller
void Error (char *format, ...);
// Forward references
void yyerror (char *msg);
%}
/* ------------------------------------------------------------------
Yacc declarations
------------------------------------------------------------------ */
/* The structure for passing value between lexer and parser */
%union {
char *str;
}
%token ERROR_TOKEN IF ELSE PRINT INPUT ASSIGN EQUAL NOTEQUAL
%token CONCAT END_STMT OPEN_PAR CLOSE_PAR
%token BEGIN_CS END_CS
%token <str> ID STRING BOOLEAN
/*%type <type> type simple_type cast*/
%expect 1 /* shift/reduce conflict: dangling ELSE */
/* declaration */
%%
/* ------------------------------------------------------------------
Yacc grammar rules
------------------------------------------------------------------ */
program
: statement_list
;
statement_list
: statement_list statement
| /* empty */
;
statement
: END_STMT {puts ("Empty statement");}
| expression END_STMT {puts ("Expression statement");}
| PRINT expression END_STMT {puts ("Print statement");}
| INPUT identifier END_STMT {puts ("Input statement");}
| if_statement {puts ("If statement");}
| compound_statement {puts ("Compound statement");}
| error END_STMT {puts ("Error statement");}
| notequal_expression {puts ("Not equal statement");}
;
/* NOTE: This rule causes an unresolvable shift/reduce conflict;
That's why %expect 1 was added (see above) */
if_statement
: IF OPEN_PAR expression CLOSE_PAR statement optional_else_statement
;
optional_else_statement
: ELSE statement
| /* empty */
;
compound_statement
: BEGIN_CS statement_list END_CS
;
expression
: equal_expression
| OPEN_PAR expression CLOSE_PAR
;
equal_expression
: expression EQUAL assign_expression
| assign_expression
;
notequal_expression
: expression NOTEQUAL assign_expression
| NOTEQUAL assign_expression
;
assign_expression
: identifier ASSIGN assign_expression
| concat_expression
;
concat_expression
: concat_expression CONCAT simple_expression
| simple_expression
;
simple_expression
: identifier
| string
;
identifier
: ID {}
;
string
: STRING {}
;
bool
: BOOLEAN {}
;
%%
/* ------------------------------------------------------------------
Additional code (again copied verbatim to the output file)
------------------------------------------------------------------ */
The lexer:
/* Lexical analyzer for StringC */
%{
/* ------------------------------------------------------------------
Initial code (copied verbatim to the output file)
------------------------------------------------------------------ */
// Includes
#include <string.h> // strcpy, strncpy
#include <io.h> // isatty
#ifdef MSVC
#define isatty _isatty // for some reason isatty is called _isatty in VC..
#endif
#define _LEX_CPP_ // make sure our variables get created
#include "lex.h"
#include "lexsymb.h"
extern "C" int yywrap (); // the yywrap function is declared by the caller
// Forward references
void Identifier ();
void StringConstant ();
void BoolConstant ();
void EatComment ();
//// End of inititial code
%}
/* ------------------------------------------------------------------
Some macros (standard regular expressions)
------------------------------------------------------------------ */
LETTER [a-zA-Z_]
DIGIT [0-9]
IDENT {LETTER}({LETTER}|{DIGIT})*
STR \"[^\"]*\"
BOOL \(false|true)\
WSPACE [ \t]+
/* ------------------------------------------------------------------
The lexer rules
------------------------------------------------------------------ */
%%
"if" {return IF;}
"else" {return ELSE;}
"print" {return PRINT;}
"input" {return INPUT;}
"=" {return ASSIGN;}
"==" {return EQUAL;}
"!=" {return NOTEQUAL;} /* Not equal to */
"+" {return CONCAT;}
";" {return END_STMT;}
"(" {return OPEN_PAR;}
")" {return CLOSE_PAR;}
"{" {return BEGIN_CS;}
"}" {return END_CS;}
{BOOL} {BoolConstant (); return BOOLEAN;}
{STR} {StringConstant (); return STRING;}
{IDENT} {Identifier (); return ID;}
"//" {EatComment();} /* comment: skip */
\n {lineno++;} /* newline: count lines */
{WSPACE} {} /* whitespace: (do nothing) */
. {return ERROR_TOKEN;} /* other char: error, illegal token */
%%
/* ------------------------------------------------------------------
Additional code (again copied verbatim to the output file)
------------------------------------------------------------------ */
// The comment-skipping function: skip to end-of-line
void EatComment() {
char c;
while ((c = yyinput()) != '\n' && c != 0);
lineno++;
}
// Pass the id name
void Identifier () {
yylval.str = new char[strlen(yytext)+1];
strcpy (yylval.str, yytext);
}
// Pass the string constant
void StringConstant() {
int l = strlen(yytext)-2;
yylval.str = new char[l+1];
strncpy (yylval.str, &yytext[1], l); yylval.str[l] = 0;
}
void BoolConstant() {
int l = strlen(yytext)-2;
yylval.str = new char[l+1];
strncpy(yylval.str, &yytext[1], l); yylval.str[l] = 0;
}
Are you sure that it's notequal_expression that is causing the issue? The nonterminal and rule that are not used, as I read it, are
bool
: BOOLEAN {}
;
Perhaps instead of
simple_expression
: identifier
| string
;
you intended to code
simple_expression
: identifier
| string
| bool
;
There are two problems with the grammar. The first is the shift/reduce conflict you've already seen (and addressed with %expect 1. I prefer to address it in the grammar instead and use %expect 0 instead. You can do that by removing ELSE from the %token list and adding a line
%right THEN ELSE
To declare right associativity. Your language doesn't actually have a THEN keyword but that's fine. You can then remove completely the rule for optional_else_statement and reword the rule for if_statement as follows:
if_statement
: IF OPEN_PAR expression CLOSE_PAR statement %prec THEN
| IF OPEN_PAR expression CLOSE_PAR statement ELSE statement
;
There are those who prefer to resolve it this way, and others who advocate the %expect 1 approach. I prefer this way, but now that you have both methods, you can certainly choose for yourself.
For the other problem, the useless rule is definitely this one:
bool
: BOOLEAN {}
;
because the non-terminal bool is not used anywhere else in the grammar. This accounts for both "1 useless nonterminal and 1 useless rule" as reported by bison. To be able to identify this kind of thing for yourself, you can use
bison --report=solved -v string.y
This will create a string.output file which will contain a large but readable report including any resolved shift-reduce errors (such as your IF-ELSE construction) and also a complete set of states created by bison. It's very useful when attempting to troubleshoot grammar problems.
I am writing a front end for my C compiler, where in I am adding Type system currently. Previously I assumed everything was an int and hence the following rule worked fine.
declaration: datatype varList ';' { gTrace<<"declaration ";}
varList: IDENTIFIER { builder.addSymbol($1); }
| varList',' IDENTIFIER { builder.addSymbol($3); }
;
But now I also add type to the symbol, and hence modified my rule like below:
declaration: datatype { currentType = $1; } varList ';' { gTrace<<"declaration "; currentType = -1; }
varList: IDENTIFIER { builder.addSymbol($1, getType(currentType)); }
| varList',' IDENTIFIER { builder.addSymbol($3, getType(currentType)); }
;
I get a shift/reduce error when I do that, since the { currentType = $1; } is being considered as an empty rule. How do I go about this error? Is there a way to specify that it is just an action?
Attached below is a snippet from my y.output
32 $#6: /* empty */
33 declaration: datatype $#6 varList ';'
34 varList: IDENTIFIER
35 | varList ',' IDENTIFIER
I don't get any error or warnings:
%token INT
%token FLOAT
%token CHAR
%token IDENTIFIER
%%
declaration: datatype { currentType = $1; } varList ';' { gTrace<<"declaration "; currentType = -1; }
varList : IDENTIFIER { builder.addSymbol($1, getType(currentType)); }
| varList ',' IDENTIFIER { builder.addSymbol($3, getType(currentType)); }
;
datatype: INT
| FLOAT
| CHAR
;
%%
Command
% bison p.yacc
%
I think you will need to provide more information.
The full yacc file and the parameters you are passing to yacc/bison
Edit
I tried your file (as per the comment) I still get no errors or warnings:
> yacc --version
bison (GNU Bison) 2.3
Written by Robert Corbett and Richard Stallman.
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I fixed the problem as below:
declaration: datatype varList ';' { gTrace<<"declaration "; currentType = -1; }
varList: IDENTIFIER { builder.addSymbol($1, getType(currentType)); }
| varList',' IDENTIFIER { builder.addSymbol($3, getType(currentType)); }
;
datatype: INTEGER { gTrace<<"int "; $$ = currentType = Type::IntegerTy; }
| FLOAT { gTrace<<"float "; $$ = currentType = Type::FloatTy; }
| VOID { gTrace<<"void "; $$ = currentType = Type::VoidTy; }
;
#sarnold, hope this helps!
I thing you can only define an actions block for each rule, so
declaration: datatype { currentType = $1; } varList ';' { gTrace<<"declaration "; currentType = -1; }
should be done as
declaration: datatype varList ';' { currentType = $1; gTrace<<"declaration "; currentType = -1; }
Anyway, you are setting currentType to the lexical value of datatype and to -1 right after