I am trying to do a simple if condition from an input file.
i will have something like
if(color = black)
No matter what i do i keep getting 1 shift / reduce
I am very new to lex and yacc
Do YACC grammars often have shift-reduce conflicts? and should i not worry about them?
My lex file will return every character in the file correctly so i wont show you the lex file
However, here is my yacc file:
%{
#include <ctype.h>
#include <stdio.h>
%}
|IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("WONT COMPILE\n");}
;
in the yacc file i tried this but that is where i am getting the shift/ reduce
IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("If statement\n");}
SOLVED
I originally wrote a long answer about the “dangling else” ambiguity, but then I took a closer look at your grammar. Let’s cut it down to size a bit and demonstrate where the problems are:
%token IF COLOR BLACK
%%
statement
: statement command
| /*nothing*/
;
command
: IF {$$ = $1; printf ("IF\n");}
| ELSE {$$ = $1; printf("ELSE\n");}
| EQ {$$ = $1; printf("EQ\n");}
| THEN {$$ = $1; printf("THEN\n");}
| LPAREN {$$ = $1; printf("LPAREN\n");}
| RPAREN {$$ = $1; printf("RPAREN\n");}
| COLOR EQ BLACK {$$ = $3; printf("color is black\n");}
| IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("WONT COMPILE\n");}
;
Just how do you expect the statement if(color = black) to be parsed? Notice that the “color = black” can reduce to a command via COLOR EQ BLACK or can be “shifted” onto the stack to become part of the longer parse IF LPAREN COLOR EQ BLACK RPAREN.
That explains the specific warning you’re getting. Now, on to the rest of your grammar:
You don’t want to be writing your grammar so incomplete statements are meaningful. Notice that the single symbol “=” is a complete valid command and therefore a complete valid statement—is that really what you want?
You’re going to want to rewrite this from scratch. Start simple:
%token NUMBER COMMAND IF THEN ELSE COLOR BLACK
%%
statement
: COMMAND NUMBER
| IF cond THEN statement
| /* nothing */
;
cond
: '(' COLOR '=' BLACK ')'
;
Not tested; but this should be enough to get you started. If you need to do something when you encounter a token, you can (for example) replace IF cond THEN COMMAND with if cond then command and add rules like
if : IF { printf("%s\n", "IF"); }
;
then: THEN { printf("%s\n", "THEN"); }
;
Start simple, add slowly, and refactor when rules get too hairy or repetitive. And work through a tutorial before you jump into a large project. The GNU Bison manual has a good tutorial, as does Kernighan & Pike’s The Unix Programming Environment.
Related
When I run my bison code in Linux Mint I get these warnings:
warning: 64 shift/reduce conflicts [-Wconflicts-sr]
I cannot find the ambiguity. I've tried to make my implementation hanlde both INT and FLOAT datatypes, but I believe that is the source of the warnings.
My Bison code is below:
%{
/**
* Definition section
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cstdio>
#include <iostream>
using namespace std;
/**
* Declare stuff from Flex that Bison needs to know about:
*/
extern int yylex();
extern int yyparse();
extern FILE* yyin;
extern int line_num;
void yyerror(const char* s);
%}
/** Bison fundamentally works by asking flex to get the next token,
* which it returns as an object of type "yystype". Initially (by default),
* yystype is merely a typedef of "int", but for non-trivial projects, tokens
* could be of any arbitrary data type. So, to deal with that, the idea is to
* override yystype's default typedef to a C union instead. Unions can hold all
* of the types of tokens that Flex could return, and this means we can return
* ints or floats or string cleanly. Bison implements this mechanism with the
* %union directive:
*/
%union {
int ival;
float fval;
}
/**
* Define the "Terminal Symbol" toke types. We use (in CAPS by convention),
* and associate each with a field of the %union:
*/
%token<ival> INT
%token<fval> FLOAT
%token PLUS MINUS MULTIPLY DIVIDE
%token NEWLINE QUIT
%type<ival> expression
%type<fval> mixed_expression
%start calculation
%%
calculation:
| calculation line
;
line: NEWLINE
| mixed_expression NEWLINE {printf("\tResult: %f\n", $1);}
| expression NEWLINE {printf("\tResult: %i\n", $1);}
| QUIT NEWLINE {printf("bye!\n"); exit(0);}
;
mixed_expression: FLOAT {$$ = $1;}
| mixed_expression PLUS mixed_expression {$$ = $1 + $3;}
| mixed_expression MINUS mixed_expression {$$ = $1 - $3;}
| mixed_expression MULTIPLY mixed_expression {$$ = $1 * $3;}
| mixed_expression DIVIDE mixed_expression {$$ = $1 / $3;}
| expression PLUS mixed_expression {$$ = $1 + $3;}
| expression MINUS mixed_expression {$$ = $1 - $3;}
| expression MULTIPLY mixed_expression {$$ = $1 * $3;}
| expression DIVIDE mixed_expression {$$ = $1 / $3;}
| mixed_expression PLUS expression {$$ = $1 + $3;}
| mixed_expression MINUS expression {$$ = $1 - $3;}
| mixed_expression MULTIPLY expression {$$ = $1 * $3;}
| mixed_expression DIVIDE expression {$$ = $1 / $3;}
| expression DIVIDE expression {$$ = $1 / (float)$3;}
;
expression: INT {$$ = $1;}
| expression PLUS expression {$$ = $1 + $3;}
| expression MINUS expression {$$ = $1 - $3;}
| expression MULTIPLY expression {$$ = $1 * $3;}
;
%%
int main(int, char**) {
// Open a file handle to a particular file:
FILE *myfile = fopen("numbers.txt", "r");
// Make sure it is valid:
if(!myfile) {
cout << "I can't open numbers.txt!" << endl;
return -1;
}
// Set Flex to read from it instead of defaulting to STDIN:
yyin = myfile;
// Parse through the input:
yyparse();
}
void yyerror(const char *s) {
cout << "AAAHHHH, parse error on line " << line_num << "! Message: " << s << endl;
// might as well halt now:
exit(-1);
}
I believe that the issue may be with my implementation of expression and mixed_expression.
The datatype of a computation is semantic, not syntactic; intuitively, one wouldn't expect a grammar to deal with it. Under some very limited circumstances it's possible — your attempt is workable, at least until you decide to implement variables — but it's almost never a good idea.
But that's not what's producing the shift-reduce conflicts. The problem is that your grammar is ambiguous and you have not told Bison how to resolve the ambiguities. The grammar is ambiguous because 4-2*3 could be parsed either as expression(4-2) TIMES expression(3) or as expression(4) MINUS expression(2*3), which obviously have different values. The usual way to resolve this particular ambiguity is through precedence declarations, which you will find deployed in pretty well all of the useful example parsers.
For example I have a lot of Tex strings like
|u(x,t)|^2 = \frac{1}{\sqrt{1+(4+t)^2))e^{-\frac{2(x-k_0t)^2}{1+4t^2))
The problem with the above Tex is that the bracket is not matching. \frac{1}{\sqrt{1+(4+t)^2)) should be \frac{1}{\sqrt{1+(4+t)^2}} and {-\frac{2(x-k_0t)^2}{1+4t^2)) should be {-\frac{2(x-k_0t)^2}{1+4t^2}}
wrong: \frac{1}{\sqrt{1+(4+t)^2))
right: \frac{1}{\sqrt{1+(4+t)^2}}
wrong: {-\frac{2(x-k_0t)^2}{1+4t^2))
right: {-\frac{2(x-k_0t)^2}{1+4t^2}}
explanation: The first example is not right, because for last two ) there is no ) matching it, and it should be } to match previous {
I want to know how to automatically correct such mismatched brackets? I have perl installed and I intended to do it with regex, but can't figure out a way.
I don't know if I'm understanding you correctly, but it sounds to me like you need to count brackets and make sure that the number of ( or [ or { is equal to the number of corresponding ) or ] or }.
One possible solution is using a hash for every line of TeX and storing values in it (not sure how the file looks. I assume all lines are like you provided):
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift;
my $line_num = 0;
open FH, "<$file" or die "Error: $!\n";
while(<FH>) {
my %brackets = (
'(' => 0,
'[' => 0,
'{' => 0
);
$line_num++;
my #chars = split //, $_;
### Count brackets.
foreach my $char (#chars) {
if ($char eq '(' or $char eq '[' or $char eq '{') {
$brackets{$char}++;
} elsif ($char eq ')' or $char eq ']' or $char eq '}') {
if ($char eq ')') { $brackets{'('}--; }
if ($char eq ']') { $brackets{'['}--; }
if ($char eq '}') { $brackets{'{'}--; }
} else {
next;
}
}
### Check that all hash values are 0.
foreach my $bracket (keys %brackets) {
if ($brackets{$bracket} != 0) {
print "In line $line_num: '$bracket' missing $brackets{$bracket} closing brackets.\n";
}
}
}
This code will at least tell you where the errors occured and give you a general idea of the nature of these errors. for input such as )ff){gfs[[y[46rw] the output will be:
In line 1: '{' missing 1 closing brackets.
In line 1: '[' missing 2 closing brackets.
In line 1: '(' missing -2 closing brackets.
Instead of printing the brackets (or storing the number of brackets. Probably better to store index of the brackets) you can write simple code to fix this because at this point, you'll have all the information you need.
This is not a simple question if the errors in the file have no pattern. I recommend looking for one before actually trying
There must be some kind of condition, which indicates where the parenthesis are used instead of the braces. I assume that it is either in front of e^ and at the end of the line.
This fixes the first one:
perl -pi~ -e 's/\)\)e\^/}}e^/' file.tex
And this the second:
perl -pi~ -e 's/\)\)$/}}/' file.tex
I'm trying to munge a simple grammar with a perl regex (note this isn’t intended for production use, just a quick analysis for providing editor hints/completions). For instance,
my $GRAMMAR = qr{(?(DEFINE)
(?<expr> \( (?&expr) \) | (?&number) | (?&var) | (?&expr) (?&op) (?&expr) )
(?<number> \d++ )
(?<var> [a-z]++ )
(?<op> [-+*/] )
)}x;
I would like to be able to run this as
$expr =~ /$GRAMMAR(?&expr)/;
and then access all the variable names. However, according to perlre,
Note that capture groups matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing groups is necessary. Thus $+{NAME_PAT} would not be defined even though $+{NAME} would be.
So apparently this is not possible. I could try using a (?{ code }) block to save variable names to a hash, but this doesn't respect backtracking (i.e. the assignment’s side effect persists even if the variable is backtracked past).
Is there any way to get everything captured by a given named capture group, including recursive matches? Or do I need to manually dig through the individual pieces (and thus duplicate all the patterns)?
The necessity of having to add capturing and backtracking machinery is one of the shortcomings that Regexp::Grammars addresses.
However, the grammar in your question is left-recursive, which neither Perl regexes nor a recursive-descent parser will parse.
Adapting your grammar to Regexp::Grammars and factoring out left-recursion produces
my $EXPR = do {
use Regexp::Grammars;
qr{
^ <Expr> $
<rule: Expr> <Term> <ExprTail>
| <Term>
<rule: Term> <Number>
| <Var>
| \( <MATCH=Expr> \)
<rule: ExprTail> <Op> <Expr>
<token: Op> \+ | \- | \* | \/
<token: Number> \d++
<token: Var> [a-z]++
}x;
};
Note that this simple grammar gives all operators equal precedence rather than Please Excuse My Dear Aunt Sally.
You want to extract all variable names, so you could walk the AST as in
sub all_variables {
my($root,$var) = #_;
$var ||= {};
++$var->{ $root->{Var} } if exists $root->{Var};
all_variables($_, $var) for grep ref $_, values %$root;
wantarray ? keys %$var : [ keys %$var ];
}
and print the result with
if ("(a + (b - c))" =~ $EXPR) {
print "[$_]\n" for sort +all_variables \%/;
}
else {
print "no match\n";
}
Another approach is to install an autoaction for the Var rule that records names of variables as they are successfully parsed.
package JustTheVarsMaam;
sub new { bless {}, shift }
sub Var {
my($self,$result) = #_;
++$self->{VARS}{$result};
$result;
}
sub all_variables { keys %{ $_[0]->{VARS} } }
1;
Call this one as in
my $vars = JustTheVarsMaam->new;
if ("(a + (b - c))" =~ $EXPR->with_actions($vars)) {
print "[$_]\n" for sort $vars->all_variables;
}
else {
print "no match\n";
}
Either way, the output is
[a]
[b]
[c]
Recursivity is native with Marpa::R2 using the BNF in the __DATA__ section below:
#!env perl
use strict;
use diagnostics;
use Marpa::R2;
my $input = shift || '(a + (b - c))';
my $grammar_source = do {local $/; <DATA>};
my $recognizer = Marpa::R2::Scanless::R->new
(
{
grammar => Marpa::R2::Scanless::G->new
(
{
source => \$grammar_source,
action_object => __PACKAGE__,
}
)
},
);
my %vars = ();
sub new { return bless {}, shift;}
sub varAction { ++$vars{$_[1]}};
$recognizer->read(\$input);
$recognizer->value() || die "No parse";
print join(', ', sort keys %vars) . "\n";
__DATA__
:start ::= expr
expr ::= NUMBER
| VAR action => varAction
| expr OP expr
| '(' expr ')'
NUMBER ~ [\d]+
VAR ~ [a-z]+
OP ~ [-+*/]
WS ~ [\s]+
:discard ~ WS
The output is:
a, b, c
Your question was adressing only how to get the variable names, so no notion of operator associativity and so on in this answer. Just note that Marpa has no problem with that, if needed.
I'm trying to shorten some repetitive code in my bison parser, here's an excerpt of one of the rules:
expression : OBJECTID ASSIGN expression { $$ = std::make_shared<Assign>($1, $3); $$->setloc(#3.first_line, curr_filename); }
| expression '.' OBJECTID '(' method_expr_list ')' { $$ = std::make_shared<DynamicDispatch>($1, $3, $5);
$$->setloc(#1.first_line, curr_filename); }
I was thinking of something along the lines of:
expression : OBJECTID ASSIGN expression { $$ = std::make_shared<Assign>($1, $3); SETLOC(#1); }
| expression '.' OBJECTID '(' method_expr_list ')' { $$ = std::make_shared<DynamicDispatch>($1, $3, $5);
SETLOC(#1); }
I can't think of any other way to achieve this other than to use a macro to do it. This is what I came up with:
#define SETLOC(node) $$->setloc((node).first_line, curr_filename)
Unfortunately, I get a compile error saying that $$ is not defined, which makes sense since it's a function-like macro. I would like to know if there's a way to achieve the code in the 2nd snippet?
This is because $$ is a special sequence that Bison recognizes and uses, it's not in the actual generated C code. You have to pass it in as an argument to the macro instead:
#define SETLOC(parent, node) parent->setloc((node).first_line, curr_filename)
I was teaching myself Bison and headed over to wikipedia for the same and copy-pasted the entire code from the example that was put there [ http://en.wikipedia.org/wiki/GNU_Bison ]. It compiled and works perfect. Then, I OOPed it by adding in a bit of C++. Here's is my new Parser.y file:
%{
#include "TypeParser.h"
#include "ParserParam.h"
#include "addition.h"
%}
%define api.pure
%left '+' TOKEN_PLUS
%left '*' TOKEN_MULTIPLY
%left '-' TOKEN_SUBTRACT
%left '/' TOKEN_DIVIDE
%left '^' TOKEN_EXP
%token TOKEN_LPAREN
%token TOKEN_RPAREN
%token TOKEN_PLUS
%token TOKEN_MULTIPLY
%token <value> TOKEN_NUMBER
%type <expression> expr
%%
input:
expr { ((SParserParam*)data)->expression = $1; }
;
expr:
expr TOKEN_PLUS expr { $$ = new Addition($1, $2); }
| expr TOKEN_MULTIPLY expr { $$ = new Multiplication($1, $2); }
| expr TOKEN_SUBTRACT expr { $$ = new Addition($1, $2); }
| expr TOKEN_DIVIDE expr { $$ = new Multiplication($1, $2); }
| expr TOKEN_EXP expr { $$ = new Addition($1, $2); }
| TOKEN_LPAREN expr TOKEN_RPAREN { $$ = $2; }
| TOKEN_NUMBER { $$ = new Value($1); }
;
%%
But then I keep getting the following errors:
Parser.y:33.52-53: $2 of `expr' has no declared type
Parser.y:34.62-63: $2 of `expr' has no declared type
Parser.y:35.56-57: $2 of `expr' has no declared type
Parser.y:36.60-61: $2 of `expr' has no declared type
Parser.y:37.52-53: $2 of `expr' has no declared type
How do I resolve it? I mean, what have I changed that is causing this? I haven't changed anything from the wikipedia code, the %type% declaration is still there [The union has the same members, with type changed from SExpression to Expression.]. All classes i.e. Addition, Expression, Multiplication are defined and declared. I don't think that is what is causing the problem here, but just saying.
And why exactly does it have a problem only with $2. Even $1 is of type expr, then why do I not get any errors for $1?
Any help is appreciated...
In the rule expr TOKEN_PLUS expr $1 is the first expression, $2 is TOKEN_PLUS, and $3 is the second expression. See the bison manual.
So the semantic action needs to change from your { $$ = new Addition($1, $2); } to { $$ = new Addition($1, $3); }.