I'm trying to shorten some repetitive code in my bison parser, here's an excerpt of one of the rules:
expression : OBJECTID ASSIGN expression { $$ = std::make_shared<Assign>($1, $3); $$->setloc(#3.first_line, curr_filename); }
| expression '.' OBJECTID '(' method_expr_list ')' { $$ = std::make_shared<DynamicDispatch>($1, $3, $5);
$$->setloc(#1.first_line, curr_filename); }
I was thinking of something along the lines of:
expression : OBJECTID ASSIGN expression { $$ = std::make_shared<Assign>($1, $3); SETLOC(#1); }
| expression '.' OBJECTID '(' method_expr_list ')' { $$ = std::make_shared<DynamicDispatch>($1, $3, $5);
SETLOC(#1); }
I can't think of any other way to achieve this other than to use a macro to do it. This is what I came up with:
#define SETLOC(node) $$->setloc((node).first_line, curr_filename)
Unfortunately, I get a compile error saying that $$ is not defined, which makes sense since it's a function-like macro. I would like to know if there's a way to achieve the code in the 2nd snippet?
This is because $$ is a special sequence that Bison recognizes and uses, it's not in the actual generated C code. You have to pass it in as an argument to the macro instead:
#define SETLOC(parent, node) parent->setloc((node).first_line, curr_filename)
Related
I am trying to match different logic expression, such as: "$a and $b" using Perl regex, here is my code:
$input =~ /^(.*)\s(and|or|==|<|>|>=|<=)\s(.*)$/ {
$arg1=$1;
$arg2=$3;
$opt=$2;
}
and my purpose is to get:
$arg1="$ARGV[0]=~/\w{4}/"
$arg2="$num_arg==1"
$opt ="and"
I want to get the exact value matched in the or expression. I don't want to do the same thing for all the cases to match one by one, and hardcode the operator.
Does anyone know how to solve the problem?
This code works for me:
$input = '$ARGV[0]=~/\w{4}/ and $num_arg==1';
if ($input=~/^(.*)\s(and|or|==|<|>|>=|<=)\s(.*)$/) {
$arg1=$1;
$arg2=$3;
$opt=$2;
print "$arg1\n$arg2\n$opt\n";
}
You need a little parser able to reveal the structure of a logical expression. That is because you may have another expression inside a term. You can use perl to test your grammar using Marpa::R2 package.
As a first attempt I would write:
<expression> ::= <term> | <expression> <binary-op> <term>
<term> ::= <factor> <binary-op> <factor> | <unary-op><factor>
<factor> ::= <id>
<binary-op> ::= (and|or|==|<|>|>=|<=)
<unary-op> ::= (not | ! )
One thing for sure that you can't complete describe the syntax of a logical expression using only regular expressions, it will always lack some valid case.
The Perl Code for validation
use Modern::Perl;
use Marpa::R2;
my $dsl = <<'END_OF_DSL';
:default ::= action => [name,values]
lexeme default = latm => 1
Expression ::= Term
| Expression BinaryOP Term
Term ::= Factor BinaryOP Factor
| UnaryOP Factor
Factor ::= ID
ID ~ [\w]+
BinaryOP ~ 'and' | 'or' | '==' | '<' | '>' | '>=' | '<='
UnaryOP ~ 'not' | '!'
:discard ~ whitespace
whitespace ~ [\s]+
END_OF_DSL
# your input
my $input = 'a and b or !c';
# your parser
my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
# process input
my $recce = Marpa::R2::Scanless::R->new(
{ grammar => $grammar, semantics_package => 'My_Actions' } );
my $length_read = $recce->read( \$input );
die "Read ended after $length_read of ", length $input, " characters"
if $length_read != length $input;
I want to use a perl regex to remove the outer brackets in a function but I can't construct a regex that doesn't interfere with the inner brackets . Here is an example:
void init(){
if(true){
//do something
}
}
into
void init()
if(true){
//do something
}
is there a regex that can do this?
Write a parser for the language. Here's a simplified example using Marpa::R2:
#!/usr/bin/perl
use warnings;
use strict;
use Marpa::R2;
my $input = << '__IN__';
void init(){
if(true){
//do something
}
}
__IN__
my $dsl = << '__DSL__';
:default ::= action => concat
lexeme default = latm => 1
FuncDef ::= type name Arglist ('{') Body ('}')
Arglist ::= '(' Args ')'
Args ::= Arg* separator => comma
Arg ::= type name
Body ::= Block+
Block ::= nonbrace
| '{' nonbrace '}'
nonbrace ~ [^{}]*
comma ~ ','
type ~ 'void'
name ~ [\w]+
space ~ [\s]+
:discard ~ space
__DSL__
sub concat { shift; join ' ', #_ }
my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl });
my $value = $grammar->parse(\$input, { semantics_package => 'main' });
print $$value;
The curly brackets at FuncDef are parenthesized, which tells Marpa to discard them.
Here it is:
my $s = "void init(){ if(true){ //do something }}";
$s =~ s/^([^{]+)\{(.*)\}([^{]*)$/$1$2$3/s;
print "$s\n";
Is there a way to specify that a Bison rule should NOT match if the lookahead token is a given value?
I currently have the following Bison grammar (simplified):
var_decl:
type ident
{
$$ = new NVariableDeclaration(*$1, *$2);
} |
type ident ASSIGN_EQUAL expr
{
$$ = new NVariableDeclaration(*$1, *$2, $4);
} |
type CURVED_OPEN STAR ident CURVED_CLOSE CURVED_OPEN func_decl_args CURVED_CLOSE
{
$$ = new NVariableDeclaration(*(new NFunctionPointerType(*$1, *$7)) /* TODO: free this memory */, *$4);
} |
type CURVED_OPEN STAR ident CURVED_CLOSE CURVED_OPEN func_decl_args CURVED_CLOSE ASSIGN_EQUAL expr
{
$$ = new NVariableDeclaration(*(new NFunctionPointerType(*$1, *$7)) /* TODO: free this memory */, *$4, $10);
} ;
...
deref:
STAR ident
{
$$ = new NDereferenceOperator(*$<ident>2);
} |
...
type:
ident
{
$$ = new NType($<type>1->name, 0, false);
delete $1;
} |
... ;
...
expr:
deref
{
$$ = $1;
} |
...
ident
{
$<ident>$ = $1;
} |
...
ident CURVED_OPEN call_args CURVED_CLOSE
{
$$ = new NMethodCall(*$1, *$3);
delete $3;
} |
...
CURVED_OPEN expr CURVED_CLOSE
{
$$ = $2;
} ;
...
call_args:
/* empty */
{
$$ = new ExpressionList();
} |
expr
{
$$ = new ExpressionList();
$$->push_back($1);
} |
call_args COMMA expr
{
$1->push_back($3);
} ;
The problem is that when parsing:
void (*ident)(char* some_arg);
It's seeing void (*ident) and deducing that it must be a function call instead of a function declaration. Is there a way I can tell Bison that it should favour looking ahead to match var_decl instead of reducing *ident and void into derefs and exprs?
any identifier can be a type
That's exactly the problem. LALR(1) grammars for C-like languages (or languages with C-like syntax for types) need to differentiate types and other identifiers at the token level. That is, you need IDENT and TYPEIDENT be two different tokens. (You will have to feed data about identifiers from the compiler back to the tokenizer). It's the most standard way to disambiguate the otherwise ambiguous grammar.
Update See, for instance, this ANSI C grammar for Yacc.
I am trying to do a simple if condition from an input file.
i will have something like
if(color = black)
No matter what i do i keep getting 1 shift / reduce
I am very new to lex and yacc
Do YACC grammars often have shift-reduce conflicts? and should i not worry about them?
My lex file will return every character in the file correctly so i wont show you the lex file
However, here is my yacc file:
%{
#include <ctype.h>
#include <stdio.h>
%}
|IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("WONT COMPILE\n");}
;
in the yacc file i tried this but that is where i am getting the shift/ reduce
IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("If statement\n");}
SOLVED
I originally wrote a long answer about the “dangling else” ambiguity, but then I took a closer look at your grammar. Let’s cut it down to size a bit and demonstrate where the problems are:
%token IF COLOR BLACK
%%
statement
: statement command
| /*nothing*/
;
command
: IF {$$ = $1; printf ("IF\n");}
| ELSE {$$ = $1; printf("ELSE\n");}
| EQ {$$ = $1; printf("EQ\n");}
| THEN {$$ = $1; printf("THEN\n");}
| LPAREN {$$ = $1; printf("LPAREN\n");}
| RPAREN {$$ = $1; printf("RPAREN\n");}
| COLOR EQ BLACK {$$ = $3; printf("color is black\n");}
| IF LPAREN COLOR EQ BLACK RPAREN {$$ = $1; printf("WONT COMPILE\n");}
;
Just how do you expect the statement if(color = black) to be parsed? Notice that the “color = black” can reduce to a command via COLOR EQ BLACK or can be “shifted” onto the stack to become part of the longer parse IF LPAREN COLOR EQ BLACK RPAREN.
That explains the specific warning you’re getting. Now, on to the rest of your grammar:
You don’t want to be writing your grammar so incomplete statements are meaningful. Notice that the single symbol “=” is a complete valid command and therefore a complete valid statement—is that really what you want?
You’re going to want to rewrite this from scratch. Start simple:
%token NUMBER COMMAND IF THEN ELSE COLOR BLACK
%%
statement
: COMMAND NUMBER
| IF cond THEN statement
| /* nothing */
;
cond
: '(' COLOR '=' BLACK ')'
;
Not tested; but this should be enough to get you started. If you need to do something when you encounter a token, you can (for example) replace IF cond THEN COMMAND with if cond then command and add rules like
if : IF { printf("%s\n", "IF"); }
;
then: THEN { printf("%s\n", "THEN"); }
;
Start simple, add slowly, and refactor when rules get too hairy or repetitive. And work through a tutorial before you jump into a large project. The GNU Bison manual has a good tutorial, as does Kernighan & Pike’s The Unix Programming Environment.
I was teaching myself Bison and headed over to wikipedia for the same and copy-pasted the entire code from the example that was put there [ http://en.wikipedia.org/wiki/GNU_Bison ]. It compiled and works perfect. Then, I OOPed it by adding in a bit of C++. Here's is my new Parser.y file:
%{
#include "TypeParser.h"
#include "ParserParam.h"
#include "addition.h"
%}
%define api.pure
%left '+' TOKEN_PLUS
%left '*' TOKEN_MULTIPLY
%left '-' TOKEN_SUBTRACT
%left '/' TOKEN_DIVIDE
%left '^' TOKEN_EXP
%token TOKEN_LPAREN
%token TOKEN_RPAREN
%token TOKEN_PLUS
%token TOKEN_MULTIPLY
%token <value> TOKEN_NUMBER
%type <expression> expr
%%
input:
expr { ((SParserParam*)data)->expression = $1; }
;
expr:
expr TOKEN_PLUS expr { $$ = new Addition($1, $2); }
| expr TOKEN_MULTIPLY expr { $$ = new Multiplication($1, $2); }
| expr TOKEN_SUBTRACT expr { $$ = new Addition($1, $2); }
| expr TOKEN_DIVIDE expr { $$ = new Multiplication($1, $2); }
| expr TOKEN_EXP expr { $$ = new Addition($1, $2); }
| TOKEN_LPAREN expr TOKEN_RPAREN { $$ = $2; }
| TOKEN_NUMBER { $$ = new Value($1); }
;
%%
But then I keep getting the following errors:
Parser.y:33.52-53: $2 of `expr' has no declared type
Parser.y:34.62-63: $2 of `expr' has no declared type
Parser.y:35.56-57: $2 of `expr' has no declared type
Parser.y:36.60-61: $2 of `expr' has no declared type
Parser.y:37.52-53: $2 of `expr' has no declared type
How do I resolve it? I mean, what have I changed that is causing this? I haven't changed anything from the wikipedia code, the %type% declaration is still there [The union has the same members, with type changed from SExpression to Expression.]. All classes i.e. Addition, Expression, Multiplication are defined and declared. I don't think that is what is causing the problem here, but just saying.
And why exactly does it have a problem only with $2. Even $1 is of type expr, then why do I not get any errors for $1?
Any help is appreciated...
In the rule expr TOKEN_PLUS expr $1 is the first expression, $2 is TOKEN_PLUS, and $3 is the second expression. See the bison manual.
So the semantic action needs to change from your { $$ = new Addition($1, $2); } to { $$ = new Addition($1, $3); }.