I was trying to do a simple parsing of general html codes.
Here's my entire bison file (example4.y).
%{
#include <iostream>
#include <cstring>
using namespace std;
extern "C" int yylex();
extern "C" int yyparse();
extern "C" FILE *yyin;
void yyerror(const char *str)
{
cout<<"Error: "<<str<<"\n";
}
int yywrap()
{
return 0;
}
main()
{
yyparse();
}
%}
%token NUMBER LANGLE CLOSERANGLE RANGLE SLASH ANYTHING
%union
{
int intVal;
float floatVal;
char *strVal;
}
%%
tag: |
opening_tag anything closing_tag
{
if(strcmp($<strVal>1,$<strVal>3)==0){
cout<<"\n[i] Tag Matches: "<<$<strVal>1;
cout <<"\n[!] The text: "<<$<strVal>2;
} else {
cout<<"\n[!] Tag Mismatch: "<<$<strVal>1<<" and "<<$<strVal>3;
}
$<strVal>$ = $<strVal>2;
}
|
opening_tag tag closing_tag
{
if(strcmp($<strVal>1,$<strVal>3)==0){
cout<<"\n[i] Tag Matches: "<<$<strVal>1;
cout <<"\n[!] The text: "<<$<strVal>2;
} else {
cout<<"\n[!] Tag Mismatch: "<<$<strVal>1<<" and "<<$<strVal>3;
}
}
;
opening_tag:
LANGLE ANYTHING RANGLE
{
$<strVal>$ = $<strVal>2;
}
;
anything:
ANYTHING
{
$<strVal>$ = $<strVal>1;
}
;
closing_tag:
LANGLE SLASH ANYTHING RANGLE
{
$<strVal>$= $<strVal>3;
}
%%
The error i get is: example4.y: conflicts: 1 shift/reduce
I think it has to do something with opening_tag tag closing_tag but i could not think what's happening here?
Any help?
It's because of two rules that start with opening_tag. The parser has to decide between the rules by lookint at most one token ahead, but it cannot. <FOO> may lead to either rule, and this requires two more tokens of lookahead.
You can do this:
tag : /* nothing */
| opening_tag contents closing_tag
;
contents: tag
| anything
;
UPDATE This new grammar has a different shift/reduce conflict. (UPDATE2: or perhaps it's the same one). Because a tag can be empty, the parser cannot decide what to do at this input:
<Foo> <...
^
|
input is here
If the next symbol is a slash, then we have a closing tag, and the empty tag rule should be matched. If the next symbol is not a slash, then we have an opening tag, and the non-empty tag rule should be matched. But the parser cannot know, it is only allowed to look at <.
The solution would be to create a new token, LANGLE_SLASH, for the </ combination.
The problem is that tag can be empty, so that <x>< might be the beginning of opening_tag tag closing_tag or of opening_tag opening_tag. Consequently, bison cannot tell whether to reduce an empty tag before shifting the <.
You should be able to fix it by removing the empty production for tag and adding an explicit production for opening_tag closing_tag.
Related
I am using this function to perform regex replace on std::string:
String regexReplace(String s,String search,String replace,String modifier,int user){
bool case_sensitive=true,global=false;
String replaced_string=s;
if(modifier=="gi" || modifier=="ig"){global=true;case_sensitive=false;}
else if(modifier=="i"){case_sensitive=false;}
else if(modifier=="g"){global=true;}
try {
std::regex re (search);
if(user==1){re=createRegex(search,case_sensitive);}
else if(!case_sensitive){re= Regex (search, REGEX_DEFAULT | ICASE);}
if(global){
replaced_string=std::regex_replace (s,re,replace,std::regex_constants::format_default);
}
else{
replaced_string=std::regex_replace (s,re,replace,NON_RECURSIVE_REGEX_REPLACE);
}
}
catch (std::regex_error& e) {
printErrorLog("Invalid replace string regex: "+search);
Exit(1);
}
return replaced_string;
}
typedefs and #defines used:
typedef std::regex Regex;
typedef std::string String;
#define REGEX_DEFAULT std::regex::ECMAScript
#define ICASE std::regex::icase
#define NON_RECURSIVE_REGEX_REPLACE std::regex_constants::format_first_only
But this function consumes approximately 0.3 seconds on 14x4 consecutive executions:
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+$","$1","i",0);
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+(e.*)$","$1$2","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*$","$1","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*(e.*)$","$1$2","i",0);
Can I make it more efficient to lessen the execution time?
Note:
The createRegex() function is not being called (user=0 by default).
Write a parser (both Yacc and Lex files) that uses the following productions and actions:
S -> cSS {print “x”}
S -> a {print “y”}
S -> b {print “z”}
Indicate the string that it will print when the input is cacba.
I am getting this error: when I give input to it, it says valid input and also says syntax error.
My Scanner Code is this
%{
#include "prac.h"
%}
%%
[c] {return C; }
[a] {return A; }
[b] {return B; }
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
int yywrap(void) {
return 1;
}
And my yacc code is this:
%{
#include <stdio.h>
%}
%token A B C
%%
statement: S {printf("Valid Input"); }
;
S: C S S {printf("Print x\n");}
| A {printf("Print y\n");}
| B {printf("Print z\n");}
;
%%
int main()
{
return yyparse();
}
yyerror(char *s)
{
printf("\n%s\n",s);
printf("Invalid Input");
fprintf(stderr,"At line %d %s ",s,yylineno);
}
How can I fix this?
(Comments converted to an answer)
#ChrisDodd wrote:
Best guess -- you're running on windows, so you're getting a \r (carriage return) character before the newline which is causing your error. Try adding \r to the [ \t] pattern to ignore it.
#Cyclone wrote:
Change your fprintf() statement to fprintf(stderr, "At line %d %s", yylineno, s); not that it will solve your problem.
The OP wrote:
You mean I should add \r into \t so the new regex for it will be [\r\t] Am I right ?
#rici wrote:
#chris suggests [ \r\t]. If you have Windows somewhere in the loop, I agree.
For some reason bison is rejecting a specific rule, the notequal_expression, beware that Im just starting to learn the whole concept so my line of thought is not so mature, the input file: ( The Error is: "string.y contains 1 useless nonterminal and 1 useless rule." )
/* Parser for StringC */
%{
/* ------------------------------------------------------------------
Initial code (copied verbatim to the output file)
------------------------------------------------------------------ */
// Includes
#include <malloc.h> // _alloca is used by the parser
#include <string.h> // strcpy
#include "lex.h" // the lexer
// Some yacc (bison) defines
#define YYDEBUG 1 // Generate debug code; needed for YYERROR_VERBOSE
#define YYERROR_VERBOSE // Give a more specific parse error message
// Error-reporting function must be defined by the caller
void Error (char *format, ...);
// Forward references
void yyerror (char *msg);
%}
/* ------------------------------------------------------------------
Yacc declarations
------------------------------------------------------------------ */
/* The structure for passing value between lexer and parser */
%union {
char *str;
}
%token ERROR_TOKEN IF ELSE PRINT INPUT ASSIGN EQUAL NOTEQUAL
%token CONCAT END_STMT OPEN_PAR CLOSE_PAR
%token BEGIN_CS END_CS
%token <str> ID STRING BOOLEAN
/*%type <type> type simple_type cast*/
%expect 1 /* shift/reduce conflict: dangling ELSE */
/* declaration */
%%
/* ------------------------------------------------------------------
Yacc grammar rules
------------------------------------------------------------------ */
program
: statement_list
;
statement_list
: statement_list statement
| /* empty */
;
statement
: END_STMT {puts ("Empty statement");}
| expression END_STMT {puts ("Expression statement");}
| PRINT expression END_STMT {puts ("Print statement");}
| INPUT identifier END_STMT {puts ("Input statement");}
| if_statement {puts ("If statement");}
| compound_statement {puts ("Compound statement");}
| error END_STMT {puts ("Error statement");}
| notequal_expression {puts ("Not equal statement");}
;
/* NOTE: This rule causes an unresolvable shift/reduce conflict;
That's why %expect 1 was added (see above) */
if_statement
: IF OPEN_PAR expression CLOSE_PAR statement optional_else_statement
;
optional_else_statement
: ELSE statement
| /* empty */
;
compound_statement
: BEGIN_CS statement_list END_CS
;
expression
: equal_expression
| OPEN_PAR expression CLOSE_PAR
;
equal_expression
: expression EQUAL assign_expression
| assign_expression
;
notequal_expression
: expression NOTEQUAL assign_expression
| NOTEQUAL assign_expression
;
assign_expression
: identifier ASSIGN assign_expression
| concat_expression
;
concat_expression
: concat_expression CONCAT simple_expression
| simple_expression
;
simple_expression
: identifier
| string
;
identifier
: ID {}
;
string
: STRING {}
;
bool
: BOOLEAN {}
;
%%
/* ------------------------------------------------------------------
Additional code (again copied verbatim to the output file)
------------------------------------------------------------------ */
The lexer:
/* Lexical analyzer for StringC */
%{
/* ------------------------------------------------------------------
Initial code (copied verbatim to the output file)
------------------------------------------------------------------ */
// Includes
#include <string.h> // strcpy, strncpy
#include <io.h> // isatty
#ifdef MSVC
#define isatty _isatty // for some reason isatty is called _isatty in VC..
#endif
#define _LEX_CPP_ // make sure our variables get created
#include "lex.h"
#include "lexsymb.h"
extern "C" int yywrap (); // the yywrap function is declared by the caller
// Forward references
void Identifier ();
void StringConstant ();
void BoolConstant ();
void EatComment ();
//// End of inititial code
%}
/* ------------------------------------------------------------------
Some macros (standard regular expressions)
------------------------------------------------------------------ */
LETTER [a-zA-Z_]
DIGIT [0-9]
IDENT {LETTER}({LETTER}|{DIGIT})*
STR \"[^\"]*\"
BOOL \(false|true)\
WSPACE [ \t]+
/* ------------------------------------------------------------------
The lexer rules
------------------------------------------------------------------ */
%%
"if" {return IF;}
"else" {return ELSE;}
"print" {return PRINT;}
"input" {return INPUT;}
"=" {return ASSIGN;}
"==" {return EQUAL;}
"!=" {return NOTEQUAL;} /* Not equal to */
"+" {return CONCAT;}
";" {return END_STMT;}
"(" {return OPEN_PAR;}
")" {return CLOSE_PAR;}
"{" {return BEGIN_CS;}
"}" {return END_CS;}
{BOOL} {BoolConstant (); return BOOLEAN;}
{STR} {StringConstant (); return STRING;}
{IDENT} {Identifier (); return ID;}
"//" {EatComment();} /* comment: skip */
\n {lineno++;} /* newline: count lines */
{WSPACE} {} /* whitespace: (do nothing) */
. {return ERROR_TOKEN;} /* other char: error, illegal token */
%%
/* ------------------------------------------------------------------
Additional code (again copied verbatim to the output file)
------------------------------------------------------------------ */
// The comment-skipping function: skip to end-of-line
void EatComment() {
char c;
while ((c = yyinput()) != '\n' && c != 0);
lineno++;
}
// Pass the id name
void Identifier () {
yylval.str = new char[strlen(yytext)+1];
strcpy (yylval.str, yytext);
}
// Pass the string constant
void StringConstant() {
int l = strlen(yytext)-2;
yylval.str = new char[l+1];
strncpy (yylval.str, &yytext[1], l); yylval.str[l] = 0;
}
void BoolConstant() {
int l = strlen(yytext)-2;
yylval.str = new char[l+1];
strncpy(yylval.str, &yytext[1], l); yylval.str[l] = 0;
}
Are you sure that it's notequal_expression that is causing the issue? The nonterminal and rule that are not used, as I read it, are
bool
: BOOLEAN {}
;
Perhaps instead of
simple_expression
: identifier
| string
;
you intended to code
simple_expression
: identifier
| string
| bool
;
There are two problems with the grammar. The first is the shift/reduce conflict you've already seen (and addressed with %expect 1. I prefer to address it in the grammar instead and use %expect 0 instead. You can do that by removing ELSE from the %token list and adding a line
%right THEN ELSE
To declare right associativity. Your language doesn't actually have a THEN keyword but that's fine. You can then remove completely the rule for optional_else_statement and reword the rule for if_statement as follows:
if_statement
: IF OPEN_PAR expression CLOSE_PAR statement %prec THEN
| IF OPEN_PAR expression CLOSE_PAR statement ELSE statement
;
There are those who prefer to resolve it this way, and others who advocate the %expect 1 approach. I prefer this way, but now that you have both methods, you can certainly choose for yourself.
For the other problem, the useless rule is definitely this one:
bool
: BOOLEAN {}
;
because the non-terminal bool is not used anywhere else in the grammar. This accounts for both "1 useless nonterminal and 1 useless rule" as reported by bison. To be able to identify this kind of thing for yourself, you can use
bison --report=solved -v string.y
This will create a string.output file which will contain a large but readable report including any resolved shift-reduce errors (such as your IF-ELSE construction) and also a complete set of states created by bison. It's very useful when attempting to troubleshoot grammar problems.
So I have a function on javacc:
void parseDSL() throws SemanticException #void :
{}
{
<ALL> "/*#mat" dslStatements() "*/" <ALL> <EOF>
}
My objective is to ignore everything until the "/*#mat" matches and after the parsing ignores everythings until EOF.
I'm really struggling to find a regular expressions that works here.
One example of a file that should pass is:
public class blabla {
int i=1;/*#mat
in float B[100];
in float C[100];
in int A[9];
in int Z[9];
out float D[];
D=A*(B+C-Z)+A*Z;
*/boolean a;
}
Thank You.
This is what lexical states are for. See the documentation and FAQ for more information. Roughly what you want is
<DEFAULT> SKIP: { < ~[] > } // Skip everything up to "/*#mat"
<DEFAULT> TOKEN: { < STARTMAT: "/*#mat" > : GO }
<GO> TOKEN: { <IN : "in" >
| <OUT : "out" >
| .... other rules go here ....
| <ENDMAT : "*/" > : DEFAULT // On a "*/" go back to skipping.
}
I want to match a simple expression with boost, but it behaves strange... The code below should match and display "a" from first and second strings:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
#include "stdio.h"
using namespace boost::xpressive;
void xmatch_action( const char *line ) {
cregex g_re_var;
cmatch what;
g_re_var = cregex::compile( "\\s*var\\s+([\\w]+)\\s*=.*?" );
if (regex_match(line, what, g_re_var )) {
printf("OK\n");
printf(">%s<\n", what[1] );
}
else {
printf("NOK\n");
}
}
int main()
{
xmatch_action("var a = qqq");
xmatch_action(" var a = aaa");
xmatch_action(" var abc ");
}
but my actual output is:
OK
>a = qqq<
OK
>a = aaa<
NOK
and it should be
OK
>a<
OK
>a<
NOK
Instead of printf() use the << operator to print the sub_match object (what[1]). Or you can try using what[1].str() instead of what[1].
See the docs: sub_match, match_results, regex_match
Remove square brackets around \w in regex AND use std::cout for printing. Then you will get result that you want.