„Erratic“ grammar parsing by JavaCC for the IfStatement?

„Erratic“ grammar parsing by JavaCC for the IfStatement? - if-statement

Starting from the Java 1.1 grammar I have tried to write a simplified grammar for a new language. Trying to parse IfStatements in various shapes I have encountered a strange problem which I don’t understand and for which I can’t find a solution.
I have written two test programs, the first containing:
… if (a > b) { x = 15; } else { x = 30; }
and the second:
… if (a > b) x = 15; else x = 30;
The first program is parsed without any problems while for the second the parser stops with the message “Encountered else at (start of ELSE) expecting one of: boolean | break | byte | char | continue ... “. According to the grammar (or rather to my intention ) there shouldn’t be any difference between a block or a single statement, and to my astonishment this perfectly works for WHILE or DO but not for IF.
The relevant pieces of my grammar are as follows:
void Statement() :
{}
{
( LOOKAHEAD(2)
Block() | EmptyStatement() | IfStatement() |
WhileStatement() | ...
}
void Block() :
{}
{ "{" ( BlockStatement() )* "}"
}
void BlockStatement() :
{}
{
( LOOKAHEAD(2) LocalVariableDeclaration() ";" |
Statement()
)
}
void EmptyStatement() :
{}
{ ";" }
void IfStatement() :
{}
{
"if" "(" Expression() ")" Statement()
[ LOOKAHEAD(1) "else" Statement() ]
}
void WhileStatement() :
{}
{
"while" "(" Expression() ")" Statement()
}
There is a second observation possibly connected with the above problem: As stated before, the first test program is parsed correctly. However, for the “Statement” in the (partial) grammar rule <"if" "(" Expression() ")" Statement()> in the IF statement the parser yields an overly complex sequence of “Statement, Block, BlockStatement, Statement” nodes including an EmptyStatement before and after the statement content while parsing a test program containing an equivalent WHILE loop just produces “Statement” and no EmptyStatements at all.
So far I have tried to increase the number of lookahead elements in the IfStatement and/or the Statement productions but without any effect.
What is going wrong here, or what is wrong in my grammar?

Related

Bison C++ mid-rule value lost with variants

I'm using Bison with lalr1.cc skeleton to generate C++ parser and api.value.type variant. I tried to use mid-rule action to return value that will be used in further semantic actions, but it appears that the value on stack becomes zero. Below is an example.
parser.y:
%require "3.0"
%skeleton "lalr1.cc"
%defines
%define api.value.type variant
%code {
#include <iostream>
#include "parser.yy.hpp"
extern int yylex(yy::parser::semantic_type * yylval);
}
%token <int> NUM
%%
expr: NUM | expr { $<int>$ = 42; } '+' NUM { std::cout << $<int>2 << std::endl; };
%%
void yy::parser::error(const std::string & message){
std::cerr << message << std::endl;
}
int main(){
yy::parser p;
return p.parse();
}
lexer.flex:
%option noyywrap
%option nounput
%option noinput
%{
#include "parser.yy.hpp"
typedef yy::parser::token token;
#define YY_DECL int yylex(yy::parser::semantic_type * yylval)
%}
%%
[ \n\t]+
[0-9]+ {
yylval->build(atoi(yytext));
return token::NUM;
}
. {
return yytext[0];
}
compiling:
bison -o parser.yy.cpp parser.y
flex -o lexer.c lexer.flex
g++ parser.yy.cpp lexer.c -O2 -Wall -o parser
Simple input like 2+2 should print value 42 but instead appears 0. When variant type is changed to %union, the printed value is as it should be. For a workaround I've been using marker actions with $<type>-n to get deeper values from stack, but that approach can reduce legibility and maintainability.
I've read in the generated source that when using variant type, the default action { $$ = $1 } is not executed. Is it an example of that behavior or is it a bug?

I vote for "bug". It does not have to do with the absence of the default action.
I traced execution through to the point where it attempts to push the value of the MRA onto the stack. However, it does not get the correct type for the MRA, with the result that the call to yypush_ does nothing, which is clearly not the desired action.
If I were you, I'd report the problem.

This was indeed a bug in Bison 3.0.5, but using $<foo>1 is also intrinsically wrong (yet there was no alternative!).
Consider:
// 1.y
%nterm <int> exp
%%
exp: { $<int>$ = 42; } { $$ = $<int>1; }
It is translated into something like:
// 2.y
%nterm <int> exp
%%
#1: %empty { $<int>$ = 42; }
exp: #1 { $$ = $<int>1; }
What is wrong here is that we did not tell Bison what the type of the semantic value of #1 is, so it will not be able to handle this semantic value correctly (typically, no %printer and no %destructor will be applied). Worse yet: here, because it does not know what value is actually stored here, it is unable to manipulate it properly: it does not know how to copy it from the local variable $$ to the stack, that's why the value was lost. (In C, the bits of the semantic value are copied blindly, so it works. In C++ with Bison variants, we use the exact assignment/copy operators corresponding to the current type, which requires to know what the type is.)
Clearly a better translation would have been to have #1 be given a type:
// 3.y
%nterm <int> exp #1
%%
#1: %empty { $$ = 42; }
exp: #1 { $$ = $1; }
which is actually what anybody would have written.
Thanks to your question, Bison 3.1 now features typed midrule actions:
// 4.y
%nterm <int> exp
%%
exp: <int>{ $$ = 42; } { $$ = $1; }
Desugared, 4.y generates exactly as in 3.y.
See http://lists.gnu.org/archive/html/bug-bison/2017-06/msg00000.html and http://lists.gnu.org/archive/html/bug-bison/2018-06/msg00001.html for more details.

rewrite define macro as c++ code?

How i can write this macro as c++ code?
extern bool distribution_test_server;
bool distribution_test_server = false;
#define GetGoldMultipler() (distribution_test_server ? 3 : 1)
And one more question, what is the vale of the macros if distribution_test_server = false;
Soo if distribution_test_server is false... then the macro it's not used?
Example i have this :
#define GetGoldMultipler() (distribution_test_server ? 3 : 1)

You can write it as an inline function:
inline int GetGoldMultipler()
{
return distribution_test_server ? 3 : 1;
}
If distribution_test_server is false, the multiplier returned is 1.

It's already C++ code, but I assume you want to rewrite it as a function:
int GetGoldMultiplier() {return distribution_test_server ? 3 : 1;}
If distribution_test_server is false then the gold multiplier will be 1; that's how ?: works. (If the first part is true, the second part is returned; else the third part)

What is defined in this macro is what we call a ternary expression.
It's basically an "if" condition concatenated, this expression can be resumed as :
int MyFunction()
{
if(ditribution_test_server == true)
{
return 3;
}
else
{
return 1;
}
}
more on ternary conditions : http://www.cprogramming.com/reference/operators/ternary-operator.html
Now the macro is something completely different. When you define a macro and use it in your code, the compiler replace the macro by what you wrote on the right side.
For example :
#define MY_MACRO 8
int a = MY_MACRO;
actually translate to:
int a = 8;
more on macros : http://www.cplusplus.com/doc/tutorial/preprocessor/
So in your code #define GetGoldMultiplier() (distribution_test_server ? 3 : 1) defines a macro named GetGoldMultiplier() (which is NOT a function !) which upon use will be replaced by (distribution_test_server ? 3 : 1) which can be interpreted as what I wrote before.

The macro will replace any place it you code where there is the symbol GetGoldMultiplier() with the expression "distribution_test_server ? 3 : 1" And this happens as a precompilation step so before the code is interpreted. This also means that there never will be a function GetColdMultiplier() even if your code looks like it is calling it.
This means that if distribution_test_server is false then the expression will always be 1. if it is true the value will always be 3.
That is because the expression
val = a ? 3 : 1
is a short hand syntax inherited from C for the code
if (a)
{
val = 3;
}
else
{
val = 1;
}
You could so of achieve the same thing with an inline function but inline is only a compiler suggestion the macro is guaranteed to do this. But if inlining is preformed, then the result will be equivalent.
inline int GetGoldMultipler()
{
return distribution_test_server ? 3 : 1;
}

passing a substring to a function

I am working on building the LISP interpreter. The problem I am stuck at is where I need to send the entire substring to a function as soon as I encounter a "(".
For example, if I have,
( begin ( set x 2 ) (set y 3 ) )
then I need to pass
begin ( set x 2 ) (set y 3 ) )
and when I encounter "(" again
I need to pass
set x 2 ) (set y 3 ) )
then
set y 3 ) )
I tried doing so with substr by calculating length, but that didn't quite work. If anyone could help, that'd be great.
Requested code
int a=0;
listnode *makelist(string t) //t is the substring
{
//some code
istringstream iss(t);
string word;
while(iss>>word){
if(word=="(")//I used strcmp here. Just for the sake for time saving I wrote this
//some operations
int x=word.size();
a=a+x;
word=word.substr(a);
p->down=makelist(word);//function called again and word here should be the substring
}}

Have you thought of using an intermediate representation? So first parse all whole string to a data structure and then execute it? After all Lisps have had traditionally applicative order which means they evaluate the arguments first before calling the function. The data structure could look something along the lines of a struct which has the first part of the string (ie begin or set in your example) and the rest of the string to process in as a second property (head and rest if you want). Also consider that Trees are more easily constructed through recursion than through iteration, the base case here being reaching the ')' character.
If you are interested in Lisp interpreters and compilers you should checkout Lisp in Small Pieces, well worth the price.

I would have thought soemthing like this:
string str = "( begin ( set x 2 ) (set y 3 ) )";
func(str);
...
void func(string s)
{
int i = 0;
while(s.size() > i)
{
if (s[i] == '(')
{
func(s.substr(i));
}
i++;
}
}
would do the job. [Obviously, you'll perhaps want to do something else in there too!]

Normally, lisp parsing is done by recursively calling a reader and let the reader "consume" as much data as is necessary. If you're doing this on strings, it may be handy to pass the same string around, by reference, and return a tuple of "this is what I read" and "this is where I finished reading".
So something like this (obviously, in actual code, you may want to pass pointers to offset rather than have a pair-structure and needing to deal with memory-management of that, I elided that to make the code more readable):
struct readthing {
Node *data;
int offset
}
struct readthing *read (char *str, int offset) {
if (str[offset] == '(')
return read_delimited(str, offset+1, ')'); /* Read a list, consumer the start */
...
}
struct readthing *read_delimited (char *str, int offset, char terminator) {
Node *list = NULL;
offset = skip_to_next_token(str, offset);
while (str[offset] != terminator) {
struct readthing *foo = read(str, offset);
offset = foo->offset;
list = do_cons(foo->data, list);
}
return make_readthing(do_reverse(list), offset+1);
}

how does a C-like compiler interpret the if statement

In C-like languages, we are used to having if statements similar to the following:
if(x == 5) {
//do something
}
else if(x == 7) {
//do something else
}
else if(x == 9) {
//do something else
} else {
//do something else
}
My question is, does the compiler see that if statement that way, or does it end up being interpreted like:
if(x == 5) {
//do something
}
else {
if(x == 7) {
//do something
}
else {
if(x == 9) {
//do something
}
else {
//do something else
}
}
}
EDIT: I realized that while the question made sense in my head, it probably sounded rather stupid to the rest of the general populace. I was more referring to how the AST would look and if there was any special AST cases for 'else-if' statements or if it would be compiled as a cascading if/else block.

They are equivalent to a C compiler. There is no special syntax else if in C. The second if is just another if statement.
To make it clearer, according to C99 standard, if statement is defined as
selection-statement:
if (expression) statement
if (expression) statement else statement
switch (expression) statement
and a compound-statement is defined as
compound-statement:
{block-item-list(opt) }
block-item-list:
block-item
block-item-list block-item
block-item:
declaration
statement
When a compiler frond-end tries to understand a source code file it often follows these steps:
Lexical analysis: turn the plain-text source code into a list of 'tokens'
Semantic analysis: parse the token list and generate an abstract syntax tree (AST)
The tree is then passed to compiler middle-end (to optimize) or back-end (to generate machine code)
In your case this if statement
if(x == 7) {
//do something else
} else if(x == 9) {
//do something else
} else {
//do something else
}
Is parsed as a selection-statement inside a selection-statement,
selection-stmt
/ | \
exp stmt stmt
| | |
... ... selection-stmt
/ | \
exp stmt stmt
| | |
... ... ...
and this one
if(x == 7) {
//do something else
} else {
if(x == 9) {
//do something else
} else {
//do something else
}
}
is the same selection-statement inside a compound-statement inside a selection-statement:
selection-stmt
/ | \
exp stmt stmt
| | |
... ... compound-stmt
|
block-item-list
|
block-item
|
stmt
|
selection-stmt
/ | \
exp stmt stmt
| | |
... ... ...
So they have different ASTs. But it makes no differences for the compiler backend: as you can see in the AST, there is no structural changes.

In both C and C++ enclosing a statement into a redundant pair of {} does not change the semantics of the program. This statement
a = b;
is equivalent to this one
{ a = b; }
is equivalent to this one
{{ a = b; }}
and to this one
{{{{{ a = b; }}}}}
Redundant {} make absolutely no difference to the compiler.
In your example, the only difference between the first version and the second version is a bunch of redundant {} you added to the latter, just like I did in my a = b example above. Your redundant {} change absolutely nothing. There's no appreciable difference between the two versions of code you presented, which makes your question essentially meaningless.
Either clarify your question, or correct the code, if you meant to ask about something else.

The two snippets of code are, in fact, identical. You can see why this is true by realizing that the syntax of the "if" statement is as follows:
if <expression>
<block>
else
<block>
NOTE that <block> may be surrounded by curly braces if necessary.
So, your code breaks down as follows.
// if <expression>
if (x == 5)
// <block> begin
{
//do something
}
// <block> end
// else
else
// <block> begin
if(x == 7) {
//do something else
}
else if(x == 9) {
//do something else
} else {
//do something else
}
// <block> end
Now if you put curly braces around the block for the "else", as is allowed by the language, you end up with your second form.
// if <expression>
if (x == 5)
// <block> begin
{
//do something
}
// <block> end
// else
else
// <block> begin
{
if(x == 7) {
//do something else
}
else if(x == 9) {
//do something else
} else {
//do something else
}
}
// <block> end
And if you do this repeatedly for all "if else" clauses, you end up with exactly your second form. The two pieces of code are exactly identical, and seen exactly the same way by the compiler.

Closer to the first one, but the question doesn't exactly fit.
When a programs compiled, it goes through a few stages. The first stage is lexical analysis, then the second stage is syntactic analysis. Lexical analysis analyses the text, separating it into tokens. Then syntactic analysis looks at the structure of the program, and constructs an abstract syntax tree (AST). This is the underlying syntactic structure that's created during a compilation.
So basically, if and if-else and if-elseif-else statements are all eventually structures into an abstract syntax tree (AST) by the compiler.
Here's the wikipedia page on ASTs: https://en.wikipedia.org/wiki/Abstract_syntax_tree
edit:
And actually, and if/if else statement probably forms something closer to the second one inside the AST. I'm not quite sure, but I wouldn't be surprised if its represented at an underlying level as a binary tree-like conditional branching structure. If you're interested in learning more in depth about it, you can do some research on the parsing aspect of compiler theory.

Note that although your first statement is indented according to the if-else "ladder" convention, actually the "correct" indentation for it which reveals the true nesting is this:
if(x == 5) {
//do something
} else
if(x == 7) { // <- this is all one big statement
//do something else
} else
if(x == 9) { // <- so is this
//do something else
} else {
//do something else
}
Indentation is whitespace; it means nothing to the compiler. What you have after the first else is one big if statement. Since it is just one statement, it does not require braces around it. When you ask, "does the compiler read it that way", you have to remember that most space is insignificant; the syntax determines the true nesting of the syntax tree.

ANTLR - Keep a block unchanged

I am a beginner with ANTLR, and I need to modify an existing - and complex - grammar.
I want to create a rule to keep a block without parsing with other rules.
To be more clear, I need to insert a code wrote in c++ into the interpreted code.
Edit 11/02/2013
After many tests, here is my grammar, my test, the result I have, and the result and want:
Grammar
cppLiteral
: cppBegin cppInnerTerm cppEnd
;
cppBegin
: '//$CPP_IN$'
;
cppEnd
: '//$CPP_OUT$'
;
cppInnerTerm
: ( ~('//$CPP_OUT$') )*
;
Test
//$CPP_IN$
txt1 txt2
//$CPP_OUT$
Result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1' 'txt2'
cppEnd = '//$CPP_OUT$'
Expected result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1 txt2'
cppEnd = '//$CPP_OUT$'
(Sorry, I can't post the image of the AST because I don't have 10 reputations)
The three tokens "cppBegin", "cppInnerTerm" and "cppEnd" can be in one token, like this:
cppLiteral
: '//$CPP_IN$'( ~('//$CPP_OUT$') )*'//$CPP_OUT$'
;
to have this result:
cppLiteral = '//$CPP_IN$\n txt1 txt2\n //$CPP_OUT$'

I want to create a rule to keep a block without parsing with other rules.
Parse it like a multiline comment, e.g. /* foobar */. Below is a small example using the keywords specified in your question.
Note that most of the work is done with lexer rules (those that start with a capital letter). Any time you want to deal with blocks of text, particularly if you want to avoid other rules as in this case, you're probably thinking in terms of lexer rules rather than parser rules.
CppBlock.g
grammar CppBlock;
document: CPP_LITERAL* EOF;
fragment CPP_IN:'//$CPP_IN$';
fragment CPP_OUT:'//$CPP_OUT$';
CPP_LITERAL: CPP_IN .* CPP_OUT
{
String t = getText();
t = t.substring(10, t.length() - 11); //10 = length of CPP_IN, 11 = length of CPP_OUT
setText(t);
}
;
WS: (' '|'\t'|'\f'|'\r'|'\n')+ {skip();};
Here is a simple test case:
Input
//$CPP_IN$
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
//$CPP_OUT$
//$CPP_IN$
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
//$CPP_OUT$
Output Tokens
[CPP_LITERAL :
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
]
[CPP_LITERAL :
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
]
Rule CPP_LITERAL preserves newlines at the beginning and end of the input (after //$CPP_IN$ and before //$CPP_OUT$). If you don't want those, just update the action to strip them out. Otherwise, I think this grammar does what you're asking for.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

„Erratic“ grammar parsing by JavaCC for the IfStatement? - if-statement

Related

Bison C++ mid-rule value lost with variants

rewrite define macro as c++ code?

passing a substring to a function

how does a C-like compiler interpret the if statement

ANTLR - Keep a block unchanged

Categories

Resources