If then else ambiguity in CUP - if-statement

I am constructing a grammar in CUP, and I have ran into a roadblock on defining IF-THEN-ELSE statements.
My code looked like this:
start with statements;
/* Top level statements */
statements ::= statement | statement SEPARATOR statements ;
statement ::= if_statement | block | while_statement | declaration | assignment ;
block ::= START_BLOCK statements END_BLOCK ;
/* Control statements */
if_statement ::= IF expression THEN statement
| IF expression THEN statement ELSE statement ;
while_statement ::= WHILE expression THEN statement ;
But the CUP tool complained about the ambiguity in the definition of the if_statement.
I found this article describing how to eliminate the ambiguity without introducing endif tokens.
So I tried adapting their solution:
start with statements;
statements ::= statement | statement SEPARATOR statements ;
statement ::= IF expression THEN statement
| IF expression THEN then_statement ELSE statement
| non_if_statement ;
then_statement ::= IF expression THEN then_statement ELSE then_statement
| non_if_statement ;
// The statement vs then_statement is for disambiguation purposes
// Solution taken from http://goldparser.org/doc/grammars/example-if-then-else.htm
non_if_statement ::= START_BLOCK statements END_BLOCK // code block
| WHILE expression statement // while statement
| declaration | assignment ;
Sadly CUP is complaining as follows:
Warning : *** Reduce/Reduce conflict found in state #57
between statement ::= non_if_statement (*)
and then_statement ::= non_if_statement (*)
under symbols: {ELSE}
Resolved in favor of the first production.
Why is this not working? How do I fix it?

The problem here is the interaction between if statements and while statements, which you can see if you remove the while statement production from non-if-statement.
The problem is that the target of a while statement can be an if statement, and that while statement could then be in the then clause of another if statement:
IF expression THEN WHILE expression IF expression THEN statement ELSE ...
Now we have a slightly different manifestation of the original problem: the else at the end could be part of the nested if or the outer if.
The solution is to extend the distinction between restricted statements ("then-statements" in the terms of your link) to also include two different kinds of while statements:
statement ::= IF expression THEN statement
| IF expression THEN then_statement ELSE statement
| WHILE expression statement
| non_if_statement ;
then_statement ::= IF expression THEN then_statement ELSE then_statement
| WHILE expression then_statement
| non_if_statement ;
non_if_statement ::= START_BLOCK statements END_BLOCK
| declaration | assignment ;
Of course, if you extend your grammar to include other types of compound statements (such as for loops), you will have to do the same thing for each of them.

Related

Antlr4 define a operator which can be Unary or Binary

I am in the process of creating a Antlr4 grammar for a language where the logical operator NOT and the operators Plus and Minus can be unary or binary operators.
How should I define the operators in Antlr4 grammar so that the parser can differentiate between them ?
Example:
NOT 1 is 0 (Unary Operator)
1 NOT 1 is 0 (Binary Operator)
Here is a small part of my Antlr4 Parser:
expr: expr ('%') expr #Modulo
| expr op=('*'|'/') expr #MulDiv
| expr op=('+'|'-') expr #AddSub
| NOT expr #NegOp
Here is a small part of my Antlr4 Lexer:
ADD : '+';
SUB : '-';
NOT : ([nN][oO][tT]|[~]);
To make a operator for my language unary and binary at the same time I needed to add the following rules to my parser:
expr (MOD) expr #Modulo
| op=(ADD|SUB) expr #UnaryPlusMinus
| expr op=(ADD|SUB) expr #AddSub
| expr op=(AND|OR|XOR|NOT) expr #LogOp
| NOT expr #NegOp
For me the above solution is fine because my language supports syntax like the following below:
++5---4 (result is 1)
NOT NOT NOT 1 (result is 0)
However it would be interesting to create a parser/lexer rule where a operator (for example Minus) could be Unary or (Exclusive Or) Binary but the parser should know when the operator is used as Unary or Binary operator and therefore not allow something like this 5-++--4 , as this would lead to an error, but at the same time this would be ok ---5 as it would result to -5.

ANTLR4 grammar doesn't recognize declaration

I have a problem with my ANTLR grammar.
In SQL there are declaration types like UNSIGNED INT or UNSIGNED BIGINT. If I run my grammar with ANTLRWorks in Testrig the parser has a problem with the UNSIGNED.
This is my grammar part for declare_type
declare_type
: BIT
| BOOLEAN
| CHAR ('(' expression ')')?
| CHARACTER ('(' expression ')')?
...
Attempt 1:
...
| UNSIGNED? INT
| UNSIGNED? BIGINT
;
Attempt 2:
...
| INT
| BIGINT
| UNSIGNED INT
| UNSIGNED BIGINT
;
Attempt 3:
...
| INT
| BIGINT
| unsigned_separable_element
;
unsigned_separable_element
: UNSIGNED INT
| UNSIGNED BIGINT
;
I hope you guys know what I my problem is, thanks.
EDITED:
I uploaded the full grammar to GitHub
Example: DECLARE value UNSIGNED INT; doesn't work because the grammar doesn't recognize UNSIGNED
If I use only INT then it works
I went through the grammar and the problem is not poetic at all. Just a typo in the UNSIGNED lexer rule... It was " U N S I G N D" - missing E
Could you specify what kind of a problem do you have? Or at least post whole grammar file? The way lexer rules are specified is important in solving a lot of problems in ANTLR. Thanks.

What does a pipe in a macro signify?

How is the following macro definition resolved?
#define EMAIL_SERVER_ADAPTER_FATAL_ERROR MSB_RETURN_TYPE_FATAL_ERROR | 1
I mean, is it resolved to 1 or to MSB_RETURN_TYPE_FATAL_ERROR and why?
| has no special meaning in macros. The macro is resolved to
MSB_RETURN_TYPE_FATAL_ERROR | 1
which is bitwise OR of two values (MSB_RETURN_TYPE_FATAL_ERROR and 1).
The | in the macro has the same meaning as elsewhere in C and C++. It means bitwise or.
Presumable MSB_RETURN_TYPE_FATAL_ERROR is some numeric value (otherwise it won't compile, pretty much).
For arguments sake, we'll make it 0x100
So the following code:
return EMAIL_SERVER_ADAPTER_FATAL_ERROR;
will expand to:
return MSB_RETURN_TYPE_FATAL_ERROR | 1;
which in turn becomes:
return 0x100 | 1;
which in turn is the same as:
return 0x101;
Of course MSB_RETURN_TYPE_FATAL_ERROR is probably something other than 0x100 - but the principle still applies.
Macros are just text replacement, so
EMAIL_SERVER_ADAPTER_FATAL_ERROR
will be substituted for
MSB_RETURN_TYPE_FATAL_ERROR | 1
After that it is just numbers (i.e. plain bit-wise OR operation).

what's an expression and expression statement in c++?

I've read that usually statements in c++ end with a semi-colon; so that might help explain what an expression statement would be. But then what would you call an expression by giving an example?
In this case, are both just statements or expression statements or expressions?
int x;
x = 0;
An expression is "a sequence of operators and operands that specifies a computation" (that's the definition given in the C++ standard). Examples are 42, 2 + 2, "hello, world", and func("argument"). Assignments are expressions in C++; so are function calls.
I don't see a definition for the term "statement", but basically it's a chunk of code that performs some action. Examples are compound statements (consisting of zero or more other statements included in { ... }), if statements, goto statements, return statements, and expression statements. (In C++, but not in C, declarations are classified as statements.)
The terms statement and expression are defined very precisely by the language grammar.
An expression statement is a particular kind of statement. It consists of an optional expression followed by a semicolon. The expression is evaluated and any result is discarded. Usually this is used when the statement has side effects (otherwise there's not much point), but you can have a expression statement where the expression has no side effects. Examples are:
x = 42; // the expression happens to be an assignment
func("argument");
42; // no side effects, allowed but not useful
; // a null statement
The null statement is a special case. (I'm not sure why it's treated that way; in my opinion it would make more sense for it to be a disinct kind of statement. But that's the way the standard defines it.)
Note that
return 42;
is a statement, but it's not an expression statement. It contains an expression, but the expression (plus the ;) doesn't make up the entire statement.
These are expressions (remember math?):
1
6 * 7
a + b * 3
sin(3) + 7
a > b
a ? 1 : 0
func()
mystring + gimmeAString() + std::string("\n")
The following are all statements:
int x; // Also a declaration.
x = 0; // Also an assignment.
if(expr) { /*...*/ } // This is why it's called an "if-statement".
for(expr; expr; expr) { /*...*/ } // For-loop.
A statement is usually made up of an expression:
if(a > b) // a > b is an expr.
while(true) // true is an expr.
func(); // func() is an expr.
To understand what is an expression statement, you should first know what is an expression and what is an statement.
An expression in a programming language is a combination of one or more explicit values, constants, variables, operators, and functions that the programming language interprets (according to its particular rules of precedence and of association) and computes to produce ("to return", in a stateful environment) another value. This process, as for mathematical expressions, is called evaluation.
Source: https://en.wikipedia.org/wiki/Expression_(computer_science)
In other words expressions are a sort of data items. They can have single or multiple entities like constants and variables. These entities may be related or connected to each other by operators. Expressions may or may not have side effects, in that they evaluate to something by means of computation which changes a state. For instance numbers, things that look like mathematical formulas and calculations, assignments, function calls, logical evaluations, strings and string operations are all considered expressions.
function calls: According to MSDN, function calls are considered expressions. A function call is an expression that passes control and arguments (if any) to a function and has the form:
expression (expression-list opt) which is invoked by the ( ) function operator.
source: https://msdn.microsoft.com/en-us/library/be6ftfba.aspx
Some examples of expressions are:
46
18 * 3 + 22 / 2
a = 4
b = a + 3
c = b * -2
abs(c)
b >= c
c
"a string"
str = "some string"
strcat(str, " some thing else")
str2 = "some string" + " some other string" // in C++11 using string library
Statements are fragments of a program that execute in sequence and cause the computer to carry out some definite action. Some C++ statement types are:
expression statements;
compound statements;
selection statements;
iteration statements;
jump statements;
declaration statements;
try blocks;
atomic and synchronized blocks (TM TS).
Source: http://en.cppreference.com/w/cpp/language/statements
I've read usually statements in c++ ends with a semicon;
Yes usually! But not always. Consider the following piece of code which is a compound statement but does not end with a semicolon, rather it is enclosed between two curly braces:
{ // begining of a compound statement
int x; // A declaration statement
int y;
int z;
x = 2; // x = 2 is an expression, thus x = 2; with the trailing semicolon is an expression statement
y = 2 * x + 5;
if(y == 9) { // A control statement
z = 52;
} else { // A branching statement of a control statement
z = 0;
}
} // end of a compound statement
By now, as you might be guessing, an expression statement is any statement that has an expression followed by a semicolon. According to MSDN an expression statement is a statement that causes the expressions to be evaluated. No transfer of control or iteration takes place as a result of an expression statement.
Source: https://msdn.microsoft.com/en-us/library/s7ytfs2k.aspx
Some Examples of expression statements:
x = 4;
y = x * x + 10;
radius = 5;
pi = 3.141593;
circumference = 2. * pi * radius;
area = pi * radius * radius;
Therefore the following can not be considered expression statements since they transfer the control flow to another part of a program by calling a function:
printf("The control is passed to the printf function");
y = pow(x, 2);
side effects: A side effect refers to the modification of a state. Such as changing the value of a variable, writing some data on a disk showing a menu in the User Interface, etc.
Source: https://en.wikipedia.org/wiki/Side_effect_(computer_science)
Note that expression statements don't need to have side effects. That is they don't have to change or modify any state. For example if we consider a program's control flow as a state which could be modified, then the following expression statements
won't have any side effects over the program's control flow:
a = 8;
b = 10 + a;
k++;
Wheres the following expression statement would have a side effect, since it would pass the control flow to sqrt() function, thus changing a state:
d = sqrt(a); // The control flow is passed to sqrt() function
If we consider the value of a variable as a state as well, modifying it would be a side effect thus all of expression statements above have side effects, because they all modify a state. An expression statement that does not have any side effect is not very useful. Consider the following expression statements:
x = 7; // This expression statement sets the value of x to 7
x; // This expression statement is evaluated to 7 and does nothing useful
In the above example x = 7; is a useful expression statement for us. It sets the value of x to 7 by = the assignment operator. But x; evaluates to 7 and it doesn't do anything useful.
According to The C++ Programming Language by Bjarne Stroustrup Special(3rd) Edition, a statement is basically any declaration, function call, assignment, or conditional. Though, if you look at the grammar, it is much more complicated than that. An expression, in simple terms, is any math or logical operation(s).
The wikipedia links that ok posted in his answer can be of help too.
In my opinion,
a statement *states* the purpose of a code block. i.e. we say this block of code if(){} is an if-statement, or this x=42; is an expression statement. So code such as 42; serves no purporse, therefore, this is *not* a statement.
and,
an expression is any legal combination of symbols that represents a value (Credit to Webopedia); it combines variables and constants to produce new values(Quoted from Chapter 2 in The C Programming Language). Therefore, it also has a mathematical connotation. For instance, number 42 in x=42; is an expression (x=42; is not an expression but rather an expression statement), or func(x) is an expression because it will evaluate to something. On the contrary, int x; is not an expression because it is not representing any value.
I think this excerpt from a technical book is most useful and clear.
Read the paragraphs till the start of 1.4.2 statements would be useful enough.
An expression is "a sequence of operators and operands that specifies a computation"
These are expressions:
1
2 + 2
"hi"
cout << "Hello, World!"
The last one is indeed an expression; << is the output operator, cout (of type ostream) and "Hello, World!" (string literals) are the operands. The operator returns the left-hand operand, so (cout << "Hello, ") << "World!" is also a valid expression but also not a statement.
An expression becomes an expression statement when it is followed by a semicolon:
1;
2 + 2;
"hi";
cout << "Hello, World!";
An expression is part of a statement, OR a statement itself.
int x; is a statement and expression.
See this : http://en.wikipedia.org/wiki/Expression_%28programming%29
http://en.wikipedia.org/wiki/Statement_%28programming%29

Is it possible to generate a parser for a language using the Reverse Polish notation with bison/yacc?

Is it possible to generate a parser for a scripting language that uses the Reverse Polish notation (and a Postscript-like syntax) using bison/yacc?
The parser should be able to parse code similar to the following one:
/fib
{
dup dup 1 eq exch 0 eq or not
{
dup 1 sub fib
exch 2 sub fib
add
} if
} def
Given the short description above and the notes on Wikipedia:
http://en.wikipedia.org/wiki/Stack-oriented_programming_language#PostScript_stacks
A simple bison grammer for the above could be:
%token ADD
%token DUP
%token DEF
%token EQ
%token EXCH
%token IF
%token NOT
%token OR
%token SUB
%token NUMBER
%token IDENTIFIER
%%
program : action_list_opt
action_list_opt : action_list
| /* No Action */
action_list : action
| action_list action
action : param_list_opt operator
param_list_opt : param_list
| /* No Parameters */
param_list : param
| param_list param
param : literal
| name
| action_block
operator : ADD
| DUP
| DEF
| EQ
| EXCH
| IF
| NOT
| OR
| SUB
literal : NUMBER
name : '/' IDENTIFIER
action_block : '{' program '}'
%%
Yes. Assuming you mean one that also uses postscript notation, it means you'd define your expressions something like:
expression: operand operand operator
Rather than the more common infix notation:
expression: operand operator operand
but that hardly qualifies as a big deal. If you mean something else by "Postcript-like", you'll probably have to clarify before a better answer can be given.
Edit: Allowing an arbitrary number of operands and operators is also pretty easy:
operand_list:
| operand_list operand
;
operator_list:
| operator_list operator
;
expression: operand_list operator_list
;
As it stands, this doesn't attempt to enforce the proper number of operators being present for any particular operand -- you'd have to add those checks separately. In a typical case, a postscript notation is executed on a stack machine, so most such checks become simple stack checks.
I should add that although you certainly can write such parsers in something like yacc, languages using postscript notation generally require such minimal parsing that you frequently feed them directly to some sort of virtual machine interpreter that executes them quite directly, with minimal parsing (mostly, the parsing comes down to throwing an error if you attempt to use a name that hasn't been defined).