How to build parse tree? - c++

Have found C++ BNF and there next lines
selection-statement:
if ( condition ) statement
if ( condition ) statement else statement
Now trying to write parser. Need to build parse tree. On input i have BNF and source file. But i'm stucked in how i can point my parser what if condition evaluated to true, then it need to execute first statement otherwise else block? Thanks.

Conditional statements have a simple recursive structure. The corresponding recursive descent parser has a similarly simple recursive structure. Abstractly, the interior conditionals are parsed as follows:
<cond> -> if <expression> then <statement> [else <statement>]
cond :
a = parse expression
b = parse statement
if is_else(token)
then c = parse statement
return conditional(a,b,c)
else return conditional(a,b)
In your example, conditional statements contain blocks of conditionals the last of which contains an else clause. Assuming that the tokenized input sequence has this form and syntactic errors were detected during lexical analysis, the outer conditional is parsed as follows:
<conditional> -> selection_statement: {<cond>} <cond>
conditional :
b = new block
while (iscond(next))
s = parse cond
b = insert(s,b)
return b
Of course, the actual implementation will be significantly more detailed and tedious. However, the preceding describes in outline the construction of a parse tree of a conditional statement having the required form from a tokenized input sequence.
I just realized you were talking about evaluating the abstract syntax tree. The structure of the function that evaluations a conditional statement is similar to the function that parses a conditional statement. Abstractly,
cond(input) :
a = evaluate(if_part(input))
if is_true(a)
then evaluate(then_part(input))
else if(is_else(input))
then evaluate(else_part(input))
else return
In order to determine which portion of the conditional to evaluate, you must first evalute the "if" part of the conditional to a Boolean value. If the Boolean value is "true," the "then" part of the conditional is evaluated. If the Boolean value is "false," then the "else" part of the conditional is evaluated. If there is no "else" part, there is nothing to evaluate. Of course, the implementation will be more detailed than the above.

First of all, you need to distinguish between the usual passes of a compiler:
The lexing, that is recognizing words and removing comments,
the parsing, that is structuring of the linear input stream into an abstract syntax tree, and
the evaluation or code generation.
Do the first things first, then you'll understand the rest. Look at boost::spirit for the first two steps.

There are a variety of programs that take a BNF grammar and output a proper parser: http://en.wikipedia.org/wiki/Backus-Naur_form#Software_using_BNF
If you are writing your own parser, there is an excellent overview online here.

Related

MARIE Assembly if else

Struggle with MARIE Assembly.
Needing to write a code that has x=3 and y=5, is x>y then it needs to output 1, if x<y it needs to output one,
I have the start but don't know how to do if else statements in MARIE
LOAD X
SUBT Y
SKIPCOND 800
JUMP ELSE
OUTPUT
HALT
Structured statements have a pattern, and each one has an equivalent pattern in assembly language.
The if-then-else statement, for example, has the following pattern:
if ( <condition> )
<then-part>
else
<else-part>
// some statement after if-then-else
Assembly language uses an if-goto-label style.  if-goto is a conditional test & branch; and goto alone is an unconditional branch.  These forms alter the flow of control and can be composed to do the same job as structure statements.
The equivalent pattern for the if-then-else in assembly (but written in pseudo code) is as follows:
if ( <condition> is false ) goto if1Else;
<then-part>
goto if1Done;
if1Else:
<else-part>
if1Done:
// some statement after if-then-else
You will note that the first conditional branch (if-goto) needs to branch on condition false.  For example, let's say that the condition is x < 10, then the if-goto should read if ( x >= 10 ) goto if1Else;, which branches on x < 10 being false.  The point of the conditional branch is to skip the then-part (to skip ahead to the else-part) when the condition is false — and when the condition is true, to simply allow the processor to run the then-part, by not branching ahead.
We cannot allow both the then-part and the else-part to execute for the same if-statement's execution.  The then-part, once completed, should make the processor move on to the next statement after the if-then-else, and in particular, to avoid the else-part, since the then-part just fired.  This is done using an unconditional branch (goto without if), to skip ahead around the else-part — if the then-part just fired, then we want the processor to unconditionally skip the else-part.
The assembly pattern for if-then-else statement ends with a label, here if1Done:, which is the logical end of the if-then-else pattern in the if-goto-label style.  Many prefer to name labels after what comes next, but these labels are logically part of the if-then-else, so I choose to name them after the structured statement patterns rather than about subsequent code.  Hopefully, you follow the assembly pattern and see that whether the if-then-else runs the then-part or the else-part, the flow of control comes back together to run the next line of code after the if-then-else, whatever that is (there must be a statement after the if-then-else, because a single statement alone is just a snippet: an incomplete fragment of code that would need to be completed to actually run).
When there are multiple structured statements, like if-statements, each pattern translation must use its own set of labels, hence the numbering of the labels.
(There are optimizations where labels can be shared between two structured statements, but doing that does not optimize the code in any way, and makes it harder to change.  Sometimes nested statements can result in branches to unconditional branches — since these actual machine code and have runtime costs, they can be optimized, but such optimizations make the code harder to rework so should probably be held off until the code is working.)
When two or more if-statements are nested, the pattern is simply applied multiple times.  We can transform the outer if statement first, or the inner first, as long as the pattern is properly applied, the flow of control will work the same in assembly as in the structured statement.
In summary, first compose a larger if-then-else statement:
if ( x < y )
Output(1)
else
Output(one)
(I'm not sure this is what you need, but it is what you said.)
Then apply the pattern transformation into if-goto-label: since, in the abstract, this is the first if-then-else, let's call it if #1, so we'll have two labels if1Done and if1Else.  Place the code found in the structured pattern into the equivalent locations of the if-goto-label pattern, and it will work the same.
MARIE uses SkipCond to form the if-goto statement.  It is typical of machine code to have separate compare and branch instructions (as for a many instruction set architectures, there are too many operands to encode an if goto in a single instruction (if x >= y goto Label; has x, y, >=, and Label as operands/parameters).  MARIE uses subtract and branch relative to 0 (the SkipCond).  There are other write-ups on the specific ways to use it so I won't go into that here, though you have a good start on that already.

How to scan through user input and cut it into chunks in c++?

I'm making a program to evaluate conditional proposition (~ or and -> <->). As the users input propositional variables and truth values (true, false) ,and proposition; the program will go through the inputs and return the truth value for the whole proposition.
For ex: if i set p = true, q = true, r = false and input: p or q and r.
Is the anyway I can cut it into q and r first, then process and put it back to result (which is false), then process the next bit (p or false) ??. And it has to keep cutting out bits (In proper order of Precedence) and putting them back until I have left is a single true or false.
And what I'm I supposed to use to hold user input (array, string) ???
Any help would be appreciated ! Thanks.
Tasks like this are usually split into two phases, lexical analysis and syntactic analysis.
Lexical analysis splits the input into a stream of tokens. In your case the tokens would be the operators ~, or, and, ->, <->, variables and the values true, false. You didn't mention them but I'd imagine you also want to include brackets as tokens in your language. Your language is simple enough that you could just write the lexical analyser yourself but tools such as flex or ragel might help you.
Synyactic analysis is where you tease out the syntactic structure of your input and perform whatever actions you need (evaluate the preposition in your case). Syntactic analysis is more complex than lexical analysys. You could write a recursive descent parser for this task, or you could use a parser generator to write the code for you. The traditional tool for this is called bison, but it's a bit clunky. I like another simple tool called the lemon parser generator although it's more C orientated than C++.

Initiating Short circuit rule in Bison for && and || operations

I'm programming a simple calculator in Bison & Flex , using C/C++ (The logic is done in Bison , and the C/C++ part is responsible for the data structures , e.g. STL and more) .
I have the following problem :
In my calculator the dollar sign $ means i++ and ++i (both prefix and postfix) , e.g. :
int y = 3;
-> $y = 4
-> y$ = 4
When the user hits : int_expression1 && int_expression2 , if int_expression1 is evaluated to 0 (i.e. false) , then I don't wan't bison to evaluate int_expression2 !
For example :
int a = 0 ;
int x = 2 ;
and the user hits : int z = a&&x$ ...
So , the variable a is evaluated to 0 , hence , I don't want to evaluate x , however it still grows by 1 ... here is the code of the bison/c++ :
%union
{
int int_value;
double double_value;
char* string_value;
}
%type <int_value> int_expr
%type <double_value> double_expr
%type <double_value> cmp_expr
int_expr:
| int_expr '&&' int_expr { /* And operation between two integers */
if ($1 == 0)
$$ = 0;
else // calc
$$ = $1 && $3;
}
How can I tell bison to not evaluate the second expression , if the first one was already evaluated to false (i.e. 0) ?
Converting extensive commentary into an answer:
How can I tell Bison to not evaluate the second expression if the first one was already evaluated to false?
It's your code that does the evaluation, not Bison; put the 'blame' where it belongs.
You need to detect that you're dealing with an && rule before the RHS is evaluated. The chances are that you need to insert some code after the && and before the second int_expr that suspends evaluation if the first int_expr evaluates to 0. You'll also need to modify all the other evaluation code to check for and obey a 'do not evaluate' flag.
Alternatively, you have the Bison do the parsing and create a program that you execute when the parse is complete, rather than evaluating as you parse. That is a much bigger set of changes.
Are you sure regarding putting some code before the second int_expr ? I can't seem to find a plausible way to do that. It's a nice trick, but I can't find a way to actually tell Bison not to evaluate the second int_expr, without ruining the entire evaluation.
You have to write your code so that it does not evaluate when it is not supposed to evaluate. The Bison syntax is:
| int_expr '&&' {...code 1...} int_expr {...code 2...}
'Code 1' will check on $1 and arrange to stop evaluating (set a global variable or something similar). 'Code 2' will conditionally evaluate $4 (4 because 'code 1' is now $3). All evaluation code must obey the dictates of 'code 1' — it must not evaluate if 'code 1' says 'do not evaluate'. Or you can do what I suggested and aselle suggested; parse and evaluate separately.
I second aselle's suggestion about The UNIX Programming Environment. There's a whole chapter in there about developing a calculator (they call it hoc for higher-order calculator) which is worth reading. Be aware, though, that the book was published in 1984, and pre-dates the C standard by a good margin. There are no prototypes in the C code, and (by modern standards) it takes a few liberties. I do have hoc6 (the last version of hoc they describe; also versions 1-3) in modern C — contact me if you want it (see my profile).
That's the problem: I can't stop evaluating in the middle of the rule, since I cannot use return (I can, but of no use; it causes the program to exit). | intExpr '&&' { if ($1 == 0) {/* turn off a flag */ } } intExpr { /* code */} After I exit $3 the $4 is being evaluated automatically.
You can stop evaluating in the middle of a rule, but you have to code your expression evaluation code block to take the possibility into account. And when I said 'stop evaluating', I meant 'stop doing the calculations', not 'stop the parser in its tracks'. The parsing must continue; your code that calculates values must only evaluate when evaluation is required, not when no evaluation is required. This might be an (ugh!) global flag, or you may have some other mechanism.
It's probably best to convert your parser into a code generator and execute the code after you've parsed it. This sort of complication is why that is a good strategy.
#JonathanLeffler: You're indeed the king ! This should be an answer !!!
Now it is an answer.
You almost assuredly want to generate some other representation before evaluating in your calculator. A parse tree or ast are classic methods, but a simple stack machine is also popular. There are many great examples of how to do this, but my favorite is
http://www.amazon.com/Unix-Programming-Environment-Prentice-Hall-Software/dp/013937681X
That shows how to take a simple direct evaluation tool like you have made in yacc (old bison) and take it all the way to a programming language that is almost as powerful as BASIC. All in very few pages. It's a very old book but well worth the read.
You can also look at SeExpr http://www.disneyanimation.com/technology/seexpr.html
which is a simple expression language calculator for scalars and 3 vectors. If you look at https://github.com/wdas/SeExpr/blob/master/src/SeExpr/SeExprNode.cpp
on line 313 you will see the && implementation of he eval() function:
void
SeExprAndNode::eval(SeVec3d& result) const
{
// operands and result must be scalar
SeVec3d a, b;
child(0)->eval(a);
if (!a[0]) {
result[0] = 0;
} else {
child(1)->eval(b);
result[0] = (b[0] != 0.0);
}
}
That file contains all objects that represent operations in the parse tree. These objects are generated as the code is parsed (these are the actions in yacc). Hope this helps.

Using first-set in recursive descent parser

I'm writing a recursive descent parser for C in C++. I don't know how to choose the right production in the following case:
statement: labeled-statement | compound-statement | expression-statement | selection-statement | iteration-statement | jump-statement
I read about the "first"-set which compares the lookahead-token/char with possible terminals which comes first in the productions. Currently I'm stuck at using a first-set in a recursive descent parser, because I only have a function and nothing else, no object for each rule or anything else with which I can identify a rule/production.
Your grammar is invalid for recursive descent parsers because it's ambiguous on the left side:
labeled-statement starts with an identifier
compound-statement starts with a { (this is fine)
expression-statement starts with an identifier or a number (or ()
Can stop here, you have a clash between labeled statement and expression statement. You need to transform your grammer to get rid of left-side ambiguities (through temporary grammar nodes to contain the common parts so when you branch you can determine which branch to go to using only the look-ahead).

Trouble with printf in Bison Rule

I have tried something like this in my Bison file...
ReturnS: RETURN expression {printf(";")}
...but the semicolon gets printed AFTER the next token, past this rule, instead of right after the expression. This rule was made as we're required to convert the input file to a c-like form and the original language doesn't require a semicolon after the expression in the return statement, but C does, so I thought I'd add it manually to the output with printf. That doesn't seem to work, as the semicolon gets added but for some reason, it gets added after the next token is parsed (outside the ReturnS rule) instead of right when the expression rule returns to ReturnS.
This rule also causes the same result:
loop_for: FOR var_name COLONEQUALS expression TO {printf("%s<=", $<chartype>2);} expression STEP {printf("%s+=", $<chartype>2);} expression {printf(")\n");} Code ENDFOR
Besides the first two printf's not working right (I'll post another question regarding that), the last printf is actually called AFTER the first token/literal of the "Code" rule has been parsed, resulting in something like this:
for (i=0; i<=5; i+=1
a)
=a+1;
instead of
for (i=0; i<=5; i+=1)
a=a+1;
Any ideas what I'm doing wrong?
Probably because the grammar has to look-ahead one token to decide to reduce by the rule you show.
The action is executed when the rule is reduced, and it is very typical that the grammar has to read one more token before it knows that it can/should reduce the previous rule.
For example, if an expression can consist of an indefinite sequence of added terms, it has to read beyond the last term to know there isn't another '+' to continue the expression.
After seeing the Yacc/Bison grammar and Lex/Flex analyzer, some of the problems became obvious, and others took a little more sorting out.
Having the lexical analyzer do much of the printing meant that the grammar was not properly in control of what appeared when. The analyzer was doing too much.
The analyzer was also not doing enough work - making the grammar process strings and numbers one character at a time is possible, but unnecessarily hard work.
Handling comments is tricky if they need to be preserved. In a regular C compiler, the lexical analyzer throws the comments away; in this case, the comments had to be preserved. The rule handling this was moved from the grammar (where it was causing shift/reduce and reduce/reduce conflicts because of empty strings matching comments) to the lexical analyzer. This may not always be optimal, but it seemed to work OK in this context.
The lexical analyzer needed to ensure that it returned a suitable value for yylval when a value was needed.
The grammar needed to propagate suitable values in $$ to ensure that rules had the necessary information. Keywords for the most part did not need a value; things like variable names and numbers do.
The grammar had to do the printing in the appropriate places.
The prototype solution returned had a major memory leak because it used strdup() liberally and didn't use free() at all. Making sure that the leaks are fixed - possibly by using a char array rather than a char pointer for YYSTYPE - is left to the OP.
Comments aren't a good place to provide code samples, so I'm going to provide an example of code that works, after Jonathan (replied above) did some work on my code. All due credit goes to him, this isn't mine.
Instead of having FLEX print any recognized parts and letting BISON do the formatting afterwards, Jonathan suggested that FLEX prints nothing and only returns to BISON, which should then handle all printing it self.
So, instead of something like this...
FLEX
"FOR" {printf("for ("); return FOR;}
"TO" {printf("; "); return TO;}
"STEP" {printf("; "); return STEP;}
"ENDFOR" {printf("\n"); printf("}\n"); return ENDFOR;}
[a-zA-Z]+ {printf("%s",yytext); yylval.strV = yytext; return CHARACTERS;}
":=" {printf("="); lisnew=0; return COLONEQUALS;}
BISON
loop_for: FOR var_name {strcpy(myvar, $<strV>2);} COLONEQUALS expression TO {printf("%s<=", myvar);} expression STEP {printf("%s+=", myvar);} expression {printf(")\n");} Code ENDFOR
...he suggested this:
FLEX
[a-zA-Z][a-zA-Z0-9]* { yylval = strdup(yytext); return VARNAME;}
[1-9][0-9]*|0 { yylval = strdup(yytext); return NUMBER; }
BISON
loop_for: FOR var_name COLONEQUALS NUMBER TO NUMBER STEP NUMBER
{ printf("for (%s = %s; %s <= %s; %s += %s)\n", $2, $4, $2, $6, $2, $8); }
var_name: VARNAME