Initiating Short circuit rule in Bison for && and || operations - c++

I'm programming a simple calculator in Bison & Flex , using C/C++ (The logic is done in Bison , and the C/C++ part is responsible for the data structures , e.g. STL and more) .
I have the following problem :
In my calculator the dollar sign $ means i++ and ++i (both prefix and postfix) , e.g. :
int y = 3;
-> $y = 4
-> y$ = 4
When the user hits : int_expression1 && int_expression2 , if int_expression1 is evaluated to 0 (i.e. false) , then I don't wan't bison to evaluate int_expression2 !
For example :
int a = 0 ;
int x = 2 ;
and the user hits : int z = a&&x$ ...
So , the variable a is evaluated to 0 , hence , I don't want to evaluate x , however it still grows by 1 ... here is the code of the bison/c++ :
%union
{
int int_value;
double double_value;
char* string_value;
}
%type <int_value> int_expr
%type <double_value> double_expr
%type <double_value> cmp_expr
int_expr:
| int_expr '&&' int_expr { /* And operation between two integers */
if ($1 == 0)
$$ = 0;
else // calc
$$ = $1 && $3;
}
How can I tell bison to not evaluate the second expression , if the first one was already evaluated to false (i.e. 0) ?

Converting extensive commentary into an answer:
How can I tell Bison to not evaluate the second expression if the first one was already evaluated to false?
It's your code that does the evaluation, not Bison; put the 'blame' where it belongs.
You need to detect that you're dealing with an && rule before the RHS is evaluated. The chances are that you need to insert some code after the && and before the second int_expr that suspends evaluation if the first int_expr evaluates to 0. You'll also need to modify all the other evaluation code to check for and obey a 'do not evaluate' flag.
Alternatively, you have the Bison do the parsing and create a program that you execute when the parse is complete, rather than evaluating as you parse. That is a much bigger set of changes.
Are you sure regarding putting some code before the second int_expr ? I can't seem to find a plausible way to do that. It's a nice trick, but I can't find a way to actually tell Bison not to evaluate the second int_expr, without ruining the entire evaluation.
You have to write your code so that it does not evaluate when it is not supposed to evaluate. The Bison syntax is:
| int_expr '&&' {...code 1...} int_expr {...code 2...}
'Code 1' will check on $1 and arrange to stop evaluating (set a global variable or something similar). 'Code 2' will conditionally evaluate $4 (4 because 'code 1' is now $3). All evaluation code must obey the dictates of 'code 1' — it must not evaluate if 'code 1' says 'do not evaluate'. Or you can do what I suggested and aselle suggested; parse and evaluate separately.
I second aselle's suggestion about The UNIX Programming Environment. There's a whole chapter in there about developing a calculator (they call it hoc for higher-order calculator) which is worth reading. Be aware, though, that the book was published in 1984, and pre-dates the C standard by a good margin. There are no prototypes in the C code, and (by modern standards) it takes a few liberties. I do have hoc6 (the last version of hoc they describe; also versions 1-3) in modern C — contact me if you want it (see my profile).
That's the problem: I can't stop evaluating in the middle of the rule, since I cannot use return (I can, but of no use; it causes the program to exit). | intExpr '&&' { if ($1 == 0) {/* turn off a flag */ } } intExpr { /* code */} After I exit $3 the $4 is being evaluated automatically.
You can stop evaluating in the middle of a rule, but you have to code your expression evaluation code block to take the possibility into account. And when I said 'stop evaluating', I meant 'stop doing the calculations', not 'stop the parser in its tracks'. The parsing must continue; your code that calculates values must only evaluate when evaluation is required, not when no evaluation is required. This might be an (ugh!) global flag, or you may have some other mechanism.
It's probably best to convert your parser into a code generator and execute the code after you've parsed it. This sort of complication is why that is a good strategy.
#JonathanLeffler: You're indeed the king ! This should be an answer !!!
Now it is an answer.

You almost assuredly want to generate some other representation before evaluating in your calculator. A parse tree or ast are classic methods, but a simple stack machine is also popular. There are many great examples of how to do this, but my favorite is
http://www.amazon.com/Unix-Programming-Environment-Prentice-Hall-Software/dp/013937681X
That shows how to take a simple direct evaluation tool like you have made in yacc (old bison) and take it all the way to a programming language that is almost as powerful as BASIC. All in very few pages. It's a very old book but well worth the read.
You can also look at SeExpr http://www.disneyanimation.com/technology/seexpr.html
which is a simple expression language calculator for scalars and 3 vectors. If you look at https://github.com/wdas/SeExpr/blob/master/src/SeExpr/SeExprNode.cpp
on line 313 you will see the && implementation of he eval() function:
void
SeExprAndNode::eval(SeVec3d& result) const
{
// operands and result must be scalar
SeVec3d a, b;
child(0)->eval(a);
if (!a[0]) {
result[0] = 0;
} else {
child(1)->eval(b);
result[0] = (b[0] != 0.0);
}
}
That file contains all objects that represent operations in the parse tree. These objects are generated as the code is parsed (these are the actions in yacc). Hope this helps.

Related

C++ parsing expressions, breaking down the order of evaluation

I'm trying to write an expression parser. One part I'm stuck on is breaking down an expression into blocks via its appropriate order of precedence.
I found the order of precedence for C++ operators here. But where exactly do I split the expression based on this?
I have to assume the worst of the user. Here's a really messy over-exaggerated test example:
if (test(s[4]) < 4 && b + 3 < r && a!=b && ((c | e) == (g | e)) ||
r % 7 < 4 * givemeanobj(a & c & e, b, hello(c)).method())
Perhaps it doesn't even evaluate, and if it doesn't I still need to break it down to determine that.
It should break down into blocks of singles and pairs connected by operators. Essentially it breaks down into a tree-structure where the branches are the groupings, and each node has two branches.
Following the order of precedence the first thing to do would be to evaluate the givemeanobj(), however that's an easy one to see. The next would be the multiplication sign. Does that split everything before the * into a separate , or just the 4? 4 * givemeanobj comes before the <, right? So that's the first grouping?
Is there a straightforward rule to follow for this?
Is there a straightforward rule to follow for this?
Yes, use a parser generator such as ANTLR. You write your language specification formally, and it will generate code which parses all valid expressions (and no invalid ones). ANTLR is nice in that it can give you an abstract syntax tree which you can easily traverse and evaluate.
Or, if the language you are parsing is actually C++, use Clang, which is a proper compiler and happens to be usable as a library as well.

Why are if expressions and if statements in ada, also for case

Taken from Introduction to Ada—If expressions:
Ada's if expressions are similar to if statements. However, there are a few differences that stem from the fact that it is an expression:
All branches' expressions must be of the same type
It must be surrounded by parentheses if the surrounding expression does not already contain them
An else branch is mandatory unless the expression following then has a Boolean value. In that case an else branch is optional and, if not present, defaults to else True.
I do not understand the need to have two different ways of constructing code with the if keyword. What is the reasoning behind this?
Also there case expressions and case statements. Why is this?
I think this is best answered by quoting the Ada 2012 Rationale Chapter 3.1:
One of the key areas identified by the WG9 guidance document [1] as
needing attention was improving the ability to write and enforce
contracts. These were discussed in detail in the previous chapter.
When defining the new aspects for preconditions, postconditions, type
invariants and subtype predicates it became clear that without more
flexible forms of expressions, many functions would need to be
introduced because in all cases the aspect was given by an expression.
However, declaring a function and thus giving the detail of the
condition, invariant or predicate in the function body makes the
detail of the contract rather remote for the human reader. Information
hiding is usually a good thing but in this case, it just introduces
obscurity. Four forms are introduced, namely, if expressions, case
expressions, quantified expressions and expression functions. Together
they give Ada some of the flexible feel of a functional language.
In addition, if statements and case statements often assigns different values to the same variable in all branches, and nothing else:
if Foo > 10 then
Bar := 1;
else
Bar := 2;
end if;
In this case, an if expression may increase readability and more clearly state in the code what's going on:
Bar := (if Foo > 10 then 1 else 2);
We can now see that there's no longer a need for the maintainer of the code to read a whole if statement in order to see that only a single variable is updated.
Same goes for case expressions, which can also reduce the need for nesting if expressions.
Also, I can throw the question back to you: Why does C-based languages have the ternary operator ?: in addition to if statements?
Egilhh already covered the main reason, but there are sometimes other useful reasons to implement expressions. Sometimes you make packages where only one or two methods are needed and they are the only reason to make a package body. You can use expressions to make expression functions which allow you to define the operations in the spec file.
Additionally, if you ever end up with some complex variant record combinations, sometimes expressions can be used to setup default values for them in instances where you normally would not be able to as cleanly. Consider the following example:
with Ada.Text_IO; use Ada.Text_IO;
procedure Hello is
type Binary_Type is (On, Off);
type Inner(Binary : Binary_Type := Off) is record
case Binary is
when On =>
Value : Integer := 0;
when Off =>
null;
end case;
end record;
type Outer(Some_Flag : Boolean) is record
Other : Integer := 32;
Thing : Inner := (if Some_Flag then
(Binary => Off)
else
(Binary => On, Value => 23));
end record;
begin
Put_Line("Hello, world!");
end Hello;
I had something come up with a more complex setup that was meant to map to a complex messaging interface at the hardware level. It's nice to have defaults whenever possible. Now I cold have used a case inside of Outer, but then I would have had to come up with two separately named versions of the message field for each case, which really isn't optimal when you want your code to map to an ICD. Again, I could have used a function to initialize it as well, but as noted in the other posters answer, that isn't always a good way to go.
Another place that outlines the motivation for adding conditional expressions to Ada can be found in the ARG document, AI05-0147-1, which explains the motivation and gives some examples of use.
An example of a place where I find them quite useful is in processing command line parameters, for the case when a default value is used if the parameter is not specified on the command line. Generally, you'd want to declare such values as constants in one's program. Conditional expressions makes it easier to do that.
with Ada.Command_Line; use Ada;
procedure Main
is
N : constant Positive :=
(if Command_Line.Argument_Count = 0 then 2_000_000
else Positive'Value (Command_Line.Argument (1)));
...
Otherwise, without conditional expressions, in order to achieve the same effect you'd need to declare a function, which I find to be more difficult to read;
with Ada.Command_Line; use Ada;
procedure Main
is
function Get_N return Positive is
begin
if Command_Line.Argument_Count = 0 then
return 2_000_000;
else
return Positive'Value (Command_Line.Argument (1));
end if;
end Get_N;
N : constant Positive := Get_N;
...
The if expression in Ada feels and works a lot like a statement using the ternary operator in the C-based languages. I took the liberty of copying some code from learn.adacore.com that introduces the if expression:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Check_Positive is
N : Integer;
begin
Put ("Enter an integer value: ");
Get (N);
Put (N,0);
declare
S : constant String :=
(if N > 0 then " is a positive number"
else " is not a positive number");
begin
Put_Line (S);
end;
end Check_Positive;
And I translated it to a C-based language - in this case, Java. I believe the main point to notice is that both languages, although syntactically different, are effectively doing the same thing: testing a condition and assigning one of two values to a variable all within one statement. Although I realize this is an oversimplification for most here on stackoverlfow. My goal is to help the beginner to understand the basic concept with introductory examples. Cheers.
import java.util.Scanner;
public class IfExpression {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter an integer value: ");
var N = in.nextInt();
System.out.print(N);
var S = N > 0 ? " is a positive number" : " is not a positive number";
System.out.println(S);
in.close();
}
}

C++ refactoring: conditional expansion and block elimination

I'm in the process of refactoring a very large amount of code, mostly C++, to remove a number of temporary configuration checks which have become permanantly set to given values. So for example, I would have the following code:
#include <value1.h>
#include <value2.h>
#include <value3.h>
...
if ( value1() )
{
// do something
}
bool b = value2();
if ( b && anotherCondition )
{
// do more stuff
}
if ( value3() < 10 )
{
// more stuff again
}
where the calls to value return either a bool or an int. Since I know the values that these calls always return, I've done some regex substitution to expand the calls to their normal values:
// where:
// value1() == true
// value2() == false
// value3() == 4
// TODO: Remove expanded config (value1)
if ( true )
{
// do something
}
// TODO: Remove expanded config (value2)
bool b = false;
if ( b && anotherCondition )
{
// do more stuff
}
// TODO: Remove expanded config (value3)
if ( 4 < 10 )
{
// more stuff again
}
Note that although the values are fixed, they are not set at compile time but are read from shared memory so the compiler is not currently optimising anything away behind the scenes.
Although the resultant code looks a bit goofy, this regex approach achieves a lot of what I want since it's simple to apply and removes dependence on the calls, while not changing the behaviour of the code and it's also likely that the compiler may then optimise a lot of it out knowing that a block can never be called or a check will always return true. It also makes it reasonably easy (especially when diffing against version control) to see what has changed and take the final step of cleaning it up so the code above code eventually looks as follows:
// do something
// DONT do more stuff (b being false always prevented this)
// more stuff again
The trouble is that I have hundreds (possibly thousands) of changes to make to get from the second, correct but goofy, stage to get to the final cleaned code.
I wondered if anyone knew of a refactoring tool which might handle this or of any techniques I could apply. The main problem is that the C++ syntax makes full expansion or elimination quite difficult to achieve and there are many permutations to the code above. I feel I almost need a compiler to deal with the variation of syntax that I would need to cover.
I know there have been similar questions but I can't find any requirement quite like this and also wondered if any tools or procedures had emerged since they were asked?
It sounds like you have what I call "zombie code"... dead in practice, but still live as far as the compiler is concerned. This is a pretty common issue with most systems of organized runtime configuration variables: eventually some configuration variables arrive at a permanent fixed state, yet are reevaluated at runtime repeatedly.
The cure isn't regex, as you have noted, because regex doesn't parse C++ code reliably.
What you need is a program transformation system. This is a tool that really parses source code, and can apply a set of code-to-code rewriting rules to the parse tree, and can regenerate source text from the changed tree.
I understand that Clang has some capability here; it can parse C++ and build a tree, but it does not have source-to-source transformation capability. You can simulate that capability by writing AST-to-AST transformations but that's a lot more inconvenient IMHO. I believe it can regenerate C++ code but I don't know if it will preserve comments or preprocessor directives.
Our DMS Software Reengineering Toolkit with its C++(11) front end can (and has been used to) carry out massive transformations on C++ source code, and has source-to-source transformations. AFAIK, it is the only production tool that can do this. What you need is a set of transformations that represent your knowledge of the final state of the configuration variables of interest, and some straightforward code simplification rules. The following DMS rules are close to what you likely want:
rule fix_value1():expression->expression
"value1()" -> "true";
rule fix_value2():expression->expression
"value2()" -> "false";
rule fix_value3():expression->expression
"value3()" -> "4";
rule simplify_boolean_and_true(r:relation):condition->condition
"r && true" -> "r".
rule simplify_boolean_or_ture(r:relation):condition->condition
"r || true" -> "true".
rule simplify_boolean_and_false(r:relation):condition->condition
"r && false" -> "false".
...
rule simplify_boolean_not_true(r:relation):condition->condition
"!true" -> "false".
...
rule simplify_if_then_false(s:statement): statement->statement
" if (false) \s" -> ";";
rule simplify_if_then_true(s:statement): statement->statement
" if (true) \s" -> "\s";
rule simplify_if_then_else_false(s1:statement, s2:statement): statement->statement
" if (false) \s1 else \s2" -> "\s2";
rule simplify_if_then_else_true(s1:statement, s2: statement): statement->statement
" if (true) \s1 else \s2" -> "\s2";
You also need rules to simplify ("fold") constant expressions involving arithmetic, and rules to handle switch on expressions that are now constant. To see what DMS rules look like for integer constant folding see Algebra as a DMS domain.
Unlike regexes, DMS rewrite rules cannot "mismatch" code; they represent the corresponding ASTs and it is that ASTs that are matched. Because it is AST matching, they have no problems with whitespace, line breaks or comments. You might think they could have trouble with order of operands ('what if "false && x" is encountered?'); they do not, as the grammar rules for && and || are marked in the DMS C++ parser as associative and commutative and the matching process automatically takes that into account.
What these rules cannot do by themselves is value (in your case, constant) propagation across assignments. For this you need flow analysis so that you can trace such assignments ("reaching definitions"). Obviously, if you don't have such assignments or very few, you can hand patch those. If you do, you'll need the flow analysis; alas, DMS's C++ front isn't quite there but we are working on it; we have control flow analysis in place. (DMS's C front end has full flow analysis).
(EDIT February 2015: Now does full C++14; flow analysis within functions/methods).
We actually applied this technique to 1.5M SLOC application of mixed C and C++ code from IBM Tivoli almost a decade ago with excellent success; we didn't need the flow analysis :-}
You say:
Note that although the values are reasonably fixed, they are not set at compile time but are read from shared memory so the compiler is not currently optimising anything away behind the scenes.
Constant-folding the values by hand doesn't make a lot of sense unless they are completely fixed. If your compiler provides constexpr you could use that, or you could substitute in preprocessor macros like this:
#define value1() true
#define value2() false
#define value3() 4
The optimizer would take care of you from there. Without seeing examples of exactly what's in your <valueX.h> headers or knowing how your process of getting these values from shared memory is working, I'll just throw out that it could be useful to rename the existing valueX() functions and do a runtime check in case they change again in the future:
// call this at startup to make sure our agreed on values haven't changed
void check_values() {
assert(value1() == get_value1_from_shared_memory());
assert(value2() == get_value2_from_shared_memory());
assert(value3() == get_value3_from_shared_memory());
}

How to build parse tree?

Have found C++ BNF and there next lines
selection-statement:
if ( condition ) statement
if ( condition ) statement else statement
Now trying to write parser. Need to build parse tree. On input i have BNF and source file. But i'm stucked in how i can point my parser what if condition evaluated to true, then it need to execute first statement otherwise else block? Thanks.
Conditional statements have a simple recursive structure. The corresponding recursive descent parser has a similarly simple recursive structure. Abstractly, the interior conditionals are parsed as follows:
<cond> -> if <expression> then <statement> [else <statement>]
cond :
a = parse expression
b = parse statement
if is_else(token)
then c = parse statement
return conditional(a,b,c)
else return conditional(a,b)
In your example, conditional statements contain blocks of conditionals the last of which contains an else clause. Assuming that the tokenized input sequence has this form and syntactic errors were detected during lexical analysis, the outer conditional is parsed as follows:
<conditional> -> selection_statement: {<cond>} <cond>
conditional :
b = new block
while (iscond(next))
s = parse cond
b = insert(s,b)
return b
Of course, the actual implementation will be significantly more detailed and tedious. However, the preceding describes in outline the construction of a parse tree of a conditional statement having the required form from a tokenized input sequence.
I just realized you were talking about evaluating the abstract syntax tree. The structure of the function that evaluations a conditional statement is similar to the function that parses a conditional statement. Abstractly,
cond(input) :
a = evaluate(if_part(input))
if is_true(a)
then evaluate(then_part(input))
else if(is_else(input))
then evaluate(else_part(input))
else return
In order to determine which portion of the conditional to evaluate, you must first evalute the "if" part of the conditional to a Boolean value. If the Boolean value is "true," the "then" part of the conditional is evaluated. If the Boolean value is "false," then the "else" part of the conditional is evaluated. If there is no "else" part, there is nothing to evaluate. Of course, the implementation will be more detailed than the above.
First of all, you need to distinguish between the usual passes of a compiler:
The lexing, that is recognizing words and removing comments,
the parsing, that is structuring of the linear input stream into an abstract syntax tree, and
the evaluation or code generation.
Do the first things first, then you'll understand the rest. Look at boost::spirit for the first two steps.
There are a variety of programs that take a BNF grammar and output a proper parser: http://en.wikipedia.org/wiki/Backus-Naur_form#Software_using_BNF
If you are writing your own parser, there is an excellent overview online here.

Processing conditional statements

I'm not looking for an implementation, just pseudo-code, or at least an algorithm to handle this effectively. I need to process statements like these:
(a) # if(a)
(a,b) # if(a || b)
(a+b) # if(a && b)
(a+b,c) # same as ((a+b),c) or if((a&&b) || c)
(a,b+c) # same as (a,(b|c)) or if(a || (b&&c))
So the + operator takes precedence over the , operator. (so my + is like mathematical multiplication with , being mathematical addition, but that is just confusing).
I think a recursive function would be best, so I can handle nested parentheses nice and easy by a recursive call. I'll also take care of error handling once the function returns, so no worries there. The problems I'm having:
I just don't know how to tackle the precedence thing. I could return true as soon as I see a , and the previous value was true. Otherwise, I'll rerun the same routine. A plus would effectively be a bool multiplication (ie true*true=true, true*false=false etc...).
Error detection: I've thought up several schemes to handle the input, but there are a lot of ugly bad things I want to detect and print an error to the user. None of the schemes I thought of handle errors in a unified (read: centralized) place in the code, which would be nice for maintainability and readability:
()
(,...
(+...
(a,,...
(a,+...
(a+,...
(a++...
Detecting these in my "routine" above should take care of bad input. Of course I'll check end-of-input each time I read a token.
Of course I'll have the problem of maybe having o read the full text file if there are unmatched parenthesis, but hey, people should avoid such tension.
EDIT: Ah, yes, I forgot the ! which should also be usable like the classic not operator:
(!a+b,c,!d)
Tiny update for those interested: I had an uninformed wild go at this, and wrote my own implementation from scratch. It may not be pretty enough for the die-hards, so hence this question on codereview.
The shunting-yard algorithm is easily implementable in a relatively short amount of code. It can be used to convert an infix expression like those in your examples into postfix expressions, and evaluation of a postfix expression is Easy-with-a-capital-E (you don't strictly need to complete the infix-to-postfix conversion; you can evaluate the postfix output of the shunting yard directly and just accumulate the result as you go along).
It handles operator precedence, parentheses, and both unary and binary operators (and with a little effort can be modified to handle infix ternary operators, like the conditional operator in many languages).
Write it in yacc (bison) it becomes trivial.
/* Yeacc Code */
%token IDENTIFIER
%token LITERAL
%%
Expression: OrExpression
OrExpression: AndExpression
| OrExpression ',' AndExpression
AndExpression: NotExpression
| AndExpression '+' NotExpression
NotExpression: PrimaryExpression
| '!' NotExpression
PrimaryExpression: Identifier
| Literal
| '(' Expression ')'
Literal: LITERAL
Identifier: IDENTIFIER
%%
There's probably a better (there's definitely a more concise) description of this, but I learned how to do this from this tutorial many years ago:
http://compilers.iecc.com/crenshaw/
It's a very easy read for non-programmers too (like me). You'll need only the first few chapters.