Consider the following grammar rule:
forstmt: for openparentheses nexpr semicolon expr semicolon nexpr closeparentheses stmt {}
nexpr: expr { }
| %empty { }
expr: .
.
.
// something huge
.
.
.
It is a parser rul for a for loop like below (a usual C like for loop):
for(i=0; i<10; i++){
Print("hello world");
}
I got to generate IR for this C like for loop (forstmt).
The IR for expr is already written.
The point is that the last nexpr's IR should go after the stmt.
I know about mid-rule actions and I thought that somehow I could solve this using a stack, but my thoughts didn't lead to any conclusions.
Precisely is there a way to stop bison from generating IR for the last nexpr and make it generate after the stmt ?
In other words, how to make all action of the last nexpr go after the stmt ?
Has anyone had a problem like this ?
Generally you generate IR (internal represention) in memory, so you can manipulate it after you're parsed your program, which allows you to analyse the whole program and reorder things as you see fit. So the order in which you generate the IR is irrelevant.
If you are instead trying to generate code directly in your actions as they are parsed, you need to set things up so that that works, generally by branching around. So you might do something like:
forexpr:
FOR '(' expr ';' {
$$.l1 = create_label(); // creates a unique label
output_label($$.l1); // the label is at this point in the code
} expr ';' {
$$.l1 = create_label();
$$.l2 = create_label();
$$.l3 = create_lable();
output_jump_if_true($6, $$.l1); // conditional branch
output_jump($$.l2); // unconditional branch
output_label($$.l3);
} expr ')' {
output_jump($5.l1);
output_label($8.l1);
} stmt {
output_jump($8.l3);
output_label($8.l2);
}
Needless to say, this is quite suboptimal.
It must be done by hand!
Bison doesn't have or even it shouldn't have anything for that!
My solution was to some how set a bit somewhere to hold the generated IR in code generator and the release it after the for loop's stmt.
Related
Sorry for the complicated title, but it's a bit hard to explain in just one sentence.
So I'm writing a simple interpreted language to help with some stuff that I often do. I have a lexer set up, feeding into an abstract syntax tree generator.
The Abstract Syntax Tree spits out Expressions. (Which I'm passing around using unique_ptrs). There's several types of expressions that are derived from this base class, which include:
Numbers
Variables
Function calls / prototypes
Binary operations
etc. Each derived class contains the info it needs for that expression, i.e. variables contain a std::string of their identifier, binary operations contain unique_ptrs to the left and right hand side as well as a char of the operator.
Now this is working perfectly, and expressions are parsed just as they should be.
This is what an AST would look like for 'x=y*6^(z-4)+5'
+--Assignment (=)--+
| |
Var (x) +--------BinOp (+)----+
| |
5 +------------BinOp (*)---+
| |
+---------BinOp (^)-------+ Var (y)
| |
Num (6) +------BinOp (-)-----+
| |
Var (z) Num (4)
The issue arises when trying to decouple the AST from the interpreter. I want to keep it decoupled in case I want to provide support for compilation in the future, or whatever. Plus the AST is already getting decently complex and I don't want to add to it. I only want the AST to have information about how to take tokens and convert them, in the right order, into an expression tree.
Now, the interpreter should be able to traverse this list of top down expressions, and recursively evaluate each subexpression, adding definitions to memory, evaluating constants, assigning definitions to their functions, etc. But, each evaluation must return a value so that I can recursively traverse the expression tree.
For example, a binary operation expression must recursively evaluate the left hand side and the right hand side, and then perform an addition of the two sides and return that.
Now, the issue is, the AST returns pointers to the base class, Expr – not the derived types. Calling getExpression returns the next expression regardless of it's derived type, which allows me to easily recursively evaluate binary operations and etc. In order for the interpreter to get the information about these expressions (the number value, or identifier for example), I would have to basically dynamically cast each expression and check if it works, and I'd have to do this repeatedly. Another way would be to do something like the Visitor pattern – the Expr calls the interpreter and passes this to it, which allows the interpreter to have multiple definitions for each derived type. But again, the interpreter must return a value!
This is why I can't use the visitor pattern – I have to return values, which would completely couple the AST to the interpreter.
I also can't use a strategy pattern because each strategy returns wildly different things. The interpreter strategy would be too different from the LLVM strategy, for example.
I'm at a complete loss of what to do here. One really gumpy solution would be to literally have an enum of each expression type as a member of the expr base class, and the interpreter could check the type and then make the appropriate typecast. But that's ugly. Really ugly.
What are my options here? Thanks!
The usual answer (as done with most parser generators) is to have both a token type value and associated data (called attributes in discussion of such things). The type value is generally a simple integer and says "number", "string" "binary op" etc. When deciding what production the use you examine only the token types and when you get a match to a production rule you then know what kind of tokens feed into that rule.
If you want to implement this yourself look up parsing algorithms (LALR and GLR are a couple examples), or you could switch to using a parser generator and only have to worry about getting your grammar correct and then proper implementation of the productions and not have to concern yourself with implementing the parsing engine yourself.
Why can't you use the visitor pattern? Any return results simply become local state:
class EvalVisitor
{
void visit(X x)
{
visit(x.y);
int res1 = res();
visit(x.z);
int res2 = res();
res(res1 + res2);
}
....
};
The above can be abstracted away so that the logic lies in proper eval
functions:
class Visitor
{
public:
virtual void visit(X) = 0;
virtual void visit(Y) = 0;
virtual void visit(Z) = 0;
};
class EvalVisitor : public Visitor
{
public:
int eval(X);
int eval(Y);
int eval(Z);
int result;
virtual void visit(X x) { result = eval(x); }
virtual void visit(Y y) { result = eval(y); }
virtual void visit(Z z) { result = eval(z); }
};
int evalExpr(Expr& x)
{
EvalVisitor v;
x.accept(v);
return x.result;
}
Then you can do:
Expr& expr = ...;
int result = evalExpr(expr);
I was told that a while loop was more efficient than a for loop. (c/c++)
This seemed reasonable but I wanted to find a way to prove or disprove it.
I have tried three tests using analogous snippets of code. Each containing Nothing but a for or while loop with the same output:
Compile time - roughly the same
Run time - Same
Compiled to intel assembly code and compared - Same number of lines and virtually the same code
Should I have tried anything else, or can anyone confirm one way or the other?
All loops follow the same template:
{
// Initialize
LOOP:
if(!(/* Condition */) ) {
goto END
}
// Loop body
// Loop increment/decrement
goto LOOP
}
END:
Therefor the two loops are the same:
// A
for(int i=0; i<10; i++) {
// Do stuff
}
// B
int i=0;
while(i < 10) {
// Do stuff
i++;
}
// Or even
int i=0;
while(true) {
if(!(i < 10) ) {
break;
}
// Do stuff
i++;
}
Both are converted to something similar to:
{
int i=0;
LOOP:
if(!(i < 10) ) {
goto END
}
// Do stuff
i++;
goto LOOP
}
END:
Unused/unreachable code will be removed from the final executable/library.
Do-while loops skip the first conditional check and are left as an exercise for the reader. :)
Certainly LLVM will convert ALL types of loops to a consistent form (to the extent possible, of course). So as long as you have the same functionality, it doesn't really matter if you use for, while, do-while or goto to form the loop, if it's got the same initialization, exit condition, and update statement and body, it will produce the exact same machine code.
This is not terribly hard to do in a compiler if it's done early enough during the optimisation (so the compiler still understands what is actually being written). The purpose of such "make all loops equal" is that you then only need one way to optimise loops, rather than having one for while-loops, one for for-loops, one for do-while loops and one for "any other loops".
It's not guaranteed for ALL compilers, but I know that gcc/g++ will also generate nearly identical code whatever loop construct you use, and from what I've seen Microsoft also does the same.
C and C++ compilers actually convert high level C or C++ codes to assembly codes and in assembly we don't have while or for loops. We can only check a condition and jump to another location.
So, performance of for or while loop heavily depends on how strong the compiler is to optimize the codes.
This is good paper on code optimizations:
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf.
I would like to perform the following:
if(x == true)
{
// do this on behalf of x
// do this on behalf of x
// do this on behalf of x
}
Using a conditional operator, is this correct?
x == true ? { /*do a*/, /*do b*/, /*do c*/ } : y == true ? ... ;
Is this malformed?
I am not nesting more than one level with a conditional operator.
The expressions I intend to use are highly terse and simple making a conditional operator, in my opinion, worth using.
P.S. I am not asking A. Which I should use? B. Which is better C. Which is more appropriate
P.S. I am asking how to convert an if-else statement to a ternary conditional operator.
Any advice given on this question regarding coding standards etc. are simply undesired.
Don't compare booleans to true and false. There's no point because they're true or false already! Just write
if (x)
{
// do this on behalf of x
// do this on behalf of x
// do this on behalf of x
}
Your second example doesn't compile because you use { and }. But this might
x ? ( /*do a*/, /*do b*/, /*do c*/ ) : y ? ... ;
but it does depend on what /*do a*/ etc are.
Using comma operator to string different expressions together is within the rules of the language, but it makes the code harder to read (because you have to spot the comma, which isn't always easy, especially if the expression isn't really simple.
The other factor is of course that you can ONLY do this for if (x) ... else if(y) ... type conditionals state.
Sometimes, it seems like people prefer "short code" from "readable code", which is of course great if you are in a competition of "who can write this in the fewest lines", but for everything else, particularly code that "on show" or shared with colleagues that also need to understand it - once a software project gets sufficiently large, it usually becomes hard to understand how the code works WITHOUT obfuscation that makes the code harder to read. I don't really see any benefit in using conditional statements in the way your second example described. It is possible that the example is bad, but generally, I'd say "don't do that".
Of course it works (with C++11). I have not tried a solution but following Herb Sutters way you can use ether a function call or a lambda which is immediately executed:
cond ?
[&]{
int i = some_default_value;
if(someConditionIstrue)
{
Do some operations ancalculate the value of i;
i = some calculated value;
}
return i;
} ()
:
somefun() ;
I have not tried to compile it but here you have an result whih is either computed with an lambda or an normal function.
I just ran into this piece of code that does this :
delete a, a = 0;
It compiles and runs just fine. But isn't this supposed to be :
delete a;
a = 0;
Why is separating statements using , allowed in this case ?
Thanks :)
In C and C++, most "statements" are actually expressions. The semicolon added to an expression makes it into a statement. Alternatively, it is allowed (but almost always bad style) to separate side-effectful expressions with the comma operator: the left-hand-side expression is evaluated for its side-effects (and its value is discarded), and the right-hand-side expression is evaluated for its value.
This is the comma-operator. It evaluates both it's arguments and returns the second one.
This is the comma operator. It can be used to separate expressions, but not declarations.
That is comma operator. MSDN article is here. And have a look at this question to understand how it works.
While it is possible to write code like that, it may be somewhat weird. A slightly more realistic usecase would be if you have a struct T as follows:
struct T {
bool check() const;
void fix();
};
Now you want to iterate through everything in the struct and run check on it, and then call fix if check returns false. The simple way to do this would be
for (list<T>::iterator it = mylist.begin(); it < mylist.end(); ++it)
if (!it->check())
it->fix();
Let's pretend you want to write it in as short a way as possible. fix() returning void means you can't just put it in the condition. However, using the comma operator you can get around this:
for (auto it = mylist.begin(); it != mylist.end() && (it->check() || (it->fix(), true)); ++it);
I wouldn't use it without a particularly good reason, but it does allow you to call any function from a condition, which can be convenient.
I want to have a while loop do something like the following, but is this possible in c++? If so, how does the syntax go?
do {
//some code
while( expression to be evaluated );
// some more code
}
I would want the loop to be exited as soon as the while statement decides the expression is no longer true( i.e. if expression is false, //some more code is not executed)
You can do:
while (1) {
//some code
if ( condition) {
break;
}
// some more code
}
A little background and analysis: what you're asking for I've heard called a "Dahl loop", named after Ole-Johan Dahl of Simula fame. As Sean E. states, C++ doesn't have them (ima's answer aside), though a few other languages do (notably, Ada). It can take the place of "do-while" and "while-do" loops, which makes it a useful construct. The more general case allows for an arbitrary number of tests. While C++ doesn't have special syntax for Dahl loops, Sean McCauliff and AraK's answers are completely equivalent to them. The "while (true)" loop should be turned into a simple jump by the compiler, so the compiled version is completely indistinguishable from a compiled version of a hypothetical Dahl loop. If you find it more readable, you could also use a
do {
...
} while (true);
Well, I think you should move the condition to the middle of the loop(?):
while (true)
{
...
// Insert at favorite position
if (condition)
break;
...
}
Technically, yes:
for ( ;CodeBefore, Condition; ) {CodeAfter}
The answer is no, you can't have your loop automatically terminate when the condition that the while statement is supposed to evaluate is true until it actually evaluates it at the top (or bottom) of the loop. The while statement can't be placed in the middle of the loop.
A variation on Dahl's loop that does not involve a goto into the middle of a do-while() is to use a switch-default:
switch (0) do {
// some more code
default:
// some code
} while (expression);