The branch instruction contains labels which are the names of the basicblocks that it might jump to. Given that, is there a way to extract a MachineBasicBlock object from a branching instruction? for example:
for(MachineBasicBlock &BB : MF){
for(MachineInstr &MI : BB){
if(MI.isConditionalBranch()){
MachineBasicBlock &InstBB = something(MI.getOperand(0));
}
}
}
First you cast MI's operand to BasicBlockSDNode and then use getBasicBlock(). Remember to perform the casting using LLVM cast<>() function.
Related
Consider the following grammar rule:
forstmt: for openparentheses nexpr semicolon expr semicolon nexpr closeparentheses stmt {}
nexpr: expr { }
| %empty { }
expr: .
.
.
// something huge
.
.
.
It is a parser rul for a for loop like below (a usual C like for loop):
for(i=0; i<10; i++){
Print("hello world");
}
I got to generate IR for this C like for loop (forstmt).
The IR for expr is already written.
The point is that the last nexpr's IR should go after the stmt.
I know about mid-rule actions and I thought that somehow I could solve this using a stack, but my thoughts didn't lead to any conclusions.
Precisely is there a way to stop bison from generating IR for the last nexpr and make it generate after the stmt ?
In other words, how to make all action of the last nexpr go after the stmt ?
Has anyone had a problem like this ?
Generally you generate IR (internal represention) in memory, so you can manipulate it after you're parsed your program, which allows you to analyse the whole program and reorder things as you see fit. So the order in which you generate the IR is irrelevant.
If you are instead trying to generate code directly in your actions as they are parsed, you need to set things up so that that works, generally by branching around. So you might do something like:
forexpr:
FOR '(' expr ';' {
$$.l1 = create_label(); // creates a unique label
output_label($$.l1); // the label is at this point in the code
} expr ';' {
$$.l1 = create_label();
$$.l2 = create_label();
$$.l3 = create_lable();
output_jump_if_true($6, $$.l1); // conditional branch
output_jump($$.l2); // unconditional branch
output_label($$.l3);
} expr ')' {
output_jump($5.l1);
output_label($8.l1);
} stmt {
output_jump($8.l3);
output_label($8.l2);
}
Needless to say, this is quite suboptimal.
It must be done by hand!
Bison doesn't have or even it shouldn't have anything for that!
My solution was to some how set a bit somewhere to hold the generated IR in code generator and the release it after the for loop's stmt.
Sorry for the complicated title, but it's a bit hard to explain in just one sentence.
So I'm writing a simple interpreted language to help with some stuff that I often do. I have a lexer set up, feeding into an abstract syntax tree generator.
The Abstract Syntax Tree spits out Expressions. (Which I'm passing around using unique_ptrs). There's several types of expressions that are derived from this base class, which include:
Numbers
Variables
Function calls / prototypes
Binary operations
etc. Each derived class contains the info it needs for that expression, i.e. variables contain a std::string of their identifier, binary operations contain unique_ptrs to the left and right hand side as well as a char of the operator.
Now this is working perfectly, and expressions are parsed just as they should be.
This is what an AST would look like for 'x=y*6^(z-4)+5'
+--Assignment (=)--+
| |
Var (x) +--------BinOp (+)----+
| |
5 +------------BinOp (*)---+
| |
+---------BinOp (^)-------+ Var (y)
| |
Num (6) +------BinOp (-)-----+
| |
Var (z) Num (4)
The issue arises when trying to decouple the AST from the interpreter. I want to keep it decoupled in case I want to provide support for compilation in the future, or whatever. Plus the AST is already getting decently complex and I don't want to add to it. I only want the AST to have information about how to take tokens and convert them, in the right order, into an expression tree.
Now, the interpreter should be able to traverse this list of top down expressions, and recursively evaluate each subexpression, adding definitions to memory, evaluating constants, assigning definitions to their functions, etc. But, each evaluation must return a value so that I can recursively traverse the expression tree.
For example, a binary operation expression must recursively evaluate the left hand side and the right hand side, and then perform an addition of the two sides and return that.
Now, the issue is, the AST returns pointers to the base class, Expr – not the derived types. Calling getExpression returns the next expression regardless of it's derived type, which allows me to easily recursively evaluate binary operations and etc. In order for the interpreter to get the information about these expressions (the number value, or identifier for example), I would have to basically dynamically cast each expression and check if it works, and I'd have to do this repeatedly. Another way would be to do something like the Visitor pattern – the Expr calls the interpreter and passes this to it, which allows the interpreter to have multiple definitions for each derived type. But again, the interpreter must return a value!
This is why I can't use the visitor pattern – I have to return values, which would completely couple the AST to the interpreter.
I also can't use a strategy pattern because each strategy returns wildly different things. The interpreter strategy would be too different from the LLVM strategy, for example.
I'm at a complete loss of what to do here. One really gumpy solution would be to literally have an enum of each expression type as a member of the expr base class, and the interpreter could check the type and then make the appropriate typecast. But that's ugly. Really ugly.
What are my options here? Thanks!
The usual answer (as done with most parser generators) is to have both a token type value and associated data (called attributes in discussion of such things). The type value is generally a simple integer and says "number", "string" "binary op" etc. When deciding what production the use you examine only the token types and when you get a match to a production rule you then know what kind of tokens feed into that rule.
If you want to implement this yourself look up parsing algorithms (LALR and GLR are a couple examples), or you could switch to using a parser generator and only have to worry about getting your grammar correct and then proper implementation of the productions and not have to concern yourself with implementing the parsing engine yourself.
Why can't you use the visitor pattern? Any return results simply become local state:
class EvalVisitor
{
void visit(X x)
{
visit(x.y);
int res1 = res();
visit(x.z);
int res2 = res();
res(res1 + res2);
}
....
};
The above can be abstracted away so that the logic lies in proper eval
functions:
class Visitor
{
public:
virtual void visit(X) = 0;
virtual void visit(Y) = 0;
virtual void visit(Z) = 0;
};
class EvalVisitor : public Visitor
{
public:
int eval(X);
int eval(Y);
int eval(Z);
int result;
virtual void visit(X x) { result = eval(x); }
virtual void visit(Y y) { result = eval(y); }
virtual void visit(Z z) { result = eval(z); }
};
int evalExpr(Expr& x)
{
EvalVisitor v;
x.accept(v);
return x.result;
}
Then you can do:
Expr& expr = ...;
int result = evalExpr(expr);
I'm working on a shift/reduce parser generator in C++11 and I am not sure how to specify the interface type of the input productions and reduction action functions such that they will hold the information I want to put in them.
I want to specify the grammar statically but using C++ types (not a separate build tool).
For each symbol (terminals and non-terminals) the user provides a string name and a type.
Then each production specifies a head symbol name and one or more body symbol names.
For each production an action function is provided by the user (the hard part) that returns the head nonterminal type and has parameters corresponding to the production body symbols (of their corresponding types).
The main problem is statically binding the parameter types and return type of these action functions to the corresponding symbol types
So for example:
Suppose we have nonterminals X, A B C
Their names/types might be:
"X" Foo
"A" string
"B" string
"C" int
And in the grammar there might be a production:
X -> A B C
And there will be an action function provided by the user for that production:
Foo f(string A, string B, int C)
If that production is reduced than the function f should be called with the production body parameters. The value returned by f is then stored for when that symbol is used in a higher up reduction.
So to specify the grammar to the parser generator I need to provide something like:
(I know the following is invalid)
struct Symbol
{
string name;
type T;
}
struct Production
{
string head;
vector<string> body;
function<head.T(body[0].T, body[1].T, ..., body[n].T)> action;
}
struct Grammar
{
vector<Symbol> symbols;
vector<Production> productions;
}
And to specify the earlier example would be:
Grammar example =
{
// symbols
{
{ "X", Foo },
{ "A", string },
{ "B", string },
{ "C", int }
},
// productions
{
{
"X",
{ "A", "B", "C" },
[](string A, string B, int C) { ... return Foo(...); }
}
}
}
This won't work of course, you can't mix type parameters with runtime parameters like that.
One solution would be to have some generic base:
struct SymbolBase
{
...
}
template<class SymbolType>
struct SymbolDerived<SymbolType> : SymbolBase
{
SymbolType value;
}
and then make all action functions of type:
typedef function<SymbolBase(vector<SymbolBase>)> ActionFunction;
and sort it out at runtime. But this makes usage more difficult, and all the casting is slow. I'd rather have the function signatures checked at compile-time and keep the mechanics hidden from the user.
How can I restructure the Symbol, Production and Grammar types to carry the information I am trying to convey in legal C++11?
(Yes I have looked at Boost Spirit and friends, it is a fine framework but it is recursive descent so the languages it can handle in a single pass are fewer than a LALR parser and because it uses backtracking the reduction actions will get called multiple times, etc, etc)
I've been playing around with precisely this problem. Once possibility I've been looking at, which looks like it should work, is to use a stack of variant objects, perhaps boost::variant or boost::any. Since each reduction knows what it's expecting from the stack, the access will be type-safe; unfortunately, the type-check will be at run-time, but it should be very cheap. This has the advantage of catching bugs :) and it will also correctly destruct objects as they're popped from the stack.
I threw together some sample code as a PoC, available upon request. The basic style for writing a reduction rule is something like this:
parse.reduce<Expression(Expression, _, Expression)>
( [](Expression left, Expression right){
return BinaryOperation(Operator::Times, left, right);
});
which corresponds to the rule:
expression: expression TIMES expression
Here, BinaryOperation is the AST node-type, and must be convertible to Expression; the template argument Expression(Expression, _, Expression) is exactly the left-hand-side and right-hand-side of the production, expressed as types. (Because the second RHS type is _, the templates don't bother feeding the value to the reduction rule: with a proper parser generator, there would actually be no reason to even push punctuation tokens onto the stack in the first place.) I implemented both the tagged union Expression and the tagged type of the parser stack using boost::variant. In case you try this, it's worth knowing that using a variant as one of the option types of another variant doesn't really work. In the end, it was easiest to wrap the smaller union as a struct. You also really have to read the section about recursive types.
The following loop iterates over all edges of a graph, determines if the end nodes belong to the same group, and then adds the edge weight to the total edge weight of that group.
// TODO: parallel
FORALL_EDGES_BEGIN((*G)) {
node u = EDGE_SOURCE;
node v = EDGE_DEST;
cluster c = zeta[u];
cluster d = zeta[v];
if (c == d) {
// TODO: critical section
intraEdgeWeight[c] += G->weight(u, v);
} // else ignore edge
} FORALL_EDGES_END();
I would like to parallelize it with OpenMP. I think that the code in the if statement is a critical section that might lead to a race condition and wrong results if threads are interrupted in the middle of it (correct?).
If the += operation could be made atomically, I believe that the problem is solved (correct?). However, I've looked up the atomic directive and there it states that:
Unfortunately, the atomicity setting can only be specified to simple
expressions that usually can be compiled into a single processor
opcode, such as increments, decrements, xors and the such. For
example, it cannot include function calls, array indexing, overloaded
operators, non-POD types, or multiple statements.
What should I use to parallelize this loop correctly?
Actually the accepted syntax for atomic update is:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
where x is a scalar l-value expression and expr is any expression, including function calls, with the only restriction being that it must be of scalar type. The compiler would take care to store the intermediate result in a temporary variable.
Sometimes it's better to consult the standards documents instead of reading tutorials on the Internet. Observe the OpenMP 3.1 standard Expample A.22.1c:
float work1(int i)
{
return 1.0 * i;
}
...
#pragma omp atomic update
x[index[i]] += work1(i);
I also think your if block is a critical section, you should not write a vector from multiple threads without serializing the writes. You can use #pragma omp critical to restrict the execution of the += in your if block to a single thread at a time.
You can split the expression in two parts: the function call that assigns the result into a temporary, and the addition of that temporary to the accumulator. In that case, the second expression will be simple enough to use omp atomic, assuming that the addition itself is not a complex overloaded operator for a custom type. Of course you can only do that in case G->weight(u,v) is a thread-safe call, otherwise you have to use omp critical or a mutex.
I just ran into this piece of code that does this :
delete a, a = 0;
It compiles and runs just fine. But isn't this supposed to be :
delete a;
a = 0;
Why is separating statements using , allowed in this case ?
Thanks :)
In C and C++, most "statements" are actually expressions. The semicolon added to an expression makes it into a statement. Alternatively, it is allowed (but almost always bad style) to separate side-effectful expressions with the comma operator: the left-hand-side expression is evaluated for its side-effects (and its value is discarded), and the right-hand-side expression is evaluated for its value.
This is the comma-operator. It evaluates both it's arguments and returns the second one.
This is the comma operator. It can be used to separate expressions, but not declarations.
That is comma operator. MSDN article is here. And have a look at this question to understand how it works.
While it is possible to write code like that, it may be somewhat weird. A slightly more realistic usecase would be if you have a struct T as follows:
struct T {
bool check() const;
void fix();
};
Now you want to iterate through everything in the struct and run check on it, and then call fix if check returns false. The simple way to do this would be
for (list<T>::iterator it = mylist.begin(); it < mylist.end(); ++it)
if (!it->check())
it->fix();
Let's pretend you want to write it in as short a way as possible. fix() returning void means you can't just put it in the condition. However, using the comma operator you can get around this:
for (auto it = mylist.begin(); it != mylist.end() && (it->check() || (it->fix(), true)); ++it);
I wouldn't use it without a particularly good reason, but it does allow you to call any function from a condition, which can be convenient.