I want to check the value of some instruction at runtime. Therefore, I create a compare instruction and a branch instruction which branches to either the "then" basic block or the "else" basic block. However, I am not sure how I can insert the created basic block after the conditional branch and how the splitting of the existing basic block works.
Instruction* someInst;
IRBuilder<> B(someInst);
Value* condition = B.CreateICmp(CmpInst::ICMP_UGT, someInst, someValue);
BasicBlock* thenBB = BasicBlock::Create(*ctx, "then");
BasicBlock* elseBB = BasicBlock::Create(*ctx, "else");
B.CreateCondBr(condition, thenBB, elseBB);
B.SetInsertPoint(thenBB);
//insert stuff
B.SetInsertPoint(elseBB);
//insert stuff
How can I insert an if/else in the middle of an existing basic block?
Short answer: you can probably use llvm::SplitBlockAndInsertIfThenElse. Don't forget your PHI node.
According to Wikipedia, a basic block:
is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit.
An if-then-else therefore involves several blocks:
The block that contains the condition,
The then block
The else block
Optionally, the block after the then and else blocks (if then and else don't return or branch elsewhere).
To insert an if-then-else, the original Basic Block must be split into (1) and (4). The condition checking and conditional branching go into (1), and (2) and (3) finish with a branch to (4). The SplitBlockAndInsertIfThenElse function (docs) will do this for you in simple cases. If you have more complicated requirements - such as then or else containing their own control flow - you may need to do the splitting yourself.
If your then or else blocks modify variables, you will need a PHI node. The Kaleidoscope tutorial explains why PHI nodes are needed and how to use them. The tutorial references the Single Static Assignment Wikipedia article, which is useful background.
There is a helper function you can use called llvm::SplitBlockAndInsertIfThenElse. You'll need to #include "llvm/Transforms/Utils/BasicBlockUtils.h".
Related
do [[unlikely]]
{...}
while(a == 0);
This code can be compiled.
But is this the correct way to tell compiler that a is usually non-zero.
Structurally, this is a correct way to say what you're trying to say. The attribute is placed in a location that tags the path of execution that is likely/unlikely to be executed. Applying it to the block statement of the do/while loop works adequately. It would also work within the block.
That having been said, it's unclear what good this would do practically. It might prevent some unrolling of the loop or inhibit prefetching. But it can't really change the structure of the compiled code, since the block has to be executed at least once and the conditional branch has to come after the block.
Struggle with MARIE Assembly.
Needing to write a code that has x=3 and y=5, is x>y then it needs to output 1, if x<y it needs to output one,
I have the start but don't know how to do if else statements in MARIE
LOAD X
SUBT Y
SKIPCOND 800
JUMP ELSE
OUTPUT
HALT
Structured statements have a pattern, and each one has an equivalent pattern in assembly language.
The if-then-else statement, for example, has the following pattern:
if ( <condition> )
<then-part>
else
<else-part>
// some statement after if-then-else
Assembly language uses an if-goto-label style. if-goto is a conditional test & branch; and goto alone is an unconditional branch. These forms alter the flow of control and can be composed to do the same job as structure statements.
The equivalent pattern for the if-then-else in assembly (but written in pseudo code) is as follows:
if ( <condition> is false ) goto if1Else;
<then-part>
goto if1Done;
if1Else:
<else-part>
if1Done:
// some statement after if-then-else
You will note that the first conditional branch (if-goto) needs to branch on condition false. For example, let's say that the condition is x < 10, then the if-goto should read if ( x >= 10 ) goto if1Else;, which branches on x < 10 being false. The point of the conditional branch is to skip the then-part (to skip ahead to the else-part) when the condition is false — and when the condition is true, to simply allow the processor to run the then-part, by not branching ahead.
We cannot allow both the then-part and the else-part to execute for the same if-statement's execution. The then-part, once completed, should make the processor move on to the next statement after the if-then-else, and in particular, to avoid the else-part, since the then-part just fired. This is done using an unconditional branch (goto without if), to skip ahead around the else-part — if the then-part just fired, then we want the processor to unconditionally skip the else-part.
The assembly pattern for if-then-else statement ends with a label, here if1Done:, which is the logical end of the if-then-else pattern in the if-goto-label style. Many prefer to name labels after what comes next, but these labels are logically part of the if-then-else, so I choose to name them after the structured statement patterns rather than about subsequent code. Hopefully, you follow the assembly pattern and see that whether the if-then-else runs the then-part or the else-part, the flow of control comes back together to run the next line of code after the if-then-else, whatever that is (there must be a statement after the if-then-else, because a single statement alone is just a snippet: an incomplete fragment of code that would need to be completed to actually run).
When there are multiple structured statements, like if-statements, each pattern translation must use its own set of labels, hence the numbering of the labels.
(There are optimizations where labels can be shared between two structured statements, but doing that does not optimize the code in any way, and makes it harder to change. Sometimes nested statements can result in branches to unconditional branches — since these actual machine code and have runtime costs, they can be optimized, but such optimizations make the code harder to rework so should probably be held off until the code is working.)
When two or more if-statements are nested, the pattern is simply applied multiple times. We can transform the outer if statement first, or the inner first, as long as the pattern is properly applied, the flow of control will work the same in assembly as in the structured statement.
In summary, first compose a larger if-then-else statement:
if ( x < y )
Output(1)
else
Output(one)
(I'm not sure this is what you need, but it is what you said.)
Then apply the pattern transformation into if-goto-label: since, in the abstract, this is the first if-then-else, let's call it if #1, so we'll have two labels if1Done and if1Else. Place the code found in the structured pattern into the equivalent locations of the if-goto-label pattern, and it will work the same.
MARIE uses SkipCond to form the if-goto statement. It is typical of machine code to have separate compare and branch instructions (as for a many instruction set architectures, there are too many operands to encode an if goto in a single instruction (if x >= y goto Label; has x, y, >=, and Label as operands/parameters). MARIE uses subtract and branch relative to 0 (the SkipCond). There are other write-ups on the specific ways to use it so I won't go into that here, though you have a good start on that already.
I am writing a simple program interpreter in c++. When I am building the internal representation of the program and I get a break statement, how do I determine the encompassing loops target location?
void Imp::whilestmt()
{
Expr *pExpr;
accept(Token::WHILE);
expr(pExpr);
WhileStmt *pwhilestmt = new WhileStmt(pExpr,vm.getLocationCounter);
vm.add(pwhilestmt);
accept(Token::LOOP);
stmtlist();
pwhilestmt->setTarget(vm.getLocationCounter);
accept(Token::END);
accept(Token::LOOP);
vm.add(new EndLoopStmt);
}
My break statement object is going to take the the while statement's target as a parameter, how can I determine this?
I'd consider building a kind of execution tree/pipeline. Every LOOP/WHILE would be a new branch (similarily to every function) so when you encounter END/BREAK instruction you just revert to the branches origin point and continue down the line.
I think the solution is to add a forward reference that is resolved (by looking up the location of the end of the loop) when all the code for that level of loop has been produced.
In other words, when generating the code for the loop, you need to form a "jump" instruction, which has it's target set to somewhere you don't know where it is yet. The solution is to have a jump with an unknown destination (set the "destination" to instruction 0 or -1 or 0xdeaddead or something else that can be easily identified for debugging purposes later - because the best way to avoid getting bugs of "I didn't fix it up properly" is to make it easy to identify those places - bugs only occur in things that are hard to identify, just like it never rains when you carry an umbrella), and keep a fixup list of such jumps until you have generated the entire loop, then work your way through that fixup list, and fill in the relevant address that you now know is "here" (the next instruction after the loop). I suspect you also need something similar for the condition of the loop itself - if that's false, then you need to continue "after" the loop.
I added setTarget as a virtual function of Stmt.
I stored the start location in the part that handles the if statements and then checked if I had any break stmts from the start location to the current location, and if I did I set the target to the current location.
really messy way to do it, but it works for now
I want to insert a block in between two basic blocks in LLVM. So for example, if a basic block A was jumping to basic block B, I want to insert a basic block C in between them such that A jumps to C and C jumps to B. How can I do that? I do have the basic idea that I need to change the terminating instruction of Basic Block A, such that the target B is replaced by C, but how do I go on adding the new basic block C in between?
Yes, you need to change (or replace) the terminating instruction of basic block A - for example, if it's a branch, you can use BranchInst::setSuccessor(). You then create basic block C and make sure that its terminating instruction jumps to B, which will make it in-between.
All you need to do is to change the terminators' targets - you don't need to rearrange the block order in the memory or anything like that.
However, you must be aware that there are two special instructions you need to worry about - phi nodes and landing pads.
Phi nodes only refer to the block's immediate predecessor. That means that if you insert C between A and B, you must fix all the phi nodes in B by either removing them or making them refer to C instead of A.
If B is a landingpad block (contains a landingpad instruction), it is only legal to jump into it directly from the unwind target of an invoke instruction. If the jump from A to B is through the unwind target, you can't add a basic block in-between unless you make C itself into a landingpad and remove the landingpad from B.
There is a function called llvm::splitEdge. It does exactly what the question asked for.
For my diploma thesis I chose to implement the task of the ICFP 2004 contest.
The task--as I translated it to myself--is to write a compiler which translates a high-level ant-language into a low-level ant-assembly. In my case this means using a DSL written in Clojure (a Lisp dialect) as the high-level ant-language to produce ant-assembly.
UPDATE:
The ant-assembly has several restrictions: there are no assembly-instructions for calling functions (that is, I can't write CALL function1, param1), nor returning from functions, nor pushing return addresses onto a stack. Also, there is no stack at all (for passing parameters), nor any heap, or any kind of memory. The only thing I have is a GOTO/JUMP instruction.
Actually, the ant-assembly is for to describe the transitions of a state machine (=the ants' "brain"). For "function calls" (=state transitions) all I have is a JUMP/GOTO.
While not having anything like a stack, heap or a proper CALL instruction, I still would like to be able to call functions in the ant-assembly (by JUMPing to certain labels).
At several places I read that transforming my Clojure DSL function calls into continuation-passing style (CPS) I can avoid using the stack[1], and I can translate my ant-assembly function calls into plain JUMPs (or GOTOs). Which is exactly what I need, because in the ant-assembly I have no stack at all, only a GOTO instruction.
My problem is that after an ant-assembly function has finished, I have no way to tell the interpreter (which interprets the ant-assembly instructions) where to continue. Maybe an example helps:
The high-level Clojure DSL:
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(pickup-food ; true branch, food was found
(go-home ; ***
(drop-food
(search-for-food cont))))
(move ; false branch, continue searching
(search-for-food cont))))
(defn run-away-from-enemy [cont]
(sense-enemy-here? ; a conditional w/ 2 branches
(go-home ; ***
(call-help-from-others cont))
(search-for-food cont)))
(defn go-home [cont]
(turn-backwards
; don't bother that this "while" is not in CPS now
(while (not (sense-home-here?))
(move)))
(cont))
The ant-assembly I'd like to produce from the go-home function is:
FUNCTION-GO-HOME:
turn left nextline
turn left nextline
turn left nextline ; now we turned backwards
SENSE-HOME:
sense here home WE-ARE-AT-HOME CONTINUE-MOVING
CONTINUE-MOVING:
move SENSE-HOME
WE-ARE-AT-HOME:
JUMP ???
FUNCTION-DROP-FOOD:
...
FUNCTION-CALL-HELP-FROM-OTHERS:
...
The syntax for the ant-asm instructions above:
turn direction which-line-to-jump
sense direction what jump-if-true jump-if-false
move which-line-to-jump
My problem is that I fail to find out what to write to the last line in the assembly (JUMP ???). Because--as you can see in the example--go-home can be invoked with two different continuations:
(go-home
(drop-food))
and
(go-home
(call-help-from-others))
After go-home has finished I'd like to call either drop-food or call-help-from-others. In assembly: after I arrived at home (=the WE-ARE-AT-HOME label) I'd like to jump either to the label FUNCTION-DROP-FOOD or to the FUNCTION-CALL-HELP-FROM-OTHERS.
How could I do that without a stack, without PUSHing the address of the next instruction (=FUNCTION-DROP-FOOD / FUNCTION-CALL-HELP-FROM-OTHERS) to the stack? My problem is that I don't understand how continuation-passing style (=no stack, only a GOTO/JUMP) could help me solving this problem.
(I can try to explain this again if the things above are incomprehensible.)
And huge thanks in advance for your help!
--
[1] "interpreting it requires no control stack or other unbounded temporary storage". Steele: Rabbit: a compiler for Scheme.
Yes, you've provided the precise motivation for continuation-passing style.
It looks like you've partially translated your code into continuation-passing-style, but not completely.
I would advise you to take a look at PLAI, but I can show you a bit of how your function would be transformed, assuming I can guess at clojure syntax, and mix in scheme's lambda.
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(search-for-food
(lambda (r)
(drop-food r
(lambda (s)
(go-home s cont)))))
(search-for-food
(lambda (r)
(move r cont)))))
I'm a bit confused by the fact that you're searching for food whether or not you sense food here, and I find myself suspicious that either this is weird half-translated code, or just doesn't mean exactly what you think it means.
Hope this helps!
And really: go take a look at PLAI. The CPS transform is covered in good detail there, though there's a bunch of stuff for you to read first.
Your ant assembly language is not even Turing-complete. You said it has no memory, so how are you supposed to allocate the environments for your function calls? You can at most get it to accept regular languages and simulate finite automata: anything more complex requires memory. To be Turing-complete you'll need what amounts to a garbage-collected heap. To do everything you need to do to evaluate CPS terms you'll also need an indirect GOTO primitive. Function calls in CPS are basically (possibly indirect) GOTOs that provide parameter passing, and the parameters you pass require memory.
Clearly, your two basic options are to inline everything, with no "external" procedures (for extra credit look up the original meaning of "internal" and "external" here), or somehow "remember" where you need to go on "return" from a procedure "call" (where the return point does not necessarily need to fall in the physical locations immediately following the "calling" point). Basically, the return point identifier can be a code address, an index into a branch table, or even a character symbol -- it just needs to identify the return target relative to the called procedure.
The most obvious here would be to track, in your compiler, all of the return targets for a given call target, then, at the end of the called procedure, build a branch table (or branch ladder) to select from one of the several possible return targets. (In most cases there are only a handful of possible return targets, though for commonly used procedures there could be hundreds or thousands.) Then, at the call point, the caller needs to load a parameter with the index of its return point relative to the called procedure.
Obviously, if the callee in turn calls another procedure, the first return point identifier must be preserved somehow.
Continuation passing is, after all, just a more generalized form of a return address.
You might be interested in Andrew Appel's book Compiling with Continuations.