Llvm Remove Terminator Instruction

Llvm Remove Terminator Instruction - llvm

I want to remove an UnreachableInst since a previous transformation has made it reachable. However, calling eraseFromParent() gives me a malformed BasicBlock since the UnreachableInst is the terminator of its BasicBlock. How do I fix the BasicBlock to terminate at the instruction previous to the UnreachableInst?

Every basic block must end with a terminator. I think that the most straightforward way to remove the unreachable instruction, then, is to replace it with another terminator - for example, a branch into the next basic block, a return instruction, etc.
Take a look at llvm::ReplaceInstWithInst in BasicBlockUtils.h for a convenient way to replace one instruction with another.

Related

Rocket Universe string delete a character question

This one has me wondering if I may be missing a function or something.
Have a string, example TZ118-AH01
I simply want to remove the second character and was wondering if there was a simple way of doing this, cannot use CONVERT as the second character may be repeated in the string.
Currently figuring I have to something like
VALUE = STRING[1,1]:STRING[3,LEN(STRING)-2]
Which seems a bit cumbersome.
Anyone have a nifty work around?

VALUE = STRING[1,1]:STRING[LEN(STRING)-2]
Would be syntactically the same if you are sure that the length won't ever be less that 2, and if for some reason it does you don't mind stopping the entire call stack.
If the first and second characters are known commodities, you could use field and fudge up the field count in the forth variable to something high like 1000, but that is kludgy and relies on the first and second character not being the same.
The best was would be a function or subroutine to concatenate a new new string iterating through the character array and skipping the second iteration.
SUBROUTINE Remove2ndChar(StringOut,StringIn)
StringOut=""
CC=LEN(StringIn)
FOR C=1 TO CC
If C NE 2 THEN
StringOut:=StringIn[C,1]
END
NEXT C
RETURN
This is not necessarily what you are looking for, but it is probably going to be more execution safe.
Good Luck.

How to get the memory address of all operands on an expression

I have some expression as a=b+c-d*e, and with the help of LLVM pass I want to make a string like this
"[Hexadecimal address of 'b'] [opcode of +] [Hexadecimal address of 'c'] [opcode of -] [Hexadecimal address of 'd'] [opcode of *] [Hexadecimal address of 'e']".
Than how can I do it .

First of all, keep in mind that variables do not necessarily reside in memory; they can be stored in registers or elided altogether. In the context of LLVM IR, it means either the value will be used directly from another value (without store or load).
Assuming all the variables involved do need to be loaded from memory, the most straightforward way I can think of for doing this is locating the store, then doing a post-order DFS backwards through the operands, recording the opcodes, and stopping when you identify a load. For your provided snippet, it should give you b's load, then plus opcode, then c's load, then minus opcode, etc.
Now that you have such a sequence, I'd say the simplest way to generate a string from it is to insert a call to C's sprintf with a dynamically-built format string, passing it the pointers that you found (that were loaded from).
I see two issues with the above, though:
There's some inherent ambiguity here - just visiting them this way cannot distinguish, for example, (b+c-d)*e from b+(c-d)*e. So I think it would make sense to also record "(" and ")" whenever you enter an arithmetic instruction and leave it, respectively.
This approach does not actually check that all the operations are part of the same expression. So if you have tmp = b+c; a = tmp-d*e;, and tmp is optimized away, then it will look the same in the IR. The only way I can think of for enforcing that is compiling with debug symbols and digging into those to identify distinct expressions - though I don't really know if that's possible - or actually modify Clang to record expression boundaries :\
Pseudo-code for this approach (with simplistic sequence-handling operations):
functionPass:
for each instruction:
if instruction is store:
processExpression(store)
processExpression(store):
sequence <- initialize
visit(sequence, store.value)
generateSprintfCallFromSequence(sequence)
visit(sequence, value):
if value is load:
sequence.add(load.pointer)
else if value is binaryop:
// sequence.add(openingParen)
visit(sequence, binaryop.operand(0))
sequence.add(binaryop.opcode)
visit(sequence, binaryop.operand(1))
// sequence.add(closingParen);

C++ Writing an Interpreter - determining loops target for break statement c++

I am writing a simple program interpreter in c++. When I am building the internal representation of the program and I get a break statement, how do I determine the encompassing loops target location?
void Imp::whilestmt()
{
Expr *pExpr;
accept(Token::WHILE);
expr(pExpr);
WhileStmt *pwhilestmt = new WhileStmt(pExpr,vm.getLocationCounter);
vm.add(pwhilestmt);
accept(Token::LOOP);
stmtlist();
pwhilestmt->setTarget(vm.getLocationCounter);
accept(Token::END);
accept(Token::LOOP);
vm.add(new EndLoopStmt);
}
My break statement object is going to take the the while statement's target as a parameter, how can I determine this?

I'd consider building a kind of execution tree/pipeline. Every LOOP/WHILE would be a new branch (similarily to every function) so when you encounter END/BREAK instruction you just revert to the branches origin point and continue down the line.

I think the solution is to add a forward reference that is resolved (by looking up the location of the end of the loop) when all the code for that level of loop has been produced.
In other words, when generating the code for the loop, you need to form a "jump" instruction, which has it's target set to somewhere you don't know where it is yet. The solution is to have a jump with an unknown destination (set the "destination" to instruction 0 or -1 or 0xdeaddead or something else that can be easily identified for debugging purposes later - because the best way to avoid getting bugs of "I didn't fix it up properly" is to make it easy to identify those places - bugs only occur in things that are hard to identify, just like it never rains when you carry an umbrella), and keep a fixup list of such jumps until you have generated the entire loop, then work your way through that fixup list, and fill in the relevant address that you now know is "here" (the next instruction after the loop). I suspect you also need something similar for the condition of the loop itself - if that's false, then you need to continue "after" the loop.

I added setTarget as a virtual function of Stmt.
I stored the start location in the part that handles the if statements and then checked if I had any break stmts from the start location to the current location, and if I did I set the target to the current location.
really messy way to do it, but it works for now

Inserting a block between two blocks in LLVM

I want to insert a block in between two basic blocks in LLVM. So for example, if a basic block A was jumping to basic block B, I want to insert a basic block C in between them such that A jumps to C and C jumps to B. How can I do that? I do have the basic idea that I need to change the terminating instruction of Basic Block A, such that the target B is replaced by C, but how do I go on adding the new basic block C in between?

Yes, you need to change (or replace) the terminating instruction of basic block A - for example, if it's a branch, you can use BranchInst::setSuccessor(). You then create basic block C and make sure that its terminating instruction jumps to B, which will make it in-between.
All you need to do is to change the terminators' targets - you don't need to rearrange the block order in the memory or anything like that.
However, you must be aware that there are two special instructions you need to worry about - phi nodes and landing pads.
Phi nodes only refer to the block's immediate predecessor. That means that if you insert C between A and B, you must fix all the phi nodes in B by either removing them or making them refer to C instead of A.
If B is a landingpad block (contains a landingpad instruction), it is only legal to jump into it directly from the unwind target of an invoke instruction. If the jump from A to B is through the unwind target, you can't add a basic block in-between unless you make C itself into a landingpad and remove the landingpad from B.

There is a function called llvm::splitEdge. It does exactly what the question asked for.

Difference between putback() and unget()

I'm using a Standard iostream to get some input from a file, and I'm confused about unget() versus putback(character). It seems to me from the documentation that these functions are effectively identical, where unget() just remembers the character put in, so I'm nervous. I've always used putback(character), but character is always the last read character and I've been thinking about changing to unget(). Is putback(character) always identical to unget(), if character is always the last read character?

You can't lie with unget(). It "ungets" the last-read character. You can lie with putback(c). You can "putback" some character other than the last-read character. Sometimes putting back a character other than the last-read character can be useful.
Also, if the underlying read buffer really does have buffering capability, you can "putback" more than one character. I think ungetc() is limited to one character.
Edit
Nope. It looks like unget() can go as far back as putback().

It's not the answer you probably expect, but want to introduce my reasoning. Documentation stays that the methods putback and unget call streambuf::sputbackc and streambuf::sungetc respectively. Definitions are as follow:
streambuf::sungetc
Moves the get pointer one character backwards, making the last character gotten by an input operation available once again for the next input operation.
During its operation, the function will call the protected virtual member function pbackfail if the get pointer gptr points to the same position as the beginning pointer eback.
The other one:
streambuf::sputbackc
The get pointer is moved back to point to the character right before its current position so the last character gotten, c, becomes available again as the character to be read at that position by the next input operation.
During its operation, the function calls the protected virtual member function pbackfail either if the character c doesn't match gptr()[-1] or if the get pointer gptr points to the same position as the beginning pointer eback.
When c does not match the character at that position, the default definition of pbackfail in streambuf will prepend c to be the character extracted at that position if possible, but derived classes may override this behavior.
The member function sungetc behaves in a similar way but without taking any parameter
As sputbackc calls pbackfail if character doesn't match, it means the method has to check if the values are equal. It looks like the additional check is the only overhead, but have no idea how it is solved in practise. I can imagine that if the last character is not stored in the object then it has to be reread, so you might expect it even when the characters are guaranteed to be the same.
I was a little bit concerned about situation when we call unget, but last character is not available. Would the putback put the value correctly? I doubt, but it shouldn't be the case while operating on files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Llvm Remove Terminator Instruction - llvm

Related

Rocket Universe string delete a character question

How to get the memory address of all operands on an expression

C++ Writing an Interpreter - determining loops target for break statement c++

Inserting a block between two blocks in LLVM

Difference between putback() and unget()

Categories

Resources