How to Insert a LLVM Instruction? - c++

I've been searching for hours and I can't find anything that could help me. I'm working on a project that involves a FunctionPass. I've implemented a runOnFunction(Function &f) method and that's working fine. Basically it needs to:
1) Detect a store instruction
2) Convert the memory address of the store instruction to an Integer
3) Alter the integer using a bitwise AND operation (0000FFFF)
4) Convert the integer back into the pointer
So far I've got the following:
virtual bool runOnFunction(Function &F) {
for (Function::iterator bb = F.begin(), bbe = F.end(); bb != bbe; ++bb) {
BasicBlock& b = *bb;
for (BasicBlock::iterator i = b.begin(), ie = b.end(); i != ie; ++i) {
if(StoreInst *si = dyn_cast<StoreInst>(&*i)) {
PtrToIntInst* ptrToInt = new PtrToIntInst(si->getPointerOperand(), IntegerType::get(si->getContext(), 32), "", si);
}
}
}
return true;
}
I can't for the life of me figure out how to actually insert the instruction, or even find a way to create an AND instruction. If anyone could point me in the right direction, that would be great.
Thanks in advance.

I recommend taking a look at the Programmer's Manual - it has a pretty decent coverage of the basics.
In particular, there's a section about creating and inserting new instructions. The simplest way is just to provide an existing instruction as the last argument for the new instruction's constructor, which will then insert that instruction immediately before the existing one.
Alternatively, you can pass the enclosing basic block if you just want to add to its end (but remember you need to take care of the terminator!). Finally, you can just call getInstList() on the enclosing basic block, then insert or push_back to insert new instructions there.
As an aside, you don't have to iterate over all blocks and then over all instructions in each, you can just iterate over the instructions directly; see the section about the instruction iterator in the programmer's manual.

virtual bool runOnFunction(Function &F) {
for (Function::iterator bb = F.begin(), bbe = F.end(); bb != bbe; ++bb) {
BasicBlock &b = *bb;
for (BasicBlock::iterator i = b.begin(), ie = b.end(); i != ie; ++i) {
if (StoreInst *si = dyn_cast<StoreInst>(&*i)) {
IRBuilder Builder(si);
Value *StoreAddr = Builder.CreatePtrToInt(si->getPointerOperand(), Builder.getInt32Ty());
Value *Masked = Builder.CreateAnd(StoreAddr, 0xffff);
Value *AlignedAddr = Builder.CreateIntToPtr(Masked, si->getPointerOperand()->getType());
// ...
}
}
}
return true;
}

You can use an IRBuilder to easily insert new instructions before another instruction or at the end of a basic block.
Alternatively, if you need to insert an instruction after another one, you need to use the instruction list in the containing basic block:
BasicBlock *pb = ...;
Instruction *pi = ...;
Instruction *newInst = new Instruction(...);
pb->getInstList().insertAfter(pi, newInst);
Code and solution taken from here.

Related

How to insert a basic block between two block in LLVM

This is similar to Inserting a block between two blocks in LLVM however the solution description is unclear to me - or better - I tried to do it like described but it does not work (for me).
What I want to do:
Wherever a basic block has more then one successor, I want to insert a basic block.
So if basic block A does a conditional jump to B or C, I want to insert a basic block between A and B and between A and C. And it should also work if there is a jump table.
So what I do is:
while (...) {
// get next basic block and ensure it has at least 2 successors:
BasicBlock *origBB = getNextBB();
Instruction *TI = origBB->getTerminator()
if (!TI || TI->getNumSuccessors() < 2)
continue;
// collect successors:
std::vector<BasicBlock *> Successors;
for (succ_iterator SI = succ_begin(origBB), SE = succ_end(origBB); SI != SE; ++SI) {
BasicBlock *succ = *SI;
Successors.push_back(succ);
}
// now for each successor:
for (uint32_t i = 0; i < Successors.size(); i++) {
// Create a new basic block
BasicBlock *BB = BasicBlock::Create(C, "", &F, nullptr);
// F.getBasicBlockList().push_back(BB); <= this did not work, seem to result in endless loop
IRBuilder<> IRB(BB);
// put instructions into BB
... // omitted
// then add the terminator:
IRB.CreateBr(Successors[i]);
// Now we have to fix the original BB to our new basic block:
TI->setSuccessor(i, BB);
}
}
When I run this LLVM pass I get the following error:
PHI node entries do not match predecessors!
OK so I thought I have to remove the corresponding predecessor from the successor and added the following code after the setSuccessor():
origBB->replaceSuccessorsPhiUsesWith(Successors[j], BB);
BasicBlock *S = Successors[i];
S->removePredecessor(origBB);
Then however I get the error Instruction does not dominate all uses!
I am sure the solution is very simple - but I cannot find it :-(
Thanks a lot for any help or pointers!
OK I am anwsering my own question here.
The comment proposing a) using splitBlock() is going into the wrong direction as it is about inserting into an edge so the result would be wrong. b) updating the phi would actually solve the issue. However this is very complex as a simple replacePhiUsesWith cannot be used - the case where A->B and A is a loop head and B a loob tail (so going back to B) will result in a compilation error, so all phis have to be carefully evaluated.
So the solution is actually very simple, which I found browsing through the source code of llvm: SplitEdge(). It does exactly what I want, inserting a a basic block between the edge of two basic blocks!
It is not obvious as the function is not documented in the doxygen class list, so other than by browsing sources and includes it cannot be found.
So here how to use it:
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
void inYourFunction() {
...
// Insert the new block into the edge between thisBB and a successorBB
BasicBlock *insertedBB = SplitEdge(thisBB, successorBB);
if (!insertedBB) {
// SplitEdge can fail, e.g. if the successor is a landing pad
return;
}
// Then put instructions into the new BB
BasicBlock::iterator IP = newBB->getFirstInsertionPt();
IRBuilder<> IRB(&(*IP));
// and then work with IRB
// You need not to take care of the branch to successorBB - it is already there
...
}
Thats it, its that simple.

How to keep iterating over data structure after inserting elements into it?

In the code snippet below, I insert Instruction's into the BasicBlock pointed to by Function::iterator bs. The inner loop iterates over the instructions contained within this BasicBlock.
Now, after the inner loop inserts these instructions, the program goes into an infinite loop with instruction sequence:
and
mul
xor
and
mul
xor
and
mul
xor
and
mul
xor
and
mul
xor
and
mul
...
How would I insert into the data structure being iterated over, while avoiding going into an infinite loop?
Somehow the iterator goes nuts (or it is invalidated). Is there a common idiom for how to tackle this problem?
for (Function::iterator bs = F.begin(), be = F.end(); bs != be; ++bs) {
for (BasicBlock::iterator is = bs->begin(), ie = be->end(); is != ie; ++is) {
Instruction& inst = *is;
BinaryOperator* binop = dyn_cast<BinaryOperator>(&inst);
if (!binop) {
continue;
}
unsigned opcode = binop->getOpcode();
errs() << binop->getOpcodeName() << "\n";
if (opcode != Instruction::Add) {
continue;
}
IRBuilder<> builder(binop);
Value* v = builder.CreateAdd(builder.CreateXor(binop->getOperand(0), binop->getOperand(1)),
builder.CreateMul(ConstantInt::get(binop->getType(), 2),
builder.CreateAnd(binop->getOperand(0), binop->getOperand(1))));
ReplaceInstWithValue(bs->getInstList(), is, v); // THINGS GO WRONG HERE!
}
}
Unfortunately, you failed to provide sufficient details, but I strongly suspect that you're inserting a new element into a container in such a way that existing iterators (to other elements) are invalidated. This is the usual behaviour for many container classes, e.g. std::vector<>::insert(), which invalidates all existing iterators if the new size() exceeds capacity() (otherwise only existing iterators to elements before the insertion point remain valid).
The way to avoid this is to use a container that does not suffer from this problem, e.g. a std::list<>, since std::list<>::insert() does not invalidate any existing iterator or reference.

How do I print out an Instruction in LLVM?

for (BasicBlock::iterator i = bb->begin(), e = bb->end(); i != e; ++i) {
i.print(errs()); ???
I am writing an LLVM PASS and I want to get the list of instructions inside the basic block, but how do print them out on the console so I can see them? The code above shows the code i have tried, it iterates through every instruction in the basic block but I get the error below for the print function.
error: ‘llvm::BasicBlock::iterator’ has no member named ‘print’
i.print(errs());
Is there a better approach to printing out instructions?
The problem is that you are trying to print the iterator and not an instruction. You can try one of the following approaches. You can print the instructions in a basic block by either printing the basic block or printing each instruction:
BasicBlock* bb = ...; //
errs() << *bb;
for (BasicBlock::iterator i = bb->begin(), e = bb->end(); i != e; ++i) {
Instruction* ii = &*i;
errs() << *ii << "\n";
Both prints will output the same results.

How to change a vector item in C++?

I've got a vector of structs in C++ and I would like to modify each item individually. I found that doing SomeStruct info = myVector[i] gives me a copy of the item, so if I modify it nothing will be changed. So right now I'm resetting the item like that: myVector[i] = info. Is there a more efficient way do that? One that won't involve a copy operation?
This is my current code:
struct CharacterInfo {
QChar character;
int occurrences;
double frequency;
};
std::vector<CharacterInfo> characterInfos;
// Some code to populate the vector
for (unsigned i = 0; i < characterInfos.size(); i++) {
CharacterInfo info = characterInfos[i];
info.frequency = (double)info.occurrences / (double)totalOccurrences;
characterInfos[i] = info; // how to avoid this?
}
The simplest way which doesn't change too much of your code is just to use a reference instead of an instance. So:
SomeStruct & info = myVector[i];
The next easiest way is to change from using a loop with an index, so like:
for (std::vector<SomeStruct>::iterator it = myVector.begin(); it != myVector.end(); ++it)
{
SomeStruct & info = *it;
// do stuff here
}
With the STL you can go even further, especially if you have a C++11 capable compiler, for instance:
std::for_each(std::begin(myVector), std::end(myVector), [](SomeStruct & info) { /* do stuff here */ });
Also not related to your question directly, but if you add a method to the struct that computes the frequency, the code becomes much cleaner, for instance following from the last example you could do:
std::for_each(std::begin(myVector), std::end(myVector), std::mem_fun(&SomeStruct::calculateFrequency));
This will also work without a C++11 compiler if you change the calls to std::begin(myVector) with myVector.begin() and the same for end.
You can use a reference:
CharacterInfo& info = characterInfos[i];
info.frequency = (double)info.occurrences / (double)totalOccurrences;
The reference info is bound to the element of your vector. If you change it, you change
the vector element too.
You could iterate through the vector with an STL iterator:
for (vector<CharacterInfo>::iterator it = characterInfos.begin();
it != characterInfos.end(); ++it) {
it->frequency = (double)it->occurrences / totalOccurrences;
}
In the loop, it is an iterator that has basically same functionality and interface as a pointer to a CharacterInfo struct: http://cplusplus.com/reference/std/iterator/RandomAccessIterator/
Looping with an iterator is the more idiomatic way of iterating through each element of a std::vector if you don't need to know the index of each element.
I am not sure I understand your question but I think you are trying to do this?
for (unsigned i = 0; i < characterInfos.size(); i++) {
characterInfos[i].frequency = (double)characterInfos[i].occurrences / (double)totalOccurrences;
}
Another option would be to use iterators:
for(std::vector<CharacterInfo>::iterator it = characterInfos.begin(); it != characterInfos.end(); ++it){
it->frequency = (double)it->occurences / (double)totalOccurences;
}
Wow, this is a very old question. For "newer" c++, the same can be done with Range-based for loop (since C++11)
for(auto &characterInfo : characterInfos) {
characterInfo.frequency = characterInfo.occurences / static_cast<double>(totalOccurences);
}

delete loop by eraseFromParent command in llvm

*I would delete the Loop. I used the following code:
cout << "begin to delete loop" << endl;
for (Loop::block_iterator bi = L->block_begin(), bi2; bi != L->block_end(); bi = bi2) {
bi2 = bi;
bi2++;
BasicBlock * BB = *bi;
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii= ii2) {
ii2 = ii;
ii2++;
Instruction *inst = ii;
inst->eraseFromParent();
}
BB->eraseFromParent();
}
But I get the following error:
Use still stuck around after Def is destroyed: %t1 = icmp sle i32 %t0, 9
opt: /home/llvm/src/lib/VMCore/Value.cpp:75: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
0 opt 0x0848e569
Stack dump:
What suggestions do you have for solve this problem?*
The solution of you problem is as follows:
make sure that for each instruction in the loop to drop all references, then simply erase all the BasicBlocks of the loop.
here is my sample code
for (Loop::block_iterator block = CPLoop->block_begin(), end = CPLoop->block_end(); block != end; block++) {
BasicBlock * bb = *block;
for (BasicBlock::iterator II = bb->begin(); II != bb->end(); ++II) {
Instruction * insII = &(*II);
insII->dropAllReferences();
}
}
for (Loop::block_iterator block = CPLoop->block_begin(), end = CPLoop->block_end(); block != end; block++) {
BasicBlock * bb = *block;
bb->removeFromParent();
}
I hope this helps
What I write is only a guess, cause I am just starting with LLVM, but I hope it will be helpful.
In SSA form each instruction:
uses values provided by previously executed instructions
provides value (with is result of executing this instruction), which is used by others.
Those are called use-def and def-use chains.
If you try to remove instruction which result (a.k.a. "provided Value") is used by other instructions, than you break instruction chain.
You might be interested in iteratating over users of instruction you remove, using :
LLVM Programmer's Manual : Iterating over def-use & use-def chains. Thanks to that, you can iterate over users (u) of value provided by instruction, you want to remove (inst), and change their reference to another one (like inst: add u v --> add X v). Ones you make sure no one is using instruction you want to remove, remove it. (Depending if analysis passes are already made you might be required to let llvm pass manager know that CFG analysis needs to be updated - unless you update them by yourself).
You are invalidating the iterator with the call to
inst->eraseFromParent();
Store all Instruction* in an std::vector or similar and batch delete them at the end of your pass.
This should solve your problem.
There is an alternative solution for "deleting" a loop: Just permanently disable it. I.e. modify the IR code from sth. like this:
...
br label %loop
loop:
<loop body>
br i1 %exitcond, label %exit, label %loop
exit:
...
to sth. like this:
...
br i1 0, label %loop, label %exit
loop:
<loop body>
br i1 %exitcond, label %exit, label %loop
exit:
...
You will probably run optimizations (like dead code elimination) on your generated IR anyways, so why fight with all the references to the loop (e.g. in LoopInfos or ValueMaps)?