Traversal of LLVM Operands

Traversal of LLVM Operands - llvm

Using a ModulePass, my goal is to traverse a SSA graph upwards: going from one Statement with 0..2 operands (most opcodes fall under that), I want to find out two things:
Is an operand a metadata / constant (easy: just try casting to Constant-Type) or a variable?
If it is a variable, get me the Statement where it is defined (as LLVM IR is in SSA form, this is a well-formed query), so I can recursively continue this traversal.
As an example, assume following LLVM IR:
define i32 #mul_add(i32 %x, i32 %y, i32 %z) {
entry:
%tmp = mul i32 %x, %y
%tmp2 = add i32 %tmp, %z
ret i32 %tmp2
}
I will be starting with the return statement. Now I want to find out what I'm returning:
I will get myself the operands of the return statement
I detect that the operand is a variable called %tmp2
I will get myself the operands of the statement where %tmp2 is defined
I will first traverse the first operand, %tmp
(...)
%x is a function parameter and therefore probably the end of my traversal (or is it?)
(... continue with the other branches in this DFS ...)
How would I use the C++ API to implement those steps?

The solution is simple and I will describe it as generic as possible.
Each Operand of an llvm::Instruction in LLVM is of supertype llvm::Value. A subtype of Value is llvm::Instruction. This allows recursion via following methods:
bool runOnInstruction(llvm::Instruction *instruction) {
bool flag = false;
for (auto operand = instruction->operands().begin();
operand != instruction->operands().end(); ++operand) {
printOperandStats(operand->get());
flag = runOnOperand(operand->get()) | flag;
}
return flag;
}
bool runOnOperand(llvm::Value *operand) {
operand->printAsOperand(errs(), true);
// ... do something else ...
auto *instruction = dyn_cast<llvm::Instruction>(operand);
if (nullptr != instruction) {
return runOnInstruction(instruction);
} else {
return false;
}
}
This equals a DFS as required by the question. Each Operand will be cast to an Instruction and will be recursively analyzed. The boolean return value is used as usual for LLVM Passes: a value of true describes a modification to the IR.

Related

LLVM How to get return value of an instruction

I have a program which allocates memory from stack like this:
%x = alloca i32, align 4
In my pass I want to get the actual memory pointer that points to this allocated memory at runtime. This should be %x. How do I get the pointer in my pass?
Instruction* I;
if (AllocaInst* AI = dyn_cast<AllocaInst>(I)) {
//How to get %x?
}

You can work with an Instruction* as a Value* (and Instruction inherits from Value), then you are working with the result / return value of that instruction. I have adapted some code from my LLVM Pass to demonstrate allocating space using alloca and then storing into that location. Notice that the results of the instructions can be directly passed to other instructions, as they are values.
// M is the module
// ci is the current instruction
LLVMContext &ctx = M.getContext();
Type* int32Ty = Type::getInt32Ty(ctx);
Type* int8Ty = Type::getInt8Ty(ctx);
Type* voidPtrTy = int8Ty->getPointerTo();
// Get an identifier for rand()
Constant* = M.getOrInsertFunction("rand", FunctionType::get(cct.int32Ty, false));
// Construct the struct and allocate space
Type* strTy[] = {int32Ty, voidPtrTy};
Type* t = StructType::create(strTy);
Instruction* nArg = new AllocaInst(t, "Wrapper Struct", ci);
// Add Store insts here
Value* gepArgs[2] = {ConstantInt::get(int32Ty, 0), ConstantInt::get(int32Ty, 0)};
Instruction* prand = GetElementPtrInst::Create(NULL, nArg, ArrayRef<Value*>(gepArgs, 2), "RandPtr", ci);
// Get a random number
Instruction* tRand = CallInst::Create(getRand, "", ci);
// Store the random number into the struct
Instruction* stPRand = new StoreInst(tRand, prand, ci);

If you want to store or load to %x you just use a store or lid instruction
If you want the numeric value of your pointer, use the ptrtoint instruction.

Outputting input's constant char array from llvm pass

all
I want to know how a llvm pass output constant char array defined from the input
source. Here's an example that I want to do.
Test input source
char* msg = "hello, world\n";
void msg_out(char * in) {
printf("msg: %s \n", in);
}
main () {
...
msg_out(msg);
...
}
llvm pass snippet
...
const CallInst* ci = dyn_cast<CallInst>(val);
const Function* func = ci->getCalledFunction();
if (func->getName() == "msg_out") {
errs() << ci->getOperand(0);
}
...
With the source, the above llvm pass would print the following output.
output
i8* getelementptr inbounds ([8 x i8]* #10, i32 0, i32 0)
However, what I want to implement instead is
identify the 1st argument is a constant character array
if so, print out "hello, world\n"
Can anyone let me know how to implement this?
Thanks a lot for your help in advance!
/Kangkook

First of all, the first argument isn't a constant character array; it's a pointer to one, hence the getelementptr (gep). In any case, the proper way to do this is to dereference the gep's pointer, verify it's a global, then get its initializer. In your case (and since the gep is actually a constant expression), it should look like this:
Value* op0 = ci->getOperand(0);
if (GetElementPtrConstantExpr* gep = dyn_cast<GetElementPtrConstantExpr>(op0)) {
if (GlobalVariable* global = dyn_cast<GlobalVariable>(gep->getOperand(0))) {
if (ConstantDataArray* array = dyn_cast<ConstantDataArray>(global->getInitializer())) {
if (array->isCString()) return array->getAsCString();
}
}
}

How to Insert a LLVM Instruction?

I've been searching for hours and I can't find anything that could help me. I'm working on a project that involves a FunctionPass. I've implemented a runOnFunction(Function &f) method and that's working fine. Basically it needs to:
1) Detect a store instruction
2) Convert the memory address of the store instruction to an Integer
3) Alter the integer using a bitwise AND operation (0000FFFF)
4) Convert the integer back into the pointer
So far I've got the following:
virtual bool runOnFunction(Function &F) {
for (Function::iterator bb = F.begin(), bbe = F.end(); bb != bbe; ++bb) {
BasicBlock& b = *bb;
for (BasicBlock::iterator i = b.begin(), ie = b.end(); i != ie; ++i) {
if(StoreInst *si = dyn_cast<StoreInst>(&*i)) {
PtrToIntInst* ptrToInt = new PtrToIntInst(si->getPointerOperand(), IntegerType::get(si->getContext(), 32), "", si);
}
}
}
return true;
}
I can't for the life of me figure out how to actually insert the instruction, or even find a way to create an AND instruction. If anyone could point me in the right direction, that would be great.
Thanks in advance.

I recommend taking a look at the Programmer's Manual - it has a pretty decent coverage of the basics.
In particular, there's a section about creating and inserting new instructions. The simplest way is just to provide an existing instruction as the last argument for the new instruction's constructor, which will then insert that instruction immediately before the existing one.
Alternatively, you can pass the enclosing basic block if you just want to add to its end (but remember you need to take care of the terminator!). Finally, you can just call getInstList() on the enclosing basic block, then insert or push_back to insert new instructions there.
As an aside, you don't have to iterate over all blocks and then over all instructions in each, you can just iterate over the instructions directly; see the section about the instruction iterator in the programmer's manual.

virtual bool runOnFunction(Function &F) {
for (Function::iterator bb = F.begin(), bbe = F.end(); bb != bbe; ++bb) {
BasicBlock &b = *bb;
for (BasicBlock::iterator i = b.begin(), ie = b.end(); i != ie; ++i) {
if (StoreInst *si = dyn_cast<StoreInst>(&*i)) {
IRBuilder Builder(si);
Value *StoreAddr = Builder.CreatePtrToInt(si->getPointerOperand(), Builder.getInt32Ty());
Value *Masked = Builder.CreateAnd(StoreAddr, 0xffff);
Value *AlignedAddr = Builder.CreateIntToPtr(Masked, si->getPointerOperand()->getType());
// ...
}
}
}
return true;
}

You can use an IRBuilder to easily insert new instructions before another instruction or at the end of a basic block.
Alternatively, if you need to insert an instruction after another one, you need to use the instruction list in the containing basic block:
BasicBlock *pb = ...;
Instruction *pi = ...;
Instruction *newInst = new Instruction(...);
pb->getInstList().insertAfter(pi, newInst);
Code and solution taken from here.

Does this qualify as tail recursion? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Tail recursion in C++
I'm new to tail recursion in c++. My project requires I make all my functions tail recursive. I've tested the following code and it works correctly. However, I'm not sure if how I've done it qualifies as tail recursion.
static int sum_helper(list_t hList, int accumulator){
if (list_isEmpty(hList))
return accumulator;
else {
accumulator += list_first(hList);
hList = list_rest(hList);
return sum_helper(hList, accumulator);
}
}
int sum(list_t list){
/*
// EFFECTS: returns the sum of each element in list
// zero if the list is empty.
*/
if (list_isEmpty(list))
return 0;
return sum_helper(list, 0);
}
Thanks!

In short, you don't do anything after the recursive call (sum_helper). This means that you never need to return to the caller, and thus, you can throw away the stack frame of the caller.
Take the example of the normal factorial function
int fact(int x)
{
if(x == 0)
return 1;
else
return x * fact(x-1);
}
This is not tail recursive since the value of fact(x-1) needs to be returned, then multiplied by six. Instead, we can cheat a little, and pass an accumulator too. See this:
int fact(int x, int acc)
{
if(x == 0)
return acc; // Technically, acc * 1, but that's the identity anyway.
else
return fact(x-1, acc*x);
}
Here, the last function call in the control flow is fact(x-1, acc*x). Afterwards, we don't need to use the return value for anything of the called function for anything else, hence we don't need to return to the current frame. For this reason, we can throw away the stack frame and apply other optimisations.
Disclaimer: I've probably applied the factorial algorithm wrong, but you get the jist. Hopefully.

It's tail-recursion provided list_t doesn't have a non-trivial destructor. If it does have a non-trivial destructor, the destructor needs to run after the recursive call returns and before the function itself returns.
Bonus:
int sum(list_t hList, int accumulator = 0) {
return list_isEmpty(hList)
? 0
: sum(list_rest(hList), accumulator + list_first(hList));
}
But tastes vary; some people might like yours more.

From theoreitcal point of view, yes, it's tail recursion (provided that hList does not have nontrival destructor). But from practival point of view it depends on your compiler and its settings. Let's take a look at assembly generated for this simple code:
#include <cstdlib>
struct list{
int head;
list * tail;
};
int sum_helper(list * l, int accumulator){
if (l == NULL)
return accumulator;
else {
accumulator += l->head;
return sum_helper(l->tail, accumulator);
}
}
Optimisations ON : (g++ -O2 ..., boring part omitted):
testq %rdi, %rdi
movl %esi, %eax
je .L2
...
.L6:
...
jne .L6 <-- loop
.L2:
rep
ret
This is clearly a loop. But when you disable optimisations, you get:
_Z10sum_helperP4listi:
.LFB6:
...
jne .L2
movl -12(%rbp), %eax
jmp .L3
.L2:
...
call _Z10sum_helperP4listi <-- recursion
.L3:
leave
.cfi_def_cfa 7, 8
ret
Which is recursive.

delete loop by eraseFromParent command in llvm

*I would delete the Loop. I used the following code:
cout << "begin to delete loop" << endl;
for (Loop::block_iterator bi = L->block_begin(), bi2; bi != L->block_end(); bi = bi2) {
bi2 = bi;
bi2++;
BasicBlock * BB = *bi;
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii= ii2) {
ii2 = ii;
ii2++;
Instruction *inst = ii;
inst->eraseFromParent();
}
BB->eraseFromParent();
}
But I get the following error:
Use still stuck around after Def is destroyed: %t1 = icmp sle i32 %t0, 9
opt: /home/llvm/src/lib/VMCore/Value.cpp:75: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
0 opt 0x0848e569
Stack dump:
What suggestions do you have for solve this problem?*

The solution of you problem is as follows:
make sure that for each instruction in the loop to drop all references, then simply erase all the BasicBlocks of the loop.
here is my sample code
for (Loop::block_iterator block = CPLoop->block_begin(), end = CPLoop->block_end(); block != end; block++) {
BasicBlock * bb = *block;
for (BasicBlock::iterator II = bb->begin(); II != bb->end(); ++II) {
Instruction * insII = &(*II);
insII->dropAllReferences();
}
}
for (Loop::block_iterator block = CPLoop->block_begin(), end = CPLoop->block_end(); block != end; block++) {
BasicBlock * bb = *block;
bb->removeFromParent();
}
I hope this helps

What I write is only a guess, cause I am just starting with LLVM, but I hope it will be helpful.
In SSA form each instruction:
uses values provided by previously executed instructions
provides value (with is result of executing this instruction), which is used by others.
Those are called use-def and def-use chains.
If you try to remove instruction which result (a.k.a. "provided Value") is used by other instructions, than you break instruction chain.
You might be interested in iteratating over users of instruction you remove, using :
LLVM Programmer's Manual : Iterating over def-use & use-def chains. Thanks to that, you can iterate over users (u) of value provided by instruction, you want to remove (inst), and change their reference to another one (like inst: add u v --> add X v). Ones you make sure no one is using instruction you want to remove, remove it. (Depending if analysis passes are already made you might be required to let llvm pass manager know that CFG analysis needs to be updated - unless you update them by yourself).

You are invalidating the iterator with the call to
inst->eraseFromParent();
Store all Instruction* in an std::vector or similar and batch delete them at the end of your pass.
This should solve your problem.

There is an alternative solution for "deleting" a loop: Just permanently disable it. I.e. modify the IR code from sth. like this:
...
br label %loop
loop:
<loop body>
br i1 %exitcond, label %exit, label %loop
exit:
...
to sth. like this:
...
br i1 0, label %loop, label %exit
loop:
<loop body>
br i1 %exitcond, label %exit, label %loop
exit:
...
You will probably run optimizations (like dead code elimination) on your generated IR anyways, so why fight with all the references to the loop (e.g. in LoopInfos or ValueMaps)?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Traversal of LLVM Operands - llvm

Related

LLVM How to get return value of an instruction

Outputting input's constant char array from llvm pass

How to Insert a LLVM Instruction?

Does this qualify as tail recursion? [duplicate]

delete loop by eraseFromParent command in llvm

Categories

Resources