LLVM IR: getting the value of an address - llvm

I'm trying to write a LLVM pass to analyse the following IR:
%d = alloca i32, align 4
store i32 0, i32* %d, align 4
%1 = load i32* %d, align 4
%2 = add nsw i32 %1, 2
store i32 %2, i32* %d, align 4
What I need to do is to figure out the final value of d.
For the store i32 0, i32* %d, align 4 I used ConstantInt casting for the operand 0 and found the assigned value for d (which is 0). But I'm struggling with how to find the value for the d in last store instruction:
store i32 %2, i32* %d, align 4
As I know, %2 is a pointer to the result of the instruction %2 = add nsw i32 %1, 2 and similar thing to the %1.
Do I need to backtrack for %2 to find the value of %2 or is there a simpler method for this?
EDIT:
Following is the code I used so far:
void analyse(BasicBlock* BB)
{
for (auto &I: *BB)
{
if (isa<StoreInst>(I))
{
Value *v = I.getOperand(0);
Instruction *i = dyn_cast<Instruction>(I.getOperand(1));
if (isa<ConstantInt>(v))
{
llvm::ConstantInt *CI = dyn_cast<llvm::ConstantInt>(v);
int value = CI->getZExtValue();
std::string ope = i->getName().str().c_str();
std::cout << "ope " << value << " \n";
}
}
}
}

Way to solve this is to back track. In this case:
store i32 %2, i32* %d, align 4
%2 = add nsw i32 %1, 2
%1 = load i32* %d, align 4
so it's checking the operand is an instruction, and if so, check the type of the instruction (i.e: isa(v), isa(v) or isa(v) etc), and then find the value.

Related

Erasing redundant expression with llvm and local value numbering algorithm

So my C code is:
#include <stdio.h>
void main(){
int a, b,c, d;
b = 18, c = 112;
b = a - d;
d = a - d;
}
and part of its IR is:
%5 = load i32, i32* %1, align 4
%6 = load i32, i32* %4, align 4
%7 = sub nsw i32 %5, %6
store i32 %7, i32* %2, align 4
%8 = load i32, i32* %1, align 4
%9 = load i32, i32* %4, align 4
%10 = sub nsw i32 %8, %9
store i32 %10, i32* %4, align 4
I have implemented LVN algorithm to detect the redundant expression which is d = a - d. Now for optimization, I need to manipulate the instruction and make it d = b. I am not sure how to do it with llvm and how I can manipulate the IR.
I am new in llvm so it might be a silly question but I am really confused. Since, llvm works on IR, I understand that when it see "d = a - d" it will first load a and d, but the binary operation and store instruction in IR needs to be changed so that %4 gets the value from %2. Can anyone help me checking if I am understanding this correctly and how I can manipulate the IR to optimize the code.
First of all, let's replace your example program with one that does not invoke undefined behaviour (due to accessing uninitialized variables), so that the UB does not confuse the issue:
void f(int a, int b, int c, int d){
b = a - d;
d = a - d;
// Code that uses b and d
}
(I've also removed the two assignments as they didn't have any effect and will disappear after mem2reg anyway.)
Now to actually answer your question: Most optimizations run after the mem2reg pass, which converts memory accesses to registers where possible. This is important because, unlike memory locations, LLVM registers can only be assigned from a single point in the source, so mem2reg turns the code into SSA form, which is required for many optimizations to work.
If we apply mem2reg to the example code, we get:
define void #f(i32, i32, i32, i32) #0 {
%5 = sub nsw i32 %0, %3
%6 = sub nsw i32 %0, %3
; Code that uses b and d
}
So now we'd apply your analysis to find out that %6 is equivalent to %5. With that information we can remove the definition of %6 and replace all the occurrences of %6 with %5 (note that this would be more complicated if %5 and %6 were in the different basic blocks where one didn't dominate the other). To do that you can find all uses of %6 using the uses() method, which tells you which instructions have %6 as which operand. Then you can just set that operand to be a reference to %5 instead.

Address Problems in LLVM Interpreter

I'm using LLVM interpreter. I tried to find variables values by using ExecutionEngine.cpp and Execution.cpp. I could find current values, but I have another problem. The binary operations could be done with similar addresses. I think they use temp addresses, but why same ones? I need to make them different to get some results.
To be more clear
Here are different instructions for different basic blocks, with there interpreting information I got.
First Instruction is:
About to interpret: %BB1 = add i32 %4, 1
Basic Block Name: CBB1
arg0: %4 = load i32, i32* #BB1
arg1: i32 1
visitBinaryOperator
Source1 Current Input ::: 10
Source1 Current Address ::: 0x7fff4582cd90
Source1 [ 0, 41 ]
Source1 Current Input::: 1
Source1 Current Input ::: 10
Sourc2 Current Address::: 0x7fff4582cdc0
Second Instruction is
About to interpret: %10 = add nsw i32 %9, %8
BasicBlock Name: CBB2
arg0: %9 = load i32, i32* %sum, align 4
arg1: %8 = load i32, i32* %7, align 4
visitBinaryOperator
Source1 Current Input ::: 0
Source1 Current Address ::: 0x7fff4582cd90
Source2 Current Input ::: 0
Source2 Current Address::: 0x7fff4582cdc0
Part of BasicBlock CBB2
%8 = load i32, i32* %7, align 4
%9 = load i32, i32* %sum, align 4
%10 = add nsw i32 %9, %8
store i32 %10, i32* %sum, align 4
I need to get some analysis using current and previous values for each instructions inputs and outputs.
I got the values from function Interpreter::visitBinaryOperator:
ExecutionContext &SF = ECStack.back();
Type *Ty = I.getOperand(0)->getType();
GenericValue Src1 = getOperandValue(I.getOperand(0), SF);
GenericValue Src2 = getOperandValue(I.getOperand(1), SF);
GenericValue R; // Result
Getting addresses y1, y2, r and values v1, v2, rv:
uint8_t *y1 = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(Src1.IntVal.getRawData()));
int v1 = *reinterpret_cast<int *>(y1);
uint8_t *y2 = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(Src2.IntVal.getRawData()));
int v2 = *reinterpret_cast<int *>(y2);
uint8_t *r = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(R.IntVal.getRawData()));
int rv = *reinterpret_cast<int *>(r);
I have another suggestion, if I could not solve the problem for add instruction, could I use the load instructions before add and store instruction after add, to get my results. I could already get them individually with valid results, but how to be connected to be arguments for the binary operation.

How to know the type of a variable in an llvm code

Is there any method to know the type of the variables in the LLVM code?
For example, I have the following code:
%i = alloca i32, align 4
store i32 1, i32* %i, align 4
%n = add i32 6, 1
br label %2
And I want a function that returns the type of each of the variables %i, %n and %2, i.e. respectively i32*, i32 and label
Is there any proposition?
Type* var_type = cur_instruction->getType();
%i = alloca i32, align 4, store i32 1, i32* %i, align 4 and %n = add i32 6, 1 are instructions. You can query their type via their getType method.
%2 is a basic block and has label type. You can check whether a value is a basic block by using isa.

How can you print instruction in llvm

From an llvm pass, I need to print an llvm instruction (Type llvm::Instruction) on the screen, just like as it appears in the llvm bitcode file. Actually my compilation is crashing, and does not reach the point where bitcode file is generated. So for debugging I want to print some instructions to know what is going wrong.
Assuming I is your instruction
I.print(errs());
By simply using the print method.
For a simple Hello World program, using C++'s range-based loops, you can do something like this:
for(auto& B: F){
for(auto& I: B){
errs() << I << "\n";
}
}
This gives the output:
%3 = alloca i32, align 4
%4 = alloca i8**, align 8
store i32 %0, i32* %3, align 4
store i8** %1, i8*** %4, align 8
%5 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* #.str, i64 0, i64 0))
ret i32 0

Identify array type in IR

I have been trying to identify array access in IR by making use of following code:
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii++) {
Instruction *I=ii;
if(GetElementPtrInst *getElePntr = dyn_cast<GetElementPtrInst>(&*I))
{
Value *valAlloc = (getElePntr->getOperand(0));
if(getElePntr->getOperand(0)->getType()->isArrayTy())
{
errs()<<"\tarray found";
}
}
}
This code identifies getElementPtr instruction but it does not identify whether it's first operand is an array type or not. Please let me know what is the problem with my code.
The first operand of a GEP (getelementptr instruction) is a pointer, not an array. That pointer may point to an array, or it may not (see below). So you need to look what this pointer points to.
Here's a sample BasicBlockPass visitor:
virtual bool runOnBasicBlock(BasicBlock &BB) {
for (BasicBlock::iterator ii = BB.begin(), ii_e = BB.end(); ii != ii_e; ++ii) {
if (GetElementPtrInst *gep = dyn_cast<GetElementPtrInst>(&*ii)) {
// Dump the GEP instruction
gep->dump();
Value* firstOperand = gep->getOperand(0);
Type* type = firstOperand->getType();
// Figure out whether the first operand points to an array
if (PointerType *pointerType = dyn_cast<PointerType>(type)) {
Type* elementType = pointerType->getElementType();
errs() << "The element type is: " << *elementType << "\n";
if (elementType->isArrayTy()) {
errs() << " .. points to an array!\n";
}
}
}
}
return false;
}
Note, however, that many "arrays" in C/C++ are actually pointers so you may not get the array type where you expect.
For example, if you compile this code:
int main(int argc, char **argv) {
return (int)argv[1][8];
}
You get the IR:
define i32 #main(i32 %argc, i8** %argv) nounwind uwtable {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5
%7 = getelementptr inbounds i8* %6, i64 8
%8 = load i8* %7
%9 = sext i8 %8 to i32
ret i32 %9
}
Although argv is treated as an array, the compiler thinks of it as a pointer, so there is no array type in sight. The pass I pasted above won't recognize an array here, because the first operand of the GEP is a pointer to a pointer.