LLVM IR for addition and subtraction - llvm

I use "builder->CreateSub" and "builder->CreateAdd" to generate LLVM-IR for subtraction and addition.
left = this->builder->CreateAlloca(llvm::Type::getInt32Ty(*this->llvm_context), nullptr, std::string("a1"));
right = this->builder->CreateAlloca(llvm::Type::getInt32Ty(*this->llvm_context), nullptr, std::string("b2"));
builder->CreateSub(left, right, "sub");
builder->CreateAdd(left, right, "add");
However, the generated IR (shown below) cannot be compiled/interpreted by LLVM (lli)
%a1 = alloca i32, align 4
store i32 100, ptr %a1, align 4
%b2 = alloca i32, align 4
store i32 10, ptr %b2, align 4
%add = add ptr %a1, %b2
%sub = sub ptr %c3, i32 2
%sub1 = sub ptr %add, %sub
Which gives error (on ptr):
lli: lli: test.ll:14:14: error: invalid operand type for instruction
%add = add ptr %a1, %b2

As pointed out by Nick Lewycky, we need load instruction to read memory from stack where alloca vars are stored. I added CreateLoad for left and right variables above.
builder->CreateLoad(llvm::Type::getInt32Ty(*this->llvm_context), alloca);

Related

How to create instruction in function without basic block by LLVM C++ API?

I want to insert instructions into function without basic block, for example:
define void #_Z2f2v() nounwind {
%a = alloca i32, align 4
%b = alloca i32, align 4
store i32 2, i32* %a, align 4
%1 = load i32* %a, align 4
%2 = icmp sgt i32 %1, 0
ret void
}
But I read LLVM document, all C++ API I have are:
BasicBlock *bb = BasicBlock::Create(...);
irBuilder.setInsertPoint(bb);
irBuilder.CreateXXXInst(...);
or
Instruction *inst = new XXXInst(..., Instruction *insertBefore);
Instruction *inst = new XXXInst(..., BasicBlock *insertAtEnd);
It seems that I must create a BasicBlock at the beginning of a function.
How could I create instruction into function without BasicBlock by C++ API ?
I want to insert instructions into function without basic block, for example:
define void #_Z2f2v() nounwind {
%a = alloca i32, align 4
%b = alloca i32, align 4
store i32 2, i32* %a, align 4
%1 = load i32* %a, align 4
%2 = icmp sgt i32 %1, 0
ret void
}
That function contains exactly one basic block, not zero. To create a function like that, you add all of your instructions to the function's entry block.
How could I create instruction into function without BasicBlock by C++ API ?
You can't - neither using the C++ API nor any other way. Every instruction has to be part of a basic block by definition.
Basic blocks are the nodes in the CFG, so if you had an instruction without a basic block, it would not be part of the CFG and could therefore never be executed, which would be pointless.

Find local variables in certain function llvm

Given a certain function in LLVM bit code, how can I identify its local variables?.
For example, the following snippet from GNU coreutils echo utility, I don't know how to find the variable do_v9 in the scope of the main IR code.
int main (int argc, char **argv)
{
bool display_return = true;
bool posixly_correct = getenv ("POSIXLY_CORRECT");
....
bool do_v9 = false;
}
I noticed LLVM creates a metadata for local variables, called DILocalVariable, where this variable will be replaced with a number starts with the letter i.
!686 = !DILocalVariable(name: "posixly_correct", scope: !678, file: !10, line: 114, type: !64)
!688 = !DILocalVariable(name: "do_v9", scope: !678, file: !10, line: 122, type: !64)
So the main IR code contains this neither the variable do_v9 nor its corresponding metadata !688, except for the value besides the definition of the main function. My analysis loops over the instructions in the main function, but I don't know how to find this local variable within my iteration. Where I'm using LLVM 6.0.
; Function Attrs: nounwind uwtable
define i32 #main(i32, i8**) #9 !dbg !678 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i8, align 1
%7 = alloca i8, align 1
%8 = alloca i8, align 1
%9 = alloca i8, align 1
%10 = alloca i32
%11 = alloca i8*, align 8
%12 = alloca i64, align 8
%13 = alloca i8*, align 8
%14 = alloca i8, align 1
%15 = alloca i8, align 1
If you want to identify a local variable from your source code in llvm IR using the debug information emitted by the compiler, you can do this by looking at the calls to the #llvm.dbg.declare or #llvm.dbg.addr intrinsics in your source code. You will have either one or the other (but not both; the llvm.dbg.addr function replaces llvm.dbg.declare in newer versions of llvm) present once for each local variable in your function. For example, if you have the following:
%1 = alloca i32, align 4
call void #llvm.dbg.addr(metadata i32* %1, metadata !2, metadata ...), !dbg ...
!2 = !DILocalVariable(name: "i", ...)
This tells us that local variable i corresponds to the stack location allocated by the alloca whose address is %1.
Note that the ... above just represents stuff we don't care about in this context.

Optimize add zero with llvm pass

int func(int i){
int j;
j = i + 0;
return j;
}
I want to practise and learn LLVM transformation pass.
For the above simple c function, I want to implement algebraic identification optimization X+0 -> X
I expect the optimized program to be
int func(int i){
int j;
j = i // remove the add instruction
return j;
}
I read about the IRBuilder, I can create Add/Sub/Mul.... a lot of instructions. But for handling the above case, I can not find any matches.
what should I do to handle the above case?
I also think if I can just remove the instruction.
And the program would be
int func(int i){
return i;
}
I am not sure if llvm will do this automatically, once I remove the useless add instruction.
Running clang -O0 -S -emit-llvm -o - test.c on your code produces following IR:
define i32 #func(i32 %i) #0 {
entry:
%i.addr = alloca i32, align 4
%j = alloca i32, align 4
store i32 %i, i32* %i.addr, align 4
%0 = load i32, i32* %i.addr, align 4
%add = add nsw i32 %0, 0
store i32 %add, i32* %j, align 4
%1 = load i32, i32* %j, align 4
ret i32 %1
}
As you can see, there is add nsw i32 %0, 0 instruction. This means that clang doesn't optimize it right away (at least on -O0) and this is instruction we are going to process by our pass.
I'll omit boilerplate code that is required to add your own pass, as it is thoroughly described in the LLVM documentation.
The pass should do something like (pseudo-code)
runOnFunction(Function& F)
{
for(each instruction in F)
if(isa<BinaryOperator>(instruction))
if(instruction.getOpcode() == BinaryInstruction::Add)
if(isa<ConstantInt>(instruction.getOperand(1))
if(extract value from constant operand and check if it is 0)
instruction.eraseFromParent()
}
To implement the avoid add zero optimization
The required things to do are:
Find the instruction where y=x+0
Replace ALL the use of y with x
Record the pointer of the instruction
Remove it afterward

Erasing redundant expression with llvm and local value numbering algorithm

So my C code is:
#include <stdio.h>
void main(){
int a, b,c, d;
b = 18, c = 112;
b = a - d;
d = a - d;
}
and part of its IR is:
%5 = load i32, i32* %1, align 4
%6 = load i32, i32* %4, align 4
%7 = sub nsw i32 %5, %6
store i32 %7, i32* %2, align 4
%8 = load i32, i32* %1, align 4
%9 = load i32, i32* %4, align 4
%10 = sub nsw i32 %8, %9
store i32 %10, i32* %4, align 4
I have implemented LVN algorithm to detect the redundant expression which is d = a - d. Now for optimization, I need to manipulate the instruction and make it d = b. I am not sure how to do it with llvm and how I can manipulate the IR.
I am new in llvm so it might be a silly question but I am really confused. Since, llvm works on IR, I understand that when it see "d = a - d" it will first load a and d, but the binary operation and store instruction in IR needs to be changed so that %4 gets the value from %2. Can anyone help me checking if I am understanding this correctly and how I can manipulate the IR to optimize the code.
First of all, let's replace your example program with one that does not invoke undefined behaviour (due to accessing uninitialized variables), so that the UB does not confuse the issue:
void f(int a, int b, int c, int d){
b = a - d;
d = a - d;
// Code that uses b and d
}
(I've also removed the two assignments as they didn't have any effect and will disappear after mem2reg anyway.)
Now to actually answer your question: Most optimizations run after the mem2reg pass, which converts memory accesses to registers where possible. This is important because, unlike memory locations, LLVM registers can only be assigned from a single point in the source, so mem2reg turns the code into SSA form, which is required for many optimizations to work.
If we apply mem2reg to the example code, we get:
define void #f(i32, i32, i32, i32) #0 {
%5 = sub nsw i32 %0, %3
%6 = sub nsw i32 %0, %3
; Code that uses b and d
}
So now we'd apply your analysis to find out that %6 is equivalent to %5. With that information we can remove the definition of %6 and replace all the occurrences of %6 with %5 (note that this would be more complicated if %5 and %6 were in the different basic blocks where one didn't dominate the other). To do that you can find all uses of %6 using the uses() method, which tells you which instructions have %6 as which operand. Then you can just set that operand to be a reference to %5 instead.

Address Problems in LLVM Interpreter

I'm using LLVM interpreter. I tried to find variables values by using ExecutionEngine.cpp and Execution.cpp. I could find current values, but I have another problem. The binary operations could be done with similar addresses. I think they use temp addresses, but why same ones? I need to make them different to get some results.
To be more clear
Here are different instructions for different basic blocks, with there interpreting information I got.
First Instruction is:
About to interpret: %BB1 = add i32 %4, 1
Basic Block Name: CBB1
arg0: %4 = load i32, i32* #BB1
arg1: i32 1
visitBinaryOperator
Source1 Current Input ::: 10
Source1 Current Address ::: 0x7fff4582cd90
Source1 [ 0, 41 ]
Source1 Current Input::: 1
Source1 Current Input ::: 10
Sourc2 Current Address::: 0x7fff4582cdc0
Second Instruction is
About to interpret: %10 = add nsw i32 %9, %8
BasicBlock Name: CBB2
arg0: %9 = load i32, i32* %sum, align 4
arg1: %8 = load i32, i32* %7, align 4
visitBinaryOperator
Source1 Current Input ::: 0
Source1 Current Address ::: 0x7fff4582cd90
Source2 Current Input ::: 0
Source2 Current Address::: 0x7fff4582cdc0
Part of BasicBlock CBB2
%8 = load i32, i32* %7, align 4
%9 = load i32, i32* %sum, align 4
%10 = add nsw i32 %9, %8
store i32 %10, i32* %sum, align 4
I need to get some analysis using current and previous values for each instructions inputs and outputs.
I got the values from function Interpreter::visitBinaryOperator:
ExecutionContext &SF = ECStack.back();
Type *Ty = I.getOperand(0)->getType();
GenericValue Src1 = getOperandValue(I.getOperand(0), SF);
GenericValue Src2 = getOperandValue(I.getOperand(1), SF);
GenericValue R; // Result
Getting addresses y1, y2, r and values v1, v2, rv:
uint8_t *y1 = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(Src1.IntVal.getRawData()));
int v1 = *reinterpret_cast<int *>(y1);
uint8_t *y2 = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(Src2.IntVal.getRawData()));
int v2 = *reinterpret_cast<int *>(y2);
uint8_t *r = reinterpret_cast<uint8_t *>(
const_cast<uint64_t *>(R.IntVal.getRawData()));
int rv = *reinterpret_cast<int *>(r);
I have another suggestion, if I could not solve the problem for add instruction, could I use the load instructions before add and store instruction after add, to get my results. I could already get them individually with valid results, but how to be connected to be arguments for the binary operation.