Selection DAG from LLVM IR? - llvm

I have fetched LLVM-IR via
clang -S -emit-llvm demo.c
where demo.c is as follows
int demo(int a, int b){
int c = a+b;
return c;
}
The IR looks like
define dso_local i32 #demo(i32 %0, i32 %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
%6 = load i32, i32* %3, align 4
%7 = load i32, i32* %4, align 4
%8 = add nsw i32 %6, %7
store i32 %8, i32* %5, align 4
%9 = load i32, i32* %5, align 4
ret i32 %9
}
from here I need only extract target independent selection DAG information so that I can use DAG information to my own pipeline. So is there any way to get information till selection DAG, or should I need to write my own selection DAG, if it is yes , what are those exact ways to encounter? any suggestion will be helpful to me.
I have built via
cmake -DCMAKE_BUILD_TYPE=Debug ..
cmake --build .
the after compiling using clang I execute the file demo.ll via
llc -fast-isel=false -view-dag-combine1-dags foo.ll
and
llc -debug foo.ll
but both the command giving same error

Related

How do I eliminate LLVM function calls and replace them with basic instructions?

My Problem
I am new to LLVM and C++.
I am currently creating an LLVM backend compiler and need to replace LLVM function calls with the instructions in its definition.
Is there already an existing pass that accomplishes this?
Examples
For example, I have the following C code, compiled to LLVM IR with clang-14 -S -emit-llvm.
int add(int a, int b) {
return a + b;
}
int main() {
int a = 10;
int b = 20;
int c = add(a, b);
return c;
}
Then, I get a LLVM IR code below.
define dso_local i32 #add(i32 noundef %a, i32 noundef %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%1 = load i32, i32* %b.addr, align 4
%add = add nsw i32 %0, %1
ret i32 %add
}
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
store i32 0, i32* %retval, align 4
store i32 10, i32* %a, align 4
store i32 20, i32* %b, align 4
%0 = load i32, i32* %a, align 4
%1 = load i32, i32* %b, align 4
%call = call i32 #add(i32 noundef %0, i32 noundef %1)
store i32 %call, i32* %c, align 4
%2 = load i32, i32* %c, align 4
ret i32 %2
}
I want to replace the function call #add with instructions in it's definition from the code above using opt command, and emit the following new code.
define dso_local i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
store i32 0, i32* %retval, align 4
store i32 10, i32* %a, align 4
store i32 20, i32* %b, align 4
%0 = load i32, i32* %a, align 4
%1 = load i32, i32* %b, align 4
%add = add nsw i32 %0, %1
store i32 %add, i32* %c, align 4
%2 = load i32, i32* %c, align 4
ret i32 %2
}
I searched the following sites for such a path, but could not find one suitable.
https://llvm.org/docs/Passes.html#loops-natural-loop-information

Is there any way to avoid deletion of duplicate load instruction when compiled using LLVM

I am working on creating LLVM front-end module pass. So, Basically I need to duplicate all load instructions and store in a different register. At -O0 for clang, opt and llc tool, this duplicated load instruction is removed. I looked at the final assembly using objdump, I could see that duplicate load instruction was removed. I want a solution that somehow does not delete duplicate load instruction.
Actual C program is,
int main(){
int* p = (int *)(0x600000);//Some address
int x=0x01, y=0x01;
int z;
z=x+y;
*p=z;
}
The corresponding IR is,
define i32 #main() #0 {
entry:
%p = alloca i32*, align 8
%x = alloca i32, align 4
%y = alloca i32, align 4
%z = alloca i32, align 4
store i32* inttoptr (i64 6291456 to i32*), i32** %p, align 8
store i32 1, i32* %x, align 4
store i32 1, i32* %y, align 4
%0 = load i32, i32* %x, align 4
%1 = load i32, i32* %y, align 4
%add = add nsw i32 %0, %1
store i32 %add, i32* %z, align 4
%2 = load i32, i32* %z, align 4
%3 = load i32*, i32** %p, align 8
store i32 %2, i32* %3, align 4
ret i32 0
}
But when my pass is enabled, this IR will change and I duplicate only load instructions with load address being same memory even for duplicated load.
The changed IR would be,
define i32 #main() #0 {
entry:
%p = alloca i32*, align 8
%x = alloca i32, align 4
%y = alloca i32, align 4
%z = alloca i32, align 4
store i32* inttoptr (i64 6291456 to i32*), i32** %p, align 8
store i32 1, i32* %x, align 4
store i32 1, i32* %y, align 4
%0 = load i32, i32* %x, align 4
%1 = load i32, i32* %y, align 4
%2 = load i32, i32* %x, align 4 //Added
%3 = load i32, i32* %y, align 4 //Added
%add = add nsw i32 %0, %1
store i32 %add, i32* %z, align 4
%4 = load i32, i32* %z, align 4
%5 = load i32*, i32** %p, align 8
%6 = load i32, i32* %z, align 4 //Added
%7 = load i32*, i32** %p, align 8 //Added
store i32 %4, i32* %5, align 4
ret i32 0
}
I am able to see the changed IR at IR level but not at the final assembly level after llc. I think llc is removing all duplicated loads. How do I stop llc from removing?
Note: I tried making all variables volatile. For this it works, I am able to see duplicated loads after llc. But, this is not a proper solution. I cannot make all thousand variables volatile :(.

LLVM: Instruction does not dominate all uses - No control flow

I implemented a function pass which iterates over basic block instructions and tracks all instructions that have a type of IntegerTy.
Here is the snippet of the pass that does it:
if (!I->isTerminator()){
Type::TypeID datatype = I->getType()->getTypeID();
if (datatype == llvm::Type::IntegerTyID) {
IRBuilder<> IRB(I);
Value* v_value = IRB.CreateZExt(I, IRB.getInt64Ty());
Value *args[] = {v_value};
IRB.CreateCall(NNT_log_int, args);
}
}
However the IRB.CreateZExt(I, IRB.getInt64Ty()); command seems to create a Instruction does not dominate all uses! problem.
I understand the nature of the issue (here and here there are similar problems).
My point of confusion that I apply this pass to a toy program with no if statements or any other control flow statements, yet I still encounter this problem.
The error message:
Instruction does not dominate all uses!
%2 = load i32, i32* %y, align 4
%1 = zext i32 %2 to i64
Instruction does not dominate all uses!
%4 = load i32, i32* %y, align 4
%3 = zext i32 %4 to i64
Note the fact that the inserted zext instructions name a constant with a counter number less than the previous instruction - I think this is the problem but I have no idea why my pass does this!!!
Here is the IR of my toy program before the application of the pass:
; Function Attrs: noinline nounwind optnone uwtable
define i32 #_Z3fooi(i32 %x) #4 {
entry:
%x.addr = alloca i32, align 4
%y = alloca i32, align 4
%z = alloca i32, align 4
store i32 %x, i32* %x.addr, align 4
store i32 0, i32* %y, align 4
%0 = load i32, i32* %x.addr, align 4
%add = add nsw i32 %0, 3
store i32 %add, i32* %y, align 4
%1 = load i32, i32* %y, align 4
store i32 %1, i32* %x.addr, align 4
%2 = load i32, i32* %y, align 4
ret i32 %2
}
; Function Attrs: noinline nounwind optnone uwtable
define i32 #_Z3bari(i32 %panos) #4 {
entry:
%panos.addr = alloca i32, align 4
%y = alloca i32, align 4
store i32 %panos, i32* %panos.addr, align 4
%0 = load i32, i32* %panos.addr, align 4
%add = add nsw i32 %0, 2
store i32 %add, i32* %y, align 4
%1 = load i32, i32* %y, align 4
ret i32 %1
}
Also, note that that the problematic instructions are before a terminator - Again I think that this is related.
Any ideas will be highly appreciated !
Your zext instruction uses I, but you're inserting it before I. When you create the IRBuilder, you should pass in the instruction after I as the insert point. For example like this:
IRBuilder<> IRB(I->getNextNode());

How Clang generates code for function parameters?

In a function , I want to know how the parameters is passed into the function body, so that to track the flow of the parameters. I tried a simple code, and find there seems to be an alloc-store pattern for every parameter, I wonder whether it is true or not?
A demo code is
int add(int x, int y){
return x+y;
}
The llvm ir it generated is:
; Function Attrs: nounwind uwtable
define i32 #add(i32 %x, i32 %y) #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 %x, i32* %1, align 4
store i32 %y, i32* %2, align 4
%3 = load i32, i32* %1, align 4
%4 = load i32, i32* %2, align 4
%5 = add nsw i32 %3, %4
ret i32 %5
}
In the example we can see that,
For every parameters, the Clang use a alloc instruction to define a
local variable
Following the alloc instruction, store instructions is used to
assign values?
My questions are:
Is all the function LLVM IR are generated in this alloc and store patterns? Or what exactly LLVM do with the parameters?
The order of parameters is determined by the convention it used?
I think this pattern holds for code that has no compile-time optimizations; however, if you instead compile the code with -O3 (or anything that applies the mem2reg optimization), this pattern is optimized out:
(clang -emit-llvm -S -O0 add.c)
define i32 #add(i32 %x, i32 %y) #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 %x, i32* %1, align 4
store i32 %y, i32* %2, align 4
%3 = load i32, i32* %1, align 4
%4 = load i32, i32* %2, align 4
%5 = add nsw i32 %3, %4
ret i32 %5
}
(opt -mem2reg add.ll -o add_m.ll)
define i32 #add(i32 %x, i32 %y) #0 {
%1 = add nsw i32 %y, %x
ret i32 %1
}
So if you are controlling all of the code that you are analyzing, then you can rely on this pattern. I would instead recommend that you use the LLVM APIs to get the function arguments. The following code iterates through the arguments to a function F and prints them after casting to values.
for (auto AI = F->arg_begin(), AE = F->arg_end(); AI != AE; ++AI)
{
Value* v = &*AI;
errs() << *v << "\n";
}
The values in the above sample are usable in the same way as any other value in the IR.

About Variables Used Within BasicBlock

I want to ask a question about LLVM IR language. For a basicblock, variables used are always loaded prior to usage, and stored after usage. Two example basic blocks are as follows:
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
%i = alloca i32, align 4
%fact = alloca i32, align 4
%n = alloca i32, align 4
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5, align 8
%7 = call i32 (i8*, ...)* bitcast (i32 (...)* #atoi to i32 (i8*, ...)*)(i8* %6)
store i32 %7, i32* %n, align 4
store i32 1, i32* %fact, align 4
store i32 1, i32* %i, align 4
br label %8
%9 = load i32* %i, align 4
%10 = load i32* %n, align 4
%11 = icmp sle i32 %9, %10
br i1 %11, label %12, label %19
For control flow, define first basic block as A, second basic block as B, control flow is from A to B.
I wonder, for the usage of %7, program store %7 to %n pointer in A, and load %n pointer to %10 to get access to it, which are like:
store i32 %7, i32* %n, align 4
%10 = load i32* %n, align 4
%11 = icmp sle i32 %9, %10
I wonder if I could just DROP store and load instructions, and directly use value %7, which is as follows:
%11 = icmp sle i32 %9, %7
Is this OK? Could anyone talk about the reason behind it?
My description may be obscure. I could explain it more clear if you have questions on it.
Thanks
It is possible to refer to virtual registers from other basic blocks.
Since you provided an incomplete example, I can just speculate if %7 can be directly used in the comparison:
If you optimize the code with LLVM's opt tool, the register will probably not be stored and reloaded and the comparison will directly use %7 (or a phi function dependent on the value).
You can try the mem2reg register pass:
opt -mem2reg <your file>.ll -o <target file>.ll