Find local variables in certain function llvm

Find local variables in certain function llvm - llvm

Given a certain function in LLVM bit code, how can I identify its local variables?.
For example, the following snippet from GNU coreutils echo utility, I don't know how to find the variable do_v9 in the scope of the main IR code.
int main (int argc, char **argv)
{
bool display_return = true;
bool posixly_correct = getenv ("POSIXLY_CORRECT");
....
bool do_v9 = false;
}
I noticed LLVM creates a metadata for local variables, called DILocalVariable, where this variable will be replaced with a number starts with the letter i.
!686 = !DILocalVariable(name: "posixly_correct", scope: !678, file: !10, line: 114, type: !64)
!688 = !DILocalVariable(name: "do_v9", scope: !678, file: !10, line: 122, type: !64)
So the main IR code contains this neither the variable do_v9 nor its corresponding metadata !688, except for the value besides the definition of the main function. My analysis loops over the instructions in the main function, but I don't know how to find this local variable within my iteration. Where I'm using LLVM 6.0.
; Function Attrs: nounwind uwtable
define i32 #main(i32, i8**) #9 !dbg !678 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i8, align 1
%7 = alloca i8, align 1
%8 = alloca i8, align 1
%9 = alloca i8, align 1
%10 = alloca i32
%11 = alloca i8*, align 8
%12 = alloca i64, align 8
%13 = alloca i8*, align 8
%14 = alloca i8, align 1
%15 = alloca i8, align 1

If you want to identify a local variable from your source code in llvm IR using the debug information emitted by the compiler, you can do this by looking at the calls to the #llvm.dbg.declare or #llvm.dbg.addr intrinsics in your source code. You will have either one or the other (but not both; the llvm.dbg.addr function replaces llvm.dbg.declare in newer versions of llvm) present once for each local variable in your function. For example, if you have the following:
%1 = alloca i32, align 4
call void #llvm.dbg.addr(metadata i32* %1, metadata !2, metadata ...), !dbg ...
!2 = !DILocalVariable(name: "i", ...)
This tells us that local variable i corresponds to the stack location allocated by the alloca whose address is %1.
Note that the ... above just represents stuff we don't care about in this context.

Related

How to create instruction in function without basic block by LLVM C++ API?

I want to insert instructions into function without basic block, for example:
define void #_Z2f2v() nounwind {
%a = alloca i32, align 4
%b = alloca i32, align 4
store i32 2, i32* %a, align 4
%1 = load i32* %a, align 4
%2 = icmp sgt i32 %1, 0
ret void
}
But I read LLVM document, all C++ API I have are:
BasicBlock *bb = BasicBlock::Create(...);
irBuilder.setInsertPoint(bb);
irBuilder.CreateXXXInst(...);
or
Instruction *inst = new XXXInst(..., Instruction *insertBefore);
Instruction *inst = new XXXInst(..., BasicBlock *insertAtEnd);
It seems that I must create a BasicBlock at the beginning of a function.
How could I create instruction into function without BasicBlock by C++ API ?

I want to insert instructions into function without basic block, for example:
define void #_Z2f2v() nounwind {
%a = alloca i32, align 4
%b = alloca i32, align 4
store i32 2, i32* %a, align 4
%1 = load i32* %a, align 4
%2 = icmp sgt i32 %1, 0
ret void
}
That function contains exactly one basic block, not zero. To create a function like that, you add all of your instructions to the function's entry block.
How could I create instruction into function without BasicBlock by C++ API ?
You can't - neither using the C++ API nor any other way. Every instruction has to be part of a basic block by definition.
Basic blocks are the nodes in the CFG, so if you had an instruction without a basic block, it would not be part of the CFG and could therefore never be executed, which would be pointless.

Erasing redundant expression with llvm and local value numbering algorithm

So my C code is:
#include <stdio.h>
void main(){
int a, b,c, d;
b = 18, c = 112;
b = a - d;
d = a - d;
}
and part of its IR is:
%5 = load i32, i32* %1, align 4
%6 = load i32, i32* %4, align 4
%7 = sub nsw i32 %5, %6
store i32 %7, i32* %2, align 4
%8 = load i32, i32* %1, align 4
%9 = load i32, i32* %4, align 4
%10 = sub nsw i32 %8, %9
store i32 %10, i32* %4, align 4
I have implemented LVN algorithm to detect the redundant expression which is d = a - d. Now for optimization, I need to manipulate the instruction and make it d = b. I am not sure how to do it with llvm and how I can manipulate the IR.
I am new in llvm so it might be a silly question but I am really confused. Since, llvm works on IR, I understand that when it see "d = a - d" it will first load a and d, but the binary operation and store instruction in IR needs to be changed so that %4 gets the value from %2. Can anyone help me checking if I am understanding this correctly and how I can manipulate the IR to optimize the code.

First of all, let's replace your example program with one that does not invoke undefined behaviour (due to accessing uninitialized variables), so that the UB does not confuse the issue:
void f(int a, int b, int c, int d){
b = a - d;
d = a - d;
// Code that uses b and d
}
(I've also removed the two assignments as they didn't have any effect and will disappear after mem2reg anyway.)
Now to actually answer your question: Most optimizations run after the mem2reg pass, which converts memory accesses to registers where possible. This is important because, unlike memory locations, LLVM registers can only be assigned from a single point in the source, so mem2reg turns the code into SSA form, which is required for many optimizations to work.
If we apply mem2reg to the example code, we get:
define void #f(i32, i32, i32, i32) #0 {
%5 = sub nsw i32 %0, %3
%6 = sub nsw i32 %0, %3
; Code that uses b and d
}
So now we'd apply your analysis to find out that %6 is equivalent to %5. With that information we can remove the definition of %6 and replace all the occurrences of %6 with %5 (note that this would be more complicated if %5 and %6 were in the different basic blocks where one didn't dominate the other). To do that you can find all uses of %6 using the uses() method, which tells you which instructions have %6 as which operand. Then you can just set that operand to be a reference to %5 instead.

LLVM IR temporaries use

I'm trying to find out whether LLVM IR temporaries can be used outside a loop in which they were defined. For that, I compiled the following simple C code:
while (*s == 'a')
{
c = *s++;
}
*s = c;
and like I suspected, the final write outside the loop (*s = c) is done
with another temporary (%tmp5) than the one read to inside the loop (%tmp4)
while.body: ; preds = %while.cond
%tmp3 = load i8*, i8** %s.addr, align 8
%incdec.ptr = getelementptr inbounds i8, i8* %tmp3, i32 1
store i8* %incdec.ptr, i8** %s.addr, align 8
%tmp4 = load i8, i8* %tmp3, align 1
store i8 %tmp4, i8* %c, align 1
br label %while.cond
while.end: ; preds = %while.cond
%tmp5 = load i8, i8* %c, align 1
%tmp6 = load i8*, i8** %s.addr, align 8
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; store i8 %tmp4, i8* %tmp6, align 1 ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
store i8 %tmp5, i8* %tmp6, align 1
When I edit the *.ll file and manually replace %tmp5 with %tmp4,
then llvm-as is unhappy:
$ llvm-as modified.ll
Instruction does not dominate all uses!
%tmp4 = load i8, i8* %tmp3, align 1
store i8 %tmp4, i8* %tmp6, align 1
Is there any example where a temporary will be defined
inside a loop and used outside of it? Thanks!

LLVM doesn't really have temporaries, it uses SSA. That's short for static single assignment, and the key word here is single. Everything is a value and the value must always be assigned once.
Anything can use any value which necessarily has been assigned by the time it's used. "Dominates" means "provably comes before" in the error message you got, ie. LLVM sees that the input string is "b", the code will jump straight from while.cond to while.end, past while.body.
When you do use values from within loop after the end of the loop, things can get a little confusing. You may need to think hard and close the Slack and Facebook tabs. But LLVM doesn't mind.

The while.end basic block has only while.cond block as its predecessor. Thus, you can't access variables defined in while.body. It is like you want to access a variable defined in one branch from another:
if(...)
int x = ...;
else
print(x);
Instead, declare whatever variables you need in loop entry block and then use it from both while.body and while.end.

Create a LLVM function with a reference argument (e.g. double &x)

I want to create, from scratch, a new function in LLVM IR. The LLVM code should correspond to a C++ function with a reference argument, say
void foo(double &x){
x=0;
}
The tutorial such as http://llvm.org/releases/2.6/docs/tutorial/JITTutorial1.html is too old (llvm 2.6) and does not consider pass-by-reference function.
Any hint on how to do this? Thanks.

In LLVM, Reference types are typically implemented with pointer types. For the following C++ source code,
int foo(int & i) {
return i;
}
int bar(int *i) {
return *i;
}
void baz(int i) {
foo(i);
bar(&i);
}
The corresponding IR is:
; Function Attrs: nounwind
define i32 #_Z3fooRi(i32* dereferenceable(4) %i) #0 {
entry:
%i.addr = alloca i32*, align 8
store i32* %i, i32** %i.addr, align 8
%0 = load i32*, i32** %i.addr, align 8
%1 = load i32, i32* %0, align 4
ret i32 %1
}
; Function Attrs: nounwind
define i32 #_Z3barPi(i32* %i) #0 {
entry:
%i.addr = alloca i32*, align 8
store i32* %i, i32** %i.addr, align 8
%0 = load i32*, i32** %i.addr, align 8
%1 = load i32, i32* %0, align 4
ret i32 %1
}
; Function Attrs: nounwind
define void #_Z3bazi(i32 %i) #0 {
entry:
%i.addr = alloca i32, align 4
store i32 %i, i32* %i.addr, align 4
%call = call i32 #_Z3fooRi(i32* dereferenceable(4) %i.addr)
%call1 = call i32 #_Z3barPi(i32* %i.addr)
ret void
}
You can find that there is no essential difference for i between functions foo and bar: dereferenceable is just a parameter attribute that you can add yourself during the code generation from the frontend.

How to know the type of a variable in an llvm code

Is there any method to know the type of the variables in the LLVM code?
For example, I have the following code:
%i = alloca i32, align 4
store i32 1, i32* %i, align 4
%n = add i32 6, 1
br label %2
And I want a function that returns the type of each of the variables %i, %n and %2, i.e. respectively i32*, i32 and label
Is there any proposition?

Type* var_type = cur_instruction->getType();

%i = alloca i32, align 4, store i32 1, i32* %i, align 4 and %n = add i32 6, 1 are instructions. You can query their type via their getType method.
%2 is a basic block and has label type. You can check whether a value is a basic block by using isa.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find local variables in certain function llvm - llvm

Related

How to create instruction in function without basic block by LLVM C++ API?

Erasing redundant expression with llvm and local value numbering algorithm

LLVM IR temporaries use

Create a LLVM function with a reference argument (e.g. double &x)

How to know the type of a variable in an llvm code

Categories

Resources