Identify array type in IR - llvm

I have been trying to identify array access in IR by making use of following code:
for (BasicBlock::iterator ii = BB->begin(), ii2; ii != BB->end(); ii++) {
Instruction *I=ii;
if(GetElementPtrInst *getElePntr = dyn_cast<GetElementPtrInst>(&*I))
{
Value *valAlloc = (getElePntr->getOperand(0));
if(getElePntr->getOperand(0)->getType()->isArrayTy())
{
errs()<<"\tarray found";
}
}
}
This code identifies getElementPtr instruction but it does not identify whether it's first operand is an array type or not. Please let me know what is the problem with my code.

The first operand of a GEP (getelementptr instruction) is a pointer, not an array. That pointer may point to an array, or it may not (see below). So you need to look what this pointer points to.
Here's a sample BasicBlockPass visitor:
virtual bool runOnBasicBlock(BasicBlock &BB) {
for (BasicBlock::iterator ii = BB.begin(), ii_e = BB.end(); ii != ii_e; ++ii) {
if (GetElementPtrInst *gep = dyn_cast<GetElementPtrInst>(&*ii)) {
// Dump the GEP instruction
gep->dump();
Value* firstOperand = gep->getOperand(0);
Type* type = firstOperand->getType();
// Figure out whether the first operand points to an array
if (PointerType *pointerType = dyn_cast<PointerType>(type)) {
Type* elementType = pointerType->getElementType();
errs() << "The element type is: " << *elementType << "\n";
if (elementType->isArrayTy()) {
errs() << " .. points to an array!\n";
}
}
}
}
return false;
}
Note, however, that many "arrays" in C/C++ are actually pointers so you may not get the array type where you expect.
For example, if you compile this code:
int main(int argc, char **argv) {
return (int)argv[1][8];
}
You get the IR:
define i32 #main(i32 %argc, i8** %argv) nounwind uwtable {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i8*** %3, align 8
%5 = getelementptr inbounds i8** %4, i64 1
%6 = load i8** %5
%7 = getelementptr inbounds i8* %6, i64 8
%8 = load i8* %7
%9 = sext i8 %8 to i32
ret i32 %9
}
Although argv is treated as an array, the compiler thinks of it as a pointer, so there is no array type in sight. The pass I pasted above won't recognize an array here, because the first operand of the GEP is a pointer to a pointer.

Related

LLVM IR: getting the value of an address

I'm trying to write a LLVM pass to analyse the following IR:
%d = alloca i32, align 4
store i32 0, i32* %d, align 4
%1 = load i32* %d, align 4
%2 = add nsw i32 %1, 2
store i32 %2, i32* %d, align 4
What I need to do is to figure out the final value of d.
For the store i32 0, i32* %d, align 4 I used ConstantInt casting for the operand 0 and found the assigned value for d (which is 0). But I'm struggling with how to find the value for the d in last store instruction:
store i32 %2, i32* %d, align 4
As I know, %2 is a pointer to the result of the instruction %2 = add nsw i32 %1, 2 and similar thing to the %1.
Do I need to backtrack for %2 to find the value of %2 or is there a simpler method for this?
EDIT:
Following is the code I used so far:
void analyse(BasicBlock* BB)
{
for (auto &I: *BB)
{
if (isa<StoreInst>(I))
{
Value *v = I.getOperand(0);
Instruction *i = dyn_cast<Instruction>(I.getOperand(1));
if (isa<ConstantInt>(v))
{
llvm::ConstantInt *CI = dyn_cast<llvm::ConstantInt>(v);
int value = CI->getZExtValue();
std::string ope = i->getName().str().c_str();
std::cout << "ope " << value << " \n";
}
}
}
}
Way to solve this is to back track. In this case:
store i32 %2, i32* %d, align 4
%2 = add nsw i32 %1, 2
%1 = load i32* %d, align 4
so it's checking the operand is an instruction, and if so, check the type of the instruction (i.e: isa(v), isa(v) or isa(v) etc), and then find the value.

LLVM check if array allocation has dynamic size or constant size

I want to check if an stack allocation of an array has a constant size or a dynamic size (calculated at runtime). For example
int myInt;
scanf("%d", &myInt);
int buffer[myInt]; //dynamic sized array
The dynamic sized array gets converted to llvm IR like this:
%myInt = alloca i32, align 4
%saved_stack = alloca i8*
%call = call i32 (i8*, ...) #__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #.str, i32 0, i32 0), i32* %myInt)
%0 = load i32, i32* %myInt, align 4
%1 = zext i32 %0 to i64
%2 = call i8* #llvm.stacksave()
store i8* %2, i8** %saved_stack
%vla = alloca i32, i64 %1, align 16 //allocation
%3 = load i8*, i8** %saved_stack
call void #llvm.stackrestore(i8* %3)
A constant sized array:
int buffer2[123];
LLVM IR:
%buffer2 = alloca [123 x i32], align 16
How can I identify if an alloca instruction allocates a dynamically sized array or a constant sized array?
Look at class AllocaInst in "include/llvm/IR/Instructions.h". It contains a method that returns the size of allocated array
/// Get the number of elements allocated. For a simple allocation of a single
/// element, this will return a constant 1 value.
const Value *getArraySize() const { return getOperand(0); }
Once you have the Value * for the size of the array, you should be able to analyze if that is a constant or not, by using dyn_cast<ConstantInt>. (grep for this expression. It is widely used in the code).

how to see content of a method pointer?

typedef int (D::*fptr)(void);
fptr bfunc;
bfunc=&D::Bfunc;
cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
complete code available at : https://ideone.com/wRVyTu
I am trying to use reinterpret_cast, but the compiler throws error
prog.cpp: In function 'int main()': prog.cpp:49:51: error: invalid cast from type 'fptr {aka int (D::*)()}' to type 'long long unsigned int' cout<<(reinterpret_cast<unsigned long long>(bfunc)&0xffffffff00000000)<<endl;
My questions are :
why is reinterpret_cast not suitable for this occasion?
Is there another way, I can see the contents of the method pointer?
Using clang++ to compile a slightly modified version of your code (removed all the cout to not get thousands of lines...), we get this for main:
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
ret i32 0
}
Note that the bfunc and dfunc are two 64-bit integer values. If I compile for 32-bit x86 it is two i32 (so 32-bit integer values).
So, if we make main look like this:
int main() {
// your code goes here
typedef int (D::*fptr)(void);
fptr bfunc;
fptr dfunc;
bfunc=&D::Bfunc;
dfunc=&D::Dfunc;
D d;
(d.*bfunc)();
return 0;
}
the generated code looks like this:
; Function Attrs: norecurse uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%bfunc = alloca { i64, i64 }, align 8
%dfunc = alloca { i64, i64 }, align 8
%d = alloca %class.D, align 8
store i32 0, i32* %retval, align 4
store { i64, i64 } { i64 1, i64 16 }, { i64, i64 }* %bfunc, align 8
store { i64, i64 } { i64 9, i64 0 }, { i64, i64 }* %dfunc, align 8
call void #_ZN1DC2Ev(%class.D* %d) #3
%0 = load { i64, i64 }, { i64, i64 }* %bfunc, align 8
%memptr.adj = extractvalue { i64, i64 } %0, 1
%1 = bitcast %class.D* %d to i8*
%2 = getelementptr inbounds i8, i8* %1, i64 %memptr.adj
%this.adjusted = bitcast i8* %2 to %class.D*
%memptr.ptr = extractvalue { i64, i64 } %0, 0
%3 = and i64 %memptr.ptr, 1
%memptr.isvirtual = icmp ne i64 %3, 0
br i1 %memptr.isvirtual, label %memptr.virtual, label %memptr.nonvirtual
memptr.virtual: ; preds = %entry
%4 = bitcast %class.D* %this.adjusted to i8**
%vtable = load i8*, i8** %4, align 8
%5 = sub i64 %memptr.ptr, 1
%6 = getelementptr i8, i8* %vtable, i64 %5
%7 = bitcast i8* %6 to i32 (%class.D*)**
%memptr.virtualfn = load i32 (%class.D*)*, i32 (%class.D*)** %7, align 8
br label %memptr.end
memptr.nonvirtual: ; preds = %entry
%memptr.nonvirtualfn = inttoptr i64 %memptr.ptr to i32 (%class.D*)*
br label %memptr.end
memptr.end: ; preds = %memptr.nonvirtual, %memptr.virtual
%8 = phi i32 (%class.D*)* [ %memptr.virtualfn, %memptr.virtual ], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
%call = call i32 %8(%class.D* %this.adjusted)
ret i32 0
}
This is not entirely trivial to follow, but in essense:
%memptr.adj = Read adjustment from bfunc[1]
%2 = %d[%memptr.adj]
cast %2 to D*
%memptr.ptr = bfunc[0]
if (%memptr.ptr & 1) goto is_virtual else goto is_non_virtual
is_virtual:
%memptr.virtual=vtable[%memptr.ptr-1]
goto common
is_non_virtual:
%memptr.non_virtual = %memptr.ptr
common:
if we came from
is_non_virtual: %8 = %memptr.non_virtual
is_virtual: %8 = %memptr.virutal
call %8
I skipped some type-casts and stuff to make it simpler.
NOTE This is NOT meant to say "this is how it is implemented always. It's one example of what the compiler MAY do. Different compilers will do this subtly differently. But if the function may or may not be virtual, the compiler first has to figure out which. [In the above example, I'm fairly sure we can turn on optimisation and get much better code, but it would presumably just figure out exactly what's going on and remove all of the code, which for understanding how it works is pointless]
There is a very simple answer to this. Pointers-to-methods are not 'normal' pointers and can not be cast to those, even through reinterpret_cast. One can cast first to void*, and than to the long long, but this is really ill-advised.
Remember, size of pointer-to-method is not neccessarily (and usually is not!) equal to the size of 'normal' pointer. The way most compilers implement pointer-to-method, it is twice the size of 'normal' pointer.
GCC is going to complain for the pointer-to-method to void* cast in pedantic mode, but will generate code still.

Create a LLVM function with a reference argument (e.g. double &x)

I want to create, from scratch, a new function in LLVM IR. The LLVM code should correspond to a C++ function with a reference argument, say
void foo(double &x){
x=0;
}
The tutorial such as http://llvm.org/releases/2.6/docs/tutorial/JITTutorial1.html is too old (llvm 2.6) and does not consider pass-by-reference function.
Any hint on how to do this? Thanks.
In LLVM, Reference types are typically implemented with pointer types. For the following C++ source code,
int foo(int & i) {
return i;
}
int bar(int *i) {
return *i;
}
void baz(int i) {
foo(i);
bar(&i);
}
The corresponding IR is:
; Function Attrs: nounwind
define i32 #_Z3fooRi(i32* dereferenceable(4) %i) #0 {
entry:
%i.addr = alloca i32*, align 8
store i32* %i, i32** %i.addr, align 8
%0 = load i32*, i32** %i.addr, align 8
%1 = load i32, i32* %0, align 4
ret i32 %1
}
; Function Attrs: nounwind
define i32 #_Z3barPi(i32* %i) #0 {
entry:
%i.addr = alloca i32*, align 8
store i32* %i, i32** %i.addr, align 8
%0 = load i32*, i32** %i.addr, align 8
%1 = load i32, i32* %0, align 4
ret i32 %1
}
; Function Attrs: nounwind
define void #_Z3bazi(i32 %i) #0 {
entry:
%i.addr = alloca i32, align 4
store i32 %i, i32* %i.addr, align 4
%call = call i32 #_Z3fooRi(i32* dereferenceable(4) %i.addr)
%call1 = call i32 #_Z3barPi(i32* %i.addr)
ret void
}
You can find that there is no essential difference for i between functions foo and bar: dereferenceable is just a parameter attribute that you can add yourself during the code generation from the frontend.

LLVM Can't find getelementptr instruction

I have this byte code fragment:
define void #setGlobal(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32* %a.addr, align 4
store i32 %0, i32* #Global, align 4
%1 = load i32* %a.addr, align 4
store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4
store i32 2, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 2), align 4
ret void
}
I am using this code to find the getelementptr from "store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4":
for (Module::iterator F = p_Module.begin(), endF = p_Module.end(); F != endF; ++F) {
for (Function::iterator BB = F->begin(), endBB = F->end(); BB != endBB; ++BB) {
for (BasicBlock::iterator I = BB->begin(), endI = BB->end(); I
!= endI; ++I) {
if (StoreInst* SI = dyn_cast<StoreInst>(I)) {
if (Instruction *gep = dyn_cast<Instruction>(SI->getOperand(1)))
{
if (gep->getOpcode() == Instruction::GetElementPtr)
{
//do something
}
}
}
}
}
}
This code can't find the getelementptr. What am I doing wrong?
There are no getelementptr instructions in your bitcode snippet, which is why you can't find them.
The two cases that look like a getelementptr instructions are actually constant expressions - the telltale sign is that they appear as part of another instruction (store), which is not something you can do with regular instructions.
So if you want to search for that expression, you need to look for type GetElementPtrConstantExpr, not GetElementPtrInst.