From an llvm pass, I need to print an llvm instruction (Type llvm::Instruction) on the screen, just like as it appears in the llvm bitcode file. Actually my compilation is crashing, and does not reach the point where bitcode file is generated. So for debugging I want to print some instructions to know what is going wrong.
Assuming I is your instruction
I.print(errs());
By simply using the print method.
For a simple Hello World program, using C++'s range-based loops, you can do something like this:
for(auto& B: F){
for(auto& I: B){
errs() << I << "\n";
}
}
This gives the output:
%3 = alloca i32, align 4
%4 = alloca i8**, align 8
store i32 %0, i32* %3, align 4
store i8** %1, i8*** %4, align 8
%5 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* #.str, i64 0, i64 0))
ret i32 0
Related
I playing with LLVM and tried to compile simple C++ code using it
#include <stdio.h>
#include <stdlib.h>
int main()
{
int test = rand();
if (test % 2)
test += 522;
else
test *= 333;
printf("test %d\n", test);
}
Especially to test how LLVM treats code branches
Result I got is very strange, it gives valid result on execution, but looks unefficient
; Function Attrs: nounwind
define i32 #main() local_unnamed_addr #0 {
%1 = tail call i32 #rand() #3
%2 = and i32 %1, 1
%3 = icmp eq i32 %2, 0
%4 = add nsw i32 %1, 522
%5 = mul nsw i32 %1, 333
%6 = select i1 %3, i32 %5, i32 %4
%7 = tail call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([9 x i8], [9 x i8]* #.str, i64 0, i64 0), i32 %6)
ret i32 0
}
It looks like it executing both ways even if only one is needen
My question is: Should not LLVM in this case generate labels and why?
Thank you
P.S. I'm using http://ellcc.org/demo/index.cgi for this test
Branches can be expensive, so generating code without branches at the cost of one unnecessary add or mul instruction, will usually work out to be faster in practice.
If you make the branches of your if longer, you'll see that it'll eventually become a proper branch instead of a select.
The compiler tends to have a good understanding of which option is faster in which case, so I'd trust it unless you have specific benchmarks that show the version with select to be slower than a version that branches.
I have this byte code fragment:
define void #setGlobal(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32* %a.addr, align 4
store i32 %0, i32* #Global, align 4
%1 = load i32* %a.addr, align 4
store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4
store i32 2, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 2), align 4
ret void
}
I am using this code to find the getelementptr from "store i32 %1, i32* getelementptr inbounds ([5 x i32]* #GlobalVec, i32 0, i64 0), align 4":
for (Module::iterator F = p_Module.begin(), endF = p_Module.end(); F != endF; ++F) {
for (Function::iterator BB = F->begin(), endBB = F->end(); BB != endBB; ++BB) {
for (BasicBlock::iterator I = BB->begin(), endI = BB->end(); I
!= endI; ++I) {
if (StoreInst* SI = dyn_cast<StoreInst>(I)) {
if (Instruction *gep = dyn_cast<Instruction>(SI->getOperand(1)))
{
if (gep->getOpcode() == Instruction::GetElementPtr)
{
//do something
}
}
}
}
}
}
This code can't find the getelementptr. What am I doing wrong?
There are no getelementptr instructions in your bitcode snippet, which is why you can't find them.
The two cases that look like a getelementptr instructions are actually constant expressions - the telltale sign is that they appear as part of another instruction (store), which is not something you can do with regular instructions.
So if you want to search for that expression, you need to look for type GetElementPtrConstantExpr, not GetElementPtrInst.
I am trying to use GEP to get a pointer of i32 from an array.
But the problem is: I don't know the size of the array.
The IR document on llvm.org said GEP just adds the offsets to the base address with silently-wrapping two’s complement arithmetic.
So, I want to ask for some advice.
Is it safe like this:
%v1 = alloca i32
store i32 5, i32* %v1
%6 = load i32* %v1
%7 = bitcast i32* %v0 to [1 x i32]*
%8 = getelementptr [1 x i32]* %7, i32 0, i32 %6
%9 = load i32* %8
store i32 %9, i32* %v0
Type of %v0 is i32*, and I know %v0 is pointing to an array in mem, but the size is 9, not 1.
Then I "GEP" from %7 which I treat it as a [1 x i32], not [9 x i32] , but the "offset" is 5(%6).
So, is there any problem? Not safe, or just not good but basically OK?
First of all, the entire code you wrote is equivalent to:
%x = getelementptr i32* %v0, i32 5
%y = load i32* %x
store i32* %y, %v0
There's no reason to bitcast the pointer to [1 x i32]*, just use it as-is.
Regarding your question - using a gep to get the pointer is always safe (in the sense that it's well-defined and will never crash), however there's nothing stopping it from evaluating to a pointer beyond the bounds of the array; and in such a case, accessing the memory (as you do in the subsequent load instruction) is undefined.
Also, this link might be of interest: http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
I'm trying to figure out how to use the trampoline intrinsics in LLVM. The documentation makes mention of some amount of storage that's needed to store the trampoline in, which is platform dependent. My question is, how do I figure out how much is needed?
I found this example, that picks 32 bytes for apparently no reason. How does one choose a good value?
declare void #llvm.init.trampoline(i8*, i8*, i8*);
declare i8* #llvm.adjust.trampoline(i8*);
define i32 #foo(i32* nest %ptr, i32 %val)
{
%x = load i32* %ptr
%sum = add i32 %x, %val
ret i32 %sum
}
define i32 #main(i32, i8**)
{
%closure = alloca i32
store i32 13, i32* %closure
%closure_ptr = bitcast i32* %closure to i8*
%tramp_buf = alloca [32 x i8], align 4
%tramp_ptr = getelementptr [32 x i8]* %tramp_buf, i32 0, i32 0
call void #llvm.init.trampoline(
i8* %tramp_ptr,
i8* bitcast (i32 (i32*, i32)* #foo to i8*),
i8* %closure_ptr)
%ptr = call i8* #llvm.adjust.trampoline(i8* %tramp_ptr)
%fp = bitcast i8* %ptr to i32(i32)*
%val2 = call i32 %fp (i32 13)
; %val = call i32 #foo(i32* %closure, i32 42);
ret i32 %val2
}
Yes, trampolines are used to generate some code "on fly". It's unclear why do you need these intrinsics at all, because they are used to implement GCC's nested functions extension (in particular, when the address of the nested function is captured and the function access the stuff inside the enclosing function).
The best way to figure out the necessary size and alignment of trampoline buffer is to grep gcc sources for "TRAMPOLINE_SIZE" and "TRAMPOLINE_ALIGNMENT".
As far as I can see, at the time of this writing, the buffer of 72 bytes and alignment of 16 bytes will be enough for all the platforms gcc / LLVM supports.
I wish I could have tested it myself, but unfortunately it's impossible to do so in the current situation I'm in. Could anyone care to shed some light on this?
It returns the address of the value it is referencing.
Unlike a pointer, a reference isn't actually an object. It doesn't have an address itself, it's easier to think of it as an alias to an object that the compiler will then interpret.
Anyway, you can always test code online with IdeOne
#include <stdio.h>
void func(int &z)
{
printf("&z: %p", &z);
}
int main(int argc, char **argv)
{
int x = 0;
int &y = x;
printf("&x: %p\n", &x);
printf("&y: %p\n", &y);
func(x);
return 0;
}
Executed online, with results.
Address of the value.
(There is no such thing as address of reference)
You can always check on codepad : http://codepad.org/I1aBfWAQ
A further experiment: there are online compilers that allow inspection of intermediate results!
Taking Mahmoud's example on the Try out LLVM and Clang page, we get the following LLVM IR (truncated).
define i32 #main() nounwind uwtable {
%x = alloca i32, align 4
store i32 0, i32* %x, align 4, !tbaa !0
%1 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([8 x i8]* #.str, i64 0, i64 0), i32* %x)
%2 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([8 x i8]* #.str1, i64 0, i64 0), i32* %x)
%3 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([7 x i8]* #.str2, i64 0, i64 0), i32* %x) nounwind
ret i32 0
}
... the syntax is not too important, what is important is to note that a single variable has been declared: %x (whose name is suspiciously similar to that of the C function).
This means that the compiler has elided the y and z variable, no storage is ever allocated for them.
I will spare you the assembly listing ;)