Will GetElementPtr work as expected - llvm

I am writing llvm code using C++. I have a place in my code where the below scenario happens
1. %117 = phi <2 x double>* [ %105, %aligned ], [ %159, %116 ]
7. %123 = getelementptr <2 x double>* %117, i32 0
8. %127 = getelementptr <2 x double>* %123, i32 0
9. %128 = load <2 x double>* %127
10. %129 = getelementptr <2 x double>* %123, i32 1
11. %130 = load <2 x double>* %129
12. %131 = shufflevector <2 x double> %128, <2 x double> %130, <2 x i32> <i32 1, i32 3>
I am trying to compute the same address which should point to same data type twice in lines 7 and 8 with the address parameter value different. Is it safe to do this or will this lead to undefined results?

The instruction
%x = getelementptr %anytype* %y, i32 0
Is completely meaningless; it's as if you've written (the illegal):
%x = %y
So yes, both %123 and %127 will point to the same memory. It's safe, but redundant: you can just use %117 directly wherever %123 or %127 are used. The only problematic thing in your snippet is that the value numbering is not sequential, but I assume that's just from pasting just parts of the code here.

Related

How to use CreateInBoundsGEP in cpp api of llvm to access the element of an array?

I am new to llvm programming, and I am trying to write cpp to generate llvm ir for a simple C code like this:
int a[10];
a[0] = 1;
I want to generate something like this to store 1 into a[0]
%3 = getelementptr inbounds [10 x i32], [10 x i32]* %2, i64 0, i64 0
store i32 1, i32* %3, align 16
And I tried CreateGEP: auto arrayPtr = builder.CreateInBoundsGEP(var, num); where var and
num are both of type llvm::Value*
but I only get
%1 = getelementptr inbounds [10 x i32], [10 x i32]* %0, i32 0
store i32 1, [10 x i32]* %1
I searched google for a long time and looked the llvm manual but still don't know what Cpp api to use and how to use it.
Really appreciate it if you can help!
Note that the 2nd argument to IRBuilder::CreateInBoundsGEP (1st overload) is actually ArrayRef<Value *>, which means it accepts an array of Value * values (including C-style array, std::vector<Value *> and std::array<Value *, LEN> and others).
To generate a GEP instruction with multiple (child) addresses, pass an array of Value * to the second argument:
Value *i32zero = ConstantInt::get(contexet, APInt(32, 0));
Value *indices[2] = {i32zero, i32zero};
builder.CreateInBoundsGEP(var, ArrayRef<Value *>(indices, 2));
Which will yield
%1 = getelementptr inbounds [10 x i32], [10 x i32]* %0, i32 0, i32 0
You can correctly identify that %1 is of type i32*, pointing to the first item in the array pointed to by %0.
LLVM documentation on GEP instruction: https://llvm.org/docs/GetElementPtr.html

LLVM: How to create char array reference

I am trying to implement an opt pass that will compress string constants in ROM and then, at the cost of CPU + RAM, re-materialize the values at runtime. Before implementing compression, I just want to place all strings in a table, and do a lookup.
Example:
printf("Hello");
Would become the equivalent of
char placeholder[6];
int strID = 0;
tableLookup(placeholder, 0 /*ID*/); // Fill array
printf(placeholder);
The LLVM IR I was able to generate looks like the following:
%fakeString = alloca [10 x i8], align 1
call void #llvm.dbg.declare(metadata [10 x i8]* %fakeString, metadata !60, metadata !25), !dbg !64
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %fakeString, i32 0, i32 0, !dbg !65
call void #tableLookup(i8* %arraydecay, i32 0) #2, !dbg !66
How would I be able to create this programatically? The two main pieces I am missing:
1. How to create the array reference (after creating the alloca instruction)
2. How to get result from tableLookup and replace the operand in the old printf()
Any help would be much appreciated!

LLVM pass to count vector type instructions

I am trying to write an LLVM pass that counts instructions of vector type.
for instructions like :
%24 = or <2 x i64> %21, %23
%25 = bitcast <16 x i8> %12 to <8 x i16>
%26 = shl <8 x i16> %25, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%27 = bitcast <8 x i16> %26 to <2 x i64>
I wrote this code:
for (auto &F : M) {
for (auto &B : F) {
for (auto &I : B) {
if (auto* VI = dyn_cast<InsertElementInst>(&I)) {
Value* op = VI->getOperand(0);
if (op->getType()->isVectorTy()){
++vcount;
}
}
But for some reason if (auto* VI = dyn_cast<InsertElementInst>(&I)) is never satisfied.
Any idea why?
Thanks in advance.
InsertElementInst is one specific instruction (that inserts an element into a vector) - and there is none in your list of instructiokns.
You probably want to dyn_cast to a regular use the Instruction in I as it is.
[I personally would use a one of the function or module pass classes as a base, so you only need to implement the inner loops of your code, but that's more of a "it's how you're supposed to do things", not something you HAVE to do to make it work].
In LLVM, the instruction is the same as it's result. so for an example
%25 = bitcast <16 x i8> %12 to <8 x i16>
when you cast Instruction I to value you get %25
Value* psVal = cast<Value>(&I);
and then you can check if it is of vector type or not by getType()->isVectorTy().
Also i suggest you look at inheritance diagram of llvm Value for more clarification
here http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html

What does an overlong bitshift on a LLVM vector yield?

The LLVM documentation for 'shl' says that
<result> = shl i32 1, 32
is an undefined value because it's shifting by greater than or equal to the number of bits in an i32. However, it's not clear to me what happens with
<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 32>
Is only the second element of the result undefined (result=<2 x i32> < i32 2, i32 undef>), or is the result as a whole undefined (result=<2 x i32> undef)?

LLVM IR: efficiently summing a vector

I'm writing a compiler that's generating LLVM IR instructions. I'm working extensively with vectors.
I would like to be able to sum all the elements in a vector. Right now I'm just extracting each element individually and adding them up manually, but it strikes me that this is precisely the sort of thing that the hardware should be able to help with (as it sounds like a pretty common operation). But there doesn't seem to be an intrinsic to do it.
What's the best way to do this? I'm using LLVM 3.2.
First of all, even without using intrinsics, you can generate log(n) vector additions (with n being vector length) instead of n scalar additions, here's an example with vector size 8:
define i32 #sum(<8 x i32> %a) {
%v1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v2 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%sum1 = add <4 x i32> %v1, %v2
%v3 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
%v4 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%sum2 = add <2 x i32> %v3, %v4
%v5 = extractelement <2 x i32> %sum2, i32 0
%v6 = extractelement <2 x i32> %sum2, i32 1
%sum3 = add i32 %v5, %v6
ret i32 %sum3
}
If your target has support for these vector additions then it seems highly likely the above will be lowered to use those instructions, giving you performance.
Regarding intrinsics, there are no target-independent intrinsics to handle this. If you're compiling to x86, though, you do have access to the hadd instrinsics (e.g. llvm.x86.int_x86_ssse3_phadd_sw_128 to add two <4 x i32> vectors together). You'll still have to do something similar to the above, only the add instructions could be replaced.
For more information about this you can search for "horizontal sum" or "horizontal vector sum"; for instance, here are some relevant stackoverflow questions for a horizontal sum on x86:
horizontal sum of 8 packed 32bit floats
Fastest way to do horizontal vector sum with AVX instructions
Fastest way to do horizontal float vector sum on x86