LLVM-IR array pointer assignment - c++

In C++/C you can do this:
unsigned char A[12];
unsigned int *B;
int *C;
B = malloc(sizeof(unsigned int));
C = malloc(2*sizeof(int));
A[0] = *B;
A[4] = *C;
//Then go on to access A byte by byte.
I was wondering if this was possible in LLVM-IR, or would it immediately complain of a problem with types. Was about to dive into this, but thought I would see if anyone has tried this particular example. Would I GEP A's 0th location as a i8* and then B and C's as a i32*. I'm a bit confused as how to proceed, if this is at all possible.
Thanks ahead of time.
UPDATE:
Ok, if I instead added initialization for *B and C[0], C[1], it would the answer change for LLVM-IR /C / C++?

LLVM has the bitcast instruction which is often used to convert one type of pointer to another type of pointer - e.g., i32* to i8*.
So for example, if you want to access the 3rd byte of a 4-byte number, doing the following is perfectly legitimate:
%bytes = bitcast i32* %num to i8*
%third_byte = getelementptr i8* %bytes, i32 2
Just keep in mind the endianess when you do stuff like that.
And yes, you can use this technique to obtain pointers to specific locations in an array and store and load values from there, enabling you to do duplicate your entire example.

No. This is not possible in C/C++ too. You should not assign an uninitialized variable to another variable. It invokes undefined behavior. *B and *c are uninitialized.

Related

Allocation and access of heap arrays with LLVM

Starting with the caleidoscope tutorial and a stack exchange question (question) I tried to output some array-creation and access code with LLVM. The idea is to have an "alloca" stack variable "a" which holds a double* pointing to an array allocated with malloc.
The generated code fails, and I believe that the main problem is my Call to "CreateInBoundsGEP" in C++.
So my main question in one sentence is "How to Call CreateInBoundsGEP so that it outputs the right IR code?"
What i tried is the following:
My allocation code is created as output of the llvm c++ interface's "CreateMalloc" call from the question referenced above.
%a = alloca double*, align 8
%malloccall = tail call i8* #malloc(i32 48)
%0 = bitcast i8* %malloccall to double*
store double* %0, double** %a, align 8
This code looks good to me, but it already leads to an error/warning when checked with verifyFunction().
Call parameter type does not match function signature!
i32 48
Sadly it does not tell me, what the right parameter type would be (i64?). The IR reference does not refer to the "malloc"-function call at all but mentions a "malloc" IR-operation instead (reference)!
My main problem (also leading to memory errors if not caught before) occurs with write access to the array.
My first try was copying (more or less) directly from the referenced stack exchange question 1:
//ret is the base adress of the pointer, ie "a"
//rhs is the Value* to the right hand side that is assigned
//index is the Value* to the array index
auto element_ptr = Builder->CreateInBoundsGEP(ret, index, "acc_tmp");
Builder->CreateStore(rhs, element_ptr);
Which outputs (for a[1]=5 as input code)
%acc_tmp = getelementptr inbounds double*, double** %a, i32 1
store double 5.000000e+00, double** %acc_tmp, align 8
This creates a "verifyFunction" error and I can see that "double**" should probably be "double*".
Since I also got a deprecation warning, I decided to try the CreateInBoundsGEP with a type parameter.
Since the documentation does not tell me whether "Type" should be the element or pointer type, I tried both
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType()->getPointerTo(), ret, index, "acc_tmp");
Or
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType(), ret, index, "acc_tmp");
Both do not work, the first version outputs the same code as without passing a type, the second version leads to
static llvm::GetElementPtrInst *llvm::GetElementPtrInst::Create(llvm::Type *, llvm::Value *, ArrayRef<llvm::Value *>, const llvm::Twine &, llvm::Instruction *): Assertion `cast<PointerType>(Ptr->getType()->getScalarType()) ->isOpaqueOrPointeeTypeMatches(PointeeType)' failed.
As I noticed in my original question, there is one pointer* too much in my instruction. Initially I did not understand why this is the case, but then I found the answer to my problem in a seemingly unrelated question 1:
If you directly use the return value of "CreateMalloc" as argument for a "CreateInBoundsGEP", the Code that I originally copied from 2 will work.
However in my case there is one more step involved: I store the "CreateMalloc" return value in a local variable, which in turn is referenced by a pointer allocated with "alloca". Because of this, I need one additional dereferencing step compared to the original Code Snippet to access my array elements.
As mentioned in 1 a dereference in LLVM-IR is just a "load". So a correct array access code looks like
//ret is the pointer to(!) the base adress of the array, ie "a"
//rhs is the Value* to the right hand side that is assigned
//index is the Value* holding the array index
llvm::Value* index = visit(ctx->index).as<llvm::Value*>();
llvm::Value* ret_deref = Builder->CreateLoad(llvm::Type::getDoubleTy(*TheContext)->getPointerTo(),ret,"deref_tmp");
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType(), ret_deref, index, "acc_tmp");
Builder->CreateStore(rhs, element_ptr);

llvm::VectorType with size from runtime constant

Problem
I am trying to create a vector type in LLVM (version 12) to exploit the SIMD feature associated with this type. However, the required size of the array is stored in an integer variable. The desired LLVM IR code could possibly look like,
;; Pseudo-code for the desired LLVM IR
%0 = load i64, i64* %a
%vec = alloca <%0 x double>, align 16
Generating such IR code seems to be impossible though.
It is possible to generate vector alloca with compile-time constants, e.g. a vector of size 4 could be generated as
%vec = alloca <4 x double>, align 16
using the LLVM C++ API as
llvm::Type* I = llvm::Type::getDoubleTy(TheContext);
auto arr_type = llvm::VectorType::get(I,4,false);
llvm::AllocaInst* arr_alloc = Builder.CreateAlloca(arr_type, 0 , "vec" );
however using a runtime constant obtained from a variable seems to be a problem, since the llvm::VectorType::get interface only allows the size to be specified as an unsigned int. I.e. the available interface looks like
static VectorType* llvm::VectorType::get ( Type * ElementType,
unsigned NumElements,
bool Scalable
)
However, if I load the variable value from %a and I cant create a vector type from it using,
llvm::Value *SIZE = Builder.CreateLoad(IntType,Address_Of_Variable_A,"a");
auto arr_type = llvm::VectorType::get(I,SIZE,false); // this line fails to compile (since SIZE is not an unsigned int)
I also could not typecast the Value* pointer to a llvm::ConstantInt* pointer to get the integer value back from the Value* as done in https://stackoverflow.com/a/5315581/2940917 .
This happens due to the fact that SIZE is in this case a LoadInst* as opposed to being created from a ConstantInt::get as done in the linked question.
Is there a way to achieve this? This seems like an essential operation for many cases. It would be surprising if there is no way to declare a vector size from the runtime constant.
Could someone point me to the right information source/idea?

LLVM IR: How to get size of array, in llvm ir-code, when passed as argument to function?

I have a function that takes an array as argument and I need to get the size of the array first thing in the function. I need to do this in LLVM IR. Is this possible? I can access the array but I don't know the size.
void test(int[] a) {
}
is right now translating to
define void #test(i32* %__p__a) {
entry:
%a = alloca i32*, align 4
store i32* %__p__a , i32** %a, align 4
ret void
}
I need to get the size of the array first thing in the function. I need to do this in LLVM IR. Is this possible?
If all you have is an i32* with no additional information about what it points to, then no, it's not possible. In order to get an array's size, you'll need to store that information somewhere where the test function can access it.
Since this is your own language and you control what the generated LLVM IR looks like, you could for example represent arrays as structs that contain the array's size and a pointer to the data.

Any difference between vector and array in memory layout of LLVM?

I have an array and a vector, and both of them hold the same data, like 0, 1, 2, 3, 4.
Then I use GEP to get the ptr of array,
%0 = getelementptr [5 x i32]* %arr, i32 0, i32 3
%1 = load i32* %0
so, %0 is the pointer of the 4th element in the array pointed by %arr, and the value of %1 is 3.
But now, I bitcast the pointer to the vector into a pointer to i32:
%2 = bitcast <5 x i32>* %Vec to i32*
and:
%3 = getelementptr i32* %2, i32 3
%4 = load i32* %3
I don't know exactly if there is any difference of the layout in the memory between array and vector.
If there is no difference, I think that way to get the element from a vector is ok.
So, am I on the right way to do like that?
According to "The Often Misunderstood GEP Instruction" (http://llvm.org/docs/GetElementPtr.html) question "Can GEP index into vector elements?", "This hasn’t always been forcefully disallowed, though it’s not recommended. It leads to awkward special cases in the optimizers, and fundamental inconsistency in the IR. In the future, it will probably be outright disallowed."
So it's probably not a good idea to use GEP against vectors, but it's doable.
And in http://llvm.org/docs/doxygen/html/classllvm_1_1SequentialType.html, it says "All of these represent "arrays" in memory. The array type represents a specifically sized array, pointer types are unsized/unknown size arrays, vector types represent specifically sized arrays that allow for use of SIMD instructions. "
So it's better to decide whether or not vector is desired. If it does be the case, then 'extractelement' instruction is probably better (http://llvm.org/docs/LangRef.html#extractelement-instruction).
You can use a bitcast followed by a gep to get the 4th item in a vector, but it's redundant - you can just use a gep by itself, in exactly the same way as you have done with the array.
When using gep, you don't need to know anything about the memory layouts. In any case the memory for vectors is always laid out sequentially, as can be inferred from how bitcast behaves between vectors and integers; and since you can't bitcast arrays, their memory layout is immaterial.

Get pointer to llvm::Value previously allocated for CreateLoad function

I'm new to llvm and I'm writing a small llvm IR Builder.
I use the IRBuilder and all these Create* functions to generate my IR.
What I'm trying to do is to create a load instruction which create a new SSA local variable with value of a previously allocated llvm::Value.
What I expected to have :
%2 = load i32* %1
With %2 results of load instruction and %1 my previously allocated Value (CreateAlloca)
Here is what I tried :
// Get Ptr from Val
Value* ptr = ConstantExpr::getIntToPtr((Constant*)loc[n],PointerType::getUnqual(builder->getInt32Ty()));
// Générate load instruction with the new Ptr
builder->CreateLoad(ptr);
And here is what I have :
%2 = load i32* null
loc is an array which contains all my llvm::Value*
Can you please tell me what I'm doing wrong ? Or maybe if I'm on a bad way ?
Thanks.
ConstantExpr::getIntToPtr() creates a constant expression. So in effect, what you're trying to generate is equivalent to this IR:
%2 = load i32* inttoptr (i32 %1 to i32*)
But this is illegal since a constant expression, as hinted by its name, only supports constants, and %1 isn't a constant. ConstantExpr::getIntToPtr() requires a Constant as a first argument to verify it, but you passed it a non-constant value which was forcefully cast to a constant.
The correct way to convert a non-constant integer to a pointer is with IRBuilder::createIntToPtr. However, since you say the previous value (loc[n]) was created via an alloca then it's already a pointer, and you don't need to perform any conversion: just do builder->CreateLoad(loc[n]).
By the way, the proper way to cast a Value to a Constant in LLVM is not via a c-style cast but via cast<>, like so: cast<Constant>(loc[n]).