Allocation and access of heap arrays with LLVM - c++

Starting with the caleidoscope tutorial and a stack exchange question (question) I tried to output some array-creation and access code with LLVM. The idea is to have an "alloca" stack variable "a" which holds a double* pointing to an array allocated with malloc.
The generated code fails, and I believe that the main problem is my Call to "CreateInBoundsGEP" in C++.
So my main question in one sentence is "How to Call CreateInBoundsGEP so that it outputs the right IR code?"
What i tried is the following:
My allocation code is created as output of the llvm c++ interface's "CreateMalloc" call from the question referenced above.
%a = alloca double*, align 8
%malloccall = tail call i8* #malloc(i32 48)
%0 = bitcast i8* %malloccall to double*
store double* %0, double** %a, align 8
This code looks good to me, but it already leads to an error/warning when checked with verifyFunction().
Call parameter type does not match function signature!
i32 48
Sadly it does not tell me, what the right parameter type would be (i64?). The IR reference does not refer to the "malloc"-function call at all but mentions a "malloc" IR-operation instead (reference)!
My main problem (also leading to memory errors if not caught before) occurs with write access to the array.
My first try was copying (more or less) directly from the referenced stack exchange question 1:
//ret is the base adress of the pointer, ie "a"
//rhs is the Value* to the right hand side that is assigned
//index is the Value* to the array index
auto element_ptr = Builder->CreateInBoundsGEP(ret, index, "acc_tmp");
Builder->CreateStore(rhs, element_ptr);
Which outputs (for a[1]=5 as input code)
%acc_tmp = getelementptr inbounds double*, double** %a, i32 1
store double 5.000000e+00, double** %acc_tmp, align 8
This creates a "verifyFunction" error and I can see that "double**" should probably be "double*".
Since I also got a deprecation warning, I decided to try the CreateInBoundsGEP with a type parameter.
Since the documentation does not tell me whether "Type" should be the element or pointer type, I tried both
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType()->getPointerTo(), ret, index, "acc_tmp");
Or
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType(), ret, index, "acc_tmp");
Both do not work, the first version outputs the same code as without passing a type, the second version leads to
static llvm::GetElementPtrInst *llvm::GetElementPtrInst::Create(llvm::Type *, llvm::Value *, ArrayRef<llvm::Value *>, const llvm::Twine &, llvm::Instruction *): Assertion `cast<PointerType>(Ptr->getType()->getScalarType()) ->isOpaqueOrPointeeTypeMatches(PointeeType)' failed.

As I noticed in my original question, there is one pointer* too much in my instruction. Initially I did not understand why this is the case, but then I found the answer to my problem in a seemingly unrelated question 1:
If you directly use the return value of "CreateMalloc" as argument for a "CreateInBoundsGEP", the Code that I originally copied from 2 will work.
However in my case there is one more step involved: I store the "CreateMalloc" return value in a local variable, which in turn is referenced by a pointer allocated with "alloca". Because of this, I need one additional dereferencing step compared to the original Code Snippet to access my array elements.
As mentioned in 1 a dereference in LLVM-IR is just a "load". So a correct array access code looks like
//ret is the pointer to(!) the base adress of the array, ie "a"
//rhs is the Value* to the right hand side that is assigned
//index is the Value* holding the array index
llvm::Value* index = visit(ctx->index).as<llvm::Value*>();
llvm::Value* ret_deref = Builder->CreateLoad(llvm::Type::getDoubleTy(*TheContext)->getPointerTo(),ret,"deref_tmp");
auto element_ptr = Builder->CreateInBoundsGEP(rhs->getType(), ret_deref, index, "acc_tmp");
Builder->CreateStore(rhs, element_ptr);

Related

llvm::VectorType with size from runtime constant

Problem
I am trying to create a vector type in LLVM (version 12) to exploit the SIMD feature associated with this type. However, the required size of the array is stored in an integer variable. The desired LLVM IR code could possibly look like,
;; Pseudo-code for the desired LLVM IR
%0 = load i64, i64* %a
%vec = alloca <%0 x double>, align 16
Generating such IR code seems to be impossible though.
It is possible to generate vector alloca with compile-time constants, e.g. a vector of size 4 could be generated as
%vec = alloca <4 x double>, align 16
using the LLVM C++ API as
llvm::Type* I = llvm::Type::getDoubleTy(TheContext);
auto arr_type = llvm::VectorType::get(I,4,false);
llvm::AllocaInst* arr_alloc = Builder.CreateAlloca(arr_type, 0 , "vec" );
however using a runtime constant obtained from a variable seems to be a problem, since the llvm::VectorType::get interface only allows the size to be specified as an unsigned int. I.e. the available interface looks like
static VectorType* llvm::VectorType::get ( Type * ElementType,
unsigned NumElements,
bool Scalable
)
However, if I load the variable value from %a and I cant create a vector type from it using,
llvm::Value *SIZE = Builder.CreateLoad(IntType,Address_Of_Variable_A,"a");
auto arr_type = llvm::VectorType::get(I,SIZE,false); // this line fails to compile (since SIZE is not an unsigned int)
I also could not typecast the Value* pointer to a llvm::ConstantInt* pointer to get the integer value back from the Value* as done in https://stackoverflow.com/a/5315581/2940917 .
This happens due to the fact that SIZE is in this case a LoadInst* as opposed to being created from a ConstantInt::get as done in the linked question.
Is there a way to achieve this? This seems like an essential operation for many cases. It would be surprising if there is no way to declare a vector size from the runtime constant.
Could someone point me to the right information source/idea?

LLVM IR: How to get size of array, in llvm ir-code, when passed as argument to function?

I have a function that takes an array as argument and I need to get the size of the array first thing in the function. I need to do this in LLVM IR. Is this possible? I can access the array but I don't know the size.
void test(int[] a) {
}
is right now translating to
define void #test(i32* %__p__a) {
entry:
%a = alloca i32*, align 4
store i32* %__p__a , i32** %a, align 4
ret void
}
I need to get the size of the array first thing in the function. I need to do this in LLVM IR. Is this possible?
If all you have is an i32* with no additional information about what it points to, then no, it's not possible. In order to get an array's size, you'll need to store that information somewhere where the test function can access it.
Since this is your own language and you control what the generated LLVM IR looks like, you could for example represent arrays as structs that contain the array's size and a pointer to the data.

LLVM IR: Get LVALUE operand

I have following instruction:
%ptrA = getelementptr float, float addrspace(1)* %A, i32 %id
I can get the operands %A and %id using getOperand(0) and getOperand(1). I was wondering if getOperand will work on %ptrA? If yes, would it be getOperand(3)?
------------------------------------Edit----------------------------
So I changed my code as follows:
for (Instruction &I : instructions(F)){
if (cast<Operator>(I).getOpcode() == Instruction::GetElementPtr){
Value* AddrPointer = cast<Value>(I);
I keep getting error:
error: cannot convert ‘llvm::Value’ to ‘llvm::Value*’ in initialization
Value* AddrPointer = cast<Value>(I);
^
I see that there is some problem with type mismatch.
Thank you.
Your question lacks quite a bit of context, but I will assume you're working with an llvm::Instruction * representing that particular getelementptr instruction. No, getOperand() will not allow you to access %ptrA. In general, getOperand() only allows access to the instruction's operands, or arguments, but not its return value. In IR, %ptrA is not so much an operand of the instruction like in traditional assembly, but can be thought of more like the return value of the instruction.
The syntax for what you're trying to do is actually very convenient. An llvm::Instruction object itself represents its own return value. In fact, llvm::Instruction is a derived class of llvm::Value. You can use llvm::cast, with llvm::Value as the template argument and the result will actually be an llvm::Value * which represents the return value of getelementptr.
llvm::Instruction * instruc;
//next line assumes instruc has your getelementptr instruction
llvm::Value * returnval = llvm::cast<llvm::Value>(instruc);
//returnval now contains the result of the instruction
//you could potentially create new instructions with IRBuilder using returnval as an argument, and %ptrA would actually be passed as an operand to those instructions
Furthermore, many of the functions that actually create instructions (the llvm::IRBuilder::Create* instructions, for instance) don't even return llvm::Instruction *s but rather llvm::Value *s. This is very convenient, because most of the time if you need to feed the return value of an instruction into another instruction, you can simply pass the return value of whatever Create function you called into the next Create function, without needing to do any casting.

LLVM-IR array pointer assignment

In C++/C you can do this:
unsigned char A[12];
unsigned int *B;
int *C;
B = malloc(sizeof(unsigned int));
C = malloc(2*sizeof(int));
A[0] = *B;
A[4] = *C;
//Then go on to access A byte by byte.
I was wondering if this was possible in LLVM-IR, or would it immediately complain of a problem with types. Was about to dive into this, but thought I would see if anyone has tried this particular example. Would I GEP A's 0th location as a i8* and then B and C's as a i32*. I'm a bit confused as how to proceed, if this is at all possible.
Thanks ahead of time.
UPDATE:
Ok, if I instead added initialization for *B and C[0], C[1], it would the answer change for LLVM-IR /C / C++?
LLVM has the bitcast instruction which is often used to convert one type of pointer to another type of pointer - e.g., i32* to i8*.
So for example, if you want to access the 3rd byte of a 4-byte number, doing the following is perfectly legitimate:
%bytes = bitcast i32* %num to i8*
%third_byte = getelementptr i8* %bytes, i32 2
Just keep in mind the endianess when you do stuff like that.
And yes, you can use this technique to obtain pointers to specific locations in an array and store and load values from there, enabling you to do duplicate your entire example.
No. This is not possible in C/C++ too. You should not assign an uninitialized variable to another variable. It invokes undefined behavior. *B and *c are uninitialized.

Get pointer to llvm::Value previously allocated for CreateLoad function

I'm new to llvm and I'm writing a small llvm IR Builder.
I use the IRBuilder and all these Create* functions to generate my IR.
What I'm trying to do is to create a load instruction which create a new SSA local variable with value of a previously allocated llvm::Value.
What I expected to have :
%2 = load i32* %1
With %2 results of load instruction and %1 my previously allocated Value (CreateAlloca)
Here is what I tried :
// Get Ptr from Val
Value* ptr = ConstantExpr::getIntToPtr((Constant*)loc[n],PointerType::getUnqual(builder->getInt32Ty()));
// Générate load instruction with the new Ptr
builder->CreateLoad(ptr);
And here is what I have :
%2 = load i32* null
loc is an array which contains all my llvm::Value*
Can you please tell me what I'm doing wrong ? Or maybe if I'm on a bad way ?
Thanks.
ConstantExpr::getIntToPtr() creates a constant expression. So in effect, what you're trying to generate is equivalent to this IR:
%2 = load i32* inttoptr (i32 %1 to i32*)
But this is illegal since a constant expression, as hinted by its name, only supports constants, and %1 isn't a constant. ConstantExpr::getIntToPtr() requires a Constant as a first argument to verify it, but you passed it a non-constant value which was forcefully cast to a constant.
The correct way to convert a non-constant integer to a pointer is with IRBuilder::createIntToPtr. However, since you say the previous value (loc[n]) was created via an alloca then it's already a pointer, and you don't need to perform any conversion: just do builder->CreateLoad(loc[n]).
By the way, the proper way to cast a Value to a Constant in LLVM is not via a c-style cast but via cast<>, like so: cast<Constant>(loc[n]).