ConstantStruct or ConstantArray read memory content

ConstantStruct or ConstantArray read memory content - c++

I'm writing some llvm (3.7) pass, and I'd need some information about Constant classes.
In the llvm-IR I'm parsing, there are "ConstantStruct", or "ConstantArray" instances which are used as initializers for global variables. For example:
%struct.S = type { i32, i32, i32, i32 }
#s = global [2 x %struct.S] [%struct.S { i32 6, i32 8, i32 -8, i32 -5 }, %struct.S { i32 0, i32 2, i32 -1, i32 2 }], align 4
My work would be much easier if I could read the memory of thoses constants, without having to recursively go through all elements (which can also be ConstantStructs or arrays).
For example, I'd need a function as
llvm::ConstantStruct* initializer = globalVar->getInitializer();
void* memoryContent;
int sizeInBytes = initializer->getMemoryContent(&memoryContent);
So far, I have to read elements one by one, which is a painful (and bug prone) process.
Any hint will be appreciated.

I just understood it's not possible due to possible ConstExpr in the initializer operands.
However, I managed to get a nice recursive implementations with a big switch on valueId.

Related

llvm - Access And Call Function Pointer In A Global Array Without Horrible Pointer Hacking

I am having quite some trouble programmatically accessing a function pointer in a global array programmatically. I have a global array of function pointers, my "lookup table" which I basically I am using for "overloads". Every time I try to GetElementPointer (GEP)/getelementptr an element in this array with the desired type, I get a runtime assertion:
warp_compiler: /root/.conan/data/llvm-core/13.0.0/_/_/package /6efbb14f313e71b5e1dbf77c1c011f47614b7c7c/include/llvm/IR/
Instructions.h:960: static llvm::GetElementPtrInst* llvm::GetElementPtrInst::Create(
llvm::Type*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, const llvm::Twine&, llvm::Instruction*):
Assertion `cast<PointerType>(Ptr->getType()->getScalarType()) ->isOpaqueOrPointeeTypeMatches(PointeeType)'
failed.
Aborted (core dumped)
Now the type of the array when compiled is [3 x i32 (i32)*] by default it tries to do a a GEP on [3 x i32 (i32)*]* with element type [3 x i32 (i32)*] which does not work.
If I manaually edit the code to be:
%option_address = getelementptr i32 (i32)*, [3 x i32 (i32)*]* #my_function_1_table, i32 %7
Or too:
%option_address = getelementptr i32 (i32)*, [3 x i32 (i32)*] #my_function_1_table, i32 %7
it works just dandy, the ladder is really what I am looking to do. But I cant seem to do it probrammatically because of this exception.
I have tried casting the array to i32 (i32)* with:
auto first_element = context->builder.CreatePointerBitCastOrAddrSpaceCast(
(llvm::Value*) lookup_table_global,
(llvm::Type*) function->getType(),
"cast"
);
Then trying to access the elements with something like:
auto element = context->builder.CreateGEP(
(llvm::Type*) function->getType(),
first_element,
index_array,
"option_address"
);
But I get that exception again, and it does work if I type it manually into the IR
%option_address = getelementptr i32 (i32)*, i32 (i32)* #my_function_1_table, i32 %7
Seems like a pretty regular way to access an array, right?
But I cant seem to do it programmatically, because if the assertion, I even tried to make a work around by tryng to inherit from GetElementPtrInst directly and omitting the assertion, but couldn't (because its constructor is private).
Currently, my solution is to cast the array to a i32 (i32)* then to a [1 x i32 (i32)*] then do the GEP on a [1 x i32(i32)*]* with a [1 x i32(i32)]
%option_address = getelementptr [1 x i32 (i32)*], [1 x i32 (i32)*]* bitcast ([3 x i32 (i32)*]* #my_function_1_table to [1 x i32 (i32)*]*), i32 %7
This is horrible.
Does anyone know how I can simply access the function pointers I need from a global (constant) array so they can be called?
Also is my current solution portable?
Thank you!

Sorry you've run into this challenging aspect of LLVM. It definitely causes confusion.
There is an entire webpage dedicated to trying to help folks understand the counter-intuitive design of this instruction. While the design is well motivated from within LLVM, it causes lots of folks confusion and frustration when they first encounter it.
The challenge you're hitting is because a GEP instruction in LLVM always operates on a pointer, and with global variables, that pointer is to the variable. When the global variable is an array as in your case, this is extra confusing -- GEP has to go through an extra layer of pointer before it gets to the array you're trying to index with it.
The first section of the GEP site I mentioned above specifically explains how the first index to a GEP works -- it indexes the base pointer directly.
The second section then specifically clarifies why global variables end up surprising here. The global variable, #my_function_1_table in your case, is a pointer to itself. You'll have to index that with a simple i32 0 index first. Then you can add an additional index into the array that global variable points to.
So for a global variable with type [3 x i32 (i32)*], if you want to extract the second element of the array, you need:
%fptr = getelementptr [3 x i32 (i32)*], [3 x i32 (i32)*]* #my_function_1_table, i32 0, i32 2
The first i32 0 here indexes the global itself. The second index of i32 2 indexes into the array.
You can also use Clang to get example LLVM IR that can help explain how to do things. For example, here is some C++ that does something similar to what you're trying to do:
using FPtrT = int (*)(int);
extern FPtrT function_ptrs[3];
int test(int i) {
FPtrT fptr = function_ptrs[i];
return (*fptr)(42);
}
And this turns into the following LLVM IR after some basic optimizations (-O1):
#function_ptrs = external dso_local local_unnamed_addr global [3 x i32 (i32)*], align 16
define dso_local i32 #_Z4testi(i32 %0) local_unnamed_addr #0 {
%2 = sext i32 %0 to i64
%3 = getelementptr inbounds [3 x i32 (i32)*], [3 x i32 (i32)*]* #function_ptrs, i64 0, i64 %2
%4 = load i32 (i32)*, i32 (i32)** %3, align 8, !tbaa !4
%5 = call i32 %4(i32 42)
ret i32 %5
}
attributes #0 = { mustprogress uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
Here you can see %3 is doing a dynamic (and inbounds, but that's orthogonal) version of this indexing.
You can play with this kind of IR generation using Compiler Explorer: https://cpp.compiler-explorer.com/z/ETa8nvTvh
Once you're using the API to create this two index GEP it should start working for you.
Also, just so you (or others) reading this don't get confused: the LLVM IR syntax changed here recently, so the latest versions of LLVM don't look quite the same. You can switch from Clang v13 to a more recent one to see what it looks like, for example here: https://cpp.compiler-explorer.com/z/Kc4er413G

How can I check if two GEP instructions are semantically equal or not?

I have two GEP instructions looking like below:
%size = getelementptr inbounds %struct.ArrayInfo, %struct.ArrayInfo* %0, i32 0, i32 0
...
%size = getelementptr inbounds %struct.ArrayInfo, %struct.ArrayInfo* %1, i32 0, i32 0
Essentially these two are accessing the same struct field. Is there a way to check if these two instructions are equivalent in llvm? I tried comparing pointers of GEPOperator (GEPOperator*), but it looks like they are different.

Try isSameOperationAs(). If you cast both size variables in your example to llvm::Instruction and call this method on one with the other as an argument, you'll get a true value.

why align 8 for int?

When i compile c++ class to IR code, the int assignment statement will turn into align 8(while one member is double). Why? When all members are int type, it will be align 4.
class _AA_ {
public:
int a = 11;
double b = 22;
};
This will turn into:
define linkonce_odr void #_ZN4_AA_C2Ev(%class._AA_*) unnamed_addr #1 align 2 {
%2 = alloca %class._AA_*, align 8
store %class._AA_* %0, %class._AA_** %2, align 8
%3 = load %class._AA_*, %class._AA_** %2, align 8
%4 = getelementptr inbounds %class._AA_, %class._AA_* %3, i32 0, i32 0
store i32 11, i32* %4, align 8
%5 = getelementptr inbounds %class._AA_, %class._AA_* %3, i32 0, i32 1
store double 2.200000e+01, double* %5, align 8
ret void
}

Since you are wrapping the two variables in a structure (class and struct are the same for this purpose), their relative positions must always be the same. So the structure as a whole must have highest alignment of all members, which is 8 from the double.
And since you have standard layout object, the first declared member must be at offset 0, so also has the alignment of the containing object.
If you only have integers, their alignment is just 4, so the object has only 4 as well.
Other than the first member will only get their natural alignment generally as they will be placed after the preceding element with just enough padding to satisfy that. It is only the first element that inherits the alignment of the containing object.

The GEP Instruction: i32 vs i64

I've been trying to understand the LLVM’s GetElementPtr (GEP) instruction and came across this document:
http://llvm.org/docs/GetElementPtr.html
It's very helpful, but there's a few things that I find confusing. In particular, in section 'What is dereferenced by GEP?' (http://llvm.org/docs/GetElementPtr.html#id6) the following code is discussed:
%MyVar = uninitialized global { [40 x i32 ]* }
...
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
%MyVar is a global variable that is a pointer to a structure containing a pointer to an array of 40 ints. This is clear. I understand that arguments after %MyVar are indices into it, but I don't see why some of them are declared as i64 and others as i32.
My understanding is that this code was written for a 64 bit machine and that pointers are assumed to be 64 bits wide. The contents of the array pointed to by %MyVar are 32 bits wide. Why then is the last index i64 17 rather than i32 17?
I should also point out that this example illustrates illegal usage of GEP (the pointer in the structure must be dereferenced in order to index into the array of 40 ints) and I am trying to get a very good grasp of why this is the case.

The answer to the question, "what is dereferenced by GEP?" is nothing. This means that GEP does never dereference pointers: it only computes new addresses based on a pointer that you pass it. It never reads any memory.
Look at the example:
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
We start with%MyVar which is a { [40 x i32]* }*, a pointer to a struct containing a pointer to an array.
After indexing with i64 0, we have a reference to a struct { [40 x i32]* }. %MyVar already pointed to this, no dereferencing necessary.
After indexing with the second i32 0, we now refer to the [40 x i32]*, the only member of the struct. It has the same memory location as the struct itself, which is at %MyVar.
The third index i64 0 would now refer to the [40 x i32] array itself. This is illegal. GEP would need to dereference the pointer obtained in the previous step to obtain this memory address. In general, GEP can never index "through" a pointer, with the obvious exception that the initial value you pass to it is always a pointer.
I will also point out that i32 0 and i64 0 are the same for the purposes of indexing, both refer to the first element in a struct/array. The same holds for the constant 17 that you mentioned.

Passing an array to an external function

I am new to LLVM, and I am learning how to use LLVM for profiling. I need to pass an array to an external method, and insert a call instruction to the method in the code. I am currently using the following code, which on execution gives a segmentation fault.
std::vector<Value*> Args(1);
//Vector with array values
SmallVector<Constant*, 2> counts;
counts.push_back(ConstantInt::get(Type::getInt32Ty(BB->getContext()),32, false));
counts.push_back(ConstantInt::get(Type::getInt32Ty(BB->getContext()),12, false));
//Array with 2 integers
Args[0]= ConstantArray::get(llvm::ArrayType::get(llvm::Type::getInt32Ty(BI->getContext()),2), counts);
Here, the external function 'hook' is defined as M.getOrInsertFunction("hook", Type::getVoidTy(M.getContext()),
llvm::ArrayType::get(llvm::Type::getInt32Ty(BI->getContext()),2)
(Type*)0);
After reading a few source files, I've tried using GetElementPtrInst to pass the array
std::vector<Value*> ids(1);
ids.push_back(ConstantInt::get(Type::getInt32Ty(BB->getContext()),0));
Constant* array = ConstantArray::get(llvm::ArrayType::get(llvm::Type::getInt32Ty(BI->getContext()),2), counts);
Args[0] = ConstantExpr::getGetElementPtr(&(*array), ids, false);
but it fails with
7 opt 0x00000000006c59f5 bool llvm::isa<llvm::Constant, llvm::Value*>(llvm::Value* const&) + 24
8 opt 0x00000000006c5a0f llvm::cast_retty<llvm::Constant, llvm::Value*>::ret_type llvm::cast<llvm::Constant, llvm::Value*>(llvm::Value* const&) + 24
9 opt 0x0000000000b2b22f
10 opt 0x0000000000b2a4fe llvm::ConstantFoldGetElementPtr(llvm::Constant*, bool, llvm::ArrayRef<llvm::Value*>) + 55
11 opt 0x0000000000b33df2 llvm::ConstantExpr::getGetElementPtr(llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool) + 82
Also, in this case, 'hook' is defined as M.getOrInsertFunction("hook", Type::getVoidTy(M.getContext()),
PointerType::get(Type::getInt32PtrTy(M.getContext()),0), //when using GEP
(Type*)0);
Could someone kindly keep me a few pointers on passing arrays to an external function (say with the signature void hook(int abc[]) ). I am probably wrong all the way through, and would really appreciate some help.

A good place to start with "how do I do this c-like thing in LLVM IR" questions is to first write what you want to do in C, then compile it to LLVM IR via Clang and take a look at the result.
In your particular instance, the file:
void f(int a[2]);
void g() {
int x[2];
x[0] = 1;
x[1] = 3;
f(x);
}
Will compile to:
define void #g() nounwind {
%x = alloca [2 x i32], align 4
%1 = getelementptr inbounds [2 x i32]* %x, i32 0, i32 0
store i32 1, i32* %1, align 4
%2 = getelementptr inbounds [2 x i32]* %x, i32 0, i32 1
store i32 3, i32* %2, align 4
%3 = getelementptr inbounds [2 x i32]* %x, i32 0, i32 0
call void #f(i32* %3)
ret void
}
declare void #f(i32*)
So we can see the clang compiled g to receive i32*, not an array. That means you need a way to get an address to the first element of the array from the array itself, and a getelementptr instruction is a straightforward way of doing that.
Notice, however, that you want to generate a GEP (getelementptr instruction), for example via GetElementPtrInst::create. A gep constant expression, which is what you're trying to generate here, is something else, and will only work on compile-time constants.

You should use Clang to compile it. Then, check the boundaries of the array and if all the elements are defined.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ConstantStruct or ConstantArray read memory content - c++

I just understood it's not possible due to possible ConstExpr in the initializer operands. However, I managed to get a nice recursive implementations with a big switch on valueId.

Related

llvm - Access And Call Function Pointer In A Global Array Without Horrible Pointer Hacking

How can I check if two GEP instructions are semantically equal or not?

why align 8 for int?

The GEP Instruction: i32 vs i64

Passing an array to an external function

Categories

Resources