Understanding how memory allocation works (LLVM) - c++

I'm making progress on a toy compiler (first time), and trying to understand how to allocate/construct an LLVM struct type.
The Kaleidoscope tutorial doesn't include or even mention this and I don't know what I'm looking for in the LLVM source/tests to find possible examples.
So I've written a simply C++ example, dumped the IR with clang in an effort to try to understand what it produces but to be honest I don't follow it all. The things obvious to me are the function definition/declarations and some function calls and a memset call so I get pieces of it but it doesn't all come together for me yet. (P.S my interpretation of the alloca instruction docs is that it anything created from that gets freed on return so I can't use that right, it's essentially only for local variables?)
What I've done is:
alloc.cpp
struct Alloc {
int age;
};
//Alloc allocCpy() {
// return *new Alloc();
//}
Alloc *allocPtr() {
return new Alloc();
}
int main() {
Alloc *ptr = allocPtr();
// ptr->name = "Courtney";
// Alloc cpy = allocCpy();
// cpy.name = "Robinson";
// std::cout << ptr->name << std::endl;
// std::cout << cpy.name << std::endl;
return 0;
}
Then run clang -S -emit-llvm alloc.cpp to produce alloc.ll
; ModuleID = 'alloc.cpp'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.11.0"
%struct.Alloc = type { i32 }
; Function Attrs: ssp uwtable
define %struct.Alloc* #_Z8allocPtrv() #0 {
entry:
%call = call noalias i8* #_Znwm(i64 4) #3
%0 = bitcast i8* %call to %struct.Alloc*
%1 = bitcast %struct.Alloc* %0 to i8*
call void #llvm.memset.p0i8.i64(i8* %1, i8 0, i64 4, i32 4, i1 false)
ret %struct.Alloc* %0
}
; Function Attrs: nobuiltin
declare noalias i8* #_Znwm(i64) #1
; Function Attrs: nounwind
declare void #llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) #2
; Function Attrs: ssp uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%ptr = alloca %struct.Alloc*, align 8
store i32 0, i32* %retval
%call = call %struct.Alloc* #_Z8allocPtrv()
store %struct.Alloc* %call, %struct.Alloc** %ptr, align 8
ret i32 0
}
attributes #0 = { ssp uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nobuiltin "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
attributes #3 = { builtin }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 3.7.0 (tags/RELEASE_370/final)"}
Can someone explain what's happening in this IR and how it maps back to the C++? Or ignoring this specific example how one would/should go about allocating heap memory for an LLVM StructType that out lives the function within which it was created (and if you're feeling generous, how to later release the memory).
The bits I've commented out are from my original example but being a total novice the IR from that was even less insightful...

my interpretation of the alloca instruction docs is that it anything
created from that gets freed on return so I can't use that right, it's
essentially only for local variables?
Yes. Furthermore, the current advice on LLVM IR is that although alloca works as you expect it to, optimizations are another case. They advise that you alloca all of your locals in the entry block right away, even if you don't allow the user access to them or they don't always contain meaningful data.
Heap allocation is a library feature. It is not a feature of LLVM or the compiler. When you use new T(), the compiler simply calls operator new to get the memory and then constructs T there. There is no magic involved. Most of the junk that you see there is C++-ABI specific rather than any requirement of LLVM. It eventually lowers into something like void* p = malloc(size); new(p) T();. For pretty much all types T, this pretty much boils down to a series of stores into p or calling a user-defined function.
You can use the memory allocation function from the runtime library of your choice.
trying to understand how to allocate/construct an LLVM struct type
The LLVM type system does not include the notion of construction. That is a notion of the source language.
As far as LLVM is concerned, a struct is just a bunch of bits, and all memory locations are more-or-less the same. If you want the bits to be a particular thing, then store the bits you want to that location. If you want to put the bits on the heap, then call a runtime library heap allocation function and store the bits into that location.
Note that garbage collection, however, is a somewhat different story, as there is some awkward stuff going on w.r.t. finding locals on the stack for marking.
For the record, you will not get far trying to understand Clang's LLVM IR. I've been doing that for several years now and it is batshit crazy and will take you that long to start to get a grip, not to mention full of C++-specific ABI details that you don't want to know about. You will get a lot further asking in #llvm in their IRC channel or asking specific questions here than in trying to reverse-engineer that.

I don't recommend looking at unoptimized IR emitted by Clang - it's way too verbose. -O1 makes it a lot more readable - here's the -O1 version with comments annotating the lines (also I've reordered two lines to make it slightly more readable):
%struct.Alloc = type { i32 } ; Define the Alloc type.
define noalias %struct.Alloc* #_Z8allocPtrv() #0 {
%1 = tail call noalias i8* #_Znwj(i32 4) #2 ; Call _Znwj(4). This retuns i8*.
%3 = bitcast i8* %1 to i32* ; Cast the returned value to i32* (int*)...
store i32 0, i32* %3, align 4 ; ...and zero its content.
%2 = bitcast i8* %1 to %struct.Alloc* ; Cast the returned value to Alloc*...
ret %struct.Alloc* %2 ; ...and return it.
}
; Declare the _Znwj function. This doesn't need to be defined since it's already defined
; in libstdc++: this is 'operator new'. You can see this by passing this string through a
; C++ demangler, for example the one at http://demangler.com/.
declare noalias i8* #_Znwj(i32) #1
define i32 #main() #0 {
%1 = tail call %struct.Alloc* #_Z8allocPtrv() ; Call _Z8allocPtrv (Defined above).
ret i32 0
}
This is a new call, not a local allocation, so it will not be cleared when leaving #_Z8allocPtrv. Local allocations are indeed performed in LLVM IR with the alloca instruction, and not a new call.
If you're curious how new works, I believe its standard implementation uses malloc, which is translated by the compiler that compiled the library to some function that includes system call(s).

Related

How to get a function pointer (in order to get function address at runtime) in llvm

In C or C++, I can get the address of function func_1 at run time with the following code.
#include <stdio.h>
#include <string.h>
void func_1()
{
printf("this is func_1\n");
}
int main()
{
printf("this is main\n");
func_1();
void *addr = (void*)func_1; // get the addr of func_1
return 0;
}
When I convert this code to IR, I can see that the corresponding statement to get the address of the function is:
store i8* bitcast (void ()* #func_1 to i8*), i8** %addr, align 8
%2 = load i8*, i8** %addr, align 8
%call1 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([18 x i8], [18 x i8]* #.str.3, i64 0, i64 0), i8* %2)
Now, I want to use the LLVM compiler to use the pass optimization phase to add one such instruction for each function in the target program to get the function address. How do I achieve this? Maybe a storeInst and a loadInst, but how to set the parameters? In particular, void ()* #func_1 to i8*

How to check if a target of an LLVM AllocaInst is a function pointer

%pointer = alloca void (i32)*, align 8
How to check if %pointer is a function pointer?Can I get the parameter list of the function pointer?
Let Create a function that check if an Alloca Instruction Type is a function pointer.
bool isFunctionPointerType(Type *type){
// Check the type here
if(PointerType *pointerType=dyn_cast<PointerType>(type)){
return isFunctionPointerType(pointerType->getElementType());
}
//Exit Condition
else if(type->isFunctionTy()){
return true;
}
return false;
}
In your runOnModule/runOnFunction Pass
if(AllocaInst *allocaInst=dyn_cast<AllocaInst>(inst)){
if(isFunctionPointerType(allocaInst->getType())){
errs()<<"Funtion Pointer Type\n";
}
}
The above pass are tested on the following source.c code
#include <stdio.h>
void fun(int a)
{
printf("Value of a is %d\n", a);
}
int main()
{
void (*fun_ptr)(int) = &fun;
(*fun_ptr)(10);
return 0;
}
Corresponding LLVM Bitcode without any optimization
entry:
%retval = alloca i32, align 4
%fun_ptr = alloca void (i32)*, align 8
store i32 0, i32* %retval, align 4
call void #llvm.dbg.declare(metadata void (i32)** %fun_ptr, metadata !11,
... metadata !15), !dbg !16
store void (i32)* #_Z3funi, void (i32)** %fun_ptr, align 8, !dbg !16
%0 = load void (i32)*, void (i32)** %fun_ptr, align 8, !dbg !17
call void %0(i32 10), !dbg !18
ret i32 0, !dbg !19
Successfully detect func_ptr as a function pointer.
Note that the code use recursion to find the type recursively
Another way is to track the used of func_ptr using def-use chain in LLVM, ie by tracking the StoreInst and check if the source operand is a pointer to function : haven't try yet.
Hope this helps...
If it help please mark it as correct solution or upvote.. Thanks..

what llvm store instruction pattern do i need?

im trying to make an llvm backend and i dont know what i need to fix this error
LLVM ERROR: Cannot select: t5: ch = store<ST4[%retval]> t0, Constant:i32<0>, FrameIndex:i64<0>, undef:i64
this is the ir im trying to process
define i32 #main() #0 {
%retval = alloca i32, align 4
store i32 0, i32* %retval, align 4
ret i32 0
}
but i don't know what dag pattern i need to be able to match it.
a tablegen file that contains some of the instructions my arch supports is here https://github.com/jfmherokiller/customllvm/blob/master/llvm/lib/Target/ZCPU/zcpuInstr.td
i just figured out the issue i was looking at the issue wrong
store<ST4[%retval]> t0, Constant:i32<0>, FrameIndex:i64<0>, undef:i64
can be expessed in function form as store(Constant:i32<0>,FrameIndex:i64<0>) or store constant i32 0 in
stack frame index 0.
The information i wasnt getting was that FrameIndex:i64<0> directly related to this line in TargetSelectionDAG.td def frameindex :SDNode<"ISD::FrameIndex",SDTPtrLeaf, [],"FrameIndexSDNode">;
so FrameIndex = frameindex

why it is asking for a token?

I've written a very simple llvm IR code. However when I try to run it through llc, I get the following error:
llc: add_test.ll:10:16: error: expected value token
%r = load i32, i32* %retval
^
Here is the code:
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
store i32 0, i32* %retval
%r = load i32, i32* %retval
ret i32 0
}
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5.0 "}
The command that i'm running is llc add-test.ll
Does anybody know what could be the problem?
The syntax for load (among others) was changed in LLVM version 3.7. The syntax you're using is the new one. Since you're using version 3.5, you need to use the old syntax, which is:
%r = load i32* %retval
In other words you only specify the type of the parameter, not of the result.
I assume the problem occurred because you're using the current version of the documentation while using an old version of LLVM. The documentation for LLVM 3.5.0 can be found here.

How can I add reference to a variable defined in previous function in LLVM IR?

I'm new to LLVM IR and I'm implementing PL0 language. http://en.wikipedia.org/wiki/PL/0
I'm generating the testfile as following:
const a = 10;
var b, c;
procedure check1;
var dd;
procedure check2;
c := 2;
begin
dd := 1
end;
begin
b := -1024+53*(-514-766)/93+100;
c := b
end.
And the LLVM IR I generated is like this:
; ModuleID = 'LLVM Module'
define void #__global_main_entry__() {
BlockUnitEntry:
%b = alloca i32
%c = alloca i32
store i32 -1653, i32* %b
%b1 = load i32* %b
store i32 %b1, i32* %c
ret void
}
define void #check1() {
ProcedureEntry:
%dd = alloca i32
store i32 1, i32* %dd
ret void
}
define void #check2() {
ProcedureEntry:
store i32 2, i32* %c
ret void
}
I got a painful error here (at destruction):
While deleting: i32* %c
Use still stuck around after Def is destroyed: store i32 2, i32* %c
test004_llvm_generate: /files/Install/LLVM_Framework/llvm/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
I guess that using variable c(defined in __global_main_entry__) in procedure check2 adds a ref in llvm::Value, when destructing __global_main_entry__ the ref at check2 is causing the error.
I do not know how to solve the problem, and if you have time to be specific, please~
(Moreover, except for the official documentation of llvm. Are there any more resources on LLVM? I found that most tutorials are outdated.)
My full list of code is here: https://github.com/adamcavendish/PL0Compiler
Thanks in advance.
Your IR is malformed - you cannot refer to an instruction from the body of a function different from the one in which the instruction appears, so referring to %c in #check2 is illegal. The failure just happened to occur during module destruction, but it can occur in other circumstances as well.
In general, I recommend running opt -verify on your IR if you're not sure it's legal, it will give you nice error messages. My Eclipse plugin might also help if you want to experiment with IR to see when it is and isn't legal.
As for a solution, it looks like you should create a global variable to represent c, not an instruction. Then you can store into it and load from it in every function in the module.