I am learning LLVM basics. I am trying to get into the builder framework and have set up the module, a function header etc, but I have not been able yet to figure out a way to create a simple sequence like this in builder:
%0 = 41
%1 = add i32 42, %0
Meaning how can I use the pseudo register notation through the builder framework?
I have tried to create a plus instruction based on two constants. The core line I'm using to generate the (integer) addition is:
Value *L = (Value *)m_left->Create_LLVM( );
Value *R = (Value *)m_right->Create_LLVM();
if ( L == 0 || R == 0 ) return 0;
llvm::Value *p_instruction = Get_Builder().CreateAdd( L, R, "addtmp" );
This contains lots of my own functions but I guess the basics is clear. I get a Value pointer for the left and right operands which are both constants and then create an add operation with the builder framwork. Again the module and builder are set up correctly, when I call dump() I see all the other stuff I do, but this line above does not create any IR code.
I would expect it co create something like
%4 = add i32 %3, %2
or something similar. Am I misunderstanding something fundamental about the way operations are to be constructed with the builder or is it just some small oversight of some detail?
Thanks
Hard to say what you are doing wrong without the fancy Create_LLVM() functions but in general for adding two constants:
You have to create 2 ConstantInt:
const auto& ctx = getGlobalContext(); // just your LLVMContext
auto* L = ConstantInt::get(Type::getInt32Ty(ctx), 41);
auto* R = ConstantInt::get(Type::getInt32Ty(ctx), 42);
const auto& builder = Get_Builder();
builder.Insert(L); // just a no-op in standard builder impl
builder.Insert(R); // just a no-op in standard builder impl
builder.CreateAdd(L, R, "addtmp");
You should get:
%addtmp = add i32 41, i32 42;
You said that your builder is set up correctly, so it will add the add at the end of the BasicBlock it currently operates on. And I assume you have allready created a Function with at least one BasicBlock.
Edit:
What will bring give you an add instruction in any way is to create it just calling the C++ API without the builder:
BinaryOperator* add = BinaryOperator::Create(BinaryOps::Add, L, R, "addtmp", BB);
where BB is the current BasicBlock.
To get something more sophisticated (adding to variables) the canonical way is this:
At first you need some memory. The AllocaInst allocates memory on the stack:
You can use the builder for this:
auto* A = builder.CreateAlloca (Type::getInt32Ty(ctx), nullptr, "a");
auto* B = builder.CreateAlloca (Type::getInt32Ty(ctx), nullptr, "b");
For simplicity I'll just take the constants from above and store them in A and B.
To store values we need StoreInst:
builder.CreateStore (L, A, /*isVolatile=*/false);
builder.CreateStore (R, B, /*isVolatile=*/false);
For the addition we load the value from memory to a register using LoadInst:
auto* addLHS = builder.CreateLoad(A);
auto* addRHS = builder.CreateLoad(B);
Finally the addition as above:
auto* add = builder.CreateAdd(addLHS, addRHS , "add");
And with the pointer to add you can go on, e.g., returning it or storing it to another variable.
The IR should look like this:
define i32 foo() {
entry:
%a = alloca i32, align 4
%b = alloca i32, align 4
store i32 41, i32* %a, align 4
store i32 42, i32* %b, align 4
%0 = load i32* %a, align 4
%1 = load i32* %b, align 4
%add = add i32 %0, %1
ret i32 %add
}
Related
I want to insert a load and a store instruction before the first instruction of the first basicblock of a function (used to simulation the performance overhead of our work). The the LLVM pass is written as following:
Value *One = llvm::ConstantInt::get(Type::getInt32Ty(Context),1);
for(Function::iterator bb = tmp->begin(); bb != tmp->end(); ++bb {
//for every instruction of the block
for (BasicBlock::iterator inst = bb->begin(); inst != bb->end(); ++inst){
if(inst == bb->begin() && bb == tmp->begin()){
BasicBlock* bp = &*bb;
Instruction* pinst = &*inst;
AllocaInst *pa = new AllocaInst(Int32Ty, "buf", pinst);
StoreInst* newstore = new StoreInst(One, pa, pinst);
LoadInst* newload = new LoadInst(pa, "loadvalue", pinst);
}
}
}
The inserted load and store instructions can be seen in the xx.ll file:
define i32 #fun() #0 {
entry:
%buf = alloca i32
%loadvalue = load i32, i32* %buf
store i32 %loadvalue, i32* %buf
%call = tail call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* #.str, i64 0, i64 0))
ret i32 1
}
However, the inserted instructions disappeared in the target executable file.
How can I fix this problem?
They were probably eliminated by optimization because they don't have any visible effect. Try marking the load and store as volatile.
In general your algorithm won't work because LLVM expects all of the allocas in a function to be the first instructions. You should scan through the first basic block and find the first non-alloca instruction and insert new code there, making sure to add allocas first (as you did).
In the first answer here, the following was mentioned about the stack memory in C++:
When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data.
This makes perfect sense on the top-level, and makes me curious about how smart compilers are when allocating this memory in and of itself, given the context of this question: Since braces themselves are not a stack frame in C (I assume this holds true for C++ as well), I want to check whether compilers optimize reserved memory based on variable scopes within a single function.
In the following I'm assuming that the stack looks like this before a function call:
--------
|main()|
-------- <- stack pointer: space above it is used for current scope
| |
| |
| |
| |
--------
And then the following after invoking a function f():
--------
|main()|
-------- <- old stack pointer (osp)
| f() |
-------- <- stack pointer, variables will now be placed between here and osp upon reaching their declarations
| |
| |
| |
| |
--------
For example, given this function
void f() {
int x = 0;
int y = 5;
int z = x + y;
}
Presumably, this will just allocate 3*sizeof(int) + some extra overhead for bookkeeping.
However, what about this function:
void g() {
for (int i = 0; i < 100000; i++) {
int x = 0;
}
{
MyObject myObject[1000];
}
{
MyObject myObject[1000];
}
}
Ignoring compiler optimizations which may elide a lot of stuff in the above since really they do nothing, I'm curious about the following in the second example:
For the for loop: will the stack space be large enough to fit all 100000 ints?
On top of that, will the stack space contain 1000*sizeof(MyObject) or 2000*sizeof(MyObject)?
In general: does the compiler take variable scope into account when determining how much memory it will need for the new stack frame, before invoking a certain function? If this is compiler-specific, how do some well-known compilers do it?
The compiler will allocate space as needed (typically for all items at the beginning of the function), but not for each iteration in the loop.
For example, what Clang produces, as LLVM-IR
define void #_Z1gv() #0 {
%i = alloca i32, align 4
%x = alloca i32, align 4
%myObject = alloca [1000 x %class.MyObject], align 16
%myObject1 = alloca [1000 x %class.MyObject], align 16
store i32 0, i32* %i, align 4
br label %1
; <label>:1: ; preds = %5, %0
%2 = load i32, i32* %i, align 4
%3 = icmp slt i32 %2, 100000
br i1 %3, label %4, label %8
; <label>:4: ; preds = %1
store i32 0, i32* %x, align 4
br label %5
; <label>:5: ; preds = %4
%6 = load i32, i32* %i, align 4
%7 = add nsw i32 %6, 1
store i32 %7, i32* %i, align 4
br label %1
; <label>:8: ; preds = %1
ret void
}
This is the result of:
class MyObject
{
public:
int x, y;
};
void g() {
for (int i = 0; i < 100000; i++)
{
int x = 0;
}
{
MyObject myObject[1000];
}
{
MyObject myObject[1000];
}
}
So, as you can see, x is allocated only once, not 100000 times. Because only ONE of those variables will exist at any given time.
(The compiler could reuse the space for myObject[1000] for x and the second myObject[1000] - and probably would do so for an optimised build, but in that case it would also completely remove these variables as they are not used, so it wouldn't show very well)
In a modern compiler, the function is first transformed to a flow graph. In every arc of the flow, the compiler knows how many variables are live - that is to say holding a visible value. Some of those will live in registers, and for the others the compiler will need to reserve stack space.
Things get a bit more complicated as the optimizer gets further involved, because it may prefer not to move stack variables around. That's not free.
Still, in the end the compiler has all the assembly operations ready, and can just count how many unique stack addresses are used.
I was challenged with rather educational task to extend LLVM in the following way:
Add register XACC and instructions LRXACC(), SRXACC(arg1), XMAC(arg1,arg2) to the SPARC back-end. Instructions do the following:
LRXACC: load value from XACC,
SRXACC: write to XACC,
XMAC: XACC+=(arg1*arg2)>>31
Provide builtins in the Clang front-end for all three of them.
My source for testing is:
int main() {
int acc = 0;
__builtin___srxacc(acc);
__builtin___xmac(12345,6789);
acc = __builtin___lrxacc();
return 0;
}
I was able to support conversion of builtins into intrinsic function. IR file I get from clang looks fine:
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%acc = alloca i32, align 4
store i32 0, i32* %retval
store i32 0, i32* %acc, align 4
%0 = load i32* %acc, align 4
call void #llvm.sparc.srxacc(i32 %0)
call void #llvm.sparc.xmac(i32 12345, i32 6789)
%1 = call i32 #llvm.sparc.lrxacc()
store i32 %1, i32* %acc, align 4
ret i32 0
}
Issue appears during DAG combining step and final output code looks like that:
.text
.file "file22.ll"
.globl main
.align 4
.type main,#function
main: ! #main
! BB#0: ! %entry
add %sp, -104, %sp
st %g0, [%sp+100]
st %g0, [%sp+96]
lrxacc %xacc, %o0
st %o0, [%sp+96]
sethi 0, %o0
retl
add %sp, 104, %sp
.Ltmp0:
.size main, .Ltmp0-main
.ident "clang version 3.6.0 (trunk)"
DAGCombiner deletes srxacc and xmac instructions as redundant. (In the ::Run method it checks node for use_empty() and deletes it if it's so)
Combiner does that because they store result in the register, so it's not visible from graph, that one of them depends on another.
I would appreciate any suggestions on how to avoid removal of my instructions.
Thank you!
Edit
To simplify and concretize: Instructions, which represented in IR code like that void #llvm.sparc.srxacc(i32 %0) look to combiner like they don't affect computation and corresponding SDNodes receive empty UseList. How to get around that?
You may use chain tokens to represent control dependency between SDNodes. This way you can add fake dependency between two instructions even if the second one doesn't consume any output of the first one.
You may use CopyToRegister and CopyFromRegister to cope with predefined physical registers
You may use Glue to glue several simple instructions into a complex pseudo-instruction.
Consider the following simple example compiled for x86:
int foo(int aa, int bb) {
return aa / bb;
}
You may want also to investigate more complicated example, though DAG picture is too big to post it here (you can view DAG with -view-sched-dags option):
void foo(int aa, int bb, int *x, int *y, int *z) {
*x = aa / bb;
*y = aa % bb;
*z = aa * bb;
}
Take a look at this post too.
I've been going in circles through the LLVM documentation / Stack Overflow and cannot figure out how an integer global variable should be initialized as 0 (first time using LLVM). This is some of my code currently:
TheModule = (argc > 1) ? new Module(argv[1], Context) : new Module("Filename", Context);
// Unrelated code
// currentGlobal->id is just a string
TheModule->getOrInsertGlobal(currentGlobal->id, Builder.getInt32Ty());
llvm::GlobalVariable* gVar = TheModule->getNamedGlobal(currentGlobal->id);
gVar->setLinkage(llvm::GlobalValue::CommonLinkage);
gVar->setAlignment(4);
// What replaces "???" below?
//gVar->setInitializer(???);
This almost does what I want, an example of output it can produce:
#a = common global i32, align 4
#b = common global i32, align 4
#c = common global i32, align 4
However, clang foo.c -S -emit-llvm produces this which I want as well:
#a = common global i32 0, align 4
#b = common global i32 0, align 4
#c = common global i32 0, align 4
As far as I can tell I need a Constant* where I have "???", but am not sure how to do it: http://llvm.org/docs/doxygen/html/classllvm_1_1GlobalVariable.html#a095f8f031d99ce3c0b25478713293dea
Use one of the APInt constructors to get a 0-valued ConstantInt (AP stands for Arbitrary Precision)
ConstantInt* const_int_val = ConstantInt::get(module->getContext(), APInt(32,0));
Then set your initializer value (a Constant subclass)
global_var->setInitializer(const_int_val);
After compilation next snippet of code with clang -O2 (or with online demo):
#include <stdio.h>
#include <stdlib.h>
int flop(int x);
int flip(int x) {
if (x == 0) return 1;
return (x+1)*flop(x-1);
}
int flop(int x) {
if (x == 0) return 1;
return (x+0)*flip(x-1);
}
int main(int argc, char **argv) {
printf("%d\n", flip(atoi(argv[1])));
}
I'm getting next snippet of llvm assembly in flip:
bb1.i: ; preds = %bb1
%4 = add nsw i32 %x, -2 ; <i32> [#uses=1]
%5 = tail call i32 #flip(i32 %4) nounwind ; <i32> [#uses=1]
%6 = mul nsw i32 %5, %2 ; <i32> [#uses=1]
br label %flop.exit
I thought that tail call means dropping current stack (i.e. return will be to the upper frame, so next instruction should be ret %5), but according to this code it will do mul for it. And in native assembly there is simple call without tail optimisation (even with appropriate flag for llc)
Can sombody explain why clang generates such code?
As well I can't understand why llvm have tail call if it can simply check that next ret will use result of prev call and later do appropriate optimisation or generate native equivalent of tail-call instruction?
Take a look at the 'call' instruction in the LLVM Assembly Language Reference Manual. It says:
The optional "tail" marker indicates that the callee function does not access any allocas or varargs in the caller. Note that calls may be marked "tail" even if they do not occur before a ret instruction.
It's likely that one of the LLVM optimization passes in Clang analyzes whether or not the callee accesses any allocas or varargs in the caller. If it doesn't, the pass marks the call as a tail call and lets another part of the LLVM figure out what to do with the "tail" marker. Maybe the function can't be a real tail call right now, but after further transformations it could be. I'm guessing it's done this way to make the ordering of the passes less important.