Equivalent of #include for LLVM IR - llvm

I have found myself with a reasonably large number of useful functions and constants written in LLVM's IR. I can use this pseudo-library by combining it with hand written IR, provided said hand written IR starts with a potentially lengthy list of declarations. I'm aware that IR isn't necessarily designed as a general purpose programming language to write stuff in.
This is much like writing a lot of C functions in one file then redeclaring them wherever they are used. In C this is worked around using #include and header files. That's not perfect, but it beats writing out the prototypes repeatedly.
What's the least nasty way to achieve something similar in IR? It only has to beat typing the stuff out over and over again (which I currently do in copy & paste fashion) and using cat as a custom build step.
Thanks!

Sadly there is no such thing in LLVM IR.
LLVM IR isn't designed to have large amounts of it written by hand. Therefore it doesn't have a #include mechanism. The job of handling that kind of stuff falls onto the compiler using the LLVM API.
One thing you could do however if you want to achieve the same effect is either to try to see if someone else's preprocessor will work for what you're trying to do or write a custom preprocessor yourself.

You can use llvm-link for combining different IRs together.
For example, you have the following sequence.
// file : f1.ll
; Function Attrs: nounwind readnone
define i32 #f1(i32 %a) #0 {
entry:
ret i32 %a
}
// file : f2.ll
; Function Attrs: nounwind
define i32 #f2(i32 %a) #0 {
entry:
%call = tail call i32 #f1(i32 %a) #2
ret i32 %call
}
Then you can call
llvm-link f1.ll f2.ll -S -o ffinal.ll
ffinal.ll would contain both IR codes.

Related

What does a ".number" following a function name mean in LLVM IR?

In LLVM IR, a "." and a number following a function name.
Such as
#kmalloc.2670,#kmalloc.19
What does this number mean?
It is often the situation that a same function name followed by different numbers. However, the definition code of the two functions are the same.
Can anybody help me?
define internal i8* #kmalloc.2670(i64 %size, i32 %flags) #5 !dbg !436635
define internal i8* #kmalloc.19(i64 %size, i32 %flags) #5 !dbg !1202009
Is this right?
LLVM docs:
One nice thing about LLVM is that the name is just a hint. For
instance, if the code above emits multiple “addtmp” variables, LLVM
will automatically provide each one with an increasing, unique numeric
suffix. Local value names for instructions are purely optional, but it
makes it much easier to read the IR dumps.

LLVM OPT not giving optimised file as output.

The man page for opt says: "It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file or the analysis results".
My Goal: To use the inbuilt optimisation pass -dce available in opt. This pass does Dead Code Elimination
My Source file foo.c:
int foo(void)
{
int a = 24;
int b = 25; /* Assignment to dead variable -- dead code */
int c;
c = a * 4;
return c;
}
Here is what I did:
1. clang-7.0 -S -emit-llvm foo.c -o foo.ll
2. opt -dce -S foo.ll -o fooOpt.ll
What I expect : A .ll file in which the dead code (in source code with the comment) part is eliminated.
What I get: fooOpt.ll is the same as non optimised code foo.ll
I have already seen this SO answer, but I didn't get optimised code.
Am I missing something here? Can someone please guide me on the right path.
Thank you.
If you look at the .ll file generated by clang, it will contain a line like this:
attributes #0 = { noinline nounwind optnone sspstrong uwtable ...}
You should remove the optnone attribute here. Whenever a function has the optnone attribute, opt won't touch that function at all.
Now if you try again, you'll notice ... nothing. It still does not work.
This time the problem is that the code is working on memory, not registers. What we need to do is to convert the allocas to registers using -mem2reg. In fact doing this will already optimize away b, so you don't even need the -dce flag.

LLVM IR types being collapsed wrongly when linking (C++ API)

Straight to the point -- I'm trying to link two (or more) llvm modules together, and I'm facing a certain odd error from LLVM.
I don't want to post too much code, so I'll use a bunch of pseudo here.
I have 3 modules, let's say A, B, and C. A is the main module; I initialise llvm::Linker with it. B and C are secondary modules; I call linker.linkInModule(B and C).
All 3 modules have, among other things, these two types defined:
%String = type { i8*, i64 }
%Character = type { i8*, i64 }
Note that they have the same member types. Furthermore, a function foo is defined as such (in module B):
define i1 #_ZN9Character7hasDataEv(%Character*) { }
This function is declared in modules A and C. Now, all seems well and good -- this function is called from both modules A and C, and the IR looks normal, like so:
%21 = call i1 #_ZN9Character7hasDataEv(%Character* %4)
Here comes the problem: when all 3 modules are linked together, something happens to these types:
They lose their name, becoming %2 (%String) and %3 (%Character).
They appear to be merged together.
Strangely, while this transformation occurs in both modules A and C, the bug only occurs in C -- note that A is the so-called "main" module.
The function definition of the linked file is now
define i1 #_ZN9Character7hasDataEv(%2*)
Note how %Character, or %3, got turned into %2. Furthermore, at the callsite, in what is presumably an attempt to un-merge the types, I get this:
%10 = call i1 bitcast (i1 (%2*)* #_ZN9Character7hasDataEv to i1 (%3*)*)(%2* %2)
Curiously, although the function was casted from i1 (%2*) to %3 (%2*), the argument passed (arg. 1) is still of type %2. What's going on?
Note that in module A, whatever is going on is done properly, and there is no error. This happens for a number of functions, but only in module C.
I've tried reproducing it by copy-pasting these to .ll files and calling llvm-link followed by llvm-dis, but 1. the types are not merged, and 2. there is no such bug.
Thanks...?
Okay, turns out that, after some poking around in the llvm IRC channel, llvm::Linker was meant to be used with an empty llvm::Module as the starting module.
Also, in my use-case I am reusing the same llvm::Type (the actual thing in memory) across different modules that I link together. They said it wasn't illegal, but that it was never tested, so... ¯\_(ツ)_/¯
So anyway, the problem was fixed by starting with an empty module to pass to the linker.

Clang does not inline calls having pointer casts (indirect function calls)

I was trying to inline functions in llvm using this command:
opt -inline -inline-threshold=1000000 a.bc -o a.inline.bc
The (indirect) function calls involving pointer casts were not been able to inline. For eg.
%call4 = call i32 (...)* bitcast (i32 (%struct.token_type*)* #print_token to i32 (...)*)(%struct.token_type* %5)
But the functions calls like the one below are being inlined:
%call49 = call i32 #special(i32 %43)
Can I inline all the function calls irrespective of the fact whether they are direct or indirect??
Thanks!
You can't inline something if you don't know what it is, and a function pointer that is assigned at run time can not be know at any point during the build process... If it is defined in such a way as to be reassign-able then it couldn't be possibly inlined... Calling code could be inlined, but calls to function pointers can't be....
It is possible that there are some scenarios that could possibly be inlined that llvm is overly cautious about, but that would probably be an issue for the llvm dev list...
And you haven't given a concrete example to look at for someone wiser than me to look at, to know if it should be possible to inline in your scenario.

convert to LLVM IR: how to create virtual register instead of allocate a stack variable?

I am working on converting another IR to llvm IR.
My IR is like this:
a = 1;
b = a;
a = a + 1;
For now, I am using alloca to create variable in my IR (Here for "a" and "b").
However, alloca probably is too heavy, it will introduce lots of load store instructions. This will be a problem if the function is huge. Actually, for my case, most of the variables are register-width. So I just want them be a virtual register with name.
Anybody know how to create a virtual register(variable) instead of memory variable?
I mean how to avoid using "alloca"?
You are not supposed to. Generating SSA code is a quite hard problem, so it's solved once for all frontends in LLVM passes. You are supposed to use alloca and load/store, and then run the mem2reg pass to convert those into SSA variables. Clang also does this (stick your example code in a C function, and compile it with no optimizations).