Clang does not inline calls having pointer casts (indirect function calls) - llvm

I was trying to inline functions in llvm using this command:
opt -inline -inline-threshold=1000000 a.bc -o a.inline.bc
The (indirect) function calls involving pointer casts were not been able to inline. For eg.
%call4 = call i32 (...)* bitcast (i32 (%struct.token_type*)* #print_token to i32 (...)*)(%struct.token_type* %5)
But the functions calls like the one below are being inlined:
%call49 = call i32 #special(i32 %43)
Can I inline all the function calls irrespective of the fact whether they are direct or indirect??
Thanks!

You can't inline something if you don't know what it is, and a function pointer that is assigned at run time can not be know at any point during the build process... If it is defined in such a way as to be reassign-able then it couldn't be possibly inlined... Calling code could be inlined, but calls to function pointers can't be....
It is possible that there are some scenarios that could possibly be inlined that llvm is overly cautious about, but that would probably be an issue for the llvm dev list...
And you haven't given a concrete example to look at for someone wiser than me to look at, to know if it should be possible to inline in your scenario.

Related

LLVM Pass: to change the function call's argument values

part of my project, based on some analysis, I have to change the function call's arguments. I am doing it in the llvm-ir level. something like this,
doWork("work",functionBefore)
based on my results my llvm-pass should be able to transform the function pointer passed to the function call like this
doWork("work",functionAfter)
assume both functionBefore and functionAfter have the same return type.
Is it possible to change the arguments using llvm pass?
Or should i delete the instruction and recreate the one I needed?
Please give some suggestions or directions how to do this ?
llvm ir to the call the function would be something like this-
invoke void #_Z7processNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPFvS4_E(%"c lass.std::__cxx11::basic_string"* nonnull %1, void (%"class.std::__cxx11::basic_string"*)* nonnull #_Z9functionBNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE) to label %7 unwind label %13

What does a ".number" following a function name mean in LLVM IR?

In LLVM IR, a "." and a number following a function name.
Such as
#kmalloc.2670,#kmalloc.19
What does this number mean?
It is often the situation that a same function name followed by different numbers. However, the definition code of the two functions are the same.
Can anybody help me?
define internal i8* #kmalloc.2670(i64 %size, i32 %flags) #5 !dbg !436635
define internal i8* #kmalloc.19(i64 %size, i32 %flags) #5 !dbg !1202009
Is this right?
LLVM docs:
One nice thing about LLVM is that the name is just a hint. For
instance, if the code above emits multiple “addtmp” variables, LLVM
will automatically provide each one with an increasing, unique numeric
suffix. Local value names for instructions are purely optional, but it
makes it much easier to read the IR dumps.

Create debug location for function calls in LLVM function pass

I have created an optimization (function) pass that instruments specific instructions and creates function calls before target instructions. It works fine, but I cannot enable debug symbols (-g) due to not having a debug location for my custom function calls.
i8* %381 = call i8* #my_function(i64* %375)
inlinable function call in a function with debug info must have a !dbg location
How can I create a debug location for a custom function call (e.g., my_function) in an LLVM optimization pass?
That limitation only applies to inlinable function calls. If your function isn't inlinable, you can mark it as such, my_function->addAttribute(AttributeList::FunctionIndex, Attribute::NoInline); and avoid the problem.

Equivalent of #include for LLVM IR

I have found myself with a reasonably large number of useful functions and constants written in LLVM's IR. I can use this pseudo-library by combining it with hand written IR, provided said hand written IR starts with a potentially lengthy list of declarations. I'm aware that IR isn't necessarily designed as a general purpose programming language to write stuff in.
This is much like writing a lot of C functions in one file then redeclaring them wherever they are used. In C this is worked around using #include and header files. That's not perfect, but it beats writing out the prototypes repeatedly.
What's the least nasty way to achieve something similar in IR? It only has to beat typing the stuff out over and over again (which I currently do in copy & paste fashion) and using cat as a custom build step.
Thanks!
Sadly there is no such thing in LLVM IR.
LLVM IR isn't designed to have large amounts of it written by hand. Therefore it doesn't have a #include mechanism. The job of handling that kind of stuff falls onto the compiler using the LLVM API.
One thing you could do however if you want to achieve the same effect is either to try to see if someone else's preprocessor will work for what you're trying to do or write a custom preprocessor yourself.
You can use llvm-link for combining different IRs together.
For example, you have the following sequence.
// file : f1.ll
; Function Attrs: nounwind readnone
define i32 #f1(i32 %a) #0 {
entry:
ret i32 %a
}
// file : f2.ll
; Function Attrs: nounwind
define i32 #f2(i32 %a) #0 {
entry:
%call = tail call i32 #f1(i32 %a) #2
ret i32 %call
}
Then you can call
llvm-link f1.ll f2.ll -S -o ffinal.ll
ffinal.ll would contain both IR codes.

LLVM IR types being collapsed wrongly when linking (C++ API)

Straight to the point -- I'm trying to link two (or more) llvm modules together, and I'm facing a certain odd error from LLVM.
I don't want to post too much code, so I'll use a bunch of pseudo here.
I have 3 modules, let's say A, B, and C. A is the main module; I initialise llvm::Linker with it. B and C are secondary modules; I call linker.linkInModule(B and C).
All 3 modules have, among other things, these two types defined:
%String = type { i8*, i64 }
%Character = type { i8*, i64 }
Note that they have the same member types. Furthermore, a function foo is defined as such (in module B):
define i1 #_ZN9Character7hasDataEv(%Character*) { }
This function is declared in modules A and C. Now, all seems well and good -- this function is called from both modules A and C, and the IR looks normal, like so:
%21 = call i1 #_ZN9Character7hasDataEv(%Character* %4)
Here comes the problem: when all 3 modules are linked together, something happens to these types:
They lose their name, becoming %2 (%String) and %3 (%Character).
They appear to be merged together.
Strangely, while this transformation occurs in both modules A and C, the bug only occurs in C -- note that A is the so-called "main" module.
The function definition of the linked file is now
define i1 #_ZN9Character7hasDataEv(%2*)
Note how %Character, or %3, got turned into %2. Furthermore, at the callsite, in what is presumably an attempt to un-merge the types, I get this:
%10 = call i1 bitcast (i1 (%2*)* #_ZN9Character7hasDataEv to i1 (%3*)*)(%2* %2)
Curiously, although the function was casted from i1 (%2*) to %3 (%2*), the argument passed (arg. 1) is still of type %2. What's going on?
Note that in module A, whatever is going on is done properly, and there is no error. This happens for a number of functions, but only in module C.
I've tried reproducing it by copy-pasting these to .ll files and calling llvm-link followed by llvm-dis, but 1. the types are not merged, and 2. there is no such bug.
Thanks...?
Okay, turns out that, after some poking around in the llvm IRC channel, llvm::Linker was meant to be used with an empty llvm::Module as the starting module.
Also, in my use-case I am reusing the same llvm::Type (the actual thing in memory) across different modules that I link together. They said it wasn't illegal, but that it was never tested, so... ¯\_(ツ)_/¯
So anyway, the problem was fixed by starting with an empty module to pass to the linker.