Identify whether a function is inlined in the LLVM IR - llvm

We are instrumenting the source code at compile-time, based on the LLVM IR. In this procedure, we want to skip the functions that are already inlined (e.g., due to compile-time optimization).
How can we determine whether a function has been inlined in our LLVM pass?

This seems rather vague and open to many interpretations...
One way to see whether foo() is inlined into bar() is to loop over the instructions in bar() and see whether any of them are call %foo or similar. If that is the case, then at least one call wasn't inlined, even if other calls may have been.
Another way is to look at the debug info. Suppose that foo() originates in foo.c lines 10 to 20. You can look at the debug info for all instructions in bar() and check whether any refer to lines 10-20 of foo.c. If any do, then at least one call was inlined, even if others were not.
I can think of at least two more ways, too, and I'm sure there are more. (Edit: I can think of three, including one quite nice way: Attach some unique metadata to the instructions in foo() early in the compilation and see where that metadata is found just before native codegen.)

Well, we gradually noticed that the "inline" operation is in terms of the specific call site rather than the callee function. Thus, one function may not have an attribute of "inline". For each function, it may be inlined at some call points but called normally at other call points, so we shouldn't skip them in our instrument pass.

Related

If a function is only called from one place, is it always better to inline it? [duplicate]

This question already has answers here:
When to use the inline function and when not to use it?
(14 answers)
Closed 7 years ago.
If a function is only used in one place and some profiling shows that it's not being inlined, will there always be a performance advantage in forcing the compiler to inline it?
Obviously "profile and see" (and in the case of the function in question, it did prove to be a small perf boost). I'm mostly asking out of curiosity -- are there any performance disadvantages to this with a reasonably smart compiler?
No, there are notable exceptions. Take this code for example:
void do_something_often(void) {
x++;
if (x == 100000000) {
do_a_lot_of_work();
}
}
Let's say do_something_often() is called very often and from many places. do_a_lot_of_work() is called very rarely (one out of every one hundred million calls). Inlining do_a_lot_of_work() into do_something_often() doesn't gain you anything. Since do_something_often() does almost nothing, it would be much better if it got inlined into the functions that call it, and in the rare case that they need to call do_a_lot_of_work(), they call it out of line. In that way, they are saving a function call almost every time, and saving code bloat at every call site.
One legitimate case where it makes sense not to inline a function, even if it's only called from a single location, is if the call to the function is rare and almost always skipped. Keeping the instructions before the function call and the instructions after the function call closely together in memory may allow those instructions to be kept in the processor cache, when that would be impossible if those blocks of instructions were separated in memory.
It would still be possible for the compiler to compile the function call as if using goto, avoiding having to keep track of a return address, but if the compiler has already determined that the function call is rare, then it makes sense to not pay as much time optimising that call.
You can't "force" the compiler to inline it, unless you are considering some implementation-specific tools that you have not mentioned, so the question is entirely moot.
If your compiler is already not doing so then it has a reason.
If the function is called only once, there should be no performance disadvantages in inlining it. However, that does not mean you should blindly inline all functions. For example, if the code in question is Linux kernel code and you're using the BUG_ON or WARN_ON statement to print a stack trace, you don't get the full stack trace which includes the inline function. Instead, the stack trace contains only the name of the calling function.
And, as the other answer explained, the "inline" doesn't actually force the compiler to inline the function, it just is a hint to the compiler. However, there is actually an attribute __attribute__((always_inline)) in GCC which should force the compiler to inline the function.
Make sure that the function definition is not exported. If it is, it obviously needs to be compiled, and that means that if your function is big probably the call will not be inlined. (Remember, it's the call that gets inlined, not the function. A function might get inlined in one place and called in another, etc.)
So even if you know that the function is called only from one place, the compiler might not. Make sure to hide the definition of your function to the other object files, for example by defining it in the anonymous namespace.
That being said, even if it is called from only one place, it does not mean that it is always a good idea to inline it. If your function is called rarely, it might waste a lot of memory in the CPU cache.
Depending on how you wrote your function.
In some cases, yes!
void doSomething(int *src, int *dst,
const int loopCountInner, const int loopCountOuter)
{
int i, j;
for(i=0; i<loopCounterOuter; i++){
for(j=0; j<loopCounterInner; j++){
*dst = someCalculations(*src);
src++;
dst++
}
}
}
In this example, if this function is compiled as non-inlined, then compiler basically has no knowledge about the trip count of the two loops. This is a big deal for implementations that rely strongly on compile-time optimizations.
I came across a even worse case: compiler assumes loopCounterInner to be a large value and optimized for that case, but loopCounterInner is actually 3 or 5 so the best choice is to fully unroll the inner loop!
For C++ probably the best way to do it is to make them template variables, but for C, the only way to generate differently optimized code for different use cases is to inline the function.
No, if the code is a rarely used function then keeping it off the 'hot path' will be beneficial. An inline function will use up cache space [instruction cache] whether or not the code is actually used. Tools like LTCG combined with Profile Guided optimisation (in the MSFT world, not sure about Linux) go to great pains to keep rarely used code off the hot path and this can make a significant difference

How do I use gcc's inline report (-Winline)

Enabling -Winline on my project produces a whole lot of output which I don't really understand. Does anyone know how to use this output to figure out why my particular function wasn't inlined?
Well, according to my gcc man page...
The compiler uses a variety of
heuristics to determine whether or not
to inline a function. For example,
the compiler takes into account the
size of the function being inlined and
the amount of inlining that has
already been done in the current
function. Therefore, seemingly
insignificant changes in the source
program can cause the warnings
produced by -Winline to appear or
disappear.
I don't believe that you can force the compiler to inline your function; it's an implementation detail that could even change when the compiler is updated. Besides, as long as the compiler's choice causes your function to run faster, is there any particular reason that you care whether the function is actually inlined or not?
Of course, if you really want to inline your function for some reason, you could probably just use a macro to do so.

Inline Functions

I know compiler may or may not perform inline expansion of a function whether requested by the programmer or not.
I was just curious to know, is there any way by which programmer can know for sure that compiler has inlined a particular function?
Other than by looking at the generated code, no. Some implementations may provide that information but it's not required by the standard.
Things like inline or register (shudder) are suggestions to the compiler and it's free to accept them, ignore them or even lie to you that it's done it while secretly going behind your back and not doing it :-)
I tend not to use features like that since I suspect the compiler often knows better than I do how to wring the most performance out of my code.
You can profile your code and see if the function of interest shows up in the call stack. Although, I suppose there is no guarantee if your stack sampling rate is not high enough.
But it may prove that it is inlined: if you know A calls B, which calls C, and A never calls C directly, if you see A calling C on the call stack, you know B was inlined for that call.
Set your compiler to generate assembler code and check there.
Read the disassembly of the object file.
There is no way to know except to look at the output assembler.
Compilers these days are 'smart' and they decide what functions to inline and in what cases.
Just like the register keyword, compilers do the picking these days and really ignore your requests.
I don't think there is a way to find out what you want,
But you can increase the possibilites of the function being an inline function by,
Making the definition of the function visible to the translation unit in which it is called. i.e you always have to put the definition of an inline function in header file.

Inline Function (When to insert)?

Inline functions are just a request to compilers that insert the complete body of the inline function in every place in the code where that function is used.
But how the compiler decides whether it should insert it or not? Which algorithm/mechanism it uses to decide?
Thanks,
Naveen
Some common aspects:
Compiler option (debug builds usually don't inline, and most compilers have options to override the inline declaration to try to inline all, or none)
suitable calling convention (e.g. varargs functions usually aren't inlined)
suitable for inlining: depends on size of the function, call frequency of the function, gains through inlining, and optimization settings (speed vs. code size). Often, tiny functions have the most benefits, but a huge function may be inlined if it is called just once
inline call depth and recursion settings
The 3rd is probably the core of your question, but that's really "compiler specific heuristics" - you need to check the compiler docs, but usually they won't give much guarantees. MSDN has some (limited) information for MSVC.
Beyond trivialities (e.g. simple getters and very primitive functions), inlining as such isn't very helpful anymore. The cost of the call instruction has gone down, and branch prediction has greatly improved.
The great opportunity for inlining is removing code paths that the compiler knows won't be taken - as an extreme example:
inline int Foo(bool refresh = false)
{
if (refresh)
{
// ...extensive code to update m_foo
}
return m_foo;
}
A good compiler would inline Foo(false), but not Foo(true).
With Link Time Code Generation, Foo could reside in a .cpp (without a inline declararion), and Foo(false) would still be inlined, so again inline has only marginal effects here.
To summarize: There are few scenarios where you should attempt to take manual control of inlining by placing (or omitting) inline statements.
The following is in the FAQ for the Sun Studio 11 compiler:
The compiler generates an inline function as an ordinary callable function (out of line) when any of the following is true:
You compile with +d.
You compile with -g.
The function's address is needed (as with a virtual function).
The function contains control structures the compiler can't generate inline.
The function is too complex.
According to the response to this post by 'clamage45' the "control structures that the compiler can't generate inline" are:
the function contains forbidden constructs, like loop, switch, or goto
Another list can be found here. As most other answers have specified the heuristics are going to be 100% compiler specific, from what I've read I think to ensure that a function is actually inlined you need to avoid:
local static variables
loop constructs
switch statements
try/catch
goto
recursion
and of course too complex (whatever that means)
All I know about inline functions (and a lot of other c++ stuff) is here.
Also, if you're focusing on the heuristics of each compiler to decide wether or not inlie a function, that's implementation dependant and you should look at each compiler's documentation. Keep in mind that the heuristic could also change depending on the level of optimitation.
I'm pretty sure most compilers decide based on the length of the function (when compiled) in bytes and how often it is used vs the optimization type (speed vs size).
I know only couple criteria:
If inline meets recursion - inline will be ignored.
switch/while/for in most cases cause compiler to ignore inline
It depends on the compiler. Here's (the first part of) what the GCC manual says:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined.
This flag allows the control of this limit for functions that are
explicitly marked as inline (i.e., marked with the inline keyword
or defined within the class definition in c++). n is the size of
functions that can be inlined in number of pseudo instructions (not
counting parameter handling). The default value of n is 600.
Increasing this value can result in more inlined code at the cost
of compilation time and memory consumption. Decreasing usually
makes the compilation faster and less code will be inlined (which
presumably means slower programs). This option is particularly
useful for programs that use inlining heavily such as those based
on recursive templates with C++.
Inlining is actually controlled by a number of parameters, which
may be specified individually by using --param name=value. The
-finline-limit=n option sets some of these parameters as follows:
#item max-inline-insns-single
is set to I/2.
#item max-inline-insns-auto
is set to I/2.
#item min-inline-insns
is set to 130 or I/4, whichever is smaller.
#item max-inline-insns-rtl
is set to I.
See below for a documentation of the individual parameters
controlling inlining.
Note: pseudo instruction represents, in this particular context, an
abstract measurement of function's size. In no way, it represents
a count of assembly instructions and as such its exact meaning
might change from one release to an another.
it inserts if you write "inline" to beginning of the function?

MSVC - Any way to check if function is actually inlined?

I have to check whether a function is being inlined by the compiler. Is there any way to do this without looking at assembly (which I don't read). I have no choice in figuring this out, so I would prefer if we could not discuss the wisdom of doing this. Thanks!
If you enable warnings C4714, C4710, and C4711, it should give you fairly detailed information about which functions are and aren't inlined.
Each call site may potentially be different.
The compiler may decide for certain parent methods it is worth inlining and for other parent methods that it is not worth inlining. Thus you can not actually determine the real answer without examing the assembley at each call site.
As a result any tools you use would potentially give you a misleading answer. If you use a tool that checks for the existance of symbol (it may be there because some call sites need it, but potentially it may be inlined at others). Conversely the lack of the symbol does not mean the method/function is not inlined it may be static (as in file static) and thus the compiler does not need to keep the symbol around (yet it was not inlined).
Using the /FAs compiler option to dump the asm with source code is the only way that I know of to be sure.
Note: if you want to force a function to be inline, just use __forceinline.
Generate a "MAP" file. This gives you the addresses of all non-inlined functions. If your function appears in this list, it's not inlined, otherwise it's either inlined or optimized out entirely (e.g. when it's not called at all).
If you really don't want to jump into assembly, declare the function as __forceinline, and if the executable gets larger, you know it wasn't being inlined.