big functions do not inline llvm -inline pass - llvm

It seems like llvm -inline pass only inlines small functions. Is there a way to inline all functions, no matter how big they are?

You can use the -inline-threshold flag to change the "cost" up to which LLVM will inline a function. A higher value means more functions will be inlined.
opt -inline -inline-threshold=10000 ...
Obviously functions can not always be inlined, particularly when the call graph contains cycles (recursive calls).

Related

GCC/Clang equivalent to Intel ICC #pragma forceinline recursive

Does clang/gcc has a way to do what Intel icc does with #pragma forceinline recursive or anything close to that?
#pragma forceinline [recursive]
Specifies inlining of all calls in a statement.
The forceinline pragma indicates that the calls in question should be inlined whenever the compiler is capable of doing so.
recursive indicates that the pragma applies to all of the calls that are called by these calls, recursively, down the call chain
These are statement-specific inlining pragmas. Each can be placed before a C/C++ statement, and will then apply to all of the calls within a statement and all calls within statements nested within that statement.
(https://software.intel.com/en-us/node/524498)
I would prefer not to use __attribute__((always_inline)) since it applies to functions. And the function in my case is not exactly small. And I would like to leave it for the compiler to determine when to inline it in most cases. Besides, though this attribute is very strong, it still is not "always". Changing global options such as inline depth and size limit are also not ideal.
However, in some particular cases, I know for fact that inlining the function can has a big impact on performance (in my case, by a factor of 2). Intel's #pragma inline/forceinline [recursive] applies to individual statement and has exactly the effect that I want. Does GCC or clang has any thing similar that can effect only a particular call site of a function?

Does it makes sense to define an inline function which makes calls to other function(s)?

From what I know an inline function is an optimization to enhance performance, thus it should run as fast as a macro. Inline function's code should be as short as possible.
I wonder if it make sense to embed functions calls inside an inline function. If the answer is yes, in which context and what are the restrictions?
Actually, I am asking this question because I looked at a code of someone who is calling functions such as "socket()", "sendto()" and "memset()" from inline functions; something that overrides the purpose of an inline function in my opinion.
Note: In the code I have there is no use of any conditional calls to the functions, the inline function just passes arguments to the called functions.
I wonder if it make sense to embed functions calls inside an inline function.
Of course it does. Inlining a call to your function is still an optimisation, removing the cost of that function call and allowing further optimisations in the context of the calling function, whether or not it in turn calls other functions.
in which context and what are the restrictions?
I've no idea what you mean by "context"; but there are no restrictions on what you can do in an inline function. The only semantic effects of declaring a function inline are to allow multiple identical definitions of the, and require a definition in any translation unit that uses the function. In all other respects, it's the same as any other function definition.
Comment posted as answer, by request:
If the guy who wrote the code believed that inline had any meaningful impact on performance, he was manifestly NOT 'doing the right thing'.
Performance comes from correct algorithm selection and avoiding cache misses.
Headaches come from naive premature optimisation techniques that may have worked in 1991
I see no a priori reason why inline code could not contain function calls.
Argument passing aside, inlining inserts the lines of code as they stand, reducing call overhead and allowing local/ad-hoc optimizations.
For instance, inline void MyInline(bool Perform) { if (Perform) memset(); } could very well be skipped when invoked with MyInline(false).
Inlining could also allow inlining of the internal function calls, resulting in even more (micro)optimization opportunities.
The compiler will choose when to inline. And you should avoid attempting premature optimisation at the expense of exposing your implementation.
The compiler may be able to optimise away the forwarding of the functions you are calling. It might do that anyway with optimisation flags even if you do not use the inline keyword.
The time to use the inline keyword is when you want to make a header-only file to use in multiple projects without having to use a link-library. In reality this doesn't really mean "inline" at all, it means "one definition only" even across compilation units calling the function.
In any case you should look at this wiki question / answer:
Benefits of inline functions in C++?
It makes a perfect sense.
Consider a function that consists of two possible branches of execution — a fast path which is activated when certain condition holds (most of the time) and a slow path.
Inlining the whole thing would result in growing the size of the code for little benefit. The slow path complexity may prevent the compiler from inlining the function.
If you make the slow path into a separate function an interesting opportunity opens.
It makes it possible to inline the condition and the fast path while a slow path remains a function call. Inlining the fast path allows to avoid function call overhead most of the time. The slow path is already slow hence the call overhead is negligible.
First, in C++, an "inline" function (one declared in the header file or labeled as such) is just a suggestion to the compiler. The compiler itself will make the decision on whether or not to actually make it inline.
There are three reasons why to inline a function:
Pushing another round of variables onto the stack is expensive, as is branching to the new point in the program.
Sometimes we can do local optimizations with the intermediates of the function (though I wouldn't count on it!)
Put the definition of a function in the header file (work around).
Take the following example
void NonInlinable(int x);
inline void Inline() { NonInlinable(10);}
This makes a ton of sense to inline. I remove 1 function call, so if NonInlinable is pretty fast, then this could be a huge speedup. So regardless of whether or not I'm calling functions, I could still want to inline the call.
Now another example:
inline int Inline(int y) {return y/10;}
//main
...
int x = 5;
int y = Inline(5);
int z = x % 10;
The modulo and devise operations are usually calculated by the same instruction. A really nice compiler, can compute y and z in 1 assembly instruction! magic
So in my mind, a better question to ask is when should I not use inline functions:
When I want to separate definition from declaration (very good practice for readability, and premature optimization is the root of all evil).
When I want to hide my implementation/use good encapsulation.

can overuse in Macros hurt performance?

I have a very long code, which is being called millions of time,
I have noticed that if I change all the macros into inline functions the code runs a lot faster.
Can you explain why this is? Aren't macros only a text replacement? As opposed to inline functions which can be a call to a function?
A macro is a text sustitution and will as such generally produce more executable code. Every time you call a macro, code is inserted (well, not necessarily, the macro could be empty... but in principle).
Inline functions, on the other hand, may work the same as macros, but they might also not be inlined at all.
In general, the inline keyword is rather a weak hint than a requirement anyway, compilers will nowadays judiciously inline functions (or will abstain from doing so) based on heuristics, mostly the number of pseudo-instructions.
Inline functions may thus cause the compiler to not inline the function at all, or inline it a couple of times and then call it non-inined in addition.
Surprisingly, not inlining may actually be faster than inlining, since it reduces overall code size and thus the number of cache and TLB misses.
This will depend on the particular macro and function call that you are using. A particular macro can actually compile to a longer set of operations than the inline function. It is often better not to use a macro for certain processes. The inline function will allow the compiler to type check and optimize the various processes. Macros will be subject to a number of errors and can actually cause various inefficiencies (such as by having to move variables in and out of storage).
In any case, since you actually see this happening in your code, you can tell that the compiler is able to optimize your inline code rather than blindly put in the text expansion.
Note that a google search 'macros vs inline' shows a number of discussions of this.
Apart from forcing inlining, macros can also be detrimental to speed if they are not carefully written not to evaluate their arguments twice. Take for example this little function-like macro and its inline function equivalent:
#define square(x) ((x)*(x))
inline long square(long x) { return x*x; }
Now, when you call them with a variable square(foo), they are equivalent. The macro vesion expands to ((foo)*(foo)), which is one multiplication just like the function if it's inlined.
However, if you call them with square(expensiveComputation(foo)), the result of the macro is, that expensiveComputation() is called twice. The inline function, in contrast, behaves like any function: its argument is evaluated once before the body of the function is executed.
Of course, you could write the macro using the gnu extension of compound statements (see http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html for documentation on this) to avoid double evaluation like this:
#define square(x) ({ \
long square_temp_variable = (x); \
square_temp_variable*square_temp_variable; \
})
But this is a lot of hassle, and it makes the code unportable. So, better stick with inline functions.
at general it is a good advise to replace function style macros by inline functions wherever this is possible.
not only you ged rit of some nasty traps a = MIN(i++, 50) for example you also gain typesafety and as already stated in some comments you avoid multiple evaluation of arguements, that may have very bad influence on performance.

What are any real sets of rules compilers use to decide whether to inline a function?

We have a macro for signalling errors in a common utilities library that goes like this:
#define OurMacro( condition ) \
if( condition ) { \
} else { \
CallExternalFunctionThatWillThrowAnException( parametersListHere ); \
} \
What I refer to as parametersListHere is a comma-separated list of constants and macros that is populated by the compiler at each macro expansion.
That function call always resolves into a call - the function implementation is not exposed to the compiler. The function has six parameters and in debug configuration all of them have meaningful values, while in release configuration only two have meaningful values and others are passed the same default values.
Normally the condition will hold true, so I don't care how fast the invokation is, I only care about the code bloat. Calling that function with 6 parameters requires seven x86 instruction (6 pushes and one call), and clearly 4 of those pushes can be avoided if the function signature is changed to have two parameters only - this can be done by introducing an intermediate "gate" function implemented in such way its implementation is not visible to the compiler.
I need to estimate whether I should insist on that change. So far the primary improvement I expect is that reducing the number of parameters will drop 4 instructions on each invokation which means that the code surrounding the macro expansion will become smaller and the compiler will inline it more likely and optimize the emitted code further.
How can I estimate that without actually trying and recompiling all our code and carefully analyzing the emitted code? Every time I read about inline there's a statement that the compiler decides whether to inline the function.
Can I see some exact set of rules of how the function internals influence compiler decision on inlining?
GCC has a fairly large set of options that expose how their process works, documented here. It's of course not exact, given that it will be tweaked over time and it's CPU-dependent.
The first rule is "their body is smaller than expected function call code".
A second rule is "static functions called once".
There are also parameters affecting the inling process, e.g. max-inline-insns-single. An insn is a pseudo-instruction in the GCC compiler, and is used here as a measure of function complexity. The documentation of parameter max-inline-insns-auto makes it clear that manually declaring a function inline might cause it to be considered for inlining even if it is too big for automatic inlining.
Inlining isn't a all-or-nothing process, since there's a -fpartial-inlining flag.
Of course, you can't consider inlining in isolation. Common Subexpression Elimination (CSE) makes code simpler. It's an optimization pass that may make a function small enough to be inlined. After inlining, new common subexpressions may be discovered so the CSE pass should be run again, which in turn might trigger further inlining. And CSE isn't the only optimization that needs rerunning.
The rules on what functions get inlined and under what conditions (e.g. selected optimization level) are specific to each compiler, so I suggest you check your compiler's documentation. However, a function that just forwards to another function (as you propose) should be a good candidate for inlining by any compiler that supports it.
Some compilers have a mechanism whereby you can flag that you really want a function to be inlined, e.g. MSVC++ has __forceinline.
If you are using Visual C++, you can use __forceinline to force the compiler to inline a function.

Inline Function (When to insert)?

Inline functions are just a request to compilers that insert the complete body of the inline function in every place in the code where that function is used.
But how the compiler decides whether it should insert it or not? Which algorithm/mechanism it uses to decide?
Thanks,
Naveen
Some common aspects:
Compiler option (debug builds usually don't inline, and most compilers have options to override the inline declaration to try to inline all, or none)
suitable calling convention (e.g. varargs functions usually aren't inlined)
suitable for inlining: depends on size of the function, call frequency of the function, gains through inlining, and optimization settings (speed vs. code size). Often, tiny functions have the most benefits, but a huge function may be inlined if it is called just once
inline call depth and recursion settings
The 3rd is probably the core of your question, but that's really "compiler specific heuristics" - you need to check the compiler docs, but usually they won't give much guarantees. MSDN has some (limited) information for MSVC.
Beyond trivialities (e.g. simple getters and very primitive functions), inlining as such isn't very helpful anymore. The cost of the call instruction has gone down, and branch prediction has greatly improved.
The great opportunity for inlining is removing code paths that the compiler knows won't be taken - as an extreme example:
inline int Foo(bool refresh = false)
{
if (refresh)
{
// ...extensive code to update m_foo
}
return m_foo;
}
A good compiler would inline Foo(false), but not Foo(true).
With Link Time Code Generation, Foo could reside in a .cpp (without a inline declararion), and Foo(false) would still be inlined, so again inline has only marginal effects here.
To summarize: There are few scenarios where you should attempt to take manual control of inlining by placing (or omitting) inline statements.
The following is in the FAQ for the Sun Studio 11 compiler:
The compiler generates an inline function as an ordinary callable function (out of line) when any of the following is true:
You compile with +d.
You compile with -g.
The function's address is needed (as with a virtual function).
The function contains control structures the compiler can't generate inline.
The function is too complex.
According to the response to this post by 'clamage45' the "control structures that the compiler can't generate inline" are:
the function contains forbidden constructs, like loop, switch, or goto
Another list can be found here. As most other answers have specified the heuristics are going to be 100% compiler specific, from what I've read I think to ensure that a function is actually inlined you need to avoid:
local static variables
loop constructs
switch statements
try/catch
goto
recursion
and of course too complex (whatever that means)
All I know about inline functions (and a lot of other c++ stuff) is here.
Also, if you're focusing on the heuristics of each compiler to decide wether or not inlie a function, that's implementation dependant and you should look at each compiler's documentation. Keep in mind that the heuristic could also change depending on the level of optimitation.
I'm pretty sure most compilers decide based on the length of the function (when compiled) in bytes and how often it is used vs the optimization type (speed vs size).
I know only couple criteria:
If inline meets recursion - inline will be ignored.
switch/while/for in most cases cause compiler to ignore inline
It depends on the compiler. Here's (the first part of) what the GCC manual says:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined.
This flag allows the control of this limit for functions that are
explicitly marked as inline (i.e., marked with the inline keyword
or defined within the class definition in c++). n is the size of
functions that can be inlined in number of pseudo instructions (not
counting parameter handling). The default value of n is 600.
Increasing this value can result in more inlined code at the cost
of compilation time and memory consumption. Decreasing usually
makes the compilation faster and less code will be inlined (which
presumably means slower programs). This option is particularly
useful for programs that use inlining heavily such as those based
on recursive templates with C++.
Inlining is actually controlled by a number of parameters, which
may be specified individually by using --param name=value. The
-finline-limit=n option sets some of these parameters as follows:
#item max-inline-insns-single
is set to I/2.
#item max-inline-insns-auto
is set to I/2.
#item min-inline-insns
is set to 130 or I/4, whichever is smaller.
#item max-inline-insns-rtl
is set to I.
See below for a documentation of the individual parameters
controlling inlining.
Note: pseudo instruction represents, in this particular context, an
abstract measurement of function's size. In no way, it represents
a count of assembly instructions and as such its exact meaning
might change from one release to an another.
it inserts if you write "inline" to beginning of the function?