Say I have some functions, each of about two simple lines of code, and they call each other like this: A calls B calls C calls D ... calls K. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?
The question is not meaningful.
If you think about inlining, and its consequences, you'll realise it:
Avoids a function call (with all the register saving/frame adjustment)
Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)
Duplicates code (bloating the instruction cache and the executable size, among other things)
When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc -O3 means optimize for speed while -Oz means optimize for size, on inlining they have quasi opposite behaviors!
Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).
This means that a simple forwarding function:
int foo(int a, int b) { return foo(a, b, 3); }
is essentially "transparent" from the inlining point of view.
One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a static free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.
From this two examples we get a hunch of how the heuristics behave:
the less instructions the function have, the better for inling
the less often it is called, the better for inlining
After that, they are parameters you should be able to set to influence one way or another (MSVC as __force_inline which hints strongly at inling, gcc as they -finline-limit flag to "raise" the treshold on the instruction count, etc...)
On a tangent: do you know about partial inlining ?
It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.
For example:
void foo(Bar* x) {
if (not x) { return; } // null pointer, pfff!
// ... BIG BLOC OF STATEMENTS ...
}
void bar(Bar* x) {
// DO 1
foo(x);
// DO 2
}
could get "optimized" as:
void foo#0(Bar* x) {
// ... BIG BLOC OF STATEMENTS ...
}
void bar(Bar* x) {
// DO 1
if (x) { foo#0(x); }
// DO 2
}
Of course, once again the heuristics for inlining apply, but they apply more discriminately!
And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.
I've seen compilers inline more than 5 functions deep. But at some point, it basically becomes a space-efficiency trade-off that the compiler makes. Every compiler is different in this aspect. Visual Studio is very conservative with inlining. GCC (under -O3) and the Intel Compiler love to inline...
Related
I have two functions void f(int x){...} and void g(int x){f(x);}. I know 99% of the time g() receives 3 or 5. In f(), x never changes and supervises lots of loops and condition branches. Would the following be faster than my original code ?
void g(int x)
{
if(x == 3) f(3);
else if(x == 5) f(5);
else f(x);
}
Would the compiler (g++ -Ofast) compile f(3) and f(5) seperately from f(x), analogous to compiling two template parameters ? What else should I do to let the compiler acknowledge the optimization opportunity more easily ? Is declaring void f(const int &x){...} helpful or necessary ?
Answers to such questions are ultimately misleading, because they depend not only on the exact environment you use, but also on the other code your project will link with, should link-time optimization be used. Furthermore, the compiler can generate multiple versions — some more “optimal”, and then the “optimality” depends on who’s calling g(). If g() is constexpr - make it so, the compiler could use that fact to guide optimizations.
In any case: you need to look at the output of your compiler; with the code as it is compiled into your project. Only then you can tell. As a prelude, you should head to Compiler Explorer at https://godbolt.org and see for yourself in an isolated environment.
If this is a performance critical function, and 99% of the time f(3) or f(5) is called, and you are trying to optimize, you should measure the difference of such calls. If f() is an inline function, the optimizer may be able to work with your constant better than a variable, to make some of its functionality evaluate at compile time (such as constant folding, strength reduction, etc.) It might be useful Godbolt.org to look at the assembly and see if any obvious improvements occur. LTO may also help even if it's not inlined, though different people report different levels of success with this.
If you don't see much improvement but think there could be some knowing x in advance, you could also consider writing different specialized versions of f(), such as f3() and f5() which are optimized for those cases (though you might also end up with a larger instruction and have icache issues. It all comes down to measuring what you try and seeing where the benefits (and losses) are. Most important thing is to measure. It's no fun making code complicated for no gain (or worse, to slow it down in the name of optimization.)
From what I know an inline function is an optimization to enhance performance, thus it should run as fast as a macro. Inline function's code should be as short as possible.
I wonder if it make sense to embed functions calls inside an inline function. If the answer is yes, in which context and what are the restrictions?
Actually, I am asking this question because I looked at a code of someone who is calling functions such as "socket()", "sendto()" and "memset()" from inline functions; something that overrides the purpose of an inline function in my opinion.
Note: In the code I have there is no use of any conditional calls to the functions, the inline function just passes arguments to the called functions.
I wonder if it make sense to embed functions calls inside an inline function.
Of course it does. Inlining a call to your function is still an optimisation, removing the cost of that function call and allowing further optimisations in the context of the calling function, whether or not it in turn calls other functions.
in which context and what are the restrictions?
I've no idea what you mean by "context"; but there are no restrictions on what you can do in an inline function. The only semantic effects of declaring a function inline are to allow multiple identical definitions of the, and require a definition in any translation unit that uses the function. In all other respects, it's the same as any other function definition.
Comment posted as answer, by request:
If the guy who wrote the code believed that inline had any meaningful impact on performance, he was manifestly NOT 'doing the right thing'.
Performance comes from correct algorithm selection and avoiding cache misses.
Headaches come from naive premature optimisation techniques that may have worked in 1991
I see no a priori reason why inline code could not contain function calls.
Argument passing aside, inlining inserts the lines of code as they stand, reducing call overhead and allowing local/ad-hoc optimizations.
For instance, inline void MyInline(bool Perform) { if (Perform) memset(); } could very well be skipped when invoked with MyInline(false).
Inlining could also allow inlining of the internal function calls, resulting in even more (micro)optimization opportunities.
The compiler will choose when to inline. And you should avoid attempting premature optimisation at the expense of exposing your implementation.
The compiler may be able to optimise away the forwarding of the functions you are calling. It might do that anyway with optimisation flags even if you do not use the inline keyword.
The time to use the inline keyword is when you want to make a header-only file to use in multiple projects without having to use a link-library. In reality this doesn't really mean "inline" at all, it means "one definition only" even across compilation units calling the function.
In any case you should look at this wiki question / answer:
Benefits of inline functions in C++?
It makes a perfect sense.
Consider a function that consists of two possible branches of execution — a fast path which is activated when certain condition holds (most of the time) and a slow path.
Inlining the whole thing would result in growing the size of the code for little benefit. The slow path complexity may prevent the compiler from inlining the function.
If you make the slow path into a separate function an interesting opportunity opens.
It makes it possible to inline the condition and the fast path while a slow path remains a function call. Inlining the fast path allows to avoid function call overhead most of the time. The slow path is already slow hence the call overhead is negligible.
First, in C++, an "inline" function (one declared in the header file or labeled as such) is just a suggestion to the compiler. The compiler itself will make the decision on whether or not to actually make it inline.
There are three reasons why to inline a function:
Pushing another round of variables onto the stack is expensive, as is branching to the new point in the program.
Sometimes we can do local optimizations with the intermediates of the function (though I wouldn't count on it!)
Put the definition of a function in the header file (work around).
Take the following example
void NonInlinable(int x);
inline void Inline() { NonInlinable(10);}
This makes a ton of sense to inline. I remove 1 function call, so if NonInlinable is pretty fast, then this could be a huge speedup. So regardless of whether or not I'm calling functions, I could still want to inline the call.
Now another example:
inline int Inline(int y) {return y/10;}
//main
...
int x = 5;
int y = Inline(5);
int z = x % 10;
The modulo and devise operations are usually calculated by the same instruction. A really nice compiler, can compute y and z in 1 assembly instruction! magic
So in my mind, a better question to ask is when should I not use inline functions:
When I want to separate definition from declaration (very good practice for readability, and premature optimization is the root of all evil).
When I want to hide my implementation/use good encapsulation.
It is my understanding that modern c++ compilers take shortcuts on things like:
if(true)
{do stuff}
But how about something like:
bool foo(){return true}
...
if(foo())
{do stuff}
Or:
class Functor
{
public:
bool operator() () { return true;}
}
...
Functor f;
if(f()){do stuff}
It depends if the compiler can see foo() in the same compilation unit.
With optimization enabled, if foo() is in the same compilation unit as the callers, it will probably inline the call to foo() and then optimization is simplified to the same if (true) check as before.
If you move foo() to a separate compilation unit, the inlining can no longer happen, so most compilers will no longer be able to optimize this code. (Link-time optimization can optimize across compilation units, but it's a lot less common--not all compilers support it and in general it's less effective.)
I've just tried g++ 4.7.2 with -O3, and in both examples it optimizes out the call. Without -O, it doesn't.
Modern compilers are incredibly clever, and often do "whole program optimization". So as long as you do sensible things, it definitely will optimise away function calls that just return a constant value. The compiler will also inline code that is only called once [even if it is very large], so writing small functions instead of large ones is definitely worth doing. Of course, using the function multiple times, it may not inline it, but then you get better cache hitrate from having the same function called from two places and smallr code overall.
Lets say you have some functions in some classes are called together like this
myclass::render(int offset_x, int offset_y)
{
otherClass.render(offset_x, offset_y)
}
This pattern will repeat for a while possibly through 10+ classes, so my question is:
Are modern C++ compilers smart enough to recognise that wherever the program stores function parameters - From what wikipedia tells me it seems to vary depending on parameter size, but that for a 2 parameter function the processor register seems likely - doesn't need to be overridden with new values?
If not I might need to look at implementing my own methods
I think it's more likely that the compiler will make a larger-scale optimization. You'd have to examine the actual machine code produced, but for example the following trivial attempt:
#include <iostream>
class B {
public:
void F( int x, int y ) {
std::cout << x << ", " << y << std::endl;
}
};
class A {
B b;
public:
void F( int x, int y ) {
b.F( x, y );
}
};
int main() {
A a;
a.F( 32, 64 );
}
causes the compiler (cl.exe from VS 2010, empty project, vanilla 'Release' configuration) to produce assembly that completely inlines the call tree; you basically get "push 40h, push 20h, call std::operator<<."
Abusing __declspec(noinline) causes cl.exe to realize that A::F just forwards to B::F and the definition of A::F is nothing but "call A::F" without stack or register manipulation at all (so in that case, it has performed the optimization you're asking about). But do note that my example is extremely contrived and so says nothing about the compiler's ability to do this well in general, only that it can be done.
In your real-world scenario, you'll have to examine the disassembly yourself. In particular, the 'this' parameter needs to be accounted for (cl.exe usually passes it via the ECX register) -- if you do any manipulation of the class member variables that may impact the results.
Yes, it is. The compiler performs dataflow analysis before register allocation, keeping track of which data is where at which time. And it will see that the arg0 location contains the value that needs to be in the arg0 location in order to call the next function, and so it doesn't need to move the data around.
I'm not a specialist, but it looks a lot like the perfect forwarding problem that will be solved in the next standard (C++0x) by using rvalue-references.
Currently I'd say it depend on the compiler, but I guess if the function and the parametters are simple enough then yes the function will serve as a shortcut.
If this function is imlpemented directly in the class definition (and then becoming implicitely candidate for inlining) it might be inligned, making the call directly call the wanted function instead of having two runtime calls.
In spite of your comment, I think that inlining is germane to this discussion. I don't believe that C++ compilers will do what you're asking (reuse parameters on the stack) UNLESS it also inlines the method completely.
The reason is that if it's making a real function call it still has to put the return address onto the stack, thus making the previous call's parameters no longer at the expected place on the stack. Thus in turn is has to put the parameters back on the stack a second time.
However I really wouldn't worry about that. Unless you're making a ridiculous number of function calls like this AND profiling shows that it's spending a large proportion of its time on these calls they're probably extremely minimal overhead and you shouldn't worry about it. For a function that small however, mark it inline and let the compiler decide if it can inline it away completely.
If I understand the question correctly, you are asking "Are most compilers smart enough to inline a simple function like this", and the answer to that question is yes. Note however the implicit this paremeter which is part of your function (because your function is part of a class), so it might not be completely inlineable if the call level is deep enough.
The problem with inlining is that the compiler will probably only be able to do this for a given compilation unit. The linker is probably less likely to be clever enough to inline from one compilation unit to another.
But given the total trivial nature of the function and that both functions have exactly the same arguments in the same order, the cost of the function call will probably be only one machine instruction viz. an additional branch (or jump) to the true implementation. There is no need to even push the return address onto the stack.
Inline functions are just a request to compilers that insert the complete body of the inline function in every place in the code where that function is used.
But how the compiler decides whether it should insert it or not? Which algorithm/mechanism it uses to decide?
Thanks,
Naveen
Some common aspects:
Compiler option (debug builds usually don't inline, and most compilers have options to override the inline declaration to try to inline all, or none)
suitable calling convention (e.g. varargs functions usually aren't inlined)
suitable for inlining: depends on size of the function, call frequency of the function, gains through inlining, and optimization settings (speed vs. code size). Often, tiny functions have the most benefits, but a huge function may be inlined if it is called just once
inline call depth and recursion settings
The 3rd is probably the core of your question, but that's really "compiler specific heuristics" - you need to check the compiler docs, but usually they won't give much guarantees. MSDN has some (limited) information for MSVC.
Beyond trivialities (e.g. simple getters and very primitive functions), inlining as such isn't very helpful anymore. The cost of the call instruction has gone down, and branch prediction has greatly improved.
The great opportunity for inlining is removing code paths that the compiler knows won't be taken - as an extreme example:
inline int Foo(bool refresh = false)
{
if (refresh)
{
// ...extensive code to update m_foo
}
return m_foo;
}
A good compiler would inline Foo(false), but not Foo(true).
With Link Time Code Generation, Foo could reside in a .cpp (without a inline declararion), and Foo(false) would still be inlined, so again inline has only marginal effects here.
To summarize: There are few scenarios where you should attempt to take manual control of inlining by placing (or omitting) inline statements.
The following is in the FAQ for the Sun Studio 11 compiler:
The compiler generates an inline function as an ordinary callable function (out of line) when any of the following is true:
You compile with +d.
You compile with -g.
The function's address is needed (as with a virtual function).
The function contains control structures the compiler can't generate inline.
The function is too complex.
According to the response to this post by 'clamage45' the "control structures that the compiler can't generate inline" are:
the function contains forbidden constructs, like loop, switch, or goto
Another list can be found here. As most other answers have specified the heuristics are going to be 100% compiler specific, from what I've read I think to ensure that a function is actually inlined you need to avoid:
local static variables
loop constructs
switch statements
try/catch
goto
recursion
and of course too complex (whatever that means)
All I know about inline functions (and a lot of other c++ stuff) is here.
Also, if you're focusing on the heuristics of each compiler to decide wether or not inlie a function, that's implementation dependant and you should look at each compiler's documentation. Keep in mind that the heuristic could also change depending on the level of optimitation.
I'm pretty sure most compilers decide based on the length of the function (when compiled) in bytes and how often it is used vs the optimization type (speed vs size).
I know only couple criteria:
If inline meets recursion - inline will be ignored.
switch/while/for in most cases cause compiler to ignore inline
It depends on the compiler. Here's (the first part of) what the GCC manual says:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined.
This flag allows the control of this limit for functions that are
explicitly marked as inline (i.e., marked with the inline keyword
or defined within the class definition in c++). n is the size of
functions that can be inlined in number of pseudo instructions (not
counting parameter handling). The default value of n is 600.
Increasing this value can result in more inlined code at the cost
of compilation time and memory consumption. Decreasing usually
makes the compilation faster and less code will be inlined (which
presumably means slower programs). This option is particularly
useful for programs that use inlining heavily such as those based
on recursive templates with C++.
Inlining is actually controlled by a number of parameters, which
may be specified individually by using --param name=value. The
-finline-limit=n option sets some of these parameters as follows:
#item max-inline-insns-single
is set to I/2.
#item max-inline-insns-auto
is set to I/2.
#item min-inline-insns
is set to 130 or I/4, whichever is smaller.
#item max-inline-insns-rtl
is set to I.
See below for a documentation of the individual parameters
controlling inlining.
Note: pseudo instruction represents, in this particular context, an
abstract measurement of function's size. In no way, it represents
a count of assembly instructions and as such its exact meaning
might change from one release to an another.
it inserts if you write "inline" to beginning of the function?