Can I selectively (force) inline a function? - c++

In the book Clean Code (and a couple of others I have come across and read) it is suggested to keep the functions small and break them up if they become large. It also suggests that functions should do one thing and one thing only.
In Optimizing software in C++ Agner Fog states that he does not like the rule of breaking up a function just because it crosses a certain threshold of a number of lines. He states that this results in unnecessary jumps which degrade performance.
First off, I understand that it will not matter if the code I am working on is not in a tight loop and that the functions are heavy so that the time it takes to call them is dwarfed by the time the code in the function takes to execute. But let's assume that I am working with functions that are, most of the time, used by other objects/functions and are performing relatively trivial tasks. These functions follow the suggestions listed in the first paragraph (that is, perform one single function and are small/comprehensible). Then I start programming a performance critical function that utilizes these other functions in a tight loop and is essentially a frame function. Lastly, assume that in-lining them has a benefit for the performance critical function but no benefit whatsoever to any other function (yes, I have profiled this, albeit with a lot of copying and pasting which I want to avoid).
Immediately, one can say that tag the function inline and let the compiler choose. But what if I don't want all those functions to be in a `.inl file or exposed in the header? In my current situation, the performance critical functions and the other functions it uses are all in the same source file.
To sum it up, can I selectively (force) inline a function(s) for a single function so that the end code behaves like it is one big function instead of several calls to other functions.

There is nothing that prevents you to put inline in a static function in a .cpp file.
Some compilers have the option to force an inline function, see e.g. the GCC attribute((always_inline)) and a ton of options to fine tune the inlining optimizations (see -minline-* parameters).
My recommendation is to use inline or even better static inline wherever you see fit, and let the compiler decide. They usually do it pretty well.

You cannot force the inline. Also, function calls are pretty cheap on modern CPUs, compared to the cost of the work done. If your functions are large enough to need to be broken down, the additional time taken to do the call will be essentially nothing.
Failing that, you could ... try ... to use a macro.

No, inline is a recommendation to the compiler ; it does not force it to do anything. Also, if you're working with MSVC++, note that __forceinline is a misnomer as well ; it's just a stronger recommendation than inline.

This is as much about good old fashioned straight C as it is about C++. I was pondering this the other day, because in an embedded world, where both speed and space need to be carefully managed, this can really matter (as opposed to the all too oft "don't worry about it, your compiler is smart and memory is cheap prevalent in desktop/server development).
A possible solution that I have yet to vet is to basically use two names for the different variants, something like
inline int _max(int a, int b) {
return a > b ? a : b;
}
and then
int max(int a, int b) {
return _max(a, b);
}
This would give one the ability to selectively call either _max() or max() and yet still having the algorithm defined once-and-only-once.

Inlining – For example, if there exists a function A that frequently calls function B, and function B is relatively small, then profile-guided optimizations will inline function B in function A.
VS Profile-Guided Optimizations
You can use the automated Profile Guided Optimization for Visual C++ plug-in in the Performance and Diagnostics Hub to simplify and streamline the optimization process within Visual Studio, or you can perform the optimization steps manually in Visual Studio or on the command line. We recommend the plug-in because it is easier to use. For information on how to get the plug-in and use it to optimize your app, see Profile Guided Optimization Plug-In.

If you have a known-hot function an want the compiler inline more aggressively than usual the flatten attribute offered by gcc/clang might be something to look into. In contrast to the inline keyword and attributes it applies to inlining decisions regarding the functions called in the marked function.
__attribute__((flatten)) void hot_code() {
// functions called here will be inlined if possible
}
See https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html and https://clang.llvm.org/docs/AttributeReference.html#flatten for official documentation.

Compilers are actually really really good at generating optimized code.
I would suggest just organizing your code into logical groupings (using additional functions if that enhanced readability), marking them inline if appropriate, and letting the compiler decide what code to optimally generate.

Quite surprised this hasn't been mention yet but as of now you can tell the compiler (I believe it may only work with GCC/G++) to force inline a function and ignore a couple restrictions associated with it.
You can do so via __attribute__((always_inline)).
Example of it in use:
inline __attribute__((always_inline)) int pleaseInlineThis() {
return 5;
}
Normally you should avoid forcing an inline as the compiler knows what's best better than you; however there are several use cases such as in OS/MicroController development where you need to inline calls where if it is instead called, would break the functionality.
C++ compilers usually aren't very friendly to controlled environments such as those without some hacks.

As people mentioned, you should avoid doing that as the compiler usually makes better decisions. There are several optimizations that you can enable to improve performance. These will inline the functions if needed:
LTO: link-time optimization or interprocedural optimization
Profile guided optimization: optimizations based on a runtime profile
BOLT: Binary Optimization and Layout Tool
Polly: a high-level loop and data-locality optimizer

Related

Is link time optimization in gcc 5.1 good enough to give up on inlining simple functions?

Out of habit I often write function definitions inline for simple functions such as this (contrived example)
class PositiveInteger
{
private:
long long unsigned m_i;
public:
PositiveInteger (int i);
};
inline PositiveInteger :: PositiveInteger (int i)
: m_i (i)
{
if (i < 0)
throw "oops";
}
I generally like to separate interface files and implementation files but, nevertheless, this is my habit for those functions which the voice in my head tells me will probably be hit a lot in hot spots.
I know the advice is "profile first" and I agree but I could avoid a whole load of profiling effort if I knew a priori that the compiler would produce identical final object code whether functions like this were inlined at compilation or link time. (Also, I believe the injected profiling code itself can cause a change in timing which swamps the effect of very simple functions such as the one above.)
GCC 5.1 has just been released advertising LTO (link time optimization) improvements. How good are they really? What kinds of functions can I safely un-inline knowing the final executable will not be affected?
You already answered your own question: Unless you're targeting an embedded system of some sort with restricted resources, write the code for clarity and maintainability first. Then if performance isn't acceptable you can profile and target your efforts towards the actual hotspots. Think about it: If you write clearer code that takes an extra 250ns that's not noticeable in your use case then the extra time doesn't matter.
GCC with LTO does cross-module inlining, so most of the time you should not see difference in code quality. Offline function declarations are also not duplicated across translation units and compile faster/produce smaller object files.
GCC's inlining heuristics however consider "inline" keyword as a hint that function is probably good to be inlined and increase limits on function size. Similarly it will take bit of extra hint for functions declared in the same translation unit as called. For small functions like one in your example this should not however make any difference.

Does it makes sense to define an inline function which makes calls to other function(s)?

From what I know an inline function is an optimization to enhance performance, thus it should run as fast as a macro. Inline function's code should be as short as possible.
I wonder if it make sense to embed functions calls inside an inline function. If the answer is yes, in which context and what are the restrictions?
Actually, I am asking this question because I looked at a code of someone who is calling functions such as "socket()", "sendto()" and "memset()" from inline functions; something that overrides the purpose of an inline function in my opinion.
Note: In the code I have there is no use of any conditional calls to the functions, the inline function just passes arguments to the called functions.
I wonder if it make sense to embed functions calls inside an inline function.
Of course it does. Inlining a call to your function is still an optimisation, removing the cost of that function call and allowing further optimisations in the context of the calling function, whether or not it in turn calls other functions.
in which context and what are the restrictions?
I've no idea what you mean by "context"; but there are no restrictions on what you can do in an inline function. The only semantic effects of declaring a function inline are to allow multiple identical definitions of the, and require a definition in any translation unit that uses the function. In all other respects, it's the same as any other function definition.
Comment posted as answer, by request:
If the guy who wrote the code believed that inline had any meaningful impact on performance, he was manifestly NOT 'doing the right thing'.
Performance comes from correct algorithm selection and avoiding cache misses.
Headaches come from naive premature optimisation techniques that may have worked in 1991
I see no a priori reason why inline code could not contain function calls.
Argument passing aside, inlining inserts the lines of code as they stand, reducing call overhead and allowing local/ad-hoc optimizations.
For instance, inline void MyInline(bool Perform) { if (Perform) memset(); } could very well be skipped when invoked with MyInline(false).
Inlining could also allow inlining of the internal function calls, resulting in even more (micro)optimization opportunities.
The compiler will choose when to inline. And you should avoid attempting premature optimisation at the expense of exposing your implementation.
The compiler may be able to optimise away the forwarding of the functions you are calling. It might do that anyway with optimisation flags even if you do not use the inline keyword.
The time to use the inline keyword is when you want to make a header-only file to use in multiple projects without having to use a link-library. In reality this doesn't really mean "inline" at all, it means "one definition only" even across compilation units calling the function.
In any case you should look at this wiki question / answer:
Benefits of inline functions in C++?
It makes a perfect sense.
Consider a function that consists of two possible branches of execution — a fast path which is activated when certain condition holds (most of the time) and a slow path.
Inlining the whole thing would result in growing the size of the code for little benefit. The slow path complexity may prevent the compiler from inlining the function.
If you make the slow path into a separate function an interesting opportunity opens.
It makes it possible to inline the condition and the fast path while a slow path remains a function call. Inlining the fast path allows to avoid function call overhead most of the time. The slow path is already slow hence the call overhead is negligible.
First, in C++, an "inline" function (one declared in the header file or labeled as such) is just a suggestion to the compiler. The compiler itself will make the decision on whether or not to actually make it inline.
There are three reasons why to inline a function:
Pushing another round of variables onto the stack is expensive, as is branching to the new point in the program.
Sometimes we can do local optimizations with the intermediates of the function (though I wouldn't count on it!)
Put the definition of a function in the header file (work around).
Take the following example
void NonInlinable(int x);
inline void Inline() { NonInlinable(10);}
This makes a ton of sense to inline. I remove 1 function call, so if NonInlinable is pretty fast, then this could be a huge speedup. So regardless of whether or not I'm calling functions, I could still want to inline the call.
Now another example:
inline int Inline(int y) {return y/10;}
//main
...
int x = 5;
int y = Inline(5);
int z = x % 10;
The modulo and devise operations are usually calculated by the same instruction. A really nice compiler, can compute y and z in 1 assembly instruction! magic
So in my mind, a better question to ask is when should I not use inline functions:
When I want to separate definition from declaration (very good practice for readability, and premature optimization is the root of all evil).
When I want to hide my implementation/use good encapsulation.

Inline Function (When to insert)?

Inline functions are just a request to compilers that insert the complete body of the inline function in every place in the code where that function is used.
But how the compiler decides whether it should insert it or not? Which algorithm/mechanism it uses to decide?
Thanks,
Naveen
Some common aspects:
Compiler option (debug builds usually don't inline, and most compilers have options to override the inline declaration to try to inline all, or none)
suitable calling convention (e.g. varargs functions usually aren't inlined)
suitable for inlining: depends on size of the function, call frequency of the function, gains through inlining, and optimization settings (speed vs. code size). Often, tiny functions have the most benefits, but a huge function may be inlined if it is called just once
inline call depth and recursion settings
The 3rd is probably the core of your question, but that's really "compiler specific heuristics" - you need to check the compiler docs, but usually they won't give much guarantees. MSDN has some (limited) information for MSVC.
Beyond trivialities (e.g. simple getters and very primitive functions), inlining as such isn't very helpful anymore. The cost of the call instruction has gone down, and branch prediction has greatly improved.
The great opportunity for inlining is removing code paths that the compiler knows won't be taken - as an extreme example:
inline int Foo(bool refresh = false)
{
if (refresh)
{
// ...extensive code to update m_foo
}
return m_foo;
}
A good compiler would inline Foo(false), but not Foo(true).
With Link Time Code Generation, Foo could reside in a .cpp (without a inline declararion), and Foo(false) would still be inlined, so again inline has only marginal effects here.
To summarize: There are few scenarios where you should attempt to take manual control of inlining by placing (or omitting) inline statements.
The following is in the FAQ for the Sun Studio 11 compiler:
The compiler generates an inline function as an ordinary callable function (out of line) when any of the following is true:
You compile with +d.
You compile with -g.
The function's address is needed (as with a virtual function).
The function contains control structures the compiler can't generate inline.
The function is too complex.
According to the response to this post by 'clamage45' the "control structures that the compiler can't generate inline" are:
the function contains forbidden constructs, like loop, switch, or goto
Another list can be found here. As most other answers have specified the heuristics are going to be 100% compiler specific, from what I've read I think to ensure that a function is actually inlined you need to avoid:
local static variables
loop constructs
switch statements
try/catch
goto
recursion
and of course too complex (whatever that means)
All I know about inline functions (and a lot of other c++ stuff) is here.
Also, if you're focusing on the heuristics of each compiler to decide wether or not inlie a function, that's implementation dependant and you should look at each compiler's documentation. Keep in mind that the heuristic could also change depending on the level of optimitation.
I'm pretty sure most compilers decide based on the length of the function (when compiled) in bytes and how often it is used vs the optimization type (speed vs size).
I know only couple criteria:
If inline meets recursion - inline will be ignored.
switch/while/for in most cases cause compiler to ignore inline
It depends on the compiler. Here's (the first part of) what the GCC manual says:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined.
This flag allows the control of this limit for functions that are
explicitly marked as inline (i.e., marked with the inline keyword
or defined within the class definition in c++). n is the size of
functions that can be inlined in number of pseudo instructions (not
counting parameter handling). The default value of n is 600.
Increasing this value can result in more inlined code at the cost
of compilation time and memory consumption. Decreasing usually
makes the compilation faster and less code will be inlined (which
presumably means slower programs). This option is particularly
useful for programs that use inlining heavily such as those based
on recursive templates with C++.
Inlining is actually controlled by a number of parameters, which
may be specified individually by using --param name=value. The
-finline-limit=n option sets some of these parameters as follows:
#item max-inline-insns-single
is set to I/2.
#item max-inline-insns-auto
is set to I/2.
#item min-inline-insns
is set to 130 or I/4, whichever is smaller.
#item max-inline-insns-rtl
is set to I.
See below for a documentation of the individual parameters
controlling inlining.
Note: pseudo instruction represents, in this particular context, an
abstract measurement of function's size. In no way, it represents
a count of assembly instructions and as such its exact meaning
might change from one release to an another.
it inserts if you write "inline" to beginning of the function?

When should I use __forceinline instead of inline?

Visual Studio includes support for __forceinline. The Microsoft Visual Studio 2005 documentation states:
The __forceinline keyword overrides
the cost/benefit analysis and relies
on the judgment of the programmer
instead.
This raises the question: When is the compiler's cost/benefit analysis wrong? And, how am I supposed to know that it's wrong?
In what scenario is it assumed that I know better than my compiler on this issue?
You know better than the compiler only when your profiling data tells you so.
The one place I am using it is licence verification.
One important factor to protect against easy* cracking is to verify being licenced in multiple places rather than only one, and you don't want these places to be the same function call.
*) Please don't turn this in a discussion that everything can be cracked - I know. Also, this alone does not help much.
The compiler is making its decisions based on static code analysis, whereas if you profile as don says, you are carrying out a dynamic analysis that can be much farther reaching. The number of calls to a specific piece of code is often largely determined by the context in which it is used, e.g. the data. Profiling a typical set of use cases will do this. Personally, I gather this information by enabling profiling on my automated regression tests. In addition to forcing inlines, I have unrolled loops and carried out other manual optimizations on the basis of such data, to good effect. It is also imperative to profile again afterwards, as sometimes your best efforts can actually lead to decreased performance. Again, automation makes this a lot less painful.
More often than not though, in my experience, tweaking alogorithms gives much better results than straight code optimization.
I've developed software for limited resource devices for 9 years or so and the only time I've ever seen the need to use __forceinline was in a tight loop where a camera driver needed to copy pixel data from a capture buffer to the device screen. There we could clearly see that the cost of a specific function call really hogged the overlay drawing performance.
The only way to be sure is to measure performance with and without. Unless you are writing highly performance critical code, this will usually be unnecessary.
For SIMD code.
SIMD code often uses constants/magic numbers. In a regular function, every const __m128 c = _mm_setr_ps(1,2,3,4); becomes a memory reference.
With __forceinline, compiler can load it once and reuse the value, unless your code exhausts registers (usually 16).
CPU caches are great but registers are still faster.
P.S. Just got 12% performance improvement by __forceinline alone.
The inline directive will be totally of no use when used for functions which are:
recursive,
long,
composed of loops,
If you want to force this decision using __forceinline
Actually, even with the __forceinline keyword. Visual C++ sometimes chooses not to inline the code. (Source: Resulting assembly source code.)
Always look at the resulting assembly code where speed is of importance (such as tight inner loops needed to be run on each frame).
Sometimes using #define instead of inline will do the trick. (of course you loose a lot of checking by using #define, so use it only when and where it really matters).
Actually, boost is loaded with it.
For example
BOOST_CONTAINER_FORCEINLINE flat_tree& operator=(BOOST_RV_REF(flat_tree) x)
BOOST_NOEXCEPT_IF( (allocator_traits_type::propagate_on_container_move_assignment::value ||
allocator_traits_type::is_always_equal::value) &&
boost::container::container_detail::is_nothrow_move_assignable<Compare>::value)
{ m_data = boost::move(x.m_data); return *this; }
BOOST_CONTAINER_FORCEINLINE const value_compare &priv_value_comp() const
{ return static_cast<const value_compare &>(this->m_data); }
BOOST_CONTAINER_FORCEINLINE value_compare &priv_value_comp()
{ return static_cast<value_compare &>(this->m_data); }
BOOST_CONTAINER_FORCEINLINE const key_compare &priv_key_comp() const
{ return this->priv_value_comp().get_comp(); }
BOOST_CONTAINER_FORCEINLINE key_compare &priv_key_comp()
{ return this->priv_value_comp().get_comp(); }
public:
// accessors:
BOOST_CONTAINER_FORCEINLINE Compare key_comp() const
{ return this->m_data.get_comp(); }
BOOST_CONTAINER_FORCEINLINE value_compare value_comp() const
{ return this->m_data; }
BOOST_CONTAINER_FORCEINLINE allocator_type get_allocator() const
{ return this->m_data.m_vect.get_allocator(); }
BOOST_CONTAINER_FORCEINLINE const stored_allocator_type &get_stored_allocator() const
{ return this->m_data.m_vect.get_stored_allocator(); }
BOOST_CONTAINER_FORCEINLINE stored_allocator_type &get_stored_allocator()
{ return this->m_data.m_vect.get_stored_allocator(); }
BOOST_CONTAINER_FORCEINLINE iterator begin()
{ return this->m_data.m_vect.begin(); }
BOOST_CONTAINER_FORCEINLINE const_iterator begin() const
{ return this->cbegin(); }
BOOST_CONTAINER_FORCEINLINE const_iterator cbegin() const
{ return this->m_data.m_vect.begin(); }
There are several situations where the compiler is not able to determine categorically whether it is appropriate or beneficial to inline a function. Inlining may involve trade-off's that the compiler is unwilling to make, but you are (e.g,, code bloat).
In general, modern compilers are actually pretty good at making this decision.
When you know that the function is going to be called in one place several times for a complicated calculation, then it is a good idea to use __forceinline. For instance, a matrix multiplication for animation may need to be called so many times that the calls to the function will start to be noticed by your profiler. As said by the others, the compiler can't really know about that, especially in a dynamic situation where the execution of the code is unknown at compile time.
wA Case For noinline
I wanted to pitch in with an unusual suggestion and actually vouch for __noinline in MSVC or the noinline attribute/pragma in GCC and ICC as an alternative to try out first over __forceinline and its equivalents when staring at profiler hotspots. YMMV but I've gotten so much more mileage (measured improvements) out of telling the compiler what to never inline than what to always inline. It also tends to be far less invasive and can produce much more predictable and understandable hotspots when profiling the changes.
While it might seem very counter-intuitive and somewhat backward to try to improve performance by telling the compiler what not to inline, I'd claim based on my experience that it's much more harmonious with how optimizing compilers work and far less invasive to their code generation. A detail to keep in mind that's easy to forget is this:
Inlining a callee can often result in the caller, or the caller of the caller, to cease to be inlined.
This is what makes force inlining a rather invasive change to the code generation that can have chaotic results on your profiling sessions. I've even had cases where force inlining a function reused in several places completely reshuffled all top ten hotspots with the highest self-samples all over the place in very confusing ways. Sometimes it got to the point where I felt like I'm fighting with the optimizer making one thing faster here only to exchange a slowdown elsewhere in an equally common use case, especially in tricky cases for optimizers like bytecode interpretation. I've found noinline approaches so much easier to use successfully to eradicate a hotspot without exchanging one for another elsewhere.
It would be possible to inline functions much less invasively if we
could inline at the call site instead of determining whether or not
every single call to a function should be inlined. Unfortunately, I've
not found many compilers supporting such a feature besides ICC. It
makes much more sense to me if we are reacting to a hotspot to respond
by inlining at the call site instead of making every single call of a
particular function forcefully inlined. Lacking this wide support
among most compilers, I've gotten far more successful results with
noinline.
Optimizing With noinline
So the idea of optimizing with noinline is still with the same goal in mind: to help the optimizer inline our most critical functions. The difference is that instead of trying to tell the compiler what they are by forcefully inlining them, we are doing the opposite and telling the compiler what functions definitely aren't part of the critical execution path by forcefully preventing them from being inlined. We are focusing on identifying the rare-case non-critical paths while leaving the compiler still free to determine what to inline in the critical paths.
Say you have a loop that executes for a million iterations, and there is a function called baz which is only very rarely called in that loop once every few thousand iterations on average in response to very unusual user inputs even though it only has 5 lines of code and no complex expressions. You've already profiled this code and the profiler shows in the disassembly that calling a function foo which then calls baz has the largest number of samples with lots of samples distributed around calling instructions. The natural temptation might be to force inline foo. I would suggest instead to try marking baz as noinline and time the results. I've managed to make certain critical loops execute 3 times faster this way.
Analyzing the resulting assembly, the speedup came from the foo function now being inlined as a result of no longer inlining baz calls into its body.
I've often found in cases like these that marking the analogical baz with noinline produces even bigger improvements than force inlining foo. I'm not a computer architecture wizard to understand precisely why but glancing at the disassembly and the distribution of samples in the profiler in such cases, the result of force inlining foo was that the compiler was still inlining the rarely-executed baz on top of foo, making foo more bloated than necessary by still inlining rare-case function calls. By simply marking baz with noinline, we allow foo to be inlined when it wasn't before without actually also inlining baz. Why the extra code resulting from inlining baz as well slowed down the overall function is still not something I understand precisely; in my experience, jump instructions to more distant paths of code always seemed to take more time than closer jumps, but I'm at a loss as to why (maybe something to do with the jump instructions taking more time with larger operands or something to do with the instruction cache). What I can definitely say for sure is that favoring noinline in such cases offered superior performance to force inlining and also didn't have such disruptive results on the subsequent profiling sessions.
So anyway, I'd suggest to give noinline a try instead and reach for it first before force inlining.
Human vs. Optimizer
In what scenario is it assumed that I know better than my compiler on
this issue?
I'd refrain from being so bold as to assume. At least I'm not good enough to do that. If anything, I've learned over the years the humbling fact that my assumptions are often wrong once I check and measure things I try with the profiler. I have gotten past the stage (over a couple of decades of making my profiler my best friend) to avoid completely blind stabs at the dark only to face humbling defeat and revert my changes, but at my best, I'm still making, at most, educated guesses. Still, I've always known better than my compiler, and hopefully, most of us programmers have always known this better than our compilers, how our product is supposed to be designed and how it is is going to most likely be used by our customers. That at least gives us some edge in the understanding of common-case and rare-case branches of code that compilers don't possess (at least without PGO and I've never gotten the best results with PGO). Compilers don't possess this type of runtime information and foresight of common-case user inputs. It is when I combine this user-end knowledge, and with a profiler in hand, that I've found the biggest improvements nudging the optimizer here and there in teaching it things like what to inline or, more commonly in my case, what to never inline.

What is wrong with using inline functions?

While it would be very convenient to use inline functions at some situations,
Are there any drawbacks with inline functions?
Conclusion:
Apparently, There is nothing wrong with using inline functions.
But it is worth noting the following points!
Overuse of inlining can actually make programs slower. Depending on a function's size, inlining it can cause the code size to increase or decrease. Inlining a very small accessor function will usually decrease code size while inlining a very large function can dramatically increase code size. On modern processors smaller code usually runs faster due to better use of the instruction cache. - Google Guidelines
The speed benefits of inline functions tend to diminish as the function grows in size. At some point the overhead of the function call becomes small compared to the execution of the function body, and the benefit is lost - Source
There are few situations where an inline function may not work:
For a function returning values; if a return statement exists.
For a function not returning any values; if a loop, switch or goto statement exists.
If a function is recursive. -Source
The __inline keyword causes a function to be inlined only if you specify the optimize option. If optimize is specified, whether or not __inline is honored depends on the setting of the inline optimizer option. By default, the inline option is in effect whenever the optimizer is run. If you specify optimize , you must also specify the noinline option if you want the __inline keyword to be ignored. -Source
It worth pointing out that the inline keyword is actually just a hint to the compiler. The compiler may ignore the inline and simply generate code for the function someplace.
The main drawback to inline functions is that it can increase the size of your executable (depending on the number of instantiations). This can be a problem on some platforms (eg. embedded systems), especially if the function itself is recursive.
I'd also recommend making inline'd functions very small - The speed benefits of inline functions tend to diminish as the function grows in size. At some point the overhead of the function call becomes small compared to the execution of the function body, and the benefit is lost.
It could increase the size of the
executable, and I don't think
compilers will always actually make
them inline even though you used the
inline keyword. (Or is it the other
way around, like what Vaibhav
said?...)
I think it's usually OK if the
function has only 1 or 2 statements.
Edit: Here's what the linux CodingStyle document says about it:
Chapter 15: The inline disease
There appears to be a common
misperception that gcc has a magic
"make me faster" speedup option called
"inline". While the use of inlines can
be appropriate (for example as a means
of replacing macros, see Chapter 12),
it very often is not. Abundant use of
the inline keyword leads to a much
bigger kernel, which in turn slows the
system as a whole down, due to a
bigger icache footprint for the CPU
and simply because there is less
memory available for the pagecache.
Just think about it; a pagecache miss
causes a disk seek, which easily takes
5 miliseconds. There are a LOT of cpu
cycles that can go into these 5
miliseconds.
A reasonable rule of thumb is to not
put inline at functions that have more
than 3 lines of code in them. An
exception to this rule are the cases
where a parameter is known to be a
compiletime constant, and as a result
of this constantness you know the
compiler will be able to optimize most
of your function away at compile time.
For a good example of this later case,
see the kmalloc() inline function.
Often people argue that adding inline
to functions that are static and used
only once is always a win since there
is no space tradeoff. While this is
technically correct, gcc is capable of
inlining these automatically without
help, and the maintenance issue of
removing the inline when a second user
appears outweighs the potential value
of the hint that tells gcc to do
something it would have done anyway.
There is a problem with inline - once you defined a function in a header file (which implies inline, either explicit or implicit by defining a body of a member function inside class) there is no simple way to change it without forcing your users to recompile (as opposed to relink). Often this causes problems, especially if the function in question is defined in a library and header is part of its interface.
I agree with the other posts:
inline may be superfluous because the compiler will do it
inline may bloat your code
A third point is it may force you to expose implementation details in your headers, .e.g.,
class OtherObject;
class Object {
public:
void someFunc(OtherObject& otherObj) {
otherObj.doIt(); // Yikes requires OtherObj declaration!
}
};
Without the inline a forward declaration of OtherObject was all you needed. With the inline your
header needs the definition for OtherObject.
As others have mentioned, the inline keyword is only a hint to the compiler. In actual fact, most modern compilers will completely ignore this hint. The compiler has its own heuristics to decide whether to inline a function, and quite frankly doesn't want your advice, thank you very much.
If you really, really want to make something inline, if you've actually profiled it and looked at the disassembly to ensure that overriding the compiler heuristic actually makes sense, then it is possible:
In VC++, use the __forceinline keyword
In GCC, use __attribute__((always_inline))
The inline keyword does have a second, valid purpose however - declaring functions in header files but not inside a class definition. The inline keyword is needed to tell the compiler not to generate multiple definitions of the function.
I doubt it. Even the compiler automatically inlines some functions for optimization.
I don't know if my answer's related to the question but:
Be very careful about inline virtual methods! Some buggy compilers (previous versions of Visual C++ for example) would generate inline code for virtual methods where the standard behaviour was to do nothing but go down the inheritance tree and call the appropriate method.
You should also note that the inline keyword is only a request. The compiler may choose not to inline it, likewise the compiler may choose to make a function inline that you did not define as inline if it thinks the speed/size tradeoff is worth it.
This decision is generaly made based on a number of things, such as the setting between optimise for speed(avoids the function call) and optimise for size (inlining can cause code bloat, so isn't great for large repeatedly used functions).
with the VC++ compiler you can overide this decision by using __forceinline
SO in general:
Use inline if you really want to have a function in a header, but elsewhere theres little point because if your going to gain anything from it, a good compiler will be making it inline for you anyway.
Inlining larger functions can make the program larger, resulting in more instruction cache misses and making it slower.
Deciding when a function is small enough that inlining will increase performance is quite tricky. Google's C++ Style Guide recommends only inlining functions of 10 lines or less.
(Simplified) Example:
Imagine a simple program that just calls function "X" 5 times.
If X is small and all calls are inlined: Potentially all instructions will be prefetched into the instruction cache with a single main memory access - great!
If X is large, let's say approaching the capacity of the instruction cache:
Inlining X will potentially result in fetching instructions from memory once for each inline instance of X.
If X isn't inlined, instructions may be fetched from memory on the first call to X, but could potentially remain in the cache for subsequent calls.
Excessive inlining of functions can increase size of compiled executable which can have negative impact on cache performance, but nowadays compiler decide about function inlining on their own (depending on many criterias) and ignore inline keyword.
Among other issues with inline functions, which I've seen heavily overused (I've seen inline functions of 500 lines), what you have to be aware of are:
build instability
Changing the source of an inline function causes all the users of the header to recompile
#includes leak into the client. This can be very nasty if you rework an inlined function and remove a no-longer used header which some client has relied on.
executable size
Every time an inline is inlined instead of a call instruction the compiler has to generate the whole code of the inline. This is OK if the code of the function is short (one or two lines), not so good if the function is long
Some functions can produce a lot more code than at first appears. I case in point is a 'trivial' destructor of a class that has a lot of non-pod member variables (or two or 3 member variables with rather messy destructors). A call has to be generated for each destructor.
execution time
this is very dependent on your CPU cache and shared libraries, but locality of reference is important. If the code you might be inlining happens to be held in cpu cache in one place, a number of clients can find the code an not suffer from a cache miss and the subsequent memory fetch (and worse, should it happen, a disk fetch). Sadly this is one of those cases where you really need to do performance analysis.
The coding standard where I work limit inline functions to simple setters/getters, and specifically say destructors should not be inline, unless you have performance measurements to show the inlining confers a noticeable advantage.
In addition to other great answers, at least once I saw a case where forced inlining actually slowed down the affected code by 1.5x. There was a nested loop inside (pretty small one) and when this function was compiled as a separate unit, compiler managed to efficiently unroll and optimize it. But when same function was inlined into much bigger outer function, compiler (MSVC 2017) failed to optimize this loop.
As other people said that inline function can create a problem if the the code is large.As each instruction is stored in a specific memory location ,so overloading of inline function make a code to take more time to get exicuted.
there are few other situations where inline may not work
does not work in case of recursive function.
It may also not work with static variable.
it also not work in case there is use of a loop,switch etc.or we can say that with multiple statements.
And the function main cannot work as inline function.