In my code (either C or C++; let's say it's C++) I have a one-liner inline function foo() which gets called from many places in the code. I'm using a profiling tool which gathers statistics by line in the object code, which it translates into statistics by using the source-code-line information (which we get with -g in clang or GCC). Thus the profiler can't distinguish between calls to foo() from different places.
I would like the stats to be counted separately for the different places foo() get called. For this to happen, I need the compiler to "fully" inline foo() - including forgetting about it when it comes to the source location information.
Now, I know I can achieve this by using a macro - that way, there is no function, and the code is just pasted where I use it. But that wont work for operators, for example; and it may be a problem with templates. So, can I tell the compiler to do what I described?
Notes:
Compiler-specific answers are relevant; I'm mainly interested in GCC and clang.
I'm not compiling a debug build, i.e. optimizations are turned on.
I know that inline function are either replaced where it is called or behave as a normal function.
But how will I know whether inline function is actually replaced at the place where it is called or not as decision of treating inline function as inline is at the compile time?
Programatically at run-time, You cannot.
And the truth of the matter is: You don't need to know
An compiler can choose to inline functions that are not marked inline or ignore functions marked explicitly inline, it is completely the wish(read wisdom) of the compiler & You should trust the compiler do its job judiciously. Most of the mainstream compilers will do their job nicely.
If your question is purely from a academic point of view then there are a couple of options available:
Analyze generated Assembly Code:
You can check the assembly code to check if the function code is inlined at point of calling.
How to generate the assembly code?
For gcc:
Use the -S switch while compilation.
For ex:
g++ -S FileName.cpp
The generated assembly code is created as file FileName.s.
For MSVC:
Use the /FA Switch from command line.
In the generated assembly code lookup if there is a call assembly instruction for the particular function.
Use Compiler specific Warnings and Diagnostics:
Some compilers will emit a warning if they fail to comply an inline function request.
For example, in gcc, the -Winline command option will emit a warning if the compiler does not inline a function that was declared inline.
Check the GCC documentation for more detail:
-Winline
Warn if a function that is declared as inline cannot be inlined. Even with this option, the compiler does not warn about failures to inline functions declared in system headers.
The compiler uses a variety of heuristics to determine whether or not to inline a function. For example, the compiler takes into account the size of the function being inlined and the amount of inlining that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by -Winline to appear or disappear.
Check the generated code. If the function is expanded, you'll see its body, as opposed to a call or similar instruction.
You can use tools for listing symbols from object files such as nm on Linux. If the function was inlined, it will not be listed in nm output - it became part of some other function. Also you will not be able to put breakpoint on this function by name in debugger.
If you need to make sure that function is inlined and OK to go with proprietary extension in MS VC++, check out the __forceinline declarator. The compiler will either inline the function or, if it falls into the list of documented special cases, you will get a warning - so you will know the inlining status.
Not endorsing it in any way.
With gdb, if you cannot call to a function, one of its possible meanings is the function is inline. Flipping the reasoning, if you can call a function inside gdb, means the function is not marked inline.
The decision to inline or not a function is made by compiler. And since it is made by compiler, so YES, it can be made at compile time only.
So, if you can see the assembly code by using -S option (with gcc -S produces assembly code), you can see whether your function has been inlined or not.
There is a way to determine if a function is inline programmatically, without looking at the assembly code. This answer is taken from here.
Say you want to check if a specific call is inlined. You would go about like this. Compiler inlines functions, but for those functions that are exported (and almost all function are exported) it needs to maintain a non-inlined addressable function code that can be called from the outside world.
To check if your function my_function is inlined, you need to compare the my_function function pointer (which is not inlined) to the current value of the PC. Here is how I did it in my environment (GCC 7, x86_64):
void * __attribute__((noinline)) get_pc () { return _builtin_return_address(0); }
void my_function() {
void* pc = get_pc();
asm volatile("": : :"memory");
printf("Function pointer = %p, current pc = %p\n", &my_function, pc);
}
void main() {
my_function();
}
If a function is not inlined, difference between the current value of the PC and value of the function pointer should small, otherwise it will be larger. On my system, when my_function is not inlined I get the following output:
Function pointer = 0x55fc17902500, pc = 0x55fc1790257b
If the function is inlined, I get:
Function pointer = 0x55ddcffc6560, pc = 0x55ddcffc4c6a
For the non-inlined version difference is 0x7b and for the inlined version difference is 0x181f.
see the size of object files, they are different between inlined and not inlined
use nm "obj_file" | grep "fun_name", they are also different
gcc -Winline -O1
compare with assembly code
Above answer are very mush useful, I am just adding some point which we keep in our mind while writing inline function.
Remember, inlining is only a request to the compiler, not a command. Compiler can ignore the request for inlining. Compiler may not perform inlining in such circumstances like:
1) If a function contains a loop. (for, while, do-while)
2) If a function contains static variables.
3) If a function is recursive.
4) If a function return type is other than void, and the return statement doesn’t exist in function body.
5) If a function contains switch or goto statement.
Complete info: https://www.geeksforgeeks.org/inline-functions-cpp/
The compiler does not make a function inline if the function returns an address.
Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the inline keyword?
For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible):
//FILE: MyA.hpp
class MyA
{
public:
int foo(void) const;
};
//FILE: MyB.hpp
class MyB
{
private:
MyA my_a_;
public:
int foo(void) const;
};
//FILE: MyB.cpp
// PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH
// IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"?
int MyB::foo(void)
{
return my_a_.foo();
}
I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?).
Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs).
However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)?
Are my hopes too naive?
Here's a quick test of your example (with a MyA::foo() implementation that simply returns 42). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the -flto option (GCC) or the /GL option (MSVC) results in full optimization - wherever MyB::foo() is called, it's simply replaced with 42.
With GCC (MinGW 4.5.1):
gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp
the call to MyB::foo() was not optimized away. MyB::foo() itself was slightly optimized to:
Dump of assembler code for function MyB::foo() const:
0x00401350 <+0>: push %ebp
0x00401351 <+1>: mov %esp,%ebp
0x00401353 <+3>: sub $0x8,%esp
=> 0x00401356 <+6>: leave
0x00401357 <+7>: jmp 0x401360 <MyA::foo() const>
Which is the entry prologue is left in place, but immediately undone (the leave instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that MyB::foo() is simply returning whatever MyA::foo() returns. I'm not sure why the prologue is left in.
MSVC 16 (from VS 2010) handled things a little differently:
MyB::foo() ended up as two jumps - one to a 'thunk' of some sort:
0:000> u myb!MyB::foo
myb!MyB::foo:
001a1030 e9d0ffffff jmp myb!ILT+0(?fooMyAQBEHXZ) (001a1005)
And the thunk simply jumped to MyA::foo():
myb!ILT+0(?fooMyAQBEHXZ):
001a1005 e936000000 jmp myb!MyA::foo (001a1040)
Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, MyB::foo() is compiled to a plain jump to MyA::foo().
So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to MyB::foo() altogether, even if MyB::foo() is a simple jump to MyA::foo().
So I guess if you want link time optimization, use the -flto (for GCC) or /GL (for the MSVC compiler) and /LTCG (for the MSVC linker) options.
Is it common ? Yes, for mainstream compilers.
Is it automatic ? Generally not. MSVC requires the /GL switch, gcc and clang the -flto flag.
How does it work ? (gcc only)
The traditional linker used in the gcc toolchain is ld, and it's kind of dumb. Therefore, and it might be surprising, link-time optimization is not performed by the linker in the gcc toolchain.
Gcc has a specific intermediate representation on which the optimizations are performed that is language agnostic: GIMPLE. When compiling a source file with -flto (which activates the LTO), it saves the intermediate representation in a specific section of the object file.
When invoking the linker driver (note: NOT the linker directly) with -flto, the driver will read those specific sections, bundle them together into a big chunk, and feed this bundle to the compiler. The compiler reapplies the optimizations as it usually does for a regular compilation (constant propagation, inlining, and this may expose new opportunities for dead code elimination, loop transformations, etc...) and produces a single big object file.
This big object file is finally fed to the regular linker of the toolchain (probably ld, unless you're experimenting with gold), which performes its linker magic.
Clang works similarly, and I surmise that MSVC uses a similar trick.
It depends. Most compilers (linkers, really) support this kind of optimizations. But in order for it to be done, the entire code-generation phase pretty much has to be deferred to link-time. MSVC calls the option link-time code generation (LTCG), and it is by default enabled in release builds, IIRC.
GCC has a similar option, under a different name, but I can't remember which -O levels, if any, enables it, or if it has to be enabled explicitly.
However, "traditionally", C++ compilers have compiled a single translation unit in isolation, after which the linker has merely tied up the loose ends, ensuring that when translation unit A calls a function defined in translation unit B, the correct function address is looked up and inserted into the calling code.
if you follow this model, then it is impossible to inline functions defined in another translation unit.
It is not just some "simple" optimization that can be done "on the fly", like, say, loop unrolling. It requires the linker and compiler to cooperate, because the linker will have to take over some of the work normally done by the compiler.
Note that the compiler will gladly inline functions that are not marked with the inline keyword. But only if it is aware of how the function is defined at the site where it is called. If it can't see the definition, then it can't inline the call. That is why you normally define such small trivial "intended-to-be-inlined" functions in headers, making their definitions visible to all callers.
Inlining is not a linker function.
The toolchains that support whole program optimization (cross-TU inlining) do so by not actually compiling anything, just parsing and storing an intermediate representation of the code, at compile time. And then the linker invokes the compiler, which does the actual inlining.
This is not done by default, you have to request it explicitly with appropriate command-line options to the compiler and linker.
One reason it is not and should not be default, is that it increases dependency-based rebuild times dramatically (sometimes by several orders of magnitude, depending on code organization).
Yes, any decent compiler is fully capable of inlining that function if you have the proper optimisation flags set and the compiler deems it a performance bonus.
If you really want to know, add a breakpoint before your function is called, compile your program, and look at the assembly. It will be very clear if you do that.
Compiled code must be able to see the content of the function for a chance of inlining. The chance of this happening more can be done though the use of unity files and LTCG.
The inline keyword only acts as a guidance for the compiler to inline functions when doing optimization. In g++, the optimization levels -O2 and -O3 generate different levels of inlining. The g++ doc specifies the following : (i) If O2 is specified -finline-small-functions is turned ON.(ii) If O3 is specified -finline-functions is turned ON along with all options for O2. (iii) Then there is one more relevant options "no-default-inline" which will make member functions inline only if "inline" keyword is added.
Typically, the size of the functions (number of instructions in the assembly), if recursive calls are used determine whether inlining happens. There are plenty more options defined in the link below for g++:
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Please take a look and see which ones you are using, because ultimately the options you use determine whether your function is inlined.
Here is my understanding of what the compiler will do with functions:
If the function definition is inside the class definition, and assuming no scenarios which prevent "inline-ing" the function, such as recursion, exist, the function will be "inline-d".
If the function definition is outside the class definition, the function will not be "inline-d" unless the function definition explicitly includes the inline keyword.
Here is an excerpt from Ivor Horton's Beginning Visual C++ 2010:
Inline Functions
With an inline function, the compiler tries to expand the code in the body of the function in place of a call to the function. This avoids much of the overhead of calling the function and, therefore, speeds up your code.
The compiler may not always be able to insert the code for a function inline (such as with recursive functions or functions for which you have obtained an address), but generally, it will work. It's best used for very short, simple functions, such as our Volume() in the CBox class, because such functions execute faster and inserting the body code does not significantly increase the size of the executable module.
With function definitions outside of the class definition, the compiler treats the functions as a normal function, and a call of the function will work in the usual way; however, it's also possible to tell the compiler that, if possible, you would like the function to be considered as inline. This is done by simply placing the keyword inline at the beginning of the function header. So, for this function, the definition would be as follows:
inline double CBox::Volume()
{
return l * w * h;
}
Enabling -Winline on my project produces a whole lot of output which I don't really understand. Does anyone know how to use this output to figure out why my particular function wasn't inlined?
Well, according to my gcc man page...
The compiler uses a variety of
heuristics to determine whether or not
to inline a function. For example,
the compiler takes into account the
size of the function being inlined and
the amount of inlining that has
already been done in the current
function. Therefore, seemingly
insignificant changes in the source
program can cause the warnings
produced by -Winline to appear or
disappear.
I don't believe that you can force the compiler to inline your function; it's an implementation detail that could even change when the compiler is updated. Besides, as long as the compiler's choice causes your function to run faster, is there any particular reason that you care whether the function is actually inlined or not?
Of course, if you really want to inline your function for some reason, you could probably just use a macro to do so.
According to MSDN Visual C++ can emit C4711 warning: function X selected for inline expansion if the compiler decides to inline a function that was not marked inline.
I don't see how this warning can be useful. Suppose I compile my code and see this warning. Now what? Why would I care?
It isn't on by default. You can turn it on if for some reason you'd like to know when functions are inlined. This can be relevant if, say, code size is at a severe premium, or you were expecting to jump into the function from outside the module, or you need the assembly to look a certain way. It can help track down code generation bugs as well.
It's purely informational.