not used const static variable in class optimized out? - c++

Can a reasonable decent compiler discard this const static variable
class A{
const static int a = 3;
}
if it is nowhere used in the compiled binary or does it show up anyway in the binary?

Short answer: Maybe. The standard does not say the compiler HAS to keep the constants (or strings, or functions, or anything else), if it's never used.
Long answer: It very much depends on the circumstances. If the compiler can clearly determine that it is not used, it will remove unused constants. If it can't make that determination, it can not remove the unused constant, since the constant COULD be used by something that isn't currently known by the compiler (e.g. another source file).
For example, if class A is inside a function, the compiler can know this class is not used elsewhere, and if the constant isn't used in the function, then it's not used anywhere. If the class is in a "global" space, such that it could be used somewhere else, then it will need to keep the constant.
This gets even more interesting with "whole program optimization" or "link time optimization" (both from now on called LTO), where all the code is actually optimized as one large lump, and of course, "used" or "not used" can be determined for all possible uses.
As you can imagine, the result will also depend on how clever the compiler (and linker for the LTO) is. All compilers should follow the principle of "if in doubt keep it".
You can of course experiment, and write some code where you use the variable, and then remove the use, and see what difference it makes to the assembly code (e.g. g++ -S x.cpp or clang++ -S x.cpp, and look at the resulting x.s file).

When optimizations are disabled, the answer is compiler-dependent. But when optimizations are enabled, the end result is the same irrespective of the compiler. Let's assume that optimizations are enabled.
The compiler will not emit a definition for a const static field in the generated object file when both of the following conditions holds:
It can resolve all uses of the field with the constant value it was initialized to.
There is at most one source code file that has used the field (there is an exception which I'll discuss at the end).
I'll discuss the second condition later. But now let's focus on the first. Let's see an example. Suppose that the target platform is 32-bit and that we have defined the following type:
// In MyClassA.h
class MyClassA{
public:
const static int MyClassAField;
};
// In MyClassA.cpp (or in MyClassA.h if it was included in at most one cpp file)
const int MyClassA::MyClassAField = 2;
Most compilers consider int to be a 32-bit signed integer. Therefore, on a 32-bit processor, most instructions can handle a 32-bit constant. In this case, the compiler will be able to replace any uses of MyClassAField with the constant 2 and that field will not exist in the object file.
On the other hand, if the field was of type double, on a 32-bit platform, instructions cannot handle 64-bit values. In this case, most compilers emit the field in the object file and uses SSE instructions and registers to 64-bit value from memory and process them.
Now I'll explain the second condition. If there is more than one source code file that is using the field, it cannot be eliminated (irrespective of whether Whole Program Optimization (WPO) is enabled or not) because some object file has to include the definition of the field so that the linker can use it for other object files. However, the linker, if you specified the right switch, can eliminate the field from the generated binary.
This is a linker optimization enabled with /OPT:REF for VC++ and --gc-sections for gcc. For icc, the names of the switches are the same (/OPT:REF on Windows and --gc-sections on Linx and OSX). However, the compiler has to emit every function and static or global field in a separate section in the object file so that the linker can eliminate it.
There is a catch, however. If the field has been defined inline as follows:
class MyClassA{
public:
const static int MyClassAField = 2;
};
Then the compiler itself will eliminate the definition of this field from every object file that uses it. That's because every source code file that uses it includes a separate definition. Each of them is compiled separately, the compiler itself will optimize the field away using an optimization called constant propagation. In fact, the VC++ compiler perform this optimization even if optimizations are disabled.
When optimizations are disabled, whether a const static field will be eliminated or not depends on the compiler, but probably it will not be eliminated.

Related

c++ function inling decided at compile time

It is often said a function can be inlined only if it is known at the compile time how it will be called, or something to that effect. (pls correct/clarify if I am wrong)
So say if I have a function like this
void Calling()
{
if (m_InputState == State::Fast) //m_inputstate is a class member and is set by user
{
CallFastFunction();
}
else if (m_InputState == State::Slow)
{
CallSlowFunction();
}
}
as m_inputstate is set by end-user, can we say this variable is not known at the compile time hence calling() cant be inlined?
Inlining has nothing to do with the function input being known at compile time.
The compiler can (and almost certainly WILL) inline your Calling function. Depending on the availability of the source of CallFastFunction and CallSlowFunction, these may also be inlined.
If the compiler can determine the value of m_InputState, it will remove the if - but only if it's definitive that the value is one value.
For example, thing.m_InputState = State::Slow; thing.Calling(); will only compile in the "slow" call, without any conditonal code, where std::cin >> thing.m_InputState; thing.Calling(); will of course not.
If, through profile-based optimisation, the compiler knows how often each of the cases are chosen, it may also select which path ends up "next" in the order of the code, such that the most likely comes first (and it may also provide prefix or other indication to the processor that "you're likely going this way".
But inlining itself happens based on:
The code itself being available.
The size and number of calls to the function.
Arbitrary compiler decisions that we can't know (unless we're very familiar with the source-code of the compiler).
Modern compilers also support "link time optimisation", where the object files produced are only "half-done", such that the linker will actually produce the final code, and it can move code around and inline, just the same as the old-fashioned compiler, on the entire code that makes up the executable (all the code using link-time optimisation), allowing code that didn't have the same source file or not in a header file to still be inlined.
The fact that m_InputState is not known does not stop Calling() from being inlined.
Typically, if you want a function to be inlinable, it has to be declared in the same .cpp file that it is called in (no function prototypes). This depends a bit on the compiler though.

Why C++ compiler isn't optimizing unused reference variables?

Consider following program:
#include <iostream>
struct Test
{
int& ref1;
int& ref2;
int& ref3;
};
int main()
{
std::cout<<sizeof(Test)<<'\n';
}
I know that C++ compiler can optimize the reference variables entirely so that they won't take any space in memory at all.
I tested a above demo program to see the output.
But when I compile & run on g++ 4.8.1 it gives me output 12.
It looks like compiler isn't optimizing the reference variables. I was expecting size of the Test struct to be 1.
I've also used -Os command line option but it still gives me output 12. I have also tried this program on MSVS 2010
compiled with /Ox command line option but it looks like Microsoft compiler isn't performing any optimization at all.
The three reference variables are unused & they aren't associated with any other variable. Then why compilers aren't optimizing them?
The size of the struct stays the same, there is nothing to optimize. If you would like to create an array of Test it should allocate the right size for each Test. The compiler cannot know which will be used or not. That's why there is no such optimization.
Unused variables would be for example a new int& int inside your main function. If this is unused, the optimizer will optimize it away.
Theoretically, if the world would only consist of simple programs, the compiler could optimize the sizeof of this struct to 1, because the sizeof of a struct is unspecified.
But in our real world, we have separate compilation of shared libraries that the compiler when compiling your code has no clue about (for example you could LoadLibrary or dlopen) that also happen to define your struct and where the sizeof should better agree with that in your program.
So actually a compiler better doesn't opimize the sizeof to 1 :)
In 8.3.2.4, of the C++ standard, it is said
It is unspecified whether or not a reference requires storage
So, the standard leaves it open to the implementation how references should be implemented. This implies that the size of your struct can be non-zero.
If the compiler would remove the references from the struct, you would not be able to link code compiled with different compiler settings. Imagine you compile one translation unit with optimizations, the other one without and link them together and pass an object from one TU to the other. Which size should the code assume? A function in TU 1 allocates 12 bytes on the stack, while a function in TU 2 allocates some other space.
The compiler can optimize your program and e.g. remove temporary objects, assignments etc. It may that you create an object of your struct somewhere in your source code and use it, but it will not be seen in the assembler code because it is not needed. What compilers also frequently do is remove indirections, e.g. by replacing references with direct access.

Do intel C++ compiler optimize out functions that have never been called in the codes?

Just some opitmization considerations:
Does anyone know it for sure whether intel C++ compiler (such as ICC 13.0, and of cause, compiled with some optimzation options like /O3 etc) will automatically optimize out any unused/uncalled struct/class/functions/variables in the codes like examplefun() below:
//...defining examplefunc()....//
const int a=0;
if (a>0)
int b=examplefunc();
The compiler does not usually optimize out unused functions unless they are static and, therefore, only accessible within a specific module. However, the linker might dead strip the function if linking is done at the function level and not the module level.
You can check the assembly output, linker map, or use something like objdump to check if the function was included in the linked binary.
I don't think the question is correctly stated. While the question literally asks whether the compiler will optimize away a function that is not used, but that is something that only the linker can do.
So what can the compiler do? The compiler can optimize away dead code, so for example in your code above, and because a is known to be 0, the compiler can remove the if statement altogether. For most uses, that is good enough (whether a function makes it to the executable or not won't affect performance much, whether a branch is avoided or not will affect the performance of the function --in particular with branch mispredictions).
Additionally, if the compiler optimizes the branch above, there will be one less reference to the exampleFunc function in the program, and when the linker processes the generated binaries, if there is no reference to a function in the whole program, it can drop the symbol altogether. Note that this can only be done as part of the program linkage, for libraries, even if the function is not called now, a program linked with the library at a later time might use it.
So getting back to the original question, the compiler will optimize away the branch, and the linker might or not remove the function from the binary, but that should not matter as much.
Regarding the other constructs, for struct and class, the only thing that makes it to the binary are the member functions, and the same thing applies there: if you are linking the program and none of the functions is used, the linker can drop the symbols.

Will C++ linker automatically inline functions (without "inline" keyword, without implementation in header)?

Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the inline keyword?
For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible):
//FILE: MyA.hpp
class MyA
{
public:
int foo(void) const;
};
//FILE: MyB.hpp
class MyB
{
private:
MyA my_a_;
public:
int foo(void) const;
};
//FILE: MyB.cpp
// PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH
// IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"?
int MyB::foo(void)
{
return my_a_.foo();
}
I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?).
Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs).
However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)?
Are my hopes too naive?
Here's a quick test of your example (with a MyA::foo() implementation that simply returns 42). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the -flto option (GCC) or the /GL option (MSVC) results in full optimization - wherever MyB::foo() is called, it's simply replaced with 42.
With GCC (MinGW 4.5.1):
gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp
the call to MyB::foo() was not optimized away. MyB::foo() itself was slightly optimized to:
Dump of assembler code for function MyB::foo() const:
0x00401350 <+0>: push %ebp
0x00401351 <+1>: mov %esp,%ebp
0x00401353 <+3>: sub $0x8,%esp
=> 0x00401356 <+6>: leave
0x00401357 <+7>: jmp 0x401360 <MyA::foo() const>
Which is the entry prologue is left in place, but immediately undone (the leave instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that MyB::foo() is simply returning whatever MyA::foo() returns. I'm not sure why the prologue is left in.
MSVC 16 (from VS 2010) handled things a little differently:
MyB::foo() ended up as two jumps - one to a 'thunk' of some sort:
0:000> u myb!MyB::foo
myb!MyB::foo:
001a1030 e9d0ffffff jmp myb!ILT+0(?fooMyAQBEHXZ) (001a1005)
And the thunk simply jumped to MyA::foo():
myb!ILT+0(?fooMyAQBEHXZ):
001a1005 e936000000 jmp myb!MyA::foo (001a1040)
Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, MyB::foo() is compiled to a plain jump to MyA::foo().
So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to MyB::foo() altogether, even if MyB::foo() is a simple jump to MyA::foo().
So I guess if you want link time optimization, use the -flto (for GCC) or /GL (for the MSVC compiler) and /LTCG (for the MSVC linker) options.
Is it common ? Yes, for mainstream compilers.
Is it automatic ? Generally not. MSVC requires the /GL switch, gcc and clang the -flto flag.
How does it work ? (gcc only)
The traditional linker used in the gcc toolchain is ld, and it's kind of dumb. Therefore, and it might be surprising, link-time optimization is not performed by the linker in the gcc toolchain.
Gcc has a specific intermediate representation on which the optimizations are performed that is language agnostic: GIMPLE. When compiling a source file with -flto (which activates the LTO), it saves the intermediate representation in a specific section of the object file.
When invoking the linker driver (note: NOT the linker directly) with -flto, the driver will read those specific sections, bundle them together into a big chunk, and feed this bundle to the compiler. The compiler reapplies the optimizations as it usually does for a regular compilation (constant propagation, inlining, and this may expose new opportunities for dead code elimination, loop transformations, etc...) and produces a single big object file.
This big object file is finally fed to the regular linker of the toolchain (probably ld, unless you're experimenting with gold), which performes its linker magic.
Clang works similarly, and I surmise that MSVC uses a similar trick.
It depends. Most compilers (linkers, really) support this kind of optimizations. But in order for it to be done, the entire code-generation phase pretty much has to be deferred to link-time. MSVC calls the option link-time code generation (LTCG), and it is by default enabled in release builds, IIRC.
GCC has a similar option, under a different name, but I can't remember which -O levels, if any, enables it, or if it has to be enabled explicitly.
However, "traditionally", C++ compilers have compiled a single translation unit in isolation, after which the linker has merely tied up the loose ends, ensuring that when translation unit A calls a function defined in translation unit B, the correct function address is looked up and inserted into the calling code.
if you follow this model, then it is impossible to inline functions defined in another translation unit.
It is not just some "simple" optimization that can be done "on the fly", like, say, loop unrolling. It requires the linker and compiler to cooperate, because the linker will have to take over some of the work normally done by the compiler.
Note that the compiler will gladly inline functions that are not marked with the inline keyword. But only if it is aware of how the function is defined at the site where it is called. If it can't see the definition, then it can't inline the call. That is why you normally define such small trivial "intended-to-be-inlined" functions in headers, making their definitions visible to all callers.
Inlining is not a linker function.
The toolchains that support whole program optimization (cross-TU inlining) do so by not actually compiling anything, just parsing and storing an intermediate representation of the code, at compile time. And then the linker invokes the compiler, which does the actual inlining.
This is not done by default, you have to request it explicitly with appropriate command-line options to the compiler and linker.
One reason it is not and should not be default, is that it increases dependency-based rebuild times dramatically (sometimes by several orders of magnitude, depending on code organization).
Yes, any decent compiler is fully capable of inlining that function if you have the proper optimisation flags set and the compiler deems it a performance bonus.
If you really want to know, add a breakpoint before your function is called, compile your program, and look at the assembly. It will be very clear if you do that.
Compiled code must be able to see the content of the function for a chance of inlining. The chance of this happening more can be done though the use of unity files and LTCG.
The inline keyword only acts as a guidance for the compiler to inline functions when doing optimization. In g++, the optimization levels -O2 and -O3 generate different levels of inlining. The g++ doc specifies the following : (i) If O2 is specified -finline-small-functions is turned ON.(ii) If O3 is specified -finline-functions is turned ON along with all options for O2. (iii) Then there is one more relevant options "no-default-inline" which will make member functions inline only if "inline" keyword is added.
Typically, the size of the functions (number of instructions in the assembly), if recursive calls are used determine whether inlining happens. There are plenty more options defined in the link below for g++:
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Please take a look and see which ones you are using, because ultimately the options you use determine whether your function is inlined.
Here is my understanding of what the compiler will do with functions:
If the function definition is inside the class definition, and assuming no scenarios which prevent "inline-ing" the function, such as recursion, exist, the function will be "inline-d".
If the function definition is outside the class definition, the function will not be "inline-d" unless the function definition explicitly includes the inline keyword.
Here is an excerpt from Ivor Horton's Beginning Visual C++ 2010:
Inline Functions
With an inline function, the compiler tries to expand the code in the body of the function in place of a call to the function. This avoids much of the overhead of calling the function and, therefore, speeds up your code.
The compiler may not always be able to insert the code for a function inline (such as with recursive functions or functions for which you have obtained an address), but generally, it will work. It's best used for very short, simple functions, such as our Volume() in the CBox class, because such functions execute faster and inserting the body code does not significantly increase the size of the executable module.
With function definitions outside of the class definition, the compiler treats the functions as a normal function, and a call of the function will work in the usual way; however, it's also possible to tell the compiler that, if possible, you would like the function to be considered as inline. This is done by simply placing the keyword inline at the beginning of the function header. So, for this function, the definition would be as follows:
inline double CBox::Volume()
{
return l * w * h;
}

Inline Function (When to insert)?

Inline functions are just a request to compilers that insert the complete body of the inline function in every place in the code where that function is used.
But how the compiler decides whether it should insert it or not? Which algorithm/mechanism it uses to decide?
Thanks,
Naveen
Some common aspects:
Compiler option (debug builds usually don't inline, and most compilers have options to override the inline declaration to try to inline all, or none)
suitable calling convention (e.g. varargs functions usually aren't inlined)
suitable for inlining: depends on size of the function, call frequency of the function, gains through inlining, and optimization settings (speed vs. code size). Often, tiny functions have the most benefits, but a huge function may be inlined if it is called just once
inline call depth and recursion settings
The 3rd is probably the core of your question, but that's really "compiler specific heuristics" - you need to check the compiler docs, but usually they won't give much guarantees. MSDN has some (limited) information for MSVC.
Beyond trivialities (e.g. simple getters and very primitive functions), inlining as such isn't very helpful anymore. The cost of the call instruction has gone down, and branch prediction has greatly improved.
The great opportunity for inlining is removing code paths that the compiler knows won't be taken - as an extreme example:
inline int Foo(bool refresh = false)
{
if (refresh)
{
// ...extensive code to update m_foo
}
return m_foo;
}
A good compiler would inline Foo(false), but not Foo(true).
With Link Time Code Generation, Foo could reside in a .cpp (without a inline declararion), and Foo(false) would still be inlined, so again inline has only marginal effects here.
To summarize: There are few scenarios where you should attempt to take manual control of inlining by placing (or omitting) inline statements.
The following is in the FAQ for the Sun Studio 11 compiler:
The compiler generates an inline function as an ordinary callable function (out of line) when any of the following is true:
You compile with +d.
You compile with -g.
The function's address is needed (as with a virtual function).
The function contains control structures the compiler can't generate inline.
The function is too complex.
According to the response to this post by 'clamage45' the "control structures that the compiler can't generate inline" are:
the function contains forbidden constructs, like loop, switch, or goto
Another list can be found here. As most other answers have specified the heuristics are going to be 100% compiler specific, from what I've read I think to ensure that a function is actually inlined you need to avoid:
local static variables
loop constructs
switch statements
try/catch
goto
recursion
and of course too complex (whatever that means)
All I know about inline functions (and a lot of other c++ stuff) is here.
Also, if you're focusing on the heuristics of each compiler to decide wether or not inlie a function, that's implementation dependant and you should look at each compiler's documentation. Keep in mind that the heuristic could also change depending on the level of optimitation.
I'm pretty sure most compilers decide based on the length of the function (when compiled) in bytes and how often it is used vs the optimization type (speed vs size).
I know only couple criteria:
If inline meets recursion - inline will be ignored.
switch/while/for in most cases cause compiler to ignore inline
It depends on the compiler. Here's (the first part of) what the GCC manual says:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined.
This flag allows the control of this limit for functions that are
explicitly marked as inline (i.e., marked with the inline keyword
or defined within the class definition in c++). n is the size of
functions that can be inlined in number of pseudo instructions (not
counting parameter handling). The default value of n is 600.
Increasing this value can result in more inlined code at the cost
of compilation time and memory consumption. Decreasing usually
makes the compilation faster and less code will be inlined (which
presumably means slower programs). This option is particularly
useful for programs that use inlining heavily such as those based
on recursive templates with C++.
Inlining is actually controlled by a number of parameters, which
may be specified individually by using --param name=value. The
-finline-limit=n option sets some of these parameters as follows:
#item max-inline-insns-single
is set to I/2.
#item max-inline-insns-auto
is set to I/2.
#item min-inline-insns
is set to 130 or I/4, whichever is smaller.
#item max-inline-insns-rtl
is set to I.
See below for a documentation of the individual parameters
controlling inlining.
Note: pseudo instruction represents, in this particular context, an
abstract measurement of function's size. In no way, it represents
a count of assembly instructions and as such its exact meaning
might change from one release to an another.
it inserts if you write "inline" to beginning of the function?