Inline functions mechanism - c++

I know that an inline function does not use the stack for copying the parameters but it just replaces the body of the function wherever it is called.
Consider these two functions:
inline void add(int a) {
a++;
} // does nothing, a won't be changed
inline void add(int &a) {
a++;
} // changes the value of a
If the stack is not used for sending the parameters, how does the compiler know if a variable will be modified or not? What does the code looks like after replacing the calls of these two functions?

What makes you think there is a stack ? And even if there is, what makes you think it would be use for passing parameters ?
You have to understand that there are two levels of reasoning:
the language level: where the semantics of what should happen are defined
the machine level: where said semantics, encoded into CPU instructions, are carried out
At the language level, if you pass a parameter by non-const reference it might be modified by the function. The language level knows not what this mysterious "stack" is. Note: the inline keyword has little to no effect on whether a function call is inlined, it just says that the definition is in-line.
At machine level... there are many ways to achieve this. When making a function call, you have to obey a calling convention. This convention defines how the function parameters (and return types) are exchanged between caller and callee and who among them is responsible for saving/restoring the CPU registers. In general, because it is so low-level, this convention changes on a per CPU family basis.
For example, on x86, a couple parameters will be passed directly in CPU registers (if they fit) whilst remaining parameters (if any) will be passed on the stack.

I have checked what at least GCC does with it if you force it to inline the methods:
inline static void add1(int a) __attribute__((always_inline));
void add1(int a) {
a++;
} // does nothing, a won't be changed
inline static void add2(int &a) __attribute__((always_inline));
void add2(int &a) {
a++;
} // changes the value of a
int main() {
label1:
int b = 0;
add1(b);
label2:
int a = 0;
add2(a);
return 0;
}
The assembly output for this looks like:
.file "test.cpp"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
.L2:
movl $0, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, -8(%ebp)
addl $1, -8(%ebp)
.L3:
movl $0, -12(%ebp)
movl -12(%ebp), %eax
addl $1, %eax
movl %eax, -12(%ebp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE2:
Interestingly even the first call of add1() that effectively does nothing as a result outside of the function call, isn't optimized out.

If the stack is not used for sending the parameters, how does the
compiler know if a variable will be modified or not?
As Matthieu M. already pointed out the language construction itself knows nothing about stack.You specify inline keyword to the function just to give a compiler a hint and express a wish that you would prefer this routine to be inlined. If this happens depends completely on the compiler.
The compiler tries to predict what the advantages of this process given particular circumstances might be. If the compiler decides that inlining the function will make the code slower, or unacceptably larger, it will not inline it. Or, if it simply cannot because of a syntactical dependency, such as other code using a function pointer for callbacks, or exporting the function externally as in a dynamic/static code library.
What does the code looks like after replacing the calls of these two
functions?
At he moment none of this function is being inlined when compiled with
g++ -finline-functions -S main.cpp
and you can see it because in disassembly of main
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline void add4(int &a) {
a++;
} // changes the value of a
inline int f() { return 43; }
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
we see a call to each routine being made:
main:
.LFB8:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add3i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add4Ri // function call
movl $0, %eax
leave
ret
.cfi_endproc
compiling with -O1 will remove all functions from program at all because they do nothing.
However addition of
__attribute__((always_inline))
allows us to see what happens when code is inlined:
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline static void add3(int a) __attribute__((always_inline));
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline static void add4(int& a) __attribute__((always_inline));
inline void add4(int &a) {
a++;
} // changes the value of a
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
now: g++ -finline-functions -S main.cpp results with:
main:
.LFB9:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, -8(%rbp)
addl $1, -8(%rbp) // addition is here, there is no call
movl -4(%rbp), %eax
addl $1, %eax // addition is here, no call again
movl %eax, -4(%rbp)
movl $0, %eax
leave
ret
.cfi_endproc

The inline keyword has two key effects. One effect is that it is a hint to the implementation that "inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism." This usage is a hint, not a mandate, because "an implementation is not required to perform this inline substitution at the point of call".
The other principal effect is how it modifies the one definition rule. Per the ODR, a program must contain exactly one definition of any given non-inline function that is odr-used in the program. That doesn't quite work with an inline function because "An inline function shall be defined in every translation unit in which it is odr-used ...". Use the same inline function in one hundred different translation units and the linker will be confronted with one hundred definitions of the function. This isn't a problem because those multiple implementations of the same function "... shall have exactly the same definition in every case." One way to look at this: There still is only one definition; it just looks like there are a whole bunch to the linker.
Note: All quoted material are from section 7.1.2 of the C++11 standard.

Related

Does const ref lvalue to non-const func return value specifically reduce copies?

I have encountered a C++ habit that I have tried to research in order to understand its impact and validate its usage. But I can't seem to find the exact answer.
std::vector< Thing > getThings();
void do() {
const std::vector< Thing > &things = getThings();
}
Here we have some function that returns a non-const& value. The habit I am seeing is the usage of a const& lvalue when assigning the return value from the function. The proposed reasoning for this habit is that it reduces a copy.
Now I have been researching RVO (Return Value Optimization), copy elision, and C++11 move semantics. I realize that a given compiler could choose to prevent a copy via RVO regardless of the use of const& here. But does the usage of a const& lvalue here have any kind of effect on non-const& return values in terms of preventing copies? And I am specifically asking about pre-C++11 compilers, before move semantics.
My assumption is that either the compiler implements RVO or it does not, and that saying the lvalue should be const& doesn't hint or force a copy-free situation.
Edit
I am specifically asking about whether const& usage here reduces a copy, and not about the lifetime of the temporary object, as described in "the most important const"
Further clarification of question
Is this:
const std::vector< Thing > &things = getThings();
any different than this:
std::vector< Thing > things = getThings();
in terms of reducing copies? Or does it not have any influence on whether the compiler can reduce copies, such as via RVO?
Semantically, the compiler needs an accessible copy-constructor, at the call site, even if later on, the compiler elides the call to the copy-constructor — that optimization is done later in the compilation phase after the semantic-analysis phase.
After reading your comments, I think I understand your question better. Now let me answer it in detail.
Imagine that the function has this return statement:
return items;
Semantically speaking, the compiler needs an accessible copy-constructor (or move-constructor) here, which can be elided. However, just for the sake of argument, assume that it makes a copy here and the copy is stored in __temp_items which I expressed this as:
__temp_items <= return items; //first copy:
Now at the call site, assume that you have not used const &, so it becomes this:
std::vector<Thing> things = __temp_items; //second copy
Now as you can see yourself, there are two copies. Compilers are allowed to elide both of them.
However, your actual code uses const &, so it becomes this:
const std::vector<Thing> & things = __temp_items; //no copy anymore.
Now, semantically there is only one copy, which can still be elided by the compiler. As for the second copy, I wont say const& "prevented" it in the sense that compiler has optimised it, rather it is not allowed by the language to begin with.
But interestingly, no matter how many times the compiler makes copies while returning, or elides few (or all) of them, the return value is a temporary. If that is so, then how does binding to a temporary work? If that is also your question (now I know that is not your question but then keep it that way so that I dont have to erase this part of my answer), then yes it works and that is guaranteed by the language.
As explained in the article the most imporant const in very detail, that if a const reference binds to a temporary, then the lifetime of the temporary is extended till the scope of the reference, and it is irrespective of the type of the object.
In C++11, there is another way to extend the lifetime of a temporary, which is rvalue-reference:
std::vector<Thing> && things = getThings();
It has the same effect, but the advantage (or disadvantage — depends on the context) is that you can also modify the content.
I personally prefer to write this as:
auto && things = getThings();
but then that is not necessarily a rvalue-reference — if you change the return type of the function, to return a reference, then things turns out to bind to lvalue-reference. If you want to discuss that, then that is a whole different topic.
Hey so your question is:
"When a function returns a class instance by value, and you assign it to a const reference, does that avoid a copy constructor call?"
Ignoring the lifetime of the temporary, as that’s not the question you’re asking, we can get a feel for what happens by looking at the assembly output. I’m using clang, llvm 7.0.2.
Here’s something box standard. Return by value, nothing fancy.
Test A
class MyClass
{
public:
MyClass();
MyClass(const MyClass & source);
long int m_tmp;
};
MyClass createMyClass();
int main()
{
const MyClass myClass = createMyClass();
return 0;
}
If I compile with “-O0 -S -fno-elide-constructors” I get this.
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # rbp-4 = 0, no idea why this happens
callq __Z13createMyClassv # createMyClass(arg0)
leaq -16(%rbp), %rdi # arg0 = & myClass
leaq -24(%rbp), %rsi # arg1 = &__temp_items
callq __ZN7MyClassC1ERKS_ # MyClass::MyClass(arg0, arg1)
xorl %eax, %eax # eax = 0, the return value for main
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
We are looking at only the calling code. We’re not interested in the implementation of createMyClass. That’s compiled somewhere else.
So createMyClass creates the class inside a temporary and then that gets copied into myClass.
Simples.
What about the const ref version ?
Test B
class MyClass
{
public:
MyClass();
MyClass(const MyClass & source);
long int m_tmp;
};
MyClass createMyClass();
int main()
{
const MyClass & myClass = createMyClass();
return 0;
}
Same compiler options.
_main: # Boiler plate
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for the stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # *(rbp-4) = 0, no idea what this is for
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, the return value for main
leaq -24(%rbp), %rdi # rdi = &___temp_items
movq %rdi, -16(%rbp) # &myClass = rdi = &___temp_items;
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
No copy constructor and therefore more optimal right ?
What happens if we turn off “-fno-elide-constructors” for both versions? Still keeping -O0.
Test A
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $16, %rsp # Reserve 16 bytes for the stack frame
leaq -16(%rbp), %rdi # arg0 = &myClass = rdi = rbp-16
movl $0, -4(%rbp) # rbp-4 = 0, no idea what this is
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, return value for main
addq $16, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Clang has removed the copy constructor call.
Test B
_main: # Boiler plate
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for the stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # rbp-4 = 0, no idea what this is
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, return value for main
leaq -24(%rbp), %rdi # rdi = &__temp_items
movq %rdi, -16(%rbp) # &myClass = rdi
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Test B (assign to const reference) is the same as it was before. It now has more instructions than Test A.
What if we set optimisation to -O1 ?
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $16, %rsp # Reserve 16 bytes for the stack frame
leaq -8(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-8
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # ex = 0, return value for main
addq $16, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Both source files turn into this when compiled with -O1.
They result in exactly the same assembler.
This is also true for -O4.
The compiler doesn’t know about the contents of createMyClass so it can’t do anything more to optimise.
With the compiler I'm using, you get no performance gain from assigning to a const ref.
I imagine it's a similar situation for g++ and intel although it's always good to check.

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

If a variable is not read from ever, it is obviously optimized out. However, the only store operation on that variable is the result of the only read operation of another variable. So, this second variable should also be optimized out. Why is this not being done?
int main() {
timeval a,b,c;
// First and only logical use of a
gettimeofday(&a,NULL);
// Junk function
foo();
// First and only logical use of b
gettimeofday(&b,NULL);
// This gets optimized out as c is never read from.
c.tv_sec = a.tv_sec - b.tv_sec;
//std::cout << c;
}
Aseembly (gcc 4.8.2 with -O3):
subq $40, %rsp
xorl %esi, %esi
movq %rsp, %rdi
call gettimeofday
call foo()
leaq 16(%rsp), %rdi
xorl %esi, %esi
call gettimeofday
xorl %eax, %eax
addq $40, %rsp
ret
subq $8, %rsp
Edit: The results are the same for using rand() .
There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Return value by two copies

I am analyzing the returning value by function. See below for the actual question:
struct S{ int a, b, c, d; };
S f(){
S s;
s.b=5;
return s;
}
int main(){
S s= f();
s.a =2;
}
compiler output from gcc 5.3 -O0 -fno-elide-constructors on godbolt:
f():
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
movl $5, -28(%rbp)
leaq -32(%rbp), %rdx
leaq -16(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call S::S(S const&)
movq -16(%rbp), %rax
movq -8(%rbp), %rdx
leave
ret
main:
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
call f()
movq %rax, -16(%rbp)
movq %rdx, -8(%rbp)
leaq -16(%rbp), %rdx
leaq -32(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call S::S(S const&)
movl $2, -32(%rbp)
movl $0, %eax
leave
ret
What is my problem?
Why there are called two copy constructors? I don't understand why it is needful.
copy constructor inside f() is called, because a "return" with the object is used. This prompts it to create a copy of it into a temporary variable and then this gets passed to the main. Copy constructor inside main is obvious f() returns an S object, and it gets copied to "s"
One of the two times the constructor is called to build a temporary object used as return value of f(). If you want to avoid this additional overhead, you have to rely on C++11 move constructors.

how do static variables inside functions work?

In the following code:
int count(){
static int n(5);
n = n + 1;
return n;
}
the variable n is instantiated only once at the first call to the function.
There should be a flag or something so it initialize the variable only once.. I tried to look on the generated assembly code from gcc, but didn't have any clue.
How does the compiler handle this?
This is, of course, compiler-specific.
The reason you didn't see any checks in the generated assembly is that, since n is an int variable, g++ simply treats it as a global variable pre-initialized to 5.
Let's see what happens if we do the same with a std::string:
#include <string>
void count() {
static std::string str;
str += ' ';
}
The generated assembly goes like this:
_Z5countv:
.LFB544:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA544
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushq %r13
pushq %r12
pushq %rbx
subq $8, %rsp
movl $_ZGVZ5countvE3str, %eax
movzbl (%rax), %eax
testb %al, %al
jne .L2 ; <======= bypass initialization
.cfi_offset 3, -40
.cfi_offset 12, -32
.cfi_offset 13, -24
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_acquire ; acquire the lock
testl %eax, %eax
setne %al
testb %al, %al
je .L2 ; check again
movl $0, %ebx
movl $_ZZ5countvE3str, %edi
.LEHB0:
call _ZNSsC1Ev ; call the constructor
.LEHE0:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_release ; release the lock
movl $_ZNSsD1Ev, %eax
movl $__dso_handle, %edx
movl $_ZZ5countvE3str, %esi
movq %rax, %rdi
call __cxa_atexit ; schedule the destructor to be called at exit
jmp .L2
.L7:
.L3:
movl %edx, %r12d
movq %rax, %r13
testb %bl, %bl
jne .L5
.L4:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_abort
.L5:
movq %r13, %rax
movslq %r12d,%rdx
movq %rax, %rdi
.LEHB1:
call _Unwind_Resume
.L2:
movl $32, %esi
movl $_ZZ5countvE3str, %edi
call _ZNSspLEc
.LEHE1:
addq $8, %rsp
popq %rbx
popq %r12
popq %r13
leave
ret
.cfi_endproc
The line I've marked with the bypass initialization comment is the conditional jump instruction that skips the construction if the variable already points to a valid object.
This is entirely up to the implementation; the language standard says nothing about that.
In practice, the compiler will usually include a hidden flag variable somewhere that indicates whether the static variable has already been instantiated or not. The static variable and the flag will probably be in the static storage area of the program (e.g. the data segment, not the stack segment), not in the function scope memory, so you may have to look around about in the assembly. (The variable can't go on the call stack, for obvious reasons, so it's really like a global variable. "static allocation" really covers all sorts of static variables!)
Update: As #aix points out, if the static variable is initialized to a constant expression, you may not even need a flag, because the initialization can be performed at load time rather than at the first function call. In C++11 you should be able to take advantage of that better than in C++03 thanks to the wider availability of constant expressions.
It's quite likely that this variable will be handled just as ordinary global variable by gcc. That means the initialization will be statically initialized directly in the binary.
This is possible, since you initialize it by a constant. If you initialized it eg. with another function return value, the compiler would add a flag and skip the initialization based on the flag.

Mapping class to instance

I'm trying to build a simple entity/component system in c++ base on the second answer to this question : Best way to organize entities in a game?
Now, I would like to have a static std::map that returns (and even automatically create, if possible).
What I am thinking is something along the lines of:
PositionComponent *pos = systems[PositionComponent].instances[myEntityID];
What would be the best way to achieve that?
Maybe this?
std::map< std::type_info*, Something > systems;
Then you can do:
Something sth = systems[ &typeid(PositionComponent) ];
Just out of curiosity, I checked the assembler code of this C++ code
#include <typeinfo>
#include <cstdio>
class Foo {
virtual ~Foo() {}
};
int main() {
printf("%p\n", &typeid(Foo));
}
to be sure that it is really a constant. Assembler (stripped) outputted by GCC (without any optimisations):
.globl _main
_main:
LFB27:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
pushl %ebx
LCFI2:
subl $20, %esp
LCFI3:
call L3
"L00000000001$pb":
L3:
popl %ebx
leal L__ZTI3Foo$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
leal LC0-"L00000000001$pb"(%ebx), %eax
movl %eax, (%esp)
call _printf
movl $0, %eax
addl $20, %esp
popl %ebx
leave
ret
So it actually has to read the L__ZTI3Foo$non_lazy_ptr symbol (I wonder though that this is not constant -- maybe with other compiler options or with other compilers, it is). So a constant may be slightly faster (if the compiler sees the constant at compile time) because you save a read.
You should create some constants (ex. POSITION_COMPONENT=1) and then map those integers to instances.