Performance of chained public member access compared to pointer

Performance of chained public member access compared to pointer - c++

Since I couldn't find any question relating to chained member access, but only chained function access, I would like to ask a couple of questions about it.
I have the following situation:
for(int i = 0; i < largeNumber; ++i)
{
//do calculations with the same chained struct:
//myStruct1.myStruct2.myStruct3.myStruct4.member1
//myStruct1.myStruct2.myStruct3.myStruct4.member2
//etc.
}
It is obviously possible to break this down using a pointer:
MyStruct4* myStruct4_pt = &myStruct1.myStruct2.myStruct3.myStruct4;
for(int i = 0; i < largeNumber; ++i)
{
//do calculations with pointer:
//(*myStruct4_pt).member1
//(*myStruct4_pt).member2
//etc.
}
Is there a difference between member access (.) and a function access that, e.g., returns a pointer to a private variable?
Will/Can the first example be optimized by the compiler and does that strongly depend on the compiler?
If no optimizations are done during compilation time, will/can the CPU optimize the behaviour (e.g. keeping it in the L1 cache)?
Does a chained member access make a difference at all in terms of performance, since variables are "wildly reassigned" during compilation time anyway?
I would kindly ask to leave discussions out regarding readability and maintainability of code, as the chained access is, for my purposes, clearer.
Update:
Everything is running in a single thread.

This is a constant offset that you're modifying, a modern compiler will realize that.
But - don't trust me, lets ask a compiler (see here).
#include <stdio.h>
struct D { float _; int i; int j; };
struct C { double _; D d; };
struct B { char _; C c; };
struct A { int _; B b; };
int bar(int i);
int foo(int i);
void foo(A &a) {
for (int i = 0; i < 10; i++) {
a.b.c.d.i += bar(i);
a.b.c.d.j += foo(i);
}
}
Compiles to
foo(A&):
pushq %rbp
movq %rdi, %rbp
pushq %rbx
xorl %ebx, %ebx
subq $8, %rsp
.L3:
movl %ebx, %edi
call bar(int)
addl %eax, 28(%rbp)
movl %ebx, %edi
addl $1, %ebx
call foo(int)
addl %eax, 32(%rbp)
cmpl $10, %ebx
jne .L3
addq $8, %rsp
popq %rbx
popq %rbp
ret
As you see, the chaining has been translated to a single offset in both cases: 28(%rbp) and 32(%rbp).

Related

<vector> in C++ weird behavior [duplicate]

I understand that arrays are a primitive class and therefore do not have built in methods to detect out of range errors. However, the vector class has the built in function .at() which does detect these errors. By using namespaces, anyone can overload the [ ] symbols to act as the .at() function by throwing an error when a value out of the vector's range is accessed. My question is this: why is this functionality not default in C++?
EDIT: Below is an example in pseudocode (I believe - correct me if needed) of overloading the vector operator [ ]:
Item_Type& operator[](size_t index) { // Verify that the index is legal.
if (index < 0 || index >= num_items) {
throw std::out_of_range
("index to operator[] is out of range");
}
return the_data[index]
}
I believe this function can be written into a user-defined namespace and is reasonably easy to implement. If this is true, why is it not default?

For something that's normally as cheap as [], bounds checking adds a significant overhead.
Consider
int f1(const std::vector<int> & v, std:size_t s) { return v[s]; }
this function translates to just three lines of assembly:
movq (%rdi), %rax
movl (%rax,%rsi,4), %eax
ret
Now consider the bounds-checking version using at():
int f2(const std::vector<int> & v, std:size_t s) { return v.at(s); }
This becomes
movq (%rdi), %rax
movq 8(%rdi), %rdx
subq %rax, %rdx
sarq $2, %rdx
cmpq %rdx, %rsi
jae .L6
movl (%rax,%rsi,4), %eax
ret
.L6:
pushq %rax
movl $.LC1, %edi
xorl %eax, %eax
call std::__throw_out_of_range_fmt(char const*, ...)
Even in the normal (non-throwing) code path, that's 8 lines of assembly - almost three times as many.

C++ has a principle of only pay for what you use. Therefore unchecked operations definitely have their place; just because you're too lazy to be careful about your bounds doesn't mean I should have to pay a performance penalty.
Historically array [] has been unchecked in both C and C++. Just because languages 10-20 years younger made that a checked operation doesn't mean C++ needs to make such a fundamental backward-incompatible change.

Does const ref lvalue to non-const func return value specifically reduce copies?

I have encountered a C++ habit that I have tried to research in order to understand its impact and validate its usage. But I can't seem to find the exact answer.
std::vector< Thing > getThings();
void do() {
const std::vector< Thing > &things = getThings();
}
Here we have some function that returns a non-const& value. The habit I am seeing is the usage of a const& lvalue when assigning the return value from the function. The proposed reasoning for this habit is that it reduces a copy.
Now I have been researching RVO (Return Value Optimization), copy elision, and C++11 move semantics. I realize that a given compiler could choose to prevent a copy via RVO regardless of the use of const& here. But does the usage of a const& lvalue here have any kind of effect on non-const& return values in terms of preventing copies? And I am specifically asking about pre-C++11 compilers, before move semantics.
My assumption is that either the compiler implements RVO or it does not, and that saying the lvalue should be const& doesn't hint or force a copy-free situation.
Edit
I am specifically asking about whether const& usage here reduces a copy, and not about the lifetime of the temporary object, as described in "the most important const"
Further clarification of question
Is this:
const std::vector< Thing > &things = getThings();
any different than this:
std::vector< Thing > things = getThings();
in terms of reducing copies? Or does it not have any influence on whether the compiler can reduce copies, such as via RVO?

Semantically, the compiler needs an accessible copy-constructor, at the call site, even if later on, the compiler elides the call to the copy-constructor — that optimization is done later in the compilation phase after the semantic-analysis phase.
After reading your comments, I think I understand your question better. Now let me answer it in detail.
Imagine that the function has this return statement:
return items;
Semantically speaking, the compiler needs an accessible copy-constructor (or move-constructor) here, which can be elided. However, just for the sake of argument, assume that it makes a copy here and the copy is stored in __temp_items which I expressed this as:
__temp_items <= return items; //first copy:
Now at the call site, assume that you have not used const &, so it becomes this:
std::vector<Thing> things = __temp_items; //second copy
Now as you can see yourself, there are two copies. Compilers are allowed to elide both of them.
However, your actual code uses const &, so it becomes this:
const std::vector<Thing> & things = __temp_items; //no copy anymore.
Now, semantically there is only one copy, which can still be elided by the compiler. As for the second copy, I wont say const& "prevented" it in the sense that compiler has optimised it, rather it is not allowed by the language to begin with.
But interestingly, no matter how many times the compiler makes copies while returning, or elides few (or all) of them, the return value is a temporary. If that is so, then how does binding to a temporary work? If that is also your question (now I know that is not your question but then keep it that way so that I dont have to erase this part of my answer), then yes it works and that is guaranteed by the language.
As explained in the article the most imporant const in very detail, that if a const reference binds to a temporary, then the lifetime of the temporary is extended till the scope of the reference, and it is irrespective of the type of the object.
In C++11, there is another way to extend the lifetime of a temporary, which is rvalue-reference:
std::vector<Thing> && things = getThings();
It has the same effect, but the advantage (or disadvantage — depends on the context) is that you can also modify the content.
I personally prefer to write this as:
auto && things = getThings();
but then that is not necessarily a rvalue-reference — if you change the return type of the function, to return a reference, then things turns out to bind to lvalue-reference. If you want to discuss that, then that is a whole different topic.

Hey so your question is:
"When a function returns a class instance by value, and you assign it to a const reference, does that avoid a copy constructor call?"
Ignoring the lifetime of the temporary, as that’s not the question you’re asking, we can get a feel for what happens by looking at the assembly output. I’m using clang, llvm 7.0.2.
Here’s something box standard. Return by value, nothing fancy.
Test A
class MyClass
{
public:
MyClass();
MyClass(const MyClass & source);
long int m_tmp;
};
MyClass createMyClass();
int main()
{
const MyClass myClass = createMyClass();
return 0;
}
If I compile with “-O0 -S -fno-elide-constructors” I get this.
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # rbp-4 = 0, no idea why this happens
callq __Z13createMyClassv # createMyClass(arg0)
leaq -16(%rbp), %rdi # arg0 = & myClass
leaq -24(%rbp), %rsi # arg1 = &__temp_items
callq __ZN7MyClassC1ERKS_ # MyClass::MyClass(arg0, arg1)
xorl %eax, %eax # eax = 0, the return value for main
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
We are looking at only the calling code. We’re not interested in the implementation of createMyClass. That’s compiled somewhere else.
So createMyClass creates the class inside a temporary and then that gets copied into myClass.
Simples.
What about the const ref version ?
Test B
class MyClass
{
public:
MyClass();
MyClass(const MyClass & source);
long int m_tmp;
};
MyClass createMyClass();
int main()
{
const MyClass & myClass = createMyClass();
return 0;
}
Same compiler options.
_main: # Boiler plate
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for the stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # *(rbp-4) = 0, no idea what this is for
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, the return value for main
leaq -24(%rbp), %rdi # rdi = &___temp_items
movq %rdi, -16(%rbp) # &myClass = rdi = &___temp_items;
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
No copy constructor and therefore more optimal right ?
What happens if we turn off “-fno-elide-constructors” for both versions? Still keeping -O0.
Test A
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $16, %rsp # Reserve 16 bytes for the stack frame
leaq -16(%rbp), %rdi # arg0 = &myClass = rdi = rbp-16
movl $0, -4(%rbp) # rbp-4 = 0, no idea what this is
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, return value for main
addq $16, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Clang has removed the copy constructor call.
Test B
_main: # Boiler plate
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $32, %rsp # Reserve 32 bytes for the stack frame
leaq -24(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-24
movl $0, -4(%rbp) # rbp-4 = 0, no idea what this is
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # eax = 0, return value for main
leaq -24(%rbp), %rdi # rdi = &__temp_items
movq %rdi, -16(%rbp) # &myClass = rdi
addq $32, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Test B (assign to const reference) is the same as it was before. It now has more instructions than Test A.
What if we set optimisation to -O1 ?
_main:
pushq %rbp # Boiler plate
movq %rsp, %rbp # Boiler plate
subq $16, %rsp # Reserve 16 bytes for the stack frame
leaq -8(%rbp), %rdi # arg0 = &___temp_items = rdi = rbp-8
callq __Z13createMyClassv # createMyClass(arg0)
xorl %eax, %eax # ex = 0, return value for main
addq $16, %rsp # Pop stack frame
popq %rbp # Boiler plate
retq
Both source files turn into this when compiled with -O1.
They result in exactly the same assembler.
This is also true for -O4.
The compiler doesn’t know about the contents of createMyClass so it can’t do anything more to optimise.
With the compiler I'm using, you get no performance gain from assigning to a const ref.
I imagine it's a similar situation for g++ and intel although it's always good to check.

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

If a variable is not read from ever, it is obviously optimized out. However, the only store operation on that variable is the result of the only read operation of another variable. So, this second variable should also be optimized out. Why is this not being done?
int main() {
timeval a,b,c;
// First and only logical use of a
gettimeofday(&a,NULL);
// Junk function
foo();
// First and only logical use of b
gettimeofday(&b,NULL);
// This gets optimized out as c is never read from.
c.tv_sec = a.tv_sec - b.tv_sec;
//std::cout << c;
}
Aseembly (gcc 4.8.2 with -O3):
subq $40, %rsp
xorl %esi, %esi
movq %rsp, %rdi
call gettimeofday
call foo()
leaq 16(%rsp), %rdi
xorl %esi, %esi
call gettimeofday
xorl %eax, %eax
addq $40, %rsp
ret
subq $8, %rsp
Edit: The results are the same for using rand() .

There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Are references or pointers faster?

From what I know, references are just another name for a variable whilst pointers are their own variable. Pointers take up space. People often say "use a reference or pointer" but they don't say which is better. If references take up no memory of their own, then references win in that department. What I don't know is if the compiler makes a distinction between references and normal variable. If you do operations on a reference, does it compile to the same code as normal variable?

Internally references are also implemented in terms of pointer. So, it's difficult to say which is faster pointer/reference.
It's a usage of these two which makes a difference.
For example you want to pass by reference a parameter to the function.
void func(int& a) case_1
{
//No need to check for NULL reference...
}
void func(int* a) case_2
{
//Need o check if pointer is not NULL
}
In case_2 you have to explicitly check if pointer is not NULL before dereferncing it whereas that's not the case with references because references are initialized to something.
Assumption is that you are playing game in civilized manner i.e
You are not doing something like:-
int*p = NULL;
int &a = *p;

Here are my two test programs:
Reference:
int i = 0;
int& r = i;
++r;
int j = 0;
++j;
Pointer:
int i = 0;
int* r = &i;
++(*r);
int j = 0;
++j;
My compiler wrote the EXACT same assembly code for both.
movl $0, -16(%rbp) #, i
leaq -16(%rbp), %rax #, tmp87
movq %rax, -8(%rbp) # tmp87, r
movq -8(%rbp), %rax # r, tmp88
movl (%rax), %eax # *r_1, D.31036
leal 1(%rax), %edx #, D.31036
movq -8(%rbp), %rax # r, tmp89
movl %edx, (%rax) # D.31036, *r_1
movl $0, -12(%rbp) #, j
addl $1, -12(%rbp) #, j
movl $0, %eax #, D.31036

They are the same, references are just a language mechanic that is a pointer that cannot be null. The difference remains only in the compilation phase, where you will get a complaint if you try to do something illegal.

Inline functions mechanism

I know that an inline function does not use the stack for copying the parameters but it just replaces the body of the function wherever it is called.
Consider these two functions:
inline void add(int a) {
a++;
} // does nothing, a won't be changed
inline void add(int &a) {
a++;
} // changes the value of a
If the stack is not used for sending the parameters, how does the compiler know if a variable will be modified or not? What does the code looks like after replacing the calls of these two functions?

What makes you think there is a stack ? And even if there is, what makes you think it would be use for passing parameters ?
You have to understand that there are two levels of reasoning:
the language level: where the semantics of what should happen are defined
the machine level: where said semantics, encoded into CPU instructions, are carried out
At the language level, if you pass a parameter by non-const reference it might be modified by the function. The language level knows not what this mysterious "stack" is. Note: the inline keyword has little to no effect on whether a function call is inlined, it just says that the definition is in-line.
At machine level... there are many ways to achieve this. When making a function call, you have to obey a calling convention. This convention defines how the function parameters (and return types) are exchanged between caller and callee and who among them is responsible for saving/restoring the CPU registers. In general, because it is so low-level, this convention changes on a per CPU family basis.
For example, on x86, a couple parameters will be passed directly in CPU registers (if they fit) whilst remaining parameters (if any) will be passed on the stack.

I have checked what at least GCC does with it if you force it to inline the methods:
inline static void add1(int a) __attribute__((always_inline));
void add1(int a) {
a++;
} // does nothing, a won't be changed
inline static void add2(int &a) __attribute__((always_inline));
void add2(int &a) {
a++;
} // changes the value of a
int main() {
label1:
int b = 0;
add1(b);
label2:
int a = 0;
add2(a);
return 0;
}
The assembly output for this looks like:
.file "test.cpp"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
.L2:
movl $0, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, -8(%ebp)
addl $1, -8(%ebp)
.L3:
movl $0, -12(%ebp)
movl -12(%ebp), %eax
addl $1, %eax
movl %eax, -12(%ebp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE2:
Interestingly even the first call of add1() that effectively does nothing as a result outside of the function call, isn't optimized out.

If the stack is not used for sending the parameters, how does the
compiler know if a variable will be modified or not?
As Matthieu M. already pointed out the language construction itself knows nothing about stack.You specify inline keyword to the function just to give a compiler a hint and express a wish that you would prefer this routine to be inlined. If this happens depends completely on the compiler.
The compiler tries to predict what the advantages of this process given particular circumstances might be. If the compiler decides that inlining the function will make the code slower, or unacceptably larger, it will not inline it. Or, if it simply cannot because of a syntactical dependency, such as other code using a function pointer for callbacks, or exporting the function externally as in a dynamic/static code library.
What does the code looks like after replacing the calls of these two
functions?
At he moment none of this function is being inlined when compiled with
g++ -finline-functions -S main.cpp
and you can see it because in disassembly of main
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline void add4(int &a) {
a++;
} // changes the value of a
inline int f() { return 43; }
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
we see a call to each routine being made:
main:
.LFB8:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add3i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add4Ri // function call
movl $0, %eax
leave
ret
.cfi_endproc
compiling with -O1 will remove all functions from program at all because they do nothing.
However addition of
__attribute__((always_inline))
allows us to see what happens when code is inlined:
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline static void add3(int a) __attribute__((always_inline));
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline static void add4(int& a) __attribute__((always_inline));
inline void add4(int &a) {
a++;
} // changes the value of a
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
now: g++ -finline-functions -S main.cpp results with:
main:
.LFB9:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, -8(%rbp)
addl $1, -8(%rbp) // addition is here, there is no call
movl -4(%rbp), %eax
addl $1, %eax // addition is here, no call again
movl %eax, -4(%rbp)
movl $0, %eax
leave
ret
.cfi_endproc

The inline keyword has two key effects. One effect is that it is a hint to the implementation that "inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism." This usage is a hint, not a mandate, because "an implementation is not required to perform this inline substitution at the point of call".
The other principal effect is how it modifies the one definition rule. Per the ODR, a program must contain exactly one definition of any given non-inline function that is odr-used in the program. That doesn't quite work with an inline function because "An inline function shall be defined in every translation unit in which it is odr-used ...". Use the same inline function in one hundred different translation units and the linker will be confronted with one hundred definitions of the function. This isn't a problem because those multiple implementations of the same function "... shall have exactly the same definition in every case." One way to look at this: There still is only one definition; it just looks like there are a whole bunch to the linker.
Note: All quoted material are from section 7.1.2 of the C++11 standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Performance of chained public member access compared to pointer - c++

Related

<vector> in C++ weird behavior [duplicate]

Does const ref lvalue to non-const func return value specifically reduce copies?

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

Are references or pointers faster?

Inline functions mechanism

Categories

Resources