Why does GCC 4.8.2 not propagate 'unused but set' optimization?

Why does GCC 4.8.2 not propagate 'unused but set' optimization? - c++

If a variable is not read from ever, it is obviously optimized out. However, the only store operation on that variable is the result of the only read operation of another variable. So, this second variable should also be optimized out. Why is this not being done?
int main() {
timeval a,b,c;
// First and only logical use of a
gettimeofday(&a,NULL);
// Junk function
foo();
// First and only logical use of b
gettimeofday(&b,NULL);
// This gets optimized out as c is never read from.
c.tv_sec = a.tv_sec - b.tv_sec;
//std::cout << c;
}
Aseembly (gcc 4.8.2 with -O3):
subq $40, %rsp
xorl %esi, %esi
movq %rsp, %rdi
call gettimeofday
call foo()
leaq 16(%rsp), %rdi
xorl %esi, %esi
call gettimeofday
xorl %eax, %eax
addq $40, %rsp
ret
subq $8, %rsp
Edit: The results are the same for using rand() .

There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Related

Increment variable by N inside array index

Could someone please tell me whether or not such a construction is valid (i.e not an UB) in C++. I have some segfaults because of that and spent couple of days trying to figure out what is going on there.
// Synthetic example
int main(int argc, char** argv)
{
int array[2] = {99, 99};
/*
The point is here. Is it legal? Does it have defined behaviour?
Will it increment first and than access element or vise versa?
*/
std::cout << array[argc += 7]; // Use argc just to avoid some optimisations
}
So, of course I did some analysis, both GCC(5/7) and clang(3.8) generate same code. First add than access.
Clang(3.8): clang++ -O3 -S test.cpp
leal 7(%rdi), %ebx
movl .L_ZZ4mainE5array+28(,%rax,4), %esi
movl $_ZSt4cout, %edi
callq _ZNSolsEi
movl $.L.str, %esi
movl $1, %edx
movq %rax, %rdi
callq _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
GCC(5/7) g++-7 -O3 -S test.cpp
leal 7(%rdi), %ebx
movl $_ZSt4cout, %edi
subq $16, %rsp
.cfi_def_cfa_offset 32
movq %fs:40, %rax
movq %rax, 8(%rsp)
xorl %eax, %eax
movabsq $425201762403, %rax
movq %rax, (%rsp)
movslq %ebx, %rax
movl (%rsp,%rax,4), %esi
call _ZNSolsEi
movl $.LC0, %esi
movq %rax, %rdi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
movl %ebx, %esi
So, can I assume such a baheviour is a standard one?

By itself array[argc += 7] is OK, the result of argc + 7 will be used as an index to array.
However, in your example array has just 2 elements, and argc is never negative, so your code will always result in UB due to an out-of-bounds array access.

In case of a[i+=N] the expression i += N will always be evaluated first before accessing the index. But the example that you provided invokes UB as your example array contains only two elements and thus you are accessing out of bounds of the array.

Your case is clearly undefined behaviour, since you will exceed array bounds for the following reasons:
First, expression array[argc += 7] is equal to *((array)+(argc+=7)), and the values of the operands will be evaluated before + is evaluated (cf. here); Operator += is an assignment (and not a side effect), and the value of an assignment is the result of argc (in this case) after the assignment (cf. here). Hence, the +=7 gets effective for subscripting;
Second, argc is defined in C++ to be never negative (cf. here); So argc += 7 will always be >=7 (or a signed integer overflow in very unrealistic scenarious, but still UB then).
Hence, UB.

It's normal behavior. Name of array actualy is a pointer to first element of array. And array[n] is the same as *(array+n)

Inline functions mechanism

I know that an inline function does not use the stack for copying the parameters but it just replaces the body of the function wherever it is called.
Consider these two functions:
inline void add(int a) {
a++;
} // does nothing, a won't be changed
inline void add(int &a) {
a++;
} // changes the value of a
If the stack is not used for sending the parameters, how does the compiler know if a variable will be modified or not? What does the code looks like after replacing the calls of these two functions?

What makes you think there is a stack ? And even if there is, what makes you think it would be use for passing parameters ?
You have to understand that there are two levels of reasoning:
the language level: where the semantics of what should happen are defined
the machine level: where said semantics, encoded into CPU instructions, are carried out
At the language level, if you pass a parameter by non-const reference it might be modified by the function. The language level knows not what this mysterious "stack" is. Note: the inline keyword has little to no effect on whether a function call is inlined, it just says that the definition is in-line.
At machine level... there are many ways to achieve this. When making a function call, you have to obey a calling convention. This convention defines how the function parameters (and return types) are exchanged between caller and callee and who among them is responsible for saving/restoring the CPU registers. In general, because it is so low-level, this convention changes on a per CPU family basis.
For example, on x86, a couple parameters will be passed directly in CPU registers (if they fit) whilst remaining parameters (if any) will be passed on the stack.

I have checked what at least GCC does with it if you force it to inline the methods:
inline static void add1(int a) __attribute__((always_inline));
void add1(int a) {
a++;
} // does nothing, a won't be changed
inline static void add2(int &a) __attribute__((always_inline));
void add2(int &a) {
a++;
} // changes the value of a
int main() {
label1:
int b = 0;
add1(b);
label2:
int a = 0;
add2(a);
return 0;
}
The assembly output for this looks like:
.file "test.cpp"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
.L2:
movl $0, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, -8(%ebp)
addl $1, -8(%ebp)
.L3:
movl $0, -12(%ebp)
movl -12(%ebp), %eax
addl $1, %eax
movl %eax, -12(%ebp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE2:
Interestingly even the first call of add1() that effectively does nothing as a result outside of the function call, isn't optimized out.

If the stack is not used for sending the parameters, how does the
compiler know if a variable will be modified or not?
As Matthieu M. already pointed out the language construction itself knows nothing about stack.You specify inline keyword to the function just to give a compiler a hint and express a wish that you would prefer this routine to be inlined. If this happens depends completely on the compiler.
The compiler tries to predict what the advantages of this process given particular circumstances might be. If the compiler decides that inlining the function will make the code slower, or unacceptably larger, it will not inline it. Or, if it simply cannot because of a syntactical dependency, such as other code using a function pointer for callbacks, or exporting the function externally as in a dynamic/static code library.
What does the code looks like after replacing the calls of these two
functions?
At he moment none of this function is being inlined when compiled with
g++ -finline-functions -S main.cpp
and you can see it because in disassembly of main
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline void add4(int &a) {
a++;
} // changes the value of a
inline int f() { return 43; }
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
we see a call to each routine being made:
main:
.LFB8:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add3i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add4Ri // function call
movl $0, %eax
leave
ret
.cfi_endproc
compiling with -O1 will remove all functions from program at all because they do nothing.
However addition of
__attribute__((always_inline))
allows us to see what happens when code is inlined:
void add1(int a) {
a++;
}
void add2(int &a) {
a++;
}
inline static void add3(int a) __attribute__((always_inline));
inline void add3(int a) {
a++;
} // does nothing, a won't be changed
inline static void add4(int& a) __attribute__((always_inline));
inline void add4(int &a) {
a++;
} // changes the value of a
int main(int argc, char** argv) {
int a = 31;
add1(a);
add2(a);
add3(a);
add4(a);
return 0;
}
now: g++ -finline-functions -S main.cpp results with:
main:
.LFB9:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $31, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %edi
call _Z4add1i // function call
leaq -4(%rbp), %rax
movq %rax, %rdi
call _Z4add2Ri // function call
movl -4(%rbp), %eax
movl %eax, -8(%rbp)
addl $1, -8(%rbp) // addition is here, there is no call
movl -4(%rbp), %eax
addl $1, %eax // addition is here, no call again
movl %eax, -4(%rbp)
movl $0, %eax
leave
ret
.cfi_endproc

The inline keyword has two key effects. One effect is that it is a hint to the implementation that "inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism." This usage is a hint, not a mandate, because "an implementation is not required to perform this inline substitution at the point of call".
The other principal effect is how it modifies the one definition rule. Per the ODR, a program must contain exactly one definition of any given non-inline function that is odr-used in the program. That doesn't quite work with an inline function because "An inline function shall be defined in every translation unit in which it is odr-used ...". Use the same inline function in one hundred different translation units and the linker will be confronted with one hundred definitions of the function. This isn't a problem because those multiple implementations of the same function "... shall have exactly the same definition in every case." One way to look at this: There still is only one definition; it just looks like there are a whole bunch to the linker.
Note: All quoted material are from section 7.1.2 of the C++11 standard.

how do static variables inside functions work?

In the following code:
int count(){
static int n(5);
n = n + 1;
return n;
}
the variable n is instantiated only once at the first call to the function.
There should be a flag or something so it initialize the variable only once.. I tried to look on the generated assembly code from gcc, but didn't have any clue.
How does the compiler handle this?

This is, of course, compiler-specific.
The reason you didn't see any checks in the generated assembly is that, since n is an int variable, g++ simply treats it as a global variable pre-initialized to 5.
Let's see what happens if we do the same with a std::string:
#include <string>
void count() {
static std::string str;
str += ' ';
}
The generated assembly goes like this:
_Z5countv:
.LFB544:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA544
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushq %r13
pushq %r12
pushq %rbx
subq $8, %rsp
movl $_ZGVZ5countvE3str, %eax
movzbl (%rax), %eax
testb %al, %al
jne .L2 ; <======= bypass initialization
.cfi_offset 3, -40
.cfi_offset 12, -32
.cfi_offset 13, -24
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_acquire ; acquire the lock
testl %eax, %eax
setne %al
testb %al, %al
je .L2 ; check again
movl $0, %ebx
movl $_ZZ5countvE3str, %edi
.LEHB0:
call _ZNSsC1Ev ; call the constructor
.LEHE0:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_release ; release the lock
movl $_ZNSsD1Ev, %eax
movl $__dso_handle, %edx
movl $_ZZ5countvE3str, %esi
movq %rax, %rdi
call __cxa_atexit ; schedule the destructor to be called at exit
jmp .L2
.L7:
.L3:
movl %edx, %r12d
movq %rax, %r13
testb %bl, %bl
jne .L5
.L4:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_abort
.L5:
movq %r13, %rax
movslq %r12d,%rdx
movq %rax, %rdi
.LEHB1:
call _Unwind_Resume
.L2:
movl $32, %esi
movl $_ZZ5countvE3str, %edi
call _ZNSspLEc
.LEHE1:
addq $8, %rsp
popq %rbx
popq %r12
popq %r13
leave
ret
.cfi_endproc
The line I've marked with the bypass initialization comment is the conditional jump instruction that skips the construction if the variable already points to a valid object.

This is entirely up to the implementation; the language standard says nothing about that.
In practice, the compiler will usually include a hidden flag variable somewhere that indicates whether the static variable has already been instantiated or not. The static variable and the flag will probably be in the static storage area of the program (e.g. the data segment, not the stack segment), not in the function scope memory, so you may have to look around about in the assembly. (The variable can't go on the call stack, for obvious reasons, so it's really like a global variable. "static allocation" really covers all sorts of static variables!)
Update: As #aix points out, if the static variable is initialized to a constant expression, you may not even need a flag, because the initialization can be performed at load time rather than at the first function call. In C++11 you should be able to take advantage of that better than in C++03 thanks to the wider availability of constant expressions.

It's quite likely that this variable will be handled just as ordinary global variable by gcc. That means the initialization will be statically initialized directly in the binary.
This is possible, since you initialize it by a constant. If you initialized it eg. with another function return value, the compiler would add a flag and skip the initialization based on the flag.

gcc stack optimization

Hi I have a question on possible stack optimization by gcc (or g++)..
Sample code under FreeBSD (does UNIX variance matter here?):
void main() {
char bing[100];
..
string buffer = ....;
..
}
What I found in gdb for a coredump of this program is that the address
of bing is actually lower than that buffer (namely, &bing[0] < &buffer).
I think this is totally the contrary of was told in textbook. Could there
be some compiler optimization that re-organize the stack layout in such a
way?
This seems to be only possible explanation but I'm not sure..
In case you're interested, the coredump is due to the buffer overflow by
bing to buffer (but that also confirms &bing[0] < &buffer).
Thanks!

Compilers are free to organise stack frames (assuming they even use stacks) any way they wish.
They may do it for alignment reasons, or for performance reasons, or for no reason at all. You would be unwise to assume any specific order.
If you hadn't invoked undefined behavior by overflowing the buffer, you probably never would have known, and that's the way it should be.
A compiler can not only re-organise your variables, it can optimise them out of existence if it can establish they're not used. With the code:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
Compare the normal assembler output:
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $80, %esp
movl %gs:20, %eax
movl %eax, 76(%esp)
xorl %eax, %eax
movl $7, (%esp)
movb $11, 5(%esp)
movl $0, %eax
movl 76(%esp), %edx
xorl %gs:20, %edx
je .L3
call __stack_chk_fail
.L3:
leave
ret
with the insanely optimised:
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
Notice anything missing from the latter? Yes, there are no stack manipulations to create space for either bing or x. They don't exist. In fact, the entire code sequence boils down to:
set return code to 0.
return.

A compiler is free to layout local variables on the stack (or keep them in register or do something else with them) however it sees fit: the C and C++ language standards don't say anything about these implementation details, and neither does POSIX or UNIX. I doubt that your textbook told you otherwise, and if it did, I would look for a new textbook.

Mapping class to instance

I'm trying to build a simple entity/component system in c++ base on the second answer to this question : Best way to organize entities in a game?
Now, I would like to have a static std::map that returns (and even automatically create, if possible).
What I am thinking is something along the lines of:
PositionComponent *pos = systems[PositionComponent].instances[myEntityID];
What would be the best way to achieve that?

Maybe this?
std::map< std::type_info*, Something > systems;
Then you can do:
Something sth = systems[ &typeid(PositionComponent) ];
Just out of curiosity, I checked the assembler code of this C++ code
#include <typeinfo>
#include <cstdio>
class Foo {
virtual ~Foo() {}
};
int main() {
printf("%p\n", &typeid(Foo));
}
to be sure that it is really a constant. Assembler (stripped) outputted by GCC (without any optimisations):
.globl _main
_main:
LFB27:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
pushl %ebx
LCFI2:
subl $20, %esp
LCFI3:
call L3
"L00000000001$pb":
L3:
popl %ebx
leal L__ZTI3Foo$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
leal LC0-"L00000000001$pb"(%ebx), %eax
movl %eax, (%esp)
call _printf
movl $0, %eax
addl $20, %esp
popl %ebx
leave
ret
So it actually has to read the L__ZTI3Foo$non_lazy_ptr symbol (I wonder though that this is not constant -- maybe with other compiler options or with other compilers, it is). So a constant may be slightly faster (if the compiler sees the constant at compile time) because you save a read.

You should create some constants (ex. POSITION_COMPONENT=1) and then map those integers to instances.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why does GCC 4.8.2 not propagate 'unused but set' optimization? - c++

There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Related

Increment variable by N inside array index

Inline functions mechanism

how do static variables inside functions work?

gcc stack optimization

Mapping class to instance

Categories

Resources