gcc stack optimization

gcc stack optimization - c++

Hi I have a question on possible stack optimization by gcc (or g++)..
Sample code under FreeBSD (does UNIX variance matter here?):
void main() {
char bing[100];
..
string buffer = ....;
..
}
What I found in gdb for a coredump of this program is that the address
of bing is actually lower than that buffer (namely, &bing[0] < &buffer).
I think this is totally the contrary of was told in textbook. Could there
be some compiler optimization that re-organize the stack layout in such a
way?
This seems to be only possible explanation but I'm not sure..
In case you're interested, the coredump is due to the buffer overflow by
bing to buffer (but that also confirms &bing[0] < &buffer).
Thanks!

Compilers are free to organise stack frames (assuming they even use stacks) any way they wish.
They may do it for alignment reasons, or for performance reasons, or for no reason at all. You would be unwise to assume any specific order.
If you hadn't invoked undefined behavior by overflowing the buffer, you probably never would have known, and that's the way it should be.
A compiler can not only re-organise your variables, it can optimise them out of existence if it can establish they're not used. With the code:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
Compare the normal assembler output:
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $80, %esp
movl %gs:20, %eax
movl %eax, 76(%esp)
xorl %eax, %eax
movl $7, (%esp)
movb $11, 5(%esp)
movl $0, %eax
movl 76(%esp), %edx
xorl %gs:20, %edx
je .L3
call __stack_chk_fail
.L3:
leave
ret
with the insanely optimised:
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
Notice anything missing from the latter? Yes, there are no stack manipulations to create space for either bing or x. They don't exist. In fact, the entire code sequence boils down to:
set return code to 0.
return.

A compiler is free to layout local variables on the stack (or keep them in register or do something else with them) however it sees fit: the C and C++ language standards don't say anything about these implementation details, and neither does POSIX or UNIX. I doubt that your textbook told you otherwise, and if it did, I would look for a new textbook.

Related

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

If a variable is not read from ever, it is obviously optimized out. However, the only store operation on that variable is the result of the only read operation of another variable. So, this second variable should also be optimized out. Why is this not being done?
int main() {
timeval a,b,c;
// First and only logical use of a
gettimeofday(&a,NULL);
// Junk function
foo();
// First and only logical use of b
gettimeofday(&b,NULL);
// This gets optimized out as c is never read from.
c.tv_sec = a.tv_sec - b.tv_sec;
//std::cout << c;
}
Aseembly (gcc 4.8.2 with -O3):
subq $40, %rsp
xorl %esi, %esi
movq %rsp, %rdi
call gettimeofday
call foo()
leaq 16(%rsp), %rdi
xorl %esi, %esi
call gettimeofday
xorl %eax, %eax
addq $40, %rsp
ret
subq $8, %rsp
Edit: The results are the same for using rand() .

There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?

Compiling this:
#include <iostream>
int main()
{
for (int i = 0; i < 4; ++i)
std::cout << i*1000000000 << std::endl;
}
and gcc produces the following warning:
warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
std::cout << i*1000000000 << std::endl;
^
I understand there is a signed integer overflow.
What I cannot get is why i value is broken by that overflow operation?
I've read the answers to Why does integer overflow on x86 with GCC cause an infinite loop?, but I'm still not clear on why this happens - I get that "undefined" means "anything can happen", but what's the underlying cause of this specific behavior?
Online: http://ideone.com/dMrRKR
Compiler: gcc (4.8)

Signed integer overflow (as strictly speaking, there is no such thing as "unsigned integer overflow") means undefined behaviour. And this means anything can happen, and discussing why does it happen under the rules of C++ doesn't make sense.
C++11 draft N3337: §5.4:1
If during the evaluation of an expression, the result is not mathematically deﬁned or not in the range of
representable values for its type, the behavior is undeﬁned. [ Note: most existing implementations of C++
ignore integer overﬂows. Treatment of division by zero, forming a remainder using a zero divisor, and all
ﬂoating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
Your code compiled with g++ -O3 emits warning (even without -Wall)
a.cpp: In function 'int main()':
a.cpp:11:18: warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
std::cout << i*1000000000 << std::endl;
^
a.cpp:9:2: note: containing loop
for (int i = 0; i < 4; ++i)
^
The only way we can analyze what the program is doing, is by reading the generated assembly code.
Here is the full assembly listing:
.file "a.cpp"
.section .text$_ZNKSt5ctypeIcE8do_widenEc,"x"
.linkonce discard
.align 2
LCOLDB0:
LHOTB0:
.align 2
.p2align 4,,15
.globl __ZNKSt5ctypeIcE8do_widenEc
.def __ZNKSt5ctypeIcE8do_widenEc; .scl 2; .type 32; .endef
__ZNKSt5ctypeIcE8do_widenEc:
LFB860:
.cfi_startproc
movzbl 4(%esp), %eax
ret $4
.cfi_endproc
LFE860:
LCOLDE0:
LHOTE0:
.section .text.unlikely,"x"
LCOLDB1:
.text
LHOTB1:
.p2align 4,,15
.def ___tcf_0; .scl 3; .type 32; .endef
___tcf_0:
LFB1091:
.cfi_startproc
movl $__ZStL8__ioinit, %ecx
jmp __ZNSt8ios_base4InitD1Ev
.cfi_endproc
LFE1091:
.section .text.unlikely,"x"
LCOLDE1:
.text
LHOTE1:
.def ___main; .scl 2; .type 32; .endef
.section .text.unlikely,"x"
LCOLDB2:
.section .text.startup,"x"
LHOTB2:
.p2align 4,,15
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB1084:
.cfi_startproc
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x70,0x6
.cfi_escape 0x10,0x7,0x2,0x75,0x7c
.cfi_escape 0x10,0x6,0x2,0x75,0x78
.cfi_escape 0x10,0x3,0x2,0x75,0x74
xorl %edi, %edi
subl $24, %esp
call ___main
L4:
movl %edi, (%esp)
movl $__ZSt4cout, %ecx
call __ZNSolsEi
movl %eax, %esi
movl (%eax), %eax
subl $4, %esp
movl -12(%eax), %eax
movl 124(%esi,%eax), %ebx
testl %ebx, %ebx
je L15
cmpb $0, 28(%ebx)
je L5
movsbl 39(%ebx), %eax
L6:
movl %esi, %ecx
movl %eax, (%esp)
addl $1000000000, %edi
call __ZNSo3putEc
subl $4, %esp
movl %eax, %ecx
call __ZNSo5flushEv
jmp L4
.p2align 4,,10
L5:
movl %ebx, %ecx
call __ZNKSt5ctypeIcE13_M_widen_initEv
movl (%ebx), %eax
movl 24(%eax), %edx
movl $10, %eax
cmpl $__ZNKSt5ctypeIcE8do_widenEc, %edx
je L6
movl $10, (%esp)
movl %ebx, %ecx
call *%edx
movsbl %al, %eax
pushl %edx
jmp L6
L15:
call __ZSt16__throw_bad_castv
.cfi_endproc
LFE1084:
.section .text.unlikely,"x"
LCOLDE2:
.section .text.startup,"x"
LHOTE2:
.section .text.unlikely,"x"
LCOLDB3:
.section .text.startup,"x"
LHOTB3:
.p2align 4,,15
.def __GLOBAL__sub_I_main; .scl 3; .type 32; .endef
__GLOBAL__sub_I_main:
LFB1092:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
movl $__ZStL8__ioinit, %ecx
call __ZNSt8ios_base4InitC1Ev
movl $___tcf_0, (%esp)
call _atexit
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
LFE1092:
.section .text.unlikely,"x"
LCOLDE3:
.section .text.startup,"x"
LHOTE3:
.section .ctors,"w"
.align 4
.long __GLOBAL__sub_I_main
.lcomm __ZStL8__ioinit,1,1
.ident "GCC: (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.0"
.def __ZNSt8ios_base4InitD1Ev; .scl 2; .type 32; .endef
.def __ZNSolsEi; .scl 2; .type 32; .endef
.def __ZNSo3putEc; .scl 2; .type 32; .endef
.def __ZNSo5flushEv; .scl 2; .type 32; .endef
.def __ZNKSt5ctypeIcE13_M_widen_initEv; .scl 2; .type 32; .endef
.def __ZSt16__throw_bad_castv; .scl 2; .type 32; .endef
.def __ZNSt8ios_base4InitC1Ev; .scl 2; .type 32; .endef
.def _atexit; .scl 2; .type 32; .endef
I can barely even read assembly, but even I can see the addl $1000000000, %edi line.
The resulting code looks more like
for(int i = 0; /* nothing, that is - infinite loop */; i += 1000000000)
std::cout << i << std::endl;
This comment of #T.C.:
I suspect that it's something like: (1) because every iteration with i of any value larger than 2 has undefined behavior -> (2) we can assume that i <= 2 for optimization purposes -> (3) the loop condition is always true -> (4) it's optimized away into an infinite loop.
gave me idea to compare the assembly code of the OP's code to the assembly code of the following code, with no undefined behaviour.
#include <iostream>
int main()
{
// changed the termination condition
for (int i = 0; i < 3; ++i)
std::cout << i*1000000000 << std::endl;
}
And, in fact, the correct code has termination condition.
; ...snip...
L6:
mov ecx, edi
mov DWORD PTR [esp], eax
add esi, 1000000000
call __ZNSo3putEc
sub esp, 4
mov ecx, eax
call __ZNSo5flushEv
cmp esi, -1294967296 // here it is
jne L7
lea esp, [ebp-16]
xor eax, eax
pop ecx
; ...snip...
Unfortunately this is the consequences of writing buggy code.
Fortunately you can make use of better diagnostics and better debugging tools - that's what they are for:
enable all warnings
-Wall is the gcc option that enables all useful warnings with no false positives. This is a bare minimum that you should always use.
gcc has many other warning options, however, they are not enabled with -Wall as they may warn on false positives
Visual C++ unfortunately is lagging behind with the ability to give useful warnings. At least the IDE enables some by default.
use debug flags for debugging
for integer overflow -ftrapv traps the program on overflow,
Clang compiler is excellent for this: -fcatch-undefined-behavior catches a lot of instances of undefined behaviour (note: "a lot of" != "all of them")
I have a spaghetti mess of a program not written by me that needs to be shipped tomorrow! HELP!!!!!!111oneone
Use gcc's -fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation.
1 - this rule does not apply to "unsigned integer overflow", as §3.9.1.4 says that
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number
of bits in the value representation of that particular size of integer.
and e.g. result of UINT_MAX + 1 is mathematically defined - by the rules of arithmetic modulo 2n

Short answer, gcc specifically has documented this problem, we can see that in the gcc 4.8 release notes which says (emphasis mine going forward):
GCC now uses a more aggressive analysis to derive an upper bound for
the number of iterations of loops using constraints imposed by
language standards. This may cause non-conforming programs to no
longer work as expected, such as SPEC CPU 2006 464.h264ref and
416.gamess. A new option, -fno-aggressive-loop-optimizations, was added to disable this aggressive analysis. In some loops that have
known constant number of iterations, but undefined behavior is known
to occur in the loop before reaching or during the last iteration, GCC
will warn about the undefined behavior in the loop instead of deriving
lower upper bound of the number of iterations for the loop. The
warning can be disabled with -Wno-aggressive-loop-optimizations.
and indeed if we use -fno-aggressive-loop-optimizations the infinite loop behavior should cease and it does in all the cases I have tested.
The long answer starts with knowing that signed integer overflow is undefined behavior by looking at the draft C++ standard section 5 Expressions paragraph 4 which says:
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined. [ Note: most existing
implementations of C++ ignore integer overflows. Treatment of division
by zero, forming a remainder using a zero divisor, and all floating
point exceptions vary among machines, and is usually adjustable by a
library function. —end note
We know that the standard says undefined behavior is unpredictable from the note that come with the definition which says:
[ Note: Undefined behavior may be expected when this International
Standard omits any explicit definition of behavior or when a program
uses an erroneous construct or erroneous data. Permissible undefined
behavior ranges from ignoring the situation completely with
unpredictable results, to behaving during translation or program
execution in a documented manner characteristic of the environment
(with or without the issuance of a diagnostic message), to terminating
a translation or execution (with the issuance of a diagnostic
message). Many erroneous program constructs do not engender undefined
behavior; they are required to be diagnosed. —end note ]
But what in the world can the gcc optimizer be doing to turn this into an infinite loop? It sounds completely wacky. But thankfully gcc gives us a clue to figuring it out in the warning:
warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
std::cout << i*1000000000 << std::endl;
^
The clue is the Waggressive-loop-optimizations, what does that mean? Fortunately for us this is not the first time this optimization has broken code in this way and we are lucky because John Regehr has documented a case in the article GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks which shows the following code:
int d[16];
int SATD (void)
{
int satd = 0, dd, k;
for (dd=d[k=0]; k<16; dd=d[++k]) {
satd += (dd < 0 ? -dd : dd);
}
return satd;
}
the article says:
The undefined behavior is accessing d[16] just before exiting the
loop. In C99 it is legal to create a pointer to an element one
position past the end of the array, but that pointer must not be
dereferenced.
and later on says:
In detail, here is what’s going on. A C compiler, upon seeing d[++k],
is permitted to assume that the incremented value of k is within the
array bounds, since otherwise undefined behavior occurs. For the code
here, GCC can infer that k is in the range 0..15. A bit later, when
GCC sees k<16, it says to itself: “Aha– that expression is always
true, so we have an infinite loop.” The situation here, where the
compiler uses the assumption of well-definedness to infer a useful
dataflow fact,
So what the compiler must be doing in some cases is assuming since signed integer overflow is undefined behavior then i must always be less than 4 and thus we have an infinite loop.
He explains this is very similar to the infamous Linux kernel null pointer check removal where in seeing this code:
struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
gcc inferred that since s was deferenced in s->f; and since dereferencing a null pointer is undefined behavior then s must not be null and therefore optimizes away the if (!s) check on the next line.
The lesson here is that modern optimizers are very aggressive about exploiting undefined behavior and most likely will only get more aggressive. Clearly with just a few examples we can see the optimizer does things that seem completely unreasonable to a programmer but in retrospect from the optimizers perspective make sense.

tl;dr The code generates a test that integer + positive integer == negative integer. Usually the optimizer does not optimize this out, but in the specific case of std::endl being used next, the compiler does optimize this test out. I haven't figured out what's special about endl yet.
From the assembly code at -O1 and higher levels, it is clear that gcc refactors the loop to:
i = 0;
do {
cout << i << endl;
i += NUMBER;
}
while (i != NUMBER * 4)
The biggest value that works correctly is 715827882, i.e. floor(INT_MAX/3). The assembly snippet at -O1 is:
L4:
movsbl %al, %eax
movl %eax, 4(%esp)
movl $__ZSt4cout, (%esp)
call __ZNSo3putEc
movl %eax, (%esp)
call __ZNSo5flushEv
addl $715827882, %esi
cmpl $-1431655768, %esi
jne L6
// fallthrough to "return" code
Note, the -1431655768 is 4 * 715827882 in 2's complement.
Hitting -O2 optimizes that to the following:
L4:
movsbl %al, %eax
addl $715827882, %esi
movl %eax, 4(%esp)
movl $__ZSt4cout, (%esp)
call __ZNSo3putEc
movl %eax, (%esp)
call __ZNSo5flushEv
cmpl $-1431655768, %esi
jne L6
leal -8(%ebp), %esp
jne L6
// fallthrough to "return" code
So the optimization that has been made is merely that the addl was moved higher up.
If we recompile with 715827883 instead then the -O1 version is identical apart from the changed number and test value. However, -O2 then makes a change:
L4:
movsbl %al, %eax
addl $715827883, %esi
movl %eax, 4(%esp)
movl $__ZSt4cout, (%esp)
call __ZNSo3putEc
movl %eax, (%esp)
call __ZNSo5flushEv
jmp L2
Where there was cmpl $-1431655764, %esi at -O1, that line has been removed for -O2. The optimizer must have decided that adding 715827883 to %esi can never equal -1431655764.
This is pretty puzzling. Adding that to INT_MIN+1 does generate the expected result, so the optimizer must have decided that %esi can never be INT_MIN+1 and I'm not sure why it would decide that.
In the working example it seems it'd be equally valid to conclude that adding 715827882 to a number cannot equal INT_MIN + 715827882 - 2 ! (this is only possible if wraparound does actually occur), yet it does not optimize the line out in that example.
The code I was using is:
#include <iostream>
#include <cstdio>
int main()
{
for (int i = 0; i < 4; ++i)
{
//volatile int j = i*715827883;
volatile int j = i*715827882;
printf("%d\n", j);
std::endl(std::cout);
}
}
If the std::endl(std::cout) is removed then the optimization no longer occurs. In fact replacing it with std::cout.put('\n'); std::flush(std::cout); also causes the optimization to not happen, even though std::endl is inlined.
The inlining of std::endl seems to affect the earlier part of the loop structure (which I don't quite understand what it is doing but I'll post it here in case someone else does):
With original code and -O2:
L2:
movl %esi, 28(%esp)
movl 28(%esp), %eax
movl $LC0, (%esp)
movl %eax, 4(%esp)
call _printf
movl __ZSt4cout, %eax
movl -12(%eax), %eax
movl __ZSt4cout+124(%eax), %ebx
testl %ebx, %ebx
je L10
cmpb $0, 28(%ebx)
je L3
movzbl 39(%ebx), %eax
L4:
movsbl %al, %eax
addl $715827883, %esi
movl %eax, 4(%esp)
movl $__ZSt4cout, (%esp)
call __ZNSo3putEc
movl %eax, (%esp)
call __ZNSo5flushEv
jmp L2 // no test
With mymanual inlining of std::endl, -O2:
L3:
movl %ebx, 28(%esp)
movl 28(%esp), %eax
addl $715827883, %ebx
movl $LC0, (%esp)
movl %eax, 4(%esp)
call _printf
movl $10, 4(%esp)
movl $__ZSt4cout, (%esp)
call __ZNSo3putEc
movl $__ZSt4cout, (%esp)
call __ZNSo5flushEv
cmpl $-1431655764, %ebx
jne L3
xorl %eax, %eax
One difference between these two is that %esi is used in the original , and %ebx in the second version; is there any difference in semantics defined between %esi and %ebx in general? (I don't know much about x86 assembly).

Another example of this error being reported in gcc is when you have a loop that executes for a constant number of iterations, but you are using the counter variable as an index into an array that has less than that number of items, such as:
int a[50], x;
for( i=0; i < 1000; i++) x = a[i];
The compiler can determine that this loop will try to access memory outside of the array 'a'. The compiler complains about this with this rather cryptic message:
iteration xxu invokes undefined behavior [-Werror=aggressive-loop-optimizations]

What I cannot get is why i value is broken by that overflow operation?
It seems that integer overflow occurs in 4th iteration (for i = 3).
signed integer overflow invokes undefined behavior. In this case nothing can be predicted. The loop may iterate only 4 times or it may go to infinite or anything else!
Result may vary compiler to compiler or even for different versions of same compiler.
C11: 1.3.24 undefined behavior:
behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
—end note ]

GCC/g++ cout << vs. printf()

Why does printf("hello world") ends up using more CPU instructions in the assembled code (not considering the standard library used) than cout << "hello world"?
For C++ we have:
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
For C:
movl $.LC0, %eax
movq %rax, %rdi
movl $0, %eax
call printf
WHAT are line 2 from the C++ code and lines 2,3 from the C code for?
I'm using gcc version 4.5.2

For 64bit gcc -O3 (4.5.0) on Linux x86_64, this reads for: cout << "Hello World"
movl $11, %edx ; String length in EDX
movl $.LC0, %esi ; String pointer in ESI
movl $_ZSt4cout, %edi ; load virtual table entry of "cout" for "ostream"
call _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
and, for printf("Hello World")
movl $.LC0, %edi ; String pointer to EDI
xorl %eax, %eax ; clear EAX (maybe flag for printf=>no stack arguments)
call printf
which means, your sequence depends entirely on any specific
compiler implementation, its version and probably compiler options.
Your Edit states,you use gcc 4.5.2 (which is fairly new).
Seems like 4.5.2 introduces additional 64bit register fiddling in
this sequence for whatever reason. It saves the 64bit RAX to RDI
before zeroing it out - which makes absolutely no sense (at least for me).
Much more interesting: 3 Argument call sequence (g++ -O1 -S source.cpp):
void c_proc()
{
printf("%s %s %s", "Hello", "World", "!") ;
}
void cpp_proc()
{
std::cout << "Hello " << "World " << "!";
}
leads to (c_proc):
movl $.LC0, %ecx
movl $.LC1, %edx
movl $.LC2, %esi
movl $.LC3, %edi
movl $0, %eax
call printf
with .LCx being the strings, and no stack pointer involved!
For cpp_proc:
movl $6, %edx
movl $.LC4, %esi
movl $_ZSt4cout, %edi
call _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl $6, %edx
movl $.LC5, %esi
movl $_ZSt4cout, %edi
call _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl $1, %edx
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
You see now what this is all about.
Regards
rbo

The caller code is most of the time irrelevant to performance.
I guess the line 2 of the C++ code stores the address of std::cout as the implicit 'this' argument of the operator<< method.
and i might be wrong on the C part, but it seems to me that it is incomplete. the 32bit upper part of rax is not initialized in this snippet, it might be initialized earlier. (no, i'm wrong here).
from what i understand (i might be wrong), the problem with 64bit registers, is that most of the time they cannot be initialized by immediates, so you have to play with 32bit operations to get the desired result. so the compiler plays with 32bit registers to initialize the 64bit rdi register.
And it seems that printf takes the value of al (the LSB of eax) as an input that tells printf() how many xmm 128 registers are used as input. It looks like an optimization to be able to pass the input string into the xmm registers or some other funny business.

int printf( const char*, ...) is a variadic function that can take one or more arguments; whereas ostream& operator<< (ostream&, signed char*) takes exactly two. I believe that that accounts for the difference in instructions needed to invoke them.
Line 2 in the C++ disassembly is where it passes the ostream& (in this case cout). so the function knows what stream object it is outputting to.
Since both end up making a function call, the comparison is largely irrelevant; the code executed within the function call will be far more significant. The operator<< is overloaded for a number of right-hand-side types, and is resolved at compile time; printf() on the other hand must parse the format string at runtime to determine the data type so may incur additional overhead. Either way the amount of code executed within the functions will swamp the call overhead in terms of instructions executed, and will almost certainly be dominated by the OS code required to render the text on a graphical display. So in short you are sweating the small stuff.

movl is move long, 32-bit move
movq is move quad, 64-bit move
printf has a return value, either the number of characters written or -1 on failure, and that value is stored into %eax, that's all the extra line is worrying about.

Mapping class to instance

I'm trying to build a simple entity/component system in c++ base on the second answer to this question : Best way to organize entities in a game?
Now, I would like to have a static std::map that returns (and even automatically create, if possible).
What I am thinking is something along the lines of:
PositionComponent *pos = systems[PositionComponent].instances[myEntityID];
What would be the best way to achieve that?

Maybe this?
std::map< std::type_info*, Something > systems;
Then you can do:
Something sth = systems[ &typeid(PositionComponent) ];
Just out of curiosity, I checked the assembler code of this C++ code
#include <typeinfo>
#include <cstdio>
class Foo {
virtual ~Foo() {}
};
int main() {
printf("%p\n", &typeid(Foo));
}
to be sure that it is really a constant. Assembler (stripped) outputted by GCC (without any optimisations):
.globl _main
_main:
LFB27:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
pushl %ebx
LCFI2:
subl $20, %esp
LCFI3:
call L3
"L00000000001$pb":
L3:
popl %ebx
leal L__ZTI3Foo$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
leal LC0-"L00000000001$pb"(%ebx), %eax
movl %eax, (%esp)
call _printf
movl $0, %eax
addl $20, %esp
popl %ebx
leave
ret
So it actually has to read the L__ZTI3Foo$non_lazy_ptr symbol (I wonder though that this is not constant -- maybe with other compiler options or with other compilers, it is). So a constant may be slightly faster (if the compiler sees the constant at compile time) because you save a read.

You should create some constants (ex. POSITION_COMPONENT=1) and then map those integers to instances.

Does the evil cast get trumped by the evil compiler?

This is not academic code or a hypothetical quesiton. The original problem was converting code from HP11 to HP1123 Itanium. Basically it boils down to a compile error on HP1123 Itanium. It has me really scratching my head when reproducing it on Windows for study. I have stripped all but the most basic aspects... You may have to press control D to exit a console window if you run it as is:
#include "stdafx.h"
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
char blah[6];
const int IAMCONST = 3;
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
(*pTOCONST) = 7;
printf("IAMCONST %d \n",IAMCONST);
printf("WHATISPOINTEDAT %d \n",(*pTOCONST));
printf("Address of IAMCONST %x pTOCONST %x\n",&IAMCONST, (pTOCONST));
cin >> blah;
return 0;
}
Here is the output
IAMCONST 3
WHATISPOINTEDAT 7
Address of IAMCONST 35f9f0 pTOCONST 35f9f0
All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example.
Update:
Indeed after searching for a while the Menu Debug >> Windows >> Disassembly had exactly the optimization that was described below.
printf("IAMCONST %d \n",IAMCONST);
0024360E mov esi,esp
00243610 push 3
00243612 push offset string "IAMCONST %d \n" (2458D0h)
00243617 call dword ptr [__imp__printf (248338h)]
0024361D add esp,8
00243620 cmp esi,esp
00243622 call #ILT+325(__RTC_CheckEsp) (24114Ah)
Thank you all!

Looks like the compiler is optimizing
printf("IAMCONST %d \n",IAMCONST);
into
printf("IAMCONST %d \n",3);
since you said that IAMCONST is a const int.
But since you're taking the address of IAMCONST, it has to actually be located on the stack somewhere, and the constness can't be enforced, so the memory at that location (*pTOCONST) is mutable after all.
In short: you casted away the constness, don't do that. Poor, defenseless C...
Addendum
Using GCC for x86, with -O0 (no optimizations), the generated assembly
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $36, %esp
movl $3, -12(%ebp)
leal -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl $7, (%eax)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
copies from *(bp-12) on the stack to printf's arguments. However, using -O1 (as well as -Os, -O2, -O3, and other optimization levels),
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $3, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $7, 4(%esp)
movl $.LC1, (%esp)
call printf
you can clearly see that the constant 3 is used instead.
If you are using Visual Studio's CL.EXE, /Od disables optimization. This varies from compiler to compiler.
Be warned that the C specification allows the C compiler to assume that the target of any int * pointer never overlaps the memory location of a const int, so you really shouldn't be doing this at all if you want predictable behavior.

The constant value IAMCONST is being inlined into the printf call.
What you're doing is at best wrong and in all likelihood is undefined by the C++ standard. My guess is that the C++ standard leaves the compiler free to inline a const primitive which is local to a function declaration. The reason being that the value should not be able to change.
Then again, it's C++ where should and can are very different words.

You are lucky the compiler is doing the optimization. An alternative treatment would be to place the const integer into read-only memory, whereupon trying to modify the value would cause a core dump.

Writing to a const object through a cast that removes the const is undefined behavior - so at the point where you do this:
(*pTOCONST) = 7;
all bets are off.
From the C++ standard 7.1.5.1 (The cv-qualifiers):
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
Because of this, the compiler is free to assume that the value of IAMCONST will not change, so it can optimize away the access to the actual storage. In fact, if the address of the const object is never taken, the compiler may eliminate the storage for the object altogether.
Also note that (again in 7.1.5.1):
A variable of non-volatile const-qualified integral or enumeration type initialized by an integral constant expression can be used in integral constant expressions (5.19).
Which means IAMCONST can be used in compile-time constant expressions (ie., to provide a value for an enumeration or the size of an array). What would it even mean to change that at runtime?

It doesn't matter if the compiler optimizes or not. You asked for trouble and you're lucky you got the trouble yourself instead of waiting for customers to report it to you.
"All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example."
If you really believe that then you need to switch to a language you can understand, or change professions. For the sake of yourself and your customers, stop using C or C++ or C#.
const int IAMCONST = 3;
You said it.
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
Guess why the compiler complained if you omitted your evil cast. The compiler might have been telling the truth before you lied to it.
"Does the evil cast get trumped by the evil compiler?"
No. The evil cast gets trumped by itself. Whether or not your compiler tried to tell you the truth, the compiler was not evil.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

gcc stack optimization - c++

Related

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?

GCC/g++ cout << vs. printf()

Mapping class to instance

Does the evil cast get trumped by the evil compiler?

Categories

Resources