In the C++11 standard there is a note regarding the array backing the uniform initialisation that states:
The implementation is free to allocate the array in read-only memory if an explicit array with the same initializer could be so allocated.
Does GCC/Clang/VS take advantage of this? Or is every initialisation using this feature subject to additional data on the stack, and additional initialisation time for this hidden array?
For instance, given the following example:
void function()
{
std::vector<std::string> values = { "First", "Second" };
...
Would each of the compilers mentioned above store the backing array to the uniform initialisation in the same memory as a variable declared static const? And would each of the compilers initialise the backing array when the function is called, or on application initialisation? (I'm not talking about the std::initializer_list<std::string> that would be created, but rather the "hidden array" it refers to.
This is my attempt to answer my own question for at least GCC. My understanding of the assembler output of gcc is not fantastic, so please correct as necessary.
Using initializer_test.cpp:
#include <vector>
int main()
{
std::vector<long> values = { 123456, 123457, 123458 };
return 0;
}
And compiling using gcc v4.6.3 using the following command line:
g++ -Wa,-adhln -g initializer_test.cpp -masm=intel -std=c++0x -fverbose-asm | c++filt | view -
I get the following output (cut down to the hopefully relevant bits):
5:initializer_test.cpp **** std::vector<long> values = { 123456, 123457, 123458 };
100 .loc 2 5 0
101 0009 488D45EF lea rax, [rbp-17] # tmp62,
102 000d 4889C7 mov rdi, rax #, tmp62
103 .cfi_offset 3, -24
104 0010 E8000000 call std::allocator<long>::allocator() #
104 00
105 0015 488D45D0 lea rax, [rbp-48] # tmp63,
106 0019 BA030000 mov edx, 3 #, <-- Parameter 3
106 00
107 001e BE000000 mov esi, OFFSET FLAT:._42 #, <-- Parameter 2
107 00
108 0023 4889C7 mov rdi, rax #, tmp63 <-- Parameter 1
109 0026 E8000000 call std::initializer_list<long>::initializer_list(long const*, unsigned long) #
109 00
110 002b 488D4DEF lea rcx, [rbp-17] # tmp64,
111 002f 488B75D0 mov rsi, QWORD PTR [rbp-48] # tmp65, D.10602
112 0033 488B55D8 mov rdx, QWORD PTR [rbp-40] # tmp66, D.10602
113 0037 488D45B0 lea rax, [rbp-80] # tmp67,
114 003b 4889C7 mov rdi, rax #, tmp67
115 .LEHB0:
116 003e E8000000 call std::vector<long, std::allocator<long> >::vector(std::initializer_list<long>, std::allocator<long> const&) #
116 00
117 .LEHE0:
118 .loc 2 5 0 is_stmt 0 discriminator 1
119 0043 488D45EF lea rax, [rbp-17] # tmp68,
120 0047 4889C7 mov rdi, rax #, tmp68
121 004a E8000000 call std::allocator<long>::~allocator() #
and
1678 .section .rodata
1679 0002 00000000 .align 16
1679 00000000
1679 00000000
1679 0000
1682 ._42:
1683 0010 40E20100 .quad 123456
1683 00000000
1684 0018 41E20100 .quad 123457
1684 00000000
1685 0020 42E20100 .quad 123458
1685 00000000
Now if I'm understanding the call on line 109 correctly in the context of x86-64 System V AMD64 ABI calling convention (the parameters I've annotated to the code listing), this is showing that the backing array is being stored in .rodata, which I am taking to be the same memory as static const data. At least for gcc 4.6 anyway.
Performing a similar thing test but with optimisations turned on (-O2) it seems the initializer_list is optimised out:
70 .file 2 "/usr/include/c++/4.6/ext/new_allocator.h"
71 .loc 2 92 0
72 0004 BF180000 mov edi, 24 #,
72 00
73 0009 E8000000 call operator new(unsigned long) #
73 00
74 .LVL1:
75 .file 3 "/usr/include/c++/4.6/bits/stl_algobase.h"
76 .loc 3 366 0
77 000e 488B1500 mov rdx, QWORD PTR ._42[rip] # ._42, ._42
77 000000
90 .file 4 "/usr/include/c++/4.6/bits/stl_vector.h"
91 .loc 4 155 0
92 0015 4885C0 test rax, rax # D.11805
105 .loc 3 366 0
106 0018 488910 mov QWORD PTR [rax], rdx #* D.11805, ._42
107 001b 488B1500 mov rdx, QWORD PTR ._42[rip+8] # ._42, ._42
107 000000
108 0022 48895008 mov QWORD PTR [rax+8], rdx #, ._42
109 0026 488B1500 mov rdx, QWORD PTR ._42[rip+16] # ._42, ._42
109 000000
110 002d 48895010 mov QWORD PTR [rax+16], rdx #, ._42
124 .loc 4 155 0
125 0031 7408 je .L8 #,
126 .LVL3:
127 .LBB342:
128 .LBB343:
129 .loc 2 98 0
130 0033 4889C7 mov rdi, rax #, D.11805
131 0036 E8000000 call operator delete(void*) #
All in all, std::initializer_list is looking pretty optimal in gcc.
First of all: VC++, as of version VS11=VS2012 in its initial release does not support initializer lists, so the question is a bit moot for VS atm., but as I'm sure they'll patch this up, it should become relevant in a few months (or years).
As additional info, I'll add what VS 2012 does with local array initialization, everybody may draw it's own conclusion as for what that means for when they'll implement initializer lists:
Here's initialization of built-in arrays what VC++2012 spits out in the default release mode of the compiler:
int _tmain(int argc, _TCHAR* argv[])
{
00B91002 in al,dx
00B91003 sub esp,28h
00B91006 mov eax,dword ptr ds:[00B94018h]
00B9100B xor eax,ebp
00B9100D mov dword ptr [ebp-4],eax
00B91010 push esi
int numbers[] = {1,2,3,4,5,6,7,8,9};
00B91011 mov dword ptr [numbers],1
00B91018 mov dword ptr [ebp-24h],2
00B9101F mov dword ptr [ebp-20h],3
00B91026 mov dword ptr [ebp-1Ch],4
00B9102D mov dword ptr [ebp-18h],5
00B91034 mov dword ptr [ebp-14h],6
00B9103B mov dword ptr [ebp-10h],7
00B91042 mov dword ptr [ebp-0Ch],8
00B91049 mov dword ptr [ebp-8],9
...
So this array is created/filled at function execution, no "static" storage involved as such.
Related
Using a very simple sample that uses an int pointer to point to a structure with longs. Granted it isn't the preferred method but it is being done to mimic other code. The objective is to view the data in the register before the free call.
This is the code.
#include <stdio.h>
#include <stdlib.h>
//#include <unistd.h>
typedef struct
{
unsigned long x;
unsigned long y;
unsigned long z;
}
myStruct;
int main () {
int *p_Struct = (int *)0;
int size = sizeof (myStruct);
printf("Size of (bytes)...\n");
printf(" myStruct : %d\n", sizeof (myStruct));
p_Struct = ( int*) malloc(size);
memset((int *)p_Struct, 0, size);
((myStruct *)p_Struct)->x = 111;
((myStruct *)p_Struct)->y = 222;
((myStruct *)p_Struct)->z = 333;
free(p_Struct);
return(0);
}
Using the following gdb version to step through the code.
Using > gdb
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Gdb is used to start the application then disassembled to acquire the line of code for the free command.
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040064d <+0>: push %rbp
0x000000000040064e <+1>: mov %rsp,%rbp
0x0000000000400651 <+4>: sub $0x10,%rsp
=> 0x0000000000400655 <+8>: movq $0x0,-0x8(%rbp)
0x000000000040065d <+16>: movl $0x18,-0xc(%rbp)
0x0000000000400664 <+23>: mov $0x400770,%edi
0x0000000000400669 <+28>: callq 0x400500 <puts#plt>
0x000000000040066e <+33>: mov $0x18,%esi
0x0000000000400673 <+38>: mov $0x400783,%edi
0x0000000000400678 <+43>: mov $0x0,%eax
0x000000000040067d <+48>: callq 0x400510 <printf#plt>
0x0000000000400682 <+53>: mov -0xc(%rbp),%eax
0x0000000000400685 <+56>: cltq
0x0000000000400687 <+58>: mov %rax,%rdi
0x000000000040068a <+61>: callq 0x400550 <malloc#plt>
0x000000000040068f <+66>: mov %rax,-0x8(%rbp)
0x0000000000400693 <+70>: mov -0xc(%rbp),%eax
0x0000000000400696 <+73>: movslq %eax,%rdx
0x0000000000400699 <+76>: mov -0x8(%rbp),%rax
0x000000000040069d <+80>: mov $0x0,%esi
0x00000000004006a2 <+85>: mov %rax,%rdi
0x00000000004006a5 <+88>: callq 0x400520 <memset#plt>
0x00000000004006aa <+93>: mov -0x8(%rbp),%rax
0x00000000004006ae <+97>: movq $0x6f,(%rax)
0x00000000004006b5 <+104>: mov -0x8(%rbp),%rax
0x00000000004006b9 <+108>: movq $0xde,0x8(%rax)
0x00000000004006c1 <+116>: mov -0x8(%rbp),%rax
0x00000000004006c5 <+120>: movq $0x14d,0x10(%rax)
0x00000000004006cd <+128>: mov -0x8(%rbp),%rax
0x00000000004006d1 <+132>: mov %rax,%rdi
0x00000000004006d4 <+135>: callq 0x4004f0 <free#plt>
0x00000000004006d9 <+140>: mov $0x0,%eax
0x00000000004006de <+145>: leaveq
0x00000000004006df <+146>: retq
End of assembler dump.
Using the specific line of code, a break point is set on free.
(gdb) break *0x00000000004006d4
Continue until the code breaks on the free command.
(gdb) continue
Continuing.
Size of (bytes)...
myStruct : 24
Breakpoint 2, 0x00000000004006d4 in main () at freeQuestion.c:28
28 free(p_Struct);
Display the available registers.
(gdb) info reg
rax 0x602010 6299664
rbx 0x0 0
rcx 0x602010 6299664
rdx 0x18 24
rsi 0x0 0
rdi 0x602010 6299664
rbp 0x7fffffffc160 0x7fffffffc160
rsp 0x7fffffffc150 0x7fffffffc150
r8 0x602000 6299648
r9 0x18 24
r10 0x7fffffffbed0 140737488338640
r11 0x2aaaaad56700 46912498919168
r12 0x400560 4195680
r13 0x7fffffffc240 140737488339520
r14 0x0 0
r15 0x0 0
rip 0x4006d4 0x4006d4 <main+135>
eflags 0x283 [ CF SF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb)
I assume that rdi register will house the data that is being freed at address 0x602010. To be sure all the data will be visible the examine command is executed to display 80 bytes of data starting 16 bytes prior.
(gdb) x/80d 0x602000
0x602000: 0 0 0 0 0 0 0 0
0x602008: 33 0 0 0 0 0 0 0
0x602010: 111 0 0 0 0 0 0 0
0x602018: -34 0 0 0 0 0 0 0
0x602020: 77 1 0 0 0 0 0 0
0x602028: -31 15 2 0 0 0 0 0
0x602030: 0 0 0 0 0 0 0 0
0x602038: 0 0 0 0 0 0 0 0
0x602040: 0 0 0 0 0 0 0 0
0x602048: 0 0 0 0 0 0 0 0
(gdb)
From the above, 111 is visible but not 222, or 333.
How can all the data (111,222,333) be viewed prior to the free command being executed?
From the above, 111 is visible but not 222, or 333.
There is no way that you could have observed this output while stopped before the CALL free instruction. We clearly see that values 0x6f == 111, 0xde == 222 and 0x14d == 333 are loaded at offset 0, 8 and 16 from $RAX:
0x00000000004006ae <+97>: movq $0x6f,(%rax)
0x00000000004006b9 <+108>: movq $0xde,0x8(%rax)
0x00000000004006c5 <+120>: movq $0x14d,0x10(%rax)
and then $RAX is copied to $RDI just before the call to free:
0x00000000004006d1 <+132>: mov %rax,%rdi
0x00000000004006d4 <+135>: callq 0x4004f0 <free#plt>
Here is the expected output (that I observe with your program):
(gdb) p/x $rdi
$1 = 0x602420
(gdb) x/6d $rdi
0x602420: 111 0 222 0
0x602430: 333 0
But if you execute nexti (to step over the call to free), then the values can be overwritten (you can't expect the contents of now freed memory to be anything in particular).
After nexti, I observe:
(gdb) x/6d 0x602420
0x602420: 0 0 222 0
0x602430: 333 0
but it could just as easily be 111 0 0 0 0 0 that you observed.
I'm in the process of writing a compiler purely as a learning experience. I'm currently learning about stack frames by compiling simple c++ code and then studying the output asm produced by gcc 4.9.2 for Windows x86.
my simple c++ code is
#include <iostream>
using namespace std;
int globalVar;
void testStackStuff(void);
void testPassingOneInt32(int v);
void forceStackFrameCreation(int v);
int main()
{
globalVar = 0;
testStackStuff();
std::cout << globalVar << std::endl;
}
void testStackStuff(void)
{
testPassingOneInt32(666);
}
void testPassingOneInt32(int v)
{
globalVar = globalVar + v;
forceStackFrameCreation(v);
}
void forceStackFrameCreation(int v)
{
globalVar = globalVar + v;
}
Ok, when this is compiled with -mpreferred-stack-boundary=4 I was expecting to see a stack aligned to 16 bytes (technically it is aligned to 16 bytes but with an extra 16 bytes of unused stack space). The prologue for main as produced by gcc is:
22 .loc 1 12 0
23 .cfi_startproc
24 0000 8D4C2404 lea ecx, [esp+4]
25 .cfi_def_cfa 1, 0
26 0004 83E4F0 and esp, -16
27 0007 FF71FC push DWORD PTR [ecx-4]
28 000a 55 push ebp
29 .cfi_escape 0x10,0x5,0x2,0x75,0
30 000b 89E5 mov ebp, esp
31 000d 51 push ecx
32 .cfi_escape 0xf,0x3,0x75,0x7c,0x6
33 000e 83EC14 sub esp, 20
34 .loc 1 12 0
35 0011 E8000000 call ___main
35 00
36 .loc 1 13 0
37 0016 C7050000 mov DWORD PTR _globalVar, 0
38 .loc 1 15 0
39 0020 E8330000 call __Z14testStackStuffv
line 26 rounds esp down to the nearest 16 byte boundary.
lines 27, 28 and 31 push a total of 12 bytes onto the stack, then
line 33 subtracts another 20 bytes from esp, giving a total of 32 bytes!
Why?
line 39 then calls testStackStuff.
NOTE - this call pushes the return address (4 bytes).
Now, lets look at the prologue for testStackStuff, keeping in mind that the stack is now 4 bytes closer to the next 16 byte boundary.
67 0058 55 push ebp
68 .cfi_def_cfa_offset 8
69 .cfi_offset 5, -8
70 0059 89E5 mov ebp, esp
71 .cfi_def_cfa_register 5
72 005b 83EC18 sub esp, 24
73 .loc 1 22 0
74 005e C704249A mov DWORD PTR [esp], 666
line 67 pushes another 4 bytes (now 8 bytes towards the boundary).
line 72 subtracts another 24 bytes (total 32 bytes).
At this point the stack is now aligned correctly on a 16 byte boundary. But why the multiple of 2?
If I change the compiler flags to -mpreferred-stack-boundary=5 I would expect a stack aligned to 32 bytes, but again gcc seems to produce stack frames aligned to 64 bytes, twice the amount I was expecting.
Prologue for main
23 .cfi_startproc
24 0000 8D4C2404 lea ecx, [esp+4]
25 .cfi_def_cfa 1, 0
26 0004 83E4E0 and esp, -32
27 0007 FF71FC push DWORD PTR [ecx-4]
28 000a 55 push ebp
29 .cfi_escape 0x10,0x5,0x2,0x75,0
30 000b 89E5 mov ebp, esp
31 000d 51 push ecx
32 .cfi_escape 0xf,0x3,0x75,0x7c,0x6
33 000e 83EC34 sub esp, 52
34 .loc 1 12 0
35 0011 E8000000 call ___main
35 00
36 .loc 1 13 0
37 0016 C7050000 mov DWORD PTR _globalVar, 0
37 00000000
37 0000
38 .loc 1 15 0
39 0020 E8330000 call __Z14testStackStuffv
line 26 rounds esp down to the nearest 32 byte boundary
lines 27, 28 and 31 push a total of 12 bytes onto the stack, then
line 33 subtracts another 52 bytes from esp, giving a total of 64 bytes!
and the prologue for testStackStuff is
66 .cfi_startproc
67 0058 55 push ebp
68 .cfi_def_cfa_offset 8
69 .cfi_offset 5, -8
70 0059 89E5 mov ebp, esp
71 .cfi_def_cfa_register 5
72 005b 83EC38 sub esp, 56
73 .loc 1 22 0
(4 bytes on stack from) call __Z14testStackStuffv
(4 bytes on stack from) push ebp
(56 bytes on stack from) sub esp,56
total 64 bytes.
Does anybody know why gcc is creating this extra stack space or have I overlooked something obvious?
Thanks for any help you can offer.
In order to resolve this mystery, you will need to look at the documentation of gcc to find out exactly which flavor of Application Binary Interface (ABI) it uses, and then go find the specification of that ABI and read it. If you are "in the process of writing a compiler purely as a learning experience" you will definitely need it.
In short, and in broad terms, what is happening is that the ABI mandates that this extra space be reserved by the current function, for the purpose of passing parameters to functions invoked by the current function. The decision of how much space to reserve depends primarily on the amount of parameter passing that the function intends to do, but it is a bit more nuanced than that, and the ABI is the document which explains it in detail
In the old style of stack frames, we would PUSH parameters to the stack, and then invoke a function.
In the new style of stack frames, EBP is not used anymore, (not sure why it is preserved and copied from ESP anymore,) parameters are placed in the stack at a specific offset with respect to ESP, and then the function is invoked. This is evidenced by the fact that mov DWORD PTR [esp], 666 is used to pass the 666 argument to the call testPassingOneInt32(666);.
For why it's doing the push DWORD PTR [ecx-4] to copy the return address, see this partial duplicate. IIRC, it's constructing a complete copy of the return-address / saved-ebp pair.
but again gcc seems to produce stack frames aligned to 64 bytes
No, it used and esp, -32. The stack frame size looks like 64 bytes, but its alignment is only 32B.
I'm not sure why it leaves so much extra space in the stack frame. It's not very interesting to guess why gcc -O0 does what it does, because it's not even trying to be optimal.
You obviously compiled without optimization, which makes the whole thing less interesting. This tells you more about gcc internals and what was convenient for gcc, not that the code it emitted was necessary or does anything useful. Also, use http://gcc.godbolt.org/ to get nice asm output without the CFI directives and other noise. (Please tidy up the asm code blocks in your question with output from that. All the noise makes them harder to read.)
I am trying to debug a tricky core dump (from an -O2 optimized binary).
// Caller Function
void caller(Container* c)
{
std::list < Message*> msgs;
if(!decoder.called(c->buf_, msgs))
{
....
.....
}
// Called Function
bool
Decoder::called(Buffer* buf, list < Message*>& msgs)
{
add_data(buf); // Inlined code to append buf to decoders buf chain
while(m_data_in && m_data_in->length() > 0)
{
.....
}
}
In both caller and the callee, the first argument is optimized out, that means it must be somewhere in the register.
Caller Disassembly:
push %r15
mov %rdi,%r15
push %r14
push %r13
push %r12
push %rbp
push %rbx
sub $0x68,%rsp
test %rsi,%rsi
je 0x8ccd62
cmpq $0x0,(%rsi)
je 0x8ccd62
lea 0x40(%rsp),%rax
lea 0x1b8(%rdi),%rdi
mov %rax,(%rsp)
mov %rax,0x40(%rsp)
mov %rax,%rdx
mov %rax,0x48(%rsp)
mov (%rsi),%rsi
callq 0x8cc820
Caller Register Info:
rax 0x7fbfffc7e0 548682057696
rbx 0x2a97905ba0 182931446688
rcx 0x0 0
rdx 0x2 2
rsi 0x1 1
rdi 0x7fbfffc7e2 548682057698
rbp 0x4f 0x4f
rsp 0x7fbfffc870 0x7fbfffc870
r8 0x40 64
r9 0x20 32
r10 0x7fbfffc7e0 548682057696
r11 0x2abe466600 183580911104
r12 0x7fbfffd910 548682062096 // THIS IS HOLDING buf_
r13 0x7fbfffdec0 548682063552
r14 0x5dc 1500
r15 0x2a97905ba0 182931446688
rip 0x8cca89 0x8cca89
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Called function Disassembly:
push %r14
push %r13
mov %rdx,%r13
push %r12
mov %rdi,%r12
push %rbp
push %rbx
sub $0x10,%rsp
mov 0x8(%rdi),%rdx
test %rdx,%rdx
jne 0x8cc843
jmpq 0x8cc9cb
mov %rax,%rdx
mov 0x8(%rdx),%rax
test %rax,%rax
mov %rsi,0x8(%rdx)
mov 0x8(%r12),%rax
test %rax,%rax
xor %edx,%edx
add 0x4(%rax),%edx
mov 0x8(%rax),%rax
lea 0x8(%rsp),%rsi
mov %r12,%rdi
movq $0x0,0x8(%rsp)
Called function Register Info :
rax 0x7fbfffc7e0 548682057696
rbx 0x2abc49f9c0 183547591104
rcx 0x0 0
rdx 0x2 2
rsi 0x1 1
rdi 0x7fbfffc7e2 548682057698
rbp 0xffffffff 0xffffffff
rsp 0x7fbfffc830 0x7fbfffc830
r8 0x40 64
r9 0x20 32
r10 0x7fbfffc7e0 548682057696
r11 0x2abe466600 183580911104
r12 0x2a97905d58 182931447128
r13 0x7fbfffc8b0 548682057904
r14 0x5dc 1500
r15 0x2a97905ba0 182931446688
rip 0x8cc88a 0x8cc88a
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
The issue is, in the called function, it appears that "add_data" function achieved nothing.
So, wanted to know whether in disassembly of called function, do we see the "buf_" pointer being used anywhere (Register r12 in callee function).
I do understand assembly to some level, but all those code inlining has left me confused.
Would appreciate some help in demistifying called function disassembly.
UPDATE:
add_data does below:
if (m_data_in) {
m_data_in->next = data;
} else {
m_data_in = data;
}
This looks like if (m_data_in)
mov 0x8(%rdi),%rdx
test %rdx,%rdx
test %rdx,%rdx
jne 0x8cc843
jmpq 0x8cc9cb
Now, I don't quite know where 0x8cc843 and 0x8cc9cb are located in your code, so can't really follow the code further. There is still not enough code & information to say exactly what is going on in the original question. I'm happy to fill in more of this answer if more information is provided.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What are the pros and cons of both approaches:
Case#1: const int& x = 5;
vs
Case#2: const int x = 5;
I know that in Case#1, x refers to an location in the executable's text, and I believe so does x in Case#2.
Why would you use one and not the other?
Using gcc explorer:
Case 1 (-O3):
main: # #main
xorl %eax, %eax
ret
global_constant:
.quad reference temporary for global_constant
reference temporary for global_constant:
.long 2048 # 0x800
Case 2 (-O3):
main: # #main
xorl %eax, %eax
ret
Using Mingw 4.8.1 g++ Windows 8 (it is the same result just looks more messy)..
Case 1 (-O3):
.file "TestConstantsAsm.cpp"
.def __main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.p2align 4,,15
.def _GLOBAL__sub_I_global_constant; .scl 3; .type 32; .endef
.seh_proc _GLOBAL__sub_I_global_constant
_GLOBAL__sub_I_global_constant:
.seh_endprologue
leaq _ZGR15global_constant0(%rip), %rax
movq %rax, global_constant(%rip)
ret
.seh_endproc
.section .ctors,"w"
.align 8
.quad _GLOBAL__sub_I_global_constant
.globl global_constant
.bss
.align 16
global_constant:
.space 8
.data
.align 4
_ZGR15global_constant0:
.long 2048
.ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Case 2 (-O3):
.file "TestConstantsAsm.cpp"
.def __main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Without (-O3), the 2nd case is still better than the first case. It always is..
The commands I used were:
g++ -S -O3 TestConstantsAsm.cpp
g++ -S -O2 TestConstantsAsm.cpp
g++ -S -O TestConstantsAsm.cpp
and the program was:
const int& global_constant = 2048;
int main()
{
}
and
const int global_constant = 2048;
int main()
{
}
Even if you std::cout<<global_constant<<"\n"; within main (in both cases), the assembly for case 2 is still cleaner than case 1.
Using Visual C++ 2013, compiled in debug mode.
Case 1
The following code:
const int& VALUE = 2048;
int main() {
int A = VALUE;
return 0;
}
...produces this assembly output:
1: const int& VALUE = 2048;
013F1380 55 push ebp
013F1381 8B EC mov ebp,esp
013F1383 81 EC C0 00 00 00 sub esp,0C0h
013F1389 53 push ebx
013F138A 56 push esi
013F138B 57 push edi
013F138C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
013F1392 B9 30 00 00 00 mov ecx,30h
013F1397 B8 CC CC CC CC mov eax,0CCCCCCCCh
013F139C F3 AB rep stos dword ptr es:[edi]
013F139E C7 05 34 91 3F 01 00 08 00 00 mov dword ptr ds:[13F9134h],800h
013F13A8 C7 05 30 91 3F 01 34 91 3F 01 mov dword ptr ds:[13F9130h],13F9134h
013F13B2 5F pop edi
013F13B3 5E pop esi
013F13B4 5B pop ebx
013F13B5 8B E5 mov esp,ebp
013F13B7 5D pop ebp
013F13B8 C3 ret
(...)
2:
3: int main() {
00C42280 55 push ebp
00C42281 8B EC mov ebp,esp
00C42283 81 EC CC 00 00 00 sub esp,0CCh
00C42289 53 push ebx
00C4228A 56 push esi
00C4228B 57 push edi
00C4228C 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
00C42292 B9 33 00 00 00 mov ecx,33h
00C42297 B8 CC CC CC CC mov eax,0CCCCCCCCh
00C4229C F3 AB rep stos dword ptr es:[edi]
4: int A = VALUE;
00C4229E A1 30 91 C4 00 mov eax,dword ptr ds:[00C49130h]
00C422A3 8B 08 mov ecx,dword ptr [eax]
00C422A5 89 4D F8 mov dword ptr [A],ecx
5: return 0;
00C422A8 33 C0 xor eax,eax
6: }
00C422AA 5F pop edi
6: }
00C422AB 5E pop esi
00C422AC 5B pop ebx
00C422AD 8B E5 mov esp,ebp
00C422AF 5D pop ebp
00C422B0 C3 ret
In case 1, the value 2048 (0x800) is stored in memory, which is accessed when the constant is referenced in the main code.
Case 2
The following code:
const int VALUE = 2048;
int main() {
int A = VALUE;
return 0;
}
...produces this assembly output:
1: const int VALUE = 2048;
2:
3: int main() {
00F02280 55 push ebp
00F02281 8B EC mov ebp,esp
00F02283 81 EC CC 00 00 00 sub esp,0CCh
00F02289 53 push ebx
00F0228A 56 push esi
00F0228B 57 push edi
00F0228C 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
00F02292 B9 33 00 00 00 mov ecx,33h
00F02297 B8 CC CC CC CC mov eax,0CCCCCCCCh
00F0229C F3 AB rep stos dword ptr es:[edi]
4: int A = VALUE;
00F0229E C7 45 F8 00 08 00 00 mov dword ptr [A],800h
5: return 0;
00F022A5 33 C0 xor eax,eax
6: }
00F022A7 5F pop edi
00F022A8 5E pop esi
00F022A9 5B pop ebx
00F022AA 8B E5 mov esp,ebp
00F022AC 5D pop ebp
00F022AD C3 ret
In case 2, no code is generated for the global constant declaration itself. When the constant is used, it is simply a "move immediate" instruction which inserts the value 2048 (0x800) directly into the variable. If the constant is not used, it will not appear in the assembly at all.
Is there any difference in computational cost of
if(something){
return something;
}else{
return somethingElse;
}
and
if(something){
return something;
}
//else (put in comments for readibility purposes)
return somethingElse;
In theory we have command (else) but it doesn't seem it should make an actuall difference.
Edit:
After running code for different set sizes, I found that there actually is a differrence, code without else appears to be about 1.5% more efficient. But it most likely depends on compiler, as stated by many people below. Code I tested it on:
int withoutElse(bool a){
if(a)
return 0;
return 1;
}
int withElse(bool a){
if(a)
return 0;
else
return 1;
}
int main(){
using namespace std;
bool a=true;
clock_t begin,end;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withElse(a);
}
end = clock();
cout<<end-begin<<endl;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withoutElse(a);
}
end = clock();
cout<<end-begin<<endl;
return 0;
}
Checked on loops from 1 000 000 to 1 000 000 000, and results were consistently different
Edit 2:
Assembly code (once again, generated using Visual Studio 2010) also shows small difference (appareantly, I'm no good with assemblers :()
?withElse##YAH_N#Z PROC ; withElse, COMDAT
; Line 12
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 13
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN2#withElse
; Line 14
xor eax, eax
jmp SHORT $LN3#withElse
; Line 15
jmp SHORT $LN3#withElse
$LN2#withElse:
; Line 16
mov eax, 1
$LN3#withElse:
; Line 17
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withElse##YAH_N#Z ENDP ; withElse
and
?withoutElse##YAH_N#Z PROC ; withoutElse, COMDAT
; Line 4
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 5
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN1#withoutEls
; Line 6
xor eax, eax
jmp SHORT $LN2#withoutEls
$LN1#withoutEls:
; Line 7
mov eax, 1
$LN2#withoutEls:
; Line 9
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withoutElse##YAH_N#Z ENDP ; withoutElse
It is generically different, but the compiler may decide to execute the same jump in both cases (it will practically do this always).
The best way to see what a compiler does is reading the assembler. Assuming that you are using gcc you can try with
gcc -g -c -fverbose-asm myfile.c; objdump -d -M intel -S myfile.o > myfile.s
which creates a mix of assembler/c code and makes the job easier at the beginning.
As for your example it is:
CASE1
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
}else{
return 0;
2e: b8 00 00 00 00 mov eax,0x0
}
CASE2
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
return 0;
2e: b8 00 00 00 00 mov eax,0x0
As you could imagine there are no differences!
It won't compile if you type `return
Think that once the code gets compiled, all ifs, elses and loops are changed to goto's
If (cond) { code A } code B
turns to
if cond is false jump to code b
code A
code B
and
If (cond) { code A } else { code B } code C
turns to
if cond is false jump to code B
code A
ALWAYS jump to code C
code B
code C
Most processors 'guess' whether they're going to jump or not before checking if they actually jump. Depending on the processor, it might affect the performance to fail a guess.
So the answer is YES! (unless there's an ALWAYS jump at the end of first comparison) It will take 2-3 cycles to do the ALWAYS jump which isn't in the first if.