Related
So I'm taking a class where I am given a single object file and need to reverse engineer it into c++ code. The command I'm told to use is "gdb assignment6_1.o" to open it in gdb, and "disass main" to see assembly code.
I'm also using "objdump -dr assignment6_1.o" myself since it outputs a little more information.
The problem I'm running into, is that using objdump, I can see that the program is trying to access what I believe is a variable or maybe a string, ".rodata+0x41". There are multiple .rodata's, that's just one example.
Is there a command or somewhere I can look to see what that's referencing? I also have access to the "Bless" program.
Below is a snippet of the disassembled code I have.
a3: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # aa <main+0x31>
a6: R_X86_64_PC32 .rodata+0x41
aa: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b1 <main+0x38>
ad: R_X86_64_PC32 _ZSt4cout-0x4
b1: e8 00 00 00 00 callq b6 <main+0x3d>
b2: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
b6: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # bd <main+0x44>
b9: R_X86_64_PC32 .rodata+0x53
bd: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # c4 <main+0x4b>
c0: R_X86_64_PC32 _ZSt4cout-0x4
c4: e8 00 00 00 00 callq c9 <main+0x50>
c5: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
c9: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # d0 <main+0x57>
cc: R_X86_64_PC32 .rodata+0x5e
d0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # d7 <main+0x5e>
d3: R_X86_64_PC32 _ZSt4cout-0x4
d7: e8 00 00 00 00 callq dc <main+0x63>
d8: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
dc: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # e3 <main+0x6a>
df: R_X86_64_PC32 .rodata+0x6e
e3: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # ea <main+0x71>
e6: R_X86_64_PC32 _ZSt4cout-0x4
ea: e8 00 00 00 00 callq ef <main+0x76>
eb: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4```
Is there a way to see what's inside a ".rodata+(memory location)" in an object file?
Sure. Both objdump and readelf can dump contents of any section.
Example:
// x.c
#include <stdio.h>
int foo() { return printf("AA.\n") + printf("BBBB.\n"); }
gcc -c x.c
objdump -dr x.o
...
9: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 10 <foo+0x10>
c: R_X86_64_PC32 .rodata-0x4
...
1f: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 26 <foo+0x26>
22: R_X86_64_PC32 .rodata+0x1
...
Note that because the RIP used in these instructions is the address of the next instruction, the actual data we care about is at .rodata+0 and .rodata+5 (in your original disassembly, you care about .rodata+45, not .rodata+41).
So what's there?
objdump -sj.rodata x.o
x.o: file format elf64-x86-64
Contents of section .rodata:
0000 41412e0a 00424242 422e0a00 AA...BBBB...
or, using readelf:
readelf -x .rodata x.o
Hex dump of section '.rodata':
0x00000000 41412e0a 00424242 422e0a00 AA...BBBB...
Will (or can?) a switch statement based on a template argument be removed by the compiler?
See for example the following function in which the template argument funcStep1 is used to select the appropriate function. Will this switch statement be removed as its argument is known at compile time? I tried to learn it from the assembly code (given below) but I have no experience in reading assembly yet so this is not really a feasible task for me.
This question became now relevant for me since the newly introduced if constexpr provides an alternative that is guaranteed to be evaluated at compile time.
template<int funcStep1>
double Mdp::expectedCost(int sdx0, int x0, int x1, int adx0, int adx1) const
{
switch (funcStep1)
{
case 1:
case 3:
return expectedCost_exact(sdx0, x0, x1, adx0, adx1);
case 2:
case 4:
return expectedCost_approx(sdx0, x0, x1, adx0, adx1);
default:
throw string("Unknown value (Mdp::expectedCost.cc)");
}
}
// Define all the template types we need
template double Mdp::expectedCost<1>(int, int, int, int, int) const;
template double Mdp::expectedCost<2>(int, int, int, int, int) const;
template double Mdp::expectedCost<3>(int, int, int, int, int) const;
template double Mdp::expectedCost<4>(int, int, int, int, int) const;
Here you find the output of 'objdump -D' when the above function is compiled with gcc -O2 -ffunction-sections:
1expectedCost.o: file format elf64-x86-64
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 08 00 or %al,(%rax)
6: 00 00 add %al,(%rax)
8: 09 00 or %eax,(%rax)
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0a 00 or (%rax),%al
6: 00 00 add %al,(%rax)
8: 0b 00 or (%rax),%eax
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0c 00 or $0x0,%al
6: 00 00 add %al,(%rax)
8: 0d .byte 0xd
9: 00 00 add %al,(%rax)
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0e (bad)
5: 00 00 add %al,(%rax)
7: 00 0f add %cl,(%rdi)
9: 00 00 add %al,(%rax)
...
Disassembly of section .bss:
0000000000000000 <_ZStL8__ioinit>:
...
Disassembly of section .text._ZNK3Mdp12expectedCostILi1EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi1EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi1EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi2EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi2EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi2EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi3EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi3EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi3EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi4EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi4EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi4EEEdiiiii+0x5>
Disassembly of section .text.startup._GLOBAL__sub_I_expectedCost.cc:
0000000000000000 <_GLOBAL__sub_I_expectedCost.cc>:
0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 7 <_GLOBAL__sub_I_expectedCost.cc+0x7>
7: 48 83 ec 08 sub $0x8,%rsp
b: e8 00 00 00 00 callq 10 <_GLOBAL__sub_I_expectedCost.cc+0x10>
10: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 17 <_GLOBAL__sub_I_expectedCost.cc+0x17>
17: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 1e <_GLOBAL__sub_I_expectedCost.cc+0x1e>
1e: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # 25 <_GLOBAL__sub_I_expectedCost.cc+0x25>
25: 48 83 c4 08 add $0x8,%rsp
29: e9 00 00 00 00 jmpq 2e <_GLOBAL__sub_I_expectedCost.cc+0x2e>
Disassembly of section .init_array:
0000000000000000 <.init_array>:
...
Disassembly of section .comment:
0000000000000000 <.comment>:
0: 00 47 43 add %al,0x43(%rdi)
3: 43 3a 20 rex.XB cmp (%r8),%spl
6: 28 55 62 sub %dl,0x62(%rbp)
9: 75 6e jne 79 <_ZStL8__ioinit+0x79>
b: 74 75 je 82 <_ZStL8__ioinit+0x82>
d: 20 37 and %dh,(%rdi)
f: 2e 33 2e xor %cs:(%rsi),%ebp
12: 30 2d 32 37 75 62 xor %ch,0x62753732(%rip) # 6275374a <_ZStL8__ioinit+0x6275374a>
18: 75 6e jne 88 <_ZStL8__ioinit+0x88>
1a: 74 75 je 91 <_ZStL8__ioinit+0x91>
1c: 31 7e 31 xor %edi,0x31(%rsi)
1f: 38 2e cmp %ch,(%rsi)
21: 30 34 29 xor %dh,(%rcx,%rbp,1)
24: 20 37 and %dh,(%rdi)
26: 2e 33 2e xor %cs:(%rsi),%ebp
29: 30 00 xor %al,(%rax)
Disassembly of section .eh_frame:
0000000000000000 <.eh_frame>:
0: 14 00 adc $0x0,%al
2: 00 00 add %al,(%rax)
4: 00 00 add %al,(%rax)
6: 00 00 add %al,(%rax)
8: 01 7a 52 add %edi,0x52(%rdx)
b: 00 01 add %al,(%rcx)
d: 78 10 js 1f <.eh_frame+0x1f>
f: 01 1b add %ebx,(%rbx)
11: 0c 07 or $0x7,%al
13: 08 90 01 00 00 10 or %dl,0x10000001(%rax)
19: 00 00 add %al,(%rax)
1b: 00 1c 00 add %bl,(%rax,%rax,1)
1e: 00 00 add %al,(%rax)
20: 00 00 add %al,(%rax)
22: 00 00 add %al,(%rax)
24: 05 00 00 00 00 add $0x0,%eax
29: 00 00 add %al,(%rax)
2b: 00 10 add %dl,(%rax)
2d: 00 00 add %al,(%rax)
2f: 00 30 add %dh,(%rax)
31: 00 00 add %al,(%rax)
33: 00 00 add %al,(%rax)
35: 00 00 add %al,(%rax)
37: 00 05 00 00 00 00 add %al,0x0(%rip) # 3d <.eh_frame+0x3d>
3d: 00 00 add %al,(%rax)
3f: 00 10 add %dl,(%rax)
41: 00 00 add %al,(%rax)
43: 00 44 00 00 add %al,0x0(%rax,%rax,1)
47: 00 00 add %al,(%rax)
49: 00 00 add %al,(%rax)
4b: 00 05 00 00 00 00 add %al,0x0(%rip) # 51 <.eh_frame+0x51>
51: 00 00 add %al,(%rax)
53: 00 10 add %dl,(%rax)
55: 00 00 add %al,(%rax)
57: 00 58 00 add %bl,0x0(%rax)
5a: 00 00 add %al,(%rax)
5c: 00 00 add %al,(%rax)
5e: 00 00 add %al,(%rax)
60: 05 00 00 00 00 add $0x0,%eax
65: 00 00 add %al,(%rax)
67: 00 14 00 add %dl,(%rax,%rax,1)
6a: 00 00 add %al,(%rax)
6c: 6c insb (%dx),%es:(%rdi)
6d: 00 00 add %al,(%rax)
6f: 00 00 add %al,(%rax)
71: 00 00 add %al,(%rax)
73: 00 2e add %ch,(%rsi)
75: 00 00 add %al,(%rax)
77: 00 00 add %al,(%rax)
79: 4b 0e rex.WXB (bad)
7b: 10 5e 0e adc %bl,0xe(%rsi)
7e: 08 00 or %al,(%rax)
Yes, this is optimized. There are a few things that make reading assembly easier, such as demangling names (example: _ZNK3Mdp12expectedCostILi1EEEdiiiii is the mangled form of double Mdp::expectedCost<1>(int, int, int, int, int) const), stripping comments and text (and using Intel syntax):
double expectedCost<1>(int, int, int, int, int): # #double expectedCost<1>(int, int, int, int, int)
jmp expectedCost_exact(int, int, int, int, int) # TAILCALL
double expectedCost<2>(int, int, int, int, int): # #double expectedCost<2>(int, int, int, int, int)
jmp expectedCost_approx(int, int, int, int, int) # TAILCALL
double expectedCost<3>(int, int, int, int, int): # #double expectedCost<3>(int, int, int, int, int)
jmp expectedCost_exact(int, int, int, int, int) # TAILCALL
double expectedCost<4>(int, int, int, int, int): # #double expectedCost<4>(int, int, int, int, int)
jmp expectedCost_approx(int, int, int, int, int) # TAILCALL
https://godbolt.org/z/ZtoKFH
The above site simplifies this whole process for you.
In this case I didn't provide definitions for expectedCost_approx so the compiler just leaves a jump. But in any case, compilers are definitely smart enough to realize that each template function has a constant value in the switch.
The answer to your question is: Yes, any moderately useful compiler will perform dead code elimination.
if constexpr is not so much about forcing compile time evaluation for reasons of performance. In terms of performance, there's not really going to be any difference between if constexpr and a normal if when the given expression is a compile-time constant because compilers will end up optimizing the unused branch away either way. What if constexpr enables is to have code in the inactive branch that must not be instantiated with the given template arguments (e.g., because it would be invalid in that particular case). For your switch above, the whole code will be instantiated for all cases. Only afterwards will the unused code be removed by the optimizer. if constexpr on the other hand, guarantees that the code in the unused branch will never be instantiated to begin with. See, e.g., here for more on that…
We don't have switch constexpr, and there are no guaranties of branch elimination for simple switch, even with constexpr value (as for regular if in fact), but I expect than compiler would remove them with proper optimization flag.
Notice also that your not-used branches would instantiate, if any, template methods/objects whereas if constexpr would not.
So if you want to have guaranty that only relevant code is there, or avoid unneeded instantiations, use if constexpr. Else use the one you find the clearer.
I'm debugging a full memory dump (procdump -ma ...), and I'm investigating the call stack, corresponding with following piece of source code:
unsigned int __stdcall ExecutionThread(void* pArg)
{
__try
{
BOOL bRunning = TRUE;
CInternalManagerObject* pInternalManagerObject = (CInternalManagerObject*) pArg;
pInternalManagerObject->Init();
CInternaStartlManagerObject* pInternaStartlManagerObject = pInternalManagerObject->GetInternaStartlManagerObject();
while(bRunning)
{
bRunning = pInternalManagerObject->Poll(pInternaStartlManagerObject);
if (CSLGlobal::IsValidHandle(_Module.m_hNeverEvent))
WaitForSingleObject(_Module.m_hNeverEvent, 15);
} <<<<<<<<<<<<<<<<============== here is the call stack pointer
pInternalManagerObject->DeInit();
As you can see, pArg is being typecasted and then being used, so it's impossible for pArg to be NULL, but yet this is exactly what the watch-window is telling me. In top of this, the internal variables seem not to be known (also as mentioned in the watch-window).
Watch-window content :
pArg 0x0000000000000000 void *
bRunning identifier "bRunning" is undefined
pInternalManagerObject identifier "pInternalManagerObject" is undefined
I can understand bRunning being optimised away, as this variable is not used anymore, but this is not correct for pInternalManagerObject, which is still used in the following line.
The symbols seem to be loaded fine.
I'm viewing this using Visual Studio Professional 2017, version 15.8.8.
Does anybody have a clue what might be causing this weird behaviour and what I can do in order to get a dump with correct values for the internal variables?
Edit after question for generated assembly code
The generated assembly is:
27:
28: unsigned int __stdcall ExecutionThread(void* pArg)
29: {
00007FF69C7A1690 48 89 5C 24 08 mov qword ptr [rsp+8],rbx
00007FF69C7A1695 48 89 74 24 10 mov qword ptr [rsp+10h],rsi
00007FF69C7A169A 57 push rdi
00007FF69C7A169B 48 83 EC 20 sub rsp,20h
00007FF69C7A169F 48 8B F9 mov rdi,rcx
30: __try
31: {
32: BOOL bRunning = TRUE;
00007FF69C7A16A2 BB 01 00 00 00 mov ebx,1
33: CInternalManagerObject* pInternalManagerObject = (CInternalManagerObject*) pArg;
34:
35: pInternalManagerObject->Init();
00007FF69C7A16A7 E8 64 EA FD FF call CInternalManagerObject::Init (07FF69C780110h)
36:
37: CBaseManager* pBaseManager = pInternalManagerObject->GetBaseManager();
00007FF69C7A16AC 48 8B CF mov rcx,rdi
00007FF69C7A16AF E8 0C E9 FD FF call CInternalManagerObject::GetBaseManager (07FF69C77FFC0h)
00007FF69C7A16B4 48 8B F0 mov rsi,rax
40: {
41: bRunning = pInternalManagerObject->Poll(pBaseManager);
00007FF69C7A16B7 48 8B CF mov rcx,rdi
38:
39: while(bRunning)
00007FF69C7A16BA 85 DB test ebx,ebx
00007FF69C7A16BC 74 2E je ExecutionThread+5Ch (07FF69C7A16ECh)
40: {
41: bRunning = pInternalManagerObject->Poll(pBaseManager);
00007FF69C7A16BE 48 8B D6 mov rdx,rsi
40: {
41: bRunning = pInternalManagerObject->Poll(pBaseManager);
00007FF69C7A16C1 E8 7A ED FD FF call CInternalManagerObject::Poll (07FF69C780440h)
00007FF69C7A16C6 8B D8 mov ebx,eax
42:
43: if (CSLGlobal::IsValidHandle(_Module.m_hNeverEvent))
00007FF69C7A16C8 48 8D 0D C1 13 0E 00 lea rcx,[_Module+550h (07FF69C882A90h)]
00007FF69C7A16CF E8 3C F2 FB FF call __Skyline_Global::CSLGlobal::IsValidHandle (07FF69C760910h)
00007FF69C7A16D4 85 C0 test eax,eax
00007FF69C7A16D6 74 12 je ExecutionThread+5Ah (07FF69C7A16EAh)
44: WaitForSingleObject(_Module.m_hNeverEvent, 15);
00007FF69C7A16D8 BA 0F 00 00 00 mov edx,0Fh
00007FF69C7A16DD 48 8B 0D AC 13 0E 00 mov rcx,qword ptr [_Module+550h (07FF69C882A90h)]
00007FF69C7A16E4 FF 15 16 0B 08 00 call qword ptr [__imp_WaitForSingleObject (07FF69C822200h)]
45: }
00007FF69C7A16EA EB CB jmp ExecutionThread+27h (07FF69C7A16B7h)
46:
47: pInternalManagerObject->DeInit();
00007FF69C7A16EC E8 FF E7 FD FF call CInternalManagerObject::DeInit (07FF69C77FEF0h)
48: }
I suppose this means that the correct value of pArg can be found in register RDI.
The Register window gives me following information:
RAX = 0000000000000000
RBX = 0000000000000001
RCX = 0000000000000000
RDX = 0000000000000000
RSI = 00000072A1E83220
RDI = 00000072A14A9990
...
Having a look into the memory at the mentioned place, I see hexadecimal values like:
0x00000072A14A9990 98 59 82 9c f6 7f 00 00 01 00 00 00 00 00 08 00 28 d2 28 62 f9 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 0e 78 a2 72 00 ˜Y.œö...........(Ò(bù...................................P.x¢r.
0x00000072A14A99CE 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 5c 07 00 00 00 00 00 00 d0 07 00 02 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ..ÿÿÿÿ............\.......Ð.......ÿÿÿÿÿÿÿÿÿÿÿÿ................
0x00000072A14A9A0C 00 00 00 00 d0 07 00 02 00 00 00 00 38 59 82 9c f6 7f 00 00 f0 90 60 a2 72 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 68 59 82 9c f6 7f 00 00 00 00
Does this mean that pArg is not NULL indeed? (Sorry, but I'm not experienced in assembly debugging)
Does this mean that pArg is not NULL indeed?
No it does not mean that; pArg is null. The watch window tells you, the registers tell you.
As you can see, pArg is being typecasted and then being used, so it's
impossible for pArg to be NULL.
That's not correct; that's not what the cast does. If the variable is null then the result of the cast will be null.
https://en.cppreference.com/w/c/language/cast
I suppose this means that the correct value of pArg can be found in
register RDI.
No; pArg is mounted onto rcx; mov works right-to-left.
mov rcx,rdi
RCX = 0000000000000000
https://c9x.me/x86/html/file_module_x86_id_176.html
I can understand bRunning being optimised away, as this variable is not used anymore, but this is not correct for pInternalManagerObject,
which is still used in the following line.
My guess is that you've observed the watch window when the program counter is on the first line of your function. bRunning and pInternalManagerObject are out of scope. (Although they could potentially be stripped due to optimisation). Note that if a variable is stripped you won't be able to see it even if it is used.
Thoughts
Program defensively. make a call to assert (or whatever assertion macro the codebase uses) in order to check the value of pArg (or any other pointer) before dereferencing. If it's an error you could reasonably see in production, go one step further: log the unexpected behaviour and early-out the function. http://www.cplusplus.com/reference/cassert/assert/
KISS: Whilst I'd commend anyone who's willing to get their "hands dirty" in this case it's just not necessary to start cracking open the disassembly. In this case the answer's right there. https://en.wikipedia.org/wiki/KISS_principle
Additionally you get a better response on SO if question is phrased in a manner that's easier to read. Remember to explain what it is you are doing and what the problem is before going into code. Explain what fault you are facing (along with any error output), along with asking a question. https://stackoverflow.com/help/how-to-ask
I created code where I have two functions returnValues and returnValuesVoid. One returns tuple of 2 values and other accept argument's references to the function.
#include <iostream>
#include <tuple>
std::tuple<int, int> returnValues(const int a, const int b) {
return std::tuple(a,b);
}
void returnValuesVoid(int &a,int &b) {
a += 100;
b += 100;
}
int main() {
auto [x,y] = returnValues(10,20);
std::cout << x ;
std::cout << y ;
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
std::cout << b ;
}
I read about http://en.cppreference.com/w/cpp/language/structured_binding
which can destruct tuple to auto [x,y] variables.
Is auto [x,y] = returnValues(10,20); better than passing by references? As I know it's slower because it does have to return tuple object and reference just works on orginal variables passed to function so there's no reason to use it except cleaner code.
As auto [x,y] is since C++17 do people use it on production? I see that it looks cleaner than returnValuesVoid which is void type and but does it have other advantages over passing by reference?
Look at disassemble (compiled with GCC -O3):
It takes more instruction to implement tuple call.
0000000000000000 <returnValues(int, int)>:
0: 83 c2 64 add $0x64,%edx
3: 83 c6 64 add $0x64,%esi
6: 48 89 f8 mov %rdi,%rax
9: 89 17 mov %edx,(%rdi)
b: 89 77 04 mov %esi,0x4(%rdi)
e: c3 retq
f: 90 nop
0000000000000010 <returnValuesVoid(int&, int&)>:
10: 83 07 64 addl $0x64,(%rdi)
13: 83 06 64 addl $0x64,(%rsi)
16: c3 retq
But less instructions for the tuple caller:
0000000000000000 <callTuple()>:
0: 48 83 ec 18 sub $0x18,%rsp
4: ba 14 00 00 00 mov $0x14,%edx
9: be 0a 00 00 00 mov $0xa,%esi
e: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
13: e8 00 00 00 00 callq 18 <callTuple()+0x18> // call returnValues
18: 8b 74 24 0c mov 0xc(%rsp),%esi
1c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
23: e8 00 00 00 00 callq 28 <callTuple()+0x28> // std::cout::operator<<
28: 8b 74 24 08 mov 0x8(%rsp),%esi
2c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
33: e8 00 00 00 00 callq 38 <callTuple()+0x38> // std::cout::operator<<
38: 48 83 c4 18 add $0x18,%rsp
3c: c3 retq
3d: 0f 1f 00 nopl (%rax)
0000000000000040 <callRef()>:
40: 48 83 ec 18 sub $0x18,%rsp
44: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
49: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
4e: c7 44 24 08 0a 00 00 movl $0xa,0x8(%rsp)
55: 00
56: c7 44 24 0c 14 00 00 movl $0x14,0xc(%rsp)
5d: 00
5e: e8 00 00 00 00 callq 63 <callRef()+0x23> // call returnValuesVoid
63: 8b 74 24 08 mov 0x8(%rsp),%esi
67: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
6e: e8 00 00 00 00 callq 73 <callRef()+0x33> // std::cout::operator<<
73: 8b 74 24 0c mov 0xc(%rsp),%esi
77: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7e: e8 00 00 00 00 callq 83 <callRef()+0x43> // std::cout::operator<<
83: 48 83 c4 18 add $0x18,%rsp
87: c3 retq
I don't think there is any considerable performance different, but the tuple one is more clear, more readable.
Also tried inlined call, there is absolutely no different at all. Both of them generate exactly the same assemble code.
0000000000000000 <callTuple()>:
0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7: 48 83 ec 08 sub $0x8,%rsp
b: be 6e 00 00 00 mov $0x6e,%esi
10: e8 00 00 00 00 callq 15 <callTuple()+0x15>
15: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
1c: be 78 00 00 00 mov $0x78,%esi
21: 48 83 c4 08 add $0x8,%rsp
25: e9 00 00 00 00 jmpq 2a <callTuple()+0x2a> // TCO, optimized way to call a function and also return
2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000030 <callRef()>:
30: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
37: 48 83 ec 08 sub $0x8,%rsp
3b: be 6e 00 00 00 mov $0x6e,%esi
40: e8 00 00 00 00 callq 45 <callRef()+0x15>
45: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
4c: be 78 00 00 00 mov $0x78,%esi
51: 48 83 c4 08 add $0x8,%rsp
55: e9 00 00 00 00 jmpq 5a <callRef()+0x2a> // TCO, optimized way to call a function and also return
Focus on what's more readable and which approach provides a better intuition to the reader, and please keep the performance issues you might think that arise in the background.
A function that returns a tuple (or a pair, a struct, etc.) is yelling to the author that the function returns something, that almost always has some meaning that the user can take into account.
A function that gives back the results in variables passed by reference, may slip the eye's attention of a tired reader.
So, in general, prefer to return the results by a tuple.
Mike van Dyke pointed to this link:
F.21: To return multiple "out" values, prefer returning a tuple or struct
Reason
A return value is self-documenting as an "output-only"
value. Note that C++ does have multiple return values, by convention
of using a tuple (including pair), possibly with the extra convenience
of tie at the call site.
[...]
Exception
Sometimes, we need to pass an object to a function to manipulate its state. In such cases, passing the object by reference T& is usually the right technique.
Using another compiler (VS 2017) the resulting code shows no difference, as the function calls are just optimized away.
int main() {
00007FF6A9C51E50 sub rsp,28h
auto [x,y] = returnValues(10,20);
std::cout << x ;
00007FF6A9C51E54 mov edx,0Ah
00007FF6A9C51E59 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << y ;
00007FF6A9C51E5E mov edx,14h
00007FF6A9C51E63 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
00007FF6A9C51E68 mov edx,6Eh
00007FF6A9C51E6D call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << b ;
00007FF6A9C51E72 mov edx,78h
00007FF6A9C51E77 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
}
00007FF6A9C51E7C xor eax,eax
00007FF6A9C51E7E add rsp,28h
00007FF6A9C51E82 ret
So using clearer code seems to be the obvious choice.
What Zang said is true but not up-to the point. I ran the code provided in question with chrono to measure time. I think the answer needs to be edited after observing what happened.
For 1M iterations, time taken by function call via reference was 3ms while time taken by function call via std::tie combined with std::tuple was about 94ms.
Though the difference seems very less in practice, still tuple one will perform slightly slower. Hence, for performance intensive systems, I suggest using call by reference.
My code:
#include <iostream>
#include <tuple>
#include <chrono>
std::tuple<int, int> returnValues(const int a, const int b)
{
return std::tuple<int, int>(a, b);
}
void returnValuesVoid(int &a, int &b)
{
a += 100;
b += 100;
}
int main()
{
int a = 10, b = 20;
auto begin = std::chrono::high_resolution_clock::now();
int x, y;
for (int i = 0; i < 1000000; i++)
{
std::tie(x, y) = returnValues(a, b);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count()) << '\n';
a = 10;
b = 20;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++)
{
returnValuesVoid(a, b);
}
auto stop = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count()) << '\n';
}
void a() { ... }
void b() { ... }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
...
}
Assume f is called multiple times from various threads (potentially contended) after the entry of main. (and of course that the only calls to a and b are those seen above)
When the above code is compiled with gcc g++ 4.6 in -std=gnu++0x mode:
Q1. Is it guaranteed that a() will be called at least once and return before b() is called? That is to ask, on the first call to f(), is the constructor of x called at the same time an automatic duration local variable (non-static) would be (and not at global static initialization time for example)?
Q2. Is it guaranteed that b() will be called exactly once? Even if two threads execute f for the first time at the same time on different cores? If yes, by which specific mechanism does the GCC generated code provide synchronization? Edit: Additionally could one of the threads calling f() obtain access to x before the constructor of X returns?
Update: I am trying to compile an example and decompile to investigate mechanism...
test.cpp:
struct X;
void ext1(int x);
void ext2(X& x);
void a() { ext1(1); }
void b() { ext1(2); }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
ext2(x);
}
Then:
$ g++ -std=gnu++0x -c -o test.o ./test.cpp
$ objdump -d test.o -M intel > test.dump
test.dump:
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z1av>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: bf 01 00 00 00 mov edi,0x1
9: e8 00 00 00 00 call e <_Z1av+0xe>
e: 5d pop rbp
f: c3 ret
0000000000000010 <_Z1bv>:
10: 55 push rbp
11: 48 89 e5 mov rbp,rsp
14: bf 02 00 00 00 mov edi,0x2
19: e8 00 00 00 00 call 1e <_Z1bv+0xe>
1e: 5d pop rbp
1f: c3 ret
0000000000000020 <_Z1fv>:
20: 55 push rbp
21: 48 89 e5 mov rbp,rsp
24: 41 54 push r12
26: 53 push rbx
27: e8 00 00 00 00 call 2c <_Z1fv+0xc>
2c: b8 00 00 00 00 mov eax,0x0
31: 0f b6 00 movzx eax,BYTE PTR [rax]
34: 84 c0 test al,al
36: 75 2d jne 65 <_Z1fv+0x45>
38: bf 00 00 00 00 mov edi,0x0
3d: e8 00 00 00 00 call 42 <_Z1fv+0x22>
42: 85 c0 test eax,eax
44: 0f 95 c0 setne al
47: 84 c0 test al,al
49: 74 1a je 65 <_Z1fv+0x45>
4b: 41 bc 00 00 00 00 mov r12d,0x0
51: bf 00 00 00 00 mov edi,0x0
56: e8 00 00 00 00 call 5b <_Z1fv+0x3b>
5b: bf 00 00 00 00 mov edi,0x0
60: e8 00 00 00 00 call 65 <_Z1fv+0x45>
65: bf 00 00 00 00 mov edi,0x0
6a: e8 00 00 00 00 call 6f <_Z1fv+0x4f>
6f: 5b pop rbx
70: 41 5c pop r12
72: 5d pop rbp
73: c3 ret
74: 48 89 c3 mov rbx,rax
77: 45 84 e4 test r12b,r12b
7a: 75 0a jne 86 <_Z1fv+0x66>
7c: bf 00 00 00 00 mov edi,0x0
81: e8 00 00 00 00 call 86 <_Z1fv+0x66>
86: 48 89 d8 mov rax,rbx
89: 48 89 c7 mov rdi,rax
8c: e8 00 00 00 00 call 91 <_Z1fv+0x71>
Disassembly of section .text._ZN1XC2Ev:
0000000000000000 <_ZN1XC1Ev>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 10 sub rsp,0x10
8: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
c: e8 00 00 00 00 call 11 <_ZN1XC1Ev+0x11>
11: c9 leave
12: c3 ret
I don't see the synchronization mechanism? Or is it added at linktime?
Update2: Ok when I link it I can see it...
400973: 84 c0 test %al,%al
400975: 75 2d jne 4009a4 <_Z1fv+0x45>
400977: bf 98 20 40 00 mov $0x402098,%edi
40097c: e8 1f fe ff ff callq 4007a0 <__cxa_guard_acquire#plt>
400981: 85 c0 test %eax,%eax
400983: 0f 95 c0 setne %al
400986: 84 c0 test %al,%al
400988: 74 1a je 4009a4 <_Z1fv+0x45>
40098a: 41 bc 00 00 00 00 mov $0x0,%r12d
400990: bf a0 20 40 00 mov $0x4020a0,%edi
400995: e8 a6 00 00 00 callq 400a40 <_ZN1XC1Ev>
40099a: bf 98 20 40 00 mov $0x402098,%edi
40099f: e8 0c fe ff ff callq 4007b0 <__cxa_guard_release#plt>
4009a4: bf a0 20 40 00 mov $0x4020a0,%edi
4009a9: e8 72 ff ff ff callq 400920 <_Z4ext2R1X>
4009ae: 5b pop %rbx
4009af: 41 5c pop %r12
4009b1: 5d pop %rbp
It surrounds it with __cxa_guard_acquire and __cxa_guard_release, whatever they do.
Q1. Yes. According to C++11, 6.7/4:
such a variable is initialized the first time control passes through its declaration
so it will be initialised after the first call to a().
Q2. Under GCC, and any compiler that supports the C++11 thread model: yes, initialisation of local static variables is thread safe. Other compilers might not give that guarantee. The exact mechanism is an implementation detail. I believe GCC uses an atomic flag to indicate whether it's initialised, and a mutex to protect initialisation when the flag is not set, but I could be wrong. Certainly, this thread implies that it was originally implemented like that.
UPDATE: your code does indeed contain the initialisation code. You can see it more clearly if you link it, and then disassemble the program, so that you can see which functions are being called. I also used objdump -SC to interleave the source and demangle C++ names. It uses internal locking functions __cxa_guard_acquire and __cxa_guard_release, to make sure only one thread executes the initialisation code.
#void f()
#{
400724: push rbp
400725: mov rbp,rsp
400728: push r13
40072a: push r12
40072c: push rbx
40072d: sub rsp,0x8
# a();
400731: call 400704 <a()>
# static X x;
# if (!guard) {
400736: mov eax,0x601050
40073b: movzx eax,BYTE PTR [rax]
40073e: test al,al
400740: jne 400792 <f()+0x6e>
# if (__cxa_guard_acquire(&guard)) {
400742: mov edi,0x601050
400747: call 4005c0 <__cxa_guard_acquire#plt>
40074c: test eax,eax
40074e: setne al
400751: test al,al
400753: je 400792 <f()+0x6e>
# // initialise x
400755: mov ebx,0x0
40075a: mov edi,0x601058
40075f: call 4007b2 <X::X()>
# __cxa_guard_release(&guard);
400764: mov edi,0x601050
400769: call 4005e0 <__cxa_guard_release#plt>
# } else {
40076e: jmp 400792 <f()+0x6e>
# // already initialised
400770: mov r12d,edx
400773: mov r13,rax
400776: test bl,bl
400778: jne 400784 <f()+0x60>
40077a: mov edi,0x601050
40077f: call 4005f0 <__cxa_guard_abort#plt>
400784: mov rax,r13
400787: movsxd rdx,r12d
40078a: mov rdi,rax
40078d: 400610 <_Unwind_Resume#plt>
# }
# }
# ext2(x);
400792: mov edi,0x601058
400797: call 4007d1 <_Z4ext2R1X>
#}
As far as I know it is guaranteed that b is only called once. However, it is not guaranteed that the initialisation is performed thread safe, which means another thread could potentially work with a half/not initialized x. (That's kind of funny because static mutexes are basicly useless this way.)