I have been learning C++ for the past couple of months. I know with functions your first declare parameters like so:
int myFunc(int funcVar);
and then you can pass in an integer variable to that function like so:
int x = 5;
myFunc(x);
When passing an argument to a function I would usually think of it like assigning and copying the value of x into the parameter of myFunc, which in C++ would look like this:
funcVar = x;
However, I noticed when declaring functions which have parameters of references (or pointers):
int myFunc(int & funcVar);
that I can either pass in the variable x to myFunc:
myFunc(x);
which would look like (in my mind):
&funcVar = x;
or you can pass in an actual reference as the argument
int & rX = x;
myFunc(rX);
and the function would work as well which with my thinking would look like this statement in C++
int & funcVar = rX
which would not make sense assigning a reference to a reference. My question is then how does the compiler actually load in arguments in a function? Should I not think of it like assigning the value of the variable to the parameter of the function?
When you call a function, each parameter of the function is initialized (not assigned). The rules for this are the same as the rules for any other copy-initialization. So if you have
int myFunc(int funcVar);
int x = 5;
myFunc(x);
then funcVar is initialized as though by a statement like this:
int funcVar = x;
and if you have
int myFunc(int & funcVar);
myFunc(x);
int & rX = x;
myFunc(rX);
then funcVar is initialized (and not assigned) as though by statements like this:
int & funcVar = x;
int & funcVar = rX;
The initialization of a reference binds it to the object or function denoted by the initializer. The second initialization does make sense---the expression rX denotes the object x because rX is a reference bound to x. Therefore, initializing a reference with rX has the same effect as initializing a reference with x.
Let us make easy code and disassemble.
int by_value(int x) { return x; }
int by_reference(int &x) { return x; }
int by_pointer(int *x) { return *x; }
int main()
{
int x = 1;
by_value(x);
by_reference(x);
by_pointer(&x);
return 0;
}
$ g++ -g -O0 a.cpp ; objdump -dS a.out
In my environment (x86_64, g++ (SUSE Linux) 4.8.3 20140627), result is as following.
(full text is here http://ideone.com/Z5G8yz)
00000000004005dd <_Z8by_valuei>:
int by_value(int x) { return x; }
4005dd: 55 push %rbp
4005de: 48 89 e5 mov %rsp,%rbp
4005e1: 89 7d fc mov %edi,-0x4(%rbp)
4005e4: 8b 45 fc mov -0x4(%rbp),%eax
4005e7: 5d pop %rbp
4005e8: c3 retq
00000000004005e9 <_Z12by_referenceRi>:
int by_reference(int &x) { return x; }
4005e9: 55 push %rbp
4005ea: 48 89 e5 mov %rsp,%rbp
4005ed: 48 89 7d f8 mov %rdi,-0x8(%rbp)
4005f1: 48 8b 45 f8 mov -0x8(%rbp),%rax
4005f5: 8b 00 mov (%rax),%eax
4005f7: 5d pop %rbp
4005f8: c3 retq
00000000004005f9 <_Z10by_pointerPi>:
int by_pointer(int *x) { return *x; }
4005f9: 55 push %rbp
4005fa: 48 89 e5 mov %rsp,%rbp
4005fd: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400601: 48 8b 45 f8 mov -0x8(%rbp),%rax
400605: 8b 00 mov (%rax),%eax
400607: 5d pop %rbp
400608: c3 retq
0000000000400609 <main>:
int main()
{
400609: 55 push %rbp
40060a: 48 89 e5 mov %rsp,%rbp
40060d: 48 83 ec 10 sub $0x10,%rsp
int x = 1;
400611: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
by_value(x);
400618: 8b 45 fc mov -0x4(%rbp),%eax
40061b: 89 c7 mov %eax,%edi
40061d: e8 bb ff ff ff callq 4005dd <_Z8by_valuei>
by_reference(x);
400622: 48 8d 45 fc lea -0x4(%rbp),%rax
400626: 48 89 c7 mov %rax,%rdi
400629: e8 bb ff ff ff callq 4005e9 <_Z12by_referenceRi>
by_pointer(&x);
40062e: 48 8d 45 fc lea -0x4(%rbp),%rax
400632: 48 89 c7 mov %rax,%rdi
400635: e8 bf ff ff ff callq 4005f9 <_Z10by_pointerPi>
return 0;
40063a: b8 00 00 00 00 mov $0x0,%eax
}
by_reference(x) is as same as by_pointer(&x) !
It makes perfect sense to assign a reference to another reference (when first defining it, i.e. at initialization), and that's what actually happens. A reference is just an alias, so when you assign a reference to another reference you are just saying that the first one aliases the one you assigned. Example
int x = 42;
int& rx = x;
int& ry = rx;
++ry;
std::cout << x; // displays 43
Live on Coliru
Related
Consider this simple function:
struct Foo {
int a;
int b;
int c;
int d;
int e;
int f;
};
Foo foo() {
Foo f;
f.a = 1;
f.b = 2;
f.c = 3;
f.d = 4;
f.e = 5;
f.f = 6;
return f;
}
It generates the following assembly:
0000000000400500 <foo()>:
400500: 48 ba 01 00 00 00 02 movabs rdx,0x200000001
400507: 00 00 00
40050a: 48 b9 03 00 00 00 04 movabs rcx,0x400000003
400511: 00 00 00
400514: 48 be 05 00 00 00 06 movabs rsi,0x600000005
40051b: 00 00 00
40051e: 48 89 17 mov QWORD PTR [rdi],rdx
400521: 48 89 4f 08 mov QWORD PTR [rdi+0x8],rcx
400525: 48 89 77 10 mov QWORD PTR [rdi+0x10],rsi
400529: 48 89 f8 mov rax,rdi
40052c: c3 ret
40052d: 0f 1f 00 nop DWORD PTR [rax]
Based on the assembly, I understand that the caller created space for Foo on its stack, and passed that information in rdi to the callee.
I am trying to find documentation for this convention. Calling convention in linux states that rdi contains the first integer argument. In this case, foo doesn't have any arguments.
Moreover, if I make foo take one integer argument, that is now passed as rsi (register for second argument) with rdi used for address of the return object.
Can anyone provide some documentation and clarity on how rdi is used in system V ABI?
See section 3.2.3 Parameter Passing in the ABI docs which says:
If the type has class MEMORY, then the caller provides space for the
return value and passes the address of this storage in %rdi as if it
were the first argument to the function. In effect, this address
becomes a "hidden" first argument.
On return %rax will contain the address that has been passed in by the
caller in %rdi.
I am trying to figure out how the C++ binary code looks like, especially for virtual function calls. I have come up with few curious things. I have this following C++ code:
#include <iostream>
using namespace std;
class Base {
public:
virtual void print() { cout << "from base" << endl; }
};
class Derived : public Base {
public:
virtual void print() { cout << "from derived" << endl; }
};
int main() {
Base *b;
Derived d;
d.print();
b = &d;
b->print();
return 0;
}
I compiled it with clang++, and then use objdump:
00000000004008b0 <main>:
4008b0: 55 push rbp
4008b1: 48 89 e5 mov rbp,rsp
4008b4: 48 83 ec 20 sub rsp,0x20
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008bc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4008c3: e8 28 00 00 00 call 4008f0 <Derived::Derived()>
4008c8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008cc: e8 5f 00 00 00 call 400930 <Derived::print()>
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d5: 48 89 7d f0 mov QWORD PTR [rbp-0x10],rdi
4008d9: 48 8b 7d f0 mov rdi,QWORD PTR [rbp-0x10]
4008dd: 48 8b 07 mov rax,QWORD PTR [rdi]
4008e0: ff 10 call QWORD PTR [rax]
4008e2: 31 c0 xor eax,eax
4008e4: 48 83 c4 20 add rsp,0x20
4008e8: 5d pop rbp
4008e9: c3 ret
4008ea: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
My question is why in assembly code, we have the following code:
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
The local variable d in main() is stored at location [rbp-0x18]. This is in the automatic storage allocated on the stack for main().
lea rdi,[rbp-0x18]
This line loads the address of d into the rdi register. By convention, member functions of Derived treat rdi as the this pointer.
This question already has answers here:
C++ performance of accessing member variables versus local variables
(11 answers)
Closed 2 years ago.
I am trying to write a code as efficient as possible and I encountered the following situation:
int foo(int a, int b, int c)
{
return (a + b) % c;
}
All good! But what if I want to check if the result of the expression to be different of a constant lets say myConst. Lets say I can afford a temporary variable.
What method is the fastest of the following:
int foo(int a, int b, int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
or
int foo(int a, int b, int c)
{
int tmp = (a + b) % c
return (tmp != myConst) ? tmp : myException;
}
I can't decide. Where is the 'line' where recalculation is more expensive than allocating and deallocating a temporary variable or the other way around.
Don't worry about it, write concise code and leave micro-optimizations to the compiler.
In your example writing the same calculation twice is error prone - so do not do this. In your specific example, compiler is more than likely to avoid creating a temporary on the stack at all!
Your example can (does on my compiler) produce following assembly (i have replaced myConst with constexpr 42 and myException with 0):
foo(int, int, int):
leal (%rdi,%rsi), %eax # this adds a and b, puts result to eax
movl %edx, %ecx # loads c
cltd
idivl %ecx # performs division, puts result into edx
movl $0, %eax #, prepares to return exception value
cmpl $42, %edx #, compares result of division with magic const
cmovne %edx, %eax # overwrites pessimized exception if all is cool
ret
As you see, there is no temporary anywhere in sight!
Use the later.
You're not computing the same value twice.
The code is more clear.
Creating local variables on the stack doesn't take any significant amount of time.
Check the assembler code this generates for both versions. You most likely want the highest optimization settings for your compiler.
You may very well find out the compiler itself can figure out the intermediate value is used twice, but only inside the function, so safe to store in a register.
To add to what has already been posted, ease of debugging is at least as important as code efficiency, (if there is any effect on code efficiency which, as others have posted, is unlikely with optimization on).
Go with the easiest to follow, test and debug.
Use a temp var.
If more developers used simpler, non-compound expressions and more temp vars, there would be far fewer 'Help - I cannot debug my code!' posts to SO.
Hard coded values
The following only applies to hard coded values (even if they aren't const or constexpr)
In the following example on MSVC 2015 they were optimized away completely and replaced only with mov edx, result (=1 in this example):
#include <iostream>
#include <exception>
int myConst{4};
int myException{2};
int foo1(int a,int b,int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
int foo2(int a,int b,int c)
{
int tmp = (a + b) % c;
return (tmp != myConst) ? tmp : myException;
}
int main()
{
00007FF71F0E1000 48 83 EC 28 sub rsp,28h
auto test1{foo1(5,2,3)};
auto test2{foo2(5,2,3)};
std::cout << test1 <<'\n';
00007FF71F0E1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E100B BA 01 00 00 00 mov edx,1
00007FF71F0E1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1016 48 8B C8 mov rcx,rax
00007FF71F0E1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
std::cout << test2 <<'\n';
00007FF71F0E101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E1025 BA 01 00 00 00 mov edx,1
00007FF71F0E102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1030 48 8B C8 mov rcx,rax
00007FF71F0E1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
return 0;
00007FF71F0E1038 33 C0 xor eax,eax
}
00007FF71F0E103A 48 83 C4 28 add rsp,28h
00007FF71F0E103E C3 ret
At this point other have pointed out that the optimization will not happen if the values were passed, or if we had separate files, but it seems even if the code is in separate compilation units the optimization is still done and we don't get any instructions for these functions:
#include <iostream>
#include <exception>
#include "Header.h"
int main()
{
00007FF667BF1000 48 83 EC 28 sub rsp,28h
int var1{5},var2{2},var3{3};
auto test1{foo1(var1,var2,var3)};
auto test2{foo2(var1,var2,var3)};
std::cout << test1 <<'\n';
00007FF667BF1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF100B BA 01 00 00 00 mov edx,1
00007FF667BF1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1016 48 8B C8 mov rcx,rax
00007FF667BF1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
std::cout << test2 <<'\n';
00007FF667BF101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF1025 BA 01 00 00 00 mov edx,1
00007FF667BF102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1030 48 8B C8 mov rcx,rax
00007FF667BF1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
return 0;
00007FF667BF1038 33 C0 xor eax,eax
}
00007FF667BF103A 48 83 C4 28 add rsp,28h
00007FF667BF103E C3 ret
I have the following class:
template<ItType I, LockType L>
class ArcItBase;
with a (one of them) constructor:
ArcItBase ( StableRootedDigraph& g_, Node const n_ ) noexcept :
srd ( g_ ),
arc ( I == ItType::in
? srd.nodes [ n_ ].head_in
: srd.nodes [ n_ ].head_out ) { }
The question is (which I don't see how to test) whether the value of the expression for the constructor of arc will be determined at compile-time or at run-time (Release, full optimization, clang-cl and VC14), given that I == ItType::in can be evaluated (is known, I is either ItType::in or ItType::out) at compile-time to either true or false?
It is not possible to have your code compiling without knowing the ItType at compile time.
The template parameter is evaluated at compile time and the conditional is a core constant expression, standard reference is C++11 5.19/2.
In the contrasting case the compiler would have to generate code that is equivalent to
arc(true ? : )
Which if you would actually write it would be optimized. However the rest of the conditional will not be optimized since you are accessing a what seems to be a non static member and cannot be evaluated as a core constant expression.
However, compilers may not always work as we expect so if you would actually want to test this you should dump the disassembled object file
objdump -DS file.o
and then you can better navigate the output.
Another option would be to launch the debugger and inspect the code.
Don't forget that you can always have your symbols even in case of optimizing, e.g.
g++ -O3 -g -c foo.cpp
Below you will find a toy implementation . In the first case values are given to the constructor of arcbase is called as:
arcbase<true> a(10,9);
Whereas in the second it is given non const random values that cannot be known at compile time.
After compiling with g++ --stc=c++11 -c -O3 -g the first case creates:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 89 75 f4 mov %esi,-0xc(%rbp)
f: 89 55 f0 mov %edx,-0x10(%rbp)
12: 48 8b 45 f8 mov -0x8(%rbp),%rax
16: 8b 55 f0 mov -0x10(%rbp),%edx
19: 8b 4d f4 mov -0xc(%rbp),%ecx
1c: 89 ce mov %ecx,%esi
1e: 48 89 c7 mov %rax,%rdi
21: e8 00 00 00 00 callq 26 <arcbase<true>::arcbase(int, int)+0x26>
26: 48 8b 45 f8 mov -0x8(%rbp),%rax
2a: 8b 00 mov (%rax),%eax
2c: 48 8b 55 f8 mov -0x8(%rbp),%rdx
30: 48 83 c2 08 add $0x8,%rdx
34: 89 c6 mov %eax,%esi
36: 48 89 d7 mov %rdx,%rdi
39: e8 00 00 00 00 callq 3e <arcbase<true>::arcbase(int, int)+0x3e>
3e: c9 leaveq
3f: c3 retq
Whereas the second case:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 53 push %rbx
1: 48 89 fb mov %rdi,%rbx
4: e8 00 00 00 00 callq 9 <arcbase<true>::arcbase(int, int)+0x9>
9: 48 8d 7b 08 lea 0x8(%rbx),%rdi
d: 8b 33 mov (%rbx),%esi
f: 5b pop %rbx
10: e9 00 00 00 00 jmpq 15 <arcbase<true>::arcbase(int, int)+0x15>
Looking at the dissasembly you should notice that even in the first case the value of 10 is not directly passed as is to the constructor, but instead only placed in the register from where is is retrieved.
Here is the output from gdb :
0x400910 <_ZN3arcC2Ei> mov %esi,(%rdi)
0x400912 <_ZN3arcC2Ei+2> retq
0x400913 nop
0x400914 nop
0x400915 nop
0x400916 nop
0x400917 nop
0x400918 nop
0x400919 nop
0x40091a nop
0x40091b nop
0x40091c nop
0x40091d nop
0x40091e nop
0x40091f nop
0x400920 <_ZN7arcbaseILb1EEC2Eii> push %rbx
0x400921 <_ZN7arcbaseILb1EEC2Eii+1> mov %rdi,%rbx
0x400924 <_ZN7arcbaseILb1EEC2Eii+4> callq 0x400900 <_ZN3srdC2Eii>
0x400929 <_ZN7arcbaseILb1EEC2Eii+9> lea 0x8(%rbx),%rdi
0x40092d <_ZN7arcbaseILb1EEC2Eii+13> mov (%rbx),%esi
0x40092f <_ZN7arcbaseILb1EEC2Eii+15> pop %rbx
0x400930 <_ZN7arcbaseILb1EEC2Eii+16> jmpq 0x400910 <_ZN3arcC2Ei>
The code for the second case is :
struct llist
{
int head_in;
int head_out;
llist(int a , int b ) : head_in(a), head_out(b) {}
};
struct srd
{
llist nodes;
srd(int a, int b) : nodes(a,b) {}
};
struct arc
{
int y;
arc( int x):y(x) {}
};
template< bool I > class arcbase
{
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
void print()
{
std::cout << iarc.y << std::endl;
}
};
int main(void)
{
std::srand(time(0));
volatile int a_ = std::rand()%100;
volatile int b_ = std::rand()%4;
arcbase<true> a(a_,b_);
a.print();
return 0;
}
I wanted to see what was happening behind the scenes when an unsigned long long was assigned the value of an unsigned int. I made a simple C++ program to try it out and moved all the io out of main():
#include <iostream>
#include <stdlib.h>
void usage() {
std::cout << "Usage: ./u_to_ull <unsigned int>\n";
exit(0);
}
void atoiWarning(int foo) {
std::cout << "WARNING: atoi() returned " << foo << " and (unsigned int)foo is " <<
((unsigned int)foo) << "\n";
}
void result(unsigned long long baz) {
std::cout << "Result as unsigned long long is " << baz << "\n";
}
int main(int argc, char** argv) {
if (argc != 2) usage();
int foo = atoi(argv[1]);
if (foo < 0) atoiWarning(foo);
// Signed to unsigned
unsigned int bar = foo;
// Conversion
unsigned long long baz = -1;
baz = bar;
result(baz);
return 0;
}
The resulting assembly produced this for main:
0000000000400950 <main>:
400950: 55 push %rbp
400951: 48 89 e5 mov %rsp,%rbp
400954: 48 83 ec 20 sub $0x20,%rsp
400958: 89 7d ec mov %edi,-0x14(%rbp)
40095b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40095f: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
400963: 74 05 je 40096a <main+0x1a>
400965: e8 3a ff ff ff callq 4008a4 <_Z5usagev>
40096a: 48 8b 45 e0 mov -0x20(%rbp),%rax
40096e: 48 83 c0 08 add $0x8,%rax
400972: 48 8b 00 mov (%rax),%rax
400975: 48 89 c7 mov %rax,%rdi
400978: e8 0b fe ff ff callq 400788 <atoi#plt>
40097d: 89 45 f0 mov %eax,-0x10(%rbp)
400980: 83 7d f0 00 cmpl $0x0,-0x10(%rbp)
400984: 79 0a jns 400990 <main+0x40>
400986: 8b 45 f0 mov -0x10(%rbp),%eax
400989: 89 c7 mov %eax,%edi
40098b: e8 31 ff ff ff callq 4008c1 <_Z11atoiWarningi>
400990: 8b 45 f0 mov -0x10(%rbp),%eax
400993: 89 45 f4 mov %eax,-0xc(%rbp)
400996: 48 c7 45 f8 ff ff ff movq $0xffffffffffffffff,-0x8(%rbp)
40099d: ff
40099e: 8b 45 f4 mov -0xc(%rbp),%eax
4009a1: 48 89 45 f8 mov %rax,-0x8(%rbp)
4009a5: 48 8b 45 f8 mov -0x8(%rbp),%rax
4009a9: 48 89 c7 mov %rax,%rdi
4009ac: e8 66 ff ff ff callq 400917 <_Z6resulty>
4009b1: b8 00 00 00 00 mov $0x0,%eax
4009b6: c9 leaveq
4009b7: c3 retq
The -1 from the C++ makes it clear that -0x8(%rbp) corresponds to baz (due to $0xffffffffffffffff). -0x8(%rbp) is written to by %rax, but the top four bytes of %rax appear to not have been assigned, %eaxwas assigned
Does this suggest that the top 4 bytes of -0x8(%rbp) are undefined?
In the IntelĀ® 64 and IA-32 Architectures Software Developer Manuals, volume 1, chapter 3.4.1.1 (General-Purpose Registers in 64-Bit Mode), it says
32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
So after mov -0xc(%rbp),%eax, the upper half of rax is defined, and it's zero.
This also applies to the 87 C0 encoding of xchg eax, eax, but not to its 90 encoding (which is defined as nop, overruling the rule quoted above).
From C++98 (and C++11 seems to be unchanged) 4.7/2 (integral conversions - no promotions are relevant) we learn:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type).
This clearly shows that as long as the source and destination are unsigned and the destination is at least as large as the source, the value will be unchanged. If the compiler generated code that failed to make the larger value equal, the compiler is buggy.