Operator [] long and short versions - c++

What is the advantage of using the longer version (something).operator[]() instead of simply (something)[]?
For example :
std::array<int, 10> arr1;
std::array<int, 10> arr2;
for(int i = 0; i < arr1.size(); i++)
std::cout << arr1[i] << ' ';
std::cout << std::endl;
for(int i = 0; i < arr2.size(); i++)
std::cout << arr2.operator[](i) << ' ';
std::cout << std::endl;

There is none. The [] is just syntactic sugar for operator[] on user-defined types. You only need the operator syntax when you define these functions yourself. This goes for all operators like operator(), operator[], operator new, operator=, ...

syntactic sugar
compiled with g++ -g -std=gnu++0x ...
0000000000400554 <main>:
#include <array>
int main() {
400554: 55 push %rbp
400555: 48 89 e5 mov %rsp,%rbp
400558: 48 83 ec 60 sub $0x60,%rsp
std::array<int, 10> arr1;
std::array<int, 10> arr2;
arr1[6];
40055c: 48 8d 45 d0 lea -0x30(%rbp),%rax
400560: be 06 00 00 00 mov $0x6,%esi
400565: 48 89 c7 mov %rax,%rdi
400568: e8 19 00 00 00 callq 400586 <std::array<int, 10ul>::operator[](unsigned long)>
arr2.operator[](6);
40056d: 48 8d 45 a0 lea -0x60(%rbp),%rax
400571: be 06 00 00 00 mov $0x6,%esi
400576: 48 89 c7 mov %rax,%rdi
400579: e8 08 00 00 00 callq 400586 <std::array<int, 10ul>::operator[](unsigned long)>
40057e: b8 00 00 00 00 mov $0x0,%eax
}
400583: c9 leaveq
400584: c3 retq
400585: 90 nop

Related

Temporary object for by-value function creation scope

I have this following code which i can not understand:
#include <cstdio>
#include <iostream>
using namespace std;
class A
{
public:
int t = 0;
A(){
cout << "constructed" << t<< endl;
}
A (A&& a) {
cout << "in move ctor, moving"<< a.t << endl;
}
~A() {
cout << "deleting"<< t << endl;
}
};
A f1 (A a)
{
a.t = 1;
std::cout << "f1: " << endl;
return a;
}
int main() {
A a = f1(A()) ;
printf("what is happening\n");
}
and the output is
constructed0
in move ctor, moving0
f1:
in move ctor, moving1
in move ctor, moving0
deleting0
deleting1
deleting0
what is happening
deleting0
The thing that I can not understand is the phase where the temporary object created for f1 (the one with a.t=1) is being destroyed.
From the output I assume it is being destroyed at the and of the line A a = f1(A()) ; While I thought it was created inside f1 and for f1, and therefore will be destroyed when exiting the function, before deleting0 is being called.
What am I missing?
So after a bit research I have the Answer.
Here is the disassembly of the code (changed the move constructor to a copy constructor for readability):
int A::counter = 0;
A f1 (A a)
{
400a18: 55 push %rbp
400a19: 48 89 e5 mov %rsp,%rbp
400a1c: 48 83 ec 10 sub $0x10,%rsp
400a20: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400a24: 48 89 75 f0 mov %rsi,-0x10(%rbp)
cout << __LINE__ << endl;
400a28: be 1d 00 00 00 mov $0x1d,%esi
400a2d: bf 80 13 60 00 mov $0x601380,%edi
400a32: e8 c1 fd ff ff callq 4007f8 <_ZNSolsEi#plt>
400a37: be 78 08 40 00 mov $0x400878,%esi
400a3c: 48 89 c7 mov %rax,%rdi
400a3f: e8 24 fe ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
a.t = 1;
400a44: 48 8b 45 f0 mov -0x10(%rbp),%rax
400a48: c7 00 01 00 00 00 movl $0x1,(%rax)
std::cout << "f1: " << endl;
400a4e: be ce 0e 40 00 mov $0x400ece,%esi
400a53: bf 80 13 60 00 mov $0x601380,%edi
400a58: e8 fb fd ff ff callq 400858 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc#plt>
400a5d: be 78 08 40 00 mov $0x400878,%esi
400a62: 48 89 c7 mov %rax,%rdi
400a65: e8 fe fd ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
cout << __LINE__ << endl;
400a6a: be 20 00 00 00 mov $0x20,%esi
400a6f: bf 80 13 60 00 mov $0x601380,%edi
400a74: e8 7f fd ff ff callq 4007f8 <_ZNSolsEi#plt>
400a79: be 78 08 40 00 mov $0x400878,%esi
400a7e: 48 89 c7 mov %rax,%rdi
400a81: e8 e2 fd ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
return a;
400a86: 48 8b 55 f0 mov -0x10(%rbp),%rdx
400a8a: 48 8b 45 f8 mov -0x8(%rbp),%rax
400a8e: 48 89 d6 mov %rdx,%rsi
400a91: 48 89 c7 mov %rax,%rdi
400a94: e8 dd 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
}
400a99: 48 8b 45 f8 mov -0x8(%rbp),%rax
400a9d: c9 leaveq
400a9e: c3 retq
0000000000400a9f <main>:
int main() {
400a9f: 55 push %rbp
400aa0: 48 89 e5 mov %rsp,%rbp
400aa3: 53 push %rbx
400aa4: 48 83 ec 48 sub $0x48,%rsp
A a = f1(A()) ;
400aa8: 48 8d 45 e0 lea -0x20(%rbp),%rax
400aac: 48 89 c7 mov %rax,%rdi
400aaf: e8 2a 01 00 00 callq 400bde <_ZN1AC1Ev>
400ab4: 48 8d 55 e0 lea -0x20(%rbp),%rdx
400ab8: 48 8d 45 d0 lea -0x30(%rbp),%rax
400abc: 48 89 d6 mov %rdx,%rsi
400abf: 48 89 c7 mov %rax,%rdi
400ac2: e8 af 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
400ac7: 48 8d 45 c0 lea -0x40(%rbp),%rax
400acb: 48 8d 55 d0 lea -0x30(%rbp),%rdx
400acf: 48 89 d6 mov %rdx,%rsi
400ad2: 48 89 c7 mov %rax,%rdi
400ad5: e8 3e ff ff ff callq 400a18 <_Z2f11A>
400ada: 48 8d 55 c0 lea -0x40(%rbp),%rdx
400ade: 48 8d 45 b0 lea -0x50(%rbp),%rax
400ae2: 48 89 d6 mov %rdx,%rsi
400ae5: 48 89 c7 mov %rax,%rdi
400ae8: e8 89 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
400aed: 48 8d 45 c0 lea -0x40(%rbp),%rax
400af1: 48 89 c7 mov %rax,%rdi
400af4: e8 31 02 00 00 callq 400d2a <_ZN1AD1Ev>
400af9: 48 8d 45 d0 lea -0x30(%rbp),%rax
400afd: 48 89 c7 mov %rax,%rdi
400b00: e8 25 02 00 00 callq 400d2a <_ZN1AD1Ev>
400b05: 48 8d 45 e0 lea -0x20(%rbp),%rax
400b09: 48 89 c7 mov %rax,%rdi
400b0c: e8 19 02 00 00 callq 400d2a <_ZN1AD1Ev>
printf("what is happening\n");
400b11: bf d3 0e 40 00 mov $0x400ed3,%edi
400b16: e8 ed fc ff ff callq 400808 <puts#plt>
cout << __LINE__ << endl;
return a;
}
The Copy constructor is called "ZN1AC1ERKS" after mangling process.
As we can see, the temporary object that is being created for f1, is being created before the function call, in main, and not as i expected, in f1's scope.
The meaning is as follows:
Temporary objects being created for functions that are called by value are not created in the functions scope, but rather on the line called the function, thus they will be destroyed before the next line execution, in the ordinary first created last destroyed way.

Which is better: returning tuple or passing arguments to function as references?

I created code where I have two functions returnValues and returnValuesVoid. One returns tuple of 2 values and other accept argument's references to the function.
#include <iostream>
#include <tuple>
std::tuple<int, int> returnValues(const int a, const int b) {
return std::tuple(a,b);
}
void returnValuesVoid(int &a,int &b) {
a += 100;
b += 100;
}
int main() {
auto [x,y] = returnValues(10,20);
std::cout << x ;
std::cout << y ;
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
std::cout << b ;
}
I read about http://en.cppreference.com/w/cpp/language/structured_binding
which can destruct tuple to auto [x,y] variables.
Is auto [x,y] = returnValues(10,20); better than passing by references? As I know it's slower because it does have to return tuple object and reference just works on orginal variables passed to function so there's no reason to use it except cleaner code.
As auto [x,y] is since C++17 do people use it on production? I see that it looks cleaner than returnValuesVoid which is void type and but does it have other advantages over passing by reference?
Look at disassemble (compiled with GCC -O3):
It takes more instruction to implement tuple call.
0000000000000000 <returnValues(int, int)>:
0: 83 c2 64 add $0x64,%edx
3: 83 c6 64 add $0x64,%esi
6: 48 89 f8 mov %rdi,%rax
9: 89 17 mov %edx,(%rdi)
b: 89 77 04 mov %esi,0x4(%rdi)
e: c3 retq
f: 90 nop
0000000000000010 <returnValuesVoid(int&, int&)>:
10: 83 07 64 addl $0x64,(%rdi)
13: 83 06 64 addl $0x64,(%rsi)
16: c3 retq
But less instructions for the tuple caller:
0000000000000000 <callTuple()>:
0: 48 83 ec 18 sub $0x18,%rsp
4: ba 14 00 00 00 mov $0x14,%edx
9: be 0a 00 00 00 mov $0xa,%esi
e: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
13: e8 00 00 00 00 callq 18 <callTuple()+0x18> // call returnValues
18: 8b 74 24 0c mov 0xc(%rsp),%esi
1c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
23: e8 00 00 00 00 callq 28 <callTuple()+0x28> // std::cout::operator<<
28: 8b 74 24 08 mov 0x8(%rsp),%esi
2c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
33: e8 00 00 00 00 callq 38 <callTuple()+0x38> // std::cout::operator<<
38: 48 83 c4 18 add $0x18,%rsp
3c: c3 retq
3d: 0f 1f 00 nopl (%rax)
0000000000000040 <callRef()>:
40: 48 83 ec 18 sub $0x18,%rsp
44: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
49: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
4e: c7 44 24 08 0a 00 00 movl $0xa,0x8(%rsp)
55: 00
56: c7 44 24 0c 14 00 00 movl $0x14,0xc(%rsp)
5d: 00
5e: e8 00 00 00 00 callq 63 <callRef()+0x23> // call returnValuesVoid
63: 8b 74 24 08 mov 0x8(%rsp),%esi
67: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
6e: e8 00 00 00 00 callq 73 <callRef()+0x33> // std::cout::operator<<
73: 8b 74 24 0c mov 0xc(%rsp),%esi
77: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7e: e8 00 00 00 00 callq 83 <callRef()+0x43> // std::cout::operator<<
83: 48 83 c4 18 add $0x18,%rsp
87: c3 retq
I don't think there is any considerable performance different, but the tuple one is more clear, more readable.
Also tried inlined call, there is absolutely no different at all. Both of them generate exactly the same assemble code.
0000000000000000 <callTuple()>:
0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7: 48 83 ec 08 sub $0x8,%rsp
b: be 6e 00 00 00 mov $0x6e,%esi
10: e8 00 00 00 00 callq 15 <callTuple()+0x15>
15: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
1c: be 78 00 00 00 mov $0x78,%esi
21: 48 83 c4 08 add $0x8,%rsp
25: e9 00 00 00 00 jmpq 2a <callTuple()+0x2a> // TCO, optimized way to call a function and also return
2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000030 <callRef()>:
30: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
37: 48 83 ec 08 sub $0x8,%rsp
3b: be 6e 00 00 00 mov $0x6e,%esi
40: e8 00 00 00 00 callq 45 <callRef()+0x15>
45: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
4c: be 78 00 00 00 mov $0x78,%esi
51: 48 83 c4 08 add $0x8,%rsp
55: e9 00 00 00 00 jmpq 5a <callRef()+0x2a> // TCO, optimized way to call a function and also return
Focus on what's more readable and which approach provides a better intuition to the reader, and please keep the performance issues you might think that arise in the background.
A function that returns a tuple (or a pair, a struct, etc.) is yelling to the author that the function returns something, that almost always has some meaning that the user can take into account.
A function that gives back the results in variables passed by reference, may slip the eye's attention of a tired reader.
So, in general, prefer to return the results by a tuple.
Mike van Dyke pointed to this link:
F.21: To return multiple "out" values, prefer returning a tuple or struct
Reason
A return value is self-documenting as an "output-only"
value. Note that C++ does have multiple return values, by convention
of using a tuple (including pair), possibly with the extra convenience
of tie at the call site.
[...]
Exception
Sometimes, we need to pass an object to a function to manipulate its state. In such cases, passing the object by reference T& is usually the right technique.
Using another compiler (VS 2017) the resulting code shows no difference, as the function calls are just optimized away.
int main() {
00007FF6A9C51E50 sub rsp,28h
auto [x,y] = returnValues(10,20);
std::cout << x ;
00007FF6A9C51E54 mov edx,0Ah
00007FF6A9C51E59 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << y ;
00007FF6A9C51E5E mov edx,14h
00007FF6A9C51E63 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
00007FF6A9C51E68 mov edx,6Eh
00007FF6A9C51E6D call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << b ;
00007FF6A9C51E72 mov edx,78h
00007FF6A9C51E77 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
}
00007FF6A9C51E7C xor eax,eax
00007FF6A9C51E7E add rsp,28h
00007FF6A9C51E82 ret
So using clearer code seems to be the obvious choice.
What Zang said is true but not up-to the point. I ran the code provided in question with chrono to measure time. I think the answer needs to be edited after observing what happened.
For 1M iterations, time taken by function call via reference was 3ms while time taken by function call via std::tie combined with std::tuple was about 94ms.
Though the difference seems very less in practice, still tuple one will perform slightly slower. Hence, for performance intensive systems, I suggest using call by reference.
My code:
#include <iostream>
#include <tuple>
#include <chrono>
std::tuple<int, int> returnValues(const int a, const int b)
{
return std::tuple<int, int>(a, b);
}
void returnValuesVoid(int &a, int &b)
{
a += 100;
b += 100;
}
int main()
{
int a = 10, b = 20;
auto begin = std::chrono::high_resolution_clock::now();
int x, y;
for (int i = 0; i < 1000000; i++)
{
std::tie(x, y) = returnValues(a, b);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count()) << '\n';
a = 10;
b = 20;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++)
{
returnValuesVoid(a, b);
}
auto stop = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count()) << '\n';
}

declaring a variable many times in the same code in Rcpp (C++)

For an R user who began using Rcpp, declaring variables is a new thing. My question is what actually happens when the same named variable is declared many times. In many examples, I see that index of for loops are declared each time.
cppFunction('
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
}
')
instead of
cppFunction('
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
}
')
Both seem to give the same answer. But is it generally ok to declare a variable (of the same name) many times in the same program? If it is not ok, when it is not ok? Or maybe I don't understand what 'declare' means, and e.g., the two functions above are identical (e.g., nothing is declared many times even in the first function).
Overview
Alrighty, let's take a looksie at the assembly code after a compiler has transformed both statements. The compiler in this case should ideally provide the same optimization (we may want to run with the -O2 flag).
Test Case
I've written up your file using pure C++. That is, I've opted to directly perform the compilation via terminal and not relying on Rcpp black magic which slips in #include <Rcpp.h> during every compilation.
test.cpp
#include <iostream>
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
}
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
}
int main(){
std::cout << add1(2) << std::endl;
std::cout << add2(2) << std::endl;
}
Decomposing the Binary
To see how the C++ code was translated into assembly, I've opted to use objdump over the built in otools on macOS. (Someone is more than welcome to provide that output as well).
In macOS, I did:
gcc -g -c test.cpp
# brew install binutils # required for (g)objdump
gobjdump -d -M intel -S test.o
This gives the following annotated output that I've chunked at the end of the post. In a nutshell, the assembly for both versions is exactly the same.
Benchmarks are King
Another way to verify would be to do a simple microbenchmark. If there was significant difference between the two, that would provide evidence to suggest different optimizations.
# install.packages("microbenchmark")
library("microbenchmark")
microbenchmark(a = add1(100L), b = add2(100L))
Gives:
Unit: microseconds
expr min lq mean median uq max neval
a 53.081 53.268 55.35613 53.576 53.8825 92.078 100
b 53.069 53.261 56.28195 53.431 53.6795 169.841 100
Switching the order:
microbenchmark(b = add2(100L), a = add1(100L))
Gives:
Unit: microseconds
expr min lq mean median uq max neval
b 53.112 53.3215 60.14641 55.0575 60.7685 196.865 100
a 53.130 53.6850 58.72041 55.2845 60.6005 93.401 100
In essence, the benchmarks themselves indicate no significant difference between either method.
Appendix
Long Output
Long output add1
int add1( const int n ){
a0: 55 push rbp
a1: 48 89 e5 mov rbp,rsp
a4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
int y = 0;
a7: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
for(int i=0; i<n; i++){
ae: c7 45 f4 00 00 00 00 mov DWORD PTR [rbp-0xc],0x0
b5: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
b8: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
bb: 0f 8d 76 00 00 00 jge 137 <__Z4add1i+0x97>
for(int j=0; j<n; j++) y++;
c1: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
c8: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
cb: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
ce: 0f 8d 1b 00 00 00 jge ef <__Z4add1i+0x4f>
d4: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
d7: 05 01 00 00 00 add eax,0x1
dc: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
df: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
e2: 05 01 00 00 00 add eax,0x1
e7: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
ea: e9 d9 ff ff ff jmp c8 <__Z4add1i+0x28>
for(int j=0; j<(n*2); j++) y++;
ef: c7 45 ec 00 00 00 00 mov DWORD PTR [rbp-0x14],0x0
f6: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
f9: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
fc: c1 e1 01 shl ecx,0x1
ff: 39 c8 cmp eax,ecx
101: 0f 8d 1b 00 00 00 jge 122 <__Z4add1i+0x82>
107: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
10a: 05 01 00 00 00 add eax,0x1
10f: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
112: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
115: 05 01 00 00 00 add eax,0x1
11a: 89 45 ec mov DWORD PTR [rbp-0x14],eax
11d: e9 d4 ff ff ff jmp f6 <__Z4add1i+0x56>
}
122: e9 00 00 00 00 jmp 127 <__Z4add1i+0x87>
return y ;
}
Long Output for add2
int add2( const int n ){
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
int y = 0;
7: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
int i, j;
for(i=0; i<n; i++){
e: c7 45 f4 00 00 00 00 mov DWORD PTR [rbp-0xc],0x0
15: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
18: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
1b: 0f 8d 76 00 00 00 jge 97 <__Z4add2i+0x97>
for(j=0; j<n; j++) y++;
21: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
28: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
2b: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
2e: 0f 8d 1b 00 00 00 jge 4f <__Z4add2i+0x4f>
34: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
37: 05 01 00 00 00 add eax,0x1
3c: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
3f: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
42: 05 01 00 00 00 add eax,0x1
47: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
4a: e9 d9 ff ff ff jmp 28 <__Z4add2i+0x28>
for(j=0; j<(n*2); j++) y++;
4f: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
56: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
59: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
5c: c1 e1 01 shl ecx,0x1
5f: 39 c8 cmp eax,ecx
61: 0f 8d 1b 00 00 00 jge 82 <__Z4add2i+0x82>
67: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
6a: 05 01 00 00 00 add eax,0x1
6f: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
72: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
75: 05 01 00 00 00 add eax,0x1
7a: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
7d: e9 d4 ff ff ff jmp 56 <__Z4add2i+0x56>
}
82: e9 00 00 00 00 jmp 87 <__Z4add2i+0x87>
Output short output
Short output for add1
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
127: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
12a: 05 01 00 00 00 add eax,0x1
12f: 89 45 f4 mov DWORD PTR [rbp-0xc],eax
132: e9 7e ff ff ff jmp b5 <__Z4add1i+0x15>
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
137: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
13a: 5d pop rbp
13b: c3 ret
13c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000140 <_main>:
}
Short output for add2
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
87: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
8a: 05 01 00 00 00 add eax,0x1
8f: 89 45 f4 mov DWORD PTR [rbp-0xc],eax
92: e9 7e ff ff ff jmp 15 <__Z4add2i+0x15>
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
97: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
9a: 5d pop rbp
9b: c3 ret
9c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000000a0 <__Z4add1i>:
}
In the 2 examples you give, it will not make much difference which one you choose - the compiler is almost certain to optimise them identically.
Both are perfectly legal. The second case you cite is fine because each variable is contained to the scope of the for loop.
Personally, I will always write my loops like in your second example unless the index of the loop is related to some other pre-existing variable. I think this is a neater solution and complies with the idea of declaring variables where you need them.
C/C++ will allow you to do something which is not completely intuitive - it will allow you to redefine the same variable name in a nested scope and then things can start to get messy:
for (int i = 0; i < 10; i++) {
for (int i = 10; i < 100; i++) {
// Be careful what you do here!
}
}
In the inner loop any reference to 'i' will refer to the 'i' declared in the inner loop - the outer loop 'i' is now inaccessible. I have seen so many bugs based on this and they can be hard to spot because it is almost never a deliberate choice by the programmer.
it is because of the encapsulation.
What you can try is
for(int i = 0; i<5;i++) {std::cout<<i<<std::endl; }
std::cout<<i<<std::endl;
this code does not work because "i" is only declared inside the for loop.
also if you have an "if-statement", every varable which is declared inside is encapsulated and does no longer exist outside the if statement.
The clamps { } also encapsulate variables.
What you can't do is
int i = 5;
int i = 4;
now you try to declare the same variable again, which gives you an error.

Member initializer list, pointer initialization without argument

In a large framework which used to use many smart pointers and now uses raw pointers, I come across situations like this quite often:
class A {
public:
int* m;
A() : m() {}
};
The reason is because int* m used to be a smart pointer and so the initializer list called a default constructor. Now that int* m is a raw pointer I am not certain if this is equivalent to:
class A {
public:
int* m;
A() : m(nullptr) {}
};
Without the explicit nullptr is A::m still initialized to zero? A look at no optimization objdump -d makes it appear to be yes but I am not certain. The reason I feel that the answer is yes is due to this line in the objdump -d (I posted more of the objdump -d below):
400644: 48 c7 00 00 00 00 00 movq $0x0,(%rax)
Little program that tries to find undefined behavior:
class A {
public:
int* m;
A() : m(nullptr) {}
};
int main() {
A buf[1000000];
unsigned int count = 0;
for (unsigned int i = 0; i < 1000000; ++i) {
count += buf[i].m ? 1 : 0;
}
return count;
}
Compilation, execution, and return value:
g++ -std=c++14 -O0 foo.cpp
./a.out; echo $?
0
Relevant assembly sections from objdump -d:
00000000004005b8 <main>:
4005b8: 55 push %rbp
4005b9: 48 89 e5 mov %rsp,%rbp
4005bc: 41 54 push %r12
4005be: 53 push %rbx
4005bf: 48 81 ec 10 12 7a 00 sub $0x7a1210,%rsp
4005c6: 48 8d 85 e0 ed 85 ff lea -0x7a1220(%rbp),%rax
4005cd: bb 3f 42 0f 00 mov $0xf423f,%ebx
4005d2: 49 89 c4 mov %rax,%r12
4005d5: eb 10 jmp 4005e7 <main+0x2f>
4005d7: 4c 89 e7 mov %r12,%rdi
4005da: e8 59 00 00 00 callq 400638 <_ZN1AC1Ev>
4005df: 49 83 c4 08 add $0x8,%r12
4005e3: 48 83 eb 01 sub $0x1,%rbx
4005e7: 48 83 fb ff cmp $0xffffffffffffffff,%rbx
4005eb: 75 ea jne 4005d7 <main+0x1f>
4005ed: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
4005f4: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
4005fb: eb 23 jmp 400620 <main+0x68>
4005fd: 8b 45 e8 mov -0x18(%rbp),%eax
400600: 48 8b 84 c5 e0 ed 85 mov -0x7a1220(%rbp,%rax,8),%rax
400607: ff
400608: 48 85 c0 test %rax,%rax
40060b: 74 07 je 400614 <main+0x5c>
40060d: b8 01 00 00 00 mov $0x1,%eax
400612: eb 05 jmp 400619 <main+0x61>
400614: b8 00 00 00 00 mov $0x0,%eax
400619: 01 45 ec add %eax,-0x14(%rbp)
40061c: 83 45 e8 01 addl $0x1,-0x18(%rbp)
400620: 81 7d e8 3f 42 0f 00 cmpl $0xf423f,-0x18(%rbp)
400627: 76 d4 jbe 4005fd <main+0x45>
400629: 8b 45 ec mov -0x14(%rbp),%eax
40062c: 48 81 c4 10 12 7a 00 add $0x7a1210,%rsp
400633: 5b pop %rbx
400634: 41 5c pop %r12
400636: 5d pop %rbp
400637: c3 retq
0000000000400638 <_ZN1AC1Ev>:
400638: 55 push %rbp
400639: 48 89 e5 mov %rsp,%rbp
40063c: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400640: 48 8b 45 f8 mov -0x8(%rbp),%rax
400644: 48 c7 00 00 00 00 00 movq $0x0,(%rax)
40064b: 5d pop %rbp
40064c: c3 retq
40064d: 0f 1f 00 nopl (%rax)
Empty () initializer stands for default-initialization in C++98 and for value-initialization in C++03 and later. For scalar types (including pointers) value-initialization/default-initialization leads to zero-initialization.
Which means that in your case m() and m(nullptr) will have exactly the same effect: in both cases m is initialized as a null pointer. In C++ it was like that since the beginning of standardized times.

Unsigned int to unsigned long long well defined?

I wanted to see what was happening behind the scenes when an unsigned long long was assigned the value of an unsigned int. I made a simple C++ program to try it out and moved all the io out of main():
#include <iostream>
#include <stdlib.h>
void usage() {
std::cout << "Usage: ./u_to_ull <unsigned int>\n";
exit(0);
}
void atoiWarning(int foo) {
std::cout << "WARNING: atoi() returned " << foo << " and (unsigned int)foo is " <<
((unsigned int)foo) << "\n";
}
void result(unsigned long long baz) {
std::cout << "Result as unsigned long long is " << baz << "\n";
}
int main(int argc, char** argv) {
if (argc != 2) usage();
int foo = atoi(argv[1]);
if (foo < 0) atoiWarning(foo);
// Signed to unsigned
unsigned int bar = foo;
// Conversion
unsigned long long baz = -1;
baz = bar;
result(baz);
return 0;
}
The resulting assembly produced this for main:
0000000000400950 <main>:
400950: 55 push %rbp
400951: 48 89 e5 mov %rsp,%rbp
400954: 48 83 ec 20 sub $0x20,%rsp
400958: 89 7d ec mov %edi,-0x14(%rbp)
40095b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40095f: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
400963: 74 05 je 40096a <main+0x1a>
400965: e8 3a ff ff ff callq 4008a4 <_Z5usagev>
40096a: 48 8b 45 e0 mov -0x20(%rbp),%rax
40096e: 48 83 c0 08 add $0x8,%rax
400972: 48 8b 00 mov (%rax),%rax
400975: 48 89 c7 mov %rax,%rdi
400978: e8 0b fe ff ff callq 400788 <atoi#plt>
40097d: 89 45 f0 mov %eax,-0x10(%rbp)
400980: 83 7d f0 00 cmpl $0x0,-0x10(%rbp)
400984: 79 0a jns 400990 <main+0x40>
400986: 8b 45 f0 mov -0x10(%rbp),%eax
400989: 89 c7 mov %eax,%edi
40098b: e8 31 ff ff ff callq 4008c1 <_Z11atoiWarningi>
400990: 8b 45 f0 mov -0x10(%rbp),%eax
400993: 89 45 f4 mov %eax,-0xc(%rbp)
400996: 48 c7 45 f8 ff ff ff movq $0xffffffffffffffff,-0x8(%rbp)
40099d: ff
40099e: 8b 45 f4 mov -0xc(%rbp),%eax
4009a1: 48 89 45 f8 mov %rax,-0x8(%rbp)
4009a5: 48 8b 45 f8 mov -0x8(%rbp),%rax
4009a9: 48 89 c7 mov %rax,%rdi
4009ac: e8 66 ff ff ff callq 400917 <_Z6resulty>
4009b1: b8 00 00 00 00 mov $0x0,%eax
4009b6: c9 leaveq
4009b7: c3 retq
The -1 from the C++ makes it clear that -0x8(%rbp) corresponds to baz (due to $0xffffffffffffffff). -0x8(%rbp) is written to by %rax, but the top four bytes of %rax appear to not have been assigned, %eaxwas assigned
Does this suggest that the top 4 bytes of -0x8(%rbp) are undefined?
In the IntelĀ® 64 and IA-32 Architectures Software Developer Manuals, volume 1, chapter 3.4.1.1 (General-Purpose Registers in 64-Bit Mode), it says
32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
So after mov -0xc(%rbp),%eax, the upper half of rax is defined, and it's zero.
This also applies to the 87 C0 encoding of xchg eax, eax, but not to its 90 encoding (which is defined as nop, overruling the rule quoted above).
From C++98 (and C++11 seems to be unchanged) 4.7/2 (integral conversions - no promotions are relevant) we learn:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type).
This clearly shows that as long as the source and destination are unsigned and the destination is at least as large as the source, the value will be unchanged. If the compiler generated code that failed to make the larger value equal, the compiler is buggy.