declaring a variable many times in the same code in Rcpp (C++) - c++

For an R user who began using Rcpp, declaring variables is a new thing. My question is what actually happens when the same named variable is declared many times. In many examples, I see that index of for loops are declared each time.
cppFunction('
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
}
')
instead of
cppFunction('
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
}
')
Both seem to give the same answer. But is it generally ok to declare a variable (of the same name) many times in the same program? If it is not ok, when it is not ok? Or maybe I don't understand what 'declare' means, and e.g., the two functions above are identical (e.g., nothing is declared many times even in the first function).

Overview
Alrighty, let's take a looksie at the assembly code after a compiler has transformed both statements. The compiler in this case should ideally provide the same optimization (we may want to run with the -O2 flag).
Test Case
I've written up your file using pure C++. That is, I've opted to directly perform the compilation via terminal and not relying on Rcpp black magic which slips in #include <Rcpp.h> during every compilation.
test.cpp
#include <iostream>
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
}
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
}
int main(){
std::cout << add1(2) << std::endl;
std::cout << add2(2) << std::endl;
}
Decomposing the Binary
To see how the C++ code was translated into assembly, I've opted to use objdump over the built in otools on macOS. (Someone is more than welcome to provide that output as well).
In macOS, I did:
gcc -g -c test.cpp
# brew install binutils # required for (g)objdump
gobjdump -d -M intel -S test.o
This gives the following annotated output that I've chunked at the end of the post. In a nutshell, the assembly for both versions is exactly the same.
Benchmarks are King
Another way to verify would be to do a simple microbenchmark. If there was significant difference between the two, that would provide evidence to suggest different optimizations.
# install.packages("microbenchmark")
library("microbenchmark")
microbenchmark(a = add1(100L), b = add2(100L))
Gives:
Unit: microseconds
expr min lq mean median uq max neval
a 53.081 53.268 55.35613 53.576 53.8825 92.078 100
b 53.069 53.261 56.28195 53.431 53.6795 169.841 100
Switching the order:
microbenchmark(b = add2(100L), a = add1(100L))
Gives:
Unit: microseconds
expr min lq mean median uq max neval
b 53.112 53.3215 60.14641 55.0575 60.7685 196.865 100
a 53.130 53.6850 58.72041 55.2845 60.6005 93.401 100
In essence, the benchmarks themselves indicate no significant difference between either method.
Appendix
Long Output
Long output add1
int add1( const int n ){
a0: 55 push rbp
a1: 48 89 e5 mov rbp,rsp
a4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
int y = 0;
a7: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
for(int i=0; i<n; i++){
ae: c7 45 f4 00 00 00 00 mov DWORD PTR [rbp-0xc],0x0
b5: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
b8: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
bb: 0f 8d 76 00 00 00 jge 137 <__Z4add1i+0x97>
for(int j=0; j<n; j++) y++;
c1: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
c8: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
cb: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
ce: 0f 8d 1b 00 00 00 jge ef <__Z4add1i+0x4f>
d4: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
d7: 05 01 00 00 00 add eax,0x1
dc: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
df: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
e2: 05 01 00 00 00 add eax,0x1
e7: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
ea: e9 d9 ff ff ff jmp c8 <__Z4add1i+0x28>
for(int j=0; j<(n*2); j++) y++;
ef: c7 45 ec 00 00 00 00 mov DWORD PTR [rbp-0x14],0x0
f6: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
f9: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
fc: c1 e1 01 shl ecx,0x1
ff: 39 c8 cmp eax,ecx
101: 0f 8d 1b 00 00 00 jge 122 <__Z4add1i+0x82>
107: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
10a: 05 01 00 00 00 add eax,0x1
10f: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
112: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
115: 05 01 00 00 00 add eax,0x1
11a: 89 45 ec mov DWORD PTR [rbp-0x14],eax
11d: e9 d4 ff ff ff jmp f6 <__Z4add1i+0x56>
}
122: e9 00 00 00 00 jmp 127 <__Z4add1i+0x87>
return y ;
}
Long Output for add2
int add2( const int n ){
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
int y = 0;
7: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
int i, j;
for(i=0; i<n; i++){
e: c7 45 f4 00 00 00 00 mov DWORD PTR [rbp-0xc],0x0
15: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
18: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
1b: 0f 8d 76 00 00 00 jge 97 <__Z4add2i+0x97>
for(j=0; j<n; j++) y++;
21: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
28: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
2b: 3b 45 fc cmp eax,DWORD PTR [rbp-0x4]
2e: 0f 8d 1b 00 00 00 jge 4f <__Z4add2i+0x4f>
34: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
37: 05 01 00 00 00 add eax,0x1
3c: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
3f: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
42: 05 01 00 00 00 add eax,0x1
47: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
4a: e9 d9 ff ff ff jmp 28 <__Z4add2i+0x28>
for(j=0; j<(n*2); j++) y++;
4f: c7 45 f0 00 00 00 00 mov DWORD PTR [rbp-0x10],0x0
56: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
59: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4]
5c: c1 e1 01 shl ecx,0x1
5f: 39 c8 cmp eax,ecx
61: 0f 8d 1b 00 00 00 jge 82 <__Z4add2i+0x82>
67: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
6a: 05 01 00 00 00 add eax,0x1
6f: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
72: 8b 45 f0 mov eax,DWORD PTR [rbp-0x10]
75: 05 01 00 00 00 add eax,0x1
7a: 89 45 f0 mov DWORD PTR [rbp-0x10],eax
7d: e9 d4 ff ff ff jmp 56 <__Z4add2i+0x56>
}
82: e9 00 00 00 00 jmp 87 <__Z4add2i+0x87>
Output short output
Short output for add1
int add1( const int n ){
int y = 0;
for(int i=0; i<n; i++){
127: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
12a: 05 01 00 00 00 add eax,0x1
12f: 89 45 f4 mov DWORD PTR [rbp-0xc],eax
132: e9 7e ff ff ff jmp b5 <__Z4add1i+0x15>
for(int j=0; j<n; j++) y++;
for(int j=0; j<(n*2); j++) y++;
}
return y ;
137: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
13a: 5d pop rbp
13b: c3 ret
13c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000140 <_main>:
}
Short output for add2
int add2( const int n ){
int y = 0;
int i, j;
for(i=0; i<n; i++){
87: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
8a: 05 01 00 00 00 add eax,0x1
8f: 89 45 f4 mov DWORD PTR [rbp-0xc],eax
92: e9 7e ff ff ff jmp 15 <__Z4add2i+0x15>
for(j=0; j<n; j++) y++;
for(j=0; j<(n*2); j++) y++;
}
return y ;
97: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
9a: 5d pop rbp
9b: c3 ret
9c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000000a0 <__Z4add1i>:
}

In the 2 examples you give, it will not make much difference which one you choose - the compiler is almost certain to optimise them identically.
Both are perfectly legal. The second case you cite is fine because each variable is contained to the scope of the for loop.
Personally, I will always write my loops like in your second example unless the index of the loop is related to some other pre-existing variable. I think this is a neater solution and complies with the idea of declaring variables where you need them.
C/C++ will allow you to do something which is not completely intuitive - it will allow you to redefine the same variable name in a nested scope and then things can start to get messy:
for (int i = 0; i < 10; i++) {
for (int i = 10; i < 100; i++) {
// Be careful what you do here!
}
}
In the inner loop any reference to 'i' will refer to the 'i' declared in the inner loop - the outer loop 'i' is now inaccessible. I have seen so many bugs based on this and they can be hard to spot because it is almost never a deliberate choice by the programmer.

it is because of the encapsulation.
What you can try is
for(int i = 0; i<5;i++) {std::cout<<i<<std::endl; }
std::cout<<i<<std::endl;
this code does not work because "i" is only declared inside the for loop.
also if you have an "if-statement", every varable which is declared inside is encapsulated and does no longer exist outside the if statement.
The clamps { } also encapsulate variables.
What you can't do is
int i = 5;
int i = 4;
now you try to declare the same variable again, which gives you an error.

Related

C++ assembly code analysis (compiled with clang)

I am trying to figure out how the C++ binary code looks like, especially for virtual function calls. I have come up with few curious things. I have this following C++ code:
#include <iostream>
using namespace std;
class Base {
public:
virtual void print() { cout << "from base" << endl; }
};
class Derived : public Base {
public:
virtual void print() { cout << "from derived" << endl; }
};
int main() {
Base *b;
Derived d;
d.print();
b = &d;
b->print();
return 0;
}
I compiled it with clang++, and then use objdump:
00000000004008b0 <main>:
4008b0: 55 push rbp
4008b1: 48 89 e5 mov rbp,rsp
4008b4: 48 83 ec 20 sub rsp,0x20
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008bc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4008c3: e8 28 00 00 00 call 4008f0 <Derived::Derived()>
4008c8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008cc: e8 5f 00 00 00 call 400930 <Derived::print()>
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d5: 48 89 7d f0 mov QWORD PTR [rbp-0x10],rdi
4008d9: 48 8b 7d f0 mov rdi,QWORD PTR [rbp-0x10]
4008dd: 48 8b 07 mov rax,QWORD PTR [rdi]
4008e0: ff 10 call QWORD PTR [rax]
4008e2: 31 c0 xor eax,eax
4008e4: 48 83 c4 20 add rsp,0x20
4008e8: 5d pop rbp
4008e9: c3 ret
4008ea: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
My question is why in assembly code, we have the following code:
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
The local variable d in main() is stored at location [rbp-0x18]. This is in the automatic storage allocated on the stack for main().
lea rdi,[rbp-0x18]
This line loads the address of d into the rdi register. By convention, member functions of Derived treat rdi as the this pointer.

Temporary object for by-value function creation scope

I have this following code which i can not understand:
#include <cstdio>
#include <iostream>
using namespace std;
class A
{
public:
int t = 0;
A(){
cout << "constructed" << t<< endl;
}
A (A&& a) {
cout << "in move ctor, moving"<< a.t << endl;
}
~A() {
cout << "deleting"<< t << endl;
}
};
A f1 (A a)
{
a.t = 1;
std::cout << "f1: " << endl;
return a;
}
int main() {
A a = f1(A()) ;
printf("what is happening\n");
}
and the output is
constructed0
in move ctor, moving0
f1:
in move ctor, moving1
in move ctor, moving0
deleting0
deleting1
deleting0
what is happening
deleting0
The thing that I can not understand is the phase where the temporary object created for f1 (the one with a.t=1) is being destroyed.
From the output I assume it is being destroyed at the and of the line A a = f1(A()) ; While I thought it was created inside f1 and for f1, and therefore will be destroyed when exiting the function, before deleting0 is being called.
What am I missing?
So after a bit research I have the Answer.
Here is the disassembly of the code (changed the move constructor to a copy constructor for readability):
int A::counter = 0;
A f1 (A a)
{
400a18: 55 push %rbp
400a19: 48 89 e5 mov %rsp,%rbp
400a1c: 48 83 ec 10 sub $0x10,%rsp
400a20: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400a24: 48 89 75 f0 mov %rsi,-0x10(%rbp)
cout << __LINE__ << endl;
400a28: be 1d 00 00 00 mov $0x1d,%esi
400a2d: bf 80 13 60 00 mov $0x601380,%edi
400a32: e8 c1 fd ff ff callq 4007f8 <_ZNSolsEi#plt>
400a37: be 78 08 40 00 mov $0x400878,%esi
400a3c: 48 89 c7 mov %rax,%rdi
400a3f: e8 24 fe ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
a.t = 1;
400a44: 48 8b 45 f0 mov -0x10(%rbp),%rax
400a48: c7 00 01 00 00 00 movl $0x1,(%rax)
std::cout << "f1: " << endl;
400a4e: be ce 0e 40 00 mov $0x400ece,%esi
400a53: bf 80 13 60 00 mov $0x601380,%edi
400a58: e8 fb fd ff ff callq 400858 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc#plt>
400a5d: be 78 08 40 00 mov $0x400878,%esi
400a62: 48 89 c7 mov %rax,%rdi
400a65: e8 fe fd ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
cout << __LINE__ << endl;
400a6a: be 20 00 00 00 mov $0x20,%esi
400a6f: bf 80 13 60 00 mov $0x601380,%edi
400a74: e8 7f fd ff ff callq 4007f8 <_ZNSolsEi#plt>
400a79: be 78 08 40 00 mov $0x400878,%esi
400a7e: 48 89 c7 mov %rax,%rdi
400a81: e8 e2 fd ff ff callq 400868 <_ZNSolsEPFRSoS_E#plt>
return a;
400a86: 48 8b 55 f0 mov -0x10(%rbp),%rdx
400a8a: 48 8b 45 f8 mov -0x8(%rbp),%rax
400a8e: 48 89 d6 mov %rdx,%rsi
400a91: 48 89 c7 mov %rax,%rdi
400a94: e8 dd 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
}
400a99: 48 8b 45 f8 mov -0x8(%rbp),%rax
400a9d: c9 leaveq
400a9e: c3 retq
0000000000400a9f <main>:
int main() {
400a9f: 55 push %rbp
400aa0: 48 89 e5 mov %rsp,%rbp
400aa3: 53 push %rbx
400aa4: 48 83 ec 48 sub $0x48,%rsp
A a = f1(A()) ;
400aa8: 48 8d 45 e0 lea -0x20(%rbp),%rax
400aac: 48 89 c7 mov %rax,%rdi
400aaf: e8 2a 01 00 00 callq 400bde <_ZN1AC1Ev>
400ab4: 48 8d 55 e0 lea -0x20(%rbp),%rdx
400ab8: 48 8d 45 d0 lea -0x30(%rbp),%rax
400abc: 48 89 d6 mov %rdx,%rsi
400abf: 48 89 c7 mov %rax,%rdi
400ac2: e8 af 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
400ac7: 48 8d 45 c0 lea -0x40(%rbp),%rax
400acb: 48 8d 55 d0 lea -0x30(%rbp),%rdx
400acf: 48 89 d6 mov %rdx,%rsi
400ad2: 48 89 c7 mov %rax,%rdi
400ad5: e8 3e ff ff ff callq 400a18 <_Z2f11A>
400ada: 48 8d 55 c0 lea -0x40(%rbp),%rdx
400ade: 48 8d 45 b0 lea -0x50(%rbp),%rax
400ae2: 48 89 d6 mov %rdx,%rsi
400ae5: 48 89 c7 mov %rax,%rdi
400ae8: e8 89 01 00 00 callq 400c76 <_ZN1AC1ERKS_>
400aed: 48 8d 45 c0 lea -0x40(%rbp),%rax
400af1: 48 89 c7 mov %rax,%rdi
400af4: e8 31 02 00 00 callq 400d2a <_ZN1AD1Ev>
400af9: 48 8d 45 d0 lea -0x30(%rbp),%rax
400afd: 48 89 c7 mov %rax,%rdi
400b00: e8 25 02 00 00 callq 400d2a <_ZN1AD1Ev>
400b05: 48 8d 45 e0 lea -0x20(%rbp),%rax
400b09: 48 89 c7 mov %rax,%rdi
400b0c: e8 19 02 00 00 callq 400d2a <_ZN1AD1Ev>
printf("what is happening\n");
400b11: bf d3 0e 40 00 mov $0x400ed3,%edi
400b16: e8 ed fc ff ff callq 400808 <puts#plt>
cout << __LINE__ << endl;
return a;
}
The Copy constructor is called "ZN1AC1ERKS" after mangling process.
As we can see, the temporary object that is being created for f1, is being created before the function call, in main, and not as i expected, in f1's scope.
The meaning is as follows:
Temporary objects being created for functions that are called by value are not created in the functions scope, but rather on the line called the function, thus they will be destroyed before the next line execution, in the ordinary first created last destroyed way.

Memory allocation for local array when only one entry/index is initialized?

I'm learning C++ from basics (using Visual Studio Community 2015)
While working on the arrays I came across the following:
int main()
{
int i[10] = {};
}
The assembly code for this is :
18: int i[10] = {};
008519CE 33 C0 xor eax,eax
008519D0 89 45 D4 mov dword ptr [ebp-2Ch],eax
008519D3 89 45 D8 mov dword ptr [ebp-28h],eax
008519D6 89 45 DC mov dword ptr [ebp-24h],eax
008519D9 89 45 E0 mov dword ptr [ebp-20h],eax
008519DC 89 45 E4 mov dword ptr [ebp-1Ch],eax
008519DF 89 45 E8 mov dword ptr [ebp-18h],eax
008519E2 89 45 EC mov dword ptr [ebp-14h],eax
008519E5 89 45 F0 mov dword ptr [ebp-10h],eax
008519E8 89 45 F4 mov dword ptr [ebp-0Ch],eax
008519EB 89 45 F8 mov dword ptr [ebp-8],eax
Here, since initalisation is used every int here is initialised to 0 (xor eax,eax). This is clear.
From what I have learnt, any variable would be allocated memory only if it is used (atleast in modern compilers) and if any one element in an array is initialised the complete array would be allocated memory as follows:
int main()
{
int i[10];
i[0] = 20;
int j = 20;
}
Assembly generated:
18: int i[10];
19: i[0] = 20;
00A319CE B8 04 00 00 00 mov eax,4
00A319D3 6B C8 00 imul ecx,eax,0
00A319D6 C7 44 0D D4 14 00 00 00 mov dword ptr [ebp+ecx-2Ch],14h
20: int j = 20;
00A319DE C7 45 C8 14 00 00 00 mov dword ptr [ebp-38h],14h
Here, the compiler used 4 bytes (to copy the value 20 to i[0]) but from what I have learnt the memory for the entire array should be allocated at line 19. But the compiler haven't produced any relevant machine code for this. And where would it store info (stating that the remaining memory for the other nine elements[1-9] of array i's cannot be used by other variables)
Please help!!!

Operator [] long and short versions

What is the advantage of using the longer version (something).operator[]() instead of simply (something)[]?
For example :
std::array<int, 10> arr1;
std::array<int, 10> arr2;
for(int i = 0; i < arr1.size(); i++)
std::cout << arr1[i] << ' ';
std::cout << std::endl;
for(int i = 0; i < arr2.size(); i++)
std::cout << arr2.operator[](i) << ' ';
std::cout << std::endl;
There is none. The [] is just syntactic sugar for operator[] on user-defined types. You only need the operator syntax when you define these functions yourself. This goes for all operators like operator(), operator[], operator new, operator=, ...
syntactic sugar
compiled with g++ -g -std=gnu++0x ...
0000000000400554 <main>:
#include <array>
int main() {
400554: 55 push %rbp
400555: 48 89 e5 mov %rsp,%rbp
400558: 48 83 ec 60 sub $0x60,%rsp
std::array<int, 10> arr1;
std::array<int, 10> arr2;
arr1[6];
40055c: 48 8d 45 d0 lea -0x30(%rbp),%rax
400560: be 06 00 00 00 mov $0x6,%esi
400565: 48 89 c7 mov %rax,%rdi
400568: e8 19 00 00 00 callq 400586 <std::array<int, 10ul>::operator[](unsigned long)>
arr2.operator[](6);
40056d: 48 8d 45 a0 lea -0x60(%rbp),%rax
400571: be 06 00 00 00 mov $0x6,%esi
400576: 48 89 c7 mov %rax,%rdi
400579: e8 08 00 00 00 callq 400586 <std::array<int, 10ul>::operator[](unsigned long)>
40057e: b8 00 00 00 00 mov $0x0,%eax
}
400583: c9 leaveq
400584: c3 retq
400585: 90 nop

Local variable vs. array access

Which of these would be more computationally efficient, and why?
A) Repeated array access:
for(i=0; i<numbers.length; i++) {
result[i] = numbers[i] * numbers[i] * numbers[i];
}
B) Setting a local variable:
for(i=0; i<numbers.length; i++) {
int n = numbers[i];
result[i] = n * n * n;
}
Would not the repeated array access version have to be calculated (using pointer arithmetic), making the first option slower because it is doing this?:
for(i=0; i<numbers.length; i++) {
result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
}
Any sufficiently sophisticated compiler will generate the same code for all three solutions. I turned your three versions into a small C program (with a minor adjustement, I changed the access numbers.length to a macro invocation which gives the length of an array):
#include <stddef.h>
size_t i;
static const int numbers[] = { 0, 1, 2, 4, 5, 6, 7, 8, 9 };
#define ARRAYLEN(x) (sizeof((x)) / sizeof(*(x)))
static int result[ARRAYLEN(numbers)];
void versionA(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
result[i] = numbers[i] * numbers[i] * numbers[i];
}
}
void versionB(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
int n = numbers[i];
result[i] = n * n * n;
}
}
void versionC(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
}
}
I then compiled it using optimizations (and debug symbols, for prettier disassembly) with Visual Studio 2012:
C:\Temp>cl /Zi /O2 /Wall /c so19244189.c
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50727.1 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
so19244189.c
Finally, here's the disassembly:
C:\Temp>dumpbin /disasm so19244189.obj
[..]
_versionA:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
_versionB:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
_versionC:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
Note how the assembly is exactly the same in all cases. So the correct answer to your question
Which of these would be more computationally efficient, and why?
for this compiler is: mu. Your question cannot be answered because it's based on incorrect assumptions. None of the answers is faster than any other.
The theoretical answer:
A reasonably good optimizing compiler should convert version A to version B, and perform only one load from memory. There should be no performance difference if optimization is enabled.
If optimization is disabled, version A will be slower, because the address must be computed 3 times and there are 3 memory loads (2 of them are cached and very fast, but it's still slower than reusing a register).
In practice, the answer will depend on your compiler, and you should check this by benchmarking.
It depends on compiler but all of them should be the same.
First lets look at case B smart compiler will generate code to load value into register only once so it doesn't matter if you use some additional variable or not, compiler generates opcode for mov instruction and has value into register. So B is the same as A.
Now lets compare A and C. We should look at opeators [] inline implementation. a[b] actually is *(a + b) so *(numbers + i) the same as numbers[i] that means cases A and C are the same.
So we have (A==B) && (A==C) all in all (A==B==C) If you know what I mean :).