Allocating a temporary variable or compute the expression twice [duplicate] - c++

This question already has answers here:
C++ performance of accessing member variables versus local variables
(11 answers)
Closed 2 years ago.
I am trying to write a code as efficient as possible and I encountered the following situation:
int foo(int a, int b, int c)
{
return (a + b) % c;
}
All good! But what if I want to check if the result of the expression to be different of a constant lets say myConst. Lets say I can afford a temporary variable.
What method is the fastest of the following:
int foo(int a, int b, int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
or
int foo(int a, int b, int c)
{
int tmp = (a + b) % c
return (tmp != myConst) ? tmp : myException;
}
I can't decide. Where is the 'line' where recalculation is more expensive than allocating and deallocating a temporary variable or the other way around.

Don't worry about it, write concise code and leave micro-optimizations to the compiler.
In your example writing the same calculation twice is error prone - so do not do this. In your specific example, compiler is more than likely to avoid creating a temporary on the stack at all!
Your example can (does on my compiler) produce following assembly (i have replaced myConst with constexpr 42 and myException with 0):
foo(int, int, int):
leal (%rdi,%rsi), %eax # this adds a and b, puts result to eax
movl %edx, %ecx # loads c
cltd
idivl %ecx # performs division, puts result into edx
movl $0, %eax #, prepares to return exception value
cmpl $42, %edx #, compares result of division with magic const
cmovne %edx, %eax # overwrites pessimized exception if all is cool
ret
As you see, there is no temporary anywhere in sight!

Use the later.
You're not computing the same value twice.
The code is more clear.
Creating local variables on the stack doesn't take any significant amount of time.

Check the assembler code this generates for both versions. You most likely want the highest optimization settings for your compiler.
You may very well find out the compiler itself can figure out the intermediate value is used twice, but only inside the function, so safe to store in a register.

To add to what has already been posted, ease of debugging is at least as important as code efficiency, (if there is any effect on code efficiency which, as others have posted, is unlikely with optimization on).
Go with the easiest to follow, test and debug.
Use a temp var.
If more developers used simpler, non-compound expressions and more temp vars, there would be far fewer 'Help - I cannot debug my code!' posts to SO.

Hard coded values
The following only applies to hard coded values (even if they aren't const or constexpr)
In the following example on MSVC 2015 they were optimized away completely and replaced only with mov edx, result (=1 in this example):
#include <iostream>
#include <exception>
int myConst{4};
int myException{2};
int foo1(int a,int b,int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
int foo2(int a,int b,int c)
{
int tmp = (a + b) % c;
return (tmp != myConst) ? tmp : myException;
}
int main()
{
00007FF71F0E1000 48 83 EC 28 sub rsp,28h
auto test1{foo1(5,2,3)};
auto test2{foo2(5,2,3)};
std::cout << test1 <<'\n';
00007FF71F0E1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E100B BA 01 00 00 00 mov edx,1
00007FF71F0E1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1016 48 8B C8 mov rcx,rax
00007FF71F0E1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
std::cout << test2 <<'\n';
00007FF71F0E101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E1025 BA 01 00 00 00 mov edx,1
00007FF71F0E102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1030 48 8B C8 mov rcx,rax
00007FF71F0E1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
return 0;
00007FF71F0E1038 33 C0 xor eax,eax
}
00007FF71F0E103A 48 83 C4 28 add rsp,28h
00007FF71F0E103E C3 ret
At this point other have pointed out that the optimization will not happen if the values were passed, or if we had separate files, but it seems even if the code is in separate compilation units the optimization is still done and we don't get any instructions for these functions:
#include <iostream>
#include <exception>
#include "Header.h"
int main()
{
00007FF667BF1000 48 83 EC 28 sub rsp,28h
int var1{5},var2{2},var3{3};
auto test1{foo1(var1,var2,var3)};
auto test2{foo2(var1,var2,var3)};
std::cout << test1 <<'\n';
00007FF667BF1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF100B BA 01 00 00 00 mov edx,1
00007FF667BF1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1016 48 8B C8 mov rcx,rax
00007FF667BF1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
std::cout << test2 <<'\n';
00007FF667BF101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF1025 BA 01 00 00 00 mov edx,1
00007FF667BF102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1030 48 8B C8 mov rcx,rax
00007FF667BF1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
return 0;
00007FF667BF1038 33 C0 xor eax,eax
}
00007FF667BF103A 48 83 C4 28 add rsp,28h
00007FF667BF103E C3 ret

Related

Why do these lambda captured values have different types?

Two int variables, one a reference, when captured by lambdas, differ in type in MSVC's c++20. Why?
Scott's technique to determine type at compile time produces expected results for c++14 and 17 but not for c++20. However, it seems this odd difference occurs in other compiles for earlier versions too.
Specifically, two ints, one a reference, when captured by either value or reference produce differing types. int and int &. Yet earlier versions produces expected types. When captured by value both were typed int. When captured by reference both were int &.
// Scott Meyer's compile time Type Deduction technique
template<typename T>
class TD;
int main()
{
int i{ 1 };
int& ri = i;
[i, ri]() mutable {
TD<decltype(i)> WhatAmI1; // c++14, c++17 TD<int>, c++20 TD<int>
TD<decltype(ri)> WhatAmI2; // c++14, c++17 TD<int>, c++20 TD<int &>
}();
[&i, &ri]() mutable {
TD<decltype(i)> WhatAmI3; // c++14, c++17 TD<int&>, c++20 TD<int>
TD<decltype(ri)> WhatAmI4; // c++14, c++17 TD<int&>, c++20 TD<int &>
}();
}
Exploring this further, the code produced is as expected and is the same for both captured values and captured references.
int main()
{
int i{ 1 };
int& ri = i;
[i, ri]() mutable {
i += 2;
ri += 3;
}();
[&i, &ri]() {
i += 2;
ri += 3;
}();
}
[i, ri]() mutable {
i += 2;
00007FF7F506187F 48 8B 85 E0 00 00 00 mov rax,qword ptr [this]
00007FF7F5061886 8B 00 mov eax,dword ptr [rax]
00007FF7F5061888 83 C0 02 add eax,2
00007FF7F506188B 48 8B 8D E0 00 00 00 mov rcx,qword ptr [this]
00007FF7F5061892 89 01 mov dword ptr [rcx],eax
ri += 3;
00007FF7F5061894 48 8B 85 E0 00 00 00 mov rax,qword ptr [this]
00007FF7F506189B 8B 40 04 mov eax,dword ptr [rax+4]
00007FF7F506189E 83 C0 03 add eax,3
00007FF7F50618A1 48 8B 8D E0 00 00 00 mov rcx,qword ptr [this]
00007FF7F50618A8 89 41 04 mov dword ptr [rcx+4],eax
[&i, &ri]() {
i += 2;
00007FF7F506190F 48 8B 85 E0 00 00 00 mov rax,qword ptr [this]
00007FF7F5061916 48 8B 00 mov rax,qword ptr [rax]
00007FF7F5061919 8B 00 mov eax,dword ptr [rax]
00007FF7F506191B 83 C0 02 add eax,2
00007FF7F506191E 48 8B 8D E0 00 00 00 mov rcx,qword ptr [this]
00007FF7F5061925 48 8B 09 mov rcx,qword ptr [rcx]
00007FF7F5061928 89 01 mov dword ptr [rcx],eax
ri += 3;
00007FF7F506192A 48 8B 85 E0 00 00 00 mov rax,qword ptr [this]
00007FF7F5061931 48 8B 40 08 mov rax,qword ptr [rax+8]
00007FF7F5061935 8B 00 mov eax,dword ptr [rax]
00007FF7F5061937 83 C0 03 add eax,3
00007FF7F506193A 48 8B 8D E0 00 00 00 mov rcx,qword ptr [this]
00007FF7F5061941 48 8B 49 08 mov rcx,qword ptr [rcx+8]
00007FF7F5061945 89 01 mov dword ptr [rcx],eax
}();
Link to compiler explorer https://godbolt.org/z/oe9KPcWWv
The behavior of Clang, GCC and MSVC in C++20 mode is correct for all standard versions supporting lambdas. decltype(i) and decltype(ri) yield the type of the named variables. It is not rewritten to refer to the members of the closure object, as would be the case for decltype((ri)). (see e.g. [expr.prim.lambda.capture]/14 in C++17 draft N4659 handling this specifically)
Apparently MSVC's default behavior for standard modes before C++20 is non-conforming in how lambdas are handled. According to the documentation the flag /Zc:lambda must be given to handle lambdas standard-conforming. With this option MSVC produces the same result as in C++20 mode for C++17 and C++14 mode as well.
See for example this bug report which was closed as not-a-bug with instruction to use this flag. Also note that it doesn't seem to be included in /permissive-.

Purpose of rdi register for no argument function

Consider this simple function:
struct Foo {
int a;
int b;
int c;
int d;
int e;
int f;
};
Foo foo() {
Foo f;
f.a = 1;
f.b = 2;
f.c = 3;
f.d = 4;
f.e = 5;
f.f = 6;
return f;
}
It generates the following assembly:
0000000000400500 <foo()>:
400500: 48 ba 01 00 00 00 02 movabs rdx,0x200000001
400507: 00 00 00
40050a: 48 b9 03 00 00 00 04 movabs rcx,0x400000003
400511: 00 00 00
400514: 48 be 05 00 00 00 06 movabs rsi,0x600000005
40051b: 00 00 00
40051e: 48 89 17 mov QWORD PTR [rdi],rdx
400521: 48 89 4f 08 mov QWORD PTR [rdi+0x8],rcx
400525: 48 89 77 10 mov QWORD PTR [rdi+0x10],rsi
400529: 48 89 f8 mov rax,rdi
40052c: c3 ret
40052d: 0f 1f 00 nop DWORD PTR [rax]
Based on the assembly, I understand that the caller created space for Foo on its stack, and passed that information in rdi to the callee.
I am trying to find documentation for this convention. Calling convention in linux states that rdi contains the first integer argument. In this case, foo doesn't have any arguments.
Moreover, if I make foo take one integer argument, that is now passed as rsi (register for second argument) with rdi used for address of the return object.
Can anyone provide some documentation and clarity on how rdi is used in system V ABI?
See section 3.2.3 Parameter Passing in the ABI docs which says:
If the type has class MEMORY, then the caller provides space for the
return value and passes the address of this storage in %rdi as if it
were the first argument to the function. In effect, this address
becomes a "hidden" first argument.
On return %rax will contain the address that has been passed in by the
caller in %rdi.

Compiler mov'es this pointer to wrong address

I have a simple polymorphic construction, with one pure virtual function Foo.
The only flaw in the big project it's used in is that the project uses a couple of global statics for centralized parameter loading and for event logging (can't easily get rid of that legacy code).
Project info:
Platform toolset: v110_xp
MFC in static library
MBCS charset
calling convention: __cdecl
All optimizations disabled
Warning level 4, no warnings on the whole project
Code:
class Base
{
public:
Base(){}
virtual ~Base(void){}
virtual void Foo(void) = 0;
};
class Derived
: public Base
{
public:
Derived(void) : Base(){}
virtual void Foo(void) override
{
double a = sqrt(4.9);
double b = -a;
}
Calling code (doesn't really matter, same behaviour everywhere)
BOOL MainMFCApp::InitInstance()
{
Derived* d = new Derived();
d->Foo();
delete d;
...
}
The problem is that when run in debug (not tested with release), and when we end up inside function Foo, the this pointer is 'corrupted':
this = 0xcccccccc
this.__vfptr = <unable to read memory>
When I dive into the assembly code entering the function I see the following:
13:
14:
15: void Derived::Foo(void)
16: {
015E3570 55 push ebp
015E3571 8B EC mov ebp,esp
015E3573 83 E4 F8 and esp,0FFFFFFF8h
015E3576 81 EC EC 00 00 00 sub esp,0ECh
015E357C 53 push ebx
015E357D 56 push esi
015E357E 57 push edi
015E357F 51 push ecx
015E3580 8D BD 14 FF FF FF lea edi,[ebp-0ECh]
015E3586 B9 3B 00 00 00 mov ecx,3Bh
015E358B B8 CC CC CC CC mov eax,0CCCCCCCCh
015E3590 F3 AB rep stos dword ptr es:[edi]
015E3592 59 pop ecx
015E3593 89 8C 24 F0 00 00 00 mov dword ptr [esp+0F0h],ecx
17: double a = sqrt(4.9);
015E359A F2 0F 10 05 00 55 09 02 movsd xmm0,mmword ptr ds:[2095500h]
015E35A2 E8 72 D4 FD FF call __libm_sse2_sqrt_precise (015C0A19h)
015E35A7 F2 0F 11 84 24 E0 00 00 00 movsd mmword ptr [esp+0E0h],xmm0
18: double b = -a;
015E35B0 F2 0F 10 84 24 E0 00 00 00 movsd xmm0,mmword ptr [esp+0E0h]
015E35B9 66 0F 57 05 10 55 09 02 xorpd xmm0,xmmword ptr ds:[2095510h]
015E35C1 F2 0F 11 84 24 D0 00 00 00 movsd mmword ptr [esp+0D0h],xmm0
19: return;
20: }
015E35CA 5F pop edi
015E35CB 5E pop esi
015E35CC 5B pop ebx
015E35CD 8B E5 mov esp,ebp
015E35CF 5D pop ebp
015E35D0 C3 ret
--- No source file -------------------------------------------------------------
015E35D1 CC int 3
...
015E35EF CC int 3
Breakpoint at line 17, right before entering the function body: Using the watch window to inspect the object behind register ecx (with a cast to Derived*) shows that ecx contains the pointer I need (to the object), but for some reason it is mov'ed to the seemingly random address [esp+0F0h].
And now the really interesting/flabbergasting part: When I change
double b = -a;
to
double b = -1.0 * a;
and compile again, everything magically works. The function assembly has now changed to:
13:
14:
15: void Derived::Foo(void)
16: {
00863570 55 push ebp
00863571 8B EC mov ebp,esp
00863573 81 EC EC 00 00 00 sub esp,0ECh
00863579 53 push ebx
0086357A 56 push esi
0086357B 57 push edi
0086357C 51 push ecx
0086357D 8D BD 14 FF FF FF lea edi,[ebp-0ECh]
00863583 B9 3B 00 00 00 mov ecx,3Bh
00863588 B8 CC CC CC CC mov eax,0CCCCCCCCh
0086358D F3 AB rep stos dword ptr es:[edi]
0086358F 59 pop ecx
00863590 89 4D F8 mov dword ptr [this],ecx
17: double a = sqrt(4.9);
00863593 F2 0F 10 05 00 55 31 01 movsd xmm0,mmword ptr ds:[1315500h]
0086359B E8 79 D4 FD FF call __libm_sse2_sqrt_precise (0840A19h)
008635A0 F2 0F 11 45 E8 movsd mmword ptr [a],xmm0
18: double b = -1.0 * a;
008635A5 F2 0F 10 05 10 55 31 01 movsd xmm0,mmword ptr ds:[1315510h]
008635AD F2 0F 59 45 E8 mulsd xmm0,mmword ptr [a]
008635B2 F2 0F 11 45 D8 movsd mmword ptr [b],xmm0
19: return;
20: }
008635B7 5F pop edi
008635B8 5E pop esi
008635B9 5B pop ebx
008635BA 81 C4 EC 00 00 00 add esp,0ECh
008635C0 3B EC cmp ebp,esp
008635C2 E8 32 CD FC FF call __RTC_CheckEsp (08302F9h)
008635C7 8B E5 mov esp,ebp
008635C9 5D pop ebp
008635CA C3 ret
--- No source file -------------------------------------------------------------
008635CB CC int 3
...
008635EF CC int 3
Now the generated code nicely moves the pointer in register ecx to this. Other difference:
different memory addresses/offsets
mulsd instead of xorpd to negate the variable
and esp,0FFFFFFF8h disappeared (?? used to align the stack pointer esp ??)
more cleanup (after the function body)?? (add cmp call)
The assembly part where the parameters get pushed to the stack is the same for both situations:
53: d->Foo();
011A500B 8B 45 E0 mov eax,dword ptr [d]
011A500E 8B 10 mov edx,dword ptr [eax]
011A5010 8B F4 mov esi,esp
011A5012 8B 4D E0 mov ecx,dword ptr [d]
011A5015 8B 42 04 mov eax,dword ptr [edx+4]
011A5018 FF D0 call eax
Of course when I try to replicate this with a Minimal, Complete, and Verifiable example, everything works as intened. But in my big project, it fails consistently.
I'm not sure which parameters can influence compilation, and don't know enough of assembly to even see what's going on there;
therefor I'm asking here in the hope that someone has seen this before or recognizes this behaviour.
Note: it also works again, when I remove the sqrt call.
Update:
No problems in release
VS2012 SP4 (v11.0.61030.00)
problem persists when referencing member variables (iso no member references)
TODO: try without global statics

Local variable vs. array access

Which of these would be more computationally efficient, and why?
A) Repeated array access:
for(i=0; i<numbers.length; i++) {
result[i] = numbers[i] * numbers[i] * numbers[i];
}
B) Setting a local variable:
for(i=0; i<numbers.length; i++) {
int n = numbers[i];
result[i] = n * n * n;
}
Would not the repeated array access version have to be calculated (using pointer arithmetic), making the first option slower because it is doing this?:
for(i=0; i<numbers.length; i++) {
result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
}
Any sufficiently sophisticated compiler will generate the same code for all three solutions. I turned your three versions into a small C program (with a minor adjustement, I changed the access numbers.length to a macro invocation which gives the length of an array):
#include <stddef.h>
size_t i;
static const int numbers[] = { 0, 1, 2, 4, 5, 6, 7, 8, 9 };
#define ARRAYLEN(x) (sizeof((x)) / sizeof(*(x)))
static int result[ARRAYLEN(numbers)];
void versionA(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
result[i] = numbers[i] * numbers[i] * numbers[i];
}
}
void versionB(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
int n = numbers[i];
result[i] = n * n * n;
}
}
void versionC(void)
{
for(i=0; i<ARRAYLEN(numbers); i++) {
result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
}
}
I then compiled it using optimizations (and debug symbols, for prettier disassembly) with Visual Studio 2012:
C:\Temp>cl /Zi /O2 /Wall /c so19244189.c
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50727.1 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
so19244189.c
Finally, here's the disassembly:
C:\Temp>dumpbin /disasm so19244189.obj
[..]
_versionA:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
_versionB:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
_versionC:
00000000: 33 C0 xor eax,eax
00000002: 8B 0C 85 00 00 00 mov ecx,dword ptr _numbers[eax*4]
00
00000009: 8B D1 mov edx,ecx
0000000B: 0F AF D1 imul edx,ecx
0000000E: 0F AF D1 imul edx,ecx
00000011: 89 14 85 00 00 00 mov dword ptr _result[eax*4],edx
00
00000018: 40 inc eax
00000019: 83 F8 09 cmp eax,9
0000001C: 72 E4 jb 00000002
0000001E: A3 00 00 00 00 mov dword ptr [_i],eax
00000023: C3 ret
Note how the assembly is exactly the same in all cases. So the correct answer to your question
Which of these would be more computationally efficient, and why?
for this compiler is: mu. Your question cannot be answered because it's based on incorrect assumptions. None of the answers is faster than any other.
The theoretical answer:
A reasonably good optimizing compiler should convert version A to version B, and perform only one load from memory. There should be no performance difference if optimization is enabled.
If optimization is disabled, version A will be slower, because the address must be computed 3 times and there are 3 memory loads (2 of them are cached and very fast, but it's still slower than reusing a register).
In practice, the answer will depend on your compiler, and you should check this by benchmarking.
It depends on compiler but all of them should be the same.
First lets look at case B smart compiler will generate code to load value into register only once so it doesn't matter if you use some additional variable or not, compiler generates opcode for mov instruction and has value into register. So B is the same as A.
Now lets compare A and C. We should look at opeators [] inline implementation. a[b] actually is *(a + b) so *(numbers + i) the same as numbers[i] that means cases A and C are the same.
So we have (A==B) && (A==C) all in all (A==B==C) If you know what I mean :).

Is accessing c++ member class through "this->member" faster/slower than implicit call to "member"

After some searching on our friend google, I could not get a clear view on the following point.
I'm used to call class members with this->. Even if not needed, I find it more explicit as it helps when maintaining some heavy piece of algorithm with loads of vars.
As I'm working on a supposed-to-be-optimised algorithm, I was wondering whether using this-> would alter runtime performance or not.
Does it ?
No, the call is exactly the same in both cases.
It doesn't make any difference. Here's a demonstration with GCC. The source is simple class, but I've restricted this post to the difference for clarity.
% diff -s with-this.cpp without-this.cpp
7c7
< this->x = 5;
---
> x = 5;
% g++ -c with-this.cpp without-this.cpp
% diff -s with-this.o without-this.o
Files with-this.o and without-this.o are identical
Answer has been given by zennehoy and here's assembly code (generated by Microsoft C++ compiler) for a simple test class:
class C
{
int n;
public:
void boo(){n = 1;}
void goo(){this->n = 2;}
};
int main()
{
C c;
c.boo();
c.goo();
return 0;
}
Disassembly Window in Visual Studio shows that assembly code is the same for both functions:
class C
{
int n;
public:
void boo(){n = 1;}
001B2F80 55 push ebp
001B2F81 8B EC mov ebp,esp
001B2F83 81 EC CC 00 00 00 sub esp,0CCh
001B2F89 53 push ebx
001B2F8A 56 push esi
001B2F8B 57 push edi
001B2F8C 51 push ecx
001B2F8D 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
001B2F93 B9 33 00 00 00 mov ecx,33h
001B2F98 B8 CC CC CC CC mov eax,0CCCCCCCCh
001B2F9D F3 AB rep stos dword ptr es:[edi]
001B2F9F 59 pop ecx
001B2FA0 89 4D F8 mov dword ptr [ebp-8],ecx
001B2FA3 8B 45 F8 mov eax,dword ptr [this]
001B2FA6 C7 00 01 00 00 00 mov dword ptr [eax],1
001B2FAC 5F pop edi
001B2FAD 5E pop esi
001B2FAE 5B pop ebx
001B2FAF 8B E5 mov esp,ebp
001B2FB1 5D pop ebp
001B2FB2 C3 ret
...
--- ..\main.cpp -----------------------------
void goo(){this->n = 2;}
001B2FC0 55 push ebp
001B2FC1 8B EC mov ebp,esp
001B2FC3 81 EC CC 00 00 00 sub esp,0CCh
001B2FC9 53 push ebx
001B2FCA 56 push esi
001B2FCB 57 push edi
001B2FCC 51 push ecx
001B2FCD 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
001B2FD3 B9 33 00 00 00 mov ecx,33h
001B2FD8 B8 CC CC CC CC mov eax,0CCCCCCCCh
001B2FDD F3 AB rep stos dword ptr es:[edi]
001B2FDF 59 pop ecx
001B2FE0 89 4D F8 mov dword ptr [ebp-8],ecx
001B2FE3 8B 45 F8 mov eax,dword ptr [this]
001B2FE6 C7 00 02 00 00 00 mov dword ptr [eax],2
001B2FEC 5F pop edi
001B2FED 5E pop esi
001B2FEE 5B pop ebx
001B2FEF 8B E5 mov esp,ebp
001B2FF1 5D pop ebp
001B2FF2 C3 ret
And the code in the main:
C c;
c.boo();
001B2F0E 8D 4D F8 lea ecx,[c]
001B2F11 E8 00 E4 FF FF call C::boo (1B1316h)
c.goo();
001B2F16 8D 4D F8 lea ecx,[c]
001B2F19 E8 29 E5 FF FF call C::goo (1B1447h)
Microsoft compiler uses __thiscall calling convention by default for class member calls and this pointer is passed via ECX register.
There are several layers involved in the compilation of a language.
The difference between accessing member as member, this->member, MyClass::member etc... is a syntactic difference.
More precisely, it's a matter of name lookup, and how the front-end of the compiler will "find" the exact element you are referring to. Therefore, you might speed up compilation by being more precise... though it will be unnoticeable (there are much more time-consuming tasks involved in C++, like opening all those includes).
Since (in this case) you are referring to the same element, it should not matter.
Now, an interesting parallel can be done with interpreted languages. In an interpreted language, the name lookup will be delayed to the moment where the line (or function) is called. Therefore, it could have an impact at runtime (though once again, probably not really noticeable).