Using a C++ reference in inline assembly with GCC - c++

I have a spin lock with the xchg instruction. The C++ function takes in the resource to be locked.
Following is the code
void SpinLock::lock( u32& resource )
{
__asm__ __volatile__
(
"mov ebx, %0\n\t"
"InUseLoop:\n\t"
"mov eax, 0x01\n\t" /* 1=In Use*/
"xchg eax, [ebx]\n\t"
"cmp eax, 0x01\n\t"
"je InUseLoop\n\t"
:"=r"(resource)
:"r"(resource)
:"eax","ebx"
);
}
void SpinLock::unlock(u32& resource )
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"mov ebx, %0\n\t"
"mov DWORD PTR [ebx], 0x00\n\t"
:"=r"(resource)
:"r"(resource)
: "ebx"
);
}
This code is compiled with gcc 4.5.2 -masm=intel on a 64 bit intel machine.
The objdump produces following assembly for the above functions .
0000000000490968 <_ZN8SpinLock4lockERj>:
490968: 55 push %rbp
490969: 48 89 e5 mov %rsp,%rbp
49096c: 53 push %rbx
49096d: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490971: 48 8b 45 f0 mov -0x10(%rbp),%rax
490975: 8b 10 mov (%rax),%edx
490977: 89 d3 mov %edx,%ebx
0000000000490979 <InUseLoop>:
490979: b8 01 00 00 00 mov $0x1,%eax
49097e: 67 87 03 addr32 xchg %eax,(%ebx)
490981: 83 f8 01 cmp $0x1,%eax
490984: 74 f3 je 490979 <InUseLoop>
490986: 48 8b 45 f0 mov -0x10(%rbp),%rax
49098a: 89 10 mov %edx,(%rax)
49098c: 5b pop %rbx
49098d: c9 leaveq
49098e: c3 retq
49098f: 90 nop
0000000000490990 <_ZN8SpinLock6unlockERj>:
490990: 55 push %rbp
490991: 48 89 e5 mov %rsp,%rbp
490994: 53 push %rbx
490995: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490999: 48 8b 45 f0 mov -0x10(%rbp),%rax
49099d: 8b 00 mov (%rax),%eax
49099f: 89 d3 mov %edx,%ebx
4909a1: 67 c7 03 00 00 00 00 addr32 movl $0x0,(%ebx)
4909a8: 48 8b 45 f0 mov -0x10(%rbp),%rax
4909ac: 89 10 mov %edx,(%rax)
4909ae: 5b pop %rbx
4909af: c9 leaveq
4909b0: c3 retq
4909b1: 90 nop
The code dumps core when executing the locking operation.
Is there something grossly wrong here ?
Regards,
-J

First, why are you using truncated 32-bit addresses in your assembly code whereas the rest of the program is compiled to execute in 64-bit mode and operate with 64-bit addresses/pointers? I'm referring to ebx. Why is it not rbx?
Second, why are you trying to return a value from the assembly code with "=r"(resource)? Your functions change the in-memory value with xchg eax, [ebx] and mov DWORD PTR [ebx], 0x00 and return void. Remove "=r"(resource).
Lastly, if you look closely at the disassembly of SpinLock::lock(), can't you see something odd about ebx?:
mov %rdi,-0x10(%rbp)
mov -0x10(%rbp),%rax
mov (%rax),%edx
mov %edx,%ebx
<InUseLoop>:
mov $0x1,%eax
addr32 xchg %eax,(%ebx)
In this code, the ebx value, which is an address/pointer, does not come directly from the function's parameter (rdi), the parameter first gets dereferenced with mov (%rax),%edx, but why? If you throw away all the confusing C++ reference stuff, technically, the function receives a pointer to u32, not a pointer to a pointer to u32, and thus needs no extra dereference anywhere.
The problem is here: "r"(resource). It must be "r"(&resource).
A small 32-bit test app demonstrates this problem:
#include <iostream>
using namespace std;
void unlock1(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(resource)
:"ebx"
);
}
void unlock2(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(&resource)
:"ebx"
);
}
unsigned blah;
int main(void)
{
blah = 3456789012u;
cout << "before unlock2() blah=" << blah << endl;
unlock2(blah);
cout << "after unlock2() blah=" << blah << endl;
blah = 3456789012u;
cout << "before unlock1() blah=" << blah << endl;
unlock1(blah); // may crash here, but if it doesn't, it won't change blah
cout << "after unlock1() blah=" << blah << endl;
return 0;
}
Output:
before unlock2() blah=3456789012
after unlock2() blah=0
before unlock1() blah=3456789012
Exiting due to signal SIGSEGV
General Protection Fault at eip=000015eb
eax=ce0a6a14 ...

Related

C++ assembly code analysis (compiled with clang)

I am trying to figure out how the C++ binary code looks like, especially for virtual function calls. I have come up with few curious things. I have this following C++ code:
#include <iostream>
using namespace std;
class Base {
public:
virtual void print() { cout << "from base" << endl; }
};
class Derived : public Base {
public:
virtual void print() { cout << "from derived" << endl; }
};
int main() {
Base *b;
Derived d;
d.print();
b = &d;
b->print();
return 0;
}
I compiled it with clang++, and then use objdump:
00000000004008b0 <main>:
4008b0: 55 push rbp
4008b1: 48 89 e5 mov rbp,rsp
4008b4: 48 83 ec 20 sub rsp,0x20
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008bc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4008c3: e8 28 00 00 00 call 4008f0 <Derived::Derived()>
4008c8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008cc: e8 5f 00 00 00 call 400930 <Derived::print()>
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d5: 48 89 7d f0 mov QWORD PTR [rbp-0x10],rdi
4008d9: 48 8b 7d f0 mov rdi,QWORD PTR [rbp-0x10]
4008dd: 48 8b 07 mov rax,QWORD PTR [rdi]
4008e0: ff 10 call QWORD PTR [rax]
4008e2: 31 c0 xor eax,eax
4008e4: 48 83 c4 20 add rsp,0x20
4008e8: 5d pop rbp
4008e9: c3 ret
4008ea: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
My question is why in assembly code, we have the following code:
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
The local variable d in main() is stored at location [rbp-0x18]. This is in the automatic storage allocated on the stack for main().
lea rdi,[rbp-0x18]
This line loads the address of d into the rdi register. By convention, member functions of Derived treat rdi as the this pointer.

How to find VPTR in C++ assembly code?

class Base {
public:
Base() {}
virtual void Get() { }
};
class Derivered : public Base {
public:
virtual void Get() { }
};
int main() {
Base* base = new Derivered();
base->Get();
return 0;
}
I use gcc 5.4.0 to compile the code, and use objdump -S a.out to disassemble binary file. I want to find Base's vptr, but only display an unknown address 0x80487d4. The max address number is 0x80487b7, I cann't understand.
command list: g++ test.cpp -O0; objdump -S a.out
080486fe <_ZN4BaseC1Ev>:
80486fe: 55 push %ebp
80486ff: 89 e5 mov %esp,%ebp
8048701: ba d4 87 04 08 mov $0x80487d4,%edx
8048706: 8b 45 08 mov 0x8(%ebp),%eax
8048709: 89 10 mov %edx,(%eax)
080486fe <_ZN4BaseC1Ev>:
80486fe: 55 push %ebp
80486ff: 89 e5 mov %esp,%ebp
8048701: ba d4 87 04 08 mov $0x80487d4,%edx
8048706: 8b 45 08 mov 0x8(%ebp),%eax
8048709: 89 10 mov %edx,(%eax)
Is...
push %ebp ;- save frame pointer
mov %esp, %ebp ;- mov esp-> ebp -ebp is frame pointer
mov $0x80487d4, %edx ; load vptr address into edx
mov 0x8(%ebp), %eax ; ld eax with address of this
mov %edx,(%eax) ; store vptr in this byte 0

Compiler mov'es this pointer to wrong address

I have a simple polymorphic construction, with one pure virtual function Foo.
The only flaw in the big project it's used in is that the project uses a couple of global statics for centralized parameter loading and for event logging (can't easily get rid of that legacy code).
Project info:
Platform toolset: v110_xp
MFC in static library
MBCS charset
calling convention: __cdecl
All optimizations disabled
Warning level 4, no warnings on the whole project
Code:
class Base
{
public:
Base(){}
virtual ~Base(void){}
virtual void Foo(void) = 0;
};
class Derived
: public Base
{
public:
Derived(void) : Base(){}
virtual void Foo(void) override
{
double a = sqrt(4.9);
double b = -a;
}
Calling code (doesn't really matter, same behaviour everywhere)
BOOL MainMFCApp::InitInstance()
{
Derived* d = new Derived();
d->Foo();
delete d;
...
}
The problem is that when run in debug (not tested with release), and when we end up inside function Foo, the this pointer is 'corrupted':
this = 0xcccccccc
this.__vfptr = <unable to read memory>
When I dive into the assembly code entering the function I see the following:
13:
14:
15: void Derived::Foo(void)
16: {
015E3570 55 push ebp
015E3571 8B EC mov ebp,esp
015E3573 83 E4 F8 and esp,0FFFFFFF8h
015E3576 81 EC EC 00 00 00 sub esp,0ECh
015E357C 53 push ebx
015E357D 56 push esi
015E357E 57 push edi
015E357F 51 push ecx
015E3580 8D BD 14 FF FF FF lea edi,[ebp-0ECh]
015E3586 B9 3B 00 00 00 mov ecx,3Bh
015E358B B8 CC CC CC CC mov eax,0CCCCCCCCh
015E3590 F3 AB rep stos dword ptr es:[edi]
015E3592 59 pop ecx
015E3593 89 8C 24 F0 00 00 00 mov dword ptr [esp+0F0h],ecx
17: double a = sqrt(4.9);
015E359A F2 0F 10 05 00 55 09 02 movsd xmm0,mmword ptr ds:[2095500h]
015E35A2 E8 72 D4 FD FF call __libm_sse2_sqrt_precise (015C0A19h)
015E35A7 F2 0F 11 84 24 E0 00 00 00 movsd mmword ptr [esp+0E0h],xmm0
18: double b = -a;
015E35B0 F2 0F 10 84 24 E0 00 00 00 movsd xmm0,mmword ptr [esp+0E0h]
015E35B9 66 0F 57 05 10 55 09 02 xorpd xmm0,xmmword ptr ds:[2095510h]
015E35C1 F2 0F 11 84 24 D0 00 00 00 movsd mmword ptr [esp+0D0h],xmm0
19: return;
20: }
015E35CA 5F pop edi
015E35CB 5E pop esi
015E35CC 5B pop ebx
015E35CD 8B E5 mov esp,ebp
015E35CF 5D pop ebp
015E35D0 C3 ret
--- No source file -------------------------------------------------------------
015E35D1 CC int 3
...
015E35EF CC int 3
Breakpoint at line 17, right before entering the function body: Using the watch window to inspect the object behind register ecx (with a cast to Derived*) shows that ecx contains the pointer I need (to the object), but for some reason it is mov'ed to the seemingly random address [esp+0F0h].
And now the really interesting/flabbergasting part: When I change
double b = -a;
to
double b = -1.0 * a;
and compile again, everything magically works. The function assembly has now changed to:
13:
14:
15: void Derived::Foo(void)
16: {
00863570 55 push ebp
00863571 8B EC mov ebp,esp
00863573 81 EC EC 00 00 00 sub esp,0ECh
00863579 53 push ebx
0086357A 56 push esi
0086357B 57 push edi
0086357C 51 push ecx
0086357D 8D BD 14 FF FF FF lea edi,[ebp-0ECh]
00863583 B9 3B 00 00 00 mov ecx,3Bh
00863588 B8 CC CC CC CC mov eax,0CCCCCCCCh
0086358D F3 AB rep stos dword ptr es:[edi]
0086358F 59 pop ecx
00863590 89 4D F8 mov dword ptr [this],ecx
17: double a = sqrt(4.9);
00863593 F2 0F 10 05 00 55 31 01 movsd xmm0,mmword ptr ds:[1315500h]
0086359B E8 79 D4 FD FF call __libm_sse2_sqrt_precise (0840A19h)
008635A0 F2 0F 11 45 E8 movsd mmword ptr [a],xmm0
18: double b = -1.0 * a;
008635A5 F2 0F 10 05 10 55 31 01 movsd xmm0,mmword ptr ds:[1315510h]
008635AD F2 0F 59 45 E8 mulsd xmm0,mmword ptr [a]
008635B2 F2 0F 11 45 D8 movsd mmword ptr [b],xmm0
19: return;
20: }
008635B7 5F pop edi
008635B8 5E pop esi
008635B9 5B pop ebx
008635BA 81 C4 EC 00 00 00 add esp,0ECh
008635C0 3B EC cmp ebp,esp
008635C2 E8 32 CD FC FF call __RTC_CheckEsp (08302F9h)
008635C7 8B E5 mov esp,ebp
008635C9 5D pop ebp
008635CA C3 ret
--- No source file -------------------------------------------------------------
008635CB CC int 3
...
008635EF CC int 3
Now the generated code nicely moves the pointer in register ecx to this. Other difference:
different memory addresses/offsets
mulsd instead of xorpd to negate the variable
and esp,0FFFFFFF8h disappeared (?? used to align the stack pointer esp ??)
more cleanup (after the function body)?? (add cmp call)
The assembly part where the parameters get pushed to the stack is the same for both situations:
53: d->Foo();
011A500B 8B 45 E0 mov eax,dword ptr [d]
011A500E 8B 10 mov edx,dword ptr [eax]
011A5010 8B F4 mov esi,esp
011A5012 8B 4D E0 mov ecx,dword ptr [d]
011A5015 8B 42 04 mov eax,dword ptr [edx+4]
011A5018 FF D0 call eax
Of course when I try to replicate this with a Minimal, Complete, and Verifiable example, everything works as intened. But in my big project, it fails consistently.
I'm not sure which parameters can influence compilation, and don't know enough of assembly to even see what's going on there;
therefor I'm asking here in the hope that someone has seen this before or recognizes this behaviour.
Note: it also works again, when I remove the sqrt call.
Update:
No problems in release
VS2012 SP4 (v11.0.61030.00)
problem persists when referencing member variables (iso no member references)
TODO: try without global statics

SIGSEGV When accessing array element using assembly

Background:
I am new to assembly. When I was learning programming, I made a program that implements multiplication tables up to 1000 * 1000. The tables are formatted so that each answer is on the line factor1 << 10 | factor2 (I know, I know, it's isn't pretty). These tables are then loaded into an array: int* tables. Empty lines are filled with 0. Here is a link to the file for the tables (7.3 MB). I know using assembly won't speed up this by much, but I just wanted to do it for fun (and a bit of practice).
Question:
I'm trying to convert this code into inline assembly (tables is a global):
int answer;
// ...
answer = tables [factor1 << 10 | factor2];
This is what I came up with:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
My C++ code works fine, but my assembly fails. What is wrong with my assembly (especially the movl _tables(,%2,4), %0; part), compared to my C++
What I have done to solve it:
I used some random numbers: 89 796 as factor1 and factor2. I know that there is an element at 89 << 10 | 786 (which is 91922) – verified this with C++. When I run it with gdb, I get a SIGSEGV:
Program received signal SIGSEGV, Segmentation fault.
at this line:
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
I added two methods around my asm, which is how I know where the asm block is in the disassembly.
Disassembly of my asm block:
The disassembly from objdump -M att -d looks fine (although I'm not sure, I'm new to assembly, as I said):
402096: 8b 45 08 mov 0x8(%ebp),%eax
402099: 8b 55 0c mov 0xc(%ebp),%edx
40209c: c1 e0 0a shl $0xa,%eax
40209f: 09 c2 or %eax,%edx
4020a1: 8b 04 95 18 e0 47 00 mov 0x47e018(,%edx,4),%eax
4020a8: 89 45 ec mov %eax,-0x14(%ebp)
The disassembly from objdump -M intel -d:
402096: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
402099: 8b 55 0c mov edx,DWORD PTR [ebp+0xc]
40209c: c1 e0 0a shl eax,0xa
40209f: 09 c2 or edx,eax
4020a1: 8b 04 95 18 e0 47 00 mov eax,DWORD PTR [edx*4+0x47e018]
4020a8: 89 45 ec mov DWORD PTR [ebp-0x14],eax
From what I understand, it's moving the first parameter of my void calc ( int factor1, int factor2 ) function into eax. Then it's moving the second parameter into edx. Then it shifts eax to the left by 10 and ors it with edx. A 32-bit integer is 4 bytes, so [edx*4+base_address]. Move result to eax and then put eax into int answer (which, I'm guessing is on -0x14 of the stack). I don't really see much of a problem.
Disassembly of the compiler's .exe:
When I replace the asm block with plain C++ (answer = tables [factor1 << 10 | factor2];) and disassemble it this is what I get in Intel syntax:
402096: a1 18 e0 47 00 mov eax,ds:0x47e018
40209b: 8b 55 08 mov edx,DWORD PTR [ebp+0x8]
40209e: c1 e2 0a shl edx,0xa
4020a1: 0b 55 0c or edx,DWORD PTR [ebp+0xc]
4020a4: c1 e2 02 shl edx,0x2
4020a7: 01 d0 add eax,edx
4020a9: 8b 00 mov eax,DWORD PTR [eax]
4020ab: 89 45 ec mov DWORD PTR [ebp-0x14],eax
AT&T syntax:
402096: a1 18 e0 47 00 mov 0x47e018,%eax
40209b: 8b 55 08 mov 0x8(%ebp),%edx
40209e: c1 e2 0a shl $0xa,%edx
4020a1: 0b 55 0c or 0xc(%ebp),%edx
4020a4: c1 e2 02 shl $0x2,%edx
4020a7: 01 d0 add %edx,%eax
4020a9: 8b 00 mov (%eax),%eax
4020ab: 89 45 ec mov %eax,-0x14(%ebp)
I am not really familiar with the Intel syntax, so I am just going to try and understand the AT&T syntax:
It first moves the base address of the tables array into %eax. Then, is moves the first parameter into %edx. It shifts %edx to the left by 10 then ors it with the second parameter. Then, by shifting %edx to the left by two, it actually multiplies %edx by 4. Then, it adds that to %eax (the base address of the array). So, basically it just did this: [edx*4+0x47e018] (Intel syntax) or 0x47e018(,%edx,4) AT&T. It moves the value of the element it got into %eax and puts it into int answer. This method is more "expanded", but it does the same thing as my hand-written assembly! So why is mine giving a SIGSEGV while the compiler's working fine?
I bet (from the disassembly) that tables is a pointer to an array, not the array itself.
So you need:
asm volatile ( "shll $10, %1;"
movl _tables,%%eax
"orl %1, %2;"
"movl (%%eax,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2) : "eax" )
(Don't forget the extra clobber in the last line).
There are of course variations, this may be more efficient if the code is in a loop:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl (%3,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2), "r"(tables) )
This is intended to be an addition to Mats Petersson's answer - I wrote it simply because it wasn't immediately clear to me why OP's analysis of the disassembly (that his assembly and the compiler-generated one were equivalent) was incorrect.
As Mats Petersson explains, the problem is that tables is actually a pointer to an array, so to access an element, you have to dereference twice. Now to me, it wasn't immediately clear where this happens in the compiler-generated code. The culprit is this innocent-looking line:
a1 18 e0 47 00 mov 0x47e018,%eax
To the untrained eye (that includes mine), this might look like the value 0x47e018 is moved to eax, but it's actually not. The Intel-syntax representation of the same opcodes gives us a clue:
a1 18 e0 47 00 mov eax,ds:0x47e018
Ah - ds: - so it's not actually a value, but an address!
For anyone who is wondering now, the following would be the opcodes and ATT syntax assembly for moving the value 0x47e018 to eax:
b8 18 e0 47 00 mov $0x47e018,%eax

Is accessing c++ member class through "this->member" faster/slower than implicit call to "member"

After some searching on our friend google, I could not get a clear view on the following point.
I'm used to call class members with this->. Even if not needed, I find it more explicit as it helps when maintaining some heavy piece of algorithm with loads of vars.
As I'm working on a supposed-to-be-optimised algorithm, I was wondering whether using this-> would alter runtime performance or not.
Does it ?
No, the call is exactly the same in both cases.
It doesn't make any difference. Here's a demonstration with GCC. The source is simple class, but I've restricted this post to the difference for clarity.
% diff -s with-this.cpp without-this.cpp
7c7
< this->x = 5;
---
> x = 5;
% g++ -c with-this.cpp without-this.cpp
% diff -s with-this.o without-this.o
Files with-this.o and without-this.o are identical
Answer has been given by zennehoy and here's assembly code (generated by Microsoft C++ compiler) for a simple test class:
class C
{
int n;
public:
void boo(){n = 1;}
void goo(){this->n = 2;}
};
int main()
{
C c;
c.boo();
c.goo();
return 0;
}
Disassembly Window in Visual Studio shows that assembly code is the same for both functions:
class C
{
int n;
public:
void boo(){n = 1;}
001B2F80 55 push ebp
001B2F81 8B EC mov ebp,esp
001B2F83 81 EC CC 00 00 00 sub esp,0CCh
001B2F89 53 push ebx
001B2F8A 56 push esi
001B2F8B 57 push edi
001B2F8C 51 push ecx
001B2F8D 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
001B2F93 B9 33 00 00 00 mov ecx,33h
001B2F98 B8 CC CC CC CC mov eax,0CCCCCCCCh
001B2F9D F3 AB rep stos dword ptr es:[edi]
001B2F9F 59 pop ecx
001B2FA0 89 4D F8 mov dword ptr [ebp-8],ecx
001B2FA3 8B 45 F8 mov eax,dword ptr [this]
001B2FA6 C7 00 01 00 00 00 mov dword ptr [eax],1
001B2FAC 5F pop edi
001B2FAD 5E pop esi
001B2FAE 5B pop ebx
001B2FAF 8B E5 mov esp,ebp
001B2FB1 5D pop ebp
001B2FB2 C3 ret
...
--- ..\main.cpp -----------------------------
void goo(){this->n = 2;}
001B2FC0 55 push ebp
001B2FC1 8B EC mov ebp,esp
001B2FC3 81 EC CC 00 00 00 sub esp,0CCh
001B2FC9 53 push ebx
001B2FCA 56 push esi
001B2FCB 57 push edi
001B2FCC 51 push ecx
001B2FCD 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
001B2FD3 B9 33 00 00 00 mov ecx,33h
001B2FD8 B8 CC CC CC CC mov eax,0CCCCCCCCh
001B2FDD F3 AB rep stos dword ptr es:[edi]
001B2FDF 59 pop ecx
001B2FE0 89 4D F8 mov dword ptr [ebp-8],ecx
001B2FE3 8B 45 F8 mov eax,dword ptr [this]
001B2FE6 C7 00 02 00 00 00 mov dword ptr [eax],2
001B2FEC 5F pop edi
001B2FED 5E pop esi
001B2FEE 5B pop ebx
001B2FEF 8B E5 mov esp,ebp
001B2FF1 5D pop ebp
001B2FF2 C3 ret
And the code in the main:
C c;
c.boo();
001B2F0E 8D 4D F8 lea ecx,[c]
001B2F11 E8 00 E4 FF FF call C::boo (1B1316h)
c.goo();
001B2F16 8D 4D F8 lea ecx,[c]
001B2F19 E8 29 E5 FF FF call C::goo (1B1447h)
Microsoft compiler uses __thiscall calling convention by default for class member calls and this pointer is passed via ECX register.
There are several layers involved in the compilation of a language.
The difference between accessing member as member, this->member, MyClass::member etc... is a syntactic difference.
More precisely, it's a matter of name lookup, and how the front-end of the compiler will "find" the exact element you are referring to. Therefore, you might speed up compilation by being more precise... though it will be unnoticeable (there are much more time-consuming tasks involved in C++, like opening all those includes).
Since (in this case) you are referring to the same element, it should not matter.
Now, an interesting parallel can be done with interpreted languages. In an interpreted language, the name lookup will be delayed to the moment where the line (or function) is called. Therefore, it could have an impact at runtime (though once again, probably not really noticeable).