Let's say I've got in X64 release configuration
It's a an obfuscated code snippet...
// Hdr1.h
// Dozen of includes
class Cls1
{
public:
Cls1();
virtual void bar();
// ...
protected:
// about 7 fields where some of them are of complex template type.
bool isFlag1 : 1;
bool isFlag2 : 1;
};
// Hdr2
// Dozens of includes
class Cls2
{
public:
// ...
void foo();
};
I've got separate translation units to implement these classes. Say from foo I try to access virtual method of Cls1::bar and I get a crash(access violation).
void Cls2::foo()
{
//...
Cls1 * pCls1 = // somehow I get this goddamn pointer
pCls1->bar(); // Here I crash
}
From disassembly I see that Cls1::Cls1 puts vtable ptr at offset 8 to the very beginning of this. From disassembly of Cls2::foo I see that it takes pointer to vtable from offset zero. Debugger is also unable to see this vtable correctly. If I manually get vtable at offset 8 - addresses appear to be correct in this table.
The question is - why could this happen, what pragma could lead to this or anything else? Compilation flags are the same for both translation units.
Below I add a bit of disassembly:
This is a normal case that I face across the code:
Module1!CSomeOkClass::CreateObjInstance:
sub rsp,28h
mov edx,4 ; own inlined operator new
lea ecx,[rdx+34h] ; own inlined operator new
call OwnMemoryRoutines!OwnMalloc (someAddr) ; own inlined operator new
xor edx,edx
test rax,rax
je Module1!CSomeOkClass::CreateObjInstance+0x40 (someAddr)
**lea rcx,[Module1!CSomeOkClass::`vftable' (someAddr)] ; Inlined CSomeOkClass::CSomeOkClass < vtable ptr**
mov qword ptr [rax+8],rdx ; Inlined CSomeOkClass::CSomeOkClass
mov qword ptr [rax+10h],rdx ; Inlined CSomeOkClass::CSomeOkClass
mov qword ptr [rax+18h],rdx ; Inlined CSomeOkClass::CSomeOkClass
mov byte ptr [rax+20h],dl ; Inlined CSomeOkClass::CSomeOkClass
mov qword ptr [rax+28h],rdx ; Inlined CSomeOkClass::CSomeOkClass
**mov qword ptr [rax],rcx ; Inlined CSomeOkClass::CSomeOkClass < offset zero**
Now let's see what I've got for Cls1::Cls1:
Module1!Cls1::Cls1:
mov qword ptr [rsp+8],rbx
push rdi
sub rsp,20h
**lea rax,[Module1!Cls1::`vftable' (someAddress)] ; vtable address**
mov rbx,rdx
mov rdi,rcx
**mov qword ptr [rcx+8],rax ; Places at offset 8**
I assure you that Cls2 expects pointer to vtable to be at offset zero.
Compilation options are:
/nologo /WX /W3 /MD /c /Zc:wchar_t /Zc:forScope /Zm192 /bigobj /d2Zi+ /Zi /Oi /GS- /GF /Oy- /fp:fast /Gm- /Ox /Gy /Ob2 /GR- /Os
I noticed that Cls1::Cls1 heavily uses SSE instructions inlined from intrinsics.
Compiler version:
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50727.1 for x64
Please pay attention that this code works ok on different platforms/compilers.
I managed to figure out that the problem was in fact with this bitfield I have in the very end of Cl1 definition. The ctor generated places pointer to vtable at offset zero if I make isFlag1 + isFlag2 ordinary bools. These flags are initialized in the ctor's initializer list. By commenting out class's code one by line I narrowed down the problem to this bitfield. In order to investigate this I used WinDbg, /P compiler option, compiled cpp unit manually with the original flags provided + /FAs /Fa. It appears that it is a compiler's bug.
I managed to figure out that the problem was in fact with this bitfield I have in the very end of Cl1 definition. The ctor generated places pointer to vtable at offset zero if I make isFlag1 + isFlag2 ordinary bools. These flags are initialized in the ctor's initializer list. By commenting out class's code one by line I narrowed down the problem to this bitfield. In order to investigate this I used WinDbg, /P compiler option, compiled cpp unit manually with the original flags provided + /FAs /Fa. It appears that it is a compiler's bug.
Related
TL;DR
A subclass is reimplementing (redefining) a virtual function of the superclass (base class) in the scope of the superclass, because the dynamic loader requires it to do so. It doesn't make any sense to me.
Example:
class IO80211Controller : public IOEthernetController
{
virtual IOReturn enablePacketTimestamping(); // Implemented in binary, I can see the disassembly.
};
// .cpp - Redefinition with superclass namespace.
IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported; // This is from the disassembly of IO80211Controller
}
The above isn't the real header, I hope it's close to what it should be - no header is available.
// .hpp
class AirPortBrcm4331 : public IO80211Controller
{
// Subclass stuff goes here
};
// .cpp - Redefinition with superclass namespace.
IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported; // This is from the disassembly of AirPortBrcm4331
}
Background
I'm researching IO80211Family.kext (which there are no headers available for), and IO80211Controller class in particular - I'm in the process of reversing the header so it will be possible to inherit from this class and create custom 802.11 drivers.
Discovering the problem
IO80211Controller defines many virtual member functions, which I need to declare on my reversed header file. I created a header file with all virtual functions (extracted from IO80211Controller's vtable) and used it for my subclass.
When loading my new kext (with the subclass), there were linking errors:
kxld[com.osxkernel.MyWirelessDriver]: The following symbols are unresolved for this kext:
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::enableFeature(IO80211FeatureCode, void*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::flowIdSupported()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::apple80211_ioctl(IO80211Interface*, __ifnet*, unsigned long, void*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::enablePacketTimestamping()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::hardwareOutputQueueDepth(IO80211Interface*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::disablePacketTimestamping()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::performCountryCodeOperation(IO80211Interface*, IO80211CountryCodeOp)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::requiresExplicitMBufRelease()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::_RESERVEDIO80211Controllerless7()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::stopDMA()
Link failed (error code 5).
The reversed header of the superclass contains over 50 virtual member functions, so if there were any linking problems, I would assume it would be a all-or-nothing. When adding a simple implementation to these functions (using superclass namespace) the linking errors are gone.
Two questions arise
How can multiple implementations of the same functions co-exist? They both live in the kernel address space.
What makes these specific functions so special, while the other 50 are ok without a weird reimplementing demand?
Hypothesis
I can't answer the first question, but I have started a research about the second.
I looked into the IO80211Family mach-o symbol table, and all the functions with the linking error don't contain the N_EXT bit in their type field - meaning they are not external symbols, while the other functions do contain the N_EXT bit.
I wasn't sure how this affects the loading kext procedure, so I dived into XNU source and looked for the kext loading code. There's a major player in here called vtable patching, which might shed some light on my first question.
Anyway, there's a predicate function called kxld_sym_is_unresolved which checks whether a symbol is unresolved. kxld calls this function on all symbols, to verify they are all ok.
boolean_t
kxld_sym_is_unresolved(const KXLDSym *sym)
{
return ((kxld_sym_is_undefined(sym) && !kxld_sym_is_replaced(sym)) ||
kxld_sym_is_indirect(sym) || kxld_sym_is_common(sym));
}
This function result in my case comes down to the return value of kxld_sym_is_replaced, which simply checks if the symbol has been patched (vtable patching), I don't understand well enough what is it and how it affects me...
The Grand Question
Why Apple chose these functions to not be external? are they implying that they should be implemented by others -- and others, why same scope as superclass? I jumped into the source to find answer to this but didn't. This is what most disturbs me - it doesn't follow my logic. I understand that a full comprehensive answer is probably too complicated, so at least help me understand, on a higher level, what's going on here, what's the logic behind not letting the subclass get the implementation of these specific functions, in such a weird way (why not pure abstract)?
Thank you so much for reading this!
The immediate explanation is indeed that the symbols are not exported by the IO80211 kext. The likely reason behind this however is that the functions are implemented inline, like so:
class IO80211Controller : public IOEthernetController
{
//...
virtual IOReturn enablePacketTimestamping()
{
return kIOReturnUnsupported;
}
//...
};
For example, if I build this code:
#include <cstdio>
class MyClass
{
public:
virtual void InlineVirtual() { printf("MyClass::InlineVirtual\n"); }
virtual void RegularVirtual();
};
void MyClass::RegularVirtual()
{
printf("MyClass::RegularVirtual\n");
}
int main()
{
MyClass a;
a.InlineVirtual();
a.RegularVirtual();
}
using the command
clang++ -std=gnu++14 inline-virtual.cpp -o inline-virtual
and then inspect the symbols using nm:
$ nm ./inline-virtual
0000000100000f10 t __ZN7MyClass13InlineVirtualEv
0000000100000e90 T __ZN7MyClass14RegularVirtualEv
0000000100000ef0 t __ZN7MyClassC1Ev
0000000100000f40 t __ZN7MyClassC2Ev
0000000100001038 S __ZTI7MyClass
0000000100000faf S __ZTS7MyClass
0000000100001018 S __ZTV7MyClass
U __ZTVN10__cxxabiv117__class_type_infoE
0000000100000000 T __mh_execute_header
0000000100000ec0 T _main
U _printf
U dyld_stub_binder
You can see that MyClass::InlineVirtual has hidden visibility (t), while MyClass::RegularVirtual is exported (T). The implementation for a function declared as inline (either explicitly with the keyword or implicitly by placing it inside the class definition) must be provided in all compilation units that call it, so it makes sense that they wouldn't have external linkage.
You're hitting a very simple phenomenon: unexported symbols.
$ nm /System/Library/Extensions/IO80211Family.kext/Contents/MacOS/IO80211Family | fgrep __ZN17IO80211Controller | egrep '\w{16} t'
00000000000560c6 t __ZN17IO80211Controller13enableFeatureE18IO80211FeatureCodePv
00000000000560f6 t __ZN17IO80211Controller15flowIdSupportedEv
0000000000055fd4 t __ZN17IO80211Controller16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv
0000000000055f74 t __ZN17IO80211Controller21monitorModeSetEnabledEP16IO80211Interfacebj
0000000000056154 t __ZN17IO80211Controller24enablePacketTimestampingEv
0000000000056008 t __ZN17IO80211Controller24hardwareOutputQueueDepthEP16IO80211Interface
0000000000056160 t __ZN17IO80211Controller25disablePacketTimestampingEv
0000000000056010 t __ZN17IO80211Controller27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp
00000000000560ee t __ZN17IO80211Controller27requiresExplicitMBufReleaseEv
0000000000055ffc t __ZN17IO80211Controller7stopDMAEv
0000000000057452 t __ZN17IO80211Controller9MetaClassD0Ev
0000000000057448 t __ZN17IO80211Controller9MetaClassD1Ev
Save for the two MetaClass destructors, the only difference to your list of linking errors is monitorModeSetEnabled (any chance you're overriding that?).
Now on my system I have exactly one class extending IO80211Controller, which is AirPort_BrcmNIC, implemented by com.apple.driver.AirPort.BrcmNIC. So let's look at how that handles it:
$ nm /System/Library/Extensions/AirPortBrcmNIC-MFG.kext/Contents/MacOS/AirPortBrcmNIC-MFG | egrep '13enableFeatureE18IO80211FeatureCodePv|15flowIdSupportedEv|16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv|21monitorModeSetEnabledEP16IO80211Interfacebj|24enablePacketTimestampingEv|24hardwareOutputQueueDepthEP16IO80211Interface|25disablePacketTimestampingEv|27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp|27requiresExplicitMBufReleaseEv|7stopDMAEv'
0000000000046150 t __ZN17IO80211Controller15flowIdSupportedEv
0000000000046120 t __ZN17IO80211Controller16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv
0000000000046160 t __ZN17IO80211Controller24enablePacketTimestampingEv
0000000000046170 t __ZN17IO80211Controller25disablePacketTimestampingEv
0000000000046140 t __ZN17IO80211Controller27requiresExplicitMBufReleaseEv
000000000003e880 T __ZN19AirPort_BrcmNIC_MFG13enableFeatureE18IO80211FeatureCodePv
0000000000025b10 T __ZN19AirPort_BrcmNIC_MFG21monitorModeSetEnabledEP16IO80211Interfacebj
0000000000025d20 T __ZN19AirPort_BrcmNIC_MFG24hardwareOutputQueueDepthEP16IO80211Interface
0000000000038cf0 T __ZN19AirPort_BrcmNIC_MFG27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp
000000000003e7d0 T __ZN19AirPort_BrcmNIC_MFG7stopDMAEv
So one bunch of methods they've overridden and the rest... they re-implemented locally. Firing up a disassembler, we can see that these are really just stubs:
;-- IO80211Controller::apple80211_ioctl(IO80211Interface*,__ifnet*,unsignedlong,void*):
;-- method.IO80211Controller.apple80211_ioctl_IO80211Interface____ifnet__unsignedlong_void:
0x00046120 55 push rbp
0x00046121 4889e5 mov rbp, rsp
0x00046124 4d89c1 mov r9, r8
0x00046127 4989c8 mov r8, rcx
0x0004612a 4889d1 mov rcx, rdx
0x0004612d 488b17 mov rdx, qword [rdi]
0x00046130 488b82900c00. mov rax, qword [rdx + 0xc90]
0x00046137 31d2 xor edx, edx
0x00046139 5d pop rbp
0x0004613a ffe0 jmp rax
0x0004613c 0f1f4000 nop dword [rax]
;-- IO80211Controller::requiresExplicitMBufRelease():
;-- method.IO80211Controller.requiresExplicitMBufRelease:
0x00046140 55 push rbp
0x00046141 4889e5 mov rbp, rsp
0x00046144 31c0 xor eax, eax
0x00046146 5d pop rbp
0x00046147 c3 ret
0x00046148 0f1f84000000. nop dword [rax + rax]
;-- IO80211Controller::flowIdSupported():
;-- method.IO80211Controller.flowIdSupported:
0x00046150 55 push rbp
0x00046151 4889e5 mov rbp, rsp
0x00046154 31c0 xor eax, eax
0x00046156 5d pop rbp
0x00046157 c3 ret
0x00046158 0f1f84000000. nop dword [rax + rax]
;-- IO80211Controller::enablePacketTimestamping():
;-- method.IO80211Controller.enablePacketTimestamping:
0x00046160 55 push rbp
0x00046161 4889e5 mov rbp, rsp
0x00046164 b8c70200e0 mov eax, 0xe00002c7
0x00046169 5d pop rbp
0x0004616a c3 ret
0x0004616b 0f1f440000 nop dword [rax + rax]
;-- IO80211Controller::disablePacketTimestamping():
;-- method.IO80211Controller.disablePacketTimestamping:
0x00046170 55 push rbp
0x00046171 4889e5 mov rbp, rsp
0x00046174 b8c70200e0 mov eax, 0xe00002c7
0x00046179 5d pop rbp
0x0004617a c3 ret
0x0004617b 0f1f440000 nop dword [rax + rax]
Which about corresponds to this:
static uint32_t IO80211Controller::apple80211_ioctl(IO80211Interface *intf, __ifnet *net, unsigned long some, void *whatev)
{
return this->apple80211_ioctl(intf, (IO80211VirtualInterface*)NULL, net, some, whatev);
}
static bool IO80211Controller::requiresExplicitMBufRelease()
{
return false;
}
static bool IO80211Controller::flowIdSupported()
{
return false;
}
static IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported;
}
static IOReturn IO80211Controller::disablePacketTimestamping()
{
return kIOReturnUnsupported;
}
I didn't try to compile the above, but that should get you on the right track. :)
I'm catching a link error when compiling and linking a source file with inline assembly.
Here are the test files:
via:$ cat test.cxx
extern int libtest();
int main(int argc, char* argv[])
{
return libtest();
}
$ cat lib.cxx
#include <stdint.h>
int libtest()
{
uint32_t rnds_00_15;
__asm__ __volatile__
(
".intel_syntax noprefix ;\n\t"
"mov DWORD PTR [rnds_00_15], 1 ;\n\t"
"cmp DWORD PTR [rnds_00_15], 1 ;\n\t"
"je done ;\n\t"
"done: ;\n\t"
".att_syntax noprefix ;\n\t"
:
: [rnds_00_15] "m" (rnds_00_15)
: "memory", "cc"
);
return 0;
}
Compiling and linking the program results in:
via:$ g++ -fPIC test.cxx lib.cxx -c
via:$ g++ -fPIC lib.o test.o -o test.exe
lib.o: In function `libtest()':
lib.cxx:(.text+0x1d): undefined reference to `rnds_00_15'
lib.cxx:(.text+0x27): undefined reference to `rnds_00_15'
collect2: error: ld returned 1 exit status
The real program is more complex. The routine is out of registers so the flag rnds_00_15 must be a memory operand. Use of rnds_00_15 is local to the asm block. It is declared in the C code to ensure the memory is allocated on the stack and nothing more. We don't read from it or write to it as far as the C code is concerned. We list it as a memory input so GCC knows we use it and wire up the "C variable name" in the extended ASM.
Why am I receiving a link error, and how do I fix it?
Compile with gcc -masm=intel and don't try to switch modes inside the asm template string. AFAIK there's no equivalent before clang14 (Note: MacOS installs clang as gcc / g++ by default.)
Also, of course you need to use valid GNU C inline asm, using operands to tell the compiler which C objects you want to read and write.
Can I use Intel syntax of x86 assembly with GCC? clang14 supports -masm=intel like GCC
How to set gcc to use intel syntax permanently? clang13 and earlier didn't.
I don't believe Intel syntax uses the percent sign. Perhaps I am missing something?
You're getting mixed up between %operand substitutions into the Extended-Asm template (which use a single %), vs. the final asm that the assembler sees.
You need %% to use a literal % in the final asm. You wouldn't use "mov %%eax, 1" in Intel-syntax inline asm, but you do still use "mov %0, 1" or %[named_operand].
See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html. In Basic asm (no operands), there is no substitution and % isn't special in the template, so you'd write mov $1, %eax in Basic asm vs. mov $1, %%eax in Extended, if for some reason you weren't using an operand like mov $1, %[tmp] or mov $1, %0.
uint32_t rnds_00_15; is a local with automatic storage. Of course it there's no asm symbol with that name.
Use %[rnds_00_15] and compile with -masm=intel (And remove the .att_syntax at the end; that would break the compiler-generate asm that comes after.)
You also need to remove the DWORD PTR, because the operand-expansion already includes that, e.g. DWORD PTR [rsp - 4], and clang errors on DWORD PTR DWORD PTR [rsp - 4]. (GAS accepts it just fine, but the 2nd one takes precendence so it's pointless and potentially misleading.)
And you'll want a "=m" output operand if you want the compiler to reserve you some scratch space on the stack. You must not modify input-only operands, even if it's unused in the C. Maybe the compiler decides it can overlap something else because it's not written and not initialized (i.e. UB). (I'm not sure if your "memory" clobber makes it safe, but there's no reason not to use an early-clobber output operand here.)
And you'll want to avoid label name conflicts by using %= to get a unique number.
Working example (GCC and ICC, but not clang unfortunately), on the Godbolt compiler explorer (which uses -masm=intel depending on options in the dropdown). You can use "binary mode" (the 11010 button) to prove that it actually assembles after compiling to asm without warnings.
int libtest_intel()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overridden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
( // ".intel_syntax noprefix \n\t"
"mov %[rnds_00_15], 1 \n\t"
"cmp %[rnds_00_15], 1 \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
:
: // no clobbers
);
return 0;
}
Compiles (with gcc -O3 -masm=intel) to this asm. Also works with gcc -m32 -masm=intel of course:
libtest_intel:
mov DWORD PTR [rsp-4], 1
cmp DWORD PTR [rsp-4], 1
je .Ldone8
.Ldone8:
xor eax, eax
ret
I couldn't get this to work with clang: It choked on .intel_syntax noprefix when I left that in explicitly.
Operand-size overrides:
You have to use %b[tmp] to get the compiler to substitute in BYTE PTR [rsp-4] to only access the low byte of a dword input operand. I'd recommend AT&T syntax if you want to do much of this.
Using %[rnds_00_15] results in Error: junk '(%ebp)' after expression.
That's because you switched to Intel syntax without telling the compiler. If you want it to use Intel addressing modes, compile with -masm=intel so the compiler can substitute into the template with the correct syntax.
This is why I avoid that crappy GCC inline assembly at nearly all costs. Man I despise this crappy tool.
You're just using it wrong. It's a bit cumbersome, but makes sense and mostly works well if you understand how it's designed.
Repeat after me: The compiler doesn't parse the asm string at all, except to do text substitutions of %operand. This is why it doesn't notice your .intel_syntax noprefex and keeps substituting AT&T syntax.
It does work better and more easily with AT&T syntax though, e.g. for overriding the operand-size of a memory operand, or adding an offset. (e.g. 4 + %[mem] works in AT&T syntax).
Dialect alternatives:
If you want to write inline asm that doesn't depend on -masm=intel or not, use Dialect alternatives (which makes your code super-ugly; not recommended for anything other than wrapping one or two instructions):
Also demonstrates operand-size overrides
#include <stdint.h>
int libtest_override_operand_size()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overriden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
(
"{movl $1, %[rnds_00_15] | mov %[rnds_00_15], 1} \n\t"
"{cmpl $1, %[rnds_00_15] | cmp %k[rnds_00_15], 1} \n\t"
"{cmpw $1, %[rnds_00_15] | cmp %w[rnds_00_15], 1} \n\t"
"{cmpb $1, %[rnds_00_15] | cmp %b[rnds_00_15], 1} \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
);
return 0;
}
With Intel syntax, gcc compiles it to:
mov DWORD PTR [rsp-4], 1
cmp DWORD PTR [rsp-4], 1
cmp WORD PTR [rsp-4], 1
cmp BYTE PTR [rsp-4], 1
je .Ldone38
.Ldone38:
xor eax, eax
ret
With AT&T syntax, compiles to:
movl $1, -4(%rsp)
cmpl $1, -4(%rsp)
cmpw $1, -4(%rsp)
cmpb $1, -4(%rsp)
je .Ldone38
.Ldone38:
xorl %eax, %eax
ret
Using this C++ code:
struct Class
{
void* vtable[1];
};
struct Object
{
Class *klass;
};
extern Object* obj;
typedef Object* (*ObjectFunction)(void*);
extern "C" Object* DoStuff ()
{
return ((ObjectFunction)obj->klass->vtable[3])(nullptr);
}
With compiler version: Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547 for x86
And compiler args: cl.exe source.cpp /MD /c /bigobj /Ox /Oy- /Fasource.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.11.25547.0
TITLE C:\Users\xxxxx\AppData\Local\Temp\source.cpp
.686P
.XMM
include listing.inc
.model flat
INCLUDELIB MSVCRT
INCLUDELIB OLDNAMES
PUBLIC _DoStuff
EXTRN ?obj##3PAUObject##A:DWORD ; obj
; Function compile flags: /Ogtp
_TEXT SEGMENT
_DoStuff PROC
mov eax, DWORD PTR ?obj##3PAUObject##A ; obj
push 0
mov eax, DWORD PTR [eax]
mov eax, DWORD PTR [eax+12]
call eax
add esp, 4
ret 0
_DoStuff ENDP
_TEXT ENDS
END
Notice that the ebp register isn't pushed into the stack, and the caller address is not saved into the ebp register. This makes CaptureStackBacktrace function not see the caller of this function in the stacktrace. Why does the compiler not emit function prolog here?
In GCC i can selectively set optimization flags for specific function, so this:
void func() {}
generates:
func():
push rbp
mov rbp, rsp
nop
pop rbp
ret
And this:
__attribute__((optimize("-fomit-frame-pointer")))
void func() {}
generates:
func():
nop
ret
How can i do the same in visual studio?
There's a command line parameter to the compiler, /Oy, this makes the compiler to omit frame pointers. You can achieve the same with #pragma:
#pragma optimize("y", on)
int foo(int a) { // foo will be compiled with omitted frame pointers
return a;
}
#pragma optimize("y", off)
Here, foo() will be compiled with omitted frame pointers.
Note: As I see, you have to build an optimized build to make this option have an effect. So, either supply some optimization flag to the compiler (like "/Og"), or include "g" into the pragma: #pragma optimize("gy", ...)
(I've checked this with Visual Studio 2015)
I've been searching for answers to this problem for the past hour but can't find a solution that works. I'm trying to use function pointers to call a non-static member function of a specific object. My code compiles fine, but during runtime I get a nasty runtime exception that says:
Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
A lot of websites said to specify the calling convention in the method header, so I added __cdecl before it. However, my code encountered the same runtime exception after the change (I tried using other calling conventions as well). I'm not sure why I have to specify cdecl in the first place because my project settings are set to cdecl. I am using some external libraries, but those were working fine before I added this function pointer stuff.
I'm following this: https://stackoverflow.com/a/151449
My code:
A.h
#pragma once
class B;
typedef void (B::*ReceiverFunction)();
class A
{
public:
A();
~A();
void addEventListener(ReceiverFunction receiverFunction);
};
A.cpp
#include "A.h"
A::A(){}
A::~A(){}
void A::addEventListener(ReceiverFunction receiverFunction)
{
//Do nothing
}
B.h
#pragma once
#include <iostream>
#include "A.h"
class B
{
public:
B();
~B();
void testFunction();
void setA(A* a);
void addEvent();
private:
A* a;
};
B.cpp
#include "B.h"
B::B(){}
B::~B(){}
void B::setA(A* a)
{
this->a = a;
}
void B::addEvent()
{
a->addEventListener(&B::testFunction); //This is the offending line for the runtime exception
}
void B::testFunction()
{
//Nothing here
}
main.cpp
#include "A.h"
#include "B.h"
int main()
{
A* a = new A();
B* b = new B();
b->setA(a);
b->addEvent();
}
I'm running with Visual Studio 2010, but I'd like my code to work on other platforms with minimal changes.
This is a known problem, necessary ingredients are a member pointer declaration using an incomplete class and having it used in different translation units. An optimization in the MSVC compiler, it uses different internal representations for a member pointers depending on the inheritance.
The workaround is to compile with /vmg or to declare the inheritance explicitly:
class __single_inheritance B;
typedef void (B::*ReceiverFunction)();
Seems not many has reproduced the problem, I'll first show the behavior of VS2010 on this piece of code here. (DEBUG build, 32bit OS)
The problem is in B::addEven() and A::addEventListener(). To give me a reference point to check the ESP value, two additional statements are added to B::addEven().
// in B.cpp, where B is complete
void B::addEvent()
{
00411580 push ebp
00411581 mov ebp,esp
00411583 sub esp,0D8h
00411589 push ebx
0041158A push esi
0041158B push edi
0041158C push ecx
0041158D lea edi,[ebp-0D8h]
00411593 mov ecx,36h
00411598 mov eax,0CCCCCCCCh
0041159D rep stos dword ptr es:[edi]
0041159F pop ecx
004115A0 mov dword ptr [ebp-8],ecx
int i = sizeof(ReceiverFunction); // added, sizeof(ReceiverFunction) is 4
004115A3 mov dword ptr [i],4
a->addEventListener(&B::testFunction); //This is the offending line for the runtime exception
004115AA push offset B::testFunction (411041h)
004115AF mov eax,dword ptr [this]
004115B2 mov ecx,dword ptr [eax]
004115B4 call A::addEventListener (4111D6h)
i = 5; // added
004115B9 mov dword ptr [i],5
}
004115C0 pop edi
004115C1 pop esi
004115C2 pop ebx
004115C3 add esp,0D8h
004115C9 cmp ebp,esp
004115CB call #ILT+330(__RTC_CheckEsp) (41114Fh)
004115D0 mov esp,ebp
004115D2 pop ebp
004115D3 ret
// In A.cpp, where B is not complete
void A::addEventListener(ReceiverFunction receiverFunction)
{
00411470 push ebp
00411471 mov ebp,esp
00411473 sub esp,0D8h
00411479 push ebx
0041147A push esi
0041147B push edi
0041147C push ecx
0041147D lea edi,[ebp-0D8h]
00411483 mov ecx,36h
00411488 mov eax,0CCCCCCCCh
0041148D rep stos dword ptr es:[edi]
0041148F pop ecx
00411490 mov dword ptr [ebp-8],ecx
int i = sizeof(receiverFunction); // added, sizeof(receiverFunction) is 10h
00411493 mov dword ptr [i],10h
//Do nothing
}
0041149A pop edi
0041149B pop esi
0041149C pop ebx
0041149D mov esp,ebp
0041149F pop ebp
004114A0 ret 10h
A:: addEventListener() used ret 10h to clear the stack, but only 4 bytes are pushed into the stack (push offset B::testFunction), which cause the stack frame to be corrupted.
Seem that depending whether B is complete or not, sizeof(void B::*func()) would change in VS2010. In OP's code, in A.cpp B is not complete, and the size is 10h. In call site B.cpp, when B is already complete, the size becomes 04h. (This can be checked by sizeof(ReceiverFunction) as shown in the above code). This caused that in the call site, and in the actual code of A::addEventListener(), the size of the augment/parameter are not the same, thus caused stack corruption.
I changed the order of inclusion to make sure B is complete in every translation unit, and the runtime error disappears.
This should be a VS2010 bug ...
Compiler Command Line:
/ZI /nologo /W3 /WX- /Od /Oy- /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /GS /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Debug\test.pch" /Fa"Debug\" /Fo"Debug\" /Fd"Debug\vc100.pdb" /Gd /analyze- /errorReport:queue
Linker Command Line:
/OUT:"...\test.exe" /INCREMENTAL /NOLOGO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MANIFEST /ManifestFile:"Debug\test.exe.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"...\test.pdb" /SUBSYSTEM:CONSOLE /PGD:"...\test.pgd" /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X86 /ERRORREPORT:QUEUE
I hid some pathes in the command line.
Using /vmg as a compiler option fixed the problem.
However, I decided to use a delegate library instead (http://www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx), and it works well!