Some kext member functions must be redefined, to avoid unresolved symbols - c++

TL;DR
A subclass is reimplementing (redefining) a virtual function of the superclass (base class) in the scope of the superclass, because the dynamic loader requires it to do so. It doesn't make any sense to me.
Example:
class IO80211Controller : public IOEthernetController
{
virtual IOReturn enablePacketTimestamping(); // Implemented in binary, I can see the disassembly.
};
// .cpp - Redefinition with superclass namespace.
IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported; // This is from the disassembly of IO80211Controller
}
The above isn't the real header, I hope it's close to what it should be - no header is available.
// .hpp
class AirPortBrcm4331 : public IO80211Controller
{
// Subclass stuff goes here
};
// .cpp - Redefinition with superclass namespace.
IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported; // This is from the disassembly of AirPortBrcm4331
}
Background
I'm researching IO80211Family.kext (which there are no headers available for), and IO80211Controller class in particular - I'm in the process of reversing the header so it will be possible to inherit from this class and create custom 802.11 drivers.
Discovering the problem
IO80211Controller defines many virtual member functions, which I need to declare on my reversed header file. I created a header file with all virtual functions (extracted from IO80211Controller's vtable) and used it for my subclass.
When loading my new kext (with the subclass), there were linking errors:
kxld[com.osxkernel.MyWirelessDriver]: The following symbols are unresolved for this kext:
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::enableFeature(IO80211FeatureCode, void*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::flowIdSupported()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::apple80211_ioctl(IO80211Interface*, __ifnet*, unsigned long, void*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::enablePacketTimestamping()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::hardwareOutputQueueDepth(IO80211Interface*)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::disablePacketTimestamping()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::performCountryCodeOperation(IO80211Interface*, IO80211CountryCodeOp)
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::requiresExplicitMBufRelease()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::_RESERVEDIO80211Controllerless7()
kxld[com.osxkernel.MyWirelessDriver]: IO80211Controller::stopDMA()
Link failed (error code 5).
The reversed header of the superclass contains over 50 virtual member functions, so if there were any linking problems, I would assume it would be a all-or-nothing. When adding a simple implementation to these functions (using superclass namespace) the linking errors are gone.
Two questions arise
How can multiple implementations of the same functions co-exist? They both live in the kernel address space.
What makes these specific functions so special, while the other 50 are ok without a weird reimplementing demand?
Hypothesis
I can't answer the first question, but I have started a research about the second.
I looked into the IO80211Family mach-o symbol table, and all the functions with the linking error don't contain the N_EXT bit in their type field - meaning they are not external symbols, while the other functions do contain the N_EXT bit.
I wasn't sure how this affects the loading kext procedure, so I dived into XNU source and looked for the kext loading code. There's a major player in here called vtable patching, which might shed some light on my first question.
Anyway, there's a predicate function called kxld_sym_is_unresolved which checks whether a symbol is unresolved. kxld calls this function on all symbols, to verify they are all ok.
boolean_t
kxld_sym_is_unresolved(const KXLDSym *sym)
{
return ((kxld_sym_is_undefined(sym) && !kxld_sym_is_replaced(sym)) ||
kxld_sym_is_indirect(sym) || kxld_sym_is_common(sym));
}
This function result in my case comes down to the return value of kxld_sym_is_replaced, which simply checks if the symbol has been patched (vtable patching), I don't understand well enough what is it and how it affects me...
The Grand Question
Why Apple chose these functions to not be external? are they implying that they should be implemented by others -- and others, why same scope as superclass? I jumped into the source to find answer to this but didn't. This is what most disturbs me - it doesn't follow my logic. I understand that a full comprehensive answer is probably too complicated, so at least help me understand, on a higher level, what's going on here, what's the logic behind not letting the subclass get the implementation of these specific functions, in such a weird way (why not pure abstract)?
Thank you so much for reading this!

The immediate explanation is indeed that the symbols are not exported by the IO80211 kext. The likely reason behind this however is that the functions are implemented inline, like so:
class IO80211Controller : public IOEthernetController
{
//...
virtual IOReturn enablePacketTimestamping()
{
return kIOReturnUnsupported;
}
//...
};
For example, if I build this code:
#include <cstdio>
class MyClass
{
public:
virtual void InlineVirtual() { printf("MyClass::InlineVirtual\n"); }
virtual void RegularVirtual();
};
void MyClass::RegularVirtual()
{
printf("MyClass::RegularVirtual\n");
}
int main()
{
MyClass a;
a.InlineVirtual();
a.RegularVirtual();
}
using the command
clang++ -std=gnu++14 inline-virtual.cpp -o inline-virtual
and then inspect the symbols using nm:
$ nm ./inline-virtual
0000000100000f10 t __ZN7MyClass13InlineVirtualEv
0000000100000e90 T __ZN7MyClass14RegularVirtualEv
0000000100000ef0 t __ZN7MyClassC1Ev
0000000100000f40 t __ZN7MyClassC2Ev
0000000100001038 S __ZTI7MyClass
0000000100000faf S __ZTS7MyClass
0000000100001018 S __ZTV7MyClass
U __ZTVN10__cxxabiv117__class_type_infoE
0000000100000000 T __mh_execute_header
0000000100000ec0 T _main
U _printf
U dyld_stub_binder
You can see that MyClass::InlineVirtual has hidden visibility (t), while MyClass::RegularVirtual is exported (T). The implementation for a function declared as inline (either explicitly with the keyword or implicitly by placing it inside the class definition) must be provided in all compilation units that call it, so it makes sense that they wouldn't have external linkage.

You're hitting a very simple phenomenon: unexported symbols.
$ nm /System/Library/Extensions/IO80211Family.kext/Contents/MacOS/IO80211Family | fgrep __ZN17IO80211Controller | egrep '\w{16} t'
00000000000560c6 t __ZN17IO80211Controller13enableFeatureE18IO80211FeatureCodePv
00000000000560f6 t __ZN17IO80211Controller15flowIdSupportedEv
0000000000055fd4 t __ZN17IO80211Controller16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv
0000000000055f74 t __ZN17IO80211Controller21monitorModeSetEnabledEP16IO80211Interfacebj
0000000000056154 t __ZN17IO80211Controller24enablePacketTimestampingEv
0000000000056008 t __ZN17IO80211Controller24hardwareOutputQueueDepthEP16IO80211Interface
0000000000056160 t __ZN17IO80211Controller25disablePacketTimestampingEv
0000000000056010 t __ZN17IO80211Controller27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp
00000000000560ee t __ZN17IO80211Controller27requiresExplicitMBufReleaseEv
0000000000055ffc t __ZN17IO80211Controller7stopDMAEv
0000000000057452 t __ZN17IO80211Controller9MetaClassD0Ev
0000000000057448 t __ZN17IO80211Controller9MetaClassD1Ev
Save for the two MetaClass destructors, the only difference to your list of linking errors is monitorModeSetEnabled (any chance you're overriding that?).
Now on my system I have exactly one class extending IO80211Controller, which is AirPort_BrcmNIC, implemented by com.apple.driver.AirPort.BrcmNIC. So let's look at how that handles it:
$ nm /System/Library/Extensions/AirPortBrcmNIC-MFG.kext/Contents/MacOS/AirPortBrcmNIC-MFG | egrep '13enableFeatureE18IO80211FeatureCodePv|15flowIdSupportedEv|16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv|21monitorModeSetEnabledEP16IO80211Interfacebj|24enablePacketTimestampingEv|24hardwareOutputQueueDepthEP16IO80211Interface|25disablePacketTimestampingEv|27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp|27requiresExplicitMBufReleaseEv|7stopDMAEv'
0000000000046150 t __ZN17IO80211Controller15flowIdSupportedEv
0000000000046120 t __ZN17IO80211Controller16apple80211_ioctlEP16IO80211InterfaceP7__ifnetmPv
0000000000046160 t __ZN17IO80211Controller24enablePacketTimestampingEv
0000000000046170 t __ZN17IO80211Controller25disablePacketTimestampingEv
0000000000046140 t __ZN17IO80211Controller27requiresExplicitMBufReleaseEv
000000000003e880 T __ZN19AirPort_BrcmNIC_MFG13enableFeatureE18IO80211FeatureCodePv
0000000000025b10 T __ZN19AirPort_BrcmNIC_MFG21monitorModeSetEnabledEP16IO80211Interfacebj
0000000000025d20 T __ZN19AirPort_BrcmNIC_MFG24hardwareOutputQueueDepthEP16IO80211Interface
0000000000038cf0 T __ZN19AirPort_BrcmNIC_MFG27performCountryCodeOperationEP16IO80211Interface20IO80211CountryCodeOp
000000000003e7d0 T __ZN19AirPort_BrcmNIC_MFG7stopDMAEv
So one bunch of methods they've overridden and the rest... they re-implemented locally. Firing up a disassembler, we can see that these are really just stubs:
;-- IO80211Controller::apple80211_ioctl(IO80211Interface*,__ifnet*,unsignedlong,void*):
;-- method.IO80211Controller.apple80211_ioctl_IO80211Interface____ifnet__unsignedlong_void:
0x00046120 55 push rbp
0x00046121 4889e5 mov rbp, rsp
0x00046124 4d89c1 mov r9, r8
0x00046127 4989c8 mov r8, rcx
0x0004612a 4889d1 mov rcx, rdx
0x0004612d 488b17 mov rdx, qword [rdi]
0x00046130 488b82900c00. mov rax, qword [rdx + 0xc90]
0x00046137 31d2 xor edx, edx
0x00046139 5d pop rbp
0x0004613a ffe0 jmp rax
0x0004613c 0f1f4000 nop dword [rax]
;-- IO80211Controller::requiresExplicitMBufRelease():
;-- method.IO80211Controller.requiresExplicitMBufRelease:
0x00046140 55 push rbp
0x00046141 4889e5 mov rbp, rsp
0x00046144 31c0 xor eax, eax
0x00046146 5d pop rbp
0x00046147 c3 ret
0x00046148 0f1f84000000. nop dword [rax + rax]
;-- IO80211Controller::flowIdSupported():
;-- method.IO80211Controller.flowIdSupported:
0x00046150 55 push rbp
0x00046151 4889e5 mov rbp, rsp
0x00046154 31c0 xor eax, eax
0x00046156 5d pop rbp
0x00046157 c3 ret
0x00046158 0f1f84000000. nop dword [rax + rax]
;-- IO80211Controller::enablePacketTimestamping():
;-- method.IO80211Controller.enablePacketTimestamping:
0x00046160 55 push rbp
0x00046161 4889e5 mov rbp, rsp
0x00046164 b8c70200e0 mov eax, 0xe00002c7
0x00046169 5d pop rbp
0x0004616a c3 ret
0x0004616b 0f1f440000 nop dword [rax + rax]
;-- IO80211Controller::disablePacketTimestamping():
;-- method.IO80211Controller.disablePacketTimestamping:
0x00046170 55 push rbp
0x00046171 4889e5 mov rbp, rsp
0x00046174 b8c70200e0 mov eax, 0xe00002c7
0x00046179 5d pop rbp
0x0004617a c3 ret
0x0004617b 0f1f440000 nop dword [rax + rax]
Which about corresponds to this:
static uint32_t IO80211Controller::apple80211_ioctl(IO80211Interface *intf, __ifnet *net, unsigned long some, void *whatev)
{
return this->apple80211_ioctl(intf, (IO80211VirtualInterface*)NULL, net, some, whatev);
}
static bool IO80211Controller::requiresExplicitMBufRelease()
{
return false;
}
static bool IO80211Controller::flowIdSupported()
{
return false;
}
static IOReturn IO80211Controller::enablePacketTimestamping()
{
return kIOReturnUnsupported;
}
static IOReturn IO80211Controller::disablePacketTimestamping()
{
return kIOReturnUnsupported;
}
I didn't try to compile the above, but that should get you on the right track. :)

Related

How to reduce the size of the executable?

When I compile this code using the {fmt} lib, the executable size becomes 255 KiB whereas by using only iostream header it becomes 65 KiB (using GCC v11.2).
time_measure.cpp
#include <iostream>
#include "core.h"
#include <string_view>
int main( )
{
// std::cout << std::string_view( "Oh hi!" );
fmt::print( "{}", std::string_view( "Oh hi!" ) );
return 0;
}
Here is my build command:
g++ -std=c++20 -Wall -O3 -DNDEBUG time_measure.cpp -I include format.cc -o runtime_measure.exe
Isn't the {fmt} library supposed to be lightweight compared to iostream? Or maybe I'm doing something wrong?
Edit: By adding -s to the command in order to remove all symbol table and relocation information from the executable, it becomes 156 KiB. But still ~2.5X more than the iostream version.
As with any other library there is a fixed cost and a per-call cost. The fixed cost for the {fmt} library is indeed around 100-150k without debug info (it depends on the compiler flags). In your example you are comparing this fixed cost of linking with the library and the reason why iostreams appears to be smaller is because it is included in the standard library itself which is linked dynamically and not counted to the binary size of the executable.
Note that a large part of this size comes from floating-point formatting functionality which doesn't even exist in iostreams (shortest round-trip representation).
If you want to compare per-call binary size which is more important for real-world code with large number of formatting function calls, you can look at object files or generated assembly. For example:
#include <fmt/core.h>
int main() {
fmt::print("Oh hi!");
}
generates (https://godbolt.org/z/qWTKEMqoG)
.LC0:
.string "Oh hi!"
main:
sub rsp, 24
pxor xmm0, xmm0
xor edx, edx
mov edi, OFFSET FLAT:.LC0
mov rcx, rsp
mov esi, 6
movaps XMMWORD PTR [rsp], xmm0
call fmt::v8::vprint(fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)
xor eax, eax
add rsp, 24
ret
while
#include <iostream>
int main() {
std::cout << "Oh hi!";
}
generates (https://godbolt.org/z/frarWvzhP)
.LC0:
.string "Oh hi!"
main:
sub rsp, 8
mov edx, 6
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Other than static initialization for cout there is not much difference because there is virtually no formatting here, so it's just one function call in both cases. Once you add formatting you'll quickly see the benefits of {fmt}, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0645r10.html#BinaryCode.
You forget that iostreams is already included in the stdlibc++.so that is not counted towards the binary size since it is a shared library (usually). I believe by default fmt is built as a static library file so it increases the binary size. You need to compile fmt as a shared library with -DBUILD_SHARED_LIBS=TRUE as explained in the building instructions
in your build/link command, why don't you use -Os option (Optimize for size) ?

Calling a standard-library-function in MASM

I want to get started in MASM in a mixed C++/Assembly way.
I am currently trying to call a standard-library-function (e.g. printf) from a PROC in assembly, that I then call in C++.
I have the code working after I declared printf's signature in my cpp-file. But I do not understand why I have to do this and if I can avoid that.
My cpp-file:
#include <stdio.h>
extern "C" {
extern int __stdcall foo(int, int);
}
extern int __stdcall printf(const char*, ...); // When I remove this line I get Linker-Error "LNK2019: unresolved external symbol"
int main()
{
foo(5, 5);
}
My asm-file:
.model flat, stdcall
EXTERN printf :PROC ; declare printf
.data
tstStr db "Mult: %i",0Ah,"Add: %i",0 ; 0Ah is the backslash - escapes are not supported
.code
foo PROC x:DWORD, y:DWORD
mov eax, x
mov ebx, y
add eax, ebx
push eax
mov eax, x
mul ebx
push eax
push OFFSET tstStr
call printf
ret
foo ENDP
END
Some Updates
In response to the comments I tried to rework the code to be eligible for the cdecl calling-convention. Unfortunatly this did not solve the problem (the code runs fine with the extern declaration, but throws an error without).
But by trial and error i found out, that the extern seems to force external linkage, even though the keyword should not be needed, because external linkage should be the default for function declarations.
I can omit the declaration by using the function in my cpp-code (i.e. if a add a printf("\0"); somewhere in the source file the linker is fine with it and everythings works correctly.
The new (but not really better) cpp-file:
#include <stdio.h>
extern "C" {
extern int __cdecl foo(int, int);
}
extern int __cdecl printf(const char*, ...); // omiting the extern results in a linker error
int main()
{
//printf("\0"); // this would replace the declaration
foo(5, 5);
return 0;
}
The asm-file:
.model flat, c
EXTERN printf :PROC
.data
tstStr db "Mult: %i",0Ah,"Add: %i",0Ah,0 ; 0Ah is the backslash - escapes are not supported
.code
foo PROC
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ebx, [ebp+12]
add eax, ebx
push eax
mov eax, [ebp+8]
mul ebx
push eax
push OFFSET tstStr
call printf
add esp, 12
pop ebp
ret
foo ENDP
END
My best guess is that this has to do with the fact that Microsoft refactored the C library starting with VS 2015 and some of the C library is now inlined (including printf) and isn't actually in the default .lib files.
My guess is in this declaration:
extern int __cdecl printf(const char*, ...);
extern forces the old legacy libraries to be included in the link process. Those libraries contain the non-inlined function printf. If the C++ code doesn't force the MS linker to include the legacy C library then the MASM code's use of printf will become unresolved.
I believe this is related to this Stackoverflow question and my answer in 2015. If you want to remove extern int __cdecl printf(const char*, ...); from the C++ code you may wish to consider adding this line to your MASM code:
includelib legacy_stdio_definitions.lib
Your MASM code would look like this if you are using CDECL calling convention and mixing C/C++ with assembly:
.model flat, C ; Default to C language
includelib legacy_stdio_definitions.lib
EXTERN printf :PROC ; declare printf
.data
tstStr db "Mult: %i",0Ah,"Add: %i",0 ; 0Ah is the backslash - escapes are not supported
.code
foo PROC x:DWORD, y:DWORD
mov eax, x
mov ebx, y
add eax, ebx
push eax
mov eax, x
mul ebx
push eax
push OFFSET tstStr
call printf
ret
foo ENDP
END
Your C++ code would be:
#include <stdio.h>
extern "C" {
extern int foo(int, int); /* __cdecl removed since it is the default */
}
int main()
{
//printf("\0"); // this would replace the declaration
foo(5, 5);
return 0;
}
The alternative to passing the includelib line in the assembly code is to add legacy_stdio_definitions.lib to the dependency list in the linker options of your Visual Studio project or the command line options if you invoke the linker manually.
Calling Convention Bug in your MASM Code
You can read about the CDECL calling convention for 32-bit Windows code in the Microsoft documentation as well as this Wiki article. Microsoft summarizes the CDECL calling convention as:
On x86 platforms, all arguments are widened to 32 bits when they are passed. Return values are also widened to 32 bits and returned in the EAX register, except for 8-byte structures, which are returned in the EDX:EAX register pair. Larger structures are returned in the EAX register as pointers to hidden return structures. Parameters are pushed onto the stack from right to left. Structures that are not PODs will not be returned in registers.
The compiler generates prologue and epilogue code to save and restore the ESI, EDI, EBX, and EBP registers, if they are used in the function.
The last paragraph is important in relation to your code. The ESI, EDI, EBX, and EBP registers are non-volatile and must be saved and restored by the called function if they are modified. Your code clobbers EBX, you must save and restore it. You can get MASM to do that by using the USES directive in a PROC statement:
foo PROC uses EBX x:DWORD, y:DWORD
mov eax, x
mov ebx, y
add eax, ebx
push eax
mov eax, x
mul ebx
push eax
push OFFSET tstStr
call printf
add esp, 12 ; Remove the parameters pushed on the stack for
; the printf call. The stack needs to be
; properly restored. If not done, the function
; prologue can't properly restore EBX
; (and any registers listed by USES)
ret
foo ENDP
uses EBX tell MASM to generate extra prologue and epilogue code to save EBX at the start and restore EBX when the function does a ret instruction. The generated instructions would look something like:
0000 _foo:
0000 55 push ebp
0001 8B EC mov ebp,esp
0003 53 push ebx
0004 8B 45 08 mov eax,0x8[ebp]
0007 8B 5D 0C mov ebx,0xc[ebp]
000A 03 C3 add eax,ebx
000C 50 push eax
000D 8B 45 08 mov eax,0x8[ebp]
0010 F7 E3 mul ebx
0012 50 push eax
0013 68 00 00 00 00 push tstStr
0018 E8 00 00 00 00 call _printf
001D 83 C4 0C add esp,0x0000000c
0020 5B pop ebx
0021 C9 leave
0022 C3 ret
That's indeed a bit pointless, isn't it?
Linkers are often pretty dumb things. They need to be told that an object file requires printf. Linkers can't figure that out from a missing printf symbol, stupidly enough.
The C++ compiler will tell the linker that it needs printf when you write extern int __stdcall printf(const char*, ...);. Or, and that's the normal way, the compiler will tell the linker so when you actually call printf. But your C++ code doesn't call it!
Assemblers are also pretty dumb. Your assembler clearly fails to tell the linker that it needs printf from C++.
The general solution is not to do complex things in assembly. That's just not what assembly is good for. Calls from C to assembly generally work well, calls the other way are problematic.

namespace in debug flags of in-class defined friend functions

I'm dealing with a class that defines a friend function in the class without outside declaration
namespace our_namespace {
template <typename T>
struct our_container {
friend our_container set_union(our_container const &, our_container const &) {
// meaningless for the example here, just a valid definition
// no valid semantics
return our_container{};
}
};
} // namespace our_namespace
As discussed (e.g. here or here) the function set_union is not in the our_namespace namespace but will be found by argument dependent lookup:
auto foo(std::vector<our_namespace::our_container<float>> in) {
// works:
return set_union(in[0], in[1]);
}
I noticed however that in the debug flags set_union appears to be in the our_namespace namespace
mov rdi, qword ptr [rbp - 40] # 8-byte Reload
mov rsi, rax
call our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
add rsp, 48
pop rbp
ret
our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&): # #our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
push rbp
mov rbp, rsp
mov qword ptr [rbp - 16], rdi
mov qword ptr [rbp - 24], rsi
pop rbp
ret
although I can't call it as our_namespace::set_union
auto foo(std::vector<our_namespace::our_container<float>> in) {
// fails:
return our_namespace::set_union(in[0], in[1]);
}
Any hints about how the debug information is to be understood?
EDIT: The set_union function body is only a strawdog example here to have a valid definition.
The C++ standard only defines compiler behavior in regards to the code compilation and behavior of the resulting program. It doesn't define all the aspects of code generation, and in particular, it doesn't define debug symbols.
So your compiler correctly (as per Standard) disallows calling the function through namespace it is not in. But since the function does exist and you should be able to debug it, it needs to put debug symbol somewhere. Enclosing namespace seems to be a reasonable choice.

d2d1debug3.dll!DebugRenderTarget::EndDraw Access violation

This question is similar to : Intermittent Access Violation ID2D1RenderTarget::EndDraw but I've done all that was suggested in that question and is still no where near the solution.
My application(Windows Store Application) dies throws memory access violation exception sometimes(not all the time, usually after few minutes of heavy use, and only in Win10 devices) when running the 1D2DRenderTarget::EndDraw. According to the dump file, it says: "The thread tried to read from or write to a virtual address for which it does not have the appropriate access."
There are few variants, but it breaks down to access violation while doing the EndDraw call. Here are disassembles and call stacks:
d3d10warp.dll!UMDevice::DestroyResource(struct D3D10DDI_HDEVICE,struct D3D10DDI_HRESOURCE) Unknown
d3d11.dll!NDXGI::CDeviceChild<IDXGIResource1,IDXGISwapChainInternal>::FinalRelease() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCReleaseUse() Unknown
d3d11.dll!NDXGI::CDeviceChild<IDXGISurface,IUnknown>::FinalRelease() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() Unknown
d3d11.dll!CDevCtxInterface::CDevCtxInterface<CContext>() Unknown
d3d11.dll!CContext::TID3D11DeviceContext_SetShaderResources_Amortized<0,4>() Unknown
d2d1.dll!CD3DDeviceLevel1::ProcessDeferredOperations() Unknown
d2d1.dll!CHwSurfaceRenderTarget::FlushQueuedOperations() Unknown
d2d1.dll!CHwSurfaceRenderTarget::EndProcessBatch() Unknown
d2d1.dll!CHwSurfaceRenderTarget::ProcessBatch() Unknown
d2d1.dll!CBatchSerializer::FlushInternal() Unknown
d2d1.dll!CBatchSerializer::Flush() Unknown
d2d1.dll!DrawingContext::FlushBatch() Unknown
d2d1.dll!DrawingContext::EndDraw() Unknown
d2d1.dll!D2DDeviceContextBase<ID2D1RenderTarget,ID2D1DeviceContext3,ID2D1DeviceContext3>::EndDraw() Unknown
d2d1debug3.dll!DebugRenderTarget::EndDraw(class DebugLayer &,struct ID2D1RenderTarget *,unsigned __int64 *,unsigned __int64 *) Unknown
d2d1debug3.dll!DebugRenderTargetGenerated<struct ID2D1BitmapRenderTarget>::EndDraw(unsigned __int64 *,unsigned __int64 *) Unknown
OZDebugApp_wrt_2013.exe!OZXCanvasD2D::~OZXCanvasD2D() Line 203 C++
Disassembly:
679F0EC1 mov eax,dword ptr [ebx+4]
679F0EC4 mov dword ptr [ecx+4],eax
679F0EC7 jmp UMDevice::DestroyResource+1CAh (679F0E6Ah)
679F0EC9 mov eax,dword ptr [edi+220h]
679F0ECF mov eax,dword ptr [eax+3Ch]
679F0ED2 test eax,eax
679F0ED4 je UMDevice::DestroyResource+135h (679F0DD5h)
679F0EDA cmp eax,0FFBADBADh
679F0EDF je UMDevice::DestroyResource+135h (679F0DD5h)
679F0EE5 mov cl,byte ptr ds:[67B58280h]
>> 679F0EEB movzx eax,byte ptr [eax]
679F0EEE add ecx,eax
679F0EF0 mov byte ptr ds:[67B58280h],cl
679F0EF6 jmp UMDevice::DestroyResource+135h (679F0DD5h)
679F0EFB push 1
This is another variant
msvcrt.dll!__VEC_memcpy() Unknown
msvcrt.dll!__VEC_memcpy() Unknown
d2d1.dll!DrawingContext::EndDraw() Unknown
D2D1Debug3.dll!DebugRenderTarget::EndDraw(class DebugLayer &,struct ID2D1RenderTarget *,unsigned __int64 *,unsigned __int64 *) Unknown
D2D1Debug3.dll!DebugRenderTargetGenerated<struct ID2D1BitmapRenderTarget>::EndDraw(unsigned __int64 *,unsigned __int64 *) Unknown
OZDebugApp_wrt_2013.exe!OZXCanvasD2D::~OZXCanvasD2D() Line 203 C++
Disassembly:
76C7A3A7 mov dword ptr [ebp-8],esi
76C7A3AA mov esi,dword ptr [ebp+0Ch]
76C7A3AD mov edi,dword ptr [ebp+8]
76C7A3B0 mov ecx,dword ptr [ebp+10h]
76C7A3B3 shr ecx,7
76C7A3B6 jmp __VEC_memcpy+108h (76C7A3BEh)
76C7A3B8 lea ebx,[ebx]
>> 76C7A3BE movdqa xmm0,xmmword ptr [esi]
76C7A3C2 movdqa xmm1,xmmword ptr [esi+10h]
76C7A3C7 movdqa xmm2,xmmword ptr [esi+20h]
76C7A3CC movdqa xmm3,xmmword ptr [esi+30h]
76C7A3D1 movdqa xmmword ptr [edi],xmm0
76C7A3D5 movdqa xmmword ptr [edi+10h],xmm1
76C7A3DA movdqa xmmword ptr [edi+20h],xmm2
76C7A3DF movdqa xmmword ptr [edi+30h],xmm3
Things I've tried:
I checked multithreading. My application uses one 1D2D factory with multithread property with multiple render targets so drawings should be interleaved by default. On top of that, I tried to add locks so that each BeginDraw, EndDraw, render target creations, and DXGI related stuffs are in critical section.
Ran with debug layer enabled, both with DirectX control panel and code.
Generated the crash dump file but using it seems to be exactly the same with just debugging the remote machine?
Implemented a logger to generate composition/drawing calls and arguments for each render target. Each run generated around 40mb of logs. I checked the render targets that crashed and their drawings are identical to some of the earlier drawings ie) I honestly can't see what's going wrong at drawing level.
None of the above worked, I'd really appreciate any help.
It turns out that I didn't set critical section for one instance of CreateWicBitmapRenderTarget. Running D2Dfactory in multithread property does interleave all Direct2D calls, but it doesn't work on WIC, DXGI, or other D3D calls. I had to use:
Microsoft::WRL::ComPtr<ID2D1Multithread> d2DMultithread;
d2dFactory.As(&d2DMultithread);
d2DMultithread->Enter();
HRESULT targetCreationResult = d2dFactory->CreateWicBitmapRenderTarget(m_wicBitmap.Get(), &prop, &target);
d2DMultithread->Leave();
Done the same for other render target creation and Begin/EndDraw. I noticed the above by looking at the thread view in debug.

mixed c/asm project (x64) yields one "unresolved external" (fixed all the others!)

I have a Visual Studio project that I want to build in both 32 and 64 bit variants. It is 99% written in C/C++, but has one function written in assembler. In 32 bit, I have the asm code in a cpp file as follows:
#ifndef X64
__declspec( naked ) void ERR ( )
{
SeqNum = 0;
SeqTimeStamp++;
__asm
{
mov EAX, MinusTwo
mov EBP, SaveBP
sub EBP, 4
mov ESP, EBP
pop EBP
ret 8
}
}
#endif
The referenced globals are defined at the top of the same file as follows:
extern "C"
{
DWORD SeqTimeStamp, SeqNum;
void *SaveBP;
}
This compiles and builds fine. (And works, too! :-) )
For the 64-bit build, with no inline asm support, the same basic algorithm is coded in a .ASM file. I have Visual Studio (2010) building this file just fine, and including it in the call to link. That code looks like this:
EXTERN SaveBP:PTR
EXTERN SeqNum:DWORD
EXTERN SeqTimeStamp:DWORD
.CODE
ERR PROC PUBLIC FRAME
PUSH RBP
MOV RBP, RSP
.ENDPROLOG
MOV SeqNum, 0
MOV EAX, SeqTimeStamp
INC EAX
MOV SeqTimeStamp, EAX
MOV RAX, 0FFFFFFFEh
MOV RBP, SaveBP
LEA RSP,[RBP+0]
POP RBP
RET 0
ERR ENDP
END
I get a single undefined external in this build:
ERR.obj : error LNK2019: unresolved external symbol SaveBP referenced in function ERR
I've tried a number of different ways of declaring and referencing SaveBP, but I haven't found a winning combination. Has anyone else run into a similar situation, or might know how to solve it?