d2d1debug3.dll!DebugRenderTarget::EndDraw Access violation - c++

This question is similar to : Intermittent Access Violation ID2D1RenderTarget::EndDraw but I've done all that was suggested in that question and is still no where near the solution.
My application(Windows Store Application) dies throws memory access violation exception sometimes(not all the time, usually after few minutes of heavy use, and only in Win10 devices) when running the 1D2DRenderTarget::EndDraw. According to the dump file, it says: "The thread tried to read from or write to a virtual address for which it does not have the appropriate access."
There are few variants, but it breaks down to access violation while doing the EndDraw call. Here are disassembles and call stacks:
d3d10warp.dll!UMDevice::DestroyResource(struct D3D10DDI_HDEVICE,struct D3D10DDI_HRESOURCE) Unknown
d3d11.dll!NDXGI::CDeviceChild<IDXGIResource1,IDXGISwapChainInternal>::FinalRelease() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCReleaseUse() Unknown
d3d11.dll!NDXGI::CDeviceChild<IDXGISurface,IUnknown>::FinalRelease() Unknown
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() Unknown
d3d11.dll!CDevCtxInterface::CDevCtxInterface<CContext>() Unknown
d3d11.dll!CContext::TID3D11DeviceContext_SetShaderResources_Amortized<0,4>() Unknown
d2d1.dll!CD3DDeviceLevel1::ProcessDeferredOperations() Unknown
d2d1.dll!CHwSurfaceRenderTarget::FlushQueuedOperations() Unknown
d2d1.dll!CHwSurfaceRenderTarget::EndProcessBatch() Unknown
d2d1.dll!CHwSurfaceRenderTarget::ProcessBatch() Unknown
d2d1.dll!CBatchSerializer::FlushInternal() Unknown
d2d1.dll!CBatchSerializer::Flush() Unknown
d2d1.dll!DrawingContext::FlushBatch() Unknown
d2d1.dll!DrawingContext::EndDraw() Unknown
d2d1.dll!D2DDeviceContextBase<ID2D1RenderTarget,ID2D1DeviceContext3,ID2D1DeviceContext3>::EndDraw() Unknown
d2d1debug3.dll!DebugRenderTarget::EndDraw(class DebugLayer &,struct ID2D1RenderTarget *,unsigned __int64 *,unsigned __int64 *) Unknown
d2d1debug3.dll!DebugRenderTargetGenerated<struct ID2D1BitmapRenderTarget>::EndDraw(unsigned __int64 *,unsigned __int64 *) Unknown
OZDebugApp_wrt_2013.exe!OZXCanvasD2D::~OZXCanvasD2D() Line 203 C++
Disassembly:
679F0EC1 mov eax,dword ptr [ebx+4]
679F0EC4 mov dword ptr [ecx+4],eax
679F0EC7 jmp UMDevice::DestroyResource+1CAh (679F0E6Ah)
679F0EC9 mov eax,dword ptr [edi+220h]
679F0ECF mov eax,dword ptr [eax+3Ch]
679F0ED2 test eax,eax
679F0ED4 je UMDevice::DestroyResource+135h (679F0DD5h)
679F0EDA cmp eax,0FFBADBADh
679F0EDF je UMDevice::DestroyResource+135h (679F0DD5h)
679F0EE5 mov cl,byte ptr ds:[67B58280h]
>> 679F0EEB movzx eax,byte ptr [eax]
679F0EEE add ecx,eax
679F0EF0 mov byte ptr ds:[67B58280h],cl
679F0EF6 jmp UMDevice::DestroyResource+135h (679F0DD5h)
679F0EFB push 1
This is another variant
msvcrt.dll!__VEC_memcpy() Unknown
msvcrt.dll!__VEC_memcpy() Unknown
d2d1.dll!DrawingContext::EndDraw() Unknown
D2D1Debug3.dll!DebugRenderTarget::EndDraw(class DebugLayer &,struct ID2D1RenderTarget *,unsigned __int64 *,unsigned __int64 *) Unknown
D2D1Debug3.dll!DebugRenderTargetGenerated<struct ID2D1BitmapRenderTarget>::EndDraw(unsigned __int64 *,unsigned __int64 *) Unknown
OZDebugApp_wrt_2013.exe!OZXCanvasD2D::~OZXCanvasD2D() Line 203 C++
Disassembly:
76C7A3A7 mov dword ptr [ebp-8],esi
76C7A3AA mov esi,dword ptr [ebp+0Ch]
76C7A3AD mov edi,dword ptr [ebp+8]
76C7A3B0 mov ecx,dword ptr [ebp+10h]
76C7A3B3 shr ecx,7
76C7A3B6 jmp __VEC_memcpy+108h (76C7A3BEh)
76C7A3B8 lea ebx,[ebx]
>> 76C7A3BE movdqa xmm0,xmmword ptr [esi]
76C7A3C2 movdqa xmm1,xmmword ptr [esi+10h]
76C7A3C7 movdqa xmm2,xmmword ptr [esi+20h]
76C7A3CC movdqa xmm3,xmmword ptr [esi+30h]
76C7A3D1 movdqa xmmword ptr [edi],xmm0
76C7A3D5 movdqa xmmword ptr [edi+10h],xmm1
76C7A3DA movdqa xmmword ptr [edi+20h],xmm2
76C7A3DF movdqa xmmword ptr [edi+30h],xmm3
Things I've tried:
I checked multithreading. My application uses one 1D2D factory with multithread property with multiple render targets so drawings should be interleaved by default. On top of that, I tried to add locks so that each BeginDraw, EndDraw, render target creations, and DXGI related stuffs are in critical section.
Ran with debug layer enabled, both with DirectX control panel and code.
Generated the crash dump file but using it seems to be exactly the same with just debugging the remote machine?
Implemented a logger to generate composition/drawing calls and arguments for each render target. Each run generated around 40mb of logs. I checked the render targets that crashed and their drawings are identical to some of the earlier drawings ie) I honestly can't see what's going wrong at drawing level.
None of the above worked, I'd really appreciate any help.

It turns out that I didn't set critical section for one instance of CreateWicBitmapRenderTarget. Running D2Dfactory in multithread property does interleave all Direct2D calls, but it doesn't work on WIC, DXGI, or other D3D calls. I had to use:
Microsoft::WRL::ComPtr<ID2D1Multithread> d2DMultithread;
d2dFactory.As(&d2DMultithread);
d2DMultithread->Enter();
HRESULT targetCreationResult = d2dFactory->CreateWicBitmapRenderTarget(m_wicBitmap.Get(), &prop, &target);
d2DMultithread->Leave();
Done the same for other render target creation and Begin/EndDraw. I noticed the above by looking at the thread view in debug.

Related

How to reduce the size of the executable?

When I compile this code using the {fmt} lib, the executable size becomes 255 KiB whereas by using only iostream header it becomes 65 KiB (using GCC v11.2).
time_measure.cpp
#include <iostream>
#include "core.h"
#include <string_view>
int main( )
{
// std::cout << std::string_view( "Oh hi!" );
fmt::print( "{}", std::string_view( "Oh hi!" ) );
return 0;
}
Here is my build command:
g++ -std=c++20 -Wall -O3 -DNDEBUG time_measure.cpp -I include format.cc -o runtime_measure.exe
Isn't the {fmt} library supposed to be lightweight compared to iostream? Or maybe I'm doing something wrong?
Edit: By adding -s to the command in order to remove all symbol table and relocation information from the executable, it becomes 156 KiB. But still ~2.5X more than the iostream version.
As with any other library there is a fixed cost and a per-call cost. The fixed cost for the {fmt} library is indeed around 100-150k without debug info (it depends on the compiler flags). In your example you are comparing this fixed cost of linking with the library and the reason why iostreams appears to be smaller is because it is included in the standard library itself which is linked dynamically and not counted to the binary size of the executable.
Note that a large part of this size comes from floating-point formatting functionality which doesn't even exist in iostreams (shortest round-trip representation).
If you want to compare per-call binary size which is more important for real-world code with large number of formatting function calls, you can look at object files or generated assembly. For example:
#include <fmt/core.h>
int main() {
fmt::print("Oh hi!");
}
generates (https://godbolt.org/z/qWTKEMqoG)
.LC0:
.string "Oh hi!"
main:
sub rsp, 24
pxor xmm0, xmm0
xor edx, edx
mov edi, OFFSET FLAT:.LC0
mov rcx, rsp
mov esi, 6
movaps XMMWORD PTR [rsp], xmm0
call fmt::v8::vprint(fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)
xor eax, eax
add rsp, 24
ret
while
#include <iostream>
int main() {
std::cout << "Oh hi!";
}
generates (https://godbolt.org/z/frarWvzhP)
.LC0:
.string "Oh hi!"
main:
sub rsp, 8
mov edx, 6
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Other than static initialization for cout there is not much difference because there is virtually no formatting here, so it's just one function call in both cases. Once you add formatting you'll quickly see the benefits of {fmt}, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0645r10.html#BinaryCode.
You forget that iostreams is already included in the stdlibc++.so that is not counted towards the binary size since it is a shared library (usually). I believe by default fmt is built as a static library file so it increases the binary size. You need to compile fmt as a shared library with -DBUILD_SHARED_LIBS=TRUE as explained in the building instructions
in your build/link command, why don't you use -Os option (Optimize for size) ?

Using C++ namespaces inside of inline assembly code

I just have a quick question on if this will work on or not.
void __declspec(naked) HookProcessEventProxy() {
__asm {
mov CallObjectPointer, ecx
push edx
mov edx, dword ptr[esp + 0x8]
mov UFunctionPointer, edx
mov edx, dword ptr[esp + 0xC]
mov ParamsPointer, edx
pop edx
pushfd
pushad
}
ProcessEventProxy();
__asm {
popad
popfd
jmp[Pointers::OldProcessEvent] // This is the line in question.
}
}
Does the Pointers namespace define to go to the Pointers::OldProcessEvent or will it go to the ProcessEvent I have inside of my DLLMain?
The HookProcessEventProxy is inside my DLLMain.
From the vendor-specific extensions in the code, it seems that you are compiling this on MSVC. If so, then this is not a problem. The inline assembler understands C++ scoping rules and identifiers.
You can easily verify this for yourself by analyzing the object code produced by the compiler. Either disassemble the binary using dumpbin /disasm, or throw the /FA switch when running the compiler to get a separate listing. What you'll see is that the compiler emits your inline assembly in a very literal fashion:
?HookProcessEventProxy##YAXXZ PROC ; HookProcessEventProxy, COMDAT
mov DWORD PTR ?CallObjectPointer##3HA, ecx ; CallObjectPointer
push edx
mov edx, DWORD PTR [esp+8]
mov DWORD PTR ?UFunctionPointer##3HA, edx ; UFunctionPointer
mov edx, DWORD PTR [esp+12]
mov DWORD PTR ?ParamsPointer##3HA, edx ; ParamsPointer
pop edx
pushfd
pushad
call ?ProcessEventProxy##YAXXZ ; ProcessEventProxy
popad
popfd
jmp ?OldProcessEvent#Pointers##YAXXZ ; Pointers::OldProcessEvent
?HookProcessEventProxy##YAXXZ ENDP ; HookProcessEventProxy
The above listing is from the file generated by the compiler when the /FA switch is used. The comments out to the right indicate the corresponding C++ object.
Note that you do not need the brackets around the branch target. Although the inline assembler ignores them, it is confusing to include them. Just write:
jmp Pointers::OldProcessEvent

mixed c/asm project (x64) yields one "unresolved external" (fixed all the others!)

I have a Visual Studio project that I want to build in both 32 and 64 bit variants. It is 99% written in C/C++, but has one function written in assembler. In 32 bit, I have the asm code in a cpp file as follows:
#ifndef X64
__declspec( naked ) void ERR ( )
{
SeqNum = 0;
SeqTimeStamp++;
__asm
{
mov EAX, MinusTwo
mov EBP, SaveBP
sub EBP, 4
mov ESP, EBP
pop EBP
ret 8
}
}
#endif
The referenced globals are defined at the top of the same file as follows:
extern "C"
{
DWORD SeqTimeStamp, SeqNum;
void *SaveBP;
}
This compiles and builds fine. (And works, too! :-) )
For the 64-bit build, with no inline asm support, the same basic algorithm is coded in a .ASM file. I have Visual Studio (2010) building this file just fine, and including it in the call to link. That code looks like this:
EXTERN SaveBP:PTR
EXTERN SeqNum:DWORD
EXTERN SeqTimeStamp:DWORD
.CODE
ERR PROC PUBLIC FRAME
PUSH RBP
MOV RBP, RSP
.ENDPROLOG
MOV SeqNum, 0
MOV EAX, SeqTimeStamp
INC EAX
MOV SeqTimeStamp, EAX
MOV RAX, 0FFFFFFFEh
MOV RBP, SaveBP
LEA RSP,[RBP+0]
POP RBP
RET 0
ERR ENDP
END
I get a single undefined external in this build:
ERR.obj : error LNK2019: unresolved external symbol SaveBP referenced in function ERR
I've tried a number of different ways of declaring and referencing SaveBP, but I haven't found a winning combination. Has anyone else run into a similar situation, or might know how to solve it?

Inline Assembly GCD won't work

I've been writing a simple c++ program that uses Assembly to take the GCD of 2 numbers and output them as an example used in a tutorial I watched. I understand what it's doing, but I don't understand why it won't work.
EDIT: Should add that when it runs, it doesn't output anything at all.
#include <iostream>
using namespace std;
int gcd(int a, int b)
{
int result;
_asm
{
push ebp
mov ebp, esp
mov eax, a
mov ebx, b
looptop:
cmp eax, 0
je goback
cmp eax, ebx
jge modulo
xchg eax, ebx
modulo:
idiv ebx
mov eax, edx
jmp looptop
goback:
mov eax, ebx
mov esp, ebp
pop ebp
mov result, edx
}
return result;
}
int main()
{
cout << gcd(46,90) << endl;
return 0;
}
I'm running it on a 32bit Windows system, any help would be appreciated. When compiling, I get 4 errors:
warning C4731: 'gcd' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'gcd' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code
The compiler will insert these or equivalent instructions for you at the beginning and end of the function:
push ebp
mov ebp, esp
...
mov esp, ebp
pop ebp
If you add them manually, you won't be able to access the function's parameters through ebp, which is why the compiler is issuing warnings.
Remove these 4 instructions.
Also, start using the debugger. Today.

error C2400: inline assembler syntax error in 'second operand'; found 'register'

I am facing compilation error while working with assembly instructions in VC++ as MACRO inline based assembler blocks.
error C2400: inline assembler syntax error in 'second operand'; found 'register'
Here is the code:
_asm {\
mov esi,dword ptr [pMemBlock]\
sub esp,sizeOfblock\
mov ebx,sizeOfblock\
mov shrResult,ebx\
shr shrResult,2\
mov ecx,shrResult\
mov shrResult,0\
mov edi,esp\
rep movs dword ptr es:[edi],dword ptr[esi]\
}
Regards
Usman
That blank line after the _asm { line will complete the macro. It should be deleted or have \ on it.
It should be
_asm {\
__asm mov esi,dword ptr [pMemBlock]\
__asm sub esp,sizeOfblock\
...
See this msdn page.