I have performance critical algorithm and implemented it with inline assembly, it resulted in 40% performance gain so I want keep and use it, however there is one issue.
I use naked declaration, that leaves everything up to me and compiler doesn't add anything in function scope. My function looks like this:
bool __declspec(naked) Foo(...)
{
__asm
{
push ebp
mov ebp, esp
sub esp, 0x10000
/*some code here*/
ret
}
}
As you see I have 64KB of locals, so it can crash because of stack corruption. That compiler does, it inserts calls to _chkstk routine which expands stack.
As I searched there are some files with this routine implemented:
C:\Program Files (x86)\Microsoft Visual Studio [version]\VC\crt\src\intel\chkstk.asm
I would like to ask how can I import and use routine from there and if there is another way of preventing crashes.
What I can think about is adding my own implementation, something like that:
mov ecx, 0x10
guard_page_testing_starts:
test [esp], eax
sub esp, 0x1000
dec ecx
jnz guard_page_testing_starts
But here is another question, does it have any negative impact?
Related
This is a follow-up from another question.
I think the following code should not use monotonic_buffer_resource, but in the generated assembly there are references to it.
void default_pmr_alloc(std::pmr::polymorphic_allocator<int>& alloc) {
(void)alloc.allocate(1);
}
godbolt
I looked into the source code of the header files and libstdc++, but could not find how monotonic_buffer_resource was selected to be used by the default pmr allocator.
The assembly tells the story. In particular, this:
cmp rax, OFFSET FLAT:_ZNSt3pmr25monotonic_buffer_resource11do_allocateEmm
jne .L11
This appears to be a test to see if the memory resource is a monotonic_buffer_resource. This seems to be done by checking the do_allocate member of the vtable. If it is not such a resource (ie: if do_allocate in the memory resource is not the monotonic one), then it jumps down to this:
.L11:
mov rdi, rbx
mov edx, 4
mov esi, 4
pop rbx
jmp rax
This appears to be a vtable call.
The rest of the assembly appears to be an inlined version of monotonic_buffer_resource::do_allocate. Which is why it conditionally calls std::pmr::monotonic_buffer_resource::_M_new_buffer.
So overall, this implementation of polymorphic_resource::allocate seems to have some built-in inlining of monotonic_buffer_resource::do_allocate if the resource is appropriate for that. That is, it won't do a vtable call if it can determine that it should call monotonic_buffer_resource::do_allocate.
Hello people,
I'm kinda newbie with c++ but i have managed to create my own dll and injecting it to my gameserver.exe
Well i have tried for days to hook a user call function but i always fail since it belongs to assembly :(
I would like you guys to show me how i write a proper lines to hook this function:
0048C1AF |. 8B9B 4C010000 MOV EBX,DWORD PTR DS:[EBX+14C]
0048C1B5 |. 8B13 MOV EDX,DWORD PTR DS:[EBX]
0048C1B7 |. 8B82 EC000000 MOV EAX,DWORD PTR DS:[EDX+EC]
0048C1BD |. 8BCB MOV ECX,EBX
0048C1BF |. FFD0 CALL EAX
0048C1C1 |. 8BF8 MOV EDI,EAX
0048C1C3 |. E8 789EF8FF CALL SR_GameS.00444040
0048C1C8 |. 8B7C24 1C MOV EDI,DWORD PTR SS:[ESP+1C]
0048C1CC |. 8BF0 MOV ESI,EAX
0048C1CE |. E8 6D9EF8FF CALL SR_GameS.00444040
What i have written on c++ so far is:
void __cdecl Global()
{
__asm
{
mov msg, edi; //msg
push ebx;
mov ebx, dword ptr[esp+1C]; //playername
mov playername, ebx;
pop ebx;
}
printf("Global [%s] -> %s\n", playername, msg);
//then calling func entry
CALL((DWORD)0x00444040);
}
when ever 0048C1CE got called, i get it into my c++ and move it's parameters into Global()
until here everything goes fine but inside Global() i can't call back the parameters successful into x00444040 even it show a strange values in console window and sometimes show a part of player message.
P.S. If it's possible an explanation about how things goes with assembly lines.
Sorry for my English, Thanks in advance.
I'll leave aside the question as to why you would want to do this. It's probably someone else's software and they probably didn't give you permission. You may be in breach of a licence somewhere.
Your description is pretty tangled. The lines of assembler are not a function, they are code with 3 function calls. I'll guess that what you meant to say is that you want to intercept the call to function 0x00444040 in order to execute your own code. You haven't shown how you do that.
The C++ code needs to do roughly three things.
On entry, it must conform to the calling sequence expected by its caller. It appears there are two arguments, in ESI and EDI.
If you want to call C++ library functions then you must save all registers that might be affected by making those calls and restore them afterwards.
When you exit, you should restore the stack and registers exactly how they were on entry, and branch (JMP not CALL) to the hooked function, so that it can return to the original caller, not to your hooking code.
At the debugger level, just make sure that every register (including the stack pointer) is the same as it was on entry, just before you branch to the hooked function.
Morality and legality aside, I am just going to focus on the technical aspects of your question - but I do feel you should give sincere thought to the points the david.pfx raised.
Having written a few projects that do similar things to what you described, for personal knowledge only, I would recommend a general purpose hooking library. I worked with the source engine (from Half-life 2 fame), and used a library called SourceHook. SourceHook is part of the AlliedModder's metamod project, which is used inside of SourceMod.
When I tried writing general purpose hooks outside of source-engine projects, I found SourceHook still useful, but also explored other options. I was pleased using mHook, another general purpose hooking library.
Its important the know the calling convention of the methods you are hooking, as restoring the registers correctly is critical to safe execution of your hooks
While inspecting the disassembly of below function,
void * malloc_float_align(size_t n, unsigned int a, float *& dizi)
{
void * adres=NULL;
void * adres2=NULL;
adres=malloc(n*sizeof(float)+a);
size_t adr=(size_t)adres;
size_t adr2=adr+a-(adr&(a-1u));
adres2=(void * ) adr2;
dizi=(float *)adres2;
return adres;
}
Builtin functions are not inlined even with the inline optimization flag set.
; Line 26
$LN4:
push rbx
sub rsp, 32 ; 00000020H
; Line 29
mov ecx, 160 ; 000000a0H
mov rbx, r8
call QWORD PTR __imp_malloc <------this is not inlined
; Line 31
mov rcx, rax
; Line 33
mov rdx, rax
and ecx, 31
sub rdx, rcx
add rdx, 32 ; 00000020H
mov QWORD PTR [rbx], rdx
; Line 35
add rsp, 32 ; 00000020H
pop rbx
ret 0
Question: is this a must-have property of functions like malloc? Can we inline it some way to inspect it(or any other function like strcmp/new/free/delete)? Is this forbidden?
Typically the compiler will inline functions when it has the source code available during compilation (in other words, the function is defined, rather than just a prototype declaration) in a header file).
However, in this case, the function (malloc) is in a DLL, so clearly the source code is not available to the compiler during the compilation of your code. It has nothing to do with what malloc does (etc). However, it's also likely that malloc won't be inlined anyway, since it is a fairly large function [at least it often is], whcih prevents it from being inlined even if the source code is available.
If you are using Visual Studio, you can almost certainly find the source code for your runtime library, as it is supplied with the Visual Studio package.
(The C runtime functions are in a DLL because many different programs in the system use the same functions, so putting them in a DLL that is loaded once for all "users" of the functionality will give a good saving on the size of all the code in the system. Although malloc is perhaps only a few hundred bytes, a function like printf can easily add some 5-25KB to the size of an executable. Multiply that by the number of "users" of printf, and there is likely several hundred kilobytes just from that one function "saved" - and of course, all other functions such as fopen, fclose, malloc, calloc, free, and so on all add a little bit each to the overall size)
A C compiler is allowed to inline malloc (or, as you see in your example, part of it), but it is not required to inline anything. The heuristics it uses need not be documented, and they're usually quite complex, but normally only short functions will be inlined, since otherwise code-bloat is likely.
malloc and friends are implemented in the runtime library, so they're not available for inlining. They would need to have their implementation in their header files for that to happen.
If you want to see their disassembly, you could step into them with a debugger. Or, depending on the compiler and runtime you're using, the source code might be available. It is available for both gcc and msvc, for example.
The main thing stopping the inlining of malloc() et al is their complexity — and the obvious fact that no inline definition of the function is provided. Besides, you may need different versions of the function at different times; it would be harder (messier) for tools like valgrind to work, and you could not arrange to use a debugging version of the functions if their code is expanded inline.
I have a program written in Visual C++ 2012, and I was trying to call a function written in Delphi(which I don't have the source code). Here is the code in Visual C++:
int (_fastcall *test)(void*) = (int(_fastcall *)(void*))0x00489A7D;
test((void *)0x12345678);
But in the compiled code it actually was:
.text:1000113B mov eax, 489A7Dh
.text:10001140 mov ecx, 12345678h
.text:10001145 call eax
And what I am excepting is:
.text:1000113B mov ebx, 489A7Dh
.text:10001140 mov eax, 12345678h
.text:10001145 call ebx
I know 'fastcall' use EAX, ECX, EDX as parameters, but I don't know why Visual C++ compiler use EAX as a entry point. Shouldn't EAX be the first parameter(12345678h)?
I tried to call the delphi function in assembly code and it works, but I really want to know how to do that without using assembly.
So is that possible to let Visual C++ compiler generate code as what I am excepting? If yes, how to do that?
Delphi's register calling convention, also known as Borland fastcall, on x86 uses EAX, EDX and ECX registers, in that order.
However, Microsoft's fastcall calling convention uses different registers. It does not use EAX at all. Instead it uses ECX and EDX registers for first two parameters, as described by the documentation.
So, with that information you could probably write some assembler to make a Delphi register function call from C++, by moving the parameter into the EAX register. However, it's going to be so much easier to let the Delphi compiler do that. Especially as I imagine that your real problem involves multiple functions and more than a single parameter.
I suggest that you write some Pascal code to adapt between stdcall and register.
function FuncRegister(param: Pointer): Integer; register; external '...';
function FuncStdcall(param: Pointer): Integer; stdcall;
begin
Result := FuncRegister(param);
end;
exports
FuncStdcall;
Then you can call FuncStdcall from your C++ code and let the Delphi compiler handle the parameter passing.
I have a bit of code which is calling a method from a COM object (IDirect3D9), but every call causes a run-time check failure #0. The failure is caused by ESP not being properly preserved across the call, so some kind of stack issue (as COM methods are all __stdcall). The unusual part is the simplicity of the method signature and the circumstances.
The code is built in 32-bit mode only, with MSVC 10 (VS 2010 SP1), using the DirectX SDK (June 2010) headers and libs. I've reinstalled the SDK to make sure the headers weren't corrupt, without luck.
I've run the code with both VS' debugger and WinDBG attached, as well as multiple times after reboots/updated drivers. The problem occurs every time, and is identical. Enabling heap validation (and most other options) in gflags doesn't seem to provide any more information, nor does running with Application Verifier. Both simply report the same error as the popup, or the segfault caused shortly after.
Without the call (returning a constant value instead), the program runs as expected. I'm out of ideas on what could be going wrong here.
The function in question is IDirect3D9::GetAdapterModeCount, called from a D3D8-to-9 wrapper (part of a graphics upgrade project for old games). For more general info, the full file is here.
I've tried all the following forms of the call:
UINT r = m_Object->GetAdapterModeCount(D3DADAPTER_DEFAULT, D3DFMT_X8R8G8B8);
UINT r = m_Object->GetAdapterModeCount(0, (D3DFORMAT)22);
UINT adapter = D3DADAPTER_DEFAULT;
D3DFORMAT format = D3DFMT_X8R8G8B8; // and other values
UINT r = m_Object->GetAdapterModecount(adapter, format);
All of which cause the check failure. m_Object is a valid IDirect3D9, and is used previously for a variety of other calls, specifically:
201, 80194887, Voodoo3D8, CVoodoo3D8::GetAdapterCount() == 3
201, 80195309, Voodoo3D8, CVoodoo3D8::GetAdapterIdentifier(0, 2, 0939CBAC) == 0
201, 80195309, Voodoo3D8, CVoodoo3D8::GetAdapterDisplayMode(0, 0018F5B4) == 0
201, 80196541, Voodoo3D8, CVoodoo3D8::GetAdapterModeCount(0, D3DFMT_X8R8G8B8) == 80
The sequence is logged by debug trace code, and appears to be correct and returning the expected values (3 monitors and so forth). The first 3 calls, by the same object on my part (a single instance of CVoodoo3D8), all succeed with no stack warnings. The fourth does not.
If I reorder the calls, to cause GetAdapterModeCount to be called immediately before any of the others in the same object, the same run-time check failure appears. From testing, this seems to rule out an immediately-previous call breaking the stack; the 4 methods calling those 4 functions all occur at different places, and calling GetAdapterModeCount anywhere from within this file causes the issue.
Which brings us to the unusual part. A different class (CVoodoo3D9) also calls the same sequence of IDirect3D9 methods, with similar parameters, but does not fail (it is the equivalent wrapper class for D3D9). The objects are not used at the same time (the code picks on or the other depending on the render process I need), but both give the same behavior every time. The code for the other class is held in another file, which led me to suspect preprocessor issues (more on that shortly).
After that didn't provide any information, I examined the calling conventions of my code and parameters. Again, nothing came to light. The codebase compiles with /w4 /wX and has for some time, with SAL on most functions and all PREfast rules enabled (and passing).
In particular, the call fails when called within this class, whether the call to my method comes from my code or another program using the object. It fails regardless of where it is called, but only within this file.
The full method is:
UINT STDMETHODCALLTYPE CVoodoo3D8::GetAdapterModeCount(UINT Adapter)
{
UINT r = m_Object->GetAdapterModeCount(D3DADAPTER_DEFAULT, D3DFMT_X8R8G8B8);
gpVoodooLogger->LogMessage(LL_Debug, VOODOO_D3D_NAME, Format("CVoodoo3D8::GetAdapterModeCount(%d, D3DFMT_X8R8G8B8) == %d") << Adapter << r);
return r;
}
The check failure occurs immediately after the call to GetAdapterModeCount and again as my method returns, if allowed to execute to that point.
The preprocessor output, as given by the preprocess-to-file option, has the method declaration (from d3d9.h) correctly as:
virtual __declspec(nothrow) UINT __stdcall GetAdapterModeCount( UINT Adapter,D3DFORMAT Format) = 0;
The declaration of my method is essentially identical:
virtual __declspec(nothrow) UINT __stdcall GetAdapterModeCount(UINT Adapter);
My method hardly expands, becoming:
UINT __stdcall CVoodoo3D8::GetAdapterModeCount(UINT Adapter)
{
UINT r = m_Object->GetAdapterModeCount(D3DADAPTER_DEFAULT, D3DFMT_X8R8G8B8);
gpVoodooLogger->LogMessage(LL_Debug, L"Voodoo3D8", Format("CVoodoo3D8::GetAdapterModeCount(%d, D3DFMT_X8R8G8B8) == %d") << Adapter << r);
return r;
}
The preprocessor output seems correct for both methods, in the declaration and definition.
The assembly listing up to the point of failure is:
UINT STDMETHODCALLTYPE CVoodoo3D8::GetAdapterModeCount(UINT Adapter)
{
642385E0 push ebp
642385E1 mov ebp,esp
642385E3 sub esp,1Ch
642385E6 push ebx
642385E7 push esi
642385E8 push edi
642385E9 mov eax,0CCCCCCCCh
642385EE mov dword ptr [ebp-1Ch],eax
642385F1 mov dword ptr [ebp-18h],eax
642385F4 mov dword ptr [ebp-14h],eax
642385F7 mov dword ptr [ebp-10h],eax
642385FA mov dword ptr [ebp-0Ch],eax
642385FD mov dword ptr [ebp-8],eax
64238600 mov dword ptr [ebp-4],eax
UINT r = m_Object->GetAdapterModeCount(D3DADAPTER_DEFAULT, D3DFMT_X8R8G8B8);
64238603 mov esi,esp
64238605 push 16h
64238607 push 0
64238609 mov eax,dword ptr [this]
6423860C mov ecx,dword ptr [eax+8]
6423860F mov edx,dword ptr [this]
64238612 mov eax,dword ptr [edx+8]
64238615 mov ecx,dword ptr [ecx]
64238617 push eax
64238618 mov edx,dword ptr [ecx+18h]
6423861B call edx
6423861D cmp esi,esp
6423861F call _RTC_CheckEsp (6424B520h)
64238624 mov dword ptr [r],eax
For clarification, the error comes at 6423861F (the call to _RTC_CheckEsp), suggesting that the call or preparation broke the stack. I am working with the assumption that since the same call works in other places, it is not something within the call breaking things.
To my untrained eye, the only unusual part is the pair of mov register, dword ptr [register+8]. As it is a 32-bit system, I'm not sure if +8 could be incrementing too far, or how it could be getting into the build if so.
Shortly after my method returns, apparently due to the call breaking ESP, the program segfaults. If I don't call GetAdapterModeCount and simply return a value, the program executes as expected.
Additionally, a release build (no RTC) segfaults at a similar point, with the stack:
d3d8.dll!CEnum::EnumAdapterModes() + 0x13b bytes
Voodoo_DX89.dll!ClassCreate() + 0x963 bytes
Although I'm not sure of the implications of the address. It is not, so far as I can tell, the same place that segfaults in debug builds; those are within the program after my methods returns, this appears to be during one of my methods which retrieves data from D3D8. Edit: The segfault occurs in a later call, which I'm currently debugging.
At this point, I'm at a complete loss as to what is going wrong or how, and am out of things to check.
I don't see anything wrong with what you're doing or with your generated assembly code.
I can answer your one concern, though.
ecx,dword ptr [eax+8]
What this is doing is moving the address of m_Object into the ecx register. The +8 is the offset within your class to m_Object, which is probably correct.
Something to look at. Step through the assembly code until you reach this point:
6423861B call edx
6423861D cmp esi,esp
At that point check the esi and esp registers (in VS just hover your mouse over the register names).
Before the call is executed, ESI should be 12 higher than ESP. After the call, they should be equal. If they are not, post what they are.
Update:
So what is catching my eye is that of the 4 methods you are showing that you are calling, only GetAdapterModeCount has a different signature between D3D8 and D3D9, and that signature is different by 4 bytes, which is the difference in your stack.
How is m_Object obtained? Since this is some kind of adapter between D3D8 and D3D9, is it possible that your m_Object is actually an IDirect3D8 object that is being cast as IDirect3D9 at some point? That would explain the error, and why it works in another context, if you are obtaining the D3D object in a different way.