Due to a WPO patch the way a function I called through an injected DLL changed.
The function is a __fastcall
The original function looked like
PUSH EAX
MOV EAX,DWORD PTR SS:[ESP]
PUSH EAX
LEA EBX,[ARG.22]
LEA EDI,[ARG.23]
CALL Function
So I could call it via:
Push ebx
Push edi
Push 0
Push 0
lea ebx,dword ptr ss:[ecx]
lea edi,dword ptr ss:[edx]
call Function
Pop edi
Pop ebx
retn
The function only needed 2 ascii strings.
Now after the WPO the function changed to
PUSH 0
LEA EDX,[LOCAL.22]
PUSH EDX
LEA EDX,[LOCAL.23]
XOR ECX,ECX
CALL Function
A common fastcall, which looks simpler. But the issue started that the ebp register carried a number while esi and edi the same strings but in Unicode.
While the call still needed only 2 arguments the registers contained additional which was required.
So instead of calling the function via 2 Ascii on ecx and edx I wrote a struct which contained the strings as ascii and unicode.
My attempt to solve it looked like
pushad
push 0
lea edi,dword ptr ss:[ecx+0x20]
lea esi,dword ptr ss:[ecx]
mov ebp, 100
lea edx,dword ptr ss:[ecx+0x50]
push edx
lea edx,dword ptr ss:[ecx+0x40]
xor ecx, ecx
call Function
pop edx
popad
retn
I followed it in the debugger and the call is processed as it should be, but after the the function returns to my asmstub and returns to my c++ code my code creates an exception on write.
Did I make a fundamental asm mistake such as messing up the order which causes the exception?
Related
Now I am trying to implement stackful coroutine in C++17 on Windows x64 OS, but, unfortunately, I have encountered the problem: I can't throw exception in my coroutine, if I do so, the program is immediately terminated with a bad exit code.
Implementation
At the begining, I allocate a stack for a new coroutine, the code looks something like that:
void* Allocate() {
static constexpr std::size_t kStackSize{524'288};
auto new_stack{::operator new(kStackSize)};
return static_cast<std::byte *>(new_stack) + kStackSize;
}
The next step is setting a trampoline function on the recently allocated stack. The code is written using MASM, since I utilize MVSC (I would like to use GCC and NASM but I have the problem with thread_local variables, see question, if it is interesting):
SetTrampoline PROC
mov rax, rsp ; saves the current stack pointer
mov rsp, [rcx] ; sets the new stack pointer
sub rsp, 20h ; shadow stack
push rdx ; saves the function pointer
; place for nonvolatile registers
sub rsp, 0e0h
mov [rcx], rsp ; saves the moved stack pointer
mov rsp, rax ; returns the initial stack pointer
ret
SetTrampoline ENDP
Then I switch machine context with this assembly function (I read this calling convetion):
SwitchContext PROC
; saves all nonvolatile registers to the caller stack
push rbx
push rbp
push rdi
push rsi
push r12
push r13
push r14
push r15
sub rsp, 10h
movdqu [rsp], xmm6
; ... pushes xmm7 - xmm14 in here, removed for brevity
sub rsp, 10h
movdqu [rsp], xmm15
mov [rdx], rsp ; saves the caller stack pointer
SwitchContextFinally PROC
mov rsp, [rcx] ; sets the callee stack pointer
; takes out the callee registers
movdqu xmm15, [rsp]
add rsp, 10h
; ... pops xmm7 - xmm14 in here, removed for brevity
movdqu xmm6, [rsp]
add rsp, 10h
pop r15
pop r14
pop r13
pop r12
pop rsi
pop rdi
pop rbp
pop rbx
ret
SwitchContextFinally ENDP
SwitchContext ENDP
Problem
Inside the trampoline I just invoke any passed function and within these functions I can't throw exceptions and catch them instantly in the same fucntion. What have I done wrong? Is it possible to throw exceptions in my case? Should I have shadow stack in SetTrampoline?
Also, I guarantee that the exception thrown don't go outside the trampoline function.
I'm new to assembly and I'm trying to figure out how C++ handles dynamic dispatch in assembly.
When looking through assembly code, I saw that there were 2 unusual calls:
call _Znwm
call _ZdlPv
These did not have a subroutine that I could trace them to. From examining the code, Znwm seemed to return the address of the object when its constructor was called, but I'm not sure about that. ZdlPv was in a block of code that could never be reached (it was jumped over).
C++:
Fruit * f;
f = new Apple();
x86:
# BB#1:
mov eax, 8
mov edi, eax
call _Znwm
mov rdi, rax
mov rcx, rax
.Ltmp6:
mov qword ptr [rbp - 48], rdi # 8-byte Spill
mov rdi, rax
mov qword ptr [rbp - 56], rcx # 8-byte Spill
call _ZN5AppleC2Ev
Any advice would be appreciated.
Thanks.
_Znwm is operator new.
_ZdlPv is operator delete.
(This question is specific to my machine's architecture and calling conventions, Windows x86_64)
I don't exactly remember where I had read this, or if I had recalled it correctly, but I had heard that, when a function should return some struct or object by value, it will either stuff it in rax (if the object can fit in the register width of 64 bits) or be passed a pointer to where the resulting object would be (I'm guessing allocated in the calling function's stack frame) in rcx, where it would do all the usual initialization, and then a mov rax, rcx for the return trip. That is, something like
extern some_struct create_it(); // implemented in assembly
would really have a secret parameter like
extern some_struct create_it(some_struct* secret_param_pointing_to_where_i_will_be);
Did my memory serve me right, or am I incorrect? How are large objects (i.e. wider than the register width) returned by value from functions?
Here's a simple disassembling of a code exampling what you're saying
typedef struct
{
int b;
int c;
int d;
int e;
int f;
int g;
char x;
} A;
A foo(int b, int c)
{
A myA = {b, c, 5, 6, 7, 8, 10};
return myA;
}
int main()
{
A myA = foo(5,9);
return 0;
}
and here's the disassembly of the foo function, and the main function calling it
main:
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 30h
call ___main
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
call _foo
mov eax, 0
leave
retn
foo:
push ebp
mov ebp, esp
sub esp, 20h
mov eax, [ebp+12]
mov [ebp-28], eax
mov eax, [ebp+16]
mov [ebp-24], eax
mov dword ptr [ebp-20], 5
mov dword ptr [ebp-16], 6
mov dword ptr [ebp-12], 7
mov dword ptr [ebp-8], 9
mov byte ptr [ebp-4], 0Ah
mov eax, [ebp+8]
mov edx, [ebp-28]
mov [eax], edx
mov edx, [ebp-24]
mov [eax+4], edx
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
leave
retn
now let's go through what just happened, so when calling foo the paramaters were passed in the following way, 9 was at highest address, then 5 then the address the myA in main begins
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
within foo there is some local myA which is stored on the stack frame, since the stack is going downwards, the lowest address of myA begins in [ebp - 28], the -28 offset could be caused by struct alignments so I'm guessing the size of the struct should be 28 bytes here and not 25 as expected. and as we can see in foo after the local myA of foo was created and filled with parameters and immediate values, it is copied and re-written to the address of myA passed from main ( this is the actual meaning of return by value )
mov eax, [ebp+8]
mov edx, [ebp-28]
[ebp + 8] is where the address of main::myA was stored ( memory address go upwards hence ebp + old ebp ( 4 bytes ) + return address ( 4 bytes )) at overall ebp + 8 to get to the first byte of main::myA, as said earlier foo::myA is stored within [ebp-28] as stack goes downwards
mov [eax], edx
place foo::myA.b in the address of the first data member of main::myA which is main::myA.b
mov edx, [ebp-24]
mov [eax+4], edx
place the value that resides in the address of foo::myA.c in edx, and place that value within the address of main::myA.b + 4 bytes which is main::myA.c
as you can see this process repeats itself through out the function
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
which basically proves that when returning a struct by val, that could not be placed in as a param, what happens is that the address of where the return value should reside in is passed as a param to the function and within the function being called the values of the returned struct are copied into the address passed as a parameter...
hope this exampled helped you visualize what happens under the hood a little bit better :)
EDIT
I hope that you've noticed that my example was using 32 bit assembler and I KNOW you've asked regarding x86-64, but I'm currently unable to disassemble code on a 64 bit machine so I hope you take my word on it that the concept is exactly the same both for 64 bit and 32 bit, and that the calling convention is nearly the same
That is exactly correct. The caller passes an extra argument which is the address of the return value. Normally it will be on the caller's stack frame but there are no guarantees.
The precise mechanics are specified by the platform ABI, but this mechanism is very common.
Various commentators have left useful links with documentation for calling conventions, so I'll hoist some of them into this answer:
Wikipedia article on x86 calling conventions
Agner Fog's collection of optimization resources, including a summary of calling conventions (Direct link to 57-page PDF document.)
Microsoft Developer Network (MSDN) documentation on calling conventions.
StackOverflow x86 tag wiki has lots of useful links.
I'm trying to get the prototype of an asm function to call it from my injected c++ dll.
Here is the function:
PUSH EBP
MOV EBP,ESP
PUSH -1
PUSH Program.0151A5BB
MOV EAX,DWORD PTR FS:[0]
PUSH EAX
SUB ESP,0F8
MOV EAX,DWORD PTR DS:[167D380]
XOR EAX,EBP
MOV DWORD PTR SS:[EBP-14],EAX
PUSH EBX
PUSH ESI
PUSH EDI
PUSH EAX
LEA EAX,DWORD PTR SS:[EBP-C]
MOV DWORD PTR FS:[0],EAX
MOV DWORD PTR SS:[EBP-10],ESP
MOV EDI,EDX
MOV ESI,ECX
MOV DWORD PTR SS:[EBP-4],0
CMP ESI,0FFFF
JE SHORT Program.0117DFC9
CALL Program.01205130
MOV ECX,82
CALL Program.012F2AE0
MOV ECX,ESI
CALL Program.012F3050
MOV ECX,EDI
CALL Program.012F3050
MOV ECX,DWORD PTR SS:[EBP+8]
CALL Program.012F2EA0
MOV ECX,DWORD PTR SS:[EBP+C]
CALL Program.012F3050
MOV ECX,DWORD PTR SS:[EBP+10]
CALL Program.012F2EA0
MOV ECX,DWORD PTR SS:[EBP+14]
CALL Program.012F2EA0
MOV CL,1
CALL Program.012F39B0
MOV DWORD PTR SS:[EBP-4],-1
MOV ECX,DWORD PTR SS:[EBP-C]
MOV DWORD PTR FS:[0],ECX
POP ECX
POP EDI
POP ESI
POP EBX
MOV ECX,DWORD PTR SS:[EBP-14]
XOR ECX,EBP
CALL Program.014BB1AC
MOV ESP,EBP
POP EBP
RETN
And here is an example of a call to this function
JMP Program.001CDD83
CALL Program.000930A0
MOV ECX,EAX
CALL Program.0024EC10
PUSH EAX ; /Arg4
PUSH DWORD PTR SS:[EBP-168] ; |Arg3
PUSH DWORD PTR DS:[EDI+8] ; |Arg2
PUSH DWORD PTR SS:[EBP-160] ; |Arg1
MOV EDX,DWORD PTR SS:[EBP-16C] ; |
MOV ECX,DWORD PTR SS:[EBP-164] ; |
CALL Program.0006DF80 ; \<---- TARGET FUNCTION
ADD ESP,10
JMP Program.001CDD83
TEST EAX,800
JE SHORT Program.001CDF6D
TEST ESI,ESI
JE Program.001CDD83
CMP ESI,DWORD PTR DS:[72202C]
JE Program.001CDD83
CMP ESI,DWORD PTR DS:[584684]
By the function call I was able to deduce that is a __fastcall function since it uses the EDX and ECX registers and it takes 4 additional parameters via stack.
Checking the stack and the registers in the moment of the call I could determinate that all 6 parameters are numbers.
Here is a picture of the state just in the function call.
With all this in mind I made this definition
typedef void(__fastcall *_programFunction)(DWORD ECX, DWORD EDX, DWORD param1, DWORD param2, DWORD param3, DWORD param4);
And it calls the function and the function works in my target program but my DLL crashes displaying this error:
"Debug Error!
Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention."
I'm pretty sure this is a __fastcall function since is the only one that prioritises EDX and ECX over the stack. Plus the caller function isn't cleaning the stack, that's another hint for __fastcall
There is any trick to deduce the function protptype from asm code?
There is something wrong with my thinking?
Thank you!!
EDIT:
I checked what mainactual said
ADD ESP, 10 after your function call seems more __cdecl to me: the caller cleans the stack. If it were a __fastcall you should find RET 10 at the end. –
and it works when I add manualy the first two parameters to ECX and EDX registers.
like this
typedef void(__cdecl *_targetFunction)(DWORD param1, DWORD param2, DWORD param3, DWORD param4);
_targetFunction fcall= (_targetFunction)(ADD_TARGET_FUNCTION);
__asm
{
mov ECX, ECX_PARAM
mov EDX, EDX_PARAM
}
fcall(param1, pram2, param3, param4);
Thank you! but why do I have to do this? There is any way to set the registers automatically?
Thank you!
Due to optimizations, you will occasionally find functions which do not perfectly match the normal calling conventions.
In this situation, the solution is to use inline assembly which you have already accomplished in your question:
typedef void(__cdecl *_targetFunction)(DWORD param1, DWORD param2, DWORD param3, DWORD param4);
_targetFunction fcall= (_targetFunction)(ADD_TARGET_FUNCTION);
__asm
{
mov ECX, ECX_PARAM
mov EDX, EDX_PARAM
}
fcall(param1, pram2, param3, param4);
Sometimes that's just the way it goes.
Using Visual Studio, I have made a very simple Class in C++ called Watertank, which has a member function:
double Watertank::getcapacity() const{
return capacity;
}
When I run the code:
Watertank wt = Watertank(100);
double capacity = wt.getcapacity();
the double capacity = wt.getcapacity(); generates the following assembly:
push ebp
mov ebp, esp
mov ecx, 0F2E320h
call Watertank::getcapacity(0F21073h)
fstp qword ptr ds:[0F2E330h]
cmp ebp,esp
call _RTC_CheckEsp (0F250B0h)
pop ebp
ret
And the assembly generated for the double Watertank::getcapacity() const body is:
push ebp
mov ebp,esp
push ecx
mov dword ptr [this],0CCCCCCCCh
mov dword ptr [this],ecx
mov eax,dword ptr [this]
fld qword ptr [eax]
mov esp,ebp
pop ebp
ret
Now, as I see it, when calling the wt.getcapacity() function, the base pointer is pushed onto the stack and the base pointer is updated to the current stack pointer. The function can then be executed, and the base pointer can be popped off the stack to return to the state before entering the function.
What I don't understand is why the function body also pushes a base pointer and pops it? I assume it has something to do with the use of the ecx register, but I don't know what that is used for.