I'm trying to get the prototype of an asm function to call it from my injected c++ dll.
Here is the function:
PUSH EBP
MOV EBP,ESP
PUSH -1
PUSH Program.0151A5BB
MOV EAX,DWORD PTR FS:[0]
PUSH EAX
SUB ESP,0F8
MOV EAX,DWORD PTR DS:[167D380]
XOR EAX,EBP
MOV DWORD PTR SS:[EBP-14],EAX
PUSH EBX
PUSH ESI
PUSH EDI
PUSH EAX
LEA EAX,DWORD PTR SS:[EBP-C]
MOV DWORD PTR FS:[0],EAX
MOV DWORD PTR SS:[EBP-10],ESP
MOV EDI,EDX
MOV ESI,ECX
MOV DWORD PTR SS:[EBP-4],0
CMP ESI,0FFFF
JE SHORT Program.0117DFC9
CALL Program.01205130
MOV ECX,82
CALL Program.012F2AE0
MOV ECX,ESI
CALL Program.012F3050
MOV ECX,EDI
CALL Program.012F3050
MOV ECX,DWORD PTR SS:[EBP+8]
CALL Program.012F2EA0
MOV ECX,DWORD PTR SS:[EBP+C]
CALL Program.012F3050
MOV ECX,DWORD PTR SS:[EBP+10]
CALL Program.012F2EA0
MOV ECX,DWORD PTR SS:[EBP+14]
CALL Program.012F2EA0
MOV CL,1
CALL Program.012F39B0
MOV DWORD PTR SS:[EBP-4],-1
MOV ECX,DWORD PTR SS:[EBP-C]
MOV DWORD PTR FS:[0],ECX
POP ECX
POP EDI
POP ESI
POP EBX
MOV ECX,DWORD PTR SS:[EBP-14]
XOR ECX,EBP
CALL Program.014BB1AC
MOV ESP,EBP
POP EBP
RETN
And here is an example of a call to this function
JMP Program.001CDD83
CALL Program.000930A0
MOV ECX,EAX
CALL Program.0024EC10
PUSH EAX ; /Arg4
PUSH DWORD PTR SS:[EBP-168] ; |Arg3
PUSH DWORD PTR DS:[EDI+8] ; |Arg2
PUSH DWORD PTR SS:[EBP-160] ; |Arg1
MOV EDX,DWORD PTR SS:[EBP-16C] ; |
MOV ECX,DWORD PTR SS:[EBP-164] ; |
CALL Program.0006DF80 ; \<---- TARGET FUNCTION
ADD ESP,10
JMP Program.001CDD83
TEST EAX,800
JE SHORT Program.001CDF6D
TEST ESI,ESI
JE Program.001CDD83
CMP ESI,DWORD PTR DS:[72202C]
JE Program.001CDD83
CMP ESI,DWORD PTR DS:[584684]
By the function call I was able to deduce that is a __fastcall function since it uses the EDX and ECX registers and it takes 4 additional parameters via stack.
Checking the stack and the registers in the moment of the call I could determinate that all 6 parameters are numbers.
Here is a picture of the state just in the function call.
With all this in mind I made this definition
typedef void(__fastcall *_programFunction)(DWORD ECX, DWORD EDX, DWORD param1, DWORD param2, DWORD param3, DWORD param4);
And it calls the function and the function works in my target program but my DLL crashes displaying this error:
"Debug Error!
Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention."
I'm pretty sure this is a __fastcall function since is the only one that prioritises EDX and ECX over the stack. Plus the caller function isn't cleaning the stack, that's another hint for __fastcall
There is any trick to deduce the function protptype from asm code?
There is something wrong with my thinking?
Thank you!!
EDIT:
I checked what mainactual said
ADD ESP, 10 after your function call seems more __cdecl to me: the caller cleans the stack. If it were a __fastcall you should find RET 10 at the end. –
and it works when I add manualy the first two parameters to ECX and EDX registers.
like this
typedef void(__cdecl *_targetFunction)(DWORD param1, DWORD param2, DWORD param3, DWORD param4);
_targetFunction fcall= (_targetFunction)(ADD_TARGET_FUNCTION);
__asm
{
mov ECX, ECX_PARAM
mov EDX, EDX_PARAM
}
fcall(param1, pram2, param3, param4);
Thank you! but why do I have to do this? There is any way to set the registers automatically?
Thank you!
Due to optimizations, you will occasionally find functions which do not perfectly match the normal calling conventions.
In this situation, the solution is to use inline assembly which you have already accomplished in your question:
typedef void(__cdecl *_targetFunction)(DWORD param1, DWORD param2, DWORD param3, DWORD param4);
_targetFunction fcall= (_targetFunction)(ADD_TARGET_FUNCTION);
__asm
{
mov ECX, ECX_PARAM
mov EDX, EDX_PARAM
}
fcall(param1, pram2, param3, param4);
Sometimes that's just the way it goes.
Related
T.hpp
class T
{
int _i;
public:
int get() const;
int some_fun();
};
T.cpp
#include "T.hpp"
int T::get() const
{ return _i; }
int T::some_fun()
{
// noise
int i = get(); // (1)
// noise
}
get() is a non-inline function, however, it's defined in the same module as some_fun. Since the compiler can see the definition of get in the context of some_fun, do compilers, in optimized builds at least, apply the optimization of replacing get() by just _i in line (1)?
If I'm not wrong, I think that, with the exception of templates, the compiler only does a one-pass parsing. What if get is defined after some_fun?
Ok, I answered myself. I thought I didn't speak assembly but it wasn't that hard to try.
Code:
class T
{
int _i = 5;
public:
int get() const;
int some_fun();
};
int T::get() const { return _i; }
int T::some_fun()
{
int i = get();
return i;
}
int main()
{
T o;
return o.some_fun();
}
Non-optimized assembly output (using godbolt.org). A lot of stuff but you can see the explicit calls:
T::get() const:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
pop rbp
ret
T::some_fun():
push rbp
mov rbp, rsp
sub rsp, 24
mov QWORD PTR [rbp-24], rdi
mov rax, QWORD PTR [rbp-24]
mov rdi, rax
call T::get() const // !!!!
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 5
lea rax, [rbp-4]
mov rdi, rax
call T::some_fun() // !!!!
nop
leave
ret
Optimized output (-O3):
T::get() const:
mov eax, DWORD PTR [rdi]
ret
T::some_fun():
mov eax, DWORD PTR [rdi]
ret
main:
mov eax, 5
ret
Here, some_fun has inlined the call to get (the call instruction has been removed and its definition is the same as get now), but the get function is still defined.
main went even further by doing an inline substitution of the call to some_fun and then realizing that o hasn't changed and at that point it still retains its default value of 5, so main directly returns 5 without even creating o.
Here a piece of C++ code.
In this example, many code blocks look like constructor calls.
Unfortunately, block code #3 is not (You can check it using https://godbolt.org/z/q3rsxn and https://cppinsights.io).
I think, it is an old C++ notation and it could explain the introduction of the new C++11 construction notation using {} (cf #4).
Do you have an explanation for T(i) meaning, so close to a constructor notation, but definitely so different?
struct T {
T() { }
T(int i) { }
};
int main() {
int i = 42;
{ // #1
T t(i); // new T named t using int ctor
}
{ // #2
T t = T(i); // new T named t using int ctor
}
{ // #3
T(i); // new T named i using default ctor
}
{ // #4
T{i}; // new T using int ctor (unnamed result)
}
{ // #5
T(2); // new T using int ctor (unnamed result)
}
}
NB: thus, T(i) (#3) is equivalent to T i = T();
The statement:
T(i);
is equivalent to:
T i;
In other words, it declares a variable named i with type T. This is because parentheses are allowed in declarations in some places (in order to change the binding of declarators) and since this statement can be parsed as a declaration, it is a declaration (even though it might make more sense as an expression).
You can use Compiler Explorer to see what happens in assembler.
You can see that #1,#2 #4 and #5 do same thing but strangly #3 call the other constructor (the base object constructor).
Does anyone have an explanation?
Assembler code :
::T() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
T::T(int):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov DWORD PTR [rbp-12], esi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 42
// #1
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-7]
mov esi, edx
mov rdi, rax
call T::T(int)
// #2
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-8]
mov esi, edx
mov rdi, rax
call T::T(int)
// #3
lea rax, [rbp-9]
mov rdi, rax
call T::T() [complete object constructor]
// #4
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-6]
mov esi, edx
mov rdi, rax
call T::T(int)
// #5
lea rax, [rbp-5]
mov esi, 2
mov rdi, rax
call T::T(int)
mov eax, 0
leave
ret
(This question is specific to my machine's architecture and calling conventions, Windows x86_64)
I don't exactly remember where I had read this, or if I had recalled it correctly, but I had heard that, when a function should return some struct or object by value, it will either stuff it in rax (if the object can fit in the register width of 64 bits) or be passed a pointer to where the resulting object would be (I'm guessing allocated in the calling function's stack frame) in rcx, where it would do all the usual initialization, and then a mov rax, rcx for the return trip. That is, something like
extern some_struct create_it(); // implemented in assembly
would really have a secret parameter like
extern some_struct create_it(some_struct* secret_param_pointing_to_where_i_will_be);
Did my memory serve me right, or am I incorrect? How are large objects (i.e. wider than the register width) returned by value from functions?
Here's a simple disassembling of a code exampling what you're saying
typedef struct
{
int b;
int c;
int d;
int e;
int f;
int g;
char x;
} A;
A foo(int b, int c)
{
A myA = {b, c, 5, 6, 7, 8, 10};
return myA;
}
int main()
{
A myA = foo(5,9);
return 0;
}
and here's the disassembly of the foo function, and the main function calling it
main:
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 30h
call ___main
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
call _foo
mov eax, 0
leave
retn
foo:
push ebp
mov ebp, esp
sub esp, 20h
mov eax, [ebp+12]
mov [ebp-28], eax
mov eax, [ebp+16]
mov [ebp-24], eax
mov dword ptr [ebp-20], 5
mov dword ptr [ebp-16], 6
mov dword ptr [ebp-12], 7
mov dword ptr [ebp-8], 9
mov byte ptr [ebp-4], 0Ah
mov eax, [ebp+8]
mov edx, [ebp-28]
mov [eax], edx
mov edx, [ebp-24]
mov [eax+4], edx
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
leave
retn
now let's go through what just happened, so when calling foo the paramaters were passed in the following way, 9 was at highest address, then 5 then the address the myA in main begins
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
within foo there is some local myA which is stored on the stack frame, since the stack is going downwards, the lowest address of myA begins in [ebp - 28], the -28 offset could be caused by struct alignments so I'm guessing the size of the struct should be 28 bytes here and not 25 as expected. and as we can see in foo after the local myA of foo was created and filled with parameters and immediate values, it is copied and re-written to the address of myA passed from main ( this is the actual meaning of return by value )
mov eax, [ebp+8]
mov edx, [ebp-28]
[ebp + 8] is where the address of main::myA was stored ( memory address go upwards hence ebp + old ebp ( 4 bytes ) + return address ( 4 bytes )) at overall ebp + 8 to get to the first byte of main::myA, as said earlier foo::myA is stored within [ebp-28] as stack goes downwards
mov [eax], edx
place foo::myA.b in the address of the first data member of main::myA which is main::myA.b
mov edx, [ebp-24]
mov [eax+4], edx
place the value that resides in the address of foo::myA.c in edx, and place that value within the address of main::myA.b + 4 bytes which is main::myA.c
as you can see this process repeats itself through out the function
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
which basically proves that when returning a struct by val, that could not be placed in as a param, what happens is that the address of where the return value should reside in is passed as a param to the function and within the function being called the values of the returned struct are copied into the address passed as a parameter...
hope this exampled helped you visualize what happens under the hood a little bit better :)
EDIT
I hope that you've noticed that my example was using 32 bit assembler and I KNOW you've asked regarding x86-64, but I'm currently unable to disassemble code on a 64 bit machine so I hope you take my word on it that the concept is exactly the same both for 64 bit and 32 bit, and that the calling convention is nearly the same
That is exactly correct. The caller passes an extra argument which is the address of the return value. Normally it will be on the caller's stack frame but there are no guarantees.
The precise mechanics are specified by the platform ABI, but this mechanism is very common.
Various commentators have left useful links with documentation for calling conventions, so I'll hoist some of them into this answer:
Wikipedia article on x86 calling conventions
Agner Fog's collection of optimization resources, including a summary of calling conventions (Direct link to 57-page PDF document.)
Microsoft Developer Network (MSDN) documentation on calling conventions.
StackOverflow x86 tag wiki has lots of useful links.
Due to a WPO patch the way a function I called through an injected DLL changed.
The function is a __fastcall
The original function looked like
PUSH EAX
MOV EAX,DWORD PTR SS:[ESP]
PUSH EAX
LEA EBX,[ARG.22]
LEA EDI,[ARG.23]
CALL Function
So I could call it via:
Push ebx
Push edi
Push 0
Push 0
lea ebx,dword ptr ss:[ecx]
lea edi,dword ptr ss:[edx]
call Function
Pop edi
Pop ebx
retn
The function only needed 2 ascii strings.
Now after the WPO the function changed to
PUSH 0
LEA EDX,[LOCAL.22]
PUSH EDX
LEA EDX,[LOCAL.23]
XOR ECX,ECX
CALL Function
A common fastcall, which looks simpler. But the issue started that the ebp register carried a number while esi and edi the same strings but in Unicode.
While the call still needed only 2 arguments the registers contained additional which was required.
So instead of calling the function via 2 Ascii on ecx and edx I wrote a struct which contained the strings as ascii and unicode.
My attempt to solve it looked like
pushad
push 0
lea edi,dword ptr ss:[ecx+0x20]
lea esi,dword ptr ss:[ecx]
mov ebp, 100
lea edx,dword ptr ss:[ecx+0x50]
push edx
lea edx,dword ptr ss:[ecx+0x40]
xor ecx, ecx
call Function
pop edx
popad
retn
I followed it in the debugger and the call is processed as it should be, but after the the function returns to my asmstub and returns to my c++ code my code creates an exception on write.
Did I make a fundamental asm mistake such as messing up the order which causes the exception?
Using Visual Studio, I have made a very simple Class in C++ called Watertank, which has a member function:
double Watertank::getcapacity() const{
return capacity;
}
When I run the code:
Watertank wt = Watertank(100);
double capacity = wt.getcapacity();
the double capacity = wt.getcapacity(); generates the following assembly:
push ebp
mov ebp, esp
mov ecx, 0F2E320h
call Watertank::getcapacity(0F21073h)
fstp qword ptr ds:[0F2E330h]
cmp ebp,esp
call _RTC_CheckEsp (0F250B0h)
pop ebp
ret
And the assembly generated for the double Watertank::getcapacity() const body is:
push ebp
mov ebp,esp
push ecx
mov dword ptr [this],0CCCCCCCCh
mov dword ptr [this],ecx
mov eax,dword ptr [this]
fld qword ptr [eax]
mov esp,ebp
pop ebp
ret
Now, as I see it, when calling the wt.getcapacity() function, the base pointer is pushed onto the stack and the base pointer is updated to the current stack pointer. The function can then be executed, and the base pointer can be popped off the stack to return to the state before entering the function.
What I don't understand is why the function body also pushes a base pointer and pops it? I assume it has something to do with the use of the ecx register, but I don't know what that is used for.