Variable arguments in _stdcall, C++ / Inline ASM - c++

I'm in a situation where I have to mock up a _stdcall function using C++ and inline ASM, but which uses a variable number of arguments. Normally it wouldn't know how many arguments to pop from the stack when it returns control to its parent, so wouldn't work, but I'm hoping to tell it via a global variable how many params it should have and then get it to pop them off like that.
Is that actually possible? If so, can someone start me off in the right direction? I'm specifically stuck with the epilog code I would need.
My objective is to make a function which can be used as a callback for any function that requires one (like EnumWindows), so long as the user tells it at runtime how long the args list has to be. The idea is for it to integrate with some code elsewhere so it basically runs a trigger each time the callback is called and provides a link to a place where the variables that were returned can be read and viewed by the user.
Does that make sense?

Doesn't make sense. __stdcall doesn't allow variadic parameters, as the total size of all parameters is decorated into the function name (from msdn):
Name-decoration convention
An underscore (_) is prefixed to the name. The name is followed by the at sign (#) followed by the number of bytes (in decimal) in the argument list. Therefore, the function declared as int func( int a, double b ) is decorated as follows: _func#12
This quote tells you how variadic __stdcall functions are implemented:
The __stdcall calling convention is used to call Win32 API functions. The callee cleans the stack, so the compiler makes vararg functions __cdecl. Functions that use this calling convention require a function prototype.
(emphasis mine)
So, there are no __stdcall functions with variadic parameters, they silently get changed to __cdecl. :)

You can do something like the following (hacked up code):
static int NumberOfParameters = 0;
__declspec(naked) void GenericCallback()
{
// prologue
__asm push ebp
__asm mov ebp, esp
// TODO: do something with parameters on stack
// manual stack unwinding for 2 parameters
// obviously you would adjust for the appropriate number of parameters
// (e.g. NumberOfParameters) instead of hard-coding it for 2
// fixup frame pointer
__asm mov eax, [ebp + 0]
__asm mov [ebp + 8], eax // NumberOfParameters * 4 (assuming dword-sized parameters)
// fixup return address
__asm mov eax, [ebp + 4]
__asm mov [ebp + 12], eax // (NumberOfParameters + 1) * 4
// return TRUE
__asm mov eax, 1
// epilogue
__asm mov esp, ebp
__asm pop ebp
// fixup stack pointer
__asm add esp, 8 // NumberOfParameters * 4
__asm ret 0
}
int main(int argc, _TCHAR* argv[])
{
NumberOfParameters = 2;
EnumWindows((WNDENUMPROC)GenericCallback, NULL);
return 0;
}

Related

MSVC Assembly function arguments C++ vs _asm

I have a function which takes 3 arguments, dest, src0, src1, each a pointer to data of size 12. I made two versions. One is written in C and optimized by the compiler, the other one is fully written in _asm. So yeah. 3 arguments? I naturally do something like:
mov ecx, [src0]
mov edx, [src1]
mov eax, [dest]
I am a bit confused by the compiler, as it saw fit to add the following:
_src0$ = -8 ; size = 4
_dest$ = -4 ; size = 4
_src1$ = 8 ; size = 4
?vm_vec_add_scalar_asm##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add_scalar_asm
; _dest$ = ecx
; _src0$ = edx
; 20 : {
sub esp, 8
mov DWORD PTR _src0$[esp+8], edx
mov DWORD PTR _dest$[esp+8], ecx
; 21 : _asm
; 22 : {
; 23 : mov ecx, [src0]
mov ecx, DWORD PTR _src0$[esp+8]
; 24 : mov edx, [src1]
mov edx, DWORD PTR _src1$[esp+4]
; 25 : mov eax, [dest]
mov eax, DWORD PTR _dest$[esp+8]
Function body etc.
add esp, 8
ret 0
What does the _src0$[esp+8] etc. even means? Why does it do all this stuff before my code? Why does it try to [apparently]stack anything so badly?
In comparison, the C++ version has only the following before its body, which is pretty similar:
_src1$ = 8 ; size = 4
?vm_vec_add##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add
; _dest$ = ecx
; _src0$ = edx
mov eax, DWORD PTR _src1$[esp-4]
Why is this little sufficient?
The answer of Mats Petersson explained __fastcall. But I guess that is not exactly what you're asking ...
Actually _src0$[esp+8] just means [_src0$ + esp + 8], and _src0$ is defined above:
_src0$ = -8 ; size = 4
So, the whole expression _src0$[esp+8] is nothing but [esp] ...
To see why it does all these stuff, you should probably first understand what Mats Petersson said in his post, the __fastcall, or more generally, what is a calling convention. See the link in his post for detailed informations.
Assuming that you have understood __fastcall, now let's see what happens to your codes. The compiler is using __fastcall. Your callee function is f(dst, src0, src1), which requires 3 parameters, so according to the calling convention, when a caller calls f, it does the following:
Move dst to ecx and src0 to edx
Push src1 onto the stack
Push the 4 bytes return address onto the stack
Go to the starting address of the function f
And the callee f, when its code begins, then knows where the parameters are: dst and src0 are in the registers ecx and edx, respectively; esp is pointing to the 4 bytes return address, but the 4 bytes below it (i.e. DWORD PTR[esp+4]) is exactly src1.
So, in your "C++ version", the function f just does what it should do:
mov eax, DWORD PTR _src1$[esp-4]
Here _src1$ = 8, so _src1$[esp-4] is exactly [esp+4]. See, it just retrieves the parameter src1 and stores it in eax.
There is however a tricky point here. In the code of f, if you want to use the parameter src1 multiple times, you can certainly do that, because it's always stored in the stack, right below the return address; but what if you want to use dst and src0 multiple times? They are in the registers, and can be destroyed at any time.
So in that case, the compiler should do the following: right after entering the function f, it should remember the current values of ecx and edx (by pushing them onto the stack). These 8 bytes are the so-called "shadow space". It is not done in your "C++ version", probably because the compiler knows for sure that these two parameters will not be used multiple times, or that it can handle it properly some other way.
Now, what happens to your _asm version? The problem here is that you are using inline assembly. The compiler then loses its control to the registers, and it cannot assume that the registers ecx and edx are safe in your _asm block (they are actually not, since you used them in the _asm block). Thus it is forced to save them at the beginning of the function.
The saving goes as follows: it first raises esp by 8 bytes (sub esp, 8), then move edx and ecx to [esp] and [esp+4] respectively.
And then it can enter safely your _asm block. Now in its mind (if it has one), the picture is that [esp] is src0, [esp+4] is dst, [esp+8] is the 4 byte return address, and [esp+12] is src1. It no longer thinks about ecx and edx.
Thus your first instruction in the _asm block, mov ecx, [src0], should be interpreted as mov ecx, [esp], which is the same as
mov ecx, DWORD PTR _src0$[esp+8]
and the same for the other two instructions.
At this point, you might say, aha it's doing stupid things, I don't want it to waste time and space on that, is there a way?
Well there is a way - do not use inline assembly... it's convenient, but there is a compromise.
You can write the assembly function f in a .asm source file and public it. In the C/C++ code, declare it as extern 'C' f(...). Then, when you begin your assembly function f, you can play directly with your ecx and edx.
The compiler has decided to use a calling convention that uses "pass arguments in registers" aka __fastcall. This allows the compiler to pass some of the arguments in registers, instead of pushing onto stack, and this can reduce the overhead in the call, because moving from a variable to a register is faster than pushing onto the stack, and it's now already in a register when we get to the callee function, so no need to read it from the stack.
There is a lot more information about how calling conventions work on the web. The wikipedia article on x86 calling conventions is a good starting point.

C++ Passing arguments to inline assembler function

I have problem with inline asm in C++. I'm trying to implement fast strlen, but it is not working - when I use __declspec(naked) keyword debugger shows address of input as 0x000000, when I don't use that keyword, eax is pointing for some trash, and function returns various values.
Here's code:
int fastStrlen(char *input) // I know that function does not calculate strlen
{ // properly, but I just want to know why it crashes
_asm // access violation when I try to write to variable x
{
mov ecx, dword ptr input
xor eax, eax
start:
mov bx, [ecx]
cmp bl, '\0'
je Sxend
inc eax
cmp bh, '\0'
je Sxend
inc eax
add ecx, 2
jmp start
Sxend:
ret
}
}
int _tmain(int argc, _TCHAR* argv[])
{
char* test = "test";
int x = fastStrlen(test);
cout << x;
return 0;
}
can anybody point me out what am I doing wrong?
Don't use __declspec(naked) since in that case the complier doesn't generate epilogue and prologue instructions and you need to generate a prologue just like compiler expects you to if you want to access the argument fastStrlen. Since you don't know what the compiler expects you should just let it generate the prologue.
This means you can't just use ret to return to the caller because this means you're supplying your own epilogue. Since you don't know what prologue the compiler used, you don't know what epilogue you need implement to reverse it. Instead assign the return value to a C variable you declare inside the function before the inline assembly statement and return that variable in a normal C return statement. For example:
int fastStrlen(char *input)
{
int retval;
_asm
{
mov ecx, dword ptr input
...
Sxend:
mov retval,eax
}
return retval;
}
As noted in your comments your code will not be able to improve on the strlen implementation in your compiler's runtime library. It also reads past the end of strings of even lengths, which will cause a memory fault if the byte past the end of a string isn't mapped into memory.

How to use variables in __asm?

I'm compiling this C++ code with the VC compiler. I'm trying to call a function that takes two WORD (aka unsigned short) parameters using the __asm statement, like this:
__declspec(naked) void __stdcall CallFunction(WORD a, WORD b)
{
__asm {
PUSH EBP
MOV EBP, ESP
PUSH a
PUSH b
CALL functionAddress
LEAVE
RETN
}
}
The function at functionAddress simply outputs the result of doing a + b. Then calling CallFuncion(5, 5); prints "64351" or something like that. The problem is when using the a and b variables inside the __asm statement because this works:
PUSH EBP
MOV EBP, ESP
PUSH 5
PUSH 5
CALL functionAddress
LEAVE
This is the function at functionAddress:
void __stdcall Add(WORD a, WORD b)
{
WORD c;
c = a + b;
printf("The result is %d\n", c);
}
How can I do this the right way? So the __asm statement interpretate the a and b values?
Since you're using __declspec(naked) and setting up your own stack frame, I don't believe the compiler will let you refer to a and b by name. Using __declspec(naked) basically means you're responsible for dealing with the stack frame, parameters, etc., on your own.
You probably want code more on this general order:
__asm {
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL functionAddress
LEAVE
RETN
}
I'ts been a while since I've handled things like this by hand, so you might want to re-check those offsets, but if I recall correctly, the return address should be at [ebp+4]. Parameters are (usually) pushed from right to left, so the the left-most parameter should be next at [ebp+8], and the next parameter at [ebp+12] (keeping in mind that the stack grows downward).
Edit: [I should have looked more carefully at the function heading.]
You've marked CallFunction as using the __stdcall calling convention. That means it's required to clean up the parameters that were passed to it. So, since it receives 8 bytes of parameters, it needs to remove 8 bytes from the stack as it returns:
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL Add_f
LEAVE
RET 8

Inline assembly language

I am doing 64 bit migration and i need to port inline assembly code to cpp Here is he code
void ExternalFunctionCall::callFunction(ArgType resultType, void* resultBuffer)
{
// I386
// just copy the args buffer to the stack (it's already layed out correctly)
int* begin = m_argsBegin;
int* ptr = m_argsEnd;
while (ptr > begin) {
int val = *(--ptr);
__asm push val
}
}
I want to migrate this __asm push val to cpp. This function is called four times and for every call we get different values of m_argsBegin and m_argsEnd(both m_argsBegin and m_argsEnd are dynamic arrays).
This while loop executes 4 times for every call of this "callFunction" function. So, in total 4x4 = 16 values are to be stored in a "CONTINUOUS memory location" this is what "__asm push val" does i guess. I need to implement this in c++ . I tried every possible way (stack, array, Lnked list, Queue even separated this into a separate asm file but none are working)
Can anyone help?
I separated this inline assembly function into a separate assembly file . Here is the code:
.386
.model c,flat
public callFunction_asm
CSEG segment public 'CODE'
callFunction_asm PROC
push ebp
mov ebp, esp
mov ecx, [ebp+8] ;val
push dword ptr [ecx]
mov esp, ebp
pop ebp
RETN
callFunction_asm ENDP
CSEG ends
END
where callFunction_asm is an extern function , I declared it as:
extern "C"
void callFunction_asm(int val);
and I am calling this function as:
while (ptr > begin) {
int val = *(--ptr);
callFunction_asm(val); //possible replacement
}
but even this is not working, can anyone tell where am I going wrong. I am new to assembly coding.
push puts its operand on the stack, as well as decrementing the stack pointer.
If you looked at the stack pointer plus 1 (1($sp)), you should see the value (but if you wanted it back, you'd typically use pop).

Accessing function parameters in C++ from assembly in IA-32

I have been learning IA-32 assembly programming. So I would like to write a function in assembly and call it from C++.
The tutorial I am following is actually for x64 assembly. But I am working on IA-32. In x64, it says function arguments are stored in registers like RCX, RDX, R8, R9 etc.
But on searching a little bit, I could understand in IA-32, arguments are stored in stack, not in registers.
Below is my C++ code :
#include <iostream>
#include <conio.h>
using namespace std;
extern "C" int PassParam(int a,int b);
int main()
{
cout << "z is " << PassParam(15,13) << endl;
_getch();
return 0;
}
Below is assembly code for PassParam() function (it just add two arguments, that's all. It is only for learning purpose) :
PassParam() in assembly :
.model C,flat
.code
PassParam proc
mov eax,[ebp-212]
add eax,[ebp-216]
ret
PassParam endp
end
In my assembly code, you can see I moved first argument from [ebp-212] to eax. That value is obtained as follows :
I wrote PassParam() function in C++ itself and disassembled it. Then checked where ebp is and where is second argument stored (arguments are stored from right to left). I could see there is a difference of 212, so that is how i got that value. Then as usual, first argument is stored 4 bytes later. And it works fine.
Question :
Is this the correct method to access arguments from assembly ? I mean, is it always [ebp-212] where argument stored?
If not, can anyone explain the correct method to pass arguments from C++ to assembly ?
Note :
I am working with Visual C++ 2010, on Windows 7 machine.
On 32bit architectures, it depends on the calling convention, Windows for example has both __fastcall and __thiscall that use register and stack args, and __cdecl and __stdcall that use stack args but differ in who does the cleanup. MSDN has a nice listing here (or the more assembly orientated version). Note that FPU/SSE operations also have their own conventions.
For ease and simplicity, try use __stdcall for everything, this allows you to use stack frames to access args via MOV r32,[EBP+4+(arg_index * 4)], or if you aren't using stack frames, you can use MOV r32,[ESP+local_stack_offset+(arg_index * 4)]. The annotated C++ -> x86 Assembly example here should be of help.
So as a simple example, lets say we have the function MulAdd in assembly, with the C++ prototype int __stdcall MulAdd(int base, int mul, int add), it would look something like:
MOV EAX,[ESP+4] //get the first arg('base') off the stack
MOV ECX,[ESP+8] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[ESP+12] //get arg 3 off the stack
ADD EAX,ECX
RETN 12 //cleanup the 3 args and return
Or if you use a stack frame:
PUSH EBP
MOV EBP,ESP //save the stack
MOV EAX,[EBP+8] //get the first arg('base') off the stack
MOV ECX,[EBP+12] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[EBP+16] //get arg 3 off the stack
ADD EAX,ECX
MOV ESP,EBP //restore the stack
POP EBP
RETN //return to caller
Using the stack frame avoids needing to adjust for changes made to the stack by PUSH'ing of args, spilling or registers or stack allocations made for local variables. Its downside is that it reduces the number of registers you have to work with.