Retrieve arguments of a x64 masm assembly procedure - c++

I have a function with the signature :
extern "C" int foo(int a, int b, int c, int d, int e);
which is in fact written in assembly.
With ml(32 bits), using standard calling convention you can pretty much write
.code
foo PROC a: DWORD, b: DWORD ,c: DWORD, d: DWORD, e: DWORD
mov eax, d
mov ebx, e
and start using those labels to access your arguments
With ml64 (64 bits) the fastcall is the only convention available. I have no trouble accessing the first arguments stored in the registers, but issues to access the ones in the stack (e in this example): I tried
.code
foo PROC a: DWORD, b: DWORD ,c: DWORD, d: DWORD, e: DWORD
and
.code
foo PROC e: DWORD
but the value in e is garbage.
I found that if I use the stack address directly I find the value.
.code
foo PROC e: DWORD
mov eax, r9 ; d
mov ebx, DWORD PTR[rbp + 48] ; e
Is there another way?

Documentation explains everything... In Windows, the first four integer parameters are passed in registers RCX, RDX, R8, R9 and floating point in XMM0, XMM1, XMM2, XMM3, anything more than four parameters are passed on the stack above the shadow space. For Unix type OS's it is a bit different.
So, your example is correct - mov ebx, DWORD PTR[rbp + 48] ; e
Shadow space = 32 + saved rbp = 40 + 5th parameter = 48

given
extern "C" int foo(int a, int b, int c, int d, int e);
I found out that visual studio 2010 doesn't save the base pointer RBP if
.code
foo PROC
but save the base pointer if
.code
foo PROC e: DWORD
Later versions (vs2015) don't allow the second code.
There is an optional optimization in x64 systems where RBP is not used (found out the hard way). It says :
The conventional use of %rbp as a frame pointer for the stack frame
may be avoided by using %rsp (the stack pointer) to index into the
stack frame. This technique saves two instructions in the prologue and
epilogue and makes one additional general-purpose register (%rbp)
available.
So it is possible that either foo PROC e: DWORD doesnt compile (vs2015), or foo PROC crashes because RBP is null.
The correct way to retrieve stack arguments is to use the RSP stack pointer given that
RBP = RSP + 8 * num_saved_reg
Where num_saved_reg is the number of registers specified in the PROC directive. So when rbp is not saved (otherwise add 8)
PROC -> DWORD PTR[rsp + 40]
PROC use RDI -> DWORD PTR[rsp + 40 + 8]
PROC use RDI RSI RBX -> DWORD PTR[rsp + 40 + 24]

Related

Create constexpr-like local variable similarly to a const global variable which can still be modified

If I have a struct and create a global variable of that type, the compiler embeds the struct and all of its values in the .data section of the binary, rather than creating it at runtime. For example, the code:
// everything is in global scope
struct test
{
int x;
int y;
float z;
};
test t{34, 1, 394.6};
creates this compiler output (from compiler explorer, MSVC x64 latest, C++20 /O2):
_DATA SEGMENT
test t DD 022H ; t
DD 01H
DD 043c54ccdr ; 394.6
_DATA ENDS
...
I can modify just one field of this global variable, because it is both readable and writable. constexpr based variables will create the same compile-time structure in the binary, but they will be read only.
I have a large structure (TASKDIALOGCONFIG from the win32 api) and I want the same benefit that global initialization provides, that is, everything will be already created in the object file, however I don't want to make the structure a global variable. I have tried using constexpr, but obviously I can't modify the one or two field members I need to change (they are set to 0 when constructed using constexpr).
I could make the local variable static, as static will create the same compiler output as a global variable, but I'm trying to avoid using static because the function will get called again and the struct needs to be changed differently every time it's called.
Modifying the fields I need to change to be mutable would work, as this code:
struct test
{
mutable int x;
int y;
float z;
};
void sample()
{
constexpr test xx{1, 2, 3};
xx.x = 23;
// fake TASKDIALOGCONFIG struct
TaskDialogIndirect(&xx);
}
Creates this output:
__real#40400000 DD 040400000r ; 3
voltbl SEGMENT
_volmd DD 0ffffffffH
DDSymXIndex: FLAT:void sample(void)
DD 0eH
DD 043H
voltbl ENDS
xx$ = 32
__$ArrayPad$ = 48
void sample(void) PROC ; sample
$LN3:
sub rsp, 72 ; 00000048H
mov rax, QWORD PTR __security_cookie
xor rax, rsp
mov QWORD PTR __$ArrayPad$[rsp], rax
mov DWORD PTR xx$[rsp], 1
mov DWORD PTR xx$[rsp+4], 2
movss xmm0, DWORD PTR __real#40400000
movss DWORD PTR xx$[rsp+8], xmm0
mov DWORD PTR xx$[rsp], 23
lea rcx, QWORD PTR xx$[rsp]
call void TaskDialogIndirect(test const *) ; TaskDialogIndirect
mov rcx, QWORD PTR __$ArrayPad$[rsp]
xor rcx, rsp
call __security_check_cookie
add rsp, 72 ; 00000048H
ret 0
void sample(void) ENDP
But the problem is, 1) I can't modify the struct definition; it's part of the win32 api header files, and 2) as the compiler listing shows, the creation of the constexpr variable still happens at runtime, for some reason.
Is there any way to do what I want, or do I have to resort to using static and/or global variables?

Why are parameters allocated below the frame pointer instead of above?

I have tried to understand this basing on a square function in c++ at godbolt.org . Clearly, return, parameters and local variables use “rbp - alignment” for this function.
Could someone please explain how this is possible?
What then would rbp + alignment do in this case?
int square(int num){
int n = 5;// just to test how locals are treated with frame pointer
return num * num;
}
Compiler (x86-64 gcc 11.1)
Generated Assembly:
square(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi. ;\\Both param and local var use rbp-*
mov DWORD PTR[rbp-4], 5. ;//
mov eax, DWORD PTR [rbp-20]
imul eax, eax
pop rbp
ret
This is one of those cases where it’s handy to distinguish between parameters and arguments. In short: arguments are the values given by the caller, while parameters are the variables holding them.
When square is called, the caller places the argument in the rdi register, in accordance with the standard x86-64 calling convention. square then allocates a local variable, the parameter, and places the argument in the parameter. This allows the parameter to be used like any other variable: be read, written into, having its address taken, and so on. Since in this case it’s the callee that allocated the memory for the parameter, it necessarily has to reside below the frame pointer.
With an ABI where arguments are passed on the stack, the callee would be able to reuse the stack slot containing the argument as the parameter. This is exactly what happens on x86-32 (pass -m32 to see yourself):
square(int): # #square(int)
push ebp
mov ebp, esp
push eax
mov eax, dword ptr [ebp + 8]
mov dword ptr [ebp - 4], 5
mov eax, dword ptr [ebp + 8]
imul eax, dword ptr [ebp + 8]
add esp, 4
pop ebp
ret
Of course, if you enabled optimisations, the compiler would not bother with allocating a parameter on the stack in the callee; it would just use the value in the register directly:
square(int): # #square(int)
mov eax, edi
imul eax, edi
ret
GCC allows "leaf" functions, those that don't call other functions, to not bother creating a stack frame. The free stack is fair game to do so as these fns wish.

C/C++ returning struct by value under the hood

(This question is specific to my machine's architecture and calling conventions, Windows x86_64)
I don't exactly remember where I had read this, or if I had recalled it correctly, but I had heard that, when a function should return some struct or object by value, it will either stuff it in rax (if the object can fit in the register width of 64 bits) or be passed a pointer to where the resulting object would be (I'm guessing allocated in the calling function's stack frame) in rcx, where it would do all the usual initialization, and then a mov rax, rcx for the return trip. That is, something like
extern some_struct create_it(); // implemented in assembly
would really have a secret parameter like
extern some_struct create_it(some_struct* secret_param_pointing_to_where_i_will_be);
Did my memory serve me right, or am I incorrect? How are large objects (i.e. wider than the register width) returned by value from functions?
Here's a simple disassembling of a code exampling what you're saying
typedef struct
{
int b;
int c;
int d;
int e;
int f;
int g;
char x;
} A;
A foo(int b, int c)
{
A myA = {b, c, 5, 6, 7, 8, 10};
return myA;
}
int main()
{
A myA = foo(5,9);
return 0;
}
and here's the disassembly of the foo function, and the main function calling it
main:
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 30h
call ___main
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
call _foo
mov eax, 0
leave
retn
foo:
push ebp
mov ebp, esp
sub esp, 20h
mov eax, [ebp+12]
mov [ebp-28], eax
mov eax, [ebp+16]
mov [ebp-24], eax
mov dword ptr [ebp-20], 5
mov dword ptr [ebp-16], 6
mov dword ptr [ebp-12], 7
mov dword ptr [ebp-8], 9
mov byte ptr [ebp-4], 0Ah
mov eax, [ebp+8]
mov edx, [ebp-28]
mov [eax], edx
mov edx, [ebp-24]
mov [eax+4], edx
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
leave
retn
now let's go through what just happened, so when calling foo the paramaters were passed in the following way, 9 was at highest address, then 5 then the address the myA in main begins
lea eax, [esp+20] ; placing the addr of myA in eax
mov dword ptr [esp+8], 9 ; param passing
mov dword ptr [esp+4], 5 ; param passing
mov [esp], eax ; passing myA addr as a param
within foo there is some local myA which is stored on the stack frame, since the stack is going downwards, the lowest address of myA begins in [ebp - 28], the -28 offset could be caused by struct alignments so I'm guessing the size of the struct should be 28 bytes here and not 25 as expected. and as we can see in foo after the local myA of foo was created and filled with parameters and immediate values, it is copied and re-written to the address of myA passed from main ( this is the actual meaning of return by value )
mov eax, [ebp+8]
mov edx, [ebp-28]
[ebp + 8] is where the address of main::myA was stored ( memory address go upwards hence ebp + old ebp ( 4 bytes ) + return address ( 4 bytes )) at overall ebp + 8 to get to the first byte of main::myA, as said earlier foo::myA is stored within [ebp-28] as stack goes downwards
mov [eax], edx
place foo::myA.b in the address of the first data member of main::myA which is main::myA.b
mov edx, [ebp-24]
mov [eax+4], edx
place the value that resides in the address of foo::myA.c in edx, and place that value within the address of main::myA.b + 4 bytes which is main::myA.c
as you can see this process repeats itself through out the function
mov edx, [ebp-20]
mov [eax+8], edx
mov edx, [ebp-16]
mov [eax+0Ch], edx
mov edx, [ebp-12]
mov [eax+10h], edx
mov edx, [ebp-8]
mov [eax+14h], edx
mov edx, [ebp-4]
mov [eax+18h], edx
mov eax, [ebp+8]
which basically proves that when returning a struct by val, that could not be placed in as a param, what happens is that the address of where the return value should reside in is passed as a param to the function and within the function being called the values of the returned struct are copied into the address passed as a parameter...
hope this exampled helped you visualize what happens under the hood a little bit better :)
EDIT
I hope that you've noticed that my example was using 32 bit assembler and I KNOW you've asked regarding x86-64, but I'm currently unable to disassemble code on a 64 bit machine so I hope you take my word on it that the concept is exactly the same both for 64 bit and 32 bit, and that the calling convention is nearly the same
That is exactly correct. The caller passes an extra argument which is the address of the return value. Normally it will be on the caller's stack frame but there are no guarantees.
The precise mechanics are specified by the platform ABI, but this mechanism is very common.
Various commentators have left useful links with documentation for calling conventions, so I'll hoist some of them into this answer:
Wikipedia article on x86 calling conventions
Agner Fog's collection of optimization resources, including a summary of calling conventions (Direct link to 57-page PDF document.)
Microsoft Developer Network (MSDN) documentation on calling conventions.
StackOverflow x86 tag wiki has lots of useful links.

MSVC Assembly function arguments C++ vs _asm

I have a function which takes 3 arguments, dest, src0, src1, each a pointer to data of size 12. I made two versions. One is written in C and optimized by the compiler, the other one is fully written in _asm. So yeah. 3 arguments? I naturally do something like:
mov ecx, [src0]
mov edx, [src1]
mov eax, [dest]
I am a bit confused by the compiler, as it saw fit to add the following:
_src0$ = -8 ; size = 4
_dest$ = -4 ; size = 4
_src1$ = 8 ; size = 4
?vm_vec_add_scalar_asm##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add_scalar_asm
; _dest$ = ecx
; _src0$ = edx
; 20 : {
sub esp, 8
mov DWORD PTR _src0$[esp+8], edx
mov DWORD PTR _dest$[esp+8], ecx
; 21 : _asm
; 22 : {
; 23 : mov ecx, [src0]
mov ecx, DWORD PTR _src0$[esp+8]
; 24 : mov edx, [src1]
mov edx, DWORD PTR _src1$[esp+4]
; 25 : mov eax, [dest]
mov eax, DWORD PTR _dest$[esp+8]
Function body etc.
add esp, 8
ret 0
What does the _src0$[esp+8] etc. even means? Why does it do all this stuff before my code? Why does it try to [apparently]stack anything so badly?
In comparison, the C++ version has only the following before its body, which is pretty similar:
_src1$ = 8 ; size = 4
?vm_vec_add##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add
; _dest$ = ecx
; _src0$ = edx
mov eax, DWORD PTR _src1$[esp-4]
Why is this little sufficient?
The answer of Mats Petersson explained __fastcall. But I guess that is not exactly what you're asking ...
Actually _src0$[esp+8] just means [_src0$ + esp + 8], and _src0$ is defined above:
_src0$ = -8 ; size = 4
So, the whole expression _src0$[esp+8] is nothing but [esp] ...
To see why it does all these stuff, you should probably first understand what Mats Petersson said in his post, the __fastcall, or more generally, what is a calling convention. See the link in his post for detailed informations.
Assuming that you have understood __fastcall, now let's see what happens to your codes. The compiler is using __fastcall. Your callee function is f(dst, src0, src1), which requires 3 parameters, so according to the calling convention, when a caller calls f, it does the following:
Move dst to ecx and src0 to edx
Push src1 onto the stack
Push the 4 bytes return address onto the stack
Go to the starting address of the function f
And the callee f, when its code begins, then knows where the parameters are: dst and src0 are in the registers ecx and edx, respectively; esp is pointing to the 4 bytes return address, but the 4 bytes below it (i.e. DWORD PTR[esp+4]) is exactly src1.
So, in your "C++ version", the function f just does what it should do:
mov eax, DWORD PTR _src1$[esp-4]
Here _src1$ = 8, so _src1$[esp-4] is exactly [esp+4]. See, it just retrieves the parameter src1 and stores it in eax.
There is however a tricky point here. In the code of f, if you want to use the parameter src1 multiple times, you can certainly do that, because it's always stored in the stack, right below the return address; but what if you want to use dst and src0 multiple times? They are in the registers, and can be destroyed at any time.
So in that case, the compiler should do the following: right after entering the function f, it should remember the current values of ecx and edx (by pushing them onto the stack). These 8 bytes are the so-called "shadow space". It is not done in your "C++ version", probably because the compiler knows for sure that these two parameters will not be used multiple times, or that it can handle it properly some other way.
Now, what happens to your _asm version? The problem here is that you are using inline assembly. The compiler then loses its control to the registers, and it cannot assume that the registers ecx and edx are safe in your _asm block (they are actually not, since you used them in the _asm block). Thus it is forced to save them at the beginning of the function.
The saving goes as follows: it first raises esp by 8 bytes (sub esp, 8), then move edx and ecx to [esp] and [esp+4] respectively.
And then it can enter safely your _asm block. Now in its mind (if it has one), the picture is that [esp] is src0, [esp+4] is dst, [esp+8] is the 4 byte return address, and [esp+12] is src1. It no longer thinks about ecx and edx.
Thus your first instruction in the _asm block, mov ecx, [src0], should be interpreted as mov ecx, [esp], which is the same as
mov ecx, DWORD PTR _src0$[esp+8]
and the same for the other two instructions.
At this point, you might say, aha it's doing stupid things, I don't want it to waste time and space on that, is there a way?
Well there is a way - do not use inline assembly... it's convenient, but there is a compromise.
You can write the assembly function f in a .asm source file and public it. In the C/C++ code, declare it as extern 'C' f(...). Then, when you begin your assembly function f, you can play directly with your ecx and edx.
The compiler has decided to use a calling convention that uses "pass arguments in registers" aka __fastcall. This allows the compiler to pass some of the arguments in registers, instead of pushing onto stack, and this can reduce the overhead in the call, because moving from a variable to a register is faster than pushing onto the stack, and it's now already in a register when we get to the callee function, so no need to read it from the stack.
There is a lot more information about how calling conventions work on the web. The wikipedia article on x86 calling conventions is a good starting point.

How to use variables in __asm?

I'm compiling this C++ code with the VC compiler. I'm trying to call a function that takes two WORD (aka unsigned short) parameters using the __asm statement, like this:
__declspec(naked) void __stdcall CallFunction(WORD a, WORD b)
{
__asm {
PUSH EBP
MOV EBP, ESP
PUSH a
PUSH b
CALL functionAddress
LEAVE
RETN
}
}
The function at functionAddress simply outputs the result of doing a + b. Then calling CallFuncion(5, 5); prints "64351" or something like that. The problem is when using the a and b variables inside the __asm statement because this works:
PUSH EBP
MOV EBP, ESP
PUSH 5
PUSH 5
CALL functionAddress
LEAVE
This is the function at functionAddress:
void __stdcall Add(WORD a, WORD b)
{
WORD c;
c = a + b;
printf("The result is %d\n", c);
}
How can I do this the right way? So the __asm statement interpretate the a and b values?
Since you're using __declspec(naked) and setting up your own stack frame, I don't believe the compiler will let you refer to a and b by name. Using __declspec(naked) basically means you're responsible for dealing with the stack frame, parameters, etc., on your own.
You probably want code more on this general order:
__asm {
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL functionAddress
LEAVE
RETN
}
I'ts been a while since I've handled things like this by hand, so you might want to re-check those offsets, but if I recall correctly, the return address should be at [ebp+4]. Parameters are (usually) pushed from right to left, so the the left-most parameter should be next at [ebp+8], and the next parameter at [ebp+12] (keeping in mind that the stack grows downward).
Edit: [I should have looked more carefully at the function heading.]
You've marked CallFunction as using the __stdcall calling convention. That means it's required to clean up the parameters that were passed to it. So, since it receives 8 bytes of parameters, it needs to remove 8 bytes from the stack as it returns:
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL Add_f
LEAVE
RET 8