I know what the differences between __cdecl and __stdcall are, but I'm not quite sure as to why __stdcall is ignored by the compiler in x64 builds.
The functions in the following code
int __stdcall stdcallFunc(int a, int b, int c, int d, int e, int f, int g)
{
return a + b + c + d + e + f + g;
}
int __cdecl cdeclFunc(int a, int b, int c, int d, int e, int f, int g)
{
return a + b + c + d + e + f + g;
}
int main()
{
stdcallFunc(1, 2, 3, 4, 5, 6, 7);
cdeclFunc(1, 2, 3, 4, 5, 6, 7);
return 0;
}
have enough parameters to exceed the available CPU registers. Therefore, some arguments must be passed via the stack. I'm not fluent in assembly but I noticed some differences between x86 and x64 assembly.
x64
main PROC
$LN3:
sub rsp, 72 ; 00000048H
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?stdcallFunc##YAHHHHHHHH#Z ; stdcallFunc
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?cdeclFunc##YAHHHHHHHH#Z ; cdeclFunc
xor eax, eax
add rsp, 72 ; 00000048H
ret 0
main ENDP
x86
_main PROC
push ebp
mov ebp, esp
push 7
push 6
push 5
push 4
push 3
push 2
push 1
call ?stdcallFunc##YGHHHHHHHH#Z ; stdcallFunc
push 7
push 6
push 5
push 4
push 3
push 2
push 1
call ?cdeclFunc##YAHHHHHHHH#Z ; cdeclFunc
add esp, 28 ; 0000001cH
xor eax, eax
pop ebp
ret 0
_main ENDP
The first 4 arguments are, as expected, passed via registers in x64.
The remaining arguments are put on the stack in the same order as in x86.
Contrary to x86, in x64 we don't use push instructions. Instead we reserve enough stack space at the beginning of main and use mov instructions to add the arguments to the stack.
In x64, no stack cleanup is happening after both calls, but at the end of main.
This brings me to my questions:
Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
Why is there no stack cleanup after the call instructions in x64?
What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?
From the docs:
On ARM and x64 processors, __stdcall is accepted and ignored by the compiler
Here is the example code and assembly.
Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
That is not the reason. Both of these instructions also exist in x86 assembly language.
The reason why your compiler is not emitting a push instruction for the x64 code is probably because it must adjust the stack pointer directly anyway, in order to create 32 bytes of "shadow space" for the called function. See this link (which was provided by #NateEldredge) for further information on "shadow space".
Allocating 32 bytes of "shadow space" with push instructions would take 4 64-bit push instructions, but only one sub instruction. That is why it prefers to use the sub instruction. Since it is using the sub instruction anyway to create 32 bytes of shadow space, there is no penalty to change the operand of the sub instruction from 32 to 72, which allocates 72 bytes of memory on the stack, which is enough to also pass 3 paramters on the stack (the other 4 are passed in CPU registers).
I don't understand why it is allocating 72 bytes on the stack, though, as, according to my calculcations, it only has to be 56 bytes (32 bytes of "shadow space" and 24 bytes for the 3 parameters that are passed on the stack). Possibly, the compiler is reserving those extra 16 bytes for local variables or for exception handling, which may be optimized away when compiler optimizations are active.
Why is there no stack cleanup after the call instructions in x64?
There is stack cleanup after the call instructions. This is what the line
add rsp, 72
does.
However, for some reason (probably increased performance), the x64 compiler only performs the cleanup at the end of the calling function, instead of after every function call. This means that with the x64 compiler, all function calls share the same stack space for their parameters, whereas with the x86 compiler, the stack space is allocated and cleaned up at every function call.
What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?
The keywords _stdcall and _cdecl specify 32-bit calling conventions. That's why they are not relevant for 64-bit programs (i.e. x64). On x64, there is only the standard calling convention and the extended __vectorcall calling convenction.
Related
So, I have a task that I need to convert from string for example string "542215" to int 542215 or with any other ASCII symbols. I am kindof new to assembly programming so, I have almost no clue to what I am doing with it, but I wrote my experimental code.
int main(int argc, char** argv)
{
int Letters;
char* argv1 = argv[1];
if (argc < 2) {
printf("Nepateiktas parametras*/\n");
return(0);
}
__asm {
push eax
push ebx
push ecx
xor ecx, ecx
mov eax, argv1
NextChar:
movzx eax, byte ptr [esi + ecx]
test eax, eax
jz Done
push ecx
sub eax, 48
push eax
call PrintIt
pop ecx
inc ecx
jmp NextChar
Done:
pop ecx
pop ebx
pop eax
PrintIt:
mov[Letters], eax
pop ecx
pop ebx
pop eax
};
printf("Count of letters in string %s is %d\n", argv[1], Letters);
return(0);
}
and I am getting error that "Run-time check failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
This error basicly gives me 0 ideas what is going on, or what even ESP is, so any help would be appreciated with my code.
you don't need to push/pop registers around your whole asm statement; MSVC already notices which registers you modify.
You're popping the registers twice at the end, when execution falls through from Done: to PrintIt. Or even worse, on strings of non-zero length, you're leaving the asm statement via a path of execution that does 3 pushes at the start, push ecx, push eax, call PrintInt, and then 3 pops. So one of your register-restores is actually reloading the return address pushed by call, and leaving 3 pushed values on the stack.
If you don't know that ESP is the stack pointer, one of x86's 8 general-purpose integer registers, you need to continue reading an intro to basic x86 asm before using stack instructions like push and call.
Also something about the fact that labels aren't the same thing as functions, and how execution continues to the next instruction.
And BTW, int->string will involve a multiply by 10. See NASM Assembly convert input to integer? for the standard algorithm and an asm implementation.
I am searching for a way to define variables in c++ inline assembly. I found an interesting way to do it. But it confuses me, how this can work.
__asm
{
push ebp
mov ebp, esp
add esp, 4
mov [ebp - 4], 2
mov esp, ebp
pop ebp
}
I see this code as - Push base pointer address to the stack, move stack pointer address into base pointer's (Stack logically should collapse here, because this is common epilogue function of cleaning the stack). Then we move 4 to the esp address (Not even the value) And then remove that 4 from esp. So we get back to the same esp address. Strange fact for me is that, it even compiles, and it works. But when I try to test it by outputting the value
uint32_t output;
__asm
{
push ebp
mov ebp, esp
add esp, 4
mov [ebp - 4], 2
mov output,[ebp-4]
mov esp, ebp
pop ebp
}
std::cout << output;
It does not compile, showing "Operand size conflict", which seems weird to me, because I use 32 bit integer and register is also 32 bit. When using [ebp-4] without [], it gives garbage values, as expected.
So, maybe someone could explain how this works without giving error :)
And one additional question, why does db does not work in inline assembly?
It doesn't work, that doesn't define a C++ variable.
It just messes with the stack to reserve some new storage below the stack frame created by the compiler. And you modify EBP so compiler-generated addressing modes that use EBP will be broken.1
If you want to define or declare a C++ variable, do it with C++ syntax like int tmp.
asm doesn't really have variables. It has registers and memory. Keep track of where values are using comments. If you want to use some extra stack space from MSVC inline asm, I think that's safe, but don't modify EBP if you also want to reference C++ local variables.
Footnote 1:
That would be the case if your code assembled at all, which it won't because mov output,[ebp-4] has 2 explicit memory operands. MSVC inline asm can't allocate C++ variables in register.
Also mov [ebp - 4], 2 has ambiguous operand-size: neither operand has a size associated with it because neither is a register. Maybe you want mov dword ptr [ebp - 4], 2
I have a function which takes 3 arguments, dest, src0, src1, each a pointer to data of size 12. I made two versions. One is written in C and optimized by the compiler, the other one is fully written in _asm. So yeah. 3 arguments? I naturally do something like:
mov ecx, [src0]
mov edx, [src1]
mov eax, [dest]
I am a bit confused by the compiler, as it saw fit to add the following:
_src0$ = -8 ; size = 4
_dest$ = -4 ; size = 4
_src1$ = 8 ; size = 4
?vm_vec_add_scalar_asm##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add_scalar_asm
; _dest$ = ecx
; _src0$ = edx
; 20 : {
sub esp, 8
mov DWORD PTR _src0$[esp+8], edx
mov DWORD PTR _dest$[esp+8], ecx
; 21 : _asm
; 22 : {
; 23 : mov ecx, [src0]
mov ecx, DWORD PTR _src0$[esp+8]
; 24 : mov edx, [src1]
mov edx, DWORD PTR _src1$[esp+4]
; 25 : mov eax, [dest]
mov eax, DWORD PTR _dest$[esp+8]
Function body etc.
add esp, 8
ret 0
What does the _src0$[esp+8] etc. even means? Why does it do all this stuff before my code? Why does it try to [apparently]stack anything so badly?
In comparison, the C++ version has only the following before its body, which is pretty similar:
_src1$ = 8 ; size = 4
?vm_vec_add##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add
; _dest$ = ecx
; _src0$ = edx
mov eax, DWORD PTR _src1$[esp-4]
Why is this little sufficient?
The answer of Mats Petersson explained __fastcall. But I guess that is not exactly what you're asking ...
Actually _src0$[esp+8] just means [_src0$ + esp + 8], and _src0$ is defined above:
_src0$ = -8 ; size = 4
So, the whole expression _src0$[esp+8] is nothing but [esp] ...
To see why it does all these stuff, you should probably first understand what Mats Petersson said in his post, the __fastcall, or more generally, what is a calling convention. See the link in his post for detailed informations.
Assuming that you have understood __fastcall, now let's see what happens to your codes. The compiler is using __fastcall. Your callee function is f(dst, src0, src1), which requires 3 parameters, so according to the calling convention, when a caller calls f, it does the following:
Move dst to ecx and src0 to edx
Push src1 onto the stack
Push the 4 bytes return address onto the stack
Go to the starting address of the function f
And the callee f, when its code begins, then knows where the parameters are: dst and src0 are in the registers ecx and edx, respectively; esp is pointing to the 4 bytes return address, but the 4 bytes below it (i.e. DWORD PTR[esp+4]) is exactly src1.
So, in your "C++ version", the function f just does what it should do:
mov eax, DWORD PTR _src1$[esp-4]
Here _src1$ = 8, so _src1$[esp-4] is exactly [esp+4]. See, it just retrieves the parameter src1 and stores it in eax.
There is however a tricky point here. In the code of f, if you want to use the parameter src1 multiple times, you can certainly do that, because it's always stored in the stack, right below the return address; but what if you want to use dst and src0 multiple times? They are in the registers, and can be destroyed at any time.
So in that case, the compiler should do the following: right after entering the function f, it should remember the current values of ecx and edx (by pushing them onto the stack). These 8 bytes are the so-called "shadow space". It is not done in your "C++ version", probably because the compiler knows for sure that these two parameters will not be used multiple times, or that it can handle it properly some other way.
Now, what happens to your _asm version? The problem here is that you are using inline assembly. The compiler then loses its control to the registers, and it cannot assume that the registers ecx and edx are safe in your _asm block (they are actually not, since you used them in the _asm block). Thus it is forced to save them at the beginning of the function.
The saving goes as follows: it first raises esp by 8 bytes (sub esp, 8), then move edx and ecx to [esp] and [esp+4] respectively.
And then it can enter safely your _asm block. Now in its mind (if it has one), the picture is that [esp] is src0, [esp+4] is dst, [esp+8] is the 4 byte return address, and [esp+12] is src1. It no longer thinks about ecx and edx.
Thus your first instruction in the _asm block, mov ecx, [src0], should be interpreted as mov ecx, [esp], which is the same as
mov ecx, DWORD PTR _src0$[esp+8]
and the same for the other two instructions.
At this point, you might say, aha it's doing stupid things, I don't want it to waste time and space on that, is there a way?
Well there is a way - do not use inline assembly... it's convenient, but there is a compromise.
You can write the assembly function f in a .asm source file and public it. In the C/C++ code, declare it as extern 'C' f(...). Then, when you begin your assembly function f, you can play directly with your ecx and edx.
The compiler has decided to use a calling convention that uses "pass arguments in registers" aka __fastcall. This allows the compiler to pass some of the arguments in registers, instead of pushing onto stack, and this can reduce the overhead in the call, because moving from a variable to a register is faster than pushing onto the stack, and it's now already in a register when we get to the callee function, so no need to read it from the stack.
There is a lot more information about how calling conventions work on the web. The wikipedia article on x86 calling conventions is a good starting point.
I have a function with the signature :
extern "C" int foo(int a, int b, int c, int d, int e);
which is in fact written in assembly.
With ml(32 bits), using standard calling convention you can pretty much write
.code
foo PROC a: DWORD, b: DWORD ,c: DWORD, d: DWORD, e: DWORD
mov eax, d
mov ebx, e
and start using those labels to access your arguments
With ml64 (64 bits) the fastcall is the only convention available. I have no trouble accessing the first arguments stored in the registers, but issues to access the ones in the stack (e in this example): I tried
.code
foo PROC a: DWORD, b: DWORD ,c: DWORD, d: DWORD, e: DWORD
and
.code
foo PROC e: DWORD
but the value in e is garbage.
I found that if I use the stack address directly I find the value.
.code
foo PROC e: DWORD
mov eax, r9 ; d
mov ebx, DWORD PTR[rbp + 48] ; e
Is there another way?
Documentation explains everything... In Windows, the first four integer parameters are passed in registers RCX, RDX, R8, R9 and floating point in XMM0, XMM1, XMM2, XMM3, anything more than four parameters are passed on the stack above the shadow space. For Unix type OS's it is a bit different.
So, your example is correct - mov ebx, DWORD PTR[rbp + 48] ; e
Shadow space = 32 + saved rbp = 40 + 5th parameter = 48
given
extern "C" int foo(int a, int b, int c, int d, int e);
I found out that visual studio 2010 doesn't save the base pointer RBP if
.code
foo PROC
but save the base pointer if
.code
foo PROC e: DWORD
Later versions (vs2015) don't allow the second code.
There is an optional optimization in x64 systems where RBP is not used (found out the hard way). It says :
The conventional use of %rbp as a frame pointer for the stack frame
may be avoided by using %rsp (the stack pointer) to index into the
stack frame. This technique saves two instructions in the prologue and
epilogue and makes one additional general-purpose register (%rbp)
available.
So it is possible that either foo PROC e: DWORD doesnt compile (vs2015), or foo PROC crashes because RBP is null.
The correct way to retrieve stack arguments is to use the RSP stack pointer given that
RBP = RSP + 8 * num_saved_reg
Where num_saved_reg is the number of registers specified in the PROC directive. So when rbp is not saved (otherwise add 8)
PROC -> DWORD PTR[rsp + 40]
PROC use RDI -> DWORD PTR[rsp + 40 + 8]
PROC use RDI RSI RBX -> DWORD PTR[rsp + 40 + 24]
I have been learning IA-32 assembly programming. So I would like to write a function in assembly and call it from C++.
The tutorial I am following is actually for x64 assembly. But I am working on IA-32. In x64, it says function arguments are stored in registers like RCX, RDX, R8, R9 etc.
But on searching a little bit, I could understand in IA-32, arguments are stored in stack, not in registers.
Below is my C++ code :
#include <iostream>
#include <conio.h>
using namespace std;
extern "C" int PassParam(int a,int b);
int main()
{
cout << "z is " << PassParam(15,13) << endl;
_getch();
return 0;
}
Below is assembly code for PassParam() function (it just add two arguments, that's all. It is only for learning purpose) :
PassParam() in assembly :
.model C,flat
.code
PassParam proc
mov eax,[ebp-212]
add eax,[ebp-216]
ret
PassParam endp
end
In my assembly code, you can see I moved first argument from [ebp-212] to eax. That value is obtained as follows :
I wrote PassParam() function in C++ itself and disassembled it. Then checked where ebp is and where is second argument stored (arguments are stored from right to left). I could see there is a difference of 212, so that is how i got that value. Then as usual, first argument is stored 4 bytes later. And it works fine.
Question :
Is this the correct method to access arguments from assembly ? I mean, is it always [ebp-212] where argument stored?
If not, can anyone explain the correct method to pass arguments from C++ to assembly ?
Note :
I am working with Visual C++ 2010, on Windows 7 machine.
On 32bit architectures, it depends on the calling convention, Windows for example has both __fastcall and __thiscall that use register and stack args, and __cdecl and __stdcall that use stack args but differ in who does the cleanup. MSDN has a nice listing here (or the more assembly orientated version). Note that FPU/SSE operations also have their own conventions.
For ease and simplicity, try use __stdcall for everything, this allows you to use stack frames to access args via MOV r32,[EBP+4+(arg_index * 4)], or if you aren't using stack frames, you can use MOV r32,[ESP+local_stack_offset+(arg_index * 4)]. The annotated C++ -> x86 Assembly example here should be of help.
So as a simple example, lets say we have the function MulAdd in assembly, with the C++ prototype int __stdcall MulAdd(int base, int mul, int add), it would look something like:
MOV EAX,[ESP+4] //get the first arg('base') off the stack
MOV ECX,[ESP+8] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[ESP+12] //get arg 3 off the stack
ADD EAX,ECX
RETN 12 //cleanup the 3 args and return
Or if you use a stack frame:
PUSH EBP
MOV EBP,ESP //save the stack
MOV EAX,[EBP+8] //get the first arg('base') off the stack
MOV ECX,[EBP+12] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[EBP+16] //get arg 3 off the stack
ADD EAX,ECX
MOV ESP,EBP //restore the stack
POP EBP
RETN //return to caller
Using the stack frame avoids needing to adjust for changes made to the stack by PUSH'ing of args, spilling or registers or stack allocations made for local variables. Its downside is that it reduces the number of registers you have to work with.