MSVC Assembly function arguments C++ vs _asm

MSVC Assembly function arguments C++ vs _asm - c++

I have a function which takes 3 arguments, dest, src0, src1, each a pointer to data of size 12. I made two versions. One is written in C and optimized by the compiler, the other one is fully written in _asm. So yeah. 3 arguments? I naturally do something like:
mov ecx, [src0]
mov edx, [src1]
mov eax, [dest]
I am a bit confused by the compiler, as it saw fit to add the following:
_src0$ = -8 ; size = 4
_dest$ = -4 ; size = 4
_src1$ = 8 ; size = 4
?vm_vec_add_scalar_asm##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add_scalar_asm
; _dest$ = ecx
; _src0$ = edx
; 20 : {
sub esp, 8
mov DWORD PTR _src0$[esp+8], edx
mov DWORD PTR _dest$[esp+8], ecx
; 21 : _asm
; 22 : {
; 23 : mov ecx, [src0]
mov ecx, DWORD PTR _src0$[esp+8]
; 24 : mov edx, [src1]
mov edx, DWORD PTR _src1$[esp+4]
; 25 : mov eax, [dest]
mov eax, DWORD PTR _dest$[esp+8]
Function body etc.
add esp, 8
ret 0
What does the _src0$[esp+8] etc. even means? Why does it do all this stuff before my code? Why does it try to [apparently]stack anything so badly?
In comparison, the C++ version has only the following before its body, which is pretty similar:
_src1$ = 8 ; size = 4
?vm_vec_add##YAXPAUvec3d##PBU1#1#Z PROC ; vm_vec_add
; _dest$ = ecx
; _src0$ = edx
mov eax, DWORD PTR _src1$[esp-4]
Why is this little sufficient?

The answer of Mats Petersson explained __fastcall. But I guess that is not exactly what you're asking ...
Actually _src0$[esp+8] just means [_src0$ + esp + 8], and _src0$ is defined above:
_src0$ = -8 ; size = 4
So, the whole expression _src0$[esp+8] is nothing but [esp] ...
To see why it does all these stuff, you should probably first understand what Mats Petersson said in his post, the __fastcall, or more generally, what is a calling convention. See the link in his post for detailed informations.
Assuming that you have understood __fastcall, now let's see what happens to your codes. The compiler is using __fastcall. Your callee function is f(dst, src0, src1), which requires 3 parameters, so according to the calling convention, when a caller calls f, it does the following:
Move dst to ecx and src0 to edx
Push src1 onto the stack
Push the 4 bytes return address onto the stack
Go to the starting address of the function f
And the callee f, when its code begins, then knows where the parameters are: dst and src0 are in the registers ecx and edx, respectively; esp is pointing to the 4 bytes return address, but the 4 bytes below it (i.e. DWORD PTR[esp+4]) is exactly src1.
So, in your "C++ version", the function f just does what it should do:
mov eax, DWORD PTR _src1$[esp-4]
Here _src1$ = 8, so _src1$[esp-4] is exactly [esp+4]. See, it just retrieves the parameter src1 and stores it in eax.
There is however a tricky point here. In the code of f, if you want to use the parameter src1 multiple times, you can certainly do that, because it's always stored in the stack, right below the return address; but what if you want to use dst and src0 multiple times? They are in the registers, and can be destroyed at any time.
So in that case, the compiler should do the following: right after entering the function f, it should remember the current values of ecx and edx (by pushing them onto the stack). These 8 bytes are the so-called "shadow space". It is not done in your "C++ version", probably because the compiler knows for sure that these two parameters will not be used multiple times, or that it can handle it properly some other way.
Now, what happens to your _asm version? The problem here is that you are using inline assembly. The compiler then loses its control to the registers, and it cannot assume that the registers ecx and edx are safe in your _asm block (they are actually not, since you used them in the _asm block). Thus it is forced to save them at the beginning of the function.
The saving goes as follows: it first raises esp by 8 bytes (sub esp, 8), then move edx and ecx to [esp] and [esp+4] respectively.
And then it can enter safely your _asm block. Now in its mind (if it has one), the picture is that [esp] is src0, [esp+4] is dst, [esp+8] is the 4 byte return address, and [esp+12] is src1. It no longer thinks about ecx and edx.
Thus your first instruction in the _asm block, mov ecx, [src0], should be interpreted as mov ecx, [esp], which is the same as
mov ecx, DWORD PTR _src0$[esp+8]
and the same for the other two instructions.
At this point, you might say, aha it's doing stupid things, I don't want it to waste time and space on that, is there a way?
Well there is a way - do not use inline assembly... it's convenient, but there is a compromise.
You can write the assembly function f in a .asm source file and public it. In the C/C++ code, declare it as extern 'C' f(...). Then, when you begin your assembly function f, you can play directly with your ecx and edx.

The compiler has decided to use a calling convention that uses "pass arguments in registers" aka __fastcall. This allows the compiler to pass some of the arguments in registers, instead of pushing onto stack, and this can reduce the overhead in the call, because moving from a variable to a register is faster than pushing onto the stack, and it's now already in a register when we get to the callee function, so no need to read it from the stack.
There is a lot more information about how calling conventions work on the web. The wikipedia article on x86 calling conventions is a good starting point.

Related

Why are parameters allocated below the frame pointer instead of above?

I have tried to understand this basing on a square function in c++ at godbolt.org . Clearly, return, parameters and local variables use “rbp - alignment” for this function.
Could someone please explain how this is possible?
What then would rbp + alignment do in this case?
int square(int num){
int n = 5;// just to test how locals are treated with frame pointer
return num * num;
}
Compiler (x86-64 gcc 11.1)
Generated Assembly:
square(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi. ;\\Both param and local var use rbp-*
mov DWORD PTR[rbp-4], 5. ;//
mov eax, DWORD PTR [rbp-20]
imul eax, eax
pop rbp
ret

This is one of those cases where it’s handy to distinguish between parameters and arguments. In short: arguments are the values given by the caller, while parameters are the variables holding them.
When square is called, the caller places the argument in the rdi register, in accordance with the standard x86-64 calling convention. square then allocates a local variable, the parameter, and places the argument in the parameter. This allows the parameter to be used like any other variable: be read, written into, having its address taken, and so on. Since in this case it’s the callee that allocated the memory for the parameter, it necessarily has to reside below the frame pointer.
With an ABI where arguments are passed on the stack, the callee would be able to reuse the stack slot containing the argument as the parameter. This is exactly what happens on x86-32 (pass -m32 to see yourself):
square(int): # #square(int)
push ebp
mov ebp, esp
push eax
mov eax, dword ptr [ebp + 8]
mov dword ptr [ebp - 4], 5
mov eax, dword ptr [ebp + 8]
imul eax, dword ptr [ebp + 8]
add esp, 4
pop ebp
ret
Of course, if you enabled optimisations, the compiler would not bother with allocating a parameter on the stack in the callee; it would just use the value in the register directly:
square(int): # #square(int)
mov eax, edi
imul eax, edi
ret

GCC allows "leaf" functions, those that don't call other functions, to not bother creating a stack frame. The free stack is fair game to do so as these fns wish.

How to convert string to int in assembly MSVC

So, I have a task that I need to convert from string for example string "542215" to int 542215 or with any other ASCII symbols. I am kindof new to assembly programming so, I have almost no clue to what I am doing with it, but I wrote my experimental code.
int main(int argc, char** argv)
{
int Letters;
char* argv1 = argv[1];
if (argc < 2) {
printf("Nepateiktas parametras*/\n");
return(0);
}
__asm {
push eax
push ebx
push ecx
xor ecx, ecx
mov eax, argv1
NextChar:
movzx eax, byte ptr [esi + ecx]
test eax, eax
jz Done
push ecx
sub eax, 48
push eax
call PrintIt
pop ecx
inc ecx
jmp NextChar
Done:
pop ecx
pop ebx
pop eax
PrintIt:
mov[Letters], eax
pop ecx
pop ebx
pop eax
};
printf("Count of letters in string %s is %d\n", argv[1], Letters);
return(0);
}
and I am getting error that "Run-time check failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
This error basicly gives me 0 ideas what is going on, or what even ESP is, so any help would be appreciated with my code.

you don't need to push/pop registers around your whole asm statement; MSVC already notices which registers you modify.
You're popping the registers twice at the end, when execution falls through from Done: to PrintIt. Or even worse, on strings of non-zero length, you're leaving the asm statement via a path of execution that does 3 pushes at the start, push ecx, push eax, call PrintInt, and then 3 pops. So one of your register-restores is actually reloading the return address pushed by call, and leaving 3 pushed values on the stack.
If you don't know that ESP is the stack pointer, one of x86's 8 general-purpose integer registers, you need to continue reading an intro to basic x86 asm before using stack instructions like push and call.
Also something about the fact that labels aren't the same thing as functions, and how execution continues to the next instruction.
And BTW, int->string will involve a multiply by 10. See NASM Assembly convert input to integer? for the standard algorithm and an asm implementation.

Decoding assembly from MSVC 32-bit release (homework). What does no-op do?

Hi heads up this is a homework. I'm given an assembly generated by MSVC 32-bit Release with optimizations on, and I'm supposed to decode it back into C++. I've included the top of the function to the line I'm having problems with. The comments are mine, which I'm wrote while trying to understand this.
Note: Code is supposedly generated from C++. Not traditional ASM.
Note 2: There is one area of undefined behavior in the code.
Here are the lines I'm stuck with
TheFunction: ; TheFunction(int* a, int s);
0F2D4670 push ebp ; Push/clear/save ebp
0F2D4671 mov ebp,esp ; ebp now points to top of stack
0F2D4673 push ecx ; Push/clear/save ecx
0F2D4674 push ebx ; Push/clear/save ebs
0F2D4675 push esi ; Push/clear/save esi
0F2D4676 mov ebx,edx ; ebx = int s
0F2D4678 mov esi,1 ; esi = 1
0F2D467D push edi ; calling convention ; Push/clear/save edi
0F2D467E mov edi,dword ptr [a (0F2D95E8h)] ; edi = a[0]
0F2D4684 cmp ebx,esi ; if(s < 1)
0F2D4686 jl SomeFunction+3Ch (0F2D46ACh) ; Jump to return
0F2D4688 nop dword ptr [eax+eax] ; !! <-- No op involving dereferencing? What does this do?
0F2D4690 mov eax,dword ptr [edi+esi*4-4] ; !! <-- edi is *a, while esi is 1. There is no address
here!
..... More code but I've figured these out ....
I've more or less got the gist of the function. Its a function that takes a pointer to an int, with an underlying array, and a size. It then goes through each element in the array from last to first, adding to each subsequent one and printing it out. However, I still haven't got the details down and need help
Two questions, both at the end of the code snippet. What does no op on a dereference pointer do, and am I reading the last line in that its attempting to dereference something not in memory?

The nop dword ptr [eax+eax] instruciton does nothing. It doesn't even access the memory location given by the operand. It literally performs no operation.
It's just there so the next instruction is aligned to a 16-byte boundary. You'll notice that next instruction address is 0F2D4690 which ends with 0 which means it's 16-byte aligned. This can improve the performance of loops. Somewhere there will be an instruction that jumps back to 0F2D4690 as part of a loop. This particular form of a NOP instruction is used because it encodes a single NOP instruction in 8 bytes.
There is no corresponding C++ code for this instruction. You shouldn't try to represent it in your C++ code, just ignore it.
Also note that your comment for mov edi,dword ptr [a (0F2D95E8h)] is incorrect. Instead of being edi = a[0] it's simply edi = a. The variable a isn't a parameter at all, instead it's a global (or file level static) variable located at memory location 0F2D95E8h. This instruction just loads the value from memory.

Defining variables in c++ inline assembly

I am searching for a way to define variables in c++ inline assembly. I found an interesting way to do it. But it confuses me, how this can work.
__asm
{
push ebp
mov ebp, esp
add esp, 4
mov [ebp - 4], 2
mov esp, ebp
pop ebp
}
I see this code as - Push base pointer address to the stack, move stack pointer address into base pointer's (Stack logically should collapse here, because this is common epilogue function of cleaning the stack). Then we move 4 to the esp address (Not even the value) And then remove that 4 from esp. So we get back to the same esp address. Strange fact for me is that, it even compiles, and it works. But when I try to test it by outputting the value
uint32_t output;
__asm
{
push ebp
mov ebp, esp
add esp, 4
mov [ebp - 4], 2
mov output,[ebp-4]
mov esp, ebp
pop ebp
}
std::cout << output;
It does not compile, showing "Operand size conflict", which seems weird to me, because I use 32 bit integer and register is also 32 bit. When using [ebp-4] without [], it gives garbage values, as expected.
So, maybe someone could explain how this works without giving error :)
And one additional question, why does db does not work in inline assembly?

It doesn't work, that doesn't define a C++ variable.
It just messes with the stack to reserve some new storage below the stack frame created by the compiler. And you modify EBP so compiler-generated addressing modes that use EBP will be broken.1
If you want to define or declare a C++ variable, do it with C++ syntax like int tmp.
asm doesn't really have variables. It has registers and memory. Keep track of where values are using comments. If you want to use some extra stack space from MSVC inline asm, I think that's safe, but don't modify EBP if you also want to reference C++ local variables.
Footnote 1:
That would be the case if your code assembled at all, which it won't because mov output,[ebp-4] has 2 explicit memory operands. MSVC inline asm can't allocate C++ variables in register.
Also mov [ebp - 4], 2 has ambiguous operand-size: neither operand has a size associated with it because neither is a register. Maybe you want mov dword ptr [ebp - 4], 2

Proper way move register values in assembly from one location to another

I'm currently having some unexplained register issues which I can't figure my guess is I'm moving registers improperly from one register to another one.
I'm trying to get the value of EDX into a test.myEDX which mostly always puts the wrong value of EDX into test.myEDX, but part of that EDX value seems right which is very strange some kind of Hi/Lo Bits issues I assume.
Let me explain what I'm trying to do.
First I hook a location in memory which contains some arbitrary assembly code.
I hook it by placing a hardware breakpoint at that location then I can see all the register values at that location in time.
Now I made a simple program that everytime the cpu goes into that breakpointed location I put the EDX value into a struct in C++.
As far as i am aware nothing should modify the EDX register while I am putting it into the struct unless putting values into the struct modifies the EDX which I tested isn't the case, I tested by moving the EDX first into EAX then passing EAX into struct which should have the same value as EDX and it's still wrong maybe EAX is used as well when putting data into the struct? I didn't go further then that in testing all registers to find which one isn't used, doubt that's even the problem anyways.
Another thing to take into consideration to actually do any operations like putting EDX into the struct I have to make in C++ the current EIP go into my naked function, I know from previously doing this stuff that naked functions don't modify any registers at all, they are simply used as a way to extend the current asm code at that EIP into much larger asm code without any trash that C++ would add when going into subroutines.
I also use PUSHAD and PUSHFD to restore previously set register values when I finish dumping EDX into struct it's all in a safe environment (as long as I don't use PUSH/POP's incorrectly of course) before I call POPFD then POPAD
I barely know any ASM but from looking at many examples I never see registers moved like this. (which obviously wouldn't make any sense to duplicate a register, but even the same register moved into 2 different addresses one after the one I haven't seen that).
MOV EBX, EDX
MOV ECX, EDX
But in fact I see something like this (which I thought was my problem why it wasn't working), (it wasn't, I am still clueless what I am doing wrong)
MOV EBX, EDX
MOV EAX, EDX //Theory: in order to move EDX a second time into ECX, I must not move EDX directly into ECX.
MOV ECX, EAX
I don't see any difference except now EAX is lost, but I still figured I have to re-route every multiple moving of the same register to multiple locations by moving it over and over before actually moving to the original location, silly but that's what I thought you had to do I still think this is the proper way to this job.
Either way this still hasn't helped me I still get wrong values.
I think either those option's AL BL registers screw up EDX or maybe TEST or JE messes it up I can't really tell, I pasted this code into OllyDBG and seems like nothing modified EDX mystery why EDX is wrong, Maybe the address updates it's value too slow? since it's based on RAM speed which doesn't sync up with CPU speed (all stupid theories of course).
Anyways that's about everything I can explain here is the code.
struct TestStruct {
int myEDX;
int mySetEDX;
} test;
extern bool optionOne = false;
extern bool optionTwo = false;
DWORD gotoDumpBackAddress = 0x40012345;
void __declspec( naked ) dump(void) {
__asm {
PUSHAD
PUSHFD
XOR EAX, EAX //zero EAX, used below as AL (optionOne)
XOR EBX, EBX //zero EBX, used below as BL (optionTwo)
MOV AL, optionOne //optionOne set
MOV BL, optionTwo //optionTwo set
TEST EAX, EAX //Tests if optionOne equals == 0, then je will be equal.
je gotoOptionOne //Jumps if optionOne equals 0.
TEST EBX, EBX //Tests if optionTwo equals == 0, then je will be equal.
je gotoOptionTwo //Jumps if optionTwo equals 0.
gotoOptionOne:
//This the default case (improper value in EDX..., I could just use address at [ESI+0x2] which is old EDX, which is risky since it's delayed (outdated)
MOV DWORD PTR DS:[ESI+0x2], EDX //(normal operation)
MOV test.myEDX, EDX //Stores freshest EDX to test.myEDX (wrong EDX value)
JMP finish //Clear temporary used registers and go back to next asm code
//Special case. (works mostly properly)
//Thing is EDX gets updated very frequently, Causes no side-effect only because
//[ESI+0x2] gets updated in a different location as well a bit later to renew the value.
//So it's not even noticeable, but when I run this at it's peak speeds, you start to see the flickering effect, which isn't normal, if I run peak speeds without hook.
//I eliminated the problem that the hook could cause the flicker effect since after
//I call a empty naked Hook with just return address to same location and disable hook
//Then re-enable hook and repeat step above (no flicker occurs).
gotoOptionTwo:
//MOV DWORD PTR DS:[ESI+0x2], EDX //(normal operation), omitted
MOV EAX, DWORD PTR DS:[ESI+0x2] //Old EDX, but atleast it's correct.
MOV test.myEDX, EAX //Stores old EDX into struct test.myEDX
MOV EAX, test.mySetEDX //Replace old EDX with what I wish it should be.
MOV DWORD PTR DS:[ESI+0x2], EAX //nySetEDX into what EDX should of did.
JMP finish //Clear temporary used registers and go back to next asm code
finish:
POPFD
POPAD
JMP gotoDumpBackAddress //return to starting location before dump + 1.
}
}
EDIT: Okay I haven't explained how I test this code out.
I'm not going to explain how I do the Hardware breakpoint, I don't want this method to be public on the internet, for my own security reasons in the future.
But it works by calling another DLL which communicates with a system driver.
But this should explain how I test it.
I run in a different thread in this DLL Injection.
void diffThread() {
while(1) {
if(GetAsyncKeyState(VK_NUMPAD0)) {
optionOne != optionOne;
Sleep(1000);
}
if(GetAsyncKeyState(VK_NUMPAD1)) {
optionTwo != optionTwo;
Sleep(1000);
}
Sleep(10);
}
}
bool __stdcall DllMain(HINSTANCE hInst,DWORD uReason,void* lpReserved)
{
if(uReason == DLL_PROCESS_ATTACH)
{
CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)&diffThread,0 ,NULL,NULL);
}
else if(uReason == DLL_PROCESS_DETACH)
{
ExitThread(0);
}
return true;
}

Okay honestly this is wrong on so many levels, I mean the code itself.
What you are trying to do is to force your way into C territory telling the compiler to back off and you do your own thing by handwriting the code. This may come handy sometimes but you have to keep in mind that you also loose some C features especially things that are done for you automatically and as far as I know the Microsoft compile just doesnt like you messing around in inline assmebly ( don't get me wrong the MASM compiler is great but it just doesn't play along well with the C compiler )
// doing this is not the best since the compiler may
// do some nasty stuff like aligning the struct that
// you have to take watch out for in asm
struct TestStruct {
int myEDX;
int mySetEDX;
} test;
extern bool optionOne = false;
extern bool optionTwo = false;
DWORD gotoDumpBackAddress = 0x40012345; // static memory address? this should fail like instantly, I dont really know what this is supposed to be
void __declspec( naked ) dump(void) {
__asm {
PUSHAD // you shouldn't normally push all registers only the ones you actually used!
PUSHFD
XOR EAX, EAX // this is how you zero out but honestly you can use EAX's low and high parts as 2 16bit regsiters instead of using 2 32bits for 2 bits of data
XOR EBX, EBX
MOV AL, optionOne //optionOne set
MOV BL, optionTwo //optionTwo set
TEST EAX, EAX // This check is meaning less because if gotoOptionTwo is false you move on regardless of gotoOpeionOne's value
je gotoOptionOne
TEST EBX, EBX // This test should always be true isn't it?
je gotoOptionTwo
gotoOptionOne:
// Assuming that ESI is coming from the system as you said that there is a
// breakpoint that invokes this code, do you even have access to that part of the memory?
MOV DWORD PTR DS:[ESI+0x2], EDX
MOV test.myEDX, EDX // this is just a C idom 'struct' this doesnt really exist in ASM
JMP finish
gotoOptionTwo:
MOV EAX, DWORD PTR DS:[ESI+0x2]
MOV test.myEDX, EAX
MOV EAX, test.mySetEDX
MOV DWORD PTR DS:[ESI+0x2], EAX
JMP finish
finish:
POPFD
POPAD
JMP gotoDumpBackAddress //return to starting location before dump + 1.
}
}
So to answer your question,
I think the biggest problem here is this line
MOV test.myEDX, EDX
Generally in ASM there is no such thing as a scope, like you're telling it to get you a value of myEDX from test which means get the pointer to the value test and then get the value by the pointer. It may work but it's up to the C compiler to help you out.
So normally you would need to get a pointer and use that to move the data like
LEA ecx,test // get the address of test
MOV DWORD PTR[ecx],edx // use that address to move the data into the struct
Now you are telling it to put that data into the struct, however this makes some assumptions, like here we know that the first element is the one we want to put the data into ( myEDX ) and we assume that a DWORD is the same as an int in C ( on Windows it usually is ) but again assumptions are bad!

Some things that I noticed in your code:
First of all, I don't think that you really need to do the pushad and pushfd You are only using one register really, because ebx wouldn't even be needed, and eax is not considered by the compiler to be preserved anyway.
XOR EAX, EAX //zero EAX, used below as AL (optionOne)
MOV AL, optionOne //optionOne set
TEST EAX, EAX //Tests if optionOne equals == 0, then je will be equal.
je gotoOptionOne //Jumps if optionOne equals 0.
MOV AL, optionTwo //optionTwo set
TEST EAX, EAX //Tests if optionTwo equals == 0, then je will be equal.
je gotoOptionTwo //Jumps if optionTwo equals 0.
Your assumption
MOV EBX, EDX
MOV EAX, EDX //Theory: in order to move EDX a second time into ECX, I must not move EDX directly into ECX.
MOV ECX, EAX
is simply wrong. You can move the same register as often as you like, there is no limitation. The only thing that MIGHT be of consideration is when you are looking for maximum performance because of the instruction scheduling, but this is an entirely different issue and doesn't apply here anyway.
So simply doing
MOV EBX, EDX
MOV ECX, EDX
would be fine.
You are using ESI but it is nowhere initialized. Maybe this is correct, as I don't really understand what your code is supposed to do.
The you are using the variable test here
MOV test.myEDX, EAX
MOV EAX, test.mySetEDX
but this variable isn't even declared, so where should the compiler know it's address from? And for using it as an offset then you would have to use a register as the base address.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js