Calling windows functions from machine code - c++

Here is the walkthrough I'm using: https://i.imgur.com/LIImg.jpg
From what I'm seeing is that to call a windows function, you put the arguments in specific registers. But where is it listed what registers to use and what order?
Look at the code section of that image, it just seems to use r8d, r9, edx and ecx? Does that mean it uses edx, ecx, r8d, r9d, r10d, etc? What happens when you run out of registers for a function with many parameters?
Also why does it have to subtract from the stack? And why 0x28?

Related

Why the gs segment register is address is set to 0x0000000000000000 on visual studio x64(MASM)?

I am currently reading "The Ultimate Anti Debugging Reference" and I am trying to implement some of the techniques.
To check the Value of the NtglobalFlag they use this code -
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
I did all the correct adjustments for running x64 code on visual studio 2017 I used this tutorial.
I used this instruction to accesses the NtGlobalFlag
lodsq gs:[rsi]
because their syntax didn't work on Visual studio.
But still, it didn't work.
While debugging I've noticed that the value of the gs register is set to 0x0000000000000000 while the fs register is set to a real value 0x0000007595377000.
I don't understand why the value of GS was zeroed, because it should have its value set on x64.
64 bit Windows is apparently using fs to point to "per thread" memory, since gs is zero. I don't know what variables are kept in "per thread" memory, other than the seed value for rand(). You could debug a program that used rand(), and step through it in a disassembler window, to see how it is accessed.
The success of adding an anti-debugger feature to a program will depend on how much motivation there is to defeat it. The main issue is Windows remote debugging, and/or using a hacker installed device driver running in kernel mode to defeat an anti-debugger feature.
So I still don't understand why the code posted here caused so many problems, As I said I just copied it from "The “Ultimate”Anti-Debugging Reference"
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
But I've found a simpler solution that works perfectly.
As #"Peter Cordes" said I should be good with just accessing the value without lodsq like so -
mov rax, gs:[60h]
And after further investigation, I found this reference,
Code -
mov rax, gs:[60h]
mov al, [rax+BCh]
and al, 70h
cmp al, 70h
jz being_debugged
And I modified it a little bit for my program -
.code
GetValueFromASM proc
mov rax, gs:[60h]
mov al, [rax+0BCh]
and al, 70h
cmp al, 70h
jz being_debugged
mov rax,0
ret
being_debugged:
mov rax, 1
ret
GetValueFromASM endp
end
Just one thing to note -
When running inside visual studio 2017 the result returned was 0. Meaning no debugger attached which is False (Because I used the Local Windows Debugger).
But when launching the process with WinDBG it did return 1 which means that it works.

Understanding a function call that uses EAX before and after for the return value

I have been trying to hook a function which is mostly optimized by the compiler. It initializes EAX before the call and its return value is stored in EAX.
Here is some code:
mov eax,dword ptr ds:[0xA6DD08]
push 0x3DC
add eax,0x800
call 0x48A2B4
mov esi,eax
At first, 0xA6DD08 is a pointer to some data in memory but once adding 0x800, EAX just points to a value of zero but the next few DWORD(s) stores pointer of pointers or data array. The function's purpose itself is to lookup and return a specific object that has a DWORD variable equal to the given value which is 0x3DC.
When using __asm to call the function from my dll, it works perfectly but I am trying to write it in c++, something like
Class1* pClass = reinterpret_cast<Class1*(__stdcall*)(DWORD)>(0x48A2B4)(988);
I believe from what I read that only __stdcall uses EAX to store its return value and that's why I choose __stdcall calling convention. What I do not understand is the purpose of EAX before calling the function.
add eax,0x800 right before a call wouldn't make sense unless EAX is an input to the called function.
Passing 1 arg in EAX and another on the stack looks to me like GCC's regparm=1 calling convention. Or if other regs are set before this, regparm=3 passes in EAX, EDX, and ECX (in that order).
32-bit x86 builds of the Linux kernel are typically built with -mregparm=3, but user-space GNU/Linux code typically follows the clunky old i386 System V convention which passes all args on the stack.
According to https://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions, a couple other obscure calling conventions also pass a first arg in EAX:
Delphi and Free Pascal register: EAX, EDX, ECX (Left-to-right Pascal style arg passing, right-most arg in EAX I guess? Unlike GCC regparm)
Watcom compiler: EAX, EDX, EBX, ECX. Unless you left out some setting of EDX, EBX, and ECX before pushing a stack arg, we can rule that out.
only __stdcall uses EAX to store it's return value
Actually, all x86 calling conventions do that for integer args, across the board. Also both x86-64 conventions. See Agner Fog's calling convention guide.

Mixing c++ and assembly cant pass multiple paramaters from C++ function to assembly

I've been frustrated by passing parameters from a c++ function to assembly. I couldn't find anything that helped on Google and would really like your help. I am using Visual Studio 2017 and masm to compile my assembly code.
This is a simplified version of my c++ file where I call the assembly procedure set_clock
int main()
{
TimeInfo localTime;
char clock[4] = { 0,0,0,0 };
set_clock(clock,&localTime);
system("pause");
return 0;
}
I run into problems in the assembly file. I can't figure out why the second parameter passed to the function turns out huge. I was going off my textbook, which shows similar code with PROC followed by parameters. I don't know why the first parameter is passed successfully and the second one isn't. Can someone tell me the correct way to pass multiple parameters?
.code
set_clock PROC,
array:qword,address:qword
mov rdx,array ; works fine memory address: 0x1052440000616
mov rdi,address ; value of rdi is 14757395258967641292
mov al, [rdx]
mov [rdi],al ; ERROR: cant access that memory location
ret
set_clock ENDP
END
MASM's high-level crap is biting you in the ass. x64 Windows passes the first 4 args in rcx, rdx, r8, r9 (for any of those 4 that are integer/pointer).
mov rdx,array
mov rdi,address
assembles to
mov rdx, rcx ; clobber 2nd arg with a copy of the 1st
mov rdi, rdx ; copy array again
Use a disassembler to check for yourself. Always a good idea to check the real machine code by disassembling or using your debuggers disassembly instead of source mode, if anything weird is happening with assembler macros.
I'm not sure why this would result in an inaccessible memory location. If both args really are pointers to locals, then it should just be loading and storing back into the same stack location. But if char clock[4] is a const in static storage, it might be in a read-only memory page which would explain the store failing.
Either way, use a debugger and find out.
BTW, rdi is a call-preserved (aka non-volatile) register in the x64 Windows convention. (https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx). Use call-clobbered registers for scratch regs unless you run out and need to save/restore some call-preserved regs. See also Agner Fog's calling conventions doc (http://agner.org/optimize/), and other links in the x86 tag wiki.
It's call-clobbered in x86-64 System V, which also passes args in different registers. Maybe you were looking at a different example?
Hopefully-fixed version, using movzx to avoid a false dependency on RAX when loading a byte.
set_clock PROC,
array:qword,address:qword
movzx eax, byte ptr [array]
mov [address], al
ret
set_clock ENDP
I don't use MASM, but I think array:qword makes array an alias for rcx. Or you could skip declaring the parameters and just use rcx and rdx directly, and document it with comments. That would be easier for everyone to understand.
You definitely don't want useless mov reg,reg instructions cluttering your code; if you're writing in asm in the first place, wasted instructions would cut into any speedups you're getting.

fastcall how to use for more than 4 parameters

I was trying to build a function in assebmly(FASM) that used more than 4 parameters. in x86 it works fine but I know in x64 with fastcall you have to spill the parameters into the shadow space in the order of rcx,rdx,r8,r9 I read that for 5 and etc you have to pass them onto the stack, but I don't know how to do this. this is what I tried but it keeps saying invalid operand. I know that the first 4 parameters I am doing right because I have made x64 functions before but it is the last 3 I don't know how to spill
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
if I try
mov [buffer3],rsp+8*4
it says extra characters on the line.
I also saw that somepeople use rsp+20h, rsp+28h etc but that does not work either.
how do I call more than 4 parameters using fastcall on x64?
also do I have to make room on the stack? I saw some people have to put add rsp,20h right before their spill code. I tried that and it did not help the invlaid operand.
thanks
update
after playing around with it for a little bit I found that the only way it seems to work is if I spill the first 4 parameters and then ignore the rest 5-infinity
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
;start the regular code. ignore spilling buffer3,startposition and length
On x86/x64-CPUs this following instructions does not exist:
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
Workaround with using the rax-register for to read and for to write a values from and to a memory loaction:
mov rax,[rsp+8*4]
mov [buffer3],rax
mov rax,[rsp+8*5]
mov [startposition],rax
mov rax,[rsp+8*6]
mov [length],rax

What is the lea instruction before a method call doing?

In looking at my disassembled code I see a lot of the following:
00B442E9 push 4
00B442EB push 3
00B442ED lea ecx,[ebp-24h]
00B442F0 call Foo::Bar (0B41127h)
I understand pushing the parameters before the call, but what's the lea doing here?
In the thiscall calling convention used by Visual C++ for x86, the this pointer is passed in the ecx register. This lea instruction copies the this pointer into the ecx register before calling the member function.
You can read all about the lea instruction in the Stack Overflow question "What's the purpose of the LEA instruction?"
I think it's just an optimized form of
mov ecx, ebp
sub ecx, 24h