Calling putchar using x64 assembly through C++ - c++

So, I wrote a little library that allows me to execute raw bytecode, as in assembly instructions, in C++.
I thought writing a brainfuck-to-x64 compiler with it. Everything worked, until I had to implement the . brainfuck instruction, which prints a character to stdout.
I know I need to pass the (only) argument through rcx (according to cdecl). But I don't know how to setup the stack, or cleanup after a function call. My ASM code is as follows:
push rbp ; This is the only thing I tried doing as an epilog
mov rcx, QWORD PTR [rbx+rax*4] ; rbx contains the address of an array (32-bit elements), and rax contains the index, the character byte is saved in that address
push rax ; Retrieve rax after it gets clobbered by putchar
push rcx ; Push rcx to use it as an argument
call r10 ; r10 contains the address of putchar
pop rcx ; Restore all clobbered registers
pop rax
pop rbp
This snippet of code works, a character gets put into stdout, but after that, I just get "Access violation executing location 0x0000000000000000."
What am I missing?
Sounds like putchar is not returning correctly due to rsp being corrupted, or something
By the way, I got the address of putchar like this:
#include <cstdio>
uint_least64_t putchar_addr = (uint_least64_t)&std::putchar;
I need to get the pointer as an integer so I can append it to the code buffer as bytecode later.

Related

asm inspection of c++ compiled object. what is the meaning of this cs: part [duplicate]

I am writing simple programs then analyze them.
Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.
TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.

Understanding a function call that uses EAX before and after for the return value

I have been trying to hook a function which is mostly optimized by the compiler. It initializes EAX before the call and its return value is stored in EAX.
Here is some code:
mov eax,dword ptr ds:[0xA6DD08]
push 0x3DC
add eax,0x800
call 0x48A2B4
mov esi,eax
At first, 0xA6DD08 is a pointer to some data in memory but once adding 0x800, EAX just points to a value of zero but the next few DWORD(s) stores pointer of pointers or data array. The function's purpose itself is to lookup and return a specific object that has a DWORD variable equal to the given value which is 0x3DC.
When using __asm to call the function from my dll, it works perfectly but I am trying to write it in c++, something like
Class1* pClass = reinterpret_cast<Class1*(__stdcall*)(DWORD)>(0x48A2B4)(988);
I believe from what I read that only __stdcall uses EAX to store its return value and that's why I choose __stdcall calling convention. What I do not understand is the purpose of EAX before calling the function.
add eax,0x800 right before a call wouldn't make sense unless EAX is an input to the called function.
Passing 1 arg in EAX and another on the stack looks to me like GCC's regparm=1 calling convention. Or if other regs are set before this, regparm=3 passes in EAX, EDX, and ECX (in that order).
32-bit x86 builds of the Linux kernel are typically built with -mregparm=3, but user-space GNU/Linux code typically follows the clunky old i386 System V convention which passes all args on the stack.
According to https://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions, a couple other obscure calling conventions also pass a first arg in EAX:
Delphi and Free Pascal register: EAX, EDX, ECX (Left-to-right Pascal style arg passing, right-most arg in EAX I guess? Unlike GCC regparm)
Watcom compiler: EAX, EDX, EBX, ECX. Unless you left out some setting of EDX, EBX, and ECX before pushing a stack arg, we can rule that out.
only __stdcall uses EAX to store it's return value
Actually, all x86 calling conventions do that for integer args, across the board. Also both x86-64 conventions. See Agner Fog's calling convention guide.

Mixing c++ and assembly cant pass multiple paramaters from C++ function to assembly

I've been frustrated by passing parameters from a c++ function to assembly. I couldn't find anything that helped on Google and would really like your help. I am using Visual Studio 2017 and masm to compile my assembly code.
This is a simplified version of my c++ file where I call the assembly procedure set_clock
int main()
{
TimeInfo localTime;
char clock[4] = { 0,0,0,0 };
set_clock(clock,&localTime);
system("pause");
return 0;
}
I run into problems in the assembly file. I can't figure out why the second parameter passed to the function turns out huge. I was going off my textbook, which shows similar code with PROC followed by parameters. I don't know why the first parameter is passed successfully and the second one isn't. Can someone tell me the correct way to pass multiple parameters?
.code
set_clock PROC,
array:qword,address:qword
mov rdx,array ; works fine memory address: 0x1052440000616
mov rdi,address ; value of rdi is 14757395258967641292
mov al, [rdx]
mov [rdi],al ; ERROR: cant access that memory location
ret
set_clock ENDP
END
MASM's high-level crap is biting you in the ass. x64 Windows passes the first 4 args in rcx, rdx, r8, r9 (for any of those 4 that are integer/pointer).
mov rdx,array
mov rdi,address
assembles to
mov rdx, rcx ; clobber 2nd arg with a copy of the 1st
mov rdi, rdx ; copy array again
Use a disassembler to check for yourself. Always a good idea to check the real machine code by disassembling or using your debuggers disassembly instead of source mode, if anything weird is happening with assembler macros.
I'm not sure why this would result in an inaccessible memory location. If both args really are pointers to locals, then it should just be loading and storing back into the same stack location. But if char clock[4] is a const in static storage, it might be in a read-only memory page which would explain the store failing.
Either way, use a debugger and find out.
BTW, rdi is a call-preserved (aka non-volatile) register in the x64 Windows convention. (https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx). Use call-clobbered registers for scratch regs unless you run out and need to save/restore some call-preserved regs. See also Agner Fog's calling conventions doc (http://agner.org/optimize/), and other links in the x86 tag wiki.
It's call-clobbered in x86-64 System V, which also passes args in different registers. Maybe you were looking at a different example?
Hopefully-fixed version, using movzx to avoid a false dependency on RAX when loading a byte.
set_clock PROC,
array:qword,address:qword
movzx eax, byte ptr [array]
mov [address], al
ret
set_clock ENDP
I don't use MASM, but I think array:qword makes array an alias for rcx. Or you could skip declaring the parameters and just use rcx and rdx directly, and document it with comments. That would be easier for everyone to understand.
You definitely don't want useless mov reg,reg instructions cluttering your code; if you're writing in asm in the first place, wasted instructions would cut into any speedups you're getting.

Injecting 64 Bit DLL using code cave

I'm trying to inject a 64 Bit DLL into 64 Bit Process (explorer for the matter).
I've tried using Remote-thread\Window Hooks techniques but some Anti-Viruses detects my loader as a false positive.
After reading this article : Dll Injection by Darawk, I decided to use code caves.
It worked great for 32bit but because VS doesn't support inline assembly for 64 Bit I had to write the op-codes and operands explicitly.
I looked at this article : 64Bit injection using code cave, as the article states, there are some differences:
There are several differences that had to be incorporated here:
MASM64 uses fastcall, so the function's argument has to be passed in a
register and not on the stack.
The length of the addresses - 32 vs. 64 bit - must be taken into account.
MASM64 has no instruction that
pushes all registers on the stack (like pushad in 32bit) so this had
to be done by pushing all the registers explicitly.
I followed those guidelines and ran the article's example but none of what I did worked.
The target process just crashed at the moment I resumed the main thread and I don't know how to really look into it because ollydbg has no 64 bit support.
This is how the code looks before I injected it:
codeToInject:
000000013FACD000 push 7741933Ah
000000013FACD005 pushfq
000000013FACD006 push rax
000000013FACD007 push rcx
000000013FACD008 push rdx
000000013FACD009 push rbx
000000013FACD00A push rbp
000000013FACD00B push rsi
000000013FACD00C push rdi
000000013FACD00D push r8
000000013FACD00F push r9
000000013FACD011 push r10
000000013FACD013 push r11
000000013FACD015 push r12
000000013FACD017 push r13
000000013FACD019 push r14
000000013FACD01B push r15
000000013FACD01D mov rcx,2CA0000h
000000013FACD027 mov rax,76E36F80h
000000013FACD031 call rax
000000013FACD033 pop r15
000000013FACD035 pop r14
000000013FACD037 pop r13
000000013FACD039 pop r12
000000013FACD03B pop r11
000000013FACD03D pop r10
000000013FACD03F pop r9
000000013FACD041 pop r8
000000013FACD043 pop rdi
000000013FACD044 pop rsi
000000013FACD045 pop rbp
000000013FACD046 pop rbx
000000013FACD047 pop rdx
000000013FACD048 pop rcx
000000013FACD049 pop rax
000000013FACD04A popfq
000000013FACD04B ret
Seems fine to me but I guess I'm missing something.
My complete code can be found here : Source code
Any ideas\suggestions\alternatives?
The first push that stores the return value only pushes a 32-bit value. dwOldIP in your code is a DWORD as well, it should be a DWORD64. Having to cast to DWORD from ctx.Rip should've been enough of a hint ;)
Also, make sure the stack is 16-byte aligned upon entering the call to LoadLibrary. Some APIs throw exceptions if the stack is not aligned properly.
Apparently, The main problem was that I allocated the code cave data without the EXECUTE_PAGE_READWRITE permission and therefore the chunk of data was treated as data and not as opcodes.

Help: Application crashes on accessing source code

Here is a simple asm code I have inserted in VC++ project. addr_curr_ebp is the current address of EBP pointer. It is pointing to the old EBP value inside the stack frame. 4 bytes after this is the return address inside the application function. I extract a single byte from the code section. I run my code along with other applications like gtalk, vlc etc. The application always crashes when I include ProbStat 1 and 2 in my code. When I remove these statements everything works fine. What do you think this is?
__asm{
push eax
push ebx
push cx
mov ebx, addr_curr_ebp
mov eax, [ebx + 4]
mov cl, BYTE PTR [eax - 5] //ProbStat 1
mov ret_5, cl // ProbStat 2
pop cx
pop ebx
pop eax
}
Your code snippet isn't good enough to see where "ret_5" is located. You'll get an automatic crash if it is a member of a class. The ecx register stores the "this" pointer, you're messing it up.
Not sure what this does, sound to me like you need to use the _ReturnAddress intrinsic. It returns the address of the instruction after the call instruction that called this code. Assign it to an unsigned char*, no need for assembly this way.