Injecting 64 Bit DLL using code cave - c++

I'm trying to inject a 64 Bit DLL into 64 Bit Process (explorer for the matter).
I've tried using Remote-thread\Window Hooks techniques but some Anti-Viruses detects my loader as a false positive.
After reading this article : Dll Injection by Darawk, I decided to use code caves.
It worked great for 32bit but because VS doesn't support inline assembly for 64 Bit I had to write the op-codes and operands explicitly.
I looked at this article : 64Bit injection using code cave, as the article states, there are some differences:
There are several differences that had to be incorporated here:
MASM64 uses fastcall, so the function's argument has to be passed in a
register and not on the stack.
The length of the addresses - 32 vs. 64 bit - must be taken into account.
MASM64 has no instruction that
pushes all registers on the stack (like pushad in 32bit) so this had
to be done by pushing all the registers explicitly.
I followed those guidelines and ran the article's example but none of what I did worked.
The target process just crashed at the moment I resumed the main thread and I don't know how to really look into it because ollydbg has no 64 bit support.
This is how the code looks before I injected it:
codeToInject:
000000013FACD000 push 7741933Ah
000000013FACD005 pushfq
000000013FACD006 push rax
000000013FACD007 push rcx
000000013FACD008 push rdx
000000013FACD009 push rbx
000000013FACD00A push rbp
000000013FACD00B push rsi
000000013FACD00C push rdi
000000013FACD00D push r8
000000013FACD00F push r9
000000013FACD011 push r10
000000013FACD013 push r11
000000013FACD015 push r12
000000013FACD017 push r13
000000013FACD019 push r14
000000013FACD01B push r15
000000013FACD01D mov rcx,2CA0000h
000000013FACD027 mov rax,76E36F80h
000000013FACD031 call rax
000000013FACD033 pop r15
000000013FACD035 pop r14
000000013FACD037 pop r13
000000013FACD039 pop r12
000000013FACD03B pop r11
000000013FACD03D pop r10
000000013FACD03F pop r9
000000013FACD041 pop r8
000000013FACD043 pop rdi
000000013FACD044 pop rsi
000000013FACD045 pop rbp
000000013FACD046 pop rbx
000000013FACD047 pop rdx
000000013FACD048 pop rcx
000000013FACD049 pop rax
000000013FACD04A popfq
000000013FACD04B ret
Seems fine to me but I guess I'm missing something.
My complete code can be found here : Source code
Any ideas\suggestions\alternatives?

The first push that stores the return value only pushes a 32-bit value. dwOldIP in your code is a DWORD as well, it should be a DWORD64. Having to cast to DWORD from ctx.Rip should've been enough of a hint ;)
Also, make sure the stack is 16-byte aligned upon entering the call to LoadLibrary. Some APIs throw exceptions if the stack is not aligned properly.

Apparently, The main problem was that I allocated the code cave data without the EXECUTE_PAGE_READWRITE permission and therefore the chunk of data was treated as data and not as opcodes.

Related

Why the gs segment register is address is set to 0x0000000000000000 on visual studio x64(MASM)?

I am currently reading "The Ultimate Anti Debugging Reference" and I am trying to implement some of the techniques.
To check the Value of the NtglobalFlag they use this code -
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
I did all the correct adjustments for running x64 code on visual studio 2017 I used this tutorial.
I used this instruction to accesses the NtGlobalFlag
lodsq gs:[rsi]
because their syntax didn't work on Visual studio.
But still, it didn't work.
While debugging I've noticed that the value of the gs register is set to 0x0000000000000000 while the fs register is set to a real value 0x0000007595377000.
I don't understand why the value of GS was zeroed, because it should have its value set on x64.
64 bit Windows is apparently using fs to point to "per thread" memory, since gs is zero. I don't know what variables are kept in "per thread" memory, other than the seed value for rand(). You could debug a program that used rand(), and step through it in a disassembler window, to see how it is accessed.
The success of adding an anti-debugger feature to a program will depend on how much motivation there is to defeat it. The main issue is Windows remote debugging, and/or using a hacker installed device driver running in kernel mode to defeat an anti-debugger feature.
So I still don't understand why the code posted here caused so many problems, As I said I just copied it from "The “Ultimate”Anti-Debugging Reference"
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
But I've found a simpler solution that works perfectly.
As #"Peter Cordes" said I should be good with just accessing the value without lodsq like so -
mov rax, gs:[60h]
And after further investigation, I found this reference,
Code -
mov rax, gs:[60h]
mov al, [rax+BCh]
and al, 70h
cmp al, 70h
jz being_debugged
And I modified it a little bit for my program -
.code
GetValueFromASM proc
mov rax, gs:[60h]
mov al, [rax+0BCh]
and al, 70h
cmp al, 70h
jz being_debugged
mov rax,0
ret
being_debugged:
mov rax, 1
ret
GetValueFromASM endp
end
Just one thing to note -
When running inside visual studio 2017 the result returned was 0. Meaning no debugger attached which is False (Because I used the Local Windows Debugger).
But when launching the process with WinDBG it did return 1 which means that it works.

Calling putchar using x64 assembly through C++

So, I wrote a little library that allows me to execute raw bytecode, as in assembly instructions, in C++.
I thought writing a brainfuck-to-x64 compiler with it. Everything worked, until I had to implement the . brainfuck instruction, which prints a character to stdout.
I know I need to pass the (only) argument through rcx (according to cdecl). But I don't know how to setup the stack, or cleanup after a function call. My ASM code is as follows:
push rbp ; This is the only thing I tried doing as an epilog
mov rcx, QWORD PTR [rbx+rax*4] ; rbx contains the address of an array (32-bit elements), and rax contains the index, the character byte is saved in that address
push rax ; Retrieve rax after it gets clobbered by putchar
push rcx ; Push rcx to use it as an argument
call r10 ; r10 contains the address of putchar
pop rcx ; Restore all clobbered registers
pop rax
pop rbp
This snippet of code works, a character gets put into stdout, but after that, I just get "Access violation executing location 0x0000000000000000."
What am I missing?
Sounds like putchar is not returning correctly due to rsp being corrupted, or something
By the way, I got the address of putchar like this:
#include <cstdio>
uint_least64_t putchar_addr = (uint_least64_t)&std::putchar;
I need to get the pointer as an integer so I can append it to the code buffer as bytecode later.

Calling windows functions from machine code

Here is the walkthrough I'm using: https://i.imgur.com/LIImg.jpg
From what I'm seeing is that to call a windows function, you put the arguments in specific registers. But where is it listed what registers to use and what order?
Look at the code section of that image, it just seems to use r8d, r9, edx and ecx? Does that mean it uses edx, ecx, r8d, r9d, r10d, etc? What happens when you run out of registers for a function with many parameters?
Also why does it have to subtract from the stack? And why 0x28?

fastcall how to use for more than 4 parameters

I was trying to build a function in assebmly(FASM) that used more than 4 parameters. in x86 it works fine but I know in x64 with fastcall you have to spill the parameters into the shadow space in the order of rcx,rdx,r8,r9 I read that for 5 and etc you have to pass them onto the stack, but I don't know how to do this. this is what I tried but it keeps saying invalid operand. I know that the first 4 parameters I am doing right because I have made x64 functions before but it is the last 3 I don't know how to spill
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
if I try
mov [buffer3],rsp+8*4
it says extra characters on the line.
I also saw that somepeople use rsp+20h, rsp+28h etc but that does not work either.
how do I call more than 4 parameters using fastcall on x64?
also do I have to make room on the stack? I saw some people have to put add rsp,20h right before their spill code. I tried that and it did not help the invlaid operand.
thanks
update
after playing around with it for a little bit I found that the only way it seems to work is if I spill the first 4 parameters and then ignore the rest 5-infinity
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
;start the regular code. ignore spilling buffer3,startposition and length
On x86/x64-CPUs this following instructions does not exist:
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
Workaround with using the rax-register for to read and for to write a values from and to a memory loaction:
mov rax,[rsp+8*4]
mov [buffer3],rax
mov rax,[rsp+8*5]
mov [startposition],rax
mov rax,[rsp+8*6]
mov [length],rax

Help: Application crashes on accessing source code

Here is a simple asm code I have inserted in VC++ project. addr_curr_ebp is the current address of EBP pointer. It is pointing to the old EBP value inside the stack frame. 4 bytes after this is the return address inside the application function. I extract a single byte from the code section. I run my code along with other applications like gtalk, vlc etc. The application always crashes when I include ProbStat 1 and 2 in my code. When I remove these statements everything works fine. What do you think this is?
__asm{
push eax
push ebx
push cx
mov ebx, addr_curr_ebp
mov eax, [ebx + 4]
mov cl, BYTE PTR [eax - 5] //ProbStat 1
mov ret_5, cl // ProbStat 2
pop cx
pop ebx
pop eax
}
Your code snippet isn't good enough to see where "ret_5" is located. You'll get an automatic crash if it is a member of a class. The ecx register stores the "this" pointer, you're messing it up.
Not sure what this does, sound to me like you need to use the _ReturnAddress intrinsic. It returns the address of the instruction after the call instruction that called this code. Assign it to an unsigned char*, no need for assembly this way.