Call functions from x86_64 assembly - c++

I am trying to create my own JIT and so far managed to run very simple assembly code (in machine-code), but having trouble figuring out how to call functions this way. In Visual Studio I can see functions in disassembly window.
Another related question is how do I call Win32 MessageBox() in machine-code?
Next question is how do I call external DLL/LIB functions in this manner?
Also is there any books or tutorials which could teach me further in this subject? I have tried to search for it but get results like .NET, JVM and LLVM which I think is not really what I am looking for.
Here is a simplified version of the code that I am working on:
#include <iostream>
#include <Windows.h>
int main(int argc, char* argv[])
{
// b8 03 00 00 00 83 c0 02 c3
unsigned char code[] = {
0xb8, // mov eax, 3
0x03, 0x00, 0x00, 0x00, // 3 (32 bit)
0x83, // add eax, 2 // 0x83 = add,
0xc0, // ModR/M with immediate 8 bit value
0x02, // 2 (8 bit)
0xc3 // ret
};
void* mem = VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(mem, code, sizeof(code));
DWORD old;
VirtualProtect(mem, sizeof(mem), PAGE_EXECUTE_READ, &old);
int(*func)() = reinterpret_cast<int(*)()>(mem);
printf("Number is %d\n", func());
VirtualFree(mem, 0, MEM_RELEASE);
return 0;
}
Is it possible to have the JIT assembly code to call a C++ function?
Before this project I made a byte-code interpreter in C++, but I wasn't really happy with the speed when comparing it to equivalent test program in C#. C# was roughly 25x times faster. So I stumbled on something called JIT to make it faster. So I hope you all can see where I am taking this JIT project. And maybe if possible make it handle GUI.

You can probably find some tutorials about writing a compiler/linker. It may help with implementing/calling dynamic libraries.
I'm not sure what you exactly mean by calling C++ functions. Anyway I wrote the following demo program that you can take a look and see if it helps at all.
#include <Windows.h>
#include <iostream>
using namespace std;
__int64 sub(__int64 a, __int64 b)
{
return a - b;
}
int main(int argc, char **argv)
{
char code[] =
{
0x48, 0x89, 0xC8, // mov rax, rcx
0xC3, // ret
0x48, 0x83, 0xEC, 0x20, // sub rsp, 0x20
0xFF, 0xD0, // call rax
0x48, 0x83, 0xC4, 0x20, // add rsp, 0x20
0xC3 // ret
};
char *mem = static_cast<char *>(VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE));
MoveMemory(mem, code, sizeof(code));
auto setFunc = reinterpret_cast<void *(*)(void *)>(mem);
auto callFunc = reinterpret_cast<__int64 (*)(__int64, __int64)>(mem + 4);
setFunc(sub);
__int64 r = callFunc(0, 1);
cout << "r = " << r << endl;
VirtualFree(mem, 0, MEM_RELEASE);
cin.ignore();
return 0;
}

Related

Find Two-byte Illegal Opcodes for x86-64

My goal is to find what two-byte opcodes generate an illegal instruction exception.
For example, opcodes 0F 0B UD2 raises an invalid opcode exception. The UD2 instruction is provided for software testing to explicitly generate an invalid opcode.
Warning Snake oil code ahead as I'm not familiar with Windows internals.
The code below allocates a 4K page with read/write/execute permissions and using UD2 as a starting point it tries to determine all the possible two-byte opcodes.
First, it copies the two-byte opcodes to the last two bytes of the 4K page
then executes them and checks for the exception code.
I figured that executing the last two page bytes would either
Generate an illegal exception EXCEPTION_ILLEGAL_INSTRUCTION with exactly two bytes.
Generate an access violation EXCEPTION_ACCESS_VIOLATION when extending beyond the 4K page.
Running the code below shows interesting instructions plus many unknowns too:
Illegal opcodes 0x0f 0x0b (error 0xc000001d)
ud2 - Generates an invalid opcode.
Illegal opcodes 0x0f 0x37 (error 0xc000001d)
getsec - Exit authenticated code execution mode.
Illegal opcodes 0x0f 0xaa (error 0xc000001d)
rsm - Resume operation of interrupted program.
Question
The hack'ish code runs fine in this opcode range
Executing opcodes 0x0f 0x0b ... Executing opcodes 0x0f 0xcb
until it encounters these two opcodes
0x0f 0xcc bswap esp
It seems anything that manipulates the stack pointer causes issues whereby it's stuck at this point (clicking Continue just repeats the message)
I've tried moving the opcode execution into its own thread since they have their own stack, but that didn't help!
Is there a way to preserve the stack pointers RSP and RBP or maybe there's a simple fix to resolve it?
(Built using M$ Visual C++ 2019)
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <string.h>
#include <intrin.h>
// The UD2 (0x0F, 0x0B) instruction is guaranteed to generate an invalid opcode exception.
DWORD InstructionResult;
void ExecuteOpcodes(LPVOID mem)
{
__try
{
// Execute opcodes...
((void(*)())((unsigned char*)mem + 0xFFE))();
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
InstructionResult = GetExceptionCode();
}
}
int main()
{
LPVOID mem = VirtualAlloc(NULL, 2, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
DWORD oldProtect = VirtualProtect(mem, 2, PAGE_EXECUTE_READWRITE, &oldProtect);
// Start searching at the UD2 (0x0F, 0x0B) instruction which is guaranteed to generate an invalid opcode exception.
for (int i = 15; i <= 255; i++)
{
for (int j = 11; j <= 255; j++)
{
// Write two byte opcodes at the 4K page end.
*((unsigned char*)mem + 0xFFE) = i;
*((unsigned char*)mem + 0xFFF) = j;
printf("Executing opcodes 0x%02x 0x%02x\n",i,j);
HANDLE hThread = CreateThread(0, 0, (LPTHREAD_START_ROUTINE)ExecuteOpcodes, mem, 0, 0);
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
if (InstructionResult == EXCEPTION_ILLEGAL_INSTRUCTION)
{
printf("Illegal opcodes 0x%02x 0x%02x (error 0x%08x)\n", i, j, InstructionResult);
}
}
}
VirtualFree(mem, 0, MEM_RELEASE);
return 0;
}
UPDATE
Based upon the good answer about creating a child process, I've pasted the updated code here: pastebin.com/j3NkL44q
#define _CRT_SECURE_NO_WARNINGS
#include <windows.h>
#include <stdio.h>
// Array of illegal single-byte opcodes
int IllegalSingleByteOpcodes[] = { 0x06 ,0x07, 0x0e, 0x16, 0x17, 0x1e, 0x1f, 0x27, 0x2f, 0x37, 0x3f, 0x60, 0x61, 0xce, 0xd6 };
// Check if a given opcode is illegal
bool is_illegal_opcode(int opcode) {
for (size_t i = 0; i < sizeof(IllegalSingleByteOpcodes) / sizeof(IllegalSingleByteOpcodes[0]); i++) {
if (IllegalSingleByteOpcodes[i] == opcode) {
return true;
}
}
return false;
}
// Create and wait for a process with the given opcodes
void create_and_wait_for_process(int opcode1, int opcode2) {
// Set up startup and process info
STARTUPINFO si;
PROCESS_INFORMATION pi;
ZeroMemory(&si, sizeof(si));
si.cb = sizeof(si);
ZeroMemory(&pi, sizeof(pi));
// Create command line string
char cmdline[256];
snprintf(cmdline, sizeof(cmdline), "IllegalOpcodes.exe %d %d", opcode1, opcode2);
// Create process
if (!CreateProcess(NULL, cmdline, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi)) {
printf("CreateProcess failed (%d).\n", GetLastError());
exit(1);
}
// Wait for process to finish
if (WaitForSingleObject(pi.hProcess, 1000) != WAIT_OBJECT_0) {
TerminateProcess(pi.hProcess, 0);
}
// Clean up handles
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
}
int main() {
// Iterate through all possible opcode pairs
for (int opcode1 = 0; opcode1 <= 255; opcode1++) {
// Skip illegal opcodes
if (is_illegal_opcode(opcode1)) {
printf("\nSkipping Illegal Opcode 0x%02x ...\n", opcode1);
continue;
}
printf("\nChecking Opcode 0x%02x ...\n", opcode1);
for (int opcode2 = 0; opcode2 <= 255; opcode2++) {
create_and_wait_for_process(opcode1, opcode2);
}
}
return 0;
}
------------------- IllegalOpcodes.cpp -------------------
#include <windows.h>
#include <stdio.h>
// Offset to write opcodes at the 4K page end
#define CODE_PAGE_END_OFFSET 0xFFE
// Write two byte opcodes at the 4K page end and execute them
void write_and_execute_opcodes(LPVOID code_page_mem, int opcode1, int opcode2)
{
*((unsigned char*)code_page_mem + CODE_PAGE_END_OFFSET) = opcode1;
*((unsigned char*)code_page_mem + CODE_PAGE_END_OFFSET + 1) = opcode2;
__try
{
// Execute opcodes...
((void(*)())((unsigned char*)code_page_mem + CODE_PAGE_END_OFFSET))();
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
switch (GetExceptionCode()) {
case EXCEPTION_ILLEGAL_INSTRUCTION:
printf("{0x%02x,0x%02x},", opcode1, opcode2);
break;
default:
// Ignore other exceptions
break;
}
}
}
int main(int argc, char* const argv[])
{
int opcode1 = atoi(argv[1]);
int opcode2 = atoi(argv[2]);
LPVOID code_page_mem = VirtualAlloc(NULL, 2, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
DWORD old_protect = VirtualProtect(code_page_mem, 2, PAGE_EXECUTE_READWRITE, &old_protect);
write_and_execute_opcodes(code_page_mem, opcode1, opcode2);
VirtualFree(code_page_mem, 0, MEM_RELEASE);
return 0;
}
It now includes checks to avoid these single byte illegal opcodes
32-bit Legal
06 push es
07 pop es
0e push cs
16 push ss
17 pop ss
1e push ds
1f pop ds
27 daa
2f das
37 aaa
3f aas
60 pushad
61 popad
ce into
d6 ??? <--- http://ref.x86asm.net/coder64.html#xD6
64-bit Illegal
06 ???
07 ???
0e ???
16 ???
17 ???
1e ???
1f ???
27 ???
2f ???
37 ???
3f ???
60 ???
61 ???
ce ???
d6 ???
maybe there's a simple fix to resolve it?
The UNIX-standard way to resolve this is to do all the test execution in a child process.
When I last worked on Windows 15 years ago, creating a child process was very expensive (slow). But since you have fewer that 64K byte combinations to try, even a slow mechanism will get you all the answers in at most a few hours.

How to use sys headers in Windows (or find their MSVC counterparts)?

I was learning about to how to build a JIT compiler and stumbled across a piece of code (attached below) :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
// Allocates RWX memory of given size and returns a pointer to it. On failure,
// prints out the error and returns NULL.
void* alloc_executable_memory(size_t size) {
void* ptr = mmap(0, size,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == (void*)-1) {
perror("mmap");
return NULL;
}
return ptr;
}
void emit_code_into_memory(unsigned char* m) {
unsigned char code[] = {
0x48, 0x89, 0xf8, // mov %rdi, %rax
0x48, 0x83, 0xc0, 0x04, // add $4, %rax
0xc3 // ret
};
memcpy(m, code, sizeof(code));
}
const size_t SIZE = 1024;
typedef long (*JittedFunc)(long);
// Allocates RWX memory directly.
void run_from_rwx() {
void* m = alloc_executable_memory(SIZE);
emit_code_into_memory(m);
JittedFunc func = m;
int result = func(2);
printf("result = %d\n", result);
}
Now before littering my terminal with error messages, I googled up MSDN for these functions and to my surprise, none of them turned up. These are apparently POSIX header files that are unavailable in Windows. My question is does MSVC alternatives to these headers exist?
I have installed Cygwin, but I get header not found error.

Cast shellcode inside function pointer

I am trying to perform a system call on 32-bit, but there is an issue.
I originally had a naked function as my stub and used inline assembly, but when I tried to turn it into shellcode, despite it being a 1-to-1 copy of the naked function (When looking at it in Visual Studio's disassembly), it does not function (Access Violation Executing NULL).
It worked perfectly with the naked function, by the way.
Here is the shellcode I wrote:
0: b8 26 00 00 00 mov eax,0x26
5: 64 ff 15 c0 00 00 00 call DWORD PTR fs:0xc0
c: c3 ret
And here is the code: Everything works fine. Memory gets allocated successfully, the problem is whenever I attempt to call NtOpenProcess: it attempts to execute a null pointer, resulting in an access execution violation.
typedef NTSTATUS(NTAPI * f_NtOpenProcess)(PHANDLE, ACCESS_MASK, OBJECT_ATTRIBUTES *, CLIENT_ID *);
INT
main(
VOID
)
{
HANDLE hProcess = NULL;
OBJECT_ATTRIBUTES oaAttributes;
memset(&oaAttributes,
NULL,
sizeof(oaAttributes));
oaAttributes.Length = sizeof(oaAttributes);
CLIENT_ID ciClient;
ciClient.UniqueProcess = GetCurrentProcessId();
ciClient.UniqueThread = NULL;
BYTE Stub[] = { 0xB8, 0x00, 0x00, 0x00, 0x00, 0x64, 0xFF, 0x15, 0x0C, 0x00, 0x00, 0x00, 0xC3 };
*(DWORD*)(Stub + 1) = 0x26;
PVOID Mem = VirtualAlloc(NULL, sizeof(Stub), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
memcpy(Mem, &Stub, sizeof(Stub));
DWORD Old = NULL;
VirtualProtect(Mem, sizeof(Stub), PAGE_EXECUTE, &Old);
f_NtOpenProcess NtOpenProcess = (f_NtOpenProcess)Mem;
DWORD Status = NtOpenProcess(&hProcess,
PROCESS_ALL_ACCESS,
&oaAttributes,
&ciClient);
printf("Status: 0x%08X\nHandle: 0x%08X\n",
Status,
hProcess);
getchar();
return NULL;
}
If anyone is wondering why am I doing this, I am really bored and I like to mess around with code when I am :)
As noted by Micheal Petch, the shellcode was wrong.
I only missed one byte (0x0C) that should be 0xC0.
If anyone will ever attempt something so stupid and useless like I did, double-check your shellcode first!

Application crashes when using hooked function with cout

I'm trying to hook a function using its address, and so far it is overall working great!
The only problem is when using std::cout in combination with having MessageBoxA from the WinAPI hooked, then it crashes! The weird thing is, it only crashes in that specific case, not if it is called in combination with printf or simply int i = MessageBoxA(...);
For testing, I made so that the instructions at the function address directly returns 32. Not much of a hook I know, but this is just for testing.
// mov eax, 32
// ret
const DWORD instructionCount = 6;
BYTE instructions[instructionCount] = { 0xB8, 0x20, 0x00, 0x00, 0x00, 0xC3 };
Besides having to change protection on a region with VirtualProtect(), then
now I basically just do
memcpy(MessageBoxA, instructions, instructionCount);
Now testing it using this:
int i = MessageBoxA(NULL, "Hello World", "Injector", MB_OK);
printf("Works: %d\n", i);
printf("Works: %d\n", MessageBoxA(NULL, "Hello World", "Injector", MB_OK));
std::cout << "Works: " << i << std::endl;
std::cout << "Doesn't: " << MessageBoxA(NULL, "Hello World", "Injector", MB_OK) << std::endl;
printf("Hello World\n");
Then it crashes just after std::cout << MessageBoxA(...). Removing that line, and everything works!
Note that it successfully prints 32, it crashes when reaching the next statement.
Again it is only in that case where it doesn't work, so using this:
__declspec(noinline) int testFunction(int i)
{
return i;
}
Then reusing the above code and changing MessageBoxA to testFunction (as well as the arguments), and now all 4 statements work!
Bottom line, does anybody have any ideas for why and what is causing the crash in that specific case? When the other cases work perfectly fine.
I think the issue is that you're corrupting the stack pointer. Try using the following:
const DWORD instructionCount = 8;
BYTE instructions[instructionCount] = { 0xB8, 0x20, 0x00, 0x00, 0x00, 0xC2, 0x10, 0x0 };
That will pop the args off the stack as Peter Cordes mentioned. Hope that helps.

C++ Detour on winsock recv hooking - custom packet

I'm trying to add additional packet in MyRecv function, but I don't know why it doesn't working. I tried to parse incoming packets and function works fine.
So probably my way to sending custom packet to application isn't properly.
In general assumption I just want send prepared packet to application.
This packet i took from WPE PRO.
Code with MyRecv function:
INT WINAPI MyRecv(SOCKET sock, CHAR* buf, INT len, INT flags) {
CHAR buffer[256];
char msg2[] = { 0x1B, 0, 0x04, 0x06, 0, 0x5A, 0x65, 0x6E, 0x74, 0x61,
0x78, 0x06, 0, 0x5A, 0x65, 0x6E, 0x74, 0x61, 0x78, 0x05, 0x07, 0,
0x66, 0x61, 0x6A, 0x6E, 0x69, 0x65, 0x65 };
int ret = precv(sock, buf, len, flags);
if (ret <= 0) {
return ret;
}
if (fake_recv) {
char tmp[256];
fake_recv = false;
printf("Fake1-> Lenght:%d Size:%d", len, strlen(buf));
strcat(buf, msg2);
printf("Fake2-> Lenght:%d Size:%d", len, strlen(buf));
return ret;
}
return ret;
}
msg2 isn't a null-terminated string. In fact it has an interior null. So using strlen() and strcat() with it is never going to work.
Similarly you neither know nor care what's already in buf, so calling strcat() and strlen() on that is both pointless and dangerous: if it contains no nulls at all you will over-run it, and at best over-report the length, and at worst crash.
And you're not adjusting ret for the extra data added into the buffer.
And no useful purpose is accomplished by declaring the unused tmp[] variable.
Try this:
if (fake_recv) {
fake_recv = false;
printf("Fake1-> Length:%d Received:%d", len, ret);
int len2 = min(len-ret, sizeof msg2);
memcpy(&buf[ret], msg2, len2);
ret += len2;
printf("Fake2-> Length:%d Received:%d", len, ret);
return ret;
}