I'm trying to write a thunk for __thiscall using a struct.
I've tested this struct and it works:
#pragma pack(push, 1)
struct Thunk
{
unsigned short leaECX;
unsigned long pThis;
unsigned char movEAX;
unsigned long pMemFunc;
unsigned short jmpEAX;
};
#pragma pack(pop)
I fill this struct with the following bytecode (which I found online):
//Load effective address of this to ECX
//because __thiscall expect to get 'this' in ECX
leaECX = 0x0D8D;
pThis = here goes 'this' pointer;
//Move member function pointer to EAX
movEAX = 0xB8;
pMemFunc = here goes pointer to member function;
//Jump to member function
jmpEAX = 0xE0FF;
My question is can the movEAX and jmpEAX instructions be replaced with bytecode for assembly call instruction ?
If so how do I do it ?
I'm allocating this struct using VirtualAlloc and this flags MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE.
Is this a compact way or does it waste memory (allocate whole page instead of sizeof(Thunk)) ?
You can use call but then of course execution will return to your thunk so you need more code afterwards. Also if you get rid of the mov I assume you will want to do the call address variant, in which case be mindful of the fact that that uses relative encoding, so you can't just poke your address into memory.
You can switch to relative jump to get rid of the mov, using something like this:
#pragma pack(push, 1)
struct Thunk
{
unsigned short leaECX;
unsigned long pThis;
unsigned char jmp;
unsigned long pOffset;
};
#pragma pack(pop)
//Load effective address of this to ECX
//because __thiscall expect to get 'this' in ECX
leaECX = 0x0D8D;
pThis = here goes 'this' pointer;
jmp = 0xE9;
pOffset = (char*)address_of_member - (char*)&thunk.pOffset - 4;
Since memory protections are page granular you will need at least a page (VirtualAlloc does round up for you automatically). If you have multiple thunks you can of course pack them into the same page.
Related
How to find index in VMTHook? im sorry im just starting learning about another hooking method i already know about HWBP Hook,Detours,etc but this VMThook make me confused i cant find any forum that can help me
this is the _asm that i want to hook http://prntscr.com/siz04i and this is my main function
float Delay = 0;
_declspec (naked) void MainFunction()
{
_asm
{
movss xmm0, Delay
movss[esi + 0x58], xmm0
jmp HookFunctionCall
}
}
And this is the VMTHook Function that i want to use
void* HookVTableFunction(void* pVTable, void* fnHookFunc, int nOffset)
{
intptr_t ptrVtable = *((intptr_t*)pVTable); // Pointer to our chosen vtable
intptr_t ptrFunction = ptrVtable + sizeof(intptr_t) * nOffset; // The offset to the function (remember it's a zero indexed array with a size of four bytes)
intptr_t ptrOriginal = *((intptr_t*)ptrFunction); // Save original address
// Edit the memory protection so we can modify it
MEMORY_BASIC_INFORMATION mbi;
VirtualQuery((LPCVOID)ptrFunction, &mbi, sizeof(mbi));
VirtualProtect(mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &mbi.Protect);
// Overwrite the old function with our new one
*((intptr_t*)ptrFunction) = (intptr_t)fnHookFunc;
// Restore the protection
VirtualProtect(mbi.BaseAddress, mbi.RegionSize, mbi.Protect, &mbi.Protect);
// Return the original function address incase we want to call it
return (void*)ptrOriginal;
}
To get the index of a vtable function you subtract the address of the vtable function pointer from the address of the vtable, if it's x86 you then divide this number by 4, if it's x64 you divide by 8.
You can find the address of the vtable easily, a pointer to it exists at offset 0x0 of every object of that class.
vTables only exist if the class has virtual functions.
I want to print return value in my tracer, there are two questions
How to get return address ?
The return position is updated before OR after ~Tracer() ?
Need text here so Stackoverflow formats the code:
struct Tracer
{
int* _retval;
~Tracer()
{ printf("return value is %d", *_retval); }
};
int foo()
{
Tracer __tracter = { __Question_1_how_to_get_return_address_here__ };
if(cond) {
return 0;
} else {
return 99;
}
//Question-2:
// return postion is updated before OR after ~Tracer() called ???
}
I found some hints for Question-1, checking Vc code now
For gcc, __builtin_return_address
http://gcc.gnu.org/onlinedocs/gcc/Return-Address.html
For Visual C++, _ReturnAddress
You can't portably or reliably do this in C++. The return value may be in memory or in a register and may or may not be indirected in different cases.
You could probably use inline assembly to make something work on certain hardware/compilers.
One possible way is to make your Tracer a template that takes a reference to a return value variable (when appropriate) and prints that out before destructing.
Also note that identifiers with __ (double underscore) are reserved for the implementation.
Your question is rather confusing, you're interchangeably using the terms "address" and "value", which are not interchangeable.
Return value is what the function spits out, in x86(_64) that comes in the form of a 4/8 byte value in E/RAX, or EDX:EAX, or XMM0, etc, you can read more about it here.
Return address on the other hand, is what E/RSP point to when a call is made (aka thing on top of the stack), and holds the address of where the function "jumps" back to when it's done (what is by definition called returning).
Now I don't even know what a tracer is tbh, but I can tell you how you'd get either, it's all about hooks.
For the value, and assuming you're doing it internally, just hook the function with one with the same definition, and once it returns you'll have your result.
For the address it's a bit more complicated because you'll have to go a bit lower, and possibly do some asm shenanigains, I really have no idea what exactly you are looking to acomplish, but I made a little "stub" if you will, to provide the callee with the return pointer.
Here is:
void __declspec(noinline) __declspec(naked) __stdcall _replaceFirstArgWithRetPtrAndJump_() {
__asm { //let's call the function we jump to "callee", and the function that called us "caller"
push ebp //save ebp, ESP IS NOW -4
mov ebp, [esp + 4] //save return address
mov eax, [esp + 8] //get callee's address (which is the first param) - eax is volatile so iz fine
mov[esp + 8], ebp //put the return address where the callee's address was (to the callee, it will be the caller)
pop ebp //restore ebp
jmp eax //jump to callee
} }
#define CallFunc_RetPtr(Function, ...) ((decltype(&Function))_replaceFirstArgWithRetPtrAndJump_)(Function, __VA_ARGS__)
unsigned __declspec(noinline) __stdcall printCaller(void* caller, unsigned param1, unsigned param2) {
printf("I'm printCaller, Called By %p; Param1: %u, Param2: %u\n", caller, param1, param2);
return 20;
}
void __declspec(noinline) doshit() {
printf("us: %p\nFunction we're calling: %p\n", doshit, printCaller);
CallFunc_RetPtr(printCaller, 69, 420);
}
Now sure, you could and maybe should use _ReturnAddress() or any different compiler's intrinsics, but if that's not available (which should be a really rare scenario depending on your work) and you know your ASM, this concept should work for any architecture, since however different the instruction set may be, they all follow the same Program Counter design.
I wrote this more because I was looking for an answer for this quite a long time ago for a certain purpose, and I couldn't find a good one since most people just go "hurr durr it's not possible or portable or whatever", and I feel like this would have helped.
I am currently working on using some ASM in C/C++
I have the following
__declspec(naked) unsigned long
someFunction( unsigned long inputDWord )
{
__asm
{
}
}
how, in asm, would I return the unsigned long?
Do I need to push something onto the stack and then call ret?
I haven't used Asm in a long time, and never inside C++ before.
Thanks!
EDIT: Thanks to #Matteo Italia, I've corrected the usage of ret.
Put the retval in eax register, this is according to __cdecl and __stdcall conventions.
Then, depending on the calling convention, you should use the appropriate variant of ret instruction:
In case of __cdecl convention (or similar) - use ret. On machine level this means pop-ing the return address from the stack and jmp to it. The caller is responsible for removing all the function parameters from the stack.
In case of __stdcall convention (or similar) - use ret X, whereas X is the size of all the function arguments.
I'm doing reverse-engineery stuff and patching a game's memory via DLL. Usually I stick to the same old way of patching everything in a single or several functions. But it feels like it could be pulled off better by using a struct array which defines the memory writes that need to take place and looping through them all in one go. Much easier to manage, IMO.
I wanna make it constant, though. So the data is all there in one go (in .rdata) instead of having to dynamically allocate memory for such things each patch, which is a simple task with 'bytesize' data, for example:
struct struc_patch
{
BYTE val[8]; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// simply write "01 02 03 04" to 0x400000
{{0x1, 0x2, 0x3, 0x4}, (void*)0x400000, 4},
};
//[...]
for each(struc_patch p in patches)
{
memcpy(p.dest, p.val, p.size);
}
But when I want to get fancier with the types, I find no way to specify an integer like "0x90909090" as the byte array "90 90 90 90". So this won't work:
struct struc_patch
{
BYTE val[8]; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// how to write "jmp MyHook"? Here, the jmp offset will be truncated instead of overlapping in the array. Annoying.
{{0xE9, (DWORD)&MyHook - 0x400005}, (void*)0x400000, 5},
};
Of course the major problem is that &MyHook has to be resolved by the compiler. Any other way to get the desired result and keep it const?
I've got little experience with STL, to be honest. So if there is a solution using that, I might need it explained in detail in order to understand the code properly. I'm a big C/C++/WinAPI junkie lol, but it's for a game written in a similar nature, so it fits.
I dont think anything from the STL will help you with this, not at compile time.
There might be a fancy way of doing with templates what you did with macros. (comma separating the bytes)
But I recommend doing something simple like this:
struct jump_insn
{
unsigned char opcode;
unsigned long addr;
} jump_insns[] = {
{0xe9, (unsigned long)&MyHook - 0x400005}
};
struct mem
{
unsigned char val[8];
} mems[] = {
{1,2,3,4}
};
struct struc_patch
{
unsigned char *val; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// simply write "01 02 03 04" to 0x400000
{(unsigned char*)(&mems[0]), (void*)0x400000, 4},
// how to write "jmp MyHook"? Here, the jmp offset will be truncated instead of overlapping in the array. Annoying.
{(unsigned char*)(&jump_insns[0]), (void*)0x400000, 5},
};
You can't do everything inline and you will need new types for different kind of patches, but they can be arbitrarily long (not just 8 bytes) and everything will be in .rodata.
A better way to handle that is to calculate the address difference on the fly. For instance (source):
#define INST_CALL 0xE8
void InterceptLocalCode(BYTE bInst, DWORD pAddr, DWORD pFunc, DWORD dwLen)
{
BYTE *bCode = new BYTE[dwLen];
::memset(bCode, 0x90, dwLen);
DWORD dwFunc = pFunc - (pAddr + 5);
bCode[0] = bInst;
*(DWORD *)&bCode[1] = dwFunc;
WriteBytes((void*)pAddr, bCode, dwLen);
delete[] bCode;
}
void PatchCall(DWORD dwAddr, DWORD dwFunc, DWORD dwLen)
{
InterceptLocalCode(INST_CALL, dwAddr, dwFunc, dwLen);
}
dwAddr is the address to put the call instruction in, dwFunc is the function to call and dwLen is the length of the instruction to replace (basically used to calculate how many NOPs to put in).
To summarize, my solution (thanks to Nicolas' suggestion):
#pragma pack(push)
#pragma pack(1)
#define POFF(d,a) (DWORD)d-(a+5)
struct jump_insn
{
const BYTE opcode = 0xE9;
DWORD offset;
};
struct jump_short_insn
{
const BYTE opcode = 0xEB;
BYTE offset;
};
struct struc_patch
{
void *data;
void *dest;
char size;
};
#pragma pack(pop)
And in use:
// Patches
jump_insn JMP_HOOK_LoadButtonTextures = {POFF(&HOOK_LoadButtonTextures, 0x400000)};
struc_patch patches[] =
{
{&JMP_HOOK_LoadButtonTextures, IntToPtr(0x400000)},
};
Using class member const's I can define everything much easier and cleaner and it can simply all be memcpy'd. The pack pragma is of course required to ensure that memcpy doesn't copy the 3 align bytes between the BYTE opcode and DWORD value.
Thanks all, helped me make my patching methods a lot more robust.
I've been trying to use 'thunking' so I can use member functions to legacy APIs which expects a C function. I'm trying to use a similar solution to this. This is my thunk structure so far:
struct Thunk
{
byte mov; // ↓
uint value; // mov esp, 'value' <-- replace the return address with 'this' (since this thunk was called with 'call', we can replace the 'pushed' return address with 'this')
byte call; // ↓
int offset; // call 'offset' <-- we want to return here for ESP alignment, so we use call instead of 'jmp'
byte sub; // ↓
byte esp; // ↓
byte num; // sub esp, 4 <-- pop the 'this' pointer from the stack
//perhaps I should use 'ret' here as well/instead?
} __attribute__((packed));
The following code is a test of mine which uses this thunk structure (but it does not yet work):
#include <iostream>
#include <sys/mman.h>
#include <cstdio>
typedef unsigned char byte;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long ulong;
#include "thunk.h"
template<typename Target, typename Source>
inline Target brute_cast(const Source s)
{
static_assert(sizeof(Source) == sizeof(Target));
union { Target t; Source s; } u;
u.s = s;
return u.t;
}
void Callback(void (*cb)(int, int))
{
std::cout << "Calling...\n";
cb(34, 71);
std::cout << "Called!\n";
}
struct Test
{
int m_x = 15;
void Hi(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
};
int main(int argc, char * argv[])
{
std::cout << "Begin Execution...\n";
Test test;
Thunk * thunk = static_cast<Thunk*>(mmap(nullptr, sizeof(Thunk),
PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0));
thunk->mov = 0xBC; // mov esp
thunk->value = reinterpret_cast<uint>(&test);
thunk->call = 0xE8; // call
thunk->offset = brute_cast<uint>(&Test::Hi) - reinterpret_cast<uint>(thunk);
thunk->offset -= 10; // Adjust the relative call
thunk->sub = 0x83; // sub
thunk->esp = 0xEC; // esp
thunk->num = 0x04; // 'num'
// Call the function
Callback(reinterpret_cast<void (*)(int, int)>(thunk));
std::cout << "End execution\n";
}
If I use that code; I receive a segmentation fault within the Test::Hi function. The reason is obvious (once you analyze the stack in GDB) but I do not know how to fix this. The stack is not aligned properly.
The x argument contains garbage but the y argument contains the this pointer (see the Thunk code). That means the stack is misaligned by 8 bytes, but I still don't know why this is the case. Can anyone tell why this is happening? x and y should contain 34 and 71 respectively.
NOTE: I'm aware of the fact that this is does not work in all scenarios (such as MI and VC++ thiscall convention) but I want to see if I can get this work, since I would benefit from it a lot!
EDIT: Obviously I also know that I can use static functions, but I see this more as a challenge...
Suppose you have a standalone (non-member, or maybe static) cdecl function:
void Hi_cdecl(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
Another function calls it this way:
push 71
push 36
push (return-address)
call (address-of-hi)
add esp, 8 (stack cleanup)
You want to replace this by the following:
push 71
push 36
push this
push (return-address)
call (address-of-hi)
add esp, 4 (cleanup of this from stack)
add esp, 8 (stack cleanup)
For this, you have to read the return-address from the stack, push this, and then, push the return-address. And for the cleanup, add 4 (not subtract) to esp.
Regarding the return address - since the thunk must do some cleanup after the callee returns, it must store the original return-address somewhere, and push the return-address of the cleanup part of the thunk. So, where to store the original return-address?
In a global variable - might be an acceptable hack (since you probably don't need your solution to be reentrant)
On the stack - requires moving the whole block of parameters (using a machine-language equivalent of memmove), whose length is pretty much unknown
Please also note that the resulting stack is not 16-byte-aligned; this can lead to crashes if the function uses certain types (those that require 8-byte and 16-byte alignment - the SSE ones, for example; also maybe double).