I made some computations to get a relative virtual address(RVA).
I compute a correct RVA (according to the .map file) and I want to translate it to a callable function pointer.
Is there a way to translate it to a physical address?
I have thought of taking the image base address and add it. According to this thread it should be possible via GetModuleHandle(NULL), but the result is "wrong". I only have a good result when I subtract a pointer from a function from its RVA defined in the .map file.
Is there a WINAPI to either convert a RVA to a physical address, or get the image base address, or get the RVA of a physical address?
Here's some example code:
#include <stdio.h>
#include <Windows.h>
#include <WinNT.h>
static int count = 0;
void reference() // According to .map RVA = 0x00401000
{
}
void toCall() // According to .map RVA = 0x00401030
{
printf("Called %d\n", ++count);
}
int main()
{
typedef void (*fnct_t)();
fnct_t fnct;
fnct = (fnct_t) (0x00401030 + (((int) reference) - 0x00401000));
fnct(); // works
fnct = (fnct_t) (0x00401030 + ((int) GetModuleHandle(NULL)) - 0x00400000);
fnct(); // often works
return 0;
}
My main concern is that it seems that sometimes (maybe in threaded contexts) GetModuleHandle(NULL) isn't correct.
To get the image base without the entry point being predefined directly at compile-time you can do, a simple search from aligning eax#1000h from the current VA and looping until a valid PE signature 'MZ' is found in the current memory page.
Assuming the base address is not relocated into another PE image. I've prepared a function for you:
DWORD FindImageBase() {
DWORD* VA = (DWORD *) &FindImageBase, ImageBase;
__asm {
mov eax, VA
and eax, 0FFFF0000h
search:
cmp word ptr [eax], 0x5a4d
je stop
sub eax, 00010000h
jmp search
stop:
mov [ImageBase], 0
mov [ImageBase], eax
}
return ImageBase;
}
Related
What I'm trying to do is intercept a register value at a given address in a x86 asm and put it into a variable. To do that, I'm injecting a dll into my program, here it is:
#include <Windows.h>
#include <stdio.h>
void test()
{
int randomNum;
DWORD address = 0x088AD6D; // return address (used only with JMP)
__asm
{
//rewriting what I erased with my detour
movsx eax, byte ptr ds:[esi+4]
push ebp
push eax
// moving ecx to my variable
mov randomNum, ecx
}
printf("number: %d\n", randomNum); // This correctly print the value I detoured
// this part is only used if I use JMP instead of call
__asm
{
JMP address // return to where the program should have been
}
}
void DetourAddress(void* funcPtr, void* hook)
{
// write jmp
BYTE cmd[5] =
{
0xE9, //jmp
0x00, 0x00, 0x00, 0x00 //address
};
/*
// write call
BYTE cmd[5] =
{
0xE8, // call
0x00, 0x00, 0x00, 0x00 // our function address
};
*/
DWORD dwProtect;
VirtualProtect(funcPtr, 5, PAGE_EXECUTE_READWRITE, &dwProtect); // make memory writable
DWORD offset = ((DWORD)hook - (DWORD)funcPtr - 5); //((to)-(from)-5)
memcpy(&cmd[1], &offset, 4); // write address into jmp
WriteProcessMemory(GetCurrentProcess(), (LPVOID)funcPtr, cmd, 5, 0); // write jmp / call
VirtualProtect(funcPtr, 5, dwProtect, NULL); // reprotect
}
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved )
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
AllocConsole();
freopen("CONOUT$", "w", stdout);
DetourAddress((void*)0x088AD68, (void*)&test);
break;
case DLL_PROCESS_DETACH:
FreeConsole();
break;
}
return TRUE;
}
The DetourAddress function is where all the magic is supposed to happen, I write a jmp command at the address I want to detour to my test() function. The test function replicate the code that was erased and I move the ecx value into my variable randomNum. I then jmp back to where the code should have been at if I didn't detour it. In theory, it works fine, but the problem is that during my test() function, the registers value get changed and when I jmp back to the original code, not only do my program stop working as intended, but it also crashes soon enough because of access violation...
For those of you who are visual, here are what happens in ollydbg:
Step 1 (the original code):
Step 2, what it looks like after I detoured the code:
Step 3, the function test() in ollydbg:
Step 4, the jmp back to the original code:
As you can see, when we go back to the original code the registers value are completely changed which leads to all sort of problem... I'm looking for a way to prevent that.
I found this question (C++ mid-function hook: get register values and jump back [x86 assembly on windows]) that is similar (or even identical...) to mine but the answer given doesn't help me. I tried using a call command instead of jmp and not only did the registers value kept being changed, but it also lead to one more problem: When I returned from my test() function, it returned in non readable memory instead of the original function...
Help please?
Rather hook the call a few instructions down, this will allow you to correctly preserve the registers while avoiding any inline assembly all together. building the pass-though and capture would look something like (assuming it returns nothing):
typedef void (__stdcall * hookfn)(const char* str, DWORD dw1, DWORD dw2, DWORD dw3, DWORD dw4);
void __stdcall Intercept(const char* str, DWORD dw1, DWORD dw2, DWORD dw3, DWORD dw4)
{
randomNum = dw1;
printf("%d\n",randomNum);
hookfn fn = (hookfn)((DWORD)BaseOfTargetModule + RVAOfCall);
fn(str,dw1,dw2,dw3,dw4);
}
You'd use your same patching mechanism to overwrite the relative call address. Its also a good idea to account for relocation by using Base+RVA rather than a fixed virtual address.
If you still want your hook, you need to preserve all the registers, this is easily done using PUSHAD on entry and POPAD on exit (these save/restore all registers bar the flags). Also, you should only perform the instructions that you overwrote just before you jump back (but after the POPAD), else they can cause issues with whatever your hook might be doing.
So I've asked this before but with significantly less detail. The question title accurately describes the problem: I have a method in C++ that I am trying to call from assembly (x86) that has both parameters and a return value. I have a rough understanding, at best, of assembly and a fairly solid understanding of C++ (otherwise I would not have undertaken this problem). Here's what I have as far as code goes:
// methodAddr is a pointer to the method address
void* methodAddr = method->Address;
// buffer is an int array of parameter values. The parameters can be anything (of any type)
// but are copied into an int array so they can be pushed onto the stack in reverse order
// 4 bytes at a time (as in push (int)). I know there's an issue here that is irrelevent to my baseline testing, in that if any parameter is over 4 bytes it will be broken and
// reversed (which is not good) but for basic testing this isn't an issue, so overlook this.
for (int index = bufferElementCount - 1; index >= 0; index--)
{
int val = buffer[index];
__asm
{
push val
}
}
int returnValueCount = 0;
// if there is a return value, allocate some space for it and push that onto the stack after
// the parameters have been pushed on
if (method->HasReturnValue)
{
*returnSize = method->ReturnValueSize;
outVal = new char[*returnSize];
returnValueCount = (*returnSize / 4) + (*returnSize % 4 != 0 ? 1 : 0);
memset(outVal, 0, *returnSize);
for (int index = returnValueCount - 1; index >= 0; index--)
{
char* addr = ((char*)outVal) + (index * 4);
__asm
{
push addr
}
}
}
// calculate the stack pointer offset so after the call we can pop the parameters and return value
int espOffset = (bufferElementCount + returnValueCount) * 4;
// call the method
__asm
{
call methodAddr;
add esp, espOffset
};
For my basic testing I am using a method with the following signature:
Person MyMethod3( int, char, int );
The problem is this: when omit the return value from the method signature, all of the parameter values are properly passed. But when I leave the method as is, the parameter data that is passed is incorrect but the value returned is correct. So my question, obviously, is what is wrong? I've tried pushing the return value space onto the stack before the parameters. The person structure is as follows:
class Person
{
public:
Text Name;
int Age;
float Cash;
ICollection<Person*>* Friends;
};
Any help would be greatly appreciated. Thanks!
I'm using Visual Studio 2013 with the November 2013 CTP compiler for C++, targeting x86.
As it relates to disassembly, this is the straight method call:
int one = 876;
char two = 'X';
int three = 9738;
Person p = MyMethod3(one, two, three);
And here is the disassembly for that:
00CB0A20 mov dword ptr [one],36Ch
char two = 'X';
00CB0A27 mov byte ptr [two],58h
int three = 9738;
00CB0A2B mov dword ptr [three],260Ah
Person p = MyMethod3(one, two, three);
00CB0A32 push 10h
00CB0A34 lea ecx,[p]
00CB0A37 call Person::__autoclassinit2 (0C6AA2Ch)
00CB0A3C mov eax,dword ptr [three]
00CB0A3F push eax
00CB0A40 movzx ecx,byte ptr [two]
00CB0A44 push ecx
00CB0A45 mov edx,dword ptr [one]
00CB0A48 push edx
00CB0A49 lea eax,[p]
00CB0A4C push eax
00CB0A4D call MyMethod3 (0C6B783h)
00CB0A52 add esp,10h
00CB0A55 mov dword ptr [ebp-4],0
My interpretation of this is as follows:
Execute the assignments to the local variables. Then create the output register. Then put the parameters in a particular register (the order here happens to be eax, ecx, and edx, which makes sense (eax and ebx are for one, ecx is for two, and edx and some other register for the last parameter?)). Then call LEA (load-effective address) which I don't understand but have understood to be a MOV. Then it calls the method with an address as the parameter? And then moves the stack pointer to pop the parameters and return value.
Any further explanation is appreciated, as I'm sure my understanding here is somewhat flawed.
From http://lastfrag.com/hotpatching-and-inline-hooking-explained/,
Q1) Does code proceed from high memory to low memory or vice versa?
Q2) More importantly, during the calculation of the replacement offset, why is it that you have to minus the function preamble? Is it because the offset starts from the end of the instruction and not the beginning?
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
Full Code:
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD ReplacementAddress = (DWORD)Function;
DWORD ReplacementAddressOffset = ReplacementAddress - OriginalAddress - 5;
LPBYTE pOriginalAddress = (LPBYTE)OriginalAddress;
LPBYTE pReplacementAddressOffset = (LPBYTE)(&ReplacementAddressOffset);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 5, NewProtect, &OldProtect);
for (int i = 0; i < 5; i++)
Store[i] = pOriginalAddress[i];
pOriginalAddress[0] = (BYTE)0xE9;
for (int i = 0; i < 4; i++)
pOriginalAddress[i + 1] = pReplacementAddressOffset[i];
VirtualProtect((PVOID)OriginalAddress, 5, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
Q3) In this code, the relative address of a jmp instruction is being replaced; relAddrSet is a pointer to the original destination; to is a pointer to the new destination. I don't understand the calculation of the to address, why is it that you have to add the original destination to the functionForHook + opcodeOffset?
DWORD *relAddrSet = (DWORD *)(currentOpcode + 1);
DWORD_PTR to = (*relAddrSet) + ((DWORD_PTR)functionForHook + opcodeOffset);
*relAddrSet = (DWORD)(to - ((DWORD_PTR)originalFunction + opcodeOffset));
Yes the relative address is the the offset after the instructions, that's why you have to substract 5.
But, in my opinion, you should just forget the idea of the relative jump and try absolute jump.
Why ? Because it is a lot easier and x86-64 compatible (relative jumps are limited to +/-2GB).
The absolute jump is (x64) :
48 b8 ef cd ab 89 67 45 23 01 mov rax, 0x0123456789abcdef
ff e0 jmp rax
And for x86 :
b8 67 45 23 01 mov eax, 0x01234567
ff e0 jmp eax
Here is the modified code (the loader is now 7 bytes instead of 5):
void HookAPI(wchar_t *Module, char *API, DWORD Function)
{
HMODULE hModule = LoadLibrary(Module);
DWORD OriginalAddress = (DWORD)GetProcAddress(hModule, API);
DWORD OldProtect = 0;
DWORD NewProtect = PAGE_EXECUTE_READWRITE;
VirtualProtect((PVOID)OriginalAddress, 7, NewProtect, &OldProtect);
memcpy(Store, OriginalAddress, 7);
memcpy(OriginalAddress, "\xb8\x00\x00\x00\x00\xff\xe0", 7);
memcpy(OriginalAddress+1, &ReplacementAddress, sizeof(void*));
VirtualProtect((PVOID)OriginalAddress, 7, OldProtect, &NewProtect);
FlushInstructionCache(GetCurrentProcess(), NULL, NULL);
FreeLibrary(hModule);
}
The code is the same for x64 but you have to add 2 nops (90) at the beginning or the end in order match the size of the following instructions, so the loader is "\x48\xb8<8-bytes addr>\xff\xe0\x90\x90" (14 bytes)
Q1) The program runs from lower to highest addresses (i.e. the program counter gets increased by the size of each instruction, unless in case of jumps, calls or ret). But I am probably missing the point of the question.
Q2) Yes, on x86 the jumps are executed after the program counter has been increased by the size of the jump instruction (5 bytes); when the CPU adds the jump offset to the program counter to calculate the target address, the program counter has already been increased of 5.
Q3) This code is quite weird, but it may work. I suppose that *relAddrset initially contains a jump offset to originalFunction (i.e. *relAddSet==originalFunction-relativeOffset). If this is true, the final result is that *reladdrSet contains a jump offset to functionFoHook. Indeed the last instruction becomes:
*relAddrSet=(originalFunction-relativeOffset)+functionForHook-originalFunction
== functionForHook-relativeOffset
Yes, code runs "forward" if I understand this question correctly. One instruction is executed after another if it is not branching.
An instruction that does a relative jump (JMP, CALL) does the jump relative to the start of the next instruction. That's why you have to subtract the length of the instruction (here: 5) from the difference.
I can't answer your third question. Please give some context and what the code is supposed to do.
I have a C++ app that uses large arrays of data, and have noticed while testing that it is running out of memory, while there is still plenty of memory available. I have reduced the code to a sample test case as follows;
void MemTest()
{
size_t Size = 500*1024*1024; // 512mb
if (Size > _HEAP_MAXREQ)
TRACE("Invalid Size");
void * mem = malloc(Size);
if (mem == NULL)
TRACE("allocation failed");
}
If I create a new MFC project, include this function, and run it from InitInstance, it works fine in debug mode (memory allocated as expected), yet fails in release mode (malloc returns NULL). Single stepping through release into the C run times, my function gets inlined I get the following
// malloc.c
void * __cdecl _malloc_base (size_t size)
{
void *res = _nh_malloc_base(size, _newmode);
RTCCALLBACK(_RTC_Allocate_hook, (res, size, 0));
return res;
}
Calling _nh_malloc_base
void * __cdecl _nh_malloc_base (size_t size, int nhFlag)
{
void * pvReturn;
// validate size
if (size > _HEAP_MAXREQ)
return NULL;
'
'
And (size > _HEAP_MAXREQ) returns true and hence my memory doesn't get allocated. Putting a watch on size comes back with the exptected 512MB, which suggests the program is linking into a different run-time library with a much smaller _HEAP_MAXREQ. Grepping the VC++ folders for _HEAP_MAXREQ shows the expected 0xFFFFFFE0, so I can't figure out what is happening here. Anyone know of any CRT changes or versions that would cause this problem, or am I missing something way more obvious?
Edit: As suggested by Andreas, looking at this under this assembly view shows the following;
--- f:\vs70builds\3077\vc\crtbld\crt\src\malloc.c ------------------------------
_heap_alloc:
0040B0E5 push 0Ch
0040B0E7 push 4280B0h
0040B0EC call __SEH_prolog (40CFF8h)
0040B0F1 mov esi,dword ptr [size]
0040B0F4 cmp dword ptr [___active_heap (434660h)],3
0040B0FB jne $L19917+7 (40B12Bh)
0040B0FD cmp esi,dword ptr [___sbh_threshold (43464Ch)]
0040B103 ja $L19917+7 (40B12Bh)
0040B105 push 4
0040B107 call _lock (40DE73h)
0040B10C pop ecx
0040B10D and dword ptr [ebp-4],0
0040B111 push esi
0040B112 call __sbh_alloc_block (40E736h)
0040B117 pop ecx
0040B118 mov dword ptr [pvReturn],eax
0040B11B or dword ptr [ebp-4],0FFFFFFFFh
0040B11F call $L19916 (40B157h)
$L19917:
0040B124 mov eax,dword ptr [pvReturn]
0040B127 test eax,eax
0040B129 jne $L19917+2Ah (40B14Eh)
0040B12B test esi,esi
0040B12D jne $L19917+0Ch (40B130h)
0040B12F inc esi
0040B130 cmp dword ptr [___active_heap (434660h)],1
0040B137 je $L19917+1Bh (40B13Fh)
0040B139 add esi,0Fh
0040B13C and esi,0FFFFFFF0h
0040B13F push esi
0040B140 push 0
0040B142 push dword ptr [__crtheap (43465Ch)]
0040B148 call dword ptr [__imp__HeapAlloc#12 (425144h)]
0040B14E call __SEH_epilog (40D033h)
0040B153 ret
$L19914:
0040B154 mov esi,dword ptr [ebp+8]
$L19916:
0040B157 push 4
0040B159 call _unlock (40DDBEh)
0040B15E pop ecx
$L19929:
0040B15F ret
_nh_malloc:
0040B160 cmp dword ptr [esp+4],0FFFFFFE0h
0040B165 ja _nh_malloc+29h (40B189h)
With the registers as follows;
EAX = 009C8AF0 EBX = FFFFFFFF ECX = 009C8A88 EDX = 00747365 ESI = 00430F80
EDI = 00430F80 EIP = 0040B160 ESP = 0013FDF4 EBP = 0013FFC0 EFL = 00000206
So the compare does appear to be against the correct constant, i.e. #040B160 cmp dword ptr [esp+4],0FFFFFFE0h, also esp+4 = 0013FDF8 = 1F400000 (my 512mb)
Second edit: Problem was actually in HeapAlloc, as per Andreas' post. Changing to a new seperate heap for large objects, using HeapCreate & HeapAlloc, did not help alleviate the problem, nor did an attempt to use VirtualAlloc with various parameters. Some further experimentation has shown that where allocation one large section of contiguous memory fails, two smaller blocks yielding the same total memory is ok. e.g. where a 300MB malloc fails, 2 x 150MB mallocs work ok. So it looks like I'll need a new array class that can live in a number of biggish memory fragments rather than a single contiguous block. Not a major problem, but I would have expected a bit more out of Win32 in this day and age.
Last edit: The following yielded 1.875GB of space, albeit non-contiguous
#define TenMB 1024*1024*10
void SmallerAllocs()
{
size_t Total = 0;
LPVOID p[200];
for (int i = 0; i < 200; i++)
{
p[i] = malloc(TenMB);
if (p[i])
Total += TenMB; else
break;
}
CString Msg;
Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
AfxMessageBox(Msg,MB_OK);
}
May it be the cast that the debugger is playing a trick on you in release-mode? Neither single stepping nor the values of variables are reliable in release-mode.
I tried your example in VS2003 in release mode, and when single stepping it does at first look like the code is landing on the return NULL line, but when I continue stepping it eventually continues into HeapAlloc, I would guess that it's this function that's failing, looking at the disassembly if (size > _HEAP_MAXREQ) reveals the following:
00401078 cmp dword ptr [esp+4],0FFFFFFE0h
so I don't think it's a problem with _HEAP_MAXREQ.
How do I obtain a stack trace of addresses on Windows without using dbghelp.dll?
I don't need to know what the symbols or function names associated with the addresses, I just want the list of addresses -- something similar to backtrace of *nix systems.
Thanks!
Check out the CaptureStackBackTrace() function, which is in Kernel32.dll. This should do all that you need.
Captures a stack back trace by walking up the stack and recording the information for each frame.
USHORT WINAPI CaptureStackBackTrace(
__in ULONG FramesToSkip,
__in ULONG FramesToCapture,
__out PVOID *BackTrace,
__out_opt PULONG BackTraceHash
);
If you want to do this extremely non-portably, you can just read the EBP register and walk the stack yourself. This will only work for the x86 architecture, and it also assumes that the C runtime you're using properly initializes EBP to 0 before calling the first function.
uint32_t read_ebp(void)
{
uint32_t my_ebp;
__asm
{
mov ebp, my_ebp
}
return my_ebp;
}
void backtrace(void)
{
uint32_t ebp = read_ebp();
printf("backtrace:\n");
while(ebp != 0)
{
printf("0x%08x\n", ebp);
ebp = ((uint32_t *)ebp)[1];
}
}
Previous variant doesn't work for me(msvc 6), so:
unsigned long prev;
unsigned long addr;
__asm { mov prev, ebp }
while(addr!=0) {
addr = ((unsigned long *)prev)[1];
printf("0x%08x\n", addr);
prev = ((unsigned long *)prev)[0];
}
Adam, thanks for highlighting the way!