Replacing assembly instruction changes other instructions - c++

I am trying to place a call instruction at a function to simulate a hook, so i should be replacing 6 bytes in the beginning of the function to place my call which is 2 bytes for the opcode and a dword for the address. however here is the disassembly of the function before i hook it
void realFunction()
{
00B533C0 push ebp
00B533C1 mov ebp,esp
00B533C3 sub esp,0C0h
00B533C9 push ebx
00B533CA push esi
00B533CB push edi
00B533CC lea edi,[ebp-0C0h]
00B533D2 mov ecx,30h
00B533D7 mov eax,0CCCCCCCCh
00B533DC rep stos dword ptr es:[edi]
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
00B533DE mov esi,esp
00B533E0 push 0
00B533E2 push 0B56488h
00B533E7 push 0B56490h
00B533EC push 0
00B533EE call dword ptr ds:[0B5613Ch]
00B533F4 cmp esi,esp
00B533F6 call _RTC_CheckEsp (0B53A10h)
}
and strangely here is it after i just replace 6 bytes
void realFunction()
{
00B533C0 call fakeFunction (0B52EF0h)
00B533C5 rol byte ptr [eax],0 <--
00B533C8 add byte ptr [ebx+56h],dl <--
00B533CB push edi <--
00B533CC lea edi,[ebp-0C0h] <--
00B533D2 mov ecx,30h
00B533D7 mov eax,0CCCCCCCCh
00B533DC rep stos dword ptr es:[edi]
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
00B533DE mov esi,esp
00B533E0 push 0
00B533E2 push 0B56488h
00B533E7 push 0B56490h
00B533EC push 0
00B533EE call dword ptr ds:[0B5613Ch]
00B533F4 cmp esi,esp
00B533F6 call _RTC_CheckEsp (0B53A10h)
}
code for the hook
#include <iostream>
#include <windows.h>
using namespace std;
void realFunction()
{
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
}
__declspec(naked) void fakeFunction()
{
__asm {
pushad;
pushfd;
}
MessageBox(NULL, "fakeFunction()", "Trace", MB_OK);
__asm{
popfd;
popad;
ret; //This should return back and resumes the execution of the original function;
}
}
void main()
{
DWORD size = sizeof(double);
DWORD oldProtection;
DWORD realFunctionAddr = (DWORD)realFunction;
DWORD fakeFunctionAddr = (DWORD)fakeFunction;
VirtualProtect((LPVOID)realFunctionAddr, size, PAGE_EXECUTE_READWRITE, &oldProtection);
*((PBYTE)(realFunctionAddr)) = 0xE8;
*((PDWORD)(realFunctionAddr + 1)) = fakeFunctionAddr - realFunctionAddr - 5;
VirtualProtect((LPVOID)fakeFunctionAddr, size, oldProtection, &oldProtection);
realFunction();
while (true){
cin.get();
}
}
I want to understand why this happens, why not just the 6 bytes i replaced are changed ?

As you can see, the sub esp,0C0h instruction begins at address 00B533C3, but the next instruction push ebx begins at address 00B533C9. You have overwritten addresses 00B533C0 through 00B533C5, so immediately after your 6 bytes you are in the middle of the sub esp,0C0h instruction.
The disassembler has no way of knowing that a certain byte is garbage and not an instruction, so it tries to interpret the bytes as instructions, as best as it can, and what you see is, of course, nonsensical instructions. After a while it just so happens (by coincidence) that the end of a nonsensical instruction coincides with the end of an actual instruction that used to be there, so from that point on the disassembler interprets instructions successfully, that's why the remainder of your function looks okay.
If you look at the actual bytes, and not at the assembly language mnemonic interpretations of these bytes, you will see that nothing funky is going on.
(Except, perhaps, for the fact that you appear to have replaced 5, not 6 bytes.)

Related

c++ stack memory scope with curly braces

I'm looking for a double check on my understanding. I ran across code of this form:
#define BUFLEN_256 256
int main()
{
const char* charPtr = "";
if (true /* some real test here */)
{
char buf[BUFLEN_256] = { 0 };
snprintf(buf, BUFLEN_256, "Some string goes here..");
charPtr = buf;
}
std::cout << charPtr << std::endl; // Is accessing charPtr technically dangerous here?
}
My immediate thought was bug, the stack memory assigned to buf[] is no longer guaranteed to belong to the array once you exit the if(){}. But the code builds and runs without problem, and in double checking myself I got confused. I'm not good at assembly, but if I'm reading it correctly it does not appear that the stack pointer is reset after leaving the curly braces. Can someone double check me on that and chime in as to whether this code is technically valid? Here is the code with the assembly (built with Visual Studio 2019). My thought is this code is not OK, but I've been wrong on odd issues before.
#define BUFLEN_256 256
int main()
{
00DA25C0 push ebp
00DA25C1 mov ebp,esp
00DA25C3 sub esp,1D8h
00DA25C9 push ebx
00DA25CA push esi
00DA25CB push edi
00DA25CC lea edi,[ebp-1D8h]
00DA25D2 mov ecx,76h
00DA25D7 mov eax,0CCCCCCCCh
00DA25DC rep stos dword ptr es:[edi]
00DA25DE mov eax,dword ptr [__security_cookie (0DAC004h)]
00DA25E3 xor eax,ebp
00DA25E5 mov dword ptr [ebp-4],eax
00DA25E8 mov ecx,offset _1FACD15F_scratch#cpp (0DAF029h)
00DA25ED call #__CheckForDebuggerJustMyCode#4 (0DA138Eh)
const char* charPtr = "";
00DA25F2 mov dword ptr [charPtr],offset string "" (0DA9B30h)
if (true /* some real test here */)
00DA25F9 mov eax,1
00DA25FE test eax,eax
00DA2600 je main+7Ah (0DA263Ah)
{
char buf[BUFLEN_256] = { 0 };
00DA2602 push 100h
00DA2607 push 0
00DA2609 lea eax,[ebp-114h]
00DA260F push eax
00DA2610 call _memset (0DA1186h)
00DA2615 add esp,0Ch
snprintf(buf, BUFLEN_256, "Some string goes here..");
00DA2618 push offset string "Some string goes here.." (0DA9BB8h)
00DA261D push 100h
00DA2622 lea eax,[ebp-114h]
00DA2628 push eax
00DA2629 call _snprintf (0DA1267h)
00DA262E add esp,0Ch
charPtr = buf;
00DA2631 lea eax,[ebp-114h]
00DA2637 mov dword ptr [charPtr],eax
}
std::cout << charPtr << std::endl;
00DA263A mov esi,esp
00DA263C push offset std::endl<char,std::char_traits<char> > (0DA103Ch)
00DA2641 mov eax,dword ptr [charPtr]
00DA2644 push eax
00DA2645 mov ecx,dword ptr [__imp_std::cout (0DAD0D4h)]
00DA264B push ecx
00DA264C call std::operator<<<std::char_traits<char> > (0DA11AEh)
00DA2651 add esp,8
00DA2654 mov ecx,eax
00DA2656 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0DAD0A0h)]
00DA265C cmp esi,esp
00DA265E call __RTC_CheckEsp (0DA129Eh)
}
00DA2663 xor eax,eax
00DA2665 push edx
00DA2666 mov ecx,ebp
00DA2668 push eax
00DA2669 lea edx,ds:[0DA2694h]
00DA266F call #_RTC_CheckStackVars#8 (0DA1235h)
00DA2674 pop eax
00DA2675 pop edx
00DA2676 pop edi
00DA2677 pop esi
00DA2678 pop ebx
00DA2679 mov ecx,dword ptr [ebp-4]
00DA267C xor ecx,ebp
00DA267E call #__security_check_cookie#4 (0DA1181h)
00DA2683 add esp,1D8h
00DA2689 cmp ebp,esp
00DA268B call __RTC_CheckEsp (0DA129Eh)
00DA2690 mov esp,ebp
00DA2692 pop ebp
00DA2693 ret
00DA2694 add dword ptr [eax],eax
00DA2696 add byte ptr [eax],al
00DA2698 pushfd
00DA2699 fiadd dword ptr es:[eax]
00DA269C in al,dx
00DA269D ?? ??????
00DA269E ?? ??????
}
My immediate thought was bug, the stack memory assigned to buf[] is no longer guaranteed to belong to the array once you exit the if(){}.
That is correct.
But the code builds and runs without problem
Undefined Behavior. In the cout << charPtr statement, charPtr is a dangling pointer to invalid memory. Whether or not the memory has been physically freed is irrelevent. The memory has gone out of scope.
I'm not good at assembly, but if I'm reading it correctly it does not appear that the stack pointer is reset after leaving the curly braces.
That is correct.
The memory for the array is being pre-allocated at the top of the stack frame when the function is entered (as part of the sub esp, 1D8h instruction), and then gets released during cleanup of the stack frame when the function exits (as part of the add esp, 1D8h instruction).
As you can see, when the if is entered, the very first thing it does is to call _memset() to zero out an array which already exists at [ebp-114h].
But that is an implementation detail, don't rely on that.
Can someone double check me on that and chime in as to whether this code is technically valid?
It is not.
What you're seeing is 'undefined' behavior. Stack memory is typically allocated all in one go at the start. So when a variable goes out-of-scope on the stack, that memory becomes available for re-use. Since you're not overwriting the stack with anything after the if statement, the data previously stored there is still intact. If you were to allocate additional memory/data to the stack after the if statement, you'd see a much different result.
See this post here:
What happens when a variable goes out of scope?
Edit:
To elaborate and demonstrate this, consider the following modification of your code (Compiled on VS2019 v142 x64):
#include <iostream>
#define BUFLEN_256 256
int main()
{
char* charPtr;
char other_buf[BUFLEN_256] = { 0 };
char* charPtr2 = other_buf;
if (true /* some real test here */)
{
char buf[BUFLEN_256] = { 0 };
snprintf(buf, BUFLEN_256, "Some string goes here..");
charPtr = buf;
}
std::cout << charPtr << std::endl;
for (int n = 0; n < 3000; ++n)
{
*charPtr2 = 'a';
charPtr2++;
}
std::cout << charPtr << std::endl;
}
Output
Some string goes here..
Some string goes haaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaca
Of course, keeping in mind that every compiler handles optimizations differently, and this may or may not happen in every case. That is why the behavior is 'undefined'. This example is more-so demonstrating overrunning the stack intentionally (buffer-overrun), but it illustrates the same effect. I'd produce a more direct example of legitimate cases where this could happen, but ironically undefined behavior is difficult to intentionally reproduce.
Yes, accessing charPtr in this way is undefined behaviour - and hence dangerous - because buf goes out of scope at the closing brace.
In practise, the code may work (or appear to work) because the memory used for buf is not re-used immediately but you should not, of course, rely on that. Whoever wrote this code made a mistake.

Storing register values from a hook in assembly/c++ and save to array

I'm trying to make a hook that takes the rdi from a function and stores it in an array.
I know how to do it using inline assembly as by the code example:
...
uintptr_t objStart = 0x0;
uintptr_t jmpBack = 0x0;
Ent* ents[255];
Ent* entsptr;
void hook()
{
__asm {
movss xmm10, [rdi + 0x54]
movss xmm11, [rdi + 0x58]
mov EntityObjStart, rdi
push rax
push rcx
push rdx
push rbx
push rsp
push rbp
push rsi
push rdi
}
__asm {
mov rax, ObjStart
mov[entsptr], rax
}
...
for (int i = 0; i < 254; i++)
{
if (ents[i] == 0)
{
ents[i] = entsptr;
break;
}
}
...
Sadly this won't work because MSVS doesn't support inline assembly for 64 bit. I tried doing a separate asm file and link that together with the project, which works, but I don't know how to pass in the pointers to the assembly file or get the RDI out so I can store it in the array...
If you guys have any suggestions, that would be amazing!

Trying to create a windows 8 syscall callgate function

I have a windows 7 callgate function that I use to call NT functions directly:
//Windows 7 syscall
__declspec(naked)
NTSTATUS __fastcall wow64 ( DWORD ecxId, char *edxArgs )
{
__asm
{
mov eax, ecx;
mov ecx, m_param;
call DWORD ptr fs:[0xc0];
add esp, 0x4;
retn;
};
}
NTSTATUS callGate ( DWORD id, ... )
{
va_list valist;
va_start(valist,id);
return wow64(id,valist);
}
//Example NTClose function
NTSTATUS closeHandle ( void *object )
{
m_param = 0;
return callGate ( 0xc, object );
}
I am trying to do the same thing for windows 8.1. I have updated all of the function call indexes; however I noticed the actual callgate function is quite different on windows 8.1:
Here is what the actual call gate looks like (located in ntdll.dll) for the function ZwCreateThreadEx
mov eax, 0xA5 //the call index
xor ecx, ecx //(m_param)
lea edx, dword ptr ss:[esp + 0x4] //this causes an sp-analysis failure in IDA
call dword ptr fs:[0xC0]
add esp, 0x4
retn 0x2C
Now here is the EXACT same NT function (ZwCreateThreadEx) on windows 8.1
mov eax, 0xB0 //the call index
call dword ptr fs:[0xC0]
retn 0x2C //2c/4 = 11 parameters
I have been trying all kinds of stuff to get this working on windows 8.1 but have had no avail. I cannot explain what the issue is or what is going wrong, all I know is I am doing it correctly on windows 7.
From the looks of the W8.1 function, I have attempted to come up with this single function (Does not work):
DWORD dwebp,dwret,dwparams; //for saving stuff
NTSTATUS __cdecl callGate ( DWORD id, DWORD numparams, ... )
{
_asm
{
pop dwebp; //save ebp off stack
pop dwret; //save return address
pop eax; //save id
pop dwparams; //save param count
push dwret; //push return addy back onto stack cuz thats how windows has it
JMP DWORD ptr fs:[0xc0]; //call with correct stackframe (i think)
mov ecx, numparams; //store num params
imul ecx, 4; //multiply numparams by sizeof(int)
add esp, ecx; //add to esp
ret;
};
}
Any help would be appreciated greatly.
Your new callGate function doesn't set up the stack frame you want, the return address at the top of the stack is return address of callGate not the instruction after the call.
This is what the stack looks like after the CALL instruction is executed in your example ZwCreateThreadEx from Windows 8.1:
return address (retn 0x2c instruction)
return address (caller of ZwCreateThreadEx)
arguments (11 DWORDs)
Here's what the stack looks like after the JMP instruction is executed in your new callGate function:
return address (caller of callGate)
arguments
There are other problems with your new callGate function. It saves values in global variables which means you function isn't thread safe. Two threads can't call callBack at the same time without trashing these saved values. It uses inline assembly which both makes your code more complicated that it needs to be and make its dependent on undocumented behaviour: how the compiler will set up the stack for the function.
Here's how I write your Windows 8.1 version of callGate in MASM:
_text SEGMENT
MAXARGS = 16
do_call MACRO argcount
##call&argcount:
call DWORD PTR fs:[0C0h]
ret argcount * 4
ENDM
call_table_entry MACRO argcount
DD OFFSET ##call&argcount
ENDM
_callGate PROC
pop edx ; return address
pop eax ; id
pop ecx ; numparams
push edx ; return address
cmp ecx, MAXARGS
jg ##fail
jmp [##call_table + ecx * 4]
##args = 0
REPT MAXARGS + 1
do_call %##args
##args = ##args + 1
ENDM
##fail:
; add better error handling
int 3
jmp ##fail
##call_table:
##args = 0
REPT MAXARGS + 1
call_table_entry %##args
##args = ##args + 1
ENDM
_callGate ENDP
_TEXT ENDS
END
This implementation is limited to MAXARGS arguments (change the value if any Windows system call takes more than 16 arguments). It uses macros generate a table of CALL/RET code blocks to avoid having to store the number of arguments somewhere across the call. I have a version that supports any number of arguments but it's more complicated and a fair bit slower. This implementation is untested, I don't have Windows 8.1.

Visual C++ - enable optimizations and browse the optimized code

How I can compile my project with optimizations turned on and see what optimizations changed in my code.
For example:
My original code:
printf("Test: %d",52);
for (int empty=0;i<100000;i++) {
//Nothing here
}
Now when I compile my code with optimzations ,I want to see: (I think it'll be like that)
printf("Test: 52");
The compiler doesn't optimize modify the source code (what you've shown in your question), but the binary, which consists of asm instructions.
How you turn optimization on and off depends on the compiler, so you'll have to refer to its documentation.
In MSVS, you can choose this option from the top toolbar - look for Debug (un-optimized) vs Release (optimized). You can see the binary code by stepping through the code with the debugger, right click -> show dissasembly.
Your code, for example, generates:
Without optimizations:
printf("Test: %d",52);
0097171E mov esi,esp
00971720 push 34h
00971722 push offset string "Test: %d" (9788C8h)
00971727 call dword ptr [__imp__printf (97C3E0h)]
0097172D add esp,8
00971730 cmp esi,esp
00971732 call #ILT+575(__RTC_CheckEsp) (971244h)
for (int i=0;i<100000;i++) {
00971737 mov dword ptr [i],0
0097173E jmp wmain+49h (971749h)
00971740 mov eax,dword ptr [i]
00971743 add eax,1
00971746 mov dword ptr [i],eax
00971749 cmp dword ptr [i],186A0h
00971750 jge wmain+54h (971754h)
//Nothing here
}
00971752 jmp wmain+40h (971740h)
With optimizations:
printf("Test: %d",52);
013A1000 push 34h
013A1002 push offset string "Test: %d" (13A20F4h)
013A1007 call dword ptr [__imp__printf (13A209Ch)]
013A100D add esp,8
for (int i=0;i<100000;i++) {
//Nothing here
}

Unusual heap size limitations in VS2003 C++

I have a C++ app that uses large arrays of data, and have noticed while testing that it is running out of memory, while there is still plenty of memory available. I have reduced the code to a sample test case as follows;
void MemTest()
{
size_t Size = 500*1024*1024; // 512mb
if (Size > _HEAP_MAXREQ)
TRACE("Invalid Size");
void * mem = malloc(Size);
if (mem == NULL)
TRACE("allocation failed");
}
If I create a new MFC project, include this function, and run it from InitInstance, it works fine in debug mode (memory allocated as expected), yet fails in release mode (malloc returns NULL). Single stepping through release into the C run times, my function gets inlined I get the following
// malloc.c
void * __cdecl _malloc_base (size_t size)
{
void *res = _nh_malloc_base(size, _newmode);
RTCCALLBACK(_RTC_Allocate_hook, (res, size, 0));
return res;
}
Calling _nh_malloc_base
void * __cdecl _nh_malloc_base (size_t size, int nhFlag)
{
void * pvReturn;
// validate size
if (size > _HEAP_MAXREQ)
return NULL;
'
'
And (size > _HEAP_MAXREQ) returns true and hence my memory doesn't get allocated. Putting a watch on size comes back with the exptected 512MB, which suggests the program is linking into a different run-time library with a much smaller _HEAP_MAXREQ. Grepping the VC++ folders for _HEAP_MAXREQ shows the expected 0xFFFFFFE0, so I can't figure out what is happening here. Anyone know of any CRT changes or versions that would cause this problem, or am I missing something way more obvious?
Edit: As suggested by Andreas, looking at this under this assembly view shows the following;
--- f:\vs70builds\3077\vc\crtbld\crt\src\malloc.c ------------------------------
_heap_alloc:
0040B0E5 push 0Ch
0040B0E7 push 4280B0h
0040B0EC call __SEH_prolog (40CFF8h)
0040B0F1 mov esi,dword ptr [size]
0040B0F4 cmp dword ptr [___active_heap (434660h)],3
0040B0FB jne $L19917+7 (40B12Bh)
0040B0FD cmp esi,dword ptr [___sbh_threshold (43464Ch)]
0040B103 ja $L19917+7 (40B12Bh)
0040B105 push 4
0040B107 call _lock (40DE73h)
0040B10C pop ecx
0040B10D and dword ptr [ebp-4],0
0040B111 push esi
0040B112 call __sbh_alloc_block (40E736h)
0040B117 pop ecx
0040B118 mov dword ptr [pvReturn],eax
0040B11B or dword ptr [ebp-4],0FFFFFFFFh
0040B11F call $L19916 (40B157h)
$L19917:
0040B124 mov eax,dword ptr [pvReturn]
0040B127 test eax,eax
0040B129 jne $L19917+2Ah (40B14Eh)
0040B12B test esi,esi
0040B12D jne $L19917+0Ch (40B130h)
0040B12F inc esi
0040B130 cmp dword ptr [___active_heap (434660h)],1
0040B137 je $L19917+1Bh (40B13Fh)
0040B139 add esi,0Fh
0040B13C and esi,0FFFFFFF0h
0040B13F push esi
0040B140 push 0
0040B142 push dword ptr [__crtheap (43465Ch)]
0040B148 call dword ptr [__imp__HeapAlloc#12 (425144h)]
0040B14E call __SEH_epilog (40D033h)
0040B153 ret
$L19914:
0040B154 mov esi,dword ptr [ebp+8]
$L19916:
0040B157 push 4
0040B159 call _unlock (40DDBEh)
0040B15E pop ecx
$L19929:
0040B15F ret
_nh_malloc:
0040B160 cmp dword ptr [esp+4],0FFFFFFE0h
0040B165 ja _nh_malloc+29h (40B189h)
With the registers as follows;
EAX = 009C8AF0 EBX = FFFFFFFF ECX = 009C8A88 EDX = 00747365 ESI = 00430F80
EDI = 00430F80 EIP = 0040B160 ESP = 0013FDF4 EBP = 0013FFC0 EFL = 00000206
So the compare does appear to be against the correct constant, i.e. #040B160 cmp dword ptr [esp+4],0FFFFFFE0h, also esp+4 = 0013FDF8 = 1F400000 (my 512mb)
Second edit: Problem was actually in HeapAlloc, as per Andreas' post. Changing to a new seperate heap for large objects, using HeapCreate & HeapAlloc, did not help alleviate the problem, nor did an attempt to use VirtualAlloc with various parameters. Some further experimentation has shown that where allocation one large section of contiguous memory fails, two smaller blocks yielding the same total memory is ok. e.g. where a 300MB malloc fails, 2 x 150MB mallocs work ok. So it looks like I'll need a new array class that can live in a number of biggish memory fragments rather than a single contiguous block. Not a major problem, but I would have expected a bit more out of Win32 in this day and age.
Last edit: The following yielded 1.875GB of space, albeit non-contiguous
#define TenMB 1024*1024*10
void SmallerAllocs()
{
size_t Total = 0;
LPVOID p[200];
for (int i = 0; i < 200; i++)
{
p[i] = malloc(TenMB);
if (p[i])
Total += TenMB; else
break;
}
CString Msg;
Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
AfxMessageBox(Msg,MB_OK);
}
May it be the cast that the debugger is playing a trick on you in release-mode? Neither single stepping nor the values of variables are reliable in release-mode.
I tried your example in VS2003 in release mode, and when single stepping it does at first look like the code is landing on the return NULL line, but when I continue stepping it eventually continues into HeapAlloc, I would guess that it's this function that's failing, looking at the disassembly if (size > _HEAP_MAXREQ) reveals the following:
00401078 cmp dword ptr [esp+4],0FFFFFFE0h
so I don't think it's a problem with _HEAP_MAXREQ.