How I can compile my project with optimizations turned on and see what optimizations changed in my code.
For example:
My original code:
printf("Test: %d",52);
for (int empty=0;i<100000;i++) {
//Nothing here
}
Now when I compile my code with optimzations ,I want to see: (I think it'll be like that)
printf("Test: 52");
The compiler doesn't optimize modify the source code (what you've shown in your question), but the binary, which consists of asm instructions.
How you turn optimization on and off depends on the compiler, so you'll have to refer to its documentation.
In MSVS, you can choose this option from the top toolbar - look for Debug (un-optimized) vs Release (optimized). You can see the binary code by stepping through the code with the debugger, right click -> show dissasembly.
Your code, for example, generates:
Without optimizations:
printf("Test: %d",52);
0097171E mov esi,esp
00971720 push 34h
00971722 push offset string "Test: %d" (9788C8h)
00971727 call dword ptr [__imp__printf (97C3E0h)]
0097172D add esp,8
00971730 cmp esi,esp
00971732 call #ILT+575(__RTC_CheckEsp) (971244h)
for (int i=0;i<100000;i++) {
00971737 mov dword ptr [i],0
0097173E jmp wmain+49h (971749h)
00971740 mov eax,dword ptr [i]
00971743 add eax,1
00971746 mov dword ptr [i],eax
00971749 cmp dword ptr [i],186A0h
00971750 jge wmain+54h (971754h)
//Nothing here
}
00971752 jmp wmain+40h (971740h)
With optimizations:
printf("Test: %d",52);
013A1000 push 34h
013A1002 push offset string "Test: %d" (13A20F4h)
013A1007 call dword ptr [__imp__printf (13A209Ch)]
013A100D add esp,8
for (int i=0;i<100000;i++) {
//Nothing here
}
Related
I'm looking for a double check on my understanding. I ran across code of this form:
#define BUFLEN_256 256
int main()
{
const char* charPtr = "";
if (true /* some real test here */)
{
char buf[BUFLEN_256] = { 0 };
snprintf(buf, BUFLEN_256, "Some string goes here..");
charPtr = buf;
}
std::cout << charPtr << std::endl; // Is accessing charPtr technically dangerous here?
}
My immediate thought was bug, the stack memory assigned to buf[] is no longer guaranteed to belong to the array once you exit the if(){}. But the code builds and runs without problem, and in double checking myself I got confused. I'm not good at assembly, but if I'm reading it correctly it does not appear that the stack pointer is reset after leaving the curly braces. Can someone double check me on that and chime in as to whether this code is technically valid? Here is the code with the assembly (built with Visual Studio 2019). My thought is this code is not OK, but I've been wrong on odd issues before.
#define BUFLEN_256 256
int main()
{
00DA25C0 push ebp
00DA25C1 mov ebp,esp
00DA25C3 sub esp,1D8h
00DA25C9 push ebx
00DA25CA push esi
00DA25CB push edi
00DA25CC lea edi,[ebp-1D8h]
00DA25D2 mov ecx,76h
00DA25D7 mov eax,0CCCCCCCCh
00DA25DC rep stos dword ptr es:[edi]
00DA25DE mov eax,dword ptr [__security_cookie (0DAC004h)]
00DA25E3 xor eax,ebp
00DA25E5 mov dword ptr [ebp-4],eax
00DA25E8 mov ecx,offset _1FACD15F_scratch#cpp (0DAF029h)
00DA25ED call #__CheckForDebuggerJustMyCode#4 (0DA138Eh)
const char* charPtr = "";
00DA25F2 mov dword ptr [charPtr],offset string "" (0DA9B30h)
if (true /* some real test here */)
00DA25F9 mov eax,1
00DA25FE test eax,eax
00DA2600 je main+7Ah (0DA263Ah)
{
char buf[BUFLEN_256] = { 0 };
00DA2602 push 100h
00DA2607 push 0
00DA2609 lea eax,[ebp-114h]
00DA260F push eax
00DA2610 call _memset (0DA1186h)
00DA2615 add esp,0Ch
snprintf(buf, BUFLEN_256, "Some string goes here..");
00DA2618 push offset string "Some string goes here.." (0DA9BB8h)
00DA261D push 100h
00DA2622 lea eax,[ebp-114h]
00DA2628 push eax
00DA2629 call _snprintf (0DA1267h)
00DA262E add esp,0Ch
charPtr = buf;
00DA2631 lea eax,[ebp-114h]
00DA2637 mov dword ptr [charPtr],eax
}
std::cout << charPtr << std::endl;
00DA263A mov esi,esp
00DA263C push offset std::endl<char,std::char_traits<char> > (0DA103Ch)
00DA2641 mov eax,dword ptr [charPtr]
00DA2644 push eax
00DA2645 mov ecx,dword ptr [__imp_std::cout (0DAD0D4h)]
00DA264B push ecx
00DA264C call std::operator<<<std::char_traits<char> > (0DA11AEh)
00DA2651 add esp,8
00DA2654 mov ecx,eax
00DA2656 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0DAD0A0h)]
00DA265C cmp esi,esp
00DA265E call __RTC_CheckEsp (0DA129Eh)
}
00DA2663 xor eax,eax
00DA2665 push edx
00DA2666 mov ecx,ebp
00DA2668 push eax
00DA2669 lea edx,ds:[0DA2694h]
00DA266F call #_RTC_CheckStackVars#8 (0DA1235h)
00DA2674 pop eax
00DA2675 pop edx
00DA2676 pop edi
00DA2677 pop esi
00DA2678 pop ebx
00DA2679 mov ecx,dword ptr [ebp-4]
00DA267C xor ecx,ebp
00DA267E call #__security_check_cookie#4 (0DA1181h)
00DA2683 add esp,1D8h
00DA2689 cmp ebp,esp
00DA268B call __RTC_CheckEsp (0DA129Eh)
00DA2690 mov esp,ebp
00DA2692 pop ebp
00DA2693 ret
00DA2694 add dword ptr [eax],eax
00DA2696 add byte ptr [eax],al
00DA2698 pushfd
00DA2699 fiadd dword ptr es:[eax]
00DA269C in al,dx
00DA269D ?? ??????
00DA269E ?? ??????
}
My immediate thought was bug, the stack memory assigned to buf[] is no longer guaranteed to belong to the array once you exit the if(){}.
That is correct.
But the code builds and runs without problem
Undefined Behavior. In the cout << charPtr statement, charPtr is a dangling pointer to invalid memory. Whether or not the memory has been physically freed is irrelevent. The memory has gone out of scope.
I'm not good at assembly, but if I'm reading it correctly it does not appear that the stack pointer is reset after leaving the curly braces.
That is correct.
The memory for the array is being pre-allocated at the top of the stack frame when the function is entered (as part of the sub esp, 1D8h instruction), and then gets released during cleanup of the stack frame when the function exits (as part of the add esp, 1D8h instruction).
As you can see, when the if is entered, the very first thing it does is to call _memset() to zero out an array which already exists at [ebp-114h].
But that is an implementation detail, don't rely on that.
Can someone double check me on that and chime in as to whether this code is technically valid?
It is not.
What you're seeing is 'undefined' behavior. Stack memory is typically allocated all in one go at the start. So when a variable goes out-of-scope on the stack, that memory becomes available for re-use. Since you're not overwriting the stack with anything after the if statement, the data previously stored there is still intact. If you were to allocate additional memory/data to the stack after the if statement, you'd see a much different result.
See this post here:
What happens when a variable goes out of scope?
Edit:
To elaborate and demonstrate this, consider the following modification of your code (Compiled on VS2019 v142 x64):
#include <iostream>
#define BUFLEN_256 256
int main()
{
char* charPtr;
char other_buf[BUFLEN_256] = { 0 };
char* charPtr2 = other_buf;
if (true /* some real test here */)
{
char buf[BUFLEN_256] = { 0 };
snprintf(buf, BUFLEN_256, "Some string goes here..");
charPtr = buf;
}
std::cout << charPtr << std::endl;
for (int n = 0; n < 3000; ++n)
{
*charPtr2 = 'a';
charPtr2++;
}
std::cout << charPtr << std::endl;
}
Output
Some string goes here..
Some string goes haaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaca
Of course, keeping in mind that every compiler handles optimizations differently, and this may or may not happen in every case. That is why the behavior is 'undefined'. This example is more-so demonstrating overrunning the stack intentionally (buffer-overrun), but it illustrates the same effect. I'd produce a more direct example of legitimate cases where this could happen, but ironically undefined behavior is difficult to intentionally reproduce.
Yes, accessing charPtr in this way is undefined behaviour - and hence dangerous - because buf goes out of scope at the closing brace.
In practise, the code may work (or appear to work) because the memory used for buf is not re-used immediately but you should not, of course, rely on that. Whoever wrote this code made a mistake.
I'm trying to make a hook that takes the rdi from a function and stores it in an array.
I know how to do it using inline assembly as by the code example:
...
uintptr_t objStart = 0x0;
uintptr_t jmpBack = 0x0;
Ent* ents[255];
Ent* entsptr;
void hook()
{
__asm {
movss xmm10, [rdi + 0x54]
movss xmm11, [rdi + 0x58]
mov EntityObjStart, rdi
push rax
push rcx
push rdx
push rbx
push rsp
push rbp
push rsi
push rdi
}
__asm {
mov rax, ObjStart
mov[entsptr], rax
}
...
for (int i = 0; i < 254; i++)
{
if (ents[i] == 0)
{
ents[i] = entsptr;
break;
}
}
...
Sadly this won't work because MSVS doesn't support inline assembly for 64 bit. I tried doing a separate asm file and link that together with the project, which works, but I don't know how to pass in the pointers to the assembly file or get the RDI out so I can store it in the array...
If you guys have any suggestions, that would be amazing!
I got the snippet below from this comment by #CaffeineAddict.
#include <iostream>
template<typename base_t, typename expo_t>
constexpr base_t POW(base_t base, expo_t expo)
{
return (expo != 0) ? base * POW(base, expo - 1) : 1;
}
int main(int argc, char** argv)
{
std::cout << POW((unsigned __int64)2, 63) << std::endl;
return 0;
}
with the following disassembly obtained from VS2015:
int main(int argc, char** argv)
{
009418A0 push ebp
009418A1 mov ebp,esp
009418A3 sub esp,0C0h
009418A9 push ebx
009418AA push esi
009418AB push edi
009418AC lea edi,[ebp-0C0h]
009418B2 mov ecx,30h
009418B7 mov eax,0CCCCCCCCh
009418BC rep stos dword ptr es:[edi]
std::cout << POW((unsigned __int64)2, 63) << std::endl;
009418BE mov esi,esp
009418C0 push offset std::endl<char,std::char_traits<char> > (0941064h)
009418C5 push 3Fh
009418C7 push 0
009418C9 push 2
009418CB call POW<unsigned __int64,int> (09410FAh) <<========
009418D0 add esp,0Ch
009418D3 mov edi,esp
009418D5 push edx
009418D6 push eax
009418D7 mov ecx,dword ptr [_imp_?cout#std##3V?$basic_ostream#DU?$char_traits#D#std###1#A (094A098h)]
009418DD call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (094A0ACh)]
009418E3 cmp edi,esp
009418E5 call __RTC_CheckEsp (0941127h)
009418EA mov ecx,eax
009418EC call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (094A0B0h)]
009418F2 cmp esi,esp
009418F4 call __RTC_CheckEsp (0941127h)
return 0;
009418F9 xor eax,eax
}
which shows (see the characters "<<======" introduced by me in the disassembly) that the compiler didn't evaluate the function POW at compile time. From his comment, #CaffeineAddict seemed to expect this behavior from the compiler. But I still can't understand why was this expected at all?
Two reasons.
First, a constexpr function isn't guaranteed to be called at compile time. To force the compiler to call it at compile time, you must store it in a constexpr variable first, i.e.
constexpr auto pow = POW((unsigned __int64)2, 63);
std::cout << pow << std::endl;
Secondly, you must build the project in Release configuration. In VS, you'll find you can break-point through constexpr functions if you have built the project using Debug configuration.
I am trying to place a call instruction at a function to simulate a hook, so i should be replacing 6 bytes in the beginning of the function to place my call which is 2 bytes for the opcode and a dword for the address. however here is the disassembly of the function before i hook it
void realFunction()
{
00B533C0 push ebp
00B533C1 mov ebp,esp
00B533C3 sub esp,0C0h
00B533C9 push ebx
00B533CA push esi
00B533CB push edi
00B533CC lea edi,[ebp-0C0h]
00B533D2 mov ecx,30h
00B533D7 mov eax,0CCCCCCCCh
00B533DC rep stos dword ptr es:[edi]
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
00B533DE mov esi,esp
00B533E0 push 0
00B533E2 push 0B56488h
00B533E7 push 0B56490h
00B533EC push 0
00B533EE call dword ptr ds:[0B5613Ch]
00B533F4 cmp esi,esp
00B533F6 call _RTC_CheckEsp (0B53A10h)
}
and strangely here is it after i just replace 6 bytes
void realFunction()
{
00B533C0 call fakeFunction (0B52EF0h)
00B533C5 rol byte ptr [eax],0 <--
00B533C8 add byte ptr [ebx+56h],dl <--
00B533CB push edi <--
00B533CC lea edi,[ebp-0C0h] <--
00B533D2 mov ecx,30h
00B533D7 mov eax,0CCCCCCCCh
00B533DC rep stos dword ptr es:[edi]
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
00B533DE mov esi,esp
00B533E0 push 0
00B533E2 push 0B56488h
00B533E7 push 0B56490h
00B533EC push 0
00B533EE call dword ptr ds:[0B5613Ch]
00B533F4 cmp esi,esp
00B533F6 call _RTC_CheckEsp (0B53A10h)
}
code for the hook
#include <iostream>
#include <windows.h>
using namespace std;
void realFunction()
{
MessageBox(NULL, "realFunction()", "Trace", MB_OK);
}
__declspec(naked) void fakeFunction()
{
__asm {
pushad;
pushfd;
}
MessageBox(NULL, "fakeFunction()", "Trace", MB_OK);
__asm{
popfd;
popad;
ret; //This should return back and resumes the execution of the original function;
}
}
void main()
{
DWORD size = sizeof(double);
DWORD oldProtection;
DWORD realFunctionAddr = (DWORD)realFunction;
DWORD fakeFunctionAddr = (DWORD)fakeFunction;
VirtualProtect((LPVOID)realFunctionAddr, size, PAGE_EXECUTE_READWRITE, &oldProtection);
*((PBYTE)(realFunctionAddr)) = 0xE8;
*((PDWORD)(realFunctionAddr + 1)) = fakeFunctionAddr - realFunctionAddr - 5;
VirtualProtect((LPVOID)fakeFunctionAddr, size, oldProtection, &oldProtection);
realFunction();
while (true){
cin.get();
}
}
I want to understand why this happens, why not just the 6 bytes i replaced are changed ?
As you can see, the sub esp,0C0h instruction begins at address 00B533C3, but the next instruction push ebx begins at address 00B533C9. You have overwritten addresses 00B533C0 through 00B533C5, so immediately after your 6 bytes you are in the middle of the sub esp,0C0h instruction.
The disassembler has no way of knowing that a certain byte is garbage and not an instruction, so it tries to interpret the bytes as instructions, as best as it can, and what you see is, of course, nonsensical instructions. After a while it just so happens (by coincidence) that the end of a nonsensical instruction coincides with the end of an actual instruction that used to be there, so from that point on the disassembler interprets instructions successfully, that's why the remainder of your function looks okay.
If you look at the actual bytes, and not at the assembly language mnemonic interpretations of these bytes, you will see that nothing funky is going on.
(Except, perhaps, for the fact that you appear to have replaced 5, not 6 bytes.)
I have a C++ app that uses large arrays of data, and have noticed while testing that it is running out of memory, while there is still plenty of memory available. I have reduced the code to a sample test case as follows;
void MemTest()
{
size_t Size = 500*1024*1024; // 512mb
if (Size > _HEAP_MAXREQ)
TRACE("Invalid Size");
void * mem = malloc(Size);
if (mem == NULL)
TRACE("allocation failed");
}
If I create a new MFC project, include this function, and run it from InitInstance, it works fine in debug mode (memory allocated as expected), yet fails in release mode (malloc returns NULL). Single stepping through release into the C run times, my function gets inlined I get the following
// malloc.c
void * __cdecl _malloc_base (size_t size)
{
void *res = _nh_malloc_base(size, _newmode);
RTCCALLBACK(_RTC_Allocate_hook, (res, size, 0));
return res;
}
Calling _nh_malloc_base
void * __cdecl _nh_malloc_base (size_t size, int nhFlag)
{
void * pvReturn;
// validate size
if (size > _HEAP_MAXREQ)
return NULL;
'
'
And (size > _HEAP_MAXREQ) returns true and hence my memory doesn't get allocated. Putting a watch on size comes back with the exptected 512MB, which suggests the program is linking into a different run-time library with a much smaller _HEAP_MAXREQ. Grepping the VC++ folders for _HEAP_MAXREQ shows the expected 0xFFFFFFE0, so I can't figure out what is happening here. Anyone know of any CRT changes or versions that would cause this problem, or am I missing something way more obvious?
Edit: As suggested by Andreas, looking at this under this assembly view shows the following;
--- f:\vs70builds\3077\vc\crtbld\crt\src\malloc.c ------------------------------
_heap_alloc:
0040B0E5 push 0Ch
0040B0E7 push 4280B0h
0040B0EC call __SEH_prolog (40CFF8h)
0040B0F1 mov esi,dword ptr [size]
0040B0F4 cmp dword ptr [___active_heap (434660h)],3
0040B0FB jne $L19917+7 (40B12Bh)
0040B0FD cmp esi,dword ptr [___sbh_threshold (43464Ch)]
0040B103 ja $L19917+7 (40B12Bh)
0040B105 push 4
0040B107 call _lock (40DE73h)
0040B10C pop ecx
0040B10D and dword ptr [ebp-4],0
0040B111 push esi
0040B112 call __sbh_alloc_block (40E736h)
0040B117 pop ecx
0040B118 mov dword ptr [pvReturn],eax
0040B11B or dword ptr [ebp-4],0FFFFFFFFh
0040B11F call $L19916 (40B157h)
$L19917:
0040B124 mov eax,dword ptr [pvReturn]
0040B127 test eax,eax
0040B129 jne $L19917+2Ah (40B14Eh)
0040B12B test esi,esi
0040B12D jne $L19917+0Ch (40B130h)
0040B12F inc esi
0040B130 cmp dword ptr [___active_heap (434660h)],1
0040B137 je $L19917+1Bh (40B13Fh)
0040B139 add esi,0Fh
0040B13C and esi,0FFFFFFF0h
0040B13F push esi
0040B140 push 0
0040B142 push dword ptr [__crtheap (43465Ch)]
0040B148 call dword ptr [__imp__HeapAlloc#12 (425144h)]
0040B14E call __SEH_epilog (40D033h)
0040B153 ret
$L19914:
0040B154 mov esi,dword ptr [ebp+8]
$L19916:
0040B157 push 4
0040B159 call _unlock (40DDBEh)
0040B15E pop ecx
$L19929:
0040B15F ret
_nh_malloc:
0040B160 cmp dword ptr [esp+4],0FFFFFFE0h
0040B165 ja _nh_malloc+29h (40B189h)
With the registers as follows;
EAX = 009C8AF0 EBX = FFFFFFFF ECX = 009C8A88 EDX = 00747365 ESI = 00430F80
EDI = 00430F80 EIP = 0040B160 ESP = 0013FDF4 EBP = 0013FFC0 EFL = 00000206
So the compare does appear to be against the correct constant, i.e. #040B160 cmp dword ptr [esp+4],0FFFFFFE0h, also esp+4 = 0013FDF8 = 1F400000 (my 512mb)
Second edit: Problem was actually in HeapAlloc, as per Andreas' post. Changing to a new seperate heap for large objects, using HeapCreate & HeapAlloc, did not help alleviate the problem, nor did an attempt to use VirtualAlloc with various parameters. Some further experimentation has shown that where allocation one large section of contiguous memory fails, two smaller blocks yielding the same total memory is ok. e.g. where a 300MB malloc fails, 2 x 150MB mallocs work ok. So it looks like I'll need a new array class that can live in a number of biggish memory fragments rather than a single contiguous block. Not a major problem, but I would have expected a bit more out of Win32 in this day and age.
Last edit: The following yielded 1.875GB of space, albeit non-contiguous
#define TenMB 1024*1024*10
void SmallerAllocs()
{
size_t Total = 0;
LPVOID p[200];
for (int i = 0; i < 200; i++)
{
p[i] = malloc(TenMB);
if (p[i])
Total += TenMB; else
break;
}
CString Msg;
Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
AfxMessageBox(Msg,MB_OK);
}
May it be the cast that the debugger is playing a trick on you in release-mode? Neither single stepping nor the values of variables are reliable in release-mode.
I tried your example in VS2003 in release mode, and when single stepping it does at first look like the code is landing on the return NULL line, but when I continue stepping it eventually continues into HeapAlloc, I would guess that it's this function that's failing, looking at the disassembly if (size > _HEAP_MAXREQ) reveals the following:
00401078 cmp dword ptr [esp+4],0FFFFFFE0h
so I don't think it's a problem with _HEAP_MAXREQ.