malloc() returning address that I cannot access - c++

I have a C++ program that calls some C routines that are generated by Flex / Bison.
When I target a Windows 8.1 64-bit platform, I hit the following exception at runtime:
Unhandled exception at 0x0007FFFA70F2C39 (libapp.dll) in application.exe: 0xC0000005:
Access violation writing location 0x000000005A818118.
I traced this exception to the following piece of code:
YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
{
YY_BUFFER_STATE b;
b = (YY_BUFFER_STATE) yy_flex_alloc( sizeof( struct yy_buffer_state ) );
if ( ! b )
YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" );
b->yy_buf_size = size; // This access is what throws the exception
}
For reference, elsewhere in the code (also generated by Flex / Bison), we have:
typedef struct yy_buffer_state *YY_BUFFER_STATE;
struct yy_buffer_state
{
FILE *yy_input_file;
char *yy_ch_buf;
char *yy_buf_pos;
yy_size_t yy_buf_size;
// ... other fields omitted,
// total struct size is 56 bytes
}
static void *yy_flex_alloc( yy_size_t size )
{
return (void *) malloc( size );
}
I traced back to the malloc call and observed that malloc itself is returning the address 0x000000005A818118. I also checked errno, but it is not set after the call to malloc.
My question is: why does malloc give me an address that I don't have access to, and how can I make it give me a correct address?
Note: I only observe this behavior in Windows 8.1 64-bit. It passes with other 32-bit Windows variants, as well as Windows 7 32-bit.
Compilation information: I am compiling this on a 64-bit Windows 8.1 machine using Visual Studio 2012.
If it helps, here is the disassembled code:
// b = (YY_BUFFER_STATE) yy_flex_alloc( ... )
0007FFFA75E2C12 call yy_flex_alloc (07FFFA75E3070h)
0007FFFA75E2C17 mov qword ptr [b],rax
// if ( ! b ) YY_FATAL_ERROR( ... )
0007FFFA75E2C1C cmp qword ptr [b],0
0007FFFA75E2C22 jne yy_create_buffer+30h (07FFFA75E2C30h)
0007FFFA75E2C24 lea rcx,[yy_chk+58h (07FFFA7646A28h)]
0007FFFA75E2C2B call yy_fatal_error (07FFFA75E3770h)
// b->yy_buf_size = size
0007FFFA75E2C30 mov rax,qword ptr [b]
0007FFFA75E2C35 mov ecx,dword ptr [size]
0007FFFA75E2C39 mov dword ptr [rax+18h],ecx
Thanks!

The real answer is:
When you are compiling flex-generated .c source in Visual Studio, it doesn't include stdlib.h (where malloc defined as returning void*) and Visual Studio takes some own definition, where malloc returns int. (I think it's for some kind of compatibility)
Visual studio prints:
'warning C4013: 'malloc' undefined; assuming extern returning int'
sizeof(int)==4, but values in pointers on x64 systems often exceed 4 bytes
So your pointer just cut to low 4 bytes.
It seems this problem appears only in x64 bits visual studio in .c files.
So, solution will be - just include stdlib.h by yourself, or define some macros, which will lead in flex-generated source to including stdlib.h.

Under normal circumstances malloc() will return a pointer to valid, accessible memory or else NULL. So, your symptoms indicate that malloc() is behaving in an unspecified way. I suspect that, at some point earlier, your program wrote outside of its valid memory, thereby corrupting the data structures used internally by malloc().
Examining your process with a run-time memory analysis tool should help you identify the source of the issue. [See this post for suggestions on memory analysis tools for Windows: Is there a good Valgrind substitute for Windows? ]

Related

How can I make single object larger than 2GB using new operator?

I'm trying to make a single object larger than 2GB using new operator.
But if the size of the object is larger than 0x7fffffff, The size of memory to be allocated become strange.
I think it is done by compiler because the assembly code itself use strange size of memory allocation.
I'm using Visual Stuio 2015 and configuration is Release, x64.
Is it bug of VS2015? otherwise, I want to know why the limitation exists.
The example code is as below with assembly code.
struct chunk1MB
{
char data[1024 * 1024];
};
class chunk1
{
chunk1MB data1[1024];
chunk1MB data2[1023];
char data[1024 * 1024 - 1];
};
class chunk2
{
chunk1MB data1[1024];
chunk1MB data2[1024];
};
auto* ptr1 = new chunk1;
00007FF668AF1044 mov ecx,7FFFFFFFh
00007FF668AF1049 call operator new (07FF668AF13E4h)
auto* ptr2 = new chunk2;
00007FF668AF104E mov rcx,0FFFFFFFF80000000h // must be 080000000h
00007FF668AF1055 mov rsi,rax
00007FF668AF1058 call operator new (07FF668AF13E4h)
Use a compiler like clang-cl that isn't broken, or that doesn't have intentional signed-32-bit implementation limits on max object size, whichever it is for MSVC. (Could this be affected by a largeaddressaware option?)
Current MSVC (19.33 on Godbolt) has the same bug, although it does seem to handle 2GiB static objects. But not 3GiB static objects; adding another 1GiB member leads to wrong code when accessing a byte more than 2GiB from the object's start (Godbolt -
mov BYTE PTR chunk2 static_chunk2-1073741825, 2 - note the negative offset.)
GCC targeting Linux makes correct code for the case of a 3GiB object, using mov r64, imm64 to get the absolute address into a register, since a RIP-relative addressing mode isn't usable. (In general you'd need gcc -mcmodel=medium to work correctly when some .data / .bss addresses are linked outside the low 2GiB and/or more than 2GiB away from code.)
MSVC seems to have internally truncated the size to signed 32-bit, and then sign-extended. Note the arg it passes to new: mov rcx, 0FFFFFFFF80000000h instead of mov ecx, 80000000h (which would set RCX = 0000000080000000h by implicit zero-extension when writing a 32-bit register.)
In a function that returns sizeof(chunk2); as a size_t, it works correctly, but interestingly prints the size as negative in the source. That might be innocent, e.g. after realizing that the value fits in a 32-bit zero-extended value, MSVC's asm printing code might just always print 32-bit integers as signed decimal, with unsigned hex in a comment.
It's clearly different from how it passes the arg to new; in that case it used 64-bit operand-size in the machine code, so the same 32-bit immediate gets sign-extended to 64-bit, to a huge value near SIZE_MAX, which is of course vastly larger than any possible max object size for x86-64. (The 48-bit virtual address spaces is 1/65536th of the 64-bit value-range of size_t).
unsigned __int64 sizeof_chunk2(void) PROC ; sizeof_chunk2, COMDAT
mov eax, -2147483648 ; 80000000H
ret 0
unsigned __int64 sizeof_chunk2(void) ENDP ; sizeof_chunk2
This looks like a compiler bug or intentional implementation limit; report it to Microsoft if it's not already known.
I'm not sure how to completely solve your issue as it's not answered anywhere I've seen properly.
Memory models are tricky and up until x64 2GB were pretty much the limit.
No basic memory model in Windows support large allocations as far as I know.
Huge pages support 1GB of memory.
However I want to point to different directions:
3 Ways I found to achieve something similar:
The obvious answer - split your allocations to smaller chunks - it's more memory efficient.
Use a different kind of swap, you can write memory to files.
Use virtual memory, not sure if it's helpful to you, using the windows api VirtualAlloc
const static SIZE_T giga = 1024 * 1024 * 1024;
const static SIZE_T size = 4 * giga;
BYTE* ptr = static_cast<BYTE*>(VirtualAlloc(nullptr, (SIZE_T)size, MEM_COMMIT, PAGE_READWRITE));
VirtualFree(ptr, 0, MEM_RELEASE);
Best of luck.

How does a compiler store information about an array's size?

Recently I read on IsoCpp about how compiler known size of array created with new. The FAQ describes two ways of implementation, but so basically and without any internal information. I tried to find an implementation of these mechanisms in STL sources from Microsoft and GCC, but as I see, both of them just call the malloc internally. I tried to go deeper and found an implementation of the malloc function in GCC, but I couldn't figure out where the magic happens.
Is it possible to find how this works, or it implemented in system runtime libraries?
Here is where the compiler stores the size in the source code for GCC: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/init.c#L3319-L3325
And the equivalent place in the source code for Clang: https://github.com/llvm/llvm-project/blob/c11051a4001c7f89e8655f1776a75110a562a45e/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2183-L2185
What the compilers do is store a "cookie" which is the number of elements allocated (the N in new T[N]) immediately before the pointer that new T[N] returns. This in turn means that a few extra bytes have to be allocated in the call to operator new[]. The compiler generates code to do this at runtime.
operator new[](std::size_t x) itself does no work: It simply allocates x bytes. The compiler makes new T[N] call operator new[](sizeof(T) * N + cookie_size).
The compiler does not "know" the size (it's a run-time value), but it knows how to generate code to retrieve the size on a subsequent delete[] p.
At least for GCC targeting x86_64, it is possible to investigate this question by looking at the assembly GCC generates for this simple program:
#include <iostream>
struct Foo
{
int x, y;
~Foo() { std::cout << "Delete foo " << this << std::endl; }
};
Foo * create()
{
return new Foo[8];
}
void destroy(Foo * p)
{
delete[] p;
}
int main()
{
destroy(create());
}
Using Compiler Explorer, we see this code generated for the create function:
create():
sub rsp, 8
mov edi, 72
call operator new[](unsigned long)
mov QWORD PTR [rax], 8
add rax, 8
add rsp, 8
ret
It looks to me like the compiler is calling operator new[] to allocate 72 bytes of memory, which is 8 bytes more than is needed for the storage of the objects (8 * 8 = 64). Then it is storing the object count (8) at the beginning of this allocation, and adding 8 bytes to the pointer before returning it, so the pointer points to the first object.
This is one of the methods what was listed in the document you linked to:
Over-allocate the array and put n just to the left of the first Fred object.
I searched a little bit in the source code of libstdc++ to see if this was implmented by the standard library or the compiler, and I think it's actually implemented by the compiler itself, though I could be wrong.

How to alloc a executable memory buffer?

I would like to alloc a buffer that I can execute on Win32 but I have an exception in visual studio cuz the malloc function returns a non executable memory zone. I read that there a NX flag to disable... My goal is convert a bytecode to asm x86 on fly with keep in mind performance.
Does somemone can help me?
You don't use malloc for that. Why would you anyway, in a C++ program? You also don't use new for executable memory, however. There's the Windows-specific VirtualAlloc function to reserve memory which you then mark as executable with the VirtualProtect function applying, for instance, the PAGE_EXECUTE_READ flag.
When you have done that, you can cast the pointer to the allocated memory to an appropriate function pointer type and just call the function. Don't forget to call VirtualFree when you are done.
Here is some very basic example code with no error handling or other sanity checks, just to show you how this can be accomplished in modern C++ (the program prints 5):
#include <windows.h>
#include <vector>
#include <iostream>
#include <cstring>
int main()
{
std::vector<unsigned char> const code =
{
0xb8, // move the following value to EAX:
0x05, 0x00, 0x00, 0x00, // 5
0xc3 // return what's currently in EAX
};
SYSTEM_INFO system_info;
GetSystemInfo(&system_info);
auto const page_size = system_info.dwPageSize;
// prepare the memory in which the machine code will be put (it's not executable yet):
auto const buffer = VirtualAlloc(nullptr, page_size, MEM_COMMIT, PAGE_READWRITE);
// copy the machine code into that memory:
std::memcpy(buffer, code.data(), code.size());
// mark the memory as executable:
DWORD dummy;
VirtualProtect(buffer, code.size(), PAGE_EXECUTE_READ, &dummy);
// interpret the beginning of the (now) executable memory as the entry
// point of a function taking no arguments and returning a 4-byte int:
auto const function_ptr = reinterpret_cast<std::int32_t(*)()>(buffer);
// call the function and store the result in a local std::int32_t object:
auto const result = function_ptr();
// free the executable memory:
VirtualFree(buffer, 0, MEM_RELEASE);
// use your std::int32_t:
std::cout << result << "\n";
}
It's very unusual compared to normal C++ memory management, but not really rocket science. The hard part is to get the actual machine code right. Note that my example here is just very basic x64 code.
Extending the above answer, a good practice is:
Allocate memory with VirtualAlloc and read-write-access.
Fill that region with your code
Change that region's protection with VirtualProtectto execute-read-access
jump to/call the entry point in this region
So it could look like this:
adr = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// write code to the region
ok = VirtualProtect(adr, size, PAGE_EXECUTE_READ, &oldProtection);
// execute the code in the region
As stated in documentation for VirtualAlloc
flProtect [in]
The memory protection for the region of pages to be allocated. If the pages are being committed, you can specify any one of the memory protection constants.
one of them is:
PAGE_EXECUTE
0x10
Enables execute access to the committed region of pages. An attempt to write to the committed region results in an access violation.
This flag is not supported by the CreateFileMapping function.
PAGE_EXECUTE_READ
0x20
Enables execute or read-only access to the committed region of pages. An attempt to write to the committed region results in an access violation.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
PAGE_EXECUTE_READWRITE
0x40
Enables execute, read-only, or read/write access to the committed region of pages.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
and so on from here
C version based off of Christian Hackl's answer
I think SIZE_T dwSize of VirtualAlloc should be the size of the code in bytes, not system_info.dwPageSize (what if sizeof code is bigger than system_info.dwPageSize?).
I don't know C enough to know if sizeof(code) is the "correct" way of getting the size of the machine code
this compiles under c++ so I guess it's not off topic lol
#include <Windows.h>
#include <stdio.h>
int main()
{
// double add(double a, double b) {
// return a + b;
// }
unsigned char code[] = { //Antonio Cuni - How to write a JIT compiler in 30 minutes: https://www.youtube.com/watch?v=DKns_rH8rrg&t=118s
0xf2,0x0f,0x58,0xc1, //addsd %xmm1,%xmm0
0xc3, //ret
};
LPVOID buffer = VirtualAlloc(NULL, sizeof(code), MEM_COMMIT, PAGE_READWRITE);
memcpy(buffer, code, sizeof(code));
//protect after write, because protect will prevent writing.
DWORD oldProtection;
VirtualProtect(buffer, sizeof(code), PAGE_EXECUTE_READ, &oldProtection);
double (*function_ptr)(double, double) = (double (*)(double, double))buffer; //is there a cleaner way to write this ?
// double result = (*function_ptr)(2, 234); //NOT SURE WHY THIS ALSO WORKS
double result = function_ptr(2, 234);
VirtualFree(buffer, 0, MEM_RELEASE);
printf("%f\n", result);
}
At compile time, the linker will organize your program's memory footprint by allocating memory into data sections and code sections. The CPU will make sure that the program counter (the hard CPU register) value remains within a code section or the CPU will throw a hardware exception for violating the memory bounds. This provides some security by making sure your program only executes valid code. Malloc is intended for allocating data memory. Your application has a heap and the heap's size is established by the linker and is marked as data memory. So at runtime malloc is just grabbing some of the virtual memory from your heap which will always be data.
I hope this helps you have a better understanding what's going on, though it might not be enough to get you where you need to be. Perhaps you can pre-allocate a "code heap" or memory pool for your runtime-generated code. You will probably need to fuss with the linker to accomplish this but I don't know any of the details.

How does Visual Studio 2013 detect buffer overrun

Visual Studio 2013 C++ projects have a /GS switch to enable buffer security check validation at runtime. We are encountering many more STATUS_STACK_BUFFER_OVERRUN errors since upgrading to VS 2013, and suspect it has something to do with improved checking of buffer overrun in the new compiler. I've been trying to verify this and better understand how buffer overrun is detected. I'm befuddled by the fact that buffer overrun is reported even when the memory updated by a statement only changes the contents of another local variable on the stack in the same scope! So it must be checking not only that the change doesn't corrupt memory not "owned" by a local variable, but that the change doesn't affect any local variable other than that allocated to the one referenced by the individual update statement. How does this work? Has it changed since VS 2010?
Edit:
Here's an example illustrating a case that Mysticial's explanation doesn't cover:
void TestFunc1();
int _tmain(int argc, _TCHAR* argv[])
{
TestFunc1();
return 0;
}
void TestFunc1()
{
char buffer1[4] = ("123");
char buffer2[4] = ("456");
int diff = buffer1 - buffer2;
printf("%d\n", diff);
getchar();
buffer2[4] = '\0';
}
The output is 4 indicating that the memory about to be overwritten is within the bounds of buffer1 (immediately after buffer2), but then the program terminates with a buffer overrun. Technically it should be considered a buffer overrun, but I don't know how it's being detected since it's still within the local variables' storage and not really corrupting anything outside local variables.
This screenshot with memory layout proves it. After stepping one line the program aborted with the buffer overrun error.
I just tried the same code in VS 2010, and although debug mode caught the buffer overrun (with a buffer offset of 12), in release mode it did not catch it (with a buffer offset of 8). So I think VS 2013 tightened the behavior of the /GS switch.
Edit 2:
I managed to sneak past even VS 2013 range checking with this code. It still did not detect that an attempt to update one local variable actually updated another:
void TestFunc()
{
char buffer1[4] = "123";
char buffer2[4] = "456";
int diff;
if (buffer1 < buffer2)
{
puts("Sequence 1,2");
diff = buffer2 - buffer1;
}
else
{
puts("Sequence 2,1");
diff = buffer1 - buffer2;
}
printf("Offset: %d\n", diff);
switch (getchar())
{
case '1':
puts("Updating buffer 1");
buffer1[diff] = '!';
break;
case '2':
puts("Updating buffer 2");
buffer2[diff] = '!';
break;
}
getchar(); // Eat enter keypress
printf("%s,%s\n", buffer1, buffer2);
}
You are seeing an improvement to the /GS mechanism, first added to VS2012. Originally /GS could detect buffer overflows but there's still a loop-hole where attacking code can stomp the stack but bypass the cookie. Roughly like this:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
}
If the attacker can manipulate the value of index then the cookie doesn't help. This code is now rewritten to:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
if (index >= 256) __report_rangefailure();
}
Just plain index checking. Which, when triggered, instantly terminates the app with __fastfail() if no debugger is attached. Backgrounder is here.
From the MSDN page on /GS in Visual Studio 2013 :
Security Checks
On functions that the compiler recognizes as subject to buffer overrun problems, the compiler allocates space on the stack before the return address. On function entry, the allocated space is loaded with a security cookie that is computed once at module load. On function exit, and during frame unwinding on 64-bit operating systems, a helper function is called to make sure that the value of the cookie is still the same. A different value indicates that an overwrite of the stack may have occurred. If a different value is detected, the process is terminated.
for more details, the same page refers to Compiler Security Checks In Depth:
What /GS Does
The /GS switch provides a "speed bump," or cookie, between the buffer and the return address. If an overflow writes over the return address, it will have to overwrite the cookie put in between it and the buffer, resulting in a new stack layout:
Function parameters
Function return address
Frame pointer
Cookie
Exception Handler frame
Locally declared variables and buffers
Callee save registers
The cookie will be examined in more detail later. The function's execution does change with these security checks. First, when a function is called, the first instructions to execute are in the function’s prolog. At a minimum, a prolog allocates space for the local variables on the stack, such as the following instruction:
sub esp, 20h
This instruction sets aside 32 bytes for use by local variables in the function. When the function is compiled with /GS, the functions prolog will set aside an additional four bytes and add three more instructions as follows:
sub esp,24h
mov eax,dword ptr [___security_cookie (408040h)]
xor eax,dword ptr [esp+24h]
mov dword ptr [esp+20h],eax
The prolog contains an instruction that fetches a copy of the cookie, followed by an instruction that does a logical xor of the cookie and the return address, and then finally an instruction that stores the cookie on the stack directly below the return address. From this point forward, the function will execute as it does normally. When a function returns, the last thing to execute is the function’s epilog, which is the opposite of the prolog. Without security checks, it will reclaim the stack space and return, such as the following instructions:
add esp,20h
ret
When compiled with /GS, the security checks are also placed in the epilog:
mov ecx,dword ptr [esp+20h]
xor ecx,dword ptr [esp+24h]
add esp,24h
jmp __security_check_cookie (4010B2h)
The stack's copy of the cookie is retrieved and then follows with the XOR instruction with the return address. The ECX register should contain a value that matches the original cookie stored in the __security_cookie variable. The stack space is then reclaimed, and then, instead of executing the RET instruction, the JMP instruction to the __security_check_cookie routine is executed.
The __security_check_cookie routine is straightforward: if the cookie was unchanged, it executes the RET instruction and ends the function call. If the cookie fails to match, the routine calls report_failure. The report_failure function then calls __security_error_handler(_SECERR_BUFFER_OVERRUN, NULL). Both functions are defined in the seccook.c file of the C run-time (CRT) source files.

32-bit C++ to 64-bit C++ using Visual Studio 2010

I recently want to convert a 32-bit C++ project to 64-bit, but I am stuck with the first try. Could you point out any suggestions/checklist/points when converting 32-bit C++ to 64-bit in VS (like converting 32-bit Delphi to 64-bit).
int GetVendorID_0(char *pVendorID,int iLen)
{
#ifdef WIN64 // why WIN64 is not defined switching to Active (x64) ?
// what to put here?
#else
DWORD dwA,dwB,dwC,dwD;
__asm
{
PUSHAD
MOV EAX,0
CPUID //CPUID(EAX=0),
MOV dwA,EAX
MOV dwC,ECX
MOV dwD,EDX
MOV dwB,EBX
POPAD
}
memset( pVendorID, 0,iLen);
memcpy( pVendorID, &dwB,4);
memcpy(&pVendorID[4], &dwD,4);
memcpy(&pVendorID[8], &dwC,4);
return dwA;
#endif
}
Microsoft's compilers (some of them, anyway) have a flag to point out at least some common problems where code will probably need modification to work as 64-bit code.
As far as your GetVendorID_0 function goes, I'd use Microsoft's _cpuid function, something like this:
int GetVendorID_0(char *pVendorID, int iLen) {
DWORD data[4];
_cpuid(0, data);
memcpy(pVendorID, data+1, 12);
return data[0];
}
That obviously doesn't replace all instances of inline assembly language. You choices are fairly simple (though not necessarily easy). One is to find an intrinsic like this to do the job. The other is to move the assembly code into a separate file and link it with your code in C++ (and learn the x64 calling convention). The third is to simply forego what you're doing now, and write the closest equivalent you can with more portable code.