mach_header 64bit and __PAGEZERO segment 64bit

mach_header 64bit and __PAGEZERO segment 64bit - c++

const struct mach_header *mach = _dyld_get_image_header(0);
struct load_command *lc;
struct segment_command_64 *sc64;
struct segment_command *sc;
if (mach->magic == MH_MAGIC_64) {
lc = (struct load_command *)((unsigned char *)mach + sizeof(struct mach_header_64));
printf("[+] detected 64bit ARM binary in memory.\n");
} else {
lc = (struct load_command *)((unsigned char *)mach + sizeof(struct mach_header));
printf("[+] detected 32bit ARM binary in memory.\n");
}
for (int i = 0; i < mach->ncmds; i++) {
if (lc->cmd == LC_SEGMENT) {
sc = (struct segment_command *)lc;
NSLog(#"32Bit: %s (%x - 0x%x)",sc->segname,sc->vmaddr,sc->vmsize);
} else if (lc->cmd == LC_SEGMENT_64) {
sc64 = (struct segment_command_64 *)lc;
NSLog(#"64Bit: %s (%llx - 0x%llx)",sc64->segname,sc64->vmaddr,sc64->vmsize);
}
lc = (struct load_command *)((unsigned char *)lc+lc->cmdsize);
}
When I run this code in 32Bit I get normal outputs:
__PAGEZERO (0 - 0x1000)
But on 64Bit: __PAGEZERO (0 - 0x100000000)
__PAGEZERO goes from 0x1000 to over 0x100000000 in size, is there any fix for it or any solution why this occurs?

Making a big __PAGEZERO in a 64-bit architecture makes a whole lot of sense. The address range of a 64-bit system, even when the upper 16 bits are "cropped off" like that of x86_64, allows for a huge amount of memory (the 48-bit address space of x86_64 is 256TB of memory address space). It is highly likely that this will be thought of as "small" at some point in the future, but right now, the biggest servers have 1-4TB, so there's plenty of room to grow, and more ordinary machines have 16-32GB.
Note also that no memory is actually OCCUPIED. It's just "reserved virtual space" (that is, "it will never be used"). It takes up absolutely zero resources, because it's not mapped in the page-table, it's not there physically. It's just an entry in the file, which tells the loader to reserve this space to it can never be used, and thus "safeguarded". The actual "data" of this section is zero in size, since, again, there's actually nothing there, just a "make sure this is not used". So your actual file size won't be any larger or smaller if this section is changed in size. It would be a few bytes smaller (the size of the section description) if it didn't exist at all. But that's really the only what it would make any difference at all.
The purpose of a __PAGEZERO is to catch NULL pointer dereferences. By reserving a large section of memory at the beginning of memory, any access through a NULL pointer will be caught and the application aborted. In a 32-bit architecture, something like:
int *p = NULL;
int x = p[0x100000];
is likely to succeed, because at 0x400000 (4MB) the code-space starts (trying to write to such a location is likely to crash, but reading will work - assuming of course the code-space actually starts there and not someplace else in the address range.
Edit:
This presentation shows that ARM, the latest entrant into the 64-bit processor sapce, is also using 48-bit virtual address space, and enforces canonical addresses (top 16 bits need to all be the same value) so it can be expanded in the future. In other words, the virtual space available on a 64-bit ARM processor is also 256TB.

Related

How to tell a C++ program to get the system memory? [duplicate]

I want to allocate my buffers according to memory available. Such that, when I do processing and memory usage goes up, but still remains in available memory limits. Is there a way to get available memory (I don't know will virtual or physical memory status will make any difference ?). Method has to be platform Independent as its going to be used on Windows, OS X, Linux and AIX. (And if possible then I would also like to allocate some of available memory for my application, someone it doesn't change during the execution).
Edit: I did it with configurable memory allocation.
I understand it is not good idea, as most OS manage memory for us, but my application was an ETL framework (intended to be used on server, but was also being used on desktop as a plugin for Adobe indesign). So, I was running in to issue of because instead of using swap, windows would return bad alloc and other applications start to fail. And as I was taught to avoid crashes and so, was just trying to degrade gracefully.

On UNIX-like operating systems, there is sysconf.
#include <unistd.h>
unsigned long long getTotalSystemMemory()
{
long pages = sysconf(_SC_PHYS_PAGES);
long page_size = sysconf(_SC_PAGE_SIZE);
return pages * page_size;
}
On Windows, there is GlobalMemoryStatusEx:
#include <windows.h>
unsigned long long getTotalSystemMemory()
{
MEMORYSTATUSEX status;
status.dwLength = sizeof(status);
GlobalMemoryStatusEx(&status);
return status.ullTotalPhys;
}
So just do some fancy #ifdefs and you'll be good to go.

There are reasons to do want to do this in HPC for scientific software. (Not game, web, business or embedded software). Scientific software routinely go through terabytes of data to get through one computation (or run) (and run for hours or weeks) -- all of which cannot be stored in memory (and if one day you tell me a terabyte is standard for any PC or tablet or phone it will be the case that the scientific software will be expected to handle petabytes or more). The amount of memory can also dictate the kind of method/algorithm that makes sense. The user does not always want to decide the memory and method - he/she has other things to worry about. So the programmer should have a good idea of what is available (4Gb or 8Gb or 64Gb or thereabouts these days) to decide whether a method will automatically work or a more laborious method is to be chosen. Disk is used but memory is preferable. And users of such software are not encouraged to be doing too many things on their computer when running such software -- in fact, they often use dedicated machines/servers.

There is no platform independent way to do this, different operating systems use different memory management strategies.
These other stack overflow questions will help:
How to get memory usage at run time in c++?
C/C++ memory usage API in Linux/Windows
You should watch out though: It is notoriously difficult to get a "real" value for available memory in linux. What the operating system displays as used by a process is no guarantee of what is actually allocated for the process.
This is a common issue when developing embedded linux systems such as routers, where you want to buffer as much as the hardware allows. Here is a link to an example showing how to get this information in a linux (in C):
http://www.unix.com/programming/25035-determining-free-available-memory-mv-linux.html

Having read through these answers I'm astonished that so many take the stance that OP's computer memory belongs to others. It's his computer and his memory to do with as he sees fit, even if it breaks other systems taking a claim it. It's an interesting question. On a more primitive system I had memavail() which would tell me this. Why shouldn't the OP take as much memory as he wants without upsetting other systems?
Here's a solution that allocates less than half the memory available, just to be kind. Output was:
Required FFFFFFFF
Required 7FFFFFFF
Required 3FFFFFFF
Memory size allocated = 1FFFFFFF
#include <stdio.h>
#include <stdlib.h>
#define MINREQ 0xFFF // arbitrary minimum
int main(void)
{
unsigned int required = (unsigned int)-1; // adapt to native uint
char *mem = NULL;
while (mem == NULL) {
printf ("Required %X\n", required);
mem = malloc (required);
if ((required >>= 1) < MINREQ) {
if (mem) free (mem);
printf ("Cannot allocate enough memory\n");
return (1);
}
}
free (mem);
mem = malloc (required);
if (mem == NULL) {
printf ("Cannot enough allocate memory\n");
return (1);
}
printf ("Memory size allocated = %X\n", required);
free (mem);
return 0;
}

Mac OS X example using sysctl (man 3 sysctl):
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
int main(void)
{
int mib[2] = { CTL_HW, HW_MEMSIZE };
u_int namelen = sizeof(mib) / sizeof(mib[0]);
uint64_t size;
size_t len = sizeof(size);
if (sysctl(mib, namelen, &size, &len, NULL, 0) < 0)
{
perror("sysctl");
}
else
{
printf("HW.HW_MEMSIZE = %llu bytes\n", size);
}
return 0;
}
(may also work on other BSD-like operating systems ?)

The code below gives the total and free memory in Megabytes. Works for FreeBSD, but you should be able to use same/similar sysctl tunables on your platform and do to the same thing (Linux & OS X have sysctl at least)
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/vmmeter.h>
int main(){
int rc;
u_int page_size;
struct vmtotal vmt;
size_t vmt_size, uint_size;
vmt_size = sizeof(vmt);
uint_size = sizeof(page_size);
rc = sysctlbyname("vm.vmtotal", &vmt, &vmt_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
rc = sysctlbyname("vm.stats.vm.v_page_size", &page_size, &uint_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
printf("Free memory : %ld\n", vmt.t_free * (u_int64_t)page_size);
printf("Available memory : %ld\n", vmt.t_avm * (u_int64_t)page_size);
return 0;
}
Below is the output of the program, compared with the vmstat(8) output on my system.
~/code/memstats % cc memstats.c
~/code/memstats % ./a.out
Free memory : 5481914368
Available memory : 8473378816
~/code/memstats % vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id
0 0 0 8093M 5228M 287 0 1 0 304 133 0 0 112 9597 1652 2 1 97

Linux currently free memory: sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages()
The total RAM was covered at https://stackoverflow.com/a/2513561/895245 with sysconf(_SC_PHYS_PAGES);.
Both sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages() are glibc extensions to POSIX that give instead the total currently available RAM pages.
You then just have to multiply them by sysconf(_SC_PAGE_SIZE) to obtain the current free RAM.
Minimal runnable example at: C - Check available free RAM?

The "official" function for this is was std::get_temporary_buffer(). However, you might want to test whether your platform has a decent implemenation. I understand that not all platforms behave as desired.

Instead of trying to guess, have you considered letting the user configure how much memory to use for buffers, as well as assuming somewhat conservative defaults? This way you can still run (possibly slightly slower) with no override, but if the user know there is X memory available for the app they can improve performance by configuring that amount.

Here is a proposal to get available memory on Linux platform:
/// Provides the available RAM memory in kibibytes (1 KiB = 1024 B) on Linux platform (Available memory in /proc/meminfo)
/// For more info about /proc/meminfo : https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo
long long getAvailableMemory()
{
long long memAvailable = -1;
std::ifstream meminfo("/proc/meminfo");
std::string line;
while (std::getline(meminfo, line))
{
if (line.find("MemAvailable:") != std::string::npos)
{
const std::size_t firstWhiteSpacePos = line.find_first_of(' ');
const std::size_t firstNonWhiteSpaceChar = line.find_first_not_of(' ', firstWhiteSpacePos);
const std::size_t nextWhiteSpace = line.find_first_of(' ', firstNonWhiteSpaceChar);
const std::size_t numChars = nextWhiteSpace - firstNonWhiteSpaceChar;
const std::string memAvailableStr = line.substr(firstNonWhiteSpaceChar, numChars);
memAvailable = std::stoll(memAvailableStr);
break;
}
}
return memAvailable;
}

What is the most reliable / portable way to allocate memory at low addresses on 64-bit systems?

I need to allocate large blocks of memory (to be used by my custom allocator) that fall within the first 32GB of virtual address space.
I imagine that if I needed, say, 1MB blocks, I could iterate using mmap and MAP_FIXED_NOREPLACE (or VirtualAlloc) from low addresses onwards in increments of, say, 1MB, until the call succeeds. Continue from the last successful block for the next one.
This sounds clumsy, but at least it will be somewhat robust against OS address space layout changes, and ASLR algorithm changes. From my understanding of current OS layouts, there should be plenty of memory available in the first 32GB this way, but maybe I am missing something?
Is there anything in Windows, Linux, OS X, iOS or Android that would defeat this scheme? Is there a better way?
Just in case you're wondering, this is for the implementation of a VM for a programming language where fitting all pointers in a 32-bit value on a 64-bit system could give huge memory usage advantages and even speed gains. Since all objects are at least 8-byte aligned, the lower 3 bits can be shifted out, expanding the pointer range from 4GB to 32GB.

for restrict allocated memory range in windows we can use NtAllocateVirtualMemory function - this api is avaible for use in both user and kernel mode. in user mode it exported by ntdll.dll (use ntdll.lib or ntdllp.lib from wdk). in this api exist parameter - ZeroBits - The number of high-order address bits that must be zero in the base address of the section view. but in msdn link next words about ZeroBits is incorrect. correct is:
ZeroBits
Supplies the number of high order address bits that must be zero in
the base address of the section view. The value of this argument must
be less than or equal to the maximum number of zero bits and is only
used when memory management determines where to allocate the view
(i.e. when BaseAddress is null).
If ZeroBits is zero, then no zero bit constraints are applied.
If ZeroBits is greater than 0 and less than 32, then it is the
number of leading zero bits from bit 31. Bits 63:32 are also required
to be zero. This retains compatibility with 32-bit systems.
If ZeroBits is greater than 32, then it is considered as a mask and then number of leading zero are counted out
in the mask. This then becomes the zero bits argument.
so really we can use ZeroBits as mask - this is most power usage. but can be use and as zero bit count from 31 bit (in this case 63-32 bits will be always equal to 0). because allocation granularity (currently 64kb - 0x10000) - low 16 bits is always 0. so valid value for ZeroBits in bits-number mode - from 1 to 15 (=31-16). for better understand how this parameter work - look for example code. for better demo effect i will be use
MEM_TOP_DOWN
The specified region should be created at the highest virtual address
possible based on ZeroBits.
PVOID BaseAddress;
ULONG_PTR ZeroBits;
SIZE_T RegionSize = 1;
NTSTATUS status;
for (ZeroBits = 0xFFFFFFFFFFFFFFFF;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%p:%p\n", ZeroBits, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
ZeroBits >>= 1;
}
else
{
DbgPrint("%x\n", status);
break;
}
}
for(ZeroBits = 0;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%x:%p\n", ZeroBits++, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
}
else
{
DbgPrint("%x\n", status);
break;
}
}
and output:
FFFFFFFFFFFFFFFF:00007FF735B40000
7FFFFFFFFFFFFFFF:00007FF735B40000
3FFFFFFFFFFFFFFF:00007FF735B40000
1FFFFFFFFFFFFFFF:00007FF735B40000
0FFFFFFFFFFFFFFF:00007FF735B40000
07FFFFFFFFFFFFFF:00007FF735B40000
03FFFFFFFFFFFFFF:00007FF735B40000
01FFFFFFFFFFFFFF:00007FF735B40000
00FFFFFFFFFFFFFF:00007FF735B40000
007FFFFFFFFFFFFF:00007FF735B40000
003FFFFFFFFFFFFF:00007FF735B40000
001FFFFFFFFFFFFF:00007FF735B40000
000FFFFFFFFFFFFF:00007FF735B40000
0007FFFFFFFFFFFF:00007FF735B40000
0003FFFFFFFFFFFF:00007FF735B40000
0001FFFFFFFFFFFF:00007FF735B40000
0000FFFFFFFFFFFF:00007FF735B40000
00007FFFFFFFFFFF:00007FF735B40000
00003FFFFFFFFFFF:00003FFFFFFF0000
00001FFFFFFFFFFF:00001FFFFFFF0000
00000FFFFFFFFFFF:00000FFFFFFF0000
000007FFFFFFFFFF:000007FFFFFF0000
000003FFFFFFFFFF:000003FFFFFF0000
000001FFFFFFFFFF:000001FFFFFF0000
000000FFFFFFFFFF:000000FFFFFF0000
0000007FFFFFFFFF:0000007FFFFF0000
0000003FFFFFFFFF:0000003FFFFF0000
0000001FFFFFFFFF:0000001FFFFF0000
0000000FFFFFFFFF:0000000FFFFF0000
00000007FFFFFFFF:00000007FFFF0000
00000003FFFFFFFF:00000003FFFF0000
00000001FFFFFFFF:00000001FFFF0000
00000000FFFFFFFF:00000000FFFF0000
000000007FFFFFFF:000000007FFF0000
000000003FFFFFFF:000000003FFF0000
000000001FFFFFFF:000000001FFF0000
000000000FFFFFFF:000000000FFF0000
0000000007FFFFFF:0000000007FF0000
0000000003FFFFFF:0000000003FF0000
0000000001FFFFFF:0000000001FF0000
0000000000FFFFFF:0000000000FF0000
00000000007FFFFF:00000000007F0000
00000000003FFFFF:00000000003F0000
00000000001FFFFF:00000000001F0000
00000000000FFFFF:00000000000F0000
000000000007FFFF:0000000000070000
000000000003FFFF:0000000000030000
000000000001FFFF:0000000000010000
c0000017
0:00007FF735B40000
1:000000007FFF0000
2:000000003FFF0000
3:000000001FFF0000
4:000000000FFF0000
5:0000000007FF0000
6:0000000003FF0000
7:0000000001FF0000
8:0000000000FF0000
9:00000000007F0000
a:00000000003F0000
b:00000000001F0000
c:00000000000F0000
d:0000000000070000
e:0000000000030000
f:0000000000010000
c0000017
so if we say want restrict memory allocation to 32Gb(0x800000000) - we can use ZeroBits = 0x800000000 - 1:
NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
0x800000000 - 1, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)
as result memory will be allocated in range [0, 7FFFFFFFF] (actually [0, 7FFFF0000] because allocation granularity low 16 bits of address always is 0)
than you can create heap by RtlCreateHeap on allocated region range and allocate memory from this heap (note - this is also user mode api - use ntdll[p].lib for linker input)
PVOID BaseAddress = 0;
SIZE_T RegionSize = 0x10000000;// reserve 256Mb
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &BaseAddress,
0x800000000 - 1, &RegionSize, MEM_RESERVE, PAGE_READWRITE))
{
if (PVOID hHeap = RtlCreateHeap(0, BaseAddress, RegionSize, 0, 0, 0))
{
HeapAlloc(hHeap, 0, <somesize>);
RtlDestroyHeap(hHeap);
}
VirtualFree(BaseAddress, 0, MEM_RELEASE);
}

Is there a reason my compiler aligns chars to 4 bytes in debug and 8 bytes in release mode?

In debug mode whenever I create chars like this it's 4-byte aligned in debug mode, and in release mode it's always 8-byte aligned. This happens whether I run it in 32 bit build mode or 64 bit. I am using Visual Studio 2017. But I have tried it at Ideone.com online compiler, and it always passes this test:
char charbuff[] = "h";
if ((unsigned long long)&charbuff[0] % 8ULL != 0) throw;
I know that string literals have an extra byte, I tried "hh" and the only difference is that in 32-bit build mode sometimes it fails (the 8-byte alignment).
Are new variables never allocated to memory that isn't at least 4-byte aligned? Is this a feature of the language or the operating system allocator?
Edit: I am placing it on the stack in the main function:
int main()
{
char charbuff[] = "hh";
char charbuff2[] = "hh";
char charbuff3[] = "hh";
if ((unsigned long long)&charbuff[0] % 8ULL != 0) throw;
if ((unsigned long long)&charbuff2[0] % 8ULL != 0) throw;
if ((unsigned long long)&charbuff3[0] % 8ULL != 0) throw;
}
This always passes in release mode. In debug mode it'll always pass a 4-byte aligned. On Ideone.com it always passes this.

C++ VirtualQueryEx infinite loop

I'm currently re-creating a memory modifier application using C++, the original was in C#.
All credit goes to "gimmeamilk" who's tutorials Ive been following on YouTube(video 1 of 8). I would highly recommend these tutorials for anyone attempting to create a similar application.
The problem I have is that my VirtualQueryEx seems to run forever. The process I'm scanning is "notepad.exe" and I am passing to the application via command line parameter.
std::cout<<"Create scan started\n";
#define WRITABLE (PAGE_READWRITE | PAGE_WRITECOPY | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY) //These are all the flags that will be used to determin if a memory block is writable.
MEMBLOCK * mb_list = NULL; //pointer to the head of the link list to be returned
MEMORY_BASIC_INFORMATION meminfo; //holder for the VirtualQueryEx return struct
unsigned char *addr = 0; //holds the value to pass to VirtualQueryEx
HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS,false, pid);
if(hProc)
{
while(1)
{
if(VirtualQueryEx(hProc,addr, &meminfo, sizeof(meminfo)) == 0)
{
break;
}
if((meminfo.State & MEM_COMMIT) && (meminfo.Protect & WRITABLE)) //((binary comparison of meminfos state and MEM_COMMIT, this is basically filtering out memory that the process has reserved but not used)())
{
MEMBLOCK * mb = create_memblock(hProc, &meminfo);
if(mb)
{
mb->next = mb_list;
mb_list = mb;
}
}
addr = (unsigned char *)meminfo.BaseAddress + meminfo.RegionSize;//move the adress along by adding on the length of the current block
}
}
else
{
std::cout<<"Failed to open process\n";
}
std::cout<<"Create scan finished\n";
return mb_list;
The output from this code results in
Create scan started on process:7228
Then it does not return anything else to the console. Unfortunately the example source code linked to via the Youtube video is no longer available.
(7228 will change based on the current pid of notepad.exe)
edit-reply to question #Hans Passant
I still don't understand, what I think Im doing is
Starting a infinite loop
{
Testing using vqx if the address is valid and populating my MEM_BASIC_etc..
{
(has the process commited to using that addr of memory)(is the memory writeable)
{
create memblock etc
}
}
move the address along by the size of the current block
}
My program is x32 and so is notepad (as far as I'm aware).
Is my problem that because I'm using a x64 bit OS that I'm actually inspecting half of a block (a block here meaning the unit assigned by the OS in memory) and its causing it to loop?
Big thanks for your help! I want to understand my problem as well as fix it.

Your problem is you're compiling a 32 bit program and using it to parse the memory of a 64 bit program. You define 'addr' as a unsigned char pointer, which in this case is 32 bits in size. It cannot contain a 64 bit address, which is the cause of your problem.
If your target process is 64 bit, compile your program as 64 bit as well. For 32 bit target processes, compile for 32 bit. This is typically the best technique for dealing with the memory of external processes and is the fastest solution.
Depending on what you're doing, you can also use #ifdef and other conditionals to use 64 bit variables depending on the target, but the original solution is usually easier.

Contiguous VirtualAlloc behaviour on Windows Mobile

I have been optimising memory performance on a Windows Mobile application and have encountered some differences in behaviour between VirtualAlloc on Win32 and Windows CE.
Consider the following test:
// Allocate 64k of memory
BYTE *a = (BYTE*)VirtualAlloc(0, 65536,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
// Allocate a second contiguous 64k of memory
BYTE *b = (BYTE*)VirtualAlloc(a+65536, 65536,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
BYTE *c = a + 65528; // Set a pointer near the end of the first allocation
BOOL valid1 = !IsBadWritePtr(c, 8); // Expect TRUE
BOOL valid2 = !IsBadWritePtr(c+8, 4088); // Expect TRUE
BOOL valid3 = !IsBadWritePtr(c, 4096); // TRUE on Win32, FALSE on WinCE
The code "allocates" 4096 of data starting at "c". On Win32 this works. I can find no mention in the VirtualAlloc documentation whether it is legal or coincidence but there are many examples of code that I have found via google that expect this behaviour.
On Windows CE 5.0/5.2 if I use the memory block at "c", in 99% of cases there are no problems, however on some (not all) Windows Mobile 6 devices, ReadFile & WriteFile will fail with error 87 (The parameter is incorrect.). I assume ReadFile is calling IsBadWritePtr or similar and return false due to this. If I perform two ReadFile calls then everything works fine. (There may of course be other API calls that will also fail.)
I am looking for a way to extend the memory returned by VirtualAlloc so that I can make the above work. Reserving a large amount of memory on Windows CE is problematic as each process only gets 32MB and due to other items being loaded it is not possible to reserve a large region of memory without causing other problems. (It is possible to reserve a larger amount of memory in the shared region but this also has other problems.)
Is there a way to get VirtualAlloc to enlarge or combine regions without reserving it up front?
I suspect it may be problematic given the following examples:
HANDLE hHeap1 = HeapCreate(0, 0, 0); // Heap defaults to 192k
BYTE * a1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +96 bytes from start of heap
BYTE * b1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +16 bytes from end of a1
BYTE * c1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +16 bytes from end of b1
BYTE * d1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +4528 bytes from end of c1
HANDLE hHeap2 = HeapCreate(0, 4*1024*1024, 4*1024*1024); // 4MB Heap
BYTE * a2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +96 bytes from start of heap
BYTE * b2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of a2
BYTE * c2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of b2
BYTE * d2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of c2

No, it's not possible.

I wouldn't trust IsBadWritePtr, see this:
http://support.microsoft.com/kb/960154
I think it does something like tries to write to the memory, and sees if it gets a hardware exception. It handles the exception and then returns true or false. But, I've heard that some hardware will only raise an exception once for an attempt to write to one page, or something like that.
Is it not possible for you to VirtualAlloc without a MEM_COMMIT, so you just reserve the virtual memory addresses. Then when you actually want to use the memory, call VirtualAlloc again to do the commit. As you already reserved the pages, you should be able to commit them contiguously.
You say you want to do it without reserving up front. But, what's wrong with reserving up front. Reserving doesn't actually use any physical memory, it reserves a range of virtual addresses, it'll only use physical memory when you commit the pages.

Ah, ok. In that case have you heard of the large memory area, I think that's Microsoft's workaround/fix to your problem.
Basically, if you VirtualAlloc more than 2 meg, it goes into the large memory area of the virtual memory and this lets you exceed the 32mb limit.
There's quite a good discussion of how it all works here:
http://msdn.microsoft.com/en-us/library/ms836325.aspx
I used it myself on a previous product, and it did the trick for me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js