Contiguous VirtualAlloc behaviour on Windows Mobile - c++

I have been optimising memory performance on a Windows Mobile application and have encountered some differences in behaviour between VirtualAlloc on Win32 and Windows CE.
Consider the following test:
// Allocate 64k of memory
BYTE *a = (BYTE*)VirtualAlloc(0, 65536,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
// Allocate a second contiguous 64k of memory
BYTE *b = (BYTE*)VirtualAlloc(a+65536, 65536,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
BYTE *c = a + 65528; // Set a pointer near the end of the first allocation
BOOL valid1 = !IsBadWritePtr(c, 8); // Expect TRUE
BOOL valid2 = !IsBadWritePtr(c+8, 4088); // Expect TRUE
BOOL valid3 = !IsBadWritePtr(c, 4096); // TRUE on Win32, FALSE on WinCE
The code "allocates" 4096 of data starting at "c". On Win32 this works. I can find no mention in the VirtualAlloc documentation whether it is legal or coincidence but there are many examples of code that I have found via google that expect this behaviour.
On Windows CE 5.0/5.2 if I use the memory block at "c", in 99% of cases there are no problems, however on some (not all) Windows Mobile 6 devices, ReadFile & WriteFile will fail with error 87 (The parameter is incorrect.). I assume ReadFile is calling IsBadWritePtr or similar and return false due to this. If I perform two ReadFile calls then everything works fine. (There may of course be other API calls that will also fail.)
I am looking for a way to extend the memory returned by VirtualAlloc so that I can make the above work. Reserving a large amount of memory on Windows CE is problematic as each process only gets 32MB and due to other items being loaded it is not possible to reserve a large region of memory without causing other problems. (It is possible to reserve a larger amount of memory in the shared region but this also has other problems.)
Is there a way to get VirtualAlloc to enlarge or combine regions without reserving it up front?
I suspect it may be problematic given the following examples:
HANDLE hHeap1 = HeapCreate(0, 0, 0); // Heap defaults to 192k
BYTE * a1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +96 bytes from start of heap
BYTE * b1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +16 bytes from end of a1
BYTE * c1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +16 bytes from end of b1
BYTE * d1 = (BYTE*)HeapAlloc(hHeap1, 0, 64000); // +4528 bytes from end of c1
HANDLE hHeap2 = HeapCreate(0, 4*1024*1024, 4*1024*1024); // 4MB Heap
BYTE * a2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +96 bytes from start of heap
BYTE * b2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of a2
BYTE * c2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of b2
BYTE * d2 = (BYTE*)HeapAlloc(hHeap2, 0, 64000); // +16 bytes from end of c2

No, it's not possible.

I wouldn't trust IsBadWritePtr, see this:
http://support.microsoft.com/kb/960154
I think it does something like tries to write to the memory, and sees if it gets a hardware exception. It handles the exception and then returns true or false. But, I've heard that some hardware will only raise an exception once for an attempt to write to one page, or something like that.
Is it not possible for you to VirtualAlloc without a MEM_COMMIT, so you just reserve the virtual memory addresses. Then when you actually want to use the memory, call VirtualAlloc again to do the commit. As you already reserved the pages, you should be able to commit them contiguously.
You say you want to do it without reserving up front. But, what's wrong with reserving up front. Reserving doesn't actually use any physical memory, it reserves a range of virtual addresses, it'll only use physical memory when you commit the pages.

Ah, ok. In that case have you heard of the large memory area, I think that's Microsoft's workaround/fix to your problem.
Basically, if you VirtualAlloc more than 2 meg, it goes into the large memory area of the virtual memory and this lets you exceed the 32mb limit.
There's quite a good discussion of how it all works here:
http://msdn.microsoft.com/en-us/library/ms836325.aspx
I used it myself on a previous product, and it did the trick for me.

Related

What is the most reliable / portable way to allocate memory at low addresses on 64-bit systems?

I need to allocate large blocks of memory (to be used by my custom allocator) that fall within the first 32GB of virtual address space.
I imagine that if I needed, say, 1MB blocks, I could iterate using mmap and MAP_FIXED_NOREPLACE (or VirtualAlloc) from low addresses onwards in increments of, say, 1MB, until the call succeeds. Continue from the last successful block for the next one.
This sounds clumsy, but at least it will be somewhat robust against OS address space layout changes, and ASLR algorithm changes. From my understanding of current OS layouts, there should be plenty of memory available in the first 32GB this way, but maybe I am missing something?
Is there anything in Windows, Linux, OS X, iOS or Android that would defeat this scheme? Is there a better way?
Just in case you're wondering, this is for the implementation of a VM for a programming language where fitting all pointers in a 32-bit value on a 64-bit system could give huge memory usage advantages and even speed gains. Since all objects are at least 8-byte aligned, the lower 3 bits can be shifted out, expanding the pointer range from 4GB to 32GB.
for restrict allocated memory range in windows we can use NtAllocateVirtualMemory function - this api is avaible for use in both user and kernel mode. in user mode it exported by ntdll.dll (use ntdll.lib or ntdllp.lib from wdk). in this api exist parameter - ZeroBits - The number of high-order address bits that must be zero in the base address of the section view. but in msdn link next words about ZeroBits is incorrect. correct is:
ZeroBits
Supplies the number of high order address bits that must be zero in
the base address of the section view. The value of this argument must
be less than or equal to the maximum number of zero bits and is only
used when memory management determines where to allocate the view
(i.e. when BaseAddress is null).
If ZeroBits is zero, then no zero bit constraints are applied.
If ZeroBits is greater than 0 and less than 32, then it is the
number of leading zero bits from bit 31. Bits 63:32 are also required
to be zero. This retains compatibility with 32-bit systems.
If ZeroBits is greater than 32, then it is considered as a mask and then number of leading zero are counted out
in the mask. This then becomes the zero bits argument.
so really we can use ZeroBits as mask - this is most power usage. but can be use and as zero bit count from 31 bit (in this case 63-32 bits will be always equal to 0). because allocation granularity (currently 64kb - 0x10000) - low 16 bits is always 0. so valid value for ZeroBits in bits-number mode - from 1 to 15 (=31-16). for better understand how this parameter work - look for example code. for better demo effect i will be use
MEM_TOP_DOWN
The specified region should be created at the highest virtual address
possible based on ZeroBits.
PVOID BaseAddress;
ULONG_PTR ZeroBits;
SIZE_T RegionSize = 1;
NTSTATUS status;
for (ZeroBits = 0xFFFFFFFFFFFFFFFF;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%p:%p\n", ZeroBits, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
ZeroBits >>= 1;
}
else
{
DbgPrint("%x\n", status);
break;
}
}
for(ZeroBits = 0;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%x:%p\n", ZeroBits++, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
}
else
{
DbgPrint("%x\n", status);
break;
}
}
and output:
FFFFFFFFFFFFFFFF:00007FF735B40000
7FFFFFFFFFFFFFFF:00007FF735B40000
3FFFFFFFFFFFFFFF:00007FF735B40000
1FFFFFFFFFFFFFFF:00007FF735B40000
0FFFFFFFFFFFFFFF:00007FF735B40000
07FFFFFFFFFFFFFF:00007FF735B40000
03FFFFFFFFFFFFFF:00007FF735B40000
01FFFFFFFFFFFFFF:00007FF735B40000
00FFFFFFFFFFFFFF:00007FF735B40000
007FFFFFFFFFFFFF:00007FF735B40000
003FFFFFFFFFFFFF:00007FF735B40000
001FFFFFFFFFFFFF:00007FF735B40000
000FFFFFFFFFFFFF:00007FF735B40000
0007FFFFFFFFFFFF:00007FF735B40000
0003FFFFFFFFFFFF:00007FF735B40000
0001FFFFFFFFFFFF:00007FF735B40000
0000FFFFFFFFFFFF:00007FF735B40000
00007FFFFFFFFFFF:00007FF735B40000
00003FFFFFFFFFFF:00003FFFFFFF0000
00001FFFFFFFFFFF:00001FFFFFFF0000
00000FFFFFFFFFFF:00000FFFFFFF0000
000007FFFFFFFFFF:000007FFFFFF0000
000003FFFFFFFFFF:000003FFFFFF0000
000001FFFFFFFFFF:000001FFFFFF0000
000000FFFFFFFFFF:000000FFFFFF0000
0000007FFFFFFFFF:0000007FFFFF0000
0000003FFFFFFFFF:0000003FFFFF0000
0000001FFFFFFFFF:0000001FFFFF0000
0000000FFFFFFFFF:0000000FFFFF0000
00000007FFFFFFFF:00000007FFFF0000
00000003FFFFFFFF:00000003FFFF0000
00000001FFFFFFFF:00000001FFFF0000
00000000FFFFFFFF:00000000FFFF0000
000000007FFFFFFF:000000007FFF0000
000000003FFFFFFF:000000003FFF0000
000000001FFFFFFF:000000001FFF0000
000000000FFFFFFF:000000000FFF0000
0000000007FFFFFF:0000000007FF0000
0000000003FFFFFF:0000000003FF0000
0000000001FFFFFF:0000000001FF0000
0000000000FFFFFF:0000000000FF0000
00000000007FFFFF:00000000007F0000
00000000003FFFFF:00000000003F0000
00000000001FFFFF:00000000001F0000
00000000000FFFFF:00000000000F0000
000000000007FFFF:0000000000070000
000000000003FFFF:0000000000030000
000000000001FFFF:0000000000010000
c0000017
0:00007FF735B40000
1:000000007FFF0000
2:000000003FFF0000
3:000000001FFF0000
4:000000000FFF0000
5:0000000007FF0000
6:0000000003FF0000
7:0000000001FF0000
8:0000000000FF0000
9:00000000007F0000
a:00000000003F0000
b:00000000001F0000
c:00000000000F0000
d:0000000000070000
e:0000000000030000
f:0000000000010000
c0000017
so if we say want restrict memory allocation to 32Gb(0x800000000) - we can use ZeroBits = 0x800000000 - 1:
NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
0x800000000 - 1, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)
as result memory will be allocated in range [0, 7FFFFFFFF] (actually [0, 7FFFF0000] because allocation granularity low 16 bits of address always is 0)
than you can create heap by RtlCreateHeap on allocated region range and allocate memory from this heap (note - this is also user mode api - use ntdll[p].lib for linker input)
PVOID BaseAddress = 0;
SIZE_T RegionSize = 0x10000000;// reserve 256Mb
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &BaseAddress,
0x800000000 - 1, &RegionSize, MEM_RESERVE, PAGE_READWRITE))
{
if (PVOID hHeap = RtlCreateHeap(0, BaseAddress, RegionSize, 0, 0, 0))
{
HeapAlloc(hHeap, 0, <somesize>);
RtlDestroyHeap(hHeap);
}
VirtualFree(BaseAddress, 0, MEM_RELEASE);
}

HeapWalk not working as expected in Release mode

So I used this example of the HeapWalk function to implement it into my app. I played around with it a bit and saw that when I added
HANDLE d = HeapAlloc(hHeap, 0, sizeof(int));
int* f = new(d) int;
after creating the heap then some new output would be logged:
Allocated block Data portion begins at: 0X037307E0
Size: 4 bytes
Overhead: 28 bytes
Region index: 0
So seeing this I thought I could check Entry.wFlags to see if it was set as PROCESS_HEAP_ENTRY_BUSY to keep a track of how much allocated memory I'm using on the heap. So I have:
HeapLock(heap);
int totalUsedSpace = 0, totalSize = 0, largestFreeSpace = 0, largestCounter = 0;
PROCESS_HEAP_ENTRY entry;
entry.lpData = NULL;
while (HeapWalk(heap, &entry) != FALSE)
{
int entrySize = entry.cbData + entry.cbOverhead;
if ((entry.wFlags & PROCESS_HEAP_ENTRY_BUSY) != 0)
{
// We have allocated memory in this block
totalUsedSpace += entrySize;
largestCounter = 0;
}
else
{
// We do not have allocated memory in this block
largestCounter += entrySize;
if (largestCounter > largestFreeSpace)
{
// Save this value as we've found a bigger space
largestFreeSpace = largestCounter;
}
}
// Keep a track of the total size of this heap
totalSize += entrySize;
}
HeapUnlock(heap);
And this appears to work when built in debug mode (totalSize and totalUsedSpace are different values). However, when I run it in Release mode totalUsedSpace is always 0.
I stepped through it with the debugger while in Release mode and for each heap it loops three times and I get the following flags in entry.wFlags from calling HeapWalk:
1 (PROCESS_HEAP_REGION)
0
2 (PROCESS_HEAP_UNCOMMITTED_RANGE)
It then exits the while loop and GetLastError() returns ERROR_NO_MORE_ITEMS as expected.
From here I found that a flag value of 0 is "the committed block which is free, i.e. not being allocated or not being used as control structure."
Does anyone know why it does not work as intended when built in Release mode? I don't have much experience of how memory is handled by the computer, so I'm not sure where the error might be coming from. Searching on Google didn't come up with anything so hopefully someone here knows.
UPDATE: I'm still looking into this myself and if I monitor the app using vmmap I can see that the process has 9 heaps, but when calling GetProcessHeaps it returns that there are 22 heaps. Also, none of the heap handles it returns matches to the return value of GetProcessHeap() or _get_heap_handle(). It seems like GetProcessHeaps is not behaving as expected. Here is the code to get the list of heaps:
// Count how many heaps there are and allocate enough space for them
DWORD numHeaps = GetProcessHeaps(0, NULL);
HANDLE* handles = new HANDLE[numHeaps];
// Get a handle to known heaps for us to compare against
HANDLE defaultHeap = GetProcessHeap();
HANDLE crtHeap = (HANDLE)_get_heap_handle();
// Get a list of handles to all the heaps
DWORD retVal = GetProcessHeaps(numHeaps, handles);
And retVal is the same value as numHeaps, which indicates that there was no error.
Application Verifier had been set up previously to do a full page heap verifying of my executable and was interfering with the heaps returned by GetProcessHeaps. I'd forgotten about it being set up as it was done for a different issue several days ago and then closed without clearing the tests. It wasn't happening in debug build because the application builds to a different file name for debug builds.
We managed to detect this by adding a breakpoint and looking at the callstack of the thread. We could see the AV DLL had been injected in and that let us know where to look.

mach_header 64bit and __PAGEZERO segment 64bit

const struct mach_header *mach = _dyld_get_image_header(0);
struct load_command *lc;
struct segment_command_64 *sc64;
struct segment_command *sc;
if (mach->magic == MH_MAGIC_64) {
lc = (struct load_command *)((unsigned char *)mach + sizeof(struct mach_header_64));
printf("[+] detected 64bit ARM binary in memory.\n");
} else {
lc = (struct load_command *)((unsigned char *)mach + sizeof(struct mach_header));
printf("[+] detected 32bit ARM binary in memory.\n");
}
for (int i = 0; i < mach->ncmds; i++) {
if (lc->cmd == LC_SEGMENT) {
sc = (struct segment_command *)lc;
NSLog(#"32Bit: %s (%x - 0x%x)",sc->segname,sc->vmaddr,sc->vmsize);
} else if (lc->cmd == LC_SEGMENT_64) {
sc64 = (struct segment_command_64 *)lc;
NSLog(#"64Bit: %s (%llx - 0x%llx)",sc64->segname,sc64->vmaddr,sc64->vmsize);
}
lc = (struct load_command *)((unsigned char *)lc+lc->cmdsize);
}
When I run this code in 32Bit I get normal outputs:
__PAGEZERO (0 - 0x1000)
But on 64Bit: __PAGEZERO (0 - 0x100000000)
__PAGEZERO goes from 0x1000 to over 0x100000000 in size, is there any fix for it or any solution why this occurs?
Making a big __PAGEZERO in a 64-bit architecture makes a whole lot of sense. The address range of a 64-bit system, even when the upper 16 bits are "cropped off" like that of x86_64, allows for a huge amount of memory (the 48-bit address space of x86_64 is 256TB of memory address space). It is highly likely that this will be thought of as "small" at some point in the future, but right now, the biggest servers have 1-4TB, so there's plenty of room to grow, and more ordinary machines have 16-32GB.
Note also that no memory is actually OCCUPIED. It's just "reserved virtual space" (that is, "it will never be used"). It takes up absolutely zero resources, because it's not mapped in the page-table, it's not there physically. It's just an entry in the file, which tells the loader to reserve this space to it can never be used, and thus "safeguarded". The actual "data" of this section is zero in size, since, again, there's actually nothing there, just a "make sure this is not used". So your actual file size won't be any larger or smaller if this section is changed in size. It would be a few bytes smaller (the size of the section description) if it didn't exist at all. But that's really the only what it would make any difference at all.
The purpose of a __PAGEZERO is to catch NULL pointer dereferences. By reserving a large section of memory at the beginning of memory, any access through a NULL pointer will be caught and the application aborted. In a 32-bit architecture, something like:
int *p = NULL;
int x = p[0x100000];
is likely to succeed, because at 0x400000 (4MB) the code-space starts (trying to write to such a location is likely to crash, but reading will work - assuming of course the code-space actually starts there and not someplace else in the address range.
Edit:
This presentation shows that ARM, the latest entrant into the 64-bit processor sapce, is also using 48-bit virtual address space, and enforces canonical addresses (top 16 bits need to all be the same value) so it can be expanded in the future. In other words, the virtual space available on a 64-bit ARM processor is also 256TB.

Why this app doesn't consume as much memory as expected

I wrote a simple application to test memory consumption. In this test application, I created four processes to continually consume memory, those processes won't release the memory unless the process exits.
I expected this test application to consume the most memory of RAM and cause the other application to slow down or crash. But the result is not the same as expected. Below is the code:
#include <stdio.h>
#include <unistd.h>
#include <list>
#include <vector>
using namespace std;
unsigned short calcrc(unsigned char *ptr, int count)
{
unsigned short crc;
unsigned char i;
//high cpu-consumption code
//implements the CRC algorithm
//CRC is Cyclic Redundancy Code
}
void* ForkChild(void* param){
vector<unsigned char*> MemoryVector;
pid_t PID = fork();
if (PID > 0){
const int TEN_MEGA = 10 * 10 * 1024 * 1024;
unsigned char* buffer = NULL;
while(1){
buffer = NULL;
buffer = new unsigned char [TEN_MEGA];
if (buffer){
try{
calcrc(buffer, TEN_MEGA);
MemoryVector.push_back(buffer);
} catch(...){
printf("An error was throwed, but caught by our app!\n");
delete [] buffer;
buffer = NULL;
}
}
else{
printf("no memory to allocate!\n");
try{
if (MemoryVector.size()){
buffer = MemoryVector[0];
calcrc(buffer, TEN_MEGA);
buffer = NULL;
} else {
printf("no memory ever allocated for this Process!\n");
continue;
}
} catch(...){
printf("An error was throwed -- branch 2,"
"but caught by our app!\n");
buffer = NULL;
}
}
} //while(1)
} else if (PID == 0){
} else {
perror("fork error");
}
return NULL;
}
int main(){
int children = 4;
while(--children >= 0){
ForkChild(NULL);
};
while(1) sleep(1);
printf("exiting main process\n");
return 0;
}
TOP command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2775 steve 20 0 1503m 508 312 R 99.5 0.0 1:00.46 test
2777 steve 20 0 1503m 508 312 R 96.9 0.0 1:00.54 test
2774 steve 20 0 1503m 904 708 R 96.6 0.0 0:59.92 test
2776 steve 20 0 1503m 508 312 R 96.2 0.0 1:00.57 test
Though CPU is high, but memory percent remains 0.0. How can it be possible??
Free command
free shared buffers cached
Mem: 3083796 0 55996 428296
Free memory is more than 3G out of 4G RAM.
Does there anybody know why this test app just doesn't work as expected?
Linux uses optimistic memory allocation: it will not physically allocate a page of memory until that page is actually written to. For that reason, you can allocate much more memory than what is available, without increasing memory consumption by the system.
If you want to force the system to allocate (commit) a physical page , then you have to write to it.
The following line does not issue any write, as it is default-initialization of unsigned char, which is a no-op:
buffer = new unsigned char [TEN_MEGA];
If you want to force a commit, use zero-initialization:
buffer = new unsigned char [TEN_MEGA]();
To make the comments into an answer:
Linux will not allocate memory pages for a process until it writes to them (copy-on-write).
Additionally, you are not writing to your buffer anywhere, as the default constructor for unsigned char does not perform any initializations, and new[] default-initializes all items.
fork() returns the PID in the parent, and 0 in the child. Your ForkChild as written will execute all the work in the parent, not the child.
And the standard new operator will never return null; it will throw if it fails to allocate memory (but due to overcommit it won't actually do that either in Linux). This means your test of buffer after the allocation is meaningless: it will always either take the first branch or never reach the test. If you want a null return, you need to write new (std::nothrow) .... Include <new> for that to work.
But your program is infact doing what you expected it to do. As an answer has pointed out (# Michael Foukarakis's answer), memory not used is not allocated. In your output of the top program, I noticed that the column virt had a large amount of memory on it for each process running your program. A little googling later, I saw what this was:
VIRT -- Virtual Memory Size (KiB). The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used.
So as you can see, your program does in fact generate memory for itself, but in the form of pages and stored as virtual memory. And I think that is a smart thing to do
A snippet from this wiki page
A page, memory page, or virtual page -- a fixed-length contiguous block of virtual memory, and it is the smallest unit of data for the following:
memory allocation performed by the operating system for a program; and
transfer between main memory and any other auxiliary store, such as a hard disk drive.
...Thus a program can address more (virtual) RAM than physically exists in the computer. Virtual memory is a scheme that gives users the illusion of working with a large block of contiguous memory space (perhaps even larger than real memory), when in actuality most of their work is on auxiliary storage (disk). Fixed-size blocks (pages) or variable-size blocks of the job are read into main memory as needed.
Sources:
http://www.computerhope.com/unix/top.htm
https://stackoverflow.com/a/18917909/2089675
http://en.wikipedia.org/wiki/Page_(computer_memory)
If you want to gobble up a lot of memory:
int mb = 0;
char* buffer;
while (1) {
buffer = malloc(1024*1024);
memset(buffer, 0, 1024*1024);
mb++;
}
I used something like this to make sure the file buffer cache was empty when taking some file I/O timing measurements.
As other answers have already mentioned, your code doesn't ever write to the buffer after allocating it. Here memset is used to write to the buffer.

MapViewOfFile and VirtualLock

Will the following code load data from file into system memory so that access to the resulting pointer will never block threads?
auto ptr = VirtualLock(MapViewOfFile(file_map, FILE_MAP_READ, high, low, size), size); // Map file to memory and wait for DMA transfer to finish.
int val0 = reinterpret_cast<int*>(ptr)[0]; // Will not block thread?
int val1 = reinterpret_cast<int*>(ptr)[size-4]; // Will not block thread?
VirtualUnlock(ptr);
UnmapViewOfFile(ptr);
EDIT:
Updated after Dammons answer.
auto ptr = MapViewOfFile(file_map, FILE_MAP_READ, high, low, size);
#pragma optimize("", off)
char dummy;
for(int n = 0; n < size; n += 4096)
dummy = reinterpret_cast<char*>(ptr)[n];
#pragma optimize("", on)
int val0 = reinterpret_cast<int*>(ptr)[0]; // Will not block thread?
int val1 = reinterpret_cast<int*>(ptr)[size-4]; // Will not block thread?
UnmapViewOfFile(ptr);
If the file's size is less than the ridiculously small maximum working set size (or, if you have modified your working set size accordingly) then in theory yes. If you exceed your maximum working set size, VirtualLock will simply do nothing (that is, fail).
(In practice, I've seen VirtualLock being rather... liberal... at interpreting what it's supposed to do as opposed to what it actually does, at least under Windows XP -- might be different under more modern versions)
I've been trying similar things in the past, and I'm now simply touching all pages that I want in RAM with a simple for loop (reading one byte). This leaves no questions open and works, with the sole possible exception that a page might in theory get swapped out again after touched. In practice, this never happens (unless the machine is really really low on RAM, and then it's ok to happen).