C++ Memory leaks on Windows 7 - c++

I'm writing a program (C++, MinGW 32 bit) to batch process images using OpenCV functions, using AngelScript as a scripting language. As of right now, my software has some memory leaks that add up pretty quickly (the images are 100-200 MB each, and I'm processing thousands at once) but I'm running into an image where Windows doesn't seem to be releasing the memory used by my program until rebooting.
If I run it on a large set of images, it runs for a while and eventually OpenCV throws an exception saying that it's out of memory. At that point, I close the program, and Task Manager's physical memory meter drops back down to where it was before I started. But here's the catch - every time I try to run the program again, it will fail right off the bat to allocate memory to OpenCV, until I reboot the computer, at which point it will work just great for a few hundred images again.
Is there some way Windows could be holding on to that memory? Or is there another reason why Windows would fail to allocate memory to my program until a reboot occurs? This doesn't make sense to me.
EDIT: The computer I'm running this program on is Windows 7 64 bit with 32 GB of ram, so even with my program's memory issues, it's only using a small amount of the available memory. Normally the program maxes out at a little over 1 GB of ram before it quits.
EDIT 2: I'm also using FreeImage to load the images, I forgot to mention that. Here's the basis of my processing code:
//load bitmap with FreeImage
FIBITMAP *bitmap = NULL;
FREE_IMAGE_FORMAT fif = FIF_UNKNOWN;
fif = FreeImage_GetFileType(filename.c_str(), 0);
bitmap = FreeImage_Load(fif, filename.c_str(), 0);
if (!bitmap) {
LogString("ScriptEngine: input file is not readable.");
processingFile = false;
return false;
}
//convert FreeImage bitmap to my custom wrapper for OpenCV::Mat
ScriptImage img;
img.image = fi2cv(bitmap);
FreeImage_Unload(bitmap);
try {
//this executes the AngelScript code
r = ctx->Execute();
} catch(std::exception e) {
std::cout << "Exception in " << __FILE__ << ", line " << __LINE__ << ", " << __FUNCTION__ << ": " << e.what() << std::endl;
}
try {
engine->GarbageCollect(asGC_FULL_CYCLE | asGC_DESTROY_GARBAGE);
} catch (std::exception e) {
std::cout << "Exception in " << __FILE__ << ", line " << __LINE__ << ", " << __FUNCTION__ << ": " << e.what() << std::endl;
}
As you can see, the only pointer is to the FIBITMAP, which is freed.

It is very likely that you are making a copy of the image data on this line:
img.image = fi2cv(bitmap);
Since you are immediately freeing the bitmap afterwards, that data must persist after the free.
Check if there is a resource release for ScriptImage objects.

Related

Windows DLL function behaviour is different if DLL is moved to different location

I'm attempting to debug some very opaque issues with DLLs in Unreal on a CI machine (see Unreal: Diagnosing why Windows cannot load a DLL for more information). glu32.dll seems to be the DLL at which the Unreal process falls over, and as Windows Server doesn't contain all the graphics-related DLLs that normal Windows 10 does, I was recommended to upload certain DLLs from my machine/Microsoft redistributables in order to make sure the Unreal build process could run.
For sanity purposes, I've written a small utility program to test whether glu32.dll on my machine can be dynamically loaded and can have its functions called correctly. I'm planning to run this executable on the troublesome CI machine soon to see what happens.
The code for the program is below:
#include <windows.h>
#include <iostream>
#include <GL/gl.h>
extern "C"
{
typedef const GLubyte* (__stdcall *ErrorStringFunc)(GLenum error);
}
int main(int argc, char** argv)
{
if (argc < 2)
{
std::cerr << "Usage: GLU32Loader.exe <path to glu32.dll>" << std::endl;
return 1;
}
const char* path = argv[1];
std::cout << "Attempting to load: " << path << std::endl;
HMODULE dllHandle = LoadLibraryA(path);
if (!dllHandle)
{
std::cerr << "Could not load " << path << std::endl;
return 1;
}
std::cout << "Successfully loaded DLL: 0x" << dllHandle << std::endl;
const char* funcName = "gluErrorString";
std::cout << "Looking up function: " << funcName << std::endl;
ErrorStringFunc func = reinterpret_cast<ErrorStringFunc>(GetProcAddress(dllHandle, funcName));
if (func)
{
std::cout << "Successfully loaded function: 0x" << func << std::endl;
const GLubyte* str = (*func)(100902);
std::cout << "Error string for value 100902: \"" << str << "\" (0x" << static_cast<const void*>(str) << ")" << std::endl;
}
else
{
std::cerr << "Failed to load function " << funcName << std::endl;
}
FreeLibrary(dllHandle);
return 0;
}
When I run the executable and point it to glu32.dll in the System32 folder, I get expected output:
> GLU32Loader.exe "C:\Windows\System32\glu32.dll"
Attempting to load: C:\Windows\System32\glu32.dll
Successfully loaded DLL: 0x00007FFC7A350000
Looking up function: gluErrorString
Successfully loaded function: 0x00007FFC7A35C650
Error string for value 100902: "out of memory" (0x000001E5757F51D0)
However, if I copy the DLL to my desktop and run the program again, although the DLL and function appear to be loaded, the string returned from the function is empty:
> GLU32Loader.exe "C:\Users\Jonathan\Desktop\glu32.dll"
Attempting to load: C:\Users\Jonathan\Desktop\glu32.dll
Successfully loaded DLL: 0x00007FFC8DDB0000
Looking up function: gluErrorString
Successfully loaded function: 0x00007FFC8DDBC650
Error string for value 100902: "" (0x0000025C5236E520)
Why would this be? It's exactly the same DLL, just in a different folder, and I would have thought that any other dependent DLLs that it references should still be available because they're all in System32. Is there some mystical property of Windows DLLs that I'm not familiar with that might cause this to happen?
This is an example of why one shall not mess around with system DLLs.
The DLL in question, like many Microsoft DLLs, uses MUI (Multilingual User Interface).
If you look at its resources, it has no resources except a MUI type resource, pointing to a folder containing the corresponding .mui file, which contains its actual (internationalized) resources.
So, if you still want to copy it, at least also copy the corresponding .mui file:
System32\glu32.dll → <my_files>\glu32.dll
System32\en-US\glu32.dll.mui → <my_files>\en-US\glu32.dll.mui
The en-US part may be different on your system depending on the default locale.
EDIT: I saw only now from your log that you didn't rename the file. Then I'm not sure what it can be. I'll leave this explanation up anyway because it would also be what happens if one were to rename that file, so maybe it is helpful to someone else...
It seems to me as if you renamed the DLL file (not just loaded it from another location but with another filename as well).
glu32.dll doesn't like to be renamed, because in some places code like GetModuleHandle("glu32.dll") is used instead of saving the hInstance received in DllMain into a global variable and using that handle (which is what should have been done, but unfortunately it isn't what Microsoft did). If you rename the DLL, this call will return NULL. Now unfortunately there also isn't much error handling going on in glu32 in that case.
The error strings are stored in global arrays of some sort, but they are lazy-loaded there from string table resources. The first time you call gluErrorString, the error strings are loaded using LoadString which takes the hInstance of the DLL. With the renamed DLL, this will be the bogus NULL handle, and calling LoadString(NULL, ...) will return 0, indicating an error. Normally the number returned is the length of the string. glu32 doesn't handle the zero case in any special way and just copies the zero characters to the array and happily returns an empty string to you at the end.

Why is the point-cloud-library's loadPCDFile so slow?

I am reading 2.2 million points from a PCD file, and loadPCDFile is using ca 13 sec both in Release as well as Debug mode. Given that visualization programs like CloudCompare can read the file in what seems like milliseconds, I expect that I am doing something harder than it needs to be.
What am I doing wrong?
The top of my PCD file:
# .PCD v0.7 - Point Cloud Data file format
VERSION 0.7
FIELDS rgb x y z _
SIZE 4 4 4 4 1
TYPE F F F F U
COUNT 1 1 1 1 4
WIDTH 2206753
HEIGHT 1
VIEWPOINT 0 0 0 1 0 0 0
POINTS 2206753
DATA binary
¥•ÃöèÝÃájfD ®§”ÃÍÌÝÃá:fD H”ø¾ÝÃH!fD .....
From my code, reading the file:
#include <iostream>
#include <vector>
#include <pcl/common/common.h>
#include <pcl/common/common_headers.h>
#include <pcl/common/angles.h>
#include <pcl/io/pcd_io.h>
#include <pcl/point_types.h>
#include <pcl/visualization/pcl_visualizer.h>
#include <pcl/console/parse.h>
#include <pcl/filters/extract_indices.h>
#include <pcl/features/normal_3d.h>
#include <boost/thread/thread.hpp>
int main() {
(...)
pcl::PointCloud<pcl::PointXYZRGB>::Ptr largeCloud(new pcl::PointCloud<pcl::PointXYZRGB>);
largeCloud->points.resize(3000000); //Tried to force resizing only once. Did not help much.
if (pcl::io::loadPCDFile<pcl::PointXYZRGB>("MY_POINTS.pcd", *largeCloud) == -1) {
PCL_ERROR("Couldn't read file MY_POINTS.pcd\n");
return(-1);
}
(...)
return 0;
}
(Using PCL 1.8 and Visual Studio 2015)
Summary of below...
PCL is slightly slower at loading cloud compare formatted PCD files. Looking at the headers, CC seems to add an extra variable to each point "_" that PCL doesn't like and has to format out. But this is only a difference of 30%-40% load time.
Based on the result that with the same size point cloud (3M), my computer took 13 seconds to load it from cloud compare when the program was compiled in Debug mode and only 0.25s to load the same cloud in Release mode. I think that you are running in debug mode. Depending on how you compiled/installed PCL, you may need to rebuild PCL to generate the appropriate Release build. My guess is that whatever you think you are doing to change from Debug to Release is not in fact engaging the PCL release library.
In PCL, across almost all functions, moving from Debug to Release will often give you one to two orders of magnitude faster processing (due to PCL's heavy usage of large array objects that have to be managed differently in Debug mode for visibility)
Testing PCL with cloud compare files
Here is the code that I ran to produce the following outputs:
std::cout << "Press enter to load cloud compare sample" << std::endl;
std::cin.get();
TimeStamp stopWatch = TimeStamp();
pcl::PointCloud<pcl::PointXYZRGB>::Ptr tempCloud2(new pcl::PointCloud<pcl::PointXYZRGB>);
pcl::io::loadPCDFile("C:/SO/testTorusColor.pcd", *tempCloud2);
stopWatch.fullStamp(true);
std::cout <<"Points loaded: "<< tempCloud2->points.size() << std::endl;
std::cout << "Sample point: " << tempCloud2->points.at(0) << std::endl;
std::cout << std::endl;
std::cout << "Press enter to save cloud in pcl format " << std::endl;
std::cin.get();
pcl::io::savePCDFileBinary("C:/SO/testTorusColorPCLFormatted.pcd", *tempCloud2);
std::cout << "Press enter to load formatted cloud" << std::endl;
std::cin.get();
stopWatch = TimeStamp();
pcl::PointCloud<pcl::PointXYZRGB>::Ptr tempCloud3(new pcl::PointCloud<pcl::PointXYZRGB>);
pcl::io::loadPCDFile("C:/SO/testTorusColorPCLFormatted.pcd", *tempCloud3);
stopWatch.fullStamp(true);
std::cout << "Points loaded: " << tempCloud3->points.size() << std::endl;
std::cout << "Sample point: " << tempCloud3->points.at(0) << std::endl;
std::cout << std::endl;
std::cin.get();
Cloud compare generated colored cloud (3M points with color):
Running in Debug, reproduced your approximate load time with a 3M pt cloud:
Running in Release:
I was running into exactly this situation.
It simply comes down to file storage style. Your file (taking that long to load) is almost certainly an ASCII style point cloud file. If you want to be able to load it much faster (x100) then convert it to binary format. For reference, I load a 1M pt cloud in about a quarter second (but that is system dependent)
pcl::PointCloud<pcl::PointXYZ>::Ptr tempCloud(new pcl::PointCloud<pcl::PointXYZ>);
The load call is the same:
pcl::io::loadPCDFile(fp, *tempCloud);
but in order to save as binary use this:
pcl::io::savePCDFileBinary(fp, *tempCloud);
Just in case it helps, here is a snippet of the code I use to load and save clouds (I structure them a bit, but it is likely based on an example, so I don't know how important that is but you may want to play with it if you switch to binary and are still seeing long load times).
//save pt cloud
std::string filePath = getUserInput("Enter file name here");
int fileType = stoi(getUserInput("0: binary, 1:ascii"));
if (filePath.size() == 0)
printf("failed file save!\n");
else
{
pcl::PointCloud<pcl::PointXYZ> tempCloud;
copyPointCloud(*currentWorkingCloud, tempCloud);
tempCloud.width = currentWorkingCloud->points.size();
tempCloud.height = 1;
tempCloud.is_dense = false;
filePath = "../PointCloudFiles/" + filePath;
std::cout << "Cloud saved to:_" << filePath << std::endl;
if (fileType == 0){pcl::io::savePCDFileBinary(filePath, tempCloud);}
else
{pcl::io::savePCDFileASCII(filePath, tempCloud);}
}
//load pt cloud
std::string filePath = getUserInput("Enter file name here");
if (filePath.size() == 0)
printf("failed user input!\n");
else
{
filePath = "../PointCloudFiles/" + filePath;
pcl::PointCloud<pcl::PointXYZ>::Ptr tempCloud(new pcl::PointCloud<pcl::PointXYZ>);
if (pcl::io::loadPCDFile(filePath, *tempCloud) == -1) //* load the file
{
printf("failed file load!\n");
}
else
{
copyPointCloud(*tempCloud, *currentWorkingCloud); std::cout << "Cloud loaded from:_" << filePath << std::endl;
}
}
List item
This looks correct, when comparing with a pcl example. I think the main work of loadPCDFile is done in the function pcl::PCDReader::read, which is located in the file pcd_io.cpp. When checking the code for binary data, as it is in your case, there are 3 nested for loops which check if the numerical data of each field is valid. The exact code comment is
// Once copied, we need to go over each field and check if it has NaN/Inf values and assign cloud
That could be time consuming. However, I am speculating.

why "open file failed" when i open many files?

this is my c++ code
const int num_of_file = 1024;
std::ifstream data("data.txt");
std::vector<std::ofstream> files(num_of_file);
for (int i = 0; i < num_of_file; ++i)
{
files[i].open(std::to_string(i) + ".txt");
if (files[i].is_open() == false)
{
std::cerr << "open " << std::to_string(i) << ".txt fail" << std::endl;
exit(0);
}
}
but i received "open 509.txt fail" when i run the code every time.
After a little bit of research the limitation seems to stem from the stream.
stream is built on top of C "streams" (fopen, fread, etc), and those functions use collection of shared tables of "file handles", those tables having a maximum size that's "burned into" the VC++ runtime library. I'm a bit surprised that you're hitting the limit at 509 files - near as I can tell, the limit should be closer to 2048 files - but I'd bet that's the limit you're hitting.
You should keep an internal buffer and after a certain limit has been reached open, write and close the file.

Error Using CvMoments to calculate the HU moments

I am doing my FYP and I am a bit new to both OpenCV and C++. I have looked for info regarding CvMoments and all i found (in theory examples that work) do not solve my problem. I want to load a set of images ("1.png" to "5.png") and write the HU moments into a text file. The code is as shows:
CvMoments moments;
CvHuMoments hu_moments;
char filename[80] = "";
ofstream myfile;
myfile.open ("HU_moments.txt");
for (int i=0;i<5;i++){
sprintf(filename,"%u.png",i);
IplImage* image = cvLoadImage(filename);
cvMoments(image,&moments);
cvGetHuMoments(&moments, &hu_moments);
myfile << "Hu1: " << hu_moments.hu1 <<
"Hu2: " << hu_moments.hu2 <<
"Hu3: " << hu_moments.hu3 <<
"Hu4: " << hu_moments.hu4 <<
"Hu5: " << hu_moments.hu5 <<
"Hu6: " << hu_moments.hu6 <<
"Hu7: " << hu_moments.hu7 << ".\n";
cvReleaseImage(&image);
}
myfile.close();
The problem occurs when i get to cvMoments(image,&moments). I get:
Unhandled exception at 0x759fb9bc in Viewer.exe: Microsoft C++ exception: cv::Exception at memory location 0x002fce00..
I have tried declaring moments as a pointer (with its corresponding melloc) but still i get the same error. The funny thing is if i click the option to continue debugging (5 times, one for each loop) i will get results that are printed into my text file. i am using visual studio 2008.
I hope someone knows what is going on here and how to solve.
You are calling it right, but I suspect that your problem is that the previous call is failing:
IplImage* image = cvLoadImage(filename);
You need to check the result of cvLoadImage() and make sure that you are passing a valid argument to cvMoments():
IplImage* image = cvLoadImage(filename);
if (!image)
{
cout << "cvLoadImage() failed!" << endl;
// deal with error! return, exit or whatever
}
cvMoments(image, &moments);
It's also a good idea to check if myfile was successfully opened.
The root of the problem, according to my crystal ball, is a misconception of where you should put the files that the application needs to load when its executed from Visual Studio, and that would be the directory where your source code files are.
If you put the image files on the same directory as the source code, you should be OK.
On the other hand, when you execute your application manually (by double clicking the executable), the image files need to be in the same directory as the executable.
EDIT:
I'm convinced that cvMoments() takes a single-channel image as input. This example doesn't throw any exceptions:
CvMoments moments;
CvHuMoments hu_moments;
IplImage* image = cvLoadImage(argv[1]);
if (!image)
{
std::cout << "Failed cvLoadImage\n";
return -1;
}
IplImage* gray = cvCreateImage(cvSize(image->width, image->height), image->depth, 1);
if (!gray)
{
std::cout << "Failed cvCreateImage\n";
return -1;
}
cvCvtColor(image, gray, CV_RGB2GRAY);
cvMoments(gray, &moments, 0);
cvGetHuMoments(&moments, &hu_moments);
cvReleaseImage(&image);
cvReleaseImage(&gray);
Before converting the colored image to gray, I was getting:
OpenCV Error: Bad argument (Invalid image type) in cvMoments, file OpenCV-2.3.0/modules/imgproc/src/moments.cpp, line 373
terminate called after throwing an instance of 'cv::Exception'
OpenCV-2.3.0/modules/imgproc/src/moments.cpp:373: error: (-5) Invalid image type in function cvMoments

Memory (and other resources) used by individual VirtualAlloc allocation

How much memory or other resources is used for an individual VirtualAlloc (xxxx, yyy, MEM_RESERVE, zzz)?
Is there any difference in resource consumption (e.g. kernel paged/nonpaged pool) when I allocated one large block, like this:
VirtualAlloc( xxxx, 1024*1024, MEM_RESERVE, PAGE_READWRITE )
or multiple smaller blocks, like this:
VirtualAlloc( xxxx, 64*1024, MEM_RESERVE, PAGE_READWRITE );
VirtualAlloc( xxxx+1*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );
VirtualAlloc( xxxx+2*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );
...
VirtualAlloc( xxxx+15*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );
If someone does not know the answer but can suggest an experiment which would be able to check it, it will be helpful as well.
The motivation is I want to implement returning memory back to OS for TCMalloc under Windows. My idea is to replace individual large VirtualAlloc calls by performing a sequence of small (allocation granularity) calls, so that I can call VirtualFree on each of them. I am aware this way the allocation of large blocks will be slower, but are there any resource consumption penalties be expected?
Just FYI, you can use GetProcessMemoryInfo and GlobalMemoryStatusEx to get some memory usage measurements.
void DisplayMemoryUsageInformation()
{
HANDLE hProcess = GetCurrentProcess();
PROCESS_MEMORY_COUNTERS pmc;
ZeroMemory(&pmc,sizeof(pmc));
GetProcessMemoryInfo(hProcess,&pmc, sizeof(pmc));
std::cout << "PageFaultCount: " << pmc.PageFaultCount << std::endl;
std::cout << "PeakWorkingSetSize: " << pmc.PeakWorkingSetSize << std::endl;
std::cout << "WorkingSetSize: " << pmc.WorkingSetSize << std::endl;
std::cout << "QuotaPeakPagedPoolUsage: " << pmc.QuotaPeakPagedPoolUsage << std::endl;
std::cout << "QuotaPagedPoolUsage: " << pmc.QuotaPagedPoolUsage << std::endl;
std::cout << "QuotaPeakNonPagedPoolUsage: " << pmc.QuotaPeakNonPagedPoolUsage << std::endl;
std::cout << "QuotaNonPagedPoolUsage: " << pmc.QuotaNonPagedPoolUsage << std::endl;
std::cout << "PagefileUsage: " << pmc.PagefileUsage << std::endl;
std::cout << "PeakPagefileUsage: " << pmc.PeakPagefileUsage << std::endl;
MEMORYSTATUSEX msx;
ZeroMemory(&msx,sizeof(msx));
msx.dwLength = sizeof(msx);
GlobalMemoryStatusEx(&msx);
std::cout << "MemoryLoad: " << msx.dwMemoryLoad << std::endl;
std::cout << "TotalPhys: " << msx.ullTotalPhys << std::endl;
std::cout << "AvailPhys: " << msx.ullAvailPhys << std::endl;
std::cout << "TotalPageFile: " << msx.ullTotalPageFile << std::endl;
std::cout << "AvailPageFile: " << msx.ullAvailPageFile << std::endl;
std::cout << "TotalVirtual: " << msx.ullTotalVirtual << std::endl;
std::cout << "AvailVirtual: " << msx.ullAvailVirtual << std::endl;
std::cout << "AvailExtendedVirtual: " << msx.ullAvailExtendedVirtual << std::endl;
}
Zero, or practically Zero, memory is used by making a VirtualAlloc call with the reserve param. This will just reserve the address space within the process. The memory will not be used until you actually back the address with a page by using VirtualAlloc with the commit param.
This is essentially the difference between virtual bytes, the amount of address space taken, and private bytes, the amount of committed memory.
Both of your uses of VirtualAlloc() will reserve the same amount of memory so they are equivalent from the resource consumption side.
I suggest that you do some reading on this before deciding to write your own allocator. One of the best sources for this is Mark Russinivich. You should check his blog. He has written a few entries called pushing the limits which cover some of of this. If you want to get at the real nitty gritty details, then you should read his book (Microsoft Windows Internals). This is by far the best reference that I have read on how windows manages the memory (and everything else).
(Edit) Additional Information:
The relevant pieces are the "Page Directory" and the "Page Table". According to my older copy of Microsoft Windows Internals... On x86, there is a single Page Directory for each process with 1024 entries. There are up to 512 page tables. Each 32 bit pointer used in the process is broken into 3 pieces [31-22]Page Directory Index, [21-12] is the Page Table Index, and [11-0] is the byte index in the page.
When you use virtual alloc with the reserve param, the Page Directory Entry is created (32 bits), and the Page Table Entry is created 32 bits. At this time, the page is not created for the reserved memory.
The best way to see this information is to use the Kernel Debugger. I would suggest using LiveKD (sysinternals). You can use liveKD without attaching a remote computer, but it doesn't allow live debugging. Load LiveKD, and select your process. Then you can run the !PTE command to examine the page table for the process.
Again, I would suggest reading Inside Windows Internals. In my version (4th ed) there is a chapter(over 100 pages) that covers all of this with examples for walking through the various data structures in liveKD.
In my understanding of the page table, you have chunks for e.g. 1024 pages, with one word per page each. In any case, it's the number of pages, not allocations, that cost. Hwoever, there might be other mechanisms that cost "extra" per allocation (I just don't know).
Still: using VirtualFree you can selectively decommit individual pages or page ranges. For a decommitted page, the virtual address range (within your process) is still reserved, but no physical memory (RAM or swap file) is assigned to it. You can later use VirtualAlloc to commit these pages again.
So unless you need to free up address space for other allocators within your process, you can use this mechanism to selectively request and return memory to the OS.
[edit]
Measuring
For measuring, I though of comparing the performance of both algorithms under one or more typical loads (artificial/random allocation pattern, an allocation-heavy "real world" application, etc.). Advantage: you get the "whole story" - Kernel resources, page fragmentation, application performance etc. Disadvantage: you have to implement both algorithms, you don't know the reason, and you probably need very special cases for a measureable difference that sticks out from the noise.
Address space fragmentation warning - be careful with your return algorithm. When returning individual pages to the process in an "whoever is free" fashion, you might end up with an fragmented address space that has 80% of free memory but no 100K of it consecutive.
You can try to use "perfmon" and add Counters (e.g. Memory) to start getting a feel of what resources are being used by VirtualAlloc. You will have to take a snapshot before and after the call to VirtualAlloc
Another option could be to debug the process making call to VirtualAlloc under WinDBG and use the memory related commands http://windbg.info/doc/1-common-cmds.html#20_memory_heap to get an idea of what is actually happening.