Testing C++ code and IsBadWritePtr

Testing C++ code and IsBadWritePtr - c++

I am currently writing some basic tests for some functions of my C++ code (it is a game engine that I am writing mostly for educational purposes). One of the features I want to test is the memory allocation code. The tests currently consist of a function that runs every startup if the code is in debug mode. This forces me to always test the code when I am debugging.
To test my memory allocation code my instinct is to do something like this:
int* test = MemoryManager::AllocateMemory<int>();
assert(!IsBadWritePtr(test, sizeof(int)), "Memory allocation test failed: allocate");
MemoryManager::FreeMemory(test);
assert(IsBadWritePtr(test, sizeof(int)), "Memory free test failed: free");
This code is working fine, but all the resources I can find say not to use the IsBadWritePtr function (this is a WinAPI function for those unfamiliar). Is the use of this function OK in this case? The three main warnings against using it I found were:
This might cause issues with guard pages
This isn't an issue as the memory allocation code is right there and I know I am not allocating a guard page.
It's better to fail earlier
This isn't a consideration as I am literally using it to fail as early as possible.
It isn't thread safe
The test code is executed right at the beginning of execution, long before any other threads exist. It also acts on memory to which no other pointers are created, and therefore could not exist in other threads.
So basically I want to know if the use of this function is a good idea in this case, and if there is anything I am missing about the function. I am also aware that something that points to the wrong place will still pass this test, but it at least detects most memory allocation errors, right (what are the chances I get a pointer to valid memory if the allocation fails?)

I was going to write this as a comment but it's too long.
I'll bring up the elephant in the room: why don't you just test for failure in the traditional way (by returning null or throwing an exception)?
Nevermind the fact that IsBadWritePtr is so frowned upon (even its documentation says that it's obsolete and you shouldn't use it), but your use case doesn't even seem appropriate. From the MSDN documentation:
This function is typically used when working with pointers returned from third-party libraries, where you cannot determine the memory management behavior in the third-party DLL.
But you are not using it to test anything passed/returned from a DLL, you just seem to be using it to test for allocation success, which is not only unnecessary (because you already know that from the return value of HeapAlloc, GlobalAlloc, etc.), but it's not what IsBadWritePtr is intended for.
Also, testing for allocation success is not something you should only do in debug mode, or with asserts, as it's obviously out of your control and you can't try to "fix" it by debugging.

Building on #user1610015's answer, there one reason why IsBadReadPtr should NOT work in your scenario.
Basically IsBadReadPtr works on whole page granularity. This means for the above code to be correct each and every allocation you make will consume a whole page (min 4KB).
Modern allocators use a variety of tricks to pack lots of allocations into pages (low fragmentation bucket heaps, linked lists of allocations, etc). If you don't pack small allocation like this then thinks like stl maps and other libraries which use lots of small allocations will absolutely kill your game (both in memory use and the fact that cache coherency will be killed with so much unused padding).
As a side not, your last comment about thread safety is dangerous. Lots of apps and libraries you link to can spawn threads off with global object constructors (and so run before main is called) and other tricks to insert code into your process. So I would definitely check this is the case with your code right now but more importantly later as you add 3rd party libraries to you code check it then.

Related

Memory leak detection in C/C++ compiler

Can heap memory leaks detection be built in a C/C++ compiler? For example, in it's simplest form, during semantic analysis it would simply count allocated memory segments (with new/malloc or whatever) and delete/free calls for each one. Then give a compile-time warning about it.

See the C++ Core Guidelines which is a tool that parses the code to find deviations from the GSL standard. This standard has statically enforceable coding rules that preclude the possibility of Memory Leaks.
https://isocpp.org/blog/2015/09/bjarne-stroustrup-announces-cpp-core-guidelines

Well, probably you can try to make it, but you should take an account of the following things:
You should build your application in the whole program mode (in other cases you can catch only very trivial things).
You should make a very complex data flow analysis for pointers - just imagine that you allocated memory for some object and than assigned this pointer to another. And you free only one of them in any place of a program. Or you can put this pointer into some container.
There can be hundreds thousands of pointers in a big application. So the analysis need a lot of memory and time. Much more than a user wants to spend.
If you make quite a cheap and quick analysis it will be inaccurate. So you will have to print a lot of false-positive warnings or only exact warnings but for most simple things
So as a an academic research that probably will be very interesting but in real live I doubt it is possible to make such analysis of satisfying quality.

Well, there are the clang (LLVM) sanitizers. They're not compile-time, but add runtime instrumentation to the emitted program.
In general, it's not possible to do a good job entirely statically, for code whose behaviour depends on inputs. Standalone static analysis tools have some heuristics for tracing various code paths, which is already sort-of dynamic-ish.
Just instrumenting the code and running it (or emulating like valgrind) is conceptually simpler, although of course it does slow down execution.

Our CheckPointer tool instruments MS or GCC C programs to find a wide variety of pointer management errors that occur at runtime.
It detects dangling pointers, pointers that step outside the language-defined memory region for which they originated, including pointers to areas inside the stack (these are issues that Valgrind cannot find because it believes the entire stack is a valid place for a pointer to reference). It also handles thread-local storage.
It reports an error at the earliest time that such an error could be reported, which makes finding the cause considerably simpler.
At the end of a run, it produces a list of allocated but not freed blocks of storage to help track down memory leaks.

Allocation numbers in C++ (windows) and its predictibility

I am using _CrtDumpMemoryLeaks to identify memory leaks in our software. We are using a third party library in a multi-threaded application. This library does have memory leaks and therefore in our tests we want to identify those that ours and discard those we do not have any control over.
We use continuous integration so new functions/algorithms/bug fixes get added all the time.
So the question is - is there a safe way of identifying those leaks that are ours and those that are the third parties library. We though about using allocation numbers but is that safe?

In a big application I worked on the global new and delete operators were overwritten (eg. see How to properly replace global new & delete operators) and used private heaps (eg. HeapCreate). Third party libraries would use the process heap and thus the allocation would be clearly separated.
Frankly I don't think you can get far with allocation numbers. Using explicit separate heaps for app/libraries (and maybe even have separate per-component heaps within your own app) would be much more manageable. Consider that you can add your own app specific header to each allocated block and thus enable very fancy memory tracking. For example capture the allocation entire call-stack would be possible, for debugging. Enable per-component accounting. Etc etc.

You might be able to do this using Mirosoft's heap debugging library without using any third-party solutions. Based on what I learned from a previous question here, you should just make sure that all memory allocated in your code is allocated through a call to _malloc_dbg where the second argument is set to _CLIENT_BLOCK. Then you can set a callback function with _CrtSetDumpClient, and that callback will only receive information about the client blocks that were allocated, not the other ones.
You can easily use the preprocessor to convert all the calls to malloc and free to actually call their debugging versions (e.g. _malloc_dbg); just look at how it's done in crtdbg.h which comes with Visual Studio.
The tricky part for me would be figuring out how to override the new and delete operators to call debugging functions like _malloc_dbg. It might be hard to find a solution where only the news and deletes in your own code are affected, and not in the third-party library.

You may want to use DebugDiag Tool provided by Microsoft. For complete information about the tool
we can refer : http://www.microsoft.com/en-sg/download/details.aspx?id=40336
DebugDiag can be used for identifying various issue. We can follow the steps to track down the
leaks(ours and third party module):
Configure the DebugDiag under Rule Type "Native(non .NET) Memory and Handle Leak".
Now Re-run the application for sometime and capture the dump files. We can also configure
the DebugDiag to capture the dump file after specified interval.
Now we can open/analyze the captured dump file using DebugDiag under the "Performance Analyzers".
Once analysis is complete, DebugDiag would automatically generate the report and would give you the
modules/DLL information where leak is possible(with probability). After this we get the information about the modules from DebugDiag tool, we can concentrate on that particular module by doing static code analysis. If modules belongs to third party DLL, we can share the DebugDiag report to them. In addition to this, if you run/attach your application with appropriate PDB file, DebugDiag also provides the call stack from where chances of memory leak is possible.
These information were very useful in the past while debugging memory leak on windows based application. Hopefully above information would be useful.

The answer would REALLY depend on the actual implementation of the third partly library. Does it only leak a consistent number of items, or does that depend on, for example, the number of threads, what functions are used within the library, or some such? When are the allocations made?
Even then if it's a consistent number of leaks regardless of library usage, I'd be hesitant to use this the allocation number. By all means, give it a try. If all the allocations are made very early on, and they don't depend on any of "your" code, then it could work - and it is a REALLY simple thing. But try adding for example a static std::vector<int>(100) to see if memory allocations in static variables are affecting the allocation number... If it does, this method is probably doomed (unless you have very strict rules on static objects).
Using a separate heap (with new/delete operators replaced) would be the correct solution, as this can probably be expanded to gather other statistics too [like number of allocations made, to detect parts of the code that makes excessive allocations - of course, this has to be analysed based on what the code actually does].

The newer Doug Lea malloc's include the mspace abstraction. An mspace is a separate heap. In our couple 100K NCSL application, we use a dozen different mspace's for different parts of the code. We use allocators to have STL containers allocate memory from the right mspace.
Some of the benefits
3rd party code does not use mspaces, so their allocation (and leaks) do not mix with ours
We can look at the memory usage of each mspace to see which piece of code might have memory leaks
Any memory corruption is contained within one mspace thus limiting the amount of code we need to look at for debugging.

How do I replace global operator new and delete in an MFC application (debug only)

I've avoided trying to do anything with operator new for many years, due to my sense that it is a quagmire on Windows (especially using MFC). Not to mention, unless one has a very compelling reason to mess with global (or even class) new and delete, one should not.
However, I have a nasty little memory corruption bug, and I'd very much like to track it down. I get messages from the CRT debug allocator indicating that previously freed memory was overwritten. This message is displayed only during a later allocation call, when it tries to reuse a block (I believe this is how it works, anyway).
Due to the portion of code in question, the error message and point of corruption are very unrelated. All I know is "something somewhere overwrote some memory that was previously freed with a single null byte." (I ascertained this by using the debugger and watching the memory referred to by the debug heap over several different runs).
Having exhausted the obvious ideas as to where the culprit might be, I'm left with trying to do something more rigorous. It occurred to me that it would be ideal if I could cause each freed block to become a no-access page of memory so that the writer would be immediately caught by the CPU's MMC! A little bit of searching later, and I found someone who had implemented something along those lines:
http://www.codeproject.com/Articles/38340/Immediate-memory-corruption-detection
His code is buried under a ton of reinvent-the-wheel code, but extracting the core concept was pretty easy, and I have done so.
The problem I have now is that MFC redfines new as DEBUG_NEW, and then further defines a slew of debug interfaces down to the CRT. In addition, it does define the global operator new and delete. Hence, as far as C++ is concerned, the "user" is trying to replace global operator new and delete twice, and hence I get a linker error to the effect 'symbol already defined.'
Looking around the internet, and SO, I see a few promising articles, but none which ultimately have anything positive to say about replacing global operator new/delete for MFC.
How to properly replace global new & delete operators
Is it possible to replace the memory allocator in a debug build of an MFC application?
I am already aware:
MFC/CRT already provides rich debugging tools for memory allocation.
Well, it provides what it provides - such as the message that got me rolling down this path in the first place. I now know that a corruption is occurring, but that's awfully weak-sauce!
What I would like to supply is guarded-allocation (or even just guarded deallocation). This is clearly possible by using a lot of virtual address space and placing every allocation in isolation, which is horribly wasteful of memory. Okay, yeah, can't see the down side when this is debug-only code useful for special-purpose moments like now.
So, I'm desperately seeking solutions to the following
Force the compiler to be copacetic with my global operator new/delete despite the CRT/MFC supplied one.
Find another way to hook the MFC/CRT _heap_alloc_dbg chain to bottom out at using my own code in place of theirs, for the pen-ultimate allocation (i.e. I'll allocate via the OS's VirtualAlloc/VirtualFree to supply memory for new and/or malloc).
Does anyone know of answers, or good articles to read that might shed some light on how that these may be accomplished?
Other ideas:
Replace the CRT's new/delete at runtime using a thunk technique.
Some other approach entirely?!
Further investigation:
This article is pretty cool... it gives a way for me to patch the global new/delete operators at runtime. However, as the article points out, it's a bit hackish (however, since I only need this for debug builds, that's not a big deal) http://zeuxcg.blogspot.com/2009/03/fighting-against-crt-heap-and-winning.html
So although this is getting at what I want (a mechanism to replace the CRT memory allocation functions), this implementation is pretty far out of date, and so far my attempts to make it work have run into myriad issues. I think it's just too hacked to the version it was originally created for, and only for a relatively simple console use (i.e. C, not even C++, and jettisoning most of the debugging features provided by the microsoft CRT). Hence, although a super-cool idea, one that ultimately would cost many hours of effort to make work with the current VS2010 dev studio, and hence not worth it (to me).
Apparently there is a well-known version of this idea: http://en.wikipedia.org/wiki/Electric_Fence Which unfortunately even the Windows port I found http://code.google.com/p/electric-fence-win32/ fails to override the CRT properly, but asks that you modify all of your source code to access the electric fence heap allocation code. :(
Update 5/3/2012:
And now I discover that Windows already provides an implementation of Electric Fence, accessible via GFLAGS debugging tool http://support.microsoft.com/kb/286470 This can be turned on and off external to the application being tested. It's essentially the same technology as I was interested in, and has the features in the DUMA project (a branch of the Electric Fence - http://duma.sourceforge.net/

The MSVCRT debug heap is actually pretty good and has some useful features you can use, such as breakpoint on the nth allocation etc.
http://msdn.microsoft.com/en-us/library/974tc9t1(v=VS.80).aspx
Among other things you can insert an allocation hook which outputs debugging information etc which you can use to debug this sort of issue.
http://msdn.microsoft.com/en-us/library/z2zscsc2(v=vs.80).aspx
In your case all you really need to do is output the address, and file and line of each allocation. Then when you experience a corrupt block, find the block whose address immediately precedes it, which will almost certainly be the one which overran. You can use the memory view in the Visual Studio debugger to look at the memory address which was corrupted and see the preceding blocks. That should tell you all you need to know to find out when it was allocated.
The debug heap also has a numerical allocation ID on each block allocated, and can break on the nth! allocation, so if you can get a reasonably consistent repro, so the same numerical block is corrupted each time, then you should be able to use the "break on nth" functionality to get a full call stack for the time it was allocated.
You may also find _CrtCheckMemory useful to find out if corruption has occurred much earlier. Just call it periodically, and once you have the bug bracketed (error didn't occur in one, did occur in the other) move them closer and closer together.

Shielding app from library leaks

I have to use a function from a shared library which leaks some small amount of memory (Let's assume I can't modify the library). Unfortunately, I have to call this function huge number of times which obviously makes this leak catastrophic.
Is there any method to fix this problem? If yes, is there a fast method of doing this? (The function must be called few hundred thousand times, the leak becomes problematic after about 10k times)

I can think of a couple of approaches, but I don't know what will work for you.
Switch to a garbage-collecting memory allocator like Boehm's gc. This can sweep up those leaks, and may even be a performance gain because free() becomes a no-op.
exit(): The Ultimate Deallocator. Fork off a subprocess, run it 10k times, pass the results back to the parent process. Apache's web server does this to contain damage from third-party library leaks.

I'm not sure this easier than rewriting the function yourself, but you could write your own small memory allocator specific for your task, which would look somewhat the following way:
(it should replace default memory allocation calls and this is done for the functions in your library too).
1) You should have a possibility to enter the leak-reverting mode, which, for example, disposes everything allocated in this mode.
2) Before your function processes something, enter that leak-reverting mode and exit it upon the function finishes.
Basically, if the dependencies in your code aren't too tight, this would help.
Another way would be making another application and pairing it with the main. When the second one exits, the memory would be automatically disposed. You may want to see how googletest framework runs it's child test and how the pipes are being constructed there.

In short, no. If you have time, you can rewrite the function yourself. Catastrophic usually means this is the way to go. One other possibility, can you load and unload the library (like a .so)? It's possible that this will release the leaked memory.

Heap corruption under Win32; how to locate?

I'm working on a multithreaded C++ application that is corrupting the heap. The usual tools to locate this corruption seem to be inapplicable. Old builds (18 months old) of the source code exhibit the same behaviour as the most recent release, so this has been around for a long time and just wasn't noticed; on the downside, source deltas can't be used to identify when the bug was introduced - there are a lot of code changes in the repository.
The prompt for crashing behaviuor is to generate throughput in this system - socket transfer of data which is munged into an internal representation. I have a set of test data that will periodically cause the app to exception (various places, various causes - including heap alloc failing, thus: heap corruption).
The behaviour seems related to CPU power or memory bandwidth; the more of each the machine has, the easier it is to crash. Disabling a hyper-threading core or a dual-core core reduces the rate of (but does not eliminate) corruption. This suggests a timing related issue.
Now here's the rub:
When it's run under a lightweight debug environment (say Visual Studio 98 / AKA MSVC6) the heap corruption is reasonably easy to reproduce - ten or fifteen minutes pass before something fails horrendously and exceptions, like an alloc; when running under a sophisticated debug environment (Rational Purify, VS2008/MSVC9 or even Microsoft Application Verifier) the system becomes memory-speed bound and doesn't crash (Memory-bound: CPU is not getting above 50%, disk light is not on, the program's going as fast it can, box consuming 1.3G of 2G of RAM). So, I've got a choice between being able to reproduce the problem (but not identify the cause) or being able to idenify the cause or a problem I can't reproduce.
My current best guesses as to where to next is:
Get an insanely grunty box (to replace the current dev box: 2Gb RAM in an E6550 Core2 Duo); this will make it possible to repro the crash causing mis-behaviour when running under a powerful debug environment; or
Rewrite operators new and delete to use VirtualAlloc and VirtualProtect to mark memory as read-only as soon as it's done with. Run under MSVC6 and have the OS catch the bad-guy who's writing to freed memory. Yes, this is a sign of desperation: who the hell rewrites new and delete?! I wonder if this is going to make it as slow as under Purify et al.
And, no: Shipping with Purify instrumentation built in is not an option.
A colleague just walked past and asked "Stack Overflow? Are we getting stack overflows now?!?"
And now, the question: How do I locate the heap corruptor?
Update: balancing new[] and delete[] seems to have gotten a long way towards solving the problem. Instead of 15mins, the app now goes about two hours before crashing. Not there yet. Any further suggestions? The heap corruption persists.
Update: a release build under Visual Studio 2008 seems dramatically better; current suspicion rests on the STL implementation that ships with VS98.
Reproduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.
I'll take a note of that, but I'm concerned that Dr Watson will only be tripped up after the fact, not when the heap is getting stomped on.
Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.
Got that going at the moment, again: not much help until something goes wrong. I want to catch the vandal in the act.
Maybe these tools will allow you at least to narrow the problem to certain component.
I don't hold much hope, but desperate times call for...
And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?
No I'm not, and I'll spend a couple of hours tomorrow going through the workspace (58 projects in it) and checking they're all compiling and linking with the appropriate flags.
Update: This took 30 seconds. Select all projects in the Settings dialog, unselect until you find the project(s) that don't have the right settings (they all had the right settings).

My first choice would be a dedicated heap tool such as pageheap.exe.
Rewriting new and delete might be useful, but that doesn't catch the allocs committed by lower-level code. If this is what you want, better to Detour the low-level alloc APIs using Microsoft Detours.
Also sanity checks such as: verify your run-time libraries match (release vs. debug, multi-threaded vs. single-threaded, dll vs. static lib), look for bad deletes (eg, delete where delete [] should have been used), make sure you're not mixing and matching your allocs.
Also try selectively turning off threads and see when/if the problem goes away.
What does the call stack etc look like at the time of the first exception?

I have same problems in my work (we also use VC6 sometimes). And there is no easy solution for it. I have only some hints:
Try with automatic crash dumps on production machine (see Process Dumper). My experience says Dr. Watson is not perfect for dumping.
Remove all catch(...) from your code. They often hide serious memory exceptions.
Check Advanced Windows Debugging - there are lots of great tips for problems like yours. I recomend this with all my heart.
If you use STL try STLPort and checked builds. Invalid iterator are hell.
Good luck. Problems like yours take us months to solve. Be ready for this...

We've had pretty good luck by writing our own malloc and free functions. In production, they just call the standard malloc and free, but in debug, they can do whatever you want. We also have a simple base class that does nothing but override the new and delete operators to use these functions, then any class you write can simply inherit from that class. If you have a ton of code, it may be a big job to replace calls to malloc and free to the new malloc and free (don't forget realloc!), but in the long run it's very helpful.
In Steve Maguire's book Writing Solid Code (highly recommended), there are examples of debug stuff that you can do in these routines, like:
Keep track of allocations to find leaks
Allocate more memory than necessary and put markers at the beginning and end of memory -- during the free routine, you can ensure these markers are still there
memset the memory with a marker on allocation (to find usage of uninitialized memory) and on free (to find usage of free'd memory)
Another good idea is to never use things like strcpy, strcat, or sprintf -- always use strncpy, strncat, and snprintf. We've written our own versions of these as well, to make sure we don't write off the end of a buffer, and these have caught lots of problems too.

Run the original application with ADplus -crash -pn appnename.exe
When the memory issue pops-up you will get a nice big dump.
You can analyze the dump to figure what memory location was corrupted.
If you are lucky the overwrite memory is a unique string you can figure out where it came from. If you are not lucky, you will need to dig into win32 heap and figure what was the orignal memory characteristics. (heap -x might help)
After you know what was messed-up, you can narrow appverifier usage with special heap settings. i.e. you can specify what DLL you monitor, or what allocation size to monitor.
Hopefully this will speedup the monitoring enough to catch the culprit.
In my experience, I never needed full heap verifier mode, but I spent a lot of time analyzing the crash dump(s) and browsing sources.
P.S:
You can use DebugDiag to analyze the dumps.
It can point out the DLL owning the corrupted heap, and give you other usefull details.

You should attack this problem with both runtime and static analysis.
For static analysis consider compiling with PREfast (cl.exe /analyze). It detects mismatched delete and delete[], buffer overruns and a host of other problems. Be prepared, though, to wade through many kilobytes of L6 warning, especially if your project still has L4 not fixed.
PREfast is available with Visual Studio Team System and, apparently, as part of Windows SDK.

Is this in low memory conditions? If so it might be that new is returning NULL rather than throwing std::bad_alloc. Older VC++ compilers didn't properly implement this. There is an article about Legacy memory allocation failures crashing STL apps built with VC6.

The apparent randomness of the memory corruption sounds very much like a thread synchronization issue - a bug is reproduced depending on machine speed. If objects (chuncks of memory) are shared among threads and synchronization (critical section, mutex, semaphore, other) primitives are not on per-class (per-object, per-class) basis, then it is possible to come to a situation where class (chunk of memory) is deleted / freed while in use, or used after deleted / freed.
As a test for that, you could add synchronization primitives to each class and method. This will make your code slower because many objects will have to wait for each other, but if this eliminates the heap corruption, your heap-corruption problem will become a code optimization one.

You tried old builds, but is there a reason you can't keep going further back in the repository history and seeing exactly when the bug was introduced?
Otherwise, I would suggest adding simple logging of some kind to help track down the problem, though I am at a loss of what specifically you might want to log.
If you can find out what exactly CAN cause this problem, via google and documentation of the exceptions you are getting, maybe that will give further insight on what to look for in the code.

My first action would be as follows:
Build the binaries in "Release" version but creating debug info file (you will find this possibility in project settings).
Use Dr Watson as a defualt debugger (DrWtsn32 -I) on a machine on which you want to reproduce the problem.
Repdroduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.
Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.
Maybe these tools will allow you at least to narrow the problem to certain component.
And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?

So from the limited information you have, this can be a combination of one or more things:
Bad heap usage, i.e., double frees, read after free, write after free, setting the HEAP_NO_SERIALIZE flag with allocs and frees from multiple threads on the same heap
Out of memory
Bad code (i.e., buffer overflows, buffer underflows, etc.)
"Timing" issues
If it's at all the first two but not the last, you should have caught it by now with either pageheap.exe.
Which most likely means it is due to how the code is accessing shared memory. Unfortunately, tracking that down is going to be rather painful. Unsynchronized access to shared memory often manifests as weird "timing" issues. Things like not using acquire/release semantics for synchronizing access to shared memory with a flag, not using locks appropriately, etc.
At the very least, it would help to be able to track allocations somehow, as was suggested earlier. At least then you can view what actually happened up until the heap corruption and attempt to diagnose from that.
Also, if you can easily redirect allocations to multiple heaps, you might want to try that to see if that either fixes the problem or results in more reproduceable buggy behavior.
When you were testing with VS2008, did you run with HeapVerifier with Conserve Memory set to Yes? That might reduce the performance impact of the heap allocator. (Plus, you have to run with it Debug->Start with Application Verifier, but you may already know that.)
You can also try debugging with Windbg and various uses of the !heap command.
MSN

Graeme's suggestion of custom malloc/free is a good idea. See if you can characterize some pattern about the corruption to give you a handle to leverage.
For example, if it is always in a block of the same size (say 64 bytes) then change your malloc/free pair to always allocate 64 byte chunks in their own page. When you free a 64 byte chunk then set the memory protection bits on that page to prevent reads and wites (using VirtualQuery). Then anyone attempting to access this memory will generate an exception rather than corrupting the heap.
This does assume that the number of outstanding 64 byte chunks is only moderate or you have a lot of memory to burn in the box!

If you choose to rewrite new/delete, I have done this and have simple source code at:
http://gandolf.homelinux.org/~smhanov/blog/?id=10
This catches memory leaks and also inserts guard data before and after the memory block to capture heap corruption. You can just integrate with it by putting #include "debug.h" at the top of every CPP file, and defining DEBUG and DEBUG_MEM.

The little time I had to solve a similar problem.
If the problem still exists I suggest you do this :
Monitor all calls to new/delete and malloc/calloc/realloc/free.
I make single DLL exporting a function for register all calls. This function receive parameter for identifying your code source, pointer to allocated area and type of call saving this information in a table.
All allocated/freed pair is eliminated. At the end or after you need you make a call to an other function for create report for left data.
With this you can identify wrong calls (new/free or malloc/delete) or missing.
If have any case of buffer overwritten in your code the information saved can be wrong but each test may detect/discover/include a solution of failure identified. Many runs to help identify the errors.
Good luck.

Do you think this is a race condition? Are multiple threads sharing one heap? Can you give each thread a private heap with HeapCreate, then they can run fast with HEAP_NO_SERIALIZE. Otherwise, a heap should be thread safe, if you're using the multi-threaded version of the system libraries.

A couple of suggestions. You mention the copious warnings at W4 - I would suggest taking the time to fix your code to compile cleanly at warning level 4 - this will go a long way to preventing subtle hard to find bugs.
Second - for the /analyze switch - it does indeed generate copious warnings. To use this switch in my own project, what I did was to create a new header file that used #pragma warning to turn off all the additional warnings generated by /analyze. Then further down in the file, I turn on only those warnings I care about. Then use the /FI compiler switch to force this header file to be included first in all your compilation units. This should allow you to use the /analyze switch while controling the output

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js