MSVC Address Sanitizer - Any reason to use it in Release builds?

MSVC Address Sanitizer - Any reason to use it in Release builds? - c++

Microsoft recently brought Address Sanitizer (ASan) to Microsoft Visual Studio 2019, and I've been experimenting with it. This question is specific to C and C++. My question is this: Is there any reason to have ASan enabled for Release builds, as opposed to having it enabled only for Debug builds? Having ASan turned on is devastating to the performance and memory usage of my program. (CPU performance worse than halved, memory usage tripled.) Therefore my hope is that if I enable ASan just to check for potential issues, and it does not detect any problems in a Debug build, then I can safely assume that there would not be any problems in the Release build that would have otherwise been caught by ASan? Certainly we are not meant to leave ASan enabled in Release/Production builds?
Thanks for any insight.

Therefore my hope is that if I enable ASan just to check for potential issues, and it does not detect any problems in a Debug build, then I can safely assume that there would not be any problems in the Release build that would have otherwise been caught by ASan?
That's not likely (especially for larger complex programs). Consider a function like:
int array[250];
int test1(uint16_t a) {
return array[ (1016 / (a+1)) & 0xFF];
}
You can test it with lots of values of a (e.g. 0, 2, 3, 4, ...) and it will be fine, and it will pass the address sanitizer's test during debug. Then, after release, it might crash due to a different (untested) value of a (e.g. 1) that triggers the "array index out of range" bug.
To guarantee that something like that can't happen you could test every possible combination of parameters for every function (which would be ludicrously expensive); but (in the presence of global variables/state) that might still not be enough, and (in the presence of multiple threads) it won't find hidden race conditions (e.g. code that almost always works because threadA finishes before threadB, that crashes when subtle differences in timing cause threadB to finish before threadA).
Certainly we are not meant to leave ASan enabled in Release/Production builds?
Correct - it's too expensive to leave enabled.
The general idea is to minimize the risk of bugs (because you can't guarantee that all bugs were found and eliminated). Without much testing you might be able to say that you're "80% sure" there's no bugs in the released version; and with some rigorous testing with Asan (and optimizations) enabled you might be able to say that you're "95% sure" there's no bugs in the released version.
What this really comes down to is economics (money). It's a "cost of software bugs (lost sales, support costs, etc)" vs. "cost of finding bugs" compromise; where something like a computer game might be fine with less testing (because users just assume that games crash occasionally anyway) and something like a life support machine (where bugs cause deaths) you might refuse to use C or C++ in the first place.
The other thing worth considering is that users don't care much why your code failed - if it failed because of a bug then the user will be annoyed; and if it failed because Asan was enabled and detected a bug then the user will be equally annoyed. Leaving Asan enabled in release builds would help the user submit better information in a bug report, but most people don't submit bug reports (they just sit there annoyed; wondering if it was a temporary hardware glitch because they're too cheap to buy ECC RAM, or if it was a bug in the OS or a driver; and then restart the software).

Hard facts up front: A program's shape - broadly speaking - falls into one of two categories.
Its logical representation expressed as its source code.
Its binary representation, translated by a compiler (and linker) into something a machine can execute.
1. is what the programmer intended the program to do, and 2. is what a program translated that intent into.
Ideally, 1. is perfectly within a programming language's specification, and either representation has identical observable behavior. In reality, 1. is wrong, and 2. does whatever.
The exercise is thus: Evaluate how badly 2. is broken. To do that, you absolutely want a 2. that's as close to what your clients will run as ever possible. Fuzzing a 2. compiled as a Release configuration with ASan enabled gets you there. If you don't feel like reading along, fuzz this target as hard as humanly imaginable, and go your merry way.
That covers the unquestionable facts. What follows is strong opinions on what you (we, us) should really be doing. Stop reading, if you don't like opinions, strong opinions, or ethics.
<Ramblings, not quite done rambling, yet>

Related

In what situation you must work on a debug build?

Given the fact that you can debug in release build as mentioned in http://msdn.microsoft.com/en-gb/library/fsk896zz.aspx, in what situation do you really need to build a debug build in the development process?

While you can debug the release configuration, the settings in the release configuration are for the release build (and probably should be seen/maintained as such, through the development lifecycle).
Changing them similar to that article is a step that you will probably have to revert at some point, unless sending debugging information to your clients is what you want to do.
In some projects there are three maintained build configurations:
debug: supporting no optimizations and full diagnostics information (optimized for code maintenance, by the developers)
release: build what the clients will see/buy
release with debug symbols (similar to the link you ask about): this is for testing; the QA team will test something as similar as possible to what the clients will see, but in case it doesn't work, developers should have enough context information to investigate the issue.

A lot depends on the type of application, but normally, you
won't want two different builds; you want to work and debug on
the same build you deliver. What you call it is up to
you—you generally don't need a name for it, since it is
the only configuration you use.
This will typically be fairly close to what Microsoft calls the
Debug build; it will have assert active, for example, and not
do much optimizing
The exception is when performance is an issue. If you find
yourself in a case where you cannot afford to leave asserts
active, bounds checking in arrays, etc., and you need
optimization in the code you deliver, you will probably want to
have two builds, one for testing and debugging, and one that you
deliver (which will also require testing). The reason is, of
course, that it is very difficult to debug optimized code, since
the generated code doesn't always correspond too closely to what
you have written. Also, a lot of debuggers (including both VS
and gdb, I think) are incapable of showing the values of
variables that the compiler has optimized into a register.
In many such cases, you may also want to create three builds;
iterator validation can be very, very expensive, and you may
want to have a build which turns that off, but still does no
optimizing. (It's very painful to debug if you need to wait 20
minutes to reach the spot where the program fails.)

The optimizations performed for the Release build make debugging harder.
For example,
a = c;
b = d;
In a Debug build, the compiled code will consist of four instructions:
read c
write a
read d
write b
That is fairly straightforward, and when stepping through the program line by line, executing the first line will run instructions 1 and 2, allowing me to examine the program state at this point afterwards.
In contrast, a Release build might notice that a and b are right next to each other in memory, as are c and d, so you could use a single access to read both c and d in one go, and write a and b.
Now we have only two instructions, and there is no clear mapping between source code lines and machine instructions. If you ask the debugger to step over the first line, it will execute both instructions (so you can see the result), but you never get the exact state between the two lines.
This is a simple example. Typically, the optimizer will try to comb instructions together so that the CPU is optimally loaded with instructions.
This especially means pulling memory reads to the front so they have a chance of being executed before the data is used in a calculation (otherwise, the CPU has to stop there and wait for the memory access to complete), mixing floating-point and integer operations (because these are run on different circuitry and can be parallelized), and calculating conditions for conditional jumps as early as possible (so the instruction prefetch mechanism knows whether to follow the jump or not).
In short, while debugging using a Release build is possible and sometimes necessary to reproduce customer bug reports, you really want the predictable behavior of a Debug build most of the time.

Tools for Isolating a Stack smashing bug

To put it mildly I have a small memory issue and am running out of tools and ideas to isolate the cause.
I have a highly multi-threaded (pthreads) C/C++ program that has developed a stack smashing issue under optimized compiles with GCC after 4.4.4 and prior to 4.7.1.
The symptom is that during the creation of one of the threads, I get a full stack smash, not just %RIP, but all parent frames and most of the registers are 0x00 or other non-sense address.
Which thread causes the issue is seemingly random, however judging by log messages it seems to be isolated to the same Hunk of code, and seems to come at a semi repeatable point in the creation of the new thread.
This has made it very hard to trap and isolate the offending code more narrowly than to a single compilation unit of may thousand lines, since print()'s with in the offending file have so far proved unreliable in trying to narrow down the active section.
The thread creation that leads off the thread that eventually smashes the stack is:
extern "C"
{
static ThreadReturnVal ThreadAPI WriterThread(void *act)
{
Recorder *rec = reinterpret_cast (act);
xuint64 writebytes;
LoggerHandle m_logger = XXGetLogger("WriterThread");
if (SetThreadAffinity(rec->m_cpu_mask))
{ ... }
SetThreadPrio((xint32)rec->m_thread_priority);
while (true)
{
... poll a ring buffer ... Hard Spin 100% use on a single core, this is that sort of crazy code.
}
}
I have tried a debug build, but the symptom is only present in optimized builds, -O2 or better.
I have tried Valgrind/memcheck and DRD but both fail to find any issue before the stack is blown away ( and takes about 12hr's to reach the failure )
A compile with -O2 -Wstack-protector sees nothing wrong,
however a build with -fstack-protector-all does protect me from the bug, but emits no errors.
Electric-Fence also traps, but only after the stack is gone.
Question: What other tools or techniques would be useful in narrowing down the offending section ?
Many thanks,
--Bill

A couple of options for approaching this sort of problem:
You could try setting a hardware breakpoint on a stack address before the corruption occurs and hope the debugger breaks early enough in the corruption to provide a vaguely useful debugging state. The tricky part here is choosing the right stack address; depending on how random the 'choice' of offending thread is, this might not be practical. But from one of your comments it sounds like it is often the newly created thread that gets smashed, so this might be doable. Try to break during thread creation, grab the thread's stack location, offset by some wild guess, set the hardware BP, and continue. Based on whether you break too early, too late, or not at all, adjust your offset, rinse, and repeat. This is basically advanced guess and check, and can be heavily hindered or outright unpractical if the corruption pattern is too random, but it is surprising how often this can lead to a semi-legible stack and successful debugging efforts.
Another option would be to start collecting crash dumps. Try to look for patterns between the crash dumps that might help bring you closer to the source of the corruption. Perhaps you'll get lucky and one of the crash dumps will crash 'faster'/'closer to the source'.
Unfortunately, both of these techniques are more art that science; they're non-deterministic, rely on a healthy dose of luck, etc. (at least in my experience.. that being said, there are people out there who can do amazing things with crash dumps, but it takes a lot of time to get to that level of skill).
One more side note: as others have pointed out, uninitialized memory is a very typical source of debug vs release differences, and could easily be your problem here. However, another possibility to keep in mind is timing differences. The order that threads get scheduled in, and for how long, is often dramatically different in debug vs release, and can easily lead to synchronization bugs being masked in one but not the other. These differences can be just due to execution speed differences, but I think some runtimes intentionally mess with thread scheduling in a debug environment.

You can use a static analysis tool to check for some sutble errors, maybe one of the found errors will be the cause of your bug. You can find some information on these tools here.

VS 2005 - Command Line Program Crashes

There is a command line program developed in VS 2005. It processes some file and creates an output file. There is an input file which causes crash, but only in some cases. If program started using command line (either release or debug build is issued) it crashes during processing that file. But, if it is started from VS 2005, by pressing F5 (Debug mode), it works fine, doesn't crash and result is correct. Any hint?
Thanks.

You could look at destructors or copy constructors.
Building in release mode can optimize things like unnecessary object copies.
What happens when you start the program from command line and attach to it afterwards?

While there are various kinds of undefined behaviors that can magically work fine in debug but not in release, or on one system but not another, or maybe only triggers noticeable behavior once in a full moon, probably the most common culprit for single-threaded code is uninitialized memory.
Most of the time this would be an uninitialized variable. It could also be a memory block that is allocated (like a buffer full of garbage) but wasn't filled, yet the code assumed it to be. Debug builds of some popular compilers have a tendency to zero out newly allocated memory, whether on the stack or heap, while release builds don't do this. They even have some debugging tools out there that deliberately fill memory with garbage to help catch these kinds of errors at runtime.
We're plagued by these in a legacy C system we work on. I'd say about 80% of the time, when we encounter such situations in single-threaded code, it's due to uninitialized memory of some sort (typically uninitialized variable). For multithreaded code which tends to exhibit timing-specific problems, that is a data race more often than not.
It's very important to practice safe practices to avoid undefined behaviors like this, since as you can see, it can become quite a pain reproducing the problem in the first place let alone narrowing down where the problem is in the code. Undefined behavior really is undefined which is what's so dangerous avoid it - it might work sometimes and sometimes not, on some systems and not others, and the fact that it works sometimes is what makes these bugs the nastiest (something that fails every time would actually be a whole lot better).
Another common beginner one that can be quite a head scratcher is failing to make a base class virtual while deleting through a base pointer. While not nearly as common, it can certainly lead to some very perplexing behavior on the systems I've tested. Again, it's hard to know in advance what your problem is with such a vague description, but it's typically going to be undefined behavior of some sort.

Why does a C/C++ program often have optimization turned off in debug mode?

In most C or C++ environments, there is a "debug" mode and a "release" mode compilation.
Looking at the difference between the two, you find that the debug mode adds the debug symbols (often the -g option on lots of compilers) but it also disables most optimizations.
In "release" mode, you usually have all sorts of optimizations turned on.
Why the difference?

Without any optimization on, the flow through your code is linear. If you are on line 5 and single step, you step to line 6. With optimization on, you can get instruction re-ordering, loop unrolling and all sorts of optimizations.
For example:
void foo() {
1: int i;
2: for(i = 0; i &lt 2; )
3: i++;
4: return;
In this example, without optimization, you could single step through the code and hit lines 1, 2, 3, 2, 3, 2, 4
With optimization on, you might get an execution path that looks like: 2, 3, 3, 4 or even just 4! (The function does nothing after all...)
Bottom line, debugging code with optimization enabled can be a royal pain! Especially if you have large functions.
Note that turning on optimization changes the code! In certain environment (safety critical systems), this is unacceptable and the code being debugged has to be the code shipped. Gotta debug with optimization on in that case.
While the optimized and non-optimized code should be "functionally" equivalent, under certain circumstances, the behavior will change.
Here is a simplistic example:
int* ptr = 0xdeadbeef; // some address to memory-mapped I/O device
*ptr = 0; // setup hardware device
while(*ptr == 1) { // loop until hardware device is done
// do something
}
With optimization off, this is straightforward, and you kinda know what to expect.
However, if you turn optimization on, a couple of things might happen:
The compiler might optimize the while block away (we init to 0, it'll never be 1)
Instead of accessing memory, pointer access might be moved to a register->No I/O Update
memory access might be cached (not necessarily compiler optimization related)
In all these cases, the behavior would be drastically different and most likely wrong.

Another crucial difference between debug and release is how local variables are stored. Conceptually local variables are allocated storage in a functions stack frame. The symbol file generated by the compiler tells the debugger the offset of the variable in the stack frame, so the debugger can show it to you. The debugger peeks at the memory location to do this.
However, this means every time a local variable is changed the generated code for that source line has to write the value back to the correct location on the stack. This is very inefficient due to the memory overhead.
In a release build the compiler may assign a local variable to a register for a portion of a function. In some cases it may not assign stack storage for it at all (the more registers a machine has the easier this is to do).
However, the debugger doesn't know how registers map to local variables for a particular point in the code (I'm not aware of any symbol format that includes this information), so it can't show it to you accurately as it doesn't know where to go looking for it.
Another optimization would be function inlining. In optimized builds the compiler may replace a call to foo() with the actual code for foo everywhere it is used because the function is small enough. However, when you try to set a breakpoint on foo() the debugger wants to know the address of the instructions for foo(), and there is no longer a simple answer to this -- there may be thousands of copies of the foo() code bytes spread over your program. A debug build will guarantee that there is somewhere for you to put the breakpoint.

Optimizing code is an automated process that improves the runtime performance of the code while preserving semantics. This process can remove intermediate results which are unncessary to complete an expression or function evaluation, but may be of interest to you when debugging. Similarly, optimizations can alter the apparent control flow so that things may happen in a slightly different order than what appears in the source code. This is done to skip unnecessary or redundant calculations. This rejiggering of code can mess with the mapping between source code line numbers and object code addresses making it hard for a debugger to follow the flow of control as you wrote it.
Debugging in unoptimized mode allows you to see everything you've written as you've written it without the optimizer removing or reordering things.
Once you are happy that your program is working correctly you can turn on optimizations to get improved performance. Even though optimizers are pretty trustworthy these days, it's still a good idea to build a good quality test suite to ensure that your program runs identically (from a functional point of view, not considering performance) in both optimized and unoptimized mode.

The expectation is for the debug version to be - debugged! Setting breakpoints, single-stepping while watching variables, stack traces, and everything else you do in a debugger (IDE or otherwise) make sense if every line of non-empty, non-comment source code matches some machine code instruction.
Most optimizations mess with the order of machine codes. Loop unrolling is a good example. Common subexpressions can be lifted out of loops. With optimization turned on, even the simplest level, you may be trying to set a breakpoint on a line that, at the machine code level, doesn't exist. Sometime you can't monitor a local variable due to it being kept in a CPU register, or perhaps even optimized out of existence!

If you're debugging at the instruction level rather than the source level, it's an awful lot for you easier to map unoptimized instructions back to the source. Also, compilers are occasionally buggy in their optimizers.
In the Windows division at Microsoft, all release binaries are built with debugging symbols and full optimizations. The symbols are stored in separate PDB files and do not affect the performance of the code. They don't ship with the product, but most of them are available at the Microsoft Symbol Server.

Another of the issues with optimizations are inline functions, also in the sense that you will always single-step through them.
With GCC, with debugging and optimizations enabled together, if you don't know what to expect you will think that the code is misbehaving and re-executing the same statement multiple times - it happened to a couple of my colleagues.
Also debugging info given by GCC with optimizations on tend to be of poorer quality than they could, actually.
However, in languages hosted by a Virtual Machine like Java, optimizations and debugging can coexist - even during debugging, JIT compilation to native code continues, and only the code of debugged methods is transparently converted to an unoptimized version.
I would like to emphasize that optimization should not change the behaviour of the code, unless the used optimizer is buggy, or the code itself is buggy and relies on partially undefined semantics; the latter is more common in multithreaded programming or when inline assembly is also used.
Code with debugging symbols are larger which may mean more cache misses, i.e. slower, which may be an issue for server software.
At least on Linux (and there's no reason why Windows should be different) debug info are packaged in a separate section of the binary, and are not loaded during normal execution. They can be split into a different file to be used for debugging.
Also, on some compilers (including Gcc, I guess also with Microsoft's C compiler) debugging info and optimizations can be both enabled together. If not, obviously the code is going to be slower.

Heap corruption under Win32; how to locate?

I'm working on a multithreaded C++ application that is corrupting the heap. The usual tools to locate this corruption seem to be inapplicable. Old builds (18 months old) of the source code exhibit the same behaviour as the most recent release, so this has been around for a long time and just wasn't noticed; on the downside, source deltas can't be used to identify when the bug was introduced - there are a lot of code changes in the repository.
The prompt for crashing behaviuor is to generate throughput in this system - socket transfer of data which is munged into an internal representation. I have a set of test data that will periodically cause the app to exception (various places, various causes - including heap alloc failing, thus: heap corruption).
The behaviour seems related to CPU power or memory bandwidth; the more of each the machine has, the easier it is to crash. Disabling a hyper-threading core or a dual-core core reduces the rate of (but does not eliminate) corruption. This suggests a timing related issue.
Now here's the rub:
When it's run under a lightweight debug environment (say Visual Studio 98 / AKA MSVC6) the heap corruption is reasonably easy to reproduce - ten or fifteen minutes pass before something fails horrendously and exceptions, like an alloc; when running under a sophisticated debug environment (Rational Purify, VS2008/MSVC9 or even Microsoft Application Verifier) the system becomes memory-speed bound and doesn't crash (Memory-bound: CPU is not getting above 50%, disk light is not on, the program's going as fast it can, box consuming 1.3G of 2G of RAM). So, I've got a choice between being able to reproduce the problem (but not identify the cause) or being able to idenify the cause or a problem I can't reproduce.
My current best guesses as to where to next is:
Get an insanely grunty box (to replace the current dev box: 2Gb RAM in an E6550 Core2 Duo); this will make it possible to repro the crash causing mis-behaviour when running under a powerful debug environment; or
Rewrite operators new and delete to use VirtualAlloc and VirtualProtect to mark memory as read-only as soon as it's done with. Run under MSVC6 and have the OS catch the bad-guy who's writing to freed memory. Yes, this is a sign of desperation: who the hell rewrites new and delete?! I wonder if this is going to make it as slow as under Purify et al.
And, no: Shipping with Purify instrumentation built in is not an option.
A colleague just walked past and asked "Stack Overflow? Are we getting stack overflows now?!?"
And now, the question: How do I locate the heap corruptor?
Update: balancing new[] and delete[] seems to have gotten a long way towards solving the problem. Instead of 15mins, the app now goes about two hours before crashing. Not there yet. Any further suggestions? The heap corruption persists.
Update: a release build under Visual Studio 2008 seems dramatically better; current suspicion rests on the STL implementation that ships with VS98.
Reproduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.
I'll take a note of that, but I'm concerned that Dr Watson will only be tripped up after the fact, not when the heap is getting stomped on.
Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.
Got that going at the moment, again: not much help until something goes wrong. I want to catch the vandal in the act.
Maybe these tools will allow you at least to narrow the problem to certain component.
I don't hold much hope, but desperate times call for...
And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?
No I'm not, and I'll spend a couple of hours tomorrow going through the workspace (58 projects in it) and checking they're all compiling and linking with the appropriate flags.
Update: This took 30 seconds. Select all projects in the Settings dialog, unselect until you find the project(s) that don't have the right settings (they all had the right settings).

My first choice would be a dedicated heap tool such as pageheap.exe.
Rewriting new and delete might be useful, but that doesn't catch the allocs committed by lower-level code. If this is what you want, better to Detour the low-level alloc APIs using Microsoft Detours.
Also sanity checks such as: verify your run-time libraries match (release vs. debug, multi-threaded vs. single-threaded, dll vs. static lib), look for bad deletes (eg, delete where delete [] should have been used), make sure you're not mixing and matching your allocs.
Also try selectively turning off threads and see when/if the problem goes away.
What does the call stack etc look like at the time of the first exception?

I have same problems in my work (we also use VC6 sometimes). And there is no easy solution for it. I have only some hints:
Try with automatic crash dumps on production machine (see Process Dumper). My experience says Dr. Watson is not perfect for dumping.
Remove all catch(...) from your code. They often hide serious memory exceptions.
Check Advanced Windows Debugging - there are lots of great tips for problems like yours. I recomend this with all my heart.
If you use STL try STLPort and checked builds. Invalid iterator are hell.
Good luck. Problems like yours take us months to solve. Be ready for this...

We've had pretty good luck by writing our own malloc and free functions. In production, they just call the standard malloc and free, but in debug, they can do whatever you want. We also have a simple base class that does nothing but override the new and delete operators to use these functions, then any class you write can simply inherit from that class. If you have a ton of code, it may be a big job to replace calls to malloc and free to the new malloc and free (don't forget realloc!), but in the long run it's very helpful.
In Steve Maguire's book Writing Solid Code (highly recommended), there are examples of debug stuff that you can do in these routines, like:
Keep track of allocations to find leaks
Allocate more memory than necessary and put markers at the beginning and end of memory -- during the free routine, you can ensure these markers are still there
memset the memory with a marker on allocation (to find usage of uninitialized memory) and on free (to find usage of free'd memory)
Another good idea is to never use things like strcpy, strcat, or sprintf -- always use strncpy, strncat, and snprintf. We've written our own versions of these as well, to make sure we don't write off the end of a buffer, and these have caught lots of problems too.

Run the original application with ADplus -crash -pn appnename.exe
When the memory issue pops-up you will get a nice big dump.
You can analyze the dump to figure what memory location was corrupted.
If you are lucky the overwrite memory is a unique string you can figure out where it came from. If you are not lucky, you will need to dig into win32 heap and figure what was the orignal memory characteristics. (heap -x might help)
After you know what was messed-up, you can narrow appverifier usage with special heap settings. i.e. you can specify what DLL you monitor, or what allocation size to monitor.
Hopefully this will speedup the monitoring enough to catch the culprit.
In my experience, I never needed full heap verifier mode, but I spent a lot of time analyzing the crash dump(s) and browsing sources.
P.S:
You can use DebugDiag to analyze the dumps.
It can point out the DLL owning the corrupted heap, and give you other usefull details.

You should attack this problem with both runtime and static analysis.
For static analysis consider compiling with PREfast (cl.exe /analyze). It detects mismatched delete and delete[], buffer overruns and a host of other problems. Be prepared, though, to wade through many kilobytes of L6 warning, especially if your project still has L4 not fixed.
PREfast is available with Visual Studio Team System and, apparently, as part of Windows SDK.

Is this in low memory conditions? If so it might be that new is returning NULL rather than throwing std::bad_alloc. Older VC++ compilers didn't properly implement this. There is an article about Legacy memory allocation failures crashing STL apps built with VC6.

The apparent randomness of the memory corruption sounds very much like a thread synchronization issue - a bug is reproduced depending on machine speed. If objects (chuncks of memory) are shared among threads and synchronization (critical section, mutex, semaphore, other) primitives are not on per-class (per-object, per-class) basis, then it is possible to come to a situation where class (chunk of memory) is deleted / freed while in use, or used after deleted / freed.
As a test for that, you could add synchronization primitives to each class and method. This will make your code slower because many objects will have to wait for each other, but if this eliminates the heap corruption, your heap-corruption problem will become a code optimization one.

You tried old builds, but is there a reason you can't keep going further back in the repository history and seeing exactly when the bug was introduced?
Otherwise, I would suggest adding simple logging of some kind to help track down the problem, though I am at a loss of what specifically you might want to log.
If you can find out what exactly CAN cause this problem, via google and documentation of the exceptions you are getting, maybe that will give further insight on what to look for in the code.

My first action would be as follows:
Build the binaries in "Release" version but creating debug info file (you will find this possibility in project settings).
Use Dr Watson as a defualt debugger (DrWtsn32 -I) on a machine on which you want to reproduce the problem.
Repdroduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.
Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.
Maybe these tools will allow you at least to narrow the problem to certain component.
And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?

So from the limited information you have, this can be a combination of one or more things:
Bad heap usage, i.e., double frees, read after free, write after free, setting the HEAP_NO_SERIALIZE flag with allocs and frees from multiple threads on the same heap
Out of memory
Bad code (i.e., buffer overflows, buffer underflows, etc.)
"Timing" issues
If it's at all the first two but not the last, you should have caught it by now with either pageheap.exe.
Which most likely means it is due to how the code is accessing shared memory. Unfortunately, tracking that down is going to be rather painful. Unsynchronized access to shared memory often manifests as weird "timing" issues. Things like not using acquire/release semantics for synchronizing access to shared memory with a flag, not using locks appropriately, etc.
At the very least, it would help to be able to track allocations somehow, as was suggested earlier. At least then you can view what actually happened up until the heap corruption and attempt to diagnose from that.
Also, if you can easily redirect allocations to multiple heaps, you might want to try that to see if that either fixes the problem or results in more reproduceable buggy behavior.
When you were testing with VS2008, did you run with HeapVerifier with Conserve Memory set to Yes? That might reduce the performance impact of the heap allocator. (Plus, you have to run with it Debug->Start with Application Verifier, but you may already know that.)
You can also try debugging with Windbg and various uses of the !heap command.
MSN

Graeme's suggestion of custom malloc/free is a good idea. See if you can characterize some pattern about the corruption to give you a handle to leverage.
For example, if it is always in a block of the same size (say 64 bytes) then change your malloc/free pair to always allocate 64 byte chunks in their own page. When you free a 64 byte chunk then set the memory protection bits on that page to prevent reads and wites (using VirtualQuery). Then anyone attempting to access this memory will generate an exception rather than corrupting the heap.
This does assume that the number of outstanding 64 byte chunks is only moderate or you have a lot of memory to burn in the box!

If you choose to rewrite new/delete, I have done this and have simple source code at:
http://gandolf.homelinux.org/~smhanov/blog/?id=10
This catches memory leaks and also inserts guard data before and after the memory block to capture heap corruption. You can just integrate with it by putting #include "debug.h" at the top of every CPP file, and defining DEBUG and DEBUG_MEM.

The little time I had to solve a similar problem.
If the problem still exists I suggest you do this :
Monitor all calls to new/delete and malloc/calloc/realloc/free.
I make single DLL exporting a function for register all calls. This function receive parameter for identifying your code source, pointer to allocated area and type of call saving this information in a table.
All allocated/freed pair is eliminated. At the end or after you need you make a call to an other function for create report for left data.
With this you can identify wrong calls (new/free or malloc/delete) or missing.
If have any case of buffer overwritten in your code the information saved can be wrong but each test may detect/discover/include a solution of failure identified. Many runs to help identify the errors.
Good luck.

Do you think this is a race condition? Are multiple threads sharing one heap? Can you give each thread a private heap with HeapCreate, then they can run fast with HEAP_NO_SERIALIZE. Otherwise, a heap should be thread safe, if you're using the multi-threaded version of the system libraries.

A couple of suggestions. You mention the copious warnings at W4 - I would suggest taking the time to fix your code to compile cleanly at warning level 4 - this will go a long way to preventing subtle hard to find bugs.
Second - for the /analyze switch - it does indeed generate copious warnings. To use this switch in my own project, what I did was to create a new header file that used #pragma warning to turn off all the additional warnings generated by /analyze. Then further down in the file, I turn on only those warnings I care about. Then use the /FI compiler switch to force this header file to be included first in all your compilation units. This should allow you to use the /analyze switch while controling the output

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js