C++ Any advice on tracking down Access Violations? - c++

I've been having issues trying to track down an Access Violation in my program. It occurs when the destructor is called for the third time, exactly when the destructor appears to finish.
I've spent hours trying to track this down so I'm looking for further advice on things I can do. I'm creating the class instance with new and delete operators. The Visual Studio output window shows:
First-chance exception at 0x60e3ad84 (msvcp100d.dll) in WebCollationAgent.exe: 0xC0000005: Access violation writing location 0xabababab.
Unhandled exception at 0x60e3ad84 (msvcp100d.dll) in WebCollationAgent.exe: 0xC0000005: Access violation writing location 0xabababab.
Is there anything I can do to try and find out what was in those memory locations?
The call stack window shows the following (in reverse order as I've pasted it in to chronological order, earliest to latest):
Program.exe!Network::`scalar deleting destructor'() + 0x2b bytes C++
Program.exe!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::~basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >() Line 754 + 0xf bytes C++
Program.exe!std::_String_val<wchar_t,std::allocator<wchar_t> >::~_String_val<wchar_t,std::allocator<wchar_t> >() Line 478 + 0xb bytes C++
msvcp100d.dll!std::_Container_base12::_Orphan_all() Line 214 + 0x5 bytes C++
My best guess at this information is that there's some sort of string variable causing the issue? Does anyone have any advice on interpreting this information?
Any other pieces of advice would also be useful, thanks in advance.
I'm coding under Windows 7 and using Visual Studio 2010 Professional.

I've had success tracking memory bugs before using BoundsChecker (now part of Borland DevPartner). There are a number of similar products that might also help: HeapAgent, and Rational Purify. These seem to be much like ValGrind, but work on Windows.
Here are 3 open source alternatives that might assist:
DUMA (by the looks of it, you'll have to build it for Windows yourself, but the README contains some notes on doing that)
XMEM
elephant
I have no idea how these perform, but they sound very promising, and look like they all work on Windows in one way or another.
This Microsoft page on managing memory errors might also help, and also has links to setting Memory breakpoints, which might help you find out when your data is first being deleted or altered. Good luck!

Use the Microsoft Heap Debugging stuff and hope yours is one of the cases for which it was designed. Purify would be the next step after that.
It's built into Visual Studio and is helpful in some cases. If it works it sure beats giving IBM two or three pockets full of cash for Purify.
You can find the info here
TL;DR
In main, do this
int tmpFlag = _CrtSetDbgFlag( _CRTDBG_REPORT_FLAG );
// Turn On (OR) - Keep freed memory blocks in the
// heap's linked list and mark them as freed
tmpFlag |= _CRTDBG_DELAY_FREE_MEM_DF;
// Turn on memory checking at each heap operation
tmpFlag |= _CRTDBG_CHECK_ALWAYS_DF;
// Set the new state for the flag
_CrtSetDbgFlag( tmpFlag );
You can toggle the _CRTDBG_CHECK_ALWAYS_DF flag at different places if it runs too slow. However, I'd run it a few times with each heap op checked in order to get a feel for where the problem occurs.

I wrote this blog with some tips
http://www.atalasoft.com/cs/blogs/loufranco/archive/2007/02/06/6-_2200_Pointers_2200_-on-Debugging-Unmanaged-Code.aspx
The main thing you need to do is to get the crash or exception to happen while the code with the bug is still on the stack. A lot of times you get Access Violation some time after the code with the bug has executed and returned and it may actually be a long time later (in computer time). In that case, it's almost impossible to figure it out.
In your case, with the problem flagged in delete, it's a strong indicator that the heap is corrupt, and two common reasons in C++ are double-delete and mixing up array deletion (using delete when you should use delete[] or the other way around).
If you can reproduce it with simple code, I would look for the two problems above. Otherwise, download the Microsoft Debugging Tools and use gflags +hpa -i program.exe to make the heap much more sensitive to corruption (it will report the error much more quickly).

Related

Out of virtual memory address space (Borland C++ Builder 6 program)

I have problem with some application written under C++ Builder 6. After some time of running (week, month) the application crashes and closes without any error message. In my application log shortly before crash i get many "Out of memory" exceptions.
I looked at the process when it was throwing out of memory exceptions (screenshot below) and it has lots of uncommitted private memory space. What can be a reason of such behavior?
I had such problem once, couple years ago. The reason for that was an option "use dynamic libraries" unchecked in linker options. When I checked it back the problem disappeared and vice versa. The test application which I made was just calling "new char[1000000]" and then delete. The memory was freed every time (no committed memory rise in windows task manager), but after some time I got out of memory, VMMap showed exactly the same thing. Lots of reserved private memory but most of it uncommitted.
Now the problem returned but I can't fix it the same way. I don't know if that was the reason but I had Builder 6 and 2010 istalled on the same machine. Now I just have Builder 6 and it seems that I cannot reproduce the error with test application like before. Ether way it seems that there is some memory manager error or something. CodeGuard doesn't show any memory leaks. When I create memory block with "new" it instantly shows in "memory commit size" and when delete the memory usage decreases, so I assume that the memory leaks are not the case, task manager doesn't show much "memory commit size".
Is there anything I can do? Is there any way I can release uncommitted memory? How to diagnose the problem any further?
The screenshot:
http://i.stack.imgur.com/UKuTZ.jpg
I found a way to simulate this problem and solution.
for(int i=0; i<100; i++)
{
char * b = new char[100000000];
new char;
delete b;
}
Borland memory manager reserves a block of memory which size is multiple of one page which is 4kB. When allocating memory size different than multiple of 4kB there is some free space which borland may use to allocate some other memory chunk. When the first chunk is deallocated the second is still keeping hole memory block reserved.
At first look the code should cause just 100B memory leak, but in fact it will cause memory allocation exception after less than 16 iterations.
I have found two solutions for this problem. One is FastMM, it works but also brings some troubles with it too.
Second solution is to exchange borlndmm.dll with the one from Embarcadero Rad Studio 2010. I didn't test it thoroughly yet but it seems to work without any problem.
I should move the hole project to RAD 2010 but for some reasons I got stuck in Borland 6.
Prologue
Hmm interesting behaviour... I have to add some thing I learned the hard way. I dismiss BCB6 immediately after try it for few times because it had too much bugs for my taste (in comparison with BCB5 especially with AnsiStrings handling). So I stayed with BCB5 for a long time without any problems. I used it even for a very Big projects like CAD/CAM.
After few years pass I had to move to BDS2006 because of my employer and the problems start (some can be similar to yours). Aside the minor IDE and trace/breakpoint/codeguard bugs there are more important things like:
memory manager
delete/delete[] corrupts memory manager if called twice for the same pointer without throwing any exception to notify ...
wrong default constructor/destructor for struct compiler bug was the biggest problem I had (in combination with the delete)
wrong or missing member functions in classes can cause multiple delete call !!! due t bug either in compiler or in C++ engine.
but I was lucky and solve it here: bds 2006 C hidden memory manager conflicts (class new / delete[] vs. AnsiString)
wrong compile
Sometimes app is compiled wrongly, no error is thrown but some lines of code are missing in exe and/or are in different order then in source code. I saw this occasionally also in BCB 5,6. To solve that:
delete all temp files like ~,obj,tds,map,exe,...
close IDE and open it again just to be sure (sometimes view of local variables (mostly big arrays) corrupt IDE memory)
compile again
beware breakpoint/trace/codeguard behave differently then raw app
especially with multi-threading App behaves different while traced and while not. Also codeguard do a big difference (and I do not mean the slow down of execution which corrupts sensitive timing). For example codeguard has a nasty habit of throwing out of memory exceptions without a reason sometimes so some parts of code must be checked over and over until it goes through sometimes even if the mem usage is still the same and far from out of mem.
AnsiString operators
There are two types of AnsiString in VCL normal and component property. So it is wise to take that into account because for component property AnsiString's are the operation of operators different. Try for example something like
Edit1->Text+="xxx";
also there are still AnsiString operator bugs like this:
AnsiString version="aaa"+AnsiString("aaa")+"aaa"; // codeguard: array access violation
Import older BCB projects
Avoid direct import if possible it often creates some unknown allocation and memleaks errors. I am not sure why but I suspect that imported window classes are handled differently and the memleaks are related to bullet #1. Better way is create new App and create/copy components and code manually. I know it is backword but the only safe way to avoid problems still don't know where is the problem but simple *.bdsproj replacement will not help !!! And in *.dfm I had not seen anything suspicious.

Why do certain things never crash whith debugger on?

My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks
It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).
but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.
A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.

Software to track several memory errors in old project?

I am programming a game since 2 years ago.
sometimes some memory errors (ie: a function returning junk instead of what it was supposed to return, or a crash that only happen on Linux, and never happen with GDB or Windows) happen seemly at random. That is, I try to fix it, and some months later the same errors return to haunt me.
There are a software (not Valgrind, I already tried it... it does not find the errors) that can help me with that problem? Or a method of solving these errors? I want to fix them permanently.
On Windows, you can automatically capture a crashing exception in a production environment and analyze it as if the error occurred on your developer PC under the debugger. This is done using a "mini-dump" file. You basically use the Windows "dbghelp.dll" DLL to generate a copy of the thread stacks, parts or all of the heap, the register values, the loaded modules, and the unhandled exception that resulted in the crash. You can launch this ".dmp" file in the MS Visual Studio debugger as if it were an executable and it will show you exactly where the crash occurred.
You can set up a trap for unhandled exceptions and delegate the creation of the mini-dump file to dbghelp.dll in that trap. You need to keep the ".pdb" files that were generated with the deployed binaries to match up memory addresses with source code locations for a better debugging experience. This topic is too deep to fully cover See Microsoft's documentation on this DLL.
You do need to be able to copy the .dmp file from the PC where it crashed to your development environment to fully debug it. If you have a hands-off relationship with your users you'll need to have the option of having a separate utility app "phone home" over the internet to tranfer the .dmp file to a location where you can access it. You can launch the app from the unhandled exception trap after the .dmp file has been generated. For user privacy, you should give the user the option of whether or not to do this.
The Totalview debugger (commercial software) may catch the crash.
Purify (commercial software) can help you find memory leaks.
Does your code compile free of compiler warnings? Did you run lint?
One thing you could try is using the Hans Boehm GC with your project. It can be used as a leak detector, allowing you to remove suspicious-looking free() or delete statements and easily see whether they cause memory leaks.
AFAIK, Boundscheck in Windows does a very good job. In one of my project, it caught some very weird errors.
To avoid this in my own projects (on Windows), I wrote my own memory allocator which simply called VirtualAlloc and VirtualFree. It allocated an extra page for each request, aligned it just to the left of the last page, and used VirtualProtect to generate an exception whenever the last page was accessed. This detected out-of-bounds accesses, even just reads, on the spot.
Disclaimer: I was by no means the first to have this idea.
For example, if pages are 4096 bytes, and new int[1] was called, the allocator would:
Allocate 8192 bytes (4 bytes are needed, which is one page, and the extra guard page brings the total to 2 pages)
Mark the last page unaccessible
Determine the address to return (the last allocated page starts at 4096... 4096 - 2 = 4092)
The following code:
main() {
int *array = new int[10];
return array[10];
}
would then generate an access violation on the spot.
It also had a (compile-time) option to detect accesses beyond the left side of the allocation (ie, array[-1]), but these kinds of errors seemed rare, so I never used the option.

Application crashes says : Access violation reading location

My application crashes after running for around 18 hours. I am not able to debug the point in the code where it actually crashes. I checked the call stack- it does not provide any information as such. The last few calls in the call stack are greyed out-meaning I cannot see the code of that part-they all belong to MFC libraries.
However, I get this 'MicroSoft Visual Studio' pop-up when it crashes which says :
Unhandled exception at 0x7c809e8a in NIMCAsst.exe: 0xC0000005:
Access violation reading location 0x154c6000.
Could the above information be useful to understand where it is crashing.Is there any software that could tell me a particular memory address is held by which variable in the code.
If you can't catch the exception sometimes you just have to go through your code line by line, very unpleasant but I'd put money on it being your code not in MFC (always is with my bugs). Check how you're using memory and what you're passing into the MFC functions extra carefully.
Probably the crash is caused by a buffer overflow or other type of memory corruption. This has overwritten some part of the stack holding the return address which has made the debugger unable to reconstruct the stack trace correctly. Or, that the code that caused the crash, you do not have correct sybols for (if the stack trace shows a module name, this would be the case).
My first guess would be to examine the code calling the code that crashed for possible issues that might have caused it. Do you get any other exceptions or error conditions before the crash? Maybe you are ignoring an error return? Did you try using the Debug Heap? What about adplus? Application verifier to turn on heap checks?
Other possibilities include to run a tool like pclint over the code to check for obvious issues of memory use. Are you using threads? Maybe there is a race condition. The list could go on forever really.
The above information only tells you which memory was accessed illegally.
You can use exception handling to narrow down the place where the problem occurs, but then you need at least an idea in which corner to seek.
You say that you're seeing the call stack, that suggests you're using a debugger. The source code of MFC is available (but perhaps not with all vc++ editions), so in principle one can trace through it. Which VC++ version are you using?
The fact that the bug takes so long to occur suggests that it is memory corruption. Some other function writes to a location that it doesn't own. This works a long time, but finally the function alters a pointer that MCF needs, and after a while MFC accesses the pointer and you are notified.
Sometimes, the 'location' can be recognized as data, in which case you have a hint. F.e. if the error said:
Access violation reading location 0x31323334
you'd recognize this as a part of an ASCII string "1234", and this might lead you to the culprit.
As Patrick says, it's almost definitely your code giving MFC invalid values. One guess would be you're passing in an incorrect length so the library is reading too far. But there are really a multitude of possible causes.
Is the crash clearly reproducible?
If yes, Use Logfiles! You should use a logfile and add a number statements that just log the source file/line number passed. Start with a few statements at the entrypoint (main event handler) and the most common execution paths. After the crash inspect the last entry in the logfile. Then add new entries down the path/paths that must have been passed etc. Usually after a few iterations of this work you will find the point of failure. In case of your long wait time the log file might become huge and each iteration will take another 18 hours. You may need to add some technique of rotating log files etc. But with this technique i was able to find some comparable bugs.
Some more questions:
Is your app multithreaded?
Does it use any arrays not managed by stl or comparable containers (does it use C-Strings, C/C++-Arrays etc)?
Try attaching a debugger to the process and have the debugger break on access violations.
If this isnt possible then we use a tool called "User mode process dumper" to create a memory dump of the process at the point where the access violation happened. You can find this for download here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=E089CA41-6A87-40C8-BF69-28AC08570B7E&displaylang=en
How it works: You configure rules on a per-process (or optionally system-wide) basis, and have the tool create either a minidump or a full dump at the point where it detects any one of a list of exceptions - one of them being an access violation. After the dump has been made the application continues as normal (and so if the access violation is unhandled, you will then see this dialog).
Note that ALL access violations in your process are captured - even those that are then later handled, also a full dump can create a while to create depending on the amount of memory the application is using (10-20 seconds for a process consuming 100-200 MB of private memory). For this reason it's probably not a good idea to enable it system-wide.
You should then be able to analyse the dump using tools like WinDbg (http://www.microsoft.com/whdc/devtools/debugging/default.mspx) to figure out what happened - in most cases you will find that you only need a minidump, not a full dump (however if your application doesnt use much memory then there arent really many drawbacks of having a full dump other than the size of the dump and the time it takes to create the dump).
Finally, be warned that debugging access violations using WinDbg can be a fairly involed and complex process - if you can get a stack trace another way then you might want to try that first.
This is the cause of possible memory leak, there are various blogs could teach on checking for memory leaks in application, you simply make observations on Physical Memory of the process from Windows Task Manager, you could find at some stage where memory keep increasing & run out of memory. You can also try running with windbg tool to identify memory leaks in your code. I havent used this tool just giving some heads up on this.
This question is pretty old, and I've had the same problem,
but I've quickly solved it - it's all about threads:
First, note that updating GUI can only be done at the Main Thread.
My problem was that I've tried to handle GUI from a Worker Thread (and not a Main Thread) and i've got the same error: 0xC0000005.
I've solved it by posting a message (which is executed at the Main Thread) - and the problem was solved:
typedef enum {
WM_UPDATE_GUI
}WM_MY_MSG
// register function callback to a message
BEGIN_MESSAGE_MAP(CMyDlg, CDlgBase)
ON_MESSAGE(WM_UPDATE_GUI, OnUpdateGui)
END_MESSAGE_MAP()
// For this example - function that is not invoked in the Main Thread:
void CMyDlg::OnTimer()
{
CString str_to_GUI("send me to gui"); // send string to gui
// Update_GUI(str_to_GUI); // crashed
::PostMessage(hWnd, MyMsg::WM_UPDATE_GUI, (WPARAM)&str_to_GUI, 0);
}
HRESULT CMyDlg::OnUpdateGui(WPARAM wParam, LPARAM lParam)
{
CString str = *(CString*)wParam; // get the string from the posted message
Update_GUI(str);
return S_OK;
}

Heisenbug: WinApi program crashes on some computers

Please help! I'm really at my wits' end.
My program is a little personal notes manager (google for "cintanotes").
On some computers (and of course I own none of them) it crashes with an unhandled exception just after start.
Nothing special about these computers could be said, except that they tend to have AMD CPUs.
Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.
Here is what is certain about this "Heisenbug":
1) The crash happens only in the Release version.
2) The crash goes away as soon as I remove all GDI-related stuff.
3) BoundChecker has no complains.
4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Any ideas would be greatly appreciated!
UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:
"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."
and code breaks on
0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]
So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!
Thank you all very much for help!!
4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?
What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?
To see the code where the crash happened you should perform what is called a postmortem analysis.
Windows Error Reporting
If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .
Get the crash dump intended for WER directly from the customer
Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.
Your own minidump
Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.
So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation:
1.) Initialization of globals
2.) Actual machine Code generated etc..
So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.
-AD
1) The crash happens only in the Release version.
That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).
2) The crash goes away as soon as I remove all GDI-related stuff.
Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?
Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.
Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.
Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)
Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Stack overflow! ;)
4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption
I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.
What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with
Ensure you have no preceeding memory corruptions using PageHeap.exe
Ensure you have no stack overflow (CBig array[1000000])
Ensure that you have no un-initialized memory.
Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.
"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"
This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.
When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.
You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,
Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.
Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.
This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.
Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.
Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.
Good Luck!
EDIT:
Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.
Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.
Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.
Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.
Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.