Why do certain things never crash whith debugger on? - c++

My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks

It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).

but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.

A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.

Related

C++ Program freezes esoterically

I wrote a C++ CLI program with MS VC++ 2010 and GCC 4.2.1 (for Mac OS X 10.6 64 bit, in Eclipse).
The program works well under GCC+OS X and most times under Windows. But sometimes it silently freezes. The command line cursor keeps blinking, but the program refuses to continue working.
The following configurations work well:
GCC with 'Release' and 'Debug' configuration.
VC++ with 'Debug' configuration
The error only occurs in the configuration 'VC++ with 'Release' configuration' under Win 7 32 bit and 64 bit. Unfortunately this is the configuration my customer wants to work with ;-(
I already checked my program high and low and fixed all memory leaks. But this error still occurs. Do you have any ideas how I can find the error?
Use logging to narrow down which part of code the program is executing when it crashes. Keep adding log until you narrow it down enough to see the issue.
Enable debug information in the release build (both compiler and linker); many variables won't show up correctly, but it should at least give you sensible backtrace (unless the freeze is due to stack smashing or stack overflow), which is usually enough if you keep functions short and doing just one thing.
Memory leaks can't cause freezes. Other forms of memory misuse are however quite likely to. In my experience overrunning a buffer often cause freezes when that buffer is freed as the free function follows the corrupted block chains. Also watch for any other kind of Undefined Behaviour. There is a lot of it in C/C++ and it usually behaves as you expect in debug and completely randomly when optimized.
Try building and running the program under DUMA library to check for buffer overruns. Be warned though that:
It requires a lot of memory. I mean easily like thousand times more. So you can only test on simple cases.
Microsoft headers tend to abuse their internal allocation functions and mismatch e.g. regular malloc and internal __debug_free (or the other way 'round). So might get a few cases that you'll have to carefully workaround by including those system headers into the duma one before it redefines the functions.
Try building the program for Linux and run it under Valgrind. That will check more problems in addition to buffer overruns and won't use that much memory (only twice as normal, but it is slower, approximately 20 times).
Debug versions usually initialize all allocated memory (MSVC fills them with 0xCD with the debug configuration). Maybe you have some uninitialized values in your classes, with the GCC configurations and MSVC Debug configuration it gets a "lucky" value, but in MSVC Release it doesn't.
Here are the rest of the magic numbers used by MSVC.
So look for uninitialized variables, attributes and allocated memory blocks.
Thank you all, especially Cody Gray and MikMik, I found it!
As some of you recommended I told VS to generate debug information and disabled optimizations in the release configuration. Then I started the program and paused it. Alternatively I remotely attached to the running process. This helped me finding the region where the error was.
The reasons were infinite loops, caused by reads behind the boundaries of an array and a missing exclusion of an invalid case. Both led to unreachable stopping conditions at runtime. The esoteric part came from the fact, that my program uses some randomized values.
That's life...

Dwarf Error: Cannot find DIE

I am having a lot of trouble debugging a segmentation fault in a C++ project in XCode 4.
I only get a segfault when I built with the "LLVM 2.0" compiler option and use -O3 optimization. From what I understand, there are limited debugging options when one is using optimization, but here is the debug output I get after I run in Xcode with gdb turned on:
warning: Got an error handling event: "Dwarf Error: Cannot find DIE at 0x3be2 referenced from DIE at 0x11d [in module /Users/imran/Library/Developer/Xcode/DerivedData/cgo-hczcifktgscxjigfphieegbpxxsq/Build/Products/Debug/cgo]".
No memory available to program now: unsafe to call malloc
I can't get gdb to give me any useful info after that (like a trace), but I'm not sure I really know how to use it properly. When I try to use the "LLDB" debugger Xcode just crashes (which has been a common theme since I started using it).
My program is deterministic, but when I try to isolate the problem with print statements the behavior will change. For example if I add cout << "hello"; at one point the segfault goes away. Other print statements cause my program to segfault in a different iteration of its main loop. And naturally when I put in enough print statements to supposedly pinpoint the offending code, the segfault seems to occur after one line but before the next (i.e. nowhere).
I am using pointers and dynamic memory allocation, which is likely the cause of the problem, but since I can't narrow down the block of code causing the error I don't know what code to show here.
I tried profiling with the "Leaks" tool in Instruments, but it didn't find any leaks.
Any advice? I am very inexperienced with debugging so anything would help, really.
EDIT: Solved. Given certain inputs, my program would try to read past the end of an array.
I don't think there's enough information that I can help you with the DWARF issue. I am not familiar enough with that toolchain to know how robust it is.
Your crashing symptoms however smell a lot like heap corruption. I don't know what allocator OSX uses by default, but common optimizations store metadata inline with objects and/or thread the freelist through empty objects, which makes them very sensitive to buffer overflows on the heap. Freeing an object twice or using a dangling pointer (a pointer that has been freed but whose space may now be in use by another allocation) can also cause seemingly nondeterministic and hard to track errors, since the layout of the heap is likely to change between runs. Print statements also use the allocator, which means changing the print statements can change when and where the problem will appear.
A tool that you may find helpful in determining if this is a heap problem or something unrelated is a heap replacement called DieHard by my advisor (http://prisms.cs.umass.edu/emery/index.php?page=download-diehard). I believe it will build on OSX, and you can link it into your program using LD_PRELOAD=/path/to/libdiehard.so to replace the default allocator at runtime. Its sole purpose is to resist memory errors and heap corruption, so if your application actually runs with it, that's probably where you need to look.

C++: Where to start when my application crashes at random places?

I'm developing a game and when I do a specific action in the game, it crashes.
So I went debugging and I saw my application crashed at simple C++ statements like if, return, ... Each time when I re-run, it crashes randomly at one of 3 lines and it never succeeds.
line 1:
if (dynamic) { ... } // dynamic is a bool member of my class
line 2:
return m_Fixture; // a line of the Box2D physical engine. m_Fixture is a pointer.
line 3:
return m_Density; // The body of a simple getter for an integer.
I get no errors from the app nor the OS...
Are there hints, tips or tricks to debug more efficient and get known what is going on?
That's why I love Java...
Thanks
Random crashes like this are usually caused by stack corruption, since these are branching instructions and thus are sensitive to the condition of the stack. These are somewhat hard to track down, but you should run valgrind and examine the call stack on each crash to try and identify common functions that might be the root cause of the error.
Are there hints, tips or tricks to debug more efficient and get known what is going on?
Run game in debugger, on the point of crash, check values of all arguments. Either using visual studio watch window or using gdb. Using "call stack" check parent routines, try to think what could go wrong.
In suspicious(potentially related to crash) routines, consider dumping all arguments to stderr (if you're using libsdl or on *nixlike systems), or write a logfile, or send dupilcates of all error messages using (on Windows) OutputDebugString. This will make them visible in "output" window in visual studio or debugger. You can also write "traces" (log("function %s was called", __FUNCTION__))
If you can't debug immediately, produce core dumps on crash. On windows it can be done using MiniDumpWriteDump, on linux it is set somewhere in configuration variables. core dumps can be handled by debugger. I'm not sure if VS express can deal with them on Windows, but you still can debug them using WinDBG.
if crash happens within class, check *this argument. It could be invalid or zero.
If the bug is truly evil (elusive stack corruption in multithreaded app that leads to delayed crash), write custom memory manager, that will override new/delete, provide alternative to malloc(if your app for some reason uses it, which may be possible), AND that locks all unused memory memory using VirtualProtect (windows) or OS-specific alternative. In this case all potentially dangerous operation will crash app instantly, which will allow you to debug the problem (if you have Just-In-Time debugger) and instantly find dangerous routine. I prefer such "custom memory manager" to boundschecker and such - since in my experience it was more useful. As an alternative you could try to use valgrind, which is available on linux only. Note, that if your app very frequently allocates memory, you'll need a large amount of RAM in order to be able to lock every unused memory block (because in order to be locked, block should be PAGE_SIZE bytes big).
In areas where you need sanity check either use ASSERT, or (IMO better solution) write a routine that will crash the application (by throwing an std::exception with a meaningful message) if some condition isn't met.
If you've identified a problematic routine, walk through it using debugger's step into/step over. Watch the arguments.
If you've identified a problematic routine, but can't directly debug it for whatever reason, after every statement within that routine, dump all variables into stderr or logfile (fprintf or iostreams - your choice). Then analyze outputs and think how it could have happened. Make sure to flush logfile after every write, or you might miss the data right before the crash.
In general you should be happy that app crashes somewhere. Crash means a bug you can quickly find using debugger and exterminate. Bugs that don't crash the program are much more difficult (example of truly complex bug: given 100000 values of input, after few hundreds of manipulations with values, among thousands of outputs, app produces 1 absolutely incorrect result, which shouldn't have happened at all)
That's why I love Java...
Excuse me, if you can't deal with language, it is entirely your fault. If you can't handle the tool, either pick another one or improve your skill. It is possible to make game in java, by the way.
These are mostly due to stack corruption, but heap corruption can also affect programs in this way.
stack corruption occurs most of the time because of "off by one errors".
heap corruption occurs because of new/delete not being handled carefully, like double delete.
Basically what happens is that the overflow/corruption overwrites an important instruction, then much much later on, when you try to execute the instruction, it will crash.
I generally like to take a second to step back and think through the code, trying to catch any logic errors.
You might try commenting out different parts of the code and seeing if it affects how the program is compiled.
Besides those two things you could try using a debugger like Visual Studio or Eclipse etc...
Lastly you could try to post your code and the error you are getting on a website with a community that knows programming and could help you work through the error (read: stackoverflow)
Crashes / Seg faults usually happen when you access a memory location that it is not allowed to access, or you attempt to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location).
There are many memory analyzer tools, for example I use Valgrind which is really great in telling what the issue is (not only the line number, but also what's causing the crash).
There are no simple C++ statements. An if is only as simple as the condition you evaluate. A return is only as simple as the expression you return.
You should use a debugger and/or post some of the crashing code. Can't be of much use with "my app crashed" as information.
I had problems like this before. I was trying to refresh the GUI from different threads.
If the if statements involve dereferencing pointers, you're almost certainly corrupting the stack (this explains why an innocent return 0 would crash...)
This can happen, for instance, by going out of bounds in an array (you should be using std::vector!), trying to strcpy a char[]-based string missing the ending '\0' (you should be using std::string!), passing a bad size to memcpy (you should be using copy-constructors!), etc.
Try to figure out a way to reproduce it reliably, then place a watch on the corrupted pointer. Run through the code line-by-line until you find the very line that corrupts the pointer.
Look at the disassembly. Almost any C/C++ debugger will be happy to show you the machine code and the registers where the program crashed. The registers include the Instruction Pointer (EIP or RIP on x86/x64) which is where the program was when it stopped. The other registers usually have memory addresses or data. If the memory address is 0 or a bad pointer, there is your problem.
Then you just have to work backward to find out how it got that way. Hardware breakpoints on memory changes are very helpful here.
On a Linux/BSD/Mac, using GDB's scripting features can help a lot here. You can script things so that after the breakpoint is hit 20 times it enables a hardware watch on the address of array element 17. Etc.
You can also write debugging into your program. Use the assert() function. Everywhere!
Use assert to check the arguments to every function. Use assert to check the state of every object before you exit the function. In a game, assert that the player is on the map, that the player has health between 0 and 100, assert everything that you can think of. For complicated objects write verify() or validate() functions into the object itself that checks everything about it and then call those from an assert().
Another way to write in debugging is to have the program use signal() in Linux or asm int 3 in Windows to break into the debugger from the program. Then you can write temporary code into the program to check if it is on iteration 1117321 of the main loop. That can be useful if the bug always happens at 1117322. The program will execute much faster this way than to use a debugger breakpoint.
some tips :
- run your application under a debugger, with the symbol files (PDB) together.
- How to set Visual Studio as the default post-mortem debugger?
- set default debugger for WinDbg Just-in-time Debugging
- check memory allocations Overriding new and delete, and Overriding malloc and free
One other trick: turn off code optimization and see if the crash points make more sense. Optimization is allowed to float little bits of your code to surprising places; mapping that back to source code lines can be less than perfect.
Check pointers. At a guess, you're dereferencing a null pointer.
I've found 'random' crashes when there are some reference to a deleted object. As the memory is not necessarily overwritten, in many cases you don't notice it and the program works correctly, and than crashes after the memory was updated and is not valid anymore.
JUST FOR DEBUGGING PURPOSES, try commenting out some suspicious 'deletes'. Then, if it doesn't crash anymore, there you are.
use the GNU Debugger
Refactoring.
Scan all the code, make it clearer if not clear at first read, try to understand what you wrote and immediately fix what seems incorrect.
You'll certainly discover the problem(s) this way and fix a lot of other problems too.

How to debug heap corruption errors?

I am debugging a (native) multi-threaded C++ application under Visual Studio 2008. On seemingly random occasions, I get a "Windows has triggered a break point..." error with a note that this might be due to a corruption in the heap. These errors won't always crash the application right away, although it is likely to crash short after.
The big problem with these errors is that they pop up only after the corruption has actually taken place, which makes them very hard to track and debug, especially on a multi-threaded application.
What sort of things can cause these errors?
How do I debug them?
Tips, tools, methods, enlightments... are welcome.
Application Verifier combined with Debugging Tools for Windows is an amazing setup. You can get both as a part of the Windows Driver Kit or the lighter Windows SDK. (Found out about Application Verifier when researching an earlier question about a heap corruption issue.) I've used BoundsChecker and Insure++ (mentioned in other answers) in the past too, although I was surprised how much functionality was in Application Verifier.
Electric Fence (aka "efence"), dmalloc, valgrind, and so forth are all worth mentioning, but most of these are much easier to get running under *nix than Windows. Valgrind is ridiculously flexible: I've debugged large server software with many heap issues using it.
When all else fails, you can provide your own global operator new/delete and malloc/calloc/realloc overloads -- how to do so will vary a bit depending on compiler and platform -- and this will be a bit of an investment -- but it may pay off over the long run. The desirable feature list should look familiar from dmalloc and electricfence, and the surprisingly excellent book Writing Solid Code:
sentry values: allow a little more space before and after each alloc, respecting maximum alignment requirement; fill with magic numbers (helps catch buffer overflows and underflows, and the occasional "wild" pointer)
alloc fill: fill new allocations with a magic non-0 value -- Visual C++ will already do this for you in Debug builds (helps catch use of uninitialized vars)
free fill: fill in freed memory with a magic non-0 value, designed to trigger a segfault if it's dereferenced in most cases (helps catch dangling pointers)
delayed free: don't return freed memory to the heap for a while, keep it free filled but not available (helps catch more dangling pointers, catches proximate double-frees)
tracking: being able to record where an allocation was made can sometimes be useful
Note that in our local homebrew system (for an embedded target) we keep the tracking separate from most of the other stuff, because the run-time overhead is much higher.
If you're interested in more reasons to overload these allocation functions/operators, take a look at my answer to "Any reason to overload global operator new and delete?"; shameless self-promotion aside, it lists other techniques that are helpful in tracking heap corruption errors, as well as other applicable tools.
Because I keep finding my own answer here when searching for alloc/free/fence values MS uses, here's another answer that covers Microsoft dbgheap fill values.
You can detect a lot of heap corruption problems by enabling Page Heap for your application . To do this you need to use gflags.exe that comes as a part of Debugging Tools For Windows
Run Gflags.exe and in the Image file options for your executable, check "Enable Page Heap" option.
Now restart your exe and attach to a debugger. With Page Heap enabled, the application will break into debugger whenever any heap corruption occurs.
To really slow things down and perform a lot of runtime checking, try adding the following at the top of your main() or equivalent in Microsoft Visual Studio C++
_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF | _CRTDBG_CHECK_ALWAYS_DF );
A very relevant article is Debugging Heap corruption with Application Verifier and Debugdiag.
What sort of things can cause these errors?
Doing naughty things with memory, e.g. writing after the end of a buffer, or writing to a buffer after it's been freed back to the heap.
How do I debug them?
Use an instrument which adds automated bounds-checking to your executable: i.e. valgrind on Unix, or a tool like BoundsChecker (Wikipedia suggests also Purify and Insure++) on Windows.
Beware that these will slow your application, so they may be unusable if yours is a soft-real-time application.
Another possible debugging aid/tool might be MicroQuill's HeapAgent.
One quick tip, that I got from Detecting access to freed memory is this:
If you want to locate the error
quickly, without checking every
statement that accesses the memory
block, you can set the memory pointer
to an invalid value after freeing the
block:
#ifdef _DEBUG // detect the access to freed memory
#undef free
#define free(p) _free_dbg(p, _NORMAL_BLOCK); *(int*)&p = 0x666;
#endif
The best tool I found useful and worked every time is code review (with good code reviewers).
Other than code review, I'd first try Page Heap. Page Heap takes a few seconds to set up and with luck it might pinpoint your problem.
If no luck with Page Heap, download Debugging Tools for Windows from Microsoft and learn to use the WinDbg. Sorry couldn't give you more specific help, but debuging multi-threaded heap corruption is more an art than science. Google for "WinDbg heap corruption" and you should find many articles on the subject.
What type of allocation functions are you using? I recently hit a similar error using the Heap* style allocation functions.
It turned out that I was mistakenly creating the heap with the HEAP_NO_SERIALIZE option. This essentially makes the Heap functions run without thread safety. It's a performance improvement if used properly but shouldn't ever be used if you are using HeapAlloc in a multi-threaded program [1]. I only mention this because your post mentions you have a multi-threaded app. If you are using HEAP_NO_SERIALIZE anywhere, delete that and it will likely fix your problem.
[1] There are certain situations where this is legal, but it requires you to serialize calls to Heap* and is typically not the case for multi-threaded programs.
If these errors occur randomly, there is high probability that you encountered data-races. Please, check: do you modify shared memory pointers from different threads? Intel Thread Checker may help to detect such issues in multithreaded program.
You may also want to check to see whether you're linking against the dynamic or static C runtime library. If your DLL files are linking against the static C runtime library, then the DLL files have separate heaps.
Hence, if you were to create an object in one DLL and try to free it in another DLL, you would get the same message you're seeing above. This problem is referenced in another Stack Overflow question, Freeing memory allocated in a different DLL.
In addition to looking for tools, consider looking for a likely culprit. Is there any component you're using, perhaps not written by you, which may not have been designed and tested to run in a multithreaded environment? Or simply one which you do not know has run in such an environment.
The last time it happened to me, it was a native package which had been successfully used from batch jobs for years. But it was the first time at this company that it had been used from a .NET web service (which is multithreaded). That was it - they had lied about the code being thread safe.
You can use VC CRT Heap-Check macros for _CrtSetDbgFlag: _CRTDBG_CHECK_ALWAYS_DF or _CRTDBG_CHECK_EVERY_16_DF.._CRTDBG_CHECK_EVERY_1024_DF.
I'd like to add my experience. In the last few days, I solved an instance of this error in my application. In my particular case, the errors in the code were:
Removing elements from an STL collection while iterating over it (I believe there are debug flags in Visual Studio to catch these things; I caught it during code review)
This one is more complex, I'll divide it in steps:
From a native C++ thread, call back into managed code
In managed land, call Control.Invoke and dispose a managed object which wraps the native object to which the callback belongs.
Since the object is still alive inside the native thread (it will remain blocked in the callback call until Control.Invoke ends). I should clarify that I use boost::thread, so I use a member function as the thread function.
Solution: Use Control.BeginInvoke (my GUI is made with Winforms) instead so that the native thread can end before the object is destroyed (the callback's purpose is precisely notifying that the thread ended and the object can be destroyed).
I had a similar problem - and it popped up quite randomly. Perhaps something was corrupt in the build files, but I ended up fixing it by cleaning the project first then rebuilding.
So in addition to the other responses given:
What sort of things can cause these errors?
Something corrupt in the build file.
How do I debug them?
Cleaning the project and rebuilding. If it's fixed, this was likely the problem.
I have also faced this issue. In my case, I allocated for x size memory and appended the data for x+n size. So, when freeing it shown heap overflow. Just make sure your allocated memory sufficient and check for how many bytes added in the memory.

Heisenbug: WinApi program crashes on some computers

Please help! I'm really at my wits' end.
My program is a little personal notes manager (google for "cintanotes").
On some computers (and of course I own none of them) it crashes with an unhandled exception just after start.
Nothing special about these computers could be said, except that they tend to have AMD CPUs.
Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.
Here is what is certain about this "Heisenbug":
1) The crash happens only in the Release version.
2) The crash goes away as soon as I remove all GDI-related stuff.
3) BoundChecker has no complains.
4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Any ideas would be greatly appreciated!
UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:
"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."
and code breaks on
0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]
So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!
Thank you all very much for help!!
4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?
What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?
To see the code where the crash happened you should perform what is called a postmortem analysis.
Windows Error Reporting
If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .
Get the crash dump intended for WER directly from the customer
Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.
Your own minidump
Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.
So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation:
1.) Initialization of globals
2.) Actual machine Code generated etc..
So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.
-AD
1) The crash happens only in the Release version.
That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).
2) The crash goes away as soon as I remove all GDI-related stuff.
Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?
Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.
Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.
Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)
Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Stack overflow! ;)
4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption
I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.
What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with
Ensure you have no preceeding memory corruptions using PageHeap.exe
Ensure you have no stack overflow (CBig array[1000000])
Ensure that you have no un-initialized memory.
Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.
"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"
This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.
When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.
You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,
Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.
Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.
This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.
Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.
Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.
Good Luck!
EDIT:
Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.
Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.
Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.
Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.
Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.