What can cause an abnormal program termination? - c++

MFC application (uses SQLite3.dll for DB access, along with other DLLs for accessing hardware) terminates abnormally. There is no particular sequence of termination :(
My application is a
Single threaded Application
Uses exception handling
Uses more than 6 DLLs to access different hardwares
Runs on WinXP SP2
Initially i thought it might be because of Stack Overflow, later i discovered its not. Can someone tell me what are all the general causes for an abnormal program termination? If someone has come across similar problems or has any hints or clues, please pass them on.
Thanks in Advance

Generally speaking, the general causes of crashes are when you:
read memory that isn't yours
write memory that isn't yours
divide by zero
do something inside an interrupt that you shouldn't
free() a pointer more than once
Possibly also:
have an unhanded exception
found a bug in your MFC
one of your >6 hardware-access DLLs is doing any of the above
You are encountering some kind of hardware fault
Maybe you're passing a bad buffer to one of your hardware DLLs, or are forgetting to lock some memory, or you could even have a version mismatch between the DLLs and their headers.
There are so many choices :P

Since this is a run-time issue, I suggest you send debug statements to a log file. Include the function name and perhaps a timestamp. Always flush the output buffer after writing to the file, as this provides better probability that the last line was written to the file before the exception occurs.

Related

Override a crash in C++

The question I have is:
Is there a way to call a function or do some other work whenever a crash occurs in the program.
To be specific, the code I am currently working on gives "segmentation fault" on a very large input. This means that I am accessing some unavailable\unallocated part of the memory at some point. Displaying each step would be too nasty so I want to detect when I cross my bounds.
So how could this be done?
If you compiled your code with debug symbols, you should be able to either load a core file into your debugger and/or attach a debugger when the crash occurs.
Usually, you can tell what happened from the location of the crash. Did you dereference a null pointer? Did you dereference an invalid pointer? The second one is harder to debug than the first (usually means memory corruption). A useful tool to use with memory corruptions, especially if the fault is repeatable, is to place "watch points" which causes the debugger to halt whenever a particular memory location (i.e. the pointer that causes the crash) changes. This will allow you to see what overwrote your pointer.
You really need a debugger. But to answer an original question for future searchers:
Yes, in UNIX you can just handle the SEGV signal with usual signal handling procedures.
SIGSEGV has number 13. See this on how to handle signals.
Be aware that there are system imposed limitations on what you can do in signal handler
(eg system calls are forbidden, including IO).
Also if accessing your program data await that it may be corrupt. Your program will continue to work after signal handler but it is likely it steps into another error really soon.
I would not recomment to do that in production code, but I think this feature is "unix way" enough to mention on SO
It is operating system and implementation specific. Practically also depends upon the compiler and optimization flags.
Very probably, you have hit some undefined behavior.
Probably valgrind should be a helpful tool.
If on Linux, read core(5) & proc(5) & signal(7). You might set up system-wide your /proc/sys/kernel/core_pattern pseudo-file to start some external program or script (perhaps starting a debugger) on core dump.
You could even handle the SIGSEGV signal in a processor- and operating-system- specific way. But I don't recommend that. See this & that
answers for more.
On Linux, syscalls (listed in syscalls(2)) are nearly the only functions you can call inside a signal handler (more precisely, it is the async-signal-safe functions mentioned in signal(7) and nothing more). But a lot of library functions (including malloc & printf, fprintf, dlopen etc...) are forbidden inside signal handlers.

A program I support is crashing with SIGSEGV but I can't from the .dmp file

As per the title, I can't locate any dump files when this program I support is crashing.
The application's logs clearly mention its a SIGSEGV exception, but I have searched my entire hard drive, and there are no .dmp files anywhere to be found.
The developers of the program have seen similar issues elsewhere but have so far been unable to explain why this is happening - and we're kind of a bit stuck at the moment.
The last section in the application logs reads as :
Received signal SIGSEGV, segmentation violation.
OurApplication::sigHandler 11.
Removing signal handlers.
OurApplication::signalCatched.
OurApplication::sigHandler: exiting application.
Removing signal handlers.
My limited understanding of this is that our application's signal handler might be 'neutralising' the SIGSEGV exception that got thrown. And therefore no core dump is getting generated... I did raise this idea with the developers but they never really seemed have investigated if this might be the reason. The theory they raised in counter was that they think the reason the dmp isn't getting generated is because the program may be crashing twice very close together.
So the questions I have at this point are:
Are there any Windows7 parameters that control the creation of a .dmp file?
Are there any requirements/flags that need to be compiled into a program in order for it (or windows) to create a core dump file if it crashes?
I'm 99% sure it must be windows that is responsible for creating the core file, since the program itself would be dead/terminated when it crashed, correct?
Are there any other things I should be aware of, or check for, or 'evidence' I can collect and then show our developers?
Many thanks in advance
Are there any Windows7 parameters that control the creation of a .dmp file?
There are parameters which control the creation of a crash dump: see MSDN on Collecting user-mode dumps.
Are there any requirements/flags that need to be compiled into a program in order for it (or windows) to create a core dump file if it crashes?
You don't need to compile anything in for the previous answer to work. However, the program needs to terminate due to an unhandled exception, which means you need to let the exception bubble up and not being handled by the unhandled exception handler.
I'm 99% sure it must be Windows that is responsible for creating the core file, since the program itself would be dead/terminated when it crashed, correct?
As stated above, Windows can handle that and it's a good idea to have Windows handle the crash. Imagine that your program is broken due to a memory leak. Some memory has been overwritten. In that case, your unhandled exception handler can be destroyed. Windows, however, still has full control over the process, can suspend it and create a dump from outside (rather from inside).
Are there any other things I should be aware of, or check for, or 'evidence' I can collect and then show our developers?
Well, suggest letting the dump be created by Windows due to above reasons. Then they also don't need to implement a configuration setting (you don't want the crash dump file to be always created, do you?). You don't need to implement a limiting number for the files. You don't need to implement a check for disk space, etc.
And you can suggest to read the Windows Internals 6 books.
Consider creating your own minidump file programatically. Should be plenty of code around showing how to do it. You can try here:
https://stackoverflow.com/search?q=minidump
This way, you're not relying on Dr. Watson or any other settings to create a dump file. Instead you will be calling the functions in DBGHELP.DLL to create the dump file.

Application crashes says : Access violation reading location

My application crashes after running for around 18 hours. I am not able to debug the point in the code where it actually crashes. I checked the call stack- it does not provide any information as such. The last few calls in the call stack are greyed out-meaning I cannot see the code of that part-they all belong to MFC libraries.
However, I get this 'MicroSoft Visual Studio' pop-up when it crashes which says :
Unhandled exception at 0x7c809e8a in NIMCAsst.exe: 0xC0000005:
Access violation reading location 0x154c6000.
Could the above information be useful to understand where it is crashing.Is there any software that could tell me a particular memory address is held by which variable in the code.
If you can't catch the exception sometimes you just have to go through your code line by line, very unpleasant but I'd put money on it being your code not in MFC (always is with my bugs). Check how you're using memory and what you're passing into the MFC functions extra carefully.
Probably the crash is caused by a buffer overflow or other type of memory corruption. This has overwritten some part of the stack holding the return address which has made the debugger unable to reconstruct the stack trace correctly. Or, that the code that caused the crash, you do not have correct sybols for (if the stack trace shows a module name, this would be the case).
My first guess would be to examine the code calling the code that crashed for possible issues that might have caused it. Do you get any other exceptions or error conditions before the crash? Maybe you are ignoring an error return? Did you try using the Debug Heap? What about adplus? Application verifier to turn on heap checks?
Other possibilities include to run a tool like pclint over the code to check for obvious issues of memory use. Are you using threads? Maybe there is a race condition. The list could go on forever really.
The above information only tells you which memory was accessed illegally.
You can use exception handling to narrow down the place where the problem occurs, but then you need at least an idea in which corner to seek.
You say that you're seeing the call stack, that suggests you're using a debugger. The source code of MFC is available (but perhaps not with all vc++ editions), so in principle one can trace through it. Which VC++ version are you using?
The fact that the bug takes so long to occur suggests that it is memory corruption. Some other function writes to a location that it doesn't own. This works a long time, but finally the function alters a pointer that MCF needs, and after a while MFC accesses the pointer and you are notified.
Sometimes, the 'location' can be recognized as data, in which case you have a hint. F.e. if the error said:
Access violation reading location 0x31323334
you'd recognize this as a part of an ASCII string "1234", and this might lead you to the culprit.
As Patrick says, it's almost definitely your code giving MFC invalid values. One guess would be you're passing in an incorrect length so the library is reading too far. But there are really a multitude of possible causes.
Is the crash clearly reproducible?
If yes, Use Logfiles! You should use a logfile and add a number statements that just log the source file/line number passed. Start with a few statements at the entrypoint (main event handler) and the most common execution paths. After the crash inspect the last entry in the logfile. Then add new entries down the path/paths that must have been passed etc. Usually after a few iterations of this work you will find the point of failure. In case of your long wait time the log file might become huge and each iteration will take another 18 hours. You may need to add some technique of rotating log files etc. But with this technique i was able to find some comparable bugs.
Some more questions:
Is your app multithreaded?
Does it use any arrays not managed by stl or comparable containers (does it use C-Strings, C/C++-Arrays etc)?
Try attaching a debugger to the process and have the debugger break on access violations.
If this isnt possible then we use a tool called "User mode process dumper" to create a memory dump of the process at the point where the access violation happened. You can find this for download here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=E089CA41-6A87-40C8-BF69-28AC08570B7E&displaylang=en
How it works: You configure rules on a per-process (or optionally system-wide) basis, and have the tool create either a minidump or a full dump at the point where it detects any one of a list of exceptions - one of them being an access violation. After the dump has been made the application continues as normal (and so if the access violation is unhandled, you will then see this dialog).
Note that ALL access violations in your process are captured - even those that are then later handled, also a full dump can create a while to create depending on the amount of memory the application is using (10-20 seconds for a process consuming 100-200 MB of private memory). For this reason it's probably not a good idea to enable it system-wide.
You should then be able to analyse the dump using tools like WinDbg (http://www.microsoft.com/whdc/devtools/debugging/default.mspx) to figure out what happened - in most cases you will find that you only need a minidump, not a full dump (however if your application doesnt use much memory then there arent really many drawbacks of having a full dump other than the size of the dump and the time it takes to create the dump).
Finally, be warned that debugging access violations using WinDbg can be a fairly involed and complex process - if you can get a stack trace another way then you might want to try that first.
This is the cause of possible memory leak, there are various blogs could teach on checking for memory leaks in application, you simply make observations on Physical Memory of the process from Windows Task Manager, you could find at some stage where memory keep increasing & run out of memory. You can also try running with windbg tool to identify memory leaks in your code. I havent used this tool just giving some heads up on this.
This question is pretty old, and I've had the same problem,
but I've quickly solved it - it's all about threads:
First, note that updating GUI can only be done at the Main Thread.
My problem was that I've tried to handle GUI from a Worker Thread (and not a Main Thread) and i've got the same error: 0xC0000005.
I've solved it by posting a message (which is executed at the Main Thread) - and the problem was solved:
typedef enum {
WM_UPDATE_GUI
}WM_MY_MSG
// register function callback to a message
BEGIN_MESSAGE_MAP(CMyDlg, CDlgBase)
ON_MESSAGE(WM_UPDATE_GUI, OnUpdateGui)
END_MESSAGE_MAP()
// For this example - function that is not invoked in the Main Thread:
void CMyDlg::OnTimer()
{
CString str_to_GUI("send me to gui"); // send string to gui
// Update_GUI(str_to_GUI); // crashed
::PostMessage(hWnd, MyMsg::WM_UPDATE_GUI, (WPARAM)&str_to_GUI, 0);
}
HRESULT CMyDlg::OnUpdateGui(WPARAM wParam, LPARAM lParam)
{
CString str = *(CString*)wParam; // get the string from the posted message
Update_GUI(str);
return S_OK;
}

How can I guarantee catching a EXCEPTION_STACK_OVERFLOW structured exception in C++ under Visual Studio 2005?

Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.

Heisenbug: WinApi program crashes on some computers

Please help! I'm really at my wits' end.
My program is a little personal notes manager (google for "cintanotes").
On some computers (and of course I own none of them) it crashes with an unhandled exception just after start.
Nothing special about these computers could be said, except that they tend to have AMD CPUs.
Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.
Here is what is certain about this "Heisenbug":
1) The crash happens only in the Release version.
2) The crash goes away as soon as I remove all GDI-related stuff.
3) BoundChecker has no complains.
4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Any ideas would be greatly appreciated!
UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:
"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."
and code breaks on
0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]
So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!
Thank you all very much for help!!
4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?
What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?
To see the code where the crash happened you should perform what is called a postmortem analysis.
Windows Error Reporting
If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .
Get the crash dump intended for WER directly from the customer
Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.
Your own minidump
Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.
So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation:
1.) Initialization of globals
2.) Actual machine Code generated etc..
So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.
-AD
1) The crash happens only in the Release version.
That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).
2) The crash goes away as soon as I remove all GDI-related stuff.
Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?
Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.
Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.
Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)
Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?
Stack overflow! ;)
4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption
I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.
What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with
Ensure you have no preceeding memory corruptions using PageHeap.exe
Ensure you have no stack overflow (CBig array[1000000])
Ensure that you have no un-initialized memory.
Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.
"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"
This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.
When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.
You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,
Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.
Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.
This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.
Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.
Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.
Good Luck!
EDIT:
Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.
Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.
Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.
Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.
Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.