run-time error and program crash - c++

How can I specify a separate function which will be called automatically at Run-Time error to prevent program crash?

Best thing as mentioned already is to identify the area where the crash ic occuring and then fix the piece of code. This is the idealistic approach.
In case you are unable to find that out another alternative is to do structured exception handling in the areas where you suspect crashes to occur. Once the crash occurs you capture what ever data you want and process it. Meanwhile you can change the settings in windows service manager to restart your application whenever it crashes. Hope this answers your question.
Also in case you are looking for methods to capture the crash and analyse debugdiag and windbg are some of the standard tools that people use to take the crash dumps.

If your in Windows you need to write your own custom runtime check handler. Use:
_RTC_SetErrorFunc
to install your custom function in place of
_CrtDbgReport.
Here is a good article on how to do it:
http://msdn.microsoft.com/en-us/library/40ky6s47(VS.71).aspx

If by run-time error you mean an unhandled exception, don't do it. If an unhanded exception propagates through your code, you want your app to crash. Maybe create a dumpfile on the way down, sure. But the last thing you want to do is catch the exception, do nothing, and continue running as if nothing had happened in the first place.
When an exception is generated, whatever caused the problem could have had other effects. Like blowing the stack or corrupting the heap. If you were to silently ignore these exceptions and continue running anyway, your app might run for a while and seem OK, but something underneath is unstable. Memory could be corrupt, resources might not be available, who knows. You will eventually crash. Bu ignoring unhandled exceptions and running anyway, all you do is delay the inevitable, and make it that much more difficult to diagnose the real problem, because when you do crash, the stack will be in a totally different and probably unrelated place from what caused the initial problem.

Related

When can the problem actually be fixed by catching an exception?

Here's the thing. There's something I don't quite understand about exceptions, and to me they seem like a construct that almost works, but can't be used cleanly.
I have a simple question. When has catching an exception been a useful or necessary component of solving the root cause of the problem? I.e. when have you been able to write code that fixes a problem signaled through an exception? I am looking for factual data, or experience you have had.
Here's what I mean. A normal program does work. If some piece of work can't be completed for reason X, the function responsible for doing the work throws an exception. But who catches the exception? As I see it, there are three reasons you might want to catch an exception:
You catch it because you want to change its type and rethrow it. (This happens when you translate mechanical exception, such as std::out_of_range, to business exceptions, such as could_not_complete_transaction)
You catch it because you want to log it, or let the user know about the problem, before aborting.
You catch it because you actually know how to solve the problem.
It is point 3 that I'm skeptical about. I have never actually caught an exception knowing what to do to solve it. When you get a std::out_of_memory, what are you supposed to do with it? It's not like you can barter the operating system to get more memory. That's just not something you can fix. And it's not just std::out_of_memory, there are also business class exceptions that suffer from this. Think about a potential connection_error exception: what can you do to fix this except wait and retry later and hope it fixes itself?
Now, to be fair, I do know of one case in which code does catch an exception and tries to fix the problem. I know that there are certain Win32 SEH handlers that catch a Stack Overflow exception and try to fix the problem by enlarging the size of the thread stack if it's possible. However, this works because SEH has try-resume semantics, which C++ exceptions don't have (you can't resume at the point the exception occurred).
The main part of the question is over. However, there's also another problem I have with exceptions that, to me, seems exactly the reason why you don't have catch clauses that fix the problem: the code that catches the exception necessarily has to be coupled with the code that throws it. Because, in order to fix the problem, it must have domain specific knowledge about what the problem cause is. But when some library documents that "if this function fails, an internal_error exception will be thrown", how am I supposed to be able to fix the problem when I don't know how the library works internally?
PS: Please note that this is not a "exceptions vs. error codes" kind of question; I am well aware that error codes suck as an error handling mechanism. They actually suffer from the same problem I have explained for exceptions.
I think your problem is that you equate "solve the problem" with "make the program keep going correctly". That is the wrong way to think of exceptions, or error handling in general.
Error handling code of any kind should not be something that is internally fixable by the program. That is, error handling logic (like catching exceptions) should not be entered because of programming mistakes.
If the user gives you a non-existent filename, that's not a programming mistake; that's a user-error. You cannot "fix" that without going back to the user and getting an existing file. But exceptions do allow you to undo what you were trying to do, restore the program to a valid state, and then communicate what happened to the user.
An invalid_connection is similarly not a programming mistake. Unlike the above, it's not necessarily a user error either. It's something that's expected to be able to happen, and different programs will handle it in different ways. Some will want to try again. Others will want to halt and let the user know.
The point is, because there is no one means to handle this condition, it cannot be done by the library. The error must be given to the caller of the library to figure out what to do.
If you have a function that parses integers, and you are given text that doesn't conform to an integer, it's not that function's job to figure out what to do next. The caller needs to be notified that the string they provided is malformed and that something ought to be done.
The caller needs to handle the error.
You don't abort most programs because a file that was supposed to contain integers didn't contain integers. But your parsing function does need to communicate this fact to the caller, and the caller does need to deal with that possibility.
That's what "catching exceptions" is for.
Now, unexpected environmental conditions like OOM are a different story. This is not usually external code's fault, but it's also not usually a programming error. And if it is a programming error (ie: memory leak), it's not one you can deal with in most cases. P0709 has an entire section on the ability (or lack thereof) of programs to be able to generally respond to OOM. The result is that, even when programs are coded defensively against OOM exceptions, they're usually still broken when they run out of memory.
Especially when dealing with OS's that don't commit pages to memory until you actually use them.
Here is my take,
There are more reasons to catch exceptions, for example, if it is a critical application, such as ones found in power substations etc. and an exception is caught to which there is no known system recovery or solution, you may want to have a controlled shutdown, protect certain modules, protect connected embedded systems etc. instead of just letting the system crash on its own. The latter could be disastrous...
I.e. when have you been able to write code that fixes a problem signaled through an exception?
When you get a std::out_of_memory, what are you supposed to do with it? It's not like you can barter the operating system to get more memory.
Actually I feel like that was my primary coding style for a while. An example: a system I worked on did not have a huge amount of memory and the system was dedicated, so, it was only my app and nothing else. Whenever I had an out_of_memory type of exception, I'd just kill the older process and open the one with the higher priority. Of course I'd wait for the kill to happen in a controlled fashion.
Think about a potential connection_error exception: what can you do to fix this except wait and retry later and hope it fixes itself?
I'd try to connect through another medium such as bluetooth, fiber, bus etc. Normally of course there would be a primary medium of contact, and the others wouldn't be called unless there is an exception.
But when some library documents that "if this function fails, an internal_error exception will be thrown", how am I supposed to be able to fix the problem when I don't know how the library works internally?
Most often an exception in a dedicated library has different consequences in your system than its own. You may not need to read the library and its internal workings to fix the problem. You just need to study its effect on your software and handle that situation. That's probably the easiest solution. And that is a lot easier to do if the library raises a known exception instead of just crashing or giving gibberish answers.
One obvious thing that came to mind was socket connections.
You try and connect to Server A and the program finds that it can't do that
Try connecting to Server B
The other examples regarding user input are equally as valid if not more so.
I admit that seeing something along the lines of
try
{
connectToServerA();
}
catch(cantConnectToServer)
{
connectToServerB();
}
would look like a bit of a weird pattern to see in real world code. It might make sense if the function takes an address and we iterate through a list of potential addresses.
Broadly speaking I agree with you often all you want to do is log the error and terminate - but some systems, which have to be robust and "always on" shouldn't just terminate if they encounter a problem.
Webservers are one obvious example. You don't just terminate because one users connection faulters, because that would drop the session for all the other connected users. There might be parts of code where raising an exception is the simplest way to deal with such a failure however.

Why catch an Access Violation?

Unlike C++ exceptions, an Access Violation indicates your applications runtime is compromised and the state of your application is, therefore, undefined. The best thing to do in that circumstance is to exit your application (generally done for you because it crashes).
I note that it is possible to catch one of these exceptions. In Microsoft Visual C++, for example, you can use /EHa or __try/__catch to do so.
So, what are the reasons you would want to catch them? As I understand it, there's no way for your application to recover.
You can recover from an access violation.
For example, you can create a dynamic array by allocating some address space with VirtualAlloc, but marking the memory it points at as not-present. Then, when you attempt to use some memory, you catch the access violation, map a page of memory where the access took place, then re-try the instruction that caused the violation.
One reason might be to write a crash dump file; you would have more control over it and be able to write the exact type you wanted. In Windows, for example, you could call MiniDumpWriteDump to do this.
Your app is not guaranteed to be stable after all access violations. But it can be stable or recover after some, so catching the access violation gives you the opportunity to:
Inform the user something went bad
Let the user attempt to save work
Attempt to recover
Log diagnostic information
Exit in the way you want, rather than disappearing.
A typical example of this is a host application that catches exceptions from a plugin. This way the host app (Photoshop, for example) can tell the user that "Plugin X crashed, and Photoshop is unstable... you should save your work, and restart Photoshop."
Note that this is different than C++ exception handling, which does not indicate unrecoverable errors at all, but is more of a stack unwinding feature.
I think it would be better to gracefully crash and let the user know what's going on, rather than just having the application disappear.
If your application gets an access violation you may or may not be able to recover. If your application is left in an undefined state, you clobber your stack for instance, you're done.
Read from a runaway pointer probably won't corrupt your application and from that you likely can recover.
Either way, the fact that it is occurring indicates a bug in your code so instead of trying to recover you should catch the error and dump what state you can to help you debug the problem.

Why won't my code segfault on Windows 7?

This is an unusual question to ask but here goes:
In my code, I accidentally dereference NULL somewhere. But instead of the application crashing with a segfault, it seems to stop execution of the current function and just return control back to the UI. This makes debugging difficult because I would normally like to be alerted to the crash so I can attach a debugger.
What could be causing this?
Specifically, my code is an ODBC Driver (ie. a DLL). My test application is ODBC Test (odbct32w.exe) which allows me to explicitly call the ODBC API functions in my DLL. When I call one of the functions which has a known segfault, instead of crashing the application, ODBC Test simply returns control to the UI without printing the result of the function call. I can then call any function in my driver again.
I do know that technically the application calls the ODBC driver manager which loads and calls the functions in my driver. But that is beside the point as my segfault (or whatever is happening) causes the driver manager function to not return either (as evidenced by the application not printing a result).
One of my co-workers with a similar machine experiences this same problem while another does not but we have not been able to determine any specific differences.
Windows has non-portable language extensions (known as "SEH") which allow you to catch page faults and segmentation violations as exceptions.
There are parts of the OS libraries (particularly inside the OS code that processes some window messages, if I remember correctly) which have a __try block and will make your code continue to run even in the face of such catastrophic errors. Likely you are being called inside one of these __try blocks. Sad but true.
Check out this blog post, for example: The case of the disappearing OnLoad exception – user-mode callback exceptions in x64
Update:
I find it kind of weird the kind of ideas that are being attributed to me in the comments. For the record:
I did not claim that SEH itself is bad.I said that it is "non-portable", which is true. I also claimed that using SEH to ignore STATUS_ACCESS_VIOLATION in user mode code is "sad". I stand by this. I should hope that I had the nerve to do this in new code and you were reviewing my code that you would yell at me, just as if I wrote catch (...) { /* Ignore this! */ }. It's a bad idea. It's especially bad for access violation because getting an AV typically means your process is in a bad state, and you shouldn't continue execution.
I did not argue that the existence of SEH means that you must swallow all errors.Of course SEH is a general mechanism and not to blame for every idiotic use of it. What I said was that some Windows binaries swallow STATUS_ACCESS_VIOLATION when calling into a function pointer, a true and observable fact, and that this is less than pretty. Note that they may have historical reasons or extenuating circumstances to justify this. Hence "sad but true."
I did not inject any "Windows vs. Unix" rhetoric here. A bad idea is a bad idea on any platform. Trying to recover from SIGSEGV on a Unix-type OS would be equally sketchy.
Dereferencing NULL pointer is an undefined behavior, which can produce almost anything -- a seg.fault, a letter to IRS, or a post to stackoverflow :)
Windows 7 also have its Fault Tollerant Heap (FTH) which sometimes does such things. In my case it was also a NULL-dereference. If you develop on Windows 7 you really want to turn it off!
What is Windows 7's Fault Tolerant Heap?
http://msdn.microsoft.com/en-us/library/dd744764%28v=vs.85%29.aspx
Read about the different kinds of exception handlers here -- they don't catch the same kind of exceptions.
Attach your debugger to all the apps that might call your dll, turn on the feature to break when an excption is thrown not just unhandled in the [debug]|[exceptions] menu.
ODBC is most (if not all) COM as such unhandled exceptions will cause issues, which could appear as exiting the ODBC function strangely or as bad as it hang and never return.

debug stack overflow in windows?

So I'm trying to debug this strange problem where a process ends without calling some destructors...
In the VS (2005) debugger, I hit 'Break all' and look around in the call stacks of the threads of the misteriously disappearing process, when I see this:
smells like SO http://img6.imageshack.us/img6/7628/95434880.jpg
This definitely looks like a SO in the making, which would explain why the process runs to its happy place without packing its suitcase first.
The problem is, the VS debugger's call stack only shows what you can see in the image.
So my question is: how can I find where the infinite recursion call starts?
I read somewhere that in Linux you can attach a callback to the SIGSEGV handler and get more info on what's going on.
Is there anything similar on Windows?
To control what Windows does in case of an access violation (SIGSEGV-equivalent), call SetErrorMode (pass it parameter 0 to force a popup in case of errors, allowing you to attach to it with a debugger.)
However, based on the stack trace you have already obtained, attaching with a debugger on fault may yield no additional information. Either your stack has been corrupted, or the depth of recursion has exceeded the maximum number of frames displayable by VS. In the latter case, you may want to decrease the default stack size of the process (use the /F switch or equivalent option in the Project properties) in order to make the problem manifest itself sooner, and make sure that VS will display all frames. You may, alternatively, want to stick a breakpoint in std::basic_filebuf<>::flush() and walk through it until the destruction phase (or disable it until just prior to the destruction phase.)
Well, you know what thread the problem is on - it might be a simple matter of tracing through it from inception to see where it goes off into the weeds.
Another option is to use one of the debuggers in the Debugging Tools for Windows package - they may be able to show more than the VS debugger (maybe), even if they are generally more complex and difficult to use (actually maybe because of that).
That does look at first glance like an infinite recursion, you could try putting a breakpoint at the line before the one that terminates the process. Does it get there ok? If it does, you've got two fairly easy ways to go.
Either you just step forward and see which destructors get called and when it gets caught up. Or you could put a printf/OutputDebugString in every relevant objects destructor (ONly ones which are globals should need this). If the message is the first thing the destructor does, then the last message you see is from the destructor which hangs things up.
On the other hand, if it doesn't get to that breakpoint I originally mentioned, then can do something similar, but it will be more annoying since the program is still "doing stuff".
I wouldn't rule out there being such a handler in Windows, but I've never heard of it.
I think the traceback that you're showing may be bogus. If you broke into the process after some kind of corruption had already occurred, then the traceback isn't necessarily valid. However, if you're lucky the bottom of the stack trace still has some clues about what's going on.
Try putting Sleep() calls into selected functions in your source that might be involved in the recursion. That should give you a better chance of breaking into the process before the stack has completely overflowed.
I agree with Dan Breslau. Your stack is bogus. may be simply because you don't have the right symbols, though.
If a program simply disappears without the WER handling kicking in, it's usually an out of memory condition. Have you gone investigated that possibility ?

How can I guarantee catching a EXCEPTION_STACK_OVERFLOW structured exception in C++ under Visual Studio 2005?

Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.