I have a very strange situation.
I'm running IOCP Server Program programmed by Visual studio 2010 in C++.
It uses 'minidump', so When there is a logical bug like pointer misuse, Program crashes with dump file so I can find out where is the crash point in codes.
Sometimes (very rarely), the program crashes and there are no dump files.
What situation makes SetUnhandledExceptionFilter() not working?
Anybody knows this problem? I can't figure out.
Well, of course you don't know because you don't have a minidump to look at. You should do the absolute minimum when the SetUnhandledExceptionFilter callback fires. The process is in a perilous state. It crashed. Locks might be held, the heap locks are particularly troublesome. You can't expect MiniDumpWriteDump() to succeed.
What you need is a little guard process that waits on a named event. Start it up as early as possible in your main() function and pass it your process ID. The guard process waits on both that event and your process handle. In your exception callback, just signal the event and immediately sleep for a long time. That wakes up the guard process, it calls MiniDumpWriteDump() plus whatever else is necessary to let you know about the crash. And kills your main program.
Related
I'm looking for a way to catch segfaults and other errors anywhere in a program (which uses multiple threads, some of which are created by external libraries). I'm using Visual Studio 2013 with the Intel C++ Compiler 2015.
Some external DLL's - in some cases I've even seen this with Windows drivers - can contain bugs that are out of my control, and my software runs 24/7 - I need to be able to log a crash somewhere and restart my software.
So far, I found that you can set a signal handler which handles SIGSEGV and other signals. Based on what I read, under Linux this would do exactly what I need (handle this signal for all threads), but under Windows you need to set the signal handler for each thread separately. Since I'm not the one who creates all the threads (if I was, I could just use __try/__catch), this isn't really an option. On top of that, I'm seeing that when I set a signal handler in a thread and then cause a SIGSEGV it doesn't get handled by the handler, while the exact same code works fine in the main thread - not sure what's going on there (but even a fix for that wouldn't help, since I don't create all the threads, and looping through all existing threads in my process to set handlers sounds like a very bad idea).
So, is there a way to do this? Googling and searching here did not help - I found several people with similar questions but no answers that are usable in my situation.
Note: What I have now, which works perfectly in the main thread but not at all if I copy this same block of code to any thread:
SignalHandlerPointer previousHandlerSEGV = signal(SIGSEGV, SignalHandler);
int *a;
a = NULL;
*a = 0;
To get notified about all unhandled exceptions in a process you can call SetUnhandledExceptionFilter. The functionality is documented as:
Issuing SetUnhandledExceptionFilter replaces the existing top-level exception filter for all existing and all future threads in the calling process.
Inside the exception filter it is recommended to trigger a call to MiniDumpWriteDump (in an external process), to produce a mini dump for off-line analysis. You get to control the amount of information that is written into the mini dump (e.g. threads, modules, call stacks, memory). Most importantly, you can dump the call stack of the thread that raised the uncaught exception, at the time the exception is raised.
As an alternative, I believe some/most/all of this can be done automatically by registering for application recovery and restart.
I use Qt 4.8.6, MS Visual Studio 2008, Windows 7. I've created a GUI program. It contains main GUI thread and worker thread (I have not made QThread subclass, by the way), which makes synchronous calls to 3rd party DLL functions. These functions are rather slow. QTcpServer instance is also under worker thread. My worker class contains QTcpServer and DLL wrapper methods.
I know that quit() is preferred over terminate(), but I don't wanna wait for a minute (because of slow DLL functions) during program shutdown. When I try to terminate() worker thread, I notice warnings about stopping QTcpServer from another thread. What is a correct way of process shutdown?
QThread::quit tells the thread's event loop to exit. After calling it the thread will get finished as soon as the control returns to the event loop of the thread
You may also force a thread to terminate right now via QThread::terminate(), but this is a very bad practice, because it may terminate the thread at an undefined position in its code, which means you may end up with resources never getting freed up and other nasty stuff. So use this only if you really can't get around it.
So i think the right approach is to first tell the thread to quit normally and if something goes wrong and takes much time and you have no way to wait for it, then terminate it:
QThread * th = myWorkerObject->thread();
th->quit();
th->wait(5000); // Wait for some seconds to quit
if(th->isRunning()) // Something took time more than usual, I have to terminate it
th->terminate();
You should always try to avoid killing threads from the outside by force and instead ask them nicely to finish what they're doing. This usually means that the thread checks regularly if it should terminate itself and the outside world tells it to terminate when needed (by setting a flag, signaling an event or whatever is appropriate for the situation at hand).
When a thread is asked to terminate itself, it finishes up what it's doing and exists cleanly. The application waits for the thread to terminate and then exits.
You say that in your case the thread takes a long time to finish. You can take this into consideration and still terminate the thread "the nice way" (for example you can hide the application window and give the impression that the app has exited, even if the process takes a little more time until it finally terminates; or you can show some form of progress indication to the user telling him that the application is shutting down).
Unless there is an overriding reason to do so, you should not attempt to terminate threads with user code at process-termination.
If there is no such reason, just call your OS process termination syscall, eg. ExitProcess(0). The OS can, and will will stop all process threads in any state before releasing all process resources. User code cannot do that, and should not try to terminate threads, or signal them to self-terminate, unless absolutely necessary.
Attempting to 'clean up' with user code sounds 'nice', (aparrently), but is an expensive luxury that you will pay for with extra code, extra testing and extra maintenance.
That is, if your customers don't stop buying your app because they get pissed off with it taking so long to shut down.
The OS is very good at stopping threads and cleaning up. It's had endless thousands of hours of testing during development and decades of life in the wild where problems with process termination would have become aparrent and got fixed. You will not even get close to that with your flags, events etc. as you struggle to stop threads running on another core without the benefit of an interprocessor driver.
There are surely times when you will have to resort to user code to stop threads. If you need to stop them before process termination, or you need to close some DB connection, flush some file at shutdown, deal with interprocess comms or the like issues, then you will have to resort to some of the approaches already suggested in other answers.
If not, don't try to duplicate OS functionality in the name of 'niceness'. Just ask it to terminate your process. You can get your warm, fuzzy feeling when your app shuts down immedately while other developers are still struggling to implement 'Shutdown' progress bars or trying to explain to customers why they have 15 zombie apps still running.
I am writing a DLL which is loaded by a proprietary program which is closed source and I have no control over. I also load a Proprietary DLL which is just as obscure. Since I sometimes have to relay commands I get through my DLLs interface to the DLL I Load with very low latency I launch a separate detached thread upon initializing my DLL and send it unformatted debug information through a lock free queue. The time consuming formating of debug output and writing to a log file is thus done asynchronously. The problem is that the process crashes inadvertently (which I am almost certain is no my fault) and I have no way of knowing what the last debug info was because my detached thread is killed by windows before it can write it to disk.
So here is my question:
Can I delay destruction in any way if the proprietary program crashes so that my detached thread runs longer before destruction?
Would interprocess communication solve my problem by moving my detached thread to another process which windows would not kill? If so what method would you suggest (I have not worked with IPC much)
If I use IPC how do I know when to terminate my "debug formating process"?
I think even with IPC you would have the situation where your thread may have un-written debug information during a process fault. Presumably you don't have debugging going on all the time, so I'd think you wouldn't need a separate thread for it, just a compile-time or run-time option to enable it. You might be able to SetUnhandledExceptionFilter for the process and do a few things before you terminate.
I'd like to apologize in advance, because this is not a very good question.
I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.
When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.
Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.
All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.
The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.
The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).
We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.
What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.
If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:
1) A top-level exception is generated and there is no matching catch block for that exception type.
2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.
A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:
"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"
-or-
"My program might crash, but if it does I want to create a dump file on the way down."
The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.
The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.
Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.
I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?
Analyze memory in a signal handler
http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx
This isn't a very good answer, but hopefully it might help you.
I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):
Dereferencing potentially invalid
pointers can disable stack expansion
in other threads. A thread exhausting
its stack, when stack expansion has
been disabled, results in the
immediate termination of the parent
process, with no pop-up error window
or diagnostic information.
As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.
There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly
Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service).
Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)
When I exit my C++ program it crashes with errors like:
EAccessViolation with mesage 'Access violation at address 0...
and
Abnormal Program Termination
It is probably caused by some destructor because it happens only when the application exits. I use a few external libraries and cannot find the code that causes it. Is there a function that forces immediate program exit (something like kill in Linux) so that memory would have to be freed by the operating system? I could use this function in app exit event.
I know that it would be a terrible solution because it'd just hide the problem.
I'm just asking out of sheer curiosity, so please don't give me -1 :)
I tried exit(0) from stdlib but it didn't help.
EDIT:
Thanks for your numerous replies:)
I use Builder C++ 6 (I know it's outdated but for some reasons I had to use it). My app uses library to neural networks (FANN). Using the debugger I found that program crashes in:
~neural_net()
{
destroy();
}
destroy() calls multiple time another function fann_safe_free(ptr), that is:
#define fann_safe_free(x) {if(x) { free(x); x = NULL; }}
The library works great, problem only appears when it does cleaning. That's why I asked about so brutal solution. My app is multi-threaded but other threads operate on different data.
I will analyze my code for the n-th time(the bug must be somewhere), thanks for all your tips :)
You should fix the problem.
First step: find at check all functions you register with atexit() (not many I hope)
Second step: find all global variables and check their destructors.
Third Step: find all static function variables check their destructors.
But otherwise you can abort.
Note: abort is for Abnormal program termination.
abort()
The difference: (note letting an application leave the main function is the equivalent of exit())
exit()
Call the functions registered with the atexit(3) function, in the reverse order of their registration. This includes the destruction of all global (static storage duration) variables.
Flush all open output streams.
Close all open streams.
Unlink all files created with the tmpfile(3) function.
abort()
Flush all open output streams.
Close all open streams.
It's a terrible solution for more than one reason. It will hide the problem (maybe), but it could also corrupt data, depending on the nature of your application.
Why don't you use a debugger and try to find out what is causing the error?
If your application is multi-threaded, you should make sure that all threads are properly shut down before exiting the application. This is a fairly common cause of that type of error on exit, when a background thread is attempting to use memory/objects that have already been destructed.
Edit:
based on your updated question, I have the following suggestions:
Try to find out more specifically what is causing the crash in the destructor.
The first thing I would do is make sure that it's not trying to destruct a NULL object. When you get your crash in ~neural_net in your debugger, check your "this" pointer to make sure it's not NULL. If it is, then check your call-stack and see where it's being destructed, and do a check to make sure it's not NULL before calling delete.
If it's not NULL, then I would unroll that macro in destroy, so you can see if it's crashing on the call to free.
You could try calling abort(); (declared in <stdlib.h> and in <process.h>)
The version in VisualC++, however, will print a warning message as it exits: "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information."
On Linux/UNIX you can use _exit:
#include <unistd.h>
void _exit(int status);
The function _exit() is like exit(), but does not call any functions registered with atexit() or on_exit(). Whether it flushes standard I/O buffers and removes temporary files created with tmpfile(3) is implementation dependent. On the other hand, _exit() does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish. If the delay is undesired, it may be useful to call functions like tcflush() before calling _exit(). Whether any pending I/O is cancelled, and which pending I/O may be cancelled upon _exit(), is implementation-dependent.
Have you tried the gruesome step by step? If you're project/solution is simply to large to do so maybe you could try segmenting it assuming you use a modular build and test each component indivdually. Without any code or visible destructors abstract advice is all I can give you I'm afraid. But nonetheless I hope trying to minimize the debugging field will help in some way.
Good luck with getting an answer :)
That immediate program exit (and yes, that's a terrible solution) is abort()
That happens most likely because a NULL pointer is being accessed. Depending on your OS try getting a stack trace and identify the culprit, don't just exit.
If you use linux, valgrind should solve your problem.
but if it is windows, try one of these: MemoryValidator, BoundsChecker or other tools like these.
Simply close your application is not the best way to deal with bugs ...