I have my application(VC MFC) run with gflags with Pageheap enabled to track down the page heap corruption.
Now the application has crashed and it shows this error, I could not interpret these lines (other than having a feel of resource inavailablity)
Can anyone throw a light on what exactly is the reason that has caused the crash of the app?
(info: Application is a multithreaded one about 500 threads running,in a multi - processor machine)
kernel32!RaiseException+53
msvcrt!_CxxThrowException+36
mfc42u!AfxThrowResourceException+19
mfc42u!AfxRegisterWndClass+ab
mfc42u!CAsyncSocket::AttachHandle+5c
mfc42u!CAsyncSocket::Socket+25
mfc42u!CAsyncSocket::Create+14
This same problem has driven me nuts but finally i fixed it and it is working. This is bug with MFC socket library that when inside a thread [other than main application thread], If we try to do something like
CSocket socket;
socket.Create();
It will throws an unhandled exception. I found an article on it See What Microsoft says about this
that said something from Microsoft but that did not help me either. So here is a workaround i have found and i hope it can help some frustrated fellow like me.
Inside thread, do this
CSocket mySock;
SOCKET sockethandle = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
mySock.m_hSocket= sockethandle;
After that DO NOT call mySock.Create as it has been created already through assignment of socket handle. I am not sure if we can use mySock.Attach(sockethandle) as i did not try it yet.
After that you can call Connect etc directly.
When you are done using the socket, DO NOT call mySock.Close() - rather call closesocket(mySock.m_hSocket); And that will free the socket object. If Attach works in above case then i guess we need to do Detach here when to free the socket.
Good Luck
I wonder if this is your actual heap corruption issue, or if your program has just hit a resource limitation as a consequence of running with Pageheap.
I can't remember the exact details, but Pageheap incurs extra memory overhead, so much so that you can run out of memory much sooner than you would without Pageheap enabled.
With 500 threads running, you have a 1MB stack for each, plus any memory they've allocated dynamically along the way.
CAsyncSocket::AttachHandle triggers AfxThrowResourceException if it can't create a window. It seems that your system is saturated due to Pageheap.
Do you have to have 500 threads running to reproduce the problem? Maybe if you could lower that count a little, there would be more resources available.
I had the same problem, and after trying many things I've notice the following CAsyncSocket reference:
Create is not thread-safe. If you are calling it in a multi-threaded environment where it could be invoked simultaneously by different threads, be sure to protect each call with a mutex or other synchronization lock.
After adding a Mutex synchronization it no longer throws an exception.
Related
I debug console multithreaded application written in C++/Qt 5.12.1. It is running on Linux Mint 18.3 x64.
This app has SIGINT handler, QWebSocketServer and QWebSocket table. It uses close() QWebSocketServer and call abort()/deleteLater() for items in QWebSocket table to handle the termination.
If the websocket client connects to this console app, then termination fails because of some running thread (I suppose it's internal QWebSocket thread).
Termination is successful if there were no connections.
How to fix it? So that the app gracefully exits.
To gracefully quit the socket server we can attempt:
The most important part is to allow the main thread event loop to run and wait on QWebSocketServer::closed() so that the slot calls QCoreApplication::quit().
That can be done even with:
connect(webSocketServer, &QWebSocketServer::closed,
QCoreApplication::instance(), &QCoreApplication::quit);
If we don't need more detailed reaction.
After connecting that signal before all, proceed with pauseAccepting() to prevent more connections.
Call QWebSocketServer::close.
The below may not be needed if the above sufficient. You need to try the above first, and only if still have problems then deal with existing and pending connections. From my experience the behavior was varying on platforms and with some unique websocket implementations in the server environment (which is likely just Qt for you).
As long as we have some array with QWebSocket instances, we can try to call QWebSocket::abort() on all of them to immediately release. This step seem to be described by the question author.
Try to iterate pending connections with QWebSocketServer::nextPendingConnection() and call abort() for them. Call deleteLater, if that works as well.
There is no need to do anything. What do you mean by "graceful exit"? As soon as there's a request to terminate your application, you should terminate it immediately using exit(0) or a similar mechanism. That's what "graceful exit" should be.
Note: I got reformed. I used to think that graceful exits were a good thing. They are most usually a waste of CPU resources and usually indicate problems in the architecture of the application.
A good rationale for why it should be so written in the kj framework (a part of capnproto).
Quoting Kenton Varda:
KJ_NORETURN(virtual void exit()) = 0;
Indicates program completion. The program is considered successful unless error() was
called. Typically this exits with _Exit(), meaning that the stack is not unwound, buffers
are not flushed, etc. -- it is the responsibility of the caller to flush any buffers that
matter. However, an alternate context implementation e.g. for unit testing purposes could
choose to throw an exception instead.
At first this approach may sound crazy. Isn't it much better to shut down cleanly? What if
you lose data? However, it turns out that if you look at each common class of program, _Exit()
is almost always preferable. Let's break it down:
Commands: A typical program you might run from the command line is single-threaded and
exits quickly and deterministically. Commands often use buffered I/O and need to flush
those buffers before exit. However, most of the work performed by destructors is not
flushing buffers, but rather freeing up memory, placing objects into freelists, and closing
file descriptors. All of this is irrelevant if the process is about to exit anyway, and
for a command that runs quickly, time wasted freeing heap space may make a real difference
in the overall runtime of a script. Meanwhile, it is usually easy to determine exactly what
resources need to be flushed before exit, and easy to tell if they are not being flushed
(because the command fails to produce the expected output). Therefore, it is reasonably
easy for commands to explicitly ensure all output is flushed before exiting, and it is
probably a good idea for them to do so anyway, because write failures should be detected
and handled. For commands, a good strategy is to allocate any objects that require clean
destruction on the stack, and allow them to go out of scope before the command exits.
Meanwhile, any resources which do not need to be cleaned up should be allocated as members
of the command's main class, whose destructor normally will not be called.
Interactive apps: Programs that interact with the user (whether they be graphical apps
with windows or console-based apps like emacs) generally exit only when the user asks them
to. Such applications may store large data structures in memory which need to be synced
to disk, such as documents or user preferences. However, relying on stack unwind or global
destructors as the mechanism for ensuring such syncing occurs is probably wrong. First of
all, it's 2013, and applications ought to be actively syncing changes to non-volatile
storage the moment those changes are made. Applications can crash at any time and a crash
should never lose data that is more than half a second old. Meanwhile, if a user actually
does try to close an application while unsaved changes exist, the application UI should
prompt the user to decide what to do. Such a UI mechanism is obviously too high level to
be implemented via destructors, so KJ's use of _Exit() shouldn't make a difference here.
Servers: A good server is fault-tolerant, prepared for the possibility that at any time
it could crash, the OS could decide to kill it off, or the machine it is running on could
just die. So, using _Exit() should be no problem. In fact, servers generally never even
call exit anyway; they are killed externally.
Batch jobs: A long-running batch job is something between a command and a server. It
probably knows exactly what needs to be flushed before exiting, and it probably should be
fault-tolerant.
I have a program that uses services from others. If the program crashes, what is the best way to close those services? At server side, I would define some checkers that monitor if a client is invalid periodically. But can we do any thing at client? I am not the sure if the normal RAII still works at this case. My code is written in C and C++.
If your application experiences a hard crash, then no, your carefully crafted cleanup code will not run, whether it is part of an RAII paradigm or a method you call at the end of main. None of an application's cleanup code runs after a crash that causes the application to be terminated.
Of course, this is not true for exceptions. Although those might eventually cause the application to be terminated, they still trigger this termination in a controlled way. Generally, the runtime library will catch an unhandled exception and trigger termination. Along the way, your RAII-based cleanup code will be executed, unless it also throws an exception. Then you're back to being unceremoniously ripped out of memory.
But even if your application's cleanup code can't run, the operating system will still attempt to clean up after you. This solves the problem of unreleased memory, handles, and other system objects. In general, if you crash, you need not worry about releasing these things. Your application's state is inconsistent, so trying to execute a bunch of cleanup code will just lead to unpredictable and potentially erroneous behavior, not to mention wasting a bunch of time. Just crash and let the system deal with your mess. As Raymond Chen puts it:
The building is being demolished. Don't bother sweeping the floor and emptying the trash cans and erasing the whiteboards. And don't line up at the exit to the building so everybody can move their in/out magnet to out. All you're doing is making the demolition team wait for you to finish these pointless housecleaning tasks.
Do what you must; skip everything else.
The only problem with this approach is, as you suggest in this question, when you're managing resources that are not controlled by the operating system, such as a remote resource on another system. In that case, there is very little you can do. The best scenario is to make your application as robust as possible so that it doesn't crash, but even that is not a perfect solution. Consider what happens when the power is lost, e.g. because a user's cat pulled the cord from the wall. No cleanup code could possibly run then, so even if your application never crashes, there may be termination events that are outside of your control. Therefore, your external resources must be robust in the event of failure. Time-outs are a standard method, and a much better solution than polling.
Another possible solution, depending on the particular use case, is to run consistency-checking and cleanup code at application initialization. This might be something that you would do for a service that is intended to run continuously and will be restarted promptly after termination. The next time it restarts, it checks its data and/or external resources for consistency, releases and/or re-initializes them as necessary, and then continues on as normal. Obviously this is a bad solution for a typical application, because there is no guarantee that the user will relaunch it in a timely manner.
As the other answers make clear, hoping to clean up after an uncontrolled crash (i.e., a failure which doesn't trigger the C++ exception unwind mechanism) is probably a path to nowhere. Even if you cover some cases, there will be other cases that fail and you are building in a serious vulnerability to those cases.
You mention that the source of the crashes is that you are "us[ing] services from others". I take this to mean that you are running untrusted code in-process, which is the potential source of crashes. In this case, you might consider running the untrusted code "out of process" and communicating back to your main process through a pipe or shared memory or whatever. Then you isolate the crashes this child process, and can do controlled cleanup in your main process. A separate process is really the lightest weight thing you can do that gives you the strong isolation you need to avoid corruption in the calling code.
If forking a process per-call is performance-prohibitive, you can try to keep the child process alive for multiple calls.
One approach would be for your program to have two modes: normal operation and monitoring.
When started in a usual way, it would :
Act as a background monitor.
Launch a subprocess of itself, passing it an internal argument (something that wouldn't clash with normal arguments passed to it, if any).
When the subprocess exists, it would release any resources held at the server.
When started with the internal argument, it would:
Expose the user interface and "act normally", using the resources of the server.
You might look into atexit, which may give you the functionality you need to release resources upon program termination. I don't believe it is infallible, though.
Having said that, however, you should really be focusing on making sure your program doesn't crash; if you're hitting an error that is "unrecoverable", you should still invest in some error-handling code. If the error is caused by a Seg-Fault or some other similar OS-related error, you can either enable SEH exceptions (not sure if this is Windows-specific or not) to enable you to catch them with a normal try-catch block, or write some Signal Handlers to intercept those errors and deal with them.
I'm looking for a way to catch segfaults and other errors anywhere in a program (which uses multiple threads, some of which are created by external libraries). I'm using Visual Studio 2013 with the Intel C++ Compiler 2015.
Some external DLL's - in some cases I've even seen this with Windows drivers - can contain bugs that are out of my control, and my software runs 24/7 - I need to be able to log a crash somewhere and restart my software.
So far, I found that you can set a signal handler which handles SIGSEGV and other signals. Based on what I read, under Linux this would do exactly what I need (handle this signal for all threads), but under Windows you need to set the signal handler for each thread separately. Since I'm not the one who creates all the threads (if I was, I could just use __try/__catch), this isn't really an option. On top of that, I'm seeing that when I set a signal handler in a thread and then cause a SIGSEGV it doesn't get handled by the handler, while the exact same code works fine in the main thread - not sure what's going on there (but even a fix for that wouldn't help, since I don't create all the threads, and looping through all existing threads in my process to set handlers sounds like a very bad idea).
So, is there a way to do this? Googling and searching here did not help - I found several people with similar questions but no answers that are usable in my situation.
Note: What I have now, which works perfectly in the main thread but not at all if I copy this same block of code to any thread:
SignalHandlerPointer previousHandlerSEGV = signal(SIGSEGV, SignalHandler);
int *a;
a = NULL;
*a = 0;
To get notified about all unhandled exceptions in a process you can call SetUnhandledExceptionFilter. The functionality is documented as:
Issuing SetUnhandledExceptionFilter replaces the existing top-level exception filter for all existing and all future threads in the calling process.
Inside the exception filter it is recommended to trigger a call to MiniDumpWriteDump (in an external process), to produce a mini dump for off-line analysis. You get to control the amount of information that is written into the mini dump (e.g. threads, modules, call stacks, memory). Most importantly, you can dump the call stack of the thread that raised the uncaught exception, at the time the exception is raised.
As an alternative, I believe some/most/all of this can be done automatically by registering for application recovery and restart.
I have an application with main thread and additional (detached) process created in it.
In that process we are running network server which sends logs from queue through the network.
The question is: is it possible to do something in segfault handler to wait/finish for sending that log queue. So I want almost 100% delivery of that queue.
While it is possible to write a segfault handler, I highly recommend against it. First off, it's very easy to get your program into a "won't terminate" state due to a segfault in the segfault handler.
Second, as dan3 mentions, the memory of the process is likely in a corrupt state, making it hard to know what will and won't work.
Finally, you lose the opportunity to use the coredump from the process to help track down the problem.
While it's not recommended, it is possible.
My recommendation is to write a small program that avoids memory allocation and the use of pointers as much as possible. Perhaps create buffers as global arrays and only ever access them with limited code that can be reviewed by several skilled developers and tested thoroughly (stress testing is great here). Keep in mind, though, that the message could still get lost by the sender or receiver if they crash, so it may not be worth the effort.
By the way - when Netscape first wrote a version of their browser for Linux, I ran it and it kept getting into a locked-up state. Using the strace program, I quickly found that it was in an infinite segfault loop. Very frustrating, and leading to almost 100% cpu wasted.
You can wait() for a process and pthread_wait() for a thread to finish (you didn't specify clearly which one you use).
Remember that if you are in segfault handler, your memory is messed up (avoid malloc() and free()) and your FILE * could also be borked.
I'd like to apologize in advance, because this is not a very good question.
I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.
When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.
Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.
All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.
The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.
The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).
We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.
What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.
If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:
1) A top-level exception is generated and there is no matching catch block for that exception type.
2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.
A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:
"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"
-or-
"My program might crash, but if it does I want to create a dump file on the way down."
The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.
The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.
Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.
I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?
Analyze memory in a signal handler
http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx
This isn't a very good answer, but hopefully it might help you.
I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):
Dereferencing potentially invalid
pointers can disable stack expansion
in other threads. A thread exhausting
its stack, when stack expansion has
been disabled, results in the
immediate termination of the parent
process, with no pop-up error window
or diagnostic information.
As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.
There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly
Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service).
Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)