Handling Exceptions in a critical application that should not crash - c++

I have a server application which I am debugging which basically parses scripts (VBscript, Python, Jscript and SQl) for the application that requests it.
This is a very critical application which, if it crashes causes havoc for a lot of users. The problem I am facing is how to handle exceptions so that the application can continue and the users know if something is wrong in their scripts.
An example: In the SQL scripts the application normally returns a set of values (Date, Number, String and Number). So the scripts have to have a statement at the end as such:
into dtDate, Number, Number, sString. These are values that are built into the application and the server application knows how to interpret these. These fields are treated in the server app as part of an array. The return values should normally be in a specific order as the indexes for these fields into the array are hardcoded inside the server application.
Now when a user writing a script forgets one of these fields, then the last field (normally string) throws an IndexOutofBoundsException.
The question is how does one recover from exceptions of this nature without taking down the application?
Another example is an error in a script for which no error parsing message can be generated. These errors just disappear in the background in the application and eventually cause the server app to crash. The scripts on which it fails don't necessarily fail to execute entirely, but part of it doesn't execute and the other parts do, which makes it look fairly odd to a user.
This server app is a native C++ application and uses COM technologies.
I was wondering if anyone has any ideas on what the best way is to handle exceptions such as the ones described above without crashing the application??

You can't handle problems like this with exceptions. You could have a top-level catch block that catches the exception and hope that not too much state of the program got irrecoverably munched to try to keep the program alive. Still doesn't make the user happy, that query she is waiting for still doesn't run.
Ensuring that changes don't destabilize a critical business app requires organization. People that sign-off on the changes and verify that they work as intended before it is allowed into production. QA.

since you talk about parsing different languages, you probably have something like
class IParser //parser interface
{
virtual bool Parse( File& fileToParse, String& errMessage ) = 0;
};
class VBParser : public Parser
class SQLParser : public Parser
Suppose the Parse() method throws an exception that is not handled, your entire app crashes. Here's a simplified example how this could be fixed at the application level:
//somewhere main server code
void ParseFileForClient( File& fileToParse )
{
try
{
String err;
if( !currentParser->Parse( fileToParse, err ) )
ReportErrorToUser( err );
else
//process parser result
}
catch( std::exception& e )
{
ReportErrorToUser( FormatExceptionMessage( err ) );
}
catch( ... )
{
ReportErrorToUser( "parser X threw unknown exception; parsing aborted" );
}
}

If you know an operation can throw an exception, then you need to add exception handling to this area.
Basically, you need to write the code in an exception safe manner which usually uses the following guidelines
Work on temporary values that can throw exceptions
Commit the changes using the temp values after (usually this will not throw an exception)
If an exception is thrown while working on the temp values, nothing gets corrupted and in the exception handling you can manage the situation and recover.
http://www.gotw.ca/gotw/056.htm
http://www.gotw.ca/gotw/082.htm

It really depends on how long it takes to start up your server application. It may be safer to let the application crash and then reload it. Or taking a cue from Chrome browser run different parts of your application in different processes that can crash. If you can safely recover an exception and trust that your application state is ok then fine do it. However catching std::exception and continuing can be risky.
There are simple to complex ways to baby sit processes to make sure if they crash they can be restarted. A couple of tools I use.
bluepill http://asemanfar.com/Bluepill:-a-new-process-monitoring-tool
pacemaker http://www.clusterlabs.org/

For simple exceptions that can happen inside your program due to user errors,
simply save the state that can be changed, and restore it like this:
SaveStateThatCanBeAlteredByScript();
try {
LoadScript();
} catch(std::exception& e){
RestoreSavedState();
ReportErrorToUser(e);
}
FreeSavedState();
If you want to prevent external code from crashing (possible untrustable code like plugins), you need an IPC scheme. On Windows, I think you can memory map files with OpenFile(). On POSIX-systems you can use sem_open() together with mmap().

If you have a server. You basically have a main loop that waits for a signal to start up a job. The signal could be nothing and your server just goes through a list of files on the file system or it could be more like a web server where it waits for a connection and executes the script provided on the connection (or any thing like that).
MainLoop()
{
while(job = jobList.getJob())
{
job.execute();
}
}
To stop the server from crashing because of the scripts you need to encapsulate the external jobs in a protected region.
MainLoop()
{
// Don't bother to catch exceptions from here.
// This probably means you have a programming error in the server.
while(job = jobList.getJob())
{
// Catch exception from job.execute()
// as these exceptions are generally caused by the script.
try
{
job.execute();
}
catch(MyServerException const& e)
{
// Something went wrong with the server not the script.
// You need to stop. So let the exception propagate.
throw;
}
catch(std::exception const& e)
{
log(job, e.what());
}
catch(...)
{
log(job, "Unknown exception!");
}
}
}
If the server is critical to your operation then just detecting the problem and logging it is not always enough. A badly written server will crash so you want to automate the recovery. So you should write some form of heartbeat processes that checks at regular intervals if the processes has crashed and if it has automatically restart it.

Related

Elmah Does not email in a fire and forget scenario

I have a MVC app where I am trying to capture all the incoming requests in a ActionFilter. Here is the logging code. I am trying to log in a fire and forget model.
My issue is if I execute this code synchronously by taking out the Task.Run Elmah does send out an email. But for the code shown below I can see the error getting logged to the InMemory logger in elmah.axd but no emails.
public void Log(HttpContextBase context)
{
Task.Run(() =>
{
try
{
throw new NotImplementedException(); //simulating an error condition
using (var s = _documentStore.OpenSession())
{
s.Store(GetDataToLog(context));
s.SaveChanges();
}
}
catch (Exception ex)
{
ErrorSignal.FromCurrentContext().Raise(ex);
}
});
}
Got this answer from Atif Aziz (ELMAH Lead contributor) on the ELMAH google group:
When you use Task.Run, the HttpContext is not transferred to the thread pool thread on which your action will execute. When ErrorSignal.FromCurrentContext is called from within your action, my guess is that it's probably failing with another exception because there is no current context. That exception is lying with the Task. If you're on .NET 4, you're lucky because you'll see the ASP.NET app crash eventually (but possibly much after the fact) when the GC will kick in and collect the Task and its exception will go “unobserved”. If you're on .NET 4.5, the policy has been changed and the exception will simply get lost. Either way, your observation will be that mailing is not working. In fact, logging won't work either unless you use Elmah.ErrorLog.GetDefault(null).Log(new Error(ex)), where a null context is allowed. But that call only logs the error but does not do any mailing. ELMAH's modules are connected to the ASP.NET context. If you detach from that context by forking to another thread, then you cannot rely on ELMAH's modules. You can only use Elmah.ErrorLog.GetDefault(null).Log(new Error(ex)) reliably to log an error.

Boost HTTP server issue

I'm starting to use Boost, so may be I'm messing something up.
I'm trying to set up http server with boost (ASIO). I've taken the code from docs: http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/examples/cpp03_examples.html (HTTP Server, the first one)
The only difference from the example is I'm running server by my own method "run" and starting io_service in background thread, like in the docs: http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/reference/io_service.html
boost::asio::io_service::work work(io_service_);
(Also I'm stopping io_service from my run method too.)
When I'm starting this modified server everything seems to be OK, run method is working fine. But then I'm trying to get a doc from the server the request hangs and control flow never comes to "request_handle" method.
Am I missing something?
UPD. Here is my code of run method:
void NetstreamServer::run()
{
LOG4CPLUS_DEBUG(logger, "NetstreamServer is running");
boost::asio::io_service::work work(io_service_);
try
{
while (true)
{
if (condition)
{
io_service_.stop();
break;
}
}
}
catch (std::exception const& e)
{
LOG4CPLUS_ERROR(logger, "NetstreamServer" << " caught exception: " << e.what());
}
}
You should call io_service_::run() - otherwise no one will dispatch the completion handlers of Asio objects serviced by io_service_.
Without including the code you changed, everyone here can only guess. Unfortunately you also do not include the compiler and the OS you are using. Even with boost claiming it is platform independent, you should always include this information, as it reality, platforms are different even with boost.
Let me do a guess. You use Microsoft Windows? How do you prevent the "main" function to exit? You moved the blocking "run" function out of it in another thread, the main function has no wait point anymore. Let me guess again, you used something like "getchar". With that, you can exit your server with only hitting the keyboard return key. If yes, the problem is the getchar, with unfortunately blocks every io of the asio socket implementation, but only on Windows based systems.
I would not need to guess if you would include the informations mentioned in your post. In particular all(!) changes you made to the code sample.

storagefile::ReadAsync exception in c++/cx?

I have been trying to use c++/cx StorageFile::ReadAsync() to read a file in a store-apps, but it always return an invalid params exception no matter what
// "file" are returned from FileOpenPicker
IRandomAccessStream^ reader = create_task(file->OpenAsync(FileAccessMode::Read)).get();
if (reader->CanRead)
{
BitmapImage^ b = ref new BitmapImage();
const int count = 1000000;
Streams::Buffer^ bb = ref new Streams::Buffer(count);
create_task(reader->ReadAsync(bb, 1, Streams::InputStreamOptions::None)).get();
}
I have turn on all the manifest capabilities and added "file open picker" + "file type association" for Declarations. Any ideas ? thanks!
ps: most solutions I found is for C#, but the code structure are similar...
If this code is executing on the UI thread (or in any other Single Threaded Apartment, or STA), then the calls to .get() will throw if the tasks have not yet completed, because the call to .get() would block the thread. You must not block the UI thread or any other STA, and when compiling with C++/CX support enabled, the libraries enforce this.
If you turn on first chance exception handling in the debugger (Debug -> Exceptions..., check the C++ Exceptions check box), you should see that the first exception to be thrown is an invalid_operation exception, from the following line in <ppltasks.h>:
// In order to prevent Windows Runtime STA threads from blocking the UI, calling
// task.wait() task.get() is illegal if task has not been completed.
if (!_IsCompleted() && !_IsCanceled())
{
throw invalid_operation("Illegal to wait on a task in a Windows Runtime STA");
}
The "invalid parameter" you are reporting is the fatal error that is caused when this exception reaches the ABI boundary: the debugger is notified that the application is about to terminate because this exception was unhandled.
You need to restructure your code to use continuations, using task::then, as described in the article Asynchronous Programming in C++ Using PPL
Just to make sure you understand the async pattern, what is happening in your code is that you call create_task and immediately after that task has started you are trying to get the result with .get(). Calls to .get() will throw immediately if the task is still running or the file could not be found. Therefore, the correct way of structuring this is using a .then on your file task, ensuring that you have the result of this task before starting the next one.
create_task(file->OpenAsync(FileAccessMode::Read)).then([](IRandomAccessStream^ reader)
{
//do stuff with the reader
});
At that point the reader is available so you can do whatever you want to, even start a new task.
Also, it is possible that the call to OpenAsync is failing cause the file is empty, I would add a try catch block to the previous task, the one that gets the file, just to make sure that's not the problem.

How to find whether a given application is single instance or not?

I am looking for an efficient way to find whether a given application (say app.exe) is single instance or not? I thought of these following sols:
Do CreateProcess() twice and check whether there are two or more instance running of that application? If no, it is single instance application. But, this is not efficient.
Do CreateProcess() and wait for 1-2 sec. If this instance is killed (because there is already an instance running for it), it will be single instance app.
But I am not convinced with both above sol. Is there any other efficient way of doing that in windows?
Please note that I don't to kill or make any modifications to an already running (if any) instance of that application.
Think about it the other way: When you write a program, how do you specify whether it is single-instance or multiple-instance? Is there a way that some other program can get that information out of your program without running it? (Once you answer this question, then you have the answer to your question.)
This problem is not solvable in general because single-instance/multiple-instance-ness is determined at runtime and can be based on runtime conditions. For example, some applications are "sometimes multiple instance, sometimes single": If you run the application to open document X, and then document Y, you will get two instances. But if you open document X, and then document X again, the two instances will fold into one. Other applications may have a configuration switch that lets you select whether they are single-instance or multiple-instance. Or maybe they decide to flip a coin and decide to be single-instance if tails and multiple-instance if heads.
The best way is via using synchronization object called Mutex (Mutually exclusive). You may google it.
I think the following code may help to.
//---------------------------------------------------------------------------
WINAPI _tWinMain(HINSTANCE, HINSTANCE, LPTSTR, int)
{
try
{
HANDLE hMutex=OpenMutex(MUTEX_ALL_ACCESS,0,"SIns");
if (!hMutex) {
//Mutex doesn’t exist. This is the first instance so create the mutex.
//in this case app name is SIns (Single Instance)
hMutex=CreateMutex(0,0,"SIns");
Application->Initialize();
Application->MainFormOnTaskBar = true;
Application->CreateForm(__classid(TfMain), &fMain);
Application->Run();
ReleaseMutex(hMutex);
}
else{
//This is not single. The prev instance is already running
//so informing about it
//remember that if it finds prev instance we're activating it here
//you may do whatsoever here ...... e.g. you may kill process or stuff like this:)
ShowMessage("The program is already running. Switching to ...");
HWND hWnd=FindWindow(0,"SIns");
SetForegroundWindow(hWnd);
}
}
catch (Exception &exception)
{
Application->ShowException(&exception);
}
catch (...)
{
try
{
throw Exception("");
}
catch (Exception &exception)
{
Application->ShowException(&exception);
}
}
return 0;
}
//---------------------------------------------------------------------------
There is no way to do this at all. What happens if the application checks a mutex then makes a messagebox to tell the user an instance is already running and only when the user dismisses it does it kill the application? There are many different ways to ensure mutual exclusion via some shared resource, mutex, shared file, even maybe setting some registry key, the methods are unlimited.
The usual solution is to use some sort of a locking file. Under
traditional Unix, for example, the application will start by creating a
file (which will succeed even if the file exists), then try to create a
link to it (an atomic action); if that fails, the application will
immediately kill itself. Under Windows, the share mode of CreateFile
can be used to the same effect: open a file with share mode 0, and if
that fails, quit. (The Unix solution will leave the lock if the process
crashes, requiring it to be cleaned up manually. The Windows solution
will remove the lock if the system crashes.)
you may use mutexes... I do such check with following code:
bool insureApplicationUniqueness(HANDLE& mutexHandle)
{
mutexHandle=CreateMutexW(NULL,true,UNIQUE_INSTANCE_MUTEX_NAME);
if( mutexHandle&&(ERROR_ALREADY_EXISTS==GetLastError()))
{
CloseHandle(mutexHandle);
return false;
}
return true;
}
but this is for application which source code is yours and which checks is another instance of itself running.
The problem with the notion is that in common environments, there is no explicit static data that determines whether an application is single-instance. You only have behavior to go on, but you cannot fully test behavior.
What if you have an app that is multi-instance, but will fail to open a file that's already open? If you test it twice with the same, valid filename, it would create only a single process, but any other command line argument would cause two processes to exist. Is this a single-instance program?
You could even argue that "single instance" isn't a well-defined catageory of programs for this reason.

Flash Player Unresponsive on Remoting Error - AS2

We have an application that was developed with Flash, AS2 and ColdFusion backend (remoting). I observed that when there was a database query failure, and that came in to Flash, the _result handler will be called (instead of _status), and the player hangs with the infamous unresponsive / abort the script error.
Doing a trace on the result produces nothing. Trying to enumerate properties in the result also produces nothing.
That's very strange. Does anyone have any idea about what could be causing this / how to solve it?
Use debug version of flash player in your browser if you don't use it already, most likely it will throw an exception popup.
Second thing is to install http://amfexplorer.riaforge.org/ and see what back-end sends, if anything.
If this doesn't help try putting result parsing code into try-catch and see where it blows up applciation:
try {
// statements
} catch (myErr) {
// statements
} finally {
// statements
}