How to terminate a program when it crashes? (which should just fail a unit test instead of getting stuck forever) - unit-testing

Our unit tests fire off child processes, and sometimes these child processes crash. When this happens, a Windows Error Reporting dialog pops up, and the process stays alive until this is manually dismissed. This of course prevents the unit tests from ever terminating.
How can this be avoided?
Here's an example dialog in Win7 with the usual settings:
If I disable the AeDebug registry key, the JIT debugging option goes away:
If I disable checking for solutions (the only thing I seem to have control over via the control panel), it looks like this, but still appears and still stops the program from dying until the user presses something. WerAddExcludedApplication is documented to also have this effect.

A summary from the answers by jdehaan and Eric Brown, as well as this question (see also this question):
N.B. These solutions may affect other error reporting as well, e.g. failure to load a DLL or open a file.
Option 1: Disable globally
Works globally on the entire user account or machine, which can be both a benefit and a drawback.
Set [HKLM|HKCU]\Software\Microsoft\Windows\Windows Error Reporting\DontShowUI to 1.
More info: WER settings.
Option 2: Disable for the application
Requires modification to the crashing program, described in documentation as best practice, unsuitable for a library function.
Call SetErrorMode: SetErrorMode(SetErrorMode(0) | SEM_NOGPFAULTERRORBOX); (or with SEM_FAILCRITICALERRORS). More info: Disabling the program crash dialog (explains the odd arrangement of calls).
Option 2a: Disable for a function:
Requires modification to the crashing program, requires Windows 7/2008 R2 (desktop apps only) or higher, described in documenation as preferred to SetErrorMode, suitable for a thread-safe library function.
Call and reset SetThreadErrorMode:
DWORD OldThreadErrorMode = 0;
SetThreadErrorMode(SEM_FAILCRITICALERRORS,& OldThreadErrorMode);
…
SetThreadErrorMode (z_OldThreadErrorMode, NULL);
More info: not much available?
Option 3: Specify a handler
Requires modification to the crashing program.
Use SetUnhandledExceptionFilter to set your own structured exception handler that simply exits, probably with reporting and possibly an attempt at clean-up.
Option 4: Catch as an exception
Requires modification to the crashing program. For .NET applications only.
Wrap all code into a global try/catch block. Specify the HandleProcessCorruptedStateExceptionsAttribute and possibly also the SecurityCriticalAttribute on the method catching the exceptions. More info: Handling corrupted state exceptions
Note: this might not catch crashes caused by the Managed Debugging Assistants; if so, these also need to be disabled in the application.
Option 5: Stop the reporting process
Works globally on the entire user account, but only for a controlled duration.
Kill the Windows Error Reporting process whenever it shows up:
var werKiller = new Thread(() =>
{
while (true)
{
foreach (var proc in Process.GetProcessesByName("WerFault"))
proc.Kill();
Thread.Sleep(3000);
}
});
werKiller.IsBackground = true;
werKiller.Start();
This is still not completely bullet-proof though, because a console application may crash via a different error message, apparently displayed by an internal function called NtRaiseHardError:

The only solution is to catch all exceptions at a very high level (for each thread) and terminate the application properly (or perform another action).
This is the only way to prevent the exception from escaping your app and activating WER.
Addition:
If the exception is something you do not except to happen you can use an AssertNoThrow(NUnit) or alike in another Unit Test framework to enclose the code firing the child processes. This way you would also get it into your Unit test report. This is in my opinion the cleanest possible solution I can think of.
Addition2:
As the comments below show, I was mistaken: you cannot always catch the asynchronous exceptions, it depends on what the environment allows. In .NET some exceptions are prevented from being caught, what makes my idea worthless in this case...
For .NET: There are complicated workarounds involving the use of AppDomains, leading to an unload of an AppDomain instead of a crash of the whole application. Too bad...
http://www.bluebytesoftware.com/blog/PermaLink,guid,223970c3-e1cc-4b09-9d61-99e8c5fae470.aspx
http://www.develop.com/media/pdfs/developments_archive/AppDomains.pdf
EDIT:
I finally got it. With .NET 4.0 You can add the HandleProcessCorruptedStateExceptions attribute from System.Runtime.ExceptionServices to the method containing the try/catch block. This really worked! Maybe not recommended but works.
using System;
using System.Reflection;
using System.Runtime.InteropServices;
using System.Runtime.ExceptionServices;
namespace ExceptionCatching
{
public class Test
{
public void StackOverflow()
{
StackOverflow();
}
public void CustomException()
{
throw new Exception();
}
public unsafe void AccessViolation()
{
byte b = *(byte*)(8762765876);
}
}
class Program
{
[HandleProcessCorruptedStateExceptions]
static void Main(string[] args)
{
Test test = new Test();
try {
//test.StackOverflow();
test.AccessViolation();
//test.CustomException();
}
catch
{
Console.WriteLine("Caught.");
}
Console.WriteLine("End of program");
}
}
}

Try setting
HKCU\Software\Microsoft\Windows\Windows Error Reporting\DontShowUI
to 1. (You can also set the same key in HKLM, but you need admin privs to do that.)
This should prevent WER from showing any UI.

Related

Using assert in main program logic of production code

Suppose I have this in a custom Stack implementation:
void Pop (Stack & s) {
assert (!isEmpty(s));
// implementation details of popping omitted
}
Suppose I want to catch bad client uses like popping an empty stack. "Assert" is great until you compile the production version and disable it via the NDEBUG flag. What's the most professional way of dying gracefully if you detect an unrecoverable error, assuming that you are going to turn off assertions eventually?
Yes, I know there are a bunch of options: use "exit"; change the function to return a C-like error code; use C++ exception handling; just ignore the offending operation (make bad calls into a no-op); implement a personal version assert called something else that won't get turned off, etc.
What the "most pro" thing to do here? I just want to die quickly, with a helpful message.
For fun, I created a vector instance from the standard library and popped it empty. This caused a seg fault, which might be acceptable for the standard library, but I want to catch and report such a problem before dying.
It is often used self defined assert macro with logging in production code. For example,
#define MYCOMPANY_ASSERT_FATAL(expression, msg, ret_val) if (!expression){ logger.fatal(msg); return ret_val;}
#define MYCOMPANY_ASSERT_WARN(...) .....
#define MYCOMPANY_ASSERT_ERROR(...) .....
Basically, the severity level of the error which should be regarded as exiting or messaging or so is dependent on your application program context layer. In short, some_lib_func() couldn't decide to exit an own process in most cases. That's why some_lib_func() should propagate the error information to caller. For example,
// library or middle module func couldn't decide to handle own application process, so after only logging return to caller
bool read_csv(double& val) {
double val = 0;
bool ret = parse_csv_from_file(val);
MYCOMPANY_ASSERT_ERROR(ret, "failed to parse_csv_from_file", false);
// return to caller with logging and error info(true/false here)
....
....
}
// application layer and need to handle for error which should be continue or message for user..
bool show_weather_in_foreign_country() {
bool ret = read_csv();
if (!ret) {
show_error_message();
}
// in this case read_csv error is trivial and you want to continue to process with proper message
...
}
In this case, showing weather is considered as trivial process in your application and you don't want to exit a whole application if error, so after showing a appropriate message and continue to process. In the other hand, another case below is critical, despite using the same function of read_csv.
// application layer and need to handle for error which should be continue or message for user..
bool send_your_account_balance_in_securiities_to_your_wife() {
double val = 0;
bool ret = read_csv(val);
MYCOMPANY_ASSERT_FATAL(ret, "critical in my account balance", false);
// in this application context, the failure of read_csv is critical and never continue to process
// and return to caller. Caller should probably exit an own application
send_email_your_wife(val);
// if send 0$ with some kind of mistake to you wife, she might die or kill you ...
...
}
Therefore, preparing some of defined macro for propagating error and logging is so useful and make your code simple and safe, and then you need to use them properly depending on your application contexts.

weird behavior of global bool after porting to RAD11

After porting one huge project from BDS2006 to RAD11 (C++ VCL win32 classic compiler) I found out that one checkbox that should be unchecked at app start is always checked...
After breakpoint/tracing I localized the problem to this function:
void main_caption()
{
if (_simple_gfx) { _GLSL=0; Main->Caption=version+" Simplified graphics."; }
else if (_GLSL) Main->Caption=version+" GLSL";
else Main->Caption=version;
Main->ck_simple_gfx->Checked=_simple_gfx; // WTF here _simple_gfx is handled as true on RAD11 even if its not (even watch list shows false)
Main->ck_GLSL ->Enabled=!_simple_gfx;
Main->ck_GLSL ->Checked=_GLSL;
}
where _simple_gfx is simple global bool set to false (confirmed also with watchlist) used to fall back to OpenGL 1.0 on questionable gfx hardware/drivers (which is not the case now), Main is my main window/form and ck_simple_gfx is the checkbox that should not be checked (but it is checked after the assignment marked with comment WTF).
If I reset the _simple_gfx=false; like this:
void main_caption()
{
if (_simple_gfx) { _GLSL=0; Main->Caption=version+" Simplified graphics."; }
else if (_GLSL) Main->Caption=version+" GLSL";
else Main->Caption=version;
_simple_gfx=false;
Main->ck_simple_gfx->Checked=_simple_gfx; // WTF here _simple_gfx is handled as true on RAD11 even if its not (even watch list shows false)
Main->ck_GLSL ->Enabled=!_simple_gfx;
Main->ck_GLSL ->Checked=_GLSL;
}
then its unchecked as it should. What is happening? the main_caption is called from Main constructor if that matters...
I do not think its related to the VCL checkbox itself because if I do this:
if (_simple_gfx)
the stuff inside get executed while _simple_gfx "is" still false. There is no conflict with _simple_gfx identificator that I know of. If the same code (just different project file) is compiled on older IDE (BDS2006 C++ Turbo Explorer) the code works as should.
Changing the bool _simple_gfx to static and or volatile did not change anything.
Does anyone know what kind of bug is this and how to workaround it?
Did global variables behavior changed in newer versions of C++ Builders? (similar to GC compilers buggy behavior for non volatile global variables on MCU platforms)
Sorry I did not create MCVE as the project is really huge which is most likely the root cause of problem like this.

Instantiating boost::beast in dynamic library causes a crash

I'm trying to implement a very simple, local, HTTP server for my C++ application — I'm using XCode on macOS. I have to implement it from within a dynamically loaded library rather than the "main" thread of the program. I decided to try using boost::beast since another part of the application uses boost libraries already. I'm trying to implement this example, but within the context of my library, and not as part its main program.
The host application for this library calls on the following function to start a localhost server, but crashes when instantiating "acceptor":
extern "C" BASICEXTERNALOBJECT_API long startLocalhost(TaggedData* argv, long argc, TaggedData * retval) {
try {
string status;
retval->type = kTypeString;
auto const address = net::ip::make_address("127.0.0.1");
unsigned short port = static_cast<unsigned short>(std::atoi("1337"));
net::io_context ioc{1};
tcp::acceptor acceptor{ioc, {address, port}}; // <-- crashes on this line
tcp::socket socket{ioc};
http_server(acceptor, socket);
ioc.run();
status = "{'status':'ok', 'message':'localhost server started!'}";
retval->data.string = getNewBuffer(status);
}
catch(std::exception const& e)
{
string status;
//err_msg = "Error: " << e.what() << std::endl;
status = "{'status':'fail', 'message':'Error starting web server'}";
retval->data.string = getNewBuffer(status);
}
return kESErrOK;
}
When stepping through the code, I see that XCode reports an error when the line with tcp::acceptor ... is executed:
Thread 1: EXC_BAD_ACCESS (code=1, address=0x783c0a3e3f22650c)
and is highlighted at the single line of code in a function in scheduler.h:
//Get the concurrency hint that was used to initialize the scheduler.
int concurrency_hint() const
{
return concurrency_hint_; //XCode halts here
}
I'm debating as to whether or not I should include a different C++ web server, like Drogon, instead of boost::beast, but I thought I would post here to see if anybody had any insight as to why the crash is happening in this case.
Update
I found a fix that is a workaround for my particular circumstances, hopefully it can help others running into this issue.
The address to the service_registry::create static factory method resolves correctly when I add ASIO_DECL in front of the methods declaration in asio/detail/service_registry.hpp.
It should look like this:
// Factory function for creating a service instance.
template <typename Service, typename Owner>
ASIO_DECL static execution_context::service* create(void* owner);
By adding ASIO_DECL in front of it, it resolves correctly and the scheduler and kqueue_reactor objects initialize properly avoiding the bad access to concurrency_hint().
In my case I am trying to use non-Boost ASIO inside of a VST3 audio plug-in running in Ableton Live 11 on macOS on an M1 processor. Using the VST3 plug-in in I'm getting this same crash. Using the same plug-in in other DAW applications, such as Reaper, does not cause the crash. It also does not occur for Ableton Live 11 on Windows.
I've got it narrowed down to the following issue:
In asio/detail/impl/service_registry.hpp the following method attempts to return a function pointer address to a create/factory method.
template <typename Service>
Service& service_registry::use_service(io_context& owner)
{
execution_context::service::key key;
init_key<Service>(key, 0);
factory_type factory = &service_registry::create<Service, io_context>;
return *static_cast<Service*>(do_use_service(key, factory, &owner));
}
Specifically, this line: factory_type factory = &service_registry::create<Service, io_context>;
When debugging in Xcode, in the hosts that work, when inspecting
factory, it shows the correct address linking to the service_registry::create<Service, io_context> static method.
However, in Ableton Live 11, it doesn't point to anything - somehow the address to the static method does not resolve correctly. This causes a cascade of issues, ultimately leading up to trying to invoke the factory function pointer in asio/asio/detail/impl/service_registry.ipp in the method service_registry::do_use_service. Since it doesn't point to a proper create method, nothing is created, it results in uninitialized objects, including the scheduler instance.
Therefore, when calling scheduler_.concurrency_hint() in kqueue_reactor.ipp the scheduler is uninitialized, and the EXC_BAD_ACCESS error results.
It's unclear to me why under some host processes, dynamically loading the plug-in cannot resolve the static method address, but others have no problem. In my case I compiled asio.hpp for standalone ASIO into the plug-in directly, there was no linking.
The best guesses I can come up with are
maybe your http_server might start additional threads or even fork. This might cause io_context and friends to be accessed after startLocalhost returned. To explain the crash location appearing to be at the indicated line, I could add the heuristic that something is already off during the destructor for ioc
the only other idea I have is that actually the opening/binding of the acceptor throws, but due to possible incompatibilities of types in the shared module vs the main program, the exception thrown is not actually caught and causes abnormal termination. This might happen more easily if the main program also uses Boost libraries, but a different copy (build/version) of them.
In this case there's a simple thing you can do: split up initialization and use the overloads that take error_code to instead use them.

Task continuation with context "use_current" does not work

I have tried looking for answer to this since three days back. It is either I have done something fundamentally wrong (that there's an obvious mistake) or the thing is too new to have any references, I can't seem to figure why simple cases like this would fail.
The following code uses PPL task library in C++ in a Windows Store application, simulating a file loading operation that takes 2 seconds before breaking out of the loop (of course this is to illustrate the problem with minimal codes, the real loop does other rendering to show progress, too).
The continuation part of the code (i.e. "fileLoaded = true") never gets called if I use "use_current" as the continuation context:
bool fileLoaded = false;
while (!fileLoaded)
{
concurrency::task<void>([this]()
{
// Simulate file load delay
concurrency::wait(2000);
}).then([this, &fileLoaded]()
{
fileLoaded = true; // This never gets executed!
// If the following is changed to "use_default" or
// "use_arbitrary", then this continuation gets called.
}, concurrency::task_continuation_context::use_current());
concurrency::wait(50);
}
The same code works if I use "use_default" or "use_arbitrary", and properly set "fileLoad" to "true". This code can be placed anywhere in a Windows Store C++ app (e.g. Direct2D app), and it would fail (I have placed it in "DirectXPage::DirectXPage" constructor body, which I expect it to be the main UI thread). Did I do something horribly wrong?
Thanks in advance for your help! :)
Since you are calling .then() on the UI thread, use_current() will cause the continuation to be scheduled for execution on the UI thread.
However, that continuation cannot run until the UI thread is free (i.e., when it is not doing any work). But, in your example, the UI thread is never free: the DirectXPage constructor is running on the UI thread. The DirectXPage constructor will not return until the continuation executes and the continuation cannot execute until the DirectXPage constructor returns.
You need to allow the constructor to return so that the UI thread is free to do other work (like execute the continuation).
Also note that if you are using fileLoaded for cross-thread communication, you need to use an std::atomic<bool> or some other proper synchronization object. A simple bool is insufficient for synchronization.

How can I catch an invalid fgetpos call as a C++ exception on Windows?

In Visual C++ 2008, I want to "catch" an exception generated as shown here:
try {
int foo = 20;
::fgetpos(0, (fpos_t*)&foo);
}
//...
Here are adjustments I've made to attempt a successful catch:
SEH is activated (/eha)
I've added a catch(...)
I've added a _set_se_translator vector.
I've added/adjusted to SEH syntax: __try / __except(EXCEPTION_EXECUTE_HANDLER)
In short, I've tried "everything in the book" and I still can't catch the exception. If I replace the call to ::fgetpos with int hey = foo / 0 then suddenly all of the above techniques work as expected. So the exception I'm dealing with from ::fgetpos is somehow "extra special."
Can someone explain why this ::fgetpos error seems uncatchable, and how to work around it?
update When executed in the VS IDE, the output window doesn't name an exception. All it says is this:
Microsoft Visual Studio C Runtime Library has detected a fatal error in MyProgram.exe.
Not very helpful. When I run the console app from the command line, I get a crash dialogue. The "problem details" section of the dialogue includes this information:
Problem Event Name: BEX
Exception Offset:0002fd30
Exception Code: c0000417
Exception Data: 00000000
Additional Information 1:69ad
Additional Information 2:69addfb19767b2221c8e3e7a5cd2f4ae
Additional Information 3:b1ff
Additional Information 4:b1ffca30cadddc78c19f19b6d150997f
Since the code in your dump corresponds to STATUS_INVALID_CRUNTIME_PARAMETER, try _set_invalid_parameter_handler
Most likely, the runtime catches it for you and issues a debug dialog without returning or propagating the exception- that is a CRT call and they may add whatever exception catching code in there they like. It's well within Visual Studio's rights to catch a hardware exception inside a library function, especially if you are running from within the IDE or in debug mode, then it is expected of the runtime.
Of course, when you divide by zero, then there is no library call here to write that extra catching code.