Floating point stack handling with floating-point exceptions turned on - c++

I'm running into an issue with floating point exceptions turned on in Visual Studio 2005. If I have code like this:
double d = 0.0;
double d2 = 3.0;
double d3 = d2/d;
and if I register an SEH handler routine, then I can easily turn the div-by-zero into a C++ exception and catch it. So far so good.
However, when I do this, the first operand (0.0 in the example above) is left on the FPU register stack. If I do it eight times, then I'll start getting a floating point stack check exception with EVERY floating point operation from then on.
I can deal with this using an __asm block to execute a FSTP, thus popping the stray value off the stack, and everything is fine.
However, this worries me, because I haven't seen this discussed anywhere. How can I be certain about the number of values that I should pop? Is it safe to just pop everything that's on the stack at the time of the exception? Are there any recommended best practices in this area?
Thanks!

While I can't find anything either, I can give some explanation as to the likely answer:
The ABI defines that upon a function call the stack should be empty, and it should be empty again on exit unless the return is a floating point value, where it would be the only item on the stack.
Since the exception handler must be able to return to any place, some criteria must hold on those locations. The question here is, does the stack unwinder have any knowledge of the FPU stack of the function having the catch()? Most likely, the answer is no because it is easier and faster to create a suitable return point with fixed properties than to include the full FPU stack in the unwinding.
Which leads to your problem - normally raising an exception has the compiler take care of the FPU being empty, but in an SEH handler, the compiler has no clue it caused an entry into another function, and thus can't take care of things just in case. (other than it being again hideously slow)
Which means that most likely, the FPU stack should be in its "consistent" state upon return, which means you probably want an equivalent of an EMMS instruction.
Why EMMS? Well, unless it is not supported it does the following:
clear the stack (which fixes all leftover floating point arguments)
clears the stack tags (which fixes an unusable stack when exiting from an MMX-enabled function)
If you want to support Pentium 1 or worse, you may of course if() around the EMMS and use something else instead.
No guarantees of course, but I hope I explained the why of the likely answer sufficiently.

Related

floating point exceptions in LINUX -- how to convert them into something useful

I've got some piece of code calculating a vector of functions of a vector of independent variables (and parameters):
struct instance
{ double m_adParam1, m_adParam2;
std::array<double, OUTSIZE> calculate(const std::array<double, INSIZE>&x) const;
};
When any part of this code fails due to a division by (nearly) zero or due to a domain error, the entire code should fail and tell the caller, that it cannot calculate this function for the given input.
This should also apply for calculated values used for comparison only. So testing the result vector for NAN or infinity is not sufficient. Inserting tests after every fallible operation is infeasible.
How could one do this?
I read about fetestexcept(). But it is unclear what is resetting this state flag.
SIGFPE is useless for anything other than debugging.
Having the ability to convert this into C++ exceptions as on Windows would be great, but inserting a final check would be sufficient as well.
Inserting code to check for invalid inputs for the following code is also infeasible, as it would slow down the calculation dramatically.
I read about fetestexcept(). But it is unclear what is resetting this state flag.
That is what feclearexcept() does. So you can do something like
std::feclearexcept(FE_INVALID | FE_OVERFLOW | FE_DIVBYZERO);
// do lots of calculations
// now check for exceptions
if (std::fetestexcept(FE_INVALID))
throw std::domain_error("argh");
else if (std::fetestexcept(FE_OVERFLOW))
throw std::overflow_error("eek");
// ...
The exception flags are "sticky" and no other operation, besides an explicit feclearexcept(), should clear them.
The main benefits of using feenablexcept() to force a SIGFPE are that you would detect the error immediately, without waiting until you happen to get around to testing the exception flag, and that you could pinpoint the exact instruction that caused the exception. It doesn't sound like either of those are needed in your use case.

Signaling or catching 'nan' as they occur in computations in numerical code base in c++

We have numerical code written in C++. Rarely but under certain specific inputs, some of the computations result in an 'nan' value.
Is there a standard or recommended method by which we can stop and alert the user when a certain numerical calculation results in an 'nan' being generated? (under debug mode).Checking for each result if it is equal to 'nan' seems impractical given the huge sizes of matrices and vectors.
How do standard numerical libraries handle this situation? Could you throw some light on this?
NaN is propagated, when applied to a numeric operation. So, it is enough to check the final result for being a NaN. As for, how to do it -- if building for >= C++11, there is std::isnan, as Goz noticed. For < C++11 - if want to be bulletproof - I would personally do bit-checking (especially, if there may be an optimization involved). The pattern for NaN is
? 11.......1 xx.......x
sign bit ^ ^exponent^ ^fraction^
where ? may be anything, and at least one x must be 1.
For platform dependent solution, there seams to be yet another possibility. There is the function feenableexcept in glibc (probably with the signal function and the compiler option -fnon-call-exceptions), which turns on a generation of the SIGFPE sinals, when an invalid floating point operation occure. And the function _control87 (probably with the _set_se_translator function and compiler option /EHa), which allows pretty much the same in VC.
Although this is a nonstandard extension originally from glibc, on many systems you can use the feenableexcept routine declared in <fenv.h> to request that the machine trap particular floating-point exceptions and deliver SIGFPE to your process. You can use fedisableexcept to mask trapping, and fegetexcept to query the set of exceptions that are unmasked. By default they are all masked.
On older BSD systems without these routines, you can use fpsetmask and fpgetmask from <ieeefp.h> instead, but the world seems to be converging on the glibc API.
Warning: glibc currently has a bug whereby (the C99 standard routine) fegetenv has the unintended side effect of masking all exception traps on x86, so you have to call fesetenv to restore them afterward. (Shows you how heavily anyone relies on this stuff...)
On many architectures, you can unmask the invalid exception, which will cause an interrupt when a NaN would ordinarily be generated by a computation such as 0*infinity. Running in the debugger, you will break on this interrupt and can examine the computation that led to that point. Outside of a debugger, you can install a trap handler to log information about the state of the computation that produced the invalid operation.
On x86, for example, you would clear the Invalid Operation Mask bit in FPCR (bit 0) and MXCSR (bit 7) to enable trapping for invalid operations from x87 and SSE operations, respectively.
Some individual platforms provide a means to write to these control registers from C, but there's no portable interface that works cross-platform.
Testing f!=f might give problems using g++ with -ffast-math optimization enabled: Checking if a double (or float) is NaN in C++
The only foolproof way is to check the bitpattern.
As to where to implement the checks, this is really dependent on the specifics of your calculation and how frequent Nan errors are i.e. performance penalty of continuing tainted calculations versus checking at certain stages.

Handling Floating-Point exceptions in C++

I'm finding the floating-point model/error issues quite confusing. It's an area I'm not familiar with and I'm not a low level C/asm programmer, so I would appreciate a bit of advice.
I have a largish C++ application built with VS2012 (VC11) that I have configured to throw floating-point exceptions (or more precisely, to allow the C++ runtime and/or hardware to throw fp-exceptions) - and it is throwing quite a lot of them in the release (optimized) build, but not in the debug build. I assume this is due to the optimizations and perhaps the floating-point model (although the compiler /fp:precise switch is set for both the release and debug builds).
My first question relates to managing the debugging of the app. I want to control where fp-exceptions are thrown and where they are "masked". This is needed because I am debugging the (optimized) release build (which is where the fp-exceptions occur) - and I want to disable fp-exceptions in certain functions where I have detected problems, so I can then locate new FP problems. But I am confused by the difference between using _controlfp_s to do this (which works fine) and the compiler (and #pragma float_control) switch "/fp:except" (which seems to have no effect). What is the difference between these two mechanisms? Are they supposed to have the same effect on fp exceptions?
Secondly, I am getting a number of "Floating-point stack check" exceptions - including one that seems to be thrown in a call to the GDI+ dll. Searching around the web, the few mentions of this exception seem to indicate it is due to compiler bugs. Is this generally the case? If so, how should I work round this? Is it best to disable compiler optimizations for the problem functions, or to disable fp-exceptions just for the problematic areas of code if there don't appear to be any bad floating-point values returned? For example, in the GDI+ call (to GraphicsPath::GetPointCount) that throws this exception, the actual returned integer value seems correct. Currently I'm using _controlfp_s to disable fp-exceptions immediately prior to the GDI+ call – and then use it again to re-enable exceptions directly after the call.
Finally, my application does make a lot of floating-point calculations and needs to be robust and reliable, but not necessarily hugely accurate. The nature of the application is that the floating-point values generally indicate probabilities, so are inherently somewhat imprecise. However, I want to trap any pure logic errors, such as divide by zero. What is the best fp model for this? Currently I am:
trapping all fp exceptions (i.e. EM_OVERFLOW | EM_UNDERFLOW | EM_ZERODIVIDE | EM_DENORMAL | EM_INVALID) using _controlfp_s and a SIGFPE Signal handler,
have enabled the denormals-are-zero (DAZ) and flush-to-zero (FTZ) (i.e. _MM_SET_FLUSH_ZERO_MODE(_MM_DENORMALS_ZERO_ON)), and
I am using the default VC11 compiler settings /fp:precise with /fp:except not specified.
Is this the best model?
Thanks and regards!
Most of the the following information comes from Bruce Dawson's blog post on the subject (link).
Since you're working with C++, you can create a RAII class that enables or disables floating point exceptions in a scoped manner. This lets you have greater control so that you're only exposing the exception state to your code, rather than manually managing calling _controlfp_s() yourself. In addition, floating point exception state that is set this way is system wide, so it's really advisable to remember the previous state of the control word and restore it when needed. RAII can take care of this for you and is a good solution for the issues with GDI+ that you're describing.
The exception flags _EM_OVERFLOW, _EM_ZERODIVIDE, and _EM_INVALID are the most important to account for. _EM_OVERFLOW is raised when positive or negative infinity is the result of a calculation, whereas _EM_INVALID is raised when a result is a signaling NaN. _EM_UNDERFLOW is safe to ignore; it signals when your computation result is non-zero and between -FLT_MIN and FLT_MIN (in other words, when you generate a denormal). _EM_INEXACT is raised too frequently to be of any practical use due to the nature of floating point arithmetic, although it can be informative if trying to track down imprecise results in some situations.
SIMD code adds more wrinkles to the mix; since you don't indicate using SIMD explicitly I'll leave out a discussion of that except to note that specifying anything other than /fp:fast can disable automatic vectorization of your code in VS 2012; see this answer for details on this.
I can't help much with the first two questions, but I have experience and a suggestion for the question about masking FPU exceptions.
I've found the functions
_statusfp() (x64 and Win32)
_statusfp2() (Win32 only)
_fpreset()
_controlfp_s()
_clearfp()
_matherr()
useful when debugging FPU exceptions and in delivering a stable and fast product.
When debugging, I selectively unmask exceptions to help isolate the line of code where an fpu exception is generated in a calculation where I cannot avoid calling other code that unpredictably generates fpu exceptions (like the .NET JIT's divide by zeros).
In released product I use them to deliver a stable program that can tolerate serious floating point exceptions, detect when they occur, and recover gracefully.
I mask all FPU exceptions when I have to call code that cannot be changed,does not have reliable exception handing, and occasionally generates FPU exceptions.
Example:
#define BAD_FPU_EX (_EM_OVERFLOW | _EM_ZERODIVIDE | _EM_INVALID)
#define COMMON_FPU_EX (_EM_INEXACT | _EM_UNDERFLOW | _EM_DENORMAL)
#define ALL_FPU_EX (BAD_FPU_EX | COMMON_FPU_EX)
Release code:
_fpreset();
Use _controlfp_s() to mask ALL_FPU_EX
_clearfp();
... calculation
unsigned int bad_fpu_ex = (BAD_FPU_EX & _statusfp());
_clearfp(); // to prevent reacting to existing status flags again
if ( 0 != bad_fpu_ex )
{
... use fallback calculation
... discard result and return error code
... throw exception with useful information
}
Debug code:
_fpreset();
_clearfp();
Use _controlfp_s() to mask COMMON_FPU_EX and unmask BAD_FPU_EX
... calculation
"crash" in debugger on the line of code that is generating the "bad" exception.
Depending on your compiler options, release builds may be using intrinsic calls to FPU ops and debug builds may call math library functions. These two methods can have significantly different error handling behavior for invalid operations like sqrt(-1.0).
Using executables built with VS2010 on 64-bit Windows 7, I have generated slightly different double precision arithmetic values when using identical code on Win32 and x64 platforms. Even using non-optimized debug builds with /fp::precise, the fpu precision control explicitly set to _PC_53, and the fpu rounding control explicitly set to _RC_NEAR. I had to adjust some regression tests that compare double precision values to take the platform into account. I don't know if this is still an issue with VS2012, but heads up.
I've been struggling for achieving some information about handling floating point exceptions on linux and I can tell you what I learned:
There are a few ways of enabling the exception mechanism:
fesetenv (FE_NOMASK_ENV); enables all exceptions
feenableexcept(FE_ALL_EXCEPT );
fpu_control_t fw;
_FPU_GETCW(fw);
fw |=FE_ALL_EXCEPT;
_FPU_SETCW(fw);
4.
> fenv_t envp; include bits/fenv.h
> fegetenv(&envp);
envp.__control_word |= ~_FPU_MASK_OM;
> fesetenv(&envp);
5.
> fpu_control_t cw;
> __asm__ ("fnstcw %0" : "=m" (*&cw));get config word
>cw |= ~FE_UNDERFLOW;
> __asm__ ("fldcw %0" : : "m" (*&cw));write config word
6.C++ mode: std::feclearexcept(FE_ALL_EXCEPT);
There are some useful links :
http://frs.web.cern.ch/frs/Source/MAC_headers/fpu_control.h
http://en.cppreference.com/w/cpp/numeric/fenv/fetestexcept
http://technopark02.blogspot.ro/2005/10/handling-sigfpe.html

Debug stack corruption

Now I am debugging a large project, which has a stack corruption: the application fails.
I would like to know how to find (debug) such stack corruption code with Visual Studio 2010?
Here's an example of some code which causes stack problems, how would I find less obvious cases of this type of corruption?
void foo()
{
int i = 10;
int *p = &i;
p[-2] = 100;
}
Update
Please note that this is just an example. I need to find such bad code in the current project.
There's one technique that can be very effective with these kinds of bugs, but it'll only work on a subset of them that has a few characteristics:
the corrupting value must be stable (ie., as in your example, when the corruption occurs, it's always 100), or at least something that can be readily identified in a simple expression
the corruption has to occur at a particular address on the stack
the corrupting value is unusual enough that you won't be hit with a slew of false positives
Note that the second condition may seem unlikely at first glance because the stack can be used in so many different ways depending on the runtime actions. However, stack usage is generally pretty deterministic. The problem is that a particular stack location can be used for so many different things that the problem is really item #3.
Anyway, if your bug has these characteristics, you should identify the stack address (or one of them) that gets corrupted, then set a memory breakpoint for a write to that address with a condition that causes it to break only if the value written is the corrupting value. In visual Studio, you can do this by creating a "New Data Breakpoint..." in the Breakpoints window then right clicking the breakpoint to set the condition.
If you end up getting too many false positives, it might help to narrow the scope of the breakpoint by leaving it disabled until some point in the execution path that's closer to the bug (if you can identify such a time), or set the hit count high enough to remove most of the false positives.
An additional complication is the address of the stack may change from run to run - in this case, you'll have to take care to set the breakpoint on each run (the lower bits of the address should be the same).
I believe your Questions quotes an example of stack corruption and the question you are asking is not why it crashes.
If it is so, It crashes because it creates an Undefined Behavior because the index -2 points to an unknown memory location.
To answer the question on profiling your application:
You can use Rational Purify Plus for Visual studio to check for memory overrites and access errors.
This is UB: p[-2] = 100;
You can access p with the operator[] in this(p[i]) way, but in this case i is an invalid value. So p[-2] points to an invalid memory location and causes Undefined Behaviour.
To find it you should debug your app and find where it crashes, and hopefully, it'll be at a place where something is actually wrong.

Why and when should one call _fpreset( )?

The only documentation I can find (on MSDN or otherwise) is that a call to _fpreset() "resets the floating-point package." What is the "floating point package?" Does this also clear the FPU status word? I see documentation that says to call _fpreset() when recovering from a SIGFPE, but doesn't _clearfp() do this as well? Do I need to call both?
I am working on an application that unmasks some FP exceptions (using _controlfp()). When I want to reset the FPU to the default state (say, when calling to .NET code), should I just call _clearfp(), _fpreset(), or both. This is performance critical code, so I don't want to call both if I don't have to...
_fpreset() resets the state of the floating-point unit. It resets the FPU precision to its default and clears the FPU status word. The two occasitions I see to use it are when recovering from an FPE (as you said) and when getting control back from library code (e.g. a DLL that you have no control about) that has screwed the FPU in any way, like changing the precision.