Why and when should one call _fpreset( )? - c++

The only documentation I can find (on MSDN or otherwise) is that a call to _fpreset() "resets the floating-point package." What is the "floating point package?" Does this also clear the FPU status word? I see documentation that says to call _fpreset() when recovering from a SIGFPE, but doesn't _clearfp() do this as well? Do I need to call both?
I am working on an application that unmasks some FP exceptions (using _controlfp()). When I want to reset the FPU to the default state (say, when calling to .NET code), should I just call _clearfp(), _fpreset(), or both. This is performance critical code, so I don't want to call both if I don't have to...

_fpreset() resets the state of the floating-point unit. It resets the FPU precision to its default and clears the FPU status word. The two occasitions I see to use it are when recovering from an FPE (as you said) and when getting control back from library code (e.g. a DLL that you have no control about) that has screwed the FPU in any way, like changing the precision.

Related

Is it necessary to save the FPU state here?

I wrote a simple cooperative multi-threading library. Currently I always save and restore the fpu state with fxsave / fxrstor when switching to a new context. But is this necessary in the cdecl calling convention?
As a simple example:
float thread_using_fpu(float x)
{
float y = x / 2; // do some fpu operation
yield(); // context switch, possibly altering fpu state.
y = y / 2; // another fpu operation
return y;
}
May the compiler make any assumptions about the FPU state after the call to yield()?
As per the The SYSTEM V APPLICATION BINARY INTERFACE Intel386TM Architecture Processor Supplement, page 3-12:
%st(0): If the function does not return a floating-point value, then this
register must be empty. This register must be empty before
entry to a function.
%st(1) through %st(7):
Floating-point scratch registers have no specified role in the
standard calling sequence. These registers must be empty before entry
and upon exit from a function.
Thus, you do not need to context switch them.
Another, newer version says this:
The CPU shall be in x87 mode upon entry to a function. Therefore, every function that uses the MMX registers is required to issue an
emms or femms instruction after using MMX registers, before returning
or calling another function. [...]
The control bits of the MXCSR register are callee-saved (preserved across calls), while the status bits are caller-saved (not preserved).
The x87 status word register is caller-saved, whereas the x87 control
word is callee-saved.
[...] All x87 registers are caller-saved, so callees that make use of the MMX registers may use the faster femms instruction.
So, you may need to save the control word.
No. You don't have to do any saving of the state. If one thread is in the middle of a floating point calculation where there is, for example, a denormalized flag set, and that thread is interrupted, then when it resumes the O/S or kernel will set the flags, just like it will restore other registers. Likewise, you don't have to worry about it in a yield().
Edit: If you are doing your own context switching, it is possible you would need to save the precision and rounding control flags if you need to set them to non-default values. Otherwise, again you're fine.

Signaling or catching 'nan' as they occur in computations in numerical code base in c++

We have numerical code written in C++. Rarely but under certain specific inputs, some of the computations result in an 'nan' value.
Is there a standard or recommended method by which we can stop and alert the user when a certain numerical calculation results in an 'nan' being generated? (under debug mode).Checking for each result if it is equal to 'nan' seems impractical given the huge sizes of matrices and vectors.
How do standard numerical libraries handle this situation? Could you throw some light on this?
NaN is propagated, when applied to a numeric operation. So, it is enough to check the final result for being a NaN. As for, how to do it -- if building for >= C++11, there is std::isnan, as Goz noticed. For < C++11 - if want to be bulletproof - I would personally do bit-checking (especially, if there may be an optimization involved). The pattern for NaN is
? 11.......1 xx.......x
sign bit ^ ^exponent^ ^fraction^
where ? may be anything, and at least one x must be 1.
For platform dependent solution, there seams to be yet another possibility. There is the function feenableexcept in glibc (probably with the signal function and the compiler option -fnon-call-exceptions), which turns on a generation of the SIGFPE sinals, when an invalid floating point operation occure. And the function _control87 (probably with the _set_se_translator function and compiler option /EHa), which allows pretty much the same in VC.
Although this is a nonstandard extension originally from glibc, on many systems you can use the feenableexcept routine declared in <fenv.h> to request that the machine trap particular floating-point exceptions and deliver SIGFPE to your process. You can use fedisableexcept to mask trapping, and fegetexcept to query the set of exceptions that are unmasked. By default they are all masked.
On older BSD systems without these routines, you can use fpsetmask and fpgetmask from <ieeefp.h> instead, but the world seems to be converging on the glibc API.
Warning: glibc currently has a bug whereby (the C99 standard routine) fegetenv has the unintended side effect of masking all exception traps on x86, so you have to call fesetenv to restore them afterward. (Shows you how heavily anyone relies on this stuff...)
On many architectures, you can unmask the invalid exception, which will cause an interrupt when a NaN would ordinarily be generated by a computation such as 0*infinity. Running in the debugger, you will break on this interrupt and can examine the computation that led to that point. Outside of a debugger, you can install a trap handler to log information about the state of the computation that produced the invalid operation.
On x86, for example, you would clear the Invalid Operation Mask bit in FPCR (bit 0) and MXCSR (bit 7) to enable trapping for invalid operations from x87 and SSE operations, respectively.
Some individual platforms provide a means to write to these control registers from C, but there's no portable interface that works cross-platform.
Testing f!=f might give problems using g++ with -ffast-math optimization enabled: Checking if a double (or float) is NaN in C++
The only foolproof way is to check the bitpattern.
As to where to implement the checks, this is really dependent on the specifics of your calculation and how frequent Nan errors are i.e. performance penalty of continuing tainted calculations versus checking at certain stages.

Handling Floating-Point exceptions in C++

I'm finding the floating-point model/error issues quite confusing. It's an area I'm not familiar with and I'm not a low level C/asm programmer, so I would appreciate a bit of advice.
I have a largish C++ application built with VS2012 (VC11) that I have configured to throw floating-point exceptions (or more precisely, to allow the C++ runtime and/or hardware to throw fp-exceptions) - and it is throwing quite a lot of them in the release (optimized) build, but not in the debug build. I assume this is due to the optimizations and perhaps the floating-point model (although the compiler /fp:precise switch is set for both the release and debug builds).
My first question relates to managing the debugging of the app. I want to control where fp-exceptions are thrown and where they are "masked". This is needed because I am debugging the (optimized) release build (which is where the fp-exceptions occur) - and I want to disable fp-exceptions in certain functions where I have detected problems, so I can then locate new FP problems. But I am confused by the difference between using _controlfp_s to do this (which works fine) and the compiler (and #pragma float_control) switch "/fp:except" (which seems to have no effect). What is the difference between these two mechanisms? Are they supposed to have the same effect on fp exceptions?
Secondly, I am getting a number of "Floating-point stack check" exceptions - including one that seems to be thrown in a call to the GDI+ dll. Searching around the web, the few mentions of this exception seem to indicate it is due to compiler bugs. Is this generally the case? If so, how should I work round this? Is it best to disable compiler optimizations for the problem functions, or to disable fp-exceptions just for the problematic areas of code if there don't appear to be any bad floating-point values returned? For example, in the GDI+ call (to GraphicsPath::GetPointCount) that throws this exception, the actual returned integer value seems correct. Currently I'm using _controlfp_s to disable fp-exceptions immediately prior to the GDI+ call – and then use it again to re-enable exceptions directly after the call.
Finally, my application does make a lot of floating-point calculations and needs to be robust and reliable, but not necessarily hugely accurate. The nature of the application is that the floating-point values generally indicate probabilities, so are inherently somewhat imprecise. However, I want to trap any pure logic errors, such as divide by zero. What is the best fp model for this? Currently I am:
trapping all fp exceptions (i.e. EM_OVERFLOW | EM_UNDERFLOW | EM_ZERODIVIDE | EM_DENORMAL | EM_INVALID) using _controlfp_s and a SIGFPE Signal handler,
have enabled the denormals-are-zero (DAZ) and flush-to-zero (FTZ) (i.e. _MM_SET_FLUSH_ZERO_MODE(_MM_DENORMALS_ZERO_ON)), and
I am using the default VC11 compiler settings /fp:precise with /fp:except not specified.
Is this the best model?
Thanks and regards!
Most of the the following information comes from Bruce Dawson's blog post on the subject (link).
Since you're working with C++, you can create a RAII class that enables or disables floating point exceptions in a scoped manner. This lets you have greater control so that you're only exposing the exception state to your code, rather than manually managing calling _controlfp_s() yourself. In addition, floating point exception state that is set this way is system wide, so it's really advisable to remember the previous state of the control word and restore it when needed. RAII can take care of this for you and is a good solution for the issues with GDI+ that you're describing.
The exception flags _EM_OVERFLOW, _EM_ZERODIVIDE, and _EM_INVALID are the most important to account for. _EM_OVERFLOW is raised when positive or negative infinity is the result of a calculation, whereas _EM_INVALID is raised when a result is a signaling NaN. _EM_UNDERFLOW is safe to ignore; it signals when your computation result is non-zero and between -FLT_MIN and FLT_MIN (in other words, when you generate a denormal). _EM_INEXACT is raised too frequently to be of any practical use due to the nature of floating point arithmetic, although it can be informative if trying to track down imprecise results in some situations.
SIMD code adds more wrinkles to the mix; since you don't indicate using SIMD explicitly I'll leave out a discussion of that except to note that specifying anything other than /fp:fast can disable automatic vectorization of your code in VS 2012; see this answer for details on this.
I can't help much with the first two questions, but I have experience and a suggestion for the question about masking FPU exceptions.
I've found the functions
_statusfp() (x64 and Win32)
_statusfp2() (Win32 only)
_fpreset()
_controlfp_s()
_clearfp()
_matherr()
useful when debugging FPU exceptions and in delivering a stable and fast product.
When debugging, I selectively unmask exceptions to help isolate the line of code where an fpu exception is generated in a calculation where I cannot avoid calling other code that unpredictably generates fpu exceptions (like the .NET JIT's divide by zeros).
In released product I use them to deliver a stable program that can tolerate serious floating point exceptions, detect when they occur, and recover gracefully.
I mask all FPU exceptions when I have to call code that cannot be changed,does not have reliable exception handing, and occasionally generates FPU exceptions.
Example:
#define BAD_FPU_EX (_EM_OVERFLOW | _EM_ZERODIVIDE | _EM_INVALID)
#define COMMON_FPU_EX (_EM_INEXACT | _EM_UNDERFLOW | _EM_DENORMAL)
#define ALL_FPU_EX (BAD_FPU_EX | COMMON_FPU_EX)
Release code:
_fpreset();
Use _controlfp_s() to mask ALL_FPU_EX
_clearfp();
... calculation
unsigned int bad_fpu_ex = (BAD_FPU_EX & _statusfp());
_clearfp(); // to prevent reacting to existing status flags again
if ( 0 != bad_fpu_ex )
{
... use fallback calculation
... discard result and return error code
... throw exception with useful information
}
Debug code:
_fpreset();
_clearfp();
Use _controlfp_s() to mask COMMON_FPU_EX and unmask BAD_FPU_EX
... calculation
"crash" in debugger on the line of code that is generating the "bad" exception.
Depending on your compiler options, release builds may be using intrinsic calls to FPU ops and debug builds may call math library functions. These two methods can have significantly different error handling behavior for invalid operations like sqrt(-1.0).
Using executables built with VS2010 on 64-bit Windows 7, I have generated slightly different double precision arithmetic values when using identical code on Win32 and x64 platforms. Even using non-optimized debug builds with /fp::precise, the fpu precision control explicitly set to _PC_53, and the fpu rounding control explicitly set to _RC_NEAR. I had to adjust some regression tests that compare double precision values to take the platform into account. I don't know if this is still an issue with VS2012, but heads up.
I've been struggling for achieving some information about handling floating point exceptions on linux and I can tell you what I learned:
There are a few ways of enabling the exception mechanism:
fesetenv (FE_NOMASK_ENV); enables all exceptions
feenableexcept(FE_ALL_EXCEPT );
fpu_control_t fw;
_FPU_GETCW(fw);
fw |=FE_ALL_EXCEPT;
_FPU_SETCW(fw);
4.
> fenv_t envp; include bits/fenv.h
> fegetenv(&envp);
envp.__control_word |= ~_FPU_MASK_OM;
> fesetenv(&envp);
5.
> fpu_control_t cw;
> __asm__ ("fnstcw %0" : "=m" (*&cw));get config word
>cw |= ~FE_UNDERFLOW;
> __asm__ ("fldcw %0" : : "m" (*&cw));write config word
6.C++ mode: std::feclearexcept(FE_ALL_EXCEPT);
There are some useful links :
http://frs.web.cern.ch/frs/Source/MAC_headers/fpu_control.h
http://en.cppreference.com/w/cpp/numeric/fenv/fetestexcept
http://technopark02.blogspot.ro/2005/10/handling-sigfpe.html

How to set double precision in C++ on MacOSX?

I'm trying to port _controlfp( _CW_DEFAULT, 0xffffffff ); from WIN32 to Mac OS X / Intel. I have absolutely no idea how to port this instruction... And you? Thanks!
Try section 8.6 of Gough's Introduction to GCC, which demonstrates the x86 FLDCW instruction. But it helps if you tell us why you need it — if you want your doubles to be IEEE-754 64-bit doubles, the easiest way is to compile with -msse -mfpmath=sse.
What precision elements are you controlling?
According to Microsoft's Website:
The _control87 function gets and sets the floating-point control word. The floating-point control word allows the program to change the precision, rounding, and infinity modes in the floating-point math package. You can also mask or unmask floating-point exceptions using _control87. If the value for mask is equal to 0, _control87 gets the floating-point control word. If mask is nonzero, a new value for the control word is set: For any bit that is on (equal to 1) in mask, the corresponding bit in new is used to update the control word. In other words, fpcntrl = ((fpcntrl & ~mask) | (new & mask)) where fpcntrl is the floating-point control word.
Note the keyword "library". This function is manipulating the Microsoft library, which may not exist on the Mac.
I suggest the following:
Control the floating point chip on
the Mac yourself
Or use a Microsoft compiler for Mac
Or find a method to move the
floating point controls to a more
generic spot in your program.
The best advice I can give is to keep as much code inline with the standard and platform or library specific issues to minimum. For those functions involving platform specific features, move them into their own source files / translation units. Create copies of these functions, one for each platform. Let the linker decide with ones to use.
Good Luck.

Floating point stack handling with floating-point exceptions turned on

I'm running into an issue with floating point exceptions turned on in Visual Studio 2005. If I have code like this:
double d = 0.0;
double d2 = 3.0;
double d3 = d2/d;
and if I register an SEH handler routine, then I can easily turn the div-by-zero into a C++ exception and catch it. So far so good.
However, when I do this, the first operand (0.0 in the example above) is left on the FPU register stack. If I do it eight times, then I'll start getting a floating point stack check exception with EVERY floating point operation from then on.
I can deal with this using an __asm block to execute a FSTP, thus popping the stray value off the stack, and everything is fine.
However, this worries me, because I haven't seen this discussed anywhere. How can I be certain about the number of values that I should pop? Is it safe to just pop everything that's on the stack at the time of the exception? Are there any recommended best practices in this area?
Thanks!
While I can't find anything either, I can give some explanation as to the likely answer:
The ABI defines that upon a function call the stack should be empty, and it should be empty again on exit unless the return is a floating point value, where it would be the only item on the stack.
Since the exception handler must be able to return to any place, some criteria must hold on those locations. The question here is, does the stack unwinder have any knowledge of the FPU stack of the function having the catch()? Most likely, the answer is no because it is easier and faster to create a suitable return point with fixed properties than to include the full FPU stack in the unwinding.
Which leads to your problem - normally raising an exception has the compiler take care of the FPU being empty, but in an SEH handler, the compiler has no clue it caused an entry into another function, and thus can't take care of things just in case. (other than it being again hideously slow)
Which means that most likely, the FPU stack should be in its "consistent" state upon return, which means you probably want an equivalent of an EMMS instruction.
Why EMMS? Well, unless it is not supported it does the following:
clear the stack (which fixes all leftover floating point arguments)
clears the stack tags (which fixes an unusable stack when exiting from an MMX-enabled function)
If you want to support Pentium 1 or worse, you may of course if() around the EMMS and use something else instead.
No guarantees of course, but I hope I explained the why of the likely answer sufficiently.