Dealing with Floating Point exceptions - c++

I am not sure how to deal with floating point exceptions in either C or C++. From wiki, there are following types of floating point exceptions:
IEEE 754 specifies five arithmetic errors that are to be recorded in "sticky bits" (by default; note that trapping and other alternatives are optional and, if provided, non-default).
* inexact, set if the rounded (and returned) value is different from the mathematically exact result of the operation.
* underflow, set if the rounded value is tiny (as specified in IEEE 754) and inexact (or maybe limited to if it has denormalisation loss, as per the 1984 version of IEEE 754), returning a subnormal value (including the zeroes).
* overflow, set if the absolute value of the rounded value is too large to be represented (an infinity or maximal finite value is returned, depending on which rounding is used).
* divide-by-zero, set if the result is infinite given finite operands (returning an infinity, either +∞ or −∞).
* invalid, set if a real-valued result cannot be returned (like for sqrt(−1), or 0/0), returning a quiet NaN.
Is it that when any type of above exceptions happens, the program will exit abnormally? Or the program will carry this error on without mentioning anything and therefore make the error hard to debug?
Is a compiler like gcc able to give warning for some obvious case?
What can I do during coding my program to notify where the error happens and what types it is when it happens, so that I can locate the error easily in my code? Please give solutions in both C and C++ case.
Thanks and regards!

There are many options, but the general and also the default philosophy introduced by 754 is to not trap but to instead produce special results such as infinities that may or may not show up in important results.
As a result, the functions that test the state of individual operations are not used as often as the functions that test the representations of results.
See, for example...
LIST OF FUNCTIONS
Each of the functions that use floating-point values are provided in sin-
gle, double, and extended precision; the double precision prototypes are
listed here. The man pages for the individual functions provide more
details on their use, special cases, and prototypes for their single and
extended precision versions.
int fpclassify(double)
int isfinite(double)
int isinf(double)
int isnan(double)
int isnormal(double)
int signbit(double)
Update:
For anyone who really thinks FPU ops generate SIGFPE in a default case these days, I would encourage you to try this program. You can easily generate underflow, overflow, and divide-by-zero. What you will not generate (unless you run it on the last surviving VAX or a non-754 RISC) is SIGFPE:
#include <stdio.h>
#include <stdlib.h>
int main(int ac, char **av) { return printf("%f\n", atof(av[1]) / atof(av[2])); }

On Linux you can use the GNU extension feenableexcept (hidden right at the bottom of that page) to turn on trapping on floating point exceptions - if you do this then you'll receive the signal SIGFPE when an exception occurs which you can then catch in your debugger. Watch out though as sometimes the signal gets thrown on the floating point instruction after the one that's actually causing the problem, giving misleading line information in the debugger!

On Windows with Visual C++, you can control which floating-point exceptions are unmasked using _control87() etc.. Unmasked floating-point exceptions generate structured exceptions, which can be handled using __try/__except (and a couple of other mechanisms). This is all completely platform-dependent.
If you leave floating point exceptions masked, another platform-dependent approach to detecting these conditions is to clear the floating-point status using _clear87() etc., perform computations, and then query the floating-point status using _status87() etc..
Is any of this any better than DigitalRoss's suggestion of checking the result? In most cases, it's not. If you need to detect (or control) rounding (which is unlikely), then maybe?
On Windows with Borland/CodeGear/Embarcadero C++, some floating-point exceptions are unmasked by default, which often causes problems when using third-party libraries that were not tested with floating-point exceptions unmasked.

Different compilers handle these errors in different ways.
Inexactness is almost always the result of division of numbers with an absolute value greater than one (perhaps through trancendental functions). Adding, subtracting and multiplying numbers with an absolute value > 1.0 can only result in overflow.
Underflow doesn't occur very often, and probably won't be a concern in normal calculations except for iterated functions such as Taylor series.
Overflow is a problem that can usually be detected by some sort of "infinity" comparison, different compilers vary.
Divide by zero is quite noticable since your program will (should) crash if you don't have an error handler. Checking dividends and divisors will help avoid the problem.
Invalid answers usually are caught without special error handlers with some sort of DOMAIN error printed.
[EDIT]
This might help: (Numerical Computation Guide by Sun)
http://docs.sun.com/source/806-3568/

C99 introduced functions for handling floating point exceptions. Prior to a floating point operation, you can use feclearexcept() to clear any outstanding exceptions. After the operation(s), you can then use fetestexcept() to test which exception flags are set.

In Linux, you can trap these exceptions by trapping the SIGFPE signal. If you do nothing, these exceptions will terminate your program. To set a handler, use the signal function, passing the signal you wish to have trapped, and the function to be called in the event the signal fires.

Related

Understanding IEEE 754: why Underflow depends on Inexact?

Note: understanding IEEE 754. Please be patient.
IEEE 754-2008 (emphasis added):
In addition, under default exception handling for underflow, if the rounded result is inexact — that is, it differs from what would have been computed were both exponent range and precision unbounded — the underflow flag shall be raised and the inexact (see 7.6) exception shall be signaled. If the rounded result is exact, no flag is raised and no inexact exception is signaled. This is the only case in this standard of an exception signal receiving default handling that does not raise the corresponding flag. Such an underflow signal has no observable effect under default handling.
As I understanding it: underflow == inexact && tiny.
Simple question: why Underflow depends on Inexact?
I.e. why if exact subnormal is produced, then no Underflow exception is raised? What is the motivation / rationale of such behavior?
Exceptions generally indicate an ideal mathematical result cannot be provided, and they inform the program about the nature of the issue.
One purpose of having exceptions generate traps is so a program can attend to the situation in a way customized to the program’s purpose. For example, one program might want to deal with overflow by terminating the current calculation sequence. Another program might want to deal with overflow by rescaling the operands and recording the new scale, effectively implementing its own extended exponent range by tracking the rescalings. Another program might want to produce infinity as a result. So traps allow customizing program behavior.
Where it makes sense, default results have been provided, such as producing infinity for an overflow, and programs that are okay with the default results can leave traps for exceptions turned off. They might ignore exceptions or check the exceptions flags at the end of a sequence of calculations.
If the program is accepting the default handling for underflow, and a subnormal result occurs but it is exact, there is no need to inform the program, because the ideal mathematical result has been provided and the program has indicated it does not want to take any special action for underflow, such as rescaling the results. If the underflow flag were raised, and the program checked it at the end of a sequence of calculations, that would incorrectly indicate some incorrect result may have occurred.

The meanings of "floating invalid" from ifort compiler

I use intel's ifort compiler for my Fortran code.
Sometimes I get an error during running:
forrtl: error (65): floating invalid
The compiler does not give the exact "invalid" reason. To my understanding, does this suggest one of the following?
Underflow, which means too close to 0, e.g. 1e-30.
Overflow, which means too large, e.g. infinity/-infinity, divide by zero.
NaN.
This is useful because I can use if statement to check which one of these cases actually happened.
Beside, I'm using -g -traceback option for debugging. Is there an option which gives more details?
The Intel Fortran compiler generally uses IEEE arithmetic. The "floating invalid" message is a result of an unhandled IEEE exception of an invalid operation.
Underflow and overflow are not treated as invalid operations (note that dividing zero by itself, or an infinity by another infinity are invalid, not overflowing). In basic terms, an invalid operation is one where, mathematically, the operand is not in the domain of the operator. Not just those two examples mentioned before, but things like taking the square root or log of a negative real number. Or using NaNs inappropriately.
The Intel compiler has supported Fortran 2003's IEEE features for some time. You can use these for fine trapping of exceptions. The compile-time option fpe controls how the compiler responds to exceptions.

On which platforms does integer divide by zero trigger a floating point exception?

In another question, someone was wondering why they were getting a "floating point error" when in fact they had an integer divide-by-zero in their C++ program. A discussion arose around this, with some asserting that floating point exceptions are in fact never raised for float divide by zero, but only arise on integer divide by zero.
This sounds strange to me, because I know that:
MSVC-compiled code on x86 and x64 on all Windows platforms reports an int divide by zero as "0xc0000094: Integer division by zero", and float divide by zero as 0xC000008E "Floating-point division by zero" (when enabled)
IA-32 and AMD64 ISAs specify #DE (integer divide exception) as interrupt 0. Floating-point exceptions trigger interrupt 16 (x87 floating-point) or interrupt 19 (SIMD floating-point).
Other hardware have similarly different interrupts (eg PPC raises 0x7000 on float-div-by-zero and doesn't trap for int/0 at all).
Our application unmasks floating-point exceptions for divide-by-zero with the _controlfp_s intrinsic (ultimately stmxcsr op) and then catches them for debugging purposes. So I have definitely seen IEEE754 divide-by-zero exceptions in practice.
So I conclude that there are some platforms that report int exceptions as floating point exceptions, such as x64 Linux (raising SIGFPE for all arithmetic errors regardless of ALU pipe).
What other operating systems (or C/C++ runtimes if you are the operating system) report integer div-by-zero as a floating point exception?
I'm not sure how the current situation came to be, but it's currently the case that FP exception detection support is very different from integer. It's common for integer division to trap. POSIX requires it to raise SIGFPE if it raises an exception at all.
However, you can sort out what kind of SIGFPE it was, to see that it was actually a division exception. (Not necessarily divide-by-zero, though: 2's complement INT_MIN / -1 division traps, and x86's div and idiv also trap when the quotient of 64b/32b division doesn't fit in the 32b output register. But that's not the case on AArch64 using sdiv.)
The glibc manual explains that BSD and GNU systems deliver an extra arg to the signal handler for SIGFPE, which will be FPE_INTDIV_TRAP for divide by zero. POSIX documents FPE_INTDIV_TRAP as a possible value for siginfo_t's int si_code field, on systems where siginfo_t includes that member.
IDK if Windows delivers a different exception in the first place, or if it bundles things into different flavours of the same arithmetic exception like Unix does. If so, the default handler decodes the extra info to tell you what kind of exception it was.
POSIX and Windows both use the phrase "division by zero" to cover all integer division exceptions, so apparently this is common shorthand. For people that do know about about INT_MIN / -1 (with 2's complement) being a problem, the phrase "division by zero" can be taken as synonymous with a divide exception. The phrase immediately points out the common case for people that don't know why integer division might be a problem.
FP exceptions semantics
FP exceptions are masked by default for user-space processes in most operating systems / C ABIs.
This makes sense, because IEEE floating point can represent infinities, and has NaN to propagate the error to all future calculations using the value.
0.0/0.0 => NaN
If x is finite: x/0.0 => +/-Inf with the sign of x
This even allows things like this to produce a sensible result when exceptions are masked:
double x = 0.0;
double y = 1.0/x; // y = +Inf
double z = 1.0/y; // z = 1/Inf = 0.0, no FP exception
FP vs. integer error detection
The FP way of detecting errors is pretty good: when exceptions are masked, they set a flag in the FP status register instead of trapping. (e.g. x86's MXCSR for SSE instructions). The flag stays set until manually cleared, so you can check once (after a loop for example) to see which exceptions happened, but not where they happened.
There have been proposals for having similar "sticky" integer-overflow flags to record if overflow happened at any point during a sequence of computations. Allowing integer division exceptions to be masked would be nice in some cases, but dangerous in other cases (e.g. in an address calculation, you should trap instead of potentially storing to a bogus location).
On x86, though, detecting if integer overflow happened during a sequence of calculations requires putting a conditional branch after every one of them, because flags are just overwritten. MIPS has an add instruction that will trap on signed overflow, and an unsigned instruction that never traps. So integer exception detection and handling is a lot less standardized.
Integer division doesn't have the option of producing NaN or Inf results, so it makes sense for it to work this way.
Any integer bit pattern produced by integer division will be wrong, because it will represent a specific finite value.
However, on x86, converting an out-of-range floating point value to integer with cvtsd2si or any similar conversion instruction produces the "integer indefinite" value if the "floating-point invalid" exception is masked. The value is all-zero except the sign bit. i.e. INT_MIN.
(See the Intel manuals, links in the x86 tag wiki.
What other operating systems (or C/C++ runtimes if you are the operating system) report integer div-by-zero as a floating point exception?
The answer depends on whether you are in kernel space or user space. If you are in kernel space, you can put "i / 0" in kernel_main(), have your interrupt handler call an exception handler and halt your kernel. If you're in user space, the answer depends on your operating system and compiler settings.
AMD64 hardware specifies integer divide by zero as interrupt 0, different from interrupt 16 (x87 floating-point exception) and interrupt 19 (SIMD floating-point exception).
The "Divide-by-zero" exception is for dividing by zero with the div instruction. Discussing the x87 FPU is outside the scope of this question.
Other hardware have similarly different interrupts (eg PPC raises 0x7000 on float-div-by-zero and doesn't trap for int/0 at all).
More specifically, 00700 is mapped to exception type "Program", which includes a floating-point enabled exception. Such an exception is raised if trying to divide-by-zero using a floating point instruction.
On the other hand, integer division is undefined behavior per the PPC PEM:
8-53 divw
If an attempt is made to perform either of the divisions—0x8000_0000 ÷
–1 or ÷ 0, then the contents of rD are undefined, as are
the contents of the LT, GT, and EQ bits of the CR0 field (if Rc = 1).
In this case, if OE = 1 then OV is set.
Our application unmasks floating-point exceptions for divide-by-zero with the _controlfp_s intrinsic (ultimately stmxcsr op) and then catches them for debugging purposes. So I have definitely seen IEEE754 divide-by-zero exceptions in practice.
I think your time is better spent catching divide by zero at compile-time rather than at run-time.
For userspace, this happens on AIX running on POWER, HP-UX running on PA-RISC, Linux running on x86-64, macOS running on x86-64, Tru64 running on Alpha and Solaris running on SPARC.
Avoiding divides-by-zero at compile time is much better.

number divide by zero is hardware exception

I have learnt during C++ exceptional handling that number divide by zero is a hardware exception. Can anybody explain it why it is called hardware exception
Because it is not an exception in the C++ sense. Usually, in the C++ world, we use the word "hardware trap", to avoid any ambiguity, but "hardware exception" can also be used. Basically, the hardware triggers something which will cause you to land in the OS.
And not all systems will generate a hardware trap for divide by 0. I've worked on one where you just got the largest possible value as a result, and kept on.
The C++ Standard itself considers divide by zero to be Undefined Behaviour, but as usual an implementation can provide Implementation Defined Behaviour if it likes.
C++20 stipulations:
7.1.4 If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [Note: Treatment of division by zero, forming a remainder using a zero divisor, and all floating-point exceptions varies among machines, and is sometimes adjustable by a library function.— end note
Typically in practice, your CPU will check for divide by zero, and historically different CPU manufacturers have used different terminology for the CPU behaviour that results: some call it an "interrupt", others a "trap", or "signal", or "exception", or "fault", or "abort". CPU designers don't tend to care about - or avoid clashes with - anything but their hardware and assembly language terminology....
Regardless, even if called a "hardware exception", it's nothing to do with C++ exceptions in the try/catch sense.
On an Intel for example, a divide by zero will result in the CPU spontaneously saving a minimum of registers on the stack, then calling a function whose address must have been placed in a specific memory address beforehand.
It's up to the OS/executable to pick/override with some useful behaviour, and while some C++ compilers do specifically support interception of these events and generation of C++ Exceptions, it's not a feature mentioned by the C++ Standard, nor widely portable. The general expectation is that you'll either write a class that checks consistently, or perform ad-hoc checks before divisions that might fail.
This is a hardware exception because it's detected by CPU.
Your code in c/c++ or any other language is converted to CPU commands and then executed by CPU. So only CPU can find out you divided by zero
It depends on your processor if you get an exception or not. Fixed point and floating point also are different or can be. The floating point spec, to be compliant, has both an exception and non-exception solution for devide by zero. If the fpu has that exception disabled then you would get the "properly signed infinity" otherwise you get an exception and the result is instead a nan or something like that I dont have the spec handy.
The programmers reference manual for a particular processor should hopefully discuss fixed point divide by zero behavior if the processor has a divide at all. If not then it is a soft divide and then it is up to the compiler library as to what it does (calls a divide by zero handler for example).
It would be called a hardware exception in general because the hardware is detecting the problem, and the hardware does something as a result. Same thing when you have other problems like mmu access faults, data aborts, prefetch aborts, etc. hardware exception because it is an exception handled by hardware, generally...
Because, if it is checked, then it is checked and raised by the hardware. Specifically, the Arithmetic-Logic Unit (ALU) of your CPU will check for 0 as divider and generate an appropriate interrupt to signal the exception.
Otherwise, you would have to explicitely check for 0 in the assembler source code.
Edit: Note that this apply to integer division only, since floating point division has specific states to signal a division by zero.

Strategy for dealing with floating point inaccuracy

Is there a general best practice strategy for dealing with floating point inaccuracy?
The project that I'm working on tried to solve them by wrapping everything in a Unit class which holds the floating point value and overloads the operators. Numbers are considered equal if they "close enough," comparisons like > or < are done by comparing with a slightly lower or higher value.
I understand the desire to encapsulate the logic of handling such floating point errors. But given that this project has had two different implementations (one based on the ratio of the numbers being compared and one based on the absolute difference) and I've been asked to look at the code because its not doing the right, the strategy seems to be a bad one.
So what is best the strategy for try to make sure you handle all of the floating point inaccuracy in a program?
You want to keep data as dumb as possible, generally. Behavior and the data are two concerns that should be kept separate.
The best way is to not have unit classes at all, in my opinion. If you have to have them, then avoid overloading operators unless it has to work one way all the time. Usually it doesn't, even if you think it does. As mentioned in the comments, it breaks strict weak ordering for instance.
I believe the sane way to handle it is to create some concrete comparators that aren't tied to anything else.
struct RatioCompare {
bool operator()(float lhs, float rhs) const;
};
struct EpsilonCompare {
bool operator()(float lhs, float rhs) const;
};
People writing algorithms can then use these in their containers or algorithms. This allows code reuse without demanding that anyone uses a specific strategy.
std::sort(prices.begin(), prices.end(), EpsilonCompare());
std::sort(prices.begin(), prices.end(), RatioCompare());
Usually people trying to overload operators to avoid these things will offer complaints about "good defaults", etc. If the compiler tells you immediately that there isn't a default, it's easy to fix. If a customer tells you that something isn't right somewhere in your million lines of price calculations, that is a little harder to track down. This can be especially dangerous if someone changed the default behavior at some point.
Check comparing floating point numbers and this post on deniweb and this on SO.
Both techniques are not good. See this article.
Google Test is a framework for writing C++ tests on a variety of platforms.
gtest.h contains the AlmostEquals function.
// Returns true iff this number is at most kMaxUlps ULP's away from
// rhs. In particular, this function:
//
// - returns false if either number is (or both are) NAN.
// - treats really large numbers as almost equal to infinity.
// - thinks +0.0 and -0.0 are 0 DLP's apart.
bool AlmostEquals(const FloatingPoint& rhs) const {
// The IEEE standard says that any comparison operation involving
// a NAN must return false.
if (is_nan() || rhs.is_nan()) return false;
return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
<= kMaxUlps;
}
Google implementation is good, fast and platform-independent.
A small documentation is here.
To me floating point errors are essentially those which on an x86 would lead to a floating point exception (assuming the coprocessor has that interrupt enabled). A special case is the "inexact" exception i e when the result was not exactly representable in the floating point format (such as when dividing 1 by 3). Newbies not yet at home in the floating-point world will expect exact results and will consider this case an error.
As I see it there are several strategies available.
Early data checking such that bad values are identified and handled
when they enter the software. This lessens the need for testing
during the floating operations themselves which should improve
performance.
Late data checking such that bad values are identified
immediately before they are used in actual floating point operations.
Should lead to lower performance.
Debugging with floating point
exception interrupts enabled. This is probably the fastest way to
gain a deeper understanding of floating point issues during the
development process.
to name just a few.
When I wrote a proprietary database engine over twenty years ago using an 80286 with an 80287 coprocessor I chose a form of late data checking and using x87 primitive operations. Since floating point operations were relatively slow I wanted to avoid doing floating point comparisons every time I loaded a value (some of which would cause exceptions). To achieve this my floating point (double precision) values were unions with unsigned integers such that I would test the floating point values using x86 operations before the x87 operations would be called upon. This was cumbersome but the integer operations were fast and when the floating point operations came into action the floating point value in question would be ready in the cache.
A typical C sequence (floating point division of two matrices) looked something like this:
// calculate source and destination pointers
type1=npx_load(src1pointer);
if (type1!=UNKNOWN) /* x87 stack contains negative, zero or positive value */
{
type2=npx_load(src2pointer);
if (!(type2==POSITIVE_NOT_0 || type2==NEGATIVE))
{
if (type2==ZERO) npx_pop();
npx_pop(); /* remove src1 value from stack since there won't be a division */
type1=UNKNOWN;
}
else npx_divide();
}
if (type1==UNKNOWN) npx_load_0(); /* x86 stack is empty so load zero */
npx_store(dstpointer); /* store either zero (from prev statement) or quotient as result */
npx_load would load value onto the top of the x87 stack providing it was valid. Otherwise the top of the stack would be empty. npx_pop simply removes the value currently at the top of the x87. BTW "npx" is an abbreviation for "Numeric Processor eXtenstion" as it was sometimes called.
The method chosen was my way of handling floating-point issues stemming from my own frustrating experiences at trying to get the coprocessor solution to behave in a predictable manner in an application.
For sure this solution led to overhead but a pure
*dstpointer = *src1pointer / *src2pointer;
was out of the question since it didn't contain any error handling. The extra cost of this error handling was more than made up for by how the pointers to the values were prepared. Also, the 99% case (both values valid) is quite fast so if the extra handling for the other cases is slower, so what?