Is there a general best practice strategy for dealing with floating point inaccuracy?
The project that I'm working on tried to solve them by wrapping everything in a Unit class which holds the floating point value and overloads the operators. Numbers are considered equal if they "close enough," comparisons like > or < are done by comparing with a slightly lower or higher value.
I understand the desire to encapsulate the logic of handling such floating point errors. But given that this project has had two different implementations (one based on the ratio of the numbers being compared and one based on the absolute difference) and I've been asked to look at the code because its not doing the right, the strategy seems to be a bad one.
So what is best the strategy for try to make sure you handle all of the floating point inaccuracy in a program?
You want to keep data as dumb as possible, generally. Behavior and the data are two concerns that should be kept separate.
The best way is to not have unit classes at all, in my opinion. If you have to have them, then avoid overloading operators unless it has to work one way all the time. Usually it doesn't, even if you think it does. As mentioned in the comments, it breaks strict weak ordering for instance.
I believe the sane way to handle it is to create some concrete comparators that aren't tied to anything else.
struct RatioCompare {
bool operator()(float lhs, float rhs) const;
};
struct EpsilonCompare {
bool operator()(float lhs, float rhs) const;
};
People writing algorithms can then use these in their containers or algorithms. This allows code reuse without demanding that anyone uses a specific strategy.
std::sort(prices.begin(), prices.end(), EpsilonCompare());
std::sort(prices.begin(), prices.end(), RatioCompare());
Usually people trying to overload operators to avoid these things will offer complaints about "good defaults", etc. If the compiler tells you immediately that there isn't a default, it's easy to fix. If a customer tells you that something isn't right somewhere in your million lines of price calculations, that is a little harder to track down. This can be especially dangerous if someone changed the default behavior at some point.
Check comparing floating point numbers and this post on deniweb and this on SO.
Both techniques are not good. See this article.
Google Test is a framework for writing C++ tests on a variety of platforms.
gtest.h contains the AlmostEquals function.
// Returns true iff this number is at most kMaxUlps ULP's away from
// rhs. In particular, this function:
//
// - returns false if either number is (or both are) NAN.
// - treats really large numbers as almost equal to infinity.
// - thinks +0.0 and -0.0 are 0 DLP's apart.
bool AlmostEquals(const FloatingPoint& rhs) const {
// The IEEE standard says that any comparison operation involving
// a NAN must return false.
if (is_nan() || rhs.is_nan()) return false;
return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
<= kMaxUlps;
}
Google implementation is good, fast and platform-independent.
A small documentation is here.
To me floating point errors are essentially those which on an x86 would lead to a floating point exception (assuming the coprocessor has that interrupt enabled). A special case is the "inexact" exception i e when the result was not exactly representable in the floating point format (such as when dividing 1 by 3). Newbies not yet at home in the floating-point world will expect exact results and will consider this case an error.
As I see it there are several strategies available.
Early data checking such that bad values are identified and handled
when they enter the software. This lessens the need for testing
during the floating operations themselves which should improve
performance.
Late data checking such that bad values are identified
immediately before they are used in actual floating point operations.
Should lead to lower performance.
Debugging with floating point
exception interrupts enabled. This is probably the fastest way to
gain a deeper understanding of floating point issues during the
development process.
to name just a few.
When I wrote a proprietary database engine over twenty years ago using an 80286 with an 80287 coprocessor I chose a form of late data checking and using x87 primitive operations. Since floating point operations were relatively slow I wanted to avoid doing floating point comparisons every time I loaded a value (some of which would cause exceptions). To achieve this my floating point (double precision) values were unions with unsigned integers such that I would test the floating point values using x86 operations before the x87 operations would be called upon. This was cumbersome but the integer operations were fast and when the floating point operations came into action the floating point value in question would be ready in the cache.
A typical C sequence (floating point division of two matrices) looked something like this:
// calculate source and destination pointers
type1=npx_load(src1pointer);
if (type1!=UNKNOWN) /* x87 stack contains negative, zero or positive value */
{
type2=npx_load(src2pointer);
if (!(type2==POSITIVE_NOT_0 || type2==NEGATIVE))
{
if (type2==ZERO) npx_pop();
npx_pop(); /* remove src1 value from stack since there won't be a division */
type1=UNKNOWN;
}
else npx_divide();
}
if (type1==UNKNOWN) npx_load_0(); /* x86 stack is empty so load zero */
npx_store(dstpointer); /* store either zero (from prev statement) or quotient as result */
npx_load would load value onto the top of the x87 stack providing it was valid. Otherwise the top of the stack would be empty. npx_pop simply removes the value currently at the top of the x87. BTW "npx" is an abbreviation for "Numeric Processor eXtenstion" as it was sometimes called.
The method chosen was my way of handling floating-point issues stemming from my own frustrating experiences at trying to get the coprocessor solution to behave in a predictable manner in an application.
For sure this solution led to overhead but a pure
*dstpointer = *src1pointer / *src2pointer;
was out of the question since it didn't contain any error handling. The extra cost of this error handling was more than made up for by how the pointers to the values were prepared. Also, the 99% case (both values valid) is quite fast so if the extra handling for the other cases is slower, so what?
Related
One of the big got'chas of floating point numbers is that some of them cannot be exactly represented in binary. This can make them difficult to work with. However what I'm curious about is whether or not subtle or not-so-subtle errors in floating point are deterministic. Can somebody predict them for example? Here's one example of a random number generator that could take advantage of floating point errors:
#include <cmath>
float constant = M_PI;
float generate()
{
static float state = 1;
state = state * constant;
return state;
}
One would have to know the implementation, the hardware, the compiler settings and so on, which makes it quite difficult to predict what the results would be. Or is my thinking flawed?
Floating point "errors" are deterministic. There is a 1:1 mapping between input and output values for a given operation. Your example will produce the same output sequence every time.
That said, there could be a floating-point implementation or ten out there that will produce different sequences, but this is not something you can consider "random" (i.e. a source of entropy).
Every floating point representation defines the composition of a floating point variable (which part is the mantissa, which part is the exponent, which part is the sign, etc) and the behaviour of every operation.
In any implementation you might choose, it is therefore possible to predict the result of every floating point operation, if you know its operand (or operands) That characteristic is the definition of determinism.
So, yes, floating point operations are deterministic.
Different implementations (compilers, host systems, etc) do support different floating point representations. So there is some variation of results between implementations. However, it is still possible to predict the result of any floating point operation, if you know how floating point variables are represented, and how operations work.
The fact not everyone knows enough about floating point types and operations on them does not make them non-deterministic. Nor does the fact that not everyone can describe the complete set of operations in a complex algorithm. The knowledge is readily available and, with enough effort, understandable well enough so effects of all operations on all possible operands can be reliably predicted before doing the operation.
There are buggy implementations of floating point out there, which do not comply with their own documentation. For example, look up the pentium FDIV bug - where some early pentium CPUs implemented floating point division incorrectly. Even those turned out to be deterministic, once it was understood what the operations actually do.
While going through this post at SO by the user #skrebbel who stated that the google testing framework does a good and fast job for comparing floats and doubles. So I wrote the following code to check the validity of the code and apparently it seems like I am missing something here , since I was expecting to enter the almost equal to section here this is my code
float left = 0.1234567;
float right= 0.1234566;
const FloatingPoint<float> lhs(left), rhs(right);
if (lhs.AlmostEquals(rhs))
{
std::cout << "EQUAL"; //Shouldnt it have entered here ?
}
Any suggetsions would be appreciated.
You can use
ASSERT_NEAR(val1, val2, abs_error);
where you can give the acceptable - your chosen one, like, say 0.0000001 - difference as abs_error, if the default one is too small, see here https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#floating-point-comparison
Your left and right are not “almost equal” because they are too far apart, farther than the default tolerance of AlmostEquals. The code in one of the answers in the question you linked to shows a tolerance of 4 ULP, but your numbers are 14 ULP apart (using IEEE 754 32-bit binary and correctly rounding software). (An ULP is the minimum increment of the floating-point value. It is small for floating-point numbers of small magnitude and large for large numbers, so it is approximately relative to the magnitude of the numbers.)
You should never perform any floating-point comparison without understanding what errors may be in the values you are comparing and what comparison you are performing.
People often misstate that you cannot test floating-point values for equality. This is false; executing a == b is a perfect operation. It returns true if and only if a is equal to b (that is, a and b are numbers with exactly the same value). The actual problem is that they are trying to calculate a correct function given incorrect input. == is a function: It takes two inputs and returns a value. Obviously, if you give any function incorrect inputs, it may return an incorrect result. So the problem here is not floating-point comparison; it is incorrect inputs. You cannot generally calculate a sum, a product, a square root, a logarithm, or any other function correctly given incorrect input. Therefore, when using floating-point, you must design an algorithm to work with approximate values (or, in special cases, use great care to ensure no errors are introduced).
Often people try to work around errors in their floating-point values by accepting as equal numbers that are slightly different. This decreases false negatives (indications of inequality due to prior computing errors) at the expense of increasing false positives (indications of equality caused by lax acceptance). Whether this exchange of one kind of error for another is acceptable depends on the application. There is no general solution, which is why functions like AlmostEquals are generally bad.
The errors in floating-point values are the results of preceding operations and values. These errors can range from zero to infinity, depending on circumstances. Because of this, one should never simply accept the default tolerance of a function such as AlmostEquals. Instead, one should calculate the tolerance, which is specific to their applications, needs, and computations, and use that calculated tolerance (or not use a comparison at all).
Another problem is that functions such as AlmostEquals are often written using tolerances that are specified relative to the values being compared. However, the errors in the values may have been affected by intermediate values of vastly different magnitude, so the final error might be a function of data that is not present in the values being compared.
“Approximate” floating-point comparisons may be acceptable in code that is testing other code because most bugs are likely cause large errors, so a lax acceptance of equality will allow good code to continue but will report bugs in most bad code. However, even in this situation, you must set the expected result and the permitted error tolerance appropriately. The AlmostEquals code appears to hard-code the error tolerance.
(Not sure if this 100% applies to the original question but this is what I came for when I stumbled upon it)
There also exist ASSERT_FLOAT_EQ and EXPECT_FLOAT_EQ (or the corresponding versions for double) which you can use if you don't want to worry about tolerable errors yourself.
Docs: https://github.com/google/googletest/blob/master/docs/reference/assertions.md#floating-point-comparison-floating-point
Without getting into unnecessary details, is it possible for operations on floating-point numbers (x86_64) to return -however small- variations on their results, based on identical inputs? Even a single bit different?
I am simulating a basically chaotic system, and I expect small variations on the data to have visible effects. However I expected that, with the same data, the behavior of the program would be fixed. This is not the case. I get visible, but acceptable, differences with each run of the program.
I am thinking I have left some variable uninitialized somewhere...
The languages I am using are C++ and Python.
ANSWER
Russell's answer is correct. Floating point ops are deterministic. The non-determinism was caused by a dangling pointer.
Yes, this is possible. Quoting from the C++ FAQ:
Turns out that on some installations, cos(x) != cos(y) even though x == y. That's not a typo; read it again if you're not shocked: the cosine of something can be unequal to the cosine of the same thing. (Or the sine, or the tangent, or the log, or just about any other floating point computation.)
Why?
[F]loating point calculations and comparisons are often performed by special hardware that often contain special registers, and those registers often have more bits than a double. That means that intermediate floating point computations often have more bits than sizeof(double), and when a floating point value is written to RAM, it often gets truncated, often losing some bits of precision.
Contra Thomas's answer, floating point operations are not non-deterministic. They are fiendishly subtle, but a given program should give the same outputs for the same inputs, if it is not using uninitialized memory or deliberately randomized data.
My first question is, what do you mean by "the same data"? How is that data getting into your program?
I have a large amount of data to process with math intensive operations on each data set. Much of it is analogous to image processing. However, since this data is read directly from a physical device, many of the pixel values can be invalid.
This makes NaN's property of representing values that are not a number and spreading on arithmetic operations very compelling. However, it also seems to require turning off some optimizations such as gcc's -ffast-math, plus we need to be cross platform. Our current design uses a simple struct that contains a float value and a bool indicating validity.
While it seems NaN was designed with this use in mind,
others think it is more trouble than it is worth. Does anyone have advice based on their more intimate experience with IEEE754 with performance in mind?
BRIEF: For strictest portability, don't use NaNs. Use a separate valid bit. E.g. a template like Valid. However, if you know that you will only ever run on IEEE 754-2008 machines, and not IEEE 754-1985 (see below), then you may get away with it.
For performance, it is probably faster not to use NaNs on most of the machines that you have access to. However, I have been involved with hardware design of FP on several machines that are improving NaN handling performance, so there is a trend to make NaNs faster, and, in particular, signalling NaNs should soon be faster than Valid.
DETAIL:
Not all floating point formats have NaNs. Not all systems use IEEE floating point. IBM hex floating point can still be found on some machines - actually systems, since IBM now supports IEEE FP on more recent machines.
Furthermore, IEEE Floating Point itself had compatibility issues wrt NaNs, in IEEE 754-1985. E.g, see wikipedia http://en.wikipedia.org/wiki/NaN:
The original IEEE 754 standard from 1985 (IEEE 754-1985) only
described binary floating point formats, and did not specify how the
signaled/quiet state was to be tagged. In practice, the most
significant bit of the significand determined whether a NaN is
signalling or quiet. Two different implementations, with reversed
meanings, resulted.
* most processors (including those of the Intel/AMD x86-32/x86-64 family, the Motorola 68000 family, the AIM PowerPC family, the ARM
family, and the Sun SPARC family) set the signaled/quiet bit to
non-zero if the NaN is quiet, and to zero if the NaN is signaling.
Thus, on these processors, the bit represents an 'is_quiet' flag.
* in NaNs generated by the PA-RISC and MIPS processors, the signaled/quiet bit is zero if the NaN is quiet, and non-zero if the
NaN is signaling. Thus, on these processors, the bit represents an
'is_signaling' flag.
This, if your code may run on older HP machines, or current MIPS machines (which are ubiquitous in embedded systems), you should not depend on a fixed encoding of NaN, but should have a machine dependent #ifdef for your special NaNs.
IEEE 754-2008 standardizes NaN encodings, so this is getting better. It depends on your market.
As for performance: many machines essentially trap, or otherwise take a major hiccup in performance, when performing computations involving both SNaNs (which must trap) and QNaNs (which don't need to trap, i.e. which could be fast - and which are getting faster in some machines as we speak.)
I can say with confidence that on older machines, particularly older Intel machines, you did NOT want to use NaNs if you cared about performance. E.g. http://www.cygnus-software.com/papers/x86andinfinity.html says "The Intel Pentium 4 handles infinities, NANs, and denormals very badly. ... If you write code that adds floating point numbers at the rate of one per clock cycle, and then throw infinities at it as input, the performance drops. A lot. A huge amount. ... NANs are even slower. Addition with NANs takes about 930 cycles. ... Denormals are a bit trickier to measure."
Get the picture? Almost 1000x slower to use a NaN than to do a normal floating point operation? In this case it is almost guaranteed that using a template like Valid will be faster.
However, see the reference to "Pentium 4"? That's a really old web page. For years people like me have been saying "QNaNs should be faster", and it has slowly taken hold.
More recently (2009), Microsoft says http://connect.microsoft.com/VisualStudio/feedback/details/498934/big-performance-penalty-for-checking-for-nans-or-infinity "If you do math on arrays of double that contain large numbers of NaN's or Infinities, there is an order of magnitude performance penalty."
If I feel impelled, I may go and run a microbenchmark on some machines. But you should get the picture.
This should be changing because it is not that hard to make QNaNs fast. But it has always been a chicken and egg problem: hardware guys like those I work with say "Nobody uses NaNs, so we won;t make them fast", while software guys don't use NaNs because they are slow. Still, the tide is slowly changing.
Heck, if you are using gcc and want best performance, you turn on optimizations like "-ffinite-math-only ... Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." Similar is true for most compilers.
By the way, you can google like I did, "NaN performance floating point" and check refs out yourself. And/or run your own microbenchmarks.
Finally, I have been assuming that you are using a template like
template<typename T> class Valid {
...
bool valid;
T value;
...
};
I like templates like this, because they can bring "validity tracking" not just to FP, but also to integer (Valid), etc.
But, they can have a big cost. The operations are probably not much more expensive than NaN handling on old machines, but the data density can be really poor. sizeof(Valid) may sometimes be 2*sizeof(float). This bad density may hurt performance much more than the operations involved.
By the way, you should consider template specialization, so that Valid uses NaNs if they arte available and fast, and a valid bit otherwise.
template <> class Valid<float> {
float value;
bool is_valid() {
return value != my_special_NaN;
}
}
etc.
Anyway, you are better off having as few valid bits as possible, and packing them elsewhere, rather than Valid right close to the value. E.g.
struct Point { float x, y, z; };
Valid<Point> pt;
is better (density wise) than
struct Point_with_Valid_Coords { Valid<float> x, y, z; };
unless you are using NaNs - or some other special encoding.
And
struct Point_with_Valid_Coords { float x, y, z; bool valid_x, valid_y, valid_z };
is in between - but then you have to do all the code yourself.
BTW, I have been assuming you are using C++. If FORTRAN or Java ...
BOTTOM LINE: separate valid bits is probably faster and more portable.
But NaN handling is speeding up, and one day soon will be good enough
By the way, my preference: create a Valid template. Then you can use it for all data types. Specialize it for NaNs if it helps. Although my life is making things faster, IMHO it is usually more important to make the code clean.
If invalid data is very common, you are of course wasting a lot of time on running this data through the processing. If the invalid data is common enough it is probably better to be running some kind of sparse datastructure of only the valid data. If it is not very common, you can of course keep a sparse datastructure of which data is invalid. That way you would not waste a bool for each value. But maybe memory is not a problem for you...
If you are doing operations such as multipling two possibly invalid data entries, I understand it is compelling to use NaNs instead of doing checks on both variables to see if they are valid and setting the same flag in the resultant.
How portable do you need to be? Will you ever need to be able to port it to an architecture with only fixed point support? If that is the case, I think your choice is clear.
Personally I would only use NaNs if it proved to be much faster. Otherwise I'd say the code gets more clear if you have explicit handling of invalid data.
Since the floating-point numbers come from a device, they probably have a limited range. You can use some other special number, rather than NaN, to indicate absense of data, e.g. 1e37. This solution is portable. I do not know whether or not is more convinient for you than using a bool flag.
I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.
In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:
if (value <= 0.0f) something_happens();
A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.
Here's a simple example program that demonstrates the phenomena I'm describing:
float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);
f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);
The output of this program is
0.0000050000057854
0.0000050000062402
even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.
What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.
The only solution I've so far been able to think of is to use some value epsilon like this:
if (value < epsilon) something_happens();
but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?
I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.
Note: Using integer values instead is not an option.
Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.
I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.
Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:
If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.
Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.
I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.
In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.
As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.
One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.
If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.
Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).
Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.
Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.