Is binary equality comparison of floats correct? - c++

I'm working on different memory block manipulation functions and during benchmarks I noticed, that my implementation of the IsEqualRange(double* begin1, double* end1, double* begin2, double* end2) is much faster then the std::equals(...) on MSVC and GCC as well. Further investigation showed, that doubles and floats are not block compared by memcmp, but in a for loop one by one.
In what situation does binary comparison of floats lead to incorrect result? When is it ok to binary compare (equality) array of floats/doubles? Are there other fundamental types where I shouldn't use memcmp?

The first thing I would do if I were you is to check your optimisation settings.
It's fine to use memcmp for an array of floating points but note that you could get different results to element-by-element ==. In particular, for IEEE754 floating point:
+0.0 is defined to compare equal to -0.0.
NaN is defined to compare not-equal to NaN.

The main issue is nan values, as these are never equal to themselves. There is also two representations of 0 (+0 and -0) that are equal but not binary equal.
So strictly speaking, you cannot use memcmp for them, as the answer would be mathematically incorrect.
If you know that you don't have nan or 0 values, then you can use memcmp.

Binary compare works with too much precision for many actual applications. For example, in javascript, 0.2 / 10.0 != 0.1 * 0.2 If you have two variables that both end up with very, very slight rounding errors, they will be unequal despite representing the "same" number.

Related

How does sorting algorithms sort containers and ranges of floats?

Since comparing floats is evil then if I have a container of floats and I sort it using some standard library sorting algorithm like std::sort then how does the algorithm sort them?
std::vector<float> vf{2.4f, 1.05f, 1.05f, 2.39f};
std::sort( vf.begin(), vf.end() );
So does the algorithm compare 1.05f and 1.05f?
Does it internally uses something like: std::fabs( 1.05f - 1.05f ) < 0.1;?
Does this apply too to containers of doubles? Thank you!
So does the algorithm compare 1.05f and 1.05f?
Yes
Does it internally uses something like: std::fabs( 1.05f - 1.05f ) < 0.1;?
No, it uses operator <, e.g. 1.05f < 1.05f. It doesn't ever need to compare for equality, so the comparison using epsilon value is not needed.
Does this apply too to containers of doubles?
Yes, it applies to containers of any type (unless you provide your own comparison function to std::sort)
Since comparing floats is evil…
That is a myth and is false. Related to it is the myth that floating-point numbers approximate real numbers.
<, <=, >, and >= work fine for sorting numbers, and == and != also correctly compare whether two numbers are equal. <, <=, >, and >= will not serve when sorting data containing NaNs (the “Not a Number” datum).
Per the IEEE 754 Standard for Floating-Point Arithmetic and other common specifications of floating-point formats, any floating-point representation ±F•be represents one number exactly. Even +∞ and −∞ are considered to be exact. It is the operations in floating-point arithmetic, not the numbers, that are specified to approximate real-number arithmetic. When a computational operation is performed, its result is the real-number results rounded to the nearest representable number according to a chosen rounding rule, except that operations with domain errors may produce a NaN. (Round-to-nearest-ties-to-even is the most common rule, and there are several others.)
Thus, when you add or subtract numbers, multiply or divide numbers, take square roots, or convert numbers from one base to another (such as decimal character input to internal floating-point format), there may be a rounding error.
Some operations have no error. Comparison operations have no error. <, <=, >, >=, ==, and != always produce the true mathematical result, true or false, with no error. And therefore they may be used for sorting numbers. (With NaNs, they always produce false, because a NaN is never less than, equal to, or greater than a number, nor is a NaN less than, equal to, or greater than a NaN. So false is the correct result, but it means these comparisons are not useful for sorting with NaNs.)
Understanding this distinction, that numbers are exact and operations may approximate, is essential for analyzing, designing, and proving algorithms involving floating-point arithmetic.
Of course, it may happen that in an array of numbers, the numbers contain errors from earlier operations: They differ from the numbers that would have been obtained by using real-number arithmetic. In this case, the numbers will be sorted into order based on their actual computed values, not on the values you would ideally like them to have. This is not a barrier to std::sort correctly sorting the numbers.
To sort data including NaNs, you need a total order predicate that reports whether one datum is earlier than another in the desired sort order, such as one that reports a NaN is later than any non-NaN. IEEE-754 defines a total order predicate, but I cannot speak to its availability in C++. (std::less does not seem to provide it, based on a quick test I tried.)

floating point number comparison why no equal functions

In the c++ standard for floating pointer number, there is std::isgreater for greater comparison, and std::isless for less comparison, so why is there not a std::isequal for equality comparison? Is there a safe and accurate way to check if a double variable is equal to the DBL_MAX constants defined by the standard? The reason we try to do this is we are accessing data through service protocol, and it defines a double field when no data is available it will send DBL_MAX, so in our client code when it's DBL_MAX we need to skip it, and anything else we need to process it.
The interest of isgreater, isless, isgreaterequal, islessequal compared to >, <, >= and <= is that they do not raise FE_INVALID (a floating point exception, these are different beasts than C++ exceptions and are not mapped to C++ exceptions) when comparing with a NaN while the operators do.
As == do not raise a FP exception, there is no need of an additional functionality which does.
Note that there is also islessgreater and isunordered.
If you are not considering NaN or not testing the floating point exception there is no need to worry about these functions.
Considering equality comparison == is what to use if you want to check that the values are the same (ignoring the issues related to signed 0 and NaN). Depending on how you are reaching these values, it is sometimes useful to consider an approximate equality comparison -- but using one systematically is not recommended, for instance such approximate equality is probably not transitive.
In your context of a network protocol, you have to consider how the data is serialized. If the serialization is defined as binary, you can probably reconstruct the exact value and thus == is what you want so compare against DBL_MAX (for other values, check what is specified for signed 0 and NaN an know that there are signalling and quiet NaN are represented by different bit patterns although IEEE 754-2008 recommend now one of them). If the representation is decimal, you'll have to check if the representation is precise enough for the DBL_MAX value be reconstructable (and pay attention to rounding modes).
Note that I'd have considered a NaN for representing the no data available case instead of using a potentially valid value.
Is there a safe and accurate way to check if a double variable is equal to the DBL_MAX constants defined by the standard?
When you obtain a floating point number as a result of evaluating some expression, then == comparison doesn't make sense in most cases due to finite precision and rounding errors. However, if you first set a floating point variable to some value, then you can compare it to that value using == (with some exceptions like positive and negative zeros).
For example:
double v = std::numeric_limits<double>::max();
{ conditional assignment to v of a non-double-max value }
if (v != std::numeric_limits<double>::max())
process(v);
std::numeric_limits<double>::max() is exactly representable as double (it is double), so this comparison should be safe and should never yield true unless v was reassigned.
Equal function between to floating point have no sense, because the floating point numbers have rounding.
So if you want compare 2 floating point numbers for equality the best way is to declare a significant epsilon(error) for your comparison and then verify that the first number is in around the second.
To do this you can verify if first number is greater than second number minus epsilon and the first number is lower than the second number plus epsilon.
Ex.
if ( first > second - epsilon && first < second + epsilon ){
//The numbers are equal
}else{
//The numbers are not equal
}

Comparing double in C++, peer review

I have always had the problem of comparing double values for equality. There are functions around like some fuzzy_compare(double a, double b), but I often enough did not manage to find them in time. So I thought on building a wrapper class for double just for the comparison operator:
typedef union {
uint64_t i;
double d;
} number64;
bool Double::operator==(const double value) const {
number64 a, b;
a.d = this->value;
b.d = value;
if ((a.i & 0x8000000000000000) != (b.i & 0x8000000000000000)) {
if ((a.i & 0x7FFFFFFFFFFFFFFF) == 0 && (b.i & 0x7FFFFFFFFFFFFFFF) == 0)
return true;
return false;
}
if ((a.i & 0x7FF0000000000000) != (b.i & 0x7FF0000000000000))
return false;
uint64_t diff = (a.i & 0x000FFFFFFFFFFFF) - (b.i & 0x000FFFFFFFFFFFF) & 0x000FFFFFFFFFFFF;
return diff < 2; // 2 here is kind of some epsilon, but integer and independent of value range
}
The idea behind it is:
First, compare the sign. If it's different, the numbers are different. Except if all other bits are zero. That is comparing +0.0 with -0.0, which should be equal. Next, compare the exponent. If these are different, the numbers are different. Last, compare the mantissa. If the difference is low enough, the values are equal.
It seems to work, but just to be sure, I'd like a peer review. It could well be that I overlooked something.
And yes, this wrapper class needs all the operator overloading stuff. I skipped that because they're all trivial. The equality operator is the main purpose of this wrapper class.
This code has several problems:
Small values on different sides of zero always compare unequal, no matter how (not) far apart.
More importantly, -0.0 compares unequal with +epsilon but +0.0 compares equal with +epsilon (for some epsilon). That's really bad.
What about NaNs?
Values with different exponents compare unequal, even if one floating point "step" apart (e.g. the double before 1 compares unequal to 1, but the one after 1 compares equal...).
The last point could ironically be fixed by not distinguishing between exponent and mantissa: The binary representations of all positive floats are exactly in the order of their magnitude!
It appears that you want to just check whether two floats are a certain number of "steps" apart. If so, maybe this boost function might help. But I would also question whether that's actually reasonable:
Should the smallest positive non-denormal compare equal to zero? There are still many (denormal) floats between them. I doubt this is what you want.
If you operate on values that are expected to be of magnitude 1e16, then 1 should compare equal to 0, even though half of all positive doubles are between 0 and 1.
It is usually most practical to use a relative + absolute epsilon. But I think it will be most worthwhile to check out this article, which discusses the topic of comparing floats more extensively than I could fit into this answer:
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
To cite its conclusion:
Know what you’re doing
There is no silver bullet. You have to choose wisely.
If you are comparing against zero, then relative epsilons and ULPs based comparisons are usually meaningless. You’ll need to use an absolute epsilon, whose value might be some small multiple of FLT_EPSILON and the inputs to your calculation. Maybe.
If you are comparing against a non-zero number then relative epsilons or ULPs based comparisons are probably what you want. You’ll probably want some small multiple of FLT_EPSILON for your relative epsilon, or some small number of ULPs. An absolute epsilon could be used if you knew exactly what number you were comparing against.
If you are comparing two arbitrary numbers that could be zero or non-zero then you need the kitchen sink. Good luck and God speed.
Above all you need to understand what you are calculating, how stable the algorithms are, and what you should do if the error is larger than expected. Floating-point math can be stunningly accurate but you also need to understand what it is that you are actually calculating.
You store into one union member and then read from another. That causes aliasing problem (undefined behaviour) because the C++ language requires that objects of different types do not alias.
There are a few ways to remove the undefined behaviour:
Get rid of the union and just memcpy the double into uint64_t. The portable way.
Mark union member i type with [[gnu::may_alias]].
Insert a compiler memory barrier between storing into union member d and reading from member i.
Frame the question this way:
We have two numbers, a and b, that have been computed with floating-point arithmetic.
If they had been computed exactly with real-number mathematics, we would have a and b.
We want to compare a and b and get an answer that tells us whether a equals b.
In other words, you are trying to correct for errors that occurred while computing a and b. In general, that is impossible, of course, because we do not know what a and b are. We only have the approximations a and b.
The code you propose falls back to another strategy:
If a and b are close to each other, we will accept that a equals b. (In other words: If a is close to b, it is possible that a equals b, and the differences we have are only because of calculation errors, so we will accept that a equals b without further evidence.)
There are two problems with this strategy:
This strategy will incorrectly accept that a equals b even when it is not true, just because a and b are close.
We need to decide how close to require a and b to be.
Your code attempts to address the latter: It is establishing some tests about whether a and b are close enough. As others have pointed out, it is severely flawed:
It treats numbers as different if they have different signs, but floating-point arithmetic can cause a to be negative even if a is positive, and vice versa.
It treats numbers as different if they have different exponents, but floating-point arithmetic can cause a to have a different exponent from a.
It treats numbers as different if they differ by more than a fixed number of ULP (units of least precision), but floating-point arithmetic can, in general, cause a to differ from a by any amount.
It assumes an IEEE-754 format and needlessly uses aliasing with behavior not defined by the C++ standard.
The approach is fundamentally flawed because it needlessly fiddles with the floating-point representation. The actual way to determine from a and b whether a and b might be equal is to figure out, given a and b, what sets of values a and b have and whether there is any value in common in those sets.
In other words, given a, the value of a might be in some interval, (a−eal, a+ear) (that is, all the numbers from a minus some error on the left to a plus some error on the right), and, given b, the value of b might be in some interval, (b−ebl, b+ebr). If so, what you want to test is not some floating-point representation properties but whether the two intervals (a−eal, a+ear) and (b−ebl, b+ebr) overlap.
To do that, you need to know, or at least have bounds on, the errors eal, ear, ebl, and ebr. But those errors are not fixed by the floating-point format. They are not 2 ULP or 1 ULP or any number of ULP scaled by the exponent. They depend on how a and b were computed. In general, the errors can range from 0 to infinity, and they can also be NaN.
So, to test whether a and b might be equal, you need to analyze the floating-point arithmetic errors that could have occurred. In general, this is difficult. There is an entire field of mathematics for it, numerical analysis.
If you have computed bounds on the errors, then you can just compare the intervals using ordinary arithmetic. There is no need to take apart the floating-point representation and work with the bits. Just use the normal add, subtract, and comparison operations.
(The problem is actually more complicated than I allowed above. Given a computed value a, the potential values of a do not always lie in a single interval. They could be an arbitrary set of points.)
As I have written previously, there is no general solution for comparing numbers containing arithmetic errors: 0 1 2 3.
Once you figure out error bounds and write a test that returns true if a and b might be equal, you still have the problem that the test also accepts false negatives: It will return true even in cases where a and b are not equal. In other words, you have just replaced a program that is wrong because it rejects equality even though a and b would be equal with a program that is wrong in other cases because it accepts equality in cases where a and b are not equal. This is another reason there is no general solution: In some applications, accepting as equal numbers that are not equal is okay, at least for some situations. In other applications, that is not okay, and using a test like this will break the program.

Is it ok to compare floating points to 0.0 without epsilon?

I am aware, that to compare two floating point values one needs to use some epsilon precision, as they are not exact. However, I wonder if there are edge cases, where I don't need that epsilon.
In particular, I would like to know if it is always safe to do something like this:
double foo(double x){
if (x < 0.0) return 0.0;
else return somethingelse(x); // somethingelse(x) != 0.0
}
int main(){
int x = -3.0;
if (foo(x) == 0.0) {
std::cout << "^- is this comparison ok?" << std::endl;
}
}
I know that there are better ways to write foo (e.g. returning a flag in addition), but I wonder if in general is it ok to assign 0.0 to a floating point variable and later compare it to 0.0.
Or more general, does the following comparison yield true always?
double x = 3.3;
double y = 3.3;
if (x == y) { std::cout << "is an epsilon required here?" << std::endl; }
When I tried it, it seems to work, but it might be that one should not rely on that.
Yes, in this example it is perfectly fine to check for == 0.0. This is not because 0.0 is special in any way, but because you only assign a value and compare it afterwards. You could also set it to 3.3 and compare for == 3.3, this would be fine too. You're storing a bit pattern, and comparing for that exact same bit pattern, as long as the values are not promoted to another type for doing the comparison.
However, calculation results that would mathematically equal zero would not always equal 0.0.
This Q/A has evolved to also include cases where different parts of the program are compiled by different compilers. The question does not mention this, my answer applies only when the same compiler is used for all relevant parts.
C++ 11 Standard,
§5.10 Equality operators
6 If both operands are of arithmetic or enumeration type, the usual
arithmetic conversions are performed on both operands; each of the
operators shall yield true if the specified relationship is true and
false if it is false.
The relationship is not defined further, so we have to use the common meaning of "equal".
§2.13.4 Floating literals
1 [...] If the scaled value is in the range of representable values
for its type, the result is the scaled value if representable, else
the larger or smaller representable value nearest the scaled value,
chosen in an implementation-defined manner. [...]
The compiler has to choose between exactly two values when converting a literal, when the value is not representable. If the same value is chosen for the same literal consistently, you are safe to compare values such as 3.3, because == means "equal".
Yes, if you return 0.0 you can compare it to 0.0; 0 is representable exactly as a floating-point value. If you return 3.3 you have to be a much more careful, since 3.3 is not exactly representable, so a conversion from double to float, for example, will produce a different value.
correction: 0 as a floating point value is not unique, but IEEE 754 defines the comparison 0.0==-0.0 to be true (any zero for that matter).
So with 0.0 this works - for every other number it does not. The literal 3.3 in one compilation unit (e.g. a library) and another (e.g. your application) might differ. The standard only requires the compiler to use the same rounding it would use at runtime - but different compilers / compiler settings might use different rounding.
It will work most of the time (for 0), but is very bad practice.
As long as you are using the same compiler with the same settings (e.g. one compilation unit) it will work because the literal 0.0 or 0.0f will translate to the same bit pattern every time. The representation of zero is not unique though. So if foo is declared in a library and your call to it in some application the same function might fail.
You can rescue this very case by using std::fpclassify to check whether the returned value represents a zero. For every finite (non-zero) value you will have to use an epsilon-comparison though unless you stay within one compilation unit and perform no operations on the values.
As written in both cases you are using identical constants in the same file fed to the same compiler. The string to float conversion the compiler uses should return the same bit pattern so these should not only be equal as in a plus or minus cases for zero thing but equal bit by bit.
Were you to have a constant which uses the operating systems C library to generate the bit pattern then have a string to f or something that can possibly use a different C library if the binary is transported to another computer than the one compiled on. You might have a problem.
Certainly if you compute 3.3 for one of the terms, runtime, and have the other 3.3 computed compile time again you can and will get failures on the equal comparisons. Some constants obviously are more likely to work than others.
Of course as written your 3.3 comparison is dead code and the compiler just removes it if optimizations are enabled.
You didnt specify the floating point format nor standard if any for that format you were interested in. Some formats have the +/- zero problem, some dont for example.
It is a common misconception that floating point values are "not exact". In fact each of them is perfectly exact (except, may be, some special cases as -0.0 or Inf) and equal to s·2e – (p – 1), where s, e, and p are significand, exponent, and precision correspondingly, each of them integer. E.g. in IEEE 754-2008 binary32 format (aka float32) p = 24 and 1 is represented as ‭0x‭800000‬‬·20 – 23. There are two things that are really not exact when you deal with floating point values:
Representation of a real value using a FP one. Obviously, not all real numbers can be represented using a given FP format, so they have to be somehow rounded. There are several rounding modes, but the most commonly used is the "Round to nearest, ties to even". If you always use the same rounding mode, which is almost certainly the case, the same real value is always represented with the same FP one. So you can be sure that if two real values are equal, their FP counterparts are exactly equal too (but not the reverse, obviously).
Operations with FP numbers are (mostly) inexact. So if you have some real-value function φ(ξ) implemented in the computer as a function of a FP argument f(x), and you want to compare its result with some "true" value y, you need to use some ε in comparison, because it is very hard (sometimes even impossible) to white a function giving exactly y. And the value of ε strongly depends on the nature of the FP operations involved, so in each particular case there may be different optimal value.
For more details see D. Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic, and J.-M. Muller et al. Handbook of Floating-Point Arithmetic. Both texts you can find in the Internet.

Floating-point comparison of constant assignment

When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:
double x;
double y;
x = f();
y = g();
if (fabs(x-y)<epsilon) {
// they are equal!
} else {
// they are not!
}
However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?
double x = 1;
double y = 1;
if (x==y) {
// they are equal!
} else {
// no they are not!
}
Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?
How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?
I am using C++ on Linux, but if it differs by language, I would like to understand that as well.
Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.
(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.
If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.
For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.
No if you assign literals they should be the same :)
Also if you start with the same value and do the same operations, they should be the same.
Floating point values are non-exact, but the operations should produce consistent results :)
Both cases are ultimately subject to implementation defined representations.
Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.
If you need precise behavior and portability, do not rely on this implementation defined behavior.
IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.
Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.
There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.
Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.
The easy answer: For constants == is ok.
There are two exceptions which you should be aware of:
First exception:
0.0 == -0.0
There is a negative zero which compares equal for the IEEE 754 standard. This means
1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y
Second exception:
NaN != NaN
This is a special caveat of NotaNumber which allows to find out if a number is a NaN
on systems which do not have a test function available (Yes, that happens).