There are a plenty of documents and discussions about float number comparison. But as for me it is not clear can it be always guaranteed that direct comparison of numbers will work on all compilers and platforms?
double x = 1.;
if (1. == x)
{
//do something
}
Will we always enter the if block?
Edited:
And what comparison here is correct (will work always)? This one?:
double x = 1.;
if (std::abs(1. - x) < std::numeric_limits<double>::epsilon())
{
//do something
}
Yes, direct comparison like that -- with no intervening operations -- will always work. The bit pattern that is stored for a floating point literal is the closest one representable by the floating point system (almost always IEEE-754). So testing 1.0 == 1.0 will always work because the bit pattern is that of 1.0; and 0.3 == 0.3 will also always work, because the bit pattern -- while not exactly 0.3 -- is the closest representable number to 0.3, in both cases.
As for the epsilon thing, stay away from machine epsilon until you actually know what it represents and what it's for. Machine epsilon is relative, not absolute; and using it to compare "close enough" requires an understanding of how much error various operations can introduce. Interestingly, in your particular case, the two tests are actually identical in effect: only exactly 1.0 will pass the test.
If you have two exact floating point values, you don't have to worry that comparing them may work contrary to your expectations. The problem is how do you know that your values are exact?
For the provided example you can be pretty confident that x == 1.0 will evaluate to true - I personally wouldn't consider supporting any platform that would fail that test. Yet, it is possible to gradually sophisticate your example without being able to tell for sure at which point you should stop relying on the outcome of the comparison.
Related
I am aware, that to compare two floating point values one needs to use some epsilon precision, as they are not exact. However, I wonder if there are edge cases, where I don't need that epsilon.
In particular, I would like to know if it is always safe to do something like this:
double foo(double x){
if (x < 0.0) return 0.0;
else return somethingelse(x); // somethingelse(x) != 0.0
}
int main(){
int x = -3.0;
if (foo(x) == 0.0) {
std::cout << "^- is this comparison ok?" << std::endl;
}
}
I know that there are better ways to write foo (e.g. returning a flag in addition), but I wonder if in general is it ok to assign 0.0 to a floating point variable and later compare it to 0.0.
Or more general, does the following comparison yield true always?
double x = 3.3;
double y = 3.3;
if (x == y) { std::cout << "is an epsilon required here?" << std::endl; }
When I tried it, it seems to work, but it might be that one should not rely on that.
Yes, in this example it is perfectly fine to check for == 0.0. This is not because 0.0 is special in any way, but because you only assign a value and compare it afterwards. You could also set it to 3.3 and compare for == 3.3, this would be fine too. You're storing a bit pattern, and comparing for that exact same bit pattern, as long as the values are not promoted to another type for doing the comparison.
However, calculation results that would mathematically equal zero would not always equal 0.0.
This Q/A has evolved to also include cases where different parts of the program are compiled by different compilers. The question does not mention this, my answer applies only when the same compiler is used for all relevant parts.
C++ 11 Standard,
§5.10 Equality operators
6 If both operands are of arithmetic or enumeration type, the usual
arithmetic conversions are performed on both operands; each of the
operators shall yield true if the specified relationship is true and
false if it is false.
The relationship is not defined further, so we have to use the common meaning of "equal".
§2.13.4 Floating literals
1 [...] If the scaled value is in the range of representable values
for its type, the result is the scaled value if representable, else
the larger or smaller representable value nearest the scaled value,
chosen in an implementation-defined manner. [...]
The compiler has to choose between exactly two values when converting a literal, when the value is not representable. If the same value is chosen for the same literal consistently, you are safe to compare values such as 3.3, because == means "equal".
Yes, if you return 0.0 you can compare it to 0.0; 0 is representable exactly as a floating-point value. If you return 3.3 you have to be a much more careful, since 3.3 is not exactly representable, so a conversion from double to float, for example, will produce a different value.
correction: 0 as a floating point value is not unique, but IEEE 754 defines the comparison 0.0==-0.0 to be true (any zero for that matter).
So with 0.0 this works - for every other number it does not. The literal 3.3 in one compilation unit (e.g. a library) and another (e.g. your application) might differ. The standard only requires the compiler to use the same rounding it would use at runtime - but different compilers / compiler settings might use different rounding.
It will work most of the time (for 0), but is very bad practice.
As long as you are using the same compiler with the same settings (e.g. one compilation unit) it will work because the literal 0.0 or 0.0f will translate to the same bit pattern every time. The representation of zero is not unique though. So if foo is declared in a library and your call to it in some application the same function might fail.
You can rescue this very case by using std::fpclassify to check whether the returned value represents a zero. For every finite (non-zero) value you will have to use an epsilon-comparison though unless you stay within one compilation unit and perform no operations on the values.
As written in both cases you are using identical constants in the same file fed to the same compiler. The string to float conversion the compiler uses should return the same bit pattern so these should not only be equal as in a plus or minus cases for zero thing but equal bit by bit.
Were you to have a constant which uses the operating systems C library to generate the bit pattern then have a string to f or something that can possibly use a different C library if the binary is transported to another computer than the one compiled on. You might have a problem.
Certainly if you compute 3.3 for one of the terms, runtime, and have the other 3.3 computed compile time again you can and will get failures on the equal comparisons. Some constants obviously are more likely to work than others.
Of course as written your 3.3 comparison is dead code and the compiler just removes it if optimizations are enabled.
You didnt specify the floating point format nor standard if any for that format you were interested in. Some formats have the +/- zero problem, some dont for example.
It is a common misconception that floating point values are "not exact". In fact each of them is perfectly exact (except, may be, some special cases as -0.0 or Inf) and equal to s·2e – (p – 1), where s, e, and p are significand, exponent, and precision correspondingly, each of them integer. E.g. in IEEE 754-2008 binary32 format (aka float32) p = 24 and 1 is represented as 0x800000·20 – 23. There are two things that are really not exact when you deal with floating point values:
Representation of a real value using a FP one. Obviously, not all real numbers can be represented using a given FP format, so they have to be somehow rounded. There are several rounding modes, but the most commonly used is the "Round to nearest, ties to even". If you always use the same rounding mode, which is almost certainly the case, the same real value is always represented with the same FP one. So you can be sure that if two real values are equal, their FP counterparts are exactly equal too (but not the reverse, obviously).
Operations with FP numbers are (mostly) inexact. So if you have some real-value function φ(ξ) implemented in the computer as a function of a FP argument f(x), and you want to compare its result with some "true" value y, you need to use some ε in comparison, because it is very hard (sometimes even impossible) to white a function giving exactly y. And the value of ε strongly depends on the nature of the FP operations involved, so in each particular case there may be different optimal value.
For more details see D. Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic, and J.-M. Muller et al. Handbook of Floating-Point Arithmetic. Both texts you can find in the Internet.
When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:
double x;
double y;
x = f();
y = g();
if (fabs(x-y)<epsilon) {
// they are equal!
} else {
// they are not!
}
However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?
double x = 1;
double y = 1;
if (x==y) {
// they are equal!
} else {
// no they are not!
}
Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?
How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?
I am using C++ on Linux, but if it differs by language, I would like to understand that as well.
Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.
(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.
If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.
For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.
No if you assign literals they should be the same :)
Also if you start with the same value and do the same operations, they should be the same.
Floating point values are non-exact, but the operations should produce consistent results :)
Both cases are ultimately subject to implementation defined representations.
Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.
If you need precise behavior and portability, do not rely on this implementation defined behavior.
IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.
Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.
There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.
Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.
The easy answer: For constants == is ok.
There are two exceptions which you should be aware of:
First exception:
0.0 == -0.0
There is a negative zero which compares equal for the IEEE 754 standard. This means
1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y
Second exception:
NaN != NaN
This is a special caveat of NotaNumber which allows to find out if a number is a NaN
on systems which do not have a test function available (Yes, that happens).
I have some math (in C++) which seems to be generating some very small, near zero, numbers (I suspect the trig function calls are my real problem), but I'd like to detect these cases so that I can study them in more detail.
I'm currently trying out the following, is it correct?
if ( std::abs(x) < DBL_MIN ) {
log_debug("detected small num, %Le, %Le", x, y);
}
Second, the nature of the mathematics is trigonometric in nature (aka using a lot of radian/degree conversions and sin/cos/tan calls, etc), what sort of transformations can I do to avoid mathematical errors?
Obviously for multiplications I can use a log transform - what else?
Contrary to widespread belief, DBL_MIN is not the smallest positive double value but the smallest positive normalized double value. Typically - for 64-bit ieee754 doubles - it's 2-1022, while the smallest positive double value is 2-1074. Therefore
I'm currently trying out the following, is it correct?
if ( std::abs(x) < DBL_MIN ) {
log_debug("detected small num, %Le, %Le", x, y);
}
may have an affirmative answer. The condition checks whether x is a denormalized (also called subnormal) number or ±0.0. Without knowing more about your specific situation, I cannot tell if that test is appropriate. Denormalized numbers can be legitimate results of calculations or the consequence of rounding where the correct result would be 0. It is also possible that rounding produces numbers of far greater magnitude than DBL_MIN when the mathematically correct result would be 0, so a much larger threshold could be sensible.
If x is a double, then one problem with this approach is that you can't distinguish between x being legitimately zero, and x being a positive value smaller than DBL_MIN. So this will work if you know x can never be legitimately zero, and you want to see when underflow occurs.
You could also try catching the SIGFPE signal, which will fire on a POSIX-compliant system any time there's a math error including floating-point underflow. See: http://en.wikipedia.org/wiki/SIGFPE
EDIT: To be clear, DBL_MIN is NOT the largest negative value that a double can hold, it is the smallest positive normalized value that a double can hold. So your approach is fine as long as the value can't be zero.
Another useful constant is DBL_EPSILON which is the smallest double value that can be added to 1.0 without getting 1.0 back. Note that this is a much larger value than DBL_MIN. But it may be useful to you since you're doing trigonometric functions that may tend toward 1 instead of tending toward 0.
Since you are using C++, the most idiomatic is to use std::numeric_limits from header <limits>.
For instance:
template <typename T>
bool is_close_to_zero(T x)
{
return std::abs(x) < std::numeric_limits<T>::epsilon();
}
The actual tolerance to be used heavily depends on your problem. Please complete your question with a concrete use case so that I can enhance my answer.
There is also std::numeric_limits<T>::min() and std::numeric_limits<T>::denorm_min() that may be useful. The first one is the smallest positive non-denormalized value of type T (equal to FLT/DBL/LDBL_MIN from <cfloat>), the second one is the smallest positive value of type T (no <cfloat> equivalent).
[You may find this document useful to read if you aren't at ease with floating point numbers representation.]
The first if check will actually only be true when your value is zero.
For your second question, you imply lots of conversions. Instead, pick one unit (deg or rad) and do all your computational operations in that unit. Then at the very end do a single conversion to the other value if you need to.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How should I do floating point comparison?
Is it not recommended to compare for equality a double and a double literal in C++, because I guess it is compiler dependent?
To be more precise it is not OK to compare a double which is hard-coded (a literal in the source code) and a double which should be computed, as the last number of the resultant of the calculation can vary from one compiler to another. Is this not standardized?
I heard this is mentioned in Knuth's TeXbook, is that right?
If this is all true, what is the solution?
You've misunderstood the advice a bit. The point is that floating-point computations aren't exact. Rounding errors occur, and precision is gradually lost. Take something as simple as 1.0/10.0. The result should be 0.1, but it isn't, because 0.1 cannot be represented exactly in floating-point format. So the actual result will be slightly different. The same is true for countless other operations, so the point has nothing to do with const doubles. It has to do with not expecting the result to be exact. If you perform some computation where the result should be 1.0, then you should not test it for equality against 1.0, because rounding errors might mean that it actually came out 0.9999999997 instead.
So the usual solution is to test if the result is sufficiently close to 1.0. If it is close, then we assume "it's good enough", and act as if the result had been 1.0.
The bottom line is that strict equality is rarely used for floating-point values. Instead, you should test if the difference between the two values is less than some small value (typically called the epsilon)
The problem you are talking about is due to rounding errors and will happen for every floating point number. What you can do is define an epsilon and see if the difference between the two floating point numbers is smaller than this. E.g.:
double A = somethingA();
double B = somethingB();
double epsilon = 0.00001;
if (abs(A - B) < epsilon)
doublesAreEqual();
[Edit] Also see this question: What is the most effective way for float and double comparison?.
The key problem is how floating point arithmetic works - it includes rounding that can lead to comparison for equality evaluated wrong. This applies to all floating point numbers regardless of whether variable is declared const or not.
if you do floating point calculations and you need to do comparisons with certain fixed values it is always safer to use an epsilon value to take into account precision errors .
Example:
double calcSomeStuf();
if ( calcSomeStuf() == 0.1 ) { ...}
is a bad idea
however:
const double epsilon = 0.005
double calcSomeStuf();
if ( abs(calcSomeStuf() - 0.1) < epsilon ) { ...}
is a lot safer (especially considering the fact that 0.1 cannot be represented exactly as a double)
This is necessary because when accumulating floating point operations rounding errors occur, and due to the nature of floating point not all numbers can be represented exactly
I am checking to make sure a float is not zero. It is impossible for the float to become negative. So is it faster to do this float != 0.0f or this float > 0.0f?
Thanks.
Edit: Yes, I know this is micro-optimisation. But this is going to be called every time through my game loop, and I would like to know anyway.
There is not likely to be a detectable difference in performance.
Consider, for entertainment purposes only:
Only 2 floating point values compare equal to 0f: zero and negative zero, and they differ only at 1 bit. So circuitry/software emulation that tests whether the 31 non-sign bits are clear will do it.
The comparison >0f is slightly more complicated, since negative numbers and 0 result in false, positive numbers result in true, but NaNs (of both signs) also result in false, so it's slightly more than just checking the sign bit.
Depending on the floating point mode, either operation could cause a super-precise result in a floating point register to be rounded to 32 bit before comparison, so the score's even there.
If there was a difference at all, I'd sort of expect != to be faster, but I wouldn't really expect there to be a difference and I wouldn't be very surprised to be wrong on some particular implementation.
I assume that your proof that the value cannot be negative is not subject to floating point errors. For example, calculations along the lines of 1/2.0 - 1/3.0 - 1/6.0 or 0.4 - 0.2 - 0.2 can result in either positive or negative values if the errors happen to accumulate rather than cancelling, so presumably nothing like that is going on. About only real use of a floating-point test for equality with 0, is to test whether you have assigned a literal 0 to it. Or the result of some other calculation guaranteed to have result 0 in float, but that can be tricksy.
It is not possible to give a clear cut answer without knowing your platform and compiler. The C standard does not define how floats are implemented.
On some platforms, yes, on other platforms, no.
If in doubt, measure.
As far as I know, f != 0.0f will sometimes return true when you think it should be false.
To check whether a float number is non-zero, you should do Math.abs(f) > EPSILON, where EPSILON is the error you can tolerate.
Performance shouldn't be a big issue in this comparison.
This is almost certainly the sort of micro-optimization you shouldn't do until you have quantitative data showing that it's a problem. If you can prove it's a problem, you should figure out how to make your compiler show the machine instructions it's generating, then take that info and go to the data book for the processor you are using, and look up the number of clock cycles required for alternative implementations of the same logic. Then you should measure again to make sure you are seeing the benefits, if any.
If you don't have any data showing that's it's a performance problem stick with the implementation that most clearly and simply presents the logic of what you are trying to do.