When comparing for equality is it okay to use ==?
For example:
int a = 3;
int b = 4;
If checking for equality should you use:
if (a == b)
{
. . .
}
Would the situation change if floating point numbers were used?
'==' is perfectly good for integer values.
You should not compare floats for equality; use an tolerance approach:
if (fabs(a - b) < tolerance)
{
// a and b are equal to within tolerance
}
Re floating points: yes. Don't use == for floats (or know EXACTLY what you're doing if you do). Rather use something like
if (fabs(a - b) < SOME_DELTA) {
...
}
EDIT: changed abs() to fabs()
Doing < and > comparisons doesn't really help you with rounding errors. Use the solution given by Mark Shearar. Direct equality comparisons for floats are not always bad, though. You can use them if some specific value (e.g. 0.0 or 1.0) is directly assigned to a variable, to check if the variable still has that value. It is only after calculations where the rounding errors screw up equality checks.
Notice that comparing a NaN value to anything (also another NaN) with <, >, <=, >= or == returns false. != returns true.
In many classes, operator== is typically implemented as (!(a < b || b < a)), so you should go ahead and use ==. Except for floats, as Mitch Wheat said above.
While comparing ints, use ==. Using "<" and ">" at the same time to check equality on a int results in slower code because it takes two comparisons instead of one, taking the double amount of time. (altough probably the compiler will fix it for you, but you should not get used to writing bad code).
Remember, early optimization is bad, but early inefficient code is just as bad.
EDIT: Fixed some english...
For integers, == does just what you expect. If they're equal, they're equal.
For floats, it's another story. Operations produce imprecise results and errors accumulate. You need to be a little fuzzy when dealing with numbers. I use
if ( std::abs( a - b )
< std::abs( a ) * ( std::numeric_limits<float_t>::epsilon() * error_margin ) )
where float_t is a typedef; this gives me as much precision as possible (assuming error_margin was calculated correctly) and allows easy adjustment to another type.
Furthermore, some floating-point values are not numbers: there's infinity, minus infinity, and of course not-a-number. == does funny things with those. Infinity equals infinity, but not-a-number does not equal not-a-number.
Finally, there are positive and negative zero, which are distinct but equal to each other! To separate them, you need to do something like check whether the inverse is positive or negative infinity. (Just make sure you won't get a divide-by-zero exception.)
So, unless you have a more specific question, I hope that handles it…
Related
This question already has an answer here:
Floating point less-than-equal comparisons after addition and substraction
(1 answer)
Closed 9 months ago.
I have gone through different threads for comparing lesser or greater float value not equal comparison but not clear do we need epsilon value logic to compare lesser or greater float value as well?
e.g ->
float a, b;
if (a < b) // is this correct way to compare two float value or we need epsilon value for lesser comparator
{
}
if (a > b) // is this correct way to compare two float value for greater comparator
{
}
I know for comparing for equality of float, we need some epsilon value
bool AreSame(double a, double b)
{
return fabs(a - b) < EPSILON;
}
It really depends on what should happen when both value are close enough to be seen as equal, meaning fabs(a - b) < EPSILON. In some use cases (for example for computing statistics), it is not very important if the comparison between 2 close values gives or not equality.
If it matters, you should first determine the uncertainty of the values. It really depends on the use case (where the input values come from and how they are processed), and then 2 value differing by less than that uncertainty should be considered as equal. But that equality is not longer a true mathematical equivalence relation: you can easily imagine how to build a chain a close values between 2 truely different values. In math words, the relation is not transitive (or is almost transitive is current language words).
I am sorry but as soon as you have to process approximations there cannot be any precise and consistent way: you have to think of the real world use case to determine how you should handle the approximation.
When you are working with floats, it's inevitable that you will run into precision errors.
In order to mitigate this, when checking for the equality two floats we often check if their difference is small enough.
For lesser and greater, however, there is no way to tell with full certainty which float is larger. The best (presumably for your intentions) approach is to first check if the two floats are the same, using the areSame function. If so return false (as a = b implies that a < b and a > b are both false).
Otherwise, return the value of either a < b or a > b.
The answer is application dependent.
If you are sure that a and b are sufficiently different that numerical errors will not reverse the order, then a < b is good enough.
But if a and b are dangerously close, you might require a < b + EPSILON. In such a case, it should be clear to you that < and ≤ are not distinguishable.
Needless to say, EPSILON should be chosen with the greatest care (which is often pretty difficult).
It ultimately depends on your application, but I would say generally no.
The problem, very simplified, is that if you calculate: (1/3) * 3 and get the answer 0.999999, then you want that to compare equal to 1. This is why we use epsilon values for equal comparisons (and the epsilon should be chosen according to the application and expected precision).
On the other hand, if you want to sort a list of floats then by default the 0.999999 value will sort before 1. But then again what would the correct behavior be? If they both are sorted as 1, then it will be somewhat random which one is actually sorted first (depending on the initial order of the list and the sorting algorithm you use).
The problem with floating point numbers is not that they are "random" and that it is impossible to predict their exact values. The problem is that base-10 fractions don't translate cleanly into base-2 fractions, and that non-repeating decimals in one system can translate into repeating one in the other - which then result in rounding errors when truncated to a finite number of decimals. We use epsilon values for equal comparisons to handle rounding errors that arise from these back and forth translations.
But do be aware that the nice relations that ==, < and <= have for integers, don't always translate over to floating points exactly because of the epsilons involved. Example:
a = x
b = a + epsilon/2
c = b + epsilon/2
d = c + epsilon/2
Now: a == b, b == c, c == d, BUT a != d, a < d. In fact, you can continue the sequence keeping num(n) == num(n+1) and at the same time get an arbitrarily large difference between a and the last number in the sequence.
As others have stated, there would always be precision errors when dealing with floats.
Thus, you should have an epsilon value even for comparing less than / greater than.
We know that in order for a to be less than b, firstly, a must be different from b. Checking this is a simple NOT equals, which uses the epsilon.
Then, once you already know a != b, the operator < is sufficient.
I'm writing a program that tries to find the minimum value of k > 1 such that the kth root of a and b (which are both given) equals a whole number.
Here's a snippet of my code, which I've commented for clarification.
int main()
{
// Declare the variables a and b.
double a;
double b;
// Read in variables a and b.
while (cin >> a >> b) {
int k = 2;
// We require the kth root of a and b to both be whole numbers.
// "while a^{1/k} and b^{1/k} are not both whole numbers..."
while ((fmod(pow(a, 1.0/k), 1) != 1.0) || (fmod(pow(b, 1.0/k), 1) != 0)) {
k++;
}
Pretty much, I read in (a, b), and I start from k = 2 and increment k until the kth roots of a and b are both congruent to 0 mod 1 (meaning that they are divisible by 1 and thus whole numbers).
But, the loop runs infinitely. I've tried researching, and I think it might have to do with precision error; however, I'm not too sure.
Another approach I've tried is changing the loop condition to check whether the floor of a^{1/k} equals a^{1/k} itself. But again, this runs infinitely, likely due to precision error.
Does anyone know how I can fix this issue?
EDIT: for example, when (a, b) = (216, 125), I want to have k = 3 because 216^(1/3) and 125^(1/3) are both integers (namely, 5 and 6).
That is not a programming problem but a mathematical one:
if a is a real, and k a positive integer, and if a^(1./k) is an integer, then a is an integer. (otherwise the aim is to toy with approximation error)
So the fastest approach may be to first check if a and b are integer, then do a prime decomposition such that a=p0e0 * p1e1 * ..., where pi are distinct primes.
Notice that, for a1/k to be an integer, each ei must also be divisible by k. In other words, k must be a common divisor of the ei. The same must be true for the prime powers of b if b1/k is to be an integer.
Thus the largest k is the greatest common divisor of all ei of both a and b.
With your approach you will have problem with large numbers. All IIEEE 754 binary64 floating points (the case of double on x86) have 53 significant bits. That means that all double larger than 253 are integer.
The function pow(x,1./k) will result in the same value for two different x, so that with your approach you will necessary have false answer, for example the numbers 55*290 and 35*2120 are exactly representable with double. The result of the algorithm is k=5. You may find this value of k with these number but you will also find k=5 for 55*290-249 and 35*2120, because pow(55*290-249,1./5)==pow(55*290). Demo here
On the other hand, as there are only 53 significant bits, prime number decomposition of double is trivial.
Floating numbers are not mathematical real numbers. The computation is "approximate". See http://floating-point-gui.de/
You could replace the test fmod(pow(a, 1.0/k), 1) != 1.0 with something like fabs(fmod(pow(a, 1.0/k), 1) - 1.0) > 0.0000001 (and play with various such 𝛆 instead of 0.0000001; see also std::numeric_limits::epsilon but use it carefully, since pow might give some error in its computations, and 1.0/k also inject imprecisions - details are very complex, dive into IEEE754 specifications).
Of course, you could (and probably should) define your bool almost_equal(double x, double y) function (and use it instead of ==, and use its negation instead of !=).
As a rule of thumb, never test floating numbers for equality (i.e. ==), but consider instead some small enough distance between them; that is, replace a test like x == y (respectively x != y) with something like fabs(x-y) < EPSILON (respectively fabs(x-y) > EPSILON) where EPSILON is a small positive number, hence testing for a small L1 distance (for equality, and a large enough distance for inequality).
And avoid floating point in integer problems.
Actually, predicting or estimating floating point accuracy is very difficult. You might want to consider tools like CADNA. My colleague Franck Védrine is an expert on static program analyzers to estimate numerical errors (see e.g. his TERATEC 2017 presentation on Fluctuat). It is a difficult research topic, see also D.Monniaux's paper the pitfalls of verifying floating-point computations etc.
And floating point errors did in some cases cost human lives (or loss of billions of dollars). Search the web for details. There are some cases where all the digits of a computed number are wrong (because the errors may accumulate, and the final result was obtained by combining thousands of operations)! There is some indirect relationship with chaos theory, because many programs might have some numerical instability.
As others have mentioned, comparing floating point values for equality is problematic. If you find a way to work directly with integers, you can avoid this problem. One way to do so is to raise integers to the k power instead of taking the kth root. The details are left as an exercise for the reader.
With this question as base, it is well known that we should not apply equals comparison operation to decimal variables, due numeric erros (it is not bound to programming language):
bool CompareDoubles1 (double A, double B)
{
return A == B;
}
The abouve code it is not right.
My questions are:
It is right to round to both numbers and then compare?
It is more efficient?
For instance:
bool CompareDoubles1 (double A, double B)
{
double a = round(A,4);
double b = round(B,4)
return a == b;
}
It is correct?
EDIT
I'm considering round is a method that take a double (number) and int (precition):
bool round (float number, int precision);
EDIT
I consider that a better idea of what I mean with this question will be expressed with this compare method:
bool CompareDoubles1 (double A, double B, int precision)
{
//precition could be the error expected when rounding
double a = round(A,precision);
double b = round(B,precision)
return a == b;
}
Usually, if you really have to compare floating values, you'd specify a tolerance:
bool CompareDoubles1 (double A, double B, double tolerance)
{
return std::abs(A - B) < tolerance;
}
Choosing an appropriate tolerance will depend on the nature of the values and the calculations that produce them.
Rounding is not appropriate: two very close values, which you'd want to compare equal, might round in different directions and appear unequal. For example, when rounding to the nearest integer, 0.3 and 0.4 would compare equal, but 0.499999 and 0.500001 wouldn't.
A common comparison for doubles is implemented as
bool CompareDoubles2 (double A, double B)
{
return std::abs(A - B) < 1e-6; // small magic constant here
}
It is clearly not as efficient as the check A == B, because it involves more steps, namely subtraction, calling std::abs and finally comparison with a constant.
The same argument about efficiency holds for you proposed solution:
bool CompareDoubles1 (double A, double B)
{
double a = round(A,4); // the magic constant hides in the 4
double b = round(B,4); // and here again
return a == b;
}
Again, this won't be as efficient as direct comparison, but -- again -- it doesn't even try to do the same.
Whether CompareDoubles2 or CompareDoubles1 is faster depends on your machine and the choice of magic constants. Just measure it. You need to make sure to supply matching magic constants, otherwise you are checking for equality with a different trust region which yields different results.
I think comparing the difference with a fixed tolerance is a bad idea.
Say what happens if you set the tolerance to 1e-6, but the two numbers you compare are
1.11e-9 and 1.19e-9?
These would be considered equal, even if they differ after the second significant digit. This may not what you want.
I think a better way to do the comparison is
equal = ( fabs(A - B) <= tol*max(fabs(A), fabs(B)) )
Note, the <= (and not <), because the above must also work for 0==0. If you set tol=1e-14, two numbers will be considered equal when they are equal up to 14 significant digits.
Sidenote: When you want to test if a number is zero, then the above test might not be ideal and then one indeed should use an absolute threshold.
If the round function used in your example means to round to 4th decimal digit, this is not correct at all. For example, if A and B are 0.000003 and 0.000004 they would be rounded to 0.0 and would therefore be compared to be equal.
A general purpose compairison function must not work with a constant tolarance but with a relative one. But it is all explained in the post you cite in your question.
There is no 'correct' way to compare floating point values (Even a f == 0.0 might be correct). Different comparison may be suitable. Have a look at http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Similar to other posts, but introducing scale-invariance: If you are doing something like adding two sets of numbers together and then you want to know if the two set sums are equal, you can take the absolute value of the log-ratio (difference of logarithms) and test to see if this is less than your prescribed tolerance. That way, e.g. if you multiply all your numbers by 10 or 100 in summation calculations, it won't affect the result about whether the answers are equal or not. You should have a separate test to determine if two numbers are equal because they are close enough to 0.
I wrote a recursive function that computes the sum of an array of double. For some reasons, the value returned by my recursive function is not correct. Actually, my recursive sum does not match my iterative sum. I know I made a little mistake somewhere, but I can't see where. Your help will be very appreciated. I only pasted the recursive function. I am using C++ on Visual Studio. Thanks!
double recursive_sum(double array_nbr[], int size_ar)
{ double rec_sum=0.0;
if( size_ar== 0)
return -1;
else if( size_ar> 0)
rec_sum=array_nbr[size_ar-1]+recursive_sum(array_nbr,size_ar-1);
return rec_sum;
}
//#### Output######
The random(s) number generated in the array =
0.697653 | 0.733848 | 0.221564 |
Recursive sum: 0.653066
Iterative sum: 1.65307
Press any key to continue . . .
Well, because sum of no elements is zero, not minus one.
if (size_ar == 0.0)
return 0.0;
Think about it this way: sum(1,2,3) is the same as sum(1,2) + sum(3) just as it is the same as sum(1,2,3)+sum() — in all three cases, you add 1, 2, and 3 together, just in a slighlty different ways. That's also why the product of no elements is one.
Try changing "if( size_ar== 0) return -1;" to return 0.
While this does not account for the large discrepancy in your output, another thing to keep in mind is the ordering of operations once you have fixed the issue with returning a -1 vs. 0 ... IEEE floating point operations are not necessarily commutative, so make sure that when you are doing your recursive vs. iterative methods, you add up the numbers in the exact same order, otherwise your output may still differ by some epsilon value.
For instance, currently in your recursive method you're adding up the values from the last member of the array in reverse to the first member of the array. That may, because of the non-commutative property of floating point math, give you a slightly different value (small epsilon) than if you sum up the values in the array from first to last. This probably won't show on a simple cout where the floating point values are truncated to a specific fixed decimal position, but should you attempt to use the == operation on the two different summations without incorporating some epsilon value, the result may still test false.
The following code is supposed to find the key 3.0in a std::map which exists. But due to floating point precision it won't be found.
map<double, double> mymap;
mymap[3.0] = 1.0;
double t = 0.0;
for(int i = 0; i < 31; i++)
{
t += 0.1;
bool contains = (mymap.count(t) > 0);
}
In the above example, contains will always be false.
My current workaround is just multiply t by 0.1 instead of adding 0.1, like this:
for(int i = 0; i < 31; i++)
{
t = 0.1 * i;
bool contains = (mymap.count(t) > 0);
}
Now the question:
Is there a way to introduce a fuzzyCompare to the std::map if I use double keys?
The common solution for floating point number comparison is usually something like a-b < epsilon. But I don't see a straightforward way to do this with std::map.
Do I really have to encapsulate the double type in a class and overwrite operator<(...) to implement this functionality?
So there are a few issues with using doubles as keys in a std::map.
First, NaN, which compares less than itself is a problem. If there is any chance of NaN being inserted, use this:
struct safe_double_less {
bool operator()(double left, double right) const {
bool leftNaN = std::isnan(left);
bool rightNaN = std::isnan(right);
if (leftNaN != rightNaN)
return leftNaN<rightNaN;
return left<right;
}
};
but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.
(I placed NaN as greater than all doubles, including +inf, in my ordering, for no good reason. Less than all doubles would also work).
So either use the default operator<, or the above safe_double_less, or something similar.
Next, I would advise using a std::multimap or std::multiset, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[], which is not advised to be used when you are using floating point keys.
The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:
// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
auto lower = container.lower_bound( target-epsilon );
auto upper = container.upper_bound( target+epsilon );
return std::make_pair(lower, upper);
}
which works on both std::map and std::set (and multi versions).
(In a more modern code base, I'd expect a range<?> object that is a better thing to return from an equal_range function. But for now, I'll make it compatible with equal_range).
This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.
To test for existence of a key, do this:
template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
auto range = my_equal_range(container, target, epsilon);
return range.first != range.second;
}
and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.
The shorter answer is "don't use floating point values as keys for std::set and std::map", because it is a bit of a hassle.
If you do use floating point keys for std::set or std::map, almost certainly never do a .find or a [] on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.
Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.
Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.
Let epsilon = 2 for simplicity. Put 1 and 4 into your map. It now might look like this:
1
\
4
So 1 is the tree root.
Now put in the numbers 2, 3, 4 in that order. Each will replace the root, because it compares equal to it. So then you have
4
\
4
which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5, 6, 7:
7
\
4
and this is even more broken, because now if we ask whether 4 is in there, it will say "no", and if we ask for an iterator for values less than 7, it won't include 4.
Though I must say that I've used maps based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.
As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false for any values that are within your tolerance for equivalence.
return (abs(left - right) > epsilon) && (left < right);
Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b) and !(b<c) results in !(a<c). This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.
You could implement own compare function.
#include <functional>
class own_double_less : public std::binary_function<double,double,bool>
{
public:
own_double_less( double arg_ = 1e-7 ) : epsilon(arg_) {}
bool operator()( const double &left, const double &right ) const
{
// you can choose other way to make decision
// (The original version is: return left < right;)
return (abs(left - right) > epsilon) && (left < right);
}
double epsilon;
};
// your map:
map<double,double,own_double_less> mymap;
Updated: see Item 40 in Effective STL!
Updated based on suggestions.
Using doubles as keys is not useful. As soon as you make any arithmetic on the keys you are not sure what exact values they have and hence cannot use them for indexing the map. The only sensible usage would be that the keys are constant.