C++ Comparison of doubles using an epsilon - c++

I defined a comparison function for doubles that uses an epsilon, as suggested in: What is the most effective way for float and double comparison?. Two doubles differ if their absolute difference is less than epsilon.
If I have doubles d1 and d2, with d2 = d1+epsilon, d1 should not be equal to d2 since they differ by epsilon. Right?
It works with some values, but not all the time. It might be related to the architecture or to the compiler? What should I do to improve my comparison function?
#include <iomanip> // For std::setprecision
#include <iostream>
#include "math.h" // For fabs
const double epsilon = 0.001;
bool equal(const double &d1, const double &d2)
{
return fabs(d1-d2) < epsilon;
}
int main()
{
double d1 = 3.5;
double d2 = d1+epsilon;
// d2 = d1 + epsilon should not be equal to d1.
std::cout << std::setprecision(16) << "d1 = " << d1 << " d2 = " << d2 << std::endl;
bool eq = equal(d1,d2);
if (eq)
{
std::cout << "They should not be equal, but they are!" << std::endl;
}
else
{
std::cout << "They are not equal. OK!" << std::endl;
}
}
Output on my machine:
d1 = 3.5 d2 = 3.501
They should not be equal, but they are!

One of the most frequent floating-point caveats: if you put a number to the floating point, it doesn't mean it will be stored as exactly the same value. Instead, they would be slightly amended to create a binary structure that is most close to your intended value.
Ok, back to business! Try to add this to your source code:
std::cout << fabs(d1-d2) << std::endl;
And find the output as the following :)
0.0009999999999998899
You see? It's clearly less than your epsilon.

If I have doubles d1 and d2, with d2 = d1+epsilon, d1 should not be equal to d2 since they differ by epsilon. Right?
Wrong. Your logic is broken. You've told that you should not compare doubles without epsilon but next step you are surprised that fabs(d1-d2) == epsilon is not true.
What should I do to improve my comparison function?
Nothing, d1 + epsilon is a border case and you cannot be sure if that number is equal to d1 or not. That is downside of floating point arithmetic.

There are actually two different problems here. First, your comparison should be >=, rather than >, so it “works” for your test case where the difference is exactly epsilon.
The second (potential) problem is that you can’t really use a constant epsilon for these sorts of comparisons, in the general case. You need to select the epsilon based on the expected magnitude of the numbers involved, and the expected accumulated numerical error in the calculations.
Floating-point numbers are less precise the larger the magnitude of the value they’re storing is. If you repeat your test with a value that’s “large enough”, it’ll fail every time.

Let's call pe precision error. It is infinitesimal(~2^−53) for IEEE 754 standard binary64
The maths are:
[Pseudo code]
d1 = 3.5 + pe1
d2 = 3.501 + pe2
d2 - d1 = 0.001 + pe2 - pe1
epsilon = 0.001 + pe3
The result of the expression:
[Pseudo code]
|d1-d2| < epsilon //Since you know that d2>d1 you can replace |d1-d2| by d2-d1
0.001 + pe2 - pe1 < 0.001 + pe3
pe2 - pe1 < pe3
This border case can be true or false dependig on magnitude and sign of pe3, pe2 and pe1. But it is undetermined.

Related

Numerical accuracy of pow(a/b,x) vs pow(b/a,-x)

Is there a difference in accuracy between pow(a/b,x) and pow(b/a,-x)?
If there is, does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?
Edit: Let's assume x86_64 processor and gcc compiler.
Edit: I tried comparing using some random numbers. For example:
printf("%.20f",pow(8.72138221/1.761329479,-1.51231)) // 0.08898783049228660424
printf("%.20f",pow(1.761329479/8.72138221, 1.51231)) // 0.08898783049228659037
So, it looks like there is a difference (albeit minuscule in this case), but maybe someone who knows about the algorithm implementation could comment on what the maximum difference is, and under what conditions.
Here's one way to answer such questions, to see how floating-point behaves. This is not a 100% correct way to analyze such question, but it gives a general idea.
Let's generate random numbers. Calculate v0=pow(a/b, n) and v1=pow(b/a, -n) in float precision. And calculate ref=pow(a/b, n) in double precision, and round it to float. We use ref as a reference value (we suppose that double has much more precision than float, so we can trust that ref can be considered the best possible value. This is true for IEEE-754 for most of the time). Then sum the difference between v0-ref and v1-ref. The difference should calculated with "the number of floating point numbers between v and ref".
Note, that the results may be depend on the range of a, b and n (and on the random generator quality. If it's really bad, it may give a biased result). Here, I've used a=[0..1], b=[0..1] and n=[-2..2]. Furthermore, this answer supposes that the algorithm of float/double division/pow is the same kind, have the same characteristics.
For my computer, the summed differences are: 2604828 2603684, it means that there is no significant precision difference between the two.
Here's the code (note, this code supposes IEEE-754 arithmetic):
#include <cmath>
#include <stdio.h>
#include <string.h>
long long int diff(float a, float b) {
unsigned int ai, bi;
memcpy(&ai, &a, 4);
memcpy(&bi, &b, 4);
long long int diff = (long long int)ai - bi;
if (diff<0) diff = -diff;
return diff;
}
int main() {
long long int e0 = 0;
long long int e1 = 0;
for (int i=0; i<10000000; i++) {
float a = 1.0f*rand()/RAND_MAX;
float b = 1.0f*rand()/RAND_MAX;
float n = 4.0f*rand()/RAND_MAX - 2.0f;
if (a==0||b==0) continue;
float v0 = std::pow(a/b, n);
float v1 = std::pow(b/a, -n);
float ref = std::pow((double)a/b, n);
e0 += diff(ref, v0);
e1 += diff(ref, v1);
}
printf("%lld %lld\n", e0, e1);
}
... between pow(a/b,x) and pow(b/a,-x) ... does raising a number less than 1 to a positive power or a number greater than 1 to a negative power produce more accurate result?
Whichever division is more arcuate.
Consider z = xy = 2y * log2(x).
Roughly: The error in y * log2(x) is magnified by the value of z to form the error in z. xy is very sensitive to the error in x. The larger the |log2(x)|, the greater concern.
In OP's case, both pow(a/b,p) and pow(b/a,-p), in general, have the same y * log2(x) and same z and similar errors in z. It is a question of how x, y are formed:
a/b and b/a, in general, both have the same error of +/- 0.5*unit in the last place and so both approaches are of similar error.
Yet with select values of a/b vs. b/a, one quotient will be more exact and it is that approach with the lower pow() error.
pow(7777777/4,-p) can be expected to be more accurate than pow(4/7777777,p).
Lacking assurance about the error in the division, the general rule applies: no major difference.
In general, the form with the positive power is slightly better, although by so little it will likely have no practical effect. Specific cases could be distinguished. For example, if either a or b is a power of two, it ought to be used as the denominator, as the division then has no rounding error.
In this answer, I assume IEEE-754 binary floating-point with round-to-nearest-ties-to-even and that the values involved are in the normal range of the floating-point format.
Given a, b, and x with values a, b, and x, and an implementation of pow that computes the representable value nearest the ideal mathematical value (actual implementations are generally not this good), pow(a/b, x) computes (a/b•(1+e0))x•(1+e1), where e0 is the rounding error that occurs in the division and e1 is the rounding error that occurs in the pow, and pow(b/a, -x) computes (b/a•(1+e2))−x•(1+e3), where e2 and e3 are the rounding errors in this division and this pow, respectively.
Each of the errors, e0…e3 lies in the interval [−u/2, u/2], where u is the unit of least precision (ULP) of 1 in the floating-point format. (The notation [p, q] is the interval containing all values from p to q, including p and q.) In case a result is near the edge of a binade (where the floating-point exponent changes and the significand is near 1), the lower bound may be −u/4. At this time, I will not analyze this case.
Rewriting, these are (a/b)x•(1+e0)x•(1+e1) and (a/b)x•(1+e2)−x•(1+e3). This reveals the primary difference is in (1+e0)x versus (1+e2)−x. The 1+e1 versus 1+e3 is also a difference, but this is just the final rounding. [I may consider further analysis of this later but omit it for now.]
Consider (1+e0)x and (1+e2)−x.The potential values of the first expression span [(1−u/2)x, (1+u/2)x], while the second spans [(1+u/2)−x, (1−u/2)−x]. When x > 0, the second interval is longer than the first:
The length of the first is (1+u/2)x−(1+u/2)x.
The length of the second is (1/(1−u/2))x−(1/(1+u/2))x.
Multiplying the latter by (1−u2/22)x produces ((1−u2/22)/(1−u/2))x−( (1−u2/22)/(1+u/2))x = (1+u/2)x−(1+u/2)x, which is the length of the first interval.
1−u2/22 < 1, so (1−u2/22)x < 1 for positive x.
Since the first length equals the second length times a number less than one, the first interval is shorter.
Thus, the form in which the exponent is positive is better in the sense that it has a shorter interval of potential results.
Nonetheless, this difference is very slight. I would not be surprised if it were unobservable in practice. Also, one might be concerned with the probability distribution of errors rather than the range of potential errors. I suspect this would also favor positive exponents.
For evaluation of rounding errors like in your case, it might be useful to use some multi-precision library, such as Boost.Multiprecision. Then, you can compare results for various precisions, e.g, such as with the following program:
#include <iomanip>
#include <iostream>
#include <boost/multiprecision/cpp_bin_float.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
namespace mp = boost::multiprecision;
template <typename FLOAT>
void comp() {
FLOAT a = 8.72138221;
FLOAT b = 1.761329479;
FLOAT c = 1.51231;
FLOAT e = mp::pow(a / b, -c);
FLOAT f = mp::pow(b / a, c);
std::cout << std::fixed << std::setw(40) << std::setprecision(40) << e << std::endl;
std::cout << std::fixed << std::setw(40) << std::setprecision(40) << f << std::endl;
}
int main() {
std::cout << "Double: " << std::endl;
comp<mp::cpp_bin_float_double>();
td::cout << std::endl;
std::cout << "Double extended: " << std::endl;
comp<mp::cpp_bin_float_double_extended>();
std::cout << std::endl;
std::cout << "Quad: " << std::endl;
comp<mp::cpp_bin_float_quad>();
std::cout << std::endl;
std::cout << "Dec-100: " << std::endl;
comp<mp::cpp_dec_float_100>();
std::cout << std::endl;
}
Its output reads, on my platform:
Double:
0.0889878304922865903670015086390776559711
0.0889878304922866181225771242679911665618
Double extended:
0.0889878304922865999079806265115166752366
0.0889878304922865999012043629334822725241
Quad:
0.0889878304922865999004910375213273866639
0.0889878304922865999004910375213273505527
Dec-100:
0.0889878304922865999004910375213273881004
0.0889878304922865999004910375213273881004
Live demo: https://wandbox.org/permlink/tAm4sBIoIuUy2lO6
For double, the first calculation was more accurate, however, I guess one cannot make any generic conclusions here.
Also, note that your input numbers are not accurately representable with the IEEE 754 double precision floating-point type (none of them). The question is whether you care about the accuracy of calculations with either those exact numbers of their closest representations.

fmod is inconsistent with division [duplicate]

when I use fmod(0.6,0.2) in c++ it returns 0.2
I know this is caused by floating point accuracy but it seems I have to get remainder of two double this moment
thanks very much for any solutions for this kind of problem
The mathematical values 0.6 and 0.2 cannot be represented exactly in binary floating-point.
This demo program will show you what's going on:
#include <iostream>
#include <iomanip>
#include <cmath>
int main() {
const double x = 0.6;
const double y = 0.2;
std::cout << std::setprecision(60)
<< "x = " << x << "\n"
<< "y = " << y << "\n"
<< "fmod(x, y) = " << fmod(x, y) << "\n";
}
The output on my system (and very likely on yours) is:
x = 0.59999999999999997779553950749686919152736663818359375
y = 0.200000000000000011102230246251565404236316680908203125
fmod(x, y) = 0.1999999999999999555910790149937383830547332763671875
The result returned by fmod() is correct given the arguments you passed it.
If you need some other result (I presume you were expecting 0.0), you'll have to do something different. There are a number of possibilities. You can check whether the result differs from 0.2 by some very small amount, or perhaps you can do the computation using integer arithmetic (if, for example, all the numbers you're dealing with are multiples of 0.1 or of 0.01).
I think your best bet here would be to use the remainder function instead of fmod. For the given example, it will return a very small number rather than 0.2. You can use this fact to assume a remainder of 0 by rounding to a certain level of precision.
You're right, the problem is a rounding error. Try this code:
#include <stdio.h>
#include <math.h>
int main()
{
double d6 = 0.6;
double d2 = 0.2;
double d = fmod (d6, d2);
printf ("%30.20e %30.20e %30.20e\n", d6, d2, d);
}
When I run it in gcc 4.4.7 the output is:
5.99999999999999977796e-01 2.00000000000000011102e-01 1.99999999999999955591e-01
As for how to "fix" it, I don't know enough about exactly what you are trying to do to know what to suggest. There will always be numbers that cannot be exactly represented in floating point, so this kind of behavior is unavoidable.
More information about the problem domain would be helpful to get better suggestions. For example, if the numbers you are working with always just have one digit after the decimal point, and are small enough, just multiply by 10, round to the nearest integer (or long or long long), and use the % operator rather than the fmod function. If you are trying to see whether the result of fmod is 0.0, simply accept a result that is close to 0.2 (in this case) as if it were close to 0.

c++ float subtraction rounding error

I have a float value between 0 and 1. I need to convert it with -120 to 80.
To do this, first I multiply with 200 after 120 subtract.
When subtract is made I had rounding error.
Let's look my example.
float val = 0.6050f;
val *= 200.f;
Now val is 121.0 as I expected.
val -= 120.0f;
Now val is 0.99999992
I thought maybe I can avoid this problem with multiplication and division.
float val = 0.6050f;
val *= 200.f;
val *= 100.f;
val -= 12000.0f;
val /= 100.f;
But it didn't help. I have still 0.99 on my hand.
Is there a solution for it?
Edit: After with detailed logging, I understand there is no problem with this part of code. Before my log shows me "0.605", after I had detailed log and I saw "0.60499995946884155273437500000000000000000000000000"
the problem is in different place.
Edit2: I think I found the guilty. The initialised value is 0.5750.
std::string floatToStr(double d)
{
std::stringstream ss;
ss << std::fixed << std::setprecision(15) << d;
return ss.str();
}
int main()
{
float val88 = 0.57500000000f;
std::cout << floatToStr(val88) << std::endl;
}
The result is 0.574999988079071
Actually I need to add and sub 0.0025 from this value every time.
Normally I expected 0.575, 0.5775, 0.5800, 0.5825 ....
Edit3: Actually I tried all of them with double. And it is working for my example.
std::string doubleToStr(double d)
{
std::stringstream ss;
ss << std::fixed << std::setprecision(15) << d;
return ss.str();
}
int main()
{
double val88 = 0.575;
std::cout << doubleToStr(val88) << std::endl;
val88 += 0.0025;
std::cout << doubleToStr(val88) << std::endl;
val88 += 0.0025;
std::cout << doubleToStr(val88) << std::endl;
val88 += 0.0025;
std::cout << doubleToStr(val88) << std::endl;
return 0;
}
The results are:
0.575000000000000
0.577500000000000
0.580000000000000
0.582500000000000
But I bound to float unfortunately. I need to change lots of things.
Thank you for all to help.
Edit4: I have found my solution with strings. I use ostringstream's rounding and convert to double after that. I can have 4 precision right numbers.
std::string doubleToStr(double d, int precision)
{
std::stringstream ss;
ss << std::fixed << std::setprecision(precision) << d;
return ss.str();
}
double val945 = (double)0.575f;
std::cout << doubleToStr(val945, 4) << std::endl;
std::cout << doubleToStr(val945, 15) << std::endl;
std::cout << atof(doubleToStr(val945, 4).c_str()) << std::endl;
and results are:
0.5750
0.574999988079071
0.575
Let us assume that your compiler implements IEEE 754 binary32 and binary64 exactly for float and double values and operations.
First, you must understand that 0.6050f does not represent the mathematical quantity 6050 / 10000. It is exactly 0.605000019073486328125, the nearest float to that. Even if you write perfect computations from there, you have to remember that these computations start from 0.605000019073486328125 and not from 0.6050.
Second, you can solve nearly all your accumulated roundoff problems by computing with double and converting to float only in the end:
$ cat t.c
#include <stdio.h>
int main(){
printf("0.6050f is %.53f\n", 0.6050f);
printf("%.53f\n", (float)((double)0.605f * 200. - 120.));
}
$ gcc t.c && ./a.out
0.6050f is 0.60500001907348632812500000000000000000000000000000000
1.00000381469726562500000000000000000000000000000000000
In the above code, all computations and intermediate values are double-precision.
This 1.0000038… is a very good answer if you remember that you started with 0.605000019073486328125 and not 0.6050 (which doesn't exist as a float).
If you really care about the difference between 0.99999992 and 1.0, float is not precise enough for your application. You need to at least change to double.
If you need an answer in a specific range, and you are getting answers slightly outside that range but within rounding error of one of the ends, replace the answer with the appropriate range end.
The point everybody is making can be summarised: in general, floating point is precise but not exact.
How precise is governed by the number of bits in the mantissa -- which is 24 for float, and 53 for double (assuming IEEE 754 binary formats, which is pretty safe these days ! [1]).
If you are looking for an exact result, you have to be ready to deal with values that differ (ever so slightly) from that exact result, but...
(1) The Exact Binary Fraction Problem
...the first issue is whether the exact value you are looking for can be represented exactly in binary floating point form...
...and that is rare -- which is often a disappointing surprise.
The binary floating point representation of a given value can be exact, but only under the following, restricted circumstances:
the value is an integer, < 2^24 (float) or < 2^53 (double).
this is the simplest case, and perhaps obvious. Since you are looking a result >= -120 and <= 80, this is sufficient.
or:
the value is an integer which divides exactly by 2^n and is then (as above) < 2^24 or < 2^53.
this includes the first rule, but is more general.
or:
the value has a fractional part, but when the value is multiplied by the smallest 2^n necessary to produce an integer, that integer is < 2^24 (float) or 2^53 (double).
This is the part which may come as a surprise.
Consider 27.01, which is a simple enough decimal value, and clearly well within the ~7 decimal digit precision of a float. Unfortunately, it does not have an exact binary floating point form -- you can multiply 27.01 by any 2^n you like, for example:
27.01 * (2^ 6) = 1728.64 (multiply by 64)
27.01 * (2^ 7) = 3457.28 (multiply by 128)
...
27.01 * (2^10) = 27658.24
...
27.01 * (2^20) = 28322037.76
...
27.01 * (2^25) = 906305208.32 (> 2^24 !)
and you never get an integer, let alone one < 2^24 or < 2^53.
Actually, all these rules boil down to one rule... if you can find an 'n' (positive or negative, integer) such that y = value * (2^n), and where y is an exact, odd integer, then value has an exact representation if y < 2^24 (float) or if y < 2^53 (double) -- assuming no under- or over-flow, which is another story.
This looks complicated, but the rule of thumb is simply: "very few decimal fractions can be represented exactly as binary fractions".
To illustrate how few, let us consider all the 4 digit decimal fractions, of which there are 10000, that is 0.0000 up to 0.9999 -- including the trivial, integer case 0.0000. We can enumerate how many of those have exact binary equivalents:
1: 0.0000 = 0/16 or 0/1
2: 0.0625 = 1/16
3: 0.1250 = 2/16 or 1/8
4: 0.1875 = 3/16
5: 0.2500 = 4/16 or 1/4
6: 0.3125 = 5/16
7: 0.3750 = 6/16 or 3/8
8: 0.4375 = 7/16
9: 0.5000 = 8/16 or 1/2
10: 0.5625 = 9/16
11: 0.6250 = 10/16 or 5/8
12: 0.6875 = 11/16
13: 0.7500 = 12/16 or 3/4
14: 0.8125 = 13/16
15: 0.8750 = 14/16 or 7/8
16: 0.9375 = 15/16
That's it ! Just 16/10000 possible 4 digit decimal fractions (including the trivial 0 case) have exact binary fraction equivalents, at any precision. All the other 9984/10000 possible decimal fractions give rise to recurring binary fractions. So, for 'n' digit decimal fractions only (2^n) / (10^n) can be represented exactly -- that's 1/(5^n) !!
This is, of course, because your decimal fraction is actually the rational x / (10^n)[2] and your binary fraction is y / (2^m) (for integer x, y, n and m), and for a given binary fraction to be exactly equal to a decimal fraction we must have:
y = (x / (10^n)) * (2^m)
= (x / ( 5^n)) * (2^(m-n))
which is only the case when x is an exact multiple of (5^n) -- for otherwise y is not an integer. (Noting that n <= m, assuming that x has no (spurious) trailing zeros, and hence n is as small as possible.)
(2) The Rounding Problem
The result of a floating point operation may need to be rounded to the precision of the destination variable. IEEE 754 requires that the operation is done as if there were no limit to the precision, and the ("true") result is then rounded to the nearest value at the precision of the destination. So, the final result is as precise as it can be... given the limitations on how precise the arguments are, and how precise the destination is... but not exact !
(With floats and doubles, 'C' may promote float arguments to double (or long double) before performing an operation, and the result of that will be rounded to double. The final result of an expression may then be a double (or long double), which is then rounded (again) if it is to be stored in a float variable. All of this adds to the fun ! See FLT_EVAL_METHOD for what your system does -- noting the default for a floating point constant is double.)
So, the other rules to remember are:
floating point values are not reals (they are, in fact, rationals with a limited denominator).
The precision of a floating point value may be large, but there are lots of real numbers that cannot be represented exactly !
floating point expressions are not algebra.
For example, converting from degrees to radians requires division by π. Any arithmetic with π has a problem ('cos it's irrational), and with floating point the value for π is rounded to whatever floating precision we are using. So, the conversion of (say) 27 (which is exact) degrees to radians involves division by 180 (which is exact) and multiplication by our "π". However exact the arguments, the division and the multiplication may round, so the result is may only approximate. Taking:
float pi = 3.14159265358979 ; /* plenty for float */
float x = 27.0 ;
float y = (x / 180.0) * pi ;
float z = (y / pi) * 180.0 ;
printf("z-x = %+6.3e\n", z-x) ;
my (pretty ordinary) machine gave: "z-x = +1.907e-06"... so, for our floating point:
x != (((x / 180.0) * pi) / pi) * 180 ;
at least, not for all x. In the case shown, the relative difference is small -- ~ 1.2 / (2^24) -- but not zero, which simple algebra might lead us to expect.
hence: floating point equality is a slippery notion.
For all the reasons above, the test x == y for two floating values is problematic. Depending on how x and y have been calculated, if you expect the two to be exactly the same, you may very well be sadly disappointed.
[1] There exists a standard for decimal floating point, but generally binary floating point is what people use.
[2] For any decimal fraction you can write down with a finite number of digits !
Even with double precision, you'll run into issues such as:
200. * .60499999999999992 = 120.99999999999997
It appears that you want some type of rounding so that 0.99999992 is rounded to 1.00000000 .
If the goal is to produce values to the nearest multiple of 1/1000, try:
#include <math.h>
val = (float) floor((200000.0f*val)-119999.5f)/1000.0f;
If the goal is to produce values to the nearest multiple of 1/200, try:
val = (float) floor((40000.0f*val)-23999.5f)/200.0f;
If the goal is to produce values to the nearest integer, try:
val = (float) floor((200.0f*val)-119.5f);

When a float variable goes out of the float limits, what happens?

I remarked two things:
std::numeric_limits<float>::max()+(a small number) gives: std::numeric_limits<float>::max().
std::numeric_limits<float>::max()+(a large number like: std::numeric_limits<float>::max()/3) gives inf.
Why this difference? Does 1 or 2 results in an OVERFLOW and thus to an undefined behavior?
Edit: Code for testing this:
1.
float d = std::numeric_limits<float>::max();
float q = d + 100;
cout << "q: " << q << endl;
2.
float d = std::numeric_limits<float>::max();
float q = d + (d/3);
cout << "q: " << q << endl;
Formally, the behavior is undefined. On a machine with IEEE
floating point, however, overflow after rounding will result
in Inf. The precision is limited, however, and the results
after rounding of FLT_MAX + 1 are FLT_MAX.
You can see the same effect with values well under FLT_MAX.
Try something like:
float f1 = 1e20; // less than FLT_MAX
float f2 = f1 + 1.0;
if ( f1 == f2 ) ...
The if will evaluate to true, at least with IEEE arithmetic.
(There do exist, or at least have existed, machines where
float has enough precision for the if to evaluate to
false, but they aren't very common today.)
It depends on what you are doing. If the float "overflow" comes in an expression which is directly returned, i.e.
return std::numeric_limits::max() + std::numeric_limits::max();
the operation might not result in an overflow. I cite from the C standard [ISO/IEC 9899:2011]:
The return statement is not an assignment. The overlap restriction of
subclause 6.5.16.1 does not apply to the case of function return. The
representation of floating-point values may have wider range or
precision than implied by the type; a cast may be used to remove this
extra range and precision.
See here for more details.

comparing double values in C++

I have following code for double comparision. Why I am getting not equal when I execute?
#include <iostream>
#include <cmath>
#include <limits>
bool AreDoubleSame(double dFirstVal, double dSecondVal)
{
return std::fabs(dFirstVal - dSecondVal) < std::numeric_limits<double>::epsilon();
}
int main()
{
double dFirstDouble = 11.304;
double dSecondDouble = 11.3043;
if(AreDoubleSame(dFirstDouble , dSecondDouble ) )
{
std::cout << "equal" << std::endl;
}
else
{
std::cout << "not equal" << std::endl;
}
}
The epsilon for 2 doubles is 2.22045e-016
By definition, epsilon is the difference between 1 and the smallest value greater than 1 that is
representable for the data type.
These differ by more than that and hence, it returns false
(Reference)
The are not equal (according to your function) because they differ by more than epsilon.
Epsilon is defined as "Machine epsilon (the difference between 1 and the least value greater than 1 that is representable)" - source http://www.cplusplus.com/reference/std/limits/numeric_limits/. This is approximately 2.22045e-016 (source http://msdn.microsoft.com/en-us/library/6x7575x3(v=vs.71).aspx)
If you want to change the fudging factor, compare to another small double, for example:
bool AreDoubleSame(double dFirstVal, double dSecondVal)
{
return std::fabs(dFirstVal - dSecondVal) < 1E-3;
}
The difference between your two doubles is 0.0003. std::numeric_limits::epsilon() is much smaller than that.
epsilon() is only the difference between 1.0 and the next value representable after 1.0, the real min. The library function std::nextafter can be used to scale the equality precision test for numbers of any magnitude.
For example using std::nextafter to test double equality, by testing that b is both <= next number lower than a && >= next number higher than a:
bool nearly_equal(double a, double b)
{
return std::nextafter(a, std::numeric_limits<double>::lowest()) <= b
&& std::nextafter(a, std::numeric_limits<double>::max()) >= b;
}
This of course, would only be true if the bit patterns for a & b are the same. So it is an inefficient way of doing the (incorrect) naive direct a == b comparrison, therefore :
To test two double for equality within some factor scaled to the representable difference, you might use:
bool nearly_equal(double a, double b, int factor /* a factor of epsilon */)
{
double min_a = a - (a - std::nextafter(a, std::numeric_limits<double>::lowest())) * factor;
double max_a = a + (std::nextafter(a, std::numeric_limits<double>::max()) - a) * factor;
return min_a <= b && max_a >= b;
}
Of course working with floating point, analysis of the calculation precision would be required to determine how representation errors build up, to determine the correct minimum factor.
Epsilon is much smaller than 0.0003, so they are clearly not equal.
If you want to see where it works check http://ideone.com/blcmB