Truncate floor into three decimal point C++ - c++

I want to truncate floor number to be 3 digit decimal number. Example:
input : x = 0.363954;
output: 0.364
i used
double myCeil(float v, int p)
{
return int(v * pow(float(10),p))/pow(float(10),p );
}
but the output was 0.3630001 .
I tried to use trunc from <cmath> but it doesn't exist.

Floating-point math typically uses a binary representation; as a result, there are decimal values that cannot be exactly represented as floating-point values. Trying to fiddle with internal precisions runs into exactly this problem. But mostly when someone is trying to do this they're really trying to display a value using a particular precision, and that's simple:
double x = 0.363954;
std::cout.precision(3);
std::cout << x << '\n';

The function your looking for is the std::ceil, not std::trunc
double myCeil(double v, int p)
{
return std::ceil(v * std::pow(10, p)) / std::pow(10, p);
}
substitue in std::floor or std::round for a myFloor or myRound as desired. (Note that std::round appears in C++11, which you will have to enable if it isn't already done).

It is just impossible to get 0.364 exactly. There is no way you can store the number 0.364 (364/1000) exactly as a float, in the same way you would need an infinite number of decimals to write 1/3 as 0.3333333333...
You did it correctly, except for that you probably want to use std::round(), to round to the closest number, instead of int(), which truncates.
Comparing floating point numbers is tricky business. Typically the best you can do is check that the numbers are sufficiently close to each other.
Are you doing your rounding for comparison purposes? In such case, it seems you are happy with 3 decimals (this depends on each problem in question...), in such case why not just
bool are_equal_to_three_decimals(double a, double b)
{
return std::abs(a-b) < 0.001;
}
Note that the results obtained via comparing the rounded numbers and the function I suggested are not equivalent!

This is an old post, but what you are asking for is decimal precision with binary mathematics. The conversion between the two is giving you an apparent distinction.
The main point, I think, which you are making is to do with identity, so that you can use equality/inequality comparisons between two numbers.
Because of the fact that there is a discrepancy between what we humans use (decimal) and what computers use (binary), we have three choices.
We use a decimal library. This is computationally costly, because we are using maths which are different to how computers work. There are several, and one day they may be adopted into std. See eg "ISO/IEC JTC1 SC22 WG21 N2849"
We learn to do our maths in binary. This is mentally costly, because it's not how we do our maths normally.
We change our algorithm to include an identity test.
We change our algorithm to use a difference test.
With option 3, it is where we make a decision as to just how close one number needs to be to another number to be considered 'the same number'.
One simple way of doing this is (as given by #SirGuy above) where we use ceiling or floor as a test - this is good, because it allows us to choose the significant number of digits we are interested in. It is domain specific, and the solution that he gives might be a bit more optimal if using a power of 2 rather than of 10.
You definitely would only want to do the calculation when using equality/inequality tests.
So now, our equality test would be (for 10 binary places (nearly 3dp))
// Normal identity test for floats.
// Quick but fails eg 1.0000023 == 1.0000024
return (a == b);
Becomes (with 2^10 = 1024).
// Modified identity test for floats.
// Works with 1.0000023 == 1.0000024
return (std::floor(a * 1024) == std::floor(b * 1024));
But this isn't great
I would go for option 4.
Say you consider any difference less than 0.001 to be insignificant, such that 1.00012 = 1.00011.
This does an additional subtraction and a sign removal, which is far cheaper (and more reliable) than bit shifts.
// Modified equality test for floats.
// Returns true if the ∂ is less than 1/10000.
// Works with 1.0000023 == 1.0000024
return abs(a - b) < 0.0001;
This boils down to your comment about calculating circularity, I am suggesting that you calculate the delta (difference) between two circles, rather than testing for equivalence. But that isn't exactly what you asked in the question...

Related

Floating point Arithmetics

Today in my C++ programming lessons, my proff told me that one should never compare two floating point values directly.
So I tried this piece of code and found out the reason for his statement.
double l_Value=94.9;
print("%.20lf",l_Value);
And I found the results as 94.89999999 ( some relative error )
I understand that floating numbers are not stored in the way one presents it to the code. Squeezing those ones and zeros in binary form involves some relative rounding errors.
Iam looking for solutions to two problems.
1. Efficient way to compare two floating values.
2. How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456
Thanks in advance.
Efficient way to compare two floating values.
A simple double a,b; if (a == b) is an efficient way to compare two floating values. Yet as OP noticed, this may not meet the overall coding goal. Better ways depend on the context of the compare, something not supplied by OP. See far below.
How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456
Floating values as source code have effective unlimited range and precision such as 1.23456789012345678901234567890e1234567. Conversion of this text to a double is limited typically to one of 264 different values. The closest is selected, but that may not be an exact match.
Neither 0.1111, 94.4345, 94.5456 can be representably exactly as a typical double.
OP has choices:
1.) Use another type other than double, float. Various libraries offer decimal floating point types.
2) Limit code to rare platforms that support double to a base 10 form such that FLT_RADIX == 10.
3) Write your own code to handle user input like "0.1111" into a structure/string and perform the needed operations.
4) Treat user input as strings and the convert to some integer type, again with supported routines to read/compute/and write.
5) Accept that floating point operations are not mathematically exact and handle round-off error.
double a = 0.1111;
printf("a: %.*e\n", DBL_DECIMAL_DIG -1 , a);
double b = 94.4345;
printf("b: %.*e\n", DBL_DECIMAL_DIG -1 , b);
double sum = a + b;
printf("sum: %.*e\n", DBL_DECIMAL_DIG -1 , sum);
printf("%.4f\n", sum);
Output
a: 1.1110000000000000e-01
b: 9.4434500000000000e+01
sum: 9.4545599999999993e+01
94.5456 // Desired textual output based on a rounded `sum` to the nearest 0.0001
More on #1
If an exact compare is not sought but some sort of "are the two values close enough?", a definition of "close enough" is needed - of which there are many.
The following "close enough" compares the distance by examining the ULP of the two numbers. It is a linear difference when the values are in the same power-of-two and becomes logarithmic other wise. Of course, change of sign is an issue.
float example:
Consider all finite float ordered from most negative to most positive. The following, somewhat-portable code, returns an integer for each float with that same order.
uint32_t sequence_f(float x) {
union {
float f;
uint32_t u32;
} u;
assert(sizeof(float) == sizeof(uint32_t));
u.f = x;
if (u.u32 & 0x80000000) {
u.u32 ^= 0x80000000;
return 0x80000000 - u.u32;
}
return u.u3
}
Now, to determine if two float are "close enough", simple compare two integers.
static bool close_enough(float x, float y, uint32_t ULP_delta) {
uint32_t ullx = sequence_f(x);
uint32_t ully = sequence_f(y);
if (ullx > ully) return (ullx - ully) <= ULP_delta;
return (ully - ullx) <= ULP_delta;
}
The way I've usually done this is is to have a custom equality comparison function. The basic idea, is you have a certain tolerance, say 0.0001 or something. Then you subtract your two numbers and take their absolute value, and if it is less than your tolerance you treat it as equal. There are other strategies that may be more appropriate for certain situations, of course.
Define for yourself a tolerance level e (for example, e=.0001) and check if abs(a-b) <= e
You aren't going to get an "exact" value with floating point. Ever. If you know in advance that you are using four decimals, and you want "exact", then you need to internally treat your numbers as integers and only display them as decimals. 944345 + 1111 = 945456

Compensating for double/float inaccuracy

I've written a math calculator that takes in a string from the user and parses it. It uses doubles to hold all values involved when calculating. Once solved, I then print it, and use std::setprecision() to make sure it is output correctly (for instance 0.9999999 will become 1 on the print out.
Returning the string that will be output:
//return true or false if this is in the returnstring.
if (returnString.compare("True") == 0 || returnString.compare("False") == 0) return returnString;
//create stringstream and put the answer into the returnString.
std::stringstream stream;
returnString = std::to_string(temp_answer.answer);
//write into the stream with precision set correctly.
stream << std::fixed << std::setprecision(5) << temp_answer.answer;
return stream.str();
I am aware of the accuracy issues when using doubles and floats. Today I started working on code so that the user can compare the two mathematical strings. For instance, 1=1 will evaluate to true, 2>3 false...etc. This works by running my math expression parser for each side of the comparison operator, then comparing the answers.
The issue i'm facing right now is when the user enters something like 1/3*3=1. Of course because i'm using doubles the parser will return 0.999999as the answer. Usually when just solving a non-comparison problem this is compensated for at printing time with std::setprecision() as mentioned before. However, when comparing two doubles it's going to return false as 0.99999!=1. How can I get it so when comparing the doubles this inaccuracy is compensated for, and the answer returned correctly? Here's the code that I use to compare the numbers themselves.
bool math_comparisons::equal_to(std::string lhs, std::string rhs)
{
auto lhs_ret = std::async(process_question, lhs);
auto rhs_ret = std::async(process_question, rhs);
bool ReturnVal = false;
if (lhs_ret.get().answer == rhs_ret.get().answer)
{
ReturnVal = true;
}
return ReturnVal;
}
I'm thinking some kind of rounding needs to occur, but i'm not 100% sure how to accomplish it properly. Please forgive me if this has already been addressed - I couldn't find much with a search. Thanks!
Assuming that answer is a double, replace this
lhs_ret.get().answer == rhs_ret.get().answer
with
abs(lhs_ret.get().answer - rhs_ret.get().answer) < TOL
where TOL is an appropriate tolerance value.
Floating point numbers should never be compared with == but by checking if the absolute difference is less than a given tolerance.
There is one difficulty that needs to be mentioned: The accuracy of doubles is about 16 decimals. So you might set TOL=1.0e-16. This will only work if your numbers are less than 1. For a number with 16 digits, it means that the tolerance has to be as large as 1.
So either you assume that your numbers are smaller than say 10e8 and use a relatively large tolerance like 10e-8 or you need to do something much more complicated.
First consider:
As a basic rule of thumb, a double will be a value with roughly 16dp where dp is decimal places, or 1.0e-16. You need to be aware that this only applies to numbers that are less than one(1) IE for 10.n you'll have to operate around that fact you can only have 15dp EG: 10.0e-15 and so on... Due to computers counting in base 2, and people counting in base 10 some values can never be properly expressed in the bit ranges that "most" modern OS use.
This can be highlighted by the fact that expressing 0.1 in binary or base 2 is infinitely long.
Therefore you should never compare a rational number via the == operator. Instead what the "go to" solution conventionally used is:
You implement a "close enough" solution. IE: you define epsilon as a value eg: epsilon = 0.000001 and you state that if (value a - value b) < epsilon == true. What we are saying is that if a - b is within e, for all intents and purposes for our program, its close enough to be regarded as true.
Now for choosing a value for epsilon, that all depends on how accurate you need to be for your purposes. For example, one can assume you need a high level of accuracy for structural engineering compared to a 2D side scrolling platform game.
The solution in your case you be to replace line 7 of you code:
(lhs_ret.get().answer == rhs_ret.get().answer)
with
abs(lhs_ret.get().answer - rhs_ret.get().answer) < epsilon where abs is the absolute value. IE ignoring the sign of the value lhs.
For more context i highly recommend this lecture on MIT open courseware which explains it in an easy to digest manner.
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00sc-introduction-to-computer-science-and-programming-spring-2011/unit-1/lecture-7-debugging/

Rounding in C++ and round-tripping numbers

I have a class that internally represents some quantity in fixed point as 32-bit integer with somewhat arbitrary denominator (it is neither power of 2 nor power of 10).
For communicating with other applications the quantity is converted to plain old double on output and back on input. As code inside the class it looks like:
int32_t quantity;
double GetValue() { return double(quantity) / DENOMINATOR; }
void SetValue(double x) { quantity = x * DENOMINATOR; }
Now I need to ensure that if I output some value as double and read it back, I will always get the same value back. I.e. that
x.SetValue(x.GetValue());
will never change x.quantity (x is arbitrary instance of the class containing the above code).
The double representation has more digits of precision, so it should be possible. But it will almost certainly not be the case with the simplistic code above.
What rounding do I need to use and
How can I find the critical would-be corner cases to test that the rounding is indeed correct?
Any 32 bits will be represented exactly when you convert to a double, but when you divide then multiply by an arbitrary value you will get a similar value but not exactly the same. You should lose at most one bit per operations, which means your double will be almost the same, prior to casting back to an int.
However, since int casts are truncations, you will get the wrong result when very minor errors turn 2.000 into 1.999, thus what you need to do is a simple rounding task prior to casting back.
You can use std::lround() for this if you have C++11, else you can write you own rounding function.
You probably don't care about fairness much here, so the common int(doubleVal+0.5) will work for positives. If as seems likely, you have negatives, try this:
int round(double d) { return d<0?d-0.5:d+0.5; }
The problem you describe is the same problem which exists with converting between binary and decimal representation just with different bases. At least it exists if you want to have the double representation to be a good approximation of the original value (otherwise you could just multiply the 32 bit value you have with your fixed denominator and store the result in a double).
Assuming you want the double representation be a good approximation of your actual value the conversions are nontrivial! The conversion from your internal representation to double can be done using Dragon4 ("How to print floating point numbers accurately", Steele & White) or Grisu ("How to print floating point numbers quickly and accurately", Loitsch; I'm not sure if this algorithm is independent from the base, though). The reverse can be done using Bellerophon ("How to read floating point numbers accurately", Clinger). These algorithms aren't entirely trivial, though...

C++ determining if a number is an integer

I have a program in C++ where I divide two numbers, and I need to know if the answer is an integer or not. What I am using is:
if(fmod(answer,1) == 0)
I also tried this:
if(floor(answer)==answer)
The problem is that answer usually is a 5 digit number, but with many decimals. For example, answer can be: 58696.000000000000000025658 and the program considers that an integer.
Is there any way I can make this work?
I am dividing double a/double b= double answer
(sometimes there are more than 30 decimals)
Thanks!
EDIT:
a and b are numbers in the thousands (about 100,000) which are then raised to powers of 2 and 3, added together and divided (according to a complicated formula). So I am plugging in various a and b values and looking at the answer. I will only keep the a and b values that make the answer an integer. An example of what I got for one of the answers was: 218624 which my program above considered to be an integer, but it really was: 218624.00000000000000000056982 So I need a code that can distinguish integers with more than 20-30 decimals.
You can use std::modf in cmath.h:
double integral;
if(std::modf(answer, &integral) == 0.0)
The integral part of answer is stored in fraction and the return value of std::modf is the fractional part of answer with the same sign as answer.
The usual solution is to check if the number is within a very short distance of an integer, like this:
bool isInteger(double a){
double b=round(a),epsilon=1e-9; //some small range of error
return (a<=b+epsilon && a>=b-epsilon);
}
This is needed because floating point numbers have limited precision, and numbers that indeed are integers may not be represented perfectly. For example, the following would fail if we do a direct comparison:
double d=sqrt(2); //square root of 2
double answer=2.0/(d*d); //2 divided by 2
Here, answer actually holds the value 0.99999..., so we cannot compare that to an integer, and we cannot check if the fractional part is close to 0.
In general, since the floating point representation of a number can be either a bit smaller or a bit bigger than the actual number, it is not good to check if the fractional part is close to 0. It may be a number like 0.99999999 or 0.000001 (or even their negatives), these are all possible results of a precision loss. That's also why I'm checking both sides (+epsilon and -epsilon). You should adjust that epsilon variable to fit your needs.
Also, keep in mind that the precision of a double is close to 15 digits. You may also use a long double, which may give you some extra digits of precision (or not, it is up to the compiler), but even that only gets you around 18 digits. If you need more precision than that, you will need to use an external library, like GMP.
Floating point numbers are stored in memory using a very different bit format than integers. Because of this, comparing them for equality is not likely to work effectively. Instead, you need to test if the difference is smaller than some epsilon:
const double EPSILON = 0.00000000000000000001; // adjust for whatever precision is useful for you
double remainder = std::fmod(numer, denom);
if(std::fabs(0.0 - remainder) < EPSILON)
{
//...
}
Alternatively, if you want to include values that are close to integers (based on your desired precision), you can modify the if condition slightly (since the remainder returned by std::fmod will be in the range [0, 1)):
if (std::fabs(std::round(d) - d) < EPSILON)
{
// ...
}
You can see the test for this here.
Floating point numbers are generally somewhat precise to about 12-15 digits (as a double), but as they are stored as a mantissa (fraction) and a exponent, rational numbers (integers or common fractions) are not likely to be stored as such. For example,
double d = 2.0; // d might actually be 1.99999999999999995
Because of this, you need to compare the difference of what you expect to some very small number that encompasses the precision you desire (we will call this value, epsilon):
double d = 2.0;
bool test = std::fabs(2 - d) < epsilon; // will return true
So when you are trying to compare the remainder from std::fmod, you need to check it against the difference from 0.0 (not for actual equality to 0.0), which is what is done above.
Also, the std::fabs call prevents you from having to do 2 checks by asserting that the value will always be positive.
If you desire a precision that is greater than 15-18 decimal places, you cannot use double or long double; you will need to use a high precision floating point library.

Is floating-point == ever OK?

Just today I came across third-party software we're using and in their sample code there was something along these lines:
// Defined in somewhere.h
static const double BAR = 3.14;
// Code elsewhere.cpp
void foo(double d)
{
if (d == BAR)
...
}
I'm aware of the problem with floating-points and their representation, but it made me wonder if there are cases where float == float would be fine? I'm not asking for when it could work, but when it makes sense and works.
Also, what about a call like foo(BAR)? Will this always compare equal as they both use the same static const BAR?
Yes, you are guaranteed that whole numbers, including 0.0, compare with ==
Of course you have to be a little careful with how you got the whole number in the first place, assignment is safe but the result of any calculation is suspect
ps there are a set of real numbers that do have a perfect reproduction as a float (think of 1/2, 1/4 1/8 etc) but you probably don't know in advance that you have one of these.
Just to clarify. It is guaranteed by IEEE 754 that float representions of integers (whole numbers) within range, are exact.
float a=1.0;
float b=1.0;
a==b // true
But you have to be careful how you get the whole numbers
float a=1.0/3.0;
a*3.0 == 1.0 // not true !!
There are two ways to answer this question:
Are there cases where float == float gives the correct result?
Are there cases where float == float is acceptable coding?
The answer to (1) is: Yes, sometimes. But it's going to be fragile, which leads to the answer to (2): No. Don't do that. You're begging for bizarre bugs in the future.
As for a call of the form foo(BAR): In that particular case the comparison will return true, but when you are writing foo you don't know (and shouldn't depend on) how it is called. For example, calling foo(BAR) will be fine but foo(BAR * 2.0 / 2.0) (or even maybe foo(BAR * 1.0) depending on how much the compiler optimises things away) will break. You shouldn't be relying on the caller not performing any arithmetic!
Long story short, even though a == b will work in some cases you really shouldn't rely on it. Even if you can guarantee the calling semantics today maybe you won't be able to guarantee them next week so save yourself some pain and don't use ==.
To my mind, float == float is never* OK because it's pretty much unmaintainable.
*For small values of never.
The other answers explain quite well why using == for floating point numbers is dangerous. I just found one example that illustrates these dangers quite well, I believe.
On the x86 platform, you can get weird floating point results for some calculations, which are not due to rounding problems inherent to the calculations you perform. This simple C program will sometimes print "error":
#include <stdio.h>
void test(double x, double y)
{
const double y2 = x + 1.0;
if (y != y2)
printf("error\n");
}
void main()
{
const double x = .012;
const double y = x + 1.0;
test(x, y);
}
The program essentially just calculates
x = 0.012 + 1.0;
y = 0.012 + 1.0;
(only spread across two functions and with intermediate variables), but the comparison can still yield false!
The reason is that on the x86 platform, programs usually use the x87 FPU for floating point calculations. The x87 internally calculates with a higher precision than regular double, so double values need to be rounded when they are stored in memory. That means that a roundtrip x87 -> RAM -> x87 loses precision, and thus calculation results differ depending on whether intermediate results passed via RAM or whether they all stayed in FPU registers. This is of course a compiler decision, so the bug only manifests for certain compilers and optimization settings :-(.
For details see the GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
Rather scary...
Additional note:
Bugs of this kind will generally be quite tricky to debug, because the different values become the same once they hit RAM.
So if for example you extend the above program to actually print out the bit patterns of y and y2 right after comparing them, you will get the exact same value. To print the value, it has to be loaded into RAM to be passed to some print function like printf, and that will make the difference disappear...
I'll provide more-or-less real example of legitimate, meaningful and useful testing for float equality.
#include <stdio.h>
#include <math.h>
/* let's try to numerically solve a simple equation F(x)=0 */
double F(double x) {
return 2 * cos(x) - pow(1.2, x);
}
/* a well-known, simple & slow but extremely smart method to do this */
double bisection(double range_start, double range_end) {
double a = range_start;
double d = range_end - range_start;
int counter = 0;
while (a != a + d) // <-- WHOA!!
{
d /= 2.0;
if (F(a) * F(a + d) > 0) /* test for same sign */
a = a + d;
++counter;
}
printf("%d iterations done\n", counter);
return a;
}
int main() {
/* we must be sure that the root can be found in [0.0, 2.0] */
printf("F(0.0)=%.17f, F(2.0)=%.17f\n", F(0.0), F(2.0));
double x = bisection(0.0, 2.0);
printf("the root is near %.17f, F(%.17f)=%.17f\n", x, x, F(x));
}
I'd rather not explain the bisection method used itself, but emphasize on the stopping condition. It has exactly the discussed form: (a == a+d) where both sides are floats: a is our current approximation of the equation's root, and d is our current precision. Given the precondition of the algorithm — that there must be a root between range_start and range_end — we guarantee on every iteration that the root stays between a and a+d while d is halved every step, shrinking the bounds.
And then, after a number of iterations, d becomes so small that during addition with a it gets rounded to zero! That is, a+d turns out to be closer to a then to any other float; and so the FPU rounds it to the closest representable value: to a itself. Calculation on a hypothetical machine can illustrate; let it have 4-digit decimal mantissa and some large exponent range. Then what result should the machine give to 2.131e+02 + 7.000e-3? The exact answer is 213.107, but our machine can't represent such number; it has to round it. And 213.107 is much closer to 213.1 than to 213.2 — so the rounded result becomes 2.131e+02 — the little summand vanished, rounded up to zero. Exactly the same is guaranteed to happen at some iteration of our algorithm — and at that point we can't continue anymore. We have found the root to maximum possible precision.
Addendum
No you can't just use "some small number" in the stopping condition. For any choice of the number, some inputs will deem your choice too large, causing loss of precision, and there will be inputs which will deem your choiсe too small, causing excess iterations or even entering infinite loop. Imagine that our F can change — and suddenly the solutions can be both huge 1.0042e+50 and tiny 1.0098e-70. Detailed discussion follows.
Calculus has no notion of a "small number": for any real number, you can find infinitely many even smaller ones. The problem is, among those "even smaller" ones might be a root of our equation. Even worse, some equations will have distinct roots (e.g. 2.51e-8 and 1.38e-8) — both of which will get approximated by the same answer if our stopping condition looks like d < 1e-6. Whichever "small number" you choose, many roots which would've been found correctly to the maximum precision with a == a+d — will get spoiled by the "epsilon" being too large.
It's true however that floats' exponent has finite limited range, so one actually can find the smallest nonzero positive FP number; in IEEE 754 single precision, it's the 1e-45 denorm. But it's useless! while (d >= 1e-45) {…} will loop forever with single-precision (positive nonzero) d.
At the same time, any choice of the "small number" in d < eps stopping condition will be too small for many equations. Where the root has high enough exponent, the result of subtraction of two neighboring mantissas will easily exceed our "epsilon". For example, 7.00023e+8 - 7.00022e+8 = 0.00001e+8 = 1.00000e+3 = 1000 — meaning that the smallest possible difference between numbers with exponent +8 and 6-digit mantissa is... 1000! It will never fit into, say, 1e-4. For numbers with relatively high exponent we simply have not enough precision to ever see a difference of 1e-4. This means eps = 1e-4 will be too small!
My implementation above took this last problem into account; you can see that d is halved each step — instead of getting recalculated as difference of (possibly huge in exponent) a and b. For reals, it doesn't matter; for floats it does! The algorithm will get into infinite loops with (b-a) < eps on equations with huge enough roots. The previous paragraph shows why. d < eps won't get stuck, but even then — needless iterations will be performed during shrinking d way down below the precision of a — still showing the choice of eps as too small. But a == a+d will stop exactly at precision.
Thus as shown: any choice of eps in while (d < eps) {…} will be both too large and too small, if we allow F to vary.
... This kind of reasoning may seem overly theoretical and needlessly deep, but it's to illustrate again the trickiness of floats. One should be aware of their finite precision when writing arithmetic operators around.
Perfect for integral values even in floating point formats
But the short answer is: "No, don't use ==."
Ironically, the floating point format works "perfectly", i.e., with exact precision, when operating on integral values within the range of the format. This means that you if you stick with double values, you get perfectly good integers with a little more than 50 bits, giving you about +- 4,500,000,000,000,000, or 4.5 quadrillion.
In fact, this is how JavaScript works internally, and it's why JavaScript can do things like + and - on really big numbers, but can only << and >> on 32-bit ones.
Strictly speaking, you can exactly compare sums and products of numbers with precise representations. Those would be all the integers, plus fractions composed of 1 / 2n terms. So, a loop incrementing by n + 0.25, n + 0.50, or n + 0.75 would be fine, but not any of the other 96 decimal fractions with 2 digits.
So the answer is: while exact equality can in theory make sense in narrow cases, it is best avoided.
The only case where I ever use == (or !=) for floats is in the following:
if (x != x)
{
// Here x is guaranteed to be Not a Number
}
and I must admit I am guilty of using Not A Number as a magic floating point constant (using numeric_limits<double>::quiet_NaN() in C++).
There is no point in comparing floating point numbers for strict equality. Floating point numbers have been designed with predictable relative accuracy limits. You are responsible for knowing what precision to expect from them and your algorithms.
It's probably ok if you're never going to calculate the value before you compare it. If you are testing if a floating point number is exactly pi, or -1, or 1 and you know that's the limited values being passed in...
I also used it a few times when rewriting few algorithms to multithreaded versions. I used a test that compared results for single- and multithreaded version to be sure, that both of them give exactly the same result.
Let's say you have a function that scales an array of floats by a constant factor:
void scale(float factor, float *vector, int extent) {
int i;
for (i = 0; i < extent; ++i) {
vector[i] *= factor;
}
}
I'll assume that your floating point implementation can represent 1.0 and 0.0 exactly, and that 0.0 is represented by all 0 bits.
If factor is exactly 1.0 then this function is a no-op, and you can return without doing any work. If factor is exactly 0.0 then this can be implemented with a call to memset, which will likely be faster than performing the floating point multiplications individually.
The reference implementation of BLAS functions at netlib uses such techniques extensively.
In my opinion, comparing for equality (or some equivalence) is a requirement in most situations: standard C++ containers or algorithms with an implied equality comparison functor, like std::unordered_set for example, requires that this comparator be an equivalence relation (see C++ named requirements: UnorderedAssociativeContainer).
Unfortunately, comparing with an epsilon as in abs(a - b) < epsilon does not yield an equivalence relation since it loses transitivity. This is most probably undefined behavior, specifically two 'almost equal' floating point numbers could yield different hashes; this can put the unordered_set in an invalid state.
Personally, I would use == for floating points most of the time, unless any kind of FPU computation would be involved on any operands. With containers and container algorithms, where only read/writes are involved, == (or any equivalence relation) is the safest.
abs(a - b) < epsilon is more or less a convergence criteria similar to a limit. I find this relation useful if I need to verify that a mathematical identity holds between two computations (for example PV = nRT, or distance = time * speed).
In short, use == if and only if no floating point computation occur;
never use abs(a-b) < e as an equality predicate;
Yes. 1/x will be valid unless x==0. You don't need an imprecise test here. 1/0.00000001 is perfectly fine. I can't think of any other case - you can't even check tan(x) for x==PI/2
The other posts show where it is appropriate. I think using bit-exact compares to avoid needless calculation is also okay..
Example:
float someFunction (float argument)
{
// I really want bit-exact comparison here!
if (argument != lastargument)
{
lastargument = argument;
cachedValue = very_expensive_calculation (argument);
}
return cachedValue;
}
I would say that comparing floats for equality would be OK if a false-negative answer is acceptable.
Assume for example, that you have a program that prints out floating points values to the screen and that if the floating point value happens to be exactly equal to M_PI, then you would like it to print out "pi" instead. If the value happens to deviate a tiny bit from the exact double representation of M_PI, it will print out a double value instead, which is equally valid, but a little less readable to the user.
I have a drawing program that fundamentally uses a floating point for its coordinate system since the user is allowed to work at any granularity/zoom. The thing they are drawing contains lines that can be bent at points created by them. When they drag one point on top of another they're merged.
In order to do "proper" floating point comparison I'd have to come up with some range within which to consider the points the same. Since the user can zoom in to infinity and work within that range and since I couldn't get anyone to commit to some sort of range, we just use '==' to see if the points are the same. Occasionally there'll be an issue where points that are supposed to be exactly the same are off by .000000000001 or something (especially around 0,0) but usually it works just fine. It's supposed to be hard to merge points without the snap turned on anyway...or at least that's how the original version worked.
It throws of the testing group occasionally but that's their problem :p
So anyway, there's an example of a possibly reasonable time to use '=='. The thing to note is that the decision is less about technical accuracy than about client wishes (or lack thereof) and convenience. It's not something that needs to be all that accurate anyway. So what if two points won't merge when you expect them to? It's not the end of the world and won't effect 'calculations'.