float/double equality with exact zero - c++

I have an algorithm which uses floats or doubles to perform some calculations.
Example:
double a;
double b;
double c;
...
double result = c / (b - a);
if ((result > 0) && (result < small_number))
{
// result is relevant...
} else {
// result not required...
}
Now, I am worried about (b - a) might be zero. If it is close to zero but not zero, it does not matter because the result will be out of range to be useful, and I already detect that (as (b - a) approaches zero, result will approach +/- inf, which is not in the range 0-small_number...)
But if the result of (b - a) is exactly zero, I expect that something platform dependant will happen due to divide by zero. I could change the if statement to:
if ((!((b-a) == 0.0)) && ((result = c/(b-a)) > 0) && (result < small_number)) {
but I don't know if (b-a) == 0.0 will always detect equality with zero. I have seen there are multiple representations for exact zero in floating point? How can you test for them all without doing some epsilon check, which I don't need (a small epsilon will be ignored in my algorithm)?
What is the platform independant way to check?
EDIT:
Not sure if it was clear enough to people. Basically I want to know how to find if an expression like:
double result = numerator / denominator;
will result in a floating point exception, a cpu exception, a signal from the operating system or something else.... without actually performing the operating and seeing if it will "throw"... because detecting a "throw" of this nature seems to be complicated and platform specific.
Is ( (denominator==0.0) || (denominator==-0.0) ) ? "Will 'throw'" : "Won't 'throw'"; enough?

It depends on how b and a got their values. Zero has an exact representation in floating point format, but the bigger problem would be almost-but-not-quite zero values. It would always be safe to check:
if (abs(b-a) > 0.00000001 && ...
Where 0.00000001 is whatever value makes sense.

Here's how you do it: instead of checking for (result < small_number), you check for
(abs(c) < abs(b - a) * small_number)
Then all your troubles disappear! The computation of c/(b-a) will never overflow if this test is passed.

I guess you can use fpclassify(-0.0) == FP_ZERO . But this is only useful if you want to check if someone did put some kind of zero into float-type variable. As many already said if you want to check result of calculation you may get values very close to zero due to nature of representation.

In brief, we can know a floating number is ZERO exactly if we know it represent format.
In practice, we compare x with a small number. And if x is less than this number, we think x is as same as ZERO functionally (but most of time our small number is still large than zero). This method is very easy, efficient and can cross platform.
Actually, the float and double have been presented by special format, and the widely used one is IEEE 754 in current hardware which divided the number into sign, exponent and mantissa (significand) bits.
So, if we want to check if a float number is ZERO exactly, we can check if both exponent and mantissa is ZERO, see here.
In IEEE 754 binary floating point numbers, zero values are represented
by the biased exponent and significand both being zero. Negative zero
has the sign bit set to one.
Take float for example, we can write a simple code to extract exponent and mantissa bit and then check it.
#include <stdio.h>
typedef union {
float f;
struct {
unsigned int mantissa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} parts;
} float_cast;
int isZero(float num) {
int flag = 0;
float_cast data;
data.f = num;
// Check both exponent and mantissa parts
if(data.parts.exponent == 0u && data.parts.mantissa == 0u) {
flag = 1;
} else {
flag = 0;
}
return(flag);
}
int main() {
float num1 = 0.f, num2 = -0.f, num3 = 1.2f;
printf("\n is zero of %f -> %d", num1, isZero(num1));
printf("\n is zero of %f -> %d", num2, isZero(num2));
printf("\n is zero of %f -> %d", num3, isZero(num3));
return(0);
}
Test results:
# is zero of 0.000000 -> 1
# is zero of -0.000000 -> 1
# is zero of 1.200000 -> 0
More examples:
Let's check when the float becomes real ZERO with code.
void test() {
int i =0;
float e = 1.f, small = 1.f;
for(i = 0; i < 40; i++) {
e *= 10.f;
small = 1.f/e;
printf("\nis %e zero? : %d", small, isZero(small));
}
return;
}
is 1.0000e-01 zero? : NO
is 1.0000e-02 zero? : NO
is 1.0000e-03 zero? : NO
is 1.0000e-04 zero? : NO
is 1.0000e-05 zero? : NO
is 1.0000e-06 zero? : NO
is 1.0000e-07 zero? : NO
is 1.0000e-08 zero? : NO
is 1.0000e-09 zero? : NO
is 1.0000e-10 zero? : NO
is 1.0000e-11 zero? : NO
is 1.0000e-12 zero? : NO
is 1.0000e-13 zero? : NO
is 1.0000e-14 zero? : NO
is 1.0000e-15 zero? : NO
is 1.0000e-16 zero? : NO
is 1.0000e-17 zero? : NO
is 1.0000e-18 zero? : NO
is 1.0000e-19 zero? : NO
is 1.0000e-20 zero? : NO
is 1.0000e-21 zero? : NO
is 1.0000e-22 zero? : NO
is 1.0000e-23 zero? : NO
is 1.0000e-24 zero? : NO
is 1.0000e-25 zero? : NO
is 1.0000e-26 zero? : NO
is 1.0000e-27 zero? : NO
is 1.0000e-28 zero? : NO
is 1.0000e-29 zero? : NO
is 1.0000e-30 zero? : NO
is 1.0000e-31 zero? : NO
is 1.0000e-32 zero? : NO
is 1.0000e-33 zero? : NO
is 1.0000e-34 zero? : NO
is 1.0000e-35 zero? : NO
is 1.0000e-36 zero? : NO
is 1.0000e-37 zero? : NO
is 1.0000e-38 zero? : NO
is 0.0000e+00 zero? : YES <-- 1e-39
is 0.0000e+00 zero? : YES <-- 1e-40

UPDATE (2016-01-04)
I've received some downvotes on this answer, and I wondered if I should just delete it. It seems the consensus (https://meta.stackexchange.com/questions/146403/should-i-delete-my-answers) is that deleting answers should only be done in extreme cases.
So, my answer is wrong. But I guess I'm leaving it up because it provides for an interesting "think out of the box" kind of thought experiment.
===============
Bingo,
You say you want to know if b-a == 0.
Another way of looking at this is to determine whether a == b. If a equals b, then b-a will be equal 0.
Another interesting idea I found:
http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Essentially, you take the floating point variables you have and tell the compiler to reinterpret them (bit for bit) as signed integers, as in the following:
if (*(int*)&b == *(int*)&a)
Then you are comparing integers, and not floating points. Maybe that will help? Maybe not. Good luck!

I believe that (b-a)==0 will be true exactly in those cases when the c/(b-a) would fail because of (b-a) being zero. The float maths is tricky but questioning this is exaggerating in my opinion. Also I believe that the (b-a)==0 is going to be equivalent to b!=a.
Distinguishing positive and negative 0 is also not necessary. See e.g. here Does float have a negative zero? (-0f)

For epsilon, in there is a standard template definition std::numeric_limits::epsilon(). I guess checking the difference to be bigger than std::numeric_limits::epsilon() should be safe enough to protect against division by zero. No platform dependency here I guess.

You could try
if ((b-a)!=(a-b) && ((result = c/(b-a)) > 0) && (result < small_number))) {
...

Related

How to get rid of -0 in C++

I am writing a program in which there are some operations being performed on a floating point number. After I debugged the program, I came to know that for a particular test case, the value of the variable equals -2.38418579e-07. Now I have cout precision set to 2 digits after decimal. So when I print it, it prints it as -0.00.
However, I would like the output to be 0.00 instead of -0.00. I have tried various if conditions on the variable's value. However, they do not help. Can anyone suggest how to get rid of -0.00 in C++
Firstly, you should define a tolerance number as threshold, where the absolute value of any floating point number bellow this threshold would be considered as zero. For example you could define this threshold as:
#define zero 1e-6
Then you could use the following construct to "filter" your floating point numbers:
template<typename T>
std::enable_if_t<std::is_floating_point<T>::value, T> sanitize(T &&num) {
return std::abs(num) < zero? T{} : num;
}
Live Demo
Notice that I use SFINAE in order for the sanitize function to accepts as input only floating point numbers.
I would like the output to be 0.00 instead of -0.00
I like the other answers better. But in a crunch you can always use brute force ... (are you sure you can ignore the actual results?)
std::string rslt;
{
std::stringstream ss;
ss << variable; // use same formatting as in your example
size_t minusSignIndx = ss.str().find("-0.00");
if (minusSignIndx != std::string::npos)
rslt = " 0.00"; // found the nasty, ignore it
else
rslt = ss.str(); // not nasty, use it
}
//... use rslt
The problem is that every floating point in a certain interval [ low , -0.0] will be printed "-0.00".
Thus you have to find low:
such that print(predecessor(low)) => "-0.01"
such that print(low) => "-0.00"
Then, you'll be able to write something like (nan apart...)
double filter(double x) {
double low = ... ;
return (x < low)
? x
: ((x > 0.0)
? x
: 0.0) ;
}
If you have a correctly rounded printf, and manage your arithmetic to be strictly IEEE754 conformant with appropriate compiler flags, the exact value of low is the nearest double to -1/200, greater than -1/200 (I write 1/200 rather than -0.005 because I'm speaking of the decimal value, not the double)
What we have with correctly rounded sscanf("-0.005","%lf",d): the double result is smaller than -1/200. I did check that with exact arithmetic like for example found in Pharo Smalltalk language:
[-0.005 < (-1/200) and: [-0.005 successor > (-1/200)]] assert.
Its successor is greater than -1/200 (necessarily, above check is just foolproofing).
Thus you can write (notice the <= low):
double filter(double x) {
double low = 0.005 ;
return (x <= low)
? x
: ((x > 0.0)
? x
: 0.0) ;
}

Do multiples of Pi to the thousandths have a value that may change how a loop executes?

Recently I decided to get into c++, and after going through the basics I decided to build a calculator using only iostream (just to challenge myself). After most of it was complete, I came across an issue with my loop for exponents. Whenever a multiple of Pi was used as the exponent, it looped way too many times. I fixed it in a somewhat redundant way and now I'm hoping someone might be able to tell me what happened. My unfixed code snippet is below. Ignore everything above and just look at the last bit of fully functioning code. All I was wondering was why values of pi would throw off the loop so much. Thanks.
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
And then heres where it all goes wrong (Denominator is set to a value of 1)
if (TestForDecimal(Power) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(Power) == 1) {
Power = Power * 10;
Denominator = Denominator * 10;
}
}
If anyone could give me an explanation that would be great!
To clarify further, the while loop kept looping even after Power became a whole number (This only happened when Power was equal to a multiple of pi such as 3.1415 or 6.2830 etc.)
Heres a complete code you can try:
#include <iostream>
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
void foo(double Power) {
double x = Power;
if (TestForDecimal(x) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(x) == 1) {
x = x * 10;
std::cout << x << std::endl;
}
}
}
int main() {
foo(3.145); // Substitute this with 3.1415 and it doesn't work (this was my problem)
system("Pause");
return 0;
}
What's wrong with doing something like this?
#include <cmath> // abs and round
#include <cfloat> // DBL_EPSILON
bool TestForDecimal(double Num) {
double diff = abs(round(Num) - Num);
// true if not a whole number
return diff > DBL_EPSILON;
}
The look is quite inefficient...what if Num is large...
A faster way could be something like
if (Num == static_cast<int>(Num))
or
if (Num == (int)Num)
if you prefer a C-style syntax.
Then a range check may be useful... it oes not make sense to ask if Num is an intger when is larger than 2^32 (about 4 billions)
Finally do not think od these numers as decimals. They are stored as binary numbers, instead of multiplying Power and Denominator by 2 you are better of multiplying them by 2.
Most decimal fractions can't be represented exactly in a binary floating-point format, so what you're trying to do can't work in general. For example, with a standard 64-bit double format, the closest representable value to 3.1415 is more like 3.1415000000000001812.
If you need to represent decimal fractions exactly, then you'll need a non-standard type. Boost.Multiprecision has some decimal types, and there's a proposal to add decimal types to the standard library; some implementations may have experimental support for this.
Beware. A double is (generally but I think you use a standard architecture) represented in IEE-754 format, that is mantissa * 2exponent. For a double, you have 53 bits for the mantissa part, one for the sign and 10 for the exponent. When you multiply it by 10 it will grow, and will get an integer value as soon as exponent will be greater than 53.
Unfortunately, unless you have a 64 bits system, an 53 bits integer cannot be represented as a 32 bits int, and your test will fail again.
So if you have a 32 bits system, you will never reach an integer value. You will more likely reach an infinity representation and stay there ...
The only use case where it could work, would be if you started with a number that can be represented with a small number of negative power of 2, for example 0.5 (1/2), 0.25(1/4), 0.75(1/2 + 1/4), giving almost all digits of mantissa part being 0.
After studying your "unfixed" function, from what I can tell, here's your basic algorithm:
double TestForDecimal(double Num) { ...
A function that accepts a double and returns a double. This would make sense if the returned value was the decimal value, but since that's not the case, perhaps you meant to use bool?
while (Num > 1) { make it less }
While there is nothing inherently wrong with this, it doesn't really address negative numbers with large magnitudes, so you'll run into problems there.
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) { return 0; }
This means that if Num is within the signed integer range and its integer typecast is equal to itself, return a 0 typecasted to a double. This means you don't care whether numbers outside the integer range are whole numbers or not. To fix this, change the condition to if (Num == (long)Num) since sizeof(long) == sizeof(double).
Perhaps the algorithm your function follows that I've just explained might shed some light on your problem.

Determining the number of decimal digits in a double - C++

I am trying to get the number of digits after a decimal point in a double. Currently, my code looks like this:
int num_of_decimal_digits = 0;
while (someDouble - someInt != 0)
{
someDouble = someDouble*10;
someInt = someDouble;
num_of_decimal_digits++;
}
Whenever I enter a decimal in for someDouble that is less than one, the loop gets stuck and repeats infinitely. Should I use static_cast? Any advice?
Due to floating-point rounding error, multiplying by 10 is not necessarily an exact decimal shift. You can test the absolute error of the difference rather than comparing it for exact equality with 0.
while (abs(someDouble - someInt) < epsilon)
Or you can acknowledge that a double with a 53-bit mantissa can only represent log10 253 ≈ 15.9 decimal digits, and limit the loop to 16 iterations.
while (someDouble - someInt != 0 && num_of_decimal_digits < 16)
Or both.
while (abs(someDouble - someInt) < epsilon && num_of_decimal_digits < 16)
The naive answer would be:
int num_of_decimal_digits = 0;
double absDouble = someDouble > 0 ? someDouble : someDouble * -1;
while (absDouble - someInt != 0)
{
absDouble = absDouble*10;
someInt = absDouble;
num_of_decimal_digits++;
}
This solves your problem of negative numbers.
However, this solution is likely not going to give you the output you desire in a lot of cases because of the way that floating point numbers are represented. For example 0.35 might really be represented as 0.3499999999998 the way floating point numbers are stored in binary. I would suggest that you share more background information about what you are hoping to accomplish with this code (your input and your desired output). There is likely a much better solution for what you are attempting to accomplish.

How close to division by zero can I get?

I want to avoid dividing by zero so I have an if statement:
float number;
//........
if (number > 0.000000000000001)
number = 1/number;
How small of a value can I safely use in place of 0.000000000000001?
Just use:
if(number > 0)
number = 1/number;
Note the difference between > and >=. If number > 0, then it definitely is not 0.
If number can be negative you can also use:
if(number != 0)
number = 1/number;
Note that, as others have mentioned in the comments, checking that number is not 0 will not prevent your result from being Inf or -Inf.
The number in the if condition depends on what you want to do with the result. In IEEE 754, which is used by (almost?) all C implementations, dividing by 0 is OK: you get positive or negative infinity.
If your goal is to avoid +/- Infinity, then the number in the if condition will depend upon the numerator. When the numerator is 1, you can use DBL_MIN or FLT_MIN from math.h.
If your goal is to avoid huge numbers after the division, you can do the division and then check if fabs(number) is bigger than certain value after the division, and then take whatever action as needed.
There is no single correct answer to your question.
You can simply check:
if (number > 0)
I can't understand why you need the lower limit.
For numeric type T std::numeric_limits gives you anything you need. For example you could do this to make sure that anything above min_invertible has finite reciprocal:
float max_float = std::numeric_limits<float>::max();
float min_float = std::numeric_limits<float>::min(); // or denorm_min()
float min_invertible = (max_float*min_float > 1.0f )? min_float : 1.0f/max_float;
You can't decently check up front. DBL_MAX / 0.5 effectively is a division by zero; the result is the same infinity you'd get from any other division by (almost) zero.
There is a simple solution: just check the result. std::isinf(result) will tell you whether the result overflowed, and IEEE754 tells you that division cannot produce infinity in other cases. (Well, except for INF/x,. That's not really producing infinity but merely preserving it.)
Your risk of producing an unhelpful result through overflow or underflow depends on both numerator and denominator.
A safety check which takes that into consideration is:
if (den == 0.0 || log2(num) - log2(den) >= log2(FLT_MAX))
/* expect overflow */ ;
else
return num / den;
but you might want to shave a small amount off log2(FLT_MAX) to leave wiggle-room for subsequent arithmetic and round-off.
You can do something similar with frexp, which would work for negative values as well:
int max;
int n, d;
frexp(FLT_MAX, &max);
frexp(num, &n);
frexp(den, &d);
if (den == 0.0 || n - d > max)
/* might overflow */ ;
else
return num / den;
This avoids the work of computing the logarithm, which might be more efficient if the compiler can find a suitable way of doing it, but it's not as accurate.
With IEEE 32-bit floats, the smallest possible value greater than 0 is 2^-149.
If you're using IEEE 64-bit, the smallest possible value is 2^-1074.
That said, (x > 0) is probably the better test.

Can float values add to a sum of zero? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Most effective way for float and double comparison
I have two values(floats) I am attempting to add together and average. The issue I have is that occasionally these values would add up to zero, thus not requiring them to be averaged.
The situation I am in specifically contains the values "-1" and "1", yet when added together I am given the value "-1.19209e-007" which is clearly not 0. Any information on this?
I'm sorry but this doesn't make sense to me.
Two floating point values, if they are exactly the same but with opposite sign, subtracted will produce always 0. This is how floating point operations works.
float a = 0.2f;
float b = -0.2f;
float f = (a - b) / 2;
printf("%f %d\n", f, f != 0); // will print out 0.0000 0
Will be always 0 also if the compiler doesn't optimize the code.
There is not any kind of rounding error to take in account if a and b have the same value but opposite sign! That is, if the higher bit of a is 0 and the higher bit of b is 1 and all other bits are the same, the result cannot be other than 0.
But if a and b are slightly different, of course, the result can be non-zero.
One possible solution to avoid this can be using a tolerance...
float f = (a + b) / 2;
if (abs(f) < 0.000001f)
f = 0;
We are using a simple tolerance to see if our value is near to zero.
A nice example code to show this is...
int main(int argc)
{
for (int i = -10000000; i <= 10000000 * argc; ++i)
{
if (i != 0)
{
float a = 3.14159265f / i;
float b = -a + (argc - 1);
float f = (a + b) / 2;
if (f != 0)
printf("%f %d\n", a, f);
}
}
printf("completed\n");
return 0;
}
I'm using "argc" here as a trick to force the compiler to not optimize out our code.
At least right off, this sounds like typical floating point imprecision.
The usual way to deal with it is to round your numbers to the correct number of significant digits. In this case, your average would be -1.19209e-08 (i.e., 0.00000001192). To (say) six or seven significant digits, that is zero.
Takes the sum of all your numbers, divide by your count. Round off your answer to something reasonable before you do prints, reports comparisons, or whatever you're doing.
again, do some searching on this but here is the basic explanation ...
the computer approximates floating point numbers by base 2 instead of base 10. this means that , for example, 0.2 (when converted to binary) is actually 0.001100110011 ... on forever. since the computer cannot add these on forever, it must approximate it.
because of these approximations, we lose "precision" of calculations. hence "single" and "double" precision floating point numbers. this is why you never test for a float to be actually 0. instead, you test whether is below some threshhold which you want to use as zero.