Treating double variable as boolean - c++

Which is the better way(efficiency/best practice wise) to test if a double variable is equal to 0?
1. if(a_double)
.....
else
.....
OR
2. if(a_double == 0)
.....
else
.....

The second is generally better (more explicit about what you're doing). I'd normally prefer if (a_double == 0.0) though. Also note that with floating point, it's often useful to do an approximate comparison to account for the possibility of rounding in the calculations (though doing this well can be non-trivial).
Edit: Since there appears to be some misunderstanding about what numbers can and can't be represented precisely: most computers use binary floating point. This means a fraction in which the denominator is a sum of powers of two can be represented exactly. If (and only if) the denominator contains a prime factor that cannot be represented as a sum of powers of 2 is it impossible to represent that number precisely.
Of course, if you get too small (or too large) it can be impossible to represent the number at all (e.g., normal IEEE floating point doesn't provide a way to represent a number like 1e+10000 or 1e-2000). Also, when you approach the limits of representation, you give up some precision (e.g., the limit for a normal double is 1e-308 -- and at 1e-300, you only get ~7 digits of precision instead of the usual ~15).
Within certainly limits, however, integers can be represented precisely (depends on the size of the significand -- usually around 253), and so can fractions where the denominator is a sum or powers of 2 (e.g., 3.5, 1.375).
There are also computers that use hexadecimal floating point and/or decimal floating point. For what we're considering here, hexadecimal floating point is essentially the same as binary floating point (i.e., since 16 is a power of 2, it can still only represent fractions precisely if the denominator is a sum of powers of 2). Decimal floating point allows precise representation of numbers where the denominator includes powers of 5 as well (e.g., 1.2 can be represented precisely in decimal floating point but not in binary or hexadecimal floating point). This has the obvious advantage that (again, within the limits of its range and precision) anything you enter as a decimal number will be represented precisely.

Because in general doubles are not going to be exactly equal to any integer you need to be careful when testing them for equality.
Read this:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
In essence with a float you want to test that the float is close to some number within some tolerance.

Edit
Apologies for the previous incomplete and incorrect explanation. This is what I meant.
#include<iostream>
using namespace std;
int main(){
double a=3/5;
double b=2/5;
double c=(a+b)-1;
if(c==0)
cout<<"C is 0"<<endl;
else if(c==0.0)
cout<<"C is 0.0"<<endl;
else
cout<<"C!=0 && C!=0.0"<<endl; //This will print most of the time
}
--Edit Ends--
You may have to read the test cases as to how a_double gets its values. The greatest problem as #Jerry told is that it's difficult to test == with double. Note the following will not work as expected..
#include<iostream>
using namespace std;
int main(){
double a=5.5;
double b=6.5;
double c=(a+b)/12;
if(c==0)
cout<<c<<endl;
else
cout<<"c is not equal to 0"<<endl;
}

Related

Why does it seem like my floating point value is not equal to itself? [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 4 years ago.
#include<iostream.h>
using namespace std;
int main()
{
float x=1.1;
if(x==1.1)
cout<<"yes";
else
cout<<"no";
return 0;
}
I assign value 1.1 to x and checked value of x is 1.1 or not?
You've wandered into an interesting area of almost all programming languages. Floating point values are tricky things, and testing them for equality is very rarely recommended. The basic problem is that floating point values on modern computers are represented as binary decimals with a finite number of digits of precision.
To make this simpler to understand, lets work with base 10 decimals and use a number that can't be accurately represented using them. Take 1/3. If you are representing it as a base 10 decimal you get this:
0.̅3 (there is a bar over the three if it isn't showing up properly). Basically, it goes on forever, there is no finite number of digits that can represent 1/3 as a base ten decimal with perfect accuracy. So, if you only have so many digits, you chop it off and approximate:
0.333333
That's actually 333333/1000000, which is really close to 1/3, but not quite.
C++ has a few different floating point types. And these types usually (it depends on the platform the program is being compiled for) have different numbers of significant digits. By default, a floating point constant is of type double which usually has more digits than a float (and it never has less). Again, using base 10 as an example, since you were storing your value in a float you were doing something like this:
0.333333 == 0.3333333333333333333
which of course is false.
If you wrote your code this way:
#include <iostream>
using namespace std;
int main()
{
float x = 1.1f;
if(x == 1.1f)
cout<<"yes";
else
cout<<"no";
return 0;
}
you would likely get the expected result. Putting an f at the end of a bare floating point value (aka, a floating point literal) tells C++ that it's of type float.
This is all very fascinating of course, and there's a lot to get into. If you would like to learn a lot more about how floating point numbers are really represented, there is a nice Wikipedia page on IEEE 754 floating point representation, which is how most modern processors represent floating point numbers nowadays.
From a practical standpoint, you should rarely (if ever) compare floating point numbers for equality. Usually, a desire to do so indicates some sort of design flaw in your program. And if you really must than use an 'epsilon' comparison. Basically, test to see if your number is 'close enough', though determining what that means in any given situation isn't necessarily a trivial task, which is why it usually represents a design flaw if you need to compare them for equality at all. But, in your case, it could look like this:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
float x=1.1;
if (fabs(x - 1.1) < 0.000001)
cout<<"yes";
else
cout<<"no";
return 0;
}
The reason the compare fails is that you're comparing a double value to a float variable.
Some compilers will issue a warning when you assign a double value to a float variable.
To get the desired output, you could try this:
double x = 1.1;
if (x == 1.1)
or this:
float x = 1.1f;
if (x == 1.1f)

Very large differences using float and double

#include <iostream>
using namespace std;
int main() {
int steps=1000000000;
float s = 0;
for (int i=1;i<(steps+1);i++){
s += (i/2.0) ;
}
cout << s << endl;
}
Declaring s as float: 9.0072e+15
Declaring s as double: 2.5e+17 (same result as implementing it in Julia)
I understand double has double precision than float, but float should still handle numbers up to 10^38.
I did read similar topics where results where not the same, but in that cases the differences were very small, here the difference is 25x.
I also add that using long double instead gives me the same result as double. If the matter is the precision, I would have expected to have something a bit different.
The problem is the lack of precision: https://en.wikipedia.org/wiki/Floating_point
After 100 million numbers you are adding 1e8 to 1e16 (or at least numbers of that magnitude), but single precision numbers are only accurate to 7 digits - so it is the same as adding 0 to 1e16; that's why your result is considerably lower for float.
Prefer double over float in most cases.
Problem with floating point precision! Infinite real numbers cannot possibly be represented by the finite memory of a computer. Float, in general, are just approximations of the number they are meant to represent.
For more details, please check the following documentation:
https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors
You didn't mention what type of floating point numbers you are using, but I'm going to assume that you use IEEE 754, or similar.
I understand double has double precision
To be more precise with the terminology, double uses twice as many bits. That's not double the number of reprensentable values, it's 4294967296 times as many representable values, despite being named "double precision".
but float should still handle numbers up to 10^38.
Float can handle a few numbers up to that magnitude. But that does't mean that float values in that range are precise. For example, 3,4028235E+38 can be represented as a single precision float. How much would you imagine is the difference between the previous value representable by float? Is it the machine epsilon? Perhaps 0.1? Maybe 1? No. The difference is about 2E+31.
Now, your numbers aren't quite in that range. But, they're outside the continuous range of whole integers that can be precisely represented by float. The highest value in that range happens to be 16777217, or about 1.7E+7, which is way less than 2.5E+17. So, every addition beyond that range adds some error to the result. You perform a billion calculations so those errors add up.
Conclusions:
Understand that single precision is way less precise than double precision.
Avoid long sequences of calculations where precision errors can accumulate.

How to shift a floating-point value to the nearest one that can be represented exactly in a specific number of decimal places?

Is there an algorithm in C++ that will allow me to, given a floating-point value V of type T (e.g. double or float), returns the closest value to V in a given direction (up or down) that can be represented exactly in less than or equal to a specified number of decimal places D ?
For example, given
T = double
V = 670000.08267799998
D = 6
For direction = towards +inf I would like the result to be 670000.082678, and for direction = towards -inf I would like the result to be 670000.082677
This is somewhat similar to std::nexttoward(), but with the restriction that the 'next' value needs to be exactly representable using at most D decimal places.
I've considered a naive solution involving separating out the fractional portion and scaling it by 10^D, truncating it, and scaling it again by 10^-D and tacking it back onto the whole number portion, but I don't believe that guarantees that the resulting value will be exactly representable in the underlying type.
I'm hopeful that there's a way to do this properly, but so far I've been unable to find one.
Edit: I think my original explanation didn't properly convey my requirements. At the suggestion of #patricia-shanahan I'll try to describing my higher-level goal and then reformulate the problem a little differently in that context.
At the highest level, the reason I need this routine is due to some business logic wherein I must take in a double value K and a percentage P, split it into two double components V1 and V2 where V1 ~= P percent of K and V1 + V2 ~= K. The catch is that V1 is used in further calculations before being sent to a 3rd party over a wire protocol that accepts floating-point values in string format with a max of D decimal places. Because the value sent to the 3rd party (in string format) needs to be reconcilable with the results of the calculations made using V1 (in double format) , I need to "adjust" V1 using some function F() so that it is as close as possible to being P percent of K while still being exactly representable in string format using at most D decimal places. V2 has none of the restrictions of V1, and can be calculated as V2 = K - F(V1) (it is understood and acceptable that this may result in V2 such that V1 + V2 is very close to but not exactly equal to K).
At the lower level, I'm looking to write that routine to 'adjust' V1 as something with the following signature:
double F(double V, unsigned int D, bool roundUpIfTrueElseDown);
where the output is computed by taking V and (if necessary, and in the direction specified by the bool param) rounding it to the Dth decimal place.
My expectation would be that when V is serialized out as follows
const auto maxD = std::numeric_limits<double>::digits10;
assert(D <= maxD); // D will be less than maxD... e.g. typically 1-6, definitely <= 13
std::cout << std::fixed
<< std::setprecision(maxD)
<< F(V, D, true);
then the output contains only zeros beyond the Dth decimal place.
It's important to note that, for performance reasons, I am looking for an implementation of F() that does not involve conversion back and forth between double and string format. Though the output may eventually be converted to a string format, in many cases the logic will early-out before this is necessary and I would like to avoid the overhead in that case.
This is a sketch of a program that does what is requested. It is presented mainly to find out whether that is really what is wanted. I wrote it in Java, because that language has some guarantees about floating point arithmetic on which I wanted to depend. I only use BigDecimal to get exact display of doubles, to show that the answers are exactly representable with no more than D digits after the decimal point.
Specifically, I depended on double behaving according to IEEE 754 64-bit binary arithmetic. That is likely, but not guaranteed by the standard, for C++. I also depended on Math.pow being exact for simple exact cases, on exactness of division by a power of two, and on being able to get exact output using BigDecimal.
I have not handled edge cases. The big missing piece is dealing with large magnitude numbers with large D. I am assuming that the bracketing binary fractions are exactly representable as doubles. If they have more than 53 significant bits that will not be the case. It also needs code to deal with infinities and NaNs. The assumption of exactness of division by a power of two is incorrect for subnormal numbers. If you need your code to handle them, you will have to put in corrections.
It is based on the concept that a number that is both exactly representable as a decimal with no more than D digits after the decimal point and is exactly representable as a binary fraction must be representable as a fraction with denominator 2 raised to the D power. If it needs a higher power of 2 in the denominator, it will need more than D digits after the decimal point in its decimal form. If it cannot be represented at all as a fraction with a power-of-two denominator, it cannot be represented exactly as a double.
Although I ran some other cases for illustration, the key output is:
670000.082678 to 6 digits Up: 670000.09375 Down: 670000.078125
Here is the program:
import java.math.BigDecimal;
public class Test {
public static void main(String args[]) {
testIt(2, 0.000001);
testIt(10, 0.000001);
testIt(6, 670000.08267799998);
}
private static void testIt(int d, double in) {
System.out.print(in + " to " + d + " digits");
System.out.print(" Up: " + new BigDecimal(roundUpExact(d, in)).toString());
System.out.println(" Down: "
+ new BigDecimal(roundDownExact(d, in)).toString());
}
public static double roundUpExact(int d, double in) {
double factor = Math.pow(2, d);
double roundee = factor * in;
roundee = Math.ceil(roundee);
return roundee / factor;
}
public static double roundDownExact(int d, double in) {
double factor = Math.pow(2, d);
double roundee = factor * in;
roundee = Math.floor(roundee);
return roundee / factor;
}
}
In general, decimal fractions are not precisely representable as binary fractions. There are some exceptions, like 0.5 (½) and 16.375 (16⅜), because all binary fractions are precisely representable as decimal fractions. (That's because 2 is a factor of 10, but 10 is not a factor of 2, or any power of two.) But if a number is not a multiple of some power of 2, its binary representation will be an infinitely-long cyclic sequence, like the representation of ⅓ in decimal (.333....).
The standard C library provides the macro DBL_DIG (normally 15); any decimal number with that many decimal digits of precision can be converted to a double (for example, with scanf) and then converted back to a decimal representation (for example, with printf). To go in the opposite direction without losing information -- start with a double, convert it to decimal and then convert it back -- you need 17 decimal digits (DBL_DECIMAL_DIG). (The values I quote are based on IEEE-754 64-bit doubles).
One way to provide something close to the question would be to consider a decimal number with no more than DBL_DIG digits of precision to be an "exact-but-not-really-exact" representation of a floating point number if that floating point number is the floating point number which comes closest to the value of the decimal number. One way to find that floating point number would be to use scanf or strtod to convert the decimal number to a floating point number, and then try the floating point numbers in the vicinity (using nextafter to explore) to find which ones convert to the same representation with DBL_DIG digits of precision.
If you trust the standard library implementation to not be too far off, you could convert your double to a decimal number using sprintf, increment the decimal string at the desired digit position (which is just a string operation), and then convert it back to a double with strtod.
Total re-write.
Based on OP's new requirement and using power-of-2 as suggested by #Patricia Shanahan, simple C solution:
double roundedV = ldexp(round(ldexp(V, D)),-D); // for nearest
double roundedV = ldexp(ceil (ldexp(V, D)),-D); // at or just greater
double roundedV = ldexp(floor(ldexp(V, D)),-D); // at or just less
The only thing added here beyond #Patricia Shanahan fine solution is C code to match OP's tag.
In C++ integers must be represented in binary, but floating point types can have a decimal representation.
If FLT_RADIX from <limits.h> is 10, or some multiple of 10, then your goal of exact representation of a decimal values is attainable.
Otherwise, in general, it's not attainable.
So, as a first step, try to find a C++ implementation where FLT_RADIX is 10.
I wouldn't worry about algorithm or efficiency thereof until the C++ implementation is installed and proved to be working on your system. But as a hint, your goal seems to be suspiciously similar to the operation known as “rounding”. I think, after obtaining my decimal floating point C++ implementation, I’d start by investigating techniques for rounding, e.g., googling that, maybe Wikipedia, …

C++ determining if a number is an integer

I have a program in C++ where I divide two numbers, and I need to know if the answer is an integer or not. What I am using is:
if(fmod(answer,1) == 0)
I also tried this:
if(floor(answer)==answer)
The problem is that answer usually is a 5 digit number, but with many decimals. For example, answer can be: 58696.000000000000000025658 and the program considers that an integer.
Is there any way I can make this work?
I am dividing double a/double b= double answer
(sometimes there are more than 30 decimals)
Thanks!
EDIT:
a and b are numbers in the thousands (about 100,000) which are then raised to powers of 2 and 3, added together and divided (according to a complicated formula). So I am plugging in various a and b values and looking at the answer. I will only keep the a and b values that make the answer an integer. An example of what I got for one of the answers was: 218624 which my program above considered to be an integer, but it really was: 218624.00000000000000000056982 So I need a code that can distinguish integers with more than 20-30 decimals.
You can use std::modf in cmath.h:
double integral;
if(std::modf(answer, &integral) == 0.0)
The integral part of answer is stored in fraction and the return value of std::modf is the fractional part of answer with the same sign as answer.
The usual solution is to check if the number is within a very short distance of an integer, like this:
bool isInteger(double a){
double b=round(a),epsilon=1e-9; //some small range of error
return (a<=b+epsilon && a>=b-epsilon);
}
This is needed because floating point numbers have limited precision, and numbers that indeed are integers may not be represented perfectly. For example, the following would fail if we do a direct comparison:
double d=sqrt(2); //square root of 2
double answer=2.0/(d*d); //2 divided by 2
Here, answer actually holds the value 0.99999..., so we cannot compare that to an integer, and we cannot check if the fractional part is close to 0.
In general, since the floating point representation of a number can be either a bit smaller or a bit bigger than the actual number, it is not good to check if the fractional part is close to 0. It may be a number like 0.99999999 or 0.000001 (or even their negatives), these are all possible results of a precision loss. That's also why I'm checking both sides (+epsilon and -epsilon). You should adjust that epsilon variable to fit your needs.
Also, keep in mind that the precision of a double is close to 15 digits. You may also use a long double, which may give you some extra digits of precision (or not, it is up to the compiler), but even that only gets you around 18 digits. If you need more precision than that, you will need to use an external library, like GMP.
Floating point numbers are stored in memory using a very different bit format than integers. Because of this, comparing them for equality is not likely to work effectively. Instead, you need to test if the difference is smaller than some epsilon:
const double EPSILON = 0.00000000000000000001; // adjust for whatever precision is useful for you
double remainder = std::fmod(numer, denom);
if(std::fabs(0.0 - remainder) < EPSILON)
{
//...
}
Alternatively, if you want to include values that are close to integers (based on your desired precision), you can modify the if condition slightly (since the remainder returned by std::fmod will be in the range [0, 1)):
if (std::fabs(std::round(d) - d) < EPSILON)
{
// ...
}
You can see the test for this here.
Floating point numbers are generally somewhat precise to about 12-15 digits (as a double), but as they are stored as a mantissa (fraction) and a exponent, rational numbers (integers or common fractions) are not likely to be stored as such. For example,
double d = 2.0; // d might actually be 1.99999999999999995
Because of this, you need to compare the difference of what you expect to some very small number that encompasses the precision you desire (we will call this value, epsilon):
double d = 2.0;
bool test = std::fabs(2 - d) < epsilon; // will return true
So when you are trying to compare the remainder from std::fmod, you need to check it against the difference from 0.0 (not for actual equality to 0.0), which is what is done above.
Also, the std::fabs call prevents you from having to do 2 checks by asserting that the value will always be positive.
If you desire a precision that is greater than 15-18 decimal places, you cannot use double or long double; you will need to use a high precision floating point library.

Exact decimal datatype for C++?

PHP has a decimal type, which doesn't have the "inaccuracy" of floats and doubles, so that 2.5 + 2.5 = 5 and not 4.999999999978325 or something like that.
So I wonder if there is such a data type implementation for C or C++?
The Boost.Multiprecision library has a decimal based floating point template class called cpp_dec_float, for which you can specify any precision you want.
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_dec_float.hpp>
int main()
{
namespace mp = boost::multiprecision;
// here I'm using a predefined type that stores 100 digits,
// but you can create custom types very easily with any level
// of precision you want.
typedef mp::cpp_dec_float_100 decimal;
decimal tiny("0.0000000000000000000000000000000000000000000001");
decimal huge("100000000000000000000000000000000000000000000000");
decimal a = tiny;
while (a != huge)
{
std::cout.precision(100);
std::cout << std::fixed << a << '\n';
a *= 10;
}
}
Yes:
There are arbitrary precision libraries for C++.
A good example is The GNU Multiple Precision arithmetic library.
If you are looking for data type supporting money / currency then try this:
https://github.com/vpiotr/decimal_for_cpp
(it's header-only solution)
There will be always some precision. On any computer in any number representation there will be always numbers which can be represented accurately, and other numbers which can't.
Computers use a base 2 system. Numbers such as 0.5 (2^-1), 0.125 (2^-3), 0.325 (2^-2 + 2^-3) will be represented accurately (0.1, 0.001, 0.011 for the above cases).
In a base 3 system those numbers cannot be represented accurately (half would be 0.111111...), but other numbers can be accurate (e.g. 2/3 would be 0.2)
Even in human base 10 system there are numbers which can't be represented accurately, for example 1/3.
You can use rational number representation and all the above will be accurate (1/2, 1/3, 3/8 etc.) but there will be always some irrational numbers too. You are also practically limited by the sizes of the integers of this representation.
For every non-representable number you can extend the representation to include it explicitly. (e.g. compare rational numbers and a representation a/b + c/d*sqrt(2)), but there will be always more numbers which still cannot be represented accurately. There is a mathematical proof that says so.
So - let me ask you this: what exactly do you need? Maybe precise computation on decimal-based numbers, e.g. in some monetary calculation?
What you're asking is anti-physics.
What phyton (and C++ as well) do is cut off the inaccuracy by rounding the result at the time to print it out, by reducing the number of significant digits:
double x = 2.5;
x += 2.5;
std::cout << x << std::endl;
just makes x to be printed with 6 decimal digit precision (while x itself has more than 12), and will be rounded as 5, cutting away the imprecision.
Alternatives are not using floating point at all, and implement data types that do just integer "scaled" arithmetic: 25/10 + 25/10 = 50/10;
Note, however, that this will reduce the upper limit represented by each integer type. The gain in precision (and exactness) will result in a faster reach to overflow.
Rational arithmetic is also possible (each number is represented by a "numarator" and a "denominator"), with no precision loss against divisions, (that -in fact- are not done unless exact) but again, with increasing values as the number of operation grows (the less "rational" is the number, the bigger are the numerator and denominator) with greater risk of overflow.
In other word the fact a finite number of bits is used (no matter how organized) will always result in a loss you have to pay on the side of small on on the side of big numbers.
I presume you are talking about the Binary Calculator in PHP. No, there isn't one in the C runtime or STL. But you can write your own if you are so inclined.
Here is a C++ version of BCMath compiled using Facebook's HipHop for PHP:
http://fossies.org/dox/facebook-hiphop-php-cf9b612/dir_2abbe3fda61b755422f6c6bae0a5444a.html
Being a higher level language PHP just cuts off what you call "inaccuracy" but it's certainly there. In C/C++ you can achieve similar effect by casting the result to integer type.