I just check following thing in python 2.7
print 0.1 + 0.2
output :- 0.3
print 0.1 + 0.2 - 0.3
output :- 5.55111512313e-17
But I expect the 0.0
So, how to achive this thing ?
The problem here is that the float type doesn't have enough precision to display the result you want. If you try to print the partial sum 0.1 + 0.2 you'll see that the float result you get is 0.30000000000000004.
So, 5.55111512313e-17 is the closest approximation possible with float type variables to that result. If you try to cast the result to int, so:
int(0.2 + 0.1 - 0.3)
You'll see 0, and that's the right integer approximation.
You can get 0.0 with floating point variables by using the decimal class.
Try this:
from decimal import Decimal
Decimal("0.2") + Decimal("0.1") - Decimal("0.3")
And you'll see that the result is Decimal("0.0")
Related
I have problem of converting a double (say N) to p/q form (rational form), for this I have the following strategy :
Multiply double N by a large number say $k = 10^{10}$
then p = y*k and q = k
Take gcd(p,q) and find p = p/gcd(p,q) and q = p/gcd(p,q)
when N = 8.2 , Answer is correct if we solve using pen and paper, but as 8.2 is represented as 8.19999999 in N (double), it causes problem in its rational form conversion.
I tried it doing other way as : (I used a large no. 10^k instead of 100)
if(abs(y*100 - round(y*100)) < 0.000001) y = round(y*100)/100
But this approach also doesn't give right representation all the time.
Is there any way I could carry out the equivalent conversion from double to p/q ?
Floating point arithmetic is very difficult. As has been mentioned in the comments, part of the difficulty is that you need to represent your numbers in binary.
For example, the number 0.125 can be represented exactly in binary:
0.125 = 2^-3 = 0b0.001
But the number 0.12 cannot.
To 11 significant figures:
0.12 = 0b0.00011110101
If this is converted back to a decimal then the error becomes obvious:
0b0.00011110101 = 0.11962890625
So if you write:
double a = 0.2;
What the machine actually does is find the closest binary representation of 0.2 that it can hold within a double data type. This is an approximation since as we saw above, 0.2 cannot be exactly represented in binary.
One possible approach is to define an 'epsilon' which determines how close your number can be to the nearest representable binary floating point.
Here is a good article on floating points:
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
have problem of converting a double (say N) to p/q form
... when N = 8.2
A typical double cannot encode 8.2 exactly. Instead the closest representable double is about
8.19999999999999928945726423989981412887573...
8.20000000000000106581410364015027880668640... // next closest
When code does
double N = 8.2;
It will be the 8.19999999999999928945726423989981412887573... that is converted into rational form.
Converting a double to p/q form:
Multiply double N by a large number say $k = 10^{10}$
This may overflow the double. First step should be to determine if the double is large, it which case, it is a whole number.
Do not multiple by some power of 10 as double certainly uses a binary encoding. Multiplication by 10, 100, etc. may introduce round-off error.
C implementations of double overwhelmingly use a binary encoding, so that FLT_RADIX == 2.
Then every finite double x has a significand that is a fraction of some integer over some power of 2: a binary fraction of DBL_MANT_DIG digits #Richard Critten. This is often 53 binary digits.
Determine the exponent of the double. If large enough or x == 0.0, the double is a whole number.
Otherwise, scale a numerator and denominator by DBL_MANT_DIG. While the numerator is even, halve both the numerator and denominator. As the denominator is a power-of-2, no other prime values are needed for simplification consideration.
#include <float.h>
#include <math.h>
#include <stdio.h>
void form_ratio(double x) {
double numerator = x;
double denominator = 1.0;
if (isfinite(numerator) && x != 0.0) {
int expo;
frexp(numerator, &expo);
if (expo < DBL_MANT_DIG) {
expo = DBL_MANT_DIG - expo;
numerator = ldexp(numerator, expo);
denominator = ldexp(1.0, expo);
while (fmod(numerator, 2.0) == 0.0 && denominator > 1.0) {
numerator /= 2.0;
denominator /= 2.0;
}
}
}
int pre = DBL_DECIMAL_DIG;
printf("%.*g --> %.*g/%.*g\n", pre, x, pre, numerator, pre, denominator);
}
int main(void) {
form_ratio(123456789012.0);
form_ratio(42.0);
form_ratio(1.0 / 7);
form_ratio(867.5309);
}
Output
123456789012 --> 123456789012/1
42 --> 42/1
0.14285714285714285 --> 2573485501354569/18014398509481984
867.53089999999997 --> 3815441248019913/4398046511104
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 6 years ago.
See the program below
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
And another program here
#include<stdio.h>
int main()
{
float x = 0.5;
if (x == 0.5)
printf("IF");
else if (x == 0.5f)
printf("ELSE IF");
else
printf("ELSE");
}
From the both programs we expect similar results because nothing has literally changed in both changed, everything is same and also comparison terms are changed correspondingly.
BUT 2 above programs produce different results
1st Program
ELSE
2nd Program
IF
Why is this 2 programs behaving differently
The behavior of these two programs will vary between computers and operating systems - you are testing for exact equality of floats.
In memory, floats are stored as a string of bits in binary - i.e. 0.1 in binary (0.1b) represents 0.5 in decimal (0.5d).
Similarly,
Binary | Decimal
0.1 | 2^-1 = 1/2
0.01 | 2^-2 = 1/4
0.001 | 2^-3 = 1/8
0.11 | 2^-1 + 2^-2 = 3/4
The problem is that some decimals don't have nice floating point representations.
0.1d = 0.0001100110011001100110011...
which is infinitely long.
So, 0.5 is really nice in binary
0.5d = 0.1000000000000000...b
but 0.1 is really nasty
0.1d = 0.00011001100110011...
Now depending on your compiler, it may assume that 0.1f is a double type, which stores more of the infinite sequence of 0.0001100110011001100110011001100110011...
so it is not equal to the float version, which truncates the sequence much earlier.
On the other hand, 0.5f is the same regardless of how many decimal places are stored, since it has all zeroes after the first place.
The accepted way to compare floats or doubles in C++ or C is to #define a very small number (I like to call it EPS, short for EPSILON) and replace
float a = 0.1f
if (a == 0.1f) {
printf("IF\n")
} else {
printf("ELSE\n")
}
with
#include <math.h>
#define EPS 0.0000001f
float a = 0.1f
if (abs(a - 0.1f) < EPS) {
printf("IF\n")
} else {
printf("ELSE\n")
}
Effectively, this tests if a is 'close enough' to 0.1f instead of exact equality. For 99% of applications, this approach works just fine, but for super-sensitive calculations some stranger tricks are needed that involve using long double, or defining a custom data type.
You are using two data types: double,automaticly in if(x=0.1))(0.1 is double) and x is float. these types differ how they store the value. 0.1 is not 0.1f, it is 0.100000000001 (double) or 0.09388383(something)
My code
double to_radians(double theta)
{
return (M_PI * theta) / 180.0;
}
int main()
{
std::vector<std::pair<double, double>> points;
for (double theta = 0.0; theta <= 360.0; theta += 30.0)
{
points.push_back(std::make_pair(std::cos(to_radians(theta)), std::sin(to_radians(theta))));
}
for (auto point : points)
std::cout << point.first << " " << point.second << "\n";
}
Output I expect
1 0
0.866025 0.5
0.5 0.866025
0 1
-0.5 0.866025
-0.866025 0.5
-1 0
-0.866025 -0.5
-0.5 -0.866025
0 -1
0.5 -0.866025
0.866025 -0.5
1 0
Output I get:
1 0
0.866025 0.5
0.5 0.866025
6.12303e-17 1
-0.5 0.866025
-0.866025 0.5
-1 1.22461e-16
-0.866025 -0.5
-0.5 -0.866025
-1.83691e-16 -1
0.5 -0.866025
0.866025 -0.5
1 -2.44921e-16
As you can see I am getting these strange values instead of zero. Can somebody explain why this is happening?
6.12303e-17, to take an example, represents the value 6.12303*10-17, or 0.00000000000000000612303.
The reason you obtain this value as result is that you did not apply cos to π/2, which is not representable as a double anyway (it's irrational). The cos function was applied to a double close to π/2, obtained by multiplying 90 by M_PI and dividing by 180. Since the argument is not π/2, the result does not have to be 0. In fact, since floating-point numbers are more dense near zero, it is extremely unlikely for any floating-point format that applying a correctly rounded cos to any floating-point number produces exactly zero as result.
In fact, since the derivative of cos in π/2 is -1, the value obtained for the expression cos(M_PI/2.0) is a close approximation of the difference between M_PI/2 and π/2. That difference is indeed of the order of d*10-17, since the double-precision IEEE 754 format can only represent the first 16 or so first decimal digits of an arbitrary number.
Note that the same argument applies to obtaining 0.5 as the result of cos(M_PI/3.0), or even -1.0 as the result of cos(M_PI). The difference is that there are many floating-point numbers, some very small, around 0, and these can represent very precisely the intended non-zero result. In comparison, 0.5 and -1.0 have only a few neighbors, and for inputs close enough to π/3 and π, the numbers 0.5 and -1.0 end up being returned as the nearest representable double-precision value to the respective mathematical result (which isn't 1/2 or -1, since the input is not π/3 or π).
The simplest solution to your problem would be to use hypothetical functions cosdeg and sindeg that would compute directly the cosine and sine of angles in degrees. Since 60 and 90 are representable exactly as double-precision floating-point numbers, these functions would have no excuse not to return 0.5 or 0.0 (also exactly representable as double-precision floating-point numbers). I asked a question in relation to these functions earlier but no-one pointed to any already available implementation.
The functions sinpi and cospi pointed out by njuffa are often available, and they allow to compute the sine and cosine or π/2, π/4 or even 7.5*π, but not of π/3, since the number 1/3 they would have to be applied to is not representable exactly in binary floating-point.
It's a floating point rounding error. Trig functions are implemented as mathematical series that are approximated on the computational level which causes for numbers very close to zero for example 6.12303e-17 rather than the expected 0.
In the below example app I calculate the floating point remainder from dividing 953 by 0.1, using std::fmod
What I was expecting is that since 953.0 / 0.1 == 9530, that std::fmod(953, 0.1) == 0
I'm getting 0.1 - why is this the case?
Note that with std::remainder I get the correct result.
That is:
std::fmod (953, 0.1) == 0.1 // unexpected
std::remainder(953, 0.1) == 0 // expected
Difference between the two functions:
According to cppreference.com
std::fmod calculates the following:
exactly the value x - n*y, where n is x/y with its fractional part truncated
std::remainder calculates the following:
exactly the value x - n*y, where n is the integral value nearest the exact value x/y
Given my inputs I would expect both functions to have the same output. Why is this not the case?
Exemplar app:
#include <iostream>
#include <cmath>
bool is_zero(double in)
{
return std::fabs(in) < 0.0000001;
}
int main()
{
double numerator = 953;
double denominator = 0.1;
double quotient = numerator / denominator;
double fmod = std::fmod (numerator, denominator);
double rem = std::remainder(numerator, denominator);
if (is_zero(fmod))
fmod = 0;
if (is_zero(rem))
rem = 0;
std::cout << "quotient: " << quotient << ", fmod: " << fmod << ", rem: " << rem << std::endl;
return 0;
}
Output:
quotient: 9530, fmod: 0.1, rem: 0
Because they are different functions.
std::remainder(x, y) calculates IEEE remainder which is x - (round(x/y)*y) where round is rounding half to even (so in particular round(1.0/2.0) == 0)
std::fmod(x, y) calculates x - trunc(x/y)*y. When you divide 953 by 0.1 you may get a number slightly smaller than 9530, so truncation gives 9529. So as the result you get 953.0 - 952.9 = 0.1
Welcome to floating point math. Here's what happens: One tenth cannot be represented exactly in binary, just as one third cannot be represented exactly in decimal. As a result, the division produces a result slightly below 9530. The floor operation produces the integer 9529 instead of 9530. And then this leaves 0.1 left over.
Here is my problem, I have several parameters that I need to increment by 0.1.
But my UI only renders x.x , x.xx, x.xxx for floats so since 0.1f is not really 0.1 but something like 0.10000000149011612 on the long run my ui will render -0.00 and that doesn't make much sense. How to prevent that for all the possible cases of UI.
Thank you.
Use integers and divide by 10 (or 1000 etc...) just before displaying. Your parameters will store an integer number of tenths, and you'll increment them by 1 tenth.
If you know that your floating point value will always be a multiple of 0.1, you can round it after every increment to make sure it maintains a sensible value. It still won't be exact (because it physically can't be), but at least the errors won't accumulate and it will display properly.
Instead of:
x += delta;
Do:
x = floor((x + delta) / precision + 0.5) * precision;
Edit: It's useful to turn the rounding into a stand-alone function and decouple it from the increment:
inline double round(double value, double precision = 1.0)
{
return floor(value / precision + 0.5) * precision;
}
x = round(x + 0.1, 0.1);