double value subtraction without precision [duplicate] - c++

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
int base = 12;
double number = 12.2112;
double c = number - base;
//// c=0.211199999999999983
this is c++ code,
How could I get the outcome: c= 0.2112,

0.2112 cannot be exactly represented in Floating Point Notation: https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
If it's extremely important to you that 0.2112 be represented exactly, then the usual solution looks a bit like this:
int base = 120000;
int number = 122112;
int c = number - base;
cout << "Value of c: " << (c / 10000.0) << endl;
Know, however, that what we're doing here is implementing fixed-point numbers. If you need fixed-point numbers in your implementation, it may be worthwhile to research and implement a full fixed-point class that does everything you're trying to accomplish here.

Related

Why does the system assign approximate value to the floating point numbers [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
While studying about float datatypes I wrote a program in C++:
float a = 0.3, b = 0.4, c = 0.7;
cout << "a+b= " << (a + b) << endl;
if ((a + b) == c)
{
cout << "Success..." << endl;
}
else
{
cout << "Failure..." << endl;
}
The output I received was:
a+b= 0.7
Failure...
I am using visual studio code as my IDE and it facilitates that on hovering the cursor over a declared variable it shows the approximate value assigned to the variable and I realized that for a it was 0.2999999999999999889, for b:0.4000000000000000222 and for c:0.6999999999999999556.
So, my question is that is there a method/a data type which can store the true value of a floating point.
And why does the system stores these values in approximate form rather than in the true value.
Why does the system assign approximate value to the floating point numbers
In short: Because there are infinitely many fractional numbers while the computer doesn't have infinite memory.
The compromising solution to that is to represent only some of the fractional numbers. And since computer hardware often use binary base, it just so happens that many decimal fractions such as 1/10 are not those that can be represented.
is there a method/a data type which can store the true value of a floating point.
Those values that you see true values stored in the floating point.
If you mean, is there a datatype that can represent 1/10 and the other mentioned fractions accurately, then
No, there is no such built-in type in C++
But such numbers can be represented with more complex structures. A very trivial example is to use int nominator = 1, denominator = 10. A class type can closely emulate operations of an integer type through operator overloads. This particular naïve representation is not efficient due to many values that have duplicate representations.

double math and boolean check gives erroneous answer [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
Take this c++ code:
double d = 0.3028 + 0.0028;
cout << d << endl;
if (d == 0.3056)
cout << "match" << endl;
else
cout << "not a match" << endl;
Why is the output "not a match"?
Well that is because how floats are stored in memory. Here is a good article on this: https://dev.to/visheshpatel/how-floating-point-no-is-stored-memory-47od
Instead floats (and doubles) should be checked if are "almost equal". In your case, if you are interested only in 4 decimal places then you can check if the difference is lower than 0.00001. So:
float epsilon = 0.00001;
double a = d; //your value
double b = 0.3056; //the value to which you are comparing
bool equal_ab = abs(a - b) < epsilon;
This is the nature of finite precision math.
If you use, say, six digits of decimal precision, 1/3 will be represented as 0.333333 and if you do "3 * 1/3" you will get 0.999999, not 1.
Similarly, 2/3 will be 0.666667, so 2 * 1/3 will not give 2/3. 1/3+1/3 will give 0.666666, not 2/3.
Finite precision representations are funny this way and testing them for precise equality is generally a bad idea.

sum of double numbers in c++ [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I want to calculate the sum of three double numbers and I expect to get 1.
double a=0.0132;
double b=0.9581;
double c=0.0287;
cout << "sum= "<< a+b+c <<endl;
if (a+b+c != 1)
cout << "error" << endl;
The sum is equal to 1 but I still get the error! I also tried:
cout<< a+b+c-1
and it gives me -1.11022e-16
I could fix the problem by changing the code to
if (a+b+c-1 > 0.00001)
cout << "error" << endl;
and it works (no error). How can a negative number be greater than a positive number and why the numbers don't add up to 1?
Maybe it is something basic with summation and under/overflow but I really appreciate your help.
Thanks
Rational numbers are infinitely precise. Computers are finite.
Precision loss is a well known problem in computer programming.
The real question is, how can you remedy it?
Consider using an approximation function when comparing floats for equality.
#include <iostream>
#include <cmath>
#include <limits>
using namespace std;
template <typename T>
bool ApproximatelyEqual(const T dX, const T dY)
{
return std::abs(dX - dY) <= std::max(std::abs(dX), std::abs(dY))
* std::numeric_limits<T>::epsilon();
}
int main() {
double a=0.0132;
double b=0.9581;
double c=0.0287;
//Evaluates to true and does not print error.
if (!ApproximatelyEqual(a+b+c,1.0)) cout << "error" << endl;
}
Floating point numbers in C++ have a binary representation. This means that most numbers that can exactly represented by a decimal fraction with only a few digits cannot be exactly represented by floating point numbers. That's where your error comes from.
One example: 0.1 (decimal) is a periodic fraction in binary:
0.000110011001100110011001100...
Therefore it cannot be exactly be represented with any number of bits with binary encoding.
In order to avoid this type of error, you can use BCD (binary coded decimal) numbers which are supported by some special libraries. The drawbacks are slower calculation speed (not directly supported by the CPU) and slightly higher memory usage.
ANother option is to represent the number by a general fraction and store numerator and denomiator as separate integers.

Arithmetic to get correct decimal number instead of exponential number in c++ [duplicate]

This question already has answers here:
How To Represent 0.1 In Floating Point Arithmetic And Decimal
(2 answers)
Closed 8 years ago.
int digit = 1;
float result=0.0;
double temp = 200000;
float tick = 0.00100000005;
result = digit/1000000.0;
long long phase = temp*result*1000/tick*1000
result will be equal to 9.99999997e-07. If manually calculate it should be 0.000001
How can I make the exponential num to be 0.000001?
Thanks.
if result = 9.99999997e-07 phase calculated will be 199999,however if result = 0.000001 phase calculated will be 200000.
So my problem is result.
Add in
For finance and certain other uses, the easiest way is to work in multiples of your smallest unit... in your example, it might be "microns":
inline long long units_to_microns(long long units) { return units * 1000000; }
long long digit = units_to_microns(1);
long long result = digit / 1000000;
Then write some custom code to print numbers a decimal point where you want it:
std::string microns_to_string(long long microns)
{
std::ostringstream oss;
oss << microns / 1000000;
if (microns % 1000000)
oss << '.' << std::setfill('0') << std::setw(6) << microns;
return oss.str();
}
A more structured (and reliable) way to do this is offered by the boost Units library. That way, you can specify the units of specific variables, and if e.g. one was in metres and another kilometres, you could add them without any special care.
If you're dealing with irrational numbers and rounding them off to a specific precision early on is not useful, then you're best off either using double (for some more significant digits of precision), or a custom library like GMP - the GNU Multiple Precision Arithmetic Library.
BTW - What Every Computer Scientist Should Know About Floating-Point Arithmetic is commonly recommended reading in this space.
You can't, because the number 1/1000000.0 cannot be represented exactly in binary. You can improve the accuracy by using a double. This type of question is pretty common here. I've found this link to be helpful:
https://docs.python.org/2/tutorial/floatingpoint.html
(it's for Python, but the issues are the same).

Losing Double Precision when multiplying by multiple of 10 [duplicate]

This question already has answers here:
Precision loss with double C++
(4 answers)
Closed 9 years ago.
So I have the following code
int main(){
double d;
cin>>d;
while(d!=0.00)
{
cout<<d<<endl;
double m = 100*d;
int n = m;
cout<<n<<endl;
cin>>d;
}
return 0;}
When I enter the input 20.40 for d the value of n comes out to be 2039 instead of 2040.
I tried replacing int n = m with int n = (int) m but the result was the same.
Is there any way to fix this. Thanks in advance.
Your code truncates m but you need rounding. Include cmath and use int n = round(m).
Decimal values can, in general, not be represented exactly using binary floating points like double. Thus, the value 20.40 is represented as an approximation which can be used to restore the original value (20.4; the precision cannot be retained), e.g., when formatting the value. Doing computations with these approximated values will typically amplify the error.
As already mentioned in one of the comments, the relevant reference is the paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic". One potential way out of your trouble is to use decimal floating points which are, however, not yet part of the C++ standard.
Single and double presicion floating point numbers are not stored the same way as integers, so whole numbers (e.g. 5, 10) may actually look like long decimals (e.g. 4.9999001, 10.000000001). When you cast to an int, all it does is truncate the whole number. So, if the number is currently represented as 4.999999999, casting it to an int will give you 4. std::round will provide you with a better result most of the time (if the number is 4.6 and you just want the whole number portion, round will not work well). The bigger question is then: what are you hoping to accomplish by casting a double to an int?
In general, when dealing with floating point numbers, you will want to use some epsilon value that is your minimum significant digits. So if you wanted to compare 4.9999999 to 5, you would do (pseudo-code): if abs(5 - 4.9999999) < epsilon, return 5.
Example
int main()
{
double d;
std::cin >> d;
while (std::fabs(d - 0.0) > DBL_EPSILON)
{
std::cout << d << std::endl;
double m = 100 * d;
int n = static_cast<int>(m);
if (std::fabs(static_cast<double>(n) - m) > DBL_EPSILON)
{
n++;
}
std::cout << n << std::endl;
std::cin >> d;
}
return 0;
}
Casting double to int truncates value so 20.40 is probably 20.399999 * 100 is 2039.99 because double is not base 10. You can use round() function that will not truncate but will get you nearest int.
int n = round(m);
Floating point numbers can't exactly represent all decimal numbers, sometimes an approximation is used. In your example the closest possible exact number is 20.39999999999999857891452847979962825775146484375. See IEEE-754 Analysis for a quick way to see exact values.
You can use rounding, but presumably you're really looking for the first two digits truncated. Just add a really small value, e.g. 0.0000000001 before or after you multiply.