How to avoid rounding off of large float or double values? - c++

I am working on some c++ problem which should output 22258199.5000000 this number, I have stored the result in double datatype and when I tried to print that variable using std::cout it rounds off to 22258200.0000000, I don't know why is this happening? can anybody explain? what can I do to avoid this problem?

A 32-bit float type holds approximately 7 decimal digits, and anything beyond that will be rounded to the nearest representable value.
A 64-bit double type holds approximately 15 decimal digits and also rounds values beyond that.
Since your value is rounding to the nearest 7-digit number, it appears you're using a float variable somewhere. Copying a float value to double doesn't restore the lost digits.
This doesn't apply to your example, but it might in others - the rounding occurs in binary, not decimal, so you might end up with a value that appears to contain many more digits than I indicated above. Just know that only the first 7 or 15 are going to be accurate in decimal.

Related

strtof() function misplacing decimal place

I have a string "1613894376.500012077" and I want to use strtof in order to convert to floating point 1613894376.500012077. The problem is when I use strtof I get the following result with the decimal misplaced 1.61389e+09. Please help me determine how to use strof properly.
A typical float is 32-bit and can only represent exactly about 232 different values. "1613894376.500012077" is not one of those.
"1.61389e+09" is the same value as "1613890000.0" and represents a close value that float can represent.
The 2 closest floats are:
1613894272.0
1613894400.0 // slightly closer to 1613894376.500012077
Print with more precision to see more digits.
The decimal point is not misplaced. The notation “1.61389e+09” means 1.61389•109, which is 1,613,890,000., which has the decimal point in the correct place.
The actual result of strtof in your computer is probably 1,613,894,400. This is the closest value to 1613894376.500012077 that the IEEE-754 binary32 (“single”) format can represent, and that is the format commonly used for float. When you print it with %g, the default is to use just six significant digits. To see it with more precision, print it with %.999g.
The number 1613894376.500012077 is equivalent (the same number up to the precision of the machine as 1.61389e+09.) The e+09 suffix means that the decimal point is located nine decimal digits right the place it has been placed (or that the number is multiplied by 10 to the ninth power). This is a common notation in computer science called scientific notation.

Double value change during assignment

I know double has some precision issues and it can truncate values during conversion to integer.
In my case I am assigning a double 690000000000123455 and it gets changed to 690000000000123392 during assignment.
Why is the number being changed so much drastically? After all there's no fractional part assigned with it. It doesn't seems like a precision issues as value doesn't change by 1 but 63.
Presumably you store 690000000000123455 as a 64 bit integer and assign this to a double.
double d = 690000000000123455;
The closest representable double precision value to 690000000000123455 can be checked here: http://pages.cs.wisc.edu/~rkennedy/exact-float?number=690000000000123455 and is seen to be 690000000000123392.
In other words, everything is as to be expected. Your number cannot be represented exactly as a double precision value and so the closest representable value is chosen.
For more discussion of floating point data types see: Is floating point math broken?
IEEE-754 double precision floats have about 53 bits of precision which equates to about 16 decimal digits (give or take). You'll notice that's about where your two numbers start to diverge.
double storage size is 8 byte. It's value ranges from 2.3E-308 to 1.7E+308. It's precision is upto 15 decimal places. But your number contains 18 digits. That's the reason.
You could use long double as it has precision upto 19 decimal places.
The other answers are already pretty complete, but I want to suggest a website I find very helpful in understanding how float point number works: IEEE 754 Converter (only 32-bit float here, but the interaction is still very good).
As we can see, 690000000000123455 is between 2^59 and 2^60, and the highest precision of the Mantissa is 2^-52 for double precision, which means that the precision step for the given number is 2^7=128. The error 63 you provided, is actually within range.
As a side suggestion, it is better to use long for storing big integers, as it will hold the precision and does not overflow (in your case).

C++ Type of variables - value

I am the beginner, but I think same important thinks I should learn as soon as it possible.
So I have a code:
float fl=8.28888888888888888888883E-5;
cout<<"The value = "<<fl<<endl;
But my .exe file after run show:
8.2888887845911086e-005
I suspected the numbers to limit of the type and rest will be the zero, but I saw digits, which are random. Maybe it gives digits from memory after varible?
Could you explain me how it does works?
I suspected the numbers to limit of the type and rest will be the zero
Yes, this is exactly what happens, but it happens in binary. This program will show it by using the hexadecimal printing format %a:
#include <stdio.h>
int main(int c, char *v[]) {
float fl = 8.28888888888888888888883E-5;
printf("%a\n%a\n", 8.28888888888888888888883E-5, fl);
}
It shows:
0x1.5ba94449649e2p-14
0x1.5ba944p-14
In these results, 0x1.5ba94449649e2p-14 is the hexadecimal representation of the double closest to 8.28888888888888888888883*10-5, and 0x1.5ba944p-14
is the representation of the conversion to float of that number. As you can see, the conversion simply truncated the last digits (in this case. The conversion is done according to the rounding mode, and when the rounding goes up instead of down, it changes one or more of the last digits).
When you observe what happens in decimal, the fact that float and double are binary floating-point formats on your computer means that there are extra digits in the representation of the value.
I suspected the numbers to limit of the type and rest will be the zero
That is what happens internally. Excess bits beyond what the type can store are lost.
But that's in the binary representation. When you convert it to decimal, you can get trailing non-zero digits.
Example:
0b0.00100 is 0.125 in decimal
What you're seeing is a result of the fact that you cannot exactly represent a floating-point number in memory. Because of this, floats will be stored as the nearest value that can be stored in memory. A float usually has 24 bits used to represent the mantissa, which translates to about 6 decimal places (this is implementation defined though, so you shouldn't rely on this). When printing more than 6 decimal digits, you'll notice your value isn't stored in memory as the value you intended, and you'll see random digits.
So to recap, the problem you encountered is caused by the fact that base-10 decimal numbers cannot be represented in memory, instead the closest number to it is stored and this number will then be used.
each data type has range after this range all number is from memory or rubbish so you have to know this ranges and deal with it when you write code.
you can know this ranges from here or here

How to print a float value of 6 digit precision in C++?

By default I'm getting 4 digit precision and when I use setprecision(6) the last digits of the variable come in random like 1/3=0.333369.
float has about 7 decimal digits of precision, due to its use of 24 binary digits to store the digits of the number. As far as output is concerned, setprecision(6) does everything you could ask for.
It's likely you are losing precision, for example by subtracting two numbers with similar values and printing the result. The quick solution is to change the computations to use double or long double. But to make any guarantees about the precision of a floating-point result, you need to understand how FP works and analyze how your formula is getting computed.
See What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Determining output (printing) of float with %f in C/C++

I have gone through earlier discussions on floating point numbers in SO but that didn't clarified my problem,I knew this floating point issues may be common in every forum but my question in not concerned about Floating point arithmetic or Comparison.I am rather inquisitive about its representation and output with %f.
The question is straight forward :"How to determine the exact output of :
float = <Some_Value>f;
printf("%f \n",<Float_Variable>);
Lets us consider this code snippet:
float f = 43.2f,
f1 = 23.7f,
f2 = 58.89f,
f3 = 0.7f;
printf("f1 = %f\n",f);
printf("f2 = %f\n",f1);
printf("f3 = %f\n",f2);
printf("f4 = %f\n",f3);
Output:
f1 = 43.200001
f2 = 23.700001
f3 = 58.889999
f4 = 0.700000
I am aware that %f (is meant to be for double) has a default precision of 6, also I am aware that the problem (in this case) can be fixed by using double but I am inquisitive about the output f2 = 23.700001 and f3 = 58.889999 in float.
EDIT: I am aware that floating point number cannot be represented precisely, but what is the rule of for obtaining the closest representable value ?
Thanks,
Assuming that you're talking about IEEE 754 float, which has a precision of 24 binary digits: represent the number in binary (exactly) and round the number to the 24th most significant digit. The result will be the closest floating point.
For example, 23.7 represented in binary is
10111.1011001100110011001100110011...
After rounding you'll get
10111.1011001100110011010
Which in decimal is
23.700000762939453125
After rounding to the sixth decimal place, you'll have
23.700001
which is exactly the output of your printf.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
You may interest to see other people question regarding that on SO too.
Please take a look too.
https://stackoverflow.com/search?q=floating+point
A 32-bit float (as in this case) is represented as 1 bit of sign, 8 bits of exponent and 23 bits of the fractional part of the mantissa.
First, forget the sign of what you put in. Then the rest of what you put in will be stored as a fraction of the format
(1 + x/8,388,608) * 2^(y-127) (note that the 8.388,608 is 2^23). Where x is the fractional mantissa and y is the exponent. Believe it or not, there is only one representation in this form for every value you put in. The value stored will be the closest value to the number you want, if your value cannot be represented exactly, it means you'll pick up an extra .0001 or whatever.
So, if you want to figure out the value that will actually be stored, just figure out what it will turn into.
So second thing to do (after throwing out the sign) is to find the largest power of 2 that is smaller in magnitude than the number you are representing. So let's take 43.2.
The largest power of two smaller than that is 32. So that's the "1" on the left, since it's a 32, not a 1, that means the 2^ value on the right must be 2^5 (32), which means y is 132. So now subtract off the 32, it's done for. What's left is 11.2. Now we need to represent 11.2 as a fraction over 8,338,608 times 2^5.
So
11.2 approx equals x*32/8,336,608 or x/262,144. The value you get for x is 2,938,013/262,144. The real numerator was 0.2 lower (2,938,012.8), so there will be an error of 0.2 in 262,144 or 2 in 131,072. In decmial, this value is 0.000015258789063. So if you print enough digits, you'll see this error value show up in your output.
When you see the output be too low, it's because the rounding went the other way, the value produced was nearer to the wanted value by being too low, and so you get an output that is too low. When the value can be represented exactly (like for example any power of 2), you never get an error.
It's not simple, but there you go. I'm sure you can code this up.
*note: for very small in magnitude values (roughly less than 2^-127) you get into weirdness called denormals. I'm not going to explain them, but they won't fit the pattern. Luckily they don't show up much. And once you get into that range, your accuracy goes to pot anyway.
You can control the number of decimal points that are outputted by including this in the format specifier.
So instead of having
float f = 43.2f,
printf("f1 = %f\n",f);
Have this
float f = 43.2f,
printf("f1 = %.2f\n",f);
to print two numbers after the decimal point.
Do note that floating point numbers are not precisely represented in memory.
The compiler and CPU use IEEE 754 to represent floating point values in memory. Most rational numbers cannot be expressed exactly in this format, so the compiler chooses the closest approximate representation.
To avoid unpredictable output, you should round to the appropriate precision.
// outputs "0.70"
printf("%.2f\n", 0.7f);
A floating point number or a double precision floating point number is stored as an integer numerator, and a power of 2 as denominator. The math behind it is pretty simple. It involves shifting and bit testing.
So when you declare a constant in base 10, the compiler converts it to a binary integer in 23 bits and an exponent in 8 (or 52 bit integer and 11 bit exponent).
To print it back out, it converts this fraction back into base 10.
Gross simplification: the rule is that "floats are good for 2 or 3 decimal places, doubles for 4 or 5". That is to say, the first 2 or 3 decimal places printed will be exactly what you put in. After that, you have to work out the encoding to see what you're going to get.
This is only a rule of thumb, and as it happens your test case shows one instance where the float representation is only good to 1 d.p.
The way to figure out what will be printed is to simulate exactly what the compiler / libraries / hardware will do:
Convert the number to binary, and round to 24 significant (binary) digits.
Convert that number to decimal, and round to 6 (decimal) digits after the decimal point.
Of course, this is exactly what your program does already, so what are you asking for?
Edit to illustrate, I'll work through one of your examples:
Begin by converting 23.7 to binary:
10111.1011001100110011001100110011001100110011001100110011...
Round that number to 24 significant binary digits:
10111.1011001100110011010
Note that it rounded up. Converting back to decimal gives:
23.700000762939453125
Now, round this value to 6 digits after the decimal point:
23.700001
Which is exactly what you observed.