Long double does not print as the constant I initialized it with [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Floating point inaccuracy examples
Im having a problem... When i compile the src, the variable showed isn't the same that i initialized, see it:
#include <iostream>
using namespace std;
int main()
{
long double mynum = 4.7;
cout.setf(ios::fixed,ios::floatfield);
cout.precision( 20 );
cout << mynum << endl;
}
And then:
[fpointbin#fedora ~]$ ./a.out
4.70000000000000017764
How to fix it? I want to "cout" shows 4.700000...

Your variable is long double, but the default precision of the literal 4.7 is only double. Since you're printing it as long double, the interpretation chooses to print it with enough significant digits to distinguish it from neighbouring long double values, even though those neighbouring values are not possible doubles.

The internal representation of doubles does not allow for an 'exact' representation of 4.7. The 'closest' is 4.70000000000000017764. In reality there is no need to look at a precision of 20 when you have 64 bit doubles. The maximum effective precision is about 15. Try using 12 or so,
cout.precision( 12 );
and you should get what you want to see.

Most platforms, including yours, can only represent those floating point numbers exactly which have a short, finite binary expansion, i.e. which are finite sums of powers of two. 4.7 is not such a number, so it cannot be represented precisely on your platform, and if you demand excessive precision (20 is too much as your mantissa has 64 bits, and log_10(64) is 19.27), then you will inevitably face small errors.
(However, as #Henning says, you are already losing precision when assigning from a (non-long) double; you should write your literal constant as a long double: 4.7L. Then you should only see an error in the 20th digit.)

floats and doubles are binary floating-point types, i.e. they store a mantissa and an exponent in base 2.
This means that any decimal number that cannot be represented exactly into the finite digits of the mantissa will be approximated; the problem you showed comes from this: 4.7 cannot be represented exactly into the mantissa of a double (the literal 4.7 is of type double, kudos #Henning Makholm for spotting it), so the nearest approximation is used.
To better visualize the problem: in base 3, 2/3 is a number with a finite representation (i.e. 0.23), while in base 10 it is a periodic number (0,6666666...); if you have only a finite space for digits, you'll have to perform an approximation, that will be 0,66666667. That's exactly the same thing here, with the source base being 10 and the "target" base being 2.
If there's a special need to avoid this kind of approximations (e.g. when dealing with decimal amounts of money) special decimal types can be used, that store mantissa and exponent in base 10 (C++ do not provide such type of its own, but there are many decimal classes available on the Net); still, for "normal"/scientific calculations binary FP types are used, because they are much faster and more space-efficient.

Certain numbers cannot be represented in base two. Apparently, 4.7 is one of them. What you're seeing is the closest representable number to 4.7.
There's nothing you can do about this, other than setting the precision to a lower number.

Related

C++ Type of variables - value

I am the beginner, but I think same important thinks I should learn as soon as it possible.
So I have a code:
float fl=8.28888888888888888888883E-5;
cout<<"The value = "<<fl<<endl;
But my .exe file after run show:
8.2888887845911086e-005
I suspected the numbers to limit of the type and rest will be the zero, but I saw digits, which are random. Maybe it gives digits from memory after varible?
Could you explain me how it does works?
I suspected the numbers to limit of the type and rest will be the zero
Yes, this is exactly what happens, but it happens in binary. This program will show it by using the hexadecimal printing format %a:
#include <stdio.h>
int main(int c, char *v[]) {
float fl = 8.28888888888888888888883E-5;
printf("%a\n%a\n", 8.28888888888888888888883E-5, fl);
}
It shows:
0x1.5ba94449649e2p-14
0x1.5ba944p-14
In these results, 0x1.5ba94449649e2p-14 is the hexadecimal representation of the double closest to 8.28888888888888888888883*10-5, and 0x1.5ba944p-14
is the representation of the conversion to float of that number. As you can see, the conversion simply truncated the last digits (in this case. The conversion is done according to the rounding mode, and when the rounding goes up instead of down, it changes one or more of the last digits).
When you observe what happens in decimal, the fact that float and double are binary floating-point formats on your computer means that there are extra digits in the representation of the value.
I suspected the numbers to limit of the type and rest will be the zero
That is what happens internally. Excess bits beyond what the type can store are lost.
But that's in the binary representation. When you convert it to decimal, you can get trailing non-zero digits.
Example:
0b0.00100 is 0.125 in decimal
What you're seeing is a result of the fact that you cannot exactly represent a floating-point number in memory. Because of this, floats will be stored as the nearest value that can be stored in memory. A float usually has 24 bits used to represent the mantissa, which translates to about 6 decimal places (this is implementation defined though, so you shouldn't rely on this). When printing more than 6 decimal digits, you'll notice your value isn't stored in memory as the value you intended, and you'll see random digits.
So to recap, the problem you encountered is caused by the fact that base-10 decimal numbers cannot be represented in memory, instead the closest number to it is stored and this number will then be used.
each data type has range after this range all number is from memory or rubbish so you have to know this ranges and deal with it when you write code.
you can know this ranges from here or here

How many decimal places does the primitive float and double support? [duplicate]

This question already has answers here:
'float' vs. 'double' precision
(6 answers)
Closed 8 years ago.
I have read that double stores 15 digits and float stores 7 digits.
My question is, are these numbers the number of decimal places supported or total number of digits in a number?
If you are on an architecture using IEEE-754 floating point arithmetic (as in most architectures), then the type float corresponds to single precision, and the type double corresponds to double precision, as described in the standard.
Let's make some numbers:
Single precision:
32 bits to represent the number, out of which 24 bits are for mantissa. This means that the least significant bit (LSB) has a relative value of 2^(-24) respect to the MSB, which is the "hidden 1", and it is not represented. Therefore, for a fixed exponent, the minimum representable value is 10^(-7.22) times the exponent. What this means is that for a representation in base exponent notation (3.141592653589 E 25), only "7.22" decimal numbers are significant, which in practice means that at least 7 decimals will be always correct.
Double precision:
64 bits to represent the number, out of which 53 bits are for mantissa. Following the same reasoning, expressing 2^(-53) as a power of 10 results in 10^(-15.95), which in term means that at least 15 decimals will be always correct.
Those are the total number of "significant figures" if you will, counting from left to right, regardless of where the decimal point is. Beyond those numbers of digits, accuracy is not preserved.
The counts you listed are for the base 10 representation.
There are macros for the number of decimal places each type supports. The gcc docs explain what they are and also what they mean:
FLT_DIG
This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.
The value of this macro is supposed to be at least 6, to satisfy ISO C.
DBL_DIG
LDBL_DIG
These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.
On both gcc 4.9.2 and clang 3.5.0, these macros yield 6 and 15, respectively.
are these numbers the number of decimal places supported or total number of digits in a number?
They are the significant digits contained in every number (although you may not need all of them, but they're still there). The mantissa of the same type always contains the same number of bits, so every number consequentially contains the same number of valid "digits" if you think in terms of decimal digits. You cannot store more digits than will fit into the mantissa.
The number of "supported" digits is, however, much larger, for example float will usually support up to 38 decimal digits and double will support up to 308 decimal digits, but most of these digits are not significant (that is, "unknown").
Although technically, this is wrong, since float and double do not have universally well-defined sizes like I presumed above (they're implementation-defined). Also, storage sizes are not necessarily the same as the sizes of intermediate results.
The C++ standard is very reluctant at precisely defining any fundamental type, leaving almost everything to the implementation. Floating point types are no exception:
3.9.1 / 8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.
Now of course all of this is not particularly helpful in practice.
In practice, floating point is (usually) IEEE 754 compliant, with float having a width of 32 bits and double having a width of 64 bits (as stored in memory, registers have higher precision on some notable mainstream architectures).
This is equivalent to 24 bits and 53 bits of matissa, respectively, or 7 and 15 full decimals.

double and accuracy

Using long double I get 18/19 = 0.947368421052631578..., and 947368421052631578 is the repeating decimal. Using double I get 0.947368421052631526... However, the former is correct. Why such an incorrect result?
Thanks for help.
A double typically provides 16(±1) decimal digits. Your example shows this:
4 8 12 16
v v v v
0.947368421052631578 long double
0.947368421052631526 double
The answers agree to 16 digits. This is what should be expected. Also note that there's no guarantee in the C Standard that a long double has more precision than a double.
You're trying to represent every decimal number with a finite amount of bits. Some things just aren't expressible exactly in floating point. Expecting exact answers with floats is your first problem. Take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic
Here's a summary from some lecture notes:
As mentioned earlier, computers cannot represent real numbers precisely since there are only a finite number of bits for storing a real number. Therefore, any number that has infinite number of digits such as 1/3, the square root of 2 and PI cannot be represented completely. Moreover, even a number of finite number of digits cannot be represented precisely because of the way of encoding real numbers.
A double which is usually implemented with IEEE 754 will be accurate to between 15 and 17 decimal digits. Anything past that can't be trusted, even if you can make the compiler display it.

Rounding problem with double type [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why don't operations on double-precision values give expected results?
I am experiencing a peculiar problem in C++. I created a variable of type Double. Then I did some calculations with some values assigned to other variables and assigned the result to the double variable I declared. It gave me a result with a long decimal part. I want it to round to only 2 decimal places. and store it into the variable. But even after several attempt rounding, I couldnt round it to 2 decimal places.
Then I tried another way to check what the real problem is. I created a Double variable and assigned it the value 1.11. But when I debugged it by putting a break point and added a watch for that variable, I could find that the value now stored in the variable is 1.109999999999.
My question is, why is it showing like that? Isnt there any way in which we can round the variable into two decimal places? Why is it showing a long decimal part even if we assign a number with just two decimal places?
Please suggest a way to store numbers - whether it is calculated or directly assigned - as it is, in a double variable rather than a number with a long decimal part.
In the set of double values, there is no such thing as the number 1.11 because internally, double uses a binary representation (as opposed to humans who are used to a decimal representation). Most finite decimal numbers (such as 1.11) have an infinite representation in binary, but since memory is limited, you lose some precision because of rounding.
The closest you can get to 1.11 with the double data type is 1.1100000000000000976996261670137755572795867919921875, which is internally represented as 0x3ff1c28f5c28f5c3.
Your requirement of two decimal places sounds like you are working with money. A simple solution is to store the cents in an integer (as opposed to the dollars in a double):
int cents = 111;
This way, you don't lose any precision. Another solution is to use a dedicated decimal data type.
the floating-point types like float and double are not 100% precise. They may store 14.3 as 14.299999... and there is nothing wrong about that. That is why you should NEVER compare two floats or doubles with == operator, instread you should check if the absolute value of their difference is smaller than a certain epsilon, like 0.000000001
Now, if you want to output the number in a pleasant way, you can use setprecision from <iomanip>
E.g.
#include <iostream>
#include <iomanip>
int main()
{
double d = 1.389040598345;
std::cout << setprecision(2) << d; //outputs 1.39
}
If you want to obtain the value of d rounded 2 decimal places after the point, then you can use this formula
d = floor((d*100)+0.5)/100.0; //d is now 1.39
Not every decimal number has an exact, finite, binary floating-point representation. You've already found one example, but another one is 0.1 (decimal) = 0.0001100110011... (binary).
You either need to live with that, or use a decimal floating-point library (which will be less efficient).
My recommendation would be to store numbers to full precision, and only round when you need to display them to humans.

Setprecision() for a float number in C++?

In C++,
What are the random digits that are displayed after giving setprecision() for a floating point number?
Note: After setting the fixed flag.
example:
float f1=3.14;
cout < < fixed<<setprecision(10)<<f1<<endl;
we get random numbers for the remaining 7 digits? But it is not the same case in double.
Two things to be aware of:
floats are stored in binary.
float has a maximum of 24 significant bits. This is equivalent to 7.22 significant digits.
So, to your computer, there's no such number as 3.14. The closest you can get using float is 3.1400001049041748046875.
double has 53 significant bits (~15.95 significant digits), so you get a more accurate approximation, 3.140000000000000124344978758017532527446746826171875. The "noise" digits don't show up with setprecision(10), but would with setprecision(17) or higher.
They're not really "random" -- they're the (best available) decimal representation of that binary fraction (will be exact only for fractions whose denominator is a power of two, e.g., 3.125 would display exactly).
Of course that changes depending on the number of bits available to represent the binary fraction that best approaches the decimal one you originally entered as a literal, i.e., single vs double precision floats.
Not really a C++ specific issue (applies to all languages using binary floats, typically to exploit the machine's underlying HW, i.e., most languages). For a very bare-bone tutorial, I recommend reading this.