In C++,
What are the random digits that are displayed after giving setprecision() for a floating point number?
Note: After setting the fixed flag.
example:
float f1=3.14;
cout < < fixed<<setprecision(10)<<f1<<endl;
we get random numbers for the remaining 7 digits? But it is not the same case in double.
Two things to be aware of:
floats are stored in binary.
float has a maximum of 24 significant bits. This is equivalent to 7.22 significant digits.
So, to your computer, there's no such number as 3.14. The closest you can get using float is 3.1400001049041748046875.
double has 53 significant bits (~15.95 significant digits), so you get a more accurate approximation, 3.140000000000000124344978758017532527446746826171875. The "noise" digits don't show up with setprecision(10), but would with setprecision(17) or higher.
They're not really "random" -- they're the (best available) decimal representation of that binary fraction (will be exact only for fractions whose denominator is a power of two, e.g., 3.125 would display exactly).
Of course that changes depending on the number of bits available to represent the binary fraction that best approaches the decimal one you originally entered as a literal, i.e., single vs double precision floats.
Not really a C++ specific issue (applies to all languages using binary floats, typically to exploit the machine's underlying HW, i.e., most languages). For a very bare-bone tutorial, I recommend reading this.
Related
I've made a BOMDAS calculator in C++ that uses doubles. Whenever I input an expression like
1000000000000000000000*1000000000000000000000
I get a result like 1000000000000000000004341624882808674582528.000000. I suspect it has something to do with floating-point numbers.
Floating point number represent values with a fixed size representation. A double can represent 16 decimal digits in form where the decimal digits can be restored (internally, it normally stores the value using base 2 which means that it can accurately represent most fractional decimal values). If the number of digits is exceeded, the value will be rounded appropriately. Of course, the upshot is that you won't necessarily get back the digits you're hoping for: if you ask for more then 16 decimal digits either explicitly or implicitly (e.g. by setting the format to std::ios_base::fixed with numbers which are bigger than 1e16) the formatting will conjure up more digits: it will accurately represent the internally held binary values which may produce up to, I think, 54 non-zero digits.
If you want to compute with large values accurately, you'll need some variable sized representation. Since your values are integers a big integer representation might work. These will typically be a lot slower to compute with than double.
A double stores 53 bits of precision. This is about 15 decimal digits. Your problem is that a double cannot store the number of digits you are trying to store. Digits after the 15th decimal digit will not be accurate.
That's not an error. It's exactly because of how floating-point types are represented, as the result is precise to double precision.
Floating-point types in computers are written in the form (-1)sign * mantissa * 2exp so they only have broader ranges, not infinite precision. They're only accurate to the mantissa precision, and the result after every operation will be rounded as such. The double type is most commonly implemented as IEEE-754 64-bit double precision with 53 bits of mantissa so it can be correct to log(253) ≈ 15.955 decimal digits. Doing 1e21*1e21 produces 1e42 which when rounding to the closest value in double precision gives the value that you saw. If you round that to 16 digits it's exactly the same as 1e42.
If you need more range, use double or long double. If you only works with integer then int64_t (or __int128 with gcc and many other compilers on 64-bit platforms) has a much larger precision (64/128 bits compared to 53 bits). If you need even more precision, use an arbitrary-precision arithmetic library instead such as GMP
Is there a way within C++ of setting a definitive amount of decimal points to a float value? for example if i were to record multiple times as float values, i would most likely generate different results (in terms of number of decimal places) and would like to generate numbers of the same lengths i.e if a number were to return as 1.33 and there are other numbers returning as say 1.333 i would like to make the first result read as 1.330.
I understand there are methods of limiting the amount of decimal places such as setprecision() but i do not want to loose accuracy of my times.
You seem to confuse two things: the actual precision of floating point calculations in C++, and formatting of float (or double, or long double) values when printing with C++ streams (like cout, for example).
The first depends on the hardware/platform, and you cannot control it, apart from choosing between float and double. If you need better precision than what long double can give you, you need a library for arbitrary precision maths, for example GMPLIB.
Controlling number of digits after dot when printing/formatting is easier, see for example this question: Set the digits after decimal point
If your need is to limit the digits after the decimal point whether of folat, double or long double then is way is to use (setprecision). When you use it seperately it will be including the digits before decimal point as well and also if the digits after the decimal point are less than the precision being set,it will not add a zero after them. And the solution is to use fixed and showpoint. So if you want to set the precision to 3 digits after the decimal point then write this line before displaying or computing the values.
cout<<fixed<<showpoint<<setprecision(3);
Using long double I get 18/19 = 0.947368421052631578..., and 947368421052631578 is the repeating decimal. Using double I get 0.947368421052631526... However, the former is correct. Why such an incorrect result?
Thanks for help.
A double typically provides 16(±1) decimal digits. Your example shows this:
4 8 12 16
v v v v
0.947368421052631578 long double
0.947368421052631526 double
The answers agree to 16 digits. This is what should be expected. Also note that there's no guarantee in the C Standard that a long double has more precision than a double.
You're trying to represent every decimal number with a finite amount of bits. Some things just aren't expressible exactly in floating point. Expecting exact answers with floats is your first problem. Take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic
Here's a summary from some lecture notes:
As mentioned earlier, computers cannot represent real numbers precisely since there are only a finite number of bits for storing a real number. Therefore, any number that has infinite number of digits such as 1/3, the square root of 2 and PI cannot be represented completely. Moreover, even a number of finite number of digits cannot be represented precisely because of the way of encoding real numbers.
A double which is usually implemented with IEEE 754 will be accurate to between 15 and 17 decimal digits. Anything past that can't be trusted, even if you can make the compiler display it.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Floating point inaccuracy examples
Im having a problem... When i compile the src, the variable showed isn't the same that i initialized, see it:
#include <iostream>
using namespace std;
int main()
{
long double mynum = 4.7;
cout.setf(ios::fixed,ios::floatfield);
cout.precision( 20 );
cout << mynum << endl;
}
And then:
[fpointbin#fedora ~]$ ./a.out
4.70000000000000017764
How to fix it? I want to "cout" shows 4.700000...
Your variable is long double, but the default precision of the literal 4.7 is only double. Since you're printing it as long double, the interpretation chooses to print it with enough significant digits to distinguish it from neighbouring long double values, even though those neighbouring values are not possible doubles.
The internal representation of doubles does not allow for an 'exact' representation of 4.7. The 'closest' is 4.70000000000000017764. In reality there is no need to look at a precision of 20 when you have 64 bit doubles. The maximum effective precision is about 15. Try using 12 or so,
cout.precision( 12 );
and you should get what you want to see.
Most platforms, including yours, can only represent those floating point numbers exactly which have a short, finite binary expansion, i.e. which are finite sums of powers of two. 4.7 is not such a number, so it cannot be represented precisely on your platform, and if you demand excessive precision (20 is too much as your mantissa has 64 bits, and log_10(64) is 19.27), then you will inevitably face small errors.
(However, as #Henning says, you are already losing precision when assigning from a (non-long) double; you should write your literal constant as a long double: 4.7L. Then you should only see an error in the 20th digit.)
floats and doubles are binary floating-point types, i.e. they store a mantissa and an exponent in base 2.
This means that any decimal number that cannot be represented exactly into the finite digits of the mantissa will be approximated; the problem you showed comes from this: 4.7 cannot be represented exactly into the mantissa of a double (the literal 4.7 is of type double, kudos #Henning Makholm for spotting it), so the nearest approximation is used.
To better visualize the problem: in base 3, 2/3 is a number with a finite representation (i.e. 0.23), while in base 10 it is a periodic number (0,6666666...); if you have only a finite space for digits, you'll have to perform an approximation, that will be 0,66666667. That's exactly the same thing here, with the source base being 10 and the "target" base being 2.
If there's a special need to avoid this kind of approximations (e.g. when dealing with decimal amounts of money) special decimal types can be used, that store mantissa and exponent in base 10 (C++ do not provide such type of its own, but there are many decimal classes available on the Net); still, for "normal"/scientific calculations binary FP types are used, because they are much faster and more space-efficient.
Certain numbers cannot be represented in base two. Apparently, 4.7 is one of them. What you're seeing is the closest representable number to 4.7.
There's nothing you can do about this, other than setting the precision to a lower number.
I have a floating value as 0.1 entering from UI.
But, while converting that string to float i am getting as 0.10...01. The problem is the appending of non zero digit. How do i tackle with this problem.
Thanks,
iSight
You need to do some background reading on floating point representations: http://docs.sun.com/source/806-3568/ncg_goldberg.html.
Given computers are on-off switches, they're storing a rounded answer, and they work in base two not the base ten we humans seem to like.
Your options are to:
display it back with less digits so you round back to base 10 (checkout the Standard library's <iomanip> header, and setprecision)
store the number in some actual decimal-capable object - you'll find plenty of C++ classes to do this via google, but none are provided in the Standard, nor in boost last I looked
convert the input from a string directly to an integral number of some smaller unit (like thousandths), avoiding the rounding.
0.1 (decimal) = 0.00011001100110011... (binary)
So, in general, a number you can represent with a finite number of decimal digits may not be representable with a finite number of bits. But floating point numbers only store the most N significant bits. So, conversions between a decimal string and a "binary" float usually involves rounding.
However a lossless roundtrip conversion decimal string -> double -> decimal string is possible if you restrict yourself to decimal strings with at most 15 significant digits (assuming IEEE 754 64 bit floats). This includes the last conversion. You need to produce a string from the double with at most 15 significant digits.
It is also possible to make the roundtrip double -> string -> double lossless. But here you may need decimal strings with 17 decimal digits to make it work (again assuming IEEE-754 64bit floats).
The best site I've ever seen that explains why some numbers can't be represented exactly is Harald Schmidt's IEEE754 Converter site.
It's an online tool for showing representations of IEEE754 single precision values and I liked it so much, I wrote my own Java app to do it (and double precision as well).
Bottom line, there are only about four billion different 32-bit values you can have but there are an infinite number of real values between any two different values. So you have a problem with precision. That's something you'll have to get used to.
If you want more precision and/or better type for decimal values, you can either:
switch to a higher number of bits.
use a decimal type
use a big-number library like GMP (although I refuse to use this in production code since I discovered it doesn't handle memory shortages elegantly).
Alternatively, you can use the inaccurate values (their error rates are very low, something like one part per hundred million for floats, from memory) and just print them out with less precision. Printing out 0.10000000145 to two decimal places will get you 0.10.
You would have to do millions and millions of additions for the error to accumulate noticeably. Less of other operations of course but still a lot.
As to why you're getting that value, 0.1 is stored in IEEE754 single precision format as follows (sign, exponent and mantissa):
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm 1/n
0 01111011 10011001100110011001101
||||||||||||||||||||||+- 8388608
|||||||||||||||||||||+-- 4194304
||||||||||||||||||||+--- 2097152
|||||||||||||||||||+---- 1048576
||||||||||||||||||+----- 524288
|||||||||||||||||+------ 262144
||||||||||||||||+------- 131072
|||||||||||||||+-------- 65536
||||||||||||||+--------- 32768
|||||||||||||+---------- 16384
||||||||||||+----------- 8192
|||||||||||+------------ 4096
||||||||||+------------- 2048
|||||||||+-------------- 1024
||||||||+--------------- 512
|||||||+---------------- 256
||||||+----------------- 128
|||||+------------------ 64
||||+------------------- 32
|||+-------------------- 16
||+--------------------- 8
|+---------------------- 4
+----------------------- 2
The sign is positive, that's pretty easy.
The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2-4 or 1/16.
The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2n) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.
When you add all these up, you get 1.60000002384185791015625.
When you multiply that by the multiplier, you get 0.100000001490116119384765625, matching the double precision value on Harald's site as far as it's printed:
0.10000000149011612 (out by 0.00000000149011612)
And when you turn off the least significant (rightmost) bit, which is the smallest downward movement you can make, you get:
0.09999999403953552 (out by 0.00000000596046448)
Putting those two together:
0.10000000149011612 (out by 0.00000000149011612)
|
0.09999999403953552 (out by 0.00000000596046448)
you can see that the first one is a closer match, by about a factor of four (14.9:59.6). So that's the closest value you can get to 0.1.
Since floats get stored in binary, the fractional portion is effectively in base-two... and one-tenth is a repeating decimal in base two, same as one-ninth is in base ten.
The most common ways to deal with this are to store your values as appropriately-scaled integers, as in the C# or SQL currency types, or to round off floating-point numbers when you display them.