Does the dot in the end of a float suggest lack of precision? - c++

When I debug my software in VS C++ by stepping the code I notice that some float calculations show up as a number with a trailing dot, i.e.:
1232432.
One operation that lead up to this result is this:
float result = pow(10, a * 0.1f) / b
where a is a large negative number around -50 to -100 and b is most often around 1. I read some articles about problem with precision when it comes to floating-points. My question is just if the trailing dot is a Visual-Studio-way of telling me that the precision is very low on this number, i.e. in the variable result. If not, what does it mean?
This came up at work today and I remember that there was a problem for larger numbers so this did to occur every time (and by "this" I mean that trailing dot). But I do remember that it happened when there was seven digits in the number. Here they wright that the precision of floats are seven digits:
C++ Float Division and Precision
Can this be the thing and Visual Studio tells me this by putting a dot in the end?
I THINK I FOUND IT! It says "The mantissa is specified as a sequence of digits followed by a period". What does the mantissa mean? Can this be different on a PC and when running the code on a DSP? Because the thing is that I get different results and the only thing that looks strange to me is this period-thing, since I don't know what it means.
http://msdn.microsoft.com/en-us/library/tfh6f0w2(v=vs.71).aspx

If you're referring to the "sig figs" convention where "4.0" means 4±0.1 and "4.00" means 4±0.01, then no, there's no such concept in float or double. Numbers are always* stored with 24 or 53 significant bits (7.22 or 15.95 decimal digits) regardless of how many are actually "significant".
The trailing dot is just a decimal point without any digits after it (which is a legal C literal). It either means that
The value is 1232432.0 and they trimed the unnecessary trailing zero, OR
Everything is being rounded to 7 significant digits (in which case the true value might also be 1232431.5, 1232431.625, 1232431.75, 1232431.875, 1232432.125, 1232432.25, 1232432.375, or 1232432.5.)
The real question is, why are you using float? double is the "normal" floating-point type in C(++), and float a memory-saving optimization.
* Pedants will be quick to point out denormals, x87 80-bit intermediate values, etc.

The precision is not variable, that is simply how VS is formatting it for display. The precision (or lackof) is always constant for a given floating point number.

The MSDN page you linked to talks about the syntax of a floating-point literal in source code. It doesn't define how the number will be displayed by whatever tool you're using. If you print a floating-point number using either printf or std:cout << ..., the language standard specifies how it will be printed.
If you print it in the debugger (which seems to be what you're doing), it will be formatted in whatever way the developers of the debugger decided on.
There are a number of different ways that a given floating-point number can be displayed: 1.0, 1., 10.0E-001, and .1e+1 all mean exactly the same thing. A trailing . does not typically tell you anything about precision. My guess is that the developers of the debugger just used 1232432. rather than 1232432.0 to save space.
If you're seeing the trailing . for some values, and a decimal number with no . at all for others, that sounds like an odd glitch (possibly a bug) in the debugger.
If you're wondering what the actual precision is, for IEEE 32-bit float (the format most computers use these days), the next representable numbers before and after 1232432.0 are 1232431.875 and 1232432.125. (You'll get much better precision using double rather than float.)

Related

C++ Type of variables - value

I am the beginner, but I think same important thinks I should learn as soon as it possible.
So I have a code:
float fl=8.28888888888888888888883E-5;
cout<<"The value = "<<fl<<endl;
But my .exe file after run show:
8.2888887845911086e-005
I suspected the numbers to limit of the type and rest will be the zero, but I saw digits, which are random. Maybe it gives digits from memory after varible?
Could you explain me how it does works?
I suspected the numbers to limit of the type and rest will be the zero
Yes, this is exactly what happens, but it happens in binary. This program will show it by using the hexadecimal printing format %a:
#include <stdio.h>
int main(int c, char *v[]) {
float fl = 8.28888888888888888888883E-5;
printf("%a\n%a\n", 8.28888888888888888888883E-5, fl);
}
It shows:
0x1.5ba94449649e2p-14
0x1.5ba944p-14
In these results, 0x1.5ba94449649e2p-14 is the hexadecimal representation of the double closest to 8.28888888888888888888883*10-5, and 0x1.5ba944p-14
is the representation of the conversion to float of that number. As you can see, the conversion simply truncated the last digits (in this case. The conversion is done according to the rounding mode, and when the rounding goes up instead of down, it changes one or more of the last digits).
When you observe what happens in decimal, the fact that float and double are binary floating-point formats on your computer means that there are extra digits in the representation of the value.
I suspected the numbers to limit of the type and rest will be the zero
That is what happens internally. Excess bits beyond what the type can store are lost.
But that's in the binary representation. When you convert it to decimal, you can get trailing non-zero digits.
Example:
0b0.00100 is 0.125 in decimal
What you're seeing is a result of the fact that you cannot exactly represent a floating-point number in memory. Because of this, floats will be stored as the nearest value that can be stored in memory. A float usually has 24 bits used to represent the mantissa, which translates to about 6 decimal places (this is implementation defined though, so you shouldn't rely on this). When printing more than 6 decimal digits, you'll notice your value isn't stored in memory as the value you intended, and you'll see random digits.
So to recap, the problem you encountered is caused by the fact that base-10 decimal numbers cannot be represented in memory, instead the closest number to it is stored and this number will then be used.
each data type has range after this range all number is from memory or rubbish so you have to know this ranges and deal with it when you write code.
you can know this ranges from here or here

VBA debugger precision

I had a single which I believe the C++ equivalent is float in VBA in an Excel workbook module. Anyways, the value I originally assigned (876.34497) is rounded off to 876.345 in the Immediate Window, and Watch, and hover tooltip when I set a breakpoint on the VBA. However, if I pass this Single to a C++ DLL C++ reports it as the original value 876.34497.
So, is it actually stored in memory as the original value? Is this some limitation of the debugger? Unsure what is going on here. Makes it difficult to test if what I'm passing is what I'm getting on the C++ side.
I tried:
?CStr(test)
876.345
?CDbl(test)
876.344970703125
?CSng(test)
876.345
VBA isn't very straightforward, so at some level it must be stored as 876.34497 in memory. Otherwise, I don't think CDbl would be correct like it is.
VBA variables of type "single" are stored as "32-bit hardware implementation of IEEE 754[-]1985 [sic]." [see: https://msdn.microsoft.com/en-us/library/ee177324.aspx].
What this means in English is, "single" precision numbers are converted to binary then truncated to fit in a 4 byte (32-bit) sequence. The exact process is very well described in Wikipedia under http://en.wikipedia.org/wiki/Single-precision_floating-point_format . The upshot is that all single precision numbers are expressed as
(1) a 23 bit "fraction" between 0 and 1, *times*
(2) an 8-bit exponent which represents a multiplier between 2^(-127) and 2^128, *times*
(3) one more bit for positive or negative.
The process of converting numbers to binary and back causes two types of rounding errors:
(1) Significant Digits -- as you have noticed, there is a limit on significant digits. A 22 bit integer can only have 8,388,607 unique values. Stated another way, no number can be expressed with greater than +/- 0.000012% precision. Reaching back to high school science, you may recall that that is another way of saying you cannot count on more than six significant digits (well, decimal digits, at least ... of course you have 22 significant binary digits). So any representation of a number with more than six significant digits will get rounded off. However, it won't get rounded off to the nearest decimal digit ... it will get rounded off to the nearest binary digit. This often causes some unexpected results (like yours).
(2) Binary conversion -- The other type of error is even more pernicious. There are some numbers with significantly less than six (decimal) digits that will get rounded off. For example, 1/5 in decimal is 0.2000000. It never gets "rounded off." But the same number in binary is 0.00110011001100110011.... repeating forever. (That sequence is equivalent to 1/8 + 1/16 + 1/16*(1/8+1/16) + 1/256*(1/8+1/16) ... ) If you used an arbitrary number of binary digits to represent 0.20, then converted it back to decimal, you will NEVER get exactly 0.20. For example, if you used eight bits, you would have 0.00110011 in binary which is:
0.12500000
0.06250000
0.00781250
+ 0.00390625
------------
0.19921875
No matter how many binary digits you use, you will never get exactly 0.20, because 0.20 cannot be expressed as the sum of powers of two.
That in a nutshell explains what's going on. When you assign 876.34497 to "test," it gets converted internally to:
1 10001000 0110110001011000010011
136 5,969,427
Which is (+1) * 2^(136-127) * (5,969,427)/(2^23)
Excel is automatically truncating the display of your single-precision number to show only six significant digits, because it knows that the seventh digit might be wrong. I can't tell you what the number is exactly because my excel doesn't display enough significant digits! But you get the point.
When you coerce the value into double precision, it uses the entire binary string and then adds another 4 bytes worth of zeroes to the end. It now allows you to display twice as many significant figures because it is double precision, but as you can see, the conversion from 8 decimal digits to 23 binary digits and then appending another long string of zeros has introduced some errors. Not really errors, if you understand what it's doing; just artifacts. After all, it's doing exactly what you told it to do ... you just didn't know what you were telling it to do!

converting floating point values to ascii and back again without introducing errors

At first sight, this seems trivial, but the usual (radix 2 <-> radix 10) FP<->ASCII conversions cannot always be done without introducing errors. Granted, these are small, but what options exist to make the conversions to and from ASCII perfect, that is, what are the possibilities of making the conversions, without introducing any error at all? I was thinking about base64 encoding, or bit-encoding (e.g. something like 11110101010...), both of these would preserve the radix.
EDIT: Since I can't answer myself, here's what I had in mind:
double d{.1};
auto const s(::std::to_string(*reinterpret_cast<::std::uint64_t*>(&d)));
::std::uint64_t n(::std::stoull(s));
auto const e(*reinterpret_cast<double*>(&n));
assert(d == e);
What do you mean exactly by "without introducing errors"? If it
is for the machine to reread later, 17 digits precision
guarantees round trip: the actual value in the text will not be
the exact value of the double, but it will be closer to the
original double value than to any other double value, so
reconversion to double will result in the initial value. If you
have access to C++11, you can also set the format to output the
value in hex:
std::cout.setf( std::ios_base::fixed | std::ios_base::scientific,
std::ios_base::floatfield );
In this case, the output should be exact, regardless of the
precision.
If it is for humans to read, and know the exact value, there is
nothing in the standard library which will guarantee this. In
theory, outputting 53 digits should suffice, but the neither the
C++ standard nor the IEEE standard require the implementation to
guard against rounding errors in the conversion routine at this
precision, and some implementations just append a sufficiently
large number of '0' after the 19th or 20th digit, rather than
waste runtime calculating incorrect values.
I think the question you are asking is how to round-trip a floating point double value via an ASCII (string) representation. I agree, for this purpose printing the number in fixed or floating point decimal notation is completely unsuitable.
If you don't care what the string looks like then the simple solution is to just treat the 8 byte double as two integers. Two hex integers will occupy 16 character positions. With practice you can even read one of these and estimate the value.
The same thing in Base-64 just reduces the number of character positions (to 11/12). The number formatted this way is quite unreadable.
There are other ways, but why bother? These should suffice.

Why the digits after decimal are all zero?

I want to perform some calculations and I want the result correct up to some decimal places, say 12.
So I wrote a sample:
#define PI 3.1415926535897932384626433832795028841971693993751
double d, k, h;
k = 999999/(2*PI);
h = 999999;
d = PI*k*k*h;
printf("%.12f\n", d);
But it gives the output:
79577232813771760.000000000000
I even used setprecision(), but same answer rather in exponential form.
cout<<setprecision(12)<<d<<endl;
prints
7.95772328138e+16
Used long double also, but in vain.
Now is there any way other than storing the integer part and the fractional part separately in long long int types?
If so, what can be done to get the answer precisely?
A double has only about 16 decimal digits of precision. Everything after the decimal point would be nonsense. (In fact, the last digit or two left of the point may not agree with an infinite-precision calculation.)
Long double is not standardized, AFAIK. It may be that on your system it is the same as double, or no more precise. That would slightly surprise me, but it doesn't violate anything.
You need to read Double-Precision concepts again; more carefully.
The double has increased precision by using 64 bits.
Stuff before the decimal is more important than that after it.
So, when you have a large integer part, it will truncate the lower precision -- this is being described to you in various answers here as rounding off.
Update:
To increase precision, you'll need to use some library or change your language.
Check this other question: Best coding language for dealing with large numbers (50000+ digits)
Yet, I'll ask you to re-check your intent once more.
Do you really need 12 decimal places for numbers that have really high values
(over 10 digits in the integer part like in your example)?
Maybe you won't really have large integer parts
(in which case such code should work fine).
But if you are tracking a value like 10000000000.123456789,
I am really interested in exactly which application you are working on (astronomy?).
If the integer part of your values is some way under 10000, you should be fine here.
Update2:
IF you must demonstrate the ability of a specific formula to work accurately within constrained error limits, the way to go is fixing the processing of your formula such that the least error is introduced.
Example,
If you want to do say, (x * y) / z
it would be prudent to try something like max(x,y)/z * min(x,y)
rather than, the original form which may overflow after (x * y), loosing precision if that did not fit in the 16 decimals of double
If you had just 2 digit precision,
. 2-digit regular-precision
`42 * 7 290 297
(42 * 7)/2 290/2 294/2
Result ==> 145 147
But ==> 42/2 = 21
21 * 7 = 147
This is probably the intent of your contest.
The double-precision binary format used by most computers can only hold about 16 digits, after that you'll get rounding. See http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Floating point values have a limit range of digits. Just because your "PI" value has six times as many digits as a double will support doesn't alter the way the hardware works.
A typical (IEEE754) double will produce approximately 15-16 decimal places. Whether that's 0.12345678901235, 1234567.8901235, 12345678901235 or 12345678901235000000000, or some other variation.
In other words, yes, if you calculate your calculation EXACTLY, you'll get lots of decimal places, because pi never ends. On a computer, you get about 15-16 digits, no matter what input values you use - all that changes is where in that sequence the decimal place sits. To get more, you need "big number support", such as the Gnu Multiprcession (GMP) library.
You're looking for std::fixed. That tells the ostream not to use exponential form.
cout << setprecision(12) << std::fixed << d << endl;

C++: How to Convert From Float to String Without Rounding, Truncation or Padding? [duplicate]

This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))