Formatting a parsable float literal - c++

I need to output float numbers using printf formatting, in such a way that the generated string are valid float literals. As the numbers are arbitrary, I use the %g format descriptor, followed by an f.
This raises a problem in one specific case: if the mantissa in an integer and no exponent is appended. In this case, the number is printed without the decimal point nor exponent, resulting in an illegal constant.
E.g.
3.3 3.3f ok
3000000000000000. 3e+015f ok
0.000000000000003 3e-015f ok
3 3f nok
Do you see an easy way to handle this corner case ?

if the mantissa in an integer and no exponent is appended. In this case, the number is printed without the decimal point nor exponent
So use the alternative form with %#g. It will always print a decimal separator, but it will also disable removing trailing zeros from the output.

Related

strtof() function misplacing decimal place

I have a string "1613894376.500012077" and I want to use strtof in order to convert to floating point 1613894376.500012077. The problem is when I use strtof I get the following result with the decimal misplaced 1.61389e+09. Please help me determine how to use strof properly.
A typical float is 32-bit and can only represent exactly about 232 different values. "1613894376.500012077" is not one of those.
"1.61389e+09" is the same value as "1613890000.0" and represents a close value that float can represent.
The 2 closest floats are:
1613894272.0
1613894400.0 // slightly closer to 1613894376.500012077
Print with more precision to see more digits.
The decimal point is not misplaced. The notation “1.61389e+09” means 1.61389•109, which is 1,613,890,000., which has the decimal point in the correct place.
The actual result of strtof in your computer is probably 1,613,894,400. This is the closest value to 1613894376.500012077 that the IEEE-754 binary32 (“single”) format can represent, and that is the format commonly used for float. When you print it with %g, the default is to use just six significant digits. To see it with more precision, print it with %.999g.
The number 1613894376.500012077 is equivalent (the same number up to the precision of the machine as 1.61389e+09.) The e+09 suffix means that the decimal point is located nine decimal digits right the place it has been placed (or that the number is multiplied by 10 to the ninth power). This is a common notation in computer science called scientific notation.

C++ Type of variables - value

I am the beginner, but I think same important thinks I should learn as soon as it possible.
So I have a code:
float fl=8.28888888888888888888883E-5;
cout<<"The value = "<<fl<<endl;
But my .exe file after run show:
8.2888887845911086e-005
I suspected the numbers to limit of the type and rest will be the zero, but I saw digits, which are random. Maybe it gives digits from memory after varible?
Could you explain me how it does works?
I suspected the numbers to limit of the type and rest will be the zero
Yes, this is exactly what happens, but it happens in binary. This program will show it by using the hexadecimal printing format %a:
#include <stdio.h>
int main(int c, char *v[]) {
float fl = 8.28888888888888888888883E-5;
printf("%a\n%a\n", 8.28888888888888888888883E-5, fl);
}
It shows:
0x1.5ba94449649e2p-14
0x1.5ba944p-14
In these results, 0x1.5ba94449649e2p-14 is the hexadecimal representation of the double closest to 8.28888888888888888888883*10-5, and 0x1.5ba944p-14
is the representation of the conversion to float of that number. As you can see, the conversion simply truncated the last digits (in this case. The conversion is done according to the rounding mode, and when the rounding goes up instead of down, it changes one or more of the last digits).
When you observe what happens in decimal, the fact that float and double are binary floating-point formats on your computer means that there are extra digits in the representation of the value.
I suspected the numbers to limit of the type and rest will be the zero
That is what happens internally. Excess bits beyond what the type can store are lost.
But that's in the binary representation. When you convert it to decimal, you can get trailing non-zero digits.
Example:
0b0.00100 is 0.125 in decimal
What you're seeing is a result of the fact that you cannot exactly represent a floating-point number in memory. Because of this, floats will be stored as the nearest value that can be stored in memory. A float usually has 24 bits used to represent the mantissa, which translates to about 6 decimal places (this is implementation defined though, so you shouldn't rely on this). When printing more than 6 decimal digits, you'll notice your value isn't stored in memory as the value you intended, and you'll see random digits.
So to recap, the problem you encountered is caused by the fact that base-10 decimal numbers cannot be represented in memory, instead the closest number to it is stored and this number will then be used.
each data type has range after this range all number is from memory or rubbish so you have to know this ranges and deal with it when you write code.
you can know this ranges from here or here

VBA debugger precision

I had a single which I believe the C++ equivalent is float in VBA in an Excel workbook module. Anyways, the value I originally assigned (876.34497) is rounded off to 876.345 in the Immediate Window, and Watch, and hover tooltip when I set a breakpoint on the VBA. However, if I pass this Single to a C++ DLL C++ reports it as the original value 876.34497.
So, is it actually stored in memory as the original value? Is this some limitation of the debugger? Unsure what is going on here. Makes it difficult to test if what I'm passing is what I'm getting on the C++ side.
I tried:
?CStr(test)
876.345
?CDbl(test)
876.344970703125
?CSng(test)
876.345
VBA isn't very straightforward, so at some level it must be stored as 876.34497 in memory. Otherwise, I don't think CDbl would be correct like it is.
VBA variables of type "single" are stored as "32-bit hardware implementation of IEEE 754[-]1985 [sic]." [see: https://msdn.microsoft.com/en-us/library/ee177324.aspx].
What this means in English is, "single" precision numbers are converted to binary then truncated to fit in a 4 byte (32-bit) sequence. The exact process is very well described in Wikipedia under http://en.wikipedia.org/wiki/Single-precision_floating-point_format . The upshot is that all single precision numbers are expressed as
(1) a 23 bit "fraction" between 0 and 1, *times*
(2) an 8-bit exponent which represents a multiplier between 2^(-127) and 2^128, *times*
(3) one more bit for positive or negative.
The process of converting numbers to binary and back causes two types of rounding errors:
(1) Significant Digits -- as you have noticed, there is a limit on significant digits. A 22 bit integer can only have 8,388,607 unique values. Stated another way, no number can be expressed with greater than +/- 0.000012% precision. Reaching back to high school science, you may recall that that is another way of saying you cannot count on more than six significant digits (well, decimal digits, at least ... of course you have 22 significant binary digits). So any representation of a number with more than six significant digits will get rounded off. However, it won't get rounded off to the nearest decimal digit ... it will get rounded off to the nearest binary digit. This often causes some unexpected results (like yours).
(2) Binary conversion -- The other type of error is even more pernicious. There are some numbers with significantly less than six (decimal) digits that will get rounded off. For example, 1/5 in decimal is 0.2000000. It never gets "rounded off." But the same number in binary is 0.00110011001100110011.... repeating forever. (That sequence is equivalent to 1/8 + 1/16 + 1/16*(1/8+1/16) + 1/256*(1/8+1/16) ... ) If you used an arbitrary number of binary digits to represent 0.20, then converted it back to decimal, you will NEVER get exactly 0.20. For example, if you used eight bits, you would have 0.00110011 in binary which is:
0.12500000
0.06250000
0.00781250
+ 0.00390625
------------
0.19921875
No matter how many binary digits you use, you will never get exactly 0.20, because 0.20 cannot be expressed as the sum of powers of two.
That in a nutshell explains what's going on. When you assign 876.34497 to "test," it gets converted internally to:
1 10001000 0110110001011000010011
136 5,969,427
Which is (+1) * 2^(136-127) * (5,969,427)/(2^23)
Excel is automatically truncating the display of your single-precision number to show only six significant digits, because it knows that the seventh digit might be wrong. I can't tell you what the number is exactly because my excel doesn't display enough significant digits! But you get the point.
When you coerce the value into double precision, it uses the entire binary string and then adds another 4 bytes worth of zeroes to the end. It now allows you to display twice as many significant figures because it is double precision, but as you can see, the conversion from 8 decimal digits to 23 binary digits and then appending another long string of zeros has introduced some errors. Not really errors, if you understand what it's doing; just artifacts. After all, it's doing exactly what you told it to do ... you just didn't know what you were telling it to do!

How to convert float to double(both stored in IEEE-754 representation) without losing precision?

I mean, for example, I have the following number encoded in IEEE-754 single precision:
"0100 0001 1011 1110 1100 1100 1100 1100" (approximately 23.85 in decimal)
The binary number above is stored in literal string.
The question is, how can I convert this string into IEEE-754 double precision representation(somewhat like the following one, but the value is not the same), WITHOUT losing precision?
"0100 0000 0011 0111 1101 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010"
which is the same number encoded in IEEE-754 double precision.
I have tried using the following algorithm to convert the first string back to decimal number first, but it loses precision.
num in decimal = (sign) * (1 + frac * 2^(-23)) * 2^(exp - 127)
I'm using Qt C++ Framework on Windows platform.
EDIT: I must apologize maybe I didn't get the question clearly expressed.
What I mean is that I don't know the true value 23.85, I only got the first string and I want to convert it to double precision representation without precision loss.
Well: keep the sign bit, rewrite the exponent (minus old bias, plus new bias), and pad the mantissa with zeros on the right...
(As #Mark says, you have to treat some special cases separately, namely when the biased exponent is either zero or max.)
IEEE-754 (and floating point in general) cannot represent periodic binary decimals with full precision. Not even when they, in fact, are rational numbers with relatively small integer numerator and denominator. Some languages provide a rational type that may do it (they are the languages that also support unbounded precision integers).
As a consequence those two numbers you posted are NOT the same number.
They in fact are:
10111.11011001100110011000000000000000000000000000000000000000 ...
10111.11011001100110011001100110011001100110011001101000000000 ...
where ... represent an infinite sequence of 0s.
Stephen Canon in a comment above gives you the corresponding decimal values (did not check them, but I have no reason to doubt he got them right).
Therefore the conversion you want to do cannot be done as the single precision number does not have the information you would need (you have NO WAY to know if the number is in fact periodic or simply looks like being because there happens to be a repetition).
First of all, +1 for identifying the input in binary.
Second, that number does not represent 23.85, but slightly less. If you flip its last binary digit from 0 to 1, the number will still not accurately represent 23.85, but slightly more. Those differences cannot be adequately captured in a float, but they can be approximately captured in a double.
Third, what you think you are losing is called accuracy, not precision. The precision of the number always grows by conversion from single precision to double precision, while the accuracy can never improve by a conversion (your inaccurate number remains inaccurate, but the additional precision makes it more obvious).
I recommend converting to a float or rounding or adding a very small value just before displaying (or logging) the number, because visual appearance is what you really lost by increasing the precision.
Resist the temptation to round right after the cast and to use the rounded value in subsequent computation - this is especially risky in loops. While this might appear to correct the issue in the debugger, the accummulated additional inaccuracies could distort the end result even more.
It might be easiest to convert the string into an actual float, convert that to a double, and convert it back to a string.
Binary floating points cannot, in general, represent decimal fraction values exactly. The conversion from a decimal fractional value to a binary floating point (see "Bellerophon" in "How to Read Floating-Point Numbers Accurately" by William D.Clinger) and from a binary floating point back to a decimal value (see "Dragon4" in "How to Print Floating-Point Numbers Accurately" by Guy L.Steele Jr. and Jon L.White) yield the expected results because one converts a decimal number to the closest representable binary floating point and the other controls the error to know which decimal value it came from (both algorithms are improved on and made more practical in David Gay's dtoa.c. The algorithms are the basis for restoring std::numeric_limits<T>::digits10 decimal digits (except, potentially, trailing zeros) from a floating point value stored in type T.
Unfortunately, expanding a float to a double wrecks havoc on the value: Trying to format the new number will in many cases not yield the decimal original because the float padded with zeros is different from the closest double Bellerophon would create and, thus, Dragon4 expects. There are basically two approaches which work reasonably well, however:
As someone suggested convert the float to a string and this string into a double. This isn't particularly efficient but can be proven to produce the correct results (assuming a correct implementation of the not entirely trivial algorithms, of course).
Assuming your value is in a reasonable range, you can multiply it by a power of 10 such that the least significant decimal digit is non-zero, convert this number to an integer, this integer to a double, and finally divide the resulting double by the original power of 10. I don't have a proof that this yields the correct number but for the range of value I'm interested in and which I want to store accurately in a float, this works.
One reasonable approach to avoid this entirely issue is to use decimal floating point values as described for C++ in the Decimal TR in the first place. Unfortunately, these are not, yet, part of the standard but I have submitted a proposal to the C++ standardization committee to get this changed.

How to determine the last nonzero decimal digit in a float?

I am writing a float printing and formatting library and want to avoid printing trailing zero digits.
For this I would like to accurately determine the last nonzero digit within the first N decimal places after the decimal point. I wonder whether there is a particular efficient way to do this.
This (non-trivial) problem has been fully solved. The idea is to print exactly enough digits so that if you converted the printed digits back to a floating-point number, you would get exactly the number you started with.
The relevant paper is "Printing Floating-Point Numbers Quickly and Accurately", by Robert Burger and R. Kent Dybvig. You can download it here.
You will have to convert the float to a string and then trim the trailing zeros.
I don't think this is very efficient, but i feel there is probably no simplier algorithm
std::cout.precision(n);
//where n is the digits you want to display after the decimal.
If there is a zero present before the precision limit but after decimal,
it will be avoided automatically.
eg. std::cout.precision(5);
then my conidition is evaluated to be 5.55000
only 5.55 will be printed
The obvious solution is to put all digits in a char[N] and check the last digit before printing. But I bet you have thought about that yourself.
The only other solution I can think of, is that the decimal part of 2^(-n) has n non-zero digits.
So if the last non-zero in the binary representation is 2^(-n), there will be exactly n non-zero digits in the decimal expansion. Therefor looking at the binary representation will tell you something about the decimal representation.
However, this is only a partial solution, as rounding could introduce additional trailing zeros.