converting floating point values to ascii and back again without introducing errors - c++

At first sight, this seems trivial, but the usual (radix 2 <-> radix 10) FP<->ASCII conversions cannot always be done without introducing errors. Granted, these are small, but what options exist to make the conversions to and from ASCII perfect, that is, what are the possibilities of making the conversions, without introducing any error at all? I was thinking about base64 encoding, or bit-encoding (e.g. something like 11110101010...), both of these would preserve the radix.
EDIT: Since I can't answer myself, here's what I had in mind:
double d{.1};
auto const s(::std::to_string(*reinterpret_cast<::std::uint64_t*>(&d)));
::std::uint64_t n(::std::stoull(s));
auto const e(*reinterpret_cast<double*>(&n));
assert(d == e);

What do you mean exactly by "without introducing errors"? If it
is for the machine to reread later, 17 digits precision
guarantees round trip: the actual value in the text will not be
the exact value of the double, but it will be closer to the
original double value than to any other double value, so
reconversion to double will result in the initial value. If you
have access to C++11, you can also set the format to output the
value in hex:
std::cout.setf( std::ios_base::fixed | std::ios_base::scientific,
std::ios_base::floatfield );
In this case, the output should be exact, regardless of the
precision.
If it is for humans to read, and know the exact value, there is
nothing in the standard library which will guarantee this. In
theory, outputting 53 digits should suffice, but the neither the
C++ standard nor the IEEE standard require the implementation to
guard against rounding errors in the conversion routine at this
precision, and some implementations just append a sufficiently
large number of '0' after the 19th or 20th digit, rather than
waste runtime calculating incorrect values.

I think the question you are asking is how to round-trip a floating point double value via an ASCII (string) representation. I agree, for this purpose printing the number in fixed or floating point decimal notation is completely unsuitable.
If you don't care what the string looks like then the simple solution is to just treat the 8 byte double as two integers. Two hex integers will occupy 16 character positions. With practice you can even read one of these and estimate the value.
The same thing in Base-64 just reduces the number of character positions (to 11/12). The number formatted this way is quite unreadable.
There are other ways, but why bother? These should suffice.

Related

Z3 - Floating point arithmetic API function Z3_mk_fpa_to_ubv

I am playing around with Z3-4.6.0 C++ for the first time. Sorry for the noob questions.
My question has 2 parts.
If I have a floating point number, and I use Z3_mk_fpa_to_ubv(...) function to create an unsigned bit-vector.
How much precision is lost?
If the precision is not lost, can I use this new unsigned bit-vector as a regular bit-vector and apply all operations defined for it for e.g., Z3_mk_bvadd(....)?
I know I can use Z3_mk_fpa_to_ieee_bv(....) for graceful, and IEEE-754 compliant conversion. Afterwards I can add,sub etc the bit-vectors.
Just being curious.
Thank you very much.
I'm afraid you're misinterpreting the role of these functions. A good reference to keep open while working with SMTLib floats is: http://smtlib.cs.uiowa.edu/papers/BTRW15.pdf
mk_fpa_to_ubv
This function corresponds to the FPToUInt function in the cited paper. It's defined as follows:
(The NaN choice above is misleading: It should be read as "undefined.")
Note that the precision loss can be huge here, depending on what the FP value is and the bit-width of the vector. Imagine converting a double-precision floating point value to an 8-bit word: You're smashing values in the range ±2.23×10^−308 to ±1.80×10^308 to a mere 256 different values. This means a large number of conversions simply will go through massive rounding cancelations.
You should think of this as "casting" in C like languages:
unsigned char c;
double f;
c = (char) f;
This is the essence of conversion from double-precision to unsigned byte, which will suffer major precision loss. In the other direction, if you convert to a really large bit-vector (say one that has a thousand bits), then your conversion will still be losing precision per the rounding mode, though you'll be able to cover all the integer values precisely in the range. So, it really depends on what BV-type you convert to and the rounding mode you choose.
mk_fpa_to_ieee_bv
This function has nothing to do with "preserving" the value. So asking "precision loss" here is irrelevant. What it does is that it gives you the underlying bit-vector representation of the floating-point value, per the IEEE-754 spec. The wikipedia article has a good discussion on this representation: https://en.wikipedia.org/wiki/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64
In particular, if you interpret the output of this function as a two's complement integer value, you'll get a completely irrelevant value that has nothing to do with the value of the floating-point number itself. (Also, this conversion is not unique since NaN has multiple corresponding bit-vector patterns.)
Summary
Long story short, conversions from floats to bit-vectors will suffer from precision loss not only due to losing the "fractional" part due to rounding, but also due to the limited range, unless you pick a very-large bit-vector size. The IEEE-754 representation conversion does not preserve value, and thus doing arithmetic on values converted via this function is more or less meaningless.

How to reestablish double in c++

When representing double number its precision corrupts in some degree. For example number 37.3 can be represented as 37.29999999999991.
I need reestablishing of corrupted double number (My project requires that). One approach is converting double into CString.
double d = 37.3;
CString str;
str.Format("%.10f", d);
Output: str = 37.3;
By this way I could reestablish corrupted d. However, I found a counterexample. If I set
d = 37.3500;
then its double representation sometimes be equal to 37.349998474121094. When converting d to CString output is still 37.3499984741, which is not equal to 37.3500 actually.
Why converting 37.3500 didn't give desired answer, while 37.3 gave? Is there any ways to reestablish double?
Thanks.
Why converting 37.3500 didn't give desired answer, while 37.3 gave?
By accident. The representation of 37.3 happened to be close enough that rounding to 10 decimal places gave the expected result, while 37.3499984741 didn't.
Is there any ways to reestablish double?
No, once information has been lost, you can't recover it. If you need an exact representation of decimal numbers, then you'll need a different format than binary floating point. There's no suitable decimal type in the C++ language or standard library; depending on your needs, you might consider libraries such as Boost.Multiprecision or GMP. Alternatively, if you can limit the number of decimal places you need, you might be able to multiply all your numbers by that scale and work with exact integers.
It can be done to some extend, but not easily. Since the string representation is base 10, but the internal representation in base 2, there is rounding involved when converting one into the other. So when you convert the decimal "37.35" to double, the result is not identical to the original number. When converting that number back to a string, the computer cannot know for sure what number was there in the first place, because there are several decimal numbers that result in the same double. However, you can add the constraint that you want the shortest possible decimal string that results in the given double, then there is a very good chance that it recovers your original string precisely. An algorithm using that constraint has been developed by David Gay. Here's the source code, you need both g_fmt.c and dtoa.c, and here is a paper about it. This is the default algorithm used in Python since Version 3.1.

How to convert float to double(both stored in IEEE-754 representation) without losing precision?

I mean, for example, I have the following number encoded in IEEE-754 single precision:
"0100 0001 1011 1110 1100 1100 1100 1100" (approximately 23.85 in decimal)
The binary number above is stored in literal string.
The question is, how can I convert this string into IEEE-754 double precision representation(somewhat like the following one, but the value is not the same), WITHOUT losing precision?
"0100 0000 0011 0111 1101 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010"
which is the same number encoded in IEEE-754 double precision.
I have tried using the following algorithm to convert the first string back to decimal number first, but it loses precision.
num in decimal = (sign) * (1 + frac * 2^(-23)) * 2^(exp - 127)
I'm using Qt C++ Framework on Windows platform.
EDIT: I must apologize maybe I didn't get the question clearly expressed.
What I mean is that I don't know the true value 23.85, I only got the first string and I want to convert it to double precision representation without precision loss.
Well: keep the sign bit, rewrite the exponent (minus old bias, plus new bias), and pad the mantissa with zeros on the right...
(As #Mark says, you have to treat some special cases separately, namely when the biased exponent is either zero or max.)
IEEE-754 (and floating point in general) cannot represent periodic binary decimals with full precision. Not even when they, in fact, are rational numbers with relatively small integer numerator and denominator. Some languages provide a rational type that may do it (they are the languages that also support unbounded precision integers).
As a consequence those two numbers you posted are NOT the same number.
They in fact are:
10111.11011001100110011000000000000000000000000000000000000000 ...
10111.11011001100110011001100110011001100110011001101000000000 ...
where ... represent an infinite sequence of 0s.
Stephen Canon in a comment above gives you the corresponding decimal values (did not check them, but I have no reason to doubt he got them right).
Therefore the conversion you want to do cannot be done as the single precision number does not have the information you would need (you have NO WAY to know if the number is in fact periodic or simply looks like being because there happens to be a repetition).
First of all, +1 for identifying the input in binary.
Second, that number does not represent 23.85, but slightly less. If you flip its last binary digit from 0 to 1, the number will still not accurately represent 23.85, but slightly more. Those differences cannot be adequately captured in a float, but they can be approximately captured in a double.
Third, what you think you are losing is called accuracy, not precision. The precision of the number always grows by conversion from single precision to double precision, while the accuracy can never improve by a conversion (your inaccurate number remains inaccurate, but the additional precision makes it more obvious).
I recommend converting to a float or rounding or adding a very small value just before displaying (or logging) the number, because visual appearance is what you really lost by increasing the precision.
Resist the temptation to round right after the cast and to use the rounded value in subsequent computation - this is especially risky in loops. While this might appear to correct the issue in the debugger, the accummulated additional inaccuracies could distort the end result even more.
It might be easiest to convert the string into an actual float, convert that to a double, and convert it back to a string.
Binary floating points cannot, in general, represent decimal fraction values exactly. The conversion from a decimal fractional value to a binary floating point (see "Bellerophon" in "How to Read Floating-Point Numbers Accurately" by William D.Clinger) and from a binary floating point back to a decimal value (see "Dragon4" in "How to Print Floating-Point Numbers Accurately" by Guy L.Steele Jr. and Jon L.White) yield the expected results because one converts a decimal number to the closest representable binary floating point and the other controls the error to know which decimal value it came from (both algorithms are improved on and made more practical in David Gay's dtoa.c. The algorithms are the basis for restoring std::numeric_limits<T>::digits10 decimal digits (except, potentially, trailing zeros) from a floating point value stored in type T.
Unfortunately, expanding a float to a double wrecks havoc on the value: Trying to format the new number will in many cases not yield the decimal original because the float padded with zeros is different from the closest double Bellerophon would create and, thus, Dragon4 expects. There are basically two approaches which work reasonably well, however:
As someone suggested convert the float to a string and this string into a double. This isn't particularly efficient but can be proven to produce the correct results (assuming a correct implementation of the not entirely trivial algorithms, of course).
Assuming your value is in a reasonable range, you can multiply it by a power of 10 such that the least significant decimal digit is non-zero, convert this number to an integer, this integer to a double, and finally divide the resulting double by the original power of 10. I don't have a proof that this yields the correct number but for the range of value I'm interested in and which I want to store accurately in a float, this works.
One reasonable approach to avoid this entirely issue is to use decimal floating point values as described for C++ in the Decimal TR in the first place. Unfortunately, these are not, yet, part of the standard but I have submitted a proposal to the C++ standardization committee to get this changed.

Does the dot in the end of a float suggest lack of precision?

When I debug my software in VS C++ by stepping the code I notice that some float calculations show up as a number with a trailing dot, i.e.:
1232432.
One operation that lead up to this result is this:
float result = pow(10, a * 0.1f) / b
where a is a large negative number around -50 to -100 and b is most often around 1. I read some articles about problem with precision when it comes to floating-points. My question is just if the trailing dot is a Visual-Studio-way of telling me that the precision is very low on this number, i.e. in the variable result. If not, what does it mean?
This came up at work today and I remember that there was a problem for larger numbers so this did to occur every time (and by "this" I mean that trailing dot). But I do remember that it happened when there was seven digits in the number. Here they wright that the precision of floats are seven digits:
C++ Float Division and Precision
Can this be the thing and Visual Studio tells me this by putting a dot in the end?
I THINK I FOUND IT! It says "The mantissa is specified as a sequence of digits followed by a period". What does the mantissa mean? Can this be different on a PC and when running the code on a DSP? Because the thing is that I get different results and the only thing that looks strange to me is this period-thing, since I don't know what it means.
http://msdn.microsoft.com/en-us/library/tfh6f0w2(v=vs.71).aspx
If you're referring to the "sig figs" convention where "4.0" means 4±0.1 and "4.00" means 4±0.01, then no, there's no such concept in float or double. Numbers are always* stored with 24 or 53 significant bits (7.22 or 15.95 decimal digits) regardless of how many are actually "significant".
The trailing dot is just a decimal point without any digits after it (which is a legal C literal). It either means that
The value is 1232432.0 and they trimed the unnecessary trailing zero, OR
Everything is being rounded to 7 significant digits (in which case the true value might also be 1232431.5, 1232431.625, 1232431.75, 1232431.875, 1232432.125, 1232432.25, 1232432.375, or 1232432.5.)
The real question is, why are you using float? double is the "normal" floating-point type in C(++), and float a memory-saving optimization.
* Pedants will be quick to point out denormals, x87 80-bit intermediate values, etc.
The precision is not variable, that is simply how VS is formatting it for display. The precision (or lackof) is always constant for a given floating point number.
The MSDN page you linked to talks about the syntax of a floating-point literal in source code. It doesn't define how the number will be displayed by whatever tool you're using. If you print a floating-point number using either printf or std:cout << ..., the language standard specifies how it will be printed.
If you print it in the debugger (which seems to be what you're doing), it will be formatted in whatever way the developers of the debugger decided on.
There are a number of different ways that a given floating-point number can be displayed: 1.0, 1., 10.0E-001, and .1e+1 all mean exactly the same thing. A trailing . does not typically tell you anything about precision. My guess is that the developers of the debugger just used 1232432. rather than 1232432.0 to save space.
If you're seeing the trailing . for some values, and a decimal number with no . at all for others, that sounds like an odd glitch (possibly a bug) in the debugger.
If you're wondering what the actual precision is, for IEEE 32-bit float (the format most computers use these days), the next representable numbers before and after 1232432.0 are 1232431.875 and 1232432.125. (You'll get much better precision using double rather than float.)

C++: How to Convert From Float to String Without Rounding, Truncation or Padding? [duplicate]

This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))