Maximize floating point precision in ASCII with a fixed field width - c++

I'm trying to write some floating point data to a scientific file format that specifies ASCII fields of a fixed width (in this case, 16 characters). I'd like to maximize the precision written to the file, but every number written must fit in the fixed field limit.
Simply calling std::setprecision(12), for example, results in a loss of precision for M_PI and field overflow for M_PI / 10000, even when combined with std::setw(16). Similar arguments can be made against std::fixed
The best I've been able to come up with is
constexpr int fieldWidth = 16;
string formatField( double value ) {
stringstream ss;
ss << setprecision(fieldWidth-7) << setw(fieldWidth) << scientific << value;
return ss.str();
}
The assumption here is that no number is greater than 1e100 or too close to zero (ex 1e-100). It is also less than optimal for positive numbers less than 100 since it cedes a character to the + sign and two more to the exponent.
Can I improve on my current solution? Improvements would include 1) Better precision across a range of numbers and/or 2) A stronger guarantee that the field width won't be violated. Solutions using boost are welcome as well.

Related

float number to string converting implementation in STD

I faced with a curious issue. Look at this simple code:
int main(int argc, char **argv) {
char buf[1000];
snprintf_l(buf, sizeof(buf), _LIBCPP_GET_C_LOCALE, "%.17f", 0.123e30f);
std::cout << "WTF?: " << buf << std::endl;
}
The output looks quire wired:
123000004117574256822262431744.00000000000000000
My question is how it's implemented? Can someone show me the original code? I did not find it. Or maybe it's too complicated for me.
I've tried to reimplement the same transformation double to string with Java code but was failed. Even when I tried to get exponent and fraction parts separately and summarize fractions in cycle I always get zeros instead of these numbers "...822262431744". When I tried to continue summarizing fractions after the 23 bits (for float number) I faced with other issue - how many fractions I need to collect? Why the original code stops on left part and does not continue until the scale is end?
So, I really do not understand the basic logic, how it implemented. I've tried to define really big numbers (e.g. 0.123e127f). And it generates huge number in decimal format. The number has much higher precision than float can be. Looks like this is an issue, because the string representation contains something which float number cannot.
Please read documentation:
printf, fprintf, sprintf, snprintf, printf_s, fprintf_s, sprintf_s, snprintf_s - cppreference.com
The format string consists of ordinary multibyte characters (except %), which are copied unchanged into the output stream, and conversion specifications. Each conversion specification has the following format:
introductory % character
...
(optional) . followed by integer number or *, or neither that specifies precision of the conversion. In the case when * is used, the precision is specified by an additional argument of type int, which appears before the argument to be converted, but after the argument supplying minimum field width if one is supplied. If the value of this argument is negative, it is ignored. If neither a number nor * is used, the precision is taken as zero. See the table below for exact effects of precision.
....
Conversion Specifier
Explanation
Expected Argument Type
f F
converts floating-point number to the decimal notation in the style [-]ddd.ddd. Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.
double
So with f you forced form ddd.ddd (no exponent) and with .17 you have forced to show 17 digits after decimal separator. With such big value printed outcome looks that odd.
Finally I've found out what the difference between Java float -> decimal -> string convertation and c++ float -> string (decimal) convertation. I did not find the original source code, but I replicated the same code in Java to make it clear. I think the code explains everything:
// the context size might be calculated properly by getting maximum
// float number (including exponent value) - its 40 + scale, 17 for me
MathContext context = new MathContext(57, RoundingMode.HALF_UP);
BigDecimal divisor = BigDecimal.valueOf(2);
int tmp = Float.floatToRawIntBits(1.23e30f)
boolean sign = tmp < 0;
tmp <<= 1;
// there might be NaN value, this code does not support it
int exponent = (tmp >>> 24) - 127;
tmp <<= 8;
int mask = 1 << 23;
int fraction = mask | (tmp >>> 9);
// at this line we have all parts of the float: sign, exponent and fractions. Let's build mantissa
BigDecimal mantissa = BigDecimal.ZERO;
for (int i = 0; i < 24; i ++) {
if ((fraction & mask) == mask) {
// i'm not sure about speed, maybe division at each iteration might be faster than pow
mantissa = mantissa.add(divisor.pow(-i, context));
}
mask >>>= 1;
}
// it was the core line where I was losing accuracy, because of context
BigDecimal decimal = mantissa.multiply(divisor.pow(exponent, context), context);
String str = decimal.setScale(17, RoundingMode.HALF_UP).toPlainString();
// add minus manually, because java lost it if after the scale value become 0, C++ version of code doesn't do it
if (sign) {
str = "-" + str;
}
return str;
Maybe topic is useless. Who really need to have the same implementation like C++ has? But at least this code keeps all precision for float number comparing to the most popular way converting float to decimal string:
return BigDecimal.valueOf(1.23e30f).setScale(17, RoundingMode.HALF_UP).toPlainString();
The C++ implementation you are using uses the IEEE-754 binary32 format for float. In this format, the closet representable value to 0.123•1030 is 123,000,004,117,574,256,822,262,431,744, which is represented in the binary32 format as +13,023,132•273. So 0.123e30f in the source code yields the number 123,000,004,117,574,256,822,262,431,744. (Because the number is represented as +13,023,132•273, we know its value is that exactly, which is 123,000,004,117,574,256,822,262,431,744, even though the digits “123000004117574256822262431744” are not stored directly.)
Then, when you format it with %.17f, your C++ implementation prints the exact value faithfully, yielding “123000004117574256822262431744.00000000000000000”. This accuracy is not required by the C++ standard, and some C++ implementations will not do the conversion exactly.
The Java specification also does not require formatting of floating-point values to be exact, at least in some formatting operations. (I am going from memory and some supposition here; I do not have a citation at hand.) It allows, perhaps even requires, that only a certain number of correct digits be produced, after which zeros are used if needed for positioning relative to the decimal point or for the requested format.
The number has much higher precision than float can be.
For any value represented in the float format, that value has infinite precision. The number +13,023,132•273 is exactly +13,023,132•273, which is exactly 123,000,004,117,574,256,822,262,431,744, to infinite precision. The precision the format has for representing numbers affects only which numbers it can represent, not how precisely it represents the numbers that it does represent.

Elegant solution to remove trailing 0's with precision set

Is there any elegant solution using the std C++ or Boost libraries to output a double to std::cout in a way that the following conditions are met:
scientific notation is disabled
the precision for the decimal part is 6
however, trailing 0's (for the decimal part) are not printed out
For example:
double d = 200000779998;
std::cout << `[something]` << d;
should print out exactly 200000779998. [something] should possibly be a noexcept combination of some existing manipulators.
This is not a solution to the problem:
std::cout << std::setprecision(6) << std::fixed << d;
because it prints out 200000779998.000000 with trailing 0's
Instead of using the fixed manipulator, you can try to use (abuse?) defaultfloat. As far as I understand, it chooses either fixed or scientific based on the ability to put the number within the specified precision. As a result you can set the precision to the number of digits of the integral part + the requested fractional precision (6 in your case).
double d = 200000779998;
std::cout << std::setprecision(integralDigits(d) + 6) << d << std::endl;
You can try it here.
Hard to prove a negative, but I would assume no.
The requirements are inconsistent with any normal use. Space efficiency dictates a binary format. 6 digits (decimal) of precision suggests a format intended for human readers, who can't churn through lots of data. And humans have no issue dealing with a consistent 6 digit format.
So, you're basically targeting a format that has no obvious audience, and that is why I would be surprised if there is support for that.

C++ String to float with precision

I have a file with number readings (example 5.513208E-05 / 1.146383E-05)
I read the file and store the entries in a temporary string.
After that I convert the temporary string into float variable (which I store in an Multi Dimensional Array).
I use the code below to convert.
getline(infile, temporary_string, ',');
Array[i][0] = ::atof(temporary_string.c_str());
getline(infile, temporary_string);
Array[i][1] = ::atof(temporary_string.c_str());
The problem is that when I print the floats to the screen
5.51321e-05 1.14638e-05 instead of
5.513208E-05 1.146383E-05
How can I get the precise numbers stored ???
You don't specify precision when you read or convert the string. Instead you set the precision when you output the value:
std::cout << std::setprecision(3) << 1.2345 << '\n';
The above will produce the following oputput:
1.23
See e.g. this reference.
Ensure you have double Array[][], not float. A text presentation (base 10) is always approximated by the binary floating point number (base 2), but with luck approximated number of atof has the same presentation, when using the same format. In general one does not do much calculation, and on output uses a reduced precision with setprecision or formats.
Every floating-point representation of numbers has limited precision. In particular, float has 24 bits (1 fixed+23 variable) for its binary mantissa, thus implying a precision of roughly seven decimal digits.
If you need more precision for the stored number, you may wish to consider using double instead of float. On normal PCs, double has 53 bits (1+52) for the binary mantissa, thus allowing a 15-decimal digit precision.
But remember that there's also a problem when those numbers are output. I think the default precision for both printf() and std::ostream is only 6 digits, both for float and for double, unless you specify otherwise. There is no point, however, in demanding a higher precision during output than what the data type provides. So, even though you can say printf("%0.30g", some_float), the extra 23 digits beyond the seven actually supported by the data type might not really produce useful information.

stringstream setprecision and floating point formatting

double value = 02369.000133699;//acutally stored as 2369.000133698999900
const std::uint32_t left = std::uint32_t(std::abs(value) < 1 ? 1: (1 + std::log10(std::abs(value))));
std::ostringstream out;
out << std::setprecision(std::numeric_limits<T>::digits10 - left ) << std::fixed << value;
std::string str = out.str(); //str = "2369.00013369900"
std::ostringstream out2;
out2 << std::setprecision(std::numeric_limits<T>::digits10 ) << std::fixed << value;
std::string str2 = out2.str(); // str2 = "2369.000133698999900"
I'm wondering how std::stringstream/precision works for formatting floating-point number.
It seems that if precision argument is superior to 16 minus number of non-fractional digits, this lead to a formatting of form "2369.000133698999900" instead of a "nice" "2369.00013369900"
how std::stringstream know that 8999900 must be resume to one 9 even if I don"t tell it to do the rounding on the 8 (like passing 12 as argument to the setprecision function) ?but don't do it for argument superior to 12
Formatting binary floating points as decimal values is fairly tricky. The underlying problem is that binary floating points cannot represent decimal values accurately. Even a simple number like 0.1 cannot be represented exactly using binary floating points. That is, the actual value represented is slightly different. When using clever algorithms for reading ("Bellerophon") and formatting ("Dragon4"; these are the names from the original papers and there are improvements of both algorithms which are used in practice) floating point numbers be used to transport decimal values. However, when asking the algorithm to format more decimal digits than it can actually hold, i.e., more than std::numeric_limits<T>::digits10, it will happily do so, [partially] revealing the value it is actually storing.
The formatting algorithm ("Dragon4") assumes that the value it is given is the value closest to the original representable with the floating point type. It uses this information together with an error estimate for the current position to determine the correct digits. The algorithm itself is non-trivial and I haven't fully understood how it works. It is described in the paper "How to Print Floating-Point Numbers Accurately" by Guy L. Steele Jr. and Jon L. White.

C++ internal representation of double/float

I am unable to understand why C++ division behaves the way it does. I have a simple program which divides 1 by 10 (using VS 2003)
double dResult = 0.0;
dResult = 1.0/10.0;
I expect dResult to be 0.1, However i get 0.10000000000000001
Why do i get this value, whats the problem with internal representation of double/float
How can i get the correct value?
Thanks.
Because all most modern processors use binary floating-point, which cannot exactly represent 0.1 (there is no way to represent 0.1 as m * 2^e with integer m and e).
If you want to see the "correct value", you can print it out with e.g.:
printf("%.1f\n", dResult);
Double and float are not identical to real numbers, it is because there are infinite values for real numbers, but only finite number of bits to represent them in double/float.
You can further read: what every computer scientist should know about floating point arithmetics
The ubiquitous IEEE754 floating point format expresses floating point numbers in scientific notation base 2, with a finite mantissa. Since a fraction like 1/5 (and hence 1/10) does not have a presentation with finitely many digits in binary scientific notation, you cannot represent the value 0.1 exactly. More generally, the only values that can be represented exactly are those that fit precisely into binary scientific notation with a mantissa of a few (e.g. 24 or 53 or 64) binary digits, and a suitably small exponent.
Working with integers, floats, and doubles could be tricky. Depends on what is your purpose. If you only want to display in nice format, then you can play with the C++ iomanipulator, precision, showpint, noshowpint. If you are trying to do precise computing with numeric methods, you may have to use some library for accurate representation. If you are multiplying a lots of small and large number, you may have to resole to use log transformations. Here is a small test:
float x=1.0000001;
cout << x << endl;
float y=9.9999999999999;
cout << "using default io format " << y/x << endl;
cout << showpoint << "using showpoint " << y/x << endl;
y=9.9999;
cout << "fewer 9 default C++ " << y/x << endl;
cout << showpoint << "fewer 9 showpoint" << y/x << endl;
1
using default io format 10
using showpoint 10.0000
fewer 9 default C++ 9.99990
fewer 9 showpoint9.99990
In special cases you want to use double (which may be the result of some complicated algorithm) to represent integer numbers, you have to figure out the proper conversion method. Once I had a situation where I want to use a single double value to store two type of values: -1, +1, or (0-1) to make my code more memory efficient (and speed, large memory tends to reduce performance). It is a little tricky to distinguish between +1 and val < 1. In this case I know that the values < 1 has a resolution say only 1/500, Then I can safely use floor(val+0.000001) to get back the 1 value that I initially stored.