Precise floating-point<->string conversion - c++

I am looking for a library function to convert floating point numbers to strings, and back again, in C++. The properties I want are that str2num(num2str(x)) == x and that num2str(str2num(x)) == x (as far as possible). The general property is that num2str should represent the simplest rational number that when rounded to the nearest representable floating pointer number gives you back the original number.
So far I've tried boost::lexical_cast:
double d = 1.34;
string_t s = boost::lexical_cast<string_t>(d);
printf("%s\n", s.c_str());
// outputs 1.3400000000000001
And I've tried std::ostringstream, which seems to work for most values if I do stream.precision(16). However, at precision 15 or 17 it either truncates or gives ugly output for things like 1.34. I don't think that precision 16 is guaranteed to have any particular properties I require, and suspect it breaks down for many numbers.
Is there a C++ library that has such a conversion? Or is such a conversion function already buried somewhere in the standard libraries/boost.
The reason for wanting these functions is to save floating point values to CSV files, and then read them correctly. In addition, I'd like the CSV files to contain simple numbers as far as possible so they can be consumed by humans.
I know that the Haskell read/show functions already have the properties I am after, as do the BSD C libraries. The standard references for string<->double conversions is a pair of papers from PLDI 1990:
How to read floating point numbers accurately, Will Klinger
How to print floating point numbers accurately, Guy Steele et al
Any C++ library/function based on these would be suitable.
EDIT: I am fully aware that floating point numbers are inexact representations of decimal numbers, and that 1.34==1.3400000000000001. However, as the papers referenced above point out, that's no excuse for choosing to display as "1.3400000000000001"
EDIT2: This paper explains exactly what I'm looking for: http://drj11.wordpress.com/2007/07/03/python-poor-printing-of-floating-point/

I am still unable to find a library that supplies the necessary code, but I did find some code that does work:
http://svn.python.org/view/python/branches/py3k/Python/dtoa.c?view=markup
By supplying a fairly small number of defines it's easy to abstract away the Python integration. This code does indeed meet all the properties I outline.

I think this does what you want, in combination with the standard library's strtod():
#include <stdio.h>
#include <stdlib.h>
int dtostr(char* buf, size_t size, double n)
{
int prec = 15;
while(1)
{
int ret = snprintf(buf, size, "%.*g", prec, n);
if(prec++ == 18 || n == strtod(buf, 0)) return ret;
}
}
A simple demo, which doesn't bother to check input words for trailing garbage:
int main(int argc, char** argv)
{
int i;
for(i = 1; i < argc; i++)
{
char buf[32];
dtostr(buf, sizeof(buf), strtod(argv[i], 0));
printf("%s\n", buf);
}
return 0;
}
Some example inputs:
% ./a.out 0.1 1234567890.1234567890 17 1e99 1.34 0.000001 0 -0 +INF NaN
0.1
1234567890.1234567
17
1e+99
1.34
1e-06
0
-0
inf
nan
I imagine your C library needs to conform to some sufficiently recent version of the standard in order to guarantee correct rounding.
I'm not sure I chose the ideal bounds on prec, but I imagine they must be close. Maybe they could be tighter? Similarly I think 32 characters for buf are always sufficient but never necessary. Obviously this all assumes 64-bit IEEE doubles. Might be worth checking that assumption with some kind of clever preprocessor directive -- sizeof(double) == 8 would be a good start.
The exponent is a bit messy, but it wouldn't be difficult to fix after breaking out of the loop but before returning, perhaps using memmove() or suchlike to shift things leftwards. I'm pretty sure there's guaranteed to be at most one + and at most one leading 0, and I don't think they can even both occur at the same time for prec >= 10 or so.
Likewise if you'd rather ignore signed zero, as Javascript does, you can easily handle it up front, e.g.:
if(n == 0) return snprintf(buf, size, "0");
I'd be curious to see a detailed comparison with that 3000-line monstrosity you dug up in the Python codebase. Presumably the short version is slower, or less correct, or something? It would be disappointing if it were neither....

The reason for wanting these functions is to save floating point values to CSV files, and then read them correctly. In addition, I'd like the CSV files to contain simple numbers as far as possible so they can be consumed by humans.
You cannot have conversion double → string → double and in the same time having the string human readable.
You need to need to choose between an exact conversion and a human readable string. This is the definition of max_digits10 and digits10:
difference explained by stackoverflow
digits10
max_digits10
Here is an implementation of num2str and str2num with two different contexts from_double (conversion double → string → double) and from_string (conversion string → double → string):
#include <iostream>
#include <limits>
#include <iomanip>
#include <sstream>
namespace from_double
{
std::string num2str(double d)
{
std::stringstream ss;
ss << std::setprecision(std::numeric_limits<double>::max_digits10) << d;
return ss.str();
}
double str2num(const std::string& s)
{
double d;
std::stringstream ss(s);
ss >> std::setprecision(std::numeric_limits<double>::max_digits10) >> d;
return d;
}
}
namespace from_string
{
std::string num2str(double d)
{
std::stringstream ss;
ss << std::setprecision(std::numeric_limits<double>::digits10) << d;
return ss.str();
}
double str2num(const std::string& s)
{
double d;
std::stringstream ss(s);
ss >> std::setprecision(std::numeric_limits<double>::digits10) >> d;
return d;
}
}
int main()
{
double d = 1.34;
if (from_double::str2num(from_double::num2str(d)) == d)
std::cout << "Good for double -> string -> double" << std::endl;
else
std::cout << "Bad for double -> string -> double" << std::endl;
std::string s = "1.34";
if (from_string::num2str(from_string::str2num(s)) == s)
std::cout << "Good for string -> double -> string" << std::endl;
else
std::cout << "Bad for string -> double -> string" << std::endl;
return 0;
}

Actually I think you'll find that 1.34 IS 1.3400000000000001. Floating point numbers are not precise. You can't get around this. 1.34f is 1.34000000333786011 for example.

As stated by others. Floating-point numbers are not that accurate its an artifact on how they store the value.
What you are really looking for is a Decimal number representation.
Basically this uses an integer to store the number and has a specific accuracy after the decimal point.
A quick Google got this:
http://www.codeproject.com/KB/mcpp/decimalclass.aspx

Related

bulletproof use of from_chars()

I have some literal strings which I want to convert to integer and even double. The base is 16, 10, 8, and 2.
At this time, I wonder about the behavior of std::from_chars() - I try to convert and the error code inside from_chars_result return holds success - even if it isn't as shown here:
#include <iostream>
#include <string_view>
#include <charconv>
using namespace std::literals::string_view_literals;
int main()
{
auto const buf = "01234567890ABCDEFG.FFp1024"sv;
double d;
auto const out = std::from_chars(buf.begin(), buf.end(), d, std::chars_format::hex);
if(out.ec != std::errc{} || out.ptr != buf.end())
{
std::cerr << buf << '\n'
<< std::string(std::distance(buf.begin(), out.ptr), ' ') << "^- here\n";
auto const ec = std::make_error_code(out.ec);
std::cerr << "err: " << ec.message() << '\n';
return 1;
}
std::cout << d << '\n';
}
gives:
01234567890ABCDEFG.FFp1024
^- here
err: Success
For convenience also at coliru.
In my use case, I'll check the character set before but, I'm not sure about the checks to make it bulletproof. Is this behavior expected (maybe my English isn't sufficient, or I didn't read carefully enough)? I've never seen such checks on iterators on blogs etc.
The other question is related to different base like 2 and 8. Base of 10 and 16 seems to be supported - what would be the way for the other two bases?
Addendum/Edit:
Bulletproof here means that I can have nasty things in the string. The obvious thing for me is that 'G' is not a hex character. But I would have expected an appropriate error code in some way! The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If I enter a crazy long hex float, at least a numerical result out of range comes up.
By bulletproof I also mean that I can find such impossible strings by length, for example, so that I can save myself the call to from_chars() - for float/doubles and integers (here I would 'strlen' compare digits10 from std::numeric_limits).
The from_chars utility is designed to convert the first number it finds in the string and to return a pointer to the point where it stopped. This allows you to parse strings like "42 centimeters" by first converting the number and then parsing the rest of the string yourself for what comes after it.
The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If you know that the entire string should be a number, then checking that the pointer in the result points to the end of the string is the normal way to ensure that from_chars read the entire string.

Issues saving double as binary in c++

In my simulation code for a particle system, I have a class defined for particles, and each particle has a property of pos containing its position, which is a double pos[3]; as there are 3 coordinate components per particle. So with particle object defined by particles = new Particle[npart]; (as we have npart many particles), then e.g. the y-component of the 2nd particle would be accessed with double dummycomp = particles[1].pos[1];
To save the particles to file before using binary I would use (saved as txt, with float precision of 10 and one particle per line):
#include <iostream>
#include <fstream>
ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);
for (int i=0; i<npart; i++){
outfile << particle[i].pos[0] << " " << particle[i].pos[1] << " " << particle[i].pos[2] << endl;
}
outfile.close();
But now, to save space, I am trying to save the configuration as a binary file, and my attempt, inspired from here, has been as follows:
ofstream outfile("test.bin", ios::binary | ios::out);
for (int i=0; i<npart; i++){
outfile.write(reinterpret_cast<const char*>(particle[i].pos),streamsize(3*sizeof(double)));
}
outfile.close();
but I am facing a segmentation fault when trying to run it. My questions are:
Am I doing something wrong with reinterpret_cast or rather in the argument of streamsize()?
Ideally, it would be great if the saved binary format could also be read within Python, is my approach (once fixed) allowing for that?
working example for the old saving approach (non-binary):
#include <iostream>
#include <fstream>
using namespace std;
class Particle {
public:
double pos[3];
};
int main() {
int npart = 2;
Particle particles[npart];
//initilizing the positions:
particles[0].pos[0] = -74.04119568;
particles[0].pos[1] = -44.33692582;
particles[0].pos[2] = 17.36278231;
particles[1].pos[0] = 48.16310086;
particles[1].pos[1] = -65.02325252;
particles[1].pos[2] = -37.2053818;
ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);
for (int i=0; i<npart; i++){
outfile << particles[i].pos[0] << " " << particles[i].pos[1] << " " << particles[i].pos[2] << endl;
}
outfile.close();
return 0;
}
And in order to save the particle positions as binary, substitute the saving portion of the above sample with
ofstream outfile("test.bin", ios::binary | ios::out);
for (int i=0; i<npart; i++){
outfile.write(reinterpret_cast<const char*>(particles[i].pos),streamsize(3*sizeof(double)));
}
outfile.close();
2nd addendum: reading the binary in Python
I managed to read the saved binary in python as follows using numpy:
data = np.fromfile('test.bin', dtype=np.float64)
data
array([-74.04119568, -44.33692582, 17.36278231, 48.16310086,
-65.02325252, -37.2053818 ])
But given the doubts cast in the comments regarding non-portability of binary format, I am not confident this type of reading in Python will always work! It would be really neat if someone could elucidate on the reliability of such approach.
The trouble is that base 10 representation of double in ascii is flawed and not guaranteed to give you the correct result (especially if you only use 10 digits). There is a potential for a loss of information even if you use all std::numeric_limits<max_digits10> digits as the number may not be representable in base 10 exactly.
The other issue you have is that the binary representation of a double is not standardized so using it is very fragile and can lead to code breaking very easily. Simply changing the compiler or compiler sittings can result in a different double format and changing architectures you have absolutely no guarantees.
You can serialize it to text in a non lossy representation by using the hex format for doubles.
stream << std::fixed << std::scientific << particles[i].pos[0];
// If you are using C++11 this was simplified to
stream << std::hexfloat << particles[i].pos[0];
This has the affect of printing the value with the same as "%a" in printf() in C, that prints the string as "Hexadecimal floating point, lowercase". Here both the radix and mantissa are converted into hex values before being printed in a very specific format. Since the underlying representation is binary these values can be represented exactly in hex and provide a non lossy way of transferring data between systems. IT also truncates proceeding and succeeding zeros so for a lot of numbers is relatively compact.
On the python side. This format is also supported. You should be able to read the value as a string then convert it to a float using float.fromhex()
see: https://docs.python.org/3/library/stdtypes.html#float.fromhex
But your goal is to save space:
But now, to save space, I am trying to save the configuration as a binary file.
I would ask the question do you really need to save space? Are you running on a low powered low resource environment? Sure then space saving can definitely be a thing (but that is rare nowadays (but these environments do exist)).
But it seems like you are running some form of particle simulation. This does not scream low resource use case. Even if you have tera bytes of data I would still go with a portable easy to read format over binary. Preferably one that is not lossy. Storage space is cheap.
I suggest using a library instead of writing a serialization/deserialization routine from scratch. I find cereal really easy to use, maybe even easier than boost::serialization. It reduces the opportunity for bugs in your own code.
In your case I'd go about serializing doubles like this using cereal:
#include <cereal/archives/binary.hpp>
#include <fstream>
int main() {
std::ofstream outfile("test.bin", ios::binary);
cereal::BinaryOutputArchive out(outfile);
double x, y, z;
x = y = z = 42.0;
out(x, y, z);
}
To deserialize them you'd use:
#include <cereal/archives/binary.hpp>
#include <fstream>
int main() {
std::ifstream infile("test.bin", ios::binary);
cereal::BinaryInputArchive in(infile);
double x,y,z;
in(x, y, z);
}
You can also serialize/deserialize whole std::vector<double>s in the same fashion. Just add #include <cereal/types/vector.hpp> and use in / out like in the given example on a single std::vector<double> instead of multiple doubles.
Ain't that swell.
Edit
In a comment you asked, whether it'd be possible to read a created binary file like that with Python.
Answer:
Serialized binary files aren't really meant to be very portable (things like endianness could play a role here). You could easily adapt the example code I gave you to write a JSON file (another advantage of using a library) and read that format in Python.
Oh and cereal::JSONOutputArchive has an option for setting precision.
Just curious if you ever investigated the idea of converting your data to vectored coordinates instead of Cartesian X,Y,Z? It would seem that this would potentially reduce the size of your data by about 30%: Two coordinates instead of three, but perhaps needing slightly higher precision in order to convert back to your X,Y,Z.
The vectored coordinates could still be further optimized by using the various compression techniques above (text compression or binary conversion).

Parse and convert denorm numbers?

In C++, we can store denorm numbers into variables without problems:
double x = std::numeric_limits<double>::denorm_min();
Then, we can print this variable without problems:
std::cout<<std::setprecision(std::numeric_limits<double>::max_digits10)
std::cout<<std::scientific;
std::cout<<x;
std::cout<<std::endl;
And it will print:
4.94065645841246544e-324
But a problem occurs when one tries to parse this number. Imagine that this number is stored inside a file, and read as a string. The problem is that:
std::string str = "4.94065645841246544e-324";
double x = std::stod(str);
will throw an std::out_of_range exception.
So my question is: how to convert a denorm value stored in a string?
I'm not sure I have understood the problem, but using std::istringstream like this:
std::string str = "4.94065645841246544e-324";
double x;
std::istringstream iss(str);
iss >> x;
std::cout << std::setprecision(std::numeric_limits<double>::max_digits10);
std::cout << std::scientific;
std::cout << x << std::endl;
...gives me:
4.94065645841246544e-324
Apparently, you can use the strtod (or the older atof) interface from cstdlib. I doubt whether this is guaranteed or portable.
I'm not sure if it will make a difference, but you are actually printing:
(std::numeric_limits<double>::max_digits10 + 1) = 18 decimal digits.
e.g., an IEEE-754 64-bit double with round-trip precision is "1.16" in scientific notation. Perhaps this is introducing some ULP / rounding that interferes with the conversion?
The problem with denormals and std::stod is that the latter is defined in terms of std::strtod, which may set errno=ERANGE on underflow (it's implementation-defined whether it'll do, and in glibc it does). As reminded by gcc developers, in such a case std::stod is defined by the standard to throw std::out_of_range.
So your proper workaround is to use std::strtod directly, ignoring ERANGE when the value it returns is finite and nonzero, like here:
double stringToDouble(const char* str, std::size_t* pos=nullptr)
{
errno=0;
char* end;
const auto x=std::strtod(str, &end);
if(errno==ERANGE)
{
// Ignore it for denormals
if(x!=0 && x>-HUGE_VAL && x<HUGE_VAL)
return x;
throw std::out_of_range("strtod: ERANGE");
}
else if(errno)
throw std::invalid_argument("strtod failed");
if(pos)
*pos=end-str;
return x;
}
Note that, unlike std::istringstream approach suggested in another answer, this will work for hexfloats too.

How to work with large numbers when writing and reading a file?

I have written a codes to write my data from one input file to another output file, I used to read all lines of my input file
while (!inputfile.eof())
but in my output file, the last line is missing. So I would like to know, how to prevent this error?
My second question is: for writing data into file, I used
Outputfile.write((char*)&a,sizeof(double));
Outputfile.write((char*)&b,sizeof(double));
here a = 289814.150 and b = 4320978.613 but in the output file, it shows like
289814 4.32098e+006
(value of a is rounded and b value shows with e values) so what is the reason for this and how to fixed this problem?
Here i tried to use cout.setf(ios::fixed);, but if this works for data written on the screen, I don’t know how to fix this to write double data inside my file.
I want to write real values with 3 decimals only in my output file. Please anyone can help thanks.
Okay, based on comments, the intent here has (at least I hope) become reasonably clear: to convert pairs of numbers in text format to binary format, and be able to verify that the converted numbers accurately represent the originals.
There are a number of ways to do that, but the first thing to keep in mind is that no matter what else you do, converting floating point numbers to/from text (decimal) format can and normally will lead to some degree of inaccuracy. The problem is fairly simple: floating point is (normally) done in binary. This means it can only represent fractions in which the denominator is a power of 2 (or a sum of powers of 2). Decimal, obviously enough, uses base 10, so fractions can be composed of a sum of powers of 2 and powers of 5. Any of those that involves a power of 2 (e.g., 0.2) can only be approximated in binary -- pretty much like trying to represent 1/3rd in decimal.
This means your only reasonable choice is to allow some discrepancy between the decimal and binary versions. The best you can hope for is to keep the errors to a minimum. To test for that, what you probably need/want to do is convert the binary floating point back to decimal in the original format, and check whether it's close to the original (e.g., ignore errors in the final digit, at least errors of +/- 1).
The conversion itself should be pretty trivial:
#include <fstream>
int main(int argc, char **argv) {
// checking argc omitted for clarity.
std::ifstream infile(argv[1]);
std::ofstream outfile(argv[2], std::ios::binary);
double a, b;
while (infile >> a && infile >> b) {
outfile.write((char const *)&a, sizeof(a));
outfile.write((char const *)&b, sizeof(b));
}
return 0;
}
Verifying the data isn't nearly so easy. One possibility would be something like this (starting from the two files, one binary and one text):
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
int main(int argc, char **argv) {
std::string text;
std::ostringstream converter;
std::ifstream text_file(argv[1]);
std::ifstream bin_file(argv[2], std::ios::binary);
double bin_value;
while (text_file >> text) {
bin_file.read((char *)&bin_value, sizeof(bin_value));
// the manipulators will probably need tweaking to match original format.
converter << std::fixed << std::setw(3) << std::setprecision(3) << bin_value;
if (converter.str() != text)
;// they're identical
else if (converter.str().substr(0,3) == text.substr(0,3))
;// the first three digits are equal
else
;// bigger error
}
return 0;
}
That's much more likely to need some tweaking to work the way you want, but the general idea should be in the ballpark as long as you're sure the original numbers are all formatted consistently.

String manipulation using Arduino and C++

I am trying to manipulate a string in C++. I am working with an Arduino board so I am limited on what I can use. I am also still learning C++ (Sorry for any stupid questions)
Here is what I need to do:
I need to send miles per hour to a 7 segment display. So if I have a number such as 17.812345, I need to display 17.8 to the 7 segment display. What seems to be most efficient way is to first multiply by 10 (This is to shift the decimal point right one place), then cast 178.12345 to an int (to chop decimal points off). The part I am stuck on is how to break apart 178. In Python I could slice the string, but I can't find anything on how to do this in C++ (or at least, I can't find the right terms to search for)
There are four 7 segment displays and a 7 segment display controller. It will measure up to tenths of a mile per hour. Thank you very much for an assistance and information you can provide me.
It would probably be easiest to not convert it to a string, but just use arithmetic to separate the digits, i.e.
float speed = 17.812345;
int display_speed = speed * 10 + 0.5; // round to nearest 0.1 == 178
int digits[4];
digits[3] = display_speed % 10; // == 8
digits[2] = (display_speed / 10) % 10; // == 7
digits[1] = (display_speed / 100) % 10; // == 1
digits[0] = (display_speed / 1000) % 10; // == 0
and, as pointed out in the comments, if you need the ASCII value for each digit:
char ascii_digits[4];
ascii_digits[0] = digits[0] + '0';
ascii_digits[1] = digits[1] + '0';
ascii_digits[2] = digits[2] + '0';
ascii_digits[3] = digits[3] + '0';
This a way you can do it in C++ without modulus math (either way seems fine to me):
#include "math.h"
#include <stdio.h>
#include <iostream.h>
int main( ) {
float value = 3.1415;
char buf[16];
value = floor( value * 10.0f ) / 10.0f;
sprintf( buf, "%0.1f", value );
std::cout << "Value: " << value << std::endl;
return 0;
}
If you actually want to be processing this stuff as strings, I would recommend looking into stringstream. It can be used much the same as any other stream, such as cin and cout, except instead of sending all output to the console you get an actual string out of the deal.
This will work with standard C++. Don't know much about Arduino, but some quick googling suggests it won't support stringstreams.
A quick example:
#include <sstream> // include this for stringstreams
#include <iostream>
#include <string>
using namespace std; // stringstream, like almost everything, is in std
string stringifyFloat(float f) {
stringstream ss;
ss.precision(1); // set decimal precision to one digit.
ss << fixed; // use fixed rather than scientific notation.
ss << f; // read in the value of f
return ss.str(); // return the string associated with the stream.
}
int main() {
cout << stringifyFloat(17.812345) << endl; // 17.8
return 0;
}
You can use a function such as this toString and work your way up from there, like you would in Python, or just use modulo 10,100,1000,etc to get it as numbers. I think manipulating it as a string might be easier for you, but its up to you.
You could also use boost::lexical_cast, but it will probably be hard to get boost working in an embedded system like yours.
A good idea would be to implement a stream for the display. That way the C++ stream syntax could be used and the rest of the application would remain generic. Although this may be overkill for an embedded system.
If you still want to use std::string you may want to use a reverse iterator. This way you can start at the right most digit (in the string) and work towards the left, one character at a time.
If you have access to the run-time library code, you could set up a C language I/O for the display. This is easier to implement than a C++ stream. You could then use fprint, fputs to write to the display. I implemented a debug port in this method, and it was easier for the rest of the developers to use.