sprintf formatting problem for doubles with high precision - c++

So I was recently upgrading an old c++ project that was built using the Visual Studio 2012 - Windows XP (v110_xp) platform toolset. In the code of this project, there is some very precise double calculations happening to require up to 20 characters of precision. These doubles were then saved to a string and printed off using the printf APIs. Here is an example of what something that would happen in this project:
double testVal = 123.456789;
// do some calculations on testVal
char str[100] = { 0 };
sprintf(str, "%.20le", testVal);
After this operation str = "1.23456789000...000e+02", which is what is expected.
However, once I update the project to be compatible with Visual Studio 2019, using Visual Studio 2019 (v142) platform Toolset, with c++ 17, the above-mentioned code produces different outputs for str.
After the call to sprintf to format the value to a string, str = "1.23456789000...556e+02". This problem isn't localized to this one value, there are even more aggregious problems. For example, one of the starting values of "2234332.434322" after the sprintf formatting gets changed to "2.23433324343219995499e+07"
From all the documentation I've read with the "l" format code, it should be the correct character for converting long doubles to the string. This behavior feels like textbook float->double conversion though.
I tried setting the projects floating-point model build an argument to precise, strict, and then fast to see if any of these options would help, but it does not have an effect on the problem.
Does anyone know why this is happening?

Use the brand new Ryu (https://github.com/ulfjack/ryu) or Grisu-Exact (https://github.com/jk-jeon/Grisu-Exact) instead which are much faster than sprintf and guaranteed to be roundtrip-correct (and more), or the good old Double-Conversion (https://github.com/google/double-conversion) which is slower than the other two but has the same guarantees, still much faster than sprintf, and is battle-tested.
(Disclaimer: I'm the author of Grisu-Exact.)
I'm not sure if you really need to print out exactly 20 decimal digits, because I personally had rare occasions where the number of digits mattered. If the sole purpose of having 20 digits is just to not lose any precision, then the above mentioned libraries will definitely provide you better (and shorter) results. If the number of digits must be precisely 20 for some reasons, then well, Ryu still provides such a feature (it's called Ryu-printf) which again has the roundtrip-guarantee and much faster than sprintf.
EDIT
To elaborate more on the last sentence, note that in general it is impossible to have the roundtrip guarantee if the number of digits is fixed, because, well, if that fixed number is too small, say, 3, then there is no way to distinguish 0.123 and 0.1234. However, 20 is big enough so that the best approximation of the true value (which is what Ryu-printf produces) is always guaranteed to be roundtrip-correct.

Related

Problems discovering glibc version in C++

My application is distributed with two binaries. One which is linked to an old glibc (for example, 2.17) and another binary that targets a newer glibc (for example, 2.34). I also distribute a launcher binary (linked against glibc 2.17) which interrogates the user's libc version and then runs the correct binary.
My code to establish the user's libc looks something like the following:
std::string gnu(gnu_get_libc_version());
double g_ver = std::stod(gnu);
if(g_ver >= 2.34)
{
return GLIBC_MAIN;
}
else
{
return GLIBC_COMPAT;
}
This works perfectly for the best part, however, some of my users report that despite having a new glibc, the old glibc binary is actually run. I have investigated this and have discovered that double g_ver is equal to 2, not 2.34 as it should be. That is, the decimal part is missing. gnu_get_libc_version() always has the correct value so it must be a problem when converting the string to a double.
I have also tried boost::lexical_cast but this has the same effect.
std::string gnu(gnu_get_libc_version());
//double g_ver = std::stod(gnu);
double glibc = boost::lexical_cast<double>(gnu);
if(g_ver >= 2.34)
{
return GLIBC_MAIN;
}
else
{
return GLIBC_COMPAT;
}
Needless to say, I am unable to reproduce this behaviour on any of my computers even when running the exact same distribution / version of Linux as affected users.
Does anybody know why boost::lexical_cast or std::stod sometimes misses the decimal part of the version number? Is there an alternative approach to this?
UPDATE
Upon further tests, this problem is introduced when using a different locale. I set my locale to fr_FR.UTF-8 on a test machine and was able to reproduce this problem. However, the output of gnu_get_libc_version() seems to be correct but std::stod is unable to parse the decimal point part of the version.
Does anybody know how to correct this problem?
The fundamental problem is that the glibc version is a string and not a decimal number. So for a "proper" solution you need to parse it manually and implement your own logic to decide which version is bigger or smaller.
However, as a quick and dirty hack, try inserting the line
setlocale(LC_NUMERIC, "C");
before the strtod call. That will set the numeric locale back to the default C locale where the decimal separator is .. If you're doing something that needs correct locales later in the program you need to set it back again. Depending on how your program initialized locales earlier, something like
setlocale(LC_NUMERIC, "");
should reset it back to what the environment says the locale should be.
Could this be the locale of the user?
The decimal separator is not a '.' in all locales and stod uses the current locale?
As you are depending on glibc anyway, you can use strverscmp.
gnu_get_libc_version() may return more than just X.X, It may be be 2.0.1 for example.
However strtod wont handle all of the invalid part of the number to would equate to 2.0 in this case.
You need to verify the actual value as a string returned from gnu_get_libc_version in the versions you currently do not have access to

Compiling C++ code to .EXE which returns double [duplicate]

This question already has answers here:
What should main() return in C and C++?
(19 answers)
Closed 5 years ago.
I am working with a MATLAB optimization platform for black-box cost functions (BBCF).
To make the user free-handed, the utilized BBCF can be any executable file which inputs the input parameters of BBF and must output (return) the cost value of BBCF, so that the MATLAB BBCF optimizer finds the best (least cost) input parameter.
Considering that, from the one hand, my BBCF is implemented in C++, and from the other hand the cost value is a double (real number), I need to compile my code to an EXE file that outputs (returns) double.
But, to the best of my knowledge, when I compile a C++ code to EXE, it "mainly" compiles the main() function and its output is the output of main() function (i.e. 0 if running successful!).
An idea could be using a main function that returns double, and then, compiling such main() to EXE but, unfortunately, it is not possible in C++ (as explained in this link or as claimed in the 3rd answer of this question to be a bug of C++ neither of which are business of this question).
Can anyone provide an idea by which the EXE compiled form of a C++ code, outputs (returns) a double value?
This is not 'a bug in C++' (by the way, the bug might be in some C++ compiler, not in the language itself) - it's described in the standard that main() should return an integer:
http://en.cppreference.com/w/cpp/language/main_function
Regarding how to return a non-int from an executable, there are a couple of ways to do that. Two simplest (in terms of how to implement them) solutions come to my mind:
Save it to a file. Then either monitor that file in Matlab for changes (e.g. compare timestamps) or read after each execution of your EXE file depending on how you're going to use it. Not very efficient solution, but does the job and probably the performance penalty is negligible to that of your other calculations.
If you are fine with your cost value losing some numerical accuracy, you can just multiply the double value by some number (the larger this number, the more decimal places you will retain). Then round it, cast it to an int, have it returned from main(), cast it back to double in matlab and divide by the same number. The number used as the multiplier should be a power of 2 so that it doesn't introduce additional rounding errors. This method might be particularly useful if your cost value takes the values limited to the range [0, 1] or if you can normalize it to these values and you know that variations less than some threshold are not important.
In English, 'shall' is an even stronger imperative than 'must'.
Making a change like this would require changes to the operating system and shell. Such changes are unlikely to happen.
The easiest way to pass a double return would to to write it to standard output. Alternatively there are several methods available for interprocess communication.

Bizarre behavior from sprintf in C++/VS2010

I have a super-simple class representing a decimal # with fixed precision, and when I want to format it I do something like this:
assert(d.DENOMINATOR == 1000000);
char buf[100];
sprintf(buf, "%d.%06d", d._value / d.DENOMINATOR, d._value % d.DENOMINATOR);
Astonishingly (to me at least) this does not work. The %06d term comes out all 0s even when d.DENOMINATOR does not evenly divide d._value. And if I throw an extra %d in the format string, I see the right value show up in the third spot -- so it's like something is secretly creating an extra argument between my two.
If I compute the two terms outside of the call to sprintf, everything behaves how I expect. I thought to reproduce this with a more simple test case:
char testa[200];
char testb[200];
int x = 12345, y = 1000;
sprintf(testa, "%d.%03d", x/y, x%y);
int term1 = x/y, term2 = x%y;
sprintf(testb, "%d.%03d", term1, term2);
...but this works properly. So I'm completely baffled as to exactly what's going on, how to avoid it in the future, etc. Can anyone shed light on this for me?
(EDIT: Problem ended up being that d._value and d.DENOMINATOR are both long longs so %d doesn't suffice. Thanks very much to Serge's comment below which pointed to the problem, and Mark's answer submitted shortly thereafter.)
Almost certainly your term components are a 64-bit type (perhaps long on a 64-bit system) which is getting passed into the non-type-safe sprintf. Thus when you create an intermediate int the size is right and it works fine.
g++ will warn about this and many other useful things with -Wall. The preferred solution is of course to use C++ iostreams for your formatting as they're totally type safe.
The alternate solution is to cast the result of your expression to the type that you told sprintf to expect so it pulls the proper number of bytes out of memory.
Finally, never use sprintf when almost every compiler supports snprintf which prevents all sorts of silly mistakes. Your code is fine now but when someone modifies it later and it runs off the end of the buffer you may spend days tracking down the corruption.

64bit integer conversion from string

I'm dealing with the problem of reading a 64bit unsigned integer unsigned long long from a string. My code should work both for GCC 4.3 and Visual Studio 2010.
I read this question and answers on the topic: Read 64 bit integer string from file and thougth that strtoull would make the work just fine and more efficiently than using a std::stringstream. Unfortunately strtoullis not available in Visual Studio's stdlib.h.
So I wrote a short templated function:
template <typename T>
T ToNumber(const std::string& Str)
{
T Number;
std::stringstream S(Str);
S >> Number;
return Number;
}
unsigned long long N = ToNumber<unsigned long long>("1234567890123456789");
I'm worried about the efficiency of this solution so, is there a better option in this escenario?
See http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/d69a6afe-6558-4913-afb0-616f00805229/
"It's called _strtoui64(), _wcstoui64() and _tcstoui64(). Plus the _l versions for custom locales.
Hans Passant."
By the way, the way to Google things like this is to notice that Google automatically thinks you're wrong (just like newer versions of Visual Studio) and searches for something else instead, so be sure to click on the link to search for what you told it to search for.
Of course you can easily enough write your own function to handle simple decimal strings. The standard functions handle various alternatives according to numeric base and locale, which make them slow in any case.
Yes, stringstream will add a heap allocation atop all that. No, performance really doesn't matter until you can tell the difference.
There is a faster option, to use the deprecated std::strstream class which does not own its buffer (hence does not make a copy or perform an allocation). I wouldn't call that "better" though.
You could parse the string 9 digits at a time starting from the rear and multiplying by 1 000 000 000 ^ i, ie (last 8 digits * 1) + (next 8 digits * 1 billion) ... or
(1000000000000000000)+(234567890000000000)+(123456789)

dtoa vs sprintf vs Grisu3 algorithm

What is the best way to render double precision numbers as strings in C++?
I ran across the article Here be dragons: advances in problems you didn’t even know you had which discusses printing floating point numbers.
I have been using sprintf. I don't understand why I would need to modify the code?
If you are happy with sprintf_s you shouldn't change. However if you need to format your output in a way that is not supported by your library, you might need to reimplement a specialized version of sprintf (with any of the known algorithms).
For example JavaScript has very precise requirements on how its numbers must be printed (see section 9.8.1 of the specification). The correct output can't be accomplished by simply calling sprintf. Indeed, Grisu has been developed to implement correct number-printing for a JavaScript compiler.
Grisu is also faster than sprintf, but unless floating-point printing is a bottleneck in your application this should not be a reason to switch to a different library.
Ahah !
The problem outlined in the article you give is that for some numbers, the computer displays something that is theoritically correct but not what we, humans, would have used.
For example, like the article says, 1.2999999... = 1.3, so if your result is 1.3, it's (quite) correct for the computer to display it as 1.299999999... But that's not what you would have seen...
Now the question is why does the computer do that ? The reason is the computer compute in base 2 (binary) and that we usually compute in base 10 (decimal). The results are the same (thanks god !) but the internal storage and the representation are not.
Some numbers looks nice when displayed in base 10, like 1.3 for example, but others don't, for example 1/3 = 0.333333333.... It's the same in base 2, some numbers "looks" nice in base 2 (usually when composed of fractions of 2) and other not. When the computer stores number internally, it may not be able to store it "exactly" and store the closest possible representation, even if the number looked "finite" in decimal. So yes, in this case, it "drifts" a little bit. If you do that again and again, you may lose precision. But there is no other way (unless using special math libs able to store fractions)
The problem arise when the computer tries to give you back in base 10 the number you gave it. Then the computer may gives you 1.299999 instead of the 1.3 you were expected.
That's also the reason why you should never compare floats with ==, <, >, but instead use the special functions islessgreater(a, b) isgreater(a, b) etc.
So the actual function you use (sprintf) is fine and as exact as it can, it gives you correct values, you just have to know that when dealing with floats, 1.2999999 at maximum precision is OK if you were expecting 1.3
Now if you want to "pretty print" those numbers to have the best "human" representation (base 10), you may want to use a special library, like your grisu3 which will try to undo the drift that may have happen and align the number to the closest base 10 representation.
Now the library cannot use a crystal ball and find what numbers were drifted or not, so it may happen that you really meant 1.2999999 at maximum precision as stored in the computer and the lib will "convert" it to 1.3... But it's not worse nor less precise than displaying 1.29999 instead of 1.3.
If you need a good readability, such lib will be useful. If not, it's just a waste of time.
Hope this help !
The best way to do this in any reasonable language is:
Use your language's runtime library. Don't ever roll your own. Even if you have the knowledge and curiosity to write it, you don't want to test it and you don't want to maintain it.
If you notice any misbehavior from the runtime library conversion, file a bug.
If these conversions are a measurable bottleneck for your program, don't try to make them faster. Instead, find a way to avoid doing them at all. Instead of storing numbers as strings, just store the floating-point data (after possibly controlling for endianness). If you need a string representation, use a hexadecimal floating-point format instead.
I don't mean to discourage you, or anyone. These are actually fascinating functions to work on, but they are also shocking complex, and trying to design good test coverage for any non-naive implementation is even more involved. Don't get started unless you're prepared to spend months thinking about the problem.
You might want to use something like Grisu (or a faster method) because it gives you the shortest decimal representation with round trip guarantee unlike sprintf which only takes a fixed precision. The good news is that C++20 includes std::format that gives you this by default. For example:
printf("%.*g", std::numeric_limits<double>::max_digits10, 0.3);
prints 0.29999999999999999 while
puts(fmt::format("{}", 0.3).c_str());
prints 0.3 (godbolt).
In the meantime you can use the {fmt} library, std::format is based on. {fmt} also provides the print function that makes this even easier and more efficient (godbolt):
fmt::print("{}", 0.3);
Disclaimer: I'm the author of {fmt} and C++20 std::format.
In C++ why aren't you using iostreams? You should probably be using cout for the console and ostringstream for string-oriented output (unless you have a very specific need to use a printf family method).
You shouldn't worry about formatting performance unless actual profiling shows that CPU is the bottleneck (compared to say I/O).
void outputdouble( ostringstream & oss, double d )
{
oss.precision( 5 );
oss << d;
}
http://www.cplusplus.com/reference/iostream/ostringstream/