I noticed a difference in the floating point values represented for Infinity and NAN. Is this specified some where in the standard?
#include <cmath>
#include <iostream>
#include <limits>
#include <stdint.h>
union Double
{
double value;
uint64_t repr;
};
int main()
{
Double d;
d.value = std::numeric_limits<double>::infinity();
std::cout << std::hex << "inf: " << d.repr << std::endl;
d.value = std::numeric_limits<double>::quiet_NaN();
std::cout << std::hex << "NAN: " << d.repr << std::endl;
return 0;
}
Ouput:
inf: 0x7ff0000000000000
NAN: 0x7ff8000000000000
I noticed a difference in the floating point values represented for Infinity and NAN.
Yes, this is not surprising. These values differ, so their representation should differ too.
Is this specified some where in the standard?
In the C++ standard? No.
In some floating-point standard, like IEEE-754? Yes.
Note: in C++, your union trick has undefined behavior. Use memcpy instead.
Related
Consider the following:
#include <iostream>
#include <cstdint>
int main() {
std::cout << std::hex
<< "0x" << std::strtoull("0xFFFFFFFFFFFFFFFF",0,16) << std::endl
<< "0x" << uint64_t(double(std::strtoull("0xFFFFFFFFFFFFFFFF",0,16))) << std::endl
<< "0x" << uint64_t(double(uint64_t(0xFFFFFFFFFFFFFFFF))) << std::endl;
return 0;
}
Which prints:
0xffffffffffffffff
0x0
0xffffffffffffffff
The first number is just the result of converting ULLONG_MAX, from a string to a uint64_t, which works as expected.
However, if I cast the result to double and then back to uint64_t, then it prints 0, the second number.
Normally, I would attribute this to the precision inaccuracy of floats, but what further puzzles me, is that if I cast the ULLONG_MAX from uint64_t to double and then back to uint64_t, the result is correct (third number).
Why the discrepancy between the second and the third result?
EDIT (by #Radoslaw Cybulski)
For another what-is-going-on-here try this code:
#include <iostream>
#include <cstdint>
using namespace std;
int main() {
uint64_t z1 = std::strtoull("0xFFFFFFFFFFFFFFFF",0,16);
uint64_t z2 = 0xFFFFFFFFFFFFFFFFull;
std::cout << z1 << " " << uint64_t(double(z1)) << "\n";
std::cout << z2 << " " << uint64_t(double(z2)) << "\n";
return 0;
}
which happily prints:
18446744073709551615 0
18446744073709551615 18446744073709551615
The number that is closest to 0xFFFFFFFFFFFFFFFF, and is representable by double (assuming 64 bit IEEE) is 18446744073709551616. You'll find that this is a bigger number than 0xFFFFFFFFFFFFFFFF. As such, the number is outside the representable range of uint64_t.
Of the conversion back to integer, the standard says (quoting latest draft):
[conv.fpint]
A prvalue of a floating-point type can be converted to a prvalue of an integer type.
The conversion truncates; that is, the fractional part is discarded.
The behavior is undefined if the truncated value cannot be represented in the destination type.
Why the discrepancy between the second and the third result?
Because the behaviour of the program is undefined.
Although it is mostly pointless to analyse reasons for differences in UB because the scope of variation is limitless, my guess at the reason for the discrepancy in this case is that in one case the value is compile time constant, while in the other there is a call to a library function that is invoked at runtime.
I am trying this:
std::cout << boost::lexical_cast<std::string>(0.0009) << std::endl;
and expecting the output to be:
0.0009
But the output is:
0.00089999999999999998
g++ version: 5.4.0, Boost version: 1.66
What can I do to make it print what it's been given.
You can in fact override the default precision:
Live On Coliru
#include <boost/lexical_cast.hpp>
#ifdef BOOST_LCAST_NO_COMPILE_TIME_PRECISION
# error unsupported
#endif
template <> struct boost::detail::lcast_precision<double> : std::integral_constant<unsigned, 5> { };
#include <string>
#include <iostream>
int main() {
std::cout << boost::lexical_cast<std::string>(0.0009) << std::endl;
}
Prints
0.0009
However, this is both not supported (detail::) and not flexible (all doubles will come out this way now).
The Real Problem
The problem is loss of accuracy converting from the decimal representation to the binary representation. Instead, use a decimal float representation:
Live On Coliru
#include <boost/lexical_cast.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <string>
#include <iostream>
using Double = boost::multiprecision::cpp_dec_float_50;
int main() {
Double x("0.009"),
y = x*2,
z = x/77;
for (Double v : { x, y, z }) {
std::cout << boost::lexical_cast<std::string>(v) << "\n";
std::cout << v << "\n";
}
}
Prints
0.009
0.009
0.018
0.018
0.000116883
0.000116883
boost::lexical_cast doesn't allow you to specify the precision when converting a floating point number into its string representation. From the documentation
For more involved conversions, such as where precision or formatting need tighter control than is offered by the default behavior of lexical_cast, the conventional std::stringstream approach is recommended.
So you could use stringstream
double d = 0.0009;
std::ostringstream ss;
ss << std::setprecision(4) << d;
std::cout << ss.str() << '\n';
Or another option is to use the boost::format library.
std::string s = (boost::format("%1$.4f") % d).str();
std::cout << s << '\n';
Both will print 0.0009.
0.0009 is a double precision floating literal with, assuming IEEE754, the value
0.00089999999999999997536692664112933925935067236423492431640625
That's what boost::lexical_cast<std::string> sees as the function parameter. And the default precision setting in the cout formatter is rounding to the 17th significant figure:
0.00089999999999999998
Really, if you want exact decimal precision, then use a decimal type (Boost has one), or work in integers and splice in the decimal separator yourself. But in your case, given that you're simply outputting the number with no complex calculations, rounding to the 15th significant figure will have the desired effect: inject
std::setprecision(15)
into the output stream.
This code snippet in Visual Studio 2013:
double a = 0.0;
double b = -0.0;
cout << (a == b) << " " << a << " " << b;
prints 1 0 -0. What is the difference between a and b?
C++ does not guarantee to differentiate between +0 and -0. This is a feature of each particular number representation. The IEEE 754 standard for floating point arithmetic does make this distinction, which can be used to keep sign information even when numbers go to zero. std::numeric_limits does not directly tell you if you have possible signed zeroes. But if std::numeric_limits<double>::is_iec559 is true then you can in practice assume that you have IEEE 754 representation, and thus possibly negative zero.
Noted by “gmch” in a comment, the C++11 standard library way to check the sign of a zero is to use std::copysign, or more directly using std::signbit, e.g. as follows:
#include <iostream>
#include <math.h> // copysign, signbit
using namespace std;
auto main() -> int
{
double const z1 = +0.0;
double const z2 = -0.0;
cout << boolalpha;
cout << "z1 is " << (signbit( z1 )? "negative" : "positive") << "." << endl;
cout << "z2 is " << (signbit( z2 )? "negative" : "positive") << "." << endl;
}
Without copysign or signbit, e.g. for a C++03 compiler, one way to detect a negative zero z is to check whether 1.0/z is negative infinity, e.g. by checking if it's just negative.
#include <iostream>
using namespace std;
auto main() -> int
{
double const z1 = +0.0;
double const z2 = -0.0;
cout << boolalpha;
cout << "z1 is " << (1/z1 < 0? "negative" : "positive") << "." << endl;
cout << "z2 is " << (1/z2 < 0? "negative" : "positive") << "." << endl;
}
But while this will probably work in practice on most any implementation, it's formally *Undefined Behavior.
One needs to be sure that the expression evaluation will not trap.
*) C++11 §5.6/4 “If the second operand of / or % is zero the behavior is undefined”
See http://en.m.wikipedia.org/wiki/Signed_zero
In a nutshell, it is due to the sign being stored as a stand-alone bit in IEEE 754 floating point representation. This leads to being able to have a zero exponent and fractional portions but still have the sign bit set--thus a negative zero. This is a condition that wouldn't happen for signed integers which are stored in twos-complement.
I really can't wrap my head around the fact that this code gives 2 results for the same formula:
#include <iostream>
#include <cmath>
int main() {
// std::cout.setf(std::ios::fixed, std::ios::floatfield);
std::cout.precision(20);
float a = (exp(M_PI) - M_PI);
std::cout << (exp(M_PI) - M_PI) << "\n";
std::cout << a << "\n";
return (0);
}
I don't really think that the IEEE 754 floating point representation is playing a significant role here ...
The first expression (namely (exp(M_PI) - M_PI)) is a double, the second expression (namely a) is a float. Neither even have 20 decimal digits of precision, but the float has a lot less precision than the double.
Because M_PI are of type double, so change a to double, you will have the same result:
#include <iostream>
#include <cmath>
int main() {
// std::cout.setf(std::ios::fixed, std::ios::floatfield);
std::cout.precision(20);
double a = (exp(M_PI) - M_PI);
std::cout << (exp(M_PI) - M_PI) << "\n";
std::cout << a << "\n";
return (0);
}
#include <iostream>
int main()
{
float test = 12535104400;
std::cout << test;
std::cin.get();
return 0;
}
//on msvc 2010 this ouputs: 1.25351e+010
I would like it to output just "12535104400" or in other words, the human readable format which has no leading zeros, but outputs the full value of a number.
The particular number cannot be accurately represented, for example try the following:
float v = 12535104400;
cout.precision(0);
cout << fixed << v << endl;
You'll see it outputs: 12535104512
You will need to include <iomanip> :
int main()
{
const double test = 12535104400;
std::cout << std::fixed << std::setprecision(0) << test;
std::cin.get();
return 0;
}
std::fixed is the manipulator which uses fixed-point precision (not scientific notation)
std::setprecision(0) sets how many digits to display after the decimal point
float test = 12535104400;
This should be a compiler error if your compiler doesn't support long long and int is 32-bit. Use floating literals instead of integer literals e.g 1234.0f vs 1234
#include <iostream>
#include <iomanip>
int main()
{
float test = 12535104400.0f;
std::cout << std::setiosflags(ios::fixed) << std::setprecision(0) << test;
std::cin.get();
return 0;
}
should print what you want. But beware that float isn't that precise
You are out of luck, 4-byte float can store cca 7 digits. Use double or long for such numbers.
In order to format the output in iostream, you'll need manipulators
If you're willing to lose precision, you can typecast it to an integer.
cout << int(test);
or
cout << (int)test;