The following code assumes that we are on an x86-compatible system and that long double maps to x87 FPU's 80-bit format.
#include <cmath>
#include <array>
#include <cstring>
#include <iomanip>
#include <iostream>
int main()
{
std::array<uint8_t,10> data1{0x52,0x23,0x6f,0x24,0x8f,0xac,0xd1,0x43,0x30,0x02};
std::array<uint8_t,10> data2{0x52,0x23,0x6f,0x24,0x8f,0xac,0xd1,0xc3,0x30,0x02};
std::array<uint8_t,10> data3{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0x30,0x02};
long double value1, value2, value3;
static_assert(sizeof value1 >= 10,"Expected float80");
std::memcpy(&value1, data1.data(),sizeof value1);
std::memcpy(&value2, data2.data(),sizeof value2);
std::memcpy(&value3, data3.data(),sizeof value3);
std::cout << "isnan(value1): " << std::boolalpha << std::isnan(value1) << "\n";
std::cout << "isnan(value2): " << std::boolalpha << std::isnan(value2) << "\n";
std::cout << "isnan(value3): " << std::boolalpha << std::isnan(value3) << "\n";
std::cout << "value1: " << std::setprecision(20) << value1 << "\n";
std::cout << "value2: " << std::setprecision(20) << value2 << "\n";
std::cout << "value3: " << std::setprecision(20) << value3 << "\n";
}
Output:
isnan(value1): true
isnan(value2): false
isnan(value3): false
value1: 3.3614005946481929011e-4764
value2: 9.7056260598879139386e-4764
value3: 6.3442254652397210376e-4764
Here value1 is classified as "unsupported" by 387 and higher, because it has nonzero and not all-ones exponent — it's in fact an "unnormal". And isnan works as expected with it: the value is indeed nothing of a number (although not exactly a NaN). The second value, value2, has that integer bit set, and also works as expected: it's not a NaN. The third one is the value of the missing integer bit.
But somehow both numbers value1 and value2 appear printed, and the values differ exactly by the missing integer bit! Why is that? All other methods I tried, like printf and to_string give just 0.00000.
Even stranger, if I do any arithmetic with value1, in subsequent prints I do get nan. Taking this into account, how does operator<<(long double) even manage to actually print anything but nan? Does it explicitly set the integer bit, or maybe it parses the number instead of doing any FPU arithmetic on it? (assuming g++4.8 on Linux 32 bit).
All other methods I tried, like printf and to_string give just
0.00000.
What operator<<(long double) actually does is using the num_put<> class from locale library to perform the numeric formatting, which in turn uses one of the printf-family functions (see sections 27.7.3.6 and 22.4.2.2 of the C++ standard).
Depending on the settings, printf conversion specifier used for long double by locale might be any of: %Lf, %Le, %LE, %La, %LA, %Lg or %LG.
In your (and my) case it seems to be %Lg:
printf("value1: %.20Lf\n", value1);
printf("value1: %.20Le\n", value1);
printf("value1: %.20La\n", value1);
printf("value1: %.20Lg\n", value1);
std::cout << "value1: " << std::setprecision(20) << value1 << "\n";
value1: 0.00000000000000000000
value1: 3.36140059464819290106e-4764
value1: 0x4.3d1ac8f246f235200000p-15826
value1: 3.3614005946481929011e-4764
value1: 3.3614005946481929011e-4764
Taking this into account, how does operator<<(long double) even manage
to actually print anything but nan? Does it explicitly set the integer
bit, or maybe it parses the number instead of doing any FPU arithmetic
on it?
It prints the unnormalized value.
Conversion from binary to decimal floating point representation used by printf() may be performed without any FPU arithmetics. You can find the glibc implementation in the stdio-common/printf_fp.c source file.
I was trying this:
long double value = std::numeric_limits<long double>::quiet_NaN();
std::cout << "isnan(value): " << std::boolalpha << std::isnan(value) << "\n";
std::cout << "value: " << std::setprecision(20) << value << "\n";
So my assumption is that as stated here: http://en.cppreference.com/w/cpp/numeric/math/isnan value is being cast to double and not long double when evaluated by std::isnan and strictly:
std::numeric_limits<long double>::quiet_NaN() != std::numeric_limits<double>::quiet_NaN()
Related
What's the C++ rules that means equal is false?. Given:
float f {-1.0};
bool equal = (static_cast<unsigned>(f) == static_cast<unsigned>(-1.0));
E.g. https://godbolt.org/z/fcmx2P
#include <iostream>
int main()
{
float f {-1.0};
const float cf {-1.0};
std::cout << std::hex;
std::cout << " f" << "=" << static_cast<unsigned>(f) << '\n';
std::cout << "cf" << "=" << static_cast<unsigned>(cf) << '\n';
return 0;
}
Produces the following output:
f=ffffffff
cf=0
The behaviour of your program is undefined: the C++ standard does not define the conversion of a negative floating point type to an unsigned type.
(Note the familiar wrap-around behaviour only applies to negative integral types.)
So therefore there's little point in attempting to explain your program output.
Consider the following:
#include <iostream>
#include <cstdint>
int main() {
std::cout << std::hex
<< "0x" << std::strtoull("0xFFFFFFFFFFFFFFFF",0,16) << std::endl
<< "0x" << uint64_t(double(std::strtoull("0xFFFFFFFFFFFFFFFF",0,16))) << std::endl
<< "0x" << uint64_t(double(uint64_t(0xFFFFFFFFFFFFFFFF))) << std::endl;
return 0;
}
Which prints:
0xffffffffffffffff
0x0
0xffffffffffffffff
The first number is just the result of converting ULLONG_MAX, from a string to a uint64_t, which works as expected.
However, if I cast the result to double and then back to uint64_t, then it prints 0, the second number.
Normally, I would attribute this to the precision inaccuracy of floats, but what further puzzles me, is that if I cast the ULLONG_MAX from uint64_t to double and then back to uint64_t, the result is correct (third number).
Why the discrepancy between the second and the third result?
EDIT (by #Radoslaw Cybulski)
For another what-is-going-on-here try this code:
#include <iostream>
#include <cstdint>
using namespace std;
int main() {
uint64_t z1 = std::strtoull("0xFFFFFFFFFFFFFFFF",0,16);
uint64_t z2 = 0xFFFFFFFFFFFFFFFFull;
std::cout << z1 << " " << uint64_t(double(z1)) << "\n";
std::cout << z2 << " " << uint64_t(double(z2)) << "\n";
return 0;
}
which happily prints:
18446744073709551615 0
18446744073709551615 18446744073709551615
The number that is closest to 0xFFFFFFFFFFFFFFFF, and is representable by double (assuming 64 bit IEEE) is 18446744073709551616. You'll find that this is a bigger number than 0xFFFFFFFFFFFFFFFF. As such, the number is outside the representable range of uint64_t.
Of the conversion back to integer, the standard says (quoting latest draft):
[conv.fpint]
A prvalue of a floating-point type can be converted to a prvalue of an integer type.
The conversion truncates; that is, the fractional part is discarded.
The behavior is undefined if the truncated value cannot be represented in the destination type.
Why the discrepancy between the second and the third result?
Because the behaviour of the program is undefined.
Although it is mostly pointless to analyse reasons for differences in UB because the scope of variation is limitless, my guess at the reason for the discrepancy in this case is that in one case the value is compile time constant, while in the other there is a call to a library function that is invoked at runtime.
I'm using visual studio 2015 to print two floating numbers:
double d1 = 1.5;
double d2 = 123456.789;
std::cout << "value1: " << d1 << std::endl;
std::cout << "value2: " << d2 << std::endl;
std::cout << "maximum number of significant decimal digits (value1): " << -std::log10(std::nextafter(d1, std::numeric_limits<double>::max()) - d1) << std::endl;
std::cout << "maximum number of significant decimal digits (value2): " << -std::log10(std::nextafter(d2, std::numeric_limits<double>::max()) - d2) << std::endl;
This prints the following:
value1: 1.5
value2: 123457
maximum number of significant decimal digits (value1): 15.6536
maximum number of significant decimal digits (value2): 10.8371
Why 123457 is printed out for the value 123456.789? Does ANSI C++ specification allow to display anything for floating numbers when std::cout is used without std::setprecision()?
The rounding off happens because of the C++ standard which can be seen by writing
std::cout<<std::cout.precision();
The output screen will show 6 which tells that the default number of significant digits which will be printed by the std::cout statement is 6. That is why it automatically rounds off the floating number to 6 digits.
What you have have pointed out is actually one of those many things that the standardization committee should consider regarding the standard iostream in C++. Such things work well when you write :-
printf ("%f\n", d2);
But not with std::cout where you need to use std::setprecision because it's formatting is similar to the use of %g instead of %f in printf. So you need to write :-
std::cout << std::setprecision(10) << "value2: " << d2 << std::endl;
But if you dont like this method & are using C++11 (& onwards) then you can also write :-
std::cout << "value2: " << std::to_string(d2) << std::endl;
This will give you the same result as printf ("%f\n", d2);.
A much better method is to cancel the rounding that occurs in std::cout by using std::fixed :-
#include <iostream>
#include <iomanip>
int main()
{
std::cout << std::fixed;
double d = 123456.789;
std::cout << d;
return 0;
}
Output :-
123456.789000
So I guess your problem is solved !!
I think the problem here is that the C++ standard is not written to be easy to read, it is written to be precise and not repeat itself. So if you look up the operator<<(double), it doesn't say anything other than "it uses num_put - because that is how the cout << some_float_value is implemented.
The default behaviour is what print("%g", value); does [table 88 in n3337 version of the C++ standard explains what the equivalence of printf and c++ formatting]. So if you want to do %.16g you need to change the precision by calling setprecision(16).
If I want to flip some bits, I was wondering which way is better. Should I flip them using XOR 0xffffffff or by using ~?
I'm afraid that there will be some cases where I might need to pad bits onto the end in one of these ways and not the other, which would make the other way safer to use. I'm wondering if there are times when it's better to use one over the other.
Here is some code that uses both on the same input value, and the output values are always the same.
#include <iostream>
#include <iomanip>
void flipBits(unsigned long value)
{
const unsigned long ORIGINAL_VALUE = value;
std::cout << "Original value:" << std::setw(19) << std::hex << value << std::endl;
value ^= 0xffffffff;
std::cout << "Value after XOR:" << std::setw(18) << std::hex << value << std::endl;
value = ORIGINAL_VALUE;
value = ~value;
std::cout << "Value after bit negation: " << std::setw(8) << std::hex << value << std::endl << std::endl;
}
int main()
{
flipBits(0x12345678);
flipBits(0x11223344);
flipBits(0xabcdef12);
flipBits(15);
flipBits(0xffffffff);
flipBits(0x0);
return 0;
}
Output:
Original value: 12345678
Value after XOR: edcba987
Value after bit negation: edcba987
Original value: 11223344
Value after XOR: eeddccbb
Value after bit negation: eeddccbb
Original value: abcdef12
Value after XOR: 543210ed
Value after bit negation: 543210ed
Original value: f
Value after XOR: fffffff0
Value after bit negation: fffffff0
Original value: ffffffff
Value after XOR: 0
Value after bit negation: 0
Original value: 0
Value after XOR: ffffffff
Value after bit negation: ffffffff
Use ~:
You won't be relying on any specific width of the type; for example, int is not 32 bits on all platforms.
It removes the risk of accidentally typing one f too few or too many.
It makes the intent clearer.
As you're asking for c++ specifically, simply use std::bitset
#include <iostream>
#include <iomanip>
#include <bitset>
#include <limits>
void flipBits(unsigned long value) {
std::bitset<std::numeric_limits<unsigned long>::digits> bits(value);
std::cout << "Original value : 0x" << std::hex << value;
value = bits.flip().to_ulong();
std::cout << ", Value after flip: 0x" << std::hex << value << std::endl;
}
See live demo.
As for your mentioned concerns, of just using the ~ operator with the unsigned long value, and having more bits flipped as actually wanted:
Since std::bitset<NumberOfBits> actually specifies the number of bits, that should be operated on, it will well solve such problems correctly.
Consider the following:
#include <iostream>
#include <cmath>
int main()
{
using std::cout;
using std::endl;
const long double be2 = std::log(2);
cout << std::log(8.0) / be2 << ", " << std::floor(std::log(8.0) / be2)
<< endl;
cout << std::log(8.0L) / be2 << ", " << std::floor(std::log(8.0L) / be2)
<< endl;
}
Outputs
3, 2
3, 3
Why does the output differ? What am I missing here?
Also here is the link to codepad: http://codepad.org/baLtYrmy
And I'm using gcc 4.5 on linux, if that's important.
When I add this:
cout.precision(40);
I get this output:
2.999999999999999839754918906642444653698, 2
3.00000000000000010039712117215771058909, 3
You're printing two values that are very close to, but not exactly equal to, 3.0. It's the nature of std::floor that its results can differ for values that are very close together (mathematically, it's a discontinuous function).
#include <iostream>
#include <cmath>
#include <iomanip>
int main()
{
using std::cout;
using std::endl;
const long double be2 = std::log(2);
cout << setprecision (50)<<std::log(8.0)<<"\n";
cout << setprecision (50)<<std::log(8.0L)<<"\n";
cout << setprecision (50)<<std::log(8.0) / be2 << ", " << std::floor(std::log(8.0) / be2)
<< endl;
cout << setprecision (50)<< std::log(8.0L) / be2 << ", " << std::floor(std::log(8.0L) / be2)
<< endl;
return 0;
}
The output is:
2.0794415416798357476579894864698871970176696777344
2.0794415416798359282860714225549259026593063026667
2.9999999999999998397549189066424446536984760314226, 2
3.0000000000000001003971211721577105890901293605566, 3
If you check the output here, you will notice that there is a slight difference in the precision of the two outputs. These roundoff errors usually kick in on operations on float & double here while performing floor() and the results that appear are not what one feels they should be.
It is important to remember two attributes Precision & Rounding when you are working with float or double numbers.
You might want to read more about it in my answer here, the same reasoning applies here as well.
To expand on what Als is saying-
In the first case you are dividing an 8-byte double precision value by a 16-byte long double. In the second case you are dividing a 16-byte long double by a 16-byte long double. This results in a very small roundoff error which can be seen here:
cout << std::setprecision(20) << (std::log(8.0) / be2) << std::endl;
cout << std::setprecision(20) << (std::log(8.0L) / be2) << std::endl;
which yields:
2.9999999999999998398
3.0000000000000001004
Edit to say: in this case, sizeof is your friend (To see the difference in precision):
sizeof(std::log(8.0)); // 8
sizeof(std::log(8.0L)); // 16
sizeof(be2); // 16