Platform-independent way to obtain maximum C++ float value - c++

What’s the best, platform-independent way to obtain the maximum value that can be stored in a float in C++?

std::numeric_limits

std::numeric_limits<float>::max()

std::numeric_limits
// numeric_limits example
#include <iostream>
#include <limits>
using namespace std;
int main () {
cout << "Minimum value for float: " << numeric_limits<float>::min() << endl;
cout << "Maximum value for float: " << numeric_limits<float>::max() << endl;
cout << "Minimum value for double: " << numeric_limits<double>::min() << endl;
cout << "Maximum value for double: " << numeric_limits<double>::max() << endl;
return 0;
}

In C++ you can use the std::numeric_limits class to get this sort of information.
If has_infinity is true (which will be true for basically all platforms nowadays), then you can use infinitity to get the value which is greater than or equal to all other values (except NaNs). Similarly, its negation will give a negative infinity, and be less than or equal to all other values (except NaNs again).
If you want finite values, then you can use min/max (which will be less than or equal to/greater than or equal to all other finite values).

#include <float.h>
then use FLT_MAX

Related

What is the use of fixed keyword in setprecision while dealing with formatting of numbers in C++?

I have seen a common programming practice to use fixed while using setprecision. Just wanted to know why it is used as I am new to Programming World.
Code in question:
#include <iomanip>
#include <iostream>
int main()
{
double num1 = 3.12345678;
std::cout << std::fixed << std::showpoint;
std::cout << std::setprecision(2);
std::cout << num1 << std::endl;
return 0;
}
It is used to clamp the amount of decimal digits to write.
The setprecision(x) call will limit it to x decimals.
More info here:
http://en.cppreference.com/w/cpp/io/manip/setprecision

Should C++ std::uniform_real_distribution<double> only generate positive numbers?

I was trying to generate some random doubles in C++ (MSVC, though that isn't too important to me—I just didn't have another compiler to test) and I noticed that my quick program never generated negative numbers:
#include <iostream>
#include <random>
#include <ctime>
int main() {
std::mt19937 generator(clock());
std::uniform_real_distribution<double>
rand_dbl(std::numeric_limits<double>::min(),
std::numeric_limits<double>::max());
std::cout << "Double Limits: (" << std::numeric_limits<double>::min()
<< "," << std::numeric_limits<double>::max() << ")"
<< std::endl << std::endl;
int total_neg = 0;
for (int i=0; i<100; i++) {
double d = rand_dbl(generator);
if (d<0) total_neg++;
std::cout << d << " ";
}
std::cout << std::endl << std::endl
<< "Total negative random double is: " << total_neg << std::endl;
return 0;
}
No matter how many numbers I have it generate, it never generates a negative one. I understand why most of the numbers generated are in the 10307 - 10308 range (which isn't exactly what I wanted), but not why the numbers are always positive. I tried a few different generators (default, mt19937, minstd_rand0) without any difference in this aspect.
Can anyone describe why this is the case?
You set it up that way with the limits that you provided. std::numeric_limits<double>::min() gives the smallest positive double, and you used that as the lower bound on the distribution.
std::numeric_limits<double>::min()
Will return DBL_MIN which is the smalles value closest to 0 a double can hold. If you want the largest negative value then you need to use
std::numeric_limits<double>::lowest()
Which will return -DBL_MAX which is the largest negative value a double can hold.
From cppreference:
For floating-point types with denormalization, min returns the minimum positive normalized value.
(emphasis mine)
So it's pretty normal you only get positive values.
Could you tell what is displayed by those lines?
std::cout << "Double Limits: (" << std::numeric_limits<double>::min()
<< "," << std::numeric_limits<double>::max() << ")"
<< std::endl << std::endl;

Why is this value printed although being NaN?

The following code assumes that we are on an x86-compatible system and that long double maps to x87 FPU's 80-bit format.
#include <cmath>
#include <array>
#include <cstring>
#include <iomanip>
#include <iostream>
int main()
{
std::array<uint8_t,10> data1{0x52,0x23,0x6f,0x24,0x8f,0xac,0xd1,0x43,0x30,0x02};
std::array<uint8_t,10> data2{0x52,0x23,0x6f,0x24,0x8f,0xac,0xd1,0xc3,0x30,0x02};
std::array<uint8_t,10> data3{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0x30,0x02};
long double value1, value2, value3;
static_assert(sizeof value1 >= 10,"Expected float80");
std::memcpy(&value1, data1.data(),sizeof value1);
std::memcpy(&value2, data2.data(),sizeof value2);
std::memcpy(&value3, data3.data(),sizeof value3);
std::cout << "isnan(value1): " << std::boolalpha << std::isnan(value1) << "\n";
std::cout << "isnan(value2): " << std::boolalpha << std::isnan(value2) << "\n";
std::cout << "isnan(value3): " << std::boolalpha << std::isnan(value3) << "\n";
std::cout << "value1: " << std::setprecision(20) << value1 << "\n";
std::cout << "value2: " << std::setprecision(20) << value2 << "\n";
std::cout << "value3: " << std::setprecision(20) << value3 << "\n";
}
Output:
isnan(value1): true
isnan(value2): false
isnan(value3): false
value1: 3.3614005946481929011e-4764
value2: 9.7056260598879139386e-4764
value3: 6.3442254652397210376e-4764
Here value1 is classified as "unsupported" by 387 and higher, because it has nonzero and not all-ones exponent — it's in fact an "unnormal". And isnan works as expected with it: the value is indeed nothing of a number (although not exactly a NaN). The second value, value2, has that integer bit set, and also works as expected: it's not a NaN. The third one is the value of the missing integer bit.
But somehow both numbers value1 and value2 appear printed, and the values differ exactly by the missing integer bit! Why is that? All other methods I tried, like printf and to_string give just 0.00000.
Even stranger, if I do any arithmetic with value1, in subsequent prints I do get nan. Taking this into account, how does operator<<(long double) even manage to actually print anything but nan? Does it explicitly set the integer bit, or maybe it parses the number instead of doing any FPU arithmetic on it? (assuming g++4.8 on Linux 32 bit).
All other methods I tried, like printf and to_string give just
0.00000.
What operator<<(long double) actually does is using the num_put<> class from locale library to perform the numeric formatting, which in turn uses one of the printf-family functions (see sections 27.7.3.6 and 22.4.2.2 of the C++ standard).
Depending on the settings, printf conversion specifier used for long double by locale might be any of: %Lf, %Le, %LE, %La, %LA, %Lg or %LG.
In your (and my) case it seems to be %Lg:
printf("value1: %.20Lf\n", value1);
printf("value1: %.20Le\n", value1);
printf("value1: %.20La\n", value1);
printf("value1: %.20Lg\n", value1);
std::cout << "value1: " << std::setprecision(20) << value1 << "\n";
value1: 0.00000000000000000000
value1: 3.36140059464819290106e-4764
value1: 0x4.3d1ac8f246f235200000p-15826
value1: 3.3614005946481929011e-4764
value1: 3.3614005946481929011e-4764
Taking this into account, how does operator<<(long double) even manage
to actually print anything but nan? Does it explicitly set the integer
bit, or maybe it parses the number instead of doing any FPU arithmetic
on it?
It prints the unnormalized value.
Conversion from binary to decimal floating point representation used by printf() may be performed without any FPU arithmetics. You can find the glibc implementation in the stdio-common/printf_fp.c source file.
I was trying this:
long double value = std::numeric_limits<long double>::quiet_NaN();
std::cout << "isnan(value): " << std::boolalpha << std::isnan(value) << "\n";
std::cout << "value: " << std::setprecision(20) << value << "\n";
So my assumption is that as stated here: http://en.cppreference.com/w/cpp/numeric/math/isnan value is being cast to double and not long double when evaluated by std::isnan and strictly:
std::numeric_limits<long double>::quiet_NaN() != std::numeric_limits<double>::quiet_NaN()

How to specify setprecision rounding

Can I specify setprecision to round double values when stream to std output?
ofile << std::setprecision(12) << total_run_time/TIME << "\n";
Output:
0.756247615801
ofile << std::setprecision(6)<< total_run_time/TIME << "\n";
Output:
0.756248
But I need the output as 0.756247
Thanks
There is also std::fesetround from <cfenv>, which sets the rounding direction:
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cfenv>
int main () {
double runtime = 0.756247615801;
// Set rounding direction and output with some precision:
const auto prev_round = std::fegetround();
std::fesetround(FE_DOWNWARD);
std::cout << "desired: " << std::setprecision(6) << runtime << "\n";
// Restore previous rounding direction and output for testing:
std::fesetround(prev_round);
std::cout << "default: " << std::setprecision(6) << runtime << "\n";
}
(note that these are not the kind of comments I recommend, they are just for tutoring purposes)
Output:
desired: 0.756247
default: 0.756248
Important note, though: I did not find any mention in the standard, that the operator<< overloads for floating types have to honour the rounding direction.
Another approach is to defeat the rounding by subtracting, in your second case, 0.000005 from the double before outputting it:
total_run_time / TIME - 0.000005
In many ways I prefer this as it avoids the potential for integer overflow.
Multiply the result of your division by a million, convert to an integer, and divide by a million (as a double). Have the side-effect that std::setprecision is not needed for the output.
std::cout.write(std::to_string(0.756247615801).c_str(), 8);
It looks really dirty, but it works!

different values of std::floor function for arguments with same value but different types

Consider the following:
#include <iostream>
#include <cmath>
int main()
{
using std::cout;
using std::endl;
const long double be2 = std::log(2);
cout << std::log(8.0) / be2 << ", " << std::floor(std::log(8.0) / be2)
<< endl;
cout << std::log(8.0L) / be2 << ", " << std::floor(std::log(8.0L) / be2)
<< endl;
}
Outputs
3, 2
3, 3
Why does the output differ? What am I missing here?
Also here is the link to codepad: http://codepad.org/baLtYrmy
And I'm using gcc 4.5 on linux, if that's important.
When I add this:
cout.precision(40);
I get this output:
2.999999999999999839754918906642444653698, 2
3.00000000000000010039712117215771058909, 3
You're printing two values that are very close to, but not exactly equal to, 3.0. It's the nature of std::floor that its results can differ for values that are very close together (mathematically, it's a discontinuous function).
#include <iostream>
#include <cmath>
#include <iomanip>
int main()
{
using std::cout;
using std::endl;
const long double be2 = std::log(2);
cout << setprecision (50)<<std::log(8.0)<<"\n";
cout << setprecision (50)<<std::log(8.0L)<<"\n";
cout << setprecision (50)<<std::log(8.0) / be2 << ", " << std::floor(std::log(8.0) / be2)
<< endl;
cout << setprecision (50)<< std::log(8.0L) / be2 << ", " << std::floor(std::log(8.0L) / be2)
<< endl;
return 0;
}
The output is:
2.0794415416798357476579894864698871970176696777344
2.0794415416798359282860714225549259026593063026667
2.9999999999999998397549189066424446536984760314226, 2
3.0000000000000001003971211721577105890901293605566, 3
If you check the output here, you will notice that there is a slight difference in the precision of the two outputs. These roundoff errors usually kick in on operations on float & double here while performing floor() and the results that appear are not what one feels they should be.
It is important to remember two attributes Precision & Rounding when you are working with float or double numbers.
You might want to read more about it in my answer here, the same reasoning applies here as well.
To expand on what Als is saying-
In the first case you are dividing an 8-byte double precision value by a 16-byte long double. In the second case you are dividing a 16-byte long double by a 16-byte long double. This results in a very small roundoff error which can be seen here:
cout << std::setprecision(20) << (std::log(8.0) / be2) << std::endl;
cout << std::setprecision(20) << (std::log(8.0L) / be2) << std::endl;
which yields:
2.9999999999999998398
3.0000000000000001004
Edit to say: in this case, sizeof is your friend (To see the difference in precision):
sizeof(std::log(8.0)); // 8
sizeof(std::log(8.0L)); // 16
sizeof(be2); // 16