How to get the lowest representable floating point value in C++ - c++

I have a program where I need to set a variable to the lowest representable (non infinite) double-precision floating point number in C++. How am I able to set a variable to the lowest double-precision floating point value?
I tried using std::numeric_limits. I am not using C++11 so I am unable to try to use the lowest() function. I tried to use max(), but when I tried it, it returned infinity. I also tried subtracting a value from max() in the hope that I would then get a representable number.
double max_value = std::numeric_limits<double>::max();
cout << "Test 1: " << max_value << endl;
max_value = max_value - 1;
cout << "Test 2: " << max_value << endl;
double low_value = - std::numeric_limits<double>::max();
cout << "Test 3: " << low_value << endl;
cout << "Test 4: " << low_value + 1 << endl;
Output:
Test 1: inf
Test 2: inf
Test 3: -inf
Test 4: -inf
How can I set low_value in the example above to the lowest representable double?

Once you have -inf (you got it), you can get the lowest finite value with the nextafter function on (-inf,0).
EDIT: Depending on the context, this may be better than -DBL_MAX in case DBL_MAX is represented in decimal (thus in an inexact way). However the C standard requires that floating constants be evaluated in the default rounding mode (i.e. to nearest). In the particular case of GCC, DBL_MAX is a long double value cast to double; however the long double value seems to have enough digits so that, once converted from decimal to long double, the value is exactly representable in as double, so that the cast is exact and the active rounding mode doesn't affect it. As you can see, this is rather tricky, and one may want to check on various platforms that it is correct under any context. In a similar way, I have serious doubts on the correctness of the definition DBL_EPSILON by GCC on PowerPC (where the long double type is implemented as a double-double arithmetic) since there are many long double values extremely close to a power of two.

The standard library <cfloat>/<float.h> provides macros defining floating point implementation parameters.
The question is somewhat ambiguous - it is not clear whether you mean the smallest magnitude representable non-zero value (which would be DBL_MIN) or the lowest representable value (given by -DBL_MAX). Either way - take your pick as necessary.

It turned out that there was a bug in the iostream that I was using to print the values. I switched to using cstdio instead of iostream. The values were then printed as expected.
double low_value = - std::numeric_limits<double>::max();
cout <<"cout: " << low_value << endl;
printf("printf: %f\n",low_value);
Output:
cout: inf
printf: 179769...

Related

How to add two large double precision numbers in c++

I have the following piece of code
#include <iostream>
#include <iomanip>
int main()
{
double x = 7033753.49999141693115234375;
double y = 7033753.499991415999829769134521484375;
double z = (x+ y)/2.0;
std::cout << "y is " << std::setprecision(40) << y << "\n";
std::cout << "x is " << std::setprecision(40) << x << "\n";
std::cout << "z is " << std::setprecision(40) << z << "\n";
return 0;
}
When the above code is run I get,
y is 7033753.499991415999829769134521484375
x is 7033753.49999141693115234375
z is 7033753.49999141693115234375
When I do the same in Wolfram Alpha the value of z is completely different
z = 7033753.4999914164654910564422607421875 #Wolfram answer
I am familiar with floating point precision and that large numbers away from zero can not be exactly represented. Is that what is happening here? Is there anyway in c++ where I can get the same answer as Wolfram without any performance penalty?
large numbers away from zero can not be exactly represented. Is that what is happening here?
Yes.
Note that there are also infinitely many rational numbers that cannot be represented near zero as well. But the distance between representable values does grow exponentially in larger value ranges.
Is there anyway in c++ where I can get the same answer as Wolfram ...
You can potentially get the same answer by using long double. My system produces exactly the same result as Wolfram. Note that precision of long double varies between systems even among systems that conform to IEEE 754 standard.
More generally though, if you need results that are accurate to many significant digits, then don't use finite precision math.
... without any performance penalty?
No. Precision comes with a cost.
Just telling IOStreams to print to 40 significant decimal figures of precision, doesn't mean that the value you're outputting actually has that much precision.
A typical double takes you up to 17 significant decimal figures (ish); beyond that, what you see is completely arbitrary.
Per eerorika's answer, it looks like the Wolfram Alpha answer is also falling foul of this, albeit possibly with some different precision limit than yours.
You can try a different approach like a "bignum" library, or limit yourself to the precision afforded by the types that you've chosen.

Minimum floating point number (closest to zero)

I'm trying to find the minimum value (closest to zero) that I can store in a single precission floating point number. Using the <limits> header I can get the value, but if I make it much smaller, the float can still hold it and it gives the right result. Here is a test program, compiled with g++ 5.3.0.
#include <limits>
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float a = numeric_limits<float>::max();
float b = numeric_limits<float>::min();
a = a*2;
b = b/pow(2,23);
cout << a << endl;
cout << b << endl;
}
As I expected, "a" gives infinity, but "b" keeps holding the good result even after dividing the minimum value by 2^23, after that it gives 0.
The value that gives numeric_limits<float>::min() is 2^(-126) which I belive is the correct answer, but why is the float on my progam holding such small numbers?
std::numeric_limits::min for floating-point types gives the smallest non-zero value that can be represented without loss of precision. std::numeric_limits::lowest gives the smallest representable value. With IEEE representations that's a subnormal value (previously called denormalized).
From wikipedia https://en.wikipedia.org/wiki/Single-precision_floating-point_format:
The minimum positive normal value is 2^−126 ≈ 1.18 × 10^−38 and the
minimum positive (denormal) value is 2^−149 ≈ 1.4 × 10^−45.
So, for
cout << (float)pow(2,-149)
<< "-->" << (float)pow(2,-150)
<< "-->" << (float)pow(2,-151) << endl;
I'm getting:
1.4013e-45-->0-->0
I'm trying to find the minimum value (closest to zero) that I can
store in a single precission floating point number
0 is the closest value to 0 that you can store in any precision float. In fact, you can store it two ways, as there is a positive and negative 0.

gnu c++ floating number precision

I have simple question about floating number,
double temp;
std::cout.precision(std::numeric_limits<double>::digits10);
temp = 12345678901234567890.1234567890;
std::cout << (temp < std::numeric_limits<double>::max()) << std::endl;
std::cout << std::fixed << std::endl;
std::cout << temp << std::endl;
However, the output I get is this,
1
12345678901234567168.000000000000000
The value of temp is still within the range of double, however, the value is completely different. I am wondering what have I done wrong here?
Thanks.
A double has only 15.95 decimal digits of precision. You've already exceeded this number of digits in the integer part of the value, hence the loss of precision in the last few digits, and the lack of any useful digits after the decimal point.
You should probably take a look at this: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html before doing any more work with floating point values.
It's not completely different. It's correct to 16 digits or so. That's about what you can expect from a double.
A double can only store a limited amount of precision. It works out to about 15 decimal digits.
Here's a helpful article on how floating point numbers are represented, and the implications of that representation: Float
IEEE 754 is not precise for any given value - for example http://www.cprogramming.com/tutorial/floating_point/understanding_floating_point.html and http://support.microsoft.com/kb/42980
-358974.27 can't be represented on float according to http://ridiculousfish.com/blog/posts/float.html and I remember (though I'm too lazy to test it) that even something "simple" like 2.2 or 2.3 can't be accurately represented even as a double.

C++ internal representation of double/float

I am unable to understand why C++ division behaves the way it does. I have a simple program which divides 1 by 10 (using VS 2003)
double dResult = 0.0;
dResult = 1.0/10.0;
I expect dResult to be 0.1, However i get 0.10000000000000001
Why do i get this value, whats the problem with internal representation of double/float
How can i get the correct value?
Thanks.
Because all most modern processors use binary floating-point, which cannot exactly represent 0.1 (there is no way to represent 0.1 as m * 2^e with integer m and e).
If you want to see the "correct value", you can print it out with e.g.:
printf("%.1f\n", dResult);
Double and float are not identical to real numbers, it is because there are infinite values for real numbers, but only finite number of bits to represent them in double/float.
You can further read: what every computer scientist should know about floating point arithmetics
The ubiquitous IEEE754 floating point format expresses floating point numbers in scientific notation base 2, with a finite mantissa. Since a fraction like 1/5 (and hence 1/10) does not have a presentation with finitely many digits in binary scientific notation, you cannot represent the value 0.1 exactly. More generally, the only values that can be represented exactly are those that fit precisely into binary scientific notation with a mantissa of a few (e.g. 24 or 53 or 64) binary digits, and a suitably small exponent.
Working with integers, floats, and doubles could be tricky. Depends on what is your purpose. If you only want to display in nice format, then you can play with the C++ iomanipulator, precision, showpint, noshowpint. If you are trying to do precise computing with numeric methods, you may have to use some library for accurate representation. If you are multiplying a lots of small and large number, you may have to resole to use log transformations. Here is a small test:
float x=1.0000001;
cout << x << endl;
float y=9.9999999999999;
cout << "using default io format " << y/x << endl;
cout << showpoint << "using showpoint " << y/x << endl;
y=9.9999;
cout << "fewer 9 default C++ " << y/x << endl;
cout << showpoint << "fewer 9 showpoint" << y/x << endl;
1
using default io format 10
using showpoint 10.0000
fewer 9 default C++ 9.99990
fewer 9 showpoint9.99990
In special cases you want to use double (which may be the result of some complicated algorithm) to represent integer numbers, you have to figure out the proper conversion method. Once I had a situation where I want to use a single double value to store two type of values: -1, +1, or (0-1) to make my code more memory efficient (and speed, large memory tends to reduce performance). It is a little tricky to distinguish between +1 and val < 1. In this case I know that the values < 1 has a resolution say only 1/500, Then I can safely use floor(val+0.000001) to get back the 1 value that I initially stored.

How to workaround the inconsistent definition of numeric_limits<T>::min()?

The numeric_limits traits is supposed to be a general way of obtaining various type infomation, to be able to do things like
template<typename T>
T min(const std::vector<T>& vect)
{
T val = std::numeric_limits<T>::min();
for(int i=0 ; i<vect.size() ; i++)
val = max(T, vect[i]);
return val;
}
The problem is that (at least using MS Visual Studio 2008) numeric_limits<int>::min() returns the smallest negative number, while numeric_limits<double>::min() returns the smallest positive number!
Anyone knows the rationalie behind this design? Is there a better (recommended?) way of using numeric_limits? In my specific function above, I could of course initialize T to vect[0], but that is not the answer I am looking for..
See also (floating-point-specific) discussion
here
You can use Boost libraries. The library Numeric Conversions provides a class called bounds that can be used consistently.
See the documentation here.
This is an old thread, but there is an updated answer:
C++11 added a lowest() function to std::numeric_limits (See here)
So you can now call std::numeric_limits<double>::lowest() to get the lowest representable negative value.
The behaviour of min() isn't all that strange, it returns FLT_MIN, DBL_MIN or INT_MIN (or their respective values), depending on the type you specialize with. So your question should be why FLT_MIN and DBL_MIN are defined differently from INT_MIN.
Unfortunately, I don't know the answer to that latter question.
My suspicion is that it was defined that way for practical purposes. For integer numbers, you're usually concerned with overflow/underflow, where the minimum and maximum value become of interest.
For floating point numbers, there exists a different kind of underflow in that a calculation could result in a value that's larger than zero, but smaller than the smallest representable decimal for that floating point type. Knowing that smallest representable floating point value allows you to work around the issue. See also the Wikipedia article on subnormal/denormal numbers.
A workaround would be
double val = -std::numeric_limits<double>::max();
Of course, this doesn't explain the strange behaviour of numerics_limits::min() which could be a result of the fact that there are different min/max borders for integers (min = -2^n, max = 2^n-1) but not for doubles.
I'm not sure of the rationale but it is expected behaviour. Well, in the sense that is how Josuttis (and, presumably the standard) describes it!
min(): "Miniumum finite value (minimum
normalized value for floating-point
types with denormalization)."
As best I can tell if the type is not an integer (numeric_limits<>::is_integer) and has denormalization (numeric_limits<>::has_denorm) min() will return the smallest representable value by that type. Otherwise it will return the smallest value - which may be negative.
For a more consistent interface check out the Boost numeric/conversion library. Specifically the bounds traits class. Here's a snippet:
cout << "lowest float:" << boost::numeric::bounds<float>::lowest();
cout << "lowest int: " << boost::numeric::bounds<int>::lowest();
You may also find the boost::integer library useful. It brings some of C99's integer support (like int_least16_t) to C++ and can help select the best sized type for you particular need. An example:
boost::uint_t<20>::fast fastest20bits; // fastest unsigned integer that
// can hold at least 20 bits.
boost::int_max_value_t<100000>::least // smallest integer that can store
// the value 100000.
I often find that when I need one of boost::numeric/conversion or boost::integer I need them both.
numeric_limits<int>::min returned the lowest negative number, all floating point number types, return the smallest positive number when I tried it with Sun CC & g++.
I guess this is because 'smallest' and 'minimum' mean different things with floating point numbers. It is a bit odd though.
Both Sun CC and g++ produce the same result :
short:min: -32768 max: 32767
int:min: -2147483648 max: 2147483647
unsigned int:min: 0 max: 4294967295
long:min: -2147483648 max: 2147483647
float:min: 1.17549e-38 max:
3.40282e+38
double:min: 2.22507e-308 max:
1.79769e+308
long double:min: 3.3621e-4932 max:
1.18973e+4932
unsigned short:min: 0 max: 65535
unsigned int:min: 0 max: 4294967295
unsigned long:min: 0 max: 429496729
template<typename T>
void showMinMax()
{
cout << "min: " << numeric_limits<T>::min() << endl;
cout << "max: " << numeric_limits<T>::max() << endl;
cout << endl;
}
int main()
{
cout << "short:";
showMinMax<short>()
...etc...etc..
The definition of the smallest value for an empty vector can be argued. If the vector is empty then there is no smallest element.
Prefer to use std::min_element instead:
int main()
{
std::vector<int> v;
std::generate_n(std::back_inserter(v), 1000, std::rand);
std::vector<int>::iterator it = std::min_element(v.begin(), v.end());
if (it == v.end())
{
std::cout << "There is no smallest element" << std::endl;
}
else
{
std::cout << "The smallest element is " << *it << std::endl;
}
}