machine epsilon - long double in c++

machine epsilon - long double in c++ - c++

I wanted to calculate the machine Epsilon, the smallest possible number e that gives 1 + e > 1 using different data types of C++: float, double and long double.
Here's my code:
#include <cstdio>
template<typename T>
T machineeps() {
T epsilon = 1;
T expression;
do {
epsilon = epsilon / 2;
expression = 1 + epsilon;
} while(expression > 1);
return epsilon;
}
int main() {
auto epsf = machineeps<float>();
auto epsd = machineeps<double>();
auto epsld = machineeps<long double>();
std::printf("epsilon float: %22.17e\nepsilon double: %22.17e\nepsilon long double: %Le\n", epsf, epsd, epsld);
return 0;
}
But I get this strange output:
epsilon float: 5.96046447753906250e-008
epsilon double: 1.11022302462515650e-016
epsilon long double: -0.000000e+000
The values for float and double are what I was expecting, but, I cannot explain the long double behavior.
Can somebody tell me what went wrong?

I cannot reproduce your results. I get:
epsilon long double: 5.421011e-20
Anyway, logically, the code should be something like:
template<typename T>
T machineeps() {
T epsilon = 1, prev;
T expression;
do {
prev = epsilon;
epsilon = epsilon / 2;
expression = 1 + epsilon;
} while (expression > 1);
return prev; // <-- `1+prev` yields a result different from one
}
On my platform it produces values similar to std::numeric_limits::epsilon:
epsilon float: 1.19209289550781250e-07
epsilon double: 2.22044604925031308e-16
epsilon long double: 1.084202e-19
(note the different order of magnitude)

There are several things going on here.
First, floating-point math is often done at the maximum available precision, regardless of the actual declared type of the floating-point variable. So, for example, arithmetic on floats is usually done with 80 bits of precision on Intel hardware (Java originally banned this, requiring all floating-point math to be done at the exact precision of the type; this killed floating-point performance, and they quickly abandoned that rule). Storing the result of a floating-point calculation is supposed to truncate the value to the appropriate type, but by default most compilers ignore this. You can tell your compiler not to allow that; the switch for that depends on the compiler. As is, you can’t rely on the result that’s being calculated here.
Second, the loop in the code terminates when the value of 1 + epsilon is not greater than 1, so the returned value will be less than the true value of epsilon.
Third, coupled with the second one, some floating-point implementations don’t do subnormal values; once the exponent becomes smaller than the smallest that can be represented, the value is 0. That may be what you’re seeing here with the long double value. IEEE floating-point handles zeros less abruptly — once you hit that minimum exponent, smaller values gradually lose precision. There are quite a few values between the smallest normalized value and 0.

Related

Rounding off floating numbers in cpp

For a particular question, I need to perform calculations on a floating number, round it off to 2 digits after the decimal place, and assign it to a variable for comparison purposes. I tried to find a solution to this but all I keep finding is how to print those rounded numbers (using printf or setprecision) instead of assigning them to a variable.
Please help.

I usually do something like that:
#include <cmath> // 'std::floor'
#include <limits> // 'std::numeric_limits'
// Round value to granularity
template<typename T> inline T round(const T x, const T gran)
{
//static_assert(gran!=0);
return gran * std::floor( x/gran + std::numeric_limits<T>::round_error() );
}
double rounded_to_cent = round(1.23456, 0.01); // Gives something near 1.23
Be sure you know how floating point types work though.
Addendum: I know that this topic has already been extensively covered in other questions, but let me put this small paragraph here.
Given a real number, you can represent it with -almost- arbitrary accuracy with a (base10) literal like 1.2345, that's a string that you can type with your keyboard.
When you store that value in a floating point type, let's say a double, you -almost- always loose accuracy because probably your number won't have an exact representation in the finite set of the numbers representable by that type.
Nowadays double uses 64 bits, so it has 2^64 symbols to represent the not numerable infinity of real numbers: that's a H2O molecule in an infinity of infinite oceans.
The representation error is relative to the value; for example in a IEEE 754 double, over 2^53 not all the integer values can be represented.
So when someone tells that the "result is wrong" they're technically right; the "acceptable" result is application dependent.

round it off to 2 digits after the decimal place, and assign it to a variable for comparison purposes
To avoid errors that creep in when using binary floating point in a decimal problem, consider alternatives.
Direct approach has corner errors due to double rounding and overflow. These errors may be tolerable for OP larger goals
// Errors:
// 1) x*100.0, round(x*100.0)/100.0 inexact.
// Select `x` values near "dddd.dd5" form an inexact product `x*100.0`
// and may lead to a double rounding error and then incorrect result when comparing.
// 2) x*100.0 may overflow.
int compare_hundredth1(double x, double ref) {
x = round(x*100.0)/100.0;
return (x > ref) - (x < ref);
}
We can do better.
When a wider floating point type exist:
int compare_hundredth2(double x, double ref) {
auto x_rounded = math::round(x*100.0L);
auto ref_rounded = ref*100.0L;
return (x_rounded > ref_rounded) - (x_rounded < ref_rounded);
}
To use the same width floating point type takes more work:
All finite large larges of x, ref are whole numbers and need no rounding to the nearest 0.01.
int compare_hundredth3(double x, double ref) {
double x_whole;
auto x_fraction = modf(x, &x_whole);
// If rounding needed ...
if (x_fraction != 0.0) {
if (x - 0.01 > ref) return 1; // x much more than ref
if (x + 0.01 < ref) return -1; // x much less than ref
// x, ref nearly the same
double ref_whole;
auto ref_fraction = modf(x, &ref_whole);
x -= ref_whole;
auto x100 = (x - ref_whole)*100; // subtraction expected to be exact here.
auto ref100 = ref_fraction*100;
return (x100 > ref100) - (x100 < ref100);
}
return (x > ref) - (x < ref);
}
The above assume ref is without error. If this is not so, consider using a scaled ref.
Note: The above sets aside not-a-number concerns.
More clean-up later.

Here's an example with a custom function that rounds up the floating number f to n decimal places. Basically, it multiplies the floating number by 10 to the power of N to separate the decimal places, then uses roundf to round the decimal places up or down, and finally divides back the floating number by 10 to the power of N (N is the amount of decimal places). Works for C and C++:
#include <stdio.h>
#include <math.h>
float my_round(float f, unsigned int n)
{
float p = powf(10.0f, (float)n);
f *= p;
f = roundf(f);
f /= p;
return f;
}
int main()
{
float f = 0.78901f;
printf("%f\n", f);
f = my_round(f, 2); /* Round with 2 decimal places */
printf("%f\n", f);
return 0;
}
Output:
0.789010
0.790000

How to round a floating point type to two decimals or more in C++? [duplicate]

How can I round a float value (such as 37.777779) to two decimal places (37.78) in C?

If you just want to round the number for output purposes, then the "%.2f" format string is indeed the correct answer. However, if you actually want to round the floating point value for further computation, something like the following works:
#include <math.h>
float val = 37.777779;
float rounded_down = floorf(val * 100) / 100; /* Result: 37.77 */
float nearest = roundf(val * 100) / 100; /* Result: 37.78 */
float rounded_up = ceilf(val * 100) / 100; /* Result: 37.78 */
Notice that there are three different rounding rules you might want to choose: round down (ie, truncate after two decimal places), rounded to nearest, and round up. Usually, you want round to nearest.
As several others have pointed out, due to the quirks of floating point representation, these rounded values may not be exactly the "obvious" decimal values, but they will be very very close.
For much (much!) more information on rounding, and especially on tie-breaking rules for rounding to nearest, see the Wikipedia article on Rounding.

Using %.2f in printf. It only print 2 decimal points.
Example:
printf("%.2f", 37.777779);
Output:
37.77

Assuming you're talking about round the value for printing, then Andrew Coleson and AraK's answer are correct:
printf("%.2f", 37.777779);
But note that if you're aiming to round the number to exactly 37.78 for internal use (eg to compare against another value), then this isn't a good idea, due to the way floating point numbers work: you usually don't want to do equality comparisons for floating point, instead use a target value +/- a sigma value. Or encode the number as a string with a known precision, and compare that.
See the link in Greg Hewgill's answer to a related question, which also covers why you shouldn't use floating point for financial calculations.

How about this:
float value = 37.777779;
float rounded = ((int)(value * 100 + .5) / 100.0);

printf("%.2f", 37.777779);
If you want to write to C-string:
char number[24]; // dummy size, you should take care of the size!
sprintf(number, "%.2f", 37.777779);

Always use the printf family of functions for this. Even if you want to get the value as a float, you're best off using snprintf to get the rounded value as a string and then parsing it back with atof:
#include <math.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
double dround(double val, int dp) {
int charsNeeded = 1 + snprintf(NULL, 0, "%.*f", dp, val);
char *buffer = malloc(charsNeeded);
snprintf(buffer, charsNeeded, "%.*f", dp, val);
double result = atof(buffer);
free(buffer);
return result;
}
I say this because the approach shown by the currently top-voted answer and several others here -
multiplying by 100, rounding to the nearest integer, and then dividing by 100 again - is flawed in two ways:
For some values, it will round in the wrong direction because the multiplication by 100 changes the decimal digit determining the rounding direction from a 4 to a 5 or vice versa, due to the imprecision of floating point numbers
For some values, multiplying and then dividing by 100 doesn't round-trip, meaning that even if no rounding takes place the end result will be wrong
To illustrate the first kind of error - the rounding direction sometimes being wrong - try running this program:
int main(void) {
// This number is EXACTLY representable as a double
double x = 0.01499999999999999944488848768742172978818416595458984375;
printf("x: %.50f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.50f\n", res1);
printf("Rounded with round, then divided: %.50f\n", res2);
}
You'll see this output:
x: 0.01499999999999999944488848768742172978818416595459
Rounded with snprintf: 0.01000000000000000020816681711721685132943093776703
Rounded with round, then divided: 0.02000000000000000041633363423443370265886187553406
Note that the value we started with was less than 0.015, and so the mathematically correct answer when rounding it to 2 decimal places is 0.01. Of course, 0.01 is not exactly representable as a double, but we expect our result to be the double nearest to 0.01. Using snprintf gives us that result, but using round(100 * x) / 100 gives us 0.02, which is wrong. Why? Because 100 * x gives us exactly 1.5 as the result. Multiplying by 100 thus changes the correct direction to round in.
To illustrate the second kind of error - the result sometimes being wrong due to * 100 and / 100 not truly being inverses of each other - we can do a similar exercise with a very big number:
int main(void) {
double x = 8631192423766613.0;
printf("x: %.1f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.1f\n", res1);
printf("Rounded with round, then divided: %.1f\n", res2);
}
Our number now doesn't even have a fractional part; it's an integer value, just stored with type double. So the result after rounding it should be the same number we started with, right?
If you run the program above, you'll see:
x: 8631192423766613.0
Rounded with snprintf: 8631192423766613.0
Rounded with round, then divided: 8631192423766612.0
Oops. Our snprintf method returns the right result again, but the multiply-then-round-then-divide approach fails. That's because the mathematically correct value of 8631192423766613.0 * 100, 863119242376661300.0, is not exactly representable as a double; the closest value is 863119242376661248.0. When you divide that back by 100, you get 8631192423766612.0 - a different number to the one you started with.
Hopefully that's a sufficient demonstration that using roundf for rounding to a number of decimal places is broken, and that you should use snprintf instead. If that feels like a horrible hack to you, perhaps you'll be reassured by the knowledge that it's basically what CPython does.

Also, if you're using C++, you can just create a function like this:
string prd(const double x, const int decDigits) {
stringstream ss;
ss << fixed;
ss.precision(decDigits); // set # places after decimal
ss << x;
return ss.str();
}
You can then output any double myDouble with n places after the decimal point with code such as this:
std::cout << prd(myDouble,n);

There isn't a way to round a float to another float because the rounded float may not be representable (a limitation of floating-point numbers). For instance, say you round 37.777779 to 37.78, but the nearest representable number is 37.781.
However, you can "round" a float by using a format string function.

You can still use:
float ceilf(float x); // don't forget #include <math.h> and link with -lm.
example:
float valueToRound = 37.777779;
float roundedValue = ceilf(valueToRound * 100) / 100;

In C++ (or in C with C-style casts), you could create the function:
/* Function to control # of decimal places to be output for x */
double showDecimals(const double& x, const int& numDecimals) {
int y=x;
double z=x-y;
double m=pow(10,numDecimals);
double q=z*m;
double r=round(q);
return static_cast<double>(y)+(1.0/m)*r;
}
Then std::cout << showDecimals(37.777779,2); would produce: 37.78.
Obviously you don't really need to create all 5 variables in that function, but I leave them there so you can see the logic. There are probably simpler solutions, but this works well for me--especially since it allows me to adjust the number of digits after the decimal place as I need.

Use float roundf(float x).
"The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.5
#include <math.h>
float y = roundf(x * 100.0f) / 100.0f;
Depending on your float implementation, numbers that may appear to be half-way are not. as floating-point is typically base-2 oriented. Further, precisely rounding to the nearest 0.01 on all "half-way" cases is most challenging.
void r100(const char *s) {
float x, y;
sscanf(s, "%f", &x);
y = round(x*100.0)/100.0;
printf("%6s %.12e %.12e\n", s, x, y);
}
int main(void) {
r100("1.115");
r100("1.125");
r100("1.135");
return 0;
}
1.115 1.115000009537e+00 1.120000004768e+00
1.125 1.125000000000e+00 1.129999995232e+00
1.135 1.134999990463e+00 1.139999985695e+00
Although "1.115" is "half-way" between 1.11 and 1.12, when converted to float, the value is 1.115000009537... and is no longer "half-way", but closer to 1.12 and rounds to the closest float of 1.120000004768...
"1.125" is "half-way" between 1.12 and 1.13, when converted to float, the value is exactly 1.125 and is "half-way". It rounds toward 1.13 due to ties to even rule and rounds to the closest float of 1.129999995232...
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 and rounds to the closest float of 1.129999995232...
If code used
y = roundf(x*100.0f)/100.0f;
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 but incorrectly rounds to float of 1.139999985695... due to the more limited precision of float vs. double. This incorrect value may be viewed as correct, depending on coding goals.

Code definition :
#define roundz(x,d) ((floor(((x)*pow(10,d))+.5))/pow(10,d))
Results :
a = 8.000000
sqrt(a) = r = 2.828427
roundz(r,2) = 2.830000
roundz(r,3) = 2.828000
roundz(r,5) = 2.828430

double f_round(double dval, int n)
{
char l_fmtp[32], l_buf[64];
char *p_str;
sprintf (l_fmtp, "%%.%df", n);
if (dval>=0)
sprintf (l_buf, l_fmtp, dval);
else
sprintf (l_buf, l_fmtp, dval);
return ((double)strtod(l_buf, &p_str));
}
Here n is the number of decimals
example:
double d = 100.23456;
printf("%f", f_round(d, 4));// result: 100.2346
printf("%f", f_round(d, 2));// result: 100.23

I made this macro for rounding float numbers.
Add it in your header / being of file
#define ROUNDF(f, c) (((float)((int)((f) * (c))) / (c)))
Here is an example:
float x = ROUNDF(3.141592, 100)
x equals 3.14 :)

Let me first attempt to justify my reason for adding yet another answer to this question. In an ideal world, rounding is not really a big deal. However, in real systems, you may need to contend with several issues that can result in rounding that may not be what you expect. For example, you may be performing financial calculations where final results are rounded and displayed to users as 2 decimal places; these same values are stored with fixed precision in a database that may include more than 2 decimal places (for various reasons; there is no optimal number of places to keep...depends on specific situations each system must support, e.g. tiny items whose prices are fractions of a penny per unit); and, floating point computations performed on values where the results are plus/minus epsilon. I have been confronting these issues and evolving my own strategy over the years. I won't claim that I have faced every scenario or have the best answer, but below is an example of my approach so far that overcomes these issues:
Suppose 6 decimal places is regarded as sufficient precision for calculations on floats/doubles (an arbitrary decision for the specific application), using the following rounding function/method:
double Round(double x, int p)
{
if (x != 0.0) {
return ((floor((fabs(x)*pow(double(10.0),p))+0.5))/pow(double(10.0),p))*(x/fabs(x));
} else {
return 0.0;
}
}
Rounding to 2 decimal places for presentation of a result can be performed as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,8),6),2));
For val = 6.825, result is 6.83 as expected.
For val = 6.824999, result is 6.82. Here the assumption is that the calculation resulted in exactly 6.824999 and the 7th decimal place is zero.
For val = 6.8249999, result is 6.83. The 7th decimal place being 9 in this case causes the Round(val,6) function to give the expected result. For this case, there could be any number of trailing 9s.
For val = 6.824999499999, result is 6.83. Rounding to the 8th decimal place as a first step, i.e. Round(val,8), takes care of the one nasty case whereby a calculated floating point result calculates to 6.8249995, but is internally represented as 6.824999499999....
Finally, the example from the question...val = 37.777779 results in 37.78.
This approach could be further generalized as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,N+2),N),2));
where N is precision to be maintained for all intermediate calculations on floats/doubles. This works on negative values as well. I do not know if this approach is mathematically correct for all possibilities.

...or you can do it the old-fashioned way without any libraries:
float a = 37.777779;
int b = a; // b = 37
float c = a - b; // c = 0.777779
c *= 100; // c = 77.777863
int d = c; // d = 77;
a = b + d / (float)100; // a = 37.770000;
That of course if you want to remove the extra information from the number.

this function takes the number and precision and returns the rounded off number
float roundoff(float num,int precision)
{
int temp=(int )(num*pow(10,precision));
int num1=num*pow(10,precision+1);
temp*=10;
temp+=5;
if(num1>=temp)
num1+=10;
num1/=10;
num1*=10;
num=num1/pow(10,precision+1);
return num;
}
it converts the floating point number into int by left shifting the point and checking for the greater than five condition.

Floating point equality and tolerances

Comparing two floating point number by something like a_float == b_float is looking for trouble since a_float / 3.0 * 3.0 might not be equal to a_float due to round off error.
What one normally does is something like fabs(a_float - b_float) < tol.
How does one calculate tol?
Ideally tolerance should be just larger than the value of one or two of the least significant figures. So if the single precision floating point number is use tol = 10E-6 should be about right. However this does not work well for the general case where a_float might be very small or might be very large.
How does one calculate tol correctly for all general cases? I am interested in C or C++ cases specifically.

This blogpost contains an example, fairly foolproof implementation, and detailed theory behind it
http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
it is also one of a series, so you can always read more.
In short: use ULP for most numbers, use epsilon for numbers near zero, but there are still caveats. If you want to be sure about your floating point math i recommend reading whole series.

As far as I know, one doesn't.
There is no general "right answer", since it can depend on the application's requirement for precision.
For instance, a 2D physics simulation working in screen-pixels might decide that 1/4 of a pixel is good enough, while a 3D CAD system used to design nuclear plant internals might not.
I can't see a way to programmatically decide this from the outside.

The C header file <float.h> gives you the constants FLT_EPSILON and DBL_EPSILON, which is the difference between 1.0 and the smallest number larger than 1.0 that a float/double can represent. You can scale that by the size of your numbers and the rounding error you wish to tolerate:
#include <float.h>
#ifndef DBL_TRUE_MIN
/* DBL_TRUE_MIN is a common non-standard extension for the minimum denorm value
* DBL_MIN is the minimum non-denorm value -- use that if TRUE_MIN is not defined */
#define DBL_TRUE_MIN DBL_MIN
#endif
/* return the difference between |x| and the next larger representable double */
double dbl_epsilon(double x) {
int exp;
if (frexp(x, &exp) == 0.0)
return DBL_TRUE_MIN;
return ldexp(DBL_EPSILON, exp-1);
}

Welcome to the world of traps, snares and loopholes. As mentioned elsewhere, a general purpose solution for floating point equality and tolerances does not exist. Given that, there are tools and axioms that a programmer may use in select cases.
fabs(a_float - b_float) < tol has the shortcoming OP mentioned: "does not work well for the general case where a_float might be very small or might be very large." fabs(a_float - ref_float) <= fabs(ref_float * tol) copes with the variant ranges much better.
OP's "single precision floating point number is use tol = 10E-6" is a bit worrisome for C and C++ so easily promote float arithmetic to double and then it's the "tolerance" of double, not float, that comes into play. Consider float f = 1.0; printf("%.20f\n", f/7.0); So many new programmers do not realize that the 7.0 caused a double precision calculation. Recommend using double though out your code except where large amounts of data need the float smaller size.
C99 provides nextafter() which can be useful in helping to gauge "tolerance". Using it, one can determine the next representable number. This will help with the OP "... the full number of significant digits for the storage type minus one ... to allow for roundoff error." if ((nextafter(x, -INF) <= y && (y <= nextafter(x, +INF))) ...
The kind of tol or "tolerance" used is often the crux of the matter. Most often (IMHO) a relative tolerance is important. e. g. "Are x and y within 0.0001%"? Sometimes an absolute tolerance is needed. e.g. "Are x and y within 0.0001"?
The value of the tolerance is often debatable for the best value is often situation dependent. Comparing within 0.01 may work for a financial application for Dollars but not Yen. (Hint: be sure to use a coding style that allows easy updates.)

Rounding error varies according to values used for operations.
Instead of a fixed tolerance, you can probably use a factor of epsilon like:
bool nearly_equal(double a, double b, int factor /* a factor of epsilon */)
{
double min_a = a - (a - std::nextafter(a, std::numeric_limits<double>::lowest())) * factor;
double max_a = a + (std::nextafter(a, std::numeric_limits<double>::max()) - a) * factor;
return min_a <= b && max_a >= b;
}

Although the value of the tolerance depends on the situation, if you are looking for precision comparasion you could used as tolerance the machine epsilon value, numeric_limits::epsilon() (Library limits). The function returns the difference between 1 and the smallest value greater than 1 that is representable for the data type.
http://msdn.microsoft.com/en-us/library/6x7575x3.aspx
The value of epsilon differs if you are comparing floats or doubles. For instance, in my computer, if comparing floats the value of epsilon is 1.1920929e-007 and if comparing doubles the value of epsilon is 2.2204460492503131e-016.
For a relative comparison between x and y, multiply the epsilon by the maximum absolute value of x and y.
The result above could be multiplied by the ulps (units in the last place) which allows you to play with the precision.
#include <iostream>
#include <cmath>
#include <limits>
template<class T> bool are_almost_equal(T x, T y, int ulp)
{
return std::abs(x-y) <= std::numeric_limits<T>::epsilon() * std::max(std::abs(x), std::abs(y)) * ulp
}

When I need to compare floats, I use code like this
bool same( double a, double b, double error ) {
double x;
if( a == 0 ) {
x = b;
} else if( b == 0 ) {
x = a;
} else {
x = (a-b) / a;
}
return fabs(x) < error;
}

Comparing Same Float Values In C [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
strange output in comparison of float with float literal
When I am trying to compare 2 same float values it doesn't print "equal values" in the following code :
void main()
{
float a = 0.7;
clrscr();
if (a < 0.7)
printf("value : %f",a);
else if (a == 0.7)
printf("equal values");
else
printf("hello");
getch();
}
Thanks in advance.

While many people will tell you to always compare floating point numbers with an epsilon (and it's usually a good idea, though it should be a percentage of the values being compared rather than a fixed value), that's not actually necessary here since you're using constants.
Your specific problem here is that:
float a = 0.7;
uses the double constant 0.7 to create a single precision number (losing some precision) while:
if (a == 0.7)
will compare two double precision numbers (a is promoted first).
The precision that was lost when turning the double 0.7 into the float a is not regained when promoting a back to a double.
If you change all those 0.7 values to 0.7f (to force float rather than double), or if you just make a a double, it will work fine - I rarely use float nowadays unless I have a massive array of them and need to save space.
You can see this in action with:
#include <stdio.h>
int main (void){
float f = 0.7; // double converted to float
double d1 = 0.7; // double kept as double
double d2 = f; // float converted back to double
printf ("double: %.30f\n", d1);
printf ("double from float: %.30f\n", d2);
return 0;
}
which will output something like (slightly modified to show difference):
double: 0.6999999|99999999955591079014994
double from float: 0.6999999|88079071044921875000000
\_ different beyond here.

Floating point number are not what you think they are: here are two sources with more information: What Every Computer Scientist Should Know About Floating-Point Arithmetic and The Floating-Point Guide.
The short answer is that due to the way floating point numbers are represented, you cannot do basic comparison or arithmetic and expect it to work.

You are comparing a single-precision approximation of 0.7 with a double-precision approximation. To get the expected output you should use:
if(a == 0.7f) // check a is exactly 0.7f
Note that due to representation and rounding errors it may be very unlikely to ever get exactly 0.7f from any operation. In general you should check if fabs(a-0.7) is sufficiently close to 0.
Don't forget that the exact value of 0.7f is not really 0.7, but slightly lower:
0.7f = 0.699999988079071044921875
The exact value of the double precision representation of 0.7 is a better approximation, but still not exactly 0.7:
0.7d = 0.6999999999999999555910790149937383830547332763671875

a is a float; 0.7 is a value of type double.
The comparison between the two requires a conversion. The compiler will convert the float value to a double value ... and the value resulting from converting a float to a double is not the same as the value resulting from the compiler converting a string of text (the source code) to a double.
But don't ever compare floating point values (float, double, or long double) with ==.
You might like to read "What Every Programmer Should Know About Floating-Point Arithmetic".

Floating point numbers must not be compared with the "==" operator.
Instead of comparing float numbers with the "==" operator, you can use a function like this one :
//compares if the float f1 is equal with f2 and returns 1 if true and 0 if false
int compare_float(float f1, float f2)
{
float precision = 0.00001;
if (((f1 - precision) < f2) &&
((f1 + precision) > f2))
{
return 1;
}
else
{
return 0;
}
}

The lack of absolute precision in floats makes it more difficult to do trivial comparisons than for integers. See this page on comparing floats in C. In particular, one code snippet lifted from there exhibits a 'workaround' to this issue:
bool AlmostEqual2sComplement(float A, float B, int maxUlps)
{
// Make sure maxUlps is non-negative and small enough that the
// default NAN won't compare as equal to anything.
assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);
int aInt = *(int*)&A;
// Make aInt lexicographically ordered as a twos-complement int
if (aInt < 0)
aInt = 0x80000000 - aInt;
// Make bInt lexicographically ordered as a twos-complement int
int bInt = *(int*)&B;
if (bInt < 0)
bInt = 0x80000000 - bInt;
int intDiff = abs(aInt - bInt);
if (intDiff <= maxUlps)
return true;
return false;
}
A simple and common workaround is to provide an epsilon with code like so:
if (fabs(result - expectedResult) < 0.00001)
This essentially checks the difference between the values is within a threshold. See the linked article as to why this is not always optimal though :)
Another article is pretty much the de facto standard of what is linked to when people ask about floats on SO.

if you need to compare a with 0.7 than
if( fabs(a-0.7) < 0.00001 )
//your code
here 0.00001 can be changed to less (like 0.00000001) or more (like 0.0001) > It depends on the precision you need.

Can I trust a real-to-int conversion of the result of ceil()?

Suppose I have some code such as:
float a, b = ...; // both positive
int s1 = ceil(sqrt(a/b));
int s2 = ceil(sqrt(a/b)) + 0.1;
Is it ever possible that s1 != s2? My concern is when a/b is a perfect square. For example, perhaps a=100.0 and b=4.0, then the output of ceil should be 5.00000 but what if instead it is 4.99999?
Similar question: is there a chance that 100.0/4.0 evaluates to say 5.00001 and then ceil will round it up to 6.00000?
I'd prefer to do this in integer math but the sqrt kinda screws that plan.
EDIT: suggestions on how to better implement this would be appreciated too! The a and b values are integer values, so actual code is more like: ceil(sqrt(float(a)/b))
EDIT: Based on levis501's answer, I think I will do this:
float a, b = ...; // both positive
int s = sqrt(a/b);
while (s*s*b < a) ++s;
Thank you all!

I don't think it's possible. Regardless of the value of sqrt(a/b), what it produces is some value N that we use as:
int s1 = ceil(N);
int s2 = ceil(N) + 0.1;
Since ceil always produces an integer value (albeit represented as a double), we will always have some value X, for which the first produces X.0 and the second X.1. Conversion to int will always truncate that .1, so both will result in X.
It might seem like there would be an exception if X was so large that X.1 overflowed the range of double. I don't see where this could be possible though. Except close to 0 (where overflow isn't a concern) the square root of a number will always be smaller than the input number. Therefore, before ceil(N)+0.1 could overflow, the a/b being used as an input in sqrt(a/b) would have to have overflowed already.

You may want to write an explicit function for your case. e.g.:
/* return the smallest positive integer whose square is at least x */
int isqrt(double x) {
int y1 = ceil(sqrt(x));
int y2 = y1 - 1;
if ((y2 * y2) >= x) return y2;
return y1;
}
This will handle the odd case where the square root of your ratio a/b is within the precision of double.

Equality of floating point numbers is indeed an issue, but IMHO not if we deal with integer numbers.
If you have the case of 100.0/4.0, it should perfectly evaluate to 25.0, as 25.0 is exactly representable as a float, as opposite to e.g. 25.1.

Yes, it's entirely possible that s1 != s2. Why is that a problem, though?
It seems natural enough that s1 != (s1 + 0.1).
BTW, if you would prefer to have 5.00001 rounded to 5.00000 instead of 6.00000, use rint instead of ceil.
And to answer the actual question (in your comment) - you can use sqrt to get a starting point and then just find the correct square using integer arithmetic.
int min_dimension_greater_than(int items, int buckets)
{
double target = double(items) / buckets;
int min_square = ceil(target);
int dim = floor(sqrt(target));
int square = dim * dim;
while (square < min_square) {
seed += 1;
square = dim * dim;
}
return dim;
}
And yes, this can be improved a lot, it's just a quick sketch.

s1 will always equal s2.
The C and C++ standards do not say much about the accuracy of math routines. Taken literally, it is impossible for the standard to be implemented, since the C standard says sqrt(x) returns the square root of x, but the square root of two cannot be exactly represented in floating point.
Implementing routines with good performance that always return a correctly rounded result (in round-to-nearest mode, this means the result is the representable floating-point number that is nearest to the exact result, with ties resolved in favor of a low zero bit) is a difficult research problem. Good math libraries target accuracy less than 1 ULP (so one of the two nearest representable numbers is returned), perhaps something slightly more than .5 ULP. (An ULP is the Unit of Least Precision, the value of the low bit given a particular value in the exponent field.) Some math libraries may be significantly worse than this. You would have to ask your vendor or check the documentation for more information.
So sqrt may be slightly off. If the exact square root is an integer (within the range in which integers are exactly representable in floating-point) and the library guarantees errors are less than 1 ULP, then the result of sqrt must be exactly correct, because any result other than the exact result is at least 1 ULP away.
Similarly, if the library guarantees errors are less than 1 ULP, then ceil must return the exact result, again because the exact result is representable and any other result would be at least 1 ULP away. Additionally, the nature of ceil is such that I would expect any reasonable math library to always return an integer, even if the rest of the library were not high quality.
As for overflow cases, if ceil(x) were beyond the range where all integers are exactly representable, then ceil(x)+.1 is closer to ceil(x) than it is to any other representable number, so the rounded result of adding .1 to ceil(x) should be ceil(x) in any system implementing the floating-point standard (IEEE 754). That is provided you are in the default rounding mode, which is round-to-nearest. It is possible to change the rounding mode to something like round-toward-infinity, which could cause ceil(x)+.1 to be an integer higher than ceil(x).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

machine epsilon - long double in c++ - c++

Related

Rounding off floating numbers in cpp

How to round a floating point type to two decimals or more in C++? [duplicate]

Floating point equality and tolerances

Comparing Same Float Values In C [duplicate]

Can I trust a real-to-int conversion of the result of ceil()?

Categories

Resources