C program outputting different values with different numbers? [duplicate] - c++

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 6 years ago.
See the program below
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
And another program here
#include<stdio.h>
int main()
{
float x = 0.5;
if (x == 0.5)
printf("IF");
else if (x == 0.5f)
printf("ELSE IF");
else
printf("ELSE");
}
From the both programs we expect similar results because nothing has literally changed in both changed, everything is same and also comparison terms are changed correspondingly.
BUT 2 above programs produce different results
1st Program
ELSE
2nd Program
IF
Why is this 2 programs behaving differently

The behavior of these two programs will vary between computers and operating systems - you are testing for exact equality of floats.
In memory, floats are stored as a string of bits in binary - i.e. 0.1 in binary (0.1b) represents 0.5 in decimal (0.5d).
Similarly,
Binary | Decimal
0.1 | 2^-1 = 1/2
0.01 | 2^-2 = 1/4
0.001 | 2^-3 = 1/8
0.11 | 2^-1 + 2^-2 = 3/4
The problem is that some decimals don't have nice floating point representations.
0.1d = 0.0001100110011001100110011...
which is infinitely long.
So, 0.5 is really nice in binary
0.5d = 0.1000000000000000...b
but 0.1 is really nasty
0.1d = 0.00011001100110011...
Now depending on your compiler, it may assume that 0.1f is a double type, which stores more of the infinite sequence of 0.0001100110011001100110011001100110011...
so it is not equal to the float version, which truncates the sequence much earlier.
On the other hand, 0.5f is the same regardless of how many decimal places are stored, since it has all zeroes after the first place.
The accepted way to compare floats or doubles in C++ or C is to #define a very small number (I like to call it EPS, short for EPSILON) and replace
float a = 0.1f
if (a == 0.1f) {
printf("IF\n")
} else {
printf("ELSE\n")
}
with
#include <math.h>
#define EPS 0.0000001f
float a = 0.1f
if (abs(a - 0.1f) < EPS) {
printf("IF\n")
} else {
printf("ELSE\n")
}
Effectively, this tests if a is 'close enough' to 0.1f instead of exact equality. For 99% of applications, this approach works just fine, but for super-sensitive calculations some stranger tricks are needed that involve using long double, or defining a custom data type.

You are using two data types: double,automaticly in if(x=0.1))(0.1 is double) and x is float. these types differ how they store the value. 0.1 is not 0.1f, it is 0.100000000001 (double) or 0.09388383(something)

Related

How does Cpp work with large numbers in calculations?

I have a code that tries to solve an integral of a function in a given interval numerically, using the method of Trapezoidal Rule (see the formula in Trapezoid method ), now, for the function sin(x) in the interval [-pi/2.0,pi/2.0], the integral is waited to be zero.
In this case, I take the number of partitions 'n' equal to 4. The problem is that when I have pi with 20 decimal places it is zero, with 14 decimal places it is 8.72e^(-17), then with 11 decimal places, it is zero, with 8 decimal places it is 8.72e^(-17), with 3 decimal places it is zero. I mean, the integral is zero or a number near zero for different approximations of pi, but it doesn't have a clear trend.
I would appreciate your help in understanding why this happens. (I did run it in Dev-C++).
#include <iostream>
#include <math.h>
using namespace std;
#define pi 3.14159265358979323846
//Pi: 3.14159265358979323846
double func(double x){
return sin(x);
}
int main() {
double x0 = -pi/2.0, xf = pi/2.0;
int n = 4;
double delta_x = (xf-x0)/(n*1.0);
double sum = (func(x0)+func(xf))/2.0;
double integral;
for (int k = 1; k<n; k++){
// cout<<"func: "<<func(x0+(k*delta_x))<<" "<<"last sum: "<<sum<<endl;
sum = sum + func(x0+(k*delta_x));
// cout<<"func + last sum= "<<sum<<endl;
}
integral = delta_x*sum;
cout<<"The value for the integral is: "<<integral<<endl;
return 0;
}
OP is integrating y=sin(x) from -a to +a. The various tests use different values of a, all near pi/2.
The approach uses a linear summation of values near -1.0, down to 0 and then up to near 1.0.
This summation is sensitive to calculation error with the last terms as the final math sum is expected to be 0.0. Since the start/end a varies, the error varies.
A more stable result would be had adding the extreme f = sin(f(k)) values first. e.g. sum += sin(f(k=1)), then sum += sin(f(k=3)), then sum += sin(f(k=2)) rather than k=1,2,3. In particular the formation of term x=f(k=3) is likely a bit off from the negative of its x=f(k=1) earlier term, further compounding the issue.
Welcome to the world or numerical analysis.
Problem exists if code used all float or all long double, just different degrees.
Problem is not due to using an inexact value of pi (Exact value is impossible with FP as pi is irrational and all finite FP are rational).
Much is due to the formation of x. Could try the below to form the x symmetrically about 0.0. Compare exactly x generated this way to x the original way.
x = (x0-x1)/2 + ((k - n/2)*delta_x)
Print out the exact values computed for deeper understanding.
printf("x:%a y:%a\n", x0+(k*delta_x), func(x0+(k*delta_x)));

Precision issues when converting a decimal number to its rational equivalent

I have problem of converting a double (say N) to p/q form (rational form), for this I have the following strategy :
Multiply double N by a large number say $k = 10^{10}$
then p = y*k and q = k
Take gcd(p,q) and find p = p/gcd(p,q) and q = p/gcd(p,q)
when N = 8.2 , Answer is correct if we solve using pen and paper, but as 8.2 is represented as 8.19999999 in N (double), it causes problem in its rational form conversion.
I tried it doing other way as : (I used a large no. 10^k instead of 100)
if(abs(y*100 - round(y*100)) < 0.000001) y = round(y*100)/100
But this approach also doesn't give right representation all the time.
Is there any way I could carry out the equivalent conversion from double to p/q ?
Floating point arithmetic is very difficult. As has been mentioned in the comments, part of the difficulty is that you need to represent your numbers in binary.
For example, the number 0.125 can be represented exactly in binary:
0.125 = 2^-3 = 0b0.001
But the number 0.12 cannot.
To 11 significant figures:
0.12 = 0b0.00011110101
If this is converted back to a decimal then the error becomes obvious:
0b0.00011110101 = 0.11962890625
So if you write:
double a = 0.2;
What the machine actually does is find the closest binary representation of 0.2 that it can hold within a double data type. This is an approximation since as we saw above, 0.2 cannot be exactly represented in binary.
One possible approach is to define an 'epsilon' which determines how close your number can be to the nearest representable binary floating point.
Here is a good article on floating points:
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
have problem of converting a double (say N) to p/q form
... when N = 8.2
A typical double cannot encode 8.2 exactly. Instead the closest representable double is about
8.19999999999999928945726423989981412887573...
8.20000000000000106581410364015027880668640... // next closest
When code does
double N = 8.2;
It will be the 8.19999999999999928945726423989981412887573... that is converted into rational form.
Converting a double to p/q form:
Multiply double N by a large number say $k = 10^{10}$
This may overflow the double. First step should be to determine if the double is large, it which case, it is a whole number.
Do not multiple by some power of 10 as double certainly uses a binary encoding. Multiplication by 10, 100, etc. may introduce round-off error.
C implementations of double overwhelmingly use a binary encoding, so that FLT_RADIX == 2.
Then every finite double x has a significand that is a fraction of some integer over some power of 2: a binary fraction of DBL_MANT_DIG digits #Richard Critten. This is often 53 binary digits.
Determine the exponent of the double. If large enough or x == 0.0, the double is a whole number.
Otherwise, scale a numerator and denominator by DBL_MANT_DIG. While the numerator is even, halve both the numerator and denominator. As the denominator is a power-of-2, no other prime values are needed for simplification consideration.
#include <float.h>
#include <math.h>
#include <stdio.h>
void form_ratio(double x) {
double numerator = x;
double denominator = 1.0;
if (isfinite(numerator) && x != 0.0) {
int expo;
frexp(numerator, &expo);
if (expo < DBL_MANT_DIG) {
expo = DBL_MANT_DIG - expo;
numerator = ldexp(numerator, expo);
denominator = ldexp(1.0, expo);
while (fmod(numerator, 2.0) == 0.0 && denominator > 1.0) {
numerator /= 2.0;
denominator /= 2.0;
}
}
}
int pre = DBL_DECIMAL_DIG;
printf("%.*g --> %.*g/%.*g\n", pre, x, pre, numerator, pre, denominator);
}
int main(void) {
form_ratio(123456789012.0);
form_ratio(42.0);
form_ratio(1.0 / 7);
form_ratio(867.5309);
}
Output
123456789012 --> 123456789012/1
42 --> 42/1
0.14285714285714285 --> 2573485501354569/18014398509481984
867.53089999999997 --> 3815441248019913/4398046511104

How i can make matlab precision to be the same as in c++?

I have problem with precision. I have to make my c++ code to have same precision as matlab. In matlab i have script which do some stuff with numbers etc. I got code in c++ which do the same as that script. Output on the same input is diffrent :( I found that in my script when i try 104 >= 104 it returns false. I tried to use format long but it did not help me to find out why its false. Both numbers are type of double. i thought that maybe matlab stores somewhere the real value of 104 and its for real like 103.9999... So i leveled up my precision in c++. It also didnt help because when matlab returns me value of 50.000 in c++ i got value of 50.050 with high precision. Those 2 values are from few calculations like + or *. Is there any way to make my c++ and matlab scrips have same precision?
for i = 1:neighbors
y = spoints(i,1)+origy;
x = spoints(i,2)+origx;
% Calculate floors, ceils and rounds for the x and y.
fy = floor(y); cy = ceil(y); ry = round(y);
fx = floor(x); cx = ceil(x); rx = round(x);
% Check if interpolation is needed.
if (abs(x - rx) < 1e-6) && (abs(y - ry) < 1e-6)
% Interpolation is not needed, use original datatypes
N = image(ry:ry+dy,rx:rx+dx);
D = N >= C;
else
% Interpolation needed, use double type images
ty = y - fy;
tx = x - fx;
% Calculate the interpolation weights.
w1 = (1 - tx) * (1 - ty);
w2 = tx * (1 - ty);
w3 = (1 - tx) * ty ;
w4 = tx * ty ;
%Compute interpolated pixel values
N = w1*d_image(fy:fy+dy,fx:fx+dx) + w2*d_image(fy:fy+dy,cx:cx+dx) + ...
w3*d_image(cy:cy+dy,fx:fx+dx) + w4*d_image(cy:cy+dy,cx:cx+dx);
D = N >= d_C;
end
I got problems in else which is in line 12. tx and ty eqauls 0.707106781186547 or 1 - 0.707106781186547. Values from d_image are in range 0 and 255. N is value 0..255 of interpolating 4 pixels from image. d_C is value 0.255. Still dunno why matlab shows that when i have in N vlaues like: x x x 140.0000 140.0000 and in d_C: x x x 140 x. D gives me 0 on 4th position so 140.0000 != 140. I Debugged it trying more precision but it still says that its 140.00000000000000 and it is still not 140.
int Codes::Interpolation( Point_<int> point, Point_<int> center , Mat *mat)
{
int x = center.x-point.x;
int y = center.y-point.y;
Point_<double> my;
if(x<0)
{
if(y<0)
{
my.x=center.x+LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x+LEN;
my.y=center.y-LEN;
}
}
else
{
if(y<0)
{
my.x=center.x-LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x-LEN;
my.y=center.y-LEN;
}
}
int a=my.x;
int b=my.y;
double tx = my.x - a;
double ty = my.y - b;
double wage[4];
wage[0] = (1 - tx) * (1 - ty);
wage[1] = tx * (1 - ty);
wage[2] = (1 - tx) * ty ;
wage[3] = tx * ty ;
int values[4];
//wpisanie do tablicy 4 pixeli ktore wchodza do interpolacji
for(int i=0;i<4;i++)
{
int val = mat->at<uchar>(Point_<int>(a+help[i].x,a+help[i].y));
values[i]=val;
}
double moze = (wage[0]) * (values[0]) + (wage[1]) * (values[1]) + (wage[2]) * (values[2]) + (wage[3]) * (values[3]);
return moze;
}
LEN = 0.707106781186547 Values in array values are 100% same as matlab values.
Matlab uses double precision. You can use C++'s double type. That should make most things similar, but not 100%.
As someone else noted, this is probably not the source of your problem. Either there is a difference in the algorithms, or it might be something like a library function defined differently in Matlab and in C++. For example, Matlab's std() divides by (n-1) and your code may divide by n.
First, as a rule of thumb, it is never a good idea to compare floating point variables directly. Instead of, for example instead of if (nr >= 104) you should use if (nr >= 104-e), where e is a small number, like 0.00001.
However, there must be some serious undersampling or rounding error somewhere in your script, because getting 50050 instead of 50000 is not in the limit of common floating point imprecision. For example, Matlab can have a step of as small as 15 digits!
I guess there are some casting problems in your code, for example
int i;
double d;
// ...
d = i/3 * d;
will will give a very inaccurate result, because you have an integer division. d = (double)i/3 * d or d = i/3. * d would give a much more accurate result.
The above example would NOT cause any problems in Matlab, because there everything is already a floating-point number by default, so a similar problem might be behind the differences in the results of the c++ and Matlab code.
Seeing your calculations would help a lot in finding what went wrong.
EDIT:
In c and c++, if you compare a double with an integer of the same value, you have a very high chance that they will not be equal. It's the same with two doubles, but you might get lucky if you perform the exact same computations on them. Even in Matlab it's dangerous, and maybe you were just lucky that as both are doubles, both got truncated the same way.
By you recent edit it seems, that the problem is where you evaluate your array. You should never use == or != when comparing floats or doubles in c++ (or in any languages when you use floating-point variables). The proper way to do a comparison is to check whether they are within a small distance of each other.
An example: using == or != to compare two doubles is like comparing the weight of two objects by counting the number of atoms in them, and deciding that they are not equal even if there is one single atom difference between them.
MATLAB uses double precision unless you say otherwise. Any differences you see with an identical implementation in C++ will be due to floating-point errors.

sqrt(1.0 - pow(1.0,2)) returns -nan [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Why pow(10,5) = 9,999 in C++
(8 answers)
Closed 4 years ago.
I've found an interesting floating point problem. I have to calculate several square roots in my code, and the expression is like this:
sqrt(1.0 - pow(pos,2))
where pos goes from -1.0 to 1.0 in a loop. The -1.0 is fine for pow, but when pos=1.0, I get an -nan. Doing some tests, using gcc 4.4.5 and icc 12.0, the output of
1.0 - pow(pos,2) = -1.33226763e-15
and
1.0 - pow(1.0,2) = 0
or
poss = 1.0
1.0 - pow(poss,2) = 0
Where clearly the first one is going to give problems, being negative. Anyone knows why pow is returning a number smaller than 0? The full offending code is below:
int main() {
double n_max = 10;
double a = -1.0;
double b = 1.0;
int divisions = int(5 * n_max);
assert (!(b == a));
double interval = b - a;
double delta_theta = interval / divisions;
double delta_thetaover2 = delta_theta / 2.0;
double pos = a;
//for (int i = 0; i < divisions - 1; i++) {
for (int i = 0; i < divisions+1; i++) {
cout<<sqrt(1.0 - pow(pos, 2)) <<setw(20)<<pos<<endl;
if(isnan(sqrt(1.0 - pow(pos, 2)))){
cout<<"Danger Will Robinson!"<<endl;
cout<< sqrt(1.0 - pow(pos,2))<<endl;
cout<<"pos "<<setprecision(9)<<pos<<endl;
cout<<"pow(pos,2) "<<setprecision(9)<<pow(pos, 2)<<endl;
cout<<"delta_theta "<<delta_theta<<endl;
cout<<"1 - pow "<< 1.0 - pow(pos,2)<<endl;
double poss = 1.0;
cout<<"1- poss "<<1.0 - pow(poss,2)<<endl;
}
pos += delta_theta;
}
return 0;
}
When you keep incrementing pos in a loop, rounding errors accumulate and in your case the final value > 1.0. Instead of that, calculate pos by multiplication on each round to only get minimal amount of rounding error.
The problem is that floating point calculations are not exact, and that 1 - 1^2 may be giving small negative results, yielding an invalid sqrt computation.
Consider capping your result:
double x = 1. - pow(pos, 2.);
result = sqrt(x < 0 ? 0 : x);
or
result = sqrt(abs(x) < 1e-12 ? 0 : x);
setprecision(9) is going to cause rounding. Use a debugger to see what the value really is. Short of that, at least set the precision beyond the possible size of the type you're using.
You will almost always have rounding errors when calculating with doubles, because the double type has only 15 significant decimal digits (52 bits) and a lot of decimal numbers are not convertible to binary floating point numbers without rounding. The IEEE standard contains a lot of effort to keep those errors low, but by principle it cannot always succeed. For a thorough introduction see this document
In your case, you should calculate pos on each loop and round to 14 or less digits. That should give you a clean 0 for the sqrt.
You can calc pos inside the loop as
pos = round(a + interval * i / divisions, 14);
with round defined as
double round(double r, int digits)
{
double multiplier = pow(digits,10);
return floor(r*multiplier + 0.5)/multiplier;
}

Float increments precision problems with UI

Here is my problem, I have several parameters that I need to increment by 0.1.
But my UI only renders x.x , x.xx, x.xxx for floats so since 0.1f is not really 0.1 but something like 0.10000000149011612 on the long run my ui will render -0.00 and that doesn't make much sense. How to prevent that for all the possible cases of UI.
Thank you.
Use integers and divide by 10 (or 1000 etc...) just before displaying. Your parameters will store an integer number of tenths, and you'll increment them by 1 tenth.
If you know that your floating point value will always be a multiple of 0.1, you can round it after every increment to make sure it maintains a sensible value. It still won't be exact (because it physically can't be), but at least the errors won't accumulate and it will display properly.
Instead of:
x += delta;
Do:
x = floor((x + delta) / precision + 0.5) * precision;
Edit: It's useful to turn the rounding into a stand-alone function and decouple it from the increment:
inline double round(double value, double precision = 1.0)
{
return floor(value / precision + 0.5) * precision;
}
x = round(x + 0.1, 0.1);