Precision of acos function in c++ - c++

i am trying to calculate the distance between two points and using the acos() function in process...but i am not getting a precise result..in case the distance is small
float distance_between(dest& point1,dest point2) {
float EARTH_RADIUS = 6371.0;//in km
float point1_lat_in_radians = point1.lat*(PI/180);
float point2_lat_in_radians = point2.lat*(PI/180);
float point1_long_in_radians = point1.lon*(PI/180);
float point2_long_in_radians = point2.lon*(PI/180);
float res = acos( sin( point1_lat_in_radians ) * sin( point2_lat_in_radians ) + cos( point1_lat_in_radians ) * cos( point2_lat_in_radians ) * cos( point2_long_in_radians - point1_long_in_radians) ) * EARTH_RADIUS;
cout<<res<<endl;
res = round(res*100)/100;
return res;
}
i am checking the distance between the following co-ordinates
52.378281 4.900070 and 52.379141 4.880590
52.373634 4.890289 and 52.379141 4.880590
the result is 0 in both cases..i know the distance is small but is there a way to get precise distance like 0.xxx?

Use double instead of float to get more precision.
That way you are going to use this prototype:
double acos (double x);
A must read is the Difference between float and double question. From there we have:
As the name implies, a double has 2x the precision of float.
The C and C++ standards do not specify the representation of float,
double and long double. It is possible that all three implemented as
IEEE double-precision. Nevertheless, for most architectures (gcc,
MSVC; x86, x64, ARM) float is indeed a IEEE single-precision
floating point number (binary32), and double is a IEEE
double-precision floating point number (binary64).

Related

Loss of precision when casting float to double

I guess I'm hitting a precision issue with my c++ program. And I don't understand why I'm getting different results in my values.
res equals to 1321.0000001192093 if I write:
float sy = -0.207010582f;
double res = -1512.*((double)sy - (2. / 3.));
but res2 equals to 1320.9999999839999 if I write:
double res2 = -1512.*(-0.207010582 - (2. / 3.));
Why even syd is different from syd2 when I write this:
double syd = -0.207010582f;
double syd2 = -0.207010582000000000;
Can somebody give me a hand, to cast my float into a double properly and to understand what's going on ?
-0.207010582f is a decimal floating-point literal. But your computer doesn't use decimal floating point, it uses binary floating point. So the value of that literal will be rounded to float precision.
Similarly, -0.207010582 is rounded to double precision. While that's closer, it still is not equal to -0.207010582 decimal.
Since double has more precision than float, you will not lose precision by casting from float to double. Any rounding will have happened earlier.
Single-Precision
As others have said, float sy = -0.207010582f; initializes a single-precision (32-bit) floating point variable from a single-precision floating point literal.
This will be treated (in storage and calculations) as the nearest representable number in that format. This number is -0.20701058208942413330078125
You code is effectively then float sy = -0.20701058208942413330078125;
You can confirm that this is the nearest representable value by looking at the adjacent single-precision floating point numbers.
-0.20701059699058532714843750 // std::nextafter( sy, std::numeric_limits<float>::lowest() )
-0.20701058208942413330078125 // sy
-0.20701056718826293945312500 // std::nextafter( sy, std::numeric_limits<float>::max() )
Double-Precision
Exactly the same occurs with double-precision floating point numbers, it's just their increased resolution means the differences are small.
e.g double dy = -0.207010582; actually represents the value 0.20701058199999999853702092877938412129878997802734375
Similarly, the adjacent values that can be represented are -
-0.2070105820000000262925965444082976318895816802978515625 // std::nextafter( dy, std::numeric_limits<double>::lowest() )
-0.2070105819999999985370209287793841212987899780273437500 // dy
-0.2070105819999999707814453131504706107079982757568359375 // std::nextafter( dy, std::numeric_limits<double>::max() )
Single to Double Conversion
All single precision floating point values are exactly representable in double-precision. Hence, nothing is lost in conversions from single to double precision.
All the above assumes IEEE754 floating-point representation.

How to increase accuracy of floating point second derivative calculation?

I've written a simple program to calculate the first and second derivative of a function, using function pointers. My program computes the correct answers (more or less), but for some functions, the accuracy is less than I would like.
This is the function I am differentiating:
float f1(float x) {
return (x * x);
}
These are the derivative functions, using the central finite difference method:
// Function for calculating the first derivative.
float first_dx(float (*fx)(float), float x) {
float h = 0.001;
float dfdx;
dfdx = (fx(x + h) - fx(x - h)) / (2 * h);
return dfdx;
}
// Function for calculating the second derivative.
float second_dx(float (*fx)(float), float x) {
float h = 0.001;
float d2fdx2;
d2fdx2 = (fx(x - h) - 2 * fx(x) + fx(x + h)) / (h * h);
return d2fdx2;
}
Main function:
int main() {
pc.baud(9600);
float x = 2.0;
pc.printf("**** Function Pointers ****\r\n");
pc.printf("Value of f(%f): %f\r\n", x, f1(x));
pc.printf("First derivative: %f\r\n", first_dx(f1, x));
pc.printf("Second derivative: %f\r\n\r\n", second_dx(f1, x));
}
This is the output from the program:
**** Function Pointers ****
Value of f(2.000000): 4.000000
First derivative: 3.999948
Second derivative: 1.430511
I'm happy with the accuracy of the first derivative, but I believe the second derivative is too far off (it should be equal to ~2.0).
I have a basic understanding of how floating point numbers are represented and why they are sometimes inaccurate, but how can I make this second derivative result more accurate? Could I be using something better than the central finite difference method, or is there a way I can get better results with the current method?
The accuracy can be increased by choosing a type which has more precision. float is currently defined as an IEEE-754 32-bit number, giving you a precision of ~7.225 decimal places.
What you want is the 64-bit counterpart: double with ~15.955 decimal places accuracy.
That should be sufficient for your calculation, however worth mentioning is boosts implementation which offers a quadruple-precision floating point number (128-bit).
Finally The GNU Multiple Precision Arithmetic Library offers types with an arbitrary number of decimal places for precision.
Go analytical. ;-) probably not an option given "with the current
method".
Use double instead of float.
Vary the epsilon (h), and combine the results in some way. For example you could try 0.00001, 0.000001, 0.0000001 and average them. In fact, you'd want the result with the smallest h that doesn't overflow/underflow. But it's not clear how to detect overflow and underflow.

Multi line math results different than single line

I was having an issue with some floating point math and I've found that if I do my math on one line, I get -0 passed to tan(), and if I do it across two lines, I get 0 passed to tan(). Have a look:
float theta = PI / 2.f;
float p = (PI / 2.f) - theta;
float result = tan(p);
The above, p = -0, result = -4.37...
float theta = PI / 2.f;
float p = PI / 2.f;
p -= theta;
float result = tan(p);
The above, p = 0, result = 0.
Can anyone explain the difference? I assume the -0 is causing that result from tan(), although I can't find anything on google that explains why. Why does the exact same calculation spread across different lines result in a different answer?
Thanks
It is probably because of the type of PI.
If you use double it will change to float and then the outcome
will be as you just represent.
But if PI is float both of this test scenarios are equal.
What #Naor says is probably correct. but I'd like to add something.
You probably not getting -4.37xx but -4.37xxxe-xx which is a pretty small negative number.
Since you can always get errors in floating point math. I'd say there is no need to change your code. Both snips are correct.
So this is what, in my opinion, is happening:
In both examples PI is a define, probably defined like this:
#define 3.14 //and some more digits
In C++, number like this is treated as double.
After preprocessing, this expression:
PI / 2.0f
will be treated as double-typed prvalue. This means that this line hides one more operation:
float theta = PI / 2.f;
which is a double-to-float conversion, which definitely looses some precision in this case.
In first example this also happens here:
float p = (PI / 2.f) - theta;
but only after evaluating whole expression. Note that during this evaluation (PI / 2.f) will be still double, but theta will be a float-to-double converted value, which explains the slight difference in result from 0.0.
In your last example you first convert (PI / 2.f) to float:
float p = PI / 2.f;
to subtract float-typed theta from it in next line. Which must result to 0.0, which probably compiler optimized out anyway ; ).

sum of weights should be exactly 1.0 no matter on which platform it is

I have such a function that calculates weights according to Gaussian distribution:
const float dx = 1.0f / static_cast<float>(points - 1);
const float sigma = 1.0f / 3.0f;
const float norm = 1.0f / (sqrtf(2.0f * static_cast<float>(M_PI)) * sigma);
const float divsigma2 = 0.5f / (sigma * sigma);
m_weights[0] = 1.0f;
for (int i = 1; i < points; i++)
{
float x = static_cast<float>(i)* dx;
m_weights[i] = norm * expf(-x * x * divsigma2) * dx;
m_weights[0] -= 2.0f * m_weights[i];
}
In all the calc above the number does not matter. The only thing matters is that m_weights[0] = 1.0f; and each time I calculate m_weights[i] I subtract it twice from m_weights[0] like this:
m_weights[0] -= 2.0f * m_weights[i];
to ensure that w[0] + 2 * w[i] (1..N) will sum to exactly 1.0f. But it does not. This assert fails:
float wSum = 0.0f;
for (size_t i = 0; i < m_weights.size(); ++i)
{
float w = m_weights[i];
if (i == 0) {
wSum += w;
} else {
wSum += (w + w);
}
}
assert(wSum == 1.0 && "Weights sum is not 1.");
How can I ensure the sum to be 1.0f on all platforms?
You can't. Floating point isn't like that. Even adding the same values can produce different results according to the cpu used.
All you can do is define some accuracy value and ensure that you end up with 1.0 +/- that value.
See: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Because the precision of float is only 23 bits (see e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format ), rounding error quickly accumulates therefore even if the rest of code is correct, your sum becomes something like 1.0000001 or 0.9999999 (have you watched it in the debugger or tried to print it to console, by the way?). To improve precision you can replace float with double, but still the sum will not be exactly 1.0: the error will just be smaller, something like 1e-16 instead of 1e-7.
The second thing to do is to replace strict comparison to 1.0 with a range comparison, like:
assert(fabs(wSum - 1.0) <= 1e-13 && "Weights sum is not 1.");
Here 1e-13 is the epsilon within which you consider two floating-point numbers equal. If you choose to go with float (not double), you may need epsilon like 1e-6 .
Depending on how large your weights are and how many points there are, accumulated error can become larger than that epsilon. In that case you would need special algorithms for keeping the precision higher, such as sorting the numbers by their absolute values prior to summing them up starting with the smallest numbers.
How can I ensure the sum to be 1.0f on all platforms?
As the other answers (and comments) have stated, you can't achieve this, due to the inexactness of floating point calculations.
One solution is that, instead of using double, use a fixed point or multi-precision library such as GMP, Boost Multiprecision Library, or one of the many others out there.

computing angle between two 3d vectors - how to implement

I would like to compute the angle between two 3d vectors. I'm using the following equation (source of equation) to achieve this:
diffangle = atan2(norm(cross(v1,v2)),dot(v1,v2))
The components of v1 and v2 are given in data type float, but since I have very small angles I would like to have the difference angle in type double. My actual implementation looks like follows:
double angle(float x1, float y1, float z1, float x2, float y2, float z2)
{
double dot = x1*x2 + y1*y2 + z1*z2;
double crossX = y1*z2-z1*y2;
double crossY = z1*x2-x1*z2;
double crossZ = x1*y2-y1*x2;
double norm = sqrt(crossX*crossX+crossY*crossY+crossZ*crossZ);
return (atan2(norm,dot)/M_PI*180);
}
Does my implementation what it should or do I have to cast somthing or take other things into account?
Thanks for your help.
Regarding the issue of accuracy: In C (and C++), floats will be promoted to double when a double in involved in the calculation (same thing for promoting from int to long int).
So the expression
double z = x*y;
First computes x*y in single precision (float), then casts the result to double. To actually perform the calculation using double precision you need to cast one of the elements involved in the expression:
double z = (double)x*y;
The simplest solution, however, would be to change the function declaration to accept double. This way, the float values would be promoted when calling the function and all calculations will use double precision.