I was having an issue with some floating point math and I've found that if I do my math on one line, I get -0 passed to tan(), and if I do it across two lines, I get 0 passed to tan(). Have a look:
float theta = PI / 2.f;
float p = (PI / 2.f) - theta;
float result = tan(p);
The above, p = -0, result = -4.37...
float theta = PI / 2.f;
float p = PI / 2.f;
p -= theta;
float result = tan(p);
The above, p = 0, result = 0.
Can anyone explain the difference? I assume the -0 is causing that result from tan(), although I can't find anything on google that explains why. Why does the exact same calculation spread across different lines result in a different answer?
Thanks
It is probably because of the type of PI.
If you use double it will change to float and then the outcome
will be as you just represent.
But if PI is float both of this test scenarios are equal.
What #Naor says is probably correct. but I'd like to add something.
You probably not getting -4.37xx but -4.37xxxe-xx which is a pretty small negative number.
Since you can always get errors in floating point math. I'd say there is no need to change your code. Both snips are correct.
So this is what, in my opinion, is happening:
In both examples PI is a define, probably defined like this:
#define 3.14 //and some more digits
In C++, number like this is treated as double.
After preprocessing, this expression:
PI / 2.0f
will be treated as double-typed prvalue. This means that this line hides one more operation:
float theta = PI / 2.f;
which is a double-to-float conversion, which definitely looses some precision in this case.
In first example this also happens here:
float p = (PI / 2.f) - theta;
but only after evaluating whole expression. Note that during this evaluation (PI / 2.f) will be still double, but theta will be a float-to-double converted value, which explains the slight difference in result from 0.0.
In your last example you first convert (PI / 2.f) to float:
float p = PI / 2.f;
to subtract float-typed theta from it in next line. Which must result to 0.0, which probably compiler optimized out anyway ; ).
Related
I am writing a program for class that simply calculates distance between two coordinate points (x,y).
differenceofx1 = x1 - x2;
differenceofy1 = y1 - y2;
squareofx1 = differenceofx1 * differenceofx1;
squareofy1 = differenceofy1 * differenceofy1;
distance1 = sqrt(squareofx1 - squareofy1);
When I calculate the distance, it works. However there are some situations such as the result being a square root of a non-square number, or the difference of x1 and x2 / y1 and y2 being negative due to the input order, that it just gives a distance of 0.00000 when the distance is clearly more than 0. I am using double for all the variables, should I use float instead for the negative possibility or does double do the same job? I set the precision to 8 as well but I don't understand why it wouldn't calculate properly?
I am sorry for the simplicity of the question, I am a bit more than a beginner.
You are using the distance formula wrong
it should be
distance1 = sqrt(squareofx1 + squareofy1);
instead of
distance1 = sqrt(squareofx1 - squareofy1);
due to the wrong formula if squareofx1 is less than squareofy1 you get an error as sqrt of a negative number is not possible in case of real coordinates.
Firstly, your formula is incorrect change it to distance1 = sqrt(squareofx1 + squareofy1) as #fefe mentioned. Btw All your calculation can be represented in one line of code:
distance1 = sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2));
No need for variables like differenceofx1, differenceofy1, squareofx1, squareofy1 unless you are using the results stored in these variables again in your program.
Secondly, Double give you more precision than float. If you need precision more than 6-7 places after decimal use Double else float works too. Read more about Float vs Double
I have a OpenCL kernel for some computation. I found only one thread gives different result with CPU codes. I am using vs2010 x64 release mode.
By checking the OpenCL codes by some examples, I found some interesting results. Here are the testing examples in kernel codes.
I tested 3 cases in OpenCl kernel, the precision is checked by printf("%.10f", fval);
case 1:
float fval = (10296184.0) / (float)(x*y*z); // which gives result fval = 3351.6225585938
float fval = (10296184.0f) / (float)(x*y*z); // which gives result fval = 3351.6225585938
Variables are: int x,y, z
these values are computed by some operations. And their values are x=12, y=16, z=16;
case 2:
float fval = (10296184.0) / (float)(12*16*16); // which gives result fval = 3351.6223144531
float fval = (10296184.0f) / (float)(12*16*16); // which gives result fval = 3351.6223144531
case 3:
However, when I compute the difference of fval by using above two expressions, the result is 0 if using 10296184.0.
float fval = (10296184.0) / (float)(x*y*z) - (10296184.0) / (float)(12*16*16); // which gives result fval = 0.0000000000
float fval = (10296184.0f) / (float)(x*y*z) - (10296184.0f) / (float)(12*16*16); // which gives result fval = 0.0001812663
Could anyone explain the reason or give me some hints?
Some observations:
The two float values differ by 1 ULP. So the results differ by a minimum amount.
// Float ULP in the 2's place here
// v
0x1.a2f3ea0000000p+11 3351.622314... // OP's lower float value
0x1.a2f3eaaaaaaabp+11 3351.622395... // higher precision quotient
0x1.a2f3ec0000000p+11 3351.622558... // OP's higher float value
(10296184.0) / (float)(12*16*16) is calculated at compile time as is the closer result to the expected mathematical answer.
float fval = (10296184.0) / (float)(x*y*z) is calculated at run time.
Considering float variables being used, surprising that code is doing this division with double math. This is a double constant divide by a double (which is the promotion of the float product) resulting in a double quotient, converted to a float and then saved. I'd expect 10296184.0f - note the f - to have been used, then the math could have all been done as floats.
C allows different rounding modes denoted by FLT_ROUNDS This may differ at compile time and run time and may explain the difference. Knowing the result of fegetround() (The function gets the current rounding direction.) would help.
OP may have employed various compiler optimizations that sacrifice precision for speed.
C does not specify the precision of math operations, yet good to the last ULP should be expected with * / + - sqrt() modf() on quality platforms. I suspect code suffers from a weak math implementation.
I have such a function that calculates weights according to Gaussian distribution:
const float dx = 1.0f / static_cast<float>(points - 1);
const float sigma = 1.0f / 3.0f;
const float norm = 1.0f / (sqrtf(2.0f * static_cast<float>(M_PI)) * sigma);
const float divsigma2 = 0.5f / (sigma * sigma);
m_weights[0] = 1.0f;
for (int i = 1; i < points; i++)
{
float x = static_cast<float>(i)* dx;
m_weights[i] = norm * expf(-x * x * divsigma2) * dx;
m_weights[0] -= 2.0f * m_weights[i];
}
In all the calc above the number does not matter. The only thing matters is that m_weights[0] = 1.0f; and each time I calculate m_weights[i] I subtract it twice from m_weights[0] like this:
m_weights[0] -= 2.0f * m_weights[i];
to ensure that w[0] + 2 * w[i] (1..N) will sum to exactly 1.0f. But it does not. This assert fails:
float wSum = 0.0f;
for (size_t i = 0; i < m_weights.size(); ++i)
{
float w = m_weights[i];
if (i == 0) {
wSum += w;
} else {
wSum += (w + w);
}
}
assert(wSum == 1.0 && "Weights sum is not 1.");
How can I ensure the sum to be 1.0f on all platforms?
You can't. Floating point isn't like that. Even adding the same values can produce different results according to the cpu used.
All you can do is define some accuracy value and ensure that you end up with 1.0 +/- that value.
See: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Because the precision of float is only 23 bits (see e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format ), rounding error quickly accumulates therefore even if the rest of code is correct, your sum becomes something like 1.0000001 or 0.9999999 (have you watched it in the debugger or tried to print it to console, by the way?). To improve precision you can replace float with double, but still the sum will not be exactly 1.0: the error will just be smaller, something like 1e-16 instead of 1e-7.
The second thing to do is to replace strict comparison to 1.0 with a range comparison, like:
assert(fabs(wSum - 1.0) <= 1e-13 && "Weights sum is not 1.");
Here 1e-13 is the epsilon within which you consider two floating-point numbers equal. If you choose to go with float (not double), you may need epsilon like 1e-6 .
Depending on how large your weights are and how many points there are, accumulated error can become larger than that epsilon. In that case you would need special algorithms for keeping the precision higher, such as sorting the numbers by their absolute values prior to summing them up starting with the smallest numbers.
How can I ensure the sum to be 1.0f on all platforms?
As the other answers (and comments) have stated, you can't achieve this, due to the inexactness of floating point calculations.
One solution is that, instead of using double, use a fixed point or multi-precision library such as GMP, Boost Multiprecision Library, or one of the many others out there.
i must be missing something, whats wrong with this?
float controlFrameRate = 1/60;
It should be assigning something like 0.0166666667 but its coming out 0.00000 etc. is visual studio just lying to me?
That is because 1/60 is an integer, which is 0 because integer division truncates. This is used to initialize the float, giving 0. You can fix it by making the RHS expression a float in the first place:
float controlFrameRate = 1.0f/60;
of
float controlFrameRate = 1/60.0f;
In C++, literals such as 1, 42 etc. are int, 1.0, 3.1416 are double, and the f in 1.0f makes the literal a float. Note that the f could have been omitted in the examples above. However, assigning a double to a float could be problematic if the double's value goes beyond the range of a float.
A division of an integer by another integer yields an integer, and is a truncating operation. You'll get the value that is less than or equal to the actual value.
Make at least one of the constants floating-point to fix it:
float controlFrameRate = 1.0 / 60;
float controlFrameRate = 1 / 60.0;
float controlFrameRate = 1.0 / 60.0;
Do
float controlFrameRate = 1.f/60;
or
float controlFrameRate = 1/60.f;
or
float controlFrameRate = 0.1f/6;
;-)
You should use 1/60.0. otherwise you will not get float result.
Has anyone seen this weird value while handling sin / cos/ tan / acos.. math stuff?
===THE WEIRD VALUE===
-1.#IND00
=====================
void inverse_pos(double x, double y, double& theta_one, double& theta_two)
{
// Assume that L1 = 350 and L2 = 250
double B = sqrt(x*x + y*y);
double angle_beta = atan2(y, x);
double angle_alpha = acos((L2*L2 - B*B - L1*L1) / (-2*B*L1));
theta_one = angle_beta + angle_alpha;
theta_two = atan2((y-L1*sin(theta_one)), (x-L1*cos(theta_one)));
}
This is the code I was working on.
In a particular condition - like when x & y are 10 & 10,
this code stores -1.#IND00 into theta_one & theta_two.
It doesn't look like either characters or numbers :(
Without a doubt, atan2 / acos / stuff are the problems.
But the problem is, try and catch doesn't work either
cuz those double variables have successfully stored
some values in them.
Moreover, the following calculations never complain about it
and never break the program!
I'm thinking of forcing to use this value somehow and make the entire program crash...
So that I can catch this error..
Except for that idea, I have no idea how I should check whether these
theta_one and theta_two variables have stored this crazy values.
Any good ideas?
Thank you in advance..
The "weird value" is NaN (not a number).
The problem is because (L2*L2 - B*B - L1*L1) / (-2*B*L1) = 6.08112… is outside of the range [-1, 1] where acos is well-defined for real numbers. So NaN is returned.
Are you sure the formula is correct?
If you want to catch an NaN, the NaN needs to be a signaling NaN. For gcc, compile with the -fsignaling-nans option.