The result of own double precision cos() implemention in a shader is NaN, but works well on the CPU. What is going wrong? - c++

as i said, i want implement my own double precision cos() function in a compute shader with GLSL, because there is just a built-in version for float.
This is my code:
double faculty[41];//values are calculated at the beginning of main()
double myCOS(double x)
{
double sum,tempExp,sign;
sum = 1.0;
tempExp = 1.0;
sign = -1.0;
for(int i = 1; i <= 30; i++)
{
tempExp *= x;
if(i % 2 == 0){
sum = sum + (sign * (tempExp / faculty[i]));
sign *= -1.0;
}
}
return sum;
}
The result of this code is, that the sum turns out to be NaN on the shader, but on the CPU the algorithm is working well.
I tried to debug this code too and I got the following information:
faculty[i] is positive and not zero for all entries
tempExp is positive in each step
none of the other variables are NaN during each step
the first time sum is NaN is at the step with i=4
and now my question: What exactly can go wrong if each variable is a number and nothing is divided by zero especially when the algorithm works on the CPU?

Let me guess:
First you determined the problem is in the loop, and you use only the following operations: +, *, /.
The rules for generating NaN from these operations are:
The divisions 0/0 and ±∞/±∞
The multiplications 0×±∞ and ±∞×0
The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions
You ruled out the possibility for 0/0 and ±∞/±∞ by stating that faculty[] is correctly initialized.
The variable sign is always 1.0 or -1.0 so it cannot generate the NaN through the * operation.
What remains is the + operation if tempExp ever become ±∞.
So probably tempExp is too high on entry of your function and becomes ±∞, this will make sum to be ±∞ too. At the next iteration you will trigger the NaN generating operation through: ∞ + (−∞). This is because you multiply one side of the addition by sign and sign switches between positive and negative at each iteration.
You're trying to approximate cos(x) around 0.0. So you should use the properties of the cos() function to reduce your input value to a value near 0.0. Ideally in the range [0, pi/4]. For instance, remove multiples of 2*pi, and get the values of cos() in [pi/4, pi/2] by computing sin(x) around 0.0 and so on.

What can go dramatically wrong is a loss of precision. cos(x) usually is implemented by range reduction followed by a dedicated implementation for the range [0, pi/2]. Range reduction uses cos(x+2*pi) = cos(x). But this range reduction isn't perfect. For starters, pi cannot be exactly represented in finite math.
Now what happens if you try something as absurd as cos(1<<30) ? It's quite possible that the range reduction algorithm introduces an error in x that's larger than 2*pi, in which case the outcome is meaningless. Returning NaN in such cases is reasonable.

Related

How does Cpp work with large numbers in calculations?

I have a code that tries to solve an integral of a function in a given interval numerically, using the method of Trapezoidal Rule (see the formula in Trapezoid method ), now, for the function sin(x) in the interval [-pi/2.0,pi/2.0], the integral is waited to be zero.
In this case, I take the number of partitions 'n' equal to 4. The problem is that when I have pi with 20 decimal places it is zero, with 14 decimal places it is 8.72e^(-17), then with 11 decimal places, it is zero, with 8 decimal places it is 8.72e^(-17), with 3 decimal places it is zero. I mean, the integral is zero or a number near zero for different approximations of pi, but it doesn't have a clear trend.
I would appreciate your help in understanding why this happens. (I did run it in Dev-C++).
#include <iostream>
#include <math.h>
using namespace std;
#define pi 3.14159265358979323846
//Pi: 3.14159265358979323846
double func(double x){
return sin(x);
}
int main() {
double x0 = -pi/2.0, xf = pi/2.0;
int n = 4;
double delta_x = (xf-x0)/(n*1.0);
double sum = (func(x0)+func(xf))/2.0;
double integral;
for (int k = 1; k<n; k++){
// cout<<"func: "<<func(x0+(k*delta_x))<<" "<<"last sum: "<<sum<<endl;
sum = sum + func(x0+(k*delta_x));
// cout<<"func + last sum= "<<sum<<endl;
}
integral = delta_x*sum;
cout<<"The value for the integral is: "<<integral<<endl;
return 0;
}
OP is integrating y=sin(x) from -a to +a. The various tests use different values of a, all near pi/2.
The approach uses a linear summation of values near -1.0, down to 0 and then up to near 1.0.
This summation is sensitive to calculation error with the last terms as the final math sum is expected to be 0.0. Since the start/end a varies, the error varies.
A more stable result would be had adding the extreme f = sin(f(k)) values first. e.g. sum += sin(f(k=1)), then sum += sin(f(k=3)), then sum += sin(f(k=2)) rather than k=1,2,3. In particular the formation of term x=f(k=3) is likely a bit off from the negative of its x=f(k=1) earlier term, further compounding the issue.
Welcome to the world or numerical analysis.
Problem exists if code used all float or all long double, just different degrees.
Problem is not due to using an inexact value of pi (Exact value is impossible with FP as pi is irrational and all finite FP are rational).
Much is due to the formation of x. Could try the below to form the x symmetrically about 0.0. Compare exactly x generated this way to x the original way.
x = (x0-x1)/2 + ((k - n/2)*delta_x)
Print out the exact values computed for deeper understanding.
printf("x:%a y:%a\n", x0+(k*delta_x), func(x0+(k*delta_x)));

constrain a value -pi to pi for precision buff

What is the best way to constrain any value from -pi to pi ?
I currently have:
if (fAngle > XM_PI) {
fAngle = fAngle - XM_2PI;
}
else if (fAngle < -XM_PI) {
fAngle = fAngle - -XM_2PI;
}
However, I fear those if's should instead be while's
For reference, under the Exploit Symmetrical Functions section:
https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/understanding-numerical-precision/mitigating-loss-of-precision
Extra bit of precision!
Adding or subtracting XM_2PI cannot restore any accuracy that has been lost. In fact, it adds noise, generally losing more accuracy, because XM_2PI is necessarily only an approximation of 2π. It has some error itself, so adding or subtracting it adds or subtracts the error in the approximation.
What it can do is keep you from losing more accuracy by ensuring that future results remain low in magnitude, thus remaining in a region where the floating-point format has more precision than if the number grew beyond 4, 8, 16, or other points where the exponent changes and the absolute precision becomes worse.
If you already have some value x outside [−π, π] and want its sine or cosine, you should get the best result by using sin(x) or cos(x) directly. Good implementations of sin and cos will reduce the argument using a high-precision value for 2π, so you will get a better result than using sin(x-XM_PI) or cos(x-XM_PI) (unless, by chance, the various errors in these happen to cancel).
So your task with trigonometric functions is not to reduce values you already have but to design your algorithms to keep values from growing. Adding or subtracting 2π is a reasonable way to do this. However, when you do it, add or subtract an extended-precision version of 2π, not just XM_2PI. You can do this by representing 2π as XM_2PI (which should be the value representable in floating-point that is closest to 2π) plus some residue r. r should be the value representable in floating-point that is closest to 2π−XM_2PI. You can calculate that with extended-precision software such as GMP or Maple and can likely find it online. (I do not have it handy or I would paste it here; anybody else is welcome to edit it in.) Then you would update your angle with fAngle = fAngle - XM_2PI - r; or fAngle = fAngle + XM_2PI + r;.
An exception is if you have the angle measured in some unit that you can represent or reduce exactly, such as in degrees (which you can reduce by 360º with no error as long as the number of degrees itself is represented with no error) or in time (such as number of seconds for some function with a period of a day or other rational number of seconds, so you can again reduce with no error). In that case, you can let the angle grow as long as you can represent it exactly, and you would reduce it modulo the period prior to converting it to radians.
The simplest coding way is to use the math library function remainder, as in
fAngle = remainder( fangle, XM_2PI);
STATIC_INLINE_PURE float const __vectorcall constrain(float const fAngle)
{
static constexpr double const
dPI(std::numbers::pi),
d2PI(2.0 * std::numbers::pi),
dResidue(-1.74845553146951715461909770965576171875e-07); // abs difference between d2PI(double precision) and XM_2PI(float precision)
double dAngle(fAngle);
dAngle = std::remainder(dAngle, d2PI);
if (dAngle > dPI) {
dAngle = dAngle - d2PI - dResidue;
}
else if (dAngle < -dPI) {
dAngle = dAngle + d2PI + dResidue;
}
return((float)dAngle);
}

Calculation sine and cosine in one shot

I have a scientific code that uses both sine and cosine of the same argument (I basically need the complex exponential of that argument). I was wondering if it were possible to do this faster than calling sine and cosine functions separately.
Also I only need about 0.1% precision. So is there any way I can find the default trig functions and truncate the power series for speed?
One other thing I have in mind is, is there any way to perform the remainder operation such that the result is always positive? In my own algorithm I used x=fmod(x,2*pi); but then I would need to add 2pi if x is negative (smaller domain means I can use a shorter power series)
EDIT: LUT turned out to be the best approach for this, however I am glad I learned about other approximation techniques. I will also advise using an explicit midpoint approximation. This is what I ended up doing:
const int N = 10000;//about 3e-4 error for 1000//3e-5 for 10 000//3e-6 for 100 000
double *cs = new double[N];
double *sn = new double[N];
for(int i =0;i<N;i++){
double A= (i+0.5)*2*pi/N;
cs[i]=cos(A);
sn[i]=sin(A);
}
The following part approximates (midpoint) sincos(2*pi*(wc2+t[j]*(cotp*t[j]-wc)))
double A=(wc2+t[j]*(cotp*t[j]-wc));
int B =(int)N*(A-floor(A));
re += cs[B]*f[j];
im += sn[B]*f[j];
Another approach could have been using the chebyshev decomposition. You can use the orthogonality property to find the coefficients. Optimized for exponential, it looks like this:
double fastsin(double x){
x=x-floor(x/2/pi)*2*pi-pi;//this line can be improved, both inside this
//function and before you input it into the function
double x2 = x*x;
return (((0.00015025063885163012*x2-
0.008034350857376128)*x2+ 0.1659789684145034)*x2-0.9995812174943602)*x;} //7th order chebyshev approx
If you seek fast evaluation with good (but not high) accuracy with powerseries you should use an expansion in Chebyshev polynomials: tabulate the coefficients (you'll need VERY few for 0.1% accuracy) and evaluate the expansion with the recursion relations for these polynomials (it's really very easy).
References:
Tabulated coefficients: http://www.ams.org/mcom/1980-34-149/S0025-5718-1980-0551302-5/S0025-5718-1980-0551302-5.pdf
Evaluation of chebyshev expansion: https://en.wikipedia.org/wiki/Chebyshev_polynomials
You'll need to (a) get the "reduced" argument in the range -pi/2..+pi/2 and consequently then (b) handle the sign in your results when the argument actually should have been in the "other" half of the full elementary interval -pi..+pi. These aspects should not pose a major problem:
determine (and "remember" as an integer 1 or -1) the sign in the original angle and proceed with the absolute value.
use a modulo function to reduce to the interval 0..2PI
Determine (and "remember" as an integer 1 or -1) whether it is in the "second" half and, if so, subtract pi*3/2, otherwise subtract pi/2. Note: this effectively interchanges sine and cosine (apart from signs); take this into account in the final evaluation.
This completes the step to get an angle in -pi/2..+pi/2
After evaluating sine and cosine with the Cheb-expansions, apply the "flags" of steps 1 and 3 above to get the right signs in the values.
Just create a lookup table. The following will let you lookup the sin and cos of any radian value between -2PI and 2PI.
// LOOK UP TABLE
var LUT_SIN_COS = [];
var N = 14400;
var HALF_N = N >> 1;
var STEP = 4 * Math.PI / N;
var INV_STEP = 1 / STEP;
// BUILD LUT
for(var i=0, r = -2*Math.PI; i < N; i++, r += STEP) {
LUT_SIN_COS[2*i] = Math.sin(r);
LUT_SIN_COS[2*i + 1] = Math.cos(r);
}
You index into the lookup table by:
var index = ((r * INV_STEP) + HALF_N) << 1;
var sin = LUT_SIN_COS[index];
var cos = LUT_SIN_COS[index + 1];
Here's a fiddle that displays the % error you can expect from different sized LUTS http://jsfiddle.net/77h6tvhj/
EDIT Here's an ideone (c++) with a ~benchmark~ vs the float sin and cos. http://ideone.com/SGrFVG For whatever a benchmark on ideone.com is worth the LUT is 5 times faster.
One way to go would be to learn how to implement the CORDIC algorithm. It is not difficult and pretty interesting intelectually. This gives you both the cosine and the sine. Wikipedia gives a MATLAB example that should be easy to adapt in C++.
Note that you can augment speed and reduce precision simply by lowering the parameter n.
About your second question, it has already been asked here (in C). It seems that there is no simple way.
You can also calculate sine using a square root, given the angle and the cosine.
The example below assumes the angle ranges from 0 to 2π:
double c = cos(angle);
double s = sqrt(1.0-c*c);
if(angle>pi)s=-s;
For single-precision floats, Microsoft uses 11-degree polynomial approximation for sine, 10-degree for cosine: XMScalarSinCos.
They also have faster version, XMScalarSinCosEst, that uses lower-degree polynomials.
If you aren’t on Windows, you’ll find same code + coefficients on geometrictools.com under Boost license.

Oh where has my precision gone with OpenMesh vector arithmetic?

Using doubles I would expect to have about 15 decimal points of precision. I know that many decimal numbers are not exactly representable in floating point notation, so I would get an approximation for 1/3 for example. However, using a double I would expect an approximation that was correct to about 15 decimal points. I would also expect to retain that level of accuracy when doing arithmetic.
However, in the following example, I try to calculate the area of a triangle using Heron's formula and OpenMesh::Vec3d which are backed by OpenMesh::VectorDataT<double,3> and end up with a result that is only accurate to 5 decimal points.
The correct result is area = 8.19922e-8, but I'm getting area=8.1992238711962083e-8. Any ideas where this is coming from?
The suggestion that this might result from the instability in Heron's Formula is a good one, but unfortunately is not the case in this example. I have added code which calculates the stable variation on Heron for those who might be interested. In this example, u.norm()>v.norm()>w.norm().
#include <OpenMesh/Core/Mesh/PolyMesh_ArrayKernelT.hh>
int main()
{
//triangle vertices
OpenMesh::Vec3d x(0.051051, 0.057411, 0.001355);
OpenMesh::Vec3d y(0.050981, 0.057337, -0.000678);
OpenMesh::Vec3d z(0.050949, 0.057303, 0.0);
//edge vectors
OpenMesh::Vec3d u = x-y;
OpenMesh::Vec3d v = x-z;
OpenMesh::Vec3d w = y-z;
//Heron's Formula
double semiP = (u.norm() + v.norm() + w.norm())/2.0;
double area = sqrt(semiP * (semiP - u.norm()) * (semiP - v.norm()) * (semiP - w.norm()) );
//Heron's Formula for small angles
double areaSmall = sqrt((u.norm() + (v.norm()+w.norm()))*(w.norm()-(u.norm()-v.norm()))*(w.norm()+(u.norm()-v.norm()))*(u.norm()+(v.norm()-w.norm())))/4.0;
}
Heron's formula is numerically unstable. If you have a very "flat" triangle with small angles, the sum of the two small sides is almost the long side, so one of the terms gets very small. If, for example, a and b are the small sides,
(s - c)
will be very small, because
s = (a + b + c)/2
is nearly equal to c.
The wikipedia article about herons formula mentions a stable alternative:
Arrange the sides such that a > b > c and use
A = 1/4*sqrt((a + (b + c))*(c - (a - b))*(c + (a - b))*(a + (b - c)))
To 75 decimal places, the correct area of your triangle is
0.000000081992238711963087279421583920293974467992093148008322378721298327364.
If I replace the nine double constants you have with their decimal equivalents, I get
0.000000081992238711965902754749500279615357792172906541206211853522524016959
It would appear that you are not getting what you're expecting because you're expecting something unreasonable.
Any calculation involving subtraction will result in a loss of precision, if the values are at all close to each other. How many significant digits do you expect from this subtraction?
1.23456789012345
- 1.23456789000000
----------------
0.00000000012345
Both operands have 15 digits of precision, but the result only has 5.

C++ double division by 0.0 versus DBL_MIN

When finding the inverse square root of a double, is it better to clamp invalid non-positive inputs at 0.0 or MIN_DBL? (In my example below double b may end up being negative due to floating point rounding errors and because the laws of physics are slightly slightly fudged in the game.)
Both division by 0.0 and MIN_DBL produce the same outcome in the game because 1/0.0 and 1/DBL_MIN are effectively infinity. My intuition says MIN_DBL is the better choice, but would there be any case for using 0.0? Like perhaps sqrt(0.0), 1/0.0 and multiplication by 1.#INF000000000000 execute faster because they are special cases.
double b = 1 - v.length_squared()/(c*c);
#ifdef CLAMP_BY_0
if (b < 0.0) b = 0.0;
#endif
#ifdef CLAMP_BY_DBL_MIN
if (b <= 0.0) b = DBL_MIN;
#endif
double lorentz_factor = 1/sqrt(b);
double division in MSVC:
1/0.0 = 1.#INF000000000000
1/DBL_MIN = 4.4942328371557898e+307
When dealing with floating point math, "infinity" and "effectively infinity" are quite different. Once a number stops being finite, it tends to stay that way. So while the value of lorentz_factor is "effectively" the same for both methods, depending on how you use that value, later computations can be radically different. sqrt(lorentz_factor) for instance remains infinite if you clamp to 0, but will actually be calculated if you clamp to some very very small number.
So the answer will largely depend on what you plan on doing with that value once you've clamped it.
Why not just assign INF to lorentz_factor directly, avoiding both the sqrt call and the division?
double lorentz_factor;
if (b <= 0.0)
lorentz_factor = std::numeric_limits<double>::infinity();
else
lorentz_factor = 1/sqrt(b);
You'll need to #include <limits> for this.
You can also use ::max() instead of ::infinity(), if that's what you need.