sin function not accurate compared to math lib sin

sin function not accurate compared to math lib sin - c++

I have been trying to implement a custom sin function that is fast, but more importantly, accurate (I cannot use math.h sin in my project). I'm not an expert when it comes to this kind of math, so work with me XD. After a little searching on the web i found the following code, the function is returning inaccurate results in certain cases.
float SinF(float X)
{
float Sine;
if (X < -3.14159265F) X += 6.28318531F;
else if (X > 3.14159265F) X -= 6.28318531F;
if (X < 0)
{
Sine = 1.27323954F * X + .405284735F * X * X;
if (Sine < 0) Sine = .225F * (Sine *-Sine - Sine) + Sine;
else Sine = .225F * (Sine * Sine - Sine) + Sine;
}
else
{
Sine = 1.27323954F * X - 0.405284735F * X * X;
if (Sine < 0) Sine = .225F * (Sine *-Sine - Sine) + Sine;
else Sine = .225F * (Sine * Sine - Sine) + Sine;
}
return Sine;
}
Examples:
Bad result example 1:
Value Passed: 1.57079637
Returned Value: 0.999999881
Correct Value: 1.00000000
Bad result example 2:
Value Passed: 1.76704633
Returned Value: 0.980933487
Correct Value: 0.980804682
Bad result example 3:
Value Passed: 1.96329641
Returned Value: 0.924392164
Correct Value: 0.923955679
Any help would be appreciated.

There's a bunch of potential implementations of sin and friends in this SO question, but typically it boils down to a few usual methods:
Built-in processor code (fsin)
Taylor series
CORDIC
Lookup tables with optional linear (or better) interpolation (mainly for speed, less accurate)
There (lots) of other methods but these are the more common ones I've seen.
Also be aware of the inherent precision limits of floating point (as user657267 linked to). For example, 1.57079637 is not exactly pi/2 so its sin() may not be exactly 1. In fact, all your "correct" values listed are not perfectly accurate. You have to decide just how accurate is good enough for your application.

Related

Numerical precision for difference of squares

in my code I often compute things like the following piece (here C code for simplicity):
float cos_theta = /* some simple operations; no cosf call! */;
float sin_theta = sqrtf(1.0f - cos_theta * cos_theta); // Option 1
For this example ignore that the argument of the square root might be negative due to imprecisions. I fixed that with additional fdimf call. However, I wondered if the following is more precise:
float sin_theta = sqrtf((1.0f + cos_theta) * (1.0f - cos_theta)); // Option 2
cos_theta is between -1 and +1 so for each choice there will be situations where I subtract similar numbers and thus will loose precision, right? What is the most precise and why?

The most precise way with floats is likely to compute both sin and cos using a single x87 instruction, fsincos.
However, if you need to do the computation manually, it's best to group arguments with similar magnitudes. This means the second option is more precise, especially when cos_theta is close to 0, where precision matters the most.
As the article
What Every Computer Scientist Should Know About Floating-Point Arithmetic notes:
The expression x2 - y2 is another formula that exhibits catastrophic
cancellation. It is more accurate to evaluate it as (x - y)(x + y).
Edit: it's more complicated than this. Although the above is generally true, (x - y)(x + y) is slightly less accurate when x and y are of very different magnitudes, as the footnote to the statement explains:
In this case, (x - y)(x + y) has three rounding errors, but x2 - y2 has only two since the rounding error committed when computing the smaller of x2 and y2 does not affect the final subtraction.
In other words, taking x - y, x + y, and the product (x - y)(x + y) each introduce rounding errors (3 steps of rounding error). x2, y2, and the subtraction x2 - y2 also each introduce rounding errors, but the rounding error obtained by squaring a relatively small number (the smaller of x and y) is so negligible that there are effectively only two steps of rounding error, making the difference of squares more precise.
So option 1 is actually going to be more precise. This is confirmed by dev.brutus's Java test.

I wrote small test. It calcutates expected value with double precision. Then it calculates an error with your options. The first option is better:
Algorithm: FloatTest$1
option 1 error = 3.802792362162126
option 2 error = 4.333273185303996
Algorithm: FloatTest$2
option 1 error = 3.802792362167937
option 2 error = 4.333273185305868
The Java code:
import org.junit.Test;
public class FloatTest {
#Test
public void test() {
testImpl(new ExpectedAlgorithm() {
public double te(double cos_theta) {
return Math.sqrt(1.0f - cos_theta * cos_theta);
}
});
testImpl(new ExpectedAlgorithm() {
public double te(double cos_theta) {
return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
}
});
}
public void testImpl(ExpectedAlgorithm ea) {
double delta1 = 0;
double delta2 = 0;
for (double cos_theta = -1; cos_theta <= 1; cos_theta += 1e-8) {
double[] delta = delta(cos_theta, ea);
delta1 += delta[0];
delta2 += delta[1];
}
System.out.println("Algorithm: " + ea.getClass().getName());
System.out.println("option 1 error = " + delta1);
System.out.println("option 2 error = " + delta2);
}
private double[] delta(double cos_theta, ExpectedAlgorithm ea) {
double expected = ea.te(cos_theta);
double delta1 = Math.abs(expected - t1((float) cos_theta));
double delta2 = Math.abs(expected - t2((float) cos_theta));
return new double[]{delta1, delta2};
}
private double t1(float cos_theta) {
return Math.sqrt(1.0f - cos_theta * cos_theta);
}
private double t2(float cos_theta) {
return Math.sqrt((1.0f + cos_theta) * (1.0f - cos_theta));
}
interface ExpectedAlgorithm {
double te(double cos_theta);
}
}

The correct way to reason about numerical precision of some expression is to:
Measure the result discrepancy relative to the correct value in ULPs (Unit in the last place), introduced in 1960. by W. H. Kahan. You can find C, Python & Mathematica implementations here, and learn more on the topic here.
Discriminate between two or more expressions based on the worst case they produce, not average absolute error as done in other answers or by some other arbitrary metric. This is how numerical approximation polynomials are constructed (Remez algorithm), how standard library methods' implementations are analysed (e.g. Intel atan2), etc...
With that in mind, version_1: sqrt(1 - x * x) and version_2: sqrt((1 - x) * (1 + x)) produce significantly different outcomes. As presented in the plot below, version_1 demonstrates catastrophic performance for x close to 1 with error > 1_000_000 ulps, while on the other hand error of version_2 is well behaved.
That is why I always recommend using version_2, i.e. exploiting the square difference formula.
Python 3.6 code that produces square_diff_error.csv file:
from fractions import Fraction
from math import exp, fabs, sqrt
from random import random
from struct import pack, unpack
def ulp(x):
"""
Computing ULP of input double precision number x exploiting
lexicographic ordering property of positive IEEE-754 numbers.
The implementation correctly handles the special cases:
- ulp(NaN) = NaN
- ulp(-Inf) = Inf
- ulp(Inf) = Inf
Author: Hrvoje Abraham
Date: 11.12.2015
Revisions: 15.08.2017
26.11.2017
MIT License https://opensource.org/licenses/MIT
:param x: (float) float ULP will be calculated for
:returns: (float) the input float number ULP value
"""
# setting sign bit to 0, e.g. -0.0 becomes 0.0
t = abs(x)
# converting IEEE-754 64-bit format bit content to unsigned integer
ll = unpack('Q', pack('d', t))[0]
# computing first smaller integer, bigger in a case of ll=0 (t=0.0)
near_ll = abs(ll - 1)
# converting back to float, its value will be float nearest to t
near_t = unpack('d', pack('Q', near_ll))[0]
# abs takes care of case t=0.0
return abs(t - near_t)
with open('e:/square_diff_error.csv', 'w') as f:
for _ in range(100_000):
# nonlinear distribution of x in [0, 1] to produce more cases close to 1
k = 10
x = (exp(k) - exp(k * random())) / (exp(k) - 1)
fx = Fraction(x)
correct = sqrt(float(Fraction(1) - fx * fx))
version1 = sqrt(1.0 - x * x)
version2 = sqrt((1.0 - x) * (1.0 + x))
err1 = fabs(version1 - correct) / ulp(correct)
err2 = fabs(version2 - correct) / ulp(correct)
f.write(f'{x},{err1},{err2}\n')
Mathematica code that produces the final plot:
data = Import["e:/square_diff_error.csv"];
err1 = {1 - #[[1]], #[[2]]} & /# data;
err2 = {1 - #[[1]], #[[3]]} & /# data;
ListLogLogPlot[{err1, err2}, PlotRange -> All, Axes -> False, Frame -> True,
FrameLabel -> {"1-x", "error [ULPs]"}, LabelStyle -> {FontSize -> 20}]

As an aside, you will always have a problem when theta is small, because the cosine is flat around theta = 0. If theta is between -0.0001 and 0.0001 then cos(theta) in float is exactly one, so your sin_theta will be exactly zero.
To answer your question, when cos_theta is close to one (corresponding to a small theta), your second computation is clearly more accurate. This is shown by the following program, that lists the absolute and relative errors for both computations for various values of cos_theta. The errors are computed by comparing against a value which is computed with 200 bits of precision, using GNU MP library, and then converted to a float.
#include <math.h>
#include <stdio.h>
#include <gmp.h>
int main()
{
int i;
printf("cos_theta abs (1) rel (1) abs (2) rel (2)\n\n");
for (i = -14; i < 0; ++i) {
float x = 1 - pow(10, i/2.0);
float approx1 = sqrt(1 - x * x);
float approx2 = sqrt((1 - x) * (1 + x));
/* Use GNU MultiPrecision Library to get 'exact' answer */
mpf_t tmp1, tmp2;
mpf_init2(tmp1, 200); /* use 200 bits precision */
mpf_init2(tmp2, 200);
mpf_set_d(tmp1, x);
mpf_mul(tmp2, tmp1, tmp1); /* tmp2 = x * x */
mpf_neg(tmp1, tmp2); /* tmp1 = -x * x */
mpf_add_ui(tmp2, tmp1, 1); /* tmp2 = 1 - x * x */
mpf_sqrt(tmp1, tmp2); /* tmp1 = sqrt(1 - x * x) */
float exact = mpf_get_d(tmp1);
printf("%.8f %.3e %.3e %.3e %.3e\n", x,
fabs(approx1 - exact), fabs((approx1 - exact) / exact),
fabs(approx2 - exact), fabs((approx2 - exact) / exact));
/* printf("%.10f %.8f %.8f %.8f\n", x, exact, approx1, approx2); */
}
return 0;
}
Output:
cos_theta abs (1) rel (1) abs (2) rel (2)
0.99999988 2.910e-11 5.960e-08 0.000e+00 0.000e+00
0.99999970 5.821e-11 7.539e-08 0.000e+00 0.000e+00
0.99999899 3.492e-10 2.453e-07 1.164e-10 8.178e-08
0.99999684 2.095e-09 8.337e-07 0.000e+00 0.000e+00
0.99998999 1.118e-08 2.497e-06 0.000e+00 0.000e+00
0.99996835 6.240e-08 7.843e-06 9.313e-10 1.171e-07
0.99989998 3.530e-07 2.496e-05 0.000e+00 0.000e+00
0.99968380 3.818e-07 1.519e-05 0.000e+00 0.000e+00
0.99900001 1.490e-07 3.333e-06 0.000e+00 0.000e+00
0.99683774 8.941e-08 1.125e-06 7.451e-09 9.376e-08
0.99000001 5.960e-08 4.225e-07 0.000e+00 0.000e+00
0.96837723 1.490e-08 5.973e-08 0.000e+00 0.000e+00
0.89999998 2.980e-08 6.837e-08 0.000e+00 0.000e+00
0.68377221 5.960e-08 8.168e-08 5.960e-08 8.168e-08
When cos_theta is not close to one, then the accuracy of both methods is very close to each other and to round-off error.

[Edited for major think-o] It looks to me like option 2 will be better, because for a number like 0.000001 for example option 1 will return the sine as 1 while option will return a number just smaller than 1.

No difference in my option since (1-x) preserves the precision not effecting the carried bit. Then for (1+x) the same is true. Then the only thing effecting the carry bit precision is the multiplication. So in both cases there is one single multiplication, so they are both as likely to give the same carry bit error.

How i can make matlab precision to be the same as in c++?

I have problem with precision. I have to make my c++ code to have same precision as matlab. In matlab i have script which do some stuff with numbers etc. I got code in c++ which do the same as that script. Output on the same input is diffrent :( I found that in my script when i try 104 >= 104 it returns false. I tried to use format long but it did not help me to find out why its false. Both numbers are type of double. i thought that maybe matlab stores somewhere the real value of 104 and its for real like 103.9999... So i leveled up my precision in c++. It also didnt help because when matlab returns me value of 50.000 in c++ i got value of 50.050 with high precision. Those 2 values are from few calculations like + or *. Is there any way to make my c++ and matlab scrips have same precision?
for i = 1:neighbors
y = spoints(i,1)+origy;
x = spoints(i,2)+origx;
% Calculate floors, ceils and rounds for the x and y.
fy = floor(y); cy = ceil(y); ry = round(y);
fx = floor(x); cx = ceil(x); rx = round(x);
% Check if interpolation is needed.
if (abs(x - rx) < 1e-6) && (abs(y - ry) < 1e-6)
% Interpolation is not needed, use original datatypes
N = image(ry:ry+dy,rx:rx+dx);
D = N >= C;
else
% Interpolation needed, use double type images
ty = y - fy;
tx = x - fx;
% Calculate the interpolation weights.
w1 = (1 - tx) * (1 - ty);
w2 = tx * (1 - ty);
w3 = (1 - tx) * ty ;
w4 = tx * ty ;
%Compute interpolated pixel values
N = w1*d_image(fy:fy+dy,fx:fx+dx) + w2*d_image(fy:fy+dy,cx:cx+dx) + ...
w3*d_image(cy:cy+dy,fx:fx+dx) + w4*d_image(cy:cy+dy,cx:cx+dx);
D = N >= d_C;
end
I got problems in else which is in line 12. tx and ty eqauls 0.707106781186547 or 1 - 0.707106781186547. Values from d_image are in range 0 and 255. N is value 0..255 of interpolating 4 pixels from image. d_C is value 0.255. Still dunno why matlab shows that when i have in N vlaues like: x x x 140.0000 140.0000 and in d_C: x x x 140 x. D gives me 0 on 4th position so 140.0000 != 140. I Debugged it trying more precision but it still says that its 140.00000000000000 and it is still not 140.
int Codes::Interpolation( Point_<int> point, Point_<int> center , Mat *mat)
{
int x = center.x-point.x;
int y = center.y-point.y;
Point_<double> my;
if(x<0)
{
if(y<0)
{
my.x=center.x+LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x+LEN;
my.y=center.y-LEN;
}
}
else
{
if(y<0)
{
my.x=center.x-LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x-LEN;
my.y=center.y-LEN;
}
}
int a=my.x;
int b=my.y;
double tx = my.x - a;
double ty = my.y - b;
double wage[4];
wage[0] = (1 - tx) * (1 - ty);
wage[1] = tx * (1 - ty);
wage[2] = (1 - tx) * ty ;
wage[3] = tx * ty ;
int values[4];
//wpisanie do tablicy 4 pixeli ktore wchodza do interpolacji
for(int i=0;i<4;i++)
{
int val = mat->at<uchar>(Point_<int>(a+help[i].x,a+help[i].y));
values[i]=val;
}
double moze = (wage[0]) * (values[0]) + (wage[1]) * (values[1]) + (wage[2]) * (values[2]) + (wage[3]) * (values[3]);
return moze;
}
LEN = 0.707106781186547 Values in array values are 100% same as matlab values.

Matlab uses double precision. You can use C++'s double type. That should make most things similar, but not 100%.
As someone else noted, this is probably not the source of your problem. Either there is a difference in the algorithms, or it might be something like a library function defined differently in Matlab and in C++. For example, Matlab's std() divides by (n-1) and your code may divide by n.

First, as a rule of thumb, it is never a good idea to compare floating point variables directly. Instead of, for example instead of if (nr >= 104) you should use if (nr >= 104-e), where e is a small number, like 0.00001.
However, there must be some serious undersampling or rounding error somewhere in your script, because getting 50050 instead of 50000 is not in the limit of common floating point imprecision. For example, Matlab can have a step of as small as 15 digits!
I guess there are some casting problems in your code, for example
int i;
double d;
// ...
d = i/3 * d;
will will give a very inaccurate result, because you have an integer division. d = (double)i/3 * d or d = i/3. * d would give a much more accurate result.
The above example would NOT cause any problems in Matlab, because there everything is already a floating-point number by default, so a similar problem might be behind the differences in the results of the c++ and Matlab code.
Seeing your calculations would help a lot in finding what went wrong.
EDIT:
In c and c++, if you compare a double with an integer of the same value, you have a very high chance that they will not be equal. It's the same with two doubles, but you might get lucky if you perform the exact same computations on them. Even in Matlab it's dangerous, and maybe you were just lucky that as both are doubles, both got truncated the same way.
By you recent edit it seems, that the problem is where you evaluate your array. You should never use == or != when comparing floats or doubles in c++ (or in any languages when you use floating-point variables). The proper way to do a comparison is to check whether they are within a small distance of each other.
An example: using == or != to compare two doubles is like comparing the weight of two objects by counting the number of atoms in them, and deciding that they are not equal even if there is one single atom difference between them.

MATLAB uses double precision unless you say otherwise. Any differences you see with an identical implementation in C++ will be due to floating-point errors.

sin and cos are slow, is there an alternatve?

My game needs to move by a certain angle. To do this I get the vector of the angle via sin and cos. Unfortunately sin and cos are my bottleneck. I'm sure I do not need this much precision. Is there an alternative to a C sin & cos and look-up table that is decently precise but very fast?
I had found this:
float Skeleton::fastSin( float x )
{
const float B = 4.0f/pi;
const float C = -4.0f/(pi*pi);
float y = B * x + C * x * abs(x);
const float P = 0.225f;
return P * (y * abs(y) - y) + y;
}
Unfortunately, this does not seem to work. I get significantly different behavior when I use this sin rather than C sin.
Thanks

A lookup table is the standard solution. You could Also use two lookup tables on for degrees and one for tenths of degrees and utilize sin(A + B) = sin(a)cos(b) + cos(A)sin(b)

For your fastSin(), you should check its documentation to see what range it's valid on. The units you're using for your game could be too big or too small and scaling them to fit within that function's expected range could make it work better.
EDIT:
Someone else mentioned getting it into the desired range by subtracting PI, but apparently there's a function called fmod for doing modulus division on floats/doubles, so this should do it:
#include <iostream>
#include <cmath>
float fastSin( float x ){
x = fmod(x + M_PI, M_PI * 2) - M_PI; // restrict x so that -M_PI < x < M_PI
const float B = 4.0f/M_PI;
const float C = -4.0f/(M_PI*M_PI);
float y = B * x + C * x * std::abs(x);
const float P = 0.225f;
return P * (y * std::abs(y) - y) + y;
}
int main() {
std::cout << fastSin(100.0) << '\n' << std::sin(100.0) << std::endl;
}
I have no idea how expensive fmod is though, so I'm going to try a quick benchmark next.
Benchmark Results
I compiled this with -O2 and ran the result with the Unix time program:
int main() {
float a = 0;
for(int i = 0; i < REPETITIONS; i++) {
a += sin(i); // or fastSin(i);
}
std::cout << a << std::endl;
}
The result is that sin is about 1.8x slower (if fastSin takes 5 seconds, sin takes 9). The accuracy also seemed to be pretty good.
If you chose to go this route, make sure to compile with optimization on (-O2 in gcc).

I know this is already an old topic, but for people who have the same question, here is a tip.
A lot of times in 2D and 3D rotation, all vectors are rotated with a fixed angle. In stead of calling the cos() or sin() every cycle of the loop, create variable before the loop which contains the value of cos(angle) or sin(angle) already. You can use this variable in your loop. This way the function only has to be called once.

If you rephrase the return in fastSin as
return (1-P) * y + P * (y * abs(y))
And rewrite y as (for x>0 )
y = 4 * x * (pi-x) / (pi * pi)
you can see that y is a parabolic first-order approximation to sin(x) chosen so that it passes through (0,0), (pi/2,1) and (pi,0), and is symmetrical about x=pi/2.
Thus we can only expect our function to be a good approximation from 0 to pi. If we want values outside that range we can use the 2-pi periodicity of sin(x) and that sin(x+pi) = -sin(x).
The y*abs(y) is a "correction term" which also passes through those three points. (I'm not sure why y*abs(y) is used rather than just y*y since y is positive in the 0-pi range).
This form of overall approximation function guarantees that a linear blend of the two functions y and y*y, (1-P)*y + P * y*y will also pass through (0,0), (pi/2,1) and (pi,0).
We might expect y to be a decent approximation to sin(x), but the hope is that by picking a good value for P we get a better approximation.
One question is "How was P chosen?". Personally, I'd chose the P that produced the least RMS error over the 0,pi/2 interval. (I'm not sure that's how this P was chosen though)
Minimizing this wrt. P gives
This can be rearranged and solved for p
Wolfram alpha evaluates the initial integral to be the quadratic
E = (16 π^5 p^2 - (96 π^5 + 100800 π^2 - 967680)p + 651 π^5 - 20160 π^2)/(1260 π^4)
which has a minimum of
min(E) = -11612160/π^9 + 2419200/π^7 - 126000/π^5 - 2304/π^4 + 224/π^2 + (169 π)/420
≈ 5.582129689596371e-07
at
p = 3 + 30240/π^5 - 3150/π^3
≈ 0.2248391013559825
Which is pretty close to the specified P=0.225.
You can raise the accuracy of the approximation by adding an additional correction term. giving a form something like return (1-a-b)*y + a y * abs(y) + b y * y * abs(y). I would find a and b by in the same way as above, this time giving a system of two linear equations in a and b to solve, rather than a single equation in p. I'm not going to do the derivation as it is tedious and the conversion to latex images is painful... ;)
NOTE: When answering another question I thought of another valid choice for P.
The problem is that using reflection to extend the curve into (-pi,0) leaves a kink in the curve at x=0. However, I suspect we can choose P such that the kink becomes smooth.
To do this take the left and right derivatives at x=0 and ensure they are equal. This gives an equation for P.

You can compute a table S of 256 values, from sin(0) to sin(2 * pi). Then, to pick sin(x), bring back x in [0, 2 * pi], you can pick 2 values S[a], S[b] from the table, such as a < x < b. From this, linear interpolation, and you should have a fair approximation
memory saving trick : you actually need to store only from [0, pi / 2], and use symmetries of sin(x)
enhancement trick : linear interpolation can be a problem because of non-smooth derivatives, humans eyes is good at spotting such glitches in animation and graphics. Use cubic interpolation then.

What about
x*(0.0174532925199433-8.650935142277599*10^-7*x^2)
for deg and
x*(1-0.162716259904269*x^2)
for rad on -45, 45 and -pi/4 , pi/4 respectively?

This (i.e. the fastsin function) is approximating the sine function using a parabola. I suspect it's only good for values between -π and +π. Fortunately, you can keep adding or subtracting 2π until you get into this range. (Edited to specify what is approximating the sine function using a parabola.)

you can use this aproximation.
this solution use a quadratic curve :
http://www.starming.com/index.php?action=plugin&v=wave&ajax=iframe&iframe=fullviewonepost&mid=56&tid=4825

Magic numbers in C++ implementation of Excel NORMDIST function

Whilst looking for a C++ implementation of Excel's NORMDIST (cumulative)
function I found this on a website:
static double normdist(double x, double mean, double standard_dev)
{
double res;
double x=(x - mean) / standard_dev;
if (x == 0)
{
res=0.5;
}
else
{
double oor2pi = 1/(sqrt(double(2) * 3.14159265358979323846));
double t = 1 / (double(1) + 0.2316419 * fabs(x));
t *= oor2pi * exp(-0.5 * x * x)
* (0.31938153 + t
* (-0.356563782 + t
* (1.781477937 + t
* (-1.821255978 + t * 1.330274429))));
if (x >= 0)
{
res = double(1) - t;
}
else
{
res = t;
}
}
return res;
}
My limited maths knowledge made me think about Taylor series, but I am unable to determine where these numbers come from:
0.2316419,
0.31938153,
-0.356563782,
1.781477937,
-1.821255978,
1.330274429
Can anyone suggest where they come from, and how they can be derived?

Check out Numerical Recipes, chapter 6.2.2. The approximation is standard. Recall that
NormCdf(x) = 0.5 * (1 + erf(x / sqrt(2)))
erf(x) = 2 / (sqrt(pi)) integral(e^(-t^2) dt, t = 0..x)
and write erf as
1 - erf x ~= t * exp(-x^2 + P(t))
for positive x, where
t = 2 / (2 + x)
and since t is between 0 and 1, you can find P by Chebyshev approximation once and for all (Numerical Recipes, section 5.8). You don't use Taylor expansion: you want the approximation to be good in the whole real line, what Taylor expansion cannot guarantee. Chebyshev approximation is the best polynomial approximation in the L^2 norm, which is a good substitute to the very difficult to find minimax polynomial (= best polynomial approximation in the sup norm).
The version here is slightly different. Instead, one writes
1 - erf x = t * exp(-x^2) * P(t)
but the procedure is similar, and normCdf is computed directly, instead of erf.
In particular and very similarly 'the implementation' that you are using differs somewhat from the one that handles in the text, because it is of the form b*exp(-a*z^2)*y(t) but it´s also a Chevishev approx. to the erfc(x) function as you can see in this paper of Schonfelder(1978)[http://www.ams.org/journals/mcom/1978-32-144/S0025-5718-1978-0494846-8/S0025-5718-1978-0494846-8.pdf ]
Also in Numerical Recipes 3rd edition, at the final of the chapter 6.2.2 they provide a C implementation very accurate of the type t*exp(-z^2 + c0 + c1*t+ c2t^2 + c3*t^3 + ... + c9t^9)

Create sine lookup table in C++

How can I rewrite the following pseudocode in C++?
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
I need to create a sine_table lookup table.

You can reduce the size of your table to 25% of the original by only storing values for the first quadrant, i.e. for x in [0,pi/2].
To do that your lookup routine just needs to map all values of x to the first quadrant using simple trig identities:
sin(x) = - sin(-x), to map from quadrant IV to I
sin(x) = sin(pi - x), to map from quadrant II to I
To map from quadrant III to I, apply both identities, i.e. sin(x) = - sin (pi + x)
Whether this strategy helps depends on how much memory usage matters in your case. But it seems wasteful to store four times as many values as you need just to avoid a comparison and subtraction or two during lookup.
I second Jeremy's recommendation to measure whether building a table is better than just using std::sin(). Even with the original large table, you'll have to spend cycles during each table lookup to convert the argument to the closest increment of pi/1000, and you'll lose some accuracy in the process.
If you're really trying to trade accuracy for speed, you might try approximating the sin() function using just the first few terms of the Taylor series expansion.
sin(x) = x - x^3/3! + x^5/5! ..., where ^ represents raising to a power and ! represents the factorial.
Of course, for efficiency, you should precompute the factorials and make use of the lower powers of x to compute higher ones, e.g. use x^3 when computing x^5.
One final point, the truncated Taylor series above is more accurate for values closer to zero, so its still worthwhile to map to the first or fourth quadrant before computing the approximate sine.
Addendum:
Yet one more potential improvement based on two observations:
1. You can compute any trig function if you can compute both the sine and cosine in the first octant [0,pi/4]
2. The Taylor series expansion centered at zero is more accurate near zero
So if you decide to use a truncated Taylor series, then you can improve accuracy (or use fewer terms for similar accuracy) by mapping to either the sine or cosine to get the angle in the range [0,pi/4] using identities like sin(x) = cos(pi/2-x) and cos(x) = sin(pi/2-x) in addition to the ones above (for example, if x > pi/4 once you've mapped to the first quadrant.)
Or if you decide to use a table lookup for both the sine and cosine, you could get by with two smaller tables that only covered the range [0,pi/4] at the expense of another possible comparison and subtraction on lookup to map to the smaller range. Then you could either use less memory for the tables, or use the same memory but provide finer granularity and accuracy.

long double sine_table[2001];
for (int index = 0; index < 2001; index++)
{
sine_table[index] = std::sin(PI * (index - 1000) / 1000.0);
}

One more point: calling trigonometric functions is pricey. if you want to prepare the lookup table for sine with constant step - you may save the calculation time, in expense of some potential precision loss.
Consider your minimal step is "a". That is, you need sin(a), sin(2a), sin(3a), ...
Then you may do the following trick: First calculate sin(a) and cos(a). Then for every consecutive step use the following trigonometric equalities:
sin([n+1] * a) = sin(n*a) * cos(a) + cos(n*a) * sin(a)
cos([n+1] * a) = cos(n*a) * cos(a) - sin(n*a) * sin(a)
The drawback of this method is that during this procedure the round-off error is accumulated.

double table[1000] = {0};
for (int i = 1; i <= 1000; i++)
{
sine_table[i-1] = std::sin(PI * i/ 1000.0);
}
double getSineValue(int multipleOfPi){
if(multipleOfPi == 0) return 0.0;
int sign = 1;
if(multipleOfPi < 0){
sign = -1;
}
return signsine_table[signmultipleOfPi - 1];
}
You can reduce the array length to 500, by a trick sin(pi/2 +/- angle) = +/- cos(angle).
So store sin and cos from 0 to pi/4.
I don't remember from top of my head but it increased the speed of my program.

You'll want the std::sin() function from <cmath>.

another approximation from a book or something
streamin ramp;
streamout sine;
float x,rect,k,i,j;
x = ramp -0.5;
rect = x * (1 - x < 0 & 2);
k = (rect + 0.42493299) *(rect -0.5) * (rect - 0.92493302) ;
i = 0.436501 + (rect * (rect + 1.05802));
j = 1.21551 + (rect * (rect - 2.0580201));
sine = i*j*k*60.252201*x;
full discussion here:
http://synthmaker.co.uk/forum/viewtopic.php?f=4&t=6457&st=0&sk=t&sd=a
I presume that you know, that using a division is a lot slower than multiplying by decimal number, /5 is always slower than *0.2
it's just an approximation.
also:
streamin ramp;
streamin x; // 1.5 = Saw 3.142 = Sin 4.5 = SawSin
streamout sine;
float saw,saw2;
saw = (ramp * 2 - 1) * x;
saw2 = saw * saw;
sine = -0.166667 + saw2 * (0.00833333 + saw2 * (-0.000198409 + saw2 * (2.7526e-006+saw2 * -2.39e-008)));
sine = saw * (1+ saw2 * sine);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js