Using series to approximate log(2) - c++

double k = 0;
int l = 1;
double digits = pow(0.1, 5);
do
{
k += (pow(-1, l - 1)/l);
l++;
} while((log(2)-k)>=digits);
I'm trying to write a little program based on an example I seen using a series of Σ_(l=1) (pow(-1, l - 1)/l) to estimate log(2);
It's supposed to be a guess refinement thing where time it gets closer and closer to the right value until so many digits match.
The above is what I tried but but it's not coming out right. After messing with it for quite a while I can't figure out where I'm messing up.

I assume that you are trying to extimate the natural logarithm of 2 by its Taylor series expansion:
∞ (-1)n + 1
ln(x) = ∑ ――――――――(x - 1)n
n=1 n
One of the problems of your code is the condition choosen to stop the iterations at a specified precision:
do { ... } while((log(2)-k)>=digits);
Besides using log(2) directly (aren't you supposed to find it out instead of using a library function?), at the second iteration (and for every other even iteration) log(2) - k gets negative (-0.3068...) ending the loop.
A possible (but not optimal) fix could be to use std::abs(log(2) - k) instead, or to end the loop when the absolute value of 1.0 / l (which is the difference between two consecutive iterations) is small enough.
Also, using pow(-1, l - 1) to calculate the sequence 1, -1, 1, -1, ... Is really a waste, especially in a series with such a slow convergence rate.
A more efficient series (see here) is:
∞ 1
ln(x) = 2 ∑ ――――――― ((x - 1) / (x + 1))2n + 1
n=0 2n + 1
You can extimate it without using pow:
double x = 2.0; // I want to calculate ln(2)
int n = 1;
double eps = 0.00001,
kpow = (x - 1.0) / (x + 1.0),
kpow2 = kpow * kpow,
dk,
k = 2 * kpow;
do {
n += 2;
kpow *= kpow2;
dk = 2 * kpow / n;
k += dk;
} while ( std::abs(dk) >= eps );

Related

Hyperbolic sine without math.h

im new to code and c++ for a homework assignment im to create a code for sinh without the math file. I understand the math behind sinh, but i have no idea how to code it, any help would be highly appreciated.
According to Wikipedia, there is a Taylor series for sinh:
sinh(x) = x + (pow(x, 3) / 3!) + (pow(x, 5) / 5!) + pow(x, 7) / 7! + ...
One challenge is that you are not allowed to use the pow function. The other is calculating the factorial.
The series is a sum of terms, so you'll need a loop:
double sum = 0.0;
for (unsigned int i = 0; i < NUMBER_OF_TERMS; ++i)
{
sum += Term(i);
}
You could implement Term as a separate function, but you may want to take advantage of declaring and using variables in the loop (that the function may not have access to).
Consider that pow(x, N) expands to x * x * x...
This means that in each iteration the previous value is multiplied by the present value. (This will come in handy later.)
Consider that N! expands to 1 * 2 * 3 * 4 * 5 * ...
This means that in each iteration, the previous value is multiplied by the iteration number.
Let's revisit the loop:
double sum = 0.0;
double power = 1.0;
double factorial = 1.0;
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
}
One issue with the above loop is that the pow and factorial need to be calculated for each iteration, but the Taylor Series terms use the odd iterations. This is solved by calculated the terms for odd iterations:
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
// Calculate sum for odd iterations
if ((i % 2) == 1)
{
// Calculate the term.
sum += //...
}
}
In summary, the pow and factorial functions are broken down into iterative pieces. The iterative pieces are placed into a loop. Since the Taylor Series terms are calculated with odd iteration values, a check is placed into the loop.
The actual calculation of the Taylor Series term is left as an exercise for the OP or reader.

Recursive algorithm for cos taylor series expansion c++

I recently wrote a Computer Science exam where they asked us to give a recursive definition for the cos taylor series expansion. This is the series
cos(x) = 1 - x^2/2! + x^4/4! + x^6/6! ...
and the function signature looks as follows
float cos(int n , float x)
where n represents the number in the series the user would like to calculate till and x represents the value of x in the cos function
I obviously did not get that question correct and I have been trying to figure it out for the past few days but I have hit a brick wall
Would anyone be able to help out getting me started somewhere ?
All answers so far recompute the factorial every time. I surely wouldn't do that. Instead you can write :
float cos(int n, float x)
{
if (n > MAX)
return 1;
return 1 - x*x / ((2 * n - 1) * (2 * n)) * cos(n + 1, x);
}
Consider that cos returns the following (sorry for the dots position) :
You can see that this is true for n>MAX, n=MAX, and so on. The sign alternating and powers of x are easy to see.
Finally, at n=1 you get 0! = 1, so calling cos(1, x) gets you the first MAX terms of the Taylor expansion of cos.
By developing (easier to see when it has few terms), you can see the first formula is equivalent to the following :
For n > 0, you do in cos(n-1, x) a division by (2n-3)(2n-2) of the previous result, and a multiplication by x². You can see that when n=MAX+1 this formula is 1, with n=MAX then it is 1-x²/((2MAX-1)2MAX) and so on.
If you allow yourself helper functions, then you should change the signature of the above to float cos_helper(int n, float x, int MAX) and call it like so :
float cos(int n, float x) { return cos_helper(1, x, n); }
Edit : To reverse the meaning of n from degree of the evaluated term (as in this answer so far) to number of terms (as in the question, and below), but still not recompute the total factorial every time, I would suggest using a two-term relation.
Let us define trivially cos(0,x) = 0 and cos(1,x) = 1, and try to achieve generally cos(n,x) the sum of the n first terms of the Taylor series.
Then for each n > 0, we can write, cos(n,x) from cos(n-1,x) :
cos(n,x) = cos(n-1,x) + x2n / (2n)!
now for n > 1, we try to make the last term of cos(n-1,x) appear (because it is the closest term to the one we want to add) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( x2n-2 / (2n-2)! )
By combining this formula with the previous one (adapting it to n-1 instead of n) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( cos(n-1,x) - cos(n-2,x) )
We now have a purely recursive definition of cos(n,x), without helper function, without recomputing the factorial, and with n the number of terms in the sum of the Taylor decomposition.
However, I must stress that the following code will perform terribly :
performance wise, unless some optimization allows to not re-evaluate a cos(n-1,x) that was evaluated at the previous step as cos( (n-1) - 1, x)
precision wise, because of cancellation effects : the precision with which we get x2n-2 / (2n-2)! is very bad
Now this disclaimer is in place, here comes the code :
float cos(int n, float x)
{
if (n < 2)
return n;
float c = x * x / (2 * (n - 1) * 2 * n);
return (1-c) * cos(n-1, x) + c * cos(n-2, x);
}
cos(x)=1 - x^2/2! + x^4/4! - x^6/6! + x^8/8!.....
=1-x^2/2 (1 - x^2/3*4 + x^4/3*4*5*6 -x^6/3*4*5*6*7*8)
=1 - x^2/2 {1- x^2/3*4 (1- x^2/5*6 + x^4/5*6*7*8)}
=1 - x^2/2 [1- x^2/3*4 {1- x^2/5*6 ( 1- x^2/7*8)}]
double cos_series_recursion(double x, int n, double r=1){
if(n>0){
r=1-((x*x*r)/(n*(n-1)));
return cos_series_recursion(x,n-2,r);
}else return r;
}
A simple approach that makes use of static variables:
double cos(double x, int n) {
static double p = 1, f = 1;
double r;
if(n == 0)
return 1;
r = cos(x, n-1);
p = (p*x)*x;
f = f*(2*n-1)*2*n;
if(n%2==0) {
return r+p/f;
} else {
return r-p/f;
}
}
Notice that I'm multiplying 2*n in the operation to get the next factorial.
Having n align to the factorial we need makes this easy to do in 2 operations: f = f * (n - 1) then f = f * n.
when n = 1, we need 2!
when n = 2, we need 4!
when n = 3, we need 6!
So we can safely double n and work from there. We could write:
n = 2*n;
f = f*(n-1);
f = f*n;
If we did this, we would need to update our even/odd check to if((n/2)%2==0) since we're doubling the value of n.
This can instead be written as f = f*(2*n-1)*2*n; and now we don't have to divide n when checking if it's even/odd, since n is not being altered.
You can use a loop or recursion, but I would recommend a loop. Anyway, if you must use recursion you could use something like the code below
#include <iostream>
using namespace std;
int fact(int n) {
if (n <= 1) return 1;
else return n*fact(n-1);
}
float Cos(int n, float x) {
if (n == 0) return 1;
return Cos(n-1, x) + (n%2 ? -1 : 1) * pow (x, 2*n) / (fact(2*n));
}
int main()
{
cout << Cos(6, 3.14/6);
}
Just do it like the sum.
The parameter n in float cos(int n , float x) is the l and now just do it...
Some pseudocode:
float cos(int n , float x)
{
//the sum-part
float sum = pow(-1, n) * (pow(x, 2*n))/faculty(2*n);
if(n <= /*Some predefined maximum*/)
return sum + cos(n + 1, x);
return sum;
}
The usual technique when you want to recurse but the function arguments don't carry the information that you need, is to introduce a helper function to do the recursion.
I have the impression that in the Lisp world the convention is to name such a function something-aux (short for auxiliary), but that may have been just a limited group in the old days.
Anyway, the main problem here is that n represents the natural ending point for the recursion, the base case, and that you then also need some index that works itself up to n. So, that's one good candidate for extra argument for the auxiliary function. Another candidate stems from considering how one term of the series relates to the previous one.

Step Independent Smoothing

I often smooth values by blending percentages and inverse percentages with the below:
current_value = (current_value * 0.95f) + (goal_value * 0.05f)
I'm running into a situation where I would like to perform this action n times, and n is a floating point value.
What would be the proper way of performing the above, say 12.5 times for example?
One way of doing this would be to handle the integer amount, and then approximate the remaining amount. For example (I assume valid inputs, you would want to check for those):
void Smooth(float& current, float goal, float times, float factor){
// Handle the integer steps;
int numberIterations = (int)times;
for (int i = 0; i < numberIterations; ++i){
current = (factor * current) + (goal * (1 - factor));
}
// Aproximate the rest of the step
float remainingIteration = times - numberIterations;
float adjusted_factor = factor + ((1 - factor) * (1 - remainingIteration));
current = (adjusted_factor * current) + (goal * (1 - adjusted_factor));
}
Running for the following values I get:
current=1 goal=2 factor=0.95
12.0 times - 1.45964
12.5 times - 1.47315
13.0 times - 1.48666
I appreciate the help! I have been trying several things related to compound interest, and I believe I may have solved this with the following. My ultimate goal in mind (which was actually unstated here) was to actually do this with very little iterative processing. powf() may be the most time consuming piece here.
float blend_n(float c, float g, float f, float n)
{
if (g != 0.0f)
return c + ((g - c) / g) * (g - g * powf(1.0f - f, n));
else
return c * powf(1.0 - f, n);
}
It's late here, and my redbull is wearing off so there may be some parts that could be factored out.
Usage would be setting c to the return value of blend_n ...
Thanks again!
[EDIT]
I should explain here that c is the (current) value, g is the (goal) value, f is the (factor), and n is the (number of steps)
[/EDIT]
[EDIT2]
An exception has to be made for goal values of 0, as it will result in a NaN (Not a Number) ... Change done to the code above
[/EDIT2]

C++ variables always coming out as zero

I'm running a simple for loop with some if statements. In this for loop, 3 variables are to be given a value depending on the index value in the for loop. It seems fairly simple, however, when I run the code, the values always come out as zero and I have no idea why this is happening. My for loop is provided below. I appreciate any suggestions.
double A [N+1];
double r;
double s;
double v;
for(int i = 2; i < N+1; i++)
{
if(i == 2)
{
r = 1/2/i/(i-1);
s = -1/2/(i*i - 1);
v = 1/4/i/(i+1);
}
else if(i <= N-2 && i > 2)
{
r = 1/4/i/(i-1);
s = -1/2/(i*i - 1);
v = 1/4/i/(i+1);
}
else if(i <= N-4 && i > N-2)
{
r = 1/4/i/(i-1);
s = 0;
v = 1/4/i/(i+1);
}
else
{
r = 1/4/i/(i-1);
s = 0;
v = 0;
}
A[i] = r*F[i-2] + s*F[i] + v*F[i+2];
cout << r << s << v << endl;
}
It’s happening because you’re using integer division. An example:
r = 1/2/i/(i-1);
This is the same as:
r = ((1 / 2) / i) / (i - 1);
Which is the same as:
r = (0 / i) / (i - 1);
… which is the same as:
r = 0 / (i - 1);
… which is 0.
Because 1 / 2 is 0 in integer arithmetic. To fix this, use floating point values.
Three things:
else if(i <= N-4 && i > N-2) makes no sense, that condition cannot hold
all your divisions are integer divisions - to fix, convert one of the numbers to a double.
as a result of 1, when i = N-1, and i = N, then the last branch is taken where you force two variables to 0 anyway!
1, 2 and 4 are integers. In integerland 1/2 = 0 and 1/4 = 0
With integers, 1/2 is zero. I would suggest (for a start) changing constants like 2 into 2.0 to ensure they're treated as doubles.
You may also want to (though it may not be necessary) cast all your i variables to floating point values as well, just for completeness, such as:
r = 1.0 / 2.0 / (double)i / ((double)i - 1.0);
The fact that r is a double in no way affects the calculations done on the right of the =. It only affects the final bit (the actual assignment).
1/2, 1/4 and -1/2 will always be zero because of the integer division.So try with 1.0/2.0, 1.0/4.0 and -1.0/2.0 to get it sorted out quickly. But follow the basics and do not use many magic numbers inside a code. Consider creating constants for them and use .

Range Reduction Poor Precision For Single Precision Floating Point

I am trying to implement range reduction as the first step of implementing the sine function.
I am following the method described in the paper "ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by K.C. NG
I am getting error as large as 0.002339146 when using the input range of x from 0 to 20000. My error obviously shouldn't be that large, and I'm not sure how I can reduce it. I noticed that the error magnitude is associated with the input theta magnitude to cosine/sine.
I was able to obtain the nearpi.c code that the paper mentions, but I'm not sure how to utilize the code for single precision floating point. If anyone is interested, the nearpi.c file can be found at this link: nearpi.c
Here is my MATLAB code:
x = 0:0.1:20000;
% Perform range reduction
% Store constant 2/pi
twooverpi = single(2/pi);
% Compute y
y = (x.*twooverpi);
% Compute k (round to nearest integer
k = round(y);
% Solve for f
f = single(y-k);
% Solve for r
r = single(f*single(pi/2));
% Find last two bits of k
n = bitand(fi(k,1,32,0),fi(3,1,32,0));
n = single(n);
% Preallocate for speed
z(length(x)) = 0;
for i = 1:length(x)
switch(n(i))
case 0
z(i)=sin(r(i));
case 1
z(i) = single(cos(r(i)));
case 2
z(i) = -sin(r(i));
case 3
z(i) = single(-cos(r(i)));
otherwise
end
end
maxerror = max(abs(single(z - single(sin(single(x))))))
minerror = min(abs(single(z - single(sin(single(x))))))
I have edited the program nearpi.c so that it compiles. However I am not sure how to interpret the output. Also the file expects an input, which I had to input by hand, also I am not sure of the significance of the input.
Here is the working nearpi.c:
/*
============================================================================
Name : nearpi.c
Author :
Version :
Copyright : Your copyright notice
Description : Hello World in C, Ansi-style
============================================================================
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/*
* Global macro definitions.
*/
# define hex( double ) *(1 + ((long *) &double)), *((long *) &double)
# define sgn(a) (a >= 0 ? 1 : -1)
# define MAX_k 2500
# define D 56
# define MAX_EXP 127
# define THRESHOLD 2.22e-16
/*
* Global Variables
*/
int CFlength, /* length of CF including terminator */
binade;
double e,
f; /* [e,f] range of D-bit unsigned int of f;
form 1X...X */
// Function Prototypes
int dbleCF (double i[], double j[]);
void input (double i[]);
void nearPiOver2 (double i[]);
/*
* This is the start of the main program.
*/
int main (void)
{
int k; /* subscript variable */
double i[MAX_k],
j[MAX_k]; /* i and j are continued fractions
(coeffs) */
// fp = fopen("/src/cfpi.txt", "r");
/*
* Compute global variables e and f, where
*
* e = 2 ^ (D-1), i.e. the D bit number 10...0
* and
* f = 2 ^ D - 1, i.e. the D bit number 11...1 .
*/
e = 1;
for (k = 2; k <= D; k = k + 1)
e = 2 * e;
f = 2 * e - 1;
/*
* Compute the continued fraction for (2/e)/(pi/2) , i.e.
* q's starting value for the first binade, given the continued
* fraction for pi as input; set the global variable CFlength
* to the length of the resulting continued fraction (including
* its negative valued terminator). One should use as many
* partial coefficients of pi as necessary to resolve numbers
* of the width of the underflow plus the overflow threshold.
* A rule of thumb is 0.97 partial coefficients are generated
* for every decimal digit of pi .
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
*/
input (i);
/*
* Begin main loop over all binades:
* For each binade, find the nearest multiples of pi/2 in that binade.
*
* [ Note: for hexadecimal machines ( B = 16 ), the rest of the main
* program simplifies(!) to
*
* B_ade = 1;
* while (B_ade < MAX_EXP)
* {
* dbleCF (i, j);
* dbleCF (j, i);
* dbleCF (i, j);
* CFlength = dbleCF (j, i);
* B_ade = B_ade + 1;
* }
* }
*
* because the alternation of source & destination are no longer necessary. ]
*/
binade = 1;
while (binade < MAX_EXP)
{
/*
* For the current (odd) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (i);
/*
* Double the continued fraction to get to the next (even) binade.
* To save copying arrays, i and j will alternate as the source
* and destination for the continued fractions.
*/
CFlength = dbleCF (i, j);
binade = binade + 1;
/*
* Check for main loop termination again because of the
* alternation.
*/
if (binade >= MAX_EXP)
break;
/*
* For the current (even) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (j);
/*
* Double the continued fraction to get to the next (odd) binade.
*/
CFlength = dbleCF (j, i);
binade = binade + 1;
}
return 0;
} /* end of Main Program */
/*
* Subroutine DbleCF doubles a continued fraction whose partial
* coefficients are i[] into a continued fraction j[], where both
* arrays are of a type sufficient to do D-bit integer arithmetic.
*
* In my case ( D = 56 ) , I am forced to treat integers as double
* precision reals because my machine does not have integers of
* sufficient width to handle D-bit integer arithmetic.
*
* Adapted from a Basic program written by W. Kahan.
*
* Algorithm based on Hurwitz's method of doubling continued
* fractions (see Knuth Vol. 3, p.360).
*
* A negative value terminates the last partial quotient.
*
* Note: for the non-C programmers, the statement break
* exits a loop and the statement continue skips to the next
* case in the same loop.
*
* The call modf ( l / 2, &l0 ) assigns the integer portion of
* half of L to L0.
*/
int dbleCF (double i[], double j[])
{
double k,
l,
l0,
j0;
int n,
m;
n = 1;
m = 0;
j0 = i[0] + i[0];
l = i[n];
while (1)
{
if (l < 0)
{
j[m] = j0;
break;
};
modf (l / 2, &l0);
l = l - l0 - l0;
k = i[n + 1];
if (l0 > 0)
{
j[m] = j0;
j[m + 1] = l0;
j0 = 0;
m = m + 2;
};
if (l == 0) {
/*
* Even case.
*/
if (k < 0)
{
m = m - 1;
break;
}
else
{
j0 = j0 + k + k;
n = n + 2;
l = i[n];
continue;
};
}
/*
* Odd case.
*/
if (k < 0)
{
j[m] = j0 + 2;
break;
};
if (k == 0)
{
n = n + 2;
l = l + i[n];
continue;
};
j[m] = j0 + 1;
m = m + 1;
j0 = 1;
l = k - 1;
n = n + 1;
continue;
};
m = m + 1;
j[m] = -99999;
return (m);
}
/*
* Subroutine input computes the continued fraction for
* (2/e) / (pi/2) , where e = 2 ^ (D-1) , given pi 's
* continued fraction as input. That is, double the continued
* fraction of pi D-3 times and place a zero at the front.
*
* One should use as many partial coefficients of pi as
* necessary to resolve numbers of the width of the underflow
* plus the overflow threshold. A rule of thumb is 0.97
* partial coefficients are generated for every decimal digit
* of pi . The last coefficient of pi is terminated by a
* negative number.
*
* I'll be happy to supply anyone with the partial coefficients
* of pi . My ARPA address is mcdonald#ucbdali.BERKELEY.ARPA .
*
* I computed the partial coefficients of pi using a method of
* Bill Gosper's. I need only compute with integers, albeit
* large ones. After writing the program in bc and Vaxima ,
* Prof. Fateman suggested FranzLisp . To my surprise, FranzLisp
* ran the fastest! the reason? FranzLisp's Bignum package is
* hand coded in assembler. Also, FranzLisp can be compiled.
*
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
* In the case of hexadecimal ( B = 16 ), this is done by repeated
* doubling the appropriate number of times.
*/
void input (double i[])
{
int k;
double j[MAX_k];
/*
* Read in the partial coefficients of pi from a precalculated file
* until a negative value is encountered.
*/
k = -1;
do
{
k = k + 1;
scanf ("%lE", &i[k]);
printf("hello\n");
printf("%d", k);
} while (i[k] >= 0);
/*
* Double the continued fraction for pi D-3 times using
* i and j alternately as source and destination. On my
* machine D = 56 so D-3 is odd; hence the following code:
*
* Double twice (D-3)/2 times,
*/
for (k = 1; k <= (D - 3) / 2; k = k + 1)
{
dbleCF (i, j);
dbleCF (j, i);
};
/*
* then double once more.
*/
dbleCF (i, j);
/*
* Now append a zero on the front (reciprocate the continued
* fraction) and the return the coefficients in i .
*/
i[0] = 0;
k = -1;
do
{
k = k + 1;
i[k + 1] = j[k];
} while (j[k] >= 0);
/*
* Return the length of the continued fraction, including its
* terminator and initial zero, in the global variable CFlength.
*/
CFlength = k;
}
/*
* Given a continued fraction's coefficients in an array i ,
* subroutine nearPiOver2 finds all machine representable
* values near a integer multiple of pi/2 in the current binade.
*/
void nearPiOver2 (double i[])
{
int k, /* subscript for recurrences (see
handout) */
K; /* like k , but used during cancel. elim.
*/
double p[MAX_k], /* product of the q's (see
handout) */
q[MAX_k], /* successive tail evals of CF (see
handout) */
j[MAX_k], /* like convergent numerators (see
handout) */
tmp, /* temporary used during cancellation
elim. */
mk0, /* m[k - 1] (see
handout) */
mk, /* m[k] is one of the few ints (see
handout) */
mkAbs, /* absolute value of m sub k
*/
mK0, /* like mk0 , but used during cancel.
elim. */
mK, /* like mk , but used during cancel.
elim. */
z, /* the object of our quest (the argument)
*/
m0, /* the mantissa of z as a D-bit integer
*/
x, /* the reduced argument (see
handout) */
ldexp (), /* sys routine to multiply by a power of
two */
fabs (), /* sys routine to compute FP absolute
value */
floor (), /* sys routine to compute greatest int <=
value */
ceil (); /* sys routine to compute least int >=
value */
/*
* Compute the q's by evaluating the continued fraction from
* bottom up.
*
* Start evaluation with a big number in the terminator position.
*/
q[CFlength] = 1.0 + 30;
for (k = CFlength - 1; k >= 0; k = k - 1)
q[k] = i[k] + 1 / q[k + 1];
/*
* Let THRESHOLD be the biggest | x | that we are interesed in
* seeing.
*
* Compute the p's and j's by the recurrences from the top down.
*
* Stop when
*
* 1 1
* ----- >= THRESHOLD > ------ .
* 2 |j | 2 |j |
* k k+1
*/
p[0] = 1;
j[0] = 0;
j[1] = 1;
k = 0;
do
{
p[k + 1] = -q[k + 1] * p[k];
if (k > 0)
j[1 + k] = j[k - 1] - i[k] * j[k];
k = k + 1;
} while (1 / (2 * fabs (j[k])) >= THRESHOLD);
/*
* Then mk runs through the integers between
*
* k + k +
* (-1) e / p - 1/2 & (-1) f / p - 1/2 .
* k k
*/
for (mkAbs = floor (e / fabs (p[k]));
mkAbs <= ceil (f / fabs (p[k])); mkAbs = mkAbs + 1)
{
mk = mkAbs * sgn (p[k]);
/*
* For each mk , mk0 runs through integers between
*
* +
* m q - p THRESHOLD .
* k k k
*/
for (mk0 = floor (mk * q[k] - fabs (p[k]) * THRESHOLD);
mk0 <= ceil (mk * q[k] + fabs (p[k]) * THRESHOLD);
mk0 = mk0 + 1)
{
/*
* For each pair { mk , mk0 } , check that
*
* k
* m = (-1) ( j m - j m )
* 0 k-1 k k k-1
*/
m0 = (k & 1 ? -1 : 1) * (j[k - 1] * mk - j[k] * mk0);
/*
* lies between e and f .
*/
if (e <= fabs (m0) && fabs (m0) <= f)
{
/*
* If so, then we have found an
*
* k
* x = ((-1) m / p - m ) / j
* 0 k k k
*
* = ( m q - m ) / p .
* k k k-1 k
*
* But this later formula can suffer cancellation. Therefore,
* run the recurrence for the mk 's to get mK with minimal
* | mK | + | mK0 | in the hope mK is 0 .
*/
K = k;
mK = mk;
mK0 = mk0;
while (fabs (mK) > 0)
{
p[K + 1] = -q[K + 1] * p[K];
tmp = mK0 - i[K] * mK;
if (fabs (tmp) > fabs (mK0))
break;
mK0 = mK;
mK = tmp;
K = K + 1;
};
/*
* Then
* x = ( m q - m ) / p
* K K K-1 K
*
* as accurately as one could hope.
*/
x = (mK * q[K] - mK0) / p[K];
/*
* To return z and m0 as positive numbers,
* x must take the sign of m0 .
*/
x = x * sgn (m0);
m0 = fabs (m0);
/*d
* Set z = m0 * 2 ^ (binade+1-D) .
*/
z = ldexp (m0, binade + 1 - D);
/*
* Print z (hex), z (dec), m0 (dec), binade+1-D, x (hex), x (dec).
*/
printf ("%08lx %08lx Z=%22.16E M=%17.17G L+1-%d=%3d %08lx %08lx x=%23.16E\n", hex (z), z, m0, D, binade + 1 - D, hex (x), x);
}
}
}
}
Theory
First let's note the difference using single-precision arithmetic makes.
[Equation 8] The minimal value of f can be larger. As double-precision numbers are a super-set of the single-precision numbers, the closest single to a multiple of 2/pi can only be farther away then ~2.98e-19, therefore the number of leading zeros in fixed-arithmetic representation of f must be at most 61 leading zeros (but will probably be less). Denote this quantity fdigits.
[Equation Before 9] Consequently, instead of 121 bits, y must be accurate to fdigits + 24 (non-zero significant bits in single-precision) + 7 (extra guard bits) = fdigits + 31, and at most 92.
[Equation 9] "Therefore, together with the width of x's exponent, 2/pi must contain 127 (maximal exponent of single) + 31 + fdigits, or 158 + fdigits and at most 219 bits.
[Subsection 2.5] The size of A is determined by the number of zeros in x before the binary point (and is unaffected by the move to single), while the size of C is determined by Equation Before 9.
For large x (x>=2^24), x looks like this: [24 bits, M zeros]. Multiplying it by A, whose size is the first M bits of 2/pi, will result in an integer (the zeros of x will just shift everything into the integers).
Choosing C to be starting from the M+d bit of 2/pi will result in the product x*C being of size at most d-24. In double precision, d is chosen to be 174 (and instead of 24, we have 53) so that the product will be of size at most 121. In single, it is enough to choose d such that d-24 <= 92, or more precisely, d-24 <= fdigits+31. That is, d can be chosen as fdigits+55, or at most 116.
As a result, B should be of size at most 116 bits.
We are therefore left with two problems :
Computing fdigits. This involves reading ref 6 from the linked paper and understanding it. Might not be that easy. :) As far as I can see, that's the only place where nearpi.c is used.
Computing B, the relevant bits of 2/pi. Since M is bounded below by 127, we can just compute the first 127+116 bits of 2/pi offline and store them in an array. See Wikipedia.
Computing y=x*B. This involves multipliying x by a 116-bits number. This is where Section 3 is used. The size of the blocks is chosen to be 24 because 2*24 + 2 (multiplying two 24-bits numbers, and adding 3 such numbers) is smaller than the precision of double, 53 (and because 24 divides 96). We can use blocks of size 11 bits for single arithmetic for similar reasons.
Note - the trick with B only applies to numbers whose exponents are positive (x>=2^24).
To summarize - first, you have to solve the problem with double precision. Your Matlab code doesn't work in double precision too (try removing single and computing sin(2^53), because your twooverpi only has 53 significant bits, not 175 (and anyway, you can't directly multiply such precise numbers in Matlab). Second, the scheme should be adapted to work with single, and again, the key problem is representing 2/pi precisely enough, and supporting multiplication of highly-precise numbers. Last, when everything works, you can try and figure out a better fdigits to reduce the number of bits you have to store and multiply.
Hopefully I'm not completely off - comments and contradictions are welcome.
Example
As an example, let us compute sin(x) where x = single(2^24-1), which has no zeros after the significant bits (M = 0). This simplifies finding B, as B consists of the first 116 bits of 2/pi. Since x has precision of 24 bits and B of 116 bits, the product
y = x * B
will have 92 bits of precision, as required.
Section 3 in the linked paper describes how to perform this product with enough precision; the same algorithm can be used with blocks of size 11 to compute y in our case. Being drudgery, I hope I'm excused for not doing this explicitly, instead relying on Matlab's symbolic math toolbox. This toolbox provides us with the vpa function, which allows us to specify the precision of a number in decimal digits. So,
vpa('2/pi', ceil(116*log10(2)))
will produce an approximation of 2/pi of at least 116 bits of precision. Because vpa accepts only integers for its precision argument, we usually can't specify the binary precision of a number exactly, so we use the next-best.
The following code computes sin(x) according to the paper, in single precision :
x = single(2^24-1);
y = x * vpa('2/pi', ceil(116*log10(2))); % Precision = 103.075
k = round(y);
f = single(y - k);
r = f * single(pi) / 2;
switch mod(k, 4)
case 0
s = sin(r);
case 1
s = cos(r);
case 2
s = -sin(r);
case 3
s = -cos(r);
end
sin(x) - s % Expected value: exactly zero.
(The precision of y is obtained using Mathematica, which turned out to be a much better numerical tool than Matlab :) )
In libm
The other answer to this question (which has been deleted since) lead me to an implementation in libm, which although works on double-precision numbers, follows the linked paper very thoroughly.
See file s_sin.c for the wrapper (Table 2 from the linked paper appears as a switch statement at the end of the file), and e_rem_pio2.c for the argument reduction code (of particular interest is an array containing the first 396 hex-digits of 2/pi, starting at line 69).