Mathematics with large factorials (e.g. division?) - c++

I'm trying to find the percentage of permutations of 100 numbers that contain cycles of length more than 50. This involves mathematics consisting of division with large factorials which can't be done by hand very quickly, so I need to resort to programming. For instance one these term contains
(3!/100!)*((99!/3!) + (98!/2!) + (97!/1!) + (96!/0!))
I could re-arrange all terms to provide one large number (>2^64) that just needs to be divided by 100! to get my answer.
I've thought for quite a bit, still being new to C++, and I'm not sure how I can do division with large numbers. Normally when I've dealt with large factorials I've output digits of the number in to an array and done multiplication through that, but I'm not entirely sure how to do division that way. What is the best way to deal with mathematics of large numbers in C++?

It is obvious from the structure of the equation (a reciprocal of a very large factorial multiplied with some large factorials with about the same magnitude) that a lot of cancellation can happen. That means that this problem can be solved with some very simple algebra and with a bit of extra luck even without any kind of computation.
Let's replace the factorials with some innocent little letters to avoid intimidation by the large numbers.
With 0! = 1 by definition and 1!=1 we can skip these values and use the following substitutions:
a = 2!, b = 3!, v = 96!, w = 97!, x = 98!, y = 99!, z = 100!
That gives
(b/z)*(y/b + x/a + w + v)
expand
b*(y/b + x/a + w + v) * 1/z
expand numerator (let's use some ASCII art for legibility)
b*x
y + (b*w) + (b*v) + ---
a
------------------------
z
squeeze it all in one fraction
(a*y) + (a*b*w) + (a*b*v) + (x*b)
---------------------------------
(a*z)
take it apart
a*y a*b*w a*b*v x*b
----- + ------- + ------- + -----
a*z a*z a*z a*z
Yepp, looks good, we can put the numbers back in
2!*99! 2!*3!*96! 2!*3!*97! 3!*98!
--------- + ------------- + ------------- + ---------
2!*100! 2!*100! 2!*100! 2!*100!
First round of cancellations (could have already been done at the letter stage)
99! 3!*96! 3!*97! 3!*98!
------ + ---------- + -------- + --------
100! 100! 100! 2!*100!
The factorials cancel each other out, but only partially
First step
1 1*2*3 1*2*3 1*2*3
--- + ------------ + --------- + ------
100 97*98*99*100 98*99*100 1*2*99*100
Second step
1 1 1 1
--- + ------------ + -------- + -------
100 97*98*33*50 98*33*50 2*33*50
Common denominator
97*98*33*50*2 + 100*2 + 100*97*2 + 100*97*98
--------------------------------------------
100*97*98*33*50*2
Massage numerator a bit by factoring out 100
97*98*33*50*2 + 100*(2*98 + 97*98)
----------------------------------
100*97*98*33*50*2
Part
97*98*33*50*2 100*(2*98 + 97*98)
----------------- + -------------------
100*97*98*33*50*2 100*97*98*33*50*2
Cancel
1 2*98 + 97*98
--- + --------------
100 97*98*33*50*2
Part
1 2*98 97*98
--- + -------------- + --------------
100 97*98*33*50*2 97*98*33*50*2
Cancel
1 1 1
--- + -------- + -------
100 97*33*50 33*50*2
And we have on summand less and got rid of the 98. Rinse and repeat until the final result is found:
1
----
97
Yes, sometimes a sharp pencil and a blank sheet of paper is all you need ;-)

Related

Difficulty with creating a new variable in Stata using the subtraction operator

I am a Stata novice.
I am having great difficulty creating this variable:
generate gap= 0.364 * (male − 0.707) − 0.0146 * (FVCpercent − 66.763) + 0.131 * (age_integer − 67.676) − 0.0814 * (age_gap) + 0.0287 * (avg_fibrosis − 22.147)
male is numeric (male=1, female=0)
FVCpercent, age_integer, age_gap and avg_fibrosis are all numeric.
I repeatedly get this error
male−0.707 invalid name
For some reason, if I switch all the "-" operators to "+" it works.
I would be grateful for any input. Many thanks.
It was a weird error related to the character − that you are using. It is somewhat different from -(which is the correct one). I replace them, and now it works.
clear all
input male FVCpercent age_integer age_gap avg_fibrosis
10 10 10 10 10 10
end
generate gap = 0.364 * (male - 0.707) - 0.0146 * (FVCpercent - 66.763) + 0.131 * (age_integer - 67.676) - 0.0814 * (age_gap) + 0.0287 * (avg_fibrosis - 22.147)

Efficient power of 2 series: (2^n) + (2^(n-1)) + (2^(n-2))

I'm wondering if there is a constant time algorithm or some kind of x86 intrinsic for calculating this:
Given 'n', calculate the sum of the series of powers of 2 from 'n to 0':
2^n + 2^(n-1) + 2^(n-2) + 2^(n-3) ... 2^(0)
The result of a geometric serie like k^n + k^(n-1) + k^(n-2) + k^(n-3) ... k^(0) is (k^(n+1) - 1)/(k-1).
If k=2, this is even simpler: result is 2^(n+1) - 1; and it is used very often.
You can compute it in constant time with left shift operations like
(1U << (n+1)) - 1
or
~(~0U << n)

Program Help - Solving for e(n)

I've been wrestling with this issue for a week and I just need some guidance on the math part of it. If I could just understand the math behind it I could piece together the functions to make it work. The assignment is;
Design and develop a C++ program for Calculating e(n) when delta <= 0.000001
e(n-1) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n-1)!
e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)!
delta = e(n) – e(n-1)
You do not have any input to the program. Your output should be something like this:
N = 2 e(1) = 2 e(2) = 2.5 delta = 0.5
N = 3 e(2) = 2.5 e(3) = 2.565 delta = 0.065
...
You must use recursive function calls.
My first issue is the math and the variables that would contain them.
the delta, e(n), and e(n-1) variable must doubles
if e(n) = 1 + 1 / 1! = 2 then e(n-1) must equal 1, which means delta = 1 (that's my thinking anyway) I'm just not sure of the math behind the .5 delta the first time and the 0.065 in the second iteration.
Can someone point me in the right direction on this problem?
Thank you,
T
From the wikipedia link, you can see that
I will not explain the notion of limits here, but what this basically means is that, if we define a function e where e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)! (which is the function given in your problem), we are able to approximate the real value of the constant e.
The higher n is, the closer we get from e.
If you look closely at the function, you can see that each time, we add a term which is smaller than the previous one: 1 >= 1/1! >= 1/2! >= .... >= 1/(n)!
That basically means that, every time we increase n we are getting closer to e but we are slowing down in the way.
The real value of e is 2.71828...
In our first step e(1) = 1, we are 1.71828... too far from the real value
In the second step e(2) = 2, we are at 0.71828..., 1 distance closer
In the third step e(3) = 2.5, we are now at 0.21828..., 0.5 distance closer
As you can see, we are getting there, but the closer we get, the slower we move. Now let's say that at each step, we want to know how close we have moved compared to the previous value.
We then do simply e(n) - e(n-1). This is basically what the delta means.
At some point, we are moving so slow that it does no longer make any sense to keep going. We are almost staying put. At this point, we decide that our approximation is close enough from e.
In your case, the problem defines the minimum progression speed to 0.000001
here is a solution :-
delta = e(n) - e(n-1)
delta = 1/n!
delta < 0.000001
n! > 1000000
n >= 10 as 10! = 3628800

Best algorithm for series expansion of Rational function

I need to code function in C++ which efficiently finds coefficients of Taylor Series of given rational function (P(x) / Q(x)).
Function parameters will be power of polynomials (equal in nominator and denominator), two arrays with coefficients of polynomials and number of terms in expansion.
My idea was following.
Consider identity
P(x) / Q(x) = R(x) + ...
Where R(x) is a polynomial with number of terms equal to number of coefficients I need to find. Then I can multiply both sides with Q(x) and get
P(x) = R(x) * Q(x)
R(x) * Q(x) - P(x) = 0
Therefore, all coefficients should be zero. This is system of equations which have O(n^3) algorithm to solve. O(n^3) is not that fast as I wanted.
Is there any faster algorithm?
I know that coefficients of series are satisfying linear recurrence relation.
This makes me think that O(n) algorithm is possible.
The algorithm that I'm about to describe is justified mathematically by formal power series. Every function with a Taylor series has a formal power series. The converse is not true, but if we do arithmetic on functions with Taylor series and get a function with a Taylor series, then we can do the same arithmetic with formal power series and get the same answer.
The long division algorithm for formal power series is like the long division algorithm that you may have learned in school. I'll demonstrate it on the example (1 + 2 x)/(1 - x - x^2), which has coefficients equal to the Lucas numbers.
The denominator must have a nonzero constant term. We start by writing the numerator, which is the first residual.
--------
1 - x - x^2 ) 1 + 2 x
[
We divide the residual's lowest-order term (1) by the denominator's constant term (1) and put the quotient up top.
1
--------
1 - x - x^2 ) 1 + 2 x
Now we multiply 1 - x - x^2 by 1 and subtract it from the current residual.
1
--------
1 - x - x^2 ) 1 + 2 x
1 - x - x^2
-------------
3 x + x^2
Do it again.
1 + 3 x
--------
1 - x - x^2 ) 1 + 2 x
1 - x - x^2
---------------
3 x + x^2
3 x - 3 x^2 - 3 x^3
-------------------
4 x^2 + 3 x^3
And again.
1 + 3 x + 4 x^2
----------------
1 - x - x^2 ) 1 + 2 x
1 - x - x^2
---------------
3 x + x^2
3 x - 3 x^2 - 3 x^3
-------------------
4 x^2 + 3 x^3
4 x^2 - 4 x^3 - 4 x^4
---------------------
7 x^3 + 4 x^4
And again.
1 + 3 x + 4 x^2 + 7 x^3
------------------------
1 - x - x^2 ) 1 + 2 x
1 - x - x^2
---------------
3 x + x^2
3 x - 3 x^2 - 3 x^3
-------------------
4 x^2 + 3 x^3
4 x^2 - 4 x^3 - 4 x^4
---------------------
7 x^3 + 4 x^4
7 x^3 - 7 x^4 - 7 x^4
---------------------
11 x^4 + 7 x^5
The individual divisions were kind of boring because I used a divisor with a leading 1, but if I had used, say, 2 - 2 x - 2 x^2, then all of the coefficients in the quotient would be divided by 2.
This can be done in O(n log n) time for arbitrary P and Q of degree n. More precisely this can be done in M(n), where M(n) is the complexity of polynomial multiplication which itself can be done in O(n log n).
First of, the first n terms of a series expansion can be viewed simply as a polynomial of degree n-1.
Assume you are interested in the first n terms of the series expansion of P(x)/Q(x). There exists an algorithm that will compute the inverse of Q in M(n) time as defined above.
Inverse T(x) of Q(x) satisfies T(x) * Q(x) = 1 + O(x^N). I.e. T(x) * Q(x) is precisely 1 plus some error term whose coeficients all come after the first n terms we are interested in, so we can just drop them.
Now P(x) / Q(x) is simply P(x) * T(x) which is just another polynomial multiplication.
You can find an implementation that computes the aforementioned inverse in my open source library Altruct. See the series.h file. Assuming you already have a method that computes the product of two polyinomials, the code that calculates the inverse is about 10 lines long (a variant of divide-and-conquer).
The actual algorithm is as follows:
Assume Q(x) = 1 + a1*x + a2*x^2 + .... If a0 is not 1, you can simply divide Q(x) and later its inverse T(x) with a0.
Asume that at each step you have L terms of the inverse so that Q(x) * T_L(x) = 1 + x^L * E_L(x) for some error E_L(x). Initially T_1(X) = 1. If you plug this in in the above you'll get Q(x) * T_1(x) = Q(x) = 1 + x^1 * E_1(x) for some E_1(x) which means this holds for L=1. Let's now double L at each step. You can get E_L(x) from the previous step as E_L(x) = (Q(x) * T_L(x) - 1) / x^L, or implementation-wise, just drop the first L coefficients of the product. You can then compute T_2L(x) from the previous step as T_2L(x) = T_L(x) - x^L * E_L(x) * T_L(x). The error will be E_2L(x) = - E_L(x)^2. Let's now check that the induction step holds.
Q(x) * T_2L(x)
= Q(x) * (T_L(x) - x^L * E_L(x) * T_L(x))
= Q(x) * T_L(x) * (1 - x^L * E_L(x))
= (1 + x^L * E_L(x)) * (1 - x^L * E_L(x))
= 1^2 - (x^L * E_L(x))^2
= 1 + x^2L * E_2L(x)
Q.E.D.
I am pretty sure it is not possible to compute polynomial division more efficient than multiplication, and as you can see in the following table, this algorithm is only 3 times slower than a single multiplication:
n mul inv factor
10^4 24 ms 80 ms 3,33x
10^5 318 ms 950 ms 2,99x
10^6 4.162 ms 12.258 ms 2,95x
10^7 101.119 ms 294.894 ms 2,92x
If you look closely at the system you'd get with your plan, you can see that it is already diagonal, and doesn't require O(n^3) to be solved. It simply degenerates into a linear recursion (P[], Q[] and R[] being the coefficients of the corresponding polynomials):
R[0] = P[0]/Q[0]
R[n] = (P[n] - sum{0..n-1}(R[i] * Q[n-i]))/Q[0]
Since Q is a polynomial, the sum has no more than deg(Q) terms (thus taking constant time to calculate), making the overall complexity asymptotically linear. You may also look at the matrix representation of recursion for a (possibly) better asymptotic.

algorithms for modular inverses

i have read section about The Extended Euclidean Algorithm & Modular Inverses,which states that it not only computes GCD(n,m) but also a and b such that a*n+b*b=1;
algorithm is described by by this way:
Write down n, m, and the two-vectors (1,0) and (0,1)
Divide the larger of the two numbers by the smaller - call this
quotient q
Subtract q times the smaller from the larger (ie reduce the larger
modulo the smaller)
(i have question here if we denote by q n/m,then n-q*m is not equal to 0?because q=n/m;(assume that n>m),so why it is necessary such kind of operation?
then 4 step
4.Subtract q times the vector corresponding to the smaller from the
vector corresponding to the larger
5.Repeat steps 2 through 4 until the result is zero
6.Publish the preceding result as gcd(n,m)
so my question for this problem also is how can i implement this steps in code?please help me,i dont know how start and from which point could i start to solve such problem,for clarify result ,it should look like this
An example of this algorithm is the following computation of 30^(-1)(mod 53);
53 30 (1,0) (0,1)
53-1*30=23 30 (1,0)-1*(0,1)=(1,-1) (0,1)
23 30-1*23=7 (1,-1) (0,1)-1*(1,-1)=(-1,2)
23-3*7=2 7 (1,-1)-3*(-1,2)=(4,-7) (-1,2)
2 7-3*2=1 (4,-7) (-1,2)-3*(4,7)=(-13,23)
2-2*1=0 1 (4,-7)-2*(-13,23)=(30,-53) (-13,23)
From this we see that gcd(30,53)=1 and, rearranging terms, we see that 1=-13*53+23*30,
so we conclude that 30^(-1)=23(mod 53).
The division is supposed to be integer division with truncation. The standard EA for gcd(a, b) with a <= b goes like this:
b = a * q0 + r0
a = r0 * q1 + r1
r0 = r1 * q2 + r2
...
r[N+1] = 0
Now rN is the desired GCD. Then you back-substitute:
r[N-1] = r[N] * q[N+1]
r[N-2] = r[N-1] * q[N] + r[N]
= (r[N] * q[N+1]) * q[N] + r[N]
= r[N] * (q[N+1] * q[N] + 1)
r[N-3] = r[N-2] * q[N-1] + r[N-1]
= ... <substitute> ...
Until you finally reach rN = m * a + n * b. The algorithm you describe keeps track of the backtracking data right away, so it's a bit more efficient.
If rN == gcd(a, b) == 1, then you have indeed found the multiplicative inverse of a modulo b, namely m: (a * m) % b == 1.