large n for factorials [duplicate] - c++

This question already has answers here:
Calculate the factorial of an arbitrarily large number, showing all the digits
(11 answers)
Closed 3 years ago.
I am trying to solve a very simple task about finding nCk when 1<=n,k<=50. I can't seem to find a way of outputting the result for very large numbers like 50 in C++. My algorithm only works for small integer values.
I implemented a factorial function for the nCk formula, but I can't find a way to solve such task for bigger number, and in 1s.
#include <iostream>
using namespace std;
int main()
{
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
int i, n, k;
long long res, num, den;
res = num = den = 1;
cin >> n >> k;
if (n < k) {
cout << 0;
return 0;
}
if (n == k || k == 0) {
cout << 1;
return 0;
}
for (i = 1; i <= k; i++) {
if ((n - i + 1) % i == 0) {
res = res * ((n - i + 1) / i);
}
else {
num *= (n - i + 1);
den *= i;
}
}
cout << (res*num)/den;
return 0;
}

This solution requires some mathematics rather than programming (to solve the problem of overflow).
You have:
n! / (k! * (n - k)!)
You can eliminate common factors easily enough by expanding it. For example:
n = 8, k = 3
8*7*6*5*4*3*2*1 / ((3*2*1) * (5*4*3*2*1))
which expands to
8*7*6*5*4*3*2*1 / 3*2*1*5*4*3*2*1
notice how we can remove 5*4*3*2*1 from both by the rules of division? We then get
8*7*6 / 3*2*1
This will be a lot easier to calculate.
Eventually if you keep getting bigger you will run into issues anyways, so you may need to look into Boost's Multiprecision

Your current formula is
binom(n, k) = n! / (n - k)!k!
This formula is OK for mathematics, but not OK for computing. Simplify it:
binom(n, k) = n(n - 1)(n - 2) ... (n - k + 1) / k!
which involves fewer terms. Also note that
binom(n, k) = binom(n, n - k)
which can be used as an optimization if k > n / 2.
Also, if the numbers are too large, you need to use a multi-precision library like GMP.

Related

Need optimization tips for a subset sum like problem with a big constraint

Given a number 1 <= N <= 3*10^5, count all subsets in the set {1, 2, ..., N-1} that sum up to N. This is essentially a modified version of the subset sum problem, but with a modification that the sum and number of elements are the same, and that the set/array increases linearly by 1 to N-1.
I think i have solved this using dp ordered map and inclusion/exclusion recursive algorithm, but due to the time and space complexity i can't compute more than 10000 elements.
#include <iostream>
#include <chrono>
#include <map>
#include "bigint.h"
using namespace std;
//2d hashmap to store values from recursion; keys- i & sum; value- count
map<pair<int, int>, bigint> hmap;
bigint counter(int n, int i, int sum){
//end case
if(i == 0){
if(sum == 0){
return 1;
}
return 0;
}
//alternative end case if its sum is zero before it has finished iterating through all of the possible combinations
if(sum == 0){
return 1;
}
//case if the result of the recursion is already in the hashmap
if(hmap.find(make_pair(i, sum)) != hmap.end()){
return hmap[make_pair(i, sum)];
}
//only proceed further recursion if resulting sum wouldnt be negative
if(sum - i < 0){
//optimization that skips unecessary recursive branches
return hmap[make_pair(i, sum)] = counter(n, sum, sum);
}
else{
//include the number dont include the number
return hmap[make_pair(i, sum)] = counter(n, i - 1, sum - i) + counter(n, i - 1, sum);
}
}
The function has starting values of N, N-1, and N, indicating number of elements, iterator(which decrements) and the sum of the recursive branch(which decreases with every included value).
This is the code that calculates the number of the subsets. for input of 3000 it takes around ~22 seconds to output the result which is 40 digits long. Because of the long digits i had to use an arbitrary precision library bigint from rgroshanrg, which works fine for values less than ~10000. Testing beyond that gives me a segfault on line 28-29, maybe due to the stored arbitrary precision values becoming too big and conflicting in the map. I need to somehow up this code so it can work with values beyond 10000 but i am stumped with it. Any ideas or should i switch towards another algorithm and data storage?
Here is a different algorithm, described in a paper by Evangelos Georgiadis, "Computing Partition Numbers q(n)":
std::vector<BigInt> RestrictedPartitionNumbers(int n)
{
std::vector<BigInt> q(n, 0);
// initialize q with A010815
for (int i = 0; ; i++)
{
int n0 = i * (3 * i - 1) >> 1;
if (n0 >= q.size())
break;
q[n0] = 1 - 2 * (i & 1);
int n1 = i * (3 * i + 1) >> 1;
if (n1 < q.size())
q[n1] = 1 - 2 * (i & 1);
}
// construct A000009 as per "Evangelos Georgiadis, Computing Partition Numbers q(n)"
for (size_t k = 0; k < q.size(); k++)
{
size_t j = 1;
size_t m = k + 1;
while (m < q.size())
{
if ((j & 1) != 0)
q[m] += q[k] << 1;
else
q[m] -= q[k] << 1;
j++;
m = k + j * j;
}
}
return q;
}
It's not the fastest algorithm out there, and this took about half a minute for on my computer for n = 300000. But you only need to do it once (since it computes all partition numbers up to some bound) and it doesn't take a lot of memory (a bit over 150MB).
The results go up to but excluding n, and they assume that for each number, that number itself is allowed to be a partition of itself eg the set {4} is a partition of the number 4, in your definition of the problem you excluded that case so you need to subtract 1 from the result.
Maybe there's a nicer way to express A010815, that part of the code isn't slow though, I just think it looks bad.

An optimized algorithm for the given problem?

I am solving a problem which states that we have a list L containing integers from 1 to N. We have to perform the following operation N−1 times:
Choose two elements of the list, let's denote them by X and Y.
Erase the chosen elements from L.
Append the number X + Y + X*Y to L.
At the end, L contains exactly one integer. Find this integer.
As the answer may be large, we have to compute it modulo 10^9 + 7
Constraints :
1≤N≤1,000,000
Time Limit :
1 sec
I have written this code which gives the correct answer in linear time but it says time limit exceeded for this approach. Can someone provide a better optimized solution
inline ull cal(ull x, ull y){
ull ans, i, modno;
modno = 1000000007;
i = 1;
ans = (x + y);
i = (i*x) % modno;
i = (i*y) % modno;
ans = ans + i;
ans = ans % modno;
return ans;
}
int main(){
ull n;
cin>>n;
ull sum, modno;
sum = 0;
modno = 1000000007;
if(n == 1)
cout<<1<<endl;
else
{
sum = n + (n-1) + (n*(n-1));
n -= 2;
do
{
if(n <= 0)
break;
sum = cal(sum, n);
n -= 1;
}while(1);
cout<<ans<<endl;
}
return 0;
}
Final code :
ull n;
cin>>n;
if(n == 1)
cout<<1<<endl;
else
{
ull modno = 1000000007;
ull ans = 1;
ull no = n+1;
while(no >= 1)
{
ans = (ans*no);
if(ans > modno)
ans = ans%modno;
no--;
}
ans = ans - 1;
ans = ans % modno;
cout<<ans<<endl;
There's a closed-form solution for the sum: L = (N+1)!-1
The sum follows this recurrent equation L_N = N + L_(n-1) + N*L_(n-1), L_0=0 which can be obtained by simply always choosing X=L_(N-1) and Y=N ( = the next number to add).
Derivation:
EDIT:
As you posted your final code, I'm posting my benchmark:
#include <iostream>
#include <cstdint>
#include <chrono>
std::uint64_t
factorial(std::uint64_t n) {
std::uint64_t x = 1;
while (n > 1)
x = (x * n--) % 1'000'000'007;
return x;
}
int
main() {
std::uint64_t n;
std::cin >> n;
std::uint64_t numMicro = 0;
for (std::size_t i = 0; i < 1'000; ++i) {
auto start = std::chrono::high_resolution_clock::now();
volatile std::uint64_t res = factorial(n);
auto end = std::chrono::high_resolution_clock::now();
numMicro +=
std::chrono::duration_cast<std::chrono::microseconds>(end - start)
.count();
}
std::cout << "On average: " << numMicro / 1000.0 << "microseconds";
return 0;
}
Compiled with -O3, volatile is there only to make sure that the compiler does not optimize the computation away.
Your solution is almost the same, way below the 1 second. Not sure what to optimize further.
As others have mentioned, the problem boils down to calculating ((n + 1)! - 1) % p. You can search around about fast methods of doing this (fast factorial modulo prime). One of those that would work under 1s is the one mentioned here
Update: Just checked the problem link from codechef. As usual, the trick lies in the constraints which you haven´t accurately described. You have to do the same task for up to 100000 cases. A single fact(n) mod p can be obtained in under 1 second using standard for loop, as n is small.
What won´t work is calculate fact(n) mod p for every test case. Like many other problems, you can benefit using precomputation: build an array where arr[i] is i! mod p up to i = max value n can take + 1. With this information, you can answer each query (test case) in O(1) by just returning (arr[n + 1] - 1) % p.
Just tried this and got accepted. Next time, please add problem link to your description, it is usually the case that you don´t think something is relevant and that part is the whole answer to the problem.
The algorithm should look like this:
sum <- 1
for index <- 2,n
sum = (sum + index + sum * index) mod 1000000007
end for
Explanation: since + and * are commutative and associative, the order in which the items are handled is irrelevant, so you are doing a good job implementing this cycle, but you unnecessarily overcomplicate your cal function.
The other answers tell you to calculate ((n + 1)! - 1) mod modno, which is correct if we forget about the modulo part, but I doubt that calculating ((n + 1)! - 1) mod modno will yield the very same result as computing this in a step-by-step manner regardless of the value of n, because we have + and * in each step. If the other answerers are correct, then you can greatly optimize your algorithm. If not, then optimizing this is not as easy.
The problem just says "Choose two elements of the list, let's denote them by X and Y." and doesn't say anything about the order that the elements need to be chosen.
Therefore it could be rewritten as:
Split the list into one sub-list per CPU
Using SIMD; calculate (X+1)*(Y+1) for each pair in each CPU's
sub-list and store the results in an new list as 64-bit integers so
that you can avoid doing the expensive modulo operation
Using SIMD; calculate (X*Y - 1) % 1000000007 for each pair in
each CPU's new sub-list and store the results as 32-bit integers.
Repeat the previous 2 steps until you're left with one value from
each CPU (and do the final R = (R - 1) % 1000000007 if necessary to bring it back to 32-bit). Store these
values in a list and terminate all threads except for one.
Using SIMD; calculate (X+1)*(Y+1) for each pair
Using SIMD; calculate (X+*Y - 1) % 1000000007 for each pair
Repeat the previous 2 steps until you're left with one value

Find the number of pairs of positive integers satisfying the inequality

I'm trying to solve a programming problem where I have to display the number of positive integer solutions of the inequality x² + y² < n, where n is given by the user. I've already written a code that seems to work but not as fast as I'd like it to. Is there any way to speed it up?
My current code:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
long long n, i, r, k, p, a;
cin >> k;
while (k--)
{
r = 0;
cin >> n;
p = sqrt(n);
for (i = 1; i <= p; i++)
{
a = sqrt(n - (i * i));
r += a;
if ((((i * i) + (a * a)) == n) && (a > 0))
{
r--;
}
}
cout << r << "\n";
}
return 0;
}
Edit:
This is a solution for this task.
The task in English:
Find the number of natural solutions (x≥1, y≥1) of the inequality x²+y² < n, where 0 < n < 2147483647. For example, for n=10 there are 4 solutions: (1,1), (1,2), (2,1), (2,2).
Input
In the first line of input the number of test cases k is given. In the next k lines, there are the n values given.
Output
In the output, you have to display in separate lines the number of natural solutions of the inequality.
Example
Input:
2
10
11
Output:
4
6
Your solution seems fast already. The main possibility to reduce the time spent is to suppress the call to sqrtin the loop. This is obtained by considering that the value a = sqrt(n - (i * i)) does not vary very much from one iteration to the next one.
Here is the code:
r = 0;
p = sqrt(n);
if ((p*p) == n) p--;
a = p;
for (long long i = 1; i <= p; i++)
{
while ((n-i*i) <= a*a) {
--a;
}
r += a;
}

Fibonacci mod number c++

I have the following problem:
I should compute a fibonacci number mod another given number. I know about the Pisano period and i am trying to implement it here. This is the code:
#include <iostream>
#include <cstdlib>
long long get_fibonaccihuge(long long n, long long m) {
long long period = 0;
if (m % 2 == 0) {
if(m / 2 > 1)
period = 8 * (m / 2) + 4;
else
period = 3;
}
else{
if(((m + 1) / 2) > 1)
period = 4 * ((m + 1) / 2);
else
period = 1;
}
long long final_period = n % period;
long long array_fib[final_period];
array_fib[0] = 1;
array_fib[1] = 1;
for (long long i = 2; i < final_period; ++i) {
array_fib[i] = (array_fib[i-1] + array_fib[i-2]) % m;
}
return array_fib[final_period - 1];
}
int main() {
long long n, m;
std::cin >> n >> m;
std::cout << get_fibonaccihuge(n, m) << '\n';
}
It works well for small tests but the problem is that it fails the following test:
281621358815590 30524
I do not know why. Any suggestions about the algorithm, I referred to this page. When I was constructing it.
The error which I receive is: wrong result.
Expected: 11963 My result: 28651
Unless your task is to use Pisano periods, I would suggest you to use a more common known way to calculate n-th Fibonacci number in log2(n) steps (by computing powers of 2*2 matrix: https://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form).
There are two reasons:
It's a simpler algorithm and that means that it will be easier to debug the program
For the numbers you mentioned as an example it should be faster (log2(n) is somewhere about 50 and m/2 is significantly more).

Sum of Greatest Common Divisor of all numbers till n with n

There are n numbers from 1 to n. I need to find the
∑gcd(i,n) where i=1 to i=n
for n of the range 10^7. I used euclid's algorithm for gcd but it gave TLE. Is there any efficient method for finding the above sum?
#include<bits/stdc++.h>
using namespace std;
typedef long long int ll;
int gcd(int a, int b)
{
return b == 0 ? a : gcd(b, a % b);
}
int main()
{
ll n,sum=0;
scanf("%lld",&n);
for(int i=1;i<=n;i++)
{
sum+=gcd(i,n);
}
printf("%lld\n",sum);
return 0;
}
You can do it via bulk GCD calculation.
You should found all simple divisors and powers of these divisors. This is possible done in Sqtr(N) complexity.
After required compose GCD table.
May code snippet on C#, it is not difficult to convert into C++
int[] gcd = new int[x + 1];
for (int i = 1; i <= x; i++) gcd[i] = 1;
for (int i = 0; i < p.Length; i++)
for (int j = 0, h = p[i]; j < c[i]; j++, h *= p[i])
for (long k = h; k <= x; k += h)
gcd[k] *= p[i];
long sum = 0;
for (int i = 1; i <= x; i++) sum += gcd[i];
p it is array of simple divisors and c power of this divisor.
For example if n = 125
p = [5]
c = [3]
125 = 5^3
if n = 12
p = [2,3]
c = [2,1]
12 = 2^2 * 3^1
I've just implemented the GCD algorithm between two numbers, which is quite easy, but I cant get what you are trying to do there.
What I read there is that you are trying to sum up a series of GCD; but a GCD is the result of a series of mathematical operations, between two or more numbers, which result in a single value.
I'm no mathematician, but I think that "sigma" as you wrote it means that you are trying to sum up the GCD of the numbers between 1 and 10.000.000; which doesnt make sense at all, for me.
What are the values you are trying to find the GCD of? All the numbers between 1 and 10.000.000? I doubt that's it.
Anyway, here's a very basic (and hurried) implementation of Euclid's GCD algorithm:
int num1=0, num2=0;
cout << "Insert the first number: ";
cin >> num1;
cout << "\n\nInsert the second number: ";
cin >> num2;
cout << "\n\n";
fflush(stdin);
while ((num1 > 0) && (num2 > 0))
{
if ((num1 - num2) > 0)
{
//cout << "..case1\n";
num1 -= num2;
}
else if ((num2 - num1) > 0)
{
//cout << "..case2\n";
num2 -= num1;
}
else if (num1 = num2)
{
cout << ">>GCD = " << num1 << "\n\n";
break;
}
}
A good place to start looking at this problem is here at the Online Encyclopedia of Integer Sequences as what you are trying to do is compute the sum of the sequence A018804 between 1 and N. As you've discovered approaches that try to use simple Euclid GCD function are too slow so what you need is a more efficient way to calculate the result.
According to one paper linked from the OEIS it's possible to rewrite the sum in terms of Euler's function. This changes the problem into one of prime factorisation - still not easy but likely to be much faster than brute force.
I had occasion to study the computation of GCD sums because the problem cropped up in a HackerEarth tutorial named GCD Sum. Googling turned up some academic papers with useful formulas, which I'm reporting here since they aren't mentioned in the MathOverflow article linked by deviantfan.
For coprime m and n (i.e. gcd(m, n) == 1) the function is multiplicative:
gcd_sum[m * n] = gcd_sum[m] * gcd_sum[n]
Powers e of primes p:
gcd_sum[p^e] = (e + 1) * p^e - e * p^(e - 1)
If only a single sum is to be computed then these formulas could be applied to the result of factoring the number in question, which would still be way faster than repeated gcd() calls or going through the rigmarole proposed by Толя.
However, the formulas could just as easily be used to compute whole tables of the function efficiently. Basically, all you have to do is plug them into the algorithm for linear time Euler totient calculation and you're done - this computes all GCD sums up to a million much faster than you can compute the single GCD sum for the number 10^6 by way of calls to a gcd() function. Basically, the algorithm efficiently enumerates the least factor decompositions of the numbers up to n in a way that makes it easy to compute any multiplicative function - Euler totient (a.k.a. phi), the sigmas or, in fact, GCD sums.
Here's a bit of hashish code that computes a table of GCD sums for smallish limits - ‘small’ in the sense that sqrt(N) * N does not overflow a 32-bit signed integer. IOW, it works for a limit of 10^6 (plenty enough for the HackerEarth task with its limit of 5 * 10^5) but a limit of 10^7 would require sticking (long) casts in a couple of strategic places. However, such hardening of the function for operation at higher ranges is left as the proverbial exercise for the reader... ;-)
static int[] precompute_Pillai (int limit)
{
var small_primes = new List<ushort>();
var result = new int[1 + limit];
result[1] = 1;
int n = 2, small_prime_limit = (int)Math.Sqrt(limit);
for (int half = limit / 2; n <= half; ++n)
{
int f_n = result[n];
if (f_n == 0)
{
f_n = result[n] = 2 * n - 1;
if (n <= small_prime_limit)
{
small_primes.Add((ushort)n);
}
}
foreach (int prime in small_primes)
{
int nth_multiple = n * prime, e = 1, p = 1; // 1e6 * 1e3 < INT_MAX
if (nth_multiple > limit)
break;
if (n % prime == 0)
{
if (n == prime)
{
f_n = 1;
e = 2;
p = prime;
}
else break;
}
for (int q; ; ++e, p = q)
{
result[nth_multiple] = f_n * ((e + 1) * (q = p * prime) - e * p);
if ((nth_multiple *= prime) > limit)
break;
}
}
}
for ( ; n <= limit; ++n)
if (result[n] == 0)
result[n] = 2 * n - 1;
return result;
}
As promised, this computes all GCD sums up to 500,000 in 12.4 ms, whereas computing the single sum for 500,000 via gcd() calls takes 48.1 ms on the same machine. The code has been verified against an OEIS list of the Pillai function (A018804) up to 2000, and up to 500,000 against a gcd-based function - an undertaking that took a full 4 hours.
There's a whole range of optimisations that could be applied to make the code significantly faster, like replacing the modulo division with a multiplication (with the inverse) and a comparison, or to shave some more milliseconds by way of stepping the ‘prime cleaner-upper’ loop modulo 6. However, I wanted to show the algorithm in its basic, unoptimised form because (a) it is plenty fast as it is, and (b) it could be useful for other multiplicative functions, not just GCD sums.
P.S.: modulo testing via multiplication with the inverse is described in section 9 of the Granlund/Montgomery paper Division by Invariant Integers using Multiplication but it is hard to find info on efficient computation of inverses modulo powers of 2. Most sources use the Extended Euclid's algorithm or similar overkill. So here comes a function that computes multiplicative inverses modulo 2^32:
static uint ModularInverse (uint n)
{
uint x = 2 - n;
x *= 2 - x * n;
x *= 2 - x * n;
x *= 2 - x * n;
x *= 2 - x * n;
return x;
}
That's effectively five iterations of Newton-Raphson, in case anyone cares. ;-)
you can use Seive to store lowest prime Factor of all number less than equal to 10^7
and the by by prime factorization of given number calculate your answer directly..