How can I write a c++ program to calculate large factorials.
Example, if I want to calculate (100!) / (99!), we know the answer is 100, but if i calculate the factorials of the numerator and denominator individually, both the numbers are gigantically large.

expanding on Dirk's answer (which imo is the correct one):
#include "math.h"
#include "stdio.h"
int main(){
printf("%lf\n", (100.0/99.0) * exp(lgamma(100)-lgamma(99)) );
try it, it really does what you want even though it looks a little crazy if you are not familiar with it. Using a bigint library is going to be wildly inefficient. Taking exps of logs of gammas is super fast. This runs instantly.
The reason you need to multiply by 100/99 is that gamma is equivalent to n-1! not n!. So yeah, you could just do exp(lgamma(101)-lgamma(100)) instead. Also, gamma is defined for more than just integers.

You can use the Gamma function instead, see the Wikipedia page which also pointers to code.

Of course this particular expression should be optimized, but as for the title question, I like GMP because it offers a decent C++ interface, and is readily available.
#include <iostream>
#include <gmpxx.h>
mpz_class fact(unsigned int n)
mpz_class result(n);
while(n --> 1) result *= n;
return result;
int main()
mpz_class result = fact(100) / fact(99);
std::cout << result.get_str(10) << std::endl;
compiles on Linux with g++ -Wall -Wextra -o test test.cc -lgmpxx -lgmp

By the sounds of your comments, you also want to calculate expressions like 100!/(96!*4!).
Having "cancelled out the 96", leaving yourself with (97 * ... * 100)/4!, you can then keep the arithmetic within smaller bounds by taking as few numbers "from the top" as possible as you go. So, in this case:
i = 96
j = 4
result = i
while (i <= 100) or (j > 1)
if (j > 1) and (result % j == 0)
result /= j
result *= i
You can of course be cleverer than that in the same vein.
This just delays the inevitable, though: eventually you reach the limits of your fixed-size type. Factorials explode so quickly that for heavy-duty use you're going to need multiple-precision.

Here's an example of how to do so:
The approach they take is to store the big #s as a character array of digits.
Also see this SO question: Calculate the factorial of an arbitrarily large number, showing all the digits

You can use a big integer library like gmp which can handle arbitrarily large integers.

The only optimization that can be made here (considering that in m!/n! m is larger than n) means crossing out everything you can before using multiplication.
If m is less than n we would have to swap the elements first, then calculate the factorial and then make something like 1 / result. Note that the result in this case would be double and you should handle it as double.
Here is the code.
if (m == n) return 1;
// If 'm' is less than 'n' we would have
// to calculate the denominator first and then
// make one division operation
bool need_swap = (m < n);
if (need_swap) std::swap(m, n);
// #note You could also use some BIG integer implementation,
// if your factorial would still be big after crossing some values
// Store the result here
int result = 1;
for (int i = m; i > n; --i) {
result *= i;
// Here comes the division if needed
// After that, we swap the elements back
if (need_swap) {
// Note the double here
// If m is always > n then these lines are not needed
double fractional_result = (double)1 / result;
std::swap(m, n);
Also to mention (if you need some big int implementation and want to do it yourself) - the best approach that is not so hard to implement is to treat your int as a sequence of blocks and the best is to split your int to series, that contain 4 digits each.
Example: 1234 | 4567 | 2323 | 2345 | .... Then you'll have to implement every basic operation that you need (sum, mult, maybe pow, division is actually a tough one).

To solve x!/y! for x > y:
int product = 1;
for(int i=0; i < x - y; i ++)
product *= x-i;
If y > x switch the variables and take the reciprocal of your solution.

I asked a similar question, and got some pointers to some libraries:
How can I calculate a factorial in C# using a library call?
It depends on whether or not you need all the digits, or just something close. If you just want something close, Stirling's Approximation is a good place to start.


How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
int n, i, sum, j;
scanf("%d", &n);
scanf("%d", &num[i]);
printf("%d\n", sum);
return 0;
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
return res;
Live Demo.
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

C++ program which calculates ln for a given variable x without using any ready functions

I've searched for the equation which calculates the ln of a number x and found out that this equation is:
and I've written this code to implement it:
double ln = x-1 ;
for(int i=2;i<=5;i++)
double tmp = 1 ;
for(int j=1;j<=i;j++)
tmp *= (x-1) ;
ln -= (tmp/i) ;
ln += (tmp/i) ;
cout << "ln: " << setprecision(10) << ln << endl ;
but unfortunately I'm getting outputs completely different from output on my calculator especially for large numbers, can anyone tell me where is the problem ?
The equation you link to is an infinite series as implied by the ellipsis following the main part of the equation and as indicated more explicitly by the previous formulation on the same page:
In your case, you are only computing the first four terms. Later terms will add small refinements to the result to come closer to the actual value, but ultimately to compute all infinite steps will require infinite time.
However, what you can do is approximate your response to something like:
double ln(double x) {
// validate 0 < x < 2
double threshold = 1e-5; // set this to whatever threshold you want
double base = x-1; // Base of the numerator; exponent will be explicit
int den = 1; // Denominator of the nth term
int sign = 1; // Used to swap the sign of each term
double term = base; // First term
double prev = 0; // Previous sum
double result = term; // Kick it off
while (fabs(prev - result) > threshold) {
sign *=- 1;
term *= base;
prev = result;
result += sign * term / den;
return result;
Caution: I haven't actually tested this so it may need some tweaking.
What this does is compute each term until the absolute difference between two consecutive terms is less than some threshold you establish.
Now this is not a particularly efficient way to do this. It's better to work with the functions the language you're using (in this case C++) provides to compute the natural log (which another poster has, I believe already shown to you). But there may be some value in trying this for yourself to see how it works.
Also, as barak manos notes below, this Taylor series only converges on the range (0, 2), so you will need to validate the value of x lies in that range before trying to run actual computation.
I believe the natural log in C++ language is simply log
It wouldn't hurt to use long and long double instead of int and double. This may get a little more accuracy on some larger values. Also, your series only extending 5 levels deep is also limiting your accuracy.
Using a series like this is basically an approximation of the logarithmic answer.
This version should be somewhat faster:
double const scale = 1.5390959186233239e-16;
double const offset = -709.05401552996614;
double fast_ln(double x)
uint64_t xbits;
memcpy(&xbits, &x, 8);
// if memcpy not allowed, use
// for( i = 0; i < 8; ++i ) i[(char*)xbits] = i[(char*)x];
return xbits * scale + offset;
The trick is that this uses a 64-bit integer * 64-bit floating-point multiply, which involves a conversion of the integer to floating-point. Said floating-point representation is similar to scientific notation and requires a logarithm to find the appropriate exponent... but it is done purely in hardware and is very fast.
However it is doing a linear approximation within each octave, which is not very accurate. Using a lookup table for those bits would be far better.
That formula won't work for large inputs, because it would require you to take in consideration the highest degree member, which you can't because they are infinity many.
It will only work for small inputs, where only the first terms of your series are relevant.
You can find ways to do that here: http://en.wikipedia.or/wiki/Pollard%27s_rho_algorithm_for_logarithms
and here: http://www.netlib.org/cephes/qlibdoc.html#qlog
This should work. You just needed the part where if x>=2 shrink x by half and add 0.6931. The reason for 0.6931 is that is ln(2). If you wanted to you could add if (x >= 1024) return myLN(x/1024) + 6.9315 where 6.9315 is ln(1024). This will add speed for big values of x. The for loop with 100 could be much less like 20. I believe to get exact result for an integer its 17.
double myLN(double x) {
if (x >= 2) {
return myLN(x/2.0) + 0.6931;
x = x-1;
double total = 0.0;
double xToTheIPower = x;
for (unsigned i = 1; i < 100; i++) {
if (i%2 == 1) {
total += xToTheIPower / (i);
} else {
total -= xToTheIPower / (i);
xToTheIPower *= x;
return total;

Need a way to make this code run faster

I'm trying to solve Project Euler problem 401. They only way I could find a way to solve it was brute-force. I've been running this code for like 10 mins without any answer. Can anyone help me with ideas improve it.
#include <iostream>
#include <cmath>
#define ull unsigned long long
using namespace std;
ull sigma2(ull n);
ull SIGMA2(ull n);
int main()
ull ans = SIGMA2(1000000000000000) % 1000000000;
cout << "Answer: " << ans << endl;
return 0;
ull sigma2(ull n)
ull sum = 0;
for(ull i = 1; i<=floor(sqrt(n)); i++)
if(n%i == 0)
sum += (i*i)+((n/i)*(n/i));
if(i*i == n)
sum -= n;
return sum;
ull SIGMA2(ull n)
ull sum = 0;
for(ull i = 1; i<=n; i++)
return sum;
You're missing some dividers, if a/b=c, and b is a divider of a then c will also be a divider of a but cmight be greater than floor(sqrt(a)), for example 3 > floor(sqrt(6)) but divides 6.
Then you should put your floor(sqrt(n)) in a variable and use the variable in the for, otherwise you recalculate it a every operation which is very expensive.
You can do some straightforward optimizations:
inline sigma2,
calculate floor(sqrt(n)) before the loop (but compiler may be doing it anyway, though),
precalculate squares of all ints from 1 to n and then use array lookup instead of multiplication
You will gain more by changing your approach. Think what you are trying to do - summing squares of all divisors of all integers from 1 to n. You grouped divisors by what they divide, but you can regroup terms in this sum. Let's group divisors by their value:
1 divides everything so it will appear n times in the sum, bringing 1*1*n total,
2 divides evens and will appear n/2 (integer division!) times, bringing 2*2*(n/2) total,
k ... will bring k*k*(n/k) total.
So we should just add up k*k*(n/k) for k from 1 to n.
Think about the problem.
Bruteforce the way you tried is obviously not a good idea.
You should come up with something better...
Isn't there any method how to use some nice prime factorization method to speed up the computation? Isn't there any recursion pattern? Try to find something...
One simple optimization that you can carry out is that there will be many repeated factors in the numbers.
So first estimate in how many numbers would 1 be a factor ( all N numbers ).
In how many numbers would 2 be a factor ( N/2 ).
Similarly for others.
Just multiply their squares with their frequency.
Time complexity shall then straight-away reduce to O(N)
There are obvious microoptimizations such as ++i rather than i++ or getting floor(sqrt(n)) out of the loop (these are two floating point operations which are really expensive compared to other integer operation in the loop), and calculting n/i only once (use a dummy variable for it and then calculate the square of the dummy).
There are also rather obvious simplifications in the algorithm. For example SIGMA2(i) = SIGMA2(i-1) + sigma2(i). But do not use recursion since you need a really huge number, this would not work and your stack memory would be exhausted. Use loop instead of recursion. There is a huge potential for improvement.
And well, there is a bigger problem - 10^15 has 15 digits. This number squared has 30 digits. There is no way you can store this into unsigned long long, which has I think about 20 digits. So you need to employ somehow the modulo 10^9 (the end of the assignment) and get additional space for your calculations...
And when using brute force, print out the temporary result every milion number for example to give you idea how fast you are approaching to the final result. Waiting 10 minutes blindly is not a good idea.

Optimizing my code for finding the factors of a given integer

Here is my code,but i'lld like to optimize it.I don't like the idea of it testing all the numbers before the square root of n,considering the fact that one could be faced with finding the factors of a large number. Your answers would be of great help. Thanks in advance.
unsigned int* factor(unsigned int n)
unsigned int tab[40];
int dim=0;
for(int i=2;i<=(int)sqrt(n);++i)
return tab;
Here's a suggestion on how to do this in 'proper' c++ (since you tagged as c++).
PS. Almost forgot to mention: I optimized the call to sqrt away :)
See it live on http://liveworkspace.org/code/6e2fcc2f7956fafbf637b54be2db014a
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef unsigned int uint;
std::vector<uint> factor(uint n)
std::vector<uint> tab;
int dim=0;
for(unsigned long i=2;i*i <= n; ++i)
return tab;
void test(uint x)
auto v = factor(x);
std::cout << x << ":\t";
std::copy(v.begin(), v.end(), std::ostream_iterator<uint>(std::cout, ";"));
std::cout << std::endl;
int main(int argc, const char *argv[])
2: 2;
4: 2;2;
43: 43;
47: 47;
9997: 13;769;
There's a simple change that will cut the run time somewhat: factor out all the 2's, then only check odd numbers.
If you use
... i*i <= n; ...
It may run much faster than i <= sqrt(n)
By the way, you should try to handle factors of negative n or at least be sure you never pass a neg number
I'm afraid you cannot. There is no known method in the planet can factorize large integers in polynomial time. However, there are some methods can help you slightly (not significantly) speed up your program. Search Wikipedia for more references. http://en.wikipedia.org/wiki/Integer_factorization
As seen from your solution , you find basically all prime numbers ( the condition while (n%i == 0)) works like that , especially for the case of large numbers , you could compute prime numbers beforehand, and keep checking only those. The prime number calculation could be done using Sieve of Eratosthenes method or some other efficient method.
unsigned int* factor(unsigned int n)
If unsigned int is the typical 32-bit type, the numbers are too small for any of the more advanced algorithms to pay off. The usual enhancements for the trial division are of course worthwhile.
If you're moving the division by 2 out of the loop, and divide only by odd numbers in the loop, as mentioned by Pete Becker, you're essentially halving the number of divisions needed to factor the input number, and thus speed up the function by a factor of very nearly 2.
If you carry that one step further and also eliminate the multiples of 3 from the divisors in the loop, you reduce the number of divisions and hence increase the speed by a factor close to 3 (on average; most numbers don't have any large prime factors, but are divisible by 2 or by 3, and for those the speedup is much smaller; but those numbers are quick to factor anyway. If you factor a longer range of numbers, the bulk of the time is spent factoring the few numbers with large prime divisors).
// if your compiler doesn't transform that to bit-operations, do it yourself
while(n % 2 == 0) {
tab[dim++] = 2;
n /= 2;
while(n % 3 == 0) {
tab[dim++] = 3;
n /= 3;
for(int d = 5, s = 2; d*d <= n; d += s, s = 6-s) {
while(n % d == 0) {
tab[dim++] = d;
n /= d;
If you're calling that function really often, it would be worthwhile to precompute the 6542 primes not exceeding 65535, store them in a static array, and divide only by the primes to eliminate all divisions that are a priori guaranteed to not find a divisor.
If unsigned int happens to be larger than 32 bits, then using one of the more advanced algorithms would be profitable. You should still begin with trial divisions to find the small prime factors (whether small should mean <= 1000, <= 10000, <= 100000 or perhaps <= 1000000 would need to be tested, my gut feeling says one of the smaller values would be better on average). If after the trial division phase the factorisation is not yet complete, check whether the remaining factor is prime using e.g. a deterministic (for the range in question) variant of the Miller-Rabin test. If it's not, search a factor using your favourite advanced algorithm. For 64 bit numbers, I'd recommend Pollard's rho algorithm or an elliptic curve factorisation. Pollard's rho algorithm is easier to implement and for numbers of that magnitude finds factors in comparable time, so that's my first recommendation.
Int is way to small to encounter any performance problems. I just tried to measure the time of your algorithm with boost but couldn't get any useful output (too fast). So you shouldn't worry about integers at all.
If you use i*i I was able to calculate 1.000.000 9-digit integers in 15.097 seconds. It's good to optimize an algorithm but instead of "wasting" time (depends on your situation) it's important to consider if a small improvement really is worth the effort. Sometimes you have to ask yourself if you rally need to be able to calculate 1.000.000 ints in 10 seconds or if 15 is fine as well.

c++ how to expess a mathematical term

i have the following :
I only want (for now) to express the (s 1) , (s 2) term .
For example ,(s 1)=s , (s 2)= s(s-1)/2! , (s 3)=s(s-1)(s-2)/3!.
I created a factorial function :
//compute factorial
int fact(int x){
if (x==0)
return 1;
return fact(x-1)*x;
and i have problem in how to do right the above.
double s=(z-x[1])/h;
double s_term=0;
for (int p=1;p<=n;p++){
if p==1
Also, it is that : s=(x - x0)/h.
I don't know if i have declared right the s above.(i use x1 in the declaration because this is my starting point)
Thank you!
You can calculate the Binomial Coefficient simply using this function (probably the best for performance and memory usage):
unsigned long long ComputeBinomialCoefficient( int n, int k )
// Run-time assert to ensure correct behavior
assert( n > k && n > 1 );
// Exploit the symmetry in the line x = k/2:
if( k > n - k )
k = n - k;
unsigned long long c(1);
// Perform the product over the space i = [1...k]
for( int i = 1; i < k+1; i++ )
c *= n - (k - i);
c /= i;
return c;
You can then just call this when you see the brackets. (I'm assuming that is the Binomial Coefficient, rather than a 2D column vector?). This technique only uses 2 variables internally (taking up a grand total of 12 bytes), and uses no recursion.
Hope this helps! :)
EDIT: I'm curious how you're going to do the (I assume laplacian) operator? Are you intending to do the forward difference method for discrete values of x, and then calculate the 2nd derivative using the results from the first, then take the quotient?
The factorial part will be much more efficient using a loop rather than recursion.
As for the binomial coefficients, the line:
isn't going to have the desired effect, as you're only setting the first and last terms correctly and missing out the (s-1), (s-2), ..., (s-p+1) terms. It's easier to just use:
s_term = fact(s) / (fact(p) * fact(s-p))
for s choose p.
As others have pointed out implementing factorial and binomial coefficient functions is not easy (e.g. overflows lurk everywhere).
If you are interested in reasonable implementations as opposed to implementing all this yourself have a look at what is available in gsl which everybody dealing with numerical problems should know of.
#include <gsl/gsl_sf_gamma.h>
double factorial_10 = gsl_sf_fact(10);
double ten_over_four = gsl_sf_choose(10, 4);
Have also a look at the documentation. There are numerous functions returning the log instead of the value to avoid overflow problems.