from number of possible pairs get the size of the original array - combinations

I have this function
public int numberOfPossiblePairs(int n)
{
int k=2;
if (k>n-k) { k=n-k;}
int b=1;
for (int i=1, m=n; i<=k; i++, m--)
b=b*m/i;
return b;
}
Which gets the number of pairs you can make from given number of items, so for example, if you have an array of 1000 items, you can make 499,500 pairs. But what I actually need is the opposite. In other words a function that would take 499500 as the parameter, and return 1000 as the original number of unique items that could produce that many pairs. (Would be a bonus if it could also handle imperfect numbers, like 499501, of which there is no number of unique items that makes exactly that many unique pairs, but it would still return 1000 as the closest since it produces 499500 pairs.)
I realize I could just incrementally loop numberOfPossiblePairs() until I see the number I am looking for, but seems like there should be an algorithmic way of doing this rather than brute forcing it like that.

Your problem boils down to a little algebra and can be solved in O(1) time. We first note that your function does not give the number of permutations of pairs, but rather the number of combinations of pairs. At any rate the logic that follows can be easily altered to accommodate permutations.
We start off by writing the formula for number of combinations choose k.
Setting n = 1000 and r = 2 gives:
1000! / (2!(998)!) = 1000 * 999 / 2 = 499500
Just as numberOfPossiblePairs(1000) does.
Moving on in our exercise, for our example we have r = 2, thus:
total = n! / ((n - 2)! * 2!)
We now simplify:
total = (n * (n - 1)) / 2
total * 2 = n^2 - n
n^2 - n - 2 * total = 0
Now we can apply the quadratic formula to solve for n.
Here we have x = n, a = 1, b = -1, and c = -2 * total which give:
n = (-(-1) +/- sqrt(1^2 - 4 * 1 * (-2 * total))) / 2
Since we are only interested in positive numbers we exclude the negative solution. In code we have something like (Note, it looks like the OP is using Java and I am not an expert here... the following is C++):
int originalNumber(int total) {
int result;
result = (1 + std::sqrt(1 - 4 * 1 * (-2 * total))) / 2;
return result;
}
As for the bonus question of returning the closest value if the result isn't a whole number, we could simply round the result before coercing to an integer:
int originalNumber(int total) {
int result;
double temp;
temp = (1 + std::sqrt(1 - 4 * 1 * (-2 * total))) / 2;
result = (int) std::round(temp);
return result;
}
Now when values like 500050 are passed, the actual result is 1000.55, and the above would return 1001, whereas the first solution would return 1000.

Related

Numbers of common distinct difference

Given two array A and B. Task to find the number of common distinct (difference of elements in two arrays).
Example :
A=[3,6,8]
B=[1,6,10]
so we get differenceSet for A
differenceSetA=[abs(3-6),abs(6-8),abs(8-3)]=[3,5,2]
similiarly
differenceSetB=[abs(1-6),abs(1-10),abs(6-10)]=[5,9,4]
Number of common elements=Intersection :{differenceSetA,differenceSetB}={5}
Answer= 1
My approach O(N^2)
int commonDifference(vector<int> A,vector<int> B){
int n=A.size();
int m=B.size();
unordered_set<int> differenceSetA;
unordered_set<int> differenceSetB;
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
differenceSetA.insert(abs(A[i]-A[j]));
}
}
for(int i=0;i<m;i++){
for(int j=i+1;j<m;j++){
differenceSetB.insert(abs(B[i]-B[j]));
}
}
int count=0;
for(auto &it:differenceSetA){
if(differenceSetB.find(it)!=differenceSetB.end()){
count++;
}
}
return count;
}
Please provide suggestions for optimizing the approach in O(N log N)
If n is the maximum range of a input array, then the set of all differences of a given array can be obtained in O(n logn), as explained in this SO post: find all differences in a array
Here is a brief recall of the method, with a few additional practical implementation details:
Create an array Posi of length 2*n = 2*range = 2*(Vmax - Vmin + 1), where elements whose index matches an element of the input are set to 1, other elements are set to 0. This can be created in O(m), where m is the size of the array.
For example, given in input array [1,4,5] of size m, we create an array [1,0,0,1,1].
Initialisation: Posi[i] = 0 for all i (i = 0 to 2*n)
Posi[A[i] - Vmin] = 1 (i = 0 to m)
Calculate the autocorrelation function of array Posi[]. This can be classically performed in three sub-steps
2.1 Calculate the FFT (size 2*n) of Posi[]array: Y[] = FFT(Posi)
2.2 Calculate the square amplitude of the result: Y2[k] = Y[k] * conj([Y[k])
2.3 Calculate the Inverse FFT of the result Diff[] = IFFT (Y2[])`
A few details are worth being mentioned here:
The reason why a size 2*n was selected, and not a size n, if that, is d is a valid difference, then -d is also a valid difference. The results corresponding to negative differences are available at positions i >= n
If you find more easy to perform FFT with a size a-power-of-two, than you can replace the size 2*n with a value n2k = 2^k, with n2k >= 2*n
The non-null differences correspond to non-null values in the array Diff[]:
`d` is a difference if `Diff[d] > 0`
Another important details is that a classical FFT is used (float calculations), then you encounter little errors. To take it into account, it is important to replace the IFFT output Diff[] with integer rounded values of the real part.
All that concerns one array only. As you want to calculate the number of common differences, then you have to:
calculate the arrays Diff_A[] and Diff_B[] for both sets A and B and then:
count = 0;
if (Diff_A[d] != 0) and (Diff_B[d] != 0) then count++;
A little Bonus
In order to avoid a plagiarism of the mentioned post, here is an additional explanation about the way to get the differences of one set, with the help of the FFT.
The input array A = {3, 6, 8} can mathematically be represented by the following z transform:
A(z) = z^3 + z^6 + z^8
Then the corresponding z-transform of the difference array is equal to the polynomial product:
D(z) = A(z) * A(z*) = (z^3 + z^6 + z^8) (z^(-3) + z^(-6) + z^(-8))
= z^(-5) + z^(-3) + z^(-2) + 3 + z^2 + z^3 + z^5
Then, we can note that A(z) is equal to a FFT of size N of the sequence [0 0 0 1 0 0 1 0 1] by taking:
z = exp (-i * 2 PI/ N), with i = sqrt(-1)
Note that here we consider the classical FFT in C, the complex field.
It is certainly possible to perform calculation in a Galois field, and then no rounding errors, as it is done for example to implement "classical" multiplications (with z = 10) for a large number of digits. This seems over-skilled here.

Sieve of Eratosthenes on a segment

Sieve of Eratosthenes on the segment:
Sometimes you need to find all the primes that are in the range
[L...R] and not in [1...N], where R is a large number.
Conditions:
You are allowed to create an array of integers with size
(R−L+1).
Implementation:
bool isPrime[r - l + 1]; //filled by true
for (long long i = 2; i * i <= r; ++i) {
for (long long j = max(i * i, (l + (i - 1)) / i * i); j <= r; j += i) {
isPrime[j - l] = false;
}
}
for (long long i = max(l, 2); i <= r; ++i) {
if (isPrime[i - l]) {
//then i is prime
}
}
What is the logic behind setting the lower limit of 'j' in second for loop??
Thanks in advance!!
Think about what we want to find. Ignore the i*i part. We have only
(L + (i - 1)) / i * i) to consider. (I wrote the L capital since l and 1 look quite similar)
What should it be? Obviously it should be the smallest number within L..R that is divisible by i. That's when we want to start to sieve out.
The last part of the formula, / i * i finds the next lower number that is divisible by i by using the properties of integer division.
Example: 35 div 4 * 4 = 8 * 4 = 32, 32 is the highest number that is (equal or) lower than 35 which is divisible by 4.
The L is where we want to start, obviously, and the + (i-1) makes sure that we don't find the highest number equal or lower than but the smallest number equal or bigger than L that is divisible by i.
Example: (459 + (4-1)) div 4 * 4 = 462 div 4 * 4 = 115 * 4 = 460.
460 >= 459, 460 | 4, smallest number with that property
(the max( i*i, ...) is only so that i is not sieved out itself if it is within L..R, I think, although I wonder why it's not 2 * i)
For reasons of readability, I'd made this an inline function next_divisible(number, divisor) or the like. And I'd make it clear that integer division is used. If not, somebody clever might change it to regular division, with which it wouldn't work.
Also, I strongly recommend to wrap the array. It is not obvious to the outside that the property for a number X is stored at position X - L. Something like a class RangedArray that does that shift for you, allowing you a direct input of X instead of X - L, could easily take the responsibility. If you don't do that, at least make it a vector, outside of a innermost class, you shouldn't use raw arrays in C++.

How to reduce execution time in C++ for the following code?

I have written this code which has an execution time of 3.664 sec but the time limit is 3 seconds.
The question is this-
N teams participate in a league cricket tournament on Mars, where each
pair of distinct teams plays each other exactly once. Thus, there are a total
of (N × (N­1))/2 matches. An expert has assigned a strength to each team,
a positive integer. Strangely, the Martian crowds love one­sided matches
and the advertising revenue earned from a match is the absolute value of
the difference between the strengths of the two matches. Given the
strengths of the N teams, find the total advertising revenue earned from all
the matches.
Input format
Line 1 : A single integer, N.
Line 2 : N space ­separated integers, the strengths of the N teams.
#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
int stren[200000];
for(int a=0;a<n;a++)
cin>>stren[a];
long long rev=0;
for(int b=0;b<n;b++)
{
int pos=b;
for(int c=pos;c<n;c++)
{
if(stren[pos]>stren[c])
rev+=(long long)(stren[pos]-stren[c]);
else
rev+=(long long)(stren[c]-stren[pos]);
}
}
cout<<rev;
}
Can you please give me a solution??
Rewrite your loop as:
sort(stren);
for(int b=0;b<n;b++)
{
rev += (2 * b - n + 1) * static_cast<long long>(stren[b]);
}
Live code here
Why does it workYour loops make all pairs of 2 numbers and add the difference to rev. So in a sorted array, bth item is subtracted (n-1-b) times and added b times. Hence the number 2 * b - n + 1
There can be 1 micro optimization that possibly is not needed:
sort(stren);
for(int b = 0, m = 1 - n; b < n; b++, m += 2)
{
rev += m * static_cast<long long>(stren[b]);
}
In place of the if statement, use
rev += std::abs(stren[pos]-stren[c]);
abs returns the positive difference between two integers. This will be much quicker than an if test and ensuing branching. The (long long) cast is also unnecessary although the compiler will probably optimise that out.
There are other optimisations you could make, but this one should do it. If your abs function is poorly implemented on your system, you could always make use of this fast version for computing the absolute value of i:
(i + (i >> 31)) ^ (i >> 31) for a 32 bit int.
This has no branching at all and would beat even an inline ternary! (But you should use int32_t as your data type; if you have 64 bit int then you'll need to adjust my formula.) But we are in the realms of micro-optimisation here.
for(int b = 0; b < n; b++)
{
for(int c = b; c < n; c++)
{
rev += abs(stren[b]-stren[c]);
}
}
This should give you a speed increase, might be enough.
An interesting approach might be to collapse down the strengths from an array - if that distribution is pretty small.
So:
std::unordered_map<int, int> strengths;
for (int i = 0; i < n; ++i) {
int next;
cin >> next;
++strengths[next];
}
This way, we can reduce the number of things we have to sum:
long long rev = 0;
for (auto a = strengths.begin(); a != strengths.end(); ++a) {
for (auto b = std::next(a), b != strengths.end(); ++b) {
rev += abs(a->first - b->first) * (a->second * b->second);
// ^^^^ stren diff ^^^^^^^^ ^^ number of occurences ^^
}
}
cout << rev;
If the strengths tend to be repeated a lot, this could save a lot of cycles.
What exactly we are doing in this problem is: For all combinations of pairs of elements, we are adding up the absolute values of the differences between the elements of the pair. i.e. Consider the sample input
3 10 3 5
Ans (Take only absolute values) = (3-10) + (3-3) + (3-5) + (10-3) + (10-5) + (3-5) = 7 + 0 + 2 + 7 + 5 + 2 = 23
Notice that I have fixed 3, iterated through the remaining elements, found the differences and added them to Ans, then fixed 10, iterated through the remaining elements and so on till the last element
Unfortunately, N(N-1)/2 iterations are required for the above procedure, which wouldn't be ok for the time limit.
Could we better it?
Let's sort the array and repeat this procedure. After sorting, the sample input is now 3 3 5 10
Let's start by fixing the greatest element, 10 and iterating through the array like how we did before (of course, the time complexity is the same)
Ans = (10-3) + (10-3) + (10-5) + (5-3) + (5-3) + (3-3) = 7 + 7 + 5 + 2 + 2 = 23
We could rearrange the above as
Ans = (10)(3)-(3+3+5) + 5(2) - (3+3) + 3(1) - (3)
Notice a pattern? Let's generalize it.
Suppose we have an array of strengths arr[N] of size N indexed from 0
Ans = (arr[N-1])(N-1) - (arr[0] + arr[1] + ... + arr[N-2]) + (arr[N-2])(N-2) - (arr[0] + arr[1] + arr[N-3]) + (arr[N-3])(N-3) - (arr[0] + arr[1] + arr[N-4]) + ... and so on
Right. So let's put this new idea to work. We'll introduce a 'sum' variable. Some basic DP to the rescue.
For i=0 to N-1
sum = sum + arr[i]
Ans = Ans + (arr[i+1]*(i+1)-sum)
That's it, you just have to sort the array and iterate only once through it. Excluding the sorting part, it's down to N iterations from N(N-1)/2, I suppose that's called O(N) time EDIT: That is O(N log N) time overall
Hope it helped!

Dynamic Programming solution for a Recursion solution

Given an input n , find the sum of all the possible combinations of numbers 1 ... n.
For example, if n=3 , then all the possible combinations are
(1),(2),(3),(1,2),(1,3),(2,3),(1,2,3)
and their sum is
1 + 2 + 3 + (1+2) + (1+3) + (2+3) + (1+2+3) =24
I am able to solve this problem using recursion. How can I solve this problem using Dynamic Programming ?
#include<iostream>
using namespace std;
int sum=0,n;
int f(int pos,int s)
{
if(pos>n)
{
return 0;
}
else
{
for(int i=pos+1;i<=n;++i)
{
sum+=s+i;
f(i,s+i);
}
}
}
int main()
{
cin>>n;
sum=0;
f(0,0);
cout<<sum<<'\n';
}
}
EDIT
Though this problem can be solved in constant time using this series.
But I want to know how this can be done using Dynamic Programming as I am very weak at it.
You do not need to use dynamic programming; you can use simple arithmetic if you want.
The number of cases is 2 ^ n, since each number is either on or off for a given sum.
Each number from 1 to n is used in exactly half of the sums, so each number comes 2 ^ (n-1) times.
1 + 2 + ... + n = (n - 1) * n / 2.
So the sum is (n - 1) * n / 2 * 2 ^ (n-1).
For n = 3, it is (4*3/2) * 4 = 24.
EDIT: if you really want to use dynamic programming, here's one way.
Dynamic programming makes use of saving the results of sub-problems to make the super problem faster to solve. In this question, the sub-problem would be the sum of all combinations from 1 ... n-1.
So create a mapping from n -> (number of combinations, sum of combinations).
Initialize with 1 -> (2,1). Because there are two combinations {0,1} and the sum is 1. Including 0 just makes the math a bit easier.
Then your iteration step is to use the mapping.
Let's say (n-1) -> (k,s), meaning there are k sets that sum to s for 1 ... n-1.
Then the number of sets for n is k * 2 (each combination either has n or does not).
And the sum of all combinations is s + (s + k * n), since you have the previous sum (where n is missing) plus the sum of all the combinations with n (which should be k * n more than s because there are k new combinations with n in each).
So add n -> (2*k,2*s + k*n).
And your final answer is the s in n -> (k,s).
let dp[n] be the result, Therefore:
dp[1] = 1
dp[n] = 2 * dp[n-1] + 2^(n-1) * n
First, it is obvious that dp[1] = 1
Second, dp[n] is the sum which contains n and sum which didn't contains n
E.G: dp[3] = {(1) (2) (1,2)} + {(3), (1,3), (2,3), (1,2,3)}
We can find dp[n-1] appear twice and the number of n appear 2^(n-1) times
I think maybe it is what you want.

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?
This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009.
Input
The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000.
Output
For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009.
Example
Input:
3
3 1
5 2
10 3
Output:
3
10
120
Notice that 1009 is a prime.
Now you can use Lucas' Theorem.
Which states:
Let p be a prime.
If n = a1a2...ar when written in base p and
if k = b1b2...br when written in base p
(pad with zeroes if required)
Then
(n choose k) modulo p = (a1 choose b1) * (a2 choose b2) * ... * (ar choose br) modulo p.
i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.
Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.
This should make it easier even when N is close to 10^15.
Note:
For N=10^15, N choose N/2 is more than
2^(100000000000000) which is way
beyond an unsigned long long.
Also, the algorithm suggested by
Lucas' theorem is O(log N) which is
exponentially faster than trying to
compute the binomial coefficient
directly (even if you did a mod 1009
to take care of the overflow issue).
Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):
class Binomial
{
public:
Binomial(int Max)
{
max = Max+1;
table = new unsigned int * [max]();
for (int i=0; i < max; i++)
{
table[i] = new unsigned int[max]();
for (int j = 0; j < max; j++)
{
table[i][j] = 0;
}
}
}
~Binomial()
{
for (int i =0; i < max; i++)
{
delete table[i];
}
delete table;
}
unsigned int Choose(unsigned int n, unsigned int k);
private:
bool Contains(unsigned int n, unsigned int k);
int max;
unsigned int **table;
};
unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
if (n < k) return 0;
if (k == 0 || n==1 ) return 1;
if (n==2 && k==1) return 2;
if (n==2 && k==2) return 1;
if (n==k) return 1;
if (Contains(n,k))
{
return table[n][k];
}
table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
return table[n][k];
}
bool Binomial::Contains(unsigned int n, unsigned int k)
{
if (table[n][k] == 0)
{
return false;
}
return true;
}
Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.
Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.
After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.
Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.
So for example consider the case of C((1009^3), 200).
Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.
On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.
That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.
We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).
Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!
I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes
Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while.
What you might use is, if
n = n0 + n1*p + n2*p^2 ... + nd*p^d
m = m0 + m1*p + m2*p^2 ... + md*p^d
(where 0<=mi,ni < p)
then
C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p
see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html
One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.
psudo code for calculating nCk:
result = 1
for i=1 to min{K,N-K}:
result *= N-i+1
result /= i
return result
Time Complexity: O(min{K,N-K})
The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because
C(k,n) = C(k, n-k)
And you can calculate the thing even more efficiently if you use the GammaLn function.
nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))
The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.
The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.
binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5
do while col < n
binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
output(binomial_coefficient)
col = col + 1
loop
The output of binomial coefficients is therefore:
1
1 * (5 + 1 - (0 + 1)) / (0 + 1) = 5
5 * (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5
5 * (5 + 1 - (4 + 1)) / (4 + 1) = 1
I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(