Numbers of common distinct difference - c++

Given two array A and B. Task to find the number of common distinct (difference of elements in two arrays).
Example :
A=[3,6,8]
B=[1,6,10]
so we get differenceSet for A
differenceSetA=[abs(3-6),abs(6-8),abs(8-3)]=[3,5,2]
similiarly
differenceSetB=[abs(1-6),abs(1-10),abs(6-10)]=[5,9,4]
Number of common elements=Intersection :{differenceSetA,differenceSetB}={5}
Answer= 1
My approach O(N^2)
int commonDifference(vector<int> A,vector<int> B){
int n=A.size();
int m=B.size();
unordered_set<int> differenceSetA;
unordered_set<int> differenceSetB;
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
differenceSetA.insert(abs(A[i]-A[j]));
}
}
for(int i=0;i<m;i++){
for(int j=i+1;j<m;j++){
differenceSetB.insert(abs(B[i]-B[j]));
}
}
int count=0;
for(auto &it:differenceSetA){
if(differenceSetB.find(it)!=differenceSetB.end()){
count++;
}
}
return count;
}
Please provide suggestions for optimizing the approach in O(N log N)

If n is the maximum range of a input array, then the set of all differences of a given array can be obtained in O(n logn), as explained in this SO post: find all differences in a array
Here is a brief recall of the method, with a few additional practical implementation details:
Create an array Posi of length 2*n = 2*range = 2*(Vmax - Vmin + 1), where elements whose index matches an element of the input are set to 1, other elements are set to 0. This can be created in O(m), where m is the size of the array.
For example, given in input array [1,4,5] of size m, we create an array [1,0,0,1,1].
Initialisation: Posi[i] = 0 for all i (i = 0 to 2*n)
Posi[A[i] - Vmin] = 1 (i = 0 to m)
Calculate the autocorrelation function of array Posi[]. This can be classically performed in three sub-steps
2.1 Calculate the FFT (size 2*n) of Posi[]array: Y[] = FFT(Posi)
2.2 Calculate the square amplitude of the result: Y2[k] = Y[k] * conj([Y[k])
2.3 Calculate the Inverse FFT of the result Diff[] = IFFT (Y2[])`
A few details are worth being mentioned here:
The reason why a size 2*n was selected, and not a size n, if that, is d is a valid difference, then -d is also a valid difference. The results corresponding to negative differences are available at positions i >= n
If you find more easy to perform FFT with a size a-power-of-two, than you can replace the size 2*n with a value n2k = 2^k, with n2k >= 2*n
The non-null differences correspond to non-null values in the array Diff[]:
`d` is a difference if `Diff[d] > 0`
Another important details is that a classical FFT is used (float calculations), then you encounter little errors. To take it into account, it is important to replace the IFFT output Diff[] with integer rounded values of the real part.
All that concerns one array only. As you want to calculate the number of common differences, then you have to:
calculate the arrays Diff_A[] and Diff_B[] for both sets A and B and then:
count = 0;
if (Diff_A[d] != 0) and (Diff_B[d] != 0) then count++;
A little Bonus
In order to avoid a plagiarism of the mentioned post, here is an additional explanation about the way to get the differences of one set, with the help of the FFT.
The input array A = {3, 6, 8} can mathematically be represented by the following z transform:
A(z) = z^3 + z^6 + z^8
Then the corresponding z-transform of the difference array is equal to the polynomial product:
D(z) = A(z) * A(z*) = (z^3 + z^6 + z^8) (z^(-3) + z^(-6) + z^(-8))
= z^(-5) + z^(-3) + z^(-2) + 3 + z^2 + z^3 + z^5
Then, we can note that A(z) is equal to a FFT of size N of the sequence [0 0 0 1 0 0 1 0 1] by taking:
z = exp (-i * 2 PI/ N), with i = sqrt(-1)
Note that here we consider the classical FFT in C, the complex field.
It is certainly possible to perform calculation in a Galois field, and then no rounding errors, as it is done for example to implement "classical" multiplications (with z = 10) for a large number of digits. This seems over-skilled here.

Related

A problem of taking combination for set theory

Given an array A with size N. Value of a subset of Array A is defined as product of all numbers in that subset. We have to return the product of values of all possible non-empty subsets of array A %(10^9+7).
E.G. array A {3,5}
` Value{3} = 3,
Value{5} = 5,
Value{3,5} = 5*3 = 15
answer = 3*5*15 %(10^9+7).
Can someone explain the mathematics behind the problem. I am thinking of solving it by combination to solve it efficiently.
I have tried using brute force it gives correct answer but it is way too slow.
Next approach is using combination. Now i think that if we take all the sets and multiply all the numbers in those set then we will get the correct answer. Thus i have to find out how many times a number is coming in calculation of answer. In the example 5 and 3 both come 2 times. If we look closely, each number in a will come same number of times.
You're heading in the right direction.
Let x be an element of the given array A. In our final answer, x appears p number of times, where p is equivalent to the number of subsets of A possible that include x.
How to calculate p? Once we have decided that we will definitely include x in our subset, we have two choices for the rest N-1 elements: either include them in set or do not. So, we conclude p = 2^(N-1).
So, each element of A appears exactly 2^(N-1) times in the final product. All remains is to calculate the answer: (a1 * a2 * ... * an)^p. Since the exponent is very large, you can use binary exponentiation for fast calculation.
As Matt Timmermans suggested in comments below, we can obtain our answer without actually calculating p = 2^(N-1). We first calculate the product a1 * a2 * ... * an. Then, we simply square this product n-1 times.
The corresponding code in C++:
int func(vector<int> &a) {
int n = a.size();
int m = 1e9+7;
if(n==0) return 0;
if(n==1) return (m + a[0]%m)%m;
long long ans = 1;
//first calculate ans = (a1*a2*...*an)%m
for(int x:a){
//negative sign does not matter since we're squaring
if(x<0) x *= -1;
x %= m;
ans *= x;
ans %= m;
}
//now calculate ans = [ ans^(2^(n-1)) ]%m
//we do this by squaring ans n-1 times
for(int i=1; i<n; i++){
ans = ans*ans;
ans %= m;
}
return (int)ans;
}
Let,
A={a,b,c}
All possible subset of A is ={{},{a},{b},{c},{a,b},{b,c},{c,a},{a,b,c,d}}
Here number of occurrence of each of the element are 4 times.
So if A={a,b,c,d}, then numbers of occurrence of each of the element will be 2^3.
So if the size of A is n, number of occurrence of eachof the element will be 2^(n-1)
So final result will be = a1^p*a2^pa3^p....*an^p
where p is 2^(n-1)
We need to solve x^2^(n-1) % mod.
We can write x^2^(n-1) % mod as x^(2^(n-1) % phi(mod)) %mod . link
As mod is a prime then phi(mod)=mod-1.
So at first find p= 2^(n-1) %(mod-1).
Then find Ai^p % mod for each of the number and multiply with the final result.
I read the previous answers and I was understanding the process of making sets. So here I am trying to put it in as simple as possible for people so that they can apply it to similar problems.
Let i be an element of array A. Following the approach given in the question, i appears p number of times in final answer.
Now, how do we make different sets. We take sets containing only one element, then sets containing group of two, then group of 3 ..... group of n elements.
Now we want to know for every time when we are making set of certain numbers say group of 3 elements, how many of these sets contain i?
There are n elements so for sets of 3 elements which always contains i, combinations are (n-1)C(3-1) because from n-1 elements we can chose 3-1 elements.
if we do this for every group, p = [ (n-1)C(x-1) ] , m going from 1 to n. Thus, p= 2^(n-1).
Similarly for every element i, p will be same. Thus we get
final answer= A[0]^p *A[1]^p...... A[n]^p

Given an integer n, return the number of ways it can be represented as a sum of 1s and 2s

For example:
5 = 1+1+1+1+1
5 = 1+1+1+2
5 = 1+1+2+1
5 = 1+2+1+1
5 = 2+1+1+1
5 = 1+2+2
5 = 2+2+1
5 = 2+1+2
Can anyone give a hint for a pseudo code on how this can be done please.
Honestly have no clue how to even start.
Also this looks like an exponential problem can it be done in linear time?
Thank you.
In the example you have provided order of addends is important. (See the last two lines in your example). With this in mind, the answer seems to be related to Fibonacci numbers. Let's F(n) be the ways n can be written as 1s and 2s. Then the last addened is either 1 or 2. So F(n) = F(n-1) + F(n-2). These are the initial values:
F(1) = 1 (1 = 1)
F(2) = 2 (2 = 1 + 1, 2 = 2)
This is actually the (n+1)th Fibonacci number. Here's why:
Let's call f(n) the number of ways to represent n. If you have n, then you can represent it as (n-1)+1 or (n-2)+2. Thus the ways to represent it are the number of ways to represent it is f(n-1) + f(n-2). This is the same recurrence as the Fibonacci numbers. Furthermore, we see if n=1 then we have 1 way, and if n=2 then we have 2 ways. Thus the (n+1)th Fibonacci number is your answer. There are algorithms out there to compute enormous Fibonacci numbers very quickly.
Permutations
If we want to know how many possible orderings there are in some set of size n without repetition (i.e., elements selected are removed from the available pool), the factorial of n (or n!) gives the answer:
double factorial(int n)
{
if (n <= 0)
return 1;
else
return n * factorial(n - 1);
}
Note: This also has an iterative solution and can even be approximated using the gamma function:
std::round(std::tgamma(n + 1)); // where n >= 0
The problem set starts with all 1s. Each time the set changes, two 1s are replaced by one 2. We want to find the number of ways k items (the 2s) can be arranged in a set of size n. We can query the number of possible permutations by computing:
double permutation(int n, int k)
{
return factorial(n) / factorial(n - k);
}
However, this is not quite the result we want. The problem is, permutations consider ordering, e.g., the sequence 2,2,2 would count as six distinct variations.
Combinations
These are essentially permutations which ignore ordering. Since the order no longer matters, many permutations are redundant. Redundancy per permutation can be found by computing k!. Dividing the number of permutations by this value gives the number of combinations:
Note: This is known as the binomial coefficient and should be read as "n choose k."
double combination(int n, int k)
{
return permutation(n, k) / factorial(k);
}
int solve(int n)
{
double result = 0;
if (n > 0) {
for ( int k = 0; k <= n; k += 1, n -= 1 )
result += combination(n, k);
}
return std::round(result);
}
This is a general solution. For example, if the problem were instead to find the number of ways an integer can be represented as a sum of 1s and 3s, we would only need to adjust the decrement of the set size (n-2) at each iteration.
Fibonacci numbers
The reason the solution using Fibonacci numbers works, has to do with their relation to the binomial coefficients. The binomial coefficients can be arranged to form Pascal's triangle, which when stored as a lower-triangular matrix, can be accessed using n and k as row/column indices to locate the element equal to combination(n,k).
The pattern of n and k as they change over the lifetime of solve, plot a diagonal when viewed as coordinates on a 2-D grid. The result of summing values along a diagonal of Pascal's triangle is a Fibonacci number. If the pattern changes (e.g., when finding sums of 1s and 3s), this will no longer be the case and this solution will fail.
Interestingly, Fibonacci numbers can be computed in constant time. Which means we can solve this problem in constant time simply by finding the (n+1)th Fibonacci number.
int fibonacci(int n)
{
constexpr double SQRT_5 = std::sqrt(5.0);
constexpr double GOLDEN_RATIO = (SQRT_5 + 1.0) / 2.0;
return std::round(std::pow(GOLDEN_RATIO, n) / SQRT_5);
}
int solve(int n)
{
if (n > 0)
return fibonacci(n + 1);
return 0;
}
As a final note, the numbers generated by both the factorial and fibonacci functions can be extremely large. Therefore, a large-maths library may be needed if n will be large.
Here is the code using backtracking which solves your problem. At each step, while remembering the numbers used to get the sum so far(using vectors here), first make a copy of them, first subtract 1 from n and add it to the copy then recur with n-1 and the copy of the vector with 1 added to it and print when n==0. then return and repeat the same for 2, which essentially is backtracking.
#include <stdio.h>
#include <vector>
#include <iostream>
using namespace std;
int n;
void print(vector<int> vect){
cout << n <<" = ";
for(int i=0;i<vect.size(); ++i){
if(i>0)
cout <<"+" <<vect[i];
else cout << vect[i];
}
cout << endl;
}
void gen(int n, vector<int> vect){
if(!n)
print(vect);
else{
for(int i=1;i<=2;++i){
if(n-i>=0){
std::vector<int> vect2(vect);
vect2.push_back(i);
gen(n-i,vect2);
}
}
}
}
int main(){
scanf("%d",&n);
vector<int> vect;
gen(n,vect);
}
This problem can be easily visualized as follows:
Consider a frog, that is present in front of a stairway. It needs to reach the n-th stair, but he can only jump 1 or 2 steps on the stairway at a time. Find the number of ways in which he can reach the n-th stair?
Let T(n) denote the number of ways to reach the n-th stair.
So, T(1) = 1 and T(2) = 2(2 one-step jumps or 1 two-step jump, so 2 ways)
In order to reach the n-th stair, we already know the number of ways to reach the (n-1)th stair and the (n-2)th stair.
So, once can simple reach the n-th stair by a 1-step jump from (n-1)th stair or a 2-step jump from (n-2)th step...
Hence, T(n) = T(n-1) + T(n-2)
Hope it helps!!!

Calculate this factorial term in C++ with basic datatypes

I am solving a programming problem, and in the end the problem boils down to calculating following term:
n!/(n1!n2!n3!....nm!)
n<50000
(n1+n2+n3...nm)<n
I am given that the final answer will fit in 8 byte. I am using C++. How should I calculate this. I am able to come up with some tricks but nothing concrete and generalized.
EDIT:
I would not like to use external libraries.
EDIT1 :
Added conditions and result will be definitely 64 bit int.
If the result is guaranteed to be an integer, work with the factored representation.
By the theorem of Legendre, you can express all these factorials by the sequence of exponents of the primes in the range (2,n).
By deducting the exponents of the factorials in the denominator from those in the numerator, you will obtain exponents for the whole quotient. The computation will then reduce to a product of primes that will never overflow the 8 bytes.
For example,
25! = 2^22.3^10.5^6.7^3.11^2.13.17.19.23
15! = 2^11.3^6.5^3.7^2.11.13
10! = 2^8.3^4.5^2.7
yields
25!/(15!.10!) = 2^3.5.11.17.19.23 = 3268760
The exponents of, say, 3 are found by
25/3 + 25/9 = 10
15/3 + 15/9 = 6
10/3 + 10/9 = 4
If all the input (not necessarily the output) is made of integers, you could try to count prime factors. You create an array of size sqrt(n) and fill it with the counts of each prime factor in n :
vector <int> v = vector <int> (sqrt(n)+1,0);
int m = 2;
while (m <=n) {
int i = 2;
int a = m;
while (a >1) {
while (a%i ==0) {
v[i] ++;
a/=i;
}
i++;
}
m++;
}
Then you iterate over the n_k (1 <= k <= m) and you decrease the count for each prime factor. This is pretty much the same code as above except that you replace the v[i]++ by v[i] --. Of course you need to call it with vector v previously obtained.
After that the vector v contains the list of count of prime factors in your expression and you just need to reconstruct the result as
int result = 1;
for (int i = 2; i < v.size(); v++) {
result *= pow(i,v[i]);
}
return result;
Note : you should use long long int instead of int above but I stick to int for simplicity
Edit : As mentioned in another answer, it would be better to use Legendre theorem to fill / unfill the vector v faster.
What you can do is to use the properties of the logarithm:
log(AB) = log(A) + log(B)
log(A/B) = log(A) - log(B)
and
X = e^(log(X))
So you can first compute the logarithm of your quantity, then exponentiate back:
log(N!/(n1!n2!...nk!)) = log(1) + ... + log(N) - [log(n1!) - ... log(nk!)]
then expand log(n1!) etc. so you end up writing everything in terms of logarithm of single numbers. Then take the exponential of your result to obtain the initial value of the factorial.
As #T.C. mentioned, this method may not be to accurate, although in typical scenarios you'll have many terms reduced. Alternatively, you expand each factorial into a list that stores the terms in its product, e.g. 6! will be stored in a list {1,2,3,4,5,6}. You do the same for the denominator terms. Then you start removing common elements. Finally, you can take gcd's and reduce everything to coprime factors, then compute the result.

Sum of difference of a number to an array of numbers

This is my problem.
Given an array of integers and another integer k, find the sum of differences of each element of the array and k.
For example if the array is 2, 4, 6, 8, 10 and k is 3
Sum of difference
= abs(2 - 3) + abs(4-3) + abs(6 - 3) + abs(8 - 3) + abs(10 - 3)
= 1 + 1 + 3 + 5 + 7
= 17
The array remains the same throughout and can contain up to 100000 elements and there will be 100000 different values of k to be tested. k may or may not be an element of the array. This has to be done within 1s or about 100M operations. How do I achieve this?
You can run multiple queries for sums of absolute differences in O(log N) if you add a preprocessing step which costs O(N * log N).
Sort the array, then for each item in the array store the sum of all numbers that are smaller than or equal to the corresponding item. This can be done in O(N * log N) Now you have a pair of arrays that look like this:
2 4 6 8 10 // <<== Original data
2 6 12 20 30 // <<== Partial sums
In addition, store the total T of all numbers in the array.
Now you can get sums of absolute differences by running a binary search on the original array, and using the sums from the partial sums array to compute the answer: subtract the sum of all numbers to the left of the target k from the count of numbers to the left of the target times k, then subtract the count times k from the sum to the right of the number, and add the two numbers together. The partial sum of the numbers to the right of the number can be computed by subtracting the partial sum on the left from the total T.
For k=3 binary search gets you to position 1.
Partial sum on the left is 2
Count of items on the left is 1
Partial sum on the right is (30-2)=28
Count of items on the right is 4
You compute (1*3-2) + (28-4*3) = 1 + 16 = 17
First sort the array and then compute an array that stores the sum of the prefixes of the resulting sorted array. Let's denote this array p, you can compute p in linear time so that p[i] = a[0] + a[1] + ... a[i]. Now having this array you can answer with constant complexity the question what is the sum of elements a[x] + a[x+1] + .... +a[y](i.e. with indices x to y). To do that you simply compute p[y] - p[x-1](Take special care when x is 1).
Now to answer a query of the type what is the sum of absolute differences with k, we will split the problem in two parts - what is the sum of the numbers greater than k and the numbers smaller than k. In order to compute these, perform a binary search to find the position of k in the sorted a(denote that idx), and compute the sum of the values in a before idx(denote that s) and after idx(denote that S). Now the sum of absolute differences with k is idx * k - s + S - (a.length - idx)* k. This of course is pseudo code and what I mean by a.length is the number of elements in a.
After performing a linearithmic precomputation, you will be able to answer a query with O(log(n)). Please note this approach only makes sense if you plan to perform multiple queries. If you are only going to perform a single query, you can not possibly go faster than O(n).
Just implementing dasblinkenlight's solution in "contest C++":
It does exactly as he says. Reads the values, sorts them, stores the accumulated sum in V[i].second, but here V[i] is the acumulated sum until i-1 (to simplify the algorithm). It also stores a sentinel in V[n] for cases when the query is greater than max(V).
Then, for each query, binary search for the value. In this case V[a].second is the sum of values lesser than query, V[n].second-V[a].second is the sum of values greater than it.
#include<iostream>
#include<algorithm>
#define pii pair<int, int>
using namespace std;
pii V[100001];
int main() {
int n;
while(cin >> n) {
for(int i=0; i<n; i++)
cin >> V[i].first;
sort(V, V+n);
V[0].second = 0;
for(int i=1; i<=n; i++)
V[i].second = V[i-1].first + V[i-1].second;
int k; cin >> k;
for(int i=0; i<k; i++) {
int query; cin >> query;
pii* res = upper_bound(V, V+n, pii(query, 0));
int a = res-V, b=n-(res-V);
int left = query*a-V[a].second;
int right = V[n].second-V[a].second-query*b;
cout << left+right << endl;
}
}
}
It assumes a file with a format like this:
5
10 2 8 4 6
2
3 5
Then, for each query, it answers like this:
17
13

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w