I'm doing Coin Row Problem. And i got a small problem.
There is a row of n coins whose values are some positive integers c1, c2, . . . , cn, not necessarily distinct.
The goal is to pick up the maximum amount of money subject to the constraint that you cannot pick up any two adjacent coins. For instance, in the example below, once you pick up 10, you cannot take either 6 or the left-hand 2.
Example:
enter the number of coins: 6
enter the value of all coins : 5 1 2 10 6 2
The maximum amount of coin : 17
The selected coins to get maximum value : C1 , C4 , C6
I wanna get Selected coins (C1, C4, C6 in ex).
Here is my function code
I just can get only maximum amount in this code.
int getanswer(int array[],int len)
{
int C[20];
for (int j = 0; j < len; j++)
{
C[j + 1] = array[j];
}
int F[20];
F[0] = 0;
F[1] = C[1];
for (int j = 2; j < len+1; j++)
{
F[j] = max(C[j] + F[j - 2], F[j - 1]);
printf("temp :%d\n", C[j]);
}
return F[len];
}
How can i get Selected coins with my code?
A good solution would involve recursion, backtracking, and memoization (dynamic programming). Write a recursive routine that tries each of the available choices from the left end, and then recurs on the remaining list. Your current algorithm has a blind spot for unbalanced values just over its visible horizon (2 elements out).
Here's some pseudo-code to help you start.
int pickup(coin[])
{
// base case: <= 2 coins left
if size(coin) == 0 // return 0 for an empty list
return 0
if size(coin) <= 2 // if only 1 or 2 coins left, return the larger
return max(coin)
// Otherwise, experiment:
// pick *each* of the first two coins, solve the remaining problem,
// and compare results.
pick1 = coin[0] + pickup(coin[2:]) // pick 1st coin; recur on rest of list
pick2 = coin[1] + pickup(coin[3:]) // pick 2nd coin; recur on rest of list
return max(pick1, pick2)
That's the general attack. You can speed up the solution a lot with memoization. Also, you'll need to convert this to your preferred implementation language and add tracking to this so you get the indices you want. If all you need is to return the coin values in order, it's a simple to accumulate an array of those values, pre-pending one on each return.
Related
I am trying to solve a problem asked in TCS MockVita 2019 Round 2:
Problem Description
Dr Felix Kline, the Math teacher at Gauss School introduced the following game to teach his students problem solving. He places a series of “hopping stones” (pieces of paper) in a line with points (a positive number) marked on each of the stones.
Students start from one end and hop to the other end. One can step on a stone and add the number on the stone to their cumulative score or jump over a stone and land on the next stone. In this case, they get twice the points marked on the stone they land but do not get the points marked on the stone they jumped over.
At most once in the journey, the student is allowed (if they choose) to do a “double jump”– that is, they jump over two consecutive stones – where they would get three times the points of the stone they land on, but not the points of the stone they jump over.
The teacher expected his students to do some thinking and come up with a plan to get the maximum score possible. Given the numbers on the sequence of stones, write a program to determine the maximum score possible.
Constraints
The number of stones in the sequence< 30
Input Format
The first line contains N, the number of integers (this is a positive integer)
The next line contains the N points (each a positive integer) separated by commas. These are the points on the stones in the order the stones are placed.
Output
One integer representing the maximum score
Test Case
Explanation
Example 1
Input
3
4,2,3
Output
10
Explanation
There are 3 stones (N=3), and the points (in the order laid out) are 4,2 and 3 respectively.
If we step on the first stone and skip the second to get 4 + 2 x 3 = 10. A double jump to the third stone will get only 9. Hence the result is 10, and the double jump is not used
Example 2
Input
6
4,5,6,7,4,5
Output
35
Explanation
N=6, and the sequence of points is given.One way of getting 35 is to start with a double jump to stone 3 (3 x 6=18), go to stone 4 (7) and jump to stone 6 (10 points) for a total of 35. The double jump was used only once, and the result is 35.
I found that it's a Dynamic programming problem, but I don't know what I did wrong because my solution is not able to pass all the test cases. My code passed all the tests I created.
unordered_map<int, int> lookup;
int res(int *arr, int n, int i){
if(i == n-1){
return 0;
}
if(i == n-2){
return arr[i+1];
}
if(lookup.find(i) != lookup.end())
return lookup[i];
int maxScore = 0;
if(i< n-3 && flag == false){
flag = true;
maxScore = max(maxScore, 3 * (arr[i+3]) + res(arr, n, i+3));
flag = false;
}
maxScore = max(maxScore, (arr[i+1] + res(arr,n,i+1)));
lookup[i] = max(maxScore, 2 * (arr[i+2]) + res(arr, n, i+2));
return lookup[i];
}
cout << res(arr, n, 0) + arr[0]; // It is inside the main()
I expect you to find the mistake in my code and give the correct solution, and any test case which fails this solution. Thanks :)
You don't need any map. All you need to remember are last few maximal values. You have two options every move (except two first), end with double jump made or without it. If you don't want ot make a dj then your best joice is maximum of last stone + current and stone before last + 2 * current max(no_dj[2] + arr[i], no_dj[1] + 2 * arr[i]).
On the other hand, if you want to have dj made than you have three options, either jump one stone after some previous dj dj[2] + arr[i] or jump over last stone after some dj dj[1] + 2 * arr[i] or do double jump in current move no_dj[0] + 3 * arr[i].
int res(int *arr, int n){
int no_dj[3]{ 0, 0, arr[0]};
int dj[3]{ 0, 0, 0};
for(int i = 1; i < n; i++){
int best_nodj = max(no_dj[1] + 2 * arr[i], no_dj[2] + arr[i]);
int best_dj = 0;
if(i > 1) best_dj = max(max(dj[1] + 2 * arr[i], dj[2] + arr[i]), no_dj[0] + 3 * arr[i]);
no_dj[0] = no_dj[1];
no_dj[1] = no_dj[2];
no_dj[2] = best_nodj;
dj[0] = dj[1];
dj[1] = dj[2];
dj[2] = best_dj;
}
return max(no_dj[2], dj[2]);
}
All you have to remember are two arrays of three elements. Last three maximum values after double jump and last three maximum values without double jump.
I have been given the following assignment:
Given N integers in the form of A(i) where 1≤i≤N, make each number
A(i) in the N numbers equal to M. To convert a number A(i) to M, it
will cost |M−Ai| units. Find out the minimum cost to convert all the N
numbers to M, so you should choose the best M to get the minimum cost.
Given:
1 <= N <= 10^5
1 <= A(i) <= 10^9
My approach was to calculate the sum of all numbers and find avg = sum / n and then subtract each number by avg to get the minimum cost.
But this fails in many test cases. How can I find the optimal solution for this?
You should take the median of the numbers (or either of the two numbers nearest the middle if the list has even length), not the mean.
An example where the mean fails to minimize is: [1, 2, 3, 4, 100]. The mean is 110 / 5 = 22, and the total cost is 21 + 20 + 19 + 18 + 78 = 156. Choosing the median (3) gives total cost: 2 + 1 + 0 + 1 + 97 = 101.
An example where the median lies between two items in the list is [1, 2, 3, 4, 5, 100]. Here the median is 3.5, and it's ok to either use M=3 or M=4. For M=3, the total cost is 2 + 1 + 0 + 1 + 2 + 97 = 103. For M=4, the total cost is 3 + 2 + 1 + 0 + 1 + 96 = 103.
A formal proof of correctness can be found on Mathematics SE, although you may convince yourself of the result by noting that if you nudge M a small amount delta in one direction (but not past one of the data points) -- and for example's sake let's say it's in the positive direction, the total cost increases by delta times the number of points to the left of M minus delta times the number of points to the right of M. So M is minimized when the number of points to its left and the right are equal in number, otherwise you could move it a small amount one way or the other to decrease the total cost.
#PaulHankin already provided a perfect answer. Anyway, when thinking about the problem, I didn't think of the median being the solution. But even if you don't know about the median, you can come up with a programming solution.
I made similar observations as #PaulHankin in the last paragraph of his last answer. This made me realize, that I have to eliminate outliers iteratively in order to find m. So I wrote a program that first sorts the input array (vector) A and then analyzes the minimum and maximum values.
The idea is to move the minimum values towards the second smallest values and the maximum values towards the second largest values. You always move either the minimum or maximum values, depending on whether you have less minimum values than maximum values or not. If all array items end up being the same value, then you found your m:
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int getMinCount(vector<int>& A);
int getMaxCount(vector<int>& A);
int main()
{
// Example as given by #PaulHankin
vector<int> A;
A.push_back(1);
A.push_back(2);
A.push_back(3);
A.push_back(4);
A.push_back(100);
sort(A.begin(), A.end());
int minCount = getMinCount(A);
int maxCount = getMaxCount(A);
while (minCount != A.size() && maxCount != A.size())
{
if(minCount <= maxCount)
{
for(int i = 0; i < minCount; i++)
A[i] = A[minCount];
// Recalculate the count of the minium value, because we changed the minimum.
minCount = getMinCount(A);
}
else
{
for(int i = 0; i < maxCount; i++)
A[A.size() - 1 - i] = A[A.size() - 1 - maxCount];
// Recalculate the count of the maximum value, because we changed the maximum.
maxCount = getMaxCount(A);
}
}
// Print out the one and only remaining value, which is m.
cout << A[0] << endl;
return 0;
}
int getMinCount(vector<int>& A)
{
// Count how often the minimum value exists.
int minCount = 1;
int pos = 1;
while (pos < A.size() && A[pos++] == A[0])
minCount++;
return minCount;
}
int getMaxCount(vector<int>& A)
{
// Count how often the maximum value exists.
int maxCount = 1;
int pos = A.size() - 2;
while (pos >= 0 && A[pos--] == A[A.size() - 1])
maxCount++;
return maxCount;
}
If you think about the algorithm, then you will come to the conclusion, that it actually calculates the median of the values in the array A. As example input I took the first example given by #PaulHankin. As expected, the code provides the correct result (3) for it.
I hope my approach helps you to understand how to tackle such kind of problems even if you don't know the correct solution. This is especially helpful when you are in an interview, for example.
So, I was solving the following question: http://www.spoj.com/problems/ROADS/en/
N cities named with numbers 1 ... N are connected with one-way roads. Each road has two parameters associated with it: the road length and the toll that needs to be paid for the road (expressed in the number of coins). Bob and Alice used to live in the city 1. After noticing that Alice was cheating in the card game they liked to play, Bob broke up with her and decided to move away - to the city N. He wants to get there as quickly as possible, but he is short on cash. We want to help Bob to find the shortest path from the city 1 to the city N that he can afford with the amount of money he has.
Input
The input begins with the number t of test cases. Then t test cases follow. The first line of the each test case contains the integer K, 0 <= K <= 10000, maximum number of coins that Bob can spend on his way. The second line contains the integer N, 2 <= N <= 100, the total number of cities. The third line contains the integer R, 1 <= R <= 10000, the total number of roads. Each of the following R lines describes one road by specifying integers S, D, L and T separated by single blank characters : S is the source city, 1 <= S <= N D is the destination city, 1 <= D <= N L is the road length, 1 <= L <= 100. T is the toll (expressed in the number of coins), 0 <= T <= 100 Notice that different roads may have the same source and destination cities.
Output
For each test case, output a single line contain the total length of the shortest path from the city 1 to the city N whose total toll is less than or equal K coins. If such path does not exist, output -1.
Now, what I did was, I tried to use the djikstra's algorithm for this which is as follows:
Instead of only having a single node as the state, I take
node and coins as one state and then apply dijkstra.
length is the weight between the states.
and I minimize the length without exceeding the total coins.
My code is as follows:
using namespace std;
#define ll long long
#define pb push_back
#define mp make_pair
class node
{
public:
int vertex;
int roadlength;
int toll;
};
int dist[101][101]; // for storing roadlength
bool visited[101][10001];
int cost[101][101]; // for storing cost
int ans[101][10001]; // actual distance being stored here
void djikstra(int totalcoins, int n);
bool operator < (node a, node b)
{
if (a.roadlength != b.roadlength)
return a.roadlength < b.roadlength;
else if (a.toll != b.toll)
return a.toll < b.toll;
return a.vertex < b.vertex;
}
int main (void)
{
int a,b,c,d;
int r,t,k,n,i,j;
cin>>t;
while (t != 0)
{
cin>>k>>n>>r;
for (i = 1; i <= 101; i++)
for (j = 1; j <= 101; j++)
dist[i][j] = INT_MAX;
for (i = 0; i <= n; i++)
for (j = 0; j <= k; j++)
ans[i][j] = INT_MAX;
for ( i = 0; i <= n; i++ )
for (j = 0; j <= k; j++ )
visited[i][j] = false;
for (i = 0; i < r; i++)
{
cin>>a>>b>>c>>d;
if (a != b)
{
dist[a][b] = c;
cost[a][b] = d;
}
}
djikstra(k,n);
int minlength = INT_MAX;
for (i = 1; i <= k; i++)
{
if (ans[n][i] < minlength)
minlength = ans[n][i];
}
if (minlength == INT_MAX)
cout<<"-1\n";
else
cout<<minlength<<"\n";
t--;
}
cout<<"\n";
return 0;
}
void djikstra(int totalcoins, int n)
{
set<node> myset;
myset.insert((node){1,0,0});
ans[1][0] = 0;
while (!myset.empty())
{
auto it = myset.begin();
myset.erase(it);
int curvertex = it->vertex;
int a = it->roadlength;
int b = it->toll;
if (visited[curvertex][b] == true)
continue;
else
{
visited[curvertex][b] = true;
for (int i = 1; i <= n; i++)
{
if (dist[curvertex][i] != INT_MAX)
{
int foo = b + cost[curvertex][i];
if (foo <= totalcoins)
{
if (ans[i][foo] >= ans[curvertex][b] + cost[curvertex][i])
{
ans[i][foo] = ans[curvertex][b] + cost[curvertex][i];
myset.insert((node){i,ans[i][foo],foo});
}
}
}
}
}
}
}
Now, I have two doubts:
Firstly, my output is not coming correct for the first given test case of the question, i.e.
Sample Input:
2
5
6
7
1 2 2 3
2 4 3 3
3 4 2 4
1 3 4 1
4 6 2 1
3 5 2 0
5 4 3 2
0
4
4
1 4 5 2
1 2 1 0
2 3 1 1
3 4 1 0
Sample Output:
11
-1
My output is coming out to be, 4 -1 which is wrong for the first test case. Where am I going wrong in this?
How do I handle the condition of having multiple edges? That is, question mentions, Notice that different roads may have the same source and destination cities. How do I handle this condition?
The simple way to store the roads is as a vector of vectors. For each origin city, you want to have a vector of all roads leading from that city.
So when you are processing a discovered "best" path to a city, you would iterate through all roads from that city to see if they might be "best" paths to some other city.
As before you have two interacting definitions of "best" than cannot be simply combined into one definition. Shortest is more important, so the main definition of "best" is shortest considering cheapest only in case of ties. But you also need the alternate definition of "best" considering only cheapest.
As I suggested for the other problem, you can sort on the main definition of "best" so you always process paths that are better in that definition before paths that are worse. Then you need to track the best seen so far for the second definition of "best" such that you only prune paths from processing when they are not better in the second definition from what you already processed prioritized by the first definition.
I haven't read your code, however I can tell you the problem cannot be solved with an unmodified version of Dijkstra's algorithm.
The problem is at least as hard as the binary knapsack problem. How? The idea is to construct the knapsack problem within the stated problem. Since the knapsack problem is known to be not solvable within polynomial time, neither is the stated problem's. Since Dijkstra's algorithm is a polynomial algorithm, it therefore could not apply.
Consider a binary knapsack problem with a set of D many values X and a maximum value m = max(X). Now construct the proposed problem as such:
Let there be D + 1 cities where city n is connected to city n + 1 by two roads. Let cities 1 through D uniquely correspond to a value v in X. Let only two roads from such a city n go only to city n + 1, one costing v with distance m - v + 1, and the other costing 0 with a distance of m + 1.
In essence, "you get exactly what you pay for" -- for every coin you spend, your trip will be one unit of distance shorter.
This reframes the problem to be "what's the maximum Bob can spend by only spending money either no or one time on each toll?" And that's the same as the binary knapsack problem we started with.
Hence, if we solve the stated problem, we also can solve the binary knapsack problem, and therefore the stated problem cannot be any more "efficient" to solve than the binary knapsack problem -- with Dijkstra's algorithm is.
Problem: "An algorithm to find the number of six digit numbers where the sum of the first three digits is equal to the sum of the last three digits."
I came across this problem in an interview and want to know the best solution. This is what I have till now.
Approach 1: The Brute force solution is, of course, to check for each number (between 100,000 and 999,999) whether the sum of its first three and last three digits are equal. If yes, then increment certain counter which keeps count of all such numbers.
But this checks for all 900,000 numbers and so is inefficient.
Approach 2: Since we are asked "how many" such numbers and not "which numbers", we could do better. Divide the number into two parts: First three digits (these go from 100 to 999) and Last three digits (these go from 000 to 999). Thus, the sum of three digits in either part of a candidate number can range from 1 to 27.
* Maintain a std::map<int, int> for each part where key is the sum and value is number of numbers (3 digit) having that sum in the corresponding part.
* Now, for each number in the first part find out its sum and update the corresponding map.
* Similarly, we can get updated map for the second part.
* Now by multiplying the corresponding pairs (e.g. value in map 1 of key 4 and value in map 2 of key 4) and adding them up we get the answer.
In this approach, we end up checking 1K numbers.
My question is how could we further optimize? Is there a better solution?
For 0 <= s <= 18, there are exactly 10 - |s - 9| ways to obtain s as the sum of two digits.
So, for the first part
int first[28] = {0};
for(int s = 0; s <= 18; ++s) {
int c = 10 - (s < 9 ? (9 - s) : (s - 9));
for(int d = 1; d <= 9; ++d) {
first[s+d] += c;
}
}
That's 19*9 = 171 iterations, for the second half, do it similarly, with the inner loop starting at 0 instead of 1, that's 19*10 = 190 iterations. Then sum first[i]*second[i] for 1 <= i <= 27.
Generate all three-digit numbers; partition them into sets based on their sum of digits. (Actually, all you need to do is keep a vector that counts the size of the sets). For each set, the number of six-digit numbers that can be generated is the size of the set squared. Sum up the squares of the set sizes to get your answer.
int sumCounts[28]; // sums can go from 0 through 27
for (int i = 0; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += count * count;
}
EDIT Variation to eliminate counting leading zeroes:
int sumCounts[28];
int sumCounts2[28];
for (int i = 0; i < 100; ++i) {
int s = sumOfDigits(i);
sumCounts[s]++;
sumCounts2[s]++;
}
for (int i = 100; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += (count - sumCounts2[i]) * count;
}
Python Implementation
def equal_digit_sums():
dists = {}
for i in range(1000):
digits = [int(d) for d in str(i)]
dsum = sum(digits)
if dsum not in dists:
dists[dsum] = [0,0]
dists[dsum][0 if len(digits) == 3 else 1] += 1
def prod(dsum):
t = dists[dsum]
return (t[0]+t[1])*t[0]
return sum(prod(dsum) for dsum in dists)
print(equal_digit_sums())
Result: 50412
One idea: For each number from 0 to 27, count the number of three-digit numbers that have that digit sum. This should be doable efficiently with a DP-style approach.
Now you just sum the squares of the results, since for each answer, you can make a six-digit number with one of those on each side.
Assuming leading 0's aren't allowed, you want to calculate how many different ways are there to sum to n with 3 digits. To calculate that you can have a for loop inside a for loop. So:
firstHalf = 0
for i in xrange(max(1,n/3),min(9,n+1)): #first digit
for j in xrange((n-i)/2,min(9,n-i+1)): #second digit
firstHalf +=1 #Will only be one possible third digit
secondHalf = firstHalf + max(0,10-|n-9|)
If you are trying to sum to a number, then the last number is always uniquely determined. Thus in the case where the first number is 0 we are just calculating how many different values are possible for the second number. This will be n+1 if n is less than 10. If n is greater, up until 18 it will be 19-n. Over 18 there are no ways to form the sum.
If you loop over all n, 1 through 27, you will have your total sum.
Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w