Is the runtime complexity of the Unbounded Knapsack problem really O(n×W)? - c++

Given a knapsack weight W and a set of n items each with certain value value_i and weight w_i, we need to calculate the maximum value we can get from the items with total weight ≤ W. This is different from classical Knapsack problem, here we are allowed to use unlimited number of instances of an item (adapted from [1]).
There is a standard solution for this problem using dynamic programming: instead of creating a function f: CurrentWeight -> MaxRemainingValue, we create a function g: (CurrentWeight, CurrentItem) -> MaxRemainingValue that iterates through each item. This way we avoid cycles in the implicit graph formed by g, and then we can use dynamic programming. One standard implementation of such function can be seen here in C++:
#include<iostream>
#include<vector>
#include <chrono>
#define INF (int)1e8
using namespace std;
using namespace std::chrono;
const int MAX_WEIGHT = 10000;
const int N_ITEMS = 10;
int memo[N_ITEMS][MAX_WEIGHT + 1];
vector<int> weights;
vector<int> values;
void initializeMemo(){
for(int i = 0; i < N_ITEMS; i++)
for(int j = 0; j < MAX_WEIGHT + 1; j++)
memo[i][j] = -1;
}
// return max possible remaining value
int dpUnboundedKnapsack(int currentItem, int currentWeight){
if(currentItem == N_ITEMS)
return 0;
int& ans = memo[currentItem][currentWeight];
if(ans != -1)
return ans;
int n_taken = 0;
while(true){
int newWeight = currentWeight + n_taken * weights[currentItem];
if(newWeight > MAX_WEIGHT)
break;
int value = n_taken * values[currentItem];
ans = max(ans, dpUnboundedKnapsack(currentItem+1, newWeight) + value);
n_taken++;
}
return ans;
}
int main(){
initializeMemo();
// weights equal 1
weights = vector<int>(N_ITEMS, 1);
// values equal 1, 2, 3, ... N_ITEMS
values = vector<int>(N_ITEMS);
for(int i = 0; i < N_ITEMS; i++)
values[i] = i+1;
auto start = high_resolution_clock::now();
cout << dpUnboundedKnapsack(0, 0) << endl;
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << duration.count() << endl;
return 0;
}
This solution transverses a DAG (we never go back an item, so there is no cycles here, all edges are of the form (item, weight) -> (item+1, new_weight) ). Each edge of the DAG is visited at most once. Hence, the time complexity of this algorithm is O(E), with E being the number of edges of the graph. In the worst case scenario, each weight is equal to 1, so each vertex (item, weigth) connects to, on average, other W/2 vertexes. So we have O(E) = O(W·#vertexes) = O(W·W·n) = O(W^2·n). The problem is, everywhere I look on the internet it is said the runtime complexity of this algorithm is O(W·n) because every vertex is calculated only once [1][2][3]. This explanation does not seem to make sense. Also, if that was the case, the algorithm above should not run so slowly. Here is a table of the algorithm runtime × MAX_WEIGHT value (try this for yourself, just have to run the code above):
MAX_WEIGHT time (microseconds)
100 1349
1000 45773
10000 5490555
20000 21249396
40000 80694646
We can clearly see a O(W^2) trend for large values of W, as suspected.
Finally, one may ask: this worst case scenario is pretty dull, as you can just take the greatest value for each repeated weight. Indeed, with this simple pre-processing the worst case scenario now becomes the one with weights = [1, 2, 3, 4, ..., n]. In this case there are around O(W^2·log(n) + W·n) edges (see the image below. I tried my best, hope you understand). So the runtime complexity of the algorithm should be O(W^2·log(n) + W·n) instead of O(W·n) as suggested pretty much every where?
Btw, here is the runtime I obtained for the case weights = [1, 2, 3, 4, ..., n]:
MAX_WEIGHT time (microseconds)
100 964
200 1340
400 2541
800 6878
1000 10407
10000 1202582
20000 5181070
40000 18761689
[1] https://www.geeksforgeeks.org/unbounded-knapsack-repetition-items-allowed/
[2] https://en.wikipedia.org/wiki/Knapsack_problem#Dynamic_programming_in-advance_algorithm
[3] Why is the knapsack problem pseudo-polynomial?

Related

Implementing a crossover function for multiple "Salesmen" TSP in a genetic algorithm

I’m trying to solve a variant of the TSP problem with “multiple salesmen". I have a series of n waypoints and m drones and I want to generate a result which sorts of balances the number of waypoints between drones and returns an acceptable shortest travelling time. At the moment, I'm not really too worried about finding an optimal solution, I just want something that works at this point. I've sort of distilled my problem to a traditional TSP run multiple times. My example is for a series of waypoints:
[0,1,2,3,4,5,6,7,8,9,10,11]
where 0 == 11 is the start and end point. Say I have 4 drones, I want to generate something like:
Drone A = [0,1,2,3,11]
Drone B = [0,5,6,7,11]
Drone C = [0,4,8,11]
Drone D = [0,9,10,11]
However, I’m struggling to generate a consistent output in my crossover function. My current function looks like this:
DNA DNA::crossover( DNA &parentB)
{
// sol holds the individual solution for
// each drone
std::vector<std::vector<std::size_t>> sol;
// contains the values in flattened sol
// used to check for duplicates
std::vector<std::size_t> flat_sol;
// returns the number of solutions
// required
int number_of_paths = this→getSolution().size();
// limits the number of waypoints required for each drone
// subtracting 2 to remove “0” and “11”
std::size_t max_wp_per_drone = ((number_of_cities-2)/number_of_drones) + 1;
for(std::size_t i = 0; i < number_of_paths; i++)
{
int start = rand() % (this->getSolution().at(i).size() -2) + 1;
int end = start + 1 + rand() % ((this->getSolution().at(i).size()-2) - start +1);
std::vector<std::size_t>::const_iterator first = this->getSolution().at(i).begin()+start;
std::vector<std::size_t>::const_iterator second = this- >getSolution().at(i).begin()+end;
// First Problem occurs here… Sometimes, newOrder can return nothing based on
//the positions of start and end. Tried to mitigate by putting a while loop
to regenerate the vector
std::vector<std::size_t> newOrder(first, second);
// RETURNS a vector from the vector of vectors sol
flat_sol = flatten(sol);
// compare new Order with solution and remove any duplicates..
for(std::size_t k = 0; k < newOrder.size(); k++ )
{
int duplicate = newOrder.at(k);
if(std::find(flat_sol.begin(), flat_sol.end(), duplicate) != flat_sol.end())
{
// second problem is found here, sometimes,
// new order might only return a vector with a single value
// or values that have already been assigned to another drone.
// In this case, those values are removed and newOrder is now 0
newOrder.erase(newOrder.begin()+k);
}
}
// attempt to create the vectors here.
for(std::size_t j = 1; j <=parentB.getSolution().at(i).size()-2; j++)
{
int city = parentB.getSolution().at(i).at(j);
if(newOrder.empty())
{
if(std::find(flat_sol.begin(), flat_sol.end(), city) == flat_sol.end())
{
newOrder.push_back(city);
}
}
else if((std::find(newOrder.begin(), newOrder.end(), city) == newOrder.end())
&&(std::find(flat_sol.begin(), flat_sol.end(), city) == flat_sol.end())
&& newOrder.size() < max_wp_per_drone )
{
newOrder.push_back(city);
}
}
sol.push_back(newOrder);
}
// waypoints and number_of drones are known,
//0 and 11 are appended to each vector in sol in the constructor.
return DNA(sol, waypoints, number_of_drones);
}
A sample output from my previous runs return the following:
[0,7,9,8, 11]
[0, 1,2,4,11]
[0, 10, 6, 11]
[0,3,11]
// This output is missing one waypoint.
[0,10,7,5, 11]
[0, 8,3,1,11]
[0, 6, 9, 11]
[0,2,4,11]
// This output is correct.
Unfortunately, this means in my subsequent generations of new children. and me getting the correct output seems to be random. For example, for one generation, I had a population size which had 40 correct children and 60 children with missing waypoints while in some cases, I've had more correct children. Any tips or help is appreciated.
Solved this by taking a slightly different approach. Instead of splitting the series of waypoints before perfoming crossover, I simply pass the series of waypoints
[0,1,2,3,4,5,6,7,8,9,10,11]
perform crossover, and when computing fitness of each set, I split the waypoints based on m drones and find the best solution of each generation. New crossover function looks like this:
DNA DNA::crossover( DNA &parentB)
{
int start = rand () % (this->getOrder().size()-1);
int end = getRandomInt<std::size_t>(start +1 , this->getOrder().size()-1);
std::vector<std::size_t>::const_iterator first = this->getOrder().begin() + start;
std::vector<std::size_t>::const_iterator second = this->getOrder().begin() + end;
std::vector<std::size_t> newOrder(first, second);
for(std::size_t i = 0; i < parentB.getOrder().size(); i++)
{
int city = parentB.getOrder().at(i);
if(std::find(newOrder.begin(), newOrder.end(), city) == newOrder.end())
{
newOrder.push_back(city);
}
}
return DNA(newOrder, waypoints, number_of_drones);
}

How to calculate the minimum cost to convert all n numbers in an array to m?

I have been given the following assignment:
Given N integers in the form of A(i) where 1≤i≤N, make each number
A(i) in the N numbers equal to M. To convert a number A(i) to M, it
will cost |M−Ai| units. Find out the minimum cost to convert all the N
numbers to M, so you should choose the best M to get the minimum cost.
Given:
1 <= N <= 10^5
1 <= A(i) <= 10^9
My approach was to calculate the sum of all numbers and find avg = sum / n and then subtract each number by avg to get the minimum cost.
But this fails in many test cases. How can I find the optimal solution for this?
You should take the median of the numbers (or either of the two numbers nearest the middle if the list has even length), not the mean.
An example where the mean fails to minimize is: [1, 2, 3, 4, 100]. The mean is 110 / 5 = 22, and the total cost is 21 + 20 + 19 + 18 + 78 = 156. Choosing the median (3) gives total cost: 2 + 1 + 0 + 1 + 97 = 101.
An example where the median lies between two items in the list is [1, 2, 3, 4, 5, 100]. Here the median is 3.5, and it's ok to either use M=3 or M=4. For M=3, the total cost is 2 + 1 + 0 + 1 + 2 + 97 = 103. For M=4, the total cost is 3 + 2 + 1 + 0 + 1 + 96 = 103.
A formal proof of correctness can be found on Mathematics SE, although you may convince yourself of the result by noting that if you nudge M a small amount delta in one direction (but not past one of the data points) -- and for example's sake let's say it's in the positive direction, the total cost increases by delta times the number of points to the left of M minus delta times the number of points to the right of M. So M is minimized when the number of points to its left and the right are equal in number, otherwise you could move it a small amount one way or the other to decrease the total cost.
#PaulHankin already provided a perfect answer. Anyway, when thinking about the problem, I didn't think of the median being the solution. But even if you don't know about the median, you can come up with a programming solution.
I made similar observations as #PaulHankin in the last paragraph of his last answer. This made me realize, that I have to eliminate outliers iteratively in order to find m. So I wrote a program that first sorts the input array (vector) A and then analyzes the minimum and maximum values.
The idea is to move the minimum values towards the second smallest values and the maximum values towards the second largest values. You always move either the minimum or maximum values, depending on whether you have less minimum values than maximum values or not. If all array items end up being the same value, then you found your m:
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int getMinCount(vector<int>& A);
int getMaxCount(vector<int>& A);
int main()
{
// Example as given by #PaulHankin
vector<int> A;
A.push_back(1);
A.push_back(2);
A.push_back(3);
A.push_back(4);
A.push_back(100);
sort(A.begin(), A.end());
int minCount = getMinCount(A);
int maxCount = getMaxCount(A);
while (minCount != A.size() && maxCount != A.size())
{
if(minCount <= maxCount)
{
for(int i = 0; i < minCount; i++)
A[i] = A[minCount];
// Recalculate the count of the minium value, because we changed the minimum.
minCount = getMinCount(A);
}
else
{
for(int i = 0; i < maxCount; i++)
A[A.size() - 1 - i] = A[A.size() - 1 - maxCount];
// Recalculate the count of the maximum value, because we changed the maximum.
maxCount = getMaxCount(A);
}
}
// Print out the one and only remaining value, which is m.
cout << A[0] << endl;
return 0;
}
int getMinCount(vector<int>& A)
{
// Count how often the minimum value exists.
int minCount = 1;
int pos = 1;
while (pos < A.size() && A[pos++] == A[0])
minCount++;
return minCount;
}
int getMaxCount(vector<int>& A)
{
// Count how often the maximum value exists.
int maxCount = 1;
int pos = A.size() - 2;
while (pos >= 0 && A[pos--] == A[A.size() - 1])
maxCount++;
return maxCount;
}
If you think about the algorithm, then you will come to the conclusion, that it actually calculates the median of the values in the array A. As example input I took the first example given by #PaulHankin. As expected, the code provides the correct result (3) for it.
I hope my approach helps you to understand how to tackle such kind of problems even if you don't know the correct solution. This is especially helpful when you are in an interview, for example.

Is Coin Change Algorithm That Output All Combinations Still Solvable By DP?

For example, total amount should be 5 and I have coins with values of 1 and 2. Then there are 3 ways of combinations:
1 1 1 1 1
1 1 1 2
1 2 2
I've seen some posts about how to calculate total number of combinations with dynamic programming or with recursion, but I want to output all the combinations like my example above. I've come up with a recursive solution below.
It's basically a backtracking algorithm, I start with the smallest coins first and try to get to the total amount, then I remove some coins and try using second smallest coins ... You can run my code below in http://cpp.sh/
The total amount is 10 and the available coin values are 1, 2, 5 in my code.
#include <iostream>
#include <stdlib.h>
#include <iomanip>
#include <cmath>
#include <vector>
using namespace std;
vector<vector<int>> res;
vector<int> values;
int total = 0;
void helper(vector<int>& curCoins, int current, int i){
int old = current;
if(i==values.size())
return;
int val = values[i];
while(current<total){
current += val;
curCoins.push_back(val);
}
if(current==total){
res.push_back(curCoins);
}
while (current>old) {
current -= val;
curCoins.pop_back();
if (current>=0) {
helper(curCoins, current, i+1);
}
}
}
int main(int argc, const char * argv[]) {
total = 10;
values = {1,2,5};
vector<int> chosenCoins;
helper(chosenCoins, 0, 0);
cout<<"number of combinations: "<<res.size()<<endl;
for (int i=0; i<res.size(); i++) {
for (int j=0; j<res[i].size(); j++) {
if(j!=0)
cout<<" ";
cout<<res[i][j];
}
cout<<endl;
}
return 0;
}
Is there a better solution to output all the combinations for this problem? Dynamic programming?
EDIT:
My question is is this problem solvable using dynamic programming?
Thanks for the help. I've implemented the DP version here: Coin Change DP Algorithm Print All Combinations
A DP solution:
We have
{solutions(n)} = Union ({solutions(n - 1) + coin1},
{solutions(n - 2) + coin2},
{solutions(n - 5) + coin5})
So in code:
using combi_set = std::set<std::array<int, 3u>>;
void append(combi_set& res, const combi_set& prev, const std::array<int, 3u>& values)
{
for (const auto& p : prev) {
res.insert({{{p[0] + values[0], p[1] + values[1], p[2] + values[2]}}});
}
}
combi_set computeCombi(int total)
{
std::vector<combi_set> combis(total + 1);
combis[0].insert({{{0, 0, 0}}});
for (int i = 1; i <= total; ++i) {
append(combis[i], combis[i - 1], {{1, 0, 0}});
if (i - 2 >= 0) { append(combis[i], combis[i - 2], {{0, 1, 0}}); }
if (i - 5 >= 0) { append(combis[i], combis[i - 5], {{0, 0, 1}}); }
}
return combis[total];
}
Live Demo.
Exhaustive search is unlikely to be 'better' with dynamic programming, but here's a possible solution:
Start with a 2d array of combination strings, arr[value][index] where value is the total worth of the coins. Let X be target value;
starting from arr[0][0] = "";
for each coin denomination n, from i = 0 to X-n you copy all the strings from arr[i] to arr[i+n] and append n to each of the strings.
for example with n=5 you would end up with
arr[0][0] = "", arr[5][0] = "5" and arr[10][0] = "5 5"
Hope that made sense. Typical DP would just count instead of having strings (you can also replace the strings with int vector to keep count instead)
Assume that you have K the total size of the output your are expecting (the total number of coins in all the combinations). Obviously you can not have a solution that runs faster than O(K), if you actually need to output all them. As K can be very large, this will be a very long running time, and in the worst case you will get little profit from the dynamic programming.
However, you still can do better than your straightforward recursive solution. Namely, you can have a solution running in O(N*S+K), where N is the number of coins you have and S is the total sum. This will not be better than straightforward solution for the worst possible K, but if K is not so big, you will get it running faster than your recursive solution.
This O(N*S+K) solution can be relatively simply coded. First you run the standard DP solution to find out for each sum current and each i whether the sum current can be composed of first i coin types. You do not yet calculate all the solutions, you just find out whether at least one solution exists for each current and i. Then, you write a recursive function similar to what you have already written, but before you try each combination, you check using you DP table whether it is worth trying, that is, whether at least one solution exists. Something like:
void helper(vector<int>& curCoins, int current, int i){
if (!solutionExists[current, i]) return;
// then your code goes
this way each branch of the recursion tree will finish in finding a solution, and therefore the total recursion tree size will be O(K), and the total running time will be O(N*S+K).
Note also that all this is worth only if you really need to output all the combinations. If you need to do something else with the combinations you get, it is very probable that you do not actually need all the combinations and you may adapt the DP solution for that. For example, if you want to print only m-th of all solutions, this can be done in O(N*S).
You just need to make two passes over the data structure (a hash table will work well as long as you've got a relatively small number of coins).
The first one finds all unique sums less than the desired total (actually you could stop perhaps at 1/2 the desired total) and records the simplest way (least additions required) to obtain that sum. This is essentially the same as the DP.
The second pass then goes starts at the desired total and works its way backwards through the data to output all ways that the total can be generated.
This ends up being a two stage approach of what Petr is suggesting.
The actual amount of non distinct valid combinations for amounts {1, 2, 5} and N = 10 is 128, using a pure recursive exhaustive technique (Code below). My question is can an exhaustive search be improved with memoization/dynamic programming. If so, how can I modify the algorithm below to incorporate such techniques.
public class Recursive {
static int[] combo = new int[100];
public static void main(String argv[]) {
int n = 10;
int[] amounts = {1, 2, 5};
ways(n, amounts, combo, 0, 0, 0);
}
public static void ways(int n, int[] amounts, int[] combo, int startIndex, int sum, int index) {
if(sum == n) {
printArray(combo, index);
}
if(sum > n) {
return;
}
for(int i=0;i<amounts.length;i++) {
sum = sum + amounts[i];
combo[index] = amounts[i];
ways(n, amounts, combo, startIndex, sum, index + 1);
sum = sum - amounts[i];
}
}
public static void printArray(int[] combo, int index) {
for(int i=0;i < index; i++) {
System.out.print(combo[i] + " ");
}
System.out.println();
}
}

What is the complexity of this program

I have solved a question on HackerEarth.
The question is
Phineas is Building a castle in his backyard to impress Isabella ( strange, isn't it? ). He has got everything delivered and ready. Even the ground floor has been finished. Now is time to make the upper part. This is where the things become interesting. As Ferb is sleeping in the house after a long day painting the fence (and you folks helped him, didn't ya!), Phineas has to do all the work himself. He is good at this, and all he wants you to do is operate the mini crane to lift the stones. Stones for the wall has been cut and ready waiting for you to lift them up.
Now we don't have Ferb to operate the mini crane, in which he is an expert, we got to do the job as quick as possible. We are given the maximum lifting capacity of the crane, and the weight of each stone. Since it's a mini crane, we cannot place more then 2 stones (of any possible size) at a time, or it will disturb the balance of the crane. we need to find out in how many turns we can deliver the stones to Phineas, who is building the castle.
INPUT: First line of input gives T, the number of test cases. For each test case, first line gives M, the maximum lifting capacity of the crane. first integer N of next line of each test case gives the number of stones, followed by N numbers, specifying the weight of individual stone X.
OUTPUT: For each test case, print the minimum number of turns the crane is operated for all stones to be lifted.
CONSTRAINTS:
1 <= T <= 50
1 <= M <= 1000
1 <= N <= 1000
Sample Input
1
50
3 28 22 48
Sample Output
2
Explanation
In first turn, 28 and 22 will be lifted together. In second turn 48 will be lifted.
Discard the stones with weight > max capacity of crane.
Now I have solved this question and I my source code is
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
using namespace std;
int main(void) {
int T = 0;
scanf("%d",&T);
while(T--) {
int i = 0,M = 0, N = 0,max = 0, res = 0, index = 0, j = 0, temp = 0;
vector<int> v1;
scanf("%d",&M);
scanf("%d",&N);
for(i = 0; i < N ;++i) {
scanf("%d",&temp);
if(temp <= M)
v1.push_back(temp);
}
for(i = 0; i < v1.size() ; ++i) {
max = 0;
index = 0;
if(v1[i] != -1) {
for(j = i + 1; j < v1.size(); ++j) {
if(v1[j] != -1) {
temp = v1[i] + v1[j];
if(temp > max && temp <= M) {
max = temp;
index = j;
}
}
}
++res;
v1[i] = -1;
v1[index] = -1;
}
}
printf("%d\n",res);
}
return 0;
}
Now here are my question
I want to know the average case time complexity of this code. Also I think worst case complexity of this code would be O(N^2).
This is a brute force approach or dynamic programming approach?
Is there any better approach then this?
This is a simplified version of Knapsack Prolblem
While the Knapsack problem is a typical dynamic programming question, this simplified question does not require dynamic Programming. Complexity of your solution is indeed O(n^2), the approach is more suitable described as Greedy As you tried to find a optimal pair for each stone, if there exist. The complexity can be further reduced to O(nlgn) if you sort the stones first and work on a sorted vector.

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w