Weighted random numbers - c++

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.
In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:
1 (weight: 90)
2 (weight: 56)
3 (weight: 4)
Does Boost have some sort of functionality for this?

There is a straightforward algorithm for picking an item at random, where items have individual weights:
1) calculate the sum of all the weights
2) pick a random number that is 0 or greater and is less than the sum of the weights
3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight
Pseudo-code illustrating this:
int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
assert(!"should never get here");
This should be straightforward to adapt to your boost containers and such.
If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:
By storing the cumulative weight sum in each item you can use a binary search to pick the item corresponding to the pick weight.
If you do not know the number of items in the list, then there's a very neat algorithm called reservoir sampling that can be adapted to be weighted.

Updated answer to an old question. You can easily do this in C++11 with just the std::lib:
#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>
int main()
{
// Set up distribution
double interval[] = {1, 2, 3, 4};
double weights[] = { .90, .56, .04};
std::piecewise_constant_distribution<> dist(std::begin(interval),
std::end(interval),
std::begin(weights));
// Choose generator
std::mt19937 gen(std::time(0)); // seed as wanted
// Demonstrate with N randomly generated numbers
const unsigned N = 1000000;
// Collect number of times each random number is generated
double avg[std::extent<decltype(weights)>::value] = {0};
for (unsigned i = 0; i < N; ++i)
{
// Generate random number using gen, distributed according to dist
unsigned r = static_cast<unsigned>(dist(gen));
// Sanity check
assert(interval[0] <= r && r <= *(std::end(interval)-2));
// Save r for statistical test of distribution
avg[r - 1]++;
}
// Compute averages for distribution
for (double* i = std::begin(avg); i < std::end(avg); ++i)
*i /= N;
// Display distribution
for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}
Output on my system:
avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544
Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested "probabilities" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.

If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:
#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
i = dist(gen);
//do something with your samples...
Note, however, that the c++11 discrete_distribution computes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.
Will's answer https://stackoverflow.com/a/1761646/837451 avoids this overhead but will be slower to draw from than the C++11 because it can't use binary search.
To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tcc on my Ubuntu 16.04 + GCC 5.3 install):
template<typename _IntType>
void
discrete_distribution<_IntType>::param_type::
_M_initialize()
{
if (_M_prob.size() < 2)
{
_M_prob.clear();
return;
}
const double __sum = std::accumulate(_M_prob.begin(),
_M_prob.end(), 0.0);
// Now normalize the probabilites.
__detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
__sum);
// Accumulate partial sums.
_M_cp.reserve(_M_prob.size());
std::partial_sum(_M_prob.begin(), _M_prob.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
}

What I do when I need to weight numbers is using a random number for the weight.
For example: I need that generate random numbers from 1 to 3 with the following weights:
10% of a random number could be 1
30% of a random number could be 2
60% of a random number could be 3
Then I use:
weight = rand() % 10;
switch( weight ) {
case 0:
randomNumber = 1;
break;
case 1:
case 2:
case 3:
randomNumber = 2;
break;
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
randomNumber = 3;
break;
}
With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.
You can play with it as your needs.
Hope I could help you, Good Luck!

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.
Example:
1 60%
2 35%
3 5%
So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)
Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:
template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
if (begin==end) return end;
double sum=0.;
for (It i=begin;i!=end;++i)
sum+=p(*i);
double choice=sum*random01();
for (It i=begin;;) {
choice -= p(*i);
It r=i;
++i;
if (choice<0 || i==end) return r;
}
return begin; //unreachable
}
Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.
p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.

This is my understanding of a "weighted random", I've been using this recently. (Code is in Python but can be implemented in other langs)
Let's say you want to pick a random person and they don't have equal chances of being selected
You can give each person a "weight" or "chance" value:
choices = [("Ade", 60), ("Tope", 50), ("Maryamu", 30)]
You use their weights to calculate a score for each then find the choice with the highest score
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
print(highest)
For Ade the highest score they can get is 60, Tope 50 and so on, meaning that Ade has a higher chance of generating the largest score than the rest.
You can use any range of weights, the greater the difference the more skewed the distribution.
E.g if Ade had a weight of 1000 they will almost always be chosen.
Test
votes = [{"name": "Ade", "votes": 0}, {"name": "Tope", "votes": 0}, {"name": "Maryamu", "votes": 0]
for v in range(100):
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
candidate = choices(index(highest[0])) # get index of person
votes[candidate]["count"] += 1 # increase vote count
print(votes)
// votes printed at the end. your results might be different
[{"name": "Ade", "votes": 45}, {"name": "Tope", "votes": 30}, {"name": "Maryamu", "votes": 25}]
Issues
It looks like the more the voters, the more predictable the results. Welp
Hope this gives someone an idea...

I have just implemented the given solution by "will"
#include <iostream>
#include <map>
using namespace std;
template < class T >
class WeightedRandomSample
{
public:
void SetWeigthMap( map< T , unsigned int >& WeightMap )
{
m_pMap = &WeightMap;
}
T GetRandomSample()
{
unsigned int sum_of_weight = GetSumOfWeights();
unsigned int rnd = (rand() % sum_of_weight);
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
unsigned int w = it->second;
if(rnd < w)
return (it->first);
rnd -= w;
}
//assert(!"should never get here");
T* t = NULL;
return *(t);
}
unsigned int GetSumOfWeights()
{
if(m_pMap == NULL)
return 0;
unsigned int sum = 0;
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
sum += it->second;
}
return sum;
}
protected:
map< T , unsigned int>* m_pMap = NULL;
};
typedef pair<int , int> PAIR_INT_INT;
typedef map<PAIR_INT_INT ,unsigned int> mul_table_weighted_map;
int main()
{
mul_table_weighted_map m;
m[PAIR_INT_INT(2,3)] = 10;
m[PAIR_INT_INT(4,5)] = 20;
m[PAIR_INT_INT(2,5)] = 10;
WeightedRandomSample<PAIR_INT_INT> WRS;
WRS.SetWeigthMap(m);
unsigned int sum_of_weight = WRS.GetSumOfWeights();
cout <<"Sum of weights : " << sum_of_weight << endl;
unsigned int number_of_test = 10000;
cout << "testing " << number_of_test << " ..." << endl;
map<PAIR_INT_INT , unsigned int> check_map;
for(int i = 0 ; i < number_of_test ; i++)
{
PAIR_INT_INT res = WRS.GetRandomSample();
check_map[res]++;
//cout << i+1 << ": random = " << res.first << " * " << res.second << endl;
}
cout << "results: " << endl;
for(auto t : check_map)
{
PAIR_INT_INT p = t.first;
unsigned int expected = (number_of_test * m[p]) / sum_of_weight;
cout << " pair " << p.first << " * " << p.second
<< ", counted = " << t.second
<< ", expected = " << expected
<< endl;
}
return 0;
}

For example, generating a random index in a vector of weights for that index can be done this way:
#include <bits/stdc++.h>
using namespace std;
int getWeightedRandomNumber(vector<int> weights){
vector<int> vec;
for(int i=0; i<weights.size(); i++){
for(int j=0; j<weights[i]; j++){
vec.push_back(i);
}
}
random_shuffle(vec.begin(), vec.end());
return vec.front();
}
int main()
{
vector<int> v{2,4,5,100,1,2,4,4};
for(int i=0; i<100; i++){
cout<<getWeightedRandomNumber(v)<<endl;
}
}
Since we are constructing another vector with (no of elements) = almost (current no of elements) * (mean weight), this approach might now work when dealing with large data.

Related

How can I add limited coins to the coin change problem? (Bottom-up - Dynamic programming)

I am new to dynamic programming (and C++ but I have more experience, some things are still unknown to me). How can I add LIMITED COINS to the coin change problem (see my code below - is a bit messy but I'm still working on it). I have a variable nr[100] that registers the number of coins (also created some conditions in my read_values() ). I don't know where can I use it in my code.
The code considers that we have an INFINITE supply of coins (which I don't want that).
It is made in the bottom-up method (dynamic programming).
My code is inspired from this video: Youtube
#include <iostream>
using namespace std;
int C[100], b[100], n, S, s[100], nr[100], i, condition=0, ok=1;
void read_values() //reads input
{
cin >> n; // coin types
cin >> S; // amount to change
for (i=1; i<=n; i++)
{
cin >> b[i]; //coin value
cin>>nr[i]; //coin amount
if(nr[i]==0)b[i]=0; //if there are no coin amount then the coin is ignored
condition+=b[i]*nr[i]; //tests to see if we have enough coins / amount of coins to create a solution
if(b[i]>S)
{
b[i]=0;
}
}
if(S>condition)
{
cout<<endl;
cout<<"Impossible!";
ok=0;
}
}
void payS()
{
int i, j;
C[0] = 0; // if amount to change is 0 then the solution is 0
for (j=1; j<=S; j++)
{
C[j] = S+1;
for (i=1; i<=n; i++)
{
if (b[i] <= j && 1 + C[j - b[i]] < C[j])
{
C[j] = 1 + C[j - b[i]];
s[j] = b[i];
}
}
}
cout << "Minimum ways to pay the amount: " << C[S] << endl;
}
void solution(int j)
{
if (j > 0)
{
solution(j - s[j]);
cout << s[j] << " ";
}
}
int main()
{
read_values();
if(ok!=0)
{
payS();
cout << "The coins that have been used are: ";
solution(S);
}
}
I'm working under the assumption that you need to generate change for a positive integer value, amount using your nbr table where nbr[n] is the number of coins available of value n. I'm also working under the assumption that nbr[0] is effectively meaningless since it would only represent coins of no value.
Most dynamic programming problems are typically recursing on a binary decision of choosing option A vs option B. Often times one option is "pick this one" and other is "don't pick this one and use the rest of the available set". This problem is really no different.
First, let's solve the recursive dynamic problem without a cache.
I'm going to replace your nbr variable with a data structure called a "cointable". This is used to keep track of both the available set of coins and the set of coins selected for any given solution path:
struct cointable
{
static const int MAX_COIN_VALUE = 100;
int table[MAX_COIN_VALUE+1]; // table[n] maps "coin of value n" to "number of coins availble at amount n"
int number; // number of coins in table
};
cointable::table is effectively the same thing as your nbr array. coinbase::number is the summation of the values in table. It's not used to keep track of available coins, but it is used to keep track of the better solution.
Now we can introduce the recursive solution without a lookup cache.
Each step of the recursion does this:
Look for the highest valuable coin that is in the set of available coins not greater than the target amount being solved for
Recurse on option A: Pick this coin selected from step 1. Now solve (recursively) for the reduced amount using the reduced set of available coins.
Recurse on option B: Don't pick this coin, but instead recurse with the first coin of lesser value than what was found in step 1.
Compare the recursion results of 2 and 3. Pick the one with lesser number of coins used
Here's the code - without using an optimal lookup cache
bool generateChange(int amount, cointable& available, cointable& solution, int maxindex)
{
if ((maxindex == 0) || (amount < 0))
{
return false;
}
if (amount == 0)
{
return true;
}
int bestcoin = 0;
// find the highest available coin that not greater than amount
if (maxindex > amount)
{
maxindex = amount;
}
// assert(maxindex <= cointable::MAX_COIN_VALUE)
for (int i = maxindex; i >= 1; i--)
{
if (available.table[i] > 0)
{
bestcoin = i;
break;
}
}
if (bestcoin == 0)
{
return false; // out of coins
}
// go down two paths - one with picking this coin. Another not picking it
// option 1
// pick this coin (clone available and result)
cointable a1 = available;
cointable r1 = solution;
a1.table[bestcoin]--;
r1.table[bestcoin]++;
r1.number++;
bool result1 = generateChange(amount - bestcoin, a1, r1, bestcoin);
// option2 - don't pick this coin and start looking for solutions with lesser
// coins (not the use of references for a2 and r2 since we haven't changed anything)
cointable& a2 = available;
cointable& r2 = solution;
bool result2 = generateChange(amount, a2, r2, bestcoin - 1);
bool isSolvable = result1 || result2;
if (!isSolvable)
{
return false;
}
// note: solution and r2 are the same object, no need to reassign solution=r2
if (
((result1 && result2) && (r1.number < r2.number))
|| (result2 == false)
)
{
solution = r1;
}
return true;
}
And then a quick demonstration for how to calculate change for 128 cents given a limited amount of coins in the larger denominations: {1:100, 5:20, 10:10, 25:1, 50:1}
int main()
{
cointable available = {}; // zero-init
cointable solution = {}; // zero-init
available.table[1] = 100;
available.table[5] = 20;
available.table[10] = 10;
available.table[25] = 1;
available.table[50] = 1;
int amount = 128;
bool result = generateChange(amount, available, solution, cointable::MAX_COIN_VALUE);
if (result == true)
{
for (int i = 1; i < 100; i++)
{
if (solution.table[i] > 0)
{
std::cout << i << " : " << solution.table[i] << "\n";
}
}
}
else
{
cout << "no solution\n";
}
}
And that should work. And it might be fast enough for most making change for anything under a dollar such that a cache is not warranted. So it's possible we can stop right here and be done.
And I am going to stop right here
I started to work on a solution that introduces a "cache" to avoid redundant recursions. But after benchmarking it and studying how the algorithm finds the best solution quickly, I'm not so sure a cache is warranted. My initial attempt to insert a cache table for both solvable and unsolvable solutions just made the code slower. I'll need to study how to make it work - if it's even warranted at all.
Maybe you wanted us to fix your code, but instead I implemented my own version of solution. Hopefully my own version will be useful somehow for you, at least educationally.
Of course I used Dynamic Programming approach for that.
I keep a vector of possible to compose changes. Each next sums is composed of previous sums by adding several coins of same value.
History of used coins is also kept, this allows us to restore each change as combination of exactly given coins.
After code you can see console output that shows example of composing change 13 out of coins 2x4, 3x3, 5x2, 10x1 (here second number is amount of coins).
Input coins and their amount is given inside coins vector at start of main() function, you can fill this vector with anything you want, for example by taking console user input. Needed to be represented change is given inside variable change.
Don't forget to see Post Scriptum (PS.) after code and console output, it has some more details about algorithm.
Full code below:
Try it online!
#include <cstdint>
#include <vector>
#include <unordered_map>
#include <set>
#include <algorithm>
#include <functional>
#include <iostream>
using u32 = uint32_t;
using u64 = uint64_t;
int main() {
std::vector<std::pair<u32, u32>> const coins =
{{2, 4}, {3, 3}, {5, 2}, {10, 1}};
u32 const change = 13;
std::vector<std::unordered_map<u32, std::pair<u64, std::set<u32>>>>
sums = {{{0, {1, {}}}}};
for (auto [coin_val, coin_cnt]: coins) {
sums.push_back({});
for (auto const & [k, v]: sums.at(sums.size() - 2))
for (size_t icnt = 0; icnt <= coin_cnt; ++icnt) {
auto & [vars, prevs] = sums.back()[k + coin_val * icnt];
vars += v.first;
prevs.insert(icnt);
}
}
std::vector<std::pair<u32, u32>> path;
std::vector<std::vector<std::pair<u32, u32>>> paths;
std::function<bool(u32, u32, u32)> Paths =
[&](u32 sum, u32 depth, u32 limit){
if (sum == 0) {
paths.push_back(path);
std::reverse(paths.back().begin(), paths.back().end());
return paths.size() < limit;
}
auto const coin = coins.at(depth - 1).first;
auto const & [_, prevs] = sums.at(depth).at(sum);
for (auto const cnt: prevs) {
if (cnt > 0)
path.push_back({coin, cnt});
if (!Paths(sum - coin * cnt, depth - 1, limit))
return false;
if (cnt > 0)
path.pop_back();
}
return true;
};
if (!sums.back().count(change)) {
std::cout << "Change " << change
<< " can NOT be represented." << std::endl;
return 0;
}
std::cout << "Change " << change << " can be composed "
<< std::get<0>(sums.back().at(change)) << " different ways." << std::endl;
Paths(change, coins.size(), 20);
std::cout << "First " << paths.size() << " variants:" << std::endl;
for (auto const & path: paths) {
std::cout << change << " = ";
for (auto [coin, cnt]: path)
std::cout << coin << "x" << cnt << " + ";
std::cout << std::endl;
}
}
Output:
Change 13 can be composed 5 different ways.
First 5 variants:
13 = 2x2 + 3x3 +
13 = 2x4 + 5x1 +
13 = 2x1 + 3x2 + 5x1 +
13 = 3x1 + 5x2 +
13 = 3x1 + 10x1 +
PS. As you may have noticed, main Dynamic Programming part of algorithm is very tiny, just following lines:
std::vector<std::unordered_map<u32, std::pair<u64, std::set<u32>>>>
sums = {{{0, {1, {}}}}};
for (auto [coin_val, coin_cnt]: coins) {
sums.push_back({});
for (auto const & [k, v]: sums.at(sums.size() - 2))
for (size_t icnt = 0; icnt <= coin_cnt; ++icnt) {
auto & [vars, prevs] = sums.back()[k + coin_val * icnt];
vars += v.first;
prevs.insert(icnt);
}
}
This part keeps all currently composable sums (changes). Algo starts from money change of 0, then incrementally adds 1-by-1 coin to all possible current changes (sums), thus forming new sums (including this new coin).
Each sum keeps a counter of all possible ways to compose it plus it keeps track of all last coins that lead to this sum. This last coins set allows to do back-tracking in order to restore concrete combinations of coins, not just amount of ways to compute this sum.

How to find all possible combinations of adding two variables, each attached to a multiplier, summing up to a given number (cin)?

In my situation, a lorry has a capacity of 30, while a van has a capacity of 10. I need to find the number of vans/lorries needed to transport a given amount of cargo, say 100. I need to find all possible combinations of lorries + vans that will add up to 100.
The basic math calculation would be: (30*lorrycount) + (10*vancount) = n, where n is number of cargo.
Output Example
Cargo to be transported: 100
Number of Lorry: 0 3 2 1
Number of Van: 10 1 4 7
For example, the 2nd combination is 3 lorries, 1 van. Considering that lorries have capacity = 30 and van capacity = 10, (30*3)+(10*1) = 100 = n.
For now, we only have this code, which finds literally all combinations of numbers that add up to given number n, without considering the formula given above.
#include <iostream>
#include <vector>
using namespace std;
void findCombinationsUtil(int arr[], int index,
int num, int reducedNum)
{
int lorry_capacity = 30;
int van_capacity = 10;
// Base condition
if (reducedNum < 0)
return;
// If combination is found, print it
if (reducedNum == 0)
{
for (int i = 0; i < index; i++)
cout << arr[i] << " ";
cout << endl;
return;
}
// Find the previous number stored in arr[]
// It helps in maintaining increasing order
int prev = (index == 0) ? 1 : arr[index - 1];
// note loop starts from previous number
// i.e. at array location index - 1
for (int k = prev; k <= num; k++)
{
// next element of array is k
arr[index] = k;
// call recursively with reduced number
findCombinationsUtil(arr, index + 1, num,
reducedNum - k);
}
}
void findCombinations(int n)
{
// array to store the combinations
// It can contain max n elements
std::vector<int> arr(n); // allocate n elements
//find all combinations
findCombinationsUtil(&*arr.begin(), 0, n, n);
}
int main()
{
int n;
cout << "Enter the amount of cargo you want to transport: ";
cin >> n;
cout << endl;
//const int n = 10;
findCombinations(n);
return 0;
}
Do let me know if you have any solution to this, thank you.
An iterative way of finding all possible combinations
#include <iostream>
#include <vector>
int main()
{
int cw = 100;
int lw = 30, vw = 10;
int maxl = cw/lw; // maximum no. of lorries that can be there
std::vector<std::pair<int,int>> solutions;
// for the inclusive range of 0 to maxl, find the corresponding no. of vans for each variant of no of lorries
for(int l = 0; l<= maxl; ++l){
bool is_integer = (cw - l*lw)%vw == 0; // only if this is true, then there is an integer which satisfies for given l
if(is_integer){
int v = (cw-l*lw)/vw; // no of vans
solutions.push_back(std::make_pair(l,v));
}
}
for( auto& solution : solutions){
std::cout<<solution.first<<" lorries and "<< solution.second<<" vans" <<std::endl;
}
return 0;
}
We will create a recursive function that walks a global capacities array left to right and tries to load cargo into the various vehicle types. We keep track of how much we still have to load and pass that on to any recursive call. If we reach the end of the array, we produce a solution only if the remaining cargo is zero.
std::vector<int> capacities = { 30, 10 };
using Solution = std::vector<int>;
using Solutions = std::vector<Solution>;
void tryLoad(int remaining_cargo, int vehicle_index, Solution so_far, std::back_insert_iterator<Solutions>& solutions) {
if (vehicle_index == capacities.size()) {
if (remaining_cargo == 0) // we have a solution
*solutions++ = so_far;
return;
}
int capacity = capacities[vehicle_index];
for (int vehicles = 0; vehicles <= remaining_cargo / capacity; vehicles++) {
Solution new_solution = so_far;
new_solution.push_back(vehicles);
tryLoad(remaining_cargo - vehicles * capacity, vehicle_index + 1, new_solution, solutions);
}
}
Calling this as follows should produce the desired output in all_solutions:
Solutions all_solutions;
auto inserter = std::back_inserter(all_solutions)
tryLoad(100, 0, Solution{}, inserter);

How to pick a sequence of numbers (from a fixed list) that will sum to a target number?

Let say I've a target number and a list of possibile values that I can pick to create a sequence that, once summed every picked number, will sum to the target:
target = 31
list = 2, 3, 4
possible sequence: 3 2 4 2 2 2 4 2 3 2 3 2
I'd like to:
first decide if there is any sequence that will reach the target
return one of the many (possible) sequence
This is my attempt:
#include <iostream>
#include <random>
#include <chrono>
#include <vector>
inline int GetRandomInt(int min = 0, int max = 1) {
uint64_t timeSeed = std::chrono::high_resolution_clock::now().time_since_epoch().count();
std::seed_seq ss{ uint32_t(timeSeed & 0xffffffff), uint32_t(timeSeed >> 32) };
std::mt19937_64 rng;
rng.seed(ss);
std::uniform_int_distribution<int> unif(min, max);
return unif(rng);
}
void CreateSequence(int target, std::vector<int> &availableNumbers) {
int numAttempts = 1;
int count = 0;
std::vector<int> elements;
while (count != target) {
while (count < target) {
int elem = availableNumbers[GetRandomInt(0, availableNumbers.size() - 1)];
count += elem;
elements.push_back(elem);
}
if (count != target) {
numAttempts++;
count = 0;
elements.clear();
}
}
int size = elements.size();
std::cout << "count: " << count << " | " << "num elements: " << size << " | " << "num attempts: " << numAttempts << std::endl;
for (auto it = elements.begin(); it != elements.end(); it++) {
std::cout << *it << " ";
}
}
int main() {
std::vector<int> availableNumbers = { 2, 3, 4 };
CreateSequence(31, availableNumbers);
}
But it can loop infinitely if the list of number can't be appropriate to reach such sum; example:
std::vector<int> availableNumbers = { 3 };
CreateSequence(8, availableNumbers);
No sequence of 3 will sum to 8. Also, if the list is huge and the target number high, it can lead to a huge amount of processing (cause lots of while check fails).
How would you implement this kind of algorithm?
Your suggested code is possibly very fast, since it is heuristic. But as you said, it gets potentially trapped in a nearly endless loop.
If you want to avoid this situation, you have to search the complete set of possible combinations.
Abstraction
Let's define our algorithm as a function f with a scalar target t and a vector <b> as parameters returning a vector of coefficients <c>, where <b> and <c> have the same dimension:
<c> = f(t, <b>)
First the given set of numbers Sg should be reduced to their reduced set Sr so we reduce the dimension of our solution vector <c>. E.g. {2,3,4,11} can be reduced to {2,3}. We get this by calling our algorithm recursively by splitting Sg into a new target ti with the remaining numbers as the new given set Sgi and ask the algorithm, if it finds any solution (a non-zero vector). If so, remove that target ti from the original given set Sg. Repeat this recursively until no solutions found any more.
Now we can understand this set of numbers as a polynomial, where we are looking for possible coefficients ci to get our target t. Let's call each element in Sb bi with i={1..n}.
Our test sum ts is the sum over all i for ci * bi, where each ci can run from 0 to ni = floor(t/bi).
The number of possible tests N is now the product over all ni+1: N = (n1+1) * (n2+1) * ... * (ni+1).
Iterate now over all possibilities by representing the coefficient vector <c> as an vector of integers and incrementing c1 and carrying over an overrun to the next element in the vector, resetting c1 and so forth.
Example
#include <random>
#include <chrono>
#include <vector>
#include <iostream>
using namespace std;
static int evaluatePolynomial(const vector<int> &base, const vector<int> &coefficients)
{
int v=0;
for(unsigned long i=0; i<base.size(); i++){
v += base[i]*coefficients[i];
}
return v;
}
static bool isZeroVector(vector<int> &v)
{
for (auto it = v.begin(); it != v.end(); it++) {
if(*it != 0){
return false;
}
}
return true;
}
static vector<int> searchCoeffs(int target, vector<int> &set) {
// TODO: reduce given set
vector<int> n = set;
vector<int> c = vector<int>(set.size(), 0);
for(unsigned long int i=0; i<set.size(); i++){
n[i] = target/set[i];
}
c[0] = 1;
bool overflow = false;
while(!overflow){
if(evaluatePolynomial(set, c) == target){
return c;
}
// increment coefficient vector
overflow = true;
for(unsigned long int i=0; i<c.size(); i++){
c[i]++;
if(c[i] > n[i]){
c[i] = 0;
}else{
overflow = false;
break;
}
}
}
return vector<int>(set.size(), 0);
}
static void print(int target, vector<int> &set, vector<int> &c)
{
for(unsigned long i=0; i<set.size(); i++){
for(int j=0; j<c[i]; j++){
cout << set[i] << " ";
}
}
cout << endl;
cout << target << " = ";
for(unsigned long i=0; i<set.size(); i++){
cout << " +" << set[i] << "*" << c[i];
}
cout << endl;
}
int main() {
vector<int> set = {4,3,2};
int target = 31;
auto c = searchCoeffs(target, set);
print(target, set,c);
}
That code prints
4 4 4 4 4 4 4 3
31 = +4*7 +3*1 +2*0
Further Thoughts
productive code should test for zeros in any given values
the search could be improved by incrementing the next coefficient if the evaluated polynomial already exceeded the target value.
further speedup is possible, when calculating the difference of the target value and the evaluated polynomial when c1 is set to zero, and checking if that difference is a multiple of b1. If not, c2 could be incremented straight forward.
Perhaps there exist some shortcuts exploiting the least common multiple
As ihavenoidea proposed, I would also try backtracking. In addition, I will sort the numbers in decreasing order, il order to speed up the process.
Note: a comment would be more appropriate than an answer, but I am not allowed to. Hope it helps. I will suppress this answer if requested.

Finding divisor pairs

I'm trying to solve this exercise http://main.edu.pl/en/archive/amppz/2014/dzi and I have no idea how to improve perfomance of my code. Problems occure when program have to handle over 500,000 unique numbers(up to 2,000,000 as in description). Then it took 1-8s to loop over all this numbers. Tests I have used are from http://main.edu.pl/en/user.phtml?op=tests&c=52014&task=1263, and I testing it by command
program.exe < data.in > result.out
Description:
You are given a sequence of n integer a1, a2, ... an. You should determine the number of such ordered pairs(i, j), that i, j equeals(1, ..., n), i != j and ai is divisor of aj.
The first line of input contains one integer n(1 <= n <= 2000000)
The second line contains a sequence of n integers a1, a2, ..., an(1 <= ai <= 2000000).
In the first and only line of output should contain one integer, denoting the number of pairs sought.
For the input data:
5
2 4 5 2 6
the correct answer is: 6
Explanation: There are 6 pars: (1, 2) = 4/2, (1, 4) = 2/2, (1, 5) = 6/2, (4, 1) = 2/2, (4, 2) = 4/2, (4, 5) = 6/2.
For example:
- with 2M in total numbers and 635k unique numbers, there is 345mln iterations in total
- with 2M in total numbers and 2mln unqiue numbers, there is 1885mln iterations in total
#include <iostream>
#include <math.h>
#include <algorithm>
#include <time.h>
#define COUNT_SAME(count) (count - 1) * count
int main(int argc, char **argv) {
std::ios_base::sync_with_stdio(0);
int n; // Total numbers
scanf("%d", &n);
clock_t start, finish;
double duration;
int minVal = 2000000;
long long *countVect = new long long[2000001]; // 1-2,000,000; Here I'm counting duplicates
unsigned long long counter = 0;
unsigned long long operations = 0;
int tmp;
int duplicates = 0;
for (int i = 0; i < n; i++) {
scanf("%d", &tmp);
if (countVect[tmp] > 0) { // Not best way, but works
++countVect[tmp];
++duplicates;
} else {
if (minVal > tmp)
minVal = tmp;
countVect[tmp] = 1;
}
}
start = clock();
int valueJ;
int sqrtValue, valueIJ;
int j;
for (int i = 2000000; i > 0; --i) {
if (countVect[i] > 0) { // Not all fields are setted up
if (countVect[i] > 1)
counter += COUNT_SAME(countVect[i]); // Sum same values
sqrtValue = sqrt(i);
for (j = minVal; j <= sqrtValue; ++j) {
if (i % j == 0) {
valueIJ = i / j;
if (valueIJ != i && countVect[valueIJ] > 0 && valueIJ > sqrtValue)
counter += countVect[i] * countVect[valueIJ];
if (i != j && countVect[j] > 0)
counter += countVect[i] * countVect[j];
}
++operations;
}
}
}
finish = clock();
duration = (double)(finish - start) / CLOCKS_PER_SEC;
printf("Loops time: %2.3f", duration);
std::cout << "s\n";
std::cout << "\n\nCounter: " << counter << "\n";
std::cout << "Total operations: " << operations;
std::cout << "\nDuplicates: " << duplicates << "/" << n;
return 0;
}
I know, I shouldn't sort the array at beginning, but I have no idea how to make it in better way.
Any tips will be great, thanks!
Here is improved algorithm - 2M unique numbers within 0.5s. Thanks to #PJTraill!
#include <iostream>
#include <math.h>
#include <algorithm>
#include <time.h>
#define COUNT_SAME(count) (count - 1) * count
int main(int argc, char **argv) {
std::ios_base::sync_with_stdio(0);
int n; // Total numbers
scanf("%d", &n);
clock_t start, finish;
double duration;
int maxVal = 0;
long long *countVect = new long long[2000001]; // 1-2,000,000; Here I'm counting duplicates
unsigned long long counter = 0;
unsigned long long operations = 0;
int tmp;
int duplicates = 0;
for (int i = 0; i < n; i++) {
scanf("%d", &tmp);
if (countVect[tmp] > 0) { // Not best way, but works
++countVect[tmp];
++duplicates;
} else {
if (maxVal < tmp)
maxVal = tmp;
countVect[tmp] = 1;
}
}
start = clock();
int j;
int jCounter = 1;
for (int i = 0; i <= maxVal; ++i) {
if (countVect[i] > 0) { // Not all fields are setted up
if (countVect[i] > 1)
counter += COUNT_SAME(countVect[i]); // Sum same values
j = i * ++jCounter;
while (j <= maxVal) {
if (countVect[j] > 0)
counter += countVect[i] * countVect[j];
j = i * ++jCounter;
++operations;
}
jCounter = 1;
}
}
finish = clock();
duration = (double)(finish - start) / CLOCKS_PER_SEC;
printf("Loops time: %2.3f", duration);
std::cout << "s\n";
std::cout << "\n\nCounter: " << counter << "\n";
std::cout << "Total operations: " << operations;
std::cout << "\nDuplicates: " << duplicates << "/" << n;
return 0;
}
I expect the following to work a lot faster than the OP’s algorithm (optimisations oblique):
(The type of values and frequencies should be 32-bit unsigned, counts 64-bit – promote before calculating a count, if your language would not.)
Read the number of values, N.
Read each value v, adding one to its frequency freq[v] (no need to store it).
(freq[MAX] (or MAX+1) can be statically allocated for probably optimal initialisation to all 0)
Calculate the number of pairs involving 1 from freq[1] and the number of values.
For every i in 2..MAX (with freq[i] > 0):
Calculate the number of pairs (i,i) from freq[i].
For every multiple m of i in 2m..MAX:
(Use m as the loop counter and increment it, rather than multiplying)
Calculate the number of pairs (i,m) from freq[i] and freq[m].
(if freq[i] = 1, one can omit the (i,i) calculation and perform a variant of the loop optimised for freq[i] = 1)
(One can perform the previous (outer) loop from 2..MAX/2, and then from MAX/2+1..MAX omitting the processing of multiples)
The number of pairs (i,i) = freq[i]C2 = ( freq[i] * (freq[i] - 1) ) / 2 .
The number of pairs (i,j) = freq[i] * freq[j] for i ≠ j.
This avoids sorting, sqrt and division.
Other optimisations
One can store the distinct values, and scan that array instead (the order does not matter); the gain or loss due to this depends on the density of the values in 1..MAX.
If the maximum frequency is < 216, which sounds very probable, all products will fit in 32 bits. One could take advantage of this by writing functions with the numeric type as a template, tracking the maximum frequency and then choosing the appropriate instance of the template for the rest. This costs N*(compare+branch) and may gain by performing D2 multiplications with 32 bits instead of 64, where D is the number of distinct values. I see no easy way to deduce that 32 bits suffice for the total, apart from N < 216.
If parallelising this for n processors, one could let different processors process different residues modulo n.
I considered keeping track of the number of even values, to avoid a scan of half the frequencies, but I think that for most datasets within the given parameters that would yield little advantage.
Ok, I am not going to write your whole algorithm for you, but it can definitely be done faster. So i guess this is what you need to get going:
So you have your list sorted, so there are a lot of assumptions you can make from this. Take for instance the highest value. It wont have any multiples. The highest value that does, will highest value divided by two.
There is also one other very usefull fact here. A multiple of a multiple is also a multiple. (Still following? ;)). Take for instance the list [2 4 12]. Now you've found (4,12) as a multiple pair. If you now also find (2,4), then you can deduce that 12 is also a multiple of 2.
And since you only have to count the pairs, you can just keep a count for each number how many multiples it has, and add that when you see that number as a multiple itself.
This means that it is probably best to iterate your sorted list backwards, and look for divisors instead.
And maybe store it in some way that goes like
[ (three 2's ), (two 5's), ...]
ie. store how often a number occurs. Once again, you don't have to keep track of it's id, since you only need to give them the total number of pairs.
Storing your list this way helps you, because all the 2's are going to have the same amount of multiples. So calculate once and then multiply.

How to produce random numbers so that their sum is equal to given number?

I want to produce X random numbers, each from the interval <0; Y> (given Y as a maximum of each number), but there is restriction that the sum of these numbers must be equal to Z.
Example:
5 Randoms numbers, each max 6 and the sum must be equal to 14, e.g: 0, 2, 6, 4, 2
Is there already a C/C++ function that could do something like that?
Personally I couldn't come up with more than some ugly if-else-constucts.
Since you don't need the generated sequence to be uniform, this could be one of the possible solutions:
#include <iostream>
#include <vector>
#include <cstdlib>
int irand(int min, int max) {
return ((double)rand() / ((double)RAND_MAX + 1.0)) * (max - min + 1) + min;
}
int main()
{
int COUNT = 5, // X
MAX_VAL = 6, // Y
MAX_SUM = 14; // Z
std::vector<int> buckets(COUNT, 0);
srand(time(0));
int remaining = MAX_SUM;
while (remaining > 0)
{
int rndBucketIdx = irand(0, COUNT-1);
if (buckets[rndBucketIdx] == MAX_VAL)
continue; // this bucket is already full
buckets[rndBucketIdx]++;
remaining--;
}
std::cout << "Printing sequence: ";
for (size_t i = 0; i < COUNT; ++i)
std::cout << buckets[i] << ' ';
}
which just simply divides the total sum to bunch of buckets until it's gone :)
Example of output: Printing sequence: 4 4 1 0 5
NOTE: this solution was written when the question specified a "MAX SUM" parameter, implying a sum of less than that amount was equally acceptable. The question's now been edited based on the OP's comment that they meant the cumulative sum must actually hit that target. I'm not going to update this answer, but clearly it could trivially discard lesser totals at the last level of recursion.
This solution does a one-time population of a vector<vector<int>> with all the possible combinations of numbers solving the input criterion, then each time a new solution is wanted it picks one of those at random and shuffles the numbers into a random order (thereby picking a permutation of the combination).
It's a bit heavy weight - perhaps not suitable for the actual use that you mentioned after I'd started writing it ;-P - but it produces an even-weighted distribution, and you can easily do things like guarantee a combination won't be returned again until all other combinations have been returned (with a supporting shuffled vector of indices into the combinations).
#include <iostream>
#include <vector>
#include <algorithm>
using std::min;
using std::max;
using std::vector;
// print solutions...
void p(const vector<vector<int>>& vvi)
{
for (int i = 0; i < vvi.size(); ++i)
{
for (int j = 0; j < vvi[i].size(); ++j)
std::cout << vvi[i][j] << ' ';
std::cout << '\n';
}
}
// populate results with solutions...
void f(vector<vector<int>>& results, int n, int max_each, int max_total)
{
if (n == 0) return;
if (results.size() == 0)
{
for (int i = 0; i <= min(max_each, max_total); ++i)
results.push_back(vector<int>(2, i));
f(results, n - 1, max_each, max_total);
return;
}
vector<vector<int>> new_results;
for (int r = 0; r < results.size(); ++r)
{
int previous = *(results[r].rbegin() + 1);
int current_total = results[r].back();
int remaining = max_total - current_total;
for (int i = 0; i <= min(previous,min(max_each, remaining)); ++i)
{
vector<int> v = results[r];
v.back() = i;
v.push_back(current_total + i);
new_results.push_back(v);
}
}
results = new_results;
f(results, n - 1, max_each, max_total);
}
const vector<int>& once(vector<vector<int>>& solutions)
{
int which = std::rand() % solutions.size();
vector<int>& v = solutions[which];
std::random_shuffle(v.begin(), v.end() - 1);
return v;
}
int main()
{
vector<vector<int>> solutions;
f(solutions, 5, 6, 14);
std::cout << "All solution combinations...\n";
p(solutions);
std::cout << "------------------\n";
std::cout << "A few sample permutations...\n";
for (int n = 1; n <= 100; ++n)
{
const vector<int>& o = once(solutions);
for (int i = 0; i < o.size() - 1; ++i)
std::cout << o[i] << ' ';
std::cout << '\n';
}
}
#include<iostream>
#include <cstdlib> //rand ()
using namespace std;
void main()
{
int random ,x=5;
int max , totalMax=0 , sum=0;
cout<<"Enter the total maximum number : ";
cin>>totalMax;
cout<<"Enter the maximum number: ";
cin>>max;
srand(0);
for( int i=0; i<x ; i++)
{
random=rand()%max+1; //range from 0 to max
sum+=random;
if(sum>=totalMax)
{
sum-=random;
i--;
}
else
cout<<random<<' ';
}
cout<<endl<<"Reached total maximum number "<<totalMax<<endl;
}
I wrote this simple code
I tested it using totalMax=14 and max=3 and it worked with me
hope it's what you asked for
LiHo's answer looks pretty similar to my second suggestion, so I'll leave that, but here's an example of the first. It could probably be improved, but it shouldn't have any tragic bugs. Here's a live sample.
#include <algorithm>
#include <array>
#include <random>
std::random_device rd;
std::mt19937 gen(rd());
constexpr int MAX = 14;
constexpr int LINES = 5;
int sum{};
int maxNum = 6;
int minNum{};
std::array<int, LINES> nums;
for (int i = 0; i < LINES; ++i) {
maxNum = std::min(maxNum, MAX - sum);
// e.g., after 0 0, min is 2 because only 12/14 can be filled after
int maxAfterThis = maxNum * (LINES - i - 1);
minNum = std::min(maxNum, std::max(minNum, MAX - sum - maxAfterThis));
std::uniform_int_distribution<> dist(minNum, maxNum);
int num = dist(gen);
nums[i] = num;
sum += num;
}
std::shuffle(std::begin(nums), std::end(nums), gen);
Creating that ditribution every time could potentially slow it down (I don't know), but the range has to go in the constructor, and I'm not one to say how well distributed these numbers are. However, the logic is pretty simple. Aside from that, it uses the nice, shiny C++11 <random> header.
We just make sure no remaining number goes over MAX (14) and that MAX is reached by the end. minNum is the odd part, and that's due to how it progresses. It starts at zero and works its way up as needed (the second part to std::max is figuring out what would be needed if we got 6s for the rest), but we can't let it surpass maxNum. I'm open to a simpler method of calculating minNum if it exists.
Since you know how many numbers you need, generate them from the given distribution but without further conditions, store them, compute the actual sum, and scale them all up/down to get the desired sum.