Random sample from a large population runs into infinite loop

Random sample from a large population runs into infinite loop - c++

I want to draw n samples from a relatively large population without replacement. So I draw random numbers and keep track of my previous choices, so I can resample whenever I drew a number twice:
boost::mt19937 generator;
boost::uniform_int<> distribution(0, 1669 - 1);
boost::variate_generator<boost::mt19937, boost::uniform_int<> >
gen(generator, distribution);
int n = 100;
std::vector<int> idxs;
while(static_cast<int>(idxs.size()) < n)
{
// get random samples
std::generate_n(std::back_inserter(idxs), n - idxs.size(),
gen);
// remove duplicates
// keep everything that's not duplicates to save time
std::sort(idxs.begin(), idxs.end());
std::vector<int>::iterator it = std::unique(idxs.begin(), idxs.end());
idxs.resize(std::distance(idxs.begin(), it));
}
Unfortunately, I run into an infinite loop for the constants used above.
I added some output (that shows that it keeps picking the same number) and stopping after 10 tries for showing the problem:
boost::mt19937 generator;
boost::uniform_int<> distribution(0, 1669 - 1);
boost::variate_generator<boost::mt19937, boost::uniform_int<> >
gen(generator, distribution);
int n = 100;
int repeat = 0;
std::vector<int> idxs;
while(static_cast<int>(idxs.size()) < n)
{
if(repeat++ > 10) break;
cout << "repeat " << repeat <<
", " << idxs.size() << " elements" << endl;
std::generate_n(std::back_inserter(idxs), n - idxs.size(),
gen);
cout << "last " << idxs.back() << endl;
std::sort(idxs.begin(), idxs.end());
std::vector<int>::iterator it = std::unique(idxs.begin(), idxs.end());
idxs.resize(std::distance(idxs.begin(), it));
}
The code prints
repeat 1, 0 elements
last 1347
repeat 2, 99 elements
last 1359
repeat 3, 99 elements
last 1359
and so on, and this seems to loop forever if I don't kill the program. This shouldn't happen, right? Am I just unlucky? Or am I doing something wrong?
short solution
Thanks to #jxh! Using a reference helps:
boost::variate_generator<boost::mt19937&, boost::uniform_int<> >
gen(generator, distribution);

The problem is that generate_n creates a copy of the generator gen you created. So, at the end of the call to generate_n, the state of gen is unchanged. Thus, each time you re-loop, you will generate the same sequence again.
One way to fix this is to use a reference to your random number generator object in your variate_generator:*
boost::variate_generator<boost::mt19937&, boost::uniform_int<> >
gen(generator, distribution);
* Due to my limited experience with Boost, my original suggestion was rather clumsy. I have adopted the solution implemented by the asker in this answer.

Related

Separating and printing out the even and odd random numbers

I have generated 100 random numbers, where I have to shorted out equal even and odd numbers of the random numbers. All even or odd numbers can not be repetitive.
For example I will create 100 random numbers
#include <iostream>
#include <cstdlib>
int main()
{
for (int i=1; i <=100; ++i){
//double p = rand();
std::cout <<"random number is :" << rand()<<std::endl;
}
}
Since I don't know if the even and odd numbers are same in the list from the 100 random numbers, I want to pick minimum number of pair even and odd numbers. I also will like to know what the total odd and even numbers are generated separately. One note is that, if for an example, any even or odd number get printed out multiple times, I want to consider that as a one.
For example. Let's assume we have 60 even and 40 odd numbers from the printed random numbers. And also from the 60 even numbers 10 even numbers are repetitive. So I would consider the different event numbers are 50. The printed out numbers would be first 20 even numbers and first 20 odd numbers.
The reason I want to do this is because I want to learn How I can filter out the even and odd numbers from the for loop of random generator.
UPDATE:
My goal is to find even and odd numbers from the generated random numbers. When I short out the even and odd numbers, all the even and odd numbers will be different. Which means, if I found even number 2 printed out 5 times, I would still consider that one. In this way I want to find the minimum number of even and odd numbers.
Lets have an example:
the generated print out is: {1,2,2,3,4,5,6,8,9,3,2,4,6,10}
From the list the even and odd numbers would be:
even = {2,4,6,8}
odd = {1,3,5,9}
If you looked carefully, I excluded 10 from the even shorting. The reason is that, If i added 10 in the even list, I would have more even than odd.

Use std::unorderd_set to create odd set and even set
to check number is odd = (num&1) will be 1 for odd and 0 for even
std::unordered_set<int> set_odd;
std::unordered_set<int> set_even;
for (int num : nums) {
if (num & 1) set_odd.insert(num);
else set_even.insert(even);
}

This will do what you explained in your example.
#include <set>
#include <vector>
int main()
{
std::vector<int> numbers = {1, 2, 2, 3, 4, 5, 6, 8, 9, 3, 2, 4, 6, 10};
std::set<int> even, odd;
// sort the numbers into even and odd sets
for (auto const& n : numbers)
{
if (n % 2 == 0)
even.insert(n);
else
odd.insert(n);
}
// figure out which set is smaller
int min = std::min(even.size(), odd.size());
// print out the evens
std::cout << "even = ";
auto it = even.begin();
for(int i = 0; i < min; ++i)
{
std::cout << *(it++) << " ";
}
std::cout << std::endl;
// print out the odds
std::cout << "odd = ";
it = odd.begin();
for(int i = 0; i < min; ++i)
{
std::cout << *(it++) << " ";
}
std::cout << std::endl;
}

If you only want pairs of random numbers (pair as in collection of 2-integers, one even, one odd, not std::pair), you need a way to generate the numbers, provide storage for the pairs, and keep track of the number of odds generated and number of evens generated and a way to associate the pairs of numbers generated.
To generate the random numbers std::random_device provides random number generation for int from your choice of random number engines such as the std::mersenne_twister_engine. You can generate a number within a std::uniform_int_distribution simply by declaring the device, seeding the generator with the device and then creating a distribution between a min and max of your choice.
To setup and generate random numbers within the positive range of int you could do:
...
#include <random>
#define NMAX std::numeric_limits<int>::max()
...
std::random_device rd; /* delcare the randon number generator device */
std::mt19937 gen(rd()); /* standard mrsene_twister engine seeded w/rd */
std::uniform_int_distribution<int> dist(0, NMAX); /* create disribution */
And to request a number from the distribution, simply:
int r = dist(rd); /* generate rand */
How to store the pairs generated depends on how you coordinate adding the even and odd numbers generated to coordinate storing pairs of numbers together. You can use a std::map or std::unordered_map or simply use an even and odd std::vector<int>. Using the latter allows you to fill an vector of evens and a vector of odds and then simply output the lesser .size() of the two to capture the pairs of even-odd generated out of our 100 random integers.
For example:
#define MAXRAND 100
...
std::vector<int> even, odd;
...
for (int i = 0; i < MAXRAND; i++) { /* loop MAXRAND times */
int r = dist(rd); /* generate rand */
if (r % 2 == 0) /* if even */
even.push_back(r); /* add to even vector */
else /* odd number */
odd.push_back(r); /* add to odd vector */
}
That way you can specify to generate 100 random numbers and you end up with however many even-odd pairs were generated out of those 100 random numbers -- which sounds like what you were asking.
Putting it altogether you could do:
#include <iostream>
#include <iomanip>
#include <random>
#include <vector>
#define NMAX std::numeric_limits<int>::max()
#define MAXRAND 100
int main (void) {
std::vector<int> even, odd;
std::random_device rd; /* delcare the randon number generator device */
std::mt19937 gen(rd()); /* standard mrsene_twister engine seeded w/rd */
std::uniform_int_distribution<int> dist(0, NMAX); /* create disribution */
size_t limit = 0;
for (int i = 0; i < MAXRAND; i++) { /* loop MAXRAND times */
int r = dist(rd); /* generate rand */
if (r % 2 == 0) /* if even */
even.push_back(r); /* add to even vector */
else /* odd number */
odd.push_back(r); /* add to odd vector */
}
/* output results */
limit = (even.size() > odd.size()) ? odd.size() : even.size();
std::cout << limit << " even-odd pairs (0 - " << NMAX << ")\n\n";
for (size_t p = 0; p < limit; p++)
std::cout << std::setw(12) << even.at(p) <<
" " << odd.at(p) << '\n';
}
Example Use/Output
Running you will generally generate between 40-50 pairs of numbers -- the random distributions being fairly good, e.g.
$ ./bin/evenoddpairs
48 even-odd pairs (0 - 2147483647)
1513290664 712950177
2014370968 990873219
161619218 6719997
2062410942 1965300831
2072442520 938385103
669324326 1957687455
1201350414 2134189381
217290372 1304388089
726760634 232069103
2086887656 1784024967
1345123406 185879429
1706842790 686104759
1034648158 268926351
1445724500 1996600823
1303450734 1890287253
763120758 1581912187
1788401668 1537971261
1542967608 1842999149
377964104 1995119937
87264498 644262155
224210492 519040373
692262016 372293591
502458404 1867793795
575578512 751306553
373162704 170423471
1502626218 152785903
284546326 287967359
388031960 1233214425
1839930048 243652639
465469190 1747259241
1488469408 252779515
2144753340 1992139979
2010564888 298805387
917820234 187798339
1204892922 1454685783
563347322 50283749
1303887182 345841727
1429521892 1668482773
702286146 1191166889
1490493310 963750729
986716484 1522036479
1107465502 1445768043
1613852374 1307939505
1584334086 1565159437
1325156802 354726127
1165679412 1618569419
1192475084 1341169211
71913214 1569003463
There are many ways to put this together, but given your constraint of generating 100 random numbers and then collecting only the even-odd pairs created out of those 100 numbers, the even/odd vectors is a simple to ensure you ended up with only the pairs generated out of the 100. Let me know if that is what you were asking or if you were asking something slightly different.
If you need to enforce only unique random numbers, you can fill an std::unordered_map from your two vectors.

Faster way then mine to populate a vector with unique integers except two values ? C++

I can´t post my all program here, just snippets. Will answer any question.
What I have:
1) I have a vector with 20 ID´s, like this [0,1,2,3,4,5,6...19].
2) I pick two ID´s, for example number 3 and number 6.
What I need:
1) Generate a vector of size N-1, where the N=5. This vector should not contain number 3 and number 6, only the remaining ID´s, and do not repeat them.
For example: new vector = [7,2,19,4]. Yes, only 4 items because the 5th is the number 3 or number 6, they will play with this new created groups, so 1+4 =5(N).
My problem:
1) I need to do this like 1 millions times. It is very slow. I believe that this part of code is the most heavy, because I deleted that part and the program runs really fast without it.
My question:
1) Below is my code, the do while loop, can I somehow optimize it ? maybe I need to use another structure or smarter method to generate this ?
Code:
for (int i = 0; i < _iterations; i++)
{
players.clear();
int y = 0;
do{
// _pop_size = 20
int rand_i = static_cast<int>(rand_double(0, _pop_size));
if (rand_i != 3 && rand_i != 6){
// verify if the ID already exists in vector
if (std::find(players.begin(), players.end(), rand_i) == players.end()){
players.push_back(rand_i);
++y;
}
}
} while (y < _group_size - 1);
// ...
// ...
// ...
// ...
rand_double() function:
double rand_double(int min, int max) const
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(min, max);
return dist(mt);
}

This answer is part gathering up the comments and part to prove a point.
The objective is to get as much as possible out of the processing loop. The first to die is the repeated re-initialization of the random number generator. A random number generator should be set up once and then used repeatedly, so re-init is a bad idea. Good riddance.
The next is to find a faster way to reject already known elements. The current approach uses a linear search through an unsorted vector. Insertion is quick because push_back only really slows down if resizing, but the more items in the vector the longer the worst case search time. A std:: set is an ordered list with very fast look-up, and somewhat slow insert. If the lists are short, stick with vector. If the lists are long (_group_size > 100), go with the set.
Here is an example with long lists:
#include <iostream>
#include <set>
#include <vector>
#include <random>
#include <functional>
#include <chrono>
#include <algorithm>
using namespace std::chrono; // I know, but the lines were ridiculously long
// remove random number generator init from processing loop.
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist(0, 1000000);
// replace function with a bind.
auto rand_int = std::bind(dist, mt);
// test function
int main()
{
int y;
int _group_size = 10000; // loop 10000 times
std::set<int> tempplayers;
std::vector<int> players;
auto start = high_resolution_clock::now(); // get start time
// with vector
do
{
// _pop_size = 20
int rand_i = rand_int();
if (rand_i != 3 && rand_i != 6)
{ //using vector: Linear search.
if (std::find(players.begin(), players.end(), rand_i) == players.end())
{
players.push_back(rand_i);
++y;
} // verify if the ID already exists in vector
}
} while (y < _group_size - 1);
auto t1 = duration_cast<nanoseconds>(high_resolution_clock::now() - start).count();
// Calculate elapsed time
std::cout << "Time (ns) with vector: " << t1 << std::endl;
// reset
players.clear();
y = 0;
// run again with a set instead of a vector
start = high_resolution_clock::now();
do
{
// _pop_size = 20
int rand_i = rand_int();
if (rand_i != 3 && rand_i != 6)
{ //using set. Not sure exactly what search it is. Probably a tree.
if (tempplayers.find(rand_i) == tempplayers.end())
{
tempplayers.insert(rand_i);
//players.push_back(rand_i);
++y;
}
}
} while (y < _group_size - 1);
// copy set into vector for comfortable use.
std::copy(tempplayers.begin(),
tempplayers.end(),
std::back_inserter(players));
//
auto t2 = duration_cast<nanoseconds>(high_resolution_clock::now() - start).count();
std::cout << "Time (ns) with set: " << t2 << std::endl;
if (t2 > 0)
{
std::cout << "Set is " << t1/ t2 << " times faster"<< std::endl;
}
}
A typical output is:
Time (ns) with vector: 373014100
Time (ns) with set: 9000800
Set is 41 times faster
NB: I'm running on Windows and my default tick resolution is horrible.

Better way, is to use a simple array instead of vector.
Since I know the size of the group I just create an array of size x, and add the values to it. To check if the values are already in the array I use a simple for loop.
What happens it that a vector takes time to allocate memory for the next number and an array no, he has already allocated memory for those numbers when I do this:
int array[4];
One test took me 96 seconds, after I changed to an array the same test took only 26 seconds.

Random doubles are always infinite

I have the following:
std::random_device rd;
std::mt19937_64 randEng(rd());
std::uniform_real_distribution<double> rg(std::numeric_limits<double>::lowest(), std::numeric_limits<double>::max());
for(size_t i = 0; i < numToGenerate; i++){
nums[i] = rg(randEng);
std::cout << nums[i] << std::endl;
}
Where nums is a vector presized to numToGenerate
Every number that is printed out though says inf my understanding was that I had set this up to get random numbers between in this case -1.79769e+308 and 1.79769e+308 as it happens to be on my machine. What am I doing wrong here in the set up of this random number generator

Probably the computation of the pseudorandom number includes the difference (max-min). For example to compute a random number between A and B a simple approach would be:
x = A + rnd*(B - A)
where rnd is a random number between 0 and 1. If you do this with the maximum and minimum double precision value you get a problem, because that difference is bigger than the maximum and thus will become "infinite".
After that A + rnd*infinite is always infinite if rnd is not zero, and NaN when it's zero.

srand not random at all - alternatives?

I was toying around with arrays, populating with pseudo random numbers, finding minimum and maximum values and their indices and number of occurrences and I noticed something strange - when using srand seeded with time the number of minimum and maximum value occurrences is ALWAYS equal. This doesn't seem very random to me.
Is there an alternative way of getting a DIFFERENT number of minimum and maximum value occurrences, as one would expect with random numbers?
Here is my code (I am learning so it may be messy and inefficient, recommendations are welcome)
#include <cstdlib>
#include <iostream>
#include <time.h>
using namespace std;
void findExtremes( const int[], int);
int main()
{
const int lenght = 2000; //define lenght
int array1[lenght];
srand(time(0));
for ( int i = 0; i < lenght; i++) //populate array with random numbers and print them out
{
array1[i] = rand() % 3000;
cout << "Index " << i << " = " << array1[i] << endl;
}
findExtremes(array1, lenght); // call fn
return 0;
}
void findExtremes( const int array[], int size)
{
int maxV, minV, maxI, minI;
maxV = array[0];
minV = array[0];
minI = 0;
maxI = 0;
for ( int i = 1; i < size; i++)
{
if ( array[i] > maxV)
{
maxV = array[i];
maxI = i;
}
if ( array[i] < minV)
{
minV = array[i];
minI = i;
}
}
//find the number of occurances for min and max values
int minOcc = 0;
int maxOcc = 0;
for ( int i = 1; i < size; i++)
{
if (array[i] == minV)
minOcc++;
if (array[i] == minV)
maxOcc++;
}
//output
cout << "\nMinmim value is index " << minI << " with value " << minV << " and " << minOcc << " occurances" << endl;
cout << "\nMaxium value is index " << maxI << " with value " << maxV << " and " << maxOcc << " occurances" << endl << "\n";
}

For a start, they're actually pseudo random numbers, not random numbers. In any case, it may be that a a truly random sequence has that exact property that you're seeing :-) The sequence 1,1,1,1,1 is as equally likely to occur in a truly random set as much as 5,2,4,2,99.
If you want a "more random" random sequence, I wouldn't be using the normal ones shipped with C libraries (unless those libraries were written by people who understand randomness) - you should look into things like the Mersenne Twister, using /dev/random (if under Linux) and so on.
You may also want to look at this snippet of your code.
if (array[i] == minV)
minOcc++;
if (array[i] == minV)
maxOcc++;
I believe that last if should be comparing with maxV rather than minV. Otherwise there is zero chance that your minimum and maximum counts will be different.
When I make that change (and change % 3000 to % 30, to get a range of duplicates), I see:
Minmim value is index 112 with value 0 and 65 occurances
Maxium value is index 24 with value 29 and 58 occurances
And, not that it really matters in terms of this question, you may want to clean up your spelling somewhat:
lenght -> length.
minmum -> minimum
maxium -> maximum
occurances -> occurrences

I perform numerical simulations on Physics and my group uses the GSL library for that:
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
class Random
{
private:
gsl_rng* r; //!< Pointer to the gsl rng
public:
//! Constructor: uses argument as the seed
Random(long unsigned int seed);
long int R(int N);
long double R();
long double gaussianR(long double sigma);
};
inline Random::Random(long unsigned int s)
{
r = gsl_rng_alloc( gsl_rng_taus );
gsl_rng_set(r, s); //seed to use to the pseudo-aleatory number generator.
}
// a uniform number between 0 and N-1
inline long int Random::R(int N)
{
return gsl_rng_uniform_int (r, N);
}
// a uniform number between 0 and 1
inline long double Random::R()
{
return gsl_rng_uniform_pos( r );
}
// a gaussian distribution with sigma
inline long double Random::gaussianR(long double sigma)
{
return gsl_ran_gaussian(r, sigma);
}
you have to compile it with flags:
OTHER_LDFLAGS = -lgsl -lm -lgslcblas
and add includes and libs (this is for fink installation case):
HEADER_SEARCH_PATHS = /sw/include
LIBRARY_SEARCH_PATHS = /sw/lib
Hope this helps.

You can use the new random library included in C++11, or you can use the Boost::Random library that it was based off.

The behaviour of your Pseudo-Random Number Generator (PRNG) is perfectly normal.
In fact, if your draw enough numbers from rand(), you will always get the same extrema, since it is uniformly distributed.
In your case, the question is: do you need another behaviour? You should not pounce on True Random Numbers as #sehe suggests. This might be useless, and even problematic when dealing with stochastic simulations, which Monte Carlo algorithms are. Imagine that you want to debug a code snippet based upon random numbers, or that of colleague of yours intends to check your results: how would you do if you were not able to reproduce the same random sequence?
That is one of the reason why PRNGs are sufficient and often preferred when you do not need crypto-secure random numbers.

I think the problem is that your initial statement is wrong. The code is provides different numbers each time. I tried you unmodified code and there are the results:
Minmim value is index 1194 with value 0 and 1 occurances
Maxium value is index 1264 with value 2995 and 1 occurances
Minmim value is index 1958 with value 1 and 1 occurances
Maxium value is index 1510 with value 2991 and 1 occurances
...
However, there are two bugs in the code:
In the second for loop, you should start with i = 0.
You should compare with maxV instead of minV in the same loop.
With regards to the random number generation:
When seeded with the same number, a series of rand() calles should return the same numbers. rand() is not for random numbers, but for pseudo-random numbers. rand() should be have this way because than e.g. a simulation will output the same results when started with the same seed. It is a very nice property.
You seed it with the current time, which is ok and therefore rand() should return a different series of numbers each time (at least when not called multiple times a second). The seeding looks good to me. It is in fact very similar to the example provided here.
The sample size is 2000 and the range of generated numbers is 3000. This means that it is not probable that the minimal size and the maximal size are always the same. If the sample size would be a million, with a high probability 2999 should be the largest number in the very most runs.

Gentlepeople: NOTE
Yes! This answer is "old". And in the era of c++11, by all means use c++11 <random>. But please don't downvote this question, years after the fact because you think "Ew Everybody knows rand() is evil!". In fact, it is not. It's just limited, and very easy to use inappropriately. But - as an historic fact, it exists as an API and it is still useful to document how you can use it better. I'm not deleting this answer for very reason.
Original answer:
Please read
http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx
Notably, don't write rand() % 3000. Write
int r = rand() / ( RAND_MAX / 3000 + 1 );
In fact, random should be uniformly distributed, meaning that indeed the lower and upper bound would have near 100% chance of occurring when the number of samples is large enough (larger than the size of the domain, for starters).
That's what true random is about (try to do a Monte-Carlo algorithm without it - you'd be very unhappy)

Weighted random numbers

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.
In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:
1 (weight: 90)
2 (weight: 56)
3 (weight: 4)
Does Boost have some sort of functionality for this?

There is a straightforward algorithm for picking an item at random, where items have individual weights:
1) calculate the sum of all the weights
2) pick a random number that is 0 or greater and is less than the sum of the weights
3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight
Pseudo-code illustrating this:
int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
assert(!"should never get here");
This should be straightforward to adapt to your boost containers and such.
If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:
By storing the cumulative weight sum in each item you can use a binary search to pick the item corresponding to the pick weight.
If you do not know the number of items in the list, then there's a very neat algorithm called reservoir sampling that can be adapted to be weighted.

Updated answer to an old question. You can easily do this in C++11 with just the std::lib:
#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>
int main()
{
// Set up distribution
double interval[] = {1, 2, 3, 4};
double weights[] = { .90, .56, .04};
std::piecewise_constant_distribution<> dist(std::begin(interval),
std::end(interval),
std::begin(weights));
// Choose generator
std::mt19937 gen(std::time(0)); // seed as wanted
// Demonstrate with N randomly generated numbers
const unsigned N = 1000000;
// Collect number of times each random number is generated
double avg[std::extent<decltype(weights)>::value] = {0};
for (unsigned i = 0; i < N; ++i)
{
// Generate random number using gen, distributed according to dist
unsigned r = static_cast<unsigned>(dist(gen));
// Sanity check
assert(interval[0] <= r && r <= *(std::end(interval)-2));
// Save r for statistical test of distribution
avg[r - 1]++;
}
// Compute averages for distribution
for (double* i = std::begin(avg); i < std::end(avg); ++i)
*i /= N;
// Display distribution
for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}
Output on my system:
avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544
Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested "probabilities" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.

If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:
#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
i = dist(gen);
//do something with your samples...
Note, however, that the c++11 discrete_distribution computes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.
Will's answer https://stackoverflow.com/a/1761646/837451 avoids this overhead but will be slower to draw from than the C++11 because it can't use binary search.
To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tcc on my Ubuntu 16.04 + GCC 5.3 install):
template<typename _IntType>
void
discrete_distribution<_IntType>::param_type::
_M_initialize()
{
if (_M_prob.size() < 2)
{
_M_prob.clear();
return;
}
const double __sum = std::accumulate(_M_prob.begin(),
_M_prob.end(), 0.0);
// Now normalize the probabilites.
__detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
__sum);
// Accumulate partial sums.
_M_cp.reserve(_M_prob.size());
std::partial_sum(_M_prob.begin(), _M_prob.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
}

What I do when I need to weight numbers is using a random number for the weight.
For example: I need that generate random numbers from 1 to 3 with the following weights:
10% of a random number could be 1
30% of a random number could be 2
60% of a random number could be 3
Then I use:
weight = rand() % 10;
switch( weight ) {
case 0:
randomNumber = 1;
break;
case 1:
case 2:
case 3:
randomNumber = 2;
break;
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
randomNumber = 3;
break;
}
With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.
You can play with it as your needs.
Hope I could help you, Good Luck!

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.
Example:
1 60%
2 35%
3 5%
So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)
Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:
template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
if (begin==end) return end;
double sum=0.;
for (It i=begin;i!=end;++i)
sum+=p(*i);
double choice=sum*random01();
for (It i=begin;;) {
choice -= p(*i);
It r=i;
++i;
if (choice<0 || i==end) return r;
}
return begin; //unreachable
}
Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.
p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.

This is my understanding of a "weighted random", I've been using this recently. (Code is in Python but can be implemented in other langs)
Let's say you want to pick a random person and they don't have equal chances of being selected
You can give each person a "weight" or "chance" value:
choices = [("Ade", 60), ("Tope", 50), ("Maryamu", 30)]
You use their weights to calculate a score for each then find the choice with the highest score
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
print(highest)
For Ade the highest score they can get is 60, Tope 50 and so on, meaning that Ade has a higher chance of generating the largest score than the rest.
You can use any range of weights, the greater the difference the more skewed the distribution.
E.g if Ade had a weight of 1000 they will almost always be chosen.
Test
votes = [{"name": "Ade", "votes": 0}, {"name": "Tope", "votes": 0}, {"name": "Maryamu", "votes": 0]
for v in range(100):
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
candidate = choices(index(highest[0])) # get index of person
votes[candidate]["count"] += 1 # increase vote count
print(votes)
// votes printed at the end. your results might be different
[{"name": "Ade", "votes": 45}, {"name": "Tope", "votes": 30}, {"name": "Maryamu", "votes": 25}]
Issues
It looks like the more the voters, the more predictable the results. Welp
Hope this gives someone an idea...

I have just implemented the given solution by "will"
#include <iostream>
#include <map>
using namespace std;
template < class T >
class WeightedRandomSample
{
public:
void SetWeigthMap( map< T , unsigned int >& WeightMap )
{
m_pMap = &WeightMap;
}
T GetRandomSample()
{
unsigned int sum_of_weight = GetSumOfWeights();
unsigned int rnd = (rand() % sum_of_weight);
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
unsigned int w = it->second;
if(rnd < w)
return (it->first);
rnd -= w;
}
//assert(!"should never get here");
T* t = NULL;
return *(t);
}
unsigned int GetSumOfWeights()
{
if(m_pMap == NULL)
return 0;
unsigned int sum = 0;
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
sum += it->second;
}
return sum;
}
protected:
map< T , unsigned int>* m_pMap = NULL;
};
typedef pair<int , int> PAIR_INT_INT;
typedef map<PAIR_INT_INT ,unsigned int> mul_table_weighted_map;
int main()
{
mul_table_weighted_map m;
m[PAIR_INT_INT(2,3)] = 10;
m[PAIR_INT_INT(4,5)] = 20;
m[PAIR_INT_INT(2,5)] = 10;
WeightedRandomSample<PAIR_INT_INT> WRS;
WRS.SetWeigthMap(m);
unsigned int sum_of_weight = WRS.GetSumOfWeights();
cout <<"Sum of weights : " << sum_of_weight << endl;
unsigned int number_of_test = 10000;
cout << "testing " << number_of_test << " ..." << endl;
map<PAIR_INT_INT , unsigned int> check_map;
for(int i = 0 ; i < number_of_test ; i++)
{
PAIR_INT_INT res = WRS.GetRandomSample();
check_map[res]++;
//cout << i+1 << ": random = " << res.first << " * " << res.second << endl;
}
cout << "results: " << endl;
for(auto t : check_map)
{
PAIR_INT_INT p = t.first;
unsigned int expected = (number_of_test * m[p]) / sum_of_weight;
cout << " pair " << p.first << " * " << p.second
<< ", counted = " << t.second
<< ", expected = " << expected
<< endl;
}
return 0;
}

For example, generating a random index in a vector of weights for that index can be done this way:
#include <bits/stdc++.h>
using namespace std;
int getWeightedRandomNumber(vector<int> weights){
vector<int> vec;
for(int i=0; i<weights.size(); i++){
for(int j=0; j<weights[i]; j++){
vec.push_back(i);
}
}
random_shuffle(vec.begin(), vec.end());
return vec.front();
}
int main()
{
vector<int> v{2,4,5,100,1,2,4,4};
for(int i=0; i<100; i++){
cout<<getWeightedRandomNumber(v)<<endl;
}
}
Since we are constructing another vector with (no of elements) = almost (current no of elements) * (mean weight), this approach might now work when dealing with large data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Random sample from a large population runs into infinite loop - c++

Related

Separating and printing out the even and odd random numbers

Faster way then mine to populate a vector with unique integers except two values ? C++

Random doubles are always infinite

srand not random at all - alternatives?

Weighted random numbers

Categories

Resources