std::mersenne_twister_engine and random number generation - c++

What is the distribution (uniform, poisson, normal, etc.) that is generated if I did the below? The output appears to indicate a uniform distribution. But then, why do we need std::uniform_int_distribution?
int main()
{
std::mt19937_64 generator(134);
std::map<int, int> freq;
const int size = 100000;
for (int i = 0; i < size; ++i) {
int r = generator() % size;
freq[r]++;
}
for (auto f : freq) {
std::cout << std::string(f.second, '*') << std::endl;
}
return 0;
}
Thanks!

Because while generator() is an uniform distribution over [generator.min(), generator.max()], generator() % n is not a uniform distribution over [0, n) (unless generator.max() is an exact multiple of n, assuming generator.min() == 0).
Let's take an example: min() == 0, max() == 65'535 and n == 7.
gen() will give numbers in the range [0, 65'535] and in this range there are:
9'363 numbers such that gen() % 7 == 0
9'363 numbers such that gen() % 7 == 1
9'362 numbers such that gen() % 7 == 2
9'362 numbers such that gen() % 7 == 3
9'362 numbers such that gen() % 7 == 4
9'362 numbers such that gen() % 7 == 5
9'362 numbers such that gen() % 7 == 6
If you are wondering where did I get these numbers think of it like this: 65'534 is an exact multiple of 7 (65'534 = 7 * 9'362). This means that in [0, 65'533] there are exactly 9'362 numbers who map to each of the {0, 1, 2, 3, 4, 5, 6} by doing gen() % 7. This leaves 65'534 who maps to 0 and 65'535 who maps to 1
So you see there is a bias towards [0, 1] than to [2, 6], i.e.
0 and 1 have a slightly higher chance (9'363 / 65'536 ≈ 14.28680419921875 %)‬ of appearing than
2, 3, 4, 5 and 6 (9'362 / 65'536 ≈ 14.2852783203125‬ %).
std::uniformn_distribution doesn't have this problem and uses some mathematical woodo with possibly getting more random numbers from the generator to achieve a truly uniform distribution.

The random engine std::mt19937_64 outputs a 64-bit number that behaves like a uniformly distributed random number. Each of the C++ random engines (including those of the std::mersenne_twister_engine family) outputs a uniformly-distributed pseudorandom number of a specific size using a specific algorithm.
Specifically, std::mersenne_twister_engine meets the RandomNumberEngine requirement, which in turn meets the UniformRandomBitGenerator requirement; therefore, std::mersenne_twister_engine outputs bits that behave like uniformly-distributed random bits.
On the other hand, std::uniform_int_distribution is useful for transforming numbers from random engines into random integers of a user-defined range (say, from 0 through 10). But note that uniform_int_distribution and other distributions (unlike random number engines) can be implemented differently from one C++ standard library implementation to another.

std::mt19937_64 generates a pseudo-random mutually independent sequence of long long / unsigned long long numbers. It is supposed to be uniform but I don't know the exact details of the engine, though, it is one of the best discovered engines thus far.
By taking % n you get an approximation to pseudo-random uniform distribution over integers [0, ... ,n] - but it is inherently inaccurate. Certain numbers have slightly higher chance to occur while others have slightly lower chance depending on n. E.g., since 2^64 = 18446744073709551616 so with n=10000 first 1616 values have a slightly higher chance to occur than the last 10000-1616 values. std::uniform_distribution takes care of the inaccuracy by taking a new random number in very rare cases: say, if the number is above 18446744073709550000 for n=10000 take a new number - it would work. Though, concrete details are up to implementation.

One of the major accomplishments of <random> was the separation of distributions from engines.
I see it as similar to Alexander Stepanov's STL, which separated algorithms from containers through the use of iterators. For random numbers I can do an implementation of the Blum-Blum-Shub single bit generator (engine) and it will still work with all the distributions in <random>. Or, I can do a simple Linear Congruential Generator, x_{n + 1} = a * x_{n} % m, which when correctly seeded can never generate 0. Again, it will work with all the distributions. Likewise, I can write a new distribution and I don't have to worry about the peculiarities of any engine as long as I only use the interface specified by a UniformRandomBitGenerator.
In general, you should always use a distribution. Also, it is time to retire using '%' for generating random numbers.

Related

How to set a minimum range for generating random number in c++? [duplicate]

I need a function which would generate a random integer in a given range (including boundary values). I don't have unreasonable quality/randomness requirements; I have four requirements:
I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
I need it to be reasonably uniform (use of rand() is perfectly fine).
the minimum-maximum ranges can be anything from <0, 1> to <-32727, 32727>.
it has to be seedable.
I currently have the following C++ code:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is a major issue for small ranges like <-1, 1>, where the last value is almost never returned.
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
But it still doesn't give me a uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
Is there a better formula? (Or even whole pseudo-random number generator function?)
The simplest (and hence best) C++ (using the 2011 standard) answer is:
#include <random>
std::random_device rd; // Only used once to initialise (seed) engine
std::mt19937 rng(rd()); // Random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // Guaranteed unbiased
auto random_integer = uni(rng);
There isn't any need to reinvent the wheel, worry about bias, or worry about using time as the random seed.
A fast, somewhat better than yours, but still not properly uniform distributed solution is
output = min + (rand() % static_cast<int>(max - min + 1))
Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.
If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.
Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.
#include <iostream>
#include <random>
#include <chrono>
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
Clock::time_point t0 = Clock::now();
const int N = 10000000;
typedef std::minstd_rand G; // Select the engine
G g; // Construct the engine
typedef std::uniform_int_distribution<> D; // Select the distribution
D d(-57, 365); // Construct the distribution
int c = 0;
for (int i = 0; i < N; ++i)
c += d(g); // Generate a random number
Clock::time_point t1 = Clock::now();
std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
return c;
}
For me (2.8 GHz Intel Core i5) this prints out:
2.10268e+07 random numbers per second.
You can seed the generator by passing in an int to its constructor:
G g(seed);
If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g., to long long):
typedef std::uniform_int_distribution<long long> D;
If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:
typedef std::mt19937 G; // Now using mersenne_twister_engine
Having separate control over the random number generator, and the random distribution can be quite liberating.
I've also computed (not shown) the first four "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:
min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001
(The x_ prefix refers to "expected".)
Let's split the problem into two parts:
Generate a random number n in the range 0 through (max-min).
Add min to that number
The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias
to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.
It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2,
we know that p > 1/2, so the expected number of iterations will always be less than two
for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.
Use the Mersenne Twister. The Boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects, such as artificial intelligence and evolutionary algorithms.
Here's their example where they make a simple function to roll a six-sided die:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>
boost::mt19937 gen;
int roll_die() {
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
return die();
}
Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():
The Mersenne Twister is a "random
number" generator invented by Makoto
Matsumoto and Takuji Nishimura; their
website includes numerous
implementations of the algorithm.
Essentially, the Mersenne Twister is a
very large linear-feedback shift
register. The algorithm operates on a
19,937 bit seed, stored in an
624-element array of 32-bit unsigned
integers. The value 2^19937-1 is a
Mersenne prime; the technique for
manipulating the seed is based on an
older "twisting" algorithm -- hence
the name "Mersenne Twister".
An appealing aspect of the Mersenne
Twister is its use of binary
operations -- as opposed to
time-consuming multiplication -- for
generating numbers. The algorithm also
has a very long period, and good
granularity. It is both fast and
effective for non-cryptographic applications.
int RandU(int nMin, int nMax)
{
return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}
This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.
Assume min and max are integer values,
[ and ] means include this value,
( and ) means do not include this value,
using the above to get the right value using C++'s rand().
Reference:
For ()[] define, visit Interval (mathematics).
For the rand and srand function or RAND_MAX define,
visit std::rand.
[min, max]
int randNum = rand() % (max - min + 1) + min
(min, max]
int randNum = rand() % (max - min) + min + 1
[min, max)
int randNum = rand() % (max - min) + min
(min, max)
int randNum = rand() % (max - min - 1) + min + 1
Here is an unbiased version that generates numbers in [low, high]:
int r;
do {
r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;
If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.
I recommend the Boost.Random library. It's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.
Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.
Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]
Take base b=max-min+1
Start from representing a number you got from rand() in base b.
That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].
Of course the final shift to [min,max] is simple for each random number r+min.
int n = NUM_DIGIT-1;
while(n >= 0)
{
r[n] = res % b;
res -= r[n];
res /= b;
n--;
}
If NUM_DIGIT is the number of digit in base b that you can extract and that is
NUM_DIGIT = floor(log(b,RAND_MAX))
then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.
In answers to this question, rejection sampling was already addressed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.
The algorithm is really simple:
calculate the smallest power of 2 greater than the interval length
randomize one number in that "new" interval
return that number if it is less than the length of the original interval
reject otherwise
Here's my sample code:
int randInInterval(int min, int max) {
int intervalLen = max - min + 1;
//now calculate the smallest power of 2 that is >= than `intervalLen`
int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));
int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"
if (randomNumber < intervalLen)
return min + randomNumber; //ok!
return randInInterval(min, max); //reject sample and try again
}
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
PS: Obviously avoiding the recursion would be more efficient (there isn't any need to calculate over and over the log ceiling...), but I thought it was more readable for this example.
The following is the idea presented by Walter. I wrote a self-contained C++ class that will generate a random integer in the closed interval [low, high]. It requires C++11.
#include <random>
// Returns random integer in closed range [low, high].
class UniformRandomInt {
std::random_device _rd{};
std::mt19937 _gen{_rd()};
std::uniform_int_distribution<int> _dist;
public:
UniformRandomInt() {
set(1, 10);
}
UniformRandomInt(int low, int high) {
set(low, high);
}
// Set the distribution parameters low and high.
void set(int low, int high) {
std::uniform_int_distribution<int>::param_type param(low, high);
_dist.param(param);
}
// Get random integer.
int get() {
return _dist(_gen);
}
};
Example usage:
UniformRandomInt ur;
ur.set(0, 9); // Get random int in closed range [0, 9].
int value = ur.get()
The formula for this is very simple, so try this expression,
int num = (int) rand() % (max - min) + min;
//Where rand() returns a random number between 0.0 and 1.0
The following expression should be unbiased if I am not mistaken:
std::floor( ( max - min + 1.0 ) * rand() ) + min;
I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 not including 1.0 and that max and min are integers with the condition that min < max.

Why is rand()%6 biased?

When reading how to use std::rand, I found this code on cppreference.com
int x = 7;
while(x > 6)
x = 1 + std::rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
What is wrong with the expression on the right? Tried it and it works perfectly.
There are two issues with rand() % 6 (the 1+ doesn't affect either problem).
First, as several answers have pointed out, if the low bits of rand() aren't appropriately uniform, the result of the remainder operator is also not uniform.
Second, if the number of distinct values produced by rand() is not a multiple of 6, then the remainder will produce more low values than high values. That's true even if rand() returns perfectly distributed values.
As an extreme example, pretend that rand() produces uniformly distributed values in the range [0..6]. If you look at the remainders for those values, when rand() returns a value in the range [0..5], the remainder produces uniformly distributed results in the range [0..5]. When rand() returns 6, rand() % 6 returns 0, just as if rand() had returned 0. So you get a distribution with twice as many 0's as any other value.
The second is the real problem with rand() % 6.
The way to avoid that problem is to discard values that would produce non-uniform duplicates. You calculate the largest multiple of 6 that's less than or equal to RAND_MAX, and whenever rand() returns a value that's greater than or equal to that multiple you reject it and call `rand() again, as many times a needed.
So:
int max = 6 * ((RAND_MAX + 1u) / 6)
int value = rand();
while (value >= max)
value = rand();
That's a different implementation of the code in question, intended to more clearly show what's going on.
There are hidden depths here:
The use of the small u in RAND_MAX + 1u. RAND_MAX is defined to be an int type, and is often the largest possible int. The behaviour of RAND_MAX + 1 would be undefined in such instances as you'd be overflowing a signed type. Writing 1u forces type conversion of RAND_MAX to unsigned, so obviating the overflow.
The use of % 6 can (but on every implementation of std::rand I've seen doesn't) introduce any additional statistical bias above and beyond the alternative presented. Such instances where % 6 is hazardous are cases where the number generator has correlation plains in the low order bits, such as a rather famous IBM implementation (in C) of rand in, I think, the 1970s which flipped the high and low bits as "a final flourish". A further consideration is that 6 is very small cf. RAND_MAX, so there will be a minimal effect if RAND_MAX is not a multiple of 6, which it probably isn't.
In conclusion, these days, due to its tractability, I'd use % 6. It's not likely to introduce any statistical anomalies beyond those introduced by the generator itself. If you are still in doubt, test your generator to see if it has the appropriate statistical properties for your use case.
This example code illustrates that std::rand is a case of legacy cargo cult balderdash that should make your eyebrows raise every time you see it.
There are several issues here:
The contract people usually assume—even the poor hapless souls who don't know any better and won't think of it in precisely these terms—is that rand samples from the uniform distribution on the integers in 0, 1, 2, …, RAND_MAX, and each call yields an independent sample.
The first problem is that the assumed contract, independent uniform random samples in each call, is not actually what the documentation says—and in practice, implementations historically failed to provide even the barest simulacrum of independence. For example, C99 §7.20.2.1 ‘The rand function’ says, without elaboration:
The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX.
This is a meaningless sentence, because pseudorandomness is a property of a function (or family of functions), not of an integer, but that doesn't stop even ISO bureaucrats from abusing the language. After all, the only readers who would be upset by it know better than to read the documentation for rand for fear of their brain cells decaying.
A typical historical implementation in C works like this:
static unsigned int seed = 1;
static void
srand(unsigned int s)
{
seed = s;
}
static unsigned int
rand(void)
{
seed = (seed*1103515245 + 12345) % ((unsigned long)RAND_MAX + 1);
return (int)seed;
}
This has the unfortunate property that even though a single sample may be uniformly distributed under a uniform random seed (which depends on the specific value of RAND_MAX), it alternates between even and odd integers in consecutive calls—after
int a = rand();
int b = rand();
the expression (a & 1) ^ (b & 1) yields 1 with 100% probability, which is not the case for independent random samples on any distribution supported on even and odd integers. Thus, a cargo cult emerged that one should discard the low-order bits to chase the elusive beast of ‘better randomness’. (Spoiler alert: This is not a technical term. This is a sign that whosever prose you are reading either doesn't know what they're talking about, or thinks you are clueless and must be condescended to.)
The second problem is that even if each call did sample independently from a uniform random distribution on 0, 1, 2, …, RAND_MAX, the outcome of rand() % 6 would not be uniformly distributed in 0, 1, 2, 3, 4, 5 like a die roll, unless RAND_MAX is congruent to -1 modulo 6. Simple counterexample: If RAND_MAX = 6, then from rand(), all outcomes have equal probability 1/7, but from rand() % 6, the outcome 0 has probability 2/7 while all other outcomes have probability 1/7.
The right way to do this is with rejection sampling: repeatedly draw an independent uniform random sample s from 0, 1, 2, …, RAND_MAX, and reject (for example) the outcomes 0, 1, 2, …, ((RAND_MAX + 1) % 6) - 1—if you get one of those, start over; otherwise, yield s % 6.
unsigned int s;
while ((s = rand()) < ((unsigned long)RAND_MAX + 1) % 6)
continue;
return s % 6;
This way, the set of outcomes from rand() that we accept is evenly divisible by 6, and each possible outcome from s % 6 is obtained by the same number of accepted outcomes from rand(), so if rand() is uniformly distributed then so is s. There is no bound on the number of trials, but the expected number is less than 2, and the probability of success grows exponentially with the number of trials.
The choice of which outcomes of rand() you reject is immaterial, provided that you map an equal number of them to each integer below 6. The code at cppreference.com makes a different choice, because of the first problem above—that nothing is guaranteed about the distribution or independence of outputs of rand(), and in practice the low-order bits exhibited patterns that don't ‘look random enough’ (never mind that the next output is a deterministic function of the previous one).
Exercise for the reader: Prove that the code at cppreference.com yields a uniform distribution on die rolls if rand() yields a uniform distribution on 0, 1, 2, …, RAND_MAX.
Exercise for the reader: Why might you prefer one or the other subsets to reject? What computation is needed for each trial in the two cases?
A third problem is that the seed space is so small that even if the seed is uniformly distributed, an adversary armed with knowledge of your program and one outcome but not the seed can readily predict the seed and subsequent outcomes, which makes them seem not so random after all. So don't even think about using this for cryptography.
You can go the fancy overengineered route and C++11's std::uniform_int_distribution class with an appropriate random device and your favorite random engine like the ever-popular Mersenne twister std::mt19937 to play at dice with your four-year-old cousin, but even that is not going to be fit for generating cryptographic key material—and the Mersenne twister is a terrible space hog too with a multi-kilobyte state wreaking havoc on your CPU's cache with an obscene setup time, so it is bad even for, e.g., parallel Monte Carlo simulations with reproducible trees of subcomputations; its popularity likely arises mainly from its catchy name. But you can use it for toy dice rolling like this example!
Another approach is to use a simple cryptographic pseudorandom number generator with a small state, such as a simple fast key erasure PRNG, or just a stream cipher such as AES-CTR or ChaCha20 if you are confident (e.g., in a Monte Carlo simulation for research in the natural sciences) that there are no adverse consequences to predicting past outcomes if the state is ever compromised.
I'm not an experienced C++ user by any means, but was interested to see if the other answers regarding
std::rand()/((RAND_MAX + 1u)/6) being less biased than 1+std::rand()%6 actually holds true. So I wrote a test program to tabulate the results for both methods (I haven't written C++ in ages, please check it). A link for running the code is found here. It's also reproduced as follows:
// Example program
#include <cstdlib>
#include <iostream>
#include <ctime>
#include <string>
int main()
{
std::srand(std::time(nullptr)); // use current time as seed for random generator
// Roll the die 6000000 times using the supposedly unbiased method and keep track of the results
int results[6] = {0,0,0,0,0,0};
// roll a 6-sided die 20 times
for (int n=0; n != 6000000; ++n) {
int x = 7;
while(x > 6)
x = 1 + std::rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
results[x-1]++;
}
for (int n=0; n !=6; n++) {
std::cout << results[n] << ' ';
}
std::cout << "\n";
// Roll the die 6000000 times using the supposedly biased method and keep track of the results
int results_bias[6] = {0,0,0,0,0,0};
// roll a 6-sided die 20 times
for (int n=0; n != 6000000; ++n) {
int x = 7;
while(x > 6)
x = 1 + std::rand()%6;
results_bias[x-1]++;
}
for (int n=0; n !=6; n++) {
std::cout << results_bias[n] << ' ';
}
}
I then took the output of this and used the chisq.test function in R to run a Chi-square test to see if the results are significantly different than expected. This stackexchange question goes into more detail of using the chi-square test to test die fairness: How can I test whether a die is fair?. Here are the results for a few runs:
> ?chisq.test
> unbias <- c(100150, 99658, 100319, 99342, 100418, 100113)
> bias <- c(100049, 100040, 100091, 99966, 100188, 99666 )
> chisq.test(unbias)
Chi-squared test for given probabilities
data: unbias
X-squared = 8.6168, df = 5, p-value = 0.1254
> chisq.test(bias)
Chi-squared test for given probabilities
data: bias
X-squared = 1.6034, df = 5, p-value = 0.9008
> unbias <- c(998630, 1001188, 998932, 1001048, 1000968, 999234 )
> bias <- c(1000071, 1000910, 999078, 1000080, 998786, 1001075 )
> chisq.test(unbias)
Chi-squared test for given probabilities
data: unbias
X-squared = 7.051, df = 5, p-value = 0.2169
> chisq.test(bias)
Chi-squared test for given probabilities
data: bias
X-squared = 4.319, df = 5, p-value = 0.5045
> unbias <- c(998630, 999010, 1000736, 999142, 1000631, 1001851)
> bias <- c(999803, 998651, 1000639, 1000735, 1000064,1000108)
> chisq.test(unbias)
Chi-squared test for given probabilities
data: unbias
X-squared = 7.9592, df = 5, p-value = 0.1585
> chisq.test(bias)
Chi-squared test for given probabilities
data: bias
X-squared = 2.8229, df = 5, p-value = 0.7273
In the three runs that I did, the p-value for both methods was always greater than typical alpha values used to test significance (0.05). This means that we wouldn't consider either of them to be biased. Interestingly, the supposedly unbiased method has consistently lower p-values, which indicates that it might actually be more biased. The caveat being that I only did 3 runs.
UPDATE: While I was writing my answer, Konrad Rudolph posted an answer that takes the same approach, but gets a very different result. I don't have the reputation to comment on his answer, so I'm going to address it here. First, the main thing is that the code he uses uses the same seed for the random number generator every time it's run. If you change the seed, you actually get a variety of results. Second, if you don't change the seed, but change the number of trials, you also get a variety of results. Try increasing or decreasing by an order of magnitude to see what I mean. Third, there is some integer truncation or rounding going on where the expected values aren't quite accurate. It probably isn't enough to make a difference, but it's there.
Basically, in summary, he just happened to get the right seed and number of trials that he might be getting a false result.
One can think of a random number generator as working on a stream of binary digits. The generator turns the stream into numbers by slicing it up into chunks. If the std:rand function is working with a RAND_MAX of 32767, then it is using 15 bits in each slice.
When one takes the modules of a number between 0 and 32767 inclusive one finds that 5462 '0's and '1's but only 5461 '2's, '3's, '4's, and '5's. Hence the result is biased. The larger the RAND_MAX value is, the less bias there will be, but it is inescapable.
What is not biased is a number in the range [0..(2^n)-1]. You can generate a (theoretically) better number in the range 0..5 by extracting 3 bits, converting them to an integer in the range 0..7 and rejecting 6 and 7.
One hopes that every bit in the bit stream has an equal chance of being a '0' or a '1' irrespective of where it is in the stream or the values of other bits. This is exceptionally difficult in practice. The many different implementations of software PRNGs offer different compromises between speed and quality. A linear congruential generator such as std::rand offers fastest speed for lowest quality. A cryptographic generator offers highest quality for lowest speed.

Will this give me proper random numbers based on these probabilities? C++

Code:
int random = (rand() % 7 + 1)
if (random == 1) { } // num 1
else if (random == 2) { } // num 2
else if (random == 3 || random == 4) { } // num 3
else if (random == 5 || random == 6) { } // num 4
else if (random == 7) { } // num 5
Basically I want each of these numbers with each of these probabilities:
1: 1/7
2: 1/7
3: 2/7
4: 2/7
5: 1/7
Will this code give me proper results? I.e. if this is run infinite times, will I get the proper frequencies? Is there a less-lengthy way of doing this?
Not, it's actually slightly off, due to the way rand() works. In particular, rand returns values in the range [0,RAND_MAX]. Hypothetically, assume RAND_MAX were ten. Then rand() would give 0…10, and they'd be mapped (by modulus) to:
0 → 0
1 → 1
2 → 2
3 → 3
4 → 4
5 → 5
6 → 6
7 → 0
8 → 1
9 → 2
10 → 3
Note how 0–3 are more common than 4–6; this is bias in your random number generation. (You're adding 1 as well, but that just shifts it over).
RAND_MAX of course isn't 10, but it's probably not a multiple of 7 (minus 1), either. Most likely its a power of two. So you'll have some bias.
I suggest using the Boost Random Number Library which can give you a random number generator that yields 1–7 without bias. Look also at bames53's answer using C++11, which is the right way to do this if your code only needs to target C++11 platforms.
Just another way:
float probs[5] = {1/7.0f, 1/7.0f, 2/7.0f, 2/7.0f, 1/7.0f};
float sum = 0;
for (int i = 0; i < 5; i++)
sum += probs[i]; /* edit */
int rand_M() {
float f = (rand()*sum)/RAND_MAX; /* edit */
for (int i = 0; i < 5; i++) {
if (f <= probs[i]) return i;
f -= probs[i];
}
return 4;
}
Assuming rand() is good then your code will work with only a very small bias to the lower X numbers, where X is RAND_MAX % 7. It's much more likely that you won't get the desired odds due to the quality of the implementation of rand(). If you find that to be the case then you'll want to use an alternative random number generator.
C++11 introduces the header <random> which includes several quality RNGs. Here's an example:
#include <random>
#include <functional>
auto rand = std::bind(std::uniform_int_distribution<int>(1,7),std::mt19937());
Given this, when you call rand() you will get a number from 1 to 7 each with equal probability. (And you can choose different engines if for different quality and speed characteristics.) You can then use this to implement the if-else conditions your example currently uses with std::rand(). However <random> allows you to do even better using one of their non-uniform distributions. In this case what you want is discrete_distribution. This distribution allows you to explicitly state the weights for each value from 0 to n.
// the random number generator
auto _rand = std::bind(std::discrete_distribution<int>{1./7.,1./7.,2./7.,2./7.,1./7.},std::mt19937());
// convert results of RNG from the range [0-4] to [1-5]
auto rand = [&_rand]() { return _rand() +1; };
int toohigh = RAND_MAX - RAND_MAX%7;
int random;
do {
random = rand();
while (random >= toohigh); //should happen ~0.03% of the time
static const int results[7] = {1, 2, 3, 3, 4, 4, 5};
random = results[random%7];
This should give numbers with a distribution as even as rand can handle, and without the big if switch.
Note this does have a theoretically possible infinite loop, but the statistical odds of it staying in the loop for even are minuscule. The odds of it staying in the loop twice is quite close to the odds of winning the California Super Lotto Jackpot. Even if every person on the planet got five random numbers, it probably wouldn't stay in the loop three times. (Assuming a perfect RNG.)
rand returns pseudo-random integral number:
Notice though that this modulo operation does not generate a truly
uniformly distributed random number in the span (since in most cases
lower numbers are slightly more likely), but it is generally a good
approximation for short spans.
Now, regarding the less-lengthy way, you can use switch-case construction, or a series of conditional operators ?: (which will make your code short and unreadable:).

Generating a random integer from a range

I need a function which would generate a random integer in a given range (including boundary values). I don't have unreasonable quality/randomness requirements; I have four requirements:
I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
I need it to be reasonably uniform (use of rand() is perfectly fine).
the minimum-maximum ranges can be anything from <0, 1> to <-32727, 32727>.
it has to be seedable.
I currently have the following C++ code:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is a major issue for small ranges like <-1, 1>, where the last value is almost never returned.
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
But it still doesn't give me a uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
Is there a better formula? (Or even whole pseudo-random number generator function?)
The simplest (and hence best) C++ (using the 2011 standard) answer is:
#include <random>
std::random_device rd; // Only used once to initialise (seed) engine
std::mt19937 rng(rd()); // Random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // Guaranteed unbiased
auto random_integer = uni(rng);
There isn't any need to reinvent the wheel, worry about bias, or worry about using time as the random seed.
A fast, somewhat better than yours, but still not properly uniform distributed solution is
output = min + (rand() % static_cast<int>(max - min + 1))
Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.
If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.
Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.
#include <iostream>
#include <random>
#include <chrono>
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
Clock::time_point t0 = Clock::now();
const int N = 10000000;
typedef std::minstd_rand G; // Select the engine
G g; // Construct the engine
typedef std::uniform_int_distribution<> D; // Select the distribution
D d(-57, 365); // Construct the distribution
int c = 0;
for (int i = 0; i < N; ++i)
c += d(g); // Generate a random number
Clock::time_point t1 = Clock::now();
std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
return c;
}
For me (2.8 GHz Intel Core i5) this prints out:
2.10268e+07 random numbers per second.
You can seed the generator by passing in an int to its constructor:
G g(seed);
If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g., to long long):
typedef std::uniform_int_distribution<long long> D;
If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:
typedef std::mt19937 G; // Now using mersenne_twister_engine
Having separate control over the random number generator, and the random distribution can be quite liberating.
I've also computed (not shown) the first four "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:
min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001
(The x_ prefix refers to "expected".)
Let's split the problem into two parts:
Generate a random number n in the range 0 through (max-min).
Add min to that number
The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias
to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.
It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2,
we know that p > 1/2, so the expected number of iterations will always be less than two
for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.
Use the Mersenne Twister. The Boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects, such as artificial intelligence and evolutionary algorithms.
Here's their example where they make a simple function to roll a six-sided die:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>
boost::mt19937 gen;
int roll_die() {
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
return die();
}
Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():
The Mersenne Twister is a "random
number" generator invented by Makoto
Matsumoto and Takuji Nishimura; their
website includes numerous
implementations of the algorithm.
Essentially, the Mersenne Twister is a
very large linear-feedback shift
register. The algorithm operates on a
19,937 bit seed, stored in an
624-element array of 32-bit unsigned
integers. The value 2^19937-1 is a
Mersenne prime; the technique for
manipulating the seed is based on an
older "twisting" algorithm -- hence
the name "Mersenne Twister".
An appealing aspect of the Mersenne
Twister is its use of binary
operations -- as opposed to
time-consuming multiplication -- for
generating numbers. The algorithm also
has a very long period, and good
granularity. It is both fast and
effective for non-cryptographic applications.
int RandU(int nMin, int nMax)
{
return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}
This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.
Assume min and max are integer values,
[ and ] means include this value,
( and ) means do not include this value,
using the above to get the right value using C++'s rand().
Reference:
For ()[] define, visit Interval (mathematics).
For the rand and srand function or RAND_MAX define,
visit std::rand.
[min, max]
int randNum = rand() % (max - min + 1) + min
(min, max]
int randNum = rand() % (max - min) + min + 1
[min, max)
int randNum = rand() % (max - min) + min
(min, max)
int randNum = rand() % (max - min - 1) + min + 1
Here is an unbiased version that generates numbers in [low, high]:
int r;
do {
r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;
If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.
I recommend the Boost.Random library. It's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.
Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.
Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]
Take base b=max-min+1
Start from representing a number you got from rand() in base b.
That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].
Of course the final shift to [min,max] is simple for each random number r+min.
int n = NUM_DIGIT-1;
while(n >= 0)
{
r[n] = res % b;
res -= r[n];
res /= b;
n--;
}
If NUM_DIGIT is the number of digit in base b that you can extract and that is
NUM_DIGIT = floor(log(b,RAND_MAX))
then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.
In answers to this question, rejection sampling was already addressed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.
The algorithm is really simple:
calculate the smallest power of 2 greater than the interval length
randomize one number in that "new" interval
return that number if it is less than the length of the original interval
reject otherwise
Here's my sample code:
int randInInterval(int min, int max) {
int intervalLen = max - min + 1;
//now calculate the smallest power of 2 that is >= than `intervalLen`
int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));
int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"
if (randomNumber < intervalLen)
return min + randomNumber; //ok!
return randInInterval(min, max); //reject sample and try again
}
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
PS: Obviously avoiding the recursion would be more efficient (there isn't any need to calculate over and over the log ceiling...), but I thought it was more readable for this example.
The following is the idea presented by Walter. I wrote a self-contained C++ class that will generate a random integer in the closed interval [low, high]. It requires C++11.
#include <random>
// Returns random integer in closed range [low, high].
class UniformRandomInt {
std::random_device _rd{};
std::mt19937 _gen{_rd()};
std::uniform_int_distribution<int> _dist;
public:
UniformRandomInt() {
set(1, 10);
}
UniformRandomInt(int low, int high) {
set(low, high);
}
// Set the distribution parameters low and high.
void set(int low, int high) {
std::uniform_int_distribution<int>::param_type param(low, high);
_dist.param(param);
}
// Get random integer.
int get() {
return _dist(_gen);
}
};
Example usage:
UniformRandomInt ur;
ur.set(0, 9); // Get random int in closed range [0, 9].
int value = ur.get()
The formula for this is very simple, so try this expression,
int num = (int) rand() % (max - min) + min;
//Where rand() returns a random number between 0.0 and 1.0
The following expression should be unbiased if I am not mistaken:
std::floor( ( max - min + 1.0 ) * rand() ) + min;
I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 not including 1.0 and that max and min are integers with the condition that min < max.

Random integers c++

I'm trying to produce random integers (uniformly distributed).
I found this snippet on an other forum but it works in a very weird way..
srand(time(NULL));
AB=rand() % 10+1;
Using this method I get values in a cycle so the value increases with every call until it goes down again. I guess this has something to do with using the time as aninitializer?
Something like this comes out.
1 3 5 6 9 1 4 5 7 8 1 2 4 6 7.
I would however like to get totally random numbers like
1 9 1 3 8 2 1 7 6 7 5...
Thanks for any help
You should call srand() only once per program.
Also check out the Boost Random Number Library:
Boost Random Number Library
srand() has to be done once per execution, not once for each rand() call,
some random number generators have a problem with using "low digit", and there is a bias
if you don't drop some number, a possible work around for both issues:
int alea(int n){
assert (0 < n && n <= RAND_MAX);
int partSize =
n == RAND_MAX ? 1 : 1 + (RAND_MAX-n)/(n+1);
int maxUsefull = partSize * n + (partSize-1);
int draw;
do {
draw = rand();
} while (draw > maxUsefull);
return draw/partSize;
}
You can use the the Park-Miller "minimal standard" Linear Congruential Generator (LCG): (seed * 16807 mod(2^31 - 1)). My implementation is here
Random integers with g++ 4.4.5
The C language 'srand()' function is used to set the global variable that 'rand()' uses. When you need a single sequence of random numbers, 'rand()' is more than enough, but oftentimes you need several random number generators. For those cases, my advice would be to use C++ and a class like 'rand31pmc'.
If you want to generate random numbers in a small range, then you can use the Java library implementation available here:
http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#nextInt%28int%29