I am trying to use mersenne twister to generate samples from various distribution. I have one generator and it is used to generate all of them. Something strange (to me at least) happens here. On one hand calculating the correlation coefficient of the various samples gives me almost zero, which seems nice. But when I change a parameter of one distribution (which is used nowhere else), it somehow also changes the results I get in others. Specifically:
#include <boost/random.hpp>
using namespace boost; // boost random library for random generators
mt19937 generator(7687); // mersenne twister random number generator, seed = 7687
double normal_sample(double mu, double sigma)
// returns a sample from normal distribution with mean mu and variance sigma
{
normal_distribution<> norm_dist;
variate_generator<mt19937&, normal_distribution<> > norm_rnd(generator, norm_dist);
return(mu + sigma * norm_rnd());
}
double poisson_sample(double intensity)
// returns a number of points in a realization of a Poisson point process
{
poisson_distribution<> poiss_dist(intensity);
variate_generator<mt19937&, poisson_distribution<> > poiss_rnd(generator, poiss_dist);
return(poiss_rnd());
}
This is the code...the generator part, then I draw from those two distributions, changing the parameter called intensity. This changes not only the Poisson sample, but the normal one as well...actually, now that I think of it, it kind of makes sense, because my Poisson sample determines a number of points that are also randomly generated using the same generator...so then then depending on how many of them there are, I get something else, because the normal sample is generated using different numbers in the sequence. Is that correct?
If so, how would one go about changing that? Should I use multiple generators?
It probably means that depending on the parameters fewer or more random samples are extracted from the mersenne twister.
This logically implies that all other results are shifted, making all other outcomes different.
[...] it kind of makes sense, because my Poisson sample determines a number of points that are also randomly generated using the same generator...so then then depending on how many of them there are, I get something else, because the normal sample is generated using different numbers in the sequence. Is that correct?
Seems to me you got it figured out already, yes.
If you wanted repeatable PRNG, use separate PRNG states, i.e. different mersenne egnines.
Related
Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_twister_engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.
Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_twister_engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.
Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_twister_engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.
I was hoping to learning how to generate numbers from normal distribution in C++ when I saw This Post. It gives a very good example, but still I am not sure what the & in boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > var_nor(rng, nd); means. What effect will it produce if I did not include this & here?
Also, when reading the tutorial on Boost's official website, I found that after generating a distribution object with boost::random::uniform_int_distribution<> dist(1, 6), they were able to directly generate random numbers with it by calling dist(gen)(gen here is the random engine), without invoking the "variate_generator" object. Of course, this is for generating uniform random numbers, but I am curious if I can do the same with normal distribution, as an alternative way to calling "variate_generator"?
Short background information
One approach to generate random numbers with a specific distribution, is to generate uniformly distributed random numbers from the interval [0, 1), for example, and then apply some maths on these numbers to shape them into the desired distribution. So you have two objects: one generator for random numbers from [0, 1) and one distribution object, which
takes uniformly distributed random numbers and spits out random numbers in the desired (e.g. the normal) distribution.
Why passing the generator by reference
The var_nor object in your code couples the generator rnd with the normal distribution nd. You have to pass your generator via reference, which is the & in the template argument. This is really essential, because the random number generator has an internal state from which it computes the next (pseudo-)random number. If you would not pass the generator via reference, you would create a copy of it and this might lead to code, which always creates the same random number. See this blog post as an example.
Why the variate_generator is necessary
Now to the part, why not to use the distribution directly with the generator. If you try the following code
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>
int main()
{
boost::mt19937 generator;
boost::normal_distribution<> distribution(0.0, 1.0);
// WARNING: THIS DOES NOT WORK AS MIGHT BE EXPECTED!!
for (int i = 0; i < 100; ++i)
std::cout << distribution(generator) << std::endl;
return 0;
}
you will see, that it outputs NaNs only (I've tested it with Boost 1.46). The reason is that the Mersenne twister returns a uniformly distributed integer random number. However, most (probably even all) continuous distributions require floating point random numbers from the range [0, 1). The example given in the Boost documentation works because uniform_int_distribution is a discrete distribution and thus can deal with integer RNGs.
Note: I have not tried the code with a newer version of Boost. Of course, it would be nice if the compiler threw an error if a discrete RNG is used together with a continuous distributuon.
I need to generate quickly lots of random numbers from binomial distributions for dramatically different trial sizes (most, however, will be small). I was hoping not to have to code an algorithm by hand (see, e.g., this related discussion from November), because I'm a novice programmer and don't like reinventing wheels. It appears Boost does not supply a generator for binomially distributed variates, but TR1 and GSL do. Is there a good reason to choose one over the other, or is it better that I write something customized to my situation? I don't know if this makes sense, but I'll alternate between generating numbers from uniform distributions and binomial distributions throughout the program, and I'd like for them to share the same seed and to minimize overhead. I'd love some advice or examples for what I should be considering.
Boost 1.43 appears to support binomial distributions. You can use boost::variate_generator to connect your source of randomness to the type
of distribution you want to sample from.
So your code might look something like this (Disclaimer: not tested!):
boost::mt19937 rng; // produces randomness out of thin air
// see pseudo-random number generators
const int n = 20;
const double p = 0.5;
boost::binomial<> my_binomial(n,p); // binomial distribution with n=20, p=0.5
// see random number distributions
boost::variate_generator<boost::mt19937&, boost::binomial<> >
next_value(rng, my_binomial); // glues randomness with mapping
int x = next_value(); // simulate flipping a fair coin 20 times
You misunderstand the Boost model - you choose a random number generator type and then a distribution on which to spread the values the RNG produces over. There's a very simple example in this answer, which uses a uniform distribution, but other distributions use the same basic pattern - the generator and the distribution are completely decoupled.