Random Number Generation : same C++ code, two different behaviors - c++

My colleague and I are working on a Monte Carlo project together, in C++. She uses Visual Studio, I use Xcode, we shared the code through git.
We are computing American option prices thanks to a given method requiring random number generation. We realized we were getting wrong results for a certain parameter K (the higher the parameter, the more wrong the answer), and my colleague found that changing the random source for Mersenne Twister to rand() (though poor generator) made the results good for the whole range of K.
But when I changed the source on my version of the code, it did nothing.
More puzzling for me, I created a new Xcode project, copied inside it her whole source and it still gives me wrong results (while she gets good ones). So it can't stem from the code itself.
I cleaned the project, relaunched Xcode, even restarted my computer (...), but nothing changes : our projects behaves consistently but differently, with the same code behind.
(EDIT: by differently but consistently, I don't mean that we don't have the same sequence of numbers. I mean that her Monte Carlo estimator converges toward 4. and mine towards 3.)
Do you have any idea of what the cause of this dual behavior could be ?
Here is the random generation code :
double loiuniforme() //uniform law
{
return (double)((float)rand() / (float)RAND_MAX);
}
vector<double> loinormale() //normal law
{
vector<double> loinormales(2, 0.);
double u1 = loiuniforme();
double v1 = loiuniforme();
loinormales[0] = sqrt(-2 * log(u1))*cos(2 * M_PI*v1);
loinormales[1] = sqrt(-2 * log(u1))*sin(2 * M_PI*v1);
return(loinormales);
}
EDIT : the MT RNG used before was :
double loiuniforme()
{
mt19937::result_type seed = clock();
auto real_rand = std::bind(std::uniform_real_distribution<double>(0,1), mt19937(seed));
return real_rand();
}

The C++ standard does not specify what algorithm is used by rand(). Whoever wrote the compiler is free to use whatever implementation they want, and there is no guarantee that it will behave the same on two different compilers, on two different architectures or even two different versions of the same compiler.

You should only create one generator and use that for every number.
mt19937::result_type seed = clock();
and
mt19937(seed)
create a new generator, with a new seed, every time you call the function.
This causes the randomness to get all twisted.
You can use static variables in the function, since these are initialised on the first call:
double loiuniforme()
{
static std::mt19937 generator(clock());
static std::uniform_real_distribution<double> distribution(0, 1);
return distribution(generator);
}
(When you're comparing results with your colleague, use the same hardcoded seed to verify that you are getting the same results.)

You need to seed the rand function with the same number on both computers. And even then I'm not sure that the underlying code across computers and operating systems will return the same value.
More importantly, if you want identical results, don't use a random function.

Related

Is the seed of the mersenne_twister_engine instance invariant? [duplicate]

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

Deterministic random numbers from STL [duplicate]

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

If we seed c++11 mt19937 as the same on different machines, will we get the same sequence of random numbers

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

Boost vs. .Net random number generators

I developed the same algorithm (Baum-Welch for estimating parameters of a hidden Markov model) both in F# (.Net) and C++. In both cases I developed the same test that generates random test data with known distribution and then uses the algorithm to estimate the parameters, and makes sure it converges to the known right answer.
The problem is that the test works fine in the F# case, but fails to converge in the C++ implementation. I compared both algorithms on some real-world data and they give the same results, so my guess is that the generation of the test data is broken in the C++ case. Hence my question: What is the random number generator that comes with .Net 4 (I think this is the default version with VS2010)?
In F# I am using:
let random = new Random()
let randomNormal () = //for a standard normal random variable
let u1 = random.NextDouble()
let u2 = random.NextDouble()
let r = sqrt (-2. * (log u1))
let theta = 2. * System.Math.PI * u2
r * (sin theta)
//random.NextDouble() for uniform random variable on [0-1]
In C++ I use the standard Boost classes:
class HmmGenerator
{
public:
HmmGenerator() :
rng(37), //the seed does change the result, but it doesn't make it work
normalGenerator(rng, boost::normal_distribution<>(0.0, 1.0)),
uniformGenerator(rng, boost::uniform_01<>()) {}//other stuff here as well
private:
boost::mt19937 rng;
boost::variate_generator<boost::mt19937&,
boost::normal_distribution<> > normalGenerator;
boost::variate_generator<boost::mt19937&,
boost::uniform_01<> > uniformGenerator;
};
Should I expect different results using these two ways of generating random numbers?
EDIT: Also, is the generator used in .Net available in Boost (ideally with the same parameters), so I could run it in C++ and compare the outcomes?
Hence my question: What is the random number generator that comes with .Net 4 (I think this is the default version with VS2010)?
From the documentation on Random
The current implementation of the Random class is based on Donald E. Knuth's subtractive random number generator algorithm. For more information, see D. E. Knuth. "The Art of Computer Programming, volume 2: Seminumerical Algorithms". Addison-Wesley, Reading, MA, second edition, 1981.
.
Should I expect different results using these two ways of generating random numbers?
The Mersenne-Twister algorithm you're using in C++ is considered very respectable, compared to other off-the-shelf random generators.
I suspect any discrepancy in your codes lie elsewhere.

How to eliminate all sources of randomness so that program always gives identical answers?

I have C++ code that relies heavily on sampling (using rand()), but I want it to be reproducible. So in the beginning, I initialize srand() with a random seed and print that seed out. I want others to be able to run the same code again but initializing srand() with that same seed and get exactly the same answer as I did.
But under what circumstances is that guaranteed? I suppose that works only if the binaries are compiled with the same compiler on the same system? What are other factors that might make the answer differ from the one I got initially?
The solution is to use the same code in all cases - the Boost random number library is infinitely better than any C++ standard library implementation, and you can use the same code on all platforms. Take a look at this question for example of its use and links to the library docs.
You're correct that the sequences might be different if compiled on different machines with different rand implementations. The best way to get around this is to write your own PRNG. The Linux man page for srand gives the following simple example (quoted from the POSIX standard):
POSIX.1-2001 gives the following
example of an implementation of rand()
and srand(), possibly useful when one
needs the same sequence on two
different machines.
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}
To avoid this kind of problem, write your own implementation of rand()! I'm no expert on random-number generation algorithms, so I'll say no more than that...
Check out implementation of rand(), and use one of the random number generators from there - which ensures repeatability no matter what platform you run on.