C++ RNG (Mersenne Twister) needs seed - c++

I have written a RNG class which holds different algorithms, however it does not work as expected. Besides the fact that i want use normal (rather than uniform) distribution my code always returns either the same number (max) or just 2 numbers out of the interval [min,max]:
std::function<int(int, int)> mt19937 =
[](int min, int max) -> int {
std::uniform_int_distribution<int> distribution(min, max);
std::mt19937 engine;
engine.seed(time(null));
auto generator = std::bind(distribution, engine);
return generator();
};
Can anyone explain me what is missing to solve this puzzle? Furthermore, how can i implement normal distribution? Last time i tried out std::normal_distribution i was not able to enter bounds!
EDIT: When i speak of a normal distribution i mean that the results of the RNG near the two bounds should not be generated as often as the mean of both. E.g. look at the graphical representation of the standard Gauss distribution. I am referring to it because it visualizes the probabilities of the resulting values which i want to implement/use this way, if you understand.

The normal distribution is just this (x is a random uniform number):
But I see something that could be problematic:
std::uniform_int_distribution<int> distribution(min, max);
Isn't this giving your number generator an int type?
To fix the seeding problem, create your engine outside of the lambda and seed it when you create it.
A RNG uses an algorithm that produces numbers that appear random, but have a a very large period of repetition (a highlight of the Mersenne Twister). When you seed, you give the RNG an initial value to start the process with. Each time you ask for another number, it spits out another iteration of the algorithm.
When you seed every iteration:
time(NULL)
this code changes only every second, so when you request a new random number, it will only change every second.

Related

Forking a random number generator deterministically?

I'm using std::mt19937 to produce deterministic random numbers. I'd like to pass it to functions so I can control their source of randomness. I could do int foo(std::mt19937& rng);, but I want to call foo and bar in parallel, so that won't work. Even if I put the generation function behind a mutex (so each call to operator() did std::lock_guard lock(mutex); return rng();), calling foo and bar in parallel wouldn't be deterministic due to the race on the mutex.
I feel like conceptually I should be able to do this:
auto fooRNG = std::mt19937(rng()); // Seed a RNG with the output of `rng`.
auto barRNG = std::mt19937(rng());
parallel_invoke([&] { fooResult = foo(fooRNG); },
[&] { barResult = bar(barRNG); });
where I "fork" rng into two new ones with different seeds. Since fooRNG and barRNG are seeded deterministically, they should be random and independent.
Is this general gist viable?
Is this particular implementation sufficient (I doubt it)?
Extended question: Suppose I want to call baz(int n, std::mt19937&) massively in parallel over a range of indexed values, something like
auto seed = rng();
parallel_for(range(0, 1 << 20),
[&](int i) {
auto thisRNG = std::mt19937(seed ^ i); // Deterministically set up RNGs in parallel?
baz(i, thisRNG);
});
something like that should work, right? That is, provided we give it enough bits of state?
Update:
Looking into std::seed_seq, it looks(?) like it's designed to turn not-so-random seeds into high-quality seeds: How to properly initialize a C++11 std::seed_seq
So maybe what I want something like
std::mt19937 fork(std::mt19937& rng) {
return std::mt19937(std::seed_seq({rng()}));
}
or more generally:
//! Represents a state that can be used to generate multiple
//! distinct deterministic child std::mt19937 instances.
class rng_fork {
std::mt19937::result_type m_seed;
public:
rng_fork(std::mt19937& rng) : m_seed(rng()) {}
// Copy is explicit b/c I think it's a correctness footgun:
explicit rng_fork(const rng_fork&) = default;
//! Make the ith fork: a deterministic but well-seeded
//! RNG based off the internal seed and the given index:
std::mt19937 ith_fork(std::mt19937::result_type index) const {
return std::mt19937(std::seed_seq({m_seed, index}));
}
};
then the initial examples would become
auto fooRNG = fork(rng);
auto barRNG = fork(rng);
parallel_invoke([&] { fooResult = foo(fooRNG); },
[&] { barResult = bar(barRNG); });
and
auto fork_point = rng_fork{rng};
parallel_for(range(0, 1 << 20),
[&](int i) {
auto thisRNG = fork_point.ith_fork(i); // Deterministically set up a RNG in parallel.
baz(i, thisRNG);
});
Is that correct usage of std::seed_seq?
I am aware of 3 ways to seed multiple parallel pseudo random number generators (PRNGs):
First option
Given a seed, initialize the first instance of the PRNG with seed, the second with seed+1, etc. The thing to be aware of here is that the state of the PRNGs will be initially very close in case the seed is not hashed. Some PRNGs will take a long time to diverge. See e.g. this blog post for more information.
For std::mt19937 specifically, however, this was never an issue in my tests because the initial seed is not taken as is but instead gets "mangled/hashed" (compare the documentation of the result_type constructor). So it seems to be a viable option in practice.
However, notice that there are some potential pitfalls when seeding a Mersenne Twister (which has an internal state of 624 32-bit integers) with a single 32 bit integer. For example, the first number can never be 7 or 13. See this blog post for more information. But if you do not rely on the randomness of only the first few drawn numbers but draw a more reasonable number of numbers from each PRNG, it is probably fine.
Second option
Without std::seed_seq:
Seed one "parent" PRNG. Then, to initialize N parallel PRNGs, draw N random numbers and use them as seeds. This is your initial idea where you draw 2 random numbers rng() and initialize the two std::mt19937:
std::mt19937 & rng = ...;
auto fooRNG = std::mt19937(rng()); // Seed a RNG with the output of `rng`.
auto barRNG = std::mt19937(rng());
The major issue to look out for here is the birthday problem. It essentially states that the probability to draw the same number twice is more likely than you'd intuitively think. Given a type of PRNG that has a value range of b (i.e. b different values can appear in its output), the probability p(t) to draw the same number twice when drawing t numbers can be estimated as:
p(t) ~ t^2 / (2b) for t^2 << b
(compare this post). If we stretch the estimate "a bit", just to show the basic issue:
For a PRNG producing a 16 bit integer, we have b=2^16. Drawing 256 numbers results in a 50% chance to draw the same number twice according to that formula. For a 32 bit PRNG (such as std::mt19937) we need to draw 65536 numbers, and for a 64 bit integer PRNG we need to draw ~4e9 numbers to reach the 50%. Of course, this is all an estimate, so you want to draw several orders of magnitude less numbers than that. Also see this blog post for more information.
In case of seeding the parallel std::m19937 instances with this method (32 bit output and input!), that means you probably do not want to draw more than a hundred or so random numbers. Otherwise, you have a realistic chance of drawing the same number twice. Of course, you could ensure that you do not draw the same seed twice by keeping a list of already used seeds. Or use std::mt19937_64.
Additionally, there are still the potential pitfalls mentioned above regarding the seeding of a Mersenne Twister with 32 bit numbers.
With seed sequence:
The idea of std::seed_seq is to take some numbers, "mix them" and then provide them as input to the PRNG so that it can initialize its state. Since the 32 bit Mersenne Twister has a state of 624 32-bit integers, you should provide that many numbers to the seed sequence for theoretically optimal results. That way you get b=2^(624*32), meaning that you avoid the birthday problem for all practical purposes.
But in your example
std::mt19937 fork(std::mt19937& rng) {
return std::mt19937(std::seed_seq({rng()}));
}
you provide only a single 32 bit integer. This effectively means that you hash that 32 bit number before putting it into std::mt19937. So you do not gain anything regarding the birthday problem. And the additional hashing is unnecessary because std::mt19937 already does something like this.
std::seed_seq itself is somewhat flawed, see this blog post. But I guess for practical purposes it does not really matter. A supposedly better alternative exists, but I have no experience with it.
Third option
Some PRNG algorithms such as PCG or xoshiro256++ allow to jump over a large number of random numbers fast. For example, xoshiro256++ has a period of (2^256)-1 before it repeats itself. It allows to jump ahead by 2^128 (or alternatively 2^192) numbers. So the idea would be that the first PRNG is seeded, then you create a copy of it and jump ahead by 2^128 numbers, then create a copy of that second one and jump ahead again by 2^128, etc. So each instance works in a slice of length 2^128 from the total range of 2^256. The slices are stochastically independent. This elegantly bypasses the problems with the above methods.
The standard PRNGs do have a discard(z) method to jump z values ahead. However, it is not guaranteed that the jumping will be fast. I don't know whether std::mt19937 implements fast jumping in all standard library implementations. (As far as I know, the Mersenne Twister algorithm itself does allow this in principle.)
Additional note
I found PRNGs to be surprisingly difficult to use "right". It really depends on the use case how careful you need to be and what method to choose. Think about the worst thing that could happen in your case if something goes wrong, and invest an according amount of time in researching the topic.
For ordinary scientific simulations where you require only a few dozens or so parallel instances of std::mt19937, I'd guess that the first and second option (without seed sequence) are both viable. But if you need several hundreds or even more, you should think more carefully.

Where should I put my random number generation to get random results?

I have a nested system as described in the pseudocode below (part of a random weighted majority algorithm):
function1() {
//for 100 iterations:
function2()
// grab logistics
}
function2() {
// create a random seed/generator
random_device rd;
mt19937 gen(rd);
//for 1000 iterations:
function3(gen);
}
function3(gen) {
// grab number from uniform_real_distribution using gen
// then use that number against differing weights
// such that higher weight gets more territory in the uniform distribution for its desired outcome
// that is in a system with weights (1, 1/2) distributed over a uniform distribution (0,1)
// outcome of weight 1 happens if dist lands (0,.6666) and outcome of weight 2 if dist lands (.6666, 1)
}
In the above example, uniform_real_distribution generates what appears to be random numbers, but function1 always ends up with the exact same result.
However, when I run this function1 will always get the same exact results in every iteration, even though the other two functions are supposed to be random. Even worse, if I change the generator from something like mt19937 to ranlux48, the system will get the exact same results each iteration, but that exact result will be different from the one the mt19937 got, which means everything I'm doing is not random-- only dependent on the generator.
I need guidance on how to fix this such that I have truly random results.
Where should I place gen and rd? Should I even use a uniform real distribution?
If I create gen in function3 every time it is called, I also still get non-random results, in fact uniform_real_distribution generates the exact same value every time
Although you have only shown pseudocode, it appears that you are creating a new random device and generator every time you call the function. This is needlessly expensive, and more importantly, every time you call the function, you'll get the same results from the random generator. The simplest modification to your pseudocode would be to make the generator static, like this:
function2() {
// create a random seed/generator ONCE only
static random_device rd;
static mt19937 gen(rd);
// work
As user4581301 pointed out in the comments I was using an old version of MinGW in which random_device was broken, producing the same seed every time. Consequently, because of the scope of my random generator I would start on the same seed and get the same result every time. It's not actually a program issue, but a compiler one, which is why I was confused!

C++ need a good technique for seeding rand() that does not use time()

I have a bash script that starts many client processes. These are AI game players that I'm using to test a game with many players, on the order of 400 connections.
The problem I'm having is that the AI player uses
srand( time(nullptr) );
But if all the players start at approximately the same time, they will frequently receive the same time() value, which will mean that they are all on the same rand() sequence.
Part of the testing process is to ensure that if lots of clients try to connect at approximately the same time, the server can handle it.
I had considered using something like
srand( (int) this );
Or similar, banking on the idea that each instance has a unique memory address.
Is there another better way?
Use a random seed to a pseudorandom generator.
std::random_device is expensive random data. (expensive as in slow)
You use that to seed a prng algorithm. mt19937 is the last prng algorithm you will ever need.
You can optionally follow that up by feeding it through a distribution if your needs require it. i.e. if you need values in a certain range other than what the generator provides.
std::random_device rd;
std::mt19937 generator(rd());
These days rand() and srand() are obsolete.
The generally accepted method is to seed a pseudo random number generator from the std::random_device. On platforms that provide non-deterministic random sources the std::random_device is required to use them to provide high quality random numbers.
However it can be slow or even block while gathering enough entropy. For this reason it is generally only used to provide the seed.
A high quality but efficient random engine is the mersenne twister provided by the standard library:
inline
std::mt19937& random_generator()
{
thread_local static std::mt19937 mt{std::random_device{}()};
return mt;
}
template<typename Number>
Number random_number(Number from, Number to)
{
static_assert(std::is_integral<Number>::value||std::is_floating_point<Number>::value,
"Parameters must be integer or floating point numbers");
using Distribution = typename std::conditional
<
std::is_integral<Number>::value,
std::uniform_int_distribution<Number>,
std::uniform_real_distribution<Number>
>::type;
thread_local static Distribution dist;
return dist(random_generator(), typename Distribution::param_type{from, to});
}
You use a random number seed if and only if you want reproducible results. This can be handy for things like map generation where you want the map to be randomized, but you want it to be predictably random based on the seed.
For most cases you don't want that, you want actually random numbers, and the best way to do that is through the Standard Library generator functions:
#include <random>
std::random_device rd;
std::map<int, int> hist;
std::uniform_int_distribution<int> dist(0, 5);
int random_die_roll = dist(rd);
No seed is required nor recommended in this case. The "random device" goes about seeding the PRNG (pseudo random number generator) properly to ensure unpredictable results.
Again, DO NOT use srand(time(NULL)) because it's a very old, very bad method for initializing random numbers and it's highly predictable. Spinning through a million possible seeds to find matching output is trivial on modern computers.
I'm trying so seed the random function with errno :
#include <stddef.h>
#include <string.h>
int main(void){
srand(&errno);
srand(strerror(0));
return rand();
}

C++ random numbers are always the same

I am currently stuck at generating random numbers during runtime. In Java, I just call Math.random() and I'm pretty much done (I just need a simple RNG). In C++, I have tried several ways to generate random numbers and always end up getting the same.
Currently, I am using the following method to get a random number between MIN and MAX:
unsigned int getRandomNumber(int min, int max){
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(min, max);
return dist(mt);
}
I have an object that calls this function in its constructor and assigns the value returned to an attribute. I currently create five instances of this object and the random number is always the same. Setting a big range (1 - 1000) does not change this. The number is always the same. Security is not a concern, it is an extremely simple application.
A random number generator works with a seed. Basically it's a number that's set only once for the random number generator to work with. If you re-seed your random number generator each time you try to generate an number you will get the same number every time. You should create the std::mt19937 object only once.
unsigned int getRandomNumber(int min, int max){
static std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(min, max);
return dist(mt);
}
Making mt static will cause it to only be instantiated once, which means it will only be constructed once, which means it will only be seeded once. Even with this fix, you'll still get the same series of numbers each time you run the program, but they'll be different each time you call getRandomNumber in one single execution.
A much better solution would be to instantiate the mt variable elsewhere, and pass it in to this function as a parameter, that way you could manage how it is seeded with more code than just a constructor call. Typically you would seed with a value based on time. Lots of insight here.

Questions with using boost for generating normal random numbers

I was hoping to learning how to generate numbers from normal distribution in C++ when I saw This Post. It gives a very good example, but still I am not sure what the & in boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > var_nor(rng, nd); means. What effect will it produce if I did not include this & here?
Also, when reading the tutorial on Boost's official website, I found that after generating a distribution object with boost::random::uniform_int_distribution<> dist(1, 6), they were able to directly generate random numbers with it by calling dist(gen)(gen here is the random engine), without invoking the "variate_generator" object. Of course, this is for generating uniform random numbers, but I am curious if I can do the same with normal distribution, as an alternative way to calling "variate_generator"?
Short background information
One approach to generate random numbers with a specific distribution, is to generate uniformly distributed random numbers from the interval [0, 1), for example, and then apply some maths on these numbers to shape them into the desired distribution. So you have two objects: one generator for random numbers from [0, 1) and one distribution object, which
takes uniformly distributed random numbers and spits out random numbers in the desired (e.g. the normal) distribution.
Why passing the generator by reference
The var_nor object in your code couples the generator rnd with the normal distribution nd. You have to pass your generator via reference, which is the & in the template argument. This is really essential, because the random number generator has an internal state from which it computes the next (pseudo-)random number. If you would not pass the generator via reference, you would create a copy of it and this might lead to code, which always creates the same random number. See this blog post as an example.
Why the variate_generator is necessary
Now to the part, why not to use the distribution directly with the generator. If you try the following code
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>
int main()
{
boost::mt19937 generator;
boost::normal_distribution<> distribution(0.0, 1.0);
// WARNING: THIS DOES NOT WORK AS MIGHT BE EXPECTED!!
for (int i = 0; i < 100; ++i)
std::cout << distribution(generator) << std::endl;
return 0;
}
you will see, that it outputs NaNs only (I've tested it with Boost 1.46). The reason is that the Mersenne twister returns a uniformly distributed integer random number. However, most (probably even all) continuous distributions require floating point random numbers from the range [0, 1). The example given in the Boost documentation works because uniform_int_distribution is a discrete distribution and thus can deal with integer RNGs.
Note: I have not tried the code with a newer version of Boost. Of course, it would be nice if the compiler threw an error if a discrete RNG is used together with a continuous distributuon.