How would I seed the PCG RNG with a chosen, set seed?

How would I seed the PCG RNG with a chosen, set seed? - c++

For testing purposes I need to seed PCG's C++ implementation (the 64 bit output one) with a set value. When I look at the examples I only see it seeding using entropy.
I've used
pcg64 rng(42);
and it's worked, rng() generating the same numbers every time, but PCG64 uses a 256-bit seed and this way seems to generate the same numbers for each value above the 64 bit integer limit.
What's the best way to seed it with known values?

As I read source code, pcg64 is an instantination of engine template class which accepts pcg128_t seed value in constructor. So it is only 128 bit value, not 256 bit.
There are two ways how you can pass 128 bit seed to constructor. First is if you already have pre-defined two 64-bit values, then you can use PCG_128BIT_CONSTANT(hi, lo) macro for building 128 bit value, and write code:
pcg64 rng{PCG_128BIT_CONSTANT(0xC7C8709C9626D159ULL, 0x675BB824D76E9146ULL)};
Second way is by using std::mt19937_64 random generator and maybe std::random_device (for initializing random generator):
#include <random>
....
std::mt19937_64 seed{0xC7C8709C9626D159ULL};
// or
// std::mt19937_64 seed{std::random_device()};
pcg64 rng{PCG_128BIT_CONSTANT(seed(), seed())};
You can also use std::seed_seq for same purpose as std::mt19937_64 was used above:
#include <random>
#include <array>
#include <cstdint>
....
std::seed_seq seed{1, 2, 3, 4, 5};
std::array<uint32_t, 4> seeds{};
seed.generate(seeds.begin(), seeds.end());
pcg64 rng{PCG_128BIT_CONSTANT(((uint64_t(seeds[0]) << 32) | seeds[1]),
((uint64_t(seeds[2]) << 32) | seeds[3]))};

Related

Cpp random number with rand returns very similar numbers

I'm trying to generate a random number using rand() command, but each time i get very similar numbers.
This is my code:
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
srand(time(0));
cout << rand();
return 0;
}
I ran it 5 times and the numbers i got are:
21767
21806
21836
21862
21888
How can i make the numbers be more different?

From the documentation of rand:
There are no guarantees as to the quality of the random sequence produced. In the past, some implementations of rand() have had serious shortcomings in the randomness, distribution and period of the sequence produced (in one well-known example, the low-order bit simply alternated between 1 and 0 between calls).
rand() is not recommended for serious random-number generation needs. It is recommended to use C++11's random number generation facilities to replace rand().
It (and I) recommend to use the newer c++11 random number generators in <random>.
In your specific case it seems you want a std::uniform_int_distribution. An example, as given on the linked page is:
std::random_device rd; //Will be used to obtain a seed for the random number engine
std::mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
std::uniform_int_distribution<> distrib(1, RAND_MAX);
std::cout << distrib(gen) << '\n';

How to Understand C++11 random number generator

Those three lines of generating random number looks a bit tricky. It is hard to always remember those lines. Could someone please shed some light on it to make it easier to understand?
#include <random>
#include <iostream>
int main()
{
std::random_device rd; //1st line: Will be used to obtain a seed for the random number engine
std::mt19937 gen(rd()); //2nd line: Standard mersenne_twister_engine seeded with rd()
std::uniform_int_distribution<> dis(1, 6);
for (int n=0; n<10; ++n)
std::cout << dis(gen) << ' '; //3rd line: Use dis to transform the random unsigned int generated by gen into an int in [1, 6]
std::cout << '\n';
}
Here are some questions I can think of:
1st line of code:
random_device is a class as described by the documentation random_device, so this line means declaring a object rd? If yes, why in 2nd line we pass rd() to construct mt19937 instead of using the object rd (without parentheses)?
3rd line of code:
Why do call class uniform_int_distribution<> object dis()? Is dis() a function? Why shall we pass in gen object into dis()?

random_device is slow but genuinely random, it's used to generate the 'seed' for the random number sequence.
mt19937 is fast but only 'pseudo random'. It needs a 'seed' to start generating a sequence of numbers. That seed can be random (as in your example) so you get a different sequence of random numbers each time. But it could be a constant, so you get the same sequence of numbers each time.
uniform_int_distribution is a way of mapping random numbers (which could have any values) to the numbers you're actually interested in, in this case a uniform distribution of integers from 1 to 6.
As is often the case with OO programming, this code is about division of responsibilities. Each class contributes a small piece to the overall requirement (the generation of dice rolls). If you wanted to do something different it's easy because you've got all the pieces in front of you.
If this is too much then all you need to do is write a function to capture the overall effect, for instance
int dice_roll()
{
static std::random_device rd;
static std::mt19937 gen(rd());
static std::uniform_int_distribution<> dis(1, 6);
return dis(gen);
}
dis is an example of a function object or functor. It's an object which overloads operator() so it can be called as if it was a function.

std::random_device rd; // create access to truly random numbers
std::mt19937 gen{rd()}; // create pseudo random generator.
// initialize its seed to truly random number.
std::uniform_int_distribution<> dis{1, 6}; // define distribution
...
auto x = dis(gen); // generate pseudo random number form `gen`
// and transform its result to desired distribution `dis`.

Repeated initial sequence of random generator

I need to gen pseudo-random numbers in the 0 : 23 range. I'm trying this:
#include <iostream>
#include <cstdlib>
#include <random>
#include <ctime>
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0,23);
unsigned int random;
random = distribution(generator);
My problem is: Everytime I run my program, the first three random numbers is 0 , 3 , 18.
How can I solve this, and why this happens ?

Remember the P stands for "pseudo"!
A PRNG takes a seed to start generation of a pseudo random number sequence from. Since you don't provide it yourself, std::default_random_engine uses the same seed when default constructed. So you get the same sequence every time.
One possible and easy way to seed it, is to employ a std::random_device as a source for a little entropy:
std::random_device r;
std::default_random_engine generator(r());
If possible, r will produce a non-deterministic number. Otherwise, it too will be PRNG, so you aren't worse off. It's not the best scheme, but it should get you started.

Mersenne twister warm up vs. reproducibility

In my current C++11 project I need to perform M simulations. For each simulation m = 1, ..., M, I randomly generate a data set by using a std::mt19937 object, constructed as follows:
std::mt19937 generator(m);
DatasetFactory dsf(generator);
According to https://stackoverflow.com/a/15509942/1849221 and https://stackoverflow.com/a/14924350/1849221, the Mersenne Twister PRNG benefits from a warm up phase, which is currently absent in my code. I report for convenience the proposed snippet of code:
#include <random>
std::mt19937 get_prng() {
std::uint_least32_t seed_data[std::mt19937::state_size];
std::random_device r;
std::generate_n(seed_data, std::mt19937::state_size, std::ref(r));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
return std::mt19937{q};
}
The problem in my case is that I need reproducibility of results, i.e., among different executions, for each simulation, the data set has to be the same. That's the reason why in my current solution I use the current simulation to seed the Mersenne Twister PRNG. It seems to me that the usage of std::random_device prevents data from being the same (AFAIK, this is the exact purpose of std::random_device).
EDIT: by different executions I mean re-launching the executable.
How can I introduce the afore-mentioned warm up phase in my code without affecting reproducibility? Thanks.
Possible solution #1
Here's a tentative implementation based on the second proposal by #SteveJessop
#include <random>
std::mt19937 get_generator(unsigned int seed) {
std::minstd_rand0 lc_generator(seed);
std::uint_least32_t seed_data[std::mt19937::state_size];
std::generate_n(seed_data, std::mt19937::state_size, std::ref(lc_generator));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
return std::mt19937{q};
}
Possible solution #2
Here's a tentative implementation based on the joint contribution by #SteveJassop and #AndréNeve. The sha256 function is adapted from https://stackoverflow.com/a/10632725/1849221
#include <openssl/sha.h>
#include <sstream>
#include <iomanip>
#include <random>
std::string sha256(const std::string str) {
unsigned char hash[SHA256_DIGEST_LENGTH];
SHA256_CTX sha256;
SHA256_Init(&sha256);
SHA256_Update(&sha256, str.c_str(), str.size());
SHA256_Final(hash, &sha256);
std::stringstream ss;
for(int i = 0; i < SHA256_DIGEST_LENGTH; i++)
ss << std::hex << std::setw(2) << std::setfill('0') << (int)hash[i];
return ss.str();
}
std::mt19937 get_generator(unsigned int seed) {
std::string seed_str = sha256(std::to_string(seed));
std::seed_seq q(seed_str.begin(), seed_str.end());
return std::mt19937{q};
}
Compile with: -I/opt/ssl/include/ -L/opt/ssl/lib/ -lcrypto

Two options:
Follow the proposal you have, but instead of using std::random_device r; to generate your seed sequence for MT, use a different PRNG seeded with m. Choose one that doesn't suffer like MT does from needing a warmup when used with small seed data: I suspect an LCG will probably do. For massive overkill, you could even use a PRNG based on a secure hash. This is a lot like "key stretching" in cryptography, if you've heard of that. You could in fact use a standard key stretching algorithm, but you're using it to generate a long seed sequence rather than large key material.
Continue using m to seed your MT, but discard a large constant amount of data before starting the simulation. That is to say, ignore the advice to use a strong seed and instead run the MT long enough for it to reach a decent internal state. I don't know off-hand how much data you need to discard, but I expect the internet does.

I think that you only need to store the initial seed (in your case the std::uint_least32_t seed_data[std::mt19937::state_size] array) and the number n of warmup steps you made (eg. using discard(n) as mentioned) for each run/simulation you wish to reproduce.
With this information, you can always create a new MT instance, seed it with the previous seed_data and run it for the same n warmup steps. This will generate the same sequence of values onwards since the MT instance will have the same inner state when the warmup ends.
When you mention the std::random_device affecting reproducibility, I believe that in your code it is simply being used to generate the seed data. If you were using it as the source of random numbers itself, then you would not be able to have reproducible results. Since you are using it only to generate the seed there shouldn't be any problem. You just can't generate a new seed every time if you want to reproduce values!
From the definition of std::random_device:
"std::random_device is a uniformly-distributed integer random number generator that produces non-deterministic random numbers."
So if it's not deterministic you cannot reproduce the sequence of values produced by it. That being said, use it simply to generate good random seeds only to store them afterwards for the re-runs.
Hope this helps
EDIT :
After discussing with #SteveJessop, we arrived at the conclusion that a simple hash of the dataset (or part of it) would be sufficient to be used as a decent seed for the purpose you need. This allows for a deterministic way of generating the same seeds every time you run your simulations. As mentioned by #Steve, you will have to guarantee that the size of the hash isn't too small compared with std::mt19937::state_size. If it is too small, then you can concatenate the hashes of m, m+M, m+2M, ... until you have enough data, as he suggested.
I am posting the updated answer here as the idea of using a hash was mine, but I will upvote #SteveJessop's answer because he contributed to it.

A comment on one of the answers you link to indicates:
Coincidentally, the default C++11 seed_seq is the Mersenne Twister warmup sequence (although the existing implementations, libc++'s mt19937 for example, use a simpler warmup when a single-value seed is provided)
So you may be able to use your current fixed seeds with std::seed_seq to do the warm-up for you.
std::mt19937 get_prng(int seed) {
std::seed_seq q{seed, maybe, some, extra, fixed, values};
return std::mt19937{q};
}

Does std::mt19937 require warmup?

I've read that many pseudo-random number generators require many samples in ordered to be "warmed up". Is that the case when using std::random_device to seed std::mt19937, or can we expect that it's ready after construction? The code in question:
#include <random>
std::random_device rd;
std::mt19937 gen(rd());

Mersenne Twister is a shift-register based pRNG (pseudo-random number generator) and is therefore subject to bad seeds with long runs of 0s or 1s that lead to relatively predictable results until the internal state is mixed up enough.
However the constructor which takes a single value uses a complicated function on that seed value which is designed to minimize the likelihood of producing such 'bad' states. There's a second way to initialize mt19937 where you directly set the internal state, via an object conforming to the SeedSequence concept. It's this second method of initialization where you may need to be concerned about choosing a 'good' state or doing warmup.
The standard includes an object conforming to the SeedSequence concept, called seed_seq. seed_seq takes an arbitrary number of input seed values, and then performs certain operations on these values in order to produce a sequence of different values suitable for directly setting the internal state of a pRNG.
Here's an example of loading up a seed sequence with enough random data to fill the entire std::mt19937 state:
std::array<int, 624> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 eng(seq);
This ensures that the entire state is randomized. Also, each engine specifies how much data it reads from the seed_sequence so you may want to read the docs to find that info for whatever engine you use.
Although here I load up the seed_seq entirely from std::random_device, seed_seq is specified such that just a few numbers that aren't particularly random should work well. For example:
std::seed_seq seq{1, 2, 3, 4, 5};
std::mt19937 eng(seq);
In the comments below Cubbi indicates that seed_seq works by performing a warmup sequence for you.
Here's what should be your 'default' for seeding:
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 rng(seed);

If you seed with just one 32-bit value, all you will ever get is one of the same 2^32 trajectories through state-space. If you use a PRNG with KiBs of state, then you should probably seed all of it. As described in the comments to #bames63' answer, using std::seed_seq is probably not a good idea if you want to init the whole state with random numbers. Sadly, std::random_device does not conform to the SeedSequence concept, but you can write a wrapper that does:
#include <random>
#include <iostream>
#include <algorithm>
#include <functional>
class random_device_wrapper {
std::random_device *m_dev;
public:
using result_type = std::random_device::result_type;
explicit random_device_wrapper(std::random_device &dev) : m_dev(&dev) {}
template <typename RandomAccessIterator>
void generate(RandomAccessIterator first, RandomAccessIterator last) {
std::generate(first, last, std::ref(*m_dev));
}
};
int main() {
auto rd = std::random_device{};
auto seedseq = random_device_wrapper{rd};
auto mt = std::mt19937{seedseq};
for (auto i = 100; i; --i)
std::cout << mt() << std::endl;
}
This works at least until you enable concepts. Depending on whether your compiler knows about SeedSequence as a C++20 concept, it may fail to work because we're supplying only the missing generate() method, nothing else. In duck-typed template programming, that code is sufficient, though, because the PRNG does not store the seed sequence object.

I believe there are situations where MT can be seeded "poorly" which results in non-optimal sequences. If I remember correctly, seeding with all zeroes is one such case. I would recommend you try to use the WELL generators if this is a serious issue for you. I believe they are more flexible - the quality of the seed does not matter as much. (Perhaps to answer your question more directly: it's probably more efficient to focus on seeding well as opposed to seeding poorly then trying to generate a bunch of samples to get the generator to an optimal state.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js