Repeated initial sequence of random generator - c++

I need to gen pseudo-random numbers in the 0 : 23 range. I'm trying this:
#include <iostream>
#include <cstdlib>
#include <random>
#include <ctime>
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0,23);
unsigned int random;
random = distribution(generator);
My problem is: Everytime I run my program, the first three random numbers is 0 , 3 , 18.
How can I solve this, and why this happens ?

Remember the P stands for "pseudo"!
A PRNG takes a seed to start generation of a pseudo random number sequence from. Since you don't provide it yourself, std::default_random_engine uses the same seed when default constructed. So you get the same sequence every time.
One possible and easy way to seed it, is to employ a std::random_device as a source for a little entropy:
std::random_device r;
std::default_random_engine generator(r());
If possible, r will produce a non-deterministic number. Otherwise, it too will be PRNG, so you aren't worse off. It's not the best scheme, but it should get you started.

Related

How would I seed the PCG RNG with a chosen, set seed?

For testing purposes I need to seed PCG's C++ implementation (the 64 bit output one) with a set value. When I look at the examples I only see it seeding using entropy.
I've used
pcg64 rng(42);
and it's worked, rng() generating the same numbers every time, but PCG64 uses a 256-bit seed and this way seems to generate the same numbers for each value above the 64 bit integer limit.
What's the best way to seed it with known values?
As I read source code, pcg64 is an instantination of engine template class which accepts pcg128_t seed value in constructor. So it is only 128 bit value, not 256 bit.
There are two ways how you can pass 128 bit seed to constructor. First is if you already have pre-defined two 64-bit values, then you can use PCG_128BIT_CONSTANT(hi, lo) macro for building 128 bit value, and write code:
pcg64 rng{PCG_128BIT_CONSTANT(0xC7C8709C9626D159ULL, 0x675BB824D76E9146ULL)};
Second way is by using std::mt19937_64 random generator and maybe std::random_device (for initializing random generator):
#include <random>
....
std::mt19937_64 seed{0xC7C8709C9626D159ULL};
// or
// std::mt19937_64 seed{std::random_device()};
pcg64 rng{PCG_128BIT_CONSTANT(seed(), seed())};
You can also use std::seed_seq for same purpose as std::mt19937_64 was used above:
#include <random>
#include <array>
#include <cstdint>
....
std::seed_seq seed{1, 2, 3, 4, 5};
std::array<uint32_t, 4> seeds{};
seed.generate(seeds.begin(), seeds.end());
pcg64 rng{PCG_128BIT_CONSTANT(((uint64_t(seeds[0]) << 32) | seeds[1]),
((uint64_t(seeds[2]) << 32) | seeds[3]))};

Cpp random number with rand returns very similar numbers

I'm trying to generate a random number using rand() command, but each time i get very similar numbers.
This is my code:
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
srand(time(0));
cout << rand();
return 0;
}
I ran it 5 times and the numbers i got are:
21767
21806
21836
21862
21888
How can i make the numbers be more different?
From the documentation of rand:
There are no guarantees as to the quality of the random sequence produced. In the past, some implementations of rand() have had serious shortcomings in the randomness, distribution and period of the sequence produced (in one well-known example, the low-order bit simply alternated between 1 and 0 between calls).
rand() is not recommended for serious random-number generation needs. It is recommended to use C++11's random number generation facilities to replace rand().
It (and I) recommend to use the newer c++11 random number generators in <random>.
In your specific case it seems you want a std::uniform_int_distribution. An example, as given on the linked page is:
std::random_device rd; //Will be used to obtain a seed for the random number engine
std::mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
std::uniform_int_distribution<> distrib(1, RAND_MAX);
std::cout << distrib(gen) << '\n';

Generating number (0,1) using mersenne twister c++

I'm working on implementing R code into C++ so that it runs faster, but I am having difficulties implementing mersenne twister. I only wish to generate values between (0,1). Here is what I have that pertains to this question.
#include <random>
std::mt19937 generator (123);
std::cout << "Random value: " << generator() << std:: endl;
I tried dividing by RAND_MAX, but that did not produce the values that I was looking for.
Thanks in advance.
In C++11 the concepts of "(pseudo) random generator" and "probability distribution" are separated, and for good reasons.
What you want can be achieved with the following lines:
std::mt19937 generator (123);
std::uniform_real_distribution<double> dis(0.0, 1.0);
double randomRealBetweenZeroAndOne = dis(generator);
If you want to understand why this separation is necessary, and why using a standard division /range manipulation on the output of the generator is a bad idea, watch this video.
You may want to consider code like this:
// For pseudo-random number generators and distributions
#include <random>
...
// Use random_device to generate a seed for Mersenne twister engine.
std::random_device rd{};
// Use Mersenne twister engine to generate pseudo-random numbers.
std::mt19937 engine{rd()};
// "Filter" MT engine's output to generate pseudo-random double values,
// **uniformly distributed** on the closed interval [0, 1].
// (Note that the range is [inclusive, inclusive].)
std::uniform_real_distribution<double> dist{0.0, 1.0};
// Generate pseudo-random number.
double x = dist(engine);
For more details on generating pseudo-random numbers in C++ (including reasons why rand() is not good), see this video by Stephan T. Lavavej (from Going Native 2013):
rand() Considered Harmful
std::mt19937 does not generate between 0 and RAND_MAX like rand(), but between 0 and 2^32-1
And by the way, the class provides min() and max() values!
You need to convert the value to a double, substract min() and divide by max()-min()
uint32_t val;
val << generator;
double doubleval = ((double)val - generator::min())/(generator::max()-generator::min());
or (less generic)
uint32_t val;
val << generator;
double doubleval = (double)val * (1.0 / std::numeric_limits<std::uint32_t>::max());

Mersenne twister warm up vs. reproducibility

In my current C++11 project I need to perform M simulations. For each simulation m = 1, ..., M, I randomly generate a data set by using a std::mt19937 object, constructed as follows:
std::mt19937 generator(m);
DatasetFactory dsf(generator);
According to https://stackoverflow.com/a/15509942/1849221 and https://stackoverflow.com/a/14924350/1849221, the Mersenne Twister PRNG benefits from a warm up phase, which is currently absent in my code. I report for convenience the proposed snippet of code:
#include <random>
std::mt19937 get_prng() {
std::uint_least32_t seed_data[std::mt19937::state_size];
std::random_device r;
std::generate_n(seed_data, std::mt19937::state_size, std::ref(r));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
return std::mt19937{q};
}
The problem in my case is that I need reproducibility of results, i.e., among different executions, for each simulation, the data set has to be the same. That's the reason why in my current solution I use the current simulation to seed the Mersenne Twister PRNG. It seems to me that the usage of std::random_device prevents data from being the same (AFAIK, this is the exact purpose of std::random_device).
EDIT: by different executions I mean re-launching the executable.
How can I introduce the afore-mentioned warm up phase in my code without affecting reproducibility? Thanks.
Possible solution #1
Here's a tentative implementation based on the second proposal by #SteveJessop
#include <random>
std::mt19937 get_generator(unsigned int seed) {
std::minstd_rand0 lc_generator(seed);
std::uint_least32_t seed_data[std::mt19937::state_size];
std::generate_n(seed_data, std::mt19937::state_size, std::ref(lc_generator));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
return std::mt19937{q};
}
Possible solution #2
Here's a tentative implementation based on the joint contribution by #SteveJassop and #AndréNeve. The sha256 function is adapted from https://stackoverflow.com/a/10632725/1849221
#include <openssl/sha.h>
#include <sstream>
#include <iomanip>
#include <random>
std::string sha256(const std::string str) {
unsigned char hash[SHA256_DIGEST_LENGTH];
SHA256_CTX sha256;
SHA256_Init(&sha256);
SHA256_Update(&sha256, str.c_str(), str.size());
SHA256_Final(hash, &sha256);
std::stringstream ss;
for(int i = 0; i < SHA256_DIGEST_LENGTH; i++)
ss << std::hex << std::setw(2) << std::setfill('0') << (int)hash[i];
return ss.str();
}
std::mt19937 get_generator(unsigned int seed) {
std::string seed_str = sha256(std::to_string(seed));
std::seed_seq q(seed_str.begin(), seed_str.end());
return std::mt19937{q};
}
Compile with: -I/opt/ssl/include/ -L/opt/ssl/lib/ -lcrypto
Two options:
Follow the proposal you have, but instead of using std::random_device r; to generate your seed sequence for MT, use a different PRNG seeded with m. Choose one that doesn't suffer like MT does from needing a warmup when used with small seed data: I suspect an LCG will probably do. For massive overkill, you could even use a PRNG based on a secure hash. This is a lot like "key stretching" in cryptography, if you've heard of that. You could in fact use a standard key stretching algorithm, but you're using it to generate a long seed sequence rather than large key material.
Continue using m to seed your MT, but discard a large constant amount of data before starting the simulation. That is to say, ignore the advice to use a strong seed and instead run the MT long enough for it to reach a decent internal state. I don't know off-hand how much data you need to discard, but I expect the internet does.
I think that you only need to store the initial seed (in your case the std::uint_least32_t seed_data[std::mt19937::state_size] array) and the number n of warmup steps you made (eg. using discard(n) as mentioned) for each run/simulation you wish to reproduce.
With this information, you can always create a new MT instance, seed it with the previous seed_data and run it for the same n warmup steps. This will generate the same sequence of values onwards since the MT instance will have the same inner state when the warmup ends.
When you mention the std::random_device affecting reproducibility, I believe that in your code it is simply being used to generate the seed data. If you were using it as the source of random numbers itself, then you would not be able to have reproducible results. Since you are using it only to generate the seed there shouldn't be any problem. You just can't generate a new seed every time if you want to reproduce values!
From the definition of std::random_device:
"std::random_device is a uniformly-distributed integer random number generator that produces non-deterministic random numbers."
So if it's not deterministic you cannot reproduce the sequence of values produced by it. That being said, use it simply to generate good random seeds only to store them afterwards for the re-runs.
Hope this helps
EDIT :
After discussing with #SteveJessop, we arrived at the conclusion that a simple hash of the dataset (or part of it) would be sufficient to be used as a decent seed for the purpose you need. This allows for a deterministic way of generating the same seeds every time you run your simulations. As mentioned by #Steve, you will have to guarantee that the size of the hash isn't too small compared with std::mt19937::state_size. If it is too small, then you can concatenate the hashes of m, m+M, m+2M, ... until you have enough data, as he suggested.
I am posting the updated answer here as the idea of using a hash was mine, but I will upvote #SteveJessop's answer because he contributed to it.
A comment on one of the answers you link to indicates:
Coincidentally, the default C++11 seed_seq is the Mersenne Twister warmup sequence (although the existing implementations, libc++'s mt19937 for example, use a simpler warmup when a single-value seed is provided)
So you may be able to use your current fixed seeds with std::seed_seq to do the warm-up for you.
std::mt19937 get_prng(int seed) {
std::seed_seq q{seed, maybe, some, extra, fixed, values};
return std::mt19937{q};
}

Can boost/random/uniform_int.hpp and boost/random/uniform_int_distribution.hpp be used interchangeably?

There are two random integer generators in boost, boost::uniform_int<> and boost::random::uniform_int_distribution<>, the latter being add only after boost 1.47.
I would like to know if there is any difference in their performance (i.e. the quality of the random numbers they generate)?
Also, with boost::uniform_int<> you need to couple it with a random engine through variate_generate, but seems on boost's official website that you can use
boost::random::mt19937 rng;
boost::random::uniform_int_distribution<> six(1,6);
int x = six(rng);
wihout the variate generate.
Can these two usage be used interchangeably?
boost::uniform_int<> inherits from boost::random::uniform_int_distribution<> and if you look at the header for uniform_int<>, you can see that it basically just calls the base class functions.
Since uniform_int<> just calls uniform_int_distribution<>'s functions, there is no difference in the numbers generated. Boost does explicitly state, however, that uniform_int<> is deprecated, and that uniform_int_distribution<> should be used for all new code.
To answer your second question, neither uniform_int<> nor uniform_int_distribution<> require a boost::random::variate_generator<> to function. The variate_generator<> simply associates a random number generator (like boost::random::mt19937) with a random number distribution (like uniform_int_distribution<>) as a convenience. If you don't use variate_generator<>, then you need to pass a random number generator each time you wish to generate a random number. Here's an example:
#include <boost/random/uniform_int.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/variate_generator.hpp>
#include <iostream>
#include <ctime>
int main()
{
boost::mt19937 rand_generator(std::time(NULL));
boost::random::uniform_int_distribution<> int_distribution(0, 100);
//Need to pass generator
std::cout << int_distribution(rand_generator) << std::endl;
//Associate generator with distribution
boost::random::variate_generator<boost::mt19937&,
boost::random::uniform_int_distribution<>
> int_variate_generator(rand_generator, int_distribution);
//No longer need to pass generator
std::cout << int_variate_generator() << std::endl;
}
Note that the first call is to uniform_int_distribution<> operator() whereas the second call is to variate_generator<> operator(). Associating a generator with a distribution does not change the original generator or distribution objects.
Please let me know if anything is unclear.