How to properly initialize a C++11 std::seed_seq - c++

I have a C++11 program that needs to create several independent random generators, for use by different threads in a parallel computation. These generators should be initialized with different seed values so that they all produce different pseudo-random sequences.
I see that there's a std::seed_seq class that seems to be meant for this purpose, but it's not clear what's the right way to construct one. The examples I've seen, such as the one on cppreference.com, initialize it with a handful of integer constants hard-coded in the program:
std::seed_seq seq{1,2,3,4,5};
I doubt that's actually a recommended best practice, so I'm wondering what is the recommended practice. In particular:
Since a seed_seq can be initialized with an arbitrary number of integers, what's the significance of the length of its initializer list? If I want to produce seeds for 100 random generators, do I need to initialize my seed_seq with 100 integers?
If the length of the initializer list doesn't have to match the number of seeds I intend to generate, is it OK to initialize a seed_seq with just one integer and then use it to produce a large number of seeds?
How about initializing with no integers, i.e. using the default constructor? (This means I'd get the same seeds every time, of course.)
If it's OK to construct a seed_seq from a single integer and then generate lots of seeds from it, what's the benefit of using seed_seq instead of an ordinary random generator? Why not just construct a std::mt19937 from that single integer and use that to produce seed values for other generators?

The trouble with using a fixed sequence like that is that you get the same sequence of seeds out of it, much the same as if you had called srand(42) at the start of your program: it generates identical sequences.
The C++11 standard states (in section 26.5.7.1 Class seed_seq):
A seed sequence is an object that consumes a sequence of integer-valued data and produces a requested number of unsigned integer values i, 0 i < 232, based on the consumed data.
[Note: Such an object provides a mechanism to avoid replication of streams of random variates. This can be useful, for example, in applications requiring large numbers of random number engines. —end note]
It also states how those integers are turned into seeds in paragraph 8 of that section, in such a way that the distribution of those seeds is acceptable even if the integer input items are very similar. So you can probably think of it as a pseudo-random number generator for seed values.
A larger number of items will provide more "randomness" in the seed values, provided they have some randomness themselves. Using constants as input is a bad idea for this reason.
What I tend to do is very similar to the way you normally randomise one generator, with srand (time (0)). In other words:
#include <random>
#include <cstdint>
#include <ctime>
#include <iostream>
int main()
{
std::seed_seq seq{time(0)};
std::vector<std::uint32_t> seeds(10);
seq.generate(seeds.begin(), seeds.end());
for (std::uint32_t n : seeds) {
std::cout << n << '\n';
}
}
If you have multiple sources of randomness, such as a value read from /dev/random under Linux, or a white noise generator of some description, or the average number of milliseconds between keypresses the last time a user ran this program, you could use those as extra inputs:
std::seed_seq seq{time(0), valFromDevRandom(), getWhiteNoise(), avgMillis()};
but I doubt constants are the way to go, since they add no randomness to the equation.

According The C++11 standard (in section 26.5.7.1.8),seed_seq can generate a sequence which is likely generated by a hash function, uniformly and randomly in the range.
I try to answer the below questions:
Q1 "Since a seed_seq can be initialized with an arbitrary number of integers, what's the significance of the length of its initializer list? If I want to produce seeds for 100 random generators, do I need to initialize my seed_seq with 100 integers?"
A1. You needn't initialize seed_seq with a lot integers. Even seed_seq initialized by one random integer, the generated sequence keep the randomness. But you initialize seed_seq with more integers and in the wider range, The generated sequence is more hardly "collide" by attackers.
Q2. "If the length of the initializer list doesn't have to match the number of seeds I intend to generate, is it OK to initialize a seed_seq with just one integer and then use it to produce a large number of seeds?"
A2. Yes, it is OK to initialize a seed_seq with just one integer if you don't need cryptographically secure level.
Q3. "How about initializing with no integers, i.e. using the default constructor? (This means I'd get the same seeds every time, of course.)"
A3. You will get the identical sequences by the default constructed seed_seq runs more. Thus it will became a security hole.
Q4. "If it's OK to construct a seed_seq from a single integer and then generate lots of seeds from it, what's the benefit of using seed_seq instead of an ordinary random generator? Why not just construct a std::mt19937 from that single integer and use that to produce seed values for other generators?"
A4. seed_seq is a light-weight algorithm, only iterates the filled sequence 3 times. I guess you can use other random generator instead of seed_seq.

Related

Match pseudo random numbers between MT19937 CPU and GPU

I am studying the behaviour of CURAND_RNG_PSEUDO_MT19937, specifically in order to match numbers generated by the standard CPU implemetation of Mersenne Twister (std::mt19937 or boost::random::mt19937).
I read in the documentation that cuRand MT19937 “has the same parameters as CPU version, but ordering is different […] Output is generated by 8192 independent generators. Each generator generates consecutive subsequence of the original sequence. […] Results are permuted differently than originally to achieve higher performance.”
Checking the unsigned int output sequences of std::mt19937 and cuRand MT19937 with the same seed, only the first number is equal and immediately the two generators diverge.
Consider that in my CPU environment, I have a distributed computation that instantiate n std::mt19937 with incremental seeds (s+1, s+2, etc.). Do you know if there is a way to modify the cuRand MT19937 generators in order to match my workflow?
Thanks

can rand() be used to generate predictable data?

My goal is generate 2D or 3D geometry without having to store it on disk, so my goal is to have any sort of function than generate the same values according to a small seed. I don't mean to seek random values, but if the same "random" garbage data is returned when given the same seed, that's something I'm looking for.
If I give srand() the same integer, I get the same sequence out of rand(). Is that an intended feature? If not, are there known standard functions designed to do the same thing?
Although I tried this on ideone and on my computer and I get different results, I can understand that those function's implementations are not described, so that explains it.
If I give srand() the same integer, I get the same sequence out of rand(). Is that an intended feature?
Yes, see 7.20.2.2:
7.20.2.2 The srand function
[...] Description
The srand function uses the argument as a seed for a new sequence of pseudo-random
numbers to be returned by subsequent calls to rand. If srand is then called with the
same seed value, the sequence of pseudo-random numbers shall be repeated.
However, that's only true for the same implementation of srand/rand. Another implementation might not use the same algorithm, and therefor won't produce the same sequence.
If not, are there known standard functions designed to do the same thing ?
Well, the functions are standard, but only in their behaviors, not the actual values (see implementation remark above). You're better off by using a specific generator from the C++11 predefined random number generators, since they're standardized.
"If I give srand() the same integer, I get the same sequence out of
rand(). Is that an intended feature ?"
Yes.
If you seed the same random number generator with the same seed, it will produce the same result.
Standard library rand and all it's variants are usually implemented as Linear congruential generators. They are not truly random, and perhaps better referred to as psuedo-random.
You probably saw different results on different machines because either they were using different psuedo-random number generation algorithms or you weren't supplying a fixed seed in which case the current system time is often the default seed.
If you need a fixed set of psuedo-random data, then generate it once and store it.
The answer is yes, you get a repeatable sequence, if you always use the same implementation and the same seed, though it might be ill-advised due to possibly poor quality of rand().
Better use the C++ random number framework in <random> though. It not only allows reproducible sequences across implementations, it also supplies all you need to reliably get the distribution you really want.
Now to the details:
The requirements on rand are:
Generates pseudo-random numbers.
Range is 0 to RAND_MAX (minimum of 32767).
The seed set by srand() determines the sequence of pseudo-random numbers returned.
There is no requirement on what PRNG is implemented, so every implementation can have its own, though Linear Congrueantial Generators are a favorite.
A conforming (though arguably useless) implementation is presented in this dilbert strip:
http://dilbert.com/strips/comic/2001-10-25/
Or for those who like XKCD (It's a perfect drop-in for any C or C++ library ;-)):
For completeness, the standard quotes:
7.22.2.1 The rand function
The rand function computes a sequence of pseudo-random integers in the range 0 to
RAND_MAX.
[...]
The value of the RAND_MAX macro shall be at least 32767.
7.22.2.2 The srand function
The srand function uses the argument as a seed for a new sequence of pseudo-random
numbers to be returned by subsequent calls to rand. If srand is then called with the
same seed value, the sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence shall be generated
as when srand is first called with a seed value of 1.
If you seed the random number generator with the same value, it will produce the same result. You saw different results on different machines because they were (probably) using different random number generation algorithms.

how to generate uncorrelated random sequences using c++

I'd like to generate two sequences of uncorrelated normal distributed random numbers X1, X2.
As normal distributed random numbers come from uniform numbers, all I need is two uncorrelated uniform sequences. But how to do it using:
srand (time(NULL));
I guess I need to seed twice or do something similar?
Since the random numbers generated by a high-quality random-number generator are uniform and independent, you can generate as many independent sequences from it as you like.
You do not need, and should not seed two different generators.
In C++(11), you should use a pseudo-random number generator from the header <random>. Here’s a minimal example that can serve as a template for an actual implementation:
std::random_device seed;
std::mt19937 gen{seed()};
std::normal_distribution<> dist1{mean1, sd1};
std::normal_distribution<> dist2{mean2, sd2};
Now you can generate independent sequences of numbers by calling dist1(gen) and dist2(gen). The random_device is used to seed the actual generator, which in my code is a Mersenne Twister generator. This type of generator is efficient and has good statistical properties. It should be considered the default choice for a (non cryptographically secure) generator.
rand doesn't support generating more than a single sequence. It stores its state in a global variable. On some systems (namely POSIX-compliant ones) you can use rand_r to stay close to that approach. You'd simply use some initial seed as internal state for each. But since your question is tagged C++, I suggest you use the random number facilities introduced in C++11. Or, if C++11 is not an option, use the random module from boost.
A while ago I've asked a similar question, Random numbers for multiple threads, the answers to which might be useful for you as well. They discuss various aspects of how to ensure that sequences are not interrelated, or at least not in an obvious way.
Use two random_devices (possibly with some use of engine) with a normal_distribution from <random> :
std::random_device rd1, rd2;
std::normal_distribution d;
double v1 = d(rd1);
double v2 = d(rd2);
...
See also example code at http://en.cppreference.com/w/cpp/numeric/random/normal_distribution

Does the C++11 standard guarantee identical random numbers for the same seed across implementations?

For example if I instantiate a std::mt19937 with the exact same seed and parameters under GCC and under MSVC, should I get the same sequence of random numbers? If so I assume this property would hold for mersenne_twister_engine in general since mt19937 is just one with specific parameters. This is not true for rand() in C. It looks like the standard documents the transformations applied in terms of specific code, so I suspect it should always be the same, but the devil is in the details...
For the new random number engines, yes, for the same seed and parameters you'll get the same sequence of values on all platforms. For rand(), no. You also don't have that guarantee with random number distributions, even when they are fed the same sequence of input values.

Just how random is std::random_shuffle?

I'd like to generate a random number of reasonably arbitrary length in C++. By "reasonably arbitary" I mean limited by speed and memory of the host computer.
Let's assume:
I want to sample a decimal number (base 10) of length ceil(log10(MY_CUSTOM_RAND_MAX)) from 0 to 10^(ceil(log10(MY_CUSTOM_RAND_MAX))+1)-1
I have a vector<char>
The length of vector<char> is ceil(log10(MY_CUSTOM_RAND_MAX))
Each char is really an integer, a random number between 0 and 9, picked with rand() or similar methods
If I use std::random_shuffle to shuffle the vector, I could iterate through each element from the end, multiplying by incremented powers of ten to convert it to unsigned long long or whatever that gets mapped to my final range.
I don't know if there are problems with std::random_shuffle in terms of how random it is or isn't, particularly when also picking a sequence of rand() results to populate the vector<char>.
How sketchy is std::random_shuffle for generating a random number of arbitrary length in this manner, in a quantifiable sense?
(I realize that there is a library in Boost for making random int numbers. It's not clear what the range limitations are, but it looks like MAX_INT. That said, I realize that said library exists. This is more of a general question about this part of the STL in the generation of an arbitrarily large random number. Thanks in advance for focusing your answers on this part.)
I'm slightly unclear as to the focus of this question, but I'll try to answer it from a few different angles:
The quality of the standard library rand() function is typically poor. However, it is very easy to find replacement random number generators which are of a higher quality (you mentioned Boost.Random yourself, so clearly you're aware of other RNGs). It is also possible to boost (no pun intended) the quality of rand() output by combining the results of multiple calls, as long as you're careful about it: http://www.azillionmonkeys.com/qed/random.html
If you don't want the decimal representation in the end, there's little to no point in generating it and then converting to binary. You can just as easily stick multiple 32-bit random numbers (from rand() or elsewhere) together to make an arbitrary bit-width random number.
If you're generating the individual digits (binary or decimal) randomly, there is little to no point in shuffling them afterwards.