Re-initializing random distribution - c++

Is it reasonable to expect that a distribution from <random> re-initialized before each next number request behaves the same way as if it was initialized once? In other words, does this:
std::default_random_engine generator;
int p[10]={};
for (int i=0; i<nrolls; ++i) {
std::uniform_int_distribution<int> distribution(0,9);
int number = distribution(generator);
++p[number];
}
have the same distribution as that
std::uniform_int_distribution<int> distribution(0,9);
std::default_random_engine generator;
int p[10]={};
for (int i=0; i<nrolls; ++i) {
int number = distribution(generator);
++p[number];
}
I've checked that for uniform and normal distribution it empirically holds true. Can I expect it from every distribution in <random>?

I essentially do what your first implementation does. Construct one every time I need a distribution.
That said, yes, the behavior of the different distributions are guaranteed by the standard to behave in specific ways.
STL recommends, that since distributions are relatively cheap, don't worry about constructing one every time you need one or need a new range. He also says if you don't want to construct one every time, you can use the param member function to change the distribution range.
Microsoft Channel9 link if the above direct Youtube link dies (seek to 30 minutes in): https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful
EDIT
I was re-watching a CppCon talk from a few years ago that discusses this exact question. The result? Constructing distributions as local variables even inside loops is 4.5 times faster.

Related

Why is this random number generator generating same numbers?

The first one works, but the second one always returns the same value. Why would this happen and how am I supposed to fix this?
int main() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0, 1);
for(int i = 0; i < 10; i++) {
std::cout << dis(gen) << std::endl;
}return 0;
}
The one dosen't work:
double generateRandomNumber() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0, 1);
return dis(gen);
}
int main() {
for(int i = 0; i < 10; i++) {
std::cout << generateRandomNumber() << std::endl;
}return 0;
}
What platform are you working on? std::random_device is allowed to be a pseudo-RNG if hardware or OS functionality to generate random numbers doesn't exist. It might initialize using the current time, in which case the intervals at which you're calling it might be too close apart for the 'current time' to take on another value.
Nevertheless, as mentioned in the comments, it is not meant to be used this way. A simple fix will be to declare rd and gen as static. A proper fix would be to move the initialization of the RNG out of the function that requires the random numbers, so it can also be used by other functions that require random numbers.
The first one uses the same generator for all the numbers, the second creates a new generator for each number.
Let's compare the differences between your two cases and see why this happening.
Case 1:
int main() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0, 1);
for(int i = 0; i < 10; i++) {
std::cout << dis(gen) << std::endl;
}return 0;
}
In your first case the program executes the main function and the first thing that happens here is that you are creating an instance of a std::random_device, std::mt19337 and a std::uniform_real_distribution<> on the stack that belong to main()'s scope. Your mersenne twister gen is initialized once with the result from your random device rd. You have also initialized your distribution dis to have the range of values from 0 to 1. These only exist once per each run of your application.
Now you create a for loop that starts at index 0 and increments to 9 and on each iteration you are displaying the resulting value to cout by using the distribution dis's operator()() passing to it your already seeded generation gen. Each time on this loop dis(gen) is going to produce a different value because gen was already seeded only once.
Case 2:
double generateRandomNumber() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0, 1);
return dis(gen);
}
int main() {
for(int i = 0; i < 10; i++) {
std::cout << generateRandomNumber() << std::endl;
}return 0;
}
In this version of the code let's see what's similar and what's different. Here the program executes and enters the main() function. This time the first thing it encounters is a for loop from 0 to 9 similar as in the main above however this loop is the first thing on main's stack. Then there is a call to cout to display results from a user defined function named generateRandomNumber(). This function is called a total of 10 times and each time you iterate through the for loop this function has its own stack memory that will be wound and unwound or created and destroyed.
Now let's jump execution into this user defined function named generateRandomNumber().
The code looks almost exactly the same as it did before when it was in main() directly but these variables live in generateRandomNumber()'s stack and have the life time of its scope instead. These variables will be created and destroyed each time this function goes in and out of scope. The other difference here is that this function also returns dis(gen).
Note: I'm not 100% sure if this will return a copy or not or if the compiler will end up doing some kind of optimizations, but returning by value usually results in a copy.
Finally when then function generateRandomNumber() returns and just before it goes completely out of scope where std::uniform_real_distribrution<>'s operator()() is being called and it goes into it's own stack and scope before returning back to main generateRandomNumber() ever so briefly and then back to main.
-Visualizing The Differences-
As you can see these two programs are quite different, very different to be exact. If you want more visual proof of them being different you can use any available online compiler to enter each program to where it shows you that program in assembly and compare the two assembly versions to see their ultimate differences.
Another way to visualize the difference between these two programs is not only to see their assembly equivalents but to step through each program line by line with a debugger and keep an eye on the stack calls and the winding and unwinding of them and keep an eye of all values as they become initialized, returned and destroyed.
-Assessment and Reasoning-
The reason the first one works as expected is because your random device, your generator and your distribution all have the life time of main and your generator is seeded only once with your random device and you only have one distribution that you are using each time in the for loop.
In your second version main doesn't know anything about any of that and all it knows is that it is going through a for loop and sending returned data from a user function to cout. Now each time it goes through the for loop this function is being called and it's stack as I said is being created and destroyed each time so all if its variables are being created and destroyed. So in this instance you are creating and destroying 10: rd, gen(rd()), and dis(0,1)s instances.
-Conclusion-
There is more to this than what I have described above and the other part that pertains to the behavior of your random number generators is what was mentioned by user Kane in his statement to you from his comment to your question:
From en.cppreference.com/w/cpp/numeric/random/random_device:
"std::random_device may be implemented in terms of an
implementation-defined pseudo-random number engine [...].
In this case each std::random_device object may generate
the same number sequence."
Each time you create and destroy you are seeding the generator over and over again with a new random_device however if your particular machine or OS doesn't have support for using random_device it can either end up using some arbitrary value as its seed value or it could end up using the system clock to generate a seed value.
So let's say it does end up using the system clock, the execution of main()'s for loop happens so fast that all of the work that is being done by the 10 calls to generateRandomNumber() is already executed before a few milliseconds have passed. So here the delta time is minimally small and negligible that it is generating the same seed value on each pass as well as it is generating the same values from the distributions.
Note that std::mt19937 gen(rd()) is very problematic. See this question, which says:
rd() returns a single unsigned int. This has at least 16 bits and probably 32. That's not enough to seed [this generator's huge state].
Using std::mt19937 gen(rd());gen() (seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)
std::random_device can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link)
Furthermore, random_device's approach to generating "nondeterministic" random numbers is "implementation-defined", and random_device allows the implementation to "employ a random number engine" if it can't generate "nondeterministic" random numbers due to "implementation limitations" ([rand.device]). (For example, under the C++ standard, an implementation might implement random_device using timestamps from the system clock, or using fast-moving cycle counters, since both are nondeterministic.)
An application should not blindly call random_device's generator (rd()) without also, at a minimum, calling the entropy() method, which gives an estimate of the implementation's entropy in bits.

Generating Gaussian Noise

I created a function that is suppose to generate a set of normal random numbers from 0 to 1. Although, it seems that each time I run the function the output is the same. I am not sure what is wrong.
Here is the code:
MatrixXd generateGaussianNoise(int n, int m){
MatrixXd M(n,m);
normal_distribution<double> nd(0.0, 1.0);
random_device rd;
mt19937 gen(rd());
for(int i = 0; i < n; i++){
for(int j = 0; j < m; j++){
M(i,j) = nd(gen);
}
}
return M;
}
The output when n = 4 and m = 1 is
0.414089
0.225568
0.413464
2.53933
I used the Eigen library for this, I am just wondering why each time I run it produces the same numbers.
From:
http://en.cppreference.com/w/cpp/numeric/random/random_device
std::random_device may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. In this case each std::random_device object may generate the same number sequence.
Thus, I think you should look into what library stack you are actually using here, and what's known about random_device in your specific implementation.
I realize that this then might in fact be a duplicate of "Why do I get the same sequence for every run with std::random_device with mingw gcc4.8.1?".
Furthermore, it at least used to be that initializating a new mt19937 instance would be kind of expensive. Thus, you have performance reasons in addition to quality of randomness to not re-initalize both your random_device and mt19937 instance for every function call. I would go for some kind of singleton here, unless you have very clear constraints (building in a library, unclear concurrency) that would make that an unuistable choice.

Best place to initialise random generator

In my program I use a random number generator quite a lot. I believe the general rule is that you should define things as close to the place where they're "called", but does this also hold true for random number generators?
For example, in my code I have the choice between:
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uni(-2147483647, 2147483646);
lots of code
for (i = 0; i < 10000; i++)
{
variable x = uni(rng);
}
Or
lots of code
for (i = 0; i < 10000; i++)
{
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uni(-2147483647, 2147483646);
variable x = uni(rng);
}
I would say the first method is faster, but I've gotten a bit confused due to reading many threads in which it is stated to always place everything as close to the place where it's called.
In this case, it's much better to create the RNG outside your loop:
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uni(-2147483647, 2147483646);
for (i = 0; i < 10000; i++)
{
variable x = uni(rng);
}
The reason for this has little to do with performance (although it will likely perform better, too). The reason is to do with correctness:
You're initialising a new random sequence each time through the loop, and reading just one value. Instead, you should be initialising the sequence just once, and consuming many values from it. Initialise outside the loop, and consume within the loop.
On the performance side, reading from a std::random_device is much slower than taking the next value from a PRNG such as std::mt19937. Doing this just once, outside the loop, will save a lot of time. Further, the std::mt19937 PRNG has a large state (624 integers). It generates this initial state from the value passed to its constructor. Again, doing this just once will give you a performance boost.
Of course, initialising outside the loop has the advantage of also being the correct usage model for the standard RNGs.
The reason is, when you located your random generator definings on top of your code, they will become global and they will be defined automatically when you first hit the "Run" button. If you are using those variables in more than one place, probably it would be the best idea. But if you are not, you don't need it. Because in some scenarios, they might not even called. Anyway, this suggestion is for class or method usages.
However, from what I see, you are going to use that number in a for loop, which will cause your computer to run below code 1000 times.
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uni(-2147483647, 2147483646);
That is unnecessary, and useless. I beleive your first code will work better on performance side.

Efficient random number generation with C++11 <random>

I am trying to understand how the C++11 random number generation features are meant to be used. My concern is performance.
Suppose that we need to generate a series of random integers between 0..k, but k changes at every step. What is the best way to proceed?
Example:
for (int i=0; i < n; ++i) {
int k = i; // of course this is more complicated in practice
std::uniform_int_distribution<> dist(0, k);
int random_number = dist(engine);
// do something with random number
}
The distributions that the <random> header provides are very convenient. But they are opaque to the user, so I cannot easily predict how they will perform. It is not clear for example how much (if any) runtime overhead will be caused by the construction of dist above.
Instead I could have used something like
std::uniform_real_distribution<> dist(0.0, 1.0);
for (int i=0; i < n; ++i) {
int k = i; // of course this is more complicated in practice
int random_number = std::floor( (k+1)*dist(engine) );
// do something with random number
}
which avoids constructing a new object in each iteration.
Random numbers are often used in numerical simulations where performance is important. What is the best way to use <random> in these situations?
Please do no answer "profile it". Profiling is part of effective optimization, but so is a good understanding of how a library is meant to be used and the performance characteristics of that library. If the answer is that it depends on the standard library implementation, or that the only way to know is to profile it, then I would rather not use the distributions from <random> at all. Instead I can use my own implementation which will be transparent to me and much easier to optimize if/when necessary.
One thing you can do is to have a permanent distribution object so that you only create the param_type object each time like this:
template<typename Integral>
Integral randint(Integral min, Integral max)
{
using param_type =
typename std::uniform_int_distribution<Integral>::param_type;
// only create these once (per thread)
thread_local static std::mt19937 eng {std::random_device{}()};
thread_local static std::uniform_int_distribution<Integral> dist;
// presumably a param_type is cheaper than a uniform_int_distribution
return dist(eng, param_type{min, max});
}
For maximizing performance, first of all consider different PRNG, such as xorshift128+. It has been reported being more than twice as fast as mt19937 for 64-bit random numbers; see http://xorshift.di.unimi.it/. And it can be implemented with a few lines of code.
Moreover, if you don't need "perfectly balanced" uniform distribution and your k is much less than 2^64 (which likely is), I would suggest to write simply something as:
uint64_t temp = engine_64(); // generates 0 <= temp < 2^64
int random_number = temp % (k + 1); // crop temp to 0,...,k
Note, however, that integer division/modulo operations are not cheap. For example, on an Intel Haswell processor, they take 39-103 processor cycles for 64-bit numbers, which is likely much longer than calling an MT19937 or xorshift+ engine.

How to use <random> to replace rand()?

C++11 introduced the header <random> with declarations for random number engines and random distributions. That's great - time to replace those uses of rand() which is often problematic in various ways. However, it seems far from obvious how to replace
srand(n);
// ...
int r = rand();
Based on the declarations it seems a uniform distribution can be built something like this:
std::default_random_engine engine;
engine.seed(n);
std::uniform_int_distribution<> distribution;
auto rand = [&](){ return distribution(engine); }
This approach seems rather involved and is surely something I won't remember unlike the use of srand() and rand(). I'm aware of N4531 but even that still seems to be quite involved.
Is there a reasonably simple way to replace srand() and rand()?
Is there a reasonably simple way to replace srand() and rand()?
Full disclosure: I don't like rand(). It's bad, and it's very easily abused.
The C++11 random library fills in a void that has been lacking for a long, long time. The problem with high quality random libraries is that they're oftentimes hard to use. The C++11 <random> library represents a huge step forward in this regard. A few lines of code and I have a very nice generator that behaves very nicely and that easily generates random variates from many different distributions.
Given the above, my answer to you is a bit heretical. If rand() is good enough for your needs, use it. As bad as rand() is (and it is bad), removing it would represent a huge break with the C language. Just make sure that the badness of rand() truly is good enough for your needs.
C++14 didn't deprecate rand(); it only deprecated functions in the C++ library that use rand(). While C++17 might deprecate rand(), it won't delete it. That means you have several more years before rand() disappears. The odds are high that you will have retired or switched to a different language by the time the C++ committee finally does delete rand() from the C++ standard library.
I'm creating random inputs to benchmark different implementations of std::sort() using something along the lines of std::vector<int> v(size); std::generate(v.begin(), v.end(), std::rand);
You don't need a cryptographically secure PRNG for that. You don't even need Mersenne Twister. In this particular case, rand() probably is good enough for your needs.
Update
There is a nice simple replacement for rand() and srand() in the C++11 random library: std::minstd_rand.
#include <random>
#include <iostream>
int main ()
{
std:: minstd_rand simple_rand;
// Use simple_rand.seed() instead of srand():
simple_rand.seed(42);
// Use simple_rand() instead of rand():
for (int ii = 0; ii < 10; ++ii)
{
std::cout << simple_rand() << '\n';
}
}
The function std::minstd_rand::operator()() returns a std::uint_fast32_t. However, the algorithm restricts the result to between 1 and 231-2, inclusive. This means the result will always convert safely to a std::int_fast32_t (or to an int if int is at least 32 bits long).
How about randutils by Melissa O'Neill of pcg-random.org?
From the introductory blog post:
randutils::mt19937_rng rng;
std::cout << "Greetings from Office #" << rng.uniform(1,17)
<< " (where we think PI = " << rng.uniform(3.1,3.2) << ")\n\n"
<< "Our office morale is " << rng.uniform('A','D') << " grade\n";
Assuming you want the behavior of the C-style rand and srand functions, including their quirkiness, but with good random, this is the closest I could get.
#include <random>
#include <cstdlib> // RAND_MAX (might be removed soon?)
#include <climits> // INT_MAX (use as replacement?)
namespace replacement
{
constexpr int rand_max {
#ifdef RAND_MAX
RAND_MAX
#else
INT_MAX
#endif
};
namespace detail
{
inline std::default_random_engine&
get_engine() noexcept
{
// Seeding with 1 is silly, but required behavior
static thread_local auto rndeng = std::default_random_engine(1);
return rndeng;
}
inline std::uniform_int_distribution<int>&
get_distribution() noexcept
{
static thread_local auto rnddst = std::uniform_int_distribution<int> {0, rand_max};
return rnddst;
}
} // namespace detail
inline int
rand() noexcept
{
return detail::get_distribution()(detail::get_engine());
}
inline void
srand(const unsigned seed) noexcept
{
detail::get_engine().seed(seed);
detail::get_distribution().reset();
}
inline void
srand()
{
std::random_device rnddev {};
srand(rnddev());
}
} // namespace replacement
The replacement::* functions can be used exactly like their std::* counterparts from <cstdlib>. I have added a srand overload that takes no arguments and seeds the engine with a “real” random number obtained from a std::random_device. How “real” that randomness will be is of course implementation defined.
The engine and the distribution are held as thread_local static instances so they carry state across multiple calls but still allow different threads to observe predictable sequences. (It's also a performance gain because you don't need to re-construct the engine or use locks and potentially trash other people's cashes.)
I've used std::default_random_engine because you did but I don't like it very much. The Mersenne Twister engines (std::mt19937 and std::mt19937_64) produce much better “randomness” and, surprisingly, have also been observed to be faster. I don't think that any compliant program must rely on std::rand being implemented using any specific kind of pseudo random engine. (And even if it did, implementations are free to define std::default_random_engine to whatever they like so you'd have to use something like std::minstd_rand to be sure.)
Abusing the fact that engines return values directly
All engines defined in <random> has an operator()() that can be used to retrieve the next generated value, as well as advancing the internal state of the engine.
std::mt19937 rand (seed); // or an engine of your choosing
for (int i = 0; i < 10; ++i) {
unsigned int x = rand ();
std::cout << x << std::endl;
}
It shall however be noted that all engines return a value of some unsigned integral type, meaning that they can potentially overflow a signed integral (which will then lead to undefined-behavior).
If you are fine with using unsigned values everywhere you retrieve a new value, the above is an easy way to replace usage of std::srand + std::rand.
Note: Using what has been described above might lead to some values having a higher chance of being returned than others, due to the fact that the result_type of the engine not having a max value that is an even multiple of the highest value that can be stored in the destination type.
If you have not worried about this in the past — when using something like rand()%low+high — you should not worry about it now.
Note: You will need to make sure that the std::engine-type::result_type is at least as large as your desired range of values (std::mt19937::result_type is uint_fast32_t).
If you only need to seed the engine once
There is no need to first default-construct a std::default_random_engine (which is just a typedef for some engine chosen by the implementation), and later assigning a seed to it; this could be done all at once by using the appropriate constructor of the random-engine.
std::random-engine-type engine (seed);
If you however need to re-seed the engine, using std::random-engine::seed is the way to do it.
If all else fails; create a helper-function
Even if the code you have posted looks slightly complicated, you are only meant to write it once.
If you find yourself in a situation where you are tempted to just copy+paste what you have written to several places in your code it is recommended, as always when doing copy+pasting; introduce a helper-function.
Intentionally left blank, see other posts for example implementations.
You can create a simple function like this:
#include <random>
#include <iostream>
int modernRand(int n) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, n);
return dis(gen);
}
And later use it like this:
int myRandValue = modernRand(n);
As mentioned here