Why GCC and MSVC std::normal_distribution are different? [duplicate]

Why GCC and MSVC std::normal_distribution are different? [duplicate] - c++

This question already has an answer here:
std::normal_distribution<double> results in wrong order windows versus linux?
(1 answer)
Closed 6 years ago.
I have a simple code sample:
#include <iostream>
#include <random>
using namespace std;
int main() {
minstd_rand0 gen(1);
uniform_real_distribution<double> dist(0.0, 1.0);
for(int i = 0; i < 10; ++i) {
cout << "1 " << dist(gen) << endl;
}
normal_distribution<double> dist2(0.0, 1.0);
minstd_rand0 gen2(1);
for(int i = 0; i < 10; ++i) {
cout << "2 " << dist2(gen2) << endl;
}
return 0;
}
Which I compile on gcc and msvc. I get diferent results on std code!(
So why GCC and MSVC std::normal_distribution results are diferent for the same seed and generator, and, most importantly, how to force them to be same?

Unlike the PRN generators defined by the standard that must produce the same output for the same seed the standard does not keep that mandate for distrobutions. From [rand.dist.general]/3
The algorithms for producing each of the specified distributions are implementation-defined.
So In this case even though the distribution has to have a density function in the form of
How the implementation does that is up to them.
The only way to get a portable distribution would be to write one yourself or use a third party library.

It's problematic, but the standard unfortunately does not specify in detail what algorithm to use when constructing (many) of the randomly distributed numbers, and there are several valid alternatives, with different benefits.
26.6.8.5 Normal distributions [rand.dist.norm]
26.6.8.5.1 Class template normal_distribution [rand.dist.norm.normal]
A normal_distribution random number distribution produces random
numbers x distributed according to the probability density function
parameters μ and are also known as this distribution’s mean and
standard deviation .
The most common algorithm for generating normally distributed numbers is Box-Muller, but even with that algorithm there are options and variations.
The freedom is even explicitly mentioned in the standard:
26.6.8 Random number distribution class templates [rand.dist]
. . .
3 The
algorithms for producing each of the specified distributions are
implementation-defined.
A goto option for this is boost random
By the way, as #Hurkyl points out: It seems that the two implementations are actually the same: For example box-muller generates pairs of values, of which one is returned and one is cached. The two implementations differ only in which of the values is returned.
Further, the random number engines are completely specified and will give the same sequence between implementations, but care does need to be taken since the different distributions can also consume different amounts of random data in order to produce their results, which will put the engines out of sync.

Related

What is an efficient method to force uniqueness using rand();

If I used (with appropriate #includes)
int main()
{
srand(time(0));
int arr[1000];
for(int i = 0; i < 1000; i++)
{
arr[i] = rand() % 100000;
}
return 0;
}
To generate random 5-digit ID numbers (disregard iomanip stuff here), would those ID numbers be guranteed by rand() to be unique? I've been running another loop to check all the values of the array vs the recently generated ID number but it takes forever to run, considering the nested 1000 iteration loops. By the way is there a simple way to do that check?

Since the question was tagged c++11,
you should consider using <random> in place of rand().
Using a standard distribution engine, you can't guarantee that you will get back unique values. If you use a std::set, you can keep retrying until you have the right amount. Depending on your distribution range, and the amount of unique values you are requesting, that may be adequate.
For example, here is a customized function to get n unique values from range [x,y].
#include <unordered_set>
#include <iostream>
#include <random>
template <typename T>
std::unordered_set<T> GetUniqueNumbers(int amount, T low, T high){
static std::random_device random_device;
static std::mt19937 engine{random_device()};
std::uniform_int_distribution<T> dist(low, high);
std::unordered_set<T> uniques;
while (uniques.size() < amount){
uniques.insert(dist(engine));
}
return uniques;
}
int main(){
//get 10 unique numbers between [0,100]
auto numbers = GetUniqueNumbers(10,0,100);
for (auto number: numbers){
std::cout << number << " ";
}
}

No, because any guarantee about the output of a random source makes it less random.
There are specific mathematical formulas that have the behavior known as a random permutation. This site seems to have quite a good write-up about it: http://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/

No, there is definitely no guarantee rand will not produce duplicate numbers, designing it in such a way would not only be expensive in terms of remembering all the numbers it has returned so far but will also reduce its randomness greatly (after it had returned many numbers you could guess what it is likely to return from what it had already returned so far).
If uniqueness is your only goal, just use an incrementing ID number for each thing. If the numbers must also be arbitrary and hard to guess you will have to use some kind of random generator or hash, but should make the numbers much longer to make the chance of a collision much closer to 0.
However if you absolutely must do it the current way I would suggest storing all the numbers you have generated so far into a std::unordered_map and generating another random number if it is already in it.

There is a common uniqueness guarantee in most PRNGs, but it won't help you here. A generator will typically iterate over a finite number of states and not visit the same state twice until every other state has been visited once.
However, a state is not the same thing as the number you get to see. Many states can map to the same number and in the worst possible case two consecutive states could map to the same number.
That said, there are specific configurations of PRNG that can visit every value in a range you specify exactly once before revisiting an old state. Notably, an LCG designed with a modulo that is a multiple of your range can be reduced to exactly your range with another modulo operation. Since most LCG implementations have a power-of-two period, this means that the low-order bits repeat with shorter periods. However, 10000 is not a power of two, so that won't help you.
A simple method is to use an LCG, bitmask it down to a power of two larger than your desired range, and just throw away results that it produces that are out of range.

How to use <random> to replace rand()?

C++11 introduced the header <random> with declarations for random number engines and random distributions. That's great - time to replace those uses of rand() which is often problematic in various ways. However, it seems far from obvious how to replace
srand(n);
// ...
int r = rand();
Based on the declarations it seems a uniform distribution can be built something like this:
std::default_random_engine engine;
engine.seed(n);
std::uniform_int_distribution<> distribution;
auto rand = [&](){ return distribution(engine); }
This approach seems rather involved and is surely something I won't remember unlike the use of srand() and rand(). I'm aware of N4531 but even that still seems to be quite involved.
Is there a reasonably simple way to replace srand() and rand()?

Is there a reasonably simple way to replace srand() and rand()?
Full disclosure: I don't like rand(). It's bad, and it's very easily abused.
The C++11 random library fills in a void that has been lacking for a long, long time. The problem with high quality random libraries is that they're oftentimes hard to use. The C++11 <random> library represents a huge step forward in this regard. A few lines of code and I have a very nice generator that behaves very nicely and that easily generates random variates from many different distributions.
Given the above, my answer to you is a bit heretical. If rand() is good enough for your needs, use it. As bad as rand() is (and it is bad), removing it would represent a huge break with the C language. Just make sure that the badness of rand() truly is good enough for your needs.
C++14 didn't deprecate rand(); it only deprecated functions in the C++ library that use rand(). While C++17 might deprecate rand(), it won't delete it. That means you have several more years before rand() disappears. The odds are high that you will have retired or switched to a different language by the time the C++ committee finally does delete rand() from the C++ standard library.
I'm creating random inputs to benchmark different implementations of std::sort() using something along the lines of std::vector<int> v(size); std::generate(v.begin(), v.end(), std::rand);
You don't need a cryptographically secure PRNG for that. You don't even need Mersenne Twister. In this particular case, rand() probably is good enough for your needs.
Update
There is a nice simple replacement for rand() and srand() in the C++11 random library: std::minstd_rand.
#include <random>
#include <iostream>
int main ()
{
std:: minstd_rand simple_rand;
// Use simple_rand.seed() instead of srand():
simple_rand.seed(42);
// Use simple_rand() instead of rand():
for (int ii = 0; ii < 10; ++ii)
{
std::cout << simple_rand() << '\n';
}
}
The function std::minstd_rand::operator()() returns a std::uint_fast32_t. However, the algorithm restricts the result to between 1 and 231-2, inclusive. This means the result will always convert safely to a std::int_fast32_t (or to an int if int is at least 32 bits long).

How about randutils by Melissa O'Neill of pcg-random.org?
From the introductory blog post:
randutils::mt19937_rng rng;
std::cout << "Greetings from Office #" << rng.uniform(1,17)
<< " (where we think PI = " << rng.uniform(3.1,3.2) << ")\n\n"
<< "Our office morale is " << rng.uniform('A','D') << " grade\n";

Assuming you want the behavior of the C-style rand and srand functions, including their quirkiness, but with good random, this is the closest I could get.
#include <random>
#include <cstdlib> // RAND_MAX (might be removed soon?)
#include <climits> // INT_MAX (use as replacement?)
namespace replacement
{
constexpr int rand_max {
#ifdef RAND_MAX
RAND_MAX
#else
INT_MAX
#endif
};
namespace detail
{
inline std::default_random_engine&
get_engine() noexcept
{
// Seeding with 1 is silly, but required behavior
static thread_local auto rndeng = std::default_random_engine(1);
return rndeng;
}
inline std::uniform_int_distribution<int>&
get_distribution() noexcept
{
static thread_local auto rnddst = std::uniform_int_distribution<int> {0, rand_max};
return rnddst;
}
} // namespace detail
inline int
rand() noexcept
{
return detail::get_distribution()(detail::get_engine());
}
inline void
srand(const unsigned seed) noexcept
{
detail::get_engine().seed(seed);
detail::get_distribution().reset();
}
inline void
srand()
{
std::random_device rnddev {};
srand(rnddev());
}
} // namespace replacement
The replacement::* functions can be used exactly like their std::* counterparts from <cstdlib>. I have added a srand overload that takes no arguments and seeds the engine with a “real” random number obtained from a std::random_device. How “real” that randomness will be is of course implementation defined.
The engine and the distribution are held as thread_local static instances so they carry state across multiple calls but still allow different threads to observe predictable sequences. (It's also a performance gain because you don't need to re-construct the engine or use locks and potentially trash other people's cashes.)
I've used std::default_random_engine because you did but I don't like it very much. The Mersenne Twister engines (std::mt19937 and std::mt19937_64) produce much better “randomness” and, surprisingly, have also been observed to be faster. I don't think that any compliant program must rely on std::rand being implemented using any specific kind of pseudo random engine. (And even if it did, implementations are free to define std::default_random_engine to whatever they like so you'd have to use something like std::minstd_rand to be sure.)

Abusing the fact that engines return values directly
All engines defined in <random> has an operator()() that can be used to retrieve the next generated value, as well as advancing the internal state of the engine.
std::mt19937 rand (seed); // or an engine of your choosing
for (int i = 0; i < 10; ++i) {
unsigned int x = rand ();
std::cout << x << std::endl;
}
It shall however be noted that all engines return a value of some unsigned integral type, meaning that they can potentially overflow a signed integral (which will then lead to undefined-behavior).
If you are fine with using unsigned values everywhere you retrieve a new value, the above is an easy way to replace usage of std::srand + std::rand.
Note: Using what has been described above might lead to some values having a higher chance of being returned than others, due to the fact that the result_type of the engine not having a max value that is an even multiple of the highest value that can be stored in the destination type.
If you have not worried about this in the past — when using something like rand()%low+high — you should not worry about it now.
Note: You will need to make sure that the std::engine-type::result_type is at least as large as your desired range of values (std::mt19937::result_type is uint_fast32_t).
If you only need to seed the engine once
There is no need to first default-construct a std::default_random_engine (which is just a typedef for some engine chosen by the implementation), and later assigning a seed to it; this could be done all at once by using the appropriate constructor of the random-engine.
std::random-engine-type engine (seed);
If you however need to re-seed the engine, using std::random-engine::seed is the way to do it.
If all else fails; create a helper-function
Even if the code you have posted looks slightly complicated, you are only meant to write it once.
If you find yourself in a situation where you are tempted to just copy+paste what you have written to several places in your code it is recommended, as always when doing copy+pasting; introduce a helper-function.
Intentionally left blank, see other posts for example implementations.

You can create a simple function like this:
#include <random>
#include <iostream>
int modernRand(int n) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, n);
return dis(gen);
}
And later use it like this:
int myRandValue = modernRand(n);
As mentioned here

several random numbers c++

I am a physicist, writing a program that involves generating several (order of a few billions) random numbers, drawn from a Gaussian distribution. I am trying to use C++11. The generation of these random numbers is separated by an operation that should take very little time. My biggest worry is if the fact that I am generating so many random numbers, with such a little time gap, could potentially lead to sub-optimal performance. I am testing certain statistical properties, which rely heavily on the independence of the randomness of the numbers, so, my result is particularly sensitive to these issues. My question is, with the kinds of numbers I mention below in the code (a simplified version of my actual code), am I doing something obviously (or even, subtly) wrong?
#include <random>
// Several other includes, etc.
int main () {
int dim_vec(400), nStats(1e8);
vector<double> vec1(dim_vec), vec2(dim_vec);
// Initialize the above vectors, which are order 1 numbers.
random_device rd;
mt19937 generator(rd());
double y(0.0);
double l(0.0);
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
normal_distribution<double> distribution(0.0,1/sqrt(vec1[j]));
l=distribution(generator);
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
}

The normal_distribution is allowed to have state. And with this particular distribution, it is common to generate numbers in pairs with every other call, and on the odd calls, return the second cached number. By constructing a new distribution on each call you are throwing away that cache.
Fortunately you can "shape" a single distribution by calling with different normal_distribution::param_type's:
normal_distribution<double> distribution;
using P = normal_distribution<double>::param_type;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l=distribution(generator, P(0.0,1/sqrt(vec1[j])));
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
I'm not familiar with all implementations of std::normal_distribution. However I wrote the one for libc++. So I can tell you with some amount of certainty that my slight rewrite of your code will have a positive performance impact. I am unsure what impact it will have on the quality, except to say that I know it won't degrade it.
Update
Regarding Severin Pappadeux's comment below about the legality of generating pairs of numbers at a time within a distribution: See N1452 where this very technique is discussed and allowed for:
Distributions sometimes store values from their associated source of
random numbers across calls to their operator(). For example, a common
method for generating normally distributed random numbers is to
retrieve two uniformly distributed random numbers and compute two
normally distributed random numbers out of them. In order to reset the
distribution's random number cache to a defined state, each
distribution has a reset member function. It should be called on a
distribution whenever its associated engine is exchanged or restored.

Some thoughts on top of excellent HH answer
Normal distribution (mu,sigma) is generated from normal (0,1) by shift and scale:
N(mu, sigma) = mu + N(0,1)*sigma
if your mean (mu) is always zero, you could simplify and speed-up (by not adding 0.0) your code by doing something like
normal_distribution<double> distribution;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l = distribution(generator);
y += l*vec2[j]/sqrt(vec1[j]);
}
cout << y << endl;
y=0.0;
}
If speed is of utmost importance, I would try to precompute everything I can outside the main 10^8 loop. Is it possible to precompute sqrt(vec1[j]) so you save on sqrt() call? Is it possible to
have vec2[j]/sqrt(vec1[j]) as a single vector?
If it is not possible to precompute those vectors, I would try to save on memory access. Keeping pieces of vec2[j] and vec1[j] together might help with fetching one cache line instead of two. So declare vector<pair<double,double>> vec12(dim_vec); and use in sampling y+=l*vec12[j].first/sqrt(vec12[j].second)

Get random number in sequence C++

Is there a way using the C++ standard library built in random generator to get a specific random number in a sequence, without saving them all?
Like
srand(cTime);
getRand(1); // 10
getRand(2); // 8995
getRand(3); // 65464456
getRand(1); // 10
getRand(2); // 8995
getRand(1); // 10
getRand(3); // 65464456

C++11 random number engines are required to implement a member function discard(unsigned long long z) (§26.5.1.4) that advances the random number sequence by z steps. The complexity guarantee is quite weak: "no worse than
the complexity of z consecutive calls e()". This member obviously exists solely to make it possible to expose more performant implementations when possible as note 274 states:
This operation is common in user code, and can often be implemented
in an engine-specific manner so as to provide significant performance
improvements over an equivalent naive loop that makes z consecutive
calls e().
Given discard you can easily implement your requirement to retrieve the nth number in sequence by reseeding a generator, discarding n-1 values and using the next generated value.
I'm unaware of which - if any - of the standard RNG engines are amenable to efficient implementations of discard. It may be worth your time to do a bit of investigation and profiling.

You have to save the numbers. There may be other variants, but it still requires saving a list of numbers (e.g. using different seeds based on the argument to getRand() - but that wouldn't really be beneficial over saving them).
Something like this would work reasonably well, I'd say:
int getRand(int n)
{
static std::map<int, int> mrand;
// Check if it's there.
if ((std::map::iterator it = mrand.find(n)) != mrand.end())
{
return it->second;
}
int r = rand();
mrand[n] = r;
return r;
}
(I haven't compiled this code, just written it up as a "this sort of thing might work")

Implement getRand() to always seed and then return the given number. This will interfere with all other random numbers in a system, though, and will be slow, especially for large indexes. Assuming a 1-based index:
int getRand(int index)
{
srand(999); // fix the seed
for (int loop=1; loop<index; ++loop)
rand();
return rand();
}

Similar to cdmh's post,
Following from C++11 could also be used :
#include<random>
long getrand(int index)
{
std::default_random_engine e;
for(auto i=1;i<index;i++)
e();
return e();
}

Check out:
Random123
From the documentation:
Random123 is a library of "counter-based" random number generators (CBRNGs), in which the Nth random number can be obtained by applying a stateless mixing function to N..

Generating two independent random number sequences (C++)

I'd like to be able to do something like this (obviously not valid C++):
rng1 = srand(x)
rng2 = srand(y)
//rng1 and rng2 give me two separate sequences of random numbers
//based on the srand seed
rng1.rand()
rng2.rand()
Is there any way to do something like this in C++? For example in Java I can create two java.util.Random objects with the seeds I want. It seems there is only a single global random number generator in C++. I'm sure there are libraries that provide this functionality, but anyway to do it with just C++?

Use rand_r.

In TR1 (and C++0x), you could use the tr1/random header. It should be built-in for modern C++ compilers (at least for g++ and MSVC).
#include <tr1/random>
// use #include <random> on MSVC
#include <iostream>
int main() {
std::tr1::mt19937 m1 (1234); // <-- seed x
std::tr1::mt19937 m2 (5678); // <-- seed y
std::tr1::uniform_int<int> distr(0, 100);
for (int i = 0; i < 20; ++ i) {
std::cout << distr(m1) << "," << distr(m2) << std::endl;
}
return 0;
}

You could also use Boost.Random.
More technical documentation here.

I just want to point out, that using different seeds may not give you statistically independent random sequences. mt19937 is an exception. Two mt19937 objects initialized with different seeds will give you more or less (depending whom you ask) statistically independent sequences with very high probability (there is a small chance that the sequences will overlap). Java's standard RNG is notoriously bad. There are plenty of implementations of mt19937 for Java, which should be preferred over the stock RNG.

as #James McNellis said, I can't imagine why do you would do that, and what pros you will get. Describe what effect you would like to achieve.

For whatever reason, the following generators interfere with each other. I need two independent generators for a task and need to reconstruct the streams. I haven't dug into the code but the std::tr1 and C++11 generators seem to share states in common. Adding m2 below changes what m1 will deliver.
std::tr1::mt19937 m1 (1234); // <-- seed x
std::tr1::mt19937 m2 (5678); // <-- seed y
I had to build my own to ensure independence.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js