How many random numbers does std::uniform_real_distribution use? - c++

I was surprised to see that the output of this program:
#include <iostream>
#include <random>
int main()
{
std::mt19937 rng1;
std::mt19937 rng2;
std::uniform_real_distribution<double> dist;
double random = dist(rng1);
rng2.discard(2);
std::cout << (rng1() - rng2()) << "\n";
return 0;
}
is 0 - i.e. std::uniform_real_distribution uses two random numbers to produce a random double value in the range [0,1). I thought it would just generate one and rescale that. After thinking about it I guess that this is because std::mt19937 produces 32-bit ints and double is twice this size and thus not "random enough".
Question: How do I find out this number generically, i.e. if the random number generator and the floating point type are arbitrary types?
Edit: I just noticed that I could use std::generate_canonical instead, as I am only interested in random numbers of [0,1). Not sure if this makes a difference.

For template<class RealType, size_t bits, class URNG> std::generate_canonical the standard (section 27.5.7.2) explicitly defines the number of calls to the uniform random number generator (URNG) to be
max(1, b / log_2 R),
where b is the minimum of the number of bits in the mantissa of the RealType and the number of bits given to generate_canonical as template parameter.
R is the range of numbers the URNG can return (URNG::max()-URNG::min()+1).
However, in your example this will not make any difference, since you need 2 calls to the mt19937 to fill the 53 bits of the mantissa of the double.
For other distributions the standard does not provide a generic way to get any information on how many numbers the URNG has to generate to obtain one number of the distribution.
A reason might be that for some distributions the number uniform random numbers required to generate a single number of the distribution is not fixed and may vary from call to call. An example is the std::poisson_distribution, which is usually implemented as a loop which draws a uniform random number in each iteration until the product of these numbers has reached a certain threshold (see for example the implementation of the GNU C++ library (line 1523-1528)).

Related

Generating pseudo-random 16-bit integers

I need to generate 16-bit pseudo-random integers and I am wondering what the best choice is.
The obvious way that comes in my mind is something as follows:
std::random_device rd;
auto seed_data = std::array<int, std::mt19937::state_size> {};
std::generate(std::begin(seed_data), std::end(seed_data), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 generator(seq);
std::uniform_int_distribution<short> dis(std::numeric_limits<short>::min(),
std::numeric_limits<short>::max());
short n = dis(generator);
The problem I see here is that std::mt19937 produces 32-bit unsigned integers since it's defined as this:
using mt19937 = mersenne_twister_engine<unsigned int,
32, 624, 397,
31, 0x9908b0df,
11, 0xffffffff,
7, 0x9d2c5680,
15, 0xefc60000,
18, 1812433253>;
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution. So I am wondering how good are these series of pseudo-random shorts and I don't have the mathematical expertise to answer that.
I expect that a better solution would be to use your own defined mersenne_twister_engine engine for 16-bit integers. However, I haven't found any mentioned set for the template arguments (requirements can be found here for instance). Are there any?
UPDATE: I updated the code sample with proper initialization for the distribution.
Your way is indeed the correct way.
The mathematical arguments are complex (I'll try to dig out a paper), but taking the least significant bits of the Mersenne Twister, as implemented by the C++ standard library, is the correct thing to do.
If you're in any doubt as to the quality of the sequence, then run it through the diehard tests.
There may be a misconception, considering this quote from OP's question (emphasis mine):
The problem I see here is that std::mt19937 produces 32-bit unsigned integers […].
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution.
That's not how it works.
The following are quotes from https://en.cppreference.com/w/cpp/numeric/random
The random number library provides classes that generate random and
pseudo-random numbers. These classes include:
Uniform random bit generators (URBGs), […];
Random number distributions (e.g. uniform, normal, or poisson distributions) which convert the output of URBGs into various statistical distributions
URBGs and distributions are designed to be used together to produce random values.
So a uniform random bit generator, like mt19937 or random_device
is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned.
While a random number distribution, like uniform_int_distribution
post-processes the output of a URBG in such a way that resulting output is distributed according to a defined statistical probability density function.
The way it's done uses all the bits from the source to produce an output. As an example, we can look at the implementation of std::uniform_distribution in libstdc++ (starting at line 824), which can be roughly simplified as
template <typename Type>
class uniform_distribution
{
Type a_ = 0, b_ = std::numeric_limits<Type>::max();
public:
uniform_distribution(Type a, Type b) : a_{a}, b_{b} {}
template<typename URBG>
Type operator() (URBG &gen)
{
using urbg_type = std::make_unsigned_t<typename URBG::result_type>;
using u_type = std::make_unsigned_t<Type>;
using max_type = std::conditional_t<(sizeof(urbg_type) > sizeof(u_type))
, urbg_type, u_type>;
urbg_type urbg_min = gen.min();
urbg_type urbg_max = gen.max();
urbg_type urbg_range = urbg_max - urbg_min;
max_type urange = b_ - a_;
max_type udenom = urbg_range <= urange ? 1 : urbg_range / (urange + 1);
Type ret;
// Note that the calculation may require more than one call to the generator
do
ret = (urbg_type(gen()) - urbg_min ) / udenom;
// which is 'ret = gen / 65535' with OP's parameters
// not a simple cast or bit shift
while (ret > b_ - a_);
return ret + a_;
}
};
This could be tested HERE.

What is an efficient method to force uniqueness using rand();

If I used (with appropriate #includes)
int main()
{
srand(time(0));
int arr[1000];
for(int i = 0; i < 1000; i++)
{
arr[i] = rand() % 100000;
}
return 0;
}
To generate random 5-digit ID numbers (disregard iomanip stuff here), would those ID numbers be guranteed by rand() to be unique? I've been running another loop to check all the values of the array vs the recently generated ID number but it takes forever to run, considering the nested 1000 iteration loops. By the way is there a simple way to do that check?
Since the question was tagged c++11,
you should consider using <random> in place of rand().
Using a standard distribution engine, you can't guarantee that you will get back unique values. If you use a std::set, you can keep retrying until you have the right amount. Depending on your distribution range, and the amount of unique values you are requesting, that may be adequate.
For example, here is a customized function to get n unique values from range [x,y].
#include <unordered_set>
#include <iostream>
#include <random>
template <typename T>
std::unordered_set<T> GetUniqueNumbers(int amount, T low, T high){
static std::random_device random_device;
static std::mt19937 engine{random_device()};
std::uniform_int_distribution<T> dist(low, high);
std::unordered_set<T> uniques;
while (uniques.size() < amount){
uniques.insert(dist(engine));
}
return uniques;
}
int main(){
//get 10 unique numbers between [0,100]
auto numbers = GetUniqueNumbers(10,0,100);
for (auto number: numbers){
std::cout << number << " ";
}
}
No, because any guarantee about the output of a random source makes it less random.
There are specific mathematical formulas that have the behavior known as a random permutation. This site seems to have quite a good write-up about it: http://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/
No, there is definitely no guarantee rand will not produce duplicate numbers, designing it in such a way would not only be expensive in terms of remembering all the numbers it has returned so far but will also reduce its randomness greatly (after it had returned many numbers you could guess what it is likely to return from what it had already returned so far).
If uniqueness is your only goal, just use an incrementing ID number for each thing. If the numbers must also be arbitrary and hard to guess you will have to use some kind of random generator or hash, but should make the numbers much longer to make the chance of a collision much closer to 0.
However if you absolutely must do it the current way I would suggest storing all the numbers you have generated so far into a std::unordered_map and generating another random number if it is already in it.
There is a common uniqueness guarantee in most PRNGs, but it won't help you here. A generator will typically iterate over a finite number of states and not visit the same state twice until every other state has been visited once.
However, a state is not the same thing as the number you get to see. Many states can map to the same number and in the worst possible case two consecutive states could map to the same number.
That said, there are specific configurations of PRNG that can visit every value in a range you specify exactly once before revisiting an old state. Notably, an LCG designed with a modulo that is a multiple of your range can be reduced to exactly your range with another modulo operation. Since most LCG implementations have a power-of-two period, this means that the low-order bits repeat with shorter periods. However, 10000 is not a power of two, so that won't help you.
A simple method is to use an LCG, bitmask it down to a power of two larger than your desired range, and just throw away results that it produces that are out of range.

std::uniform_real_distribution - get all possible numbers

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?
uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.

Reseeding std::rand / c++11 <random>

I am trying to implement a simple noise function that takes two integers and return a random float based on the seed combined with the two parameters.
using std::mt19937 works great, but for some reason when I try to use srand with rand(), I get repeated numbers..
Note: Using c++11 seed member function in a loop is really, really slow.
here are two terrains using both methods (with the same reseeding numbers):
c++11 random:
std::random_device rd;
std::mt19937 g{ rd() };
std::uniform_real_distribution<float> gen{ -1.0, 1.0 };
float getNoise(int x, int z) {
g.seed(x * 523234 + z * 128354 + seed);
return gen(g);
}
c random:
float getNoise(int x, int z) {
std::srand(x * 523234 + z * 128354 + seed);
return static_cast<float>(std::rand()) / RAND_MAX * 2.0f - 1.0f;
}
To the questions:
Is there a faster way to reseed the c++11 pseudo-random number ?
Why doesn't the srand work as expected?
Thanks in advance.
EDIT:
Ok sorry for not being clear, and, I know maybe I am wrong but let me try to explain again, I use reseeding because I use the same x and z coordinate's when iterating (not the same iteration).
If I remove the reseeding I will get this result:
I am trying to implement a simple noise function that takes two integers and return a random float based on the seed combined with the two parameters.
Please don't say I shouldn't reseeding, I want to reseed on purpose.
You are purposely breaking it and asking us why it is broken, with the caveat that we aren't allowed to mention the gorilla in the room.
Don't reseed.
[edit]
Alright, as per comment request, here's an answer:
1) No, there is no faster way to reseed a PRNG, which you shouldn't be doing anyway. Properly, you should be seeding and then “warming up” the PRNG by discarding a few thousand values.
2) The reason rand() (and, even though you don't believe it, any other PRNG you use) doesn't work is because your getNoise() function is incorrect.
Your third image is correct. It is the result you should expect from simply returning a clamped random value.
You have attempted to modulate it by messing with the seed and, because of an apparent visual goodness in your first attempt, concluded that it is the correct method. However, what is really happening is you are simply crippling the PRNG and seeing the result of that. (It is more clear in the rand() attempt because its seed more crudely defines the resulting sequence, which itself has a smaller period than the Mersenne Twister.)
(Attempting to modify it by skewing the (x,z) coordinate is also a red herring. It doesn't affect the actual randomness of the output.)
TL;DR
You're doing it wrong.
If you want to generate terrain maps, you should google around fractal terrain generation. In fact, here's a decent link for you: http://www.gameprogrammer.com/fractal.html
You will find that it takes a little more work to do it, but that the methods are very pleasingly simple and that you can very easily tweak them to modify your results.
Hope this helps.
The random number generator generates a sequence of random values from an initial seed, and is not meant to be used to generate single random values in function of a seed. So it should be initialized with g.seed(seed), and then be called in a fixed order for all (x, y) values, without reseeding each time. This will give random values efficiently, with the expected distribution.
For example:
std::random_device rd;
std::mt19937 g{ rd() };
std::uniform_real_distribution<float> gen{ -1.0, 1.0 };
constexpr std::size_t nx = 100;
constexpr std::size_t nz = 100;
float noise[nx][nz];
void generateNoise() {
g.seed(seed);
for(int x = 0; x < nx; ++x) for(int x = 0; x < nx; ++x)
noise[x][z] = gen(g);
return gen(g);
}
I don't see why you'd want to continuously re-seed - that seems pointless and slow. But that's not what you are asking, so...
rand produces very poor quality random numbers. Very low period and usually based on a linear congruential generator (not good). Also, the seed size is very small. Don't use it - <random> exists for a reason.
The way you seed using srand seems to depend very much on the x and z values you pass in and that you then multiply by large numbers which likely leeds to overflows and truncation when passing to srand, meaning that (due to the limited number of possible seed values) you'll be reusing the same seed(s) often.
Some relevant links you may want to visit:
http://en.cppreference.com/w/cpp/numeric/random
https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful

Generating number (0,1) using mersenne twister c++

I'm working on implementing R code into C++ so that it runs faster, but I am having difficulties implementing mersenne twister. I only wish to generate values between (0,1). Here is what I have that pertains to this question.
#include <random>
std::mt19937 generator (123);
std::cout << "Random value: " << generator() << std:: endl;
I tried dividing by RAND_MAX, but that did not produce the values that I was looking for.
Thanks in advance.
In C++11 the concepts of "(pseudo) random generator" and "probability distribution" are separated, and for good reasons.
What you want can be achieved with the following lines:
std::mt19937 generator (123);
std::uniform_real_distribution<double> dis(0.0, 1.0);
double randomRealBetweenZeroAndOne = dis(generator);
If you want to understand why this separation is necessary, and why using a standard division /range manipulation on the output of the generator is a bad idea, watch this video.
You may want to consider code like this:
// For pseudo-random number generators and distributions
#include <random>
...
// Use random_device to generate a seed for Mersenne twister engine.
std::random_device rd{};
// Use Mersenne twister engine to generate pseudo-random numbers.
std::mt19937 engine{rd()};
// "Filter" MT engine's output to generate pseudo-random double values,
// **uniformly distributed** on the closed interval [0, 1].
// (Note that the range is [inclusive, inclusive].)
std::uniform_real_distribution<double> dist{0.0, 1.0};
// Generate pseudo-random number.
double x = dist(engine);
For more details on generating pseudo-random numbers in C++ (including reasons why rand() is not good), see this video by Stephan T. Lavavej (from Going Native 2013):
rand() Considered Harmful
std::mt19937 does not generate between 0 and RAND_MAX like rand(), but between 0 and 2^32-1
And by the way, the class provides min() and max() values!
You need to convert the value to a double, substract min() and divide by max()-min()
uint32_t val;
val << generator;
double doubleval = ((double)val - generator::min())/(generator::max()-generator::min());
or (less generic)
uint32_t val;
val << generator;
double doubleval = (double)val * (1.0 / std::numeric_limits<std::uint32_t>::max());