I need to generate 16-bit pseudo-random integers and I am wondering what the best choice is.
The obvious way that comes in my mind is something as follows:
std::random_device rd;
auto seed_data = std::array<int, std::mt19937::state_size> {};
std::generate(std::begin(seed_data), std::end(seed_data), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 generator(seq);
std::uniform_int_distribution<short> dis(std::numeric_limits<short>::min(),
std::numeric_limits<short>::max());
short n = dis(generator);
The problem I see here is that std::mt19937 produces 32-bit unsigned integers since it's defined as this:
using mt19937 = mersenne_twister_engine<unsigned int,
32, 624, 397,
31, 0x9908b0df,
11, 0xffffffff,
7, 0x9d2c5680,
15, 0xefc60000,
18, 1812433253>;
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution. So I am wondering how good are these series of pseudo-random shorts and I don't have the mathematical expertise to answer that.
I expect that a better solution would be to use your own defined mersenne_twister_engine engine for 16-bit integers. However, I haven't found any mentioned set for the template arguments (requirements can be found here for instance). Are there any?
UPDATE: I updated the code sample with proper initialization for the distribution.
Your way is indeed the correct way.
The mathematical arguments are complex (I'll try to dig out a paper), but taking the least significant bits of the Mersenne Twister, as implemented by the C++ standard library, is the correct thing to do.
If you're in any doubt as to the quality of the sequence, then run it through the diehard tests.
There may be a misconception, considering this quote from OP's question (emphasis mine):
The problem I see here is that std::mt19937 produces 32-bit unsigned integers […].
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution.
That's not how it works.
The following are quotes from https://en.cppreference.com/w/cpp/numeric/random
The random number library provides classes that generate random and
pseudo-random numbers. These classes include:
Uniform random bit generators (URBGs), […];
Random number distributions (e.g. uniform, normal, or poisson distributions) which convert the output of URBGs into various statistical distributions
URBGs and distributions are designed to be used together to produce random values.
So a uniform random bit generator, like mt19937 or random_device
is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned.
While a random number distribution, like uniform_int_distribution
post-processes the output of a URBG in such a way that resulting output is distributed according to a defined statistical probability density function.
The way it's done uses all the bits from the source to produce an output. As an example, we can look at the implementation of std::uniform_distribution in libstdc++ (starting at line 824), which can be roughly simplified as
template <typename Type>
class uniform_distribution
{
Type a_ = 0, b_ = std::numeric_limits<Type>::max();
public:
uniform_distribution(Type a, Type b) : a_{a}, b_{b} {}
template<typename URBG>
Type operator() (URBG &gen)
{
using urbg_type = std::make_unsigned_t<typename URBG::result_type>;
using u_type = std::make_unsigned_t<Type>;
using max_type = std::conditional_t<(sizeof(urbg_type) > sizeof(u_type))
, urbg_type, u_type>;
urbg_type urbg_min = gen.min();
urbg_type urbg_max = gen.max();
urbg_type urbg_range = urbg_max - urbg_min;
max_type urange = b_ - a_;
max_type udenom = urbg_range <= urange ? 1 : urbg_range / (urange + 1);
Type ret;
// Note that the calculation may require more than one call to the generator
do
ret = (urbg_type(gen()) - urbg_min ) / udenom;
// which is 'ret = gen / 65535' with OP's parameters
// not a simple cast or bit shift
while (ret > b_ - a_);
return ret + a_;
}
};
This could be tested HERE.
Related
For unit tests I implemented a mock random number generator. I believe that this is a valid implementation of UniformBitGenerator (the mock actually uses google mock to set the return of operator(), but it behaves the same).
struct RNG
{
using result_type = size_t;
static result_type min() { return 0; }
static result_type max() { return std::numeric_limits<result_type>::max(); }
result_type operator()() { return max(); }
};
Now I use this mock to sample from std::uniform_int_distribution in the range [a, b], a == b. I believe this is allowed, the only restriction I have found here on the parameters of the distribution is b >= a. So I would expect the following program to print 5.
int main()
{
auto rng = RNG();
auto dist = std::uniform_int_distribution<>(5, 5);
printf("%d\n", dist(rng));
return 0;
}
Instead it goes into an infinite loop inside the STL, repeatedly drawing numbers from the generator but failing to find a number within the specified range. I tested different (current) compilers (including clang, gcc, icc) in different versions. RNG::max can return other values (e.g. 42) as well, doesn't change anything.
The real code I'm testing draws a random index into a container which may contain only one element. It would be easy to check this condition but it's a rare case and I would like to avoid it.
Am I missing something in the specification of RNGs in the STL? I'd be surprised to find a bug in ALL compilers ...
A uniform distribution is usually achieved with rejection sampling. You keep requesting random numbers until you get one that meets the criteria. You've set up a situation where the criteria can't be met, because your random number generator is very non-random, so it results in an infinite loop.
The standard says ([rand.dist.uni.int]):
A uniform_int_distribution random number distribution produces random integers i,
a ≤ i ≤ b, distributed according to the constant discrete probability function
P(i|a,b)=1/(b−a+1)
. . .
explicit uniform_int_distribution(IntType a = 0, IntType b = numeric_limits<IntType>::max());
Requires: a ≤ b.
So uniform_int_distribution<>(5,5) should return 5 with probability 1/1.
Implementations that go into an infinite loop instead, have a bug.
However, your mock RNG that always generates the same value, doesn't satisfy Uniform random bit generator requirements:
A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned. [ Note: The degree to which g's results approximate the ideal is often determined statistically. — end note ]
See [req.genl]/p1.b:
Throughout this subclause [rand], the effect of instantiating a template:
b) that has a template type parameter named URBG is undefined unless the corresponding template argument is cv-unqualified and satisfies the requirements of uniform random bit generator.
Sure enough, with a standard RNG it just works:
#include <iostream>
#include <random>
int main() {
std::mt19937_64 rng;
std::uniform_int_distribution<> dist(5, 5);
std::cout << dist(rng) << "\n";
}
Prints:
5
I'm just starting to use C++11's <random> header for the first time, but there are still some things that seem a bit mysterious. This question is about the intended, idiomatic, best-practice way to accomplish a very simple task.
Currently, in one part of my code I have something like this:
std::default_random_engine eng {std::random_device{}()};
std::uniform_int_distribution<> random_up_to_A {0, A};
std::uniform_int_distribution<> random_up_to_B {0, B};
std::uniform_int_distribution<> random_up_to_some_other_constant {0, some_other_constant};
and then when I want an integer between 0 and B I call random_up_to_B(eng).
Since this is starting to look a bit silly, I want to implement a function rnd such that rnd(n, eng) returns a random integer between 0 and n.
Something like the following ought to work
template <class URNG>
int rnd(int n, URNG &eng) {
std::uniform_int_distribution<> dist {0, n};
return dist(eng);
}
but that involves creating a new distribution object every time, and I get the impression that's not the way you're supposed to do it.
So my question is, what is the intended, best-practice way to accomplish this simple task, using the abstractions provided by the <random> header? I ask because I'm bound to want to do much more complicated things than this later on, and I want to make sure I'm using this system in the right way.
uniform_int_distribution should not be expensive to construct, so creating one every time with new limits should be OK. However, there is a way to use the same object with new limits, but it is cumbersome.
uniform_int_distribution::operator() has an overload that takes a uniform_int_distribution::param_type object which can specify the new limits to be used, but param_type itself is an opaque type, and there's no portable way to construct one except extracting it from an existing uniform_int_distribution instance. For instance, the following function can be used to construct a uniform_int_distribution::param_type.
std::uniform_int_distribution<>::param_type
make_param_type(int min, int max)
{
return std::uniform_int_distribution<>(min, max).param();
}
Pass these to operator() and the generated result will be in the specified range.
Live demo
So if you really want to reuse the same uniform_int_distribution, create and save multiple instance of param_type using the function above, and use these when calling operator().
The answer above is inaccurate, because the standard does specify that the param_type can be constructed from the same distribution arguments as those used by the corresponding distribution type's constructor. Thanks to #T.C. for pointing this out.
From §26.5.1.6/9 [rand.req.dist]
For each of the constructors of D taking arguments corresponding to parameters of the distribution, P shall have a corresponding constructor subject to the same requirements and taking arguments identical in number, type, and default values. ...
So we don't need to construct the distribution object needlessly only to extract the param_type. Instead the make_param_type function can be modified to
template <typename Distribution, typename... Args>
typename Distribution::param_type make_param_type(Args&&... args)
{
return typename Distribution::param_type(std::forward<Args>(args)...);
}
which can be used as
make_param_type<std::uniform_int_distribution<>>(0, 10)
Live demo
Answering my own question: by adapting an example found in this document, the following appears to be the correct way to implement a function returning a random integer between 0 and n-1 inclusive:
template<class URNG>
int rnd(int n, URNG &engine) {
using dist_t = std::uniform_int_distribution<>;
using param_t = dist_t::param_type;
static dist_t dist;
param_t params{0,n-1};
return dist(engine, params);
}
To make it thread-safe one must avoid the static declaration. One possibility is to make a convenience class along these lines, which is what I'm using in my own code:
template<class URNG>
class Random {
public:
Random(): engine(std::random_device{}()) {}
Random(typename std::result_of<URNG()>::type seed): engine(seed) {}
int integer(int n) {
std::uniform_int_distribution<>::param_type params {0, n-1};
return int_dist(engine, params);
}
private:
URNG engine;
std::uniform_int_distribution<> int_dist;
};
This is instantiated with (for example) Random<std::default_random_engine> rnd, and the random integers can then be obtained with rnd.integer(n). Methods for sampling from other distributions can easily be added to this class.
To repeat what I said in the comments, reusing the distribution object is probably unnecessary for the specific task of uniformly sampling integers, but for other distributions I think this will be more efficient than creating it every time, because there are some algorithms for sampling from some distributions that can save CPU cycles by generating multiple values simultaneously. (In principle even uniform_int_distribution could do this, via SIMD vectorisation.) If you can't increase efficiency by retaining the distribution object then it's hard to imagine why they would have designed the API this way.
Hooray for C++ and its needless complexity! This concludes an afternoon's work accomplishing a simple five-minute task, but at least I have a much better idea what I'm doing now.
The idiomatic way to generate code according to varying parameters is to create distribution objects as needed, per Vary range of uniform_int_distribution:
std::random_device rd;
std::default_random_engine eng{rd()};
int n = std::uniform_int_distribution<>{0, A}(eng);
If you are concerned that performance may be hindered by failing to fully exploit the distribution's internal state, you can use a single distribution and pass it different parameters each time:
std::random_device rd;
std::default_random_engine eng{rd()};
std::uniform_int_distribution<> dist;
int n = dist(eng, decltype(dist)::param_type{0, A});
If this seems complicated, consider that for most purposes you will generate random numbers according to the same distribution with the same parameters (hence the distribution constructor taking parameters); by varying parameters you are already entering into advanced territory.
I was surprised to see that the output of this program:
#include <iostream>
#include <random>
int main()
{
std::mt19937 rng1;
std::mt19937 rng2;
std::uniform_real_distribution<double> dist;
double random = dist(rng1);
rng2.discard(2);
std::cout << (rng1() - rng2()) << "\n";
return 0;
}
is 0 - i.e. std::uniform_real_distribution uses two random numbers to produce a random double value in the range [0,1). I thought it would just generate one and rescale that. After thinking about it I guess that this is because std::mt19937 produces 32-bit ints and double is twice this size and thus not "random enough".
Question: How do I find out this number generically, i.e. if the random number generator and the floating point type are arbitrary types?
Edit: I just noticed that I could use std::generate_canonical instead, as I am only interested in random numbers of [0,1). Not sure if this makes a difference.
For template<class RealType, size_t bits, class URNG> std::generate_canonical the standard (section 27.5.7.2) explicitly defines the number of calls to the uniform random number generator (URNG) to be
max(1, b / log_2 R),
where b is the minimum of the number of bits in the mantissa of the RealType and the number of bits given to generate_canonical as template parameter.
R is the range of numbers the URNG can return (URNG::max()-URNG::min()+1).
However, in your example this will not make any difference, since you need 2 calls to the mt19937 to fill the 53 bits of the mantissa of the double.
For other distributions the standard does not provide a generic way to get any information on how many numbers the URNG has to generate to obtain one number of the distribution.
A reason might be that for some distributions the number uniform random numbers required to generate a single number of the distribution is not fixed and may vary from call to call. An example is the std::poisson_distribution, which is usually implemented as a loop which draws a uniform random number in each iteration until the product of these numbers has reached a certain threshold (see for example the implementation of the GNU C++ library (line 1523-1528)).
It seems that one can use the following code to produce random numbers from a particular Normal distribution:
float mean = 0, variance = 1;
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise(mean, variance);
variate_generator<mt19937, normal_distribution<float> > nD(randgen, noise);
float random = nD();
This works fine, however, I would like to be able to draw numbers from several distributions, i.e. one would think something like:
float mean1 = 0, variance1 = 1, mean2 = 10, variance2 = 0.25;
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise1(mean1, variance1);
boost::normal_distribution<float> noise2(mean2, variance2);
variate_generator<mt19937, normal_distribution<float> > nD(randgen, noise1);
variate_generator<mt19937, normal_distribution<float> > nC(randgen, noise2);
float random1 = nD();
float random2 = nC();
However, the problem appears to be that nD() and nC() are generating similar sequences of numbers. I hypothesize this is because the constructor for variate_generator appears to make a copy of randgen, not use it explicitly. Thus, the same psuedo-random sequence is being generated and simply pushed through different transformations (due to the different parameters of the distributions).
Does anyone know if there is a way, in Boost, to create a single random number generator and use it for multiple distributions? Alternatively, does the design of the Boost random library intend users to create one random number generator per distribution? Obviously, I could write code to transform a sequence of uniform random numbers to a sequence from an arbitrary distribution, but I'm looking for something simple and already built-in to the library.
Thanks in advance for your help.
Your hypothesis is correct. You want both variate_generator instances to use the same random number generator instance. So use a reference to mt19937 as your template parameter.
variate_generator<mt19937 &, normal_distribution<float> > nD(randgen, noise1);
variate_generator<mt19937 &, normal_distribution<float> > nC(randgen, noise2);
Obviously you'll have to ensure randgen does not go out of scope before nD and nC do.
I am looking for a pseudo random number generator which would be specialized to work fast when it is given a seed before generating each number. Most generators I have seen so far assume you set seed once and then generate a long sequence of numbers. The only thing which looks somewhat similar to I have seen so far is Perlin Noise, but it generates too "smooth" data - for similar inputs it tends to produce similar results.
The declaration of the generator should look something like:
int RandomNumber1(int seed);
Or:
int RandomNumber3(int seedX, int seedY, int seedZ);
I think having good RandomNumber1 should be enough, as it is possible to implement RandomNumber3 by hashing its inputs and passing the result into the RandomNumber1, but I wrote the 2nd prototype in case some implementation could use the independent inputs.
The intended use for this generator is to use it for procedural content generator, like generating a forest by placing trees in a grid and determining a random tree species and random spatial offsets for each location.
The generator needs to be very efficient (below 500 CPU cycles), because the procedural content is created in huge quantities in real time during rendering.
Seems like you're asking for a hash-function rather than a PRNG. Googling 'fast hash function' yields several promising-looking results.
For example:
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
Edit: Yep, some hash functions definitely look more suitable than others.
For your purposes, it should be sufficient to eyeball thefunction and check that a single-bit change in the input will propagate to lots of output bits.
Yep, you are looking for a fast integer hash algorithm rather than a PRNG.
This page has a few algorithms, I'm sure you'll find plenty more now you know the correct search terms.
Edit: The original page has been removed, a live version can be found on GitHub.
Here's a small random number generator developed by George Marsaglia. He's an expert in the field, so you can be confident the generator has good statistical properties.
v = 36969*(v & 65535) + (v >> 16);
u = 18000*(u & 65535) + (u >> 16);
return (v << 16) + (u & 65535);
Here u and v are unsigned ints. Initialize them to any non-zero values. Each time you generate a random number, store u and v somewhere. You could wrap this in a function to match your signature above (except the ints are unsigned.)
see std::tr1::ranlux3, or other random number generators that are part of TR1 additions to the standard C++ library. I suggested mt19937 initialially, but then saw your note that it needs to be very fast. TR1 is should be available on Microsoft VC++ and GCC, and can also be found in the boost libraries which support even more compilers.
example adapted from boost documentation:
#include <random>
#include <iostream>
#include <iterator>
#include <functional>
#include <algorithm>
#include <ctime>
using namespace std;
using namespace std::tr1;
int main(){
random_device trueRand;
ranlux3 rng(trueRand); // produces randomness out of thin air
// see pseudo-random number generators
uniform_int<> six(1,6); // distribution that maps to 1..6
// see random number distributions
variate_generator<ranlux3&, uniform_int<> >
die(rng, six); // glues randomness with mapping
// simulate rolling a die
generate_n( ostream_iterator<int>(cout, " "), 10, ref(die));
}
example output:
2 4 4 2 4 5 4 3 6 2
Any TR1 random number generator can seed any other random number generator. If you need higher quality results, consider feeding the output of mt19937 (which is slower, but higher quality) into a minstd_rand or randlux3, which are faster generators.
If memory is not really an issue and speed is of utmost importance then you can prebuild a large array of random numbers and just iterate through it at runtime. For example have a seperate program generate 100,000 random numbers and save it as it's own file like
unsigned int randarray []={1,2,3,....}
then include that file into your compile and at runtime your random number function only needs to pull numbers from that array and loop back to the start when it hits the end.
I use the following code in my Java random number library - this has worked pretty well for me. I also use this for generating procedural content.
/**
* State for random number generation
*/
private static volatile long state=xorShift64(System.nanoTime()|0xCAFEBABE);
/**
* Gets a long random value
* #return Random long value based on static state
*/
public static long nextLong() {
long a=state;
state = xorShift64(a);
return a;
}
/**
* XORShift algorithm - credit to George Marsaglia!
* #param a initial state
* #return new state
*/
public static final long xorShift64(long a) {
a ^= (a << 21);
a ^= (a >>> 35);
a ^= (a << 4);
return a;
}