uniform_int_distribution with zero range goes to infinite loop

uniform_int_distribution with zero range goes to infinite loop - c++

For unit tests I implemented a mock random number generator. I believe that this is a valid implementation of UniformBitGenerator (the mock actually uses google mock to set the return of operator(), but it behaves the same).
struct RNG
{
using result_type = size_t;
static result_type min() { return 0; }
static result_type max() { return std::numeric_limits<result_type>::max(); }
result_type operator()() { return max(); }
};
Now I use this mock to sample from std::uniform_int_distribution in the range [a, b], a == b. I believe this is allowed, the only restriction I have found here on the parameters of the distribution is b >= a. So I would expect the following program to print 5.
int main()
{
auto rng = RNG();
auto dist = std::uniform_int_distribution<>(5, 5);
printf("%d\n", dist(rng));
return 0;
}
Instead it goes into an infinite loop inside the STL, repeatedly drawing numbers from the generator but failing to find a number within the specified range. I tested different (current) compilers (including clang, gcc, icc) in different versions. RNG::max can return other values (e.g. 42) as well, doesn't change anything.
The real code I'm testing draws a random index into a container which may contain only one element. It would be easy to check this condition but it's a rare case and I would like to avoid it.
Am I missing something in the specification of RNGs in the STL? I'd be surprised to find a bug in ALL compilers ...

A uniform distribution is usually achieved with rejection sampling. You keep requesting random numbers until you get one that meets the criteria. You've set up a situation where the criteria can't be met, because your random number generator is very non-random, so it results in an infinite loop.

The standard says ([rand.dist.uni.int]):
A uniform_int_distribution random number distribution produces random integers i,
a ≤ i ≤ b, distributed according to the constant discrete probability function
  P(i|a,b)=1/(b−a+1)
. . .
explicit uniform_int_distribution(IntType a = 0, IntType b = numeric_limits<IntType>::max());
  Requires: a ≤ b.
So uniform_int_distribution<>(5,5) should return 5 with probability 1/1.
Implementations that go into an infinite loop instead, have a bug.
However, your mock RNG that always generates the same value, doesn't satisfy Uniform random bit generator requirements:
A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned. [ Note: The degree to which g's results approximate the ideal is often determined statistically.  — end note ]
See [req.genl]/p1.b:
Throughout this subclause [rand], the effect of instantiating a template:
b) that has a template type parameter named URBG is undefined unless the corresponding template argument is cv-unqualified and satisfies the requirements of uniform random bit generator.
Sure enough, with a standard RNG it just works:
#include <iostream>
#include <random>
int main() {
std::mt19937_64 rng;
std::uniform_int_distribution<> dist(5, 5);
std::cout << dist(rng) << "\n";
}
Prints:
5

Related

How to correctly implement a function that will generate pseudo-random integers with C++20

I want to note that in C++ the generation of pseudo random numbers is overcomplicated. If you remember about old languages like Pascal, then they had the function Random(n), where n is integer and the generation range is from 0 to n-1. Now, going back to modern C++, I want to get a similar interface, but with a function random_int(a,b), which generates numbers in the [a,b].
Consider the following example:
#include <random>
namespace utils
{
namespace implementation_details
{
struct eng_wrap {
std::mt19937 engine;
eng_wrap()
{
std::random_device device;
engine.seed(device());
}
std::mt19937& operator()()
{
return engine;
}
};
eng_wrap rnd_eng;
}
template <typename int_t, int_t a, int_t b> int_t random_int()
{
static_assert(a <= b);
static std::uniform_int_distribution<int_t> distr(a, b);
return distr(implementation_details::rnd_eng());
}
}
You can see that the distr is marked with the static keyword. Due to this, repeated calls with the same arguments will not cause the construction of the type std::uniform_int_distribution.
In some cases, at the compilation time we do not know the generation boundaries.
Therefore, we have to rewrite this function:
template <typename int_t> int_t random_int2(int_t a, int_t b)
{
std::uniform_int_distribution<int_t> distr(a, b);
return distr(implementation_details::rnd_eng());
}
Next, suppose the second version of this function is called more times:
int a, b;
std::cin>>a>>b;
for (int i=1;i!=1000000;++i)
std::cout<<utils::random_int2(a,b)<<' ';
Question
What is the cost of creating std::uniform_int_distribution in each
iteration of the loop?
Can you suggest a more optimized function that returns a pseudo-random number in the passed range for a normal desktop application?

If you want to use the same a and b repeatedly, use a class with a member function—that’s what they’re for. If you don’t want to expose your rnd_eng (choosing instead to preclude useful multithreaded clients), write the class to use it:
template<class T>
struct random_int {
random_int(T a,T b) : d(a,b) {}
T operator()() const {return d(implementation_details::rnd_eng());}
private:
std::uniform_int_distribution<T> d;
};

IMO, for most simple programs such as games, graphics, and Monte Carlo simulations, the API you actually want is
static xoshiro256ss g;
// Generate a random number between 0 and n-1.
// For example, randint0(2) flips a coin; randint0(6) rolls a die.
int randint0(int n) {
return g() % n;
}
// This version is useful for games like NetHack, where you often
// want to express an ad-hoc percentage chance of something happening.
bool pct(int n) {
return randint0(100) < n;
}
(or substitute std::mt19937 for xoshiro256ss but be aware you're trading away performance in exchange for... something. :))
The % n above is mathematically dubious, when n is astronomically large (e.g. if you're rolling a 12297829382473034410-sided die, you'll find that values between 0 and 6148914691236517205 come up twice as often as they should). So you may prefer to use C++11's uniform_int_distribution:
int randint0(int n) {
return std::uniform_int_distribution<int>(0, n-1)(g);
}
However, again be aware you're gaining mathematical perfection at the cost of raw speed. uniform_int_distribution is more for when you don't already trust your random number engine to be sane (e.g. if the engine's output range might be 0 to 255 but you want to generate numbers from 1 to 1000), or when you're writing template code to work with any arbitrary integer distribution (e.g. binomial_distribution, geometric_distribution) and need a uniform distribution object of that same general "shape" to plug into your template.
The answer to your question #1 is "The cost is free." You will not gain anything by stashing the result of std::uniform_int_distribution<int>(0, n-1) into a static variable. A distribution object is very small, trivially copyable, and basically free to construct. In fact, the cost of constructing the uniform_int_distribution in this case is orders of magnitude cheaper than the cost of thread-safe static initialization.
(There are special cases such as std::normal_distribution where not-stashing the distribution object between calls can result in your doing twice as much work as needed; but uniform_int_distribution is not one of those cases.)

Generating pseudo-random 16-bit integers

I need to generate 16-bit pseudo-random integers and I am wondering what the best choice is.
The obvious way that comes in my mind is something as follows:
std::random_device rd;
auto seed_data = std::array<int, std::mt19937::state_size> {};
std::generate(std::begin(seed_data), std::end(seed_data), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 generator(seq);
std::uniform_int_distribution<short> dis(std::numeric_limits<short>::min(),
std::numeric_limits<short>::max());
short n = dis(generator);
The problem I see here is that std::mt19937 produces 32-bit unsigned integers since it's defined as this:
using mt19937 = mersenne_twister_engine<unsigned int,
32, 624, 397,
31, 0x9908b0df,
11, 0xffffffff,
7, 0x9d2c5680,
15, 0xefc60000,
18, 1812433253>;
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution. So I am wondering how good are these series of pseudo-random shorts and I don't have the mathematical expertise to answer that.
I expect that a better solution would be to use your own defined mersenne_twister_engine engine for 16-bit integers. However, I haven't found any mentioned set for the template arguments (requirements can be found here for instance). Are there any?
UPDATE: I updated the code sample with proper initialization for the distribution.

Your way is indeed the correct way.
The mathematical arguments are complex (I'll try to dig out a paper), but taking the least significant bits of the Mersenne Twister, as implemented by the C++ standard library, is the correct thing to do.
If you're in any doubt as to the quality of the sequence, then run it through the diehard tests.

There may be a misconception, considering this quote from OP's question (emphasis mine):
The problem I see here is that std::mt19937 produces 32-bit unsigned integers […].
That means static casting is done and only the least significant part of these 32-bit integers is used by the distribution.
That's not how it works.
The following are quotes from https://en.cppreference.com/w/cpp/numeric/random
The random number library provides classes that generate random and
pseudo-random numbers. These classes include:
Uniform random bit generators (URBGs), […];
Random number distributions (e.g. uniform, normal, or poisson distributions) which convert the output of URBGs into various statistical distributions
URBGs and distributions are designed to be used together to produce random values.
So a uniform random bit generator, like mt19937 or random_device
is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned.
While a random number distribution, like uniform_int_distribution
post-processes the output of a URBG in such a way that resulting output is distributed according to a defined statistical probability density function.
The way it's done uses all the bits from the source to produce an output. As an example, we can look at the implementation of std::uniform_distribution in libstdc++ (starting at line 824), which can be roughly simplified as
template <typename Type>
class uniform_distribution
{
Type a_ = 0, b_ = std::numeric_limits<Type>::max();
public:
uniform_distribution(Type a, Type b) : a_{a}, b_{b} {}
template<typename URBG>
Type operator() (URBG &gen)
{
using urbg_type = std::make_unsigned_t<typename URBG::result_type>;
using u_type = std::make_unsigned_t<Type>;
using max_type = std::conditional_t<(sizeof(urbg_type) > sizeof(u_type))
, urbg_type, u_type>;
urbg_type urbg_min = gen.min();
urbg_type urbg_max = gen.max();
urbg_type urbg_range = urbg_max - urbg_min;
max_type urange = b_ - a_;
max_type udenom = urbg_range <= urange ? 1 : urbg_range / (urange + 1);
Type ret;
// Note that the calculation may require more than one call to the generator
do
ret = (urbg_type(gen()) - urbg_min ) / udenom;
// which is 'ret = gen / 65535' with OP's parameters
// not a simple cast or bit shift
while (ret > b_ - a_);
return ret + a_;
}
};
This could be tested HERE.

std::uniform_real_distribution - get all possible numbers

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?

uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.

Encapsulate c++ Random Number Generator

I'm building something that requires me to
template<D>
class DistributionAdapter {
public:
/**
* #return number generated by the distribution function.
*/
virtual D operator()(RANDOM_NUMBER_GENERATOR& rng) = 0;
};
RANDOM_NUMBER_GENERATOR is supposed to represent the class of random number generator in c++, either std::random_device or a pseudo random number generator. Can someone tell me how should I approach this, I don't know if random number generator in c++ have a common base type

Section § 26.5.1.3 of the standard describes the requirements for random number generators.
In particular, a generator must support the function call operator :
g() T Returns a value in the closed interval [ G::min() , G::max() ] .
amortized constant
So, although there is no base class shared by every single generator, the standard guarantees that the operator() will be present in each of them : you can call rng() in your function.

It's not entirely clear what you're asking, but here is a fairly easy to use function returning uniformly distributed random integers within a particular range.
#include <random>
// random number generator from Stroustrup:
// http://www.stroustrup.com/C++11FAQ.html#std-random
int rand_int(int low, int high)
{
static std::default_random_engine re {};
using Dist = std::uniform_int_distribution<int>;
static Dist uid {};
return uid(re, Dist::param_type{low,high});
}

Get random number in sequence C++

Is there a way using the C++ standard library built in random generator to get a specific random number in a sequence, without saving them all?
Like
srand(cTime);
getRand(1); // 10
getRand(2); // 8995
getRand(3); // 65464456
getRand(1); // 10
getRand(2); // 8995
getRand(1); // 10
getRand(3); // 65464456

C++11 random number engines are required to implement a member function discard(unsigned long long z) (§26.5.1.4) that advances the random number sequence by z steps. The complexity guarantee is quite weak: "no worse than
the complexity of z consecutive calls e()". This member obviously exists solely to make it possible to expose more performant implementations when possible as note 274 states:
This operation is common in user code, and can often be implemented
in an engine-specific manner so as to provide significant performance
improvements over an equivalent naive loop that makes z consecutive
calls e().
Given discard you can easily implement your requirement to retrieve the nth number in sequence by reseeding a generator, discarding n-1 values and using the next generated value.
I'm unaware of which - if any - of the standard RNG engines are amenable to efficient implementations of discard. It may be worth your time to do a bit of investigation and profiling.

You have to save the numbers. There may be other variants, but it still requires saving a list of numbers (e.g. using different seeds based on the argument to getRand() - but that wouldn't really be beneficial over saving them).
Something like this would work reasonably well, I'd say:
int getRand(int n)
{
static std::map<int, int> mrand;
// Check if it's there.
if ((std::map::iterator it = mrand.find(n)) != mrand.end())
{
return it->second;
}
int r = rand();
mrand[n] = r;
return r;
}
(I haven't compiled this code, just written it up as a "this sort of thing might work")

Implement getRand() to always seed and then return the given number. This will interfere with all other random numbers in a system, though, and will be slow, especially for large indexes. Assuming a 1-based index:
int getRand(int index)
{
srand(999); // fix the seed
for (int loop=1; loop<index; ++loop)
rand();
return rand();
}

Similar to cdmh's post,
Following from C++11 could also be used :
#include<random>
long getrand(int index)
{
std::default_random_engine e;
for(auto i=1;i<index;i++)
e();
return e();
}

Check out:
Random123
From the documentation:
Random123 is a library of "counter-based" random number generators (CBRNGs), in which the Nth random number can be obtained by applying a stateless mixing function to N..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

uniform_int_distribution with zero range goes to infinite loop - c++

A uniform distribution is usually achieved with rejection sampling. You keep requesting random numbers until you get one that meets the criteria. You've set up a situation where the criteria can't be met, because your random number generator is very non-random, so it results in an infinite loop.

Related

How to correctly implement a function that will generate pseudo-random integers with C++20

Generating pseudo-random 16-bit integers

std::uniform_real_distribution - get all possible numbers

Encapsulate c++ Random Number Generator

Get random number in sequence C++

Categories

Resources