std::uniform_real_distribution - get all possible numbers - c++

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?

uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.

Related

uniform_int_distribution with zero range goes to infinite loop

For unit tests I implemented a mock random number generator. I believe that this is a valid implementation of UniformBitGenerator (the mock actually uses google mock to set the return of operator(), but it behaves the same).
struct RNG
{
using result_type = size_t;
static result_type min() { return 0; }
static result_type max() { return std::numeric_limits<result_type>::max(); }
result_type operator()() { return max(); }
};
Now I use this mock to sample from std::uniform_int_distribution in the range [a, b], a == b. I believe this is allowed, the only restriction I have found here on the parameters of the distribution is b >= a. So I would expect the following program to print 5.
int main()
{
auto rng = RNG();
auto dist = std::uniform_int_distribution<>(5, 5);
printf("%d\n", dist(rng));
return 0;
}
Instead it goes into an infinite loop inside the STL, repeatedly drawing numbers from the generator but failing to find a number within the specified range. I tested different (current) compilers (including clang, gcc, icc) in different versions. RNG::max can return other values (e.g. 42) as well, doesn't change anything.
The real code I'm testing draws a random index into a container which may contain only one element. It would be easy to check this condition but it's a rare case and I would like to avoid it.
Am I missing something in the specification of RNGs in the STL? I'd be surprised to find a bug in ALL compilers ...
A uniform distribution is usually achieved with rejection sampling. You keep requesting random numbers until you get one that meets the criteria. You've set up a situation where the criteria can't be met, because your random number generator is very non-random, so it results in an infinite loop.
The standard says ([rand.dist.uni.int]):
A uniform_­int_­distribution random number distribution produces random integers i,
a ≤ i ≤ b, distributed according to the constant discrete probability function
  P(i|a,b)=1/(b−a+1)
. . .
explicit uniform_int_distribution(IntType a = 0, IntType b = numeric_limits<IntType>::max());
  Requires: a ≤ b.
So uniform_int_distribution<>(5,5) should return 5 with probability 1/1.
Implementations that go into an infinite loop instead, have a bug.
However, your mock RNG that always generates the same value, doesn't satisfy Uniform random bit generator requirements:
A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned. [ Note: The degree to which g's results approximate the ideal is often determined statistically.  — end note ]
See [req.genl]/p1.b:
Throughout this subclause [rand], the effect of instantiating a template:
b) that has a template type parameter named URBG is undefined unless the corresponding template argument is cv-unqualified and satisfies the requirements of uniform random bit generator.
Sure enough, with a standard RNG it just works:
#include <iostream>
#include <random>
int main() {
std::mt19937_64 rng;
std::uniform_int_distribution<> dist(5, 5);
std::cout << dist(rng) << "\n";
}
Prints:
5

Given (a, b) compute the maximum value of k such that a^{1/k} and b^{1/k} are whole numbers

I'm writing a program that tries to find the minimum value of k > 1 such that the kth root of a and b (which are both given) equals a whole number.
Here's a snippet of my code, which I've commented for clarification.
int main()
{
// Declare the variables a and b.
double a;
double b;
// Read in variables a and b.
while (cin >> a >> b) {
int k = 2;
// We require the kth root of a and b to both be whole numbers.
// "while a^{1/k} and b^{1/k} are not both whole numbers..."
while ((fmod(pow(a, 1.0/k), 1) != 1.0) || (fmod(pow(b, 1.0/k), 1) != 0)) {
k++;
}
Pretty much, I read in (a, b), and I start from k = 2 and increment k until the kth roots of a and b are both congruent to 0 mod 1 (meaning that they are divisible by 1 and thus whole numbers).
But, the loop runs infinitely. I've tried researching, and I think it might have to do with precision error; however, I'm not too sure.
Another approach I've tried is changing the loop condition to check whether the floor of a^{1/k} equals a^{1/k} itself. But again, this runs infinitely, likely due to precision error.
Does anyone know how I can fix this issue?
EDIT: for example, when (a, b) = (216, 125), I want to have k = 3 because 216^(1/3) and 125^(1/3) are both integers (namely, 5 and 6).
That is not a programming problem but a mathematical one:
if a is a real, and k a positive integer, and if a^(1./k) is an integer, then a is an integer. (otherwise the aim is to toy with approximation error)
So the fastest approach may be to first check if a and b are integer, then do a prime decomposition such that a=p0e0 * p1e1 * ..., where pi are distinct primes.
Notice that, for a1/k to be an integer, each ei must also be divisible by k. In other words, k must be a common divisor of the ei. The same must be true for the prime powers of b if b1/k is to be an integer.
Thus the largest k is the greatest common divisor of all ei of both a and b.
With your approach you will have problem with large numbers. All IIEEE 754 binary64 floating points (the case of double on x86) have 53 significant bits. That means that all double larger than 253 are integer.
The function pow(x,1./k) will result in the same value for two different x, so that with your approach you will necessary have false answer, for example the numbers 55*290 and 35*2120 are exactly representable with double. The result of the algorithm is k=5. You may find this value of k with these number but you will also find k=5 for 55*290-249 and 35*2120, because pow(55*290-249,1./5)==pow(55*290). Demo here
On the other hand, as there are only 53 significant bits, prime number decomposition of double is trivial.
Floating numbers are not mathematical real numbers. The computation is "approximate". See http://floating-point-gui.de/
You could replace the test fmod(pow(a, 1.0/k), 1) != 1.0 with something like fabs(fmod(pow(a, 1.0/k), 1) - 1.0) > 0.0000001 (and play with various such 𝛆 instead of 0.0000001; see also std::numeric_limits::epsilon but use it carefully, since pow might give some error in its computations, and 1.0/k also inject imprecisions - details are very complex, dive into IEEE754 specifications).
Of course, you could (and probably should) define your bool almost_equal(double x, double y) function (and use it instead of ==, and use its negation instead of !=).
As a rule of thumb, never test floating numbers for equality (i.e. ==), but consider instead some small enough distance between them; that is, replace a test like x == y (respectively x != y) with something like fabs(x-y) < EPSILON (respectively fabs(x-y) > EPSILON) where EPSILON is a small positive number, hence testing for a small L1 distance (for equality, and a large enough distance for inequality).
And avoid floating point in integer problems.
Actually, predicting or estimating floating point accuracy is very difficult. You might want to consider tools like CADNA. My colleague Franck Védrine is an expert on static program analyzers to estimate numerical errors (see e.g. his TERATEC 2017 presentation on Fluctuat). It is a difficult research topic, see also D.Monniaux's paper the pitfalls of verifying floating-point computations etc.
And floating point errors did in some cases cost human lives (or loss of billions of dollars). Search the web for details. There are some cases where all the digits of a computed number are wrong (because the errors may accumulate, and the final result was obtained by combining thousands of operations)! There is some indirect relationship with chaos theory, because many programs might have some numerical instability.
As others have mentioned, comparing floating point values for equality is problematic. If you find a way to work directly with integers, you can avoid this problem. One way to do so is to raise integers to the k power instead of taking the kth root. The details are left as an exercise for the reader.

What is an efficient method to force uniqueness using rand();

If I used (with appropriate #includes)
int main()
{
srand(time(0));
int arr[1000];
for(int i = 0; i < 1000; i++)
{
arr[i] = rand() % 100000;
}
return 0;
}
To generate random 5-digit ID numbers (disregard iomanip stuff here), would those ID numbers be guranteed by rand() to be unique? I've been running another loop to check all the values of the array vs the recently generated ID number but it takes forever to run, considering the nested 1000 iteration loops. By the way is there a simple way to do that check?
Since the question was tagged c++11,
you should consider using <random> in place of rand().
Using a standard distribution engine, you can't guarantee that you will get back unique values. If you use a std::set, you can keep retrying until you have the right amount. Depending on your distribution range, and the amount of unique values you are requesting, that may be adequate.
For example, here is a customized function to get n unique values from range [x,y].
#include <unordered_set>
#include <iostream>
#include <random>
template <typename T>
std::unordered_set<T> GetUniqueNumbers(int amount, T low, T high){
static std::random_device random_device;
static std::mt19937 engine{random_device()};
std::uniform_int_distribution<T> dist(low, high);
std::unordered_set<T> uniques;
while (uniques.size() < amount){
uniques.insert(dist(engine));
}
return uniques;
}
int main(){
//get 10 unique numbers between [0,100]
auto numbers = GetUniqueNumbers(10,0,100);
for (auto number: numbers){
std::cout << number << " ";
}
}
No, because any guarantee about the output of a random source makes it less random.
There are specific mathematical formulas that have the behavior known as a random permutation. This site seems to have quite a good write-up about it: http://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/
No, there is definitely no guarantee rand will not produce duplicate numbers, designing it in such a way would not only be expensive in terms of remembering all the numbers it has returned so far but will also reduce its randomness greatly (after it had returned many numbers you could guess what it is likely to return from what it had already returned so far).
If uniqueness is your only goal, just use an incrementing ID number for each thing. If the numbers must also be arbitrary and hard to guess you will have to use some kind of random generator or hash, but should make the numbers much longer to make the chance of a collision much closer to 0.
However if you absolutely must do it the current way I would suggest storing all the numbers you have generated so far into a std::unordered_map and generating another random number if it is already in it.
There is a common uniqueness guarantee in most PRNGs, but it won't help you here. A generator will typically iterate over a finite number of states and not visit the same state twice until every other state has been visited once.
However, a state is not the same thing as the number you get to see. Many states can map to the same number and in the worst possible case two consecutive states could map to the same number.
That said, there are specific configurations of PRNG that can visit every value in a range you specify exactly once before revisiting an old state. Notably, an LCG designed with a modulo that is a multiple of your range can be reduced to exactly your range with another modulo operation. Since most LCG implementations have a power-of-two period, this means that the low-order bits repeat with shorter periods. However, 10000 is not a power of two, so that won't help you.
A simple method is to use an LCG, bitmask it down to a power of two larger than your desired range, and just throw away results that it produces that are out of range.

Get true or false with a given probability

I'm trying to write a function in c++ that will return true or false based on a probability given. So, for example if the probability given was 0.634 then, 63.4% of the time the function would return true. I've tried a few different things, and failed. Any help?
If you'd like to do this in C++11, you can use its various random number engines, combined with the uniform_real_distribution to provide a good result. The following code demonstrates:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::uniform_real_distribution<> uniform_zero_to_one(0.0, 1.0);
bool random_bool_with_prob( double prob ) // probability between 0.0 and 1.0
{
return uniform_zero_to_one(rand_engine) >= prob;
}
Alternately, you can use the bernoulli_distribution, which directly gives you a bool with the specified probability. The probability it takes is the probability of returning true, so it is exactly what you need:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
bool random_bool_with_prob( double prob ) // probability between 0.0 and 1.0
{
std::bernoulli_distribution d(prob);
return d(rand_engine);
}
If your probability is fixed, then you can move it out of the function like so:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::bernoulli_distribution random_bool_generator( prob ); // replace "prob" with your probability
bool random_bool()
{
return random_bool_generator( rand_engine );
}
Or if you want to get fancier still, you can bind them together:
#include <random>
#include <functional>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::bernoulli_distribution random_bool_generator( prob ); // replace "prob" with your probability
auto random_bool = std::bind( random_bool_generator, rand_engine )
// Now call random_bool() to get your random boolean with the specified probability.
You can replace knuth_b with any of the standard engines:
std::linear_congruential_engine
std::mersenne_twister_engine
std::subtract_with_carry_engine
or many more, which are versions of the above, parameterized various ways. My reference lists the following:
std::default_random_engine (Implementation defined.)
std::minstd_rand0
std::minstd_rand
std::mt19937
std::mt19337_64
std::ranlux24_base
std::ranlux48_base
std::ranlux24
std::ranlux48
std::knuth_b
And if that isn't enough, there are some standard adaptors that can further perturb the random number sequence:
std::discard_block_engine which adapts an engine by discarding a given number of generated values each time.
std::independent_bits_engine which adapts an engine to produce random values with a specified number of bits. (Not important to your particular need.)
std::shuffle_order_engine which adapts an engine by permutation of the order of their generated values.
The generators in the second list are derived from the base generators in the first list, either with specific parameters, adaptors or both. For example, knuth_b is equivalent to shuffle_order_engine< linear_congruential_engine< uint32_t, 16807, 0, 2147483647>, 256>, according to my reference book. (The C++ Standard Library, Second Edition, by Nicolai Josuttis, a great reference work.)
You can find more information online, including this brief introduction here: http://en.wikipedia.org/wiki/C++11#Extensible_random_number_facility
There's more documentation here: http://en.cppreference.com/w/cpp/numeric/random
You will probably want to modify the declaration of rand_engine above to provide a seed. The example above uses the default seed. See cppreference.com for how to seed it if you want a different seed.
#include <stdlib.h>
bool prob_true(double p){
return rand()/(RAND_MAX+1.0) < p;
}
Logic:
rand() returns a random number between 0 and RAND_MAX (including both), with equal probability for each number. So by dividing the result by RAND_MAX we get a random number between 0 and 1. This allows us to choose a area of - in your example 63.4% of this segment, e.g. from 0 to 0.634 - and check if the result fell in that area.
Now comes the tricky part: we don't want to get both 0 and 1! Why? Because we want probability 0 to never be true, that's why we need the <p (rather than the <=p) - so that when p=0 you'll never get true.
However, if you can also have 1 as the result, then in the case where p=1 there is a very small chance you get false!
That's why instead of dividing by MAX_RAND you divide by MAX_RAND+1.0. Also note that I added 1.0 instead of 1 to turn the number into a double (otherwise I might get an overflow if MAX_RAND==INT_MAX)
Finally, here's an alternate implementation without the division:
#include <stdlib.h>
bool prob_true(double p){
return rand() < p * (RAND_MAX+1.0);
}

How many random numbers does std::uniform_real_distribution use?

I was surprised to see that the output of this program:
#include <iostream>
#include <random>
int main()
{
std::mt19937 rng1;
std::mt19937 rng2;
std::uniform_real_distribution<double> dist;
double random = dist(rng1);
rng2.discard(2);
std::cout << (rng1() - rng2()) << "\n";
return 0;
}
is 0 - i.e. std::uniform_real_distribution uses two random numbers to produce a random double value in the range [0,1). I thought it would just generate one and rescale that. After thinking about it I guess that this is because std::mt19937 produces 32-bit ints and double is twice this size and thus not "random enough".
Question: How do I find out this number generically, i.e. if the random number generator and the floating point type are arbitrary types?
Edit: I just noticed that I could use std::generate_canonical instead, as I am only interested in random numbers of [0,1). Not sure if this makes a difference.
For template<class RealType, size_t bits, class URNG> std::generate_canonical the standard (section 27.5.7.2) explicitly defines the number of calls to the uniform random number generator (URNG) to be
max(1, b / log_2 R),
where b is the minimum of the number of bits in the mantissa of the RealType and the number of bits given to generate_canonical as template parameter.
R is the range of numbers the URNG can return (URNG::max()-URNG::min()+1).
However, in your example this will not make any difference, since you need 2 calls to the mt19937 to fill the 53 bits of the mantissa of the double.
For other distributions the standard does not provide a generic way to get any information on how many numbers the URNG has to generate to obtain one number of the distribution.
A reason might be that for some distributions the number uniform random numbers required to generate a single number of the distribution is not fixed and may vary from call to call. An example is the std::poisson_distribution, which is usually implemented as a loop which draws a uniform random number in each iteration until the product of these numbers has reached a certain threshold (see for example the implementation of the GNU C++ library (line 1523-1528)).