Get true or false with a given probability - c++

I'm trying to write a function in c++ that will return true or false based on a probability given. So, for example if the probability given was 0.634 then, 63.4% of the time the function would return true. I've tried a few different things, and failed. Any help?

If you'd like to do this in C++11, you can use its various random number engines, combined with the uniform_real_distribution to provide a good result. The following code demonstrates:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::uniform_real_distribution<> uniform_zero_to_one(0.0, 1.0);
bool random_bool_with_prob( double prob ) // probability between 0.0 and 1.0
{
return uniform_zero_to_one(rand_engine) >= prob;
}
Alternately, you can use the bernoulli_distribution, which directly gives you a bool with the specified probability. The probability it takes is the probability of returning true, so it is exactly what you need:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
bool random_bool_with_prob( double prob ) // probability between 0.0 and 1.0
{
std::bernoulli_distribution d(prob);
return d(rand_engine);
}
If your probability is fixed, then you can move it out of the function like so:
#include <random>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::bernoulli_distribution random_bool_generator( prob ); // replace "prob" with your probability
bool random_bool()
{
return random_bool_generator( rand_engine );
}
Or if you want to get fancier still, you can bind them together:
#include <random>
#include <functional>
std::knuth_b rand_engine; // replace knuth_b with one of the engines listed below
std::bernoulli_distribution random_bool_generator( prob ); // replace "prob" with your probability
auto random_bool = std::bind( random_bool_generator, rand_engine )
// Now call random_bool() to get your random boolean with the specified probability.
You can replace knuth_b with any of the standard engines:
std::linear_congruential_engine
std::mersenne_twister_engine
std::subtract_with_carry_engine
or many more, which are versions of the above, parameterized various ways. My reference lists the following:
std::default_random_engine (Implementation defined.)
std::minstd_rand0
std::minstd_rand
std::mt19937
std::mt19337_64
std::ranlux24_base
std::ranlux48_base
std::ranlux24
std::ranlux48
std::knuth_b
And if that isn't enough, there are some standard adaptors that can further perturb the random number sequence:
std::discard_block_engine which adapts an engine by discarding a given number of generated values each time.
std::independent_bits_engine which adapts an engine to produce random values with a specified number of bits. (Not important to your particular need.)
std::shuffle_order_engine which adapts an engine by permutation of the order of their generated values.
The generators in the second list are derived from the base generators in the first list, either with specific parameters, adaptors or both. For example, knuth_b is equivalent to shuffle_order_engine< linear_congruential_engine< uint32_t, 16807, 0, 2147483647>, 256>, according to my reference book. (The C++ Standard Library, Second Edition, by Nicolai Josuttis, a great reference work.)
You can find more information online, including this brief introduction here: http://en.wikipedia.org/wiki/C++11#Extensible_random_number_facility
There's more documentation here: http://en.cppreference.com/w/cpp/numeric/random
You will probably want to modify the declaration of rand_engine above to provide a seed. The example above uses the default seed. See cppreference.com for how to seed it if you want a different seed.

#include <stdlib.h>
bool prob_true(double p){
return rand()/(RAND_MAX+1.0) < p;
}
Logic:
rand() returns a random number between 0 and RAND_MAX (including both), with equal probability for each number. So by dividing the result by RAND_MAX we get a random number between 0 and 1. This allows us to choose a area of - in your example 63.4% of this segment, e.g. from 0 to 0.634 - and check if the result fell in that area.
Now comes the tricky part: we don't want to get both 0 and 1! Why? Because we want probability 0 to never be true, that's why we need the <p (rather than the <=p) - so that when p=0 you'll never get true.
However, if you can also have 1 as the result, then in the case where p=1 there is a very small chance you get false!
That's why instead of dividing by MAX_RAND you divide by MAX_RAND+1.0. Also note that I added 1.0 instead of 1 to turn the number into a double (otherwise I might get an overflow if MAX_RAND==INT_MAX)
Finally, here's an alternate implementation without the division:
#include <stdlib.h>
bool prob_true(double p){
return rand() < p * (RAND_MAX+1.0);
}

Related

uniform_int_distribution with zero range goes to infinite loop

For unit tests I implemented a mock random number generator. I believe that this is a valid implementation of UniformBitGenerator (the mock actually uses google mock to set the return of operator(), but it behaves the same).
struct RNG
{
using result_type = size_t;
static result_type min() { return 0; }
static result_type max() { return std::numeric_limits<result_type>::max(); }
result_type operator()() { return max(); }
};
Now I use this mock to sample from std::uniform_int_distribution in the range [a, b], a == b. I believe this is allowed, the only restriction I have found here on the parameters of the distribution is b >= a. So I would expect the following program to print 5.
int main()
{
auto rng = RNG();
auto dist = std::uniform_int_distribution<>(5, 5);
printf("%d\n", dist(rng));
return 0;
}
Instead it goes into an infinite loop inside the STL, repeatedly drawing numbers from the generator but failing to find a number within the specified range. I tested different (current) compilers (including clang, gcc, icc) in different versions. RNG::max can return other values (e.g. 42) as well, doesn't change anything.
The real code I'm testing draws a random index into a container which may contain only one element. It would be easy to check this condition but it's a rare case and I would like to avoid it.
Am I missing something in the specification of RNGs in the STL? I'd be surprised to find a bug in ALL compilers ...
A uniform distribution is usually achieved with rejection sampling. You keep requesting random numbers until you get one that meets the criteria. You've set up a situation where the criteria can't be met, because your random number generator is very non-random, so it results in an infinite loop.
The standard says ([rand.dist.uni.int]):
A uniform_­int_­distribution random number distribution produces random integers i,
a ≤ i ≤ b, distributed according to the constant discrete probability function
  P(i|a,b)=1/(b−a+1)
. . .
explicit uniform_int_distribution(IntType a = 0, IntType b = numeric_limits<IntType>::max());
  Requires: a ≤ b.
So uniform_int_distribution<>(5,5) should return 5 with probability 1/1.
Implementations that go into an infinite loop instead, have a bug.
However, your mock RNG that always generates the same value, doesn't satisfy Uniform random bit generator requirements:
A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned. [ Note: The degree to which g's results approximate the ideal is often determined statistically.  — end note ]
See [req.genl]/p1.b:
Throughout this subclause [rand], the effect of instantiating a template:
b) that has a template type parameter named URBG is undefined unless the corresponding template argument is cv-unqualified and satisfies the requirements of uniform random bit generator.
Sure enough, with a standard RNG it just works:
#include <iostream>
#include <random>
int main() {
std::mt19937_64 rng;
std::uniform_int_distribution<> dist(5, 5);
std::cout << dist(rng) << "\n";
}
Prints:
5

C++: How to generate random numbers while excluding numbers from a given cache

So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:
#include <random>
#include <time.h>
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);
}
What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again.
How do I alter the above such that I can achieve this?
There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.
Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.
Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?
If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution
If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large
I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.
Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible
As mentioned by #virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.
Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.
#include <random>
#include <time.h>
#include <iostream>
int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
if (buff.size() > 0) {
std::uniform_int_distribution<int> dist(0, buff.size()-1);
int tmp_index = dist(mt);
int tmp_value = buff[tmp_index];
buff.erase(buff.begin() + tmp_index);
return tmp_value;
} else {
return 0;
}
}
int main() {
// lower and upper limit for random distribution
int lower = 0;
int upper = 10;
// Random generator
std::mt19937 mt(time(nullptr));
// Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
std::vector<int> my_buffer(upper-lower);
std::iota(my_buffer.begin(), my_buffer.end(), lower);
for (int i = 0; i < 20; ++i) {
std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
}
return 0;
}
Edit: a cleaner solution here
It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.
#include <random>
#include <time.h>
#include <set>
std::set<int> cache;
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
auto r = dist(mt);
while(cache.find(r) != cache.end())
r = dist(mt);
return r;
}

std::uniform_real_distribution - get all possible numbers

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?
uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.

How many random numbers does std::uniform_real_distribution use?

I was surprised to see that the output of this program:
#include <iostream>
#include <random>
int main()
{
std::mt19937 rng1;
std::mt19937 rng2;
std::uniform_real_distribution<double> dist;
double random = dist(rng1);
rng2.discard(2);
std::cout << (rng1() - rng2()) << "\n";
return 0;
}
is 0 - i.e. std::uniform_real_distribution uses two random numbers to produce a random double value in the range [0,1). I thought it would just generate one and rescale that. After thinking about it I guess that this is because std::mt19937 produces 32-bit ints and double is twice this size and thus not "random enough".
Question: How do I find out this number generically, i.e. if the random number generator and the floating point type are arbitrary types?
Edit: I just noticed that I could use std::generate_canonical instead, as I am only interested in random numbers of [0,1). Not sure if this makes a difference.
For template<class RealType, size_t bits, class URNG> std::generate_canonical the standard (section 27.5.7.2) explicitly defines the number of calls to the uniform random number generator (URNG) to be
max(1, b / log_2 R),
where b is the minimum of the number of bits in the mantissa of the RealType and the number of bits given to generate_canonical as template parameter.
R is the range of numbers the URNG can return (URNG::max()-URNG::min()+1).
However, in your example this will not make any difference, since you need 2 calls to the mt19937 to fill the 53 bits of the mantissa of the double.
For other distributions the standard does not provide a generic way to get any information on how many numbers the URNG has to generate to obtain one number of the distribution.
A reason might be that for some distributions the number uniform random numbers required to generate a single number of the distribution is not fixed and may vary from call to call. An example is the std::poisson_distribution, which is usually implemented as a loop which draws a uniform random number in each iteration until the product of these numbers has reached a certain threshold (see for example the implementation of the GNU C++ library (line 1523-1528)).

Fast pseudo random number generator for procedural content

I am looking for a pseudo random number generator which would be specialized to work fast when it is given a seed before generating each number. Most generators I have seen so far assume you set seed once and then generate a long sequence of numbers. The only thing which looks somewhat similar to I have seen so far is Perlin Noise, but it generates too "smooth" data - for similar inputs it tends to produce similar results.
The declaration of the generator should look something like:
int RandomNumber1(int seed);
Or:
int RandomNumber3(int seedX, int seedY, int seedZ);
I think having good RandomNumber1 should be enough, as it is possible to implement RandomNumber3 by hashing its inputs and passing the result into the RandomNumber1, but I wrote the 2nd prototype in case some implementation could use the independent inputs.
The intended use for this generator is to use it for procedural content generator, like generating a forest by placing trees in a grid and determining a random tree species and random spatial offsets for each location.
The generator needs to be very efficient (below 500 CPU cycles), because the procedural content is created in huge quantities in real time during rendering.
Seems like you're asking for a hash-function rather than a PRNG. Googling 'fast hash function' yields several promising-looking results.
For example:
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
Edit: Yep, some hash functions definitely look more suitable than others.
For your purposes, it should be sufficient to eyeball thefunction and check that a single-bit change in the input will propagate to lots of output bits.
Yep, you are looking for a fast integer hash algorithm rather than a PRNG.
This page has a few algorithms, I'm sure you'll find plenty more now you know the correct search terms.
Edit: The original page has been removed, a live version can be found on GitHub.
Here's a small random number generator developed by George Marsaglia. He's an expert in the field, so you can be confident the generator has good statistical properties.
v = 36969*(v & 65535) + (v >> 16);
u = 18000*(u & 65535) + (u >> 16);
return (v << 16) + (u & 65535);
Here u and v are unsigned ints. Initialize them to any non-zero values. Each time you generate a random number, store u and v somewhere. You could wrap this in a function to match your signature above (except the ints are unsigned.)
see std::tr1::ranlux3, or other random number generators that are part of TR1 additions to the standard C++ library. I suggested mt19937 initialially, but then saw your note that it needs to be very fast. TR1 is should be available on Microsoft VC++ and GCC, and can also be found in the boost libraries which support even more compilers.
example adapted from boost documentation:
#include <random>
#include <iostream>
#include <iterator>
#include <functional>
#include <algorithm>
#include <ctime>
using namespace std;
using namespace std::tr1;
int main(){
random_device trueRand;
ranlux3 rng(trueRand); // produces randomness out of thin air
// see pseudo-random number generators
uniform_int<> six(1,6); // distribution that maps to 1..6
// see random number distributions
variate_generator<ranlux3&, uniform_int<> >
die(rng, six); // glues randomness with mapping
// simulate rolling a die
generate_n( ostream_iterator<int>(cout, " "), 10, ref(die));
}
example output:
2 4 4 2 4 5 4 3 6 2
Any TR1 random number generator can seed any other random number generator. If you need higher quality results, consider feeding the output of mt19937 (which is slower, but higher quality) into a minstd_rand or randlux3, which are faster generators.
If memory is not really an issue and speed is of utmost importance then you can prebuild a large array of random numbers and just iterate through it at runtime. For example have a seperate program generate 100,000 random numbers and save it as it's own file like
unsigned int randarray []={1,2,3,....}
then include that file into your compile and at runtime your random number function only needs to pull numbers from that array and loop back to the start when it hits the end.
I use the following code in my Java random number library - this has worked pretty well for me. I also use this for generating procedural content.
/**
* State for random number generation
*/
private static volatile long state=xorShift64(System.nanoTime()|0xCAFEBABE);
/**
* Gets a long random value
* #return Random long value based on static state
*/
public static long nextLong() {
long a=state;
state = xorShift64(a);
return a;
}
/**
* XORShift algorithm - credit to George Marsaglia!
* #param a initial state
* #return new state
*/
public static final long xorShift64(long a) {
a ^= (a << 21);
a ^= (a >>> 35);
a ^= (a << 4);
return a;
}