Random ints with different likelihoods - c++

I was wondering if there was a way to have a random number between A an b and where if a number meets a certain requirement it is more likely to appear than all the other numbers between A and B, for example: Lower numbers are more likely to appear so if A = 1 and B = 10 then 1 would be the likeliest and 10 would be the unlikeliest.
All help is appreciated :) (sorry for bad English/grammar/question)

C++11 (which you should absolutely be using by now) added the <random> header to the C++ standard library. This header provides much higher quality random number generators to C++. Using srand() and rand() has never been a very good idea because there's no guarantee of quality, but now it's truly inexcusable.
In your example, it sounds like you want what would probably be called a 'discrete triangular distribution': the probability mass function looks like a triangle. The easiest (but perhaps not the most efficient) way to implement this in C++ would be the discrete distribution included in <random>:
auto discrete_triangular_distribution(int max) {
std::vector<int> weights(max);
std::iota(weights.begin(), weights.end(), 0);
std::discrete_distribution<> dist(weights.begin(), weights.end());
return dist;
}
int main() {
std::random_device rd;
std::mt19937 gen(rd());
auto&& dist = discrete_triangular_distribution(10);
std::map<int, int> counts;
for (int i = 0; i < 10000; i++)
++counts[dist(gen)];
for (auto count: counts)
std::cout << count.first << " generated ";
std::cout << count.second << " times.\n";
}
which for me gives the following output:
1 generated 233 times.
2 generated 425 times.
3 generated 677 times.
4 generated 854 times.
5 generated 1130 times.
6 generated 1334 times.
7 generated 1565 times.
8 generated 1804 times.
9 generated 1978 times.
Things more complex than this would be better served with either using one of the existing distributions (I have been told that all commonly used statistical distributions are included) or by writing your own distribution, which isn't too hard: it just has to be an object with a function call operator that takes a random bit generator and uses those bits to produce (in this case) random numbers. But you could create one that made random strings, or any arbitrary random objects, perhaps for testing purposes).

Your question doesn't specify which distribution to use. One option (of many) is to use the (negative) exponential distribution. This distribution is parameterized by a parameter λ. For each value of λ, the maximum result is unbounded (which needs to be handled in order to return results only in the range specified)
(from Wikipedia, By Skbkekas, CC BY 3.0)
so any λ could theoretically work; however, the properties of the CDF
(from Wikipedia, By Skbkekas, CC BY 3.0)
imply that it pays to choose something in the order of 1 / (to - from + 1).
The following class works like a standard library distribution. Internally, it generates numbers in a loop, until a result in [from, to] is obtained.
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
class bounded_discrete_exponential_dist {
public:
explicit bounded_discrete_exponential_dist(std::size_t from, std::size_t to) :
m_from{from}, m_to{to}, m_d{0.5 / (to - from + 1)} {}
explicit bounded_discrete_exponential_dist(std::size_t from, std::size_t to, double factor) :
m_from{from}, m_to{to}, m_d{factor} {}
template<class Gen>
std::size_t operator()(Gen &gen) {
while(true) {
const auto r = m_from + static_cast<std::size_t>(m_d(gen));
if(r <= m_to)
return r;
}
}
private:
std::size_t m_from, m_to;
std::exponential_distribution<> m_d;
};
Here is an example of using it:
int main()
{
std::random_device rd;
std::mt19937 gen(rd());
bounded_discrete_exponential_dist d{1, 10};
std::vector<std::size_t> hist(10, 0);
for(std::size_t i = 0; i < 99999; ++i)
++hist[d(gen) - 1];
for(auto h: hist)
std::cout << std::string(static_cast<std::size_t>(80 * h / 99999.), '+') << std::endl;
}
When run, it outputs a histogram like this:
$ ./a.out
++++++++++
+++++++++
+++++++++
++++++++
+++++++
+++++++
+++++++
+++++++
++++++
++++++

Your basic random number generator should produce a high-quality, uniform random numbers on 0 to 1 - epsilon. You then transform it to get the distribution you want. The simplest transform is of course (int) ( p * N) in the common case of needing an integer on 0 to N -1.
But there are many many other transforms you can try. Take the square root, for example, to bias it to 1.0, then 1 - p to set the bias towards zero. Or you can look up the Poisson distribution, which might be what you are after. You can also use a half-Gaussian distribution (statistical bell curve with the zero entries cut off, and presumably also the extreme tail of the distribution as it goes out of range).
There can be no right answer. Try various things, plot out ten thousand or so values, and pick the one that gives results you like.

You can make an array of values, the more likely value has more indexes and then choose a random index.
example:
int random[55];
int result;
int index = 0;
for (int i = 1 ; i <= 10 ; ++i)
for (int j = i ; j <= 10 ; ++j)
random[index++] = i;
result = random[rand() % 55];
Also, you can try to get random number twice, first time you choose the max number then you choose your random number:
int max= rand() % 10 + 1; // This is your max value
int random = rand() % max + 1; // This is you result
Both ways will make 1 more likely than 2 , 2 more likely than 3 ... 9 more likely than 10.

Related

How to set a minimum range for generating random number in c++? [duplicate]

I need a function which would generate a random integer in a given range (including boundary values). I don't have unreasonable quality/randomness requirements; I have four requirements:
I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
I need it to be reasonably uniform (use of rand() is perfectly fine).
the minimum-maximum ranges can be anything from <0, 1> to <-32727, 32727>.
it has to be seedable.
I currently have the following C++ code:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is a major issue for small ranges like <-1, 1>, where the last value is almost never returned.
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
But it still doesn't give me a uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
Is there a better formula? (Or even whole pseudo-random number generator function?)
The simplest (and hence best) C++ (using the 2011 standard) answer is:
#include <random>
std::random_device rd; // Only used once to initialise (seed) engine
std::mt19937 rng(rd()); // Random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // Guaranteed unbiased
auto random_integer = uni(rng);
There isn't any need to reinvent the wheel, worry about bias, or worry about using time as the random seed.
A fast, somewhat better than yours, but still not properly uniform distributed solution is
output = min + (rand() % static_cast<int>(max - min + 1))
Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.
If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.
Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.
#include <iostream>
#include <random>
#include <chrono>
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
Clock::time_point t0 = Clock::now();
const int N = 10000000;
typedef std::minstd_rand G; // Select the engine
G g; // Construct the engine
typedef std::uniform_int_distribution<> D; // Select the distribution
D d(-57, 365); // Construct the distribution
int c = 0;
for (int i = 0; i < N; ++i)
c += d(g); // Generate a random number
Clock::time_point t1 = Clock::now();
std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
return c;
}
For me (2.8 GHz Intel Core i5) this prints out:
2.10268e+07 random numbers per second.
You can seed the generator by passing in an int to its constructor:
G g(seed);
If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g., to long long):
typedef std::uniform_int_distribution<long long> D;
If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:
typedef std::mt19937 G; // Now using mersenne_twister_engine
Having separate control over the random number generator, and the random distribution can be quite liberating.
I've also computed (not shown) the first four "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:
min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001
(The x_ prefix refers to "expected".)
Let's split the problem into two parts:
Generate a random number n in the range 0 through (max-min).
Add min to that number
The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias
to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.
It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2,
we know that p > 1/2, so the expected number of iterations will always be less than two
for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.
Use the Mersenne Twister. The Boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects, such as artificial intelligence and evolutionary algorithms.
Here's their example where they make a simple function to roll a six-sided die:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>
boost::mt19937 gen;
int roll_die() {
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
return die();
}
Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():
The Mersenne Twister is a "random
number" generator invented by Makoto
Matsumoto and Takuji Nishimura; their
website includes numerous
implementations of the algorithm.
Essentially, the Mersenne Twister is a
very large linear-feedback shift
register. The algorithm operates on a
19,937 bit seed, stored in an
624-element array of 32-bit unsigned
integers. The value 2^19937-1 is a
Mersenne prime; the technique for
manipulating the seed is based on an
older "twisting" algorithm -- hence
the name "Mersenne Twister".
An appealing aspect of the Mersenne
Twister is its use of binary
operations -- as opposed to
time-consuming multiplication -- for
generating numbers. The algorithm also
has a very long period, and good
granularity. It is both fast and
effective for non-cryptographic applications.
int RandU(int nMin, int nMax)
{
return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}
This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.
Assume min and max are integer values,
[ and ] means include this value,
( and ) means do not include this value,
using the above to get the right value using C++'s rand().
Reference:
For ()[] define, visit Interval (mathematics).
For the rand and srand function or RAND_MAX define,
visit std::rand.
[min, max]
int randNum = rand() % (max - min + 1) + min
(min, max]
int randNum = rand() % (max - min) + min + 1
[min, max)
int randNum = rand() % (max - min) + min
(min, max)
int randNum = rand() % (max - min - 1) + min + 1
Here is an unbiased version that generates numbers in [low, high]:
int r;
do {
r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;
If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.
I recommend the Boost.Random library. It's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.
Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.
Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]
Take base b=max-min+1
Start from representing a number you got from rand() in base b.
That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].
Of course the final shift to [min,max] is simple for each random number r+min.
int n = NUM_DIGIT-1;
while(n >= 0)
{
r[n] = res % b;
res -= r[n];
res /= b;
n--;
}
If NUM_DIGIT is the number of digit in base b that you can extract and that is
NUM_DIGIT = floor(log(b,RAND_MAX))
then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.
In answers to this question, rejection sampling was already addressed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.
The algorithm is really simple:
calculate the smallest power of 2 greater than the interval length
randomize one number in that "new" interval
return that number if it is less than the length of the original interval
reject otherwise
Here's my sample code:
int randInInterval(int min, int max) {
int intervalLen = max - min + 1;
//now calculate the smallest power of 2 that is >= than `intervalLen`
int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));
int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"
if (randomNumber < intervalLen)
return min + randomNumber; //ok!
return randInInterval(min, max); //reject sample and try again
}
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
PS: Obviously avoiding the recursion would be more efficient (there isn't any need to calculate over and over the log ceiling...), but I thought it was more readable for this example.
The following is the idea presented by Walter. I wrote a self-contained C++ class that will generate a random integer in the closed interval [low, high]. It requires C++11.
#include <random>
// Returns random integer in closed range [low, high].
class UniformRandomInt {
std::random_device _rd{};
std::mt19937 _gen{_rd()};
std::uniform_int_distribution<int> _dist;
public:
UniformRandomInt() {
set(1, 10);
}
UniformRandomInt(int low, int high) {
set(low, high);
}
// Set the distribution parameters low and high.
void set(int low, int high) {
std::uniform_int_distribution<int>::param_type param(low, high);
_dist.param(param);
}
// Get random integer.
int get() {
return _dist(_gen);
}
};
Example usage:
UniformRandomInt ur;
ur.set(0, 9); // Get random int in closed range [0, 9].
int value = ur.get()
The formula for this is very simple, so try this expression,
int num = (int) rand() % (max - min) + min;
//Where rand() returns a random number between 0.0 and 1.0
The following expression should be unbiased if I am not mistaken:
std::floor( ( max - min + 1.0 ) * rand() ) + min;
I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 not including 1.0 and that max and min are integers with the condition that min < max.

Randomly selecting 2 integers not in a range?

I'm new to C++, and I've been searching all day to find a way to randomly select one of two distinct integers.
Everything I've found so far works only for integers within a range (1-10, etc) rather than for (1 or 3).
For ex. code I've been using elsewhere in the program (for a range of numbers) is
int c;
int Min = 1;
int Max = 3;
c = rand() % (Max + 1 - Min) + Min;
which returns a random integer within the range, rather than one or the other integers given.
First of all you shouldn't use C random in C++. Use C++ random.
The way to chose from a set of elements is to randomly generate an index. You can wrap the logic in a class:
#include <random>
#include <iostream>
#include <vector>
#include <initializer_list>
class Random_choice
{
std::random_device rd_{};
public:
template <class T> auto get_choice(std::initializer_list<T> elements) -> T
{
std::uniform_int_distribution<std::size_t> dist{0, elements.size() - 1};
std::size_t i = dist(rd_);
return *(elements.begin() + i);
}
};
int main()
{
Random_choice rc;
std::cout << rc.get_choice({3, 5}) << std::endl;
}
Or without the abstraction:
#include <random>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> choices = {3, 5};
std::random_device rd;
std::mt19937 e{rd()};
std::uniform_int_distribution<std::size_t> dist{0, choices.size() - 1};
std::size_t i = dist(e);
std::cout << choices[i] << std::endl;
}
Randomly choosing one of two integers, a or b:
c = (rand() % 2) ? a : b
Randomly choosing an integer from a list of integers:
std::vector<int> numbers;
c = numbers.at(rand() % numbers.size());
Randomly choosing an integer from two intervals [a, b) and [c, d):
H = (b-a);
L = (b-a) + (d-c);
k = rand() % L;
c = (k < H) ? (a + k) : (c + (k - H));
In case you do C++11 then you may definitely have look into pseudo-random numer generation, like discrete_distribution and uniform_int_distribution.
Update. Removed the claim that we would choose uniformly from the given set. Since rand() chooses from [0, RAND_MAX], this is only true if the divisor of the above modulo operations divides (RAND_MAX+1). (Which is true for the first example in most implementations where RAND_MAX is 32767 or another power-of-two minus 1.) However, the defect from being uniform is roughly of the order of divisor/RAND_MAX. Nevertheless, C++11 uniform_int_distribution is recommended instead. Thanks, Baum mit Augen.
If you have a range of numbers that you don't want numbersin, then you have two ranges that you do want numbers in.
For example, if your range is 1 to 3 (inclusive) then the two ranges you do want numbers in are -∞ to 0, and 4 to ∞.
Infinity is a little tricky on computers, but can easily be emulated for example by std::numeric_limits to get the min and max for the wanted type.
So in your case you want a random number in the range std::numeric_limits<int>::min() to 0, and 4 to std::numeric_limits<int>::max().
Two get two random numbers from a random choice of either range, first pick (randomly) one range, and get a number from that. Then again (randomly) pick a range and get the second number from that.

Using continuously seed to generate uniform random numbers?

I have an interesting question, namely
by using the famous mersenne twister std::mt19937 r in the standard library (or any other random generator) and setting it up with a seed as r.seed(4) for example it is possible to obtain uniformly random generated numbers (in the range uint_fast32_t provides).
What exactly happens if we loop through the seed from say 1 till 100 and generate the first random number, is this sequence still uniformly distributed or not?
for(int i = 0;i<100;i++){
r.seed(i);
int v = r();
}
I have some algorithm which would be much easier to implement by using this trick instead of generating the number in the usual way (without resetting the seed everytime).
I actually don't believe that by misusing the generator like that, that the uniformity of the sequence can be maintained anymore.
Does anybody has the expertise to give some reasoning about this?
Thanks a lot!
This code does what you say, reseting the seed between each number generation :
#include <iostream>
#include <random>
int main ()
{
std::mt19937 r;
for(int i = 0;i<10;i++){
r.seed(i);
int v = r();
std::cout << v << std::endl;
}
return 0;
}
The output of this program is deterministic. You keep reseting the state between each generation (and this state is used to generate the next random number). You have absolutely no guarantee about the distribution, or the uniformity, of numbers generated from different mersenne sequences (again, a new sequence being started each time you reset the seed).
If your goal is to generate a uniform distribution constrained in an interval, use std::uniform_real_distribution :
Example from en.cppreference.com :
#include <random>
#include <iostream>
int main()
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(1, 2);
for (int n = 0; n < 10; ++n) {
std::cout << dis(gen) << ' ';
}
std::cout << '\n';
}
It is defined in the C++ standard, section § 26.5.8.2.2 :
A uniform_real_distribution random number distribution produces random
numbers x , a ≤ x < b , distributed according to the constant
probability density function p ( x | a,b ) = 1 / ( b − a )

How to generate negative random integer in c++

I wrote a function that takes integers. It won't crash if the user types for example, -5, but it will convert it into positive =-(
int getRandoms(int size, int upper, int lower)
{
int randInt = 0;
randInt = 1 + rand() % (upper -lower + 1);
return randInt;
}
What should I change in the function in order to build random negative integers?
The user inputs the range.
There are two answers to this, if you are using C++11 then you should be using uniform_int_distribtion, it is preferable for many reasons for example Why do people say there is modulo bias when using a random number generator? is one example and rand() Considered Harmful presentation gives a more complete set of reasons. Here is an example which generates random integers from -5 to 5:
#include <iostream>
#include <random>
int main()
{
std::random_device rd;
std::mt19937 e2(rd());
std::uniform_int_distribution<int> dist(-5, 5);
for (int n = 0; n < 10; ++n) {
std::cout << dist(e2) << ", " ;
}
std::cout << std::endl ;
}
If C++11 is not an option then the method described in C FAQ in How can I get random integers in a certain range? is the way to go. Using that method you would do the following to generate random integers from [M, N]:
M + rand() / (RAND_MAX / (N - M + 1) + 1)
For a number in the closed range [lower,upper], you want:
return lower + rand() % (upper - lower + 1); // NOT 1 + ...
This will work for positive or negative values, as long as upper is greater than or equal to lower.
Your version returns numbers from a range of the same size, but starting from 1 rather than lower.
You could also use Boost.Random, if you don't mind the dependency. http://www.boost.org/doc/libs/1_54_0/doc/html/boost_random.html
You want to start by computing the range of the numbers, so (for example) -10 to +5 is a range of 15.
You can compute numbers in that range with code like this:
int rand_lim(int limit) {
/* return a random number in the range [0..limit)
*/
int divisor = RAND_MAX/limit;
int retval;
do {
retval = rand() / divisor;
} while (retval == limit);
return retval;
}
Having done that, getting the numbers to the correct range is pretty trivial: add the lower bound to each number you get.
Note that C++11 has added both random number generator and distribution classes that can take care of most of this for you.
If you do attempt to do this on your own, when you reduce numbers to a range, you pretty much need to use a loop as I've shown above to avoid skew. Essentially any attempt at just using division or remainder on its own almost inevitably introduces skew into the result (i.e., some results will happen more often than others).
You only need to sum to the lower-bound of the range [lbound, ubound]:
int rangesize = ubound - lbound + 1;
int myradnom = (rand() % rangesize) + lbound;

Generating a random integer from a range

I need a function which would generate a random integer in a given range (including boundary values). I don't have unreasonable quality/randomness requirements; I have four requirements:
I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
I need it to be reasonably uniform (use of rand() is perfectly fine).
the minimum-maximum ranges can be anything from <0, 1> to <-32727, 32727>.
it has to be seedable.
I currently have the following C++ code:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is a major issue for small ranges like <-1, 1>, where the last value is almost never returned.
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
But it still doesn't give me a uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
Is there a better formula? (Or even whole pseudo-random number generator function?)
The simplest (and hence best) C++ (using the 2011 standard) answer is:
#include <random>
std::random_device rd; // Only used once to initialise (seed) engine
std::mt19937 rng(rd()); // Random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // Guaranteed unbiased
auto random_integer = uni(rng);
There isn't any need to reinvent the wheel, worry about bias, or worry about using time as the random seed.
A fast, somewhat better than yours, but still not properly uniform distributed solution is
output = min + (rand() % static_cast<int>(max - min + 1))
Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.
If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.
Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.
#include <iostream>
#include <random>
#include <chrono>
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
Clock::time_point t0 = Clock::now();
const int N = 10000000;
typedef std::minstd_rand G; // Select the engine
G g; // Construct the engine
typedef std::uniform_int_distribution<> D; // Select the distribution
D d(-57, 365); // Construct the distribution
int c = 0;
for (int i = 0; i < N; ++i)
c += d(g); // Generate a random number
Clock::time_point t1 = Clock::now();
std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
return c;
}
For me (2.8 GHz Intel Core i5) this prints out:
2.10268e+07 random numbers per second.
You can seed the generator by passing in an int to its constructor:
G g(seed);
If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g., to long long):
typedef std::uniform_int_distribution<long long> D;
If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:
typedef std::mt19937 G; // Now using mersenne_twister_engine
Having separate control over the random number generator, and the random distribution can be quite liberating.
I've also computed (not shown) the first four "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:
min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001
(The x_ prefix refers to "expected".)
Let's split the problem into two parts:
Generate a random number n in the range 0 through (max-min).
Add min to that number
The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias
to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.
It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2,
we know that p > 1/2, so the expected number of iterations will always be less than two
for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.
Use the Mersenne Twister. The Boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects, such as artificial intelligence and evolutionary algorithms.
Here's their example where they make a simple function to roll a six-sided die:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>
boost::mt19937 gen;
int roll_die() {
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
return die();
}
Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():
The Mersenne Twister is a "random
number" generator invented by Makoto
Matsumoto and Takuji Nishimura; their
website includes numerous
implementations of the algorithm.
Essentially, the Mersenne Twister is a
very large linear-feedback shift
register. The algorithm operates on a
19,937 bit seed, stored in an
624-element array of 32-bit unsigned
integers. The value 2^19937-1 is a
Mersenne prime; the technique for
manipulating the seed is based on an
older "twisting" algorithm -- hence
the name "Mersenne Twister".
An appealing aspect of the Mersenne
Twister is its use of binary
operations -- as opposed to
time-consuming multiplication -- for
generating numbers. The algorithm also
has a very long period, and good
granularity. It is both fast and
effective for non-cryptographic applications.
int RandU(int nMin, int nMax)
{
return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}
This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.
Assume min and max are integer values,
[ and ] means include this value,
( and ) means do not include this value,
using the above to get the right value using C++'s rand().
Reference:
For ()[] define, visit Interval (mathematics).
For the rand and srand function or RAND_MAX define,
visit std::rand.
[min, max]
int randNum = rand() % (max - min + 1) + min
(min, max]
int randNum = rand() % (max - min) + min + 1
[min, max)
int randNum = rand() % (max - min) + min
(min, max)
int randNum = rand() % (max - min - 1) + min + 1
Here is an unbiased version that generates numbers in [low, high]:
int r;
do {
r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;
If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.
I recommend the Boost.Random library. It's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.
Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.
Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]
Take base b=max-min+1
Start from representing a number you got from rand() in base b.
That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].
Of course the final shift to [min,max] is simple for each random number r+min.
int n = NUM_DIGIT-1;
while(n >= 0)
{
r[n] = res % b;
res -= r[n];
res /= b;
n--;
}
If NUM_DIGIT is the number of digit in base b that you can extract and that is
NUM_DIGIT = floor(log(b,RAND_MAX))
then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.
In answers to this question, rejection sampling was already addressed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.
The algorithm is really simple:
calculate the smallest power of 2 greater than the interval length
randomize one number in that "new" interval
return that number if it is less than the length of the original interval
reject otherwise
Here's my sample code:
int randInInterval(int min, int max) {
int intervalLen = max - min + 1;
//now calculate the smallest power of 2 that is >= than `intervalLen`
int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));
int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"
if (randomNumber < intervalLen)
return min + randomNumber; //ok!
return randInInterval(min, max); //reject sample and try again
}
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
PS: Obviously avoiding the recursion would be more efficient (there isn't any need to calculate over and over the log ceiling...), but I thought it was more readable for this example.
The following is the idea presented by Walter. I wrote a self-contained C++ class that will generate a random integer in the closed interval [low, high]. It requires C++11.
#include <random>
// Returns random integer in closed range [low, high].
class UniformRandomInt {
std::random_device _rd{};
std::mt19937 _gen{_rd()};
std::uniform_int_distribution<int> _dist;
public:
UniformRandomInt() {
set(1, 10);
}
UniformRandomInt(int low, int high) {
set(low, high);
}
// Set the distribution parameters low and high.
void set(int low, int high) {
std::uniform_int_distribution<int>::param_type param(low, high);
_dist.param(param);
}
// Get random integer.
int get() {
return _dist(_gen);
}
};
Example usage:
UniformRandomInt ur;
ur.set(0, 9); // Get random int in closed range [0, 9].
int value = ur.get()
The formula for this is very simple, so try this expression,
int num = (int) rand() % (max - min) + min;
//Where rand() returns a random number between 0.0 and 1.0
The following expression should be unbiased if I am not mistaken:
std::floor( ( max - min + 1.0 ) * rand() ) + min;
I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 not including 1.0 and that max and min are integers with the condition that min < max.