Why it is appropriate to use `std::uniform_real_distribution`? - c++

I'm trying to write Metropolis Monte Carlo simulation code.
Since the simulation will be very long, I'd like to think seriously about the performance for generating random numbers in [0, 1].
So I decided to check the performance of two methods by the following code:
#include <cfloat>
#include <chrono>
#include <iostream>
#include <random>
int main()
{
constexpr auto Ntry = 5000000;
std::mt19937 mt(123);
std::uniform_real_distribution<double> dist(0.0, std::nextafter(1.0, DBL_MAX));
double test1, test2;
// method 1
auto start1 = std::chrono::system_clock::now();
for (int i=0; i<Ntry; i++) {
test1 = dist(mt);
}
auto end1 = std::chrono::system_clock::now();
auto elapsed1 = std::chrono::duration_cast<std::chrono::microseconds>(end1-start1).count();
std::cout << elapsed1 << std::endl;
// method 2
auto start2 = std::chrono::system_clock::now();
for (int i=0; i<Ntry; i++) {
test2 = 1.0*mt() / mt.max();
}
auto end2 = std::chrono::system_clock::now();
auto elapsed2 = std::chrono::duration_cast<std::chrono::microseconds>(end2-start2).count();
std::cout << elapsed2 << std::endl;
}
Then the result is
295489 micro sec for method 1
79884 micro sec for method 2
I understand that there are many posts that recommend to use std::uniform_real_distribution.
But performance-wise, it is tempting to use the latter as this result shows.
Would you tell me what is the point of using std::uniform_real_distribution?
What is the disadvantage of using 1.0*mt() / mt.max()?
And in the current purpose, is it acceptable to use 1.0*mt() / mt.max() instead?
Edit:
I compiled this code with g++-11 test.cpp.
When I compile with -O3 flag, the result is qualitatively same (the method 1 is approx. 1.8 times slower).
I would like to discuss what is the advantage of the widely-used method.
I do concern the trend of performances, but specific performance comparison is out of my scope.

You use the standard random library because it is extremely difficult to do numerical calculations correctly and you don't want the burden of proving and maintaining your own random library.
Case in point, your random distribution is wrong. std::mt19937 produces 32-bit integers, yet you're expecting a double, which has a 53-bit significand (usually). There are values in the range [0, 1] that you will never obtain from 1.0*mt() / mt::max().

Your testing methodology is flawed. You don't use the result that you produce, so a smart optimiser may simply skip producing a result.
Would you tell me what is the point of using std::uniform_real_distribution?
The clue is in the name. It produces a uniform distribution.
Furthermore, it allows you to specify the minimum and maximum between which you want the distribution to lie.
What is the disadvantage of using 1.0*mt() / mt.max()?
You cannot specify a minimum and a maximum.
It produces a less uniform distribution.
It produces less randomness.
is it acceptable to use 1.0*mt() / mt.max() instead?
In some use cases, it could be acceptable. In some other cases, it isn't acceptable. In the rest, it won't matter.

Related

How to make this code thread safe with openMP? Monte Carlo two-dimensional integration

I am trying to learn parallelization with openmp.
This code performs two-dimensional Monte Carlo integration.
Program execution(with omp) time is about 25 seconds for N = 900 million
#include <cmath>
#include <cstdint>
#include <iostream>
#include <omp.h>
#include <random>
double func(double x, double y) {
return sqrt(x+y);
}
using std::cout;
using std::endl;
const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const uint64_t N = 900000000;
int main() {
double sum = 0.0;
std::random_device rd;
#pragma omp parallel
{
uint32_t seed;
#pragma omp critical
{
seed = rd();
}
std::mt19937 gen(seed);
std::uniform_real_distribution<double> dist(0.0, 1.0);
#pragma omp for reduction(+ : sum)
for (int64_t i = 0; i < N; i++) {
double x = a + (b - a) * dist(gen);
double y = c + (d - c) * dist(gen);
sum += func(x, y);
// sum - local for each thread
// initialized to 0 in each thread
// and will be added to the global sum when exiting the loop
}
}
// The result is calculated after the parallel section
double ans = sum * (b - a) * (d - c)/ N;
cout << ans<<endl;
return 0;
}
How can I convert this to thread-safe code (professor said that it is not thread-safe)?
As far as I can see, your professor is mistaken. The code is thread-safe. The reduction clause is written correctly.
The only issue I have is with the way you initialize the random number generators.
It is inefficient calling into random_device once per thread
It opens up the chance of a birthday problem. There is a (very minor) chance that two threads start with the same random state because they receive the same random value.
The more efficient way to create multiple random sequences is to increment the seed by 1 per thread. A good random number generator will convert this to vastly different random values. See for example here:
https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
Another minor issue is that you can move the reduction to the end of the parallel section. Then you can declare the for-loop as nowait to avoid one implicit openmp barrier at the end of the loop. This optimization is possible because you do not need the final sum inside the parallel section. However, this change is only an optimization. It does not change the correctness of your code.
The code would look something like this:
#include <cmath>
// using std::sqrt
#include <random>
// using std::random_device, std::mt19937, std::uniform_real_distribution
#include <iostream>
// using std::cout
#include <omp.h>
// using omp_get_thread_num
int main() {
const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const int64_t N = 900000000;
double sum = 0.0;
std::random_device rd;
// initial seed
const std::random_device::result_type seed = rd();
# pragma omp parallel reduction(+ : sum)
{
std::uniform_real_distribution<double> dist(0.0, 1.0);
/*
* We add the thread number to the seed. This is better than using multiple
* random numbers because it avoids the birthday problem. See for example
* https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
*/
std::mt19937 gen(seed + static_cast<unsigned>(omp_get_thread_num()));
# pragma omp for nowait
for(int64_t i = 0; i < N; ++i) {
const double x = a + (b - a) * dist(gen);
const double y = c + (d - c) * dist(gen);
sum += std::sqrt(x + y);
}
}
const double ans = sum * (b - a) * (d - c)/ N;
std::cout << ans << '\n';
return 0;
}
Also note how you can indent the pragmas to improve readability. Just keep the # at the start of the line.
Relation to similar question
As Victor brought up, there is a similar question here:
Is mersenne twister thread safe for cpp
The question there was whether this code can be used thread-safe:
int f() {
std::random_device seeder;
std::mt19937 engine(seeder());
std::uniform_int_distribution<int> dist(1, 6);
return dist(engine);
}
The answer to that is yes, as pointed out in the answer there:
No C++ std type uses global data in a non-thread-safe way.
However, creating a new mt19937 just to compute a single integer is pointless. So the answer there suggests ways to reuse an mt19937 instance multiple times. But reuse requires thread-safety because
By default, one instance of a type cannot be accessed from two threads without synchronization.
The easiest answer there is to use a thread-local instance. We use a similar approach here by using one instance per thread in the parallel section.
There are a few key differences between my solution and the one proposed there:
The other solution still suffers from the birthday problem. However, that issue is minor unless you create a ton of threads so it is forgivable (IMHO). Also, you can avoid this by using a different initialization strategy. For example this would work:
static std::atomic<std::default_random_engine::result_type> seed(
std::default_random_engine{}());
thread_local std::mt19937 rnd(
seed.fetch_add(1, std::memory_order_relaxed));
The mt19937 there may live significantly longer and be reused more if you reuse the threads. This can be a benefit. But in the question here, the threads are not reused. Also, we use the RNG for a significant time before discarding it, so the benefit of using it even longer is less significant than in the other question.
There may be hidden costs associated with thread-local data. An mt19937 has significant memory associated with it. I suggest not over-using thread-local storage if not required. Plus, like with any hidden global state, it is a bit of a code smell. Direct control over initialization may be preferential, e.g. for testing
Initializing Mersenne Twisters
I should also mention that initializing a mersenne twister with its huge internal state from just a single integer seed value may be less than ideal. C++'s solution of seed_seq may also be flawed but it is the best we've got. So here is an outline of the code as I would write it:
std::array<std::mt19937::result_type, std::mt19937::state_size> seed;
std::random_device rdev;
std::generate(seed.begin(), seed.end(), std::ref(rdev));
# pragma omp parallel firstprivate(seed)
{
seed.front() += static_cast<unsigned>(omp_get_thread_num());
std::seed_seq sseq(seed.begin(), seed.end());
std::mt19937 rnd(sseq);
}
Use of MT19937 for Monte Carlo
I also need to mention this little note on Wikipedia
Multiple instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values.[44][45]
[44] Makoto Matsumoto; Takuji Nishimura. "Dynamic Creation of Pseudorandom Number Generators" (PDF). Retrieved 19 July 2015.
[45] Hiroshi Haramoto; Makoto Matsumoto; Takuji Nishimura; François Panneton; Pierre L'Ecuyer. "Efficient Jump Ahead for F2-Linear Random Number Generators" (PDF). Retrieved 12 Nov 2015

C++ random generator with provided (at least estimated) entropy

Using C++ standard random generator I can more or less efficiently create sequences with pre-defined distributions using language-provided tools. What about Shannon entropy? Is it possible some way to define output Shannon entropy for the provided sequence?
I tried a small experiment, generated a long enough sequence with linear distribution, and implemented a Shannon entropy calculator. Resulting value is from 0.0 (absolute order) to 8.0 (absolute chaos)
template <typename T>
double shannon_entropy(T first, T last)
{
size_t frequencies_count{};
double entropy = 0.0;
std::for_each(first, last, [&entropy, &frequencies_count] (auto item) mutable {
if (0. == item) return;
double fp_item = static_cast<double>(item);
entropy += fp_item * log2(fp_item);
++frequencies_count;
});
if (frequencies_count > 256) {
return -1.0;
}
return -entropy;
}
std::vector<uint8_t> generate_random_sequence(size_t sequence_size)
{
std::vector<uint8_t> random_sequence;
std::random_device rnd_device;
std::cout << "Random device entropy: " << rnd_device.entropy() << '\n';
std::mt19937 mersenne_engine(rnd_device());
std::uniform_int_distribution<unsigned> dist(0, 255);
auto gen = std::bind(dist, mersenne_engine);
random_sequence.resize(sequence_size);
std::generate(random_sequence.begin(), random_sequence.end(), gen);
return std::move(random_sequence);
}
std::vector<double> read_random_probabilities(size_t sequence_size)
{
std::vector<size_t> bytes_distribution(256);
std::vector<double> bytes_frequencies(256);
std::vector<uint8_t> random_sequence = generate_random_sequence(sequence_size);
size_t rnd_seq_size = random_sequence.size();
std::for_each(random_sequence.begin(), random_sequence.end(), [&](uint8_t b) mutable {
++bytes_distribution[b];
});
std::transform(bytes_distribution.begin(), bytes_distribution.end(), bytes_frequencies.begin(),
[&rnd_seq_size](size_t item) {
return static_cast<double>(item) / rnd_seq_size;
});
return std::move(bytes_frequencies);
}
int main(int argc, char* argv[]) {
size_t sequence_size = 1024 * 1024;
std::vector<double> bytes_frequencies = read_random_probabilities(sequence_size);
double entropy = shannon_entropy(bytes_frequencies.begin(), bytes_frequencies.end());
std::cout << "Sequence entropy: " << std::setprecision(16) << entropy << std::endl;
std::cout << "Min possible file size assuming max theoretical compression efficiency:\n";
std::cout << (entropy * sequence_size) << " in bits\n";
std::cout << ((entropy * sequence_size) / 8) << " in bytes\n";
return EXIT_SUCCESS;
}
First, it appears that std::random_device::entropy() hardcoded to return 32; in MSVC 2015 (which is probably 8.0 according to Shannon definition). As you can try it's not far from the truth, this example it's always close to 7.9998..., i.e. absolute chaos.
The working example is on IDEONE (by the way, their compiler hardcode entropy to 0)
One more, the main question - is it possible to create such a generator that generate linearly-distributed sequence with defined entropy, let's say 6.0 to 7.0? Could it be implemented at all, and if yes, if there are some implementations?
First, you're viewing Shannon's theory entirely wrong. His argument (as you're using it) is simply, "given the probably of x (Pr(x)), the bits required to store x is -log2 Pr(x). It has nothing to do with the probability of x. In this regard, you're viewing Pr(x) wrong. -log2 Pr(x) given a Pr(x) that should be uniformly 1/256 results in a required bitwidth of 8 bits to store. However, that's not how statistics work. Go back to thinking about Pr(x) because the bits required means nothing.
Your question is about statistics. Given an infinite sample, if-and-only-if the distribution matches the ideal histogram, as the sample size approaches infinite the probability of each sample will approach the expected frequency. I want to make it clear that you're not looking for "-log2 Pr(x) is absolute chaos when it's 8 given Pr(x) = 1/256." A uniform distribution is not chaos. In fact, it's... well, uniform. It's properties are well known, simple, and easy to predict. You're looking for, "Is the finite sample set of S meeting the criteria of a independently-distributed uniform distribution (commonly known as "Independently and Identically Distributed Data" or "i.i.d") of Pr(x) = 1/256?" This has nothing to do with Shannon's theory and goes much further back in time to the basic probability theories involving flips of a coin (in this case binomial given assumed uniformity).
Assuming for a moment that any C++11 <random> generator meets the criteria for "statistically indistinguishable from i.i.d." (which, by the way, those generators don't), you can use them to emulate i.i.d. results. If you would like a range of data that is storable within 6..7 bits (it wasn't clear, did you mean 6 or 7 bits, because hypothetically, everything in between is doable as well), simply scale the range. For example...
#include <iostream>
#include <random>
int main() {
unsigned long low = 1 << 6; // 2^6 == 64
unsigned long limit = 1 << 7; // 2^7 == 128
// Therefore, the range is 6-bits to 7-bits (or 64 + [128 - 64])
unsigned long range = limit - low;
std::random_device rd;
std::mt19937 rng(rd()); //<< Doesn't actually meet criteria for i.d.d.
std::uniform_int_distribution<unsigned long> dist(low, limit - 1); //<< Given an engine that actually produces i.i.d. data, this would produce exactly what you're looking for
for (int i = 0; i != 10; ++i) {
unsigned long y = dist(rng);
//y is known to be in set {2^6..2^7-1} and assumed to be uniform (coin flip) over {low..low + (range-1)}.
std::cout << y << std::endl;
}
return 0;
}
The problem with this is that, while the <random> distribution classes are accurate, the random number generators (presumably aside from std::random_device, but that's system-specific) are not designed to stand up to statistical tests of fitness as i.i.d. generators.
If you would like one that does, implement a CSPRNG (my go-to is Bob Jenkins' ISAAC) that has an interface meeting the requirements of the <random> class of generators (probably just covering the basic interface of std::random_device is good enough).
To test for statistically sound "no" or "we can't say no" for whether a set follows a specific model (and therefore Pr(x) is accurate and therefore Shannon's entropy function is an accurate prediction), that's a whole thing else entirely. Like I said, no generator in <random> meets these criteria (except maybe std::random_device). My advice is to do research into things like Central limit theorem, Goodness-of-fit, Birthday-spacing, et cetera.
To drive my point a bit more, under the assumptions of your question...
struct uniform_rng {
unsigned long x;
constexpr uniform_rng(unsigned long seed = 0) noexcept:
x{ seed }
{ };
unsigned long operator ()() noexcept {
unsigned long y = this->x++;
return y;
}
};
... would absolutely meet your criteria of being uniform (or as you say "absolute chaos"). Pr(x) is most certainly 1/N and the bits required to store any number of the set is -log2 Pr(1/N) which is whatever 2 to the power of the bitwidth of unsigned long is. However, it's not independently distributed. Because we know it's properties, you can "store" it's entire sequence by simply storing seed. Surprise, all PRNGs work this way. Therefore the bits required to store the entire sequence of an PRNG is -log2(1/2^bitsForSeed). As your sample grows, the bits required to store vs the bits your able to generate that sample (aka, the compression ratio) approaches a limit of 0.
I cannot comment yet, but I would like to start the discussion:
From communication/information theory, it would seem that you would require probabilistic shaping methods to achieve what you want. You should be able to feed the output of any distribution function through a shaping coder, which then should re-distribute the input to a specific target shannon entropy.
Probabilistic constellation shaping has been succesfully applied in fiber-optic communication: Wikipedia with some other links
You are not clear what you want to achieve, and there are several ways of lowering the Shannon entropy for your sequence:
Correlation between the bits, e.g. putting random_sequence through a
simple filter.
Individual bits are not fully random.
As an example below you could make the bytes less random:
std::vector<uint8_t> generate_random_sequence(size_t sequence_size,
int unit8_t cutoff=10)
{
std::vector<uint8_t> random_sequence;
std::vector<uint8_t> other_sequence;
std::random_device rnd_device;
std::cout << "Random device entropy: " << rnd_device.entropy() << '\n';
std::mt19937 mersenne_engine(rnd_device());
std::uniform_int_distribution<unsigned> dist(0, 255);
auto gen = std::bind(dist, mersenne_engine);
random_sequence.resize(sequence_size);
std::generate(random_sequence.begin(), random_sequence.end(), gen);
other_sequence.resize(sequence_size);
std::generate(other_sequence.begin(), other_sequence.end(), gen);
for(size_t j=0;j<size;++j) {
if (other_sequence[j]<=cutoff) random_sequence[j]=0; // Or j or ...
}
return std::move(random_sequence);
}
I don't think this was the answer you were looking for - so you likely need to clarify the question more.

First random number is always smaller than rest

I happen to notice that in C++ the first random number being called with the std rand() method is most of the time significant smaller than the second one. Concerning the Qt implementation the first one is nearly always several magnitudes smaller.
qsrand(QTime::currentTime().msec());
qDebug() << "qt1: " << qrand();
qDebug() << "qt2: " << qrand();
srand((unsigned int) time(0));
std::cout << "std1: " << rand() << std::endl;
std::cout << "std2: " << rand() << std::endl;
output:
qt1: 7109361
qt2: 1375429742
std1: 871649082
std2: 1820164987
Is this intended, due to error in seeding or a bug?
Also while the qrand() output varies strongly the first rand() output seems to change linearly with time. Just wonder why.
I'm not sure that could be classified as a bug, but it has an explanation. Let's examine the situation:
Look at rand's implementation. You'll see it's just a calculation using the last generated value.
You're seeding using QTime::currentTime().msec(), which is by nature bounded by the small range of values 0..999, but qsrand accepts an uint variable, on the range 0..4294967295.
By combining those two factors, you have a pattern.
Just out of curiosity: try seeding with QTime::currentTime().msec() + 100000000
Now the first value will probably be bigger than the second most of the time.
I wouldn't worry too much. This "pattern" seems to happen only on the first two generated values. After that, everything seems to go back to normal.
EDIT:
To make things more clear, try running the code below. It'll compare the first two generated values to see which one is smaller, using all possible millisecond values (range: 0..999) as the seed:
int totalCalls, leftIsSmaller = 0;
for (totalCalls = 0; totalCalls < 1000; totalCalls++)
{
qsrand(totalCalls);
if (qrand() < qrand())
leftIsSmaller++;
}
qDebug() << (100.0 * leftIsSmaller) / totalCalls;
It will print 94.8, which means 94.8% of the time the first value will be smaller than the second.
Conclusion: when using the current millisecond to seed, you'll see that pattern for the first two values. I did some tests here and the pattern seems to disappear after the second value is generated. My advice: find a "good" value to call qsrand (which should obviously be called only once, at the beginning of your program). A good value should span the whole range of the uint class. Take a look at this other question for some ideas:
Recommended way to initialize srand?
Also, take a look at this:
PCG: A Family of Better Random Number Generators
Neither current Qt nor C standard run-time have a quality randomizer and your test shows. Qt seems to use C run-time for that (this is easy to check but why). If C++ 11 is available in your project, use much better and way more reliable method:
#include <random>
#include <chrono>
auto seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::uniform_int_distribution<uint> distribution;
uint randomUint = distribution(generator);
There is good video that covers the topic. As noted by commenter user2357112 we can apply different random engines and then different distributions but for my specific use the above worked really well.
Keeping in mind that making judgments about a statistical phenomena based on a small number of samples might be misleading, I decided to run a small experiment. I run the following code:
int main()
{
int i = 0;
int j = 0;
while (i < RAND_MAX)
{
srand(time(NULL));
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
}
}
which basically printed the percentage of times the first generated number was smaller than the second. Below you see the plot of that ratio:
and as you can see it actually approaches 0.5 after less than 50 actual new seeds.
As suggested in the comment, we could modify the code to use consecutive seeds every iteration and speed up the convergence:
int main()
{
int i = 0;
int j = 0;
int t = time(NULL);
while (i < RAND_MAX)
{
srand(t);
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
++t;
}
}
This gives us:
which stays pretty close to 0.5 as well.
While rand is certainly not the best pseudo random number generator, the claim that it often generates a smaller number during the first run does not seem to be warranted.

Random Engine Differences

The C++11 standard specifies a number of different engines for random number generation: linear_congruential_engine, mersenne_twister_engine, subtract_with_carry_engine and so on. Obviously, this is a large change from the old usage of std::rand.
Obviously, one of the major benefits of (at least some) of these engines is the massively increased period length (it's built into the name for std::mt19937).
However, the differences between the engines is less clear. What are the strengths and weaknesses of the different engines? When should one be used over the other? Is there a sensible default that should generally be preferred?
From the explanations below, a linear engine seems to be faster but less random while the Mersenne Twister has a higher complexity and randomness. Subtract-with-carry random number engine is an improvement to the linear engine and it is definitely more random. In the last reference, it is stated that Mersenne Twister has higher complexity than the Subtract-with-carry random number engine.
Linear congruential random number engine
A pseudo-random number generator engine that produces unsigned integer numbers.
This is the simplest generator engine in the standard library. Its state is a single integer value, with the following transition algorithm:
x = (ax+c) mod m
Where x is the current state value, a and c are their respective template parameters, and m is its respective template parameter if this is greater than 0, or numerics_limits<UIntType>::max() + 1, otherwise.
Its generation algorithm is a direct copy of the state value.
This makes it an extremely efficient generator in terms of processing and memory consumption, but producing numbers with varying degrees of serial correlation, depending on the specific parameters used.
The random numbers generated by linear_congruential_engine have a period of m.
Mersenne twister random number engine
A pseudo-random number generator engine that produces unsigned integer numbers in the closed interval [0,2^w-1].
The algorithm used by this engine is optimized to compute large series of numbers (such as in Monte Carlo experiments) with an almost uniform distribution in the range.
The engine has an internal state sequence of n integer elements, which is filled with a pseudo-random series generated on construction or by calling member function seed.
The internal state sequence becomes the source for n elements: When the state is advanced (for example, in order to produce a new random number), the engine alters the state sequence by twisting the current value using xor mask a on a mix of bits determined by parameter r that come from that value and from a value m elements away (see operator() for details).
The random numbers produced are tempered versions of these twisted values. The tempering is a sequence of shift and xor operations defined by parameters u, d, s, b, t, c and l applied on the selected state value (see operator()).
The random numbers generated by mersenne_twister_engine have a period equivalent to the mersenne number 2^((n-1)*w)-1.
Subtract-with-carry random number engine
A pseudo-random number generator engine that produces unsigned integer numbers.
The algorithm used by this engine is a lagged fibonacci generator, with a state sequence of r integer elements, plus one carry value.
Lagged Fibonacci generators have a maximum period of (2k - 1)*^(2M-1) if addition or subtraction is used. The initialization of LFGs is a very complex problem. The output of LFGs is very sensitive to initial conditions, and statistical defects may appear initially but also periodically in the output sequence unless extreme care is taken. Another potential problem with LFGs is that the mathematical theory behind them is incomplete, making it necessary to rely on statistical tests rather than theoretical performance.
And finally from the documentation of random:
The choice of which engine to use involves a number of tradeoffs: the linear congruential engine is moderately fast and has a very small storage requirement for state. The lagged Fibonacci generators are very fast even on processors without advanced arithmetic instruction sets, at the expense of greater state storage and sometimes less desirable spectral characteristics. The Mersenne Twister is slower and has greater state storage requirements but with the right parameters has the longest non-repeating sequence with the most desirable spectral characteristics (for a given definition of desirable).
I think that the point is that random generators have different properties, which can make them more suitable or not for a given problem.
The period length is one of the properties.
The quality of the random numbers can also be important.
The performance of the generator can also be an issue.
Depending on your need, you might take one generator or another one. E.g., if you need fast random numbers but do not really care for the quality, an LCG might be a good option. If you want better quality random numbers, the Mersenne Twister is probably a better option.
To help you making your choice, there are some standard tests and results (I definitely like the table p.29 of this paper).
EDIT: From the paper,
The LCG (LCG(***) in the paper) family are the fastest generators, but with the poorest quality.
The Mersenne Twister (MT19937) is a little bit slower, but yields better random numbers.
The substract with carry ( SWB(***), I think) are way slower, but can yield better random properties when well tuned.
As the other answers forget about ranlux, here is a small note by an AMD developer that recently ported it to OpenCL:
https://community.amd.com/thread/139236
RANLUX is also one of very few (the only one I know of actually) PRNGs that has a underlying theory explaining why it generates "random" numbers, and why they are good. Indeed, if the theory is correct (and I don't know of anyone who has disputed it), RANLUX at the highest luxury level produces completely decorrelated numbers down to the last bit, with no long-range correlations as long as we stay well below the period (10^171). Most other generators can say very little about their quality (like Mersenne Twister, KISS etc.) They must rely on passing statistical tests.
Physicists at CERN are fan of this PRNG. 'nuff said.
Some of the information in these other answers conflicts with my findings. I've run tests on Windows 8.1 using Visual Studio 2013, and consistently I've found mersenne_twister_engine to be but higher quality and significantly faster than either linear_congruential_engine or subtract_with_carry_engine. This leads me to believe, when the information in the other answers are taken into account, that the specific implementation of an engine has a significant impact on performance.
This is of great surprise to nobody, I'm sure, but it's not mentioned in the other answers where mersenne_twister_engine is said to be slower. I have no test results for other platforms and compilers, but with my configuration, mersenne_twister_engine is clearly the superior choice when considering period, quality, and speed performance. I have not profiled memory usage, so I cannot speak to the space requirement property.
Here's the code I'm using to test with (to make portable, you should only have to replace the windows.h QueryPerformanceXxx() API calls with an appropriate timing mechanism):
// compile with: cl.exe /EHsc
#include <random>
#include <iostream>
#include <windows.h>
using namespace std;
void test_lc(const int a, const int b, const int s) {
/*
typedef linear_congruential_engine<unsigned int, 48271, 0, 2147483647> minstd_rand;
*/
minstd_rand gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
void test_mt(const int a, const int b, const int s) {
/*
typedef mersenne_twister_engine<unsigned int, 32, 624, 397,
31, 0x9908b0df,
11, 0xffffffff,
7, 0x9d2c5680,
15, 0xefc60000,
18, 1812433253> mt19937;
*/
mt19937 gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
void test_swc(const int a, const int b, const int s) {
/*
typedef subtract_with_carry_engine<unsigned int, 24, 10, 24> ranlux24_base;
*/
ranlux24_base gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
int main()
{
int a_dist = 0;
int b_dist = 1000;
int samples = 100000000;
cout << "Testing with " << samples << " samples." << endl;
LARGE_INTEGER ElapsedTime;
double ElapsedSeconds = 0;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
double TickInterval = 1.0 / ((double) Frequency.QuadPart);
LARGE_INTEGER StartingTime;
LARGE_INTEGER EndingTime;
QueryPerformanceCounter(&StartingTime);
test_lc(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << "linear_congruential_engine time: " << ElapsedSeconds << endl;
QueryPerformanceCounter(&StartingTime);
test_mt(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << " mersenne_twister_engine time: " << ElapsedSeconds << endl;
QueryPerformanceCounter(&StartingTime);
test_swc(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << "subtract_with_carry_engine time: " << ElapsedSeconds << endl;
}
Output:
Testing with 100000000 samples.
linear_congruential_engine time: 10.0821
mersenne_twister_engine time: 6.11615
subtract_with_carry_engine time: 9.26676
I just saw this answer from Marnos and decided to test it myself. I used std::chono::high_resolution_clock to time 100000 samples 100 times to produce an average. I measured everything in std::chrono::nanoseconds and ended up with different results:
std::minstd_rand had an average of 28991658 nanoseconds
std::mt19937 had an average of 29871710 nanoseconds
ranlux48_base had an average of 29281677 nanoseconds
This is on a Windows 7 machine. Compiler is Mingw-Builds 4.8.1 64bit. This is obviously using the C++11 flag and no optimisation flags.
When I turn on -O3 optimisations, the std::minstd_rand and ranlux48_base actually run faster than what the implementation of high_precision_clock can measure; however std::mt19937 still takes 730045 nanoseconds, or 3/4 of a second.
So, as he said, it's implementation specific, but at least in GCC the average time seems to stick to what the descriptions in the accepted answer say. Mersenne Twister seems to benefit the least from optimizations, whereas the other two really just throw out the random numbers unbelieveably fast once you factor in compiler optimizations.
As an aside, I'd been using Mersenne Twister engine in my noise generation library (it doesn't precompute gradients), so I think I'll switch to one of the others to really see some speed improvements. In my case, the "true" randomness doesn't matter.
Code:
#include <iostream>
#include <chrono>
#include <random>
using namespace std;
using namespace std::chrono;
int main()
{
minstd_rand linearCongruentialEngine;
mt19937 mersenneTwister;
ranlux48_base subtractWithCarry;
uniform_real_distribution<float> distro;
int numSamples = 100000;
int repeats = 100;
long long int avgL = 0;
long long int avgM = 0;
long long int avgS = 0;
cout << "results:" << endl;
for(int j = 0; j < repeats; ++j)
{
cout << "start of sequence: " << j << endl;
auto start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(linearCongruentialEngine);
auto stop = high_resolution_clock::now();
auto L = duration_cast<nanoseconds>(stop-start).count();
avgL += L;
cout << "Linear Congruential:\t" << L << endl;
start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(mersenneTwister);
stop = high_resolution_clock::now();
auto M = duration_cast<nanoseconds>(stop-start).count();
avgM += M;
cout << "Mersenne Twister:\t" << M << endl;
start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(subtractWithCarry);
stop = high_resolution_clock::now();
auto S = duration_cast<nanoseconds>(stop-start).count();
avgS += S;
cout << "Subtract With Carry:\t" << S << endl;
}
cout << setprecision(10) << "\naverage:\nLinear Congruential: " << (long double)(avgL/repeats)
<< "\nMersenne Twister: " << (long double)(avgM/repeats)
<< "\nSubtract with Carry: " << (long double)(avgS/repeats) << endl;
}
Its a trade-off really. A PRNG like Mersenne Twister is better because it has extremely large period and other good statistical properties.
But a large period PRNG takes up more memory (for maintaining the internal state) and also takes more time for generating a random number (due to complex transitions and post processing).
Choose a PNRG depending on the needs of your application. When in doubt use Mersenne Twister, its the default in many tools.
In general, mersenne twister is the best (and fastest) RNG, but it requires some space (about 2.5 kilobytes). Which one suits your need depends on how many times you need to instantiate the generator object. (If you need to instantiate it only once, or a few times, then MT is the one to use. If you need to instantiate it millions of times, then perhaps something smaller.)
Some people report that MT is slower than some of the others. According to my experiments, this depends a lot on your compiler optimization settings. Most importantly the -march=native setting may make a huge difference, depending on your host architecture.
I ran a small program to test the speed of different generators, and their sizes, and got this:
std::mt19937 (2504 bytes): 1.4714 s
std::mt19937_64 (2504 bytes): 1.50923 s
std::ranlux24 (120 bytes): 16.4865 s
std::ranlux48 (120 bytes): 57.7741 s
std::minstd_rand (4 bytes): 1.04819 s
std::minstd_rand0 (4 bytes): 1.33398 s
std::knuth_b (1032 bytes): 1.42746 s

Uniform random number generator in c++

I am trying to produce true random number in c++ with C++ TR1.
However, when run my program again, it produces same random numbers.The code is below.
I need true random number for each run as random as possible.
std::tr1::mt19937 eng;
std::tr1::uniform_real<double> unif(0, 1);
unif(eng);
You have to initialize the engine with a seed, otherwise the default seed is going to be used:
eng.seed(static_cast<unsigned int >(time(NULL)));
However, true randomness is something you cannot achieve on a deterministic machine without additional input. Every pseudo-random number generator is periodical in some way, which is something you wouldn't expect from a non-deterministic number. For example std::mt19937 has a period of 219937-1 iterations. True randomness is hard to achieve, as you would have to monitor something that doesn't seem deterministic (user input, atmospheric noise). See Jerry's and Handprint's answer.
If you don't want a time based seed you can use std::random_device as seen in emsr's answer. You could even use std::random_device as generator, which is the closest you'll get to true randomness with standard library methods only.
These are pseudo-random number generators. They can never produce truly random numbers. For that, you typically need special hardware (e.g., typically things like measuring noise in a thermal diode or radiation from radioactive source).
To get a difference sequences from pseudo-random generators in different runs, you typically seed the generator based on the current time.
That produces fairly predictable results though (i.e., somebody else can figure out the seed you used fairly easily. If you need to prevent that, most systems do provide some source of at least fairly random numbers. On Linux, /dev/random, and on Windows, CryptGenRandom.
Those latter tend to be fairly slow, though, so you usually want to use them as a seed, not just retrieve all your random numbers from them.
If you want true hardware random numbers then the standard library offers access to this through the random_device class:
I use it to seed another generator:
#include <random>
...
std::mt19937_64 re;
std::random_device rd;
re.seed(rd());
...
std::cout << re();
If your hardware has /dev/urandom or /dev/random then this will be used. Otherwise the implementation is free to use one of it's pseudorandom generators. On G++ mt19937 is used as a fallback.
I'm pretty sure tr1 has this as well bu as others noted I think it's best to use std C++11 utilities at this point.
Ed
This answer is a wiki. I'm working on a library and examples in .NET, feel free to add your own in any language...
Without external 'random' input (such as monitoring street noise), as a deterministic machine, a computer cannot generate truly random numbers: Random Number Generation.
Since most of us don't have the money and expertise to utilize the special equipment to provide chaotic input, there are ways to utitlize the somewhat unpredictable nature of your OS, task scheduler, process manager, and user inputs (e.g. mouse movement), to generate the improved pseudo-randomness.
Unfortunately, I do not know enough about C++ TR1 to know if it has the capability to do this.
Edit
As others have pointed out, you get different number sequences (which eventually repeat, so they aren't truly random), by seeding your RNG with different inputs. So you have two options in improving your generation:
Periodically reseed your RNG with some sort of chaotic input OR make the output of your RNG unreliable based on how your system operates.
The former can be accomplished by creating algorithms that explicitly produce seeds by examining the system environment. This may require setting up some event handlers, delegate functions, etc.
The latter can be accomplished by poor parallel computing practice: i.e. setting many RNG threads/processes to compete in an 'unsafe manner' to create each subsequent random number (or number sequence). This implicitly adds chaos from the sum total of activity on your system, because every minute event will have an impact on which thread's output ends up having being written and eventually read when a 'GetNext()' type method is called. Below is a crude proof of concept in .NET 3.5. Note two things: 1) Even though the RNG is seeded with the same number everytime, 24 identical rows are not created; 2) There is a noticeable hit on performance and obvious increase in resource consumption, which is a given when improving random number generation:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace RandomParallel
{
class RandomParallel
{
static int[] _randomRepository;
static Queue<int> _randomSource = new Queue<int>();
static void Main(string[] args)
{
InitializeRepository(0, 1, 40);
FillSource();
for (int i = 0; i < 24; i++)
{
for (int j = 0; j < 40; j++)
Console.Write(GetNext() + " ");
Console.WriteLine();
}
Console.ReadLine();
}
static void InitializeRepository(int min, int max, int size)
{
_randomRepository = new int[size];
var rand = new Random(1024);
for (int i = 0; i < size; i++)
_randomRepository[i] = rand.Next(min, max + 1);
}
static void FillSource()
{
Thread[] threads = new Thread[Environment.ProcessorCount * 8];
for (int j = 0; j < threads.Length; j++)
{
threads[j] = new Thread((myNum) =>
{
int i = (int)myNum * _randomRepository.Length / threads.Length;
int max = (((int)myNum + 1) * _randomRepository.Length / threads.Length) - 1;
for (int k = i; k <= max; k++)
{
_randomSource.Enqueue(_randomRepository[k]);
}
});
threads[j].Priority = ThreadPriority.Highest;
}
for (int k = 0; k < threads.Length; k++)
threads[k].Start(k);
}
static int GetNext()
{
if (_randomSource.Count > 0)
return _randomSource.Dequeue();
else
{
FillSource();
return _randomSource.Dequeue();
}
}
}
}
As long as there is user(s) input/interaction during the generation, this technique will produce an uncrackable, non-repeating sequence of 'random' numbers. In such a scenario, knowing the initial state of the machine would be insufficient to predict the outcome.
Here's an example of seeding the engine (using C++11 instead of TR1)
#include <chrono>
#include <random>
#include <iostream>
int main() {
std::mt19937 eng(std::chrono::high_resolution_clock::now()
.time_since_epoch().count());
std::uniform_real_distribution<> unif;
std::cout << unif(eng) << '\n';
}
Seeding with the current time can be relatively predictable and is probably not something that should be done. The above at least does not limit you just to one possible seed per second, which is very predictable.
If you want to seed from something like /dev/random instead of the current time you can do:
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 eng(seed);
(This depends on your standard library implementation. For example, libc++ uses /dev/urandom by default, but in VS11 random_device is deterministic)
Of course nothing you get out of mt19937 is going to meet your requirement of a "true random number", and I suspect that you don't really need true randomness.