C++ random generator with provided (at least estimated) entropy - c++

Using C++ standard random generator I can more or less efficiently create sequences with pre-defined distributions using language-provided tools. What about Shannon entropy? Is it possible some way to define output Shannon entropy for the provided sequence?
I tried a small experiment, generated a long enough sequence with linear distribution, and implemented a Shannon entropy calculator. Resulting value is from 0.0 (absolute order) to 8.0 (absolute chaos)
template <typename T>
double shannon_entropy(T first, T last)
{
size_t frequencies_count{};
double entropy = 0.0;
std::for_each(first, last, [&entropy, &frequencies_count] (auto item) mutable {
if (0. == item) return;
double fp_item = static_cast<double>(item);
entropy += fp_item * log2(fp_item);
++frequencies_count;
});
if (frequencies_count > 256) {
return -1.0;
}
return -entropy;
}
std::vector<uint8_t> generate_random_sequence(size_t sequence_size)
{
std::vector<uint8_t> random_sequence;
std::random_device rnd_device;
std::cout << "Random device entropy: " << rnd_device.entropy() << '\n';
std::mt19937 mersenne_engine(rnd_device());
std::uniform_int_distribution<unsigned> dist(0, 255);
auto gen = std::bind(dist, mersenne_engine);
random_sequence.resize(sequence_size);
std::generate(random_sequence.begin(), random_sequence.end(), gen);
return std::move(random_sequence);
}
std::vector<double> read_random_probabilities(size_t sequence_size)
{
std::vector<size_t> bytes_distribution(256);
std::vector<double> bytes_frequencies(256);
std::vector<uint8_t> random_sequence = generate_random_sequence(sequence_size);
size_t rnd_seq_size = random_sequence.size();
std::for_each(random_sequence.begin(), random_sequence.end(), [&](uint8_t b) mutable {
++bytes_distribution[b];
});
std::transform(bytes_distribution.begin(), bytes_distribution.end(), bytes_frequencies.begin(),
[&rnd_seq_size](size_t item) {
return static_cast<double>(item) / rnd_seq_size;
});
return std::move(bytes_frequencies);
}
int main(int argc, char* argv[]) {
size_t sequence_size = 1024 * 1024;
std::vector<double> bytes_frequencies = read_random_probabilities(sequence_size);
double entropy = shannon_entropy(bytes_frequencies.begin(), bytes_frequencies.end());
std::cout << "Sequence entropy: " << std::setprecision(16) << entropy << std::endl;
std::cout << "Min possible file size assuming max theoretical compression efficiency:\n";
std::cout << (entropy * sequence_size) << " in bits\n";
std::cout << ((entropy * sequence_size) / 8) << " in bytes\n";
return EXIT_SUCCESS;
}
First, it appears that std::random_device::entropy() hardcoded to return 32; in MSVC 2015 (which is probably 8.0 according to Shannon definition). As you can try it's not far from the truth, this example it's always close to 7.9998..., i.e. absolute chaos.
The working example is on IDEONE (by the way, their compiler hardcode entropy to 0)
One more, the main question - is it possible to create such a generator that generate linearly-distributed sequence with defined entropy, let's say 6.0 to 7.0? Could it be implemented at all, and if yes, if there are some implementations?

First, you're viewing Shannon's theory entirely wrong. His argument (as you're using it) is simply, "given the probably of x (Pr(x)), the bits required to store x is -log2 Pr(x). It has nothing to do with the probability of x. In this regard, you're viewing Pr(x) wrong. -log2 Pr(x) given a Pr(x) that should be uniformly 1/256 results in a required bitwidth of 8 bits to store. However, that's not how statistics work. Go back to thinking about Pr(x) because the bits required means nothing.
Your question is about statistics. Given an infinite sample, if-and-only-if the distribution matches the ideal histogram, as the sample size approaches infinite the probability of each sample will approach the expected frequency. I want to make it clear that you're not looking for "-log2 Pr(x) is absolute chaos when it's 8 given Pr(x) = 1/256." A uniform distribution is not chaos. In fact, it's... well, uniform. It's properties are well known, simple, and easy to predict. You're looking for, "Is the finite sample set of S meeting the criteria of a independently-distributed uniform distribution (commonly known as "Independently and Identically Distributed Data" or "i.i.d") of Pr(x) = 1/256?" This has nothing to do with Shannon's theory and goes much further back in time to the basic probability theories involving flips of a coin (in this case binomial given assumed uniformity).
Assuming for a moment that any C++11 <random> generator meets the criteria for "statistically indistinguishable from i.i.d." (which, by the way, those generators don't), you can use them to emulate i.i.d. results. If you would like a range of data that is storable within 6..7 bits (it wasn't clear, did you mean 6 or 7 bits, because hypothetically, everything in between is doable as well), simply scale the range. For example...
#include <iostream>
#include <random>
int main() {
unsigned long low = 1 << 6; // 2^6 == 64
unsigned long limit = 1 << 7; // 2^7 == 128
// Therefore, the range is 6-bits to 7-bits (or 64 + [128 - 64])
unsigned long range = limit - low;
std::random_device rd;
std::mt19937 rng(rd()); //<< Doesn't actually meet criteria for i.d.d.
std::uniform_int_distribution<unsigned long> dist(low, limit - 1); //<< Given an engine that actually produces i.i.d. data, this would produce exactly what you're looking for
for (int i = 0; i != 10; ++i) {
unsigned long y = dist(rng);
//y is known to be in set {2^6..2^7-1} and assumed to be uniform (coin flip) over {low..low + (range-1)}.
std::cout << y << std::endl;
}
return 0;
}
The problem with this is that, while the <random> distribution classes are accurate, the random number generators (presumably aside from std::random_device, but that's system-specific) are not designed to stand up to statistical tests of fitness as i.i.d. generators.
If you would like one that does, implement a CSPRNG (my go-to is Bob Jenkins' ISAAC) that has an interface meeting the requirements of the <random> class of generators (probably just covering the basic interface of std::random_device is good enough).
To test for statistically sound "no" or "we can't say no" for whether a set follows a specific model (and therefore Pr(x) is accurate and therefore Shannon's entropy function is an accurate prediction), that's a whole thing else entirely. Like I said, no generator in <random> meets these criteria (except maybe std::random_device). My advice is to do research into things like Central limit theorem, Goodness-of-fit, Birthday-spacing, et cetera.
To drive my point a bit more, under the assumptions of your question...
struct uniform_rng {
unsigned long x;
constexpr uniform_rng(unsigned long seed = 0) noexcept:
x{ seed }
{ };
unsigned long operator ()() noexcept {
unsigned long y = this->x++;
return y;
}
};
... would absolutely meet your criteria of being uniform (or as you say "absolute chaos"). Pr(x) is most certainly 1/N and the bits required to store any number of the set is -log2 Pr(1/N) which is whatever 2 to the power of the bitwidth of unsigned long is. However, it's not independently distributed. Because we know it's properties, you can "store" it's entire sequence by simply storing seed. Surprise, all PRNGs work this way. Therefore the bits required to store the entire sequence of an PRNG is -log2(1/2^bitsForSeed). As your sample grows, the bits required to store vs the bits your able to generate that sample (aka, the compression ratio) approaches a limit of 0.

I cannot comment yet, but I would like to start the discussion:
From communication/information theory, it would seem that you would require probabilistic shaping methods to achieve what you want. You should be able to feed the output of any distribution function through a shaping coder, which then should re-distribute the input to a specific target shannon entropy.
Probabilistic constellation shaping has been succesfully applied in fiber-optic communication: Wikipedia with some other links

You are not clear what you want to achieve, and there are several ways of lowering the Shannon entropy for your sequence:
Correlation between the bits, e.g. putting random_sequence through a
simple filter.
Individual bits are not fully random.
As an example below you could make the bytes less random:
std::vector<uint8_t> generate_random_sequence(size_t sequence_size,
int unit8_t cutoff=10)
{
std::vector<uint8_t> random_sequence;
std::vector<uint8_t> other_sequence;
std::random_device rnd_device;
std::cout << "Random device entropy: " << rnd_device.entropy() << '\n';
std::mt19937 mersenne_engine(rnd_device());
std::uniform_int_distribution<unsigned> dist(0, 255);
auto gen = std::bind(dist, mersenne_engine);
random_sequence.resize(sequence_size);
std::generate(random_sequence.begin(), random_sequence.end(), gen);
other_sequence.resize(sequence_size);
std::generate(other_sequence.begin(), other_sequence.end(), gen);
for(size_t j=0;j<size;++j) {
if (other_sequence[j]<=cutoff) random_sequence[j]=0; // Or j or ...
}
return std::move(random_sequence);
}
I don't think this was the answer you were looking for - so you likely need to clarify the question more.

Related

Draining the entropy of std::random_device

I'm using std::random_device and would like to check for its remaining entropy. According to cppreference.com:
std::random_device::entropy
double entropy() const noexcept;
[...]
Return value
The value of the device entropy, or zero if not applicable.
Notes
This function is not fully implemented in some standard libraries. For example, LLVM libc++ always returns zero even though the device is non-deterministic. In comparison, Microsoft Visual C++ implementation always returns 32, and boost.random returns 10.
The entropy of the Linux kernel device /dev/urandom may be obtained using ioctl RNDGETENTCNT - that's what std::random_device::entropy() in GNU libstdc++ uses as of version 8.1
So under Linux ang g++ >= 8.1, I should be good... but I'm not:
#include <random>
#include <iostream>
void drain_entropy(std::random_device& rd, std::size_t count = 1)
{
while (count --> 0) {
volatile const int discard = rd();
(void) discard;
}
}
int main()
{
std::random_device rd;
std::cout << "Entropy: " << rd.entropy() << '\n'; // Entropy: 32
drain_entropy(rd, 1'000'000);
std::cout << "Entropy: " << rd.entropy() << '\n'; // Entropy: 32
}
Live demo on Coliru (which runs under Linux, right?)
I'm expecting that generating numbers from the device drains its entropy. But it doesn't.
What's happening?
The library will not return an entropy value greater than the number of bits in its result type, which is 32 in this case.
See libstd code:
const int max = sizeof(result_type) * __CHAR_BIT__;
if (ent > max)
ent = max;
The documentation you linked to explains this:
Obtains an estimate of the random number device entropy, which is a floating-point value between 0 and log 2(max()+1) (which is equal to std::numeric_limits::digits).
You can see how .entropy() is implemented here.
Basically, the entropy() calls ioctl(fd, RNDGETENTCNT, &ent) and returns ent (after clamping it the maximum number of bits in the target type as required).
It just so happens that it didn't change after your drain_entropy call.
You can manually implement this method and see that it behaves the same.
Even if you remove the clamping, the entropy is barely affected (it could even increase).

Which machines support nondeterministic random_device?

I need to obtain data from different C++ random number generation algorithms, and for that purpose I created some programs. Some of them use pseudo-random number generators and others use random_device (nondeterministic random number generator). The following program belongs to the second group:
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
using namespace std;
const int N = 5000;
const int M = 1000000;
const int VALS = 2;
const int ESP = M / VALS;
int main() {
for (int i = 0; i < N; ++i) {
random_device rd;
if (rd.entropy() == 0) {
cout << "No support for nondeterministic RNG." << endl;
break;
} else {
mt19937 gen(rd());
uniform_int_distribution<int> distrib(0, 1);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[distrib(gen)];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
}
As you can see in the code, I check for the entropy to be greater than 0. I do this because:
Unlike the other standard generators, this [random_device] is not meant to be an
engine that generates pseudo-random numbers, but a generator based on
stochastic processes to generate a sequence of uniformly distributed
random numbers. Although, certain library implementations may lack the
ability to produce such numbers and employ a random number engine to
generate pseudo-random values instead. In this case, entropy returns
zero. Source
Checking the value of the entropy allows me to abort de data obtaining if the resulting data is going to be pseudo-random (not nondeterministic). Please note that I assume that if rd.entropy() == 0 is true, then we are in pseudo-random mode.
Unfortunately, all my trials result in a file with no data because of entropy being 0. My question is: what can I do to my computer, or where can I find a machine that allows me to obtain the data?
The source you cite is misleading you. The standard says that
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log2(max()+1).
And a better reference has some empirical observations
Notes
This function is not fully implemented in some standard libraries. For
example, LLVM libc++ always returns zero even though the device is
non-deterministic. In comparison, Microsoft Visual C++ implementation
always returns 32, and boost.random returns 10.
In practice, nearly all the main implementations (targeting general purpose computers) have non-deterministic std::random_devices. Your test has a very high false negative rate.

C++ Get random number from 0 to max long long integer

I have the following function:
typedef unsigned long long int UINT64;
UINT64 getRandom(const UINT64 &begin = 0, const UINT64 &end = 100) {
return begin >= end ? 0 : begin + (UINT64) ((end - begin)*rand()/(double)RAND_MAX);
};
Whenever I call
getRandom(0, ULLONG_MAX);
or
getRandom(0, LLONG_MAX);
I always get the same value 562967133814800. How can I fix this problem?
What is rand()?
According to this the rand() function returns a value in the range [0,RAND_MAX].
What is RAND_MAX?
According to this, RAND_MAX is "an integral constant expression whose value is the maximum value returned by the rand function. This value is library-dependent, but is guaranteed to be at least 32767 on any standard library implementation."
Precision Is An Issue
You take rand()/(double)RAND_MAX, but you have perhaps only 32767 discrete values to work with. Thus, although you have big numbers, you don't really have more numbers. That could be an issue.
Seeding May Be An Issue
Also, you don't talk about how you are calling the function. Do you run the program once with LLONG_MAX and another time with ULLONG_MAX? In that case, the behaviour you are seeing is because you are implicitly using the same random seed each time. Put another way, each time you run the program it will generate the exact same sequence of random numbers.
How can I seed?
You can use the srand() function like so:
#include <stdlib.h> /* srand, rand */
#include <time.h> /* time */
int main (){
srand (time(NULL));
//The rest of your program goes here
}
Now you will get a new sequence of random numbers each time you run your program.
Overflow Is An Issue
Consider this part ((end - begin)*rand()/(double)RAND_MAX).
What is (end-begin)? It is LLONG_MAX or ULLONG_MAX these are, by definition, the largest possible values those data types can hold. Therefore, it would be bad to multiply them by anything. Yet you do! You multiply them by rand(), which is non-zero. This will cause an overflow. But we can fix that...
Order of Operations Is An Issue
You then divide them by RAND_MAX. I think you've got your order of operations wrong here. You really meant to say:
((end - begin) * (rand()/(double)RAND_MAX) )
Note the new parantheses! (rand()/(double)RAND_MAX)
Now you are multiplying an integer by a fraction, so you are guaranteed not to overflow. But that introduces a new problem...
Promotion Is An Issue
But there's an even deeper problem. You divide an int by a double. When you do that the int is promoted to a double. A double is a floating-point number which basically means that it sacrifices precision in order to have a big range. That's probably what's biting you. As you get to bigger and bigger numbers both your ullong and your llong end up getting cast to the same value. This could be especially true if you overflowed your data type first (see above).
Uh oh
So, basically, everything about the PRNG you have presented is wrong.
Perhaps this is why John von Neumann said
Anyone who attempts to generate random numbers by deterministic means
is, of course, living in a state of sin.
And, sometimes, we pay for those sins.
How can I absolve myself?
C++11 provides some nice functionality. You can use it as follows
#include <iostream>
#include <random>
#include <limits>
int main(){
std::random_device rd; //Get a random seed from the OS entropy device, or whatever
std::mt19937_64 eng(rd()); //Use the 64-bit Mersenne Twister 19937 generator
//and seed it with entropy.
//Define the distribution, by default it goes from 0 to MAX(unsigned long long)
//or what have you.
std::uniform_int_distribution<unsigned long long> distr;
//Generate random numbers
for(int n=0; n<40; n++)
std::cout << distr(eng) << ' ';
std::cout << std::endl;
}
(Note that appropriately seeding the generator is difficult. This question addresses that.)
typedef unsigned long long int UINT64;
UINT64 getRandom(UINT64 const& min = 0, UINT64 const& max = 0)
{
return (((UINT64)(unsigned int)rand() << 32) + (UINT64)(unsigned int)rand()) % (max - min) + min;
}
Using shift operation is unsafe since unsigned long long might be less than 64 bits on some machines. You can use unsigned __int64 instead, but keep in mind it's compiler dependant, therefore is available only in certain compilers.
unsigned __int64 getRandom(unsigned __int64 const& min, unsigned __int64 const& max)
{
return (((unsigned __int64)(unsigned int)rand() << 32) + (unsigned __int64)(unsigned int)rand()) % (max - min) + min;
}
Use your own PRNG that meets your requirements rather than the one provided with rand that seems not to and was never guaranteed to.
Given that ULLONG_MAX and LLONG_MAX are both way bigger than the RAND_MAX value, you will certainly get "less precision than you want".
Other than that, there's 50% chance that your value is below the LLONG_MAX, as it is halfway throuogh the range of 64-bit values.
I would suggest using the Mersenne-Twister from the C++11, which has a 64-bit variant
http://www.cplusplus.com/reference/random/mt19937_64/
That should give you a value that fits in a 64-bit number.
If you "always get the same value", then it's because you haven't seeded the random number generator, using for example srand(time(0)) - you should normally only seed once, because this sets the "sequence". If the seed is very similar, e.g. you run the same program twice in short succession, you will still get the same sequence, because "time" only ticks once a second, and even then, doesn't change that much. There are various other ways to seed a random number, but for most purposes, time(0) is reasonably good.
You are overflowing the computation, in the expression
((end - begin)*rand()/(double)RAND_MAX)
you are telling the compiler to multiply (ULLONG_MAX - 0) * rand() and then divide by RAND_MAX, you should divide by RAND_MAX first, then multiply by rand().
// http://stackoverflow.com/questions/22883840/c-get-random-number-from-0-to-max-long-long-integer
#include <iostream>
#include <stdlib.h> /* srand, rand */
#include <limits.h>
using std::cout;
using std::endl;
typedef unsigned long long int UINT64;
UINT64 getRandom(const UINT64 &begin = 0, const UINT64 &end = 100) {
//return begin >= end ? 0 : begin + (UINT64) ((end - begin)*rand()/(double)RAND_MAX);
return begin >= end ? 0 : begin + (UINT64) rand()*((end - begin)/RAND_MAX);
};
int main( int argc, char *argv[] )
{
cout << getRandom(0, ULLONG_MAX) << endl;
cout << getRandom(0, ULLONG_MAX) << endl;
cout << getRandom(0, ULLONG_MAX) << endl;
return 0;
}
See it live in Coliru
union bigRand {
uint64_t ll;
uint32_t ii[2];
};
uint64_t rand64() {
bigRand b;
b.ii[0] = rand();
b.ii[1] = rand();
return b.ll;
}
I am not sure how portable it is. But you could easily modify it depending on how wide RAND_MAX is on the particular platform. As an upside, it is brutally simple. I mean the compiler will likely optimize it to be quite efficient, without extra arithmetic whatsoever. Just the cost of calling rand twice.
The most reasonable solution would be to use C++11's <random>, mt19937_64 would do.
Alternativelly you might try:
return ((double)rand() / ((double)RAND_MAX + 1.0)) * (end - begin + 1) + begin;
to produce numbers in more reasonable way. However note that just like your first attempt, this will still not be producing uniformly distributed numbers (although it might be good enough).
The term (end - begin)*rand() seems produce an overflow. You can alleviate that problem by using (end - begin) * (rand()/(double)RAND_MAX). Using the second way, I get the following results:
15498727792227194880
7275080918072332288
14445630964995612672
14728618955737210880
with the following calls:
std::cout << getRandom(0, ULLONG_MAX) << std::endl;
std::cout << getRandom(0, ULLONG_MAX) << std::endl;
std::cout << getRandom(0, ULLONG_MAX) << std::endl;
std::cout << getRandom(0, ULLONG_MAX) << std::endl;

Random Engine Differences

The C++11 standard specifies a number of different engines for random number generation: linear_congruential_engine, mersenne_twister_engine, subtract_with_carry_engine and so on. Obviously, this is a large change from the old usage of std::rand.
Obviously, one of the major benefits of (at least some) of these engines is the massively increased period length (it's built into the name for std::mt19937).
However, the differences between the engines is less clear. What are the strengths and weaknesses of the different engines? When should one be used over the other? Is there a sensible default that should generally be preferred?
From the explanations below, a linear engine seems to be faster but less random while the Mersenne Twister has a higher complexity and randomness. Subtract-with-carry random number engine is an improvement to the linear engine and it is definitely more random. In the last reference, it is stated that Mersenne Twister has higher complexity than the Subtract-with-carry random number engine.
Linear congruential random number engine
A pseudo-random number generator engine that produces unsigned integer numbers.
This is the simplest generator engine in the standard library. Its state is a single integer value, with the following transition algorithm:
x = (ax+c) mod m
Where x is the current state value, a and c are their respective template parameters, and m is its respective template parameter if this is greater than 0, or numerics_limits<UIntType>::max() + 1, otherwise.
Its generation algorithm is a direct copy of the state value.
This makes it an extremely efficient generator in terms of processing and memory consumption, but producing numbers with varying degrees of serial correlation, depending on the specific parameters used.
The random numbers generated by linear_congruential_engine have a period of m.
Mersenne twister random number engine
A pseudo-random number generator engine that produces unsigned integer numbers in the closed interval [0,2^w-1].
The algorithm used by this engine is optimized to compute large series of numbers (such as in Monte Carlo experiments) with an almost uniform distribution in the range.
The engine has an internal state sequence of n integer elements, which is filled with a pseudo-random series generated on construction or by calling member function seed.
The internal state sequence becomes the source for n elements: When the state is advanced (for example, in order to produce a new random number), the engine alters the state sequence by twisting the current value using xor mask a on a mix of bits determined by parameter r that come from that value and from a value m elements away (see operator() for details).
The random numbers produced are tempered versions of these twisted values. The tempering is a sequence of shift and xor operations defined by parameters u, d, s, b, t, c and l applied on the selected state value (see operator()).
The random numbers generated by mersenne_twister_engine have a period equivalent to the mersenne number 2^((n-1)*w)-1.
Subtract-with-carry random number engine
A pseudo-random number generator engine that produces unsigned integer numbers.
The algorithm used by this engine is a lagged fibonacci generator, with a state sequence of r integer elements, plus one carry value.
Lagged Fibonacci generators have a maximum period of (2k - 1)*^(2M-1) if addition or subtraction is used. The initialization of LFGs is a very complex problem. The output of LFGs is very sensitive to initial conditions, and statistical defects may appear initially but also periodically in the output sequence unless extreme care is taken. Another potential problem with LFGs is that the mathematical theory behind them is incomplete, making it necessary to rely on statistical tests rather than theoretical performance.
And finally from the documentation of random:
The choice of which engine to use involves a number of tradeoffs: the linear congruential engine is moderately fast and has a very small storage requirement for state. The lagged Fibonacci generators are very fast even on processors without advanced arithmetic instruction sets, at the expense of greater state storage and sometimes less desirable spectral characteristics. The Mersenne Twister is slower and has greater state storage requirements but with the right parameters has the longest non-repeating sequence with the most desirable spectral characteristics (for a given definition of desirable).
I think that the point is that random generators have different properties, which can make them more suitable or not for a given problem.
The period length is one of the properties.
The quality of the random numbers can also be important.
The performance of the generator can also be an issue.
Depending on your need, you might take one generator or another one. E.g., if you need fast random numbers but do not really care for the quality, an LCG might be a good option. If you want better quality random numbers, the Mersenne Twister is probably a better option.
To help you making your choice, there are some standard tests and results (I definitely like the table p.29 of this paper).
EDIT: From the paper,
The LCG (LCG(***) in the paper) family are the fastest generators, but with the poorest quality.
The Mersenne Twister (MT19937) is a little bit slower, but yields better random numbers.
The substract with carry ( SWB(***), I think) are way slower, but can yield better random properties when well tuned.
As the other answers forget about ranlux, here is a small note by an AMD developer that recently ported it to OpenCL:
https://community.amd.com/thread/139236
RANLUX is also one of very few (the only one I know of actually) PRNGs that has a underlying theory explaining why it generates "random" numbers, and why they are good. Indeed, if the theory is correct (and I don't know of anyone who has disputed it), RANLUX at the highest luxury level produces completely decorrelated numbers down to the last bit, with no long-range correlations as long as we stay well below the period (10^171). Most other generators can say very little about their quality (like Mersenne Twister, KISS etc.) They must rely on passing statistical tests.
Physicists at CERN are fan of this PRNG. 'nuff said.
Some of the information in these other answers conflicts with my findings. I've run tests on Windows 8.1 using Visual Studio 2013, and consistently I've found mersenne_twister_engine to be but higher quality and significantly faster than either linear_congruential_engine or subtract_with_carry_engine. This leads me to believe, when the information in the other answers are taken into account, that the specific implementation of an engine has a significant impact on performance.
This is of great surprise to nobody, I'm sure, but it's not mentioned in the other answers where mersenne_twister_engine is said to be slower. I have no test results for other platforms and compilers, but with my configuration, mersenne_twister_engine is clearly the superior choice when considering period, quality, and speed performance. I have not profiled memory usage, so I cannot speak to the space requirement property.
Here's the code I'm using to test with (to make portable, you should only have to replace the windows.h QueryPerformanceXxx() API calls with an appropriate timing mechanism):
// compile with: cl.exe /EHsc
#include <random>
#include <iostream>
#include <windows.h>
using namespace std;
void test_lc(const int a, const int b, const int s) {
/*
typedef linear_congruential_engine<unsigned int, 48271, 0, 2147483647> minstd_rand;
*/
minstd_rand gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
void test_mt(const int a, const int b, const int s) {
/*
typedef mersenne_twister_engine<unsigned int, 32, 624, 397,
31, 0x9908b0df,
11, 0xffffffff,
7, 0x9d2c5680,
15, 0xefc60000,
18, 1812433253> mt19937;
*/
mt19937 gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
void test_swc(const int a, const int b, const int s) {
/*
typedef subtract_with_carry_engine<unsigned int, 24, 10, 24> ranlux24_base;
*/
ranlux24_base gen(1729);
uniform_int_distribution<> distr(a, b);
for (int i = 0; i < s; ++i) {
distr(gen);
}
}
int main()
{
int a_dist = 0;
int b_dist = 1000;
int samples = 100000000;
cout << "Testing with " << samples << " samples." << endl;
LARGE_INTEGER ElapsedTime;
double ElapsedSeconds = 0;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
double TickInterval = 1.0 / ((double) Frequency.QuadPart);
LARGE_INTEGER StartingTime;
LARGE_INTEGER EndingTime;
QueryPerformanceCounter(&StartingTime);
test_lc(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << "linear_congruential_engine time: " << ElapsedSeconds << endl;
QueryPerformanceCounter(&StartingTime);
test_mt(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << " mersenne_twister_engine time: " << ElapsedSeconds << endl;
QueryPerformanceCounter(&StartingTime);
test_swc(a_dist, b_dist, samples);
QueryPerformanceCounter(&EndingTime);
ElapsedTime.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedSeconds = ElapsedTime.QuadPart * TickInterval;
cout << "subtract_with_carry_engine time: " << ElapsedSeconds << endl;
}
Output:
Testing with 100000000 samples.
linear_congruential_engine time: 10.0821
mersenne_twister_engine time: 6.11615
subtract_with_carry_engine time: 9.26676
I just saw this answer from Marnos and decided to test it myself. I used std::chono::high_resolution_clock to time 100000 samples 100 times to produce an average. I measured everything in std::chrono::nanoseconds and ended up with different results:
std::minstd_rand had an average of 28991658 nanoseconds
std::mt19937 had an average of 29871710 nanoseconds
ranlux48_base had an average of 29281677 nanoseconds
This is on a Windows 7 machine. Compiler is Mingw-Builds 4.8.1 64bit. This is obviously using the C++11 flag and no optimisation flags.
When I turn on -O3 optimisations, the std::minstd_rand and ranlux48_base actually run faster than what the implementation of high_precision_clock can measure; however std::mt19937 still takes 730045 nanoseconds, or 3/4 of a second.
So, as he said, it's implementation specific, but at least in GCC the average time seems to stick to what the descriptions in the accepted answer say. Mersenne Twister seems to benefit the least from optimizations, whereas the other two really just throw out the random numbers unbelieveably fast once you factor in compiler optimizations.
As an aside, I'd been using Mersenne Twister engine in my noise generation library (it doesn't precompute gradients), so I think I'll switch to one of the others to really see some speed improvements. In my case, the "true" randomness doesn't matter.
Code:
#include <iostream>
#include <chrono>
#include <random>
using namespace std;
using namespace std::chrono;
int main()
{
minstd_rand linearCongruentialEngine;
mt19937 mersenneTwister;
ranlux48_base subtractWithCarry;
uniform_real_distribution<float> distro;
int numSamples = 100000;
int repeats = 100;
long long int avgL = 0;
long long int avgM = 0;
long long int avgS = 0;
cout << "results:" << endl;
for(int j = 0; j < repeats; ++j)
{
cout << "start of sequence: " << j << endl;
auto start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(linearCongruentialEngine);
auto stop = high_resolution_clock::now();
auto L = duration_cast<nanoseconds>(stop-start).count();
avgL += L;
cout << "Linear Congruential:\t" << L << endl;
start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(mersenneTwister);
stop = high_resolution_clock::now();
auto M = duration_cast<nanoseconds>(stop-start).count();
avgM += M;
cout << "Mersenne Twister:\t" << M << endl;
start = high_resolution_clock::now();
for(int i = 0; i < numSamples; ++i)
distro(subtractWithCarry);
stop = high_resolution_clock::now();
auto S = duration_cast<nanoseconds>(stop-start).count();
avgS += S;
cout << "Subtract With Carry:\t" << S << endl;
}
cout << setprecision(10) << "\naverage:\nLinear Congruential: " << (long double)(avgL/repeats)
<< "\nMersenne Twister: " << (long double)(avgM/repeats)
<< "\nSubtract with Carry: " << (long double)(avgS/repeats) << endl;
}
Its a trade-off really. A PRNG like Mersenne Twister is better because it has extremely large period and other good statistical properties.
But a large period PRNG takes up more memory (for maintaining the internal state) and also takes more time for generating a random number (due to complex transitions and post processing).
Choose a PNRG depending on the needs of your application. When in doubt use Mersenne Twister, its the default in many tools.
In general, mersenne twister is the best (and fastest) RNG, but it requires some space (about 2.5 kilobytes). Which one suits your need depends on how many times you need to instantiate the generator object. (If you need to instantiate it only once, or a few times, then MT is the one to use. If you need to instantiate it millions of times, then perhaps something smaller.)
Some people report that MT is slower than some of the others. According to my experiments, this depends a lot on your compiler optimization settings. Most importantly the -march=native setting may make a huge difference, depending on your host architecture.
I ran a small program to test the speed of different generators, and their sizes, and got this:
std::mt19937 (2504 bytes): 1.4714 s
std::mt19937_64 (2504 bytes): 1.50923 s
std::ranlux24 (120 bytes): 16.4865 s
std::ranlux48 (120 bytes): 57.7741 s
std::minstd_rand (4 bytes): 1.04819 s
std::minstd_rand0 (4 bytes): 1.33398 s
std::knuth_b (1032 bytes): 1.42746 s

C/C++ algorithm to produce same pseudo-random number sequences from same seed on different platforms? [duplicate]

This question already has answers here:
Consistent pseudo-random numbers across platforms
(5 answers)
Closed 9 years ago.
The title says it all, I am looking for something preferably stand-alone because I don't want to add more libraries.
Performance should be good since I need it in a tight high-performance loop. I guess that will come at a cost of the degree of randomness.
Any particular pseudo-random number generation algorithm will behave like this. The problem with rand is that it's not specified how it is implemented. Different implementations will behave in different ways and even have varying qualities.
However, C++11 provides the new <random> standard library header that contains lots of great random number generation facilities. The random number engines defined within are well-defined and, given the same seed, will always produce the same set of numbers.
For example, a popular high quality random number engine is std::mt19937, which is the Mersenne twister algorithm configured in a specific way. No matter which machine, you're on, the following will always produce the same set of real numbers between 0 and 1:
std::mt19937 engine(0); // Fixed seed of 0
std::uniform_real_distribution<> dist;
for (int i = 0; i < 100; i++) {
std::cout << dist(engine) << std::endl;
}
Here's a Mersenne Twister
Here is another another PRNG implementation in C.
You may find a collection of PRNG here.
Here's the simple classic PRNG:
#include <iostream>
using namespace std;
unsigned int PRNG()
{
// our initial starting seed is 5323
static unsigned int nSeed = 5323;
// Take the current seed and generate a new value from it
// Due to our use of large constants and overflow, it would be
// very hard for someone to predict what the next number is
// going to be from the previous one.
nSeed = (8253729 * nSeed + 2396403);
// Take the seed and return a value between 0 and 32767
return nSeed % 32767;
}
int main()
{
// Print 100 random numbers
for (int nCount=0; nCount < 100; ++nCount)
{
cout << PRNG() << "\t";
// If we've printed 5 numbers, start a new column
if ((nCount+1) % 5 == 0)
cout << endl;
}
}