Is there any literature available on the periodicity of the random number generator in gcc's g++ (if we don't re-seed the function)? I suppose I could run tests myself, but it would be better to have access to well-verified research.
Thanks in advance for your help.
// EDIT
I just wanted to add that I have searched quite a bit, with multiple engines, but I have not found anything specific. I have only read general comments about the periodicity being limited by the number of bits needed to represent the seed. (So I guess that given the fact that srand is usually called with time, the periodicity can be no more than 10^12 or so. But something more definite would be very helpful before I start implementing my algorithms.)
When searching in the rand(3) man page, I found this:
The versions of rand() and srand() in
the Linux C Library use the same
random number generator as random()
and srandom()
so I had a look at the random(3) man page, and here is your answer:
The period of this random number
generator is very large, approximately
16*((2**31)-1)
This can be quite useful for pedagogic purposes, since you want to develop your own PRNG. However, I would discourage you to use this PRNG when developing an application. You should prefer one of the Boost.Random's implementation as suggested #Neil Butterworth (MT19937 is a good default PRNG, sufficient for most of the applications).
Finally, if you intend to learn more about PRNGs, I would suggest you to read these two scientific articles, that well survey PRNGs.
Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010
Pseudorandom number generators, Pierre L'Ecuyer, in Encyclopedia of Quantitative Finance
Encyclopedia of Quantitative Finance, 2008
The srand/rand functions are a bit broken. As you are using C++, I strongly recommend you use the boost random number library. It's a header-only library, so you don't need to build anything. There's an example of how to use it here.
Related
I want to check whether my implementation of std::random_device
has non-zero entropy (i.e. is non-deterministic), using std::random_device::entropy() function. However, according
to cppreference.com
"This function is not fully implemented in some standard libraries.
For example, gcc and clang always return zero even though the device
is non-deterministic. In comparison, Visual C++ always returns 32,
and boost.random returns 10."
Is there any way of finding the real entropy? In particular, do modern
computers (MacBook Pro/iMac etc) have a non-deterministic source or randomness, like e.g. using heat dissipation monitors?
I recommend you the lecture of this article.
Myths about /dev/urandom
ยง 26.5.6
A random_device uniform random number generator produces non-deterministic random numbers.
If implementation limitations prevent generating non-deterministic random numbers, the implementation may employ a random number engine.
So basically it will try to use the internal system "true" random number generator, in linux /dev/{u}random o windows RltGenRandom.
A different point is you don't trust those sources of randomness because they depend on internal noise or are close implementations.
Additionally is how do you meassure the quality of entropy, as you know that is one of the biggest problem trying to find good rng generators.
One estimation could be extremely good and other estimation could report not so good entropy.
Entropy Estimation
In various science/engineering applications, such as independent
component analysis, image analysis, genetic analysis, speech
recognition, manifold learning, and time delay estimation it is useful
to estimate the differential entropy of a system or process, given
some observations.
As it sais, you must rely on final observations, and those can be wrong.
I you think the internal rng is not good enough, you can always try to buy hardware devices for that purpose. This list on wikipedia has a list of vendors, you can check on the internet reviews about them.
Performance
One point you must consider is the performance within your application using real random number generators. One common technique is to use as seed in a mersenne twister a number obtained using /dev/random.
If the user can't access your system physically, you will need to balance reliability with availability, a system with security holes is as bad as one doesn't work, at the end you must have your important data encrypted.
Edit 1: As suggestion I have moved the article at the top of my comment, is a good read. Thanks for the hint :-).
All the standard gives you is what you've already seen. You would need to know something about how a given standard library implements random_device in order to answer this question. For example, in Visual Studio 2013 Update 4, random_device forwards to rand_s which forwards to RtlGenRandom, which may actually be (always?) a cryptographically secure pseudorandom number generator depending on your Windows version and the hardware available.
If you don't trust the platform to provide a good source of entropy, then you should use your own cryptographically secure PRNG, such as one based on AES. That said, platform vendors have strong incentives for their random numbers to actually be random, and embedding the PRNG into your app means that the PRNG can't be updated as easily in the event it is found to be insecure. Only you can decide on that tradeoff for yourself :)
Entropy is just one measure of RNG quality (and true, exact entropy is impossible to measure). For a practical and reasonably-accurate measurement of your std::random_device's random number quality, consider using a standard randomness test suite such as TestU01, diehard, or its successor dieharder. These run a battery of statistical tests designed to stress your RNG, ensuring it produces statistically random data.
Note that statistical randomness by itself does not certify that the RNG is suitable for cryptographic applications.
Many modern computers have easily-accessible sources of hardware randomness, namely the analog-to-digital converters found in the audio input, camera, and various sensors. These exhibit low-level thermal or electrical noise which can be exploited to produce high-quality random data. However, no OS that I know of actually uses these sensors to supply their system random-number sources (such as /dev/[u]random), since the bitrate of such physical random number sources tends to be very low.
Instead, OS-provided random number sources tend to be seeded by hardware counters and events, such as page faults, device driver events, and other sources of unpredictability. In theory, these events might be fully predictable given the precise hardware state (since they aren't based on e.g. quantum or thermal noise), but in practice they are sufficiently unpredictable that they produce good random data.
Entropy as a scientific term is misused when describing random numbers. Complexity might be a better term. Entropy in physics is defined as the logarithm of the number of available quantum states (not useful in RNG), and entropy in information theory is defined by the Shannon entropy, but that is geared towards the other extreme - how to put as much information into a noisy bit stream, not how to minimize the information.
For example, the digits of Pi look random, but the actual entropy of the digits is zero once you know that they derive from Pi. Increasing "Entropy" in RNG is basically a question of making the source of the numbers as obscure as possible.
If i want to start building a PRNG in C++ what are the best bricks for the job?
are there any standardize and portable libraries with a predictable behaviour ( with a seed ) and pseudo randomic?
When you say "portable" I assume you want the same sequence of random numbers given the same seed, no matter which platform they're compiled for. Pseudo-random number generators should provide the same sequence as long as they're based on the same algorithm. I think boost::random is your best bet, it's a good random number generator (better than rand in many cases) with predictable behavior across platforms.
C++11 offers a host of portable random-number generators. This was driven by the folks at Fermilab, who do heavy-duty simulations of subatomic particle interactions, often involving distributing work through a network to many computers.
I have legacy C++ code that I wrote to generate uniform random numbers and a Gaussian distribution. It implements algorithms by Dr. George Marsaglia that are extremely fast. (I was using them to generate skazillions of samples for Monte Carlo high-dimensional integration.)
I think it would be a good idea to re-factor the generator and distribution to work with the new C++11 std::random scheme.
Can anyone point me to a tutorial or a good reference for std::random that includes the necessary info for how to extend it? Example code would be ideal.
UPDATE. Thanks for everyone's help. I have now written a drop-in replacement for the std::normal_distribution that ships with Visual C++ 2010. On my machine, the replacement is 26% faster when fed by the default engine. I am a little disappointed that the difference is not bigger, but hey, that's my problem. :-)
N3376 is the latest draft C++ standard (this is post C++11, but is an excellent snapshot of C++11).
Everything C++11-random is in: 26.5 Random number generation [rand]
26.5.1.4 Random number engine requirements [rand.req.eng] has all of the requirements your uniform random number generator would need to fulfill.
26.5.1.6 Random number distribution requirements [rand.req.dist] has all of the requirements your Gaussian distribution would need to fulfill.
26.5.8.5.1 Class template normal_distribution [rand.dist.norm.normal] is the section describing the std-defined Gaussian distribution.
The C++11 <random> is very STL-like in that it sets up requirements for random number generators (containers), and random distributions (algorithms), and then the client can mix and match the two. It's a really very cool design.
Sorry, I don't know of a good tutorial. The C++ standard is an excellent reference and a lousy tutorial. However you are obviously well educated in the domain of random numbers. So assuming you know a thing or two about C++, the C++ standard may not be too bad.
Open source implementations of <random> are available if you want to peruse their source (for an example). One example is libc++. All they ask is that you retain their copyright notices if you reuse any of their code.
Edit
You are uniquely qualified to write this tutorial. :-)
You can learn a lot by reading boost library sources, since many proposals in C++11 were adopted from boost.
Take a look at the interface of an example rng engine here:
http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine
I would start by implementing the min max seed and operator() functionalities to see if it passes as a valid engine for C++11
So I'm new to C++ and am trying to learn some things. As such I am trying to make a Random Number Generator (RNG or PRNG if you will). I have basic knowledge of RNGs, like you have to start with a seed and then send the seed through the algorithm. What I'm stuck at is how people come up with said algorithms.
Here is the code I have to get the seed.
int getSeed()
{
time_t randSeed;
randSeed = time(NULL);
return randSeed;
}
Now I know that there is are prebuilt RNGs in C++ but I'm looking to learn not just copy other people's work and try to figure it out.
So if anyone could lead me to where I could read or show me examples of how to come up with algorithms for this I would be greatly appreciative.
First, just to clarify, any algorithm you come up with will be a pseudo random number generator and not a true random number generator. Since you would be making an algorithm (i.e. writing a function, i.e. making a set of rules), the random number generator would have to eventually repeat itself or do something similar which would be non-random.
Examples of truly random number generators are one's that capture random noise from nature and digitize it. These include:
http://www.fourmilab.ch/hotbits/
http://www.random.org/
You can also buy physical equipment that generate white noise (or some other means on randomness) and digitally capture it:
http://www.lavarnd.org/
http://www.idquantique.com/true-random-number-generator/products-overview.html
http://www.araneus.fi/products-alea-eng.html
In terms of pseudo random number generators, the easiest ones to learn (and ones that an average lay person could probably make on their own) are the linear congruential generators. Unfortunately, these are also some of the worst PRNGs there are.
Some guidelines for determining what is a good PRNG include:
Periodicity (what is the range of available numbers?)
Consecutive numbers (what is the probability that the same number will be repeated twice in a row)
Uniformity (Is it just as likely to pick numbers from a certain sub range as another sub range)
Difficulty in reverse engineering it (If it is close to truly random then someone should not be able to figure out the next number it generates based on the last few numbers it generated)
Speed (how fast can I generate a new number? Does it take 5 or 500 arithmetic operations)
I'm sure there are others I am missing
One of the more popular ones right now that is considered good in most applications (ie not crptography) is the Mersenne Twister. As you can see from the link, it is a simple algorithm, perhaps only 30 lines of code. However trying to come up with those 20 or 30 lines of code from scratch takes a lot of brainpower and study of PRNGs. Usually the most famous algorithms are designed by a professor or industry professional that has studied PRNGs for decades.
I hope you do study PRNGs and try to roll your own (try Knuth's Art of Computer Programming or Numerical Recipes as a starting place), but I just wanted to lay this all out so at the end of the day (unless PRNGs will be your life's work) its much better to just use something someone else has come up with. Also, along those lines, I'd like to point out that historically compilers, spreadsheets, etc. don't use what most mathematicians consider good PRNGs so if you have a need for a high quality PRNGs don't use the standard library one in C++, Excel, .NET, Java, etc. until you have research what they are implementing it with.
A linear congruential generator is commonly used and the Wiki article explains it pretty well.
To quote John von Neumann:
Anyone who considers arithmetical
methods of producing random digits is
of course in a state of sin.
This is taken from Chapter 3 Random Numbers of Knuth's book "The Art of Computer Programming", which must be the most exhaustive overview of the subject available. And once you have read it, you will be exhausted. You will also know why you don't want to write your own random number generator.
The correct solution best fulfills the requirements and the requirements of every situation will be unique. This is probably the simplest way to go about it:
Create a large one dimensional array
populated with "real" random values.
"seed" your pseudo-random generator by
calculating the starting index with
system time.
Iterate through the array and return
the value for each call to your
function.
Wrap around when it reaches the end.
I want to supply a number, and then receive a set of random numbers. However, I want those numbers to be the same regardless of which computer I run it on (assuming I supply the same seed).
Basically my question is: in C++, if I make use of rand(), but supply srand() with a user-defined seed rather than the current time, will I be able to generate the same random number stream on any computer?
There are dozens of PRNGs available as libraries. Pick one. I tend to use Mersenne Twister.
By using an externally supplied library, you bypass the risk of a weird or buggy implementation of your language's library rand(). As long as your platforms all conform to the same mathematical semantics, you'll get consistent results.
MT is a favorite of mine because I'm a physicist, and I use these things for Monte Carlo, where the guarantee of equal-distribution to high dimensions is important. But don't use MT as a cryptographic PRNG!
srand() & rand() are not part of the STL. They're actually part of the C runtime.
Yes, they will produce the same results as long as it's the same implementation of srand()/rand().
Depending on your needs, you might want to consider using Boost.Random. It provides several high-quality random number generators.
Assuming the implementations of rand() are the same, yes.
The easiest way to ensure this is to include a known rand() implementation with your program - either included in your project's source code or in the form of a library you can manage.
No, the ANSI C standard only specifies that rand() must produce a stream of random integers between 0 and RAND_MAX, which must be at least 32767 (source). This stream must be deterministic only in that, for a given implementation on a given machine, it must produce the same integer stream given the same seed.
You want a portable PRNG. Mersenne Twister (many implementations linked at the bottom) is pretty portable, as is Ben Pfaff's homegrown C99-compliant PRNG. Boost.Random should be fine too; as you're writing your code in C++, using Boost doesn't limit your choice of platforms much (although some "lesser" (i.e. non-compliant) compilers may have trouble with its heavy use of template metaprogramming). This is only really a problem for low-volume embedded platforms and perhaps novel research architectures, so if by "any computer" you mean "any x86/PPC/ARM/SPARC/Alpha/etc. platform that GCC targets", any of the above should do just fine.
Write your own pseudorandom number routine. There are a lot of algorithms documented on the internet, and they have a number of applications where rand isn't good enough (e.g. Perlin Noise).
Try these links for starters:
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://en.wikipedia.org/wiki/Pseudorandom_number_generator
I believe if you supply with srand with the same seed, you will get the same results. That's pretty much the definition of a seed in terms of pseudo random number generators.
Yes. For a given seed (starting value), the sequence of numbers that rand() returns will always be the same.