As we know, the Mersenne Twister is not crytographically secure:
Mersenne Twister is not cryptographically secure. (MT is based on a
linear recursion. Any pseudorandom number sequence generated by a
linear recursion is insecure, since from sufficiently long subsequencje
of the outputs, one can predict the rest of the outputs.)
But many sources, like Stephan T. Lavavej and even this website. The advice is almost always (verbatim) to use the Mersenne Twister like this:
auto engine = mt19937{random_device{}()};
They come in different flavors, like using std::seed_seq or complicated ways of manipulating std::tm, but this is the simplest approach.
Even though std::random_device is not always reliable:
std::random_device may be implemented in terms of an
implementation-defined pseudo-random number engine if a
non-deterministic source (e.g. a hardware device) is not available to
the implementation. In this case each std::random_device object may
generate the same number sequence.
The /dev/urandom vs /dev/random debate rages on.
But while the standard library provides a good collection of PRNGs, it doesn't seem to provide any CSPRNGs. I prefer to stick to the standard library rather than using POSIX, Linux-only headers, etc. Can the Mersenne Twister be manipulated to make it cryptographically secure?
Visual Studio guarantees that random_device is cryptographically secure and non-deterministic:
https://msdn.microsoft.com/en-us/library/bb982250.aspx
If you want something faster or cross platform, you could for example use GnuTLS: http://gnutls.org/manual/html_node/Random-number-generation.html
It provides random numbers of adjustable quality. GNUTLS_RND_RANDOM is what you want I think.
As several people already said, please forget about MT in cryptographic contexts.
Related
Usage of rand() is usually frowned upon despite using a seed via srand(). Why would that be the case? What better alternatives are available?
There are two parts to this story.
First, rand is a pseudorandom number generator. This means it depends on a seed. For a given seed it will always give the same sequence (assuming the same implementation). This makes it not suitable for certain applications where security is of a great concern. But this is not specific to rand. It's an issue with any pseudo-random generator. And there are most certainly a lot of classes of problems where a pseudo-random generator is acceptable. A true random generator has its own issues (efficiency, implementation, entropy) so for problems that are not security related most often a pseudo-random generator is used.
So you analyzed your problem and you conclude a pseudo-random generator is the solution. And here we arrive to the real troubles with the C random library (which includes rand and srand) that are specific to it and make it obsolete (a.k.a.: the reasons you should never use rand and the C random library).
One issue is that it has a global state (set by srand). This makes it impossible to use multiple random engines at the same time. It also greatly complicates multithreaded tasks.
The most visible problem of it is that it lacks a distribution engine: rand gives you a number in interval [0 RAND_MAX]. It is uniform in this interval, which means that each number in this interval has the same probability to appear. But most often you need a random number in a specific interval. Let's say [0, 1017]. A commonly (and naive) used formula is rand() % 1018. But the issue with this is that unless RAND_MAX is an exact multiple of 1018 you won't get an uniform distribution.
Another issue is the Quality of Implementation of rand. There are other answers here detailing this better than I could, so please read them.
In modern C++ you should definitely use the C++ library from <random> which comes with multiple random well-defined engines and various distributions for integer and floating point types.
None of the answers here explains the real reason of being rand() bad.
rand() is a pseudo-random number generator (PRNG), but this doesn't mean it must be bad. Actually, there are very good PRNGs, which are statistically hard or impossible to distinguish from true random numbers.
rand() is completely implementation defined, but historically it is implemented as a Linear Congruential Generator (LCG), which is usually a fast, but notoriously bad class of PRNGs. The lower bits of these generators have much lower statistical randomness than the higher bits and the generated numbers can produce visible lattice and/or planar structures (the best example of that is the famous RANDU PRNG). Some implementations try to reduce the lower bits problem by shifting the bits right by a pre-defined amount, however this kind of solution also reduces the range of the output.
Still, there are notable examples of excellent LCGs, like L'Ecuyer's 64 and 128 bits multiplicative linear congruential generators presented in Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure, Pierre L'Ecuyer, 1999.
The general rule of thumb is that don't trust rand(), use your own pseudo-random number generator which fits your needs and usage requirements.
What is bad about rand/srand is that rand—
Uses an unspecified algorithm for the sequence of numbers it generates, yet
allows that algorithm to be initialized with srand for repeatable "randomness".
These two points, taken together, hamper the ability of implementations to improve on rand's implementation (e.g., to use a cryptographic random number generator [RNG] or an otherwise "better" algorithm for producing pseudorandom numbers). For example, JavaScript's Math.random and FreeBSD's arc4random don't have this problem, since they don't allow applications to seed them for repeatable "randomness" — it's for exactly this reason that the V8 JavaScript engine was able to change its Math.random implementation to a variant of xorshift128+ while preserving backward compatibility. (On the other hand, letting applications supply additional data to supplement "randomness", as in BCryptGenRandom, is less problematic; even so, however, this is generally seen only in cryptographic RNGs.)
Also:
The fact that the algorithm and the seeding procedure for rand and srand are unspecified means that even reproducible "randomness" is not guaranteed between rand/srand implementations, between versions of the same standard library, between operating systems, etc.
If srand is not called before rand is, rand behaves similarly as though srand(1) were first called. In practice, this means that rand can only be implemented as a pseudorandom number generator (PRNG) rather than as a nondeterministic RNG, and that rand's PRNG algorithm can't differ in a given implementation whether the application calls srand or not.
EDIT (Jul. 8, 2020):
There is one more important thing that's bad about rand and srand. Nothing in the C standard for these functions specifies a particular distribution that the "pseudo-random numbers" delivered by rand have to follow, including the uniform distribution or even a distribution that approximates the uniform distribution. Contrast this with C++'s uniform_int_distribution and uniform_real_distribution classes, as well as the specific pseudorandom generator algorithms specified by C++, such as linear_congruential_engine and mt19937.
EDIT (begun Dec. 12, 2020):
Yet another bad thing about rand and srand: srand takes a seed that can only be as big as an unsigned. unsigned must be at least 16 bits and in most mainstream C implementations, unsigned is either 16 or 32 bits depending on the implementation's data model (notably not 64 bits even if the C implementation adopts a 64-bit data model). Thus, no more than 2^N different sequences of numbers can be selected this way (where N is the number of bits in an unsigned), even if the underlying algorithm implemented by rand can produce many more different sequences than that (say, 2^128 or even 2^19937 as in C++'s mt19937).
Firstly, srand() doesn't get a seed, it sets a seed. Seeding is part of the use of any pseudo random number generator (PRNG). When seeded the sequence of numbers that the PRNG produces from that seed is strictly deterministic because (most?) computers have no means to generate true random numbers. Changing your PRNG won't stop the sequence from being repeatable from the seed and, indeed, this is a good thing because the ability to produce the same sequence of pseudo-random numbers is often useful.
So if all PRNGs share this feature with rand() why is rand() considered bad? Well, it comes down to the "psuedo" part of pseudo-random. We know that a PRNG can't be truly random but we want it to behave as close to a true random number generator as possible, and there are various tests that can be applied to check how similar a PRNG sequence is to a true random sequence. Although its implementation is unspecified by the standard, rand() in every commonly used compiler uses a very old method of generation suited for very weak hardware, and the results it produces fair poorly on these tests. Since this time many better random number generators have been created and it is best to choose one suited to your needs rather than relying on the poor quality one likely to provided by rand().
Which is suitable for your purposes depends on what you are doing, for example you may need cryptographic quality, or multi-dimensional generation, but for many uses where you simply want things to be fairly uniformly random, fast generation, and money is not on the line based on the quality of the results you likely want the xoroshiro128+ generator. Alternatively you could use one of the methods in C++'s <random> header but the generators offered are not state of the art, and much better is now available, however, they're still good enough for most purposes and quite convenient.
If money is on the line (e.g. for card shuffling in an online casino, etc.), or you need cryptogaphic quality, you need to carefully investigation appropriate generators and ensure they exactly much your specific needs.
rand is usually -but not always-, for historical reasons, a very bad pseudo-random number generator (PRNG). How bad is it is implementation specific.
C++11 has nice, much better, PRNGs. Use its <random> standard header. See notably std::uniform_int_distribution here which has a nice example above std::mersenne_twister_engine.
PRNGs are a very tricky subject. I know nothing about them, but I trust the experts.
Let me add another reason that makes rand() totally not usable: The standard does not define any characteristic of random numbers it generates, neither distribution nor range.
Without definition of distribution we can't even wrap it to have what distribution we want.
Even further, theorically I can implement rand() by simply return 0, and anounce that RAND_MAX of my rand() is 0.
Or even worse, I can let least significant bit always be 0, which doesn't violate the standard. Image someone write code like if (rand()%2) ....
Pratically, rand() is implementation defined and the standards says:
There are no guarantees as to the quality of the random sequence produced and some implementations
are known to produce sequences with distressingly non-random low-order bits. Applications with
particular requirements should use a generator that is known to be sufficient for their needs
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf p36
If you use rand(), you will basically have the same result after generating your random number.
So even after using srand(), it will be easy to predict the number generated if someone can guess the seed you use. This is because the function rand() uses a specific algorithm to produce such numbers
With some time to waste, you can figure out how to predict numbers generated by the function, given the seed. All you need now is to guess the seed. Some people refer to the seed as the current time. So if can guess the time at which you run the application, I ll be able to predict the number
IT IS BAD TO USE RAND()!!!!
I see many people talk about security and std::random_device together.
For example, here slide 22.
According to cppreference, std::random_device :
std::random_device is a uniformly-distributed integer random number generator that produces non-deterministic random numbers.
It does not talk about security explicitly.
Is there any valid reference that explicitly mentions std::random_device is secure for cryptography?
No, because that's not what std::random_device is designed for; it's designed to generate random numbers, not to be secure.
In the context of security, randomness is something that is useful for key generation, but randomness is not something that is absolutely needed. For example, AES does not use any randomness, yet AES-256 is what is used to encrypt top secret information in the US.
One area where randomness and security cross, is when a random key is generated and used; if I can guess the seed and know the random protocol used, there's a good chance I can then use that same seed value to generate the same "random" value and thus the same key.
std::random_device will use a hardware module (like a hardware TPM) if one is available, otherwise it will use whatever the OS has as a RNG (like CryptGenRandom in Windows, or /dev/random in *nix systems), which might even be a PRNG (pseudo-random number generator), which might generate the same number depending on the random number algorithm used. As a side note: much like how the AES instruction set was incorporated into chipsets to speed up encryption and decryption, hardware RNG's help to give a larger entropy pool and faster random number generation as the algorithms are moved into hardware.
So if you are using std::random_device in any sort of cryptographic key generation, you'll need to be aware what random number generator is being used on the system being deployed to, otherwise you can have collisions and thus your encrypted system can be susceptible to duplicate key types of attack.
Hope that can help.
TL;DR: only use std::random_device to generate seeds for the defined PRNG's within this library. Otherwise use a cryptographic library such as Crypto++, Bothan, OpenSSL etc. to generate secure random numbers.
To have an idea why std::random_device is required it is important to see it in the light of the context for which it was defined.
std::random_device is part of a set of classes and methods that are used to generate deterministic/pseudo random number sequences fast. One example - also shown in the slides - is the Mersenne twister algorithm, which is certainly not cryptographically secure.
Now this is all very nice, but as the defined algorithms are all deterministic, this is arguably not what the users may be after: they want a fast random number generator that doesn't produce the same stream all of the time. Some kind of entropy source is required to seed the insecure PRNG. This is where std::random_device comes into action, it is used to seed the Mersenne twister (as shown in the slides referred to in the answer).
The slides show a speed difference of about 250 times for the Mersenne twister and the slow system provided non-deterministic random number generator. This clearly demonstrates why local, deterministic PRNG's can help to speed up random number generation.
Also note that local PRNG's won't slow down much when used from multiple threads. The system generator could be speedy when accessed by multiple threads, but this is certainly not a given. Sometimes system RNG's may even block or have latency or related issues.
As mentioned in the comments below the question, the contract for std::random_device is rather weak. It talks about using a "deterministic" generator if a system generator is not avialable. Of course, on most desktop / server configurations such a device (e.g. /dev/random or the non-blocking /dev/urandom device) is available. In that case std:random_device is rather likely to return a secure random number generator. However, you cannot rely on this to happen on all system configurations.
If you require a relatively fast secure random number generator I would recommend you use a cryptographic library such as OpenSSL or Crypto++ instead of using an insecure fast one the relatively slow system random generator. OpenSSL - for instance - will use the system random generator (as well as other entropy sources) to seed a more secure algorithm.
when I want to generate random numbers using std::random, which engine should I prefer? the std::default_random_engine or the std::mt19937? what are the differences?
For lightweight randomnes (e.g. games), you could certainly consider default_random_engine. But if your code depends heavily on quality of randomness (e.g. simulation software), you shouldn't use it, as it gives only minimalistic garantees:
It is the library implemention's selection of a generator that
provides at least acceptable engine behavior for relatively casual,
inexpert, and/or lightweight use.
The mt19937 32 bits mersene twister (or its 64 bit counterpart mt19937_64) is on the other side a well known algorithm that passes very well statistical randomness tests. So it's ideal for scientific applications.
However, you shall consider neither of them, if your randomn numbers are meant for security (e.g. cryptographic) purpose.
The question is currently having one close vote as primary opinion based. I would argue against that and say that std::default_random_engine is objectively a bad choice, since you don't know what you get and switching standard libraries can give you different results in the quality of the randomness you receive.
You should pick whatever random number generator gives you the kind of qualities you are looking for. If you have to pick between the two, go with std::mt19937 as it gives you predictable and defined behaviour.
They address different needs. The first is an implementation-defined alias for a certain generator whilst the latter specifically uses the Mersenne-Twister algorithm with a 32 bit seed.
If you don't have particular requirements, std::default_random_engine should be ok.
If i want to start building a PRNG in C++ what are the best bricks for the job?
are there any standardize and portable libraries with a predictable behaviour ( with a seed ) and pseudo randomic?
When you say "portable" I assume you want the same sequence of random numbers given the same seed, no matter which platform they're compiled for. Pseudo-random number generators should provide the same sequence as long as they're based on the same algorithm. I think boost::random is your best bet, it's a good random number generator (better than rand in many cases) with predictable behavior across platforms.
C++11 offers a host of portable random-number generators. This was driven by the folks at Fermilab, who do heavy-duty simulations of subatomic particle interactions, often involving distributing work through a network to many computers.
I want to supply a number, and then receive a set of random numbers. However, I want those numbers to be the same regardless of which computer I run it on (assuming I supply the same seed).
Basically my question is: in C++, if I make use of rand(), but supply srand() with a user-defined seed rather than the current time, will I be able to generate the same random number stream on any computer?
There are dozens of PRNGs available as libraries. Pick one. I tend to use Mersenne Twister.
By using an externally supplied library, you bypass the risk of a weird or buggy implementation of your language's library rand(). As long as your platforms all conform to the same mathematical semantics, you'll get consistent results.
MT is a favorite of mine because I'm a physicist, and I use these things for Monte Carlo, where the guarantee of equal-distribution to high dimensions is important. But don't use MT as a cryptographic PRNG!
srand() & rand() are not part of the STL. They're actually part of the C runtime.
Yes, they will produce the same results as long as it's the same implementation of srand()/rand().
Depending on your needs, you might want to consider using Boost.Random. It provides several high-quality random number generators.
Assuming the implementations of rand() are the same, yes.
The easiest way to ensure this is to include a known rand() implementation with your program - either included in your project's source code or in the form of a library you can manage.
No, the ANSI C standard only specifies that rand() must produce a stream of random integers between 0 and RAND_MAX, which must be at least 32767 (source). This stream must be deterministic only in that, for a given implementation on a given machine, it must produce the same integer stream given the same seed.
You want a portable PRNG. Mersenne Twister (many implementations linked at the bottom) is pretty portable, as is Ben Pfaff's homegrown C99-compliant PRNG. Boost.Random should be fine too; as you're writing your code in C++, using Boost doesn't limit your choice of platforms much (although some "lesser" (i.e. non-compliant) compilers may have trouble with its heavy use of template metaprogramming). This is only really a problem for low-volume embedded platforms and perhaps novel research architectures, so if by "any computer" you mean "any x86/PPC/ARM/SPARC/Alpha/etc. platform that GCC targets", any of the above should do just fine.
Write your own pseudorandom number routine. There are a lot of algorithms documented on the internet, and they have a number of applications where rand isn't good enough (e.g. Perlin Noise).
Try these links for starters:
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://en.wikipedia.org/wiki/Pseudorandom_number_generator
I believe if you supply with srand with the same seed, you will get the same results. That's pretty much the definition of a seed in terms of pseudo random number generators.
Yes. For a given seed (starting value), the sequence of numbers that rand() returns will always be the same.