how to find the "true" entropy of std::random_device?

how to find the "true" entropy of std::random_device? - c++

I want to check whether my implementation of std::random_device
has non-zero entropy (i.e. is non-deterministic), using std::random_device::entropy() function. However, according
to cppreference.com
"This function is not fully implemented in some standard libraries.
For example, gcc and clang always return zero even though the device
is non-deterministic. In comparison, Visual C++ always returns 32,
and boost.random returns 10."
Is there any way of finding the real entropy? In particular, do modern
computers (MacBook Pro/iMac etc) have a non-deterministic source or randomness, like e.g. using heat dissipation monitors?

I recommend you the lecture of this article.
Myths about /dev/urandom
§ 26.5.6
A random_device uniform random number generator produces non-deterministic random numbers.
If implementation limitations prevent generating non-deterministic random numbers, the implementation may employ a random number engine.
So basically it will try to use the internal system "true" random number generator, in linux /dev/{u}random o windows RltGenRandom.
A different point is you don't trust those sources of randomness because they depend on internal noise or are close implementations.
Additionally is how do you meassure the quality of entropy, as you know that is one of the biggest problem trying to find good rng generators.
One estimation could be extremely good and other estimation could report not so good entropy.
Entropy Estimation
In various science/engineering applications, such as independent
component analysis, image analysis, genetic analysis, speech
recognition, manifold learning, and time delay estimation it is useful
to estimate the differential entropy of a system or process, given
some observations.
As it sais, you must rely on final observations, and those can be wrong.
I you think the internal rng is not good enough, you can always try to buy hardware devices for that purpose. This list on wikipedia has a list of vendors, you can check on the internet reviews about them.
Performance
One point you must consider is the performance within your application using real random number generators. One common technique is to use as seed in a mersenne twister a number obtained using /dev/random.
If the user can't access your system physically, you will need to balance reliability with availability, a system with security holes is as bad as one doesn't work, at the end you must have your important data encrypted.
Edit 1: As suggestion I have moved the article at the top of my comment, is a good read. Thanks for the hint :-).

All the standard gives you is what you've already seen. You would need to know something about how a given standard library implements random_device in order to answer this question. For example, in Visual Studio 2013 Update 4, random_device forwards to rand_s which forwards to RtlGenRandom, which may actually be (always?) a cryptographically secure pseudorandom number generator depending on your Windows version and the hardware available.
If you don't trust the platform to provide a good source of entropy, then you should use your own cryptographically secure PRNG, such as one based on AES. That said, platform vendors have strong incentives for their random numbers to actually be random, and embedding the PRNG into your app means that the PRNG can't be updated as easily in the event it is found to be insecure. Only you can decide on that tradeoff for yourself :)

Entropy is just one measure of RNG quality (and true, exact entropy is impossible to measure). For a practical and reasonably-accurate measurement of your std::random_device's random number quality, consider using a standard randomness test suite such as TestU01, diehard, or its successor dieharder. These run a battery of statistical tests designed to stress your RNG, ensuring it produces statistically random data.
Note that statistical randomness by itself does not certify that the RNG is suitable for cryptographic applications.
Many modern computers have easily-accessible sources of hardware randomness, namely the analog-to-digital converters found in the audio input, camera, and various sensors. These exhibit low-level thermal or electrical noise which can be exploited to produce high-quality random data. However, no OS that I know of actually uses these sensors to supply their system random-number sources (such as /dev/[u]random), since the bitrate of such physical random number sources tends to be very low.
Instead, OS-provided random number sources tend to be seeded by hardware counters and events, such as page faults, device driver events, and other sources of unpredictability. In theory, these events might be fully predictable given the precise hardware state (since they aren't based on e.g. quantum or thermal noise), but in practice they are sufficiently unpredictable that they produce good random data.

Entropy as a scientific term is misused when describing random numbers. Complexity might be a better term. Entropy in physics is defined as the logarithm of the number of available quantum states (not useful in RNG), and entropy in information theory is defined by the Shannon entropy, but that is geared towards the other extreme - how to put as much information into a noisy bit stream, not how to minimize the information.
For example, the digits of Pi look random, but the actual entropy of the digits is zero once you know that they derive from Pi. Increasing "Entropy" in RNG is basically a question of making the source of the numbers as obscure as possible.

Related

How to access Intel Random Number generator RdRand in C++? [duplicate]

I have seen that Intel seems to have included a new assembly function to get real random numbers obtained from hardware. The name of the instruction is RdRand, but only a small amount of details seem accessible on it on Internet: http://en.wikipedia.org/wiki/RdRand
My questions concerning this new instruction and its use in C++11 are the following:
Are the random numbers generated with RdRand really random? (each bit generated from uncorrelated white noise or quantum processes? )
Is it a special feature of Ivy Bridge processors and will Intel continue to implement this function in the next generation of cpu?
How to use it through C++11? Maybe with std::random_device but do compilers already call RdRand if the instruction is available?
How to check whether RdRand is really called when I compile a program?

I designed the random number generator that supplies the random numbers to the RdRand instruction. So for a change, I really know the answers.
1) The random numbers are generated from an SP800-90 AES-CTR DRBG compliant PRNG. The AES uses a 128 bit key, and so the numbers have multiplicative prediction resistance up to 128 bits and additive beyond 128.
However the PRNG is reseeded from a full entropy source frequently. For isolated RdRand instructions it will be freshly reseeded. For 8 threads on 4 cores pulling as fast as possible, it will be reseeded always more frequently than once per 14 RdRands.
The seeds come from a true random number generator. This involves a 2.5Gbps entropy source that is fed into a 3:1 compression ratio entropy extractor using AES-CBC-MAC.
So it is in effect a TRNG, but one that falls back to the properties of a cryptographically secure PRNG for short sequences when heavily loaded.
This is exactly the semantic difference between /dev/random and /dev/urandom on linux, only a lot faster.
The entropy is ultimately gathered from a quantum process, since that is the only fundamental random process we know of in nature. In the DRNG it is specifically the thermal noise in the gates of 4 transistors that drive the resolution state of a metastable latch, 2.5 billion times a second.
The entropy source and conditioner is intended to SP800-90B and SP800-90C compliant, but those specs are still in draft form.
2) RdRand is a part of the standard intel instruction set. It will be supported in all CPU products in the future.
3) You either need to use inline assembly or a library (like openssl) that does use RdRand. If you use a library, the library is implementing the inline assembler that you could implement directly. Intel gives code examples on their web site.
Someone else mentioned librdrand.a. I wrote that. It's pretty simple.
4) Just look for the RdRand opcodes in the binary.

That certainly depends on your view of the determinism of the universe, so is more a philosophical question, but many people consider it being random.
Only intel will know, but since there was demand to add it, its likely there will be demand to keep it
std::random_device is not required to be hardware driven, and even if it is, it is not required to use rdrand. You can ask its double entropy() const noexcept member function whether it is hardware driven or not. Using rdrand for that is a QoI issue, but I would expect every sane implementation that has it available to do so (I have seen e.g. gcc doing it). If unsure, you can always check assembly, but also other means of hardware randomness should be good enough (there is other dedicated hardware available).
See above, if you are interested in whether its only hardware, use entropy, if interested in rdrand, scan the generated machine code.

Since PRISM and Snowden revelations, I would be very carefull at using hardware random generators, or relying on one single library, in an application with security concerns. I prefer using a combination of independant open source cryptographic random generators. By combination, I mean for example:
Let's ra, rb, rc be three independant cryptographic random generators, r be the random value returned to the application.
Let's sa, sb, sc be their seed, ta, tb, tc, reseed periods i.e. e.g. reseed rb every tb draws.
By independant: belonging as far as possible to independant libraries, and relying on different cyphers or algorithms.
Pseudo-code:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sc = rb every tc
r = rb xor rc
sa = rc every ta
Of course, every draw shall be used only once.
Probably two sources are enough:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sa = rb every ta
r = rb xor ra
Choose different values for ta, tb, tc. Their range depends on the strengh of the random source you use.
EDIT: I have started the new library ABaDooRand for this purpose.

1) No, the numbers from RdRand are not truly random, since they come from a cryptographically-secure pseudorandom number generator. However, RdRand, RdSeed, and the Intel Secure Key technology are probably the closest to truly random you will find.
2) Yes, the feature is available in all Intel processors that appear in laptops, desktops, and servers starting with the Ivy Bridge processors you mention. These days, the features are also implemented in AMD chips.
3 and 4) The Intel software development guide is the place to look for these answers. There is an interesting discussion of how Intel Secure Key is applied to an astrophysical problem here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120) and non-paywalled version here (https://arxiv.org/abs/1707.02212). This paper describes how the technology works, how to implement it, and describes its performance (Sections 2.2.1 and 5). Had to read it for a class.

I think they are "said to be" random...Since it's for encryption. I wouldn't worry too much about the quality of the random numbers.
I think Intel will keep doing it as they always regard backward compatibility as important even if this instruction maybe useless in the future.
I am sorry I cannot answer this question because I don't use C++11.
You can try librdrand.a if you don't want to dig into assembly code. Intel has provided the library for free download on their website. I have tested it, it's pretty convenient and has error report mechanism (since the random number generator has a small probability of failing to generate a random number). So if you use this library, you only need to check the return value of the function in librdrand
Please let me know if there is anything wrong in my reply. Thanks
Good luck
xiangpisaiMM

How should I choose parameters for a smaller-than-standard std::mersenne_twister_engine?

I need a C++11 random number generator which is "good enough" and which I can save and restore state in. I want the saved state to be significantly smaller than the 6.6kb or so which this code produces
std::mt19937 rng (1);
std::ofstream save ("save.txt");
save << rng;
std::mersenne_twister_engine has a large number of parameters. It's a bit scary.
For my purposes, a period on the order of billions is sufficient. I've heard of TinyMT, that may be appropriate but can't see how to implement it as a template specialization.
How should I choose the parameters? I suspect it will break badly if I merely reduce the "state size" parameter to a few words.
I would consider using a different engine entirely but, apart from tolerating a moderate period, I don't want to sacrifice the quality of statistical randomness. Artefacts such as the below (for linear congruentals) are unacceptable.

If don't need a lot of numbers, any decent 64bit size RNG will be good. Out of top of my hat very good generator would be XorShift64*, paper http://arxiv.org/abs/1402.6246, code https://github.com/Iwan-Zotow/xorshift64STAR
Another option to use is PCG, "Quadratisch. Praktisch. Gut.", paper and code at http://www.pcg-random.org/
They are both statistically better than MT, the only disadvantage being small(er) period, but it is ok with you as far as I can see

There are many good generators with a small state: MRG32k3a, LFSR113, Chacha-8, Philox-32x4. Even Mixmax (with N=17) would be small by your standard (state of 17 doubles).
TinyMT is also a possibility, although Vigna has shown that some of the bits are not always good (not sure if the not so great lower bits really matters in practice).
I would be wary of xorshift based rngs, see the paper Again, random numbers fall mainly in the planes: xorshift128+ generators by Matsumoto for example. I am also dubious of PCG, if only for the colored table on frontpage of the website: it dumbs things down too much, does not present all the relevant generators, and is skewed towards PCG of course.

How to create the same random numbers on different computers with Armadillo?

I am using the Armadillo c++ library, that allows high-perfomance computation of matrices and vectors. This library has built-in functions to populate its objects with random numbers. I use it in the context of a procedurial random generation of an object. The object creation is random, but no matter how often I recreate the object, it remains the same as long as the seed remains the same.
The issue is that, although I can set the seed to a determined value, and thus recreate the same run on my machine... I lose the coherence of the randomness when going to a different computer. I come from the enchanted land of Matlab where I can specify the function used for the generation of pseudo-random numbers. So, this generation can be cross platform if one chooses the function well. But how do I specify the RNG function for Armadillo?
My research has led me to this source documentation, that "detail" the process of random number generation:
http://arma.sourceforge.net/internal_docs_4300/a01181_source.html
http://arma.sourceforge.net/internal_docs_4300/a00087.html
But i have no clue on what to do here: this code is much more advanced than what I can write. I would appreciate any help!
Thank you guys!
Remarks:
- I do not care how good the random function used is. I just want a fast cross-platform cross-architecture generator. Deterministic randomness is my goal anyway.
- In details, in case it matters, the machines to consider should be intel processors, windows or mac, 32b or 64b.
- I have read the several posts mentionning the use of seeds for randomness but it seems that the problem here is the cross-platform context and the fact that the random generator is buried (to my untrained eyes at least) within Armadillo's code.

In C++98 / C++03 mode, Armadillo will internally use std::rand() for generating random numbers (there's more to it, but that's a good approximation of what's happening).
If you move from one operating system to the next (or across two versions of the same operating system), there is no guarantee that the system provided random number generator will be the same.
If you use Armadillo in C++11 mode, you can use any random number generator you like, with the help of the .imbue() function. Example:
std::mt19937 engine; // Mersenne twister random number engine with default parameters
std::uniform_real_distribution<double> distr(0.0, 1.0);
mat A(123,456);
A.imbue( [&]() { return distr(engine); } ); // fill with random numbers provided by the engine
The Mersenne twister random number engine is provided as standard functionality in C++11. The default parameters should be stable across compiler vendors and versions, and are independent of the operating system.

Is there a way to check if std::random_device is in fact random?

Quoting from cppreference:
std::random_device is a non-deterministic random number engine, although implementations are allowed to implement std::random_device using a pseudo-random number engine if there is no support for non-deterministic random number generation.
Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
Note that little bit of googling shows that at least MinGW implements std::random_device in this way, and thus this is real danger if std::random_device is to be used.
---edit---
Also, if the answer is no and someone could give some insight as to why there is no such function/trait/something I would be quite interested.

Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
There is a way: std::random_device::entropy will return 0.0 if it is implemented in terms of a random number engine (that is, it's deterministic).
From the standard:
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log_2(max() + 1).

There is no 100% safe way to determine real randomness for sure. With a black box approach the best you could do do is to show evidence if it's not fully random:
first you could verify that the distribution seems random, by generating a lot of random munmbers and making statistics about their distribution (e.g. generate 1 million random numbers between 0 and 1000). If it appears that some numbers come out significantly more often than other, then obviously it's not really random.
THe next you can is to run several time a programme generating random numbers after the same initial seed. If you obtain the same sequence of random numbers then it's definitively PRNG and not real randmness. However, if you don't obtain the same sequence it does not proove tanything: the library could use some kind of auto-seed (using clock ticks or something else) to hide/improve the pseudo-randmness.
If your application highly depends on randomness quality (e.g. cryptographic quality) you should consider some more tests, such as those recommended by NIST SP 800-22

Xarn stated above:
However, said pessimism also precludes this method from differentiating between RNG and PRNG based implementation, making it rather unhelpful. Also VC++ could be realistic, but to check that would probably require a lot of insider knowledge about Windows.
If you debug into the Windows implementation, then you will find that you end up in RtlGenRandom, which is one of the better sources of cryptographically random bytes. If you debug into the Linux implementation, then you should end up reading from dev/urandom, which is also OK. The fact that they don't tell us that we're not using something awful, like rand, is annoying.
PS - you don't have to have internal Windows knowledge, you just need to attach the symbols to the debugger.

True random numbers with C++11 and RDRAND

I have seen that Intel seems to have included a new assembly function to get real random numbers obtained from hardware. The name of the instruction is RdRand, but only a small amount of details seem accessible on it on Internet: http://en.wikipedia.org/wiki/RdRand
My questions concerning this new instruction and its use in C++11 are the following:
Are the random numbers generated with RdRand really random? (each bit generated from uncorrelated white noise or quantum processes? )
Is it a special feature of Ivy Bridge processors and will Intel continue to implement this function in the next generation of cpu?
How to use it through C++11? Maybe with std::random_device but do compilers already call RdRand if the instruction is available?
How to check whether RdRand is really called when I compile a program?

That certainly depends on your view of the determinism of the universe, so is more a philosophical question, but many people consider it being random.
Only intel will know, but since there was demand to add it, its likely there will be demand to keep it
std::random_device is not required to be hardware driven, and even if it is, it is not required to use rdrand. You can ask its double entropy() const noexcept member function whether it is hardware driven or not. Using rdrand for that is a QoI issue, but I would expect every sane implementation that has it available to do so (I have seen e.g. gcc doing it). If unsure, you can always check assembly, but also other means of hardware randomness should be good enough (there is other dedicated hardware available).
See above, if you are interested in whether its only hardware, use entropy, if interested in rdrand, scan the generated machine code.

Since PRISM and Snowden revelations, I would be very carefull at using hardware random generators, or relying on one single library, in an application with security concerns. I prefer using a combination of independant open source cryptographic random generators. By combination, I mean for example:
Let's ra, rb, rc be three independant cryptographic random generators, r be the random value returned to the application.
Let's sa, sb, sc be their seed, ta, tb, tc, reseed periods i.e. e.g. reseed rb every tb draws.
By independant: belonging as far as possible to independant libraries, and relying on different cyphers or algorithms.
Pseudo-code:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sc = rb every tc
r = rb xor rc
sa = rc every ta
Of course, every draw shall be used only once.
Probably two sources are enough:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sa = rb every ta
r = rb xor ra
Choose different values for ta, tb, tc. Their range depends on the strengh of the random source you use.
EDIT: I have started the new library ABaDooRand for this purpose.

1) No, the numbers from RdRand are not truly random, since they come from a cryptographically-secure pseudorandom number generator. However, RdRand, RdSeed, and the Intel Secure Key technology are probably the closest to truly random you will find.
2) Yes, the feature is available in all Intel processors that appear in laptops, desktops, and servers starting with the Ivy Bridge processors you mention. These days, the features are also implemented in AMD chips.
3 and 4) The Intel software development guide is the place to look for these answers. There is an interesting discussion of how Intel Secure Key is applied to an astrophysical problem here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120) and non-paywalled version here (https://arxiv.org/abs/1707.02212). This paper describes how the technology works, how to implement it, and describes its performance (Sections 2.2.1 and 5). Had to read it for a class.

I think they are "said to be" random...Since it's for encryption. I wouldn't worry too much about the quality of the random numbers.
I think Intel will keep doing it as they always regard backward compatibility as important even if this instruction maybe useless in the future.
I am sorry I cannot answer this question because I don't use C++11.
You can try librdrand.a if you don't want to dig into assembly code. Intel has provided the library for free download on their website. I have tested it, it's pretty convenient and has error report mechanism (since the random number generator has a small probability of failing to generate a random number). So if you use this library, you only need to check the return value of the function in librdrand
Please let me know if there is anything wrong in my reply. Thanks
Good luck
xiangpisaiMM

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js