Ran2 prng library compatible with C++ "random" distributions - c++

I've looked for it a lot and even though it is referred as a good prng in term of quality, except for the cycle length of less than 10^63, I didn't found anything.
There are a lot of strange looking prng out there, all of these in huge libraries compatible with "random" and C++11 standards; none of these has ran2.
The first question popping out of my mind is: why?
The second question is: is it possible? I really need to compare the results of that one with other prngs for the system I'm simulating (Ising Model), which is said to be a little infamous with prngs and behaves well with ran2.
I already have the library which implements it in a C++ way https://github.com/PrincetonUniversity/athena-public-version/blob/master/src/utils/ran2.cpp, but I hope to find a way to use it with uniform real distributions too.

Related

Is there a way to check if std::random_device is in fact random?

Quoting from cppreference:
std::random_device is a non-deterministic random number engine, although implementations are allowed to implement std::random_device using a pseudo-random number engine if there is no support for non-deterministic random number generation.
Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
Note that little bit of googling shows that at least MinGW implements std::random_device in this way, and thus this is real danger if std::random_device is to be used.
---edit---
Also, if the answer is no and someone could give some insight as to why there is no such function/trait/something I would be quite interested.
Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
There is a way: std::random_device::entropy will return 0.0 if it is implemented in terms of a random number engine (that is, it's deterministic).
From the standard:
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log_2(max() + 1).
There is no 100% safe way to determine real randomness for sure. With a black box approach the best you could do do is to show evidence if it's not fully random:
first you could verify that the distribution seems random, by generating a lot of random munmbers and making statistics about their distribution (e.g. generate 1 million random numbers between 0 and 1000). If it appears that some numbers come out significantly more often than other, then obviously it's not really random.
THe next you can is to run several time a programme generating random numbers after the same initial seed. If you obtain the same sequence of random numbers then it's definitively PRNG and not real randmness. However, if you don't obtain the same sequence it does not proove tanything: the library could use some kind of auto-seed (using clock ticks or something else) to hide/improve the pseudo-randmness.
If your application highly depends on randomness quality (e.g. cryptographic quality) you should consider some more tests, such as those recommended by NIST SP 800-22
Xarn stated above:
However, said pessimism also precludes this method from differentiating between RNG and PRNG based implementation, making it rather unhelpful. Also VC++ could be realistic, but to check that would probably require a lot of insider knowledge about Windows.
If you debug into the Windows implementation, then you will find that you end up in RtlGenRandom, which is one of the better sources of cryptographically random bytes. If you debug into the Linux implementation, then you should end up reading from dev/urandom, which is also OK. The fact that they don't tell us that we're not using something awful, like rand, is annoying.
PS - you don't have to have internal Windows knowledge, you just need to attach the symbols to the debugger.

Usefulness of `rand()` - or who should call `srand()`?

Background: I use rand(), std::rand(), std::random_shuffle() and other functions in my code for scientific calculations. To be able to reproduce my results, I always explicitly specify the random seed, and set it via srand(). That worked fine until recently, when I figured out that libxml2 would also call srand() lazily on its first usage - which was after my early srand() call.
I filled in a bug report to libxml2 about its srand() call, but I got the answer:
Initialize libxml2 first then.
That's a perfectly legal call to be made from a library. You should
not expect that nobody else calls srand(), and the man page nowhere
states that using srand() multiple time should be avoided
This is actually my question now. If the general policy is that every lib can/should/will call srand(), and I can/might also call it here and there, I don't really see how that can be useful at all. Or how is rand() useful then?
That is why I thought, the general (unwritten) policy is that no lib should ever call srand() and the application should call it only once in the beginning. (Not taking multi-threading into account. I guess in that case, you anyway should use something different.)
I also tried to research a bit which other libraries actually call srand(), but I didn't find any. Are there any?
My current workaround is this ugly code:
{
// On the first call to xmlDictCreate,
// libxml2 will initialize some internal randomize system,
// which calls srand(time(NULL)).
// So, do that first call here now, so that we can use our
// own random seed.
xmlDictPtr p = xmlDictCreate();
xmlDictFree(p);
}
srand(my_own_seed);
Probably the only clean solution would be to not use that at all and only to use my own random generator (maybe via C++11 <random>). But that is not really the question. The question is, who should call srand(), and if everyone does it, how is rand() useful then?
Use the new <random> header instead. It allows for multiple engine instances, using different algorithms and more importantly for you, independent seeds.
[edit]
To answer the "useful" part, rand generates random numbers. That's what it's good for. If you need fine-grained control, including reproducibility, you should not only have a known seed but a known algorithm. srand at best gives you a fixed seed, so that's not a complete solution anyway.
Well, the obvious thing has been stated a few times by others, use the new C++11 generators. I'm restating it for a different reason, though.
You use the output for scientific calculations, and rand usually implements a rather poor generator (in the mean time, many mainstream implementations use MT19937 which apart from bad state recovery isn't so bad, but you have no guarantee for a particular algorithm, and at least one mainstream compiler still uses a really poor LCG).
Don't do scientific calculations with a poor generator. It doesn't really matter if you have things like hyperplanes in your random numbers if you do some silly game shooting little birds on your mobile phone, but it matters big time for scientific simulations. Don't ever use a bad generator. Don't.
Important note: std::random_shuffle (the version with two parameters) may actually call rand, which is a pitfall to be aware of if you're using that one, even if you otherwise use the new C++11 generators found in <random>.
About the actual issue, calling srand twice (or even more often) is no problem. You can in principle call it as often as you want, all it does is change the seed, and consequentially the pseudorandom sequence that follows. I'm wondering why an XML library would want to call it at all, but they're right in their response, it is not illegitimate for them to do it. But it also doesn't matter.
The only important thing to make sure is that either you don't care about getting any particular pseudorandom sequence (that is, any sequence will do, you're not interested in reproducing an exact sequence), or you are the last one to call srand, which will override any prior calls.
That said, implementing your own generator with good statistical properties and a sufficiently long period in 3-5 lines of code isn't all that hard either, with a little care. The main advantage (apart from speed) is that you control exactly where your state is and who modifies it.
It is unlikely that you will ever need periods much longer than 2128 because of the sheer forbidding time to actually consume that many numbers. A 3GHz computer consuming one number every cycle will run for 1021 years on a 2128 period, so there's not much of an issue for humans with average lifespans. Even assuming that the supercomputer you run your simulation on is a trillion times faster, your grand-grand-grand children won't live to see the end of the period.
Insofar, periods like 219937 which current "state of the art" generators deliver are really ridiculous, that's trying to improve the generator at the wrong end if you ask me (it's better to make sure they're statistically firm and that they recover quickly from a worst-case state, etc.). But of course, opinions may differ here.
This site lists a couple of fast generators with implementations. They're xorshift generators combined with an addition or multiplication step and a small (from 2 to 64 machine words) lag, which results in both fast and high quality generators (there's a test suite as well, and the site's author wrote a couple of papers on the subject, too). I'm using a modification of one of these (the 2-word 128-bit version ported to 64-bits, with shift triples modified accordingly) myself.
This problem is being tackled in C++11's random number generation, i.e. you can create an instance of a class:
std::default_random_engine e1
which allows you to fully control only random numbers generated from object e1 (as opposed to whatever would be used in libxml). The general rule of thumb would then be to use new construct, as you can generate your random numbers independently.
Very good documentation
To address your concerns - I also think that it would be a bad practice to call srand() in a library like libxml. However, it's more that srand() and rand() are not designed to be used in the context you are trying to use them - they are enough when you just need some random numbers, as libxml does. However, when you need reproducibility and be sure that you are independent on others, the new <random> header is the way to go for you. So, to sum up, I don't think it's a good practice on library's side, but it's hard to blame them for doing that. Also, I could not imagine them changing that, as billion other pieces of software probably depend on it.
The real answer here is that if you want to be sure that YOUR random number sequence isn't being altered by someone else's code, you need a random number context that is private to YOUR work. Note that calling srand is only one small part of this. For example, if you call some function in some other library that calls rand, it will also disrupt the sequence of YOUR random numbers.
In other words, if you want predictable behaviour from your code, based on random number generation, it needs to be completely separate from any other code that uses random numbers.
Others have suggested using the C++ 11 random number generation, which is one solution.
On Linux and other compatible libraries, you could also use rand_r, which takes a pointer to an unsigned int to a seed that is used for that sequence. So if you initialize that a seed variable, then use that with all calls to rand_r, it will be producing a unique sequence for YOUR code. This is of course still the same old rand generator, just a separate seed. The main reason I meantion this is that you could fairly easily do something like this:
int myrand()
{
static unsigned int myseed = ... some initialization of your choice ...;
return rand_r(&myseed);
}
and simply call myrand instead of std::rand (and should be doable to work into the std::random_shuffle that takes a random generator parameter)

How to generate good random seed for a random generator?

I certainly can't use the random generator for that. Currently I'm creating a CRC32 hash from unixtime()+microtime().
Are there any smarter methods than hashing time()+microtime() ?
I am not fully satisfied from the results though, I expected it to be more random, but I can see strong patterns in it, until I added more calls to MicroTime() but it gets a lot slower, so I'm looking for some optimal way of doing this.
This silly code generates the best output I could make so far, the calculations were necessary or I could see some patterns in the output:
starthash(crc32);
addtohash(crc32, MicroTime());
addtohash(crc32, time(NULL)); // 64bit
addtohash(crc32, MicroTime()/13.37f);
addtohash(crc32, (10.0f-MicroTime())*1337.0f);
addtohash(crc32, (11130.0f-MicroTime())/1313137.0f);
endhash(crc32);
MicroTime() returns microseconds elapsed from program start. I have overloaded the addtohash() to every possible type.
I would rather take non-library solutions, it's just ~10 lines of code probably anyways, I don't want to install huge library because of something I don't actually need that much, and I'm more interested in the code than just using it from a function call.
If in any doubt, get your seed from CryptGenRandom on Windows, or by reading from dev/random or dev/urandom on *NIX systems.
This might be overkill for your purposes, but unless it causes performance problems there's no point messing with low-entropy sources like the time.
It's unlikely to be underkill. And if you're writing code with a real need for high-quality secure random data, and didn't bother mentioning that in the question, well, you get what you deserve ;-)
you can check for lfsr & pseudorandom generators.. usually this is a hardwre solution but you can implement easily your own software lfsr

What is the periodicity of the PRNG in GNU C Library?

Is there any literature available on the periodicity of the random number generator in gcc's g++ (if we don't re-seed the function)? I suppose I could run tests myself, but it would be better to have access to well-verified research.
Thanks in advance for your help.
// EDIT
I just wanted to add that I have searched quite a bit, with multiple engines, but I have not found anything specific. I have only read general comments about the periodicity being limited by the number of bits needed to represent the seed. (So I guess that given the fact that srand is usually called with time, the periodicity can be no more than 10^12 or so. But something more definite would be very helpful before I start implementing my algorithms.)
When searching in the rand(3) man page, I found this:
The versions of rand() and srand() in
the Linux C Library use the same
random number generator as random()
and srandom()
so I had a look at the random(3) man page, and here is your answer:
The period of this random number
generator is very large, approximately
16*((2**31)-1)
This can be quite useful for pedagogic purposes, since you want to develop your own PRNG. However, I would discourage you to use this PRNG when developing an application. You should prefer one of the Boost.Random's implementation as suggested #Neil Butterworth (MT19937 is a good default PRNG, sufficient for most of the applications).
Finally, if you intend to learn more about PRNGs, I would suggest you to read these two scientific articles, that well survey PRNGs.
Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010
Pseudorandom number generators, Pierre L'Ecuyer, in Encyclopedia of Quantitative Finance
Encyclopedia of Quantitative Finance, 2008
The srand/rand functions are a bit broken. As you are using C++, I strongly recommend you use the boost random number library. It's a header-only library, so you don't need to build anything. There's an example of how to use it here.

Deterministic Random Number Streams in C++ STL

I want to supply a number, and then receive a set of random numbers. However, I want those numbers to be the same regardless of which computer I run it on (assuming I supply the same seed).
Basically my question is: in C++, if I make use of rand(), but supply srand() with a user-defined seed rather than the current time, will I be able to generate the same random number stream on any computer?
There are dozens of PRNGs available as libraries. Pick one. I tend to use Mersenne Twister.
By using an externally supplied library, you bypass the risk of a weird or buggy implementation of your language's library rand(). As long as your platforms all conform to the same mathematical semantics, you'll get consistent results.
MT is a favorite of mine because I'm a physicist, and I use these things for Monte Carlo, where the guarantee of equal-distribution to high dimensions is important. But don't use MT as a cryptographic PRNG!
srand() & rand() are not part of the STL. They're actually part of the C runtime.
Yes, they will produce the same results as long as it's the same implementation of srand()/rand().
Depending on your needs, you might want to consider using Boost.Random. It provides several high-quality random number generators.
Assuming the implementations of rand() are the same, yes.
The easiest way to ensure this is to include a known rand() implementation with your program - either included in your project's source code or in the form of a library you can manage.
No, the ANSI C standard only specifies that rand() must produce a stream of random integers between 0 and RAND_MAX, which must be at least 32767 (source). This stream must be deterministic only in that, for a given implementation on a given machine, it must produce the same integer stream given the same seed.
You want a portable PRNG. Mersenne Twister (many implementations linked at the bottom) is pretty portable, as is Ben Pfaff's homegrown C99-compliant PRNG. Boost.Random should be fine too; as you're writing your code in C++, using Boost doesn't limit your choice of platforms much (although some "lesser" (i.e. non-compliant) compilers may have trouble with its heavy use of template metaprogramming). This is only really a problem for low-volume embedded platforms and perhaps novel research architectures, so if by "any computer" you mean "any x86/PPC/ARM/SPARC/Alpha/etc. platform that GCC targets", any of the above should do just fine.
Write your own pseudorandom number routine. There are a lot of algorithms documented on the internet, and they have a number of applications where rand isn't good enough (e.g. Perlin Noise).
Try these links for starters:
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://en.wikipedia.org/wiki/Pseudorandom_number_generator
I believe if you supply with srand with the same seed, you will get the same results. That's pretty much the definition of a seed in terms of pseudo random number generators.
Yes. For a given seed (starting value), the sequence of numbers that rand() returns will always be the same.