How to create the same random numbers on different computers with Armadillo?

How to create the same random numbers on different computers with Armadillo? - c++

I am using the Armadillo c++ library, that allows high-perfomance computation of matrices and vectors. This library has built-in functions to populate its objects with random numbers. I use it in the context of a procedurial random generation of an object. The object creation is random, but no matter how often I recreate the object, it remains the same as long as the seed remains the same.
The issue is that, although I can set the seed to a determined value, and thus recreate the same run on my machine... I lose the coherence of the randomness when going to a different computer. I come from the enchanted land of Matlab where I can specify the function used for the generation of pseudo-random numbers. So, this generation can be cross platform if one chooses the function well. But how do I specify the RNG function for Armadillo?
My research has led me to this source documentation, that "detail" the process of random number generation:
http://arma.sourceforge.net/internal_docs_4300/a01181_source.html
http://arma.sourceforge.net/internal_docs_4300/a00087.html
But i have no clue on what to do here: this code is much more advanced than what I can write. I would appreciate any help!
Thank you guys!
Remarks:
- I do not care how good the random function used is. I just want a fast cross-platform cross-architecture generator. Deterministic randomness is my goal anyway.
- In details, in case it matters, the machines to consider should be intel processors, windows or mac, 32b or 64b.
- I have read the several posts mentionning the use of seeds for randomness but it seems that the problem here is the cross-platform context and the fact that the random generator is buried (to my untrained eyes at least) within Armadillo's code.

In C++98 / C++03 mode, Armadillo will internally use std::rand() for generating random numbers (there's more to it, but that's a good approximation of what's happening).
If you move from one operating system to the next (or across two versions of the same operating system), there is no guarantee that the system provided random number generator will be the same.
If you use Armadillo in C++11 mode, you can use any random number generator you like, with the help of the .imbue() function. Example:
std::mt19937 engine; // Mersenne twister random number engine with default parameters
std::uniform_real_distribution<double> distr(0.0, 1.0);
mat A(123,456);
A.imbue( [&]() { return distr(engine); } ); // fill with random numbers provided by the engine
The Mersenne twister random number engine is provided as standard functionality in C++11. The default parameters should be stable across compiler vendors and versions, and are independent of the operating system.

Related

What is the PRNG behind C++'s rand() function? [duplicate]

Usage of rand() is usually frowned upon despite using a seed via srand(). Why would that be the case? What better alternatives are available?

There are two parts to this story.
First, rand is a pseudorandom number generator. This means it depends on a seed. For a given seed it will always give the same sequence (assuming the same implementation). This makes it not suitable for certain applications where security is of a great concern. But this is not specific to rand. It's an issue with any pseudo-random generator. And there are most certainly a lot of classes of problems where a pseudo-random generator is acceptable. A true random generator has its own issues (efficiency, implementation, entropy) so for problems that are not security related most often a pseudo-random generator is used.
So you analyzed your problem and you conclude a pseudo-random generator is the solution. And here we arrive to the real troubles with the C random library (which includes rand and srand) that are specific to it and make it obsolete (a.k.a.: the reasons you should never use rand and the C random library).
One issue is that it has a global state (set by srand). This makes it impossible to use multiple random engines at the same time. It also greatly complicates multithreaded tasks.
The most visible problem of it is that it lacks a distribution engine: rand gives you a number in interval [0 RAND_MAX]. It is uniform in this interval, which means that each number in this interval has the same probability to appear. But most often you need a random number in a specific interval. Let's say [0, 1017]. A commonly (and naive) used formula is rand() % 1018. But the issue with this is that unless RAND_MAX is an exact multiple of 1018 you won't get an uniform distribution.
Another issue is the Quality of Implementation of rand. There are other answers here detailing this better than I could, so please read them.
In modern C++ you should definitely use the C++ library from <random> which comes with multiple random well-defined engines and various distributions for integer and floating point types.

None of the answers here explains the real reason of being rand() bad.
rand() is a pseudo-random number generator (PRNG), but this doesn't mean it must be bad. Actually, there are very good PRNGs, which are statistically hard or impossible to distinguish from true random numbers.
rand() is completely implementation defined, but historically it is implemented as a Linear Congruential Generator (LCG), which is usually a fast, but notoriously bad class of PRNGs. The lower bits of these generators have much lower statistical randomness than the higher bits and the generated numbers can produce visible lattice and/or planar structures (the best example of that is the famous RANDU PRNG). Some implementations try to reduce the lower bits problem by shifting the bits right by a pre-defined amount, however this kind of solution also reduces the range of the output.
Still, there are notable examples of excellent LCGs, like L'Ecuyer's 64 and 128 bits multiplicative linear congruential generators presented in Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure, Pierre L'Ecuyer, 1999.
The general rule of thumb is that don't trust rand(), use your own pseudo-random number generator which fits your needs and usage requirements.

What is bad about rand/srand is that rand—
Uses an unspecified algorithm for the sequence of numbers it generates, yet
allows that algorithm to be initialized with srand for repeatable "randomness".
These two points, taken together, hamper the ability of implementations to improve on rand's implementation (e.g., to use a cryptographic random number generator [RNG] or an otherwise "better" algorithm for producing pseudorandom numbers). For example, JavaScript's Math.random and FreeBSD's arc4random don't have this problem, since they don't allow applications to seed them for repeatable "randomness" — it's for exactly this reason that the V8 JavaScript engine was able to change its Math.random implementation to a variant of xorshift128+ while preserving backward compatibility. (On the other hand, letting applications supply additional data to supplement "randomness", as in BCryptGenRandom, is less problematic; even so, however, this is generally seen only in cryptographic RNGs.)
Also:
The fact that the algorithm and the seeding procedure for rand and srand are unspecified means that even reproducible "randomness" is not guaranteed between rand/srand implementations, between versions of the same standard library, between operating systems, etc.
If srand is not called before rand is, rand behaves similarly as though srand(1) were first called. In practice, this means that rand can only be implemented as a pseudorandom number generator (PRNG) rather than as a nondeterministic RNG, and that rand's PRNG algorithm can't differ in a given implementation whether the application calls srand or not.
EDIT (Jul. 8, 2020):
There is one more important thing that's bad about rand and srand. Nothing in the C standard for these functions specifies a particular distribution that the "pseudo-random numbers" delivered by rand have to follow, including the uniform distribution or even a distribution that approximates the uniform distribution. Contrast this with C++'s uniform_int_distribution and uniform_real_distribution classes, as well as the specific pseudorandom generator algorithms specified by C++, such as linear_congruential_engine and mt19937.
EDIT (begun Dec. 12, 2020):
Yet another bad thing about rand and srand: srand takes a seed that can only be as big as an unsigned. unsigned must be at least 16 bits and in most mainstream C implementations, unsigned is either 16 or 32 bits depending on the implementation's data model (notably not 64 bits even if the C implementation adopts a 64-bit data model). Thus, no more than 2^N different sequences of numbers can be selected this way (where N is the number of bits in an unsigned), even if the underlying algorithm implemented by rand can produce many more different sequences than that (say, 2^128 or even 2^19937 as in C++'s mt19937).

Firstly, srand() doesn't get a seed, it sets a seed. Seeding is part of the use of any pseudo random number generator (PRNG). When seeded the sequence of numbers that the PRNG produces from that seed is strictly deterministic because (most?) computers have no means to generate true random numbers. Changing your PRNG won't stop the sequence from being repeatable from the seed and, indeed, this is a good thing because the ability to produce the same sequence of pseudo-random numbers is often useful.
So if all PRNGs share this feature with rand() why is rand() considered bad? Well, it comes down to the "psuedo" part of pseudo-random. We know that a PRNG can't be truly random but we want it to behave as close to a true random number generator as possible, and there are various tests that can be applied to check how similar a PRNG sequence is to a true random sequence. Although its implementation is unspecified by the standard, rand() in every commonly used compiler uses a very old method of generation suited for very weak hardware, and the results it produces fair poorly on these tests. Since this time many better random number generators have been created and it is best to choose one suited to your needs rather than relying on the poor quality one likely to provided by rand().
Which is suitable for your purposes depends on what you are doing, for example you may need cryptographic quality, or multi-dimensional generation, but for many uses where you simply want things to be fairly uniformly random, fast generation, and money is not on the line based on the quality of the results you likely want the xoroshiro128+ generator. Alternatively you could use one of the methods in C++'s <random> header but the generators offered are not state of the art, and much better is now available, however, they're still good enough for most purposes and quite convenient.
If money is on the line (e.g. for card shuffling in an online casino, etc.), or you need cryptogaphic quality, you need to carefully investigation appropriate generators and ensure they exactly much your specific needs.

rand is usually -but not always-, for historical reasons, a very bad pseudo-random number generator (PRNG). How bad is it is implementation specific.
C++11 has nice, much better, PRNGs. Use its <random> standard header. See notably std::uniform_int_distribution here which has a nice example above std::mersenne_twister_engine.
PRNGs are a very tricky subject. I know nothing about them, but I trust the experts.

Let me add another reason that makes rand() totally not usable: The standard does not define any characteristic of random numbers it generates, neither distribution nor range.
Without definition of distribution we can't even wrap it to have what distribution we want.
Even further, theorically I can implement rand() by simply return 0, and anounce that RAND_MAX of my rand() is 0.
Or even worse, I can let least significant bit always be 0, which doesn't violate the standard. Image someone write code like if (rand()%2) ....
Pratically, rand() is implementation defined and the standards says:
There are no guarantees as to the quality of the random sequence produced and some implementations
are known to produce sequences with distressingly non-random low-order bits. Applications with
particular requirements should use a generator that is known to be sufficient for their needs
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf p36

If you use rand(), you will basically have the same result after generating your random number.
So even after using srand(), it will be easy to predict the number generated if someone can guess the seed you use. This is because the function rand() uses a specific algorithm to produce such numbers
With some time to waste, you can figure out how to predict numbers generated by the function, given the seed. All you need now is to guess the seed. Some people refer to the seed as the current time. So if can guess the time at which you run the application, I ll be able to predict the number
IT IS BAD TO USE RAND()!!!!

Is there a way to check if std::random_device is in fact random?

Quoting from cppreference:
std::random_device is a non-deterministic random number engine, although implementations are allowed to implement std::random_device using a pseudo-random number engine if there is no support for non-deterministic random number generation.
Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
Note that little bit of googling shows that at least MinGW implements std::random_device in this way, and thus this is real danger if std::random_device is to be used.
---edit---
Also, if the answer is no and someone could give some insight as to why there is no such function/trait/something I would be quite interested.

Is there a way to check whether current implementation uses PRNG instead of RNG (and then say exit with an error) and if not, why not?
There is a way: std::random_device::entropy will return 0.0 if it is implemented in terms of a random number engine (that is, it's deterministic).
From the standard:
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log_2(max() + 1).

There is no 100% safe way to determine real randomness for sure. With a black box approach the best you could do do is to show evidence if it's not fully random:
first you could verify that the distribution seems random, by generating a lot of random munmbers and making statistics about their distribution (e.g. generate 1 million random numbers between 0 and 1000). If it appears that some numbers come out significantly more often than other, then obviously it's not really random.
THe next you can is to run several time a programme generating random numbers after the same initial seed. If you obtain the same sequence of random numbers then it's definitively PRNG and not real randmness. However, if you don't obtain the same sequence it does not proove tanything: the library could use some kind of auto-seed (using clock ticks or something else) to hide/improve the pseudo-randmness.
If your application highly depends on randomness quality (e.g. cryptographic quality) you should consider some more tests, such as those recommended by NIST SP 800-22

Xarn stated above:
However, said pessimism also precludes this method from differentiating between RNG and PRNG based implementation, making it rather unhelpful. Also VC++ could be realistic, but to check that would probably require a lot of insider knowledge about Windows.
If you debug into the Windows implementation, then you will find that you end up in RtlGenRandom, which is one of the better sources of cryptographically random bytes. If you debug into the Linux implementation, then you should end up reading from dev/urandom, which is also OK. The fact that they don't tell us that we're not using something awful, like rand, is annoying.
PS - you don't have to have internal Windows knowledge, you just need to attach the symbols to the debugger.

How to ensure uniqe seeds for the RNG on subsequent process launches?

Summary: I need a simple self-contained way to seed my RNG so that the seed is different every time the program is launched.
Details:
I often need to run the same program (which does calculations with random numbers, e.g. Monte Carlo simulation etc.) many times to have good statistics on the result. In this case it is important that the random number generator will have a different seed on each run.
I would like to have a simple, cross-platform solution for this that can be contained within the program itself. (I.e. I don't want to always go to the trouble of having a script that launches each instance of the program with a different seed parameter.)
Note that using time(0) as a seed is not a good solution because the timer resolution is bad: if several processes are launched in parallel, they are likely to get the same seed from time(0).
Requirements:
as simple as possible
cross platform (currently I need it to work on Windows & Linux, x86 & x64 only).
self contained: shouldn't rely on a special way of launching the program (passing the seed as a parameter from the launch script is too much trouble).
I'd like to wrap the whole thing into a small library that I can include in any new project with minimal effort and just do something like SeedMyRNG(getSeed());
EDIT:
Although my main question was about doing this in C (or C++), based on the pointers provided in the answer I found os.urandom() as a Python solution (which is also useful for me).
Related relevant question: How to use /dev/random or urandom in C?

"Cross-platform" is a subjective term. Do you mean "any platform" (you might encounter in the future) or "every platform" (on your list of supported platforms)? Here's a pragmatic approach that I usually take:
Check if you have /dev/urandom; if yes, seed from there.
On Windows, use CryptGenRandom().
If all else fails, seed from time().

You could use dev random on Linux and the crypto api on Windows. Write a small library to present a platform independent interface and it should do exactly what you want.

Check out RandomLib
which is a C++ random number library with good support for seeds. In
particular
Random r;
r.Reseed();
causes r to be seeded with a vector of numbers (from a call to
RandomSeed::SeedVector()) which is almost certainly unique. This
includes the time, microseconds, pid, hostid, year.
Less optimally, you can also seed with RandomSeed::SeedWord() which
reads from /dev/urandom if possible. However, you will typically get a
seed collision after 2^16 runs with a single 32-bit word as your seed.
So, if your application is run many times, you are better off using the
bigger seed space offered by a vector.
Of course, this supposes that you are using a random number generator
that can make use of a vector seed. RandomLib offers MT19937 and
SFMT19937, which both use vector seeds.

Update on 2014-08-04:
Boost has a cross-platform implementation now, random_device. Here's an example for seeding a pseudo-random generator from boost using an unpredictable seed:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/random_device.hpp>
boost::random::mt11213b rng( (boost::random_device())() );

C++. Is it possible that a RNG gives different random variable in two different machines using the same seed?

I have this long and complex source code that uses a RNG with a fix seed.
This code is a simulator and the parameters of this simulator are the random values given by this RNG.
When I execute the code in the same machine, no matter how many attempts I do the output is the same. But when I execute this code on two different machines and I compare the outputs of both the machines, they are different.
Is it possible that two different machines gives somehow different output using the same random number generator and the same seed?
The compiler version, the libraries and the OS are the same.

It is certainly possible, as the RNG may be combining machine specific data with the seed, such as the network card address, to generate the random number. It is basically implementation specific.

As they do give different results it is obviously possible that they give different results. Easy-to-answer question, next!
Seriously: without knowing the source code to the RNG it’s hard to know whether you’re observing a bug or a feature. But it sounds like the RNG in question is using a second seed from somewhere else, e.g. the current time, or some hardware-dependent value like the network card’s MAC address.

If you need something that can be repeated from machine to machine, try the Boost Random Number Library.

If it's a pseudo random generator that uses nothing but the seed to produce a number sequence, then by definition they cannot be different. However, if the ones you're using are using something machine dependent to perturb the seed, or quite simply, a different algorithm, it's of course quite possible. Which implementation are you using, and if it's a standard library implementation, are they both the same version?

Yes. There are floating-point RNGs, for instance, whose results can depend on whether your CPU is properly implementing IEEE floats (not guranteed in ISO C++). Also, effects such as spilling 80 bits doubles to memory can influence results.
There is also some possibile confusion about the notion of a "seed". Some people define the seed as all input to set the initial state of the RNG. Others restrict it to only the explicit input in code, and exclude implicit input from e.g. HW sources or /dev/random.

Perhaps it's a little/big endian problem, or the code detects the processor in some way. Easiest way to do this would be to use breakpoints or similar debug routines to watch the seeding routines and the RNG itself at work.

It depends greatly on which RNG you are using. Things such like random(3) or the rand48(3) family are designed to return the same sequence when run with the same seed. Now, if the RNG you are using take /dev/random output, all bets are off and results will be different.

Deterministic Random Number Streams in C++ STL

I want to supply a number, and then receive a set of random numbers. However, I want those numbers to be the same regardless of which computer I run it on (assuming I supply the same seed).
Basically my question is: in C++, if I make use of rand(), but supply srand() with a user-defined seed rather than the current time, will I be able to generate the same random number stream on any computer?

There are dozens of PRNGs available as libraries. Pick one. I tend to use Mersenne Twister.
By using an externally supplied library, you bypass the risk of a weird or buggy implementation of your language's library rand(). As long as your platforms all conform to the same mathematical semantics, you'll get consistent results.
MT is a favorite of mine because I'm a physicist, and I use these things for Monte Carlo, where the guarantee of equal-distribution to high dimensions is important. But don't use MT as a cryptographic PRNG!

srand() & rand() are not part of the STL. They're actually part of the C runtime.
Yes, they will produce the same results as long as it's the same implementation of srand()/rand().
Depending on your needs, you might want to consider using Boost.Random. It provides several high-quality random number generators.

Assuming the implementations of rand() are the same, yes.
The easiest way to ensure this is to include a known rand() implementation with your program - either included in your project's source code or in the form of a library you can manage.

No, the ANSI C standard only specifies that rand() must produce a stream of random integers between 0 and RAND_MAX, which must be at least 32767 (source). This stream must be deterministic only in that, for a given implementation on a given machine, it must produce the same integer stream given the same seed.
You want a portable PRNG. Mersenne Twister (many implementations linked at the bottom) is pretty portable, as is Ben Pfaff's homegrown C99-compliant PRNG. Boost.Random should be fine too; as you're writing your code in C++, using Boost doesn't limit your choice of platforms much (although some "lesser" (i.e. non-compliant) compilers may have trouble with its heavy use of template metaprogramming). This is only really a problem for low-volume embedded platforms and perhaps novel research architectures, so if by "any computer" you mean "any x86/PPC/ARM/SPARC/Alpha/etc. platform that GCC targets", any of the above should do just fine.

Write your own pseudorandom number routine. There are a lot of algorithms documented on the internet, and they have a number of applications where rand isn't good enough (e.g. Perlin Noise).
Try these links for starters:
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://en.wikipedia.org/wiki/Pseudorandom_number_generator

I believe if you supply with srand with the same seed, you will get the same results. That's pretty much the definition of a seed in terms of pseudo random number generators.

Yes. For a given seed (starting value), the sequence of numbers that rand() returns will always be the same.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js