How to set seed dynamically for LLVM Random Number Generator? - llvm

According to the doc llvm random number generator, it says
The seed should be set by passing the -rng-seed= option.
So this is a static parameter?
Can I have something like C's srand(seed)? That is using a runtime value to initialize a random number generator in LLVM?

AFAIK, there's no alternative and you need to use the -rng-seed option if you want to set the seed. The seed value is global static in this compilation unit, so it'll be always overridden by the command-line option.
Maybe you can emulate the random seed generation using /dev/urandom with something like this with above command-line option (especially if you wrap the pass invocation in a script):
head -100 /dev/urandom | cksum | awk '{print $1}'
This will allow you to keep the flexibility of having the same pseudo-random sequence generated for debugging purposes, without having to recompile.

Related

How to properly seed a 64 bit random generator with time [duplicate]

I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.
Currently I'm computing a 32-bit seed using the following method:
mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity
unsigned long genSeed() {
return ( static_cast<unsigned long>(time(NULL)) << 16 )
| ( (static_cast<unsigned long>(clock()) & 0xFF) << 8 )
| ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}
//...
seed = genSeed();
prng.seed(seed);
I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?
Use std::random_device to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.
std::mt19937_64 prng;
seed = std::random_device{}();
prng.seed(seed);
operator() of std::random_device returns an unsigned int, so if your platform has 32-bit ints, and you want a 64-bit seed, you'll need to call it twice.
std::mt19937_64 prng;
std::random_device device;
seed = (static_cast<uint64_t>(device()) << 32) | device();
prng.seed(seed);
Another available option is using std::seed_seq to seed the PRNG. This allows the PRNG to call seed_seq::generate, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.
std::mt19937_64 prng;
std::random_device device;
std::seed_seq seq{device(), device(), device(), device()};
prng.seed(seq);
I'm calling the random_device 4 times to create a 4 element initial sequence for seed_seq. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.
Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:
The program is relaunched on the same machine later,
Two threads are launched on the same machine at the same time,
The program is launched on two different machines at the same time.
1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)
Now the point is what is the best way to combine them to get a uint_fast64_t (the seed type of std::mt19937_64)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.
A std::seed_seq would be the easy way to go, however its return type uint_least32_t is not our best choice.
A good 64 bits hasher is a much better choice. The STL offers std::hash under the functional header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is a size_t which on 64 machines is very likely to match our requirements.
Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.
A std::random_device could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.
As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.
The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update seed_count in a thread-safe manner, or multiple simulations could end up with the same seed_count. You could simply make it an std::atomic<int> to solve that.
Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:
at program startup, grab the current system time (using a high resolution timer) once, and store that.
assign each simulation a unique ID (this could just be an integer initialized to 0, (which should be generated without any race conditions, as mentioned above) which is incremented each time a simulation starts, effectively like your seed_count.
when seeding a simulation, just use the initially generated timestamp + the unique ID. If you do this, every simulation in the process is assured a unique seed.
How about...
There is some main code that starts the threads and there are copies of a function run in those threads, each copy with it's own Marsenne Twister. Am I correct? If it is so, why not use another random generator in the main code? It would be seeded with time stamp, and send it's consecutive pseudorandom numbers to function instances as their seeds.
From the comments I understand you want to run several instances of the algorithm, one instance per thread. And given that the seed for each instance will be generated pretty much at the same time, you want to ensure that these seeds are different. If that is indeed what you are trying to solve, then your genSeed function will not necessarily guarantee that.
In my opinion, what you need is a parallelisable random number generator (RNG). What this means, is that you only need one RNG which you instantiate with only one seed (which you can generate with your genSeed) and then the sequence of random numbers that would normally be gerenated in a sequential environment is split in X non-overlapping sequences; where X is the number of threads. There is a very good library which provides these type of RNGs in C++, follows the C++ standard for RNGs, and is called TRNG(http://numbercrunch.de/trng).
Here is a little more information. There are two ways you can achieve non-overlapping sequences per thread. Let's assume that the sequence of random numbers from a single RNG is r = {r(1), r(2), r(3),...} and you have only two threads. If you know in advance how many random numbers you will need per thread, say M, you can give the first M of the r sequence to the first thread, ie {r(1), r(2),..., r(M)}, and the second M to the second thread, ie {r(M+1), r(M+2),...r(2M)}. This technique is called blocksplitting since you split the sequence in two consecutive blocks.
The second way is to create the sequence for the first thread as {r(1), r(3), r(5), ...} and for the second thread as {r(2), r(4), r(6),...}, which has the advantage that you do not need to know in advance how many random numbers you will need per thread. This is called leapfroging.
Note that both methods guarantee that the sequences per thread are indeed non-overlapping. The link I posted above has many examples and the library itself is extremely easy to use. I hope my post helps.
The POSIX function gettimeofday(2) gives the time with microsecond precision.
The POSIX thread function gettid(2) returns the ID number of the current thread.
You should be able to combine the time in seconds since the epoch (which you are already using), the time in microseconds, and the thread ID to get a seed which is always unique on one machine.
If you also need it to be unique across multiple machines, you could consider also getting the hostname, the IP address, or the MAC address.
I would guess that 32 bits is probably enough, since there are over 4 billion unique seeds available. Unless you are running billions of processes, which doesn't seem likely, you should be alright without going to 64 bit seeds.

C++ RNG: how to get different rand generators on different processors?

How do I seed the random generator so that I have different number sequences on different processors?
My first attempt was using the processor's rank as seed.
Then I found out the hard way that srand(0) gives the same sequence as srand(1).
Currently, I'm doing this:
srand(time(NULL) + rank)
Is this an OK approach? Or is there a better way?
Thanks
Generate different seeds with rand() initalized time(NULL) and pass that seeds to your process/thread. And yes, use something different than rand().
Calling time() in a multi-threaded environment to seed an RNG is asking for trouble. The threads could all get the same time, they could get different ones, and it's hard to control. Seed the RNG with processor rank, as you were doing initially, but either use a decent RNG that behaves well with any seed, or if you must use rand(), simply fiddle with the rank to get a slightly better seed, e.g. rank * 5 + 123;
if you are on a *nix environment make use of /dev/random or /dev/urandom as your source of entropy. On Windows, call CryptGenRandom()

C++11: Random Numbers using C++11 Random Number Support Library, Generate Same Sequence Each Run

Question
Is it possible to seed the mt19937_64 engine in such a way that the same sequence is generated each time a program is run?
I presume this is possible, as there is a seed function. However I don't know if this will do what I want it to, or will work as I expect, generating the same sequence each time.
You can either set the seed using seed or call the constructor that takes a seed. In either case ensure you are passing some constant such as 27423.

Best way to seed mt19937_64 for Monte Carlo simulations

I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.
Currently I'm computing a 32-bit seed using the following method:
mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity
unsigned long genSeed() {
return ( static_cast<unsigned long>(time(NULL)) << 16 )
| ( (static_cast<unsigned long>(clock()) & 0xFF) << 8 )
| ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}
//...
seed = genSeed();
prng.seed(seed);
I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?
Use std::random_device to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.
std::mt19937_64 prng;
seed = std::random_device{}();
prng.seed(seed);
operator() of std::random_device returns an unsigned int, so if your platform has 32-bit ints, and you want a 64-bit seed, you'll need to call it twice.
std::mt19937_64 prng;
std::random_device device;
seed = (static_cast<uint64_t>(device()) << 32) | device();
prng.seed(seed);
Another available option is using std::seed_seq to seed the PRNG. This allows the PRNG to call seed_seq::generate, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.
std::mt19937_64 prng;
std::random_device device;
std::seed_seq seq{device(), device(), device(), device()};
prng.seed(seq);
I'm calling the random_device 4 times to create a 4 element initial sequence for seed_seq. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.
Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:
The program is relaunched on the same machine later,
Two threads are launched on the same machine at the same time,
The program is launched on two different machines at the same time.
1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)
Now the point is what is the best way to combine them to get a uint_fast64_t (the seed type of std::mt19937_64)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.
A std::seed_seq would be the easy way to go, however its return type uint_least32_t is not our best choice.
A good 64 bits hasher is a much better choice. The STL offers std::hash under the functional header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is a size_t which on 64 machines is very likely to match our requirements.
Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.
A std::random_device could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.
As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.
The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update seed_count in a thread-safe manner, or multiple simulations could end up with the same seed_count. You could simply make it an std::atomic<int> to solve that.
Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:
at program startup, grab the current system time (using a high resolution timer) once, and store that.
assign each simulation a unique ID (this could just be an integer initialized to 0, (which should be generated without any race conditions, as mentioned above) which is incremented each time a simulation starts, effectively like your seed_count.
when seeding a simulation, just use the initially generated timestamp + the unique ID. If you do this, every simulation in the process is assured a unique seed.
How about...
There is some main code that starts the threads and there are copies of a function run in those threads, each copy with it's own Marsenne Twister. Am I correct? If it is so, why not use another random generator in the main code? It would be seeded with time stamp, and send it's consecutive pseudorandom numbers to function instances as their seeds.
From the comments I understand you want to run several instances of the algorithm, one instance per thread. And given that the seed for each instance will be generated pretty much at the same time, you want to ensure that these seeds are different. If that is indeed what you are trying to solve, then your genSeed function will not necessarily guarantee that.
In my opinion, what you need is a parallelisable random number generator (RNG). What this means, is that you only need one RNG which you instantiate with only one seed (which you can generate with your genSeed) and then the sequence of random numbers that would normally be gerenated in a sequential environment is split in X non-overlapping sequences; where X is the number of threads. There is a very good library which provides these type of RNGs in C++, follows the C++ standard for RNGs, and is called TRNG(http://numbercrunch.de/trng).
Here is a little more information. There are two ways you can achieve non-overlapping sequences per thread. Let's assume that the sequence of random numbers from a single RNG is r = {r(1), r(2), r(3),...} and you have only two threads. If you know in advance how many random numbers you will need per thread, say M, you can give the first M of the r sequence to the first thread, ie {r(1), r(2),..., r(M)}, and the second M to the second thread, ie {r(M+1), r(M+2),...r(2M)}. This technique is called blocksplitting since you split the sequence in two consecutive blocks.
The second way is to create the sequence for the first thread as {r(1), r(3), r(5), ...} and for the second thread as {r(2), r(4), r(6),...}, which has the advantage that you do not need to know in advance how many random numbers you will need per thread. This is called leapfroging.
Note that both methods guarantee that the sequences per thread are indeed non-overlapping. The link I posted above has many examples and the library itself is extremely easy to use. I hope my post helps.
The POSIX function gettimeofday(2) gives the time with microsecond precision.
The POSIX thread function gettid(2) returns the ID number of the current thread.
You should be able to combine the time in seconds since the epoch (which you are already using), the time in microseconds, and the thread ID to get a seed which is always unique on one machine.
If you also need it to be unique across multiple machines, you could consider also getting the hostname, the IP address, or the MAC address.
I would guess that 32 bits is probably enough, since there are over 4 billion unique seeds available. Unless you are running billions of processes, which doesn't seem likely, you should be alright without going to 64 bit seeds.

Where to initialize random seed for use through multiple random modules?

So, every time I am developing something big, with multiple modules coming together to build a final functionality, I've been wondering the same question: Where to initialize the random seed if more than 1 module needs to use the random function?
If I have a certain class that needs random (e.g. class that initializes itself by sorting an input array with self-implemented quicksort, so I would need a random for the pivot choice), I usually have a private static bool isRandOn; variable, so before I start the random pivot choice, I check that variable and do srand(time(NULL)); if the random is not on yet.
If I have a ton of utility functions in a namespace, I do a very similar thing: I put such a variable in an anonymous namespace inside my utils library, and do the more-or-less same thing as with a class.
The problem I have is when combining those modules. All by it self, I know each module will not set the seed more than once. But, I want to be able to use a various amount of my modules together, I want other people to be able to use one or more of my modules independent of the others...
So, what is the best way to handle multiple random-seed-needing modules? Set the seed in each module? Do not set the seed at all but instead document the usage of random and make the user initialize the seed if he wants to use the module? Something third?
I would suggest using Boost.Random rather than relying on some global state shared at the program level.
Boost.Random has two concepts:
Engine: which generates random numbers
Distributions: which adapt the result from the engines to provide results fitted to a certain distribution (normal, poisson, gaussian, ...)
Each module may then have its own Engine, or indeed several of them: there is no specific reason for a given Engine to be shared between several different functions within the same module.
As a final word: whatever you do make sure you have a way to set the seed(s) deterministically for bug repro purposes. Bug repro may benefit from having multiple engines (isolation of the parts helps).
You can make a special "module" for random number generation, and use that from the other parts of your application. Then you only seed once when the random-number module is initialized.
#penelope gave a correct answer. There is some complex algorithm for generating pseudo-random number sequence behind rand(). This is like some function rand_func(prev_rand), which generates next pseudo-random number from previous. For the first time you call srand(time(NULL)), which sets prev_rand to in these terms supposing time(NULL) to be quite undetermined. So you can safely call srand() (which sets ) multiple times.
The special issue is if you neet predictable pseudo-random sequences: for example, srand(0) etc. But it seems to be not your case.
The best way I came with to avoid repeating always the same initial random sequence is to do the following in each module where you call the random() function:
/* Global variable to remember if we already initialized the PRNG */
static bool seed_initialized = false;
/* Helper function to avoid having always the same sequence again and again */
static void
prng_init (unsigned int seed)
{
if (!seed_initialized)
{
srandom (seed);
seed_initialized = true;
}
}
And, each time you use random() in a function you start the function with something like:
/* Initializing PRNG with a 'reasonably strong' random seed for our purpose */
prng_init (time (NULL) - getpid());
This way, you ensure that:
You will initialize your PRNG at least the first time you go through;
You will never reinitialize the random sequence more than once within the module.
Hope this help!