Seeding SML/NJ's RNG on a Windows machine

Seeding SML/NJ's RNG on a Windows machine - sml

How to seed SML/NJ's random number generator on a Windows machine?
The function Random.rand() takes a pair of integers and uses them to seed the random number generator. Based on my experience with other porgramming languages, I would expect there to be a relatively easy way to seed it based on the system clock (something like srand(time(null)); in C). Unless I am overlooking something obvious, there doesn't seem to be any straightforward way, at least if you are using Windows.
The closest I can find to time(null) in SML is Posix.ProcEnv.time, which returns Unix epoch time. Unfortunately, the Posix structures are not part of the Windows download, and the Windows structure (which is) doesn't seem to include any direct analogue of time.
The Timer structure does have ways of determining elapsed real time. I could write a function which does about half a second of meaningless calculation, time how long it takes, and figure out a way to extract a couple of integers from that. But: 1) this is an awful lot of work for something which is trivial in most languages, 2) more importantly -- it seems likely to result in the same seed being reused a non-trivial percentage of the times.
Another idea I had is that if I could access the Windows environment variable "TIME" I could use that. The following prints the time to the repl:
OS.Process.system "TIME/T";
but doesn't give any programatic access to the printed string.
OS.Process.getEnv "TIME";
sounds promising, but returns NONE.
If there really is no easy solution in SML/NJ -- are there options which work for some of the other implementations of SML such as Poly/ML?

The Basis Library's TIME signature has a function for returning the current time.
val now: unit -> t

#matt answered the question itself as far as getting the system clock reading in a portable way. To complement his answer, here is a seed function. As a technical problem, the number of elapsed seconds since 1970 is too large for an SML/NJ 31-bit int. I could use large ints of course, but a simple solution seemed to be to just reduce by 1.48 billion before converting to an int (and to use the decimal part of the time to get the second int seed parameter):
fun seed () =
let
val r = Time.toReal(Time.now()) - 1.48e9
val f = Real.realFloor(r)
val d = r - f
val i = Real.floor(f)
val j = Real.floor(1000.0*d)
in
Random.rand(i,j)
end;
There is almost definitely a more principled way to do this, but the above works:
- val s = seed ();
val s =
RND
{borrow=ref false,congx=ref 0wx4B7CD4CA,index=ref 0,
vals=[|0wx40E9888B,0wx6F1B97FD,0wx4011C479,0wx2012F528,0wx3CDC0237,
0wx7C36E91D,0wx5361B64D,0wx4B61A297,0wx61823821,0wx7C6CD6BD,
0wx1683CA4D,0wx670A75AF,...|]} : Random.rand
- Random.randRange(1,100) s;
val it = 35 : int
- val s = seed ();
val s =
RND
{borrow=ref false,congx=ref 0wx512EBCFC,index=ref 0,
vals=[|0wx456E115A,0wx27817499,0wx46A6BE48,0wx2C79BB3,0wx3FF47B4D,
0wx5B48FC93,0wx53C3647F,0wx32E40F5A,0wx157AB4C8,0wx16E750D,
0wx78BD3EA3,0wx7885CA23,...|]} : Random.rand
- Random.randRange(1,100) s;
val it = 73 : int
Not very exciting, but successive seedings sepearated by just a few seconds produced different outputs, as expected.

Related

Random Number Generation : same C++ code, two different behaviors

My colleague and I are working on a Monte Carlo project together, in C++. She uses Visual Studio, I use Xcode, we shared the code through git.
We are computing American option prices thanks to a given method requiring random number generation. We realized we were getting wrong results for a certain parameter K (the higher the parameter, the more wrong the answer), and my colleague found that changing the random source for Mersenne Twister to rand() (though poor generator) made the results good for the whole range of K.
But when I changed the source on my version of the code, it did nothing.
More puzzling for me, I created a new Xcode project, copied inside it her whole source and it still gives me wrong results (while she gets good ones). So it can't stem from the code itself.
I cleaned the project, relaunched Xcode, even restarted my computer (...), but nothing changes : our projects behaves consistently but differently, with the same code behind.
(EDIT: by differently but consistently, I don't mean that we don't have the same sequence of numbers. I mean that her Monte Carlo estimator converges toward 4. and mine towards 3.)
Do you have any idea of what the cause of this dual behavior could be ?
Here is the random generation code :
double loiuniforme() //uniform law
{
return (double)((float)rand() / (float)RAND_MAX);
}
vector<double> loinormale() //normal law
{
vector<double> loinormales(2, 0.);
double u1 = loiuniforme();
double v1 = loiuniforme();
loinormales[0] = sqrt(-2 * log(u1))*cos(2 * M_PI*v1);
loinormales[1] = sqrt(-2 * log(u1))*sin(2 * M_PI*v1);
return(loinormales);
}
EDIT : the MT RNG used before was :
double loiuniforme()
{
mt19937::result_type seed = clock();
auto real_rand = std::bind(std::uniform_real_distribution<double>(0,1), mt19937(seed));
return real_rand();
}

The C++ standard does not specify what algorithm is used by rand(). Whoever wrote the compiler is free to use whatever implementation they want, and there is no guarantee that it will behave the same on two different compilers, on two different architectures or even two different versions of the same compiler.

You should only create one generator and use that for every number.
mt19937::result_type seed = clock();
and
mt19937(seed)
create a new generator, with a new seed, every time you call the function.
This causes the randomness to get all twisted.
You can use static variables in the function, since these are initialised on the first call:
double loiuniforme()
{
static std::mt19937 generator(clock());
static std::uniform_real_distribution<double> distribution(0, 1);
return distribution(generator);
}
(When you're comparing results with your colleague, use the same hardcoded seed to verify that you are getting the same results.)

You need to seed the rand function with the same number on both computers. And even then I'm not sure that the underlying code across computers and operating systems will return the same value.
More importantly, if you want identical results, don't use a random function.

rand() not giving me a random number (even when srand() is used)

Okay I'm starting to lose my mind. All I want to do is random a number between 0 and 410, and according to this page, my code should do that. And since I want a random number and not a pseudo-random number, I'm using srand() as well, in a way that e.g. this thread told me to do. But this isn't working. All I get is a number that is depending on how long it was since my last execution. If I e.g. execute it again as fast as I can, the number is usually 6 numbers higher than the last number, and if I wait longer, it's higher, etc. When it reaches 410 it goes back to 0 and begins all over again. What am I missing?
Edit: And oh, if I remove the srand(time(NULL)); line I just get the same number (41) every time I run the program. That's not even pseudo random, that's just a static number. Just copying the first line of code from the article I linked to above still gives me number 41 all the time. Am I the star in a sequel to "The Number 23", or have I missed something?
int main(void) {
srand(time(NULL));
int number = rand() % 410;
std::cout << number << std::endl;
system("pause");
}

That is what you get for using deprecated random number generation.
rand produces a fixed sequence of numbers (which by itself is fine), and does that very, very badly.
You tell rand via srand where in the sequence to start. Since your "starting point" (called seed btw) depends on the number of seconds since 1.1.1970 0:00:00 UTC, your output is obviously time depended.
The correct way to do what you want to do is using the C++11 <random> library. In your concrete example, this would look somewhat like this:
std::mt19937 rng (std::random_device{}());
std::uniform_int_distribution<> dist (0, 409);
auto random_number = dist(rng);
For more information on the evils of rand and the advantages of <random> have a look at this.
As a last remark, seeding std::mt19937 like I did above is not quite optimal because the MT's state space is much larger than the 32 bit you get out of a single call to std::random_device{}(). This is not a problem for toy programs and your standard school assignments, but for reference: Here is my take at seeding the MT's entire state space, plus some helpful suggestions in the answers.

From manual:
time() returns the time as the number of seconds since the Epoch,
1970-01-01 00:00:00 +0000 (UTC).
Which means that if you start your program twice both times at the same second you will initialize srand with same value and will get same state of PRNG.
And if you remove initialization via call to srand you will always get exactly same sequence of numbers from rand.

I'm afraid you can't get trully random numbers there. Built in functions are meant to provide just pseudo random numbers. Moreover using srand and rand, because the first uses the same approach as the second one. If you want to cook true random numbers, you must find a correct source of entrophy, working for example with atmospheric noise, as the approach of www.random.org.
The problem here consists in the seed used by the randomness algorithm: if it's a number provided by a machine, it can't be unpredictable. A normal solution for this is using external hardware.

Unfortunately you can't get a real random number from a computer without specific hardware (which is often too slow to be practical).
Therefore you need to make do with a pseudo generator. But you need to use them carefully.
The function rand is designed to return a number between 0 and RAND_MAX in a way that, broadly speaking, satisfies the statistical properties of a uniform distribution. At best you can expect the mean of the drawn numbers to be 0.5 * RAND_MAX and the variance to be RAND_MAX * RAND_MAX / 12.
Typically the implementation of rand is a linear congruential generator which basically means that the returned number is a function of the previous number. That can give surprisingly good results and allows you to seed the generator with a function srand.
But repeated use of srand ruins the statistical properties of the generator, which is what is happening to you: your use of srand is correlated with your system clock time. The behaviour you're observing is completely expected.
What you should do is to only make one call to srand and then draw a sequence of numbers using rand. You cannot easily do this in the way you've set things up. But there are alternatives; you could switch to a random number generator (say mersenne twister) which allows you to draw the (n)th term and you could pass the value of n as a command line argument.
As a final remark, I'd avoid using a modulus when drawing a number. This will create a statistical bias if your modulo is not a multiple of RAND_MAX.

Try by change the NULL in time(NULL) by time(0) (that will give you the current système time). If it doesn't work, you could try to convert time(0) into ms by doing time(0)*1000.

C++ uniform_int_distribution always returning min() on first invocation

In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<> does not return a random value, but rather the distribution's min value. That is, given the code:
default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );
...x will in fact be smaller for any values of any_seed(), smaller, or larger.
To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.
I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?

Explanation for the observed behavior
This is how uniform_int_distribution maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
__ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;
where __urange is larger - smaller and __urngrange is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)
In our case, the rng default_random_engine is a minstd_rand0, which yields __scaling == 195225785 for the range [0,10] you tested with. Thus, if rng() < 195225785, the distribution will return 0.
The first number a minstd_rand0 returns is
(16807 * seed) % 2147483647
(where seed == 0 gets adjusted to 1 btw). We can thus see that the first value produced by a minstd_rand0 seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 ); you used. (mod off-by-one-errors on my part. ;) )
You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.
Does that mean (libstdc++'s impl of) <random> is broken?
No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0 will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)
What can we do about this?
Case 1: You want random number of high statistical quality.
For that, you use a better rng like mt19937 and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)
Case 2: You really want to use those small seeds only.
We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:
std::mt19937 rng(smallSeed);
rng.discard(700000);
It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937 are superior by far.
Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0.
You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.
Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.

How to properly choose rng seed for parallel processes

I'm currently working on a C/C++ project where I'm using a random number generator (gsl or boost). The whole idea can be simplified to a non-trivial stochastic process which receives a seed and returns results. I'm computing averages over different realisations of the process.
So, the seed is important: the processes must be with different seeds or it will bias the averages.
So far, I'm using time(NULL) to give a seed. However, if two processes start at the same second, the seed is the same. That happens because I'm using parallelisation (using openMP).
So, my question is: how to implement a "seed giver" on C/C++ which gives independent seeds?
For instance, I though in using the thread number (thread_num), seed = time(NULL)*thread_num. However, this means that the seeds are correlated: they are multiple of each others. Does that poses any problem to the "pseudo-random" or is it as good as sequential seeds?
The requirements are that it must work on both Mac OS (my pc) and Linux distribution similar to OS Cent (the cluster) (and naturally give independent realisations).

A commonly used scheme for this is to have a "master" RNG used to generate seeds for each process-specific RNG.
The advantage of such a scheme is that the whole computation is determined by only one seed, which you can record somewhere to be able to replay any simulation (this might be useful to debug nasty bugs).

We ran into a similar problem on a Beowulf computing grid, the solution we used was to incorporate the pid of the process into the RNG seed, like so:
time(NULL)*thread_num*getpid()
Of course, you could just read from /dev/urandom or /dev/random into an integer.

When faced with this problem I often use seed_rng from Boost.Uuid. It uses time, clock and random data from /dev/urandom to calculate a seed. You can use it like
#include <boost/uuid/seed_rng.hpp>
#include <iostream>
int main() {
int seed = boost::uuids::detail::seed_rng()();
std::cout << seed << std::endl;
}
Note that seed_rng comes from a detail namespace, so it can go away without further notice. In that case writing your own implementation based on seed_rng shouldn't be too hard.

Mac OS is Unix too, so it probably has /dev/random. If so, that's the
best solution for obtaining the seeds. Otherwise, if the generator is
good, taking time( NULL ) once, and then incrementing it for the seed
of each generator, should give reasonably good results.

If you are on x86 and don't mind making the code non-portable then you could read the Time Stamp Counter (TSC) which is a 64-bit counter that increments at the CPU (max) clock rate (about 3 GHz) and use that as a seed.
#include <stdint.h>
static inline uint64_t rdtsc()
{
uint64_t tsc;
asm volatile
(
"rdtsc\n\t"
"shl\t$32,%%rdx\n\t" // rdx = TSC[ 63 : 32 ] : 0x00000000
"add\t%%rdx,%%rax\n\t" // rax = TSC[ 63 : 0 ]
: "=a" (tsc) : : "%rdx"
);
return tsc;
}

When compare two infinite time sequences produced by the same pseudo-random number generator with different seeds, we can see that they are same delayed by some time tau. Usually this time time scale is much bigger than your problem to ensure that the two random walks are uncorrelated.
If your stochastic process is in a high dimensional phase space, I think that one good suggestion could be:
seed = MAXIMUM_INTEGER/NUMBER_OF_PARALLEL_RW*thread_num + time(NULL)
Notice that using scheme you are not guaranteeing that time tau is big !!
If you have some knowledge of your system time scale, you can call your random number generator some number o times in order to generate seeds that are equidistant by some time interval.

Maybe you could try std::chrono high resolution clock from C++11:
Class std::chrono::high_resolution_clock represents the clock with the
smallest tick period available on the system. It may be an alias of
std::chrono::system_clock or std::chrono::steady_clock, or a third,
independent clock.
http://en.cppreference.com/w/cpp/chrono/high_resolution_clock
BUT tbh Im not sure that there is anything wrong with srand(0); srand(1), srand(2).... but my knowledge of rand is very very basic. :/
For crazy safety consider this:
Note that all pseudo-random number generators described below are
CopyConstructible and Assignable. Copying or assigning a generator
will copy all its internal state, so the original and the copy will
generate the identical sequence of random numbers.
http://www.boost.org/doc/libs/1_51_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Since most of the generators have crazy long cycles you could generate one, copy it as first generator, generate X numbers with original, copy it as second, generate X numbers with original, copy it as third...
If your users call their own generator less than X time they will not be overlapping.

The way I understand your question, you have multiple processes using the same pseudo-random number generation algorithm, and you want each "stream" of random numbers (in each process) to be independent of each other. Am I correct ?
In that case, you are right in suspecting that giving different (correlated) seeds does not guaranty you anything unless the rng algorithm says so. You basically have two solutions:
Simple version
Use a single source of random numbers, with a single seed. Then feed random numbers in a round-robin fashion to each process.
This solution is slow but provide some guaranty that the number you give to your processes are ok.
You can do the same thing but generating all the random numbers you need at once, and then splitting this set into as many slices as you have processes.
Use a RNG designed for that
You can find in papers and on the web several algorithms specifically designed to provide independent streams of random numbers from a single initial state. They are complicated but most provide source code. The idea is generally to "split" the RNG space (values you can obtain from the initial state) into various chunks like above. They are just faster because the algorithm used makes it possible to compute easily what would be the state of the RNG if you skipped a given number of values.
These generators are generally called "parallel random number generators".
The most popular ones are probably these two:
RngStreams: http://statmath.wu.ac.at/software/RngStreams/
SPRNG: http://sprng.cs.fsu.edu/
Check their manuals to fully understand what they do, how they do it, and if it really is what you need.

generating a random number within range 0 to n where n can be > RAND_MAX

How can I generate a random number within range 0 to n where n can be > RAND_MAX in c,c++?
Thanks.

split the generation in two phases, then combine the resulting numbers.

Random numbers is a very specialized subject that unless you are a maths junky is very easy to get wrong. So I would advice against building a random number from multiple sources you should use a good library.
I would first look at boost::Random
If that is not suffecient try of this group sci.crypt.random-numbers
Ask the question there they should be able to help.

suppose you want to generate a 64-bit random number, you could do this:
uint64_t n = 0;
for(int i = 0; i < 8; ++i) {
uint64_t x = generate_8bit_random_num();
n = (n << (8 * i)) | x;
}
Of course you could do it 16/32 bits at a time too, but this illustrates the concept.
How you generate that 8/16/32-bit random numbers is up to you. It could be as simple as rand() & 0xff or something better depending on how much you care about the randomness.

Assuming C++, have you tried looking at a decent random number library, like Boost.Random. Otherwise you may have to combine multiple random numbers.

If you're looking for a uniform distribution (or any distribution for that manner) , you must take care that the statistical properties of the output are sufficient for your needs. If you can't use the output of a random number generator directly, you should be very careful trying to combine numbers to achieve your needs.
At a bare minimum you should make sure the distribution is appropriate. If you're looking for a uniform distribution of integers from 0 to M, and you have some uniform random number generator g() to produce outputs that are smaller than M, make sure you do not do one of the following:
add k outputs of g() together until they're large enough (the result is nonuniform)
take r = g() + (g() << 16), then compute r % M (if the range of r is not an even multiple of M, it will weight certain values in the range slightly more than others; the shift-left itself is questionable unless g() outputs a range between 0 and a power of 2 minus 1)
Beyond that, there is the potential for cross-correlation between terms of the sequence (random number generators are supposed to produce independent identically-distributed outputs).
Read The Art of Computer Programming vol. 2 (Knuth) and/or Numerical Recipes and ask questions until you feel confident.

If your implementation has an integer type large enough to hold the result you need, it's generally easier to get a decent distribution by simply using a generator that produces the required range than to try to combine outputs from the smaller generator.
Of course, in most cases, you can just download code for something like the Mersenne Twister or (if you need a cryptographic quality generator) Blum-Blum-Shub, and forget about writing your own.

Do x random numbers (from 0 to RAND_MAX) and add them together, where
x = n % RAND_MAX

Consider a random variable which can take on values {0, 1} with P(0) = P(1) = 0.5. If you want to generate random values between 0 to 2 by summing two independent draws, you will have P(0) = 0.25, P(1) = 0.5 and P(2) = 0.25.
Therefore, use an appropriate library unless you do not care at all about the PDF of the RNG.
See also Chapter 7 in Numerical Recipes. (This is a link to the older edition but that's the one I studied anyway ;-)

There are many ways to do this.
If you are OK with less granularity (higher chance of dupes), then something like (in pseudocode) rand() * n / RAND_MAX will work to spread the values across a larger range. The catch is that in your real code you'll need to avoid overflow, either via casting rand() or n to a large-enough type (e.g. 64-bit int if RAND_MAX is 0xFFFFFFFF) to hold the multiplication result without overflow, or use a multiply-then-divide API (like GNU's MulDiv64 or Win32's MulDiv) which is optimized for this scenario.
If you want granuarity down to each integer, you can call rand() multiple times and append the results. Another answer suggests calling rand() for each 8-bit/16-bit/32-bit chunk depending on size of RAND_MAX.
But, IMHO, the above ideas can rapidly get complicated, inaccurate, or both. Generating random numbers is a solved problem in other libraries, and it's probably much easier to borrow existing code (e.g. from Boost) than try to roll your own. Open source random number generation algorithm in C++? has answers with more links if you want something besides Boost.
[ EDIT: revising after having a busy day... meant to get back and clean up my quick answer this morning, but got pulled away and only getting back now. :-) ]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js