srand seed consistency between physical machines - c++

I'm not quite sure how to phrase this question, but I couldn't find any others like it.
Say I have this code:
srand(1);
srand(SOME_DEFINED_CONST_INT);
If I run this executable on a number of different physical machines, is the sequence of rand() guaranteed to be consistent between them? i.e. if I get 1, 4, 6, 3, 4 on one machine, will I always get that same sequence on the others?
If yes, how can that be proven? Is it part of the standard?
If no, is there anything I could do to make it so?

No, the standard guarantees no such thing. However, the logic of generating the random numbers is inside the C standard library. So if you build the application with the same version of the library, the sequence should be the same. The second part of my answer is just a guess, but the standard definitely doesn't give any guarantees.

As Armen said, it's non standard. However, if you look at the man page for srand() on Linux, you'll see something interesting:
POSIX 1003.1-2003 gives the following example of an implementation of
rand() and srand(), possibly useful when one needs the same sequence
on two different machines.
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}

As Mat said it is always a good idea to implement the random number generator yourself. Preferably in an object oriented manner. As a nice side effect you can get thread safety and possibly speed besides consistency across platforms. Linear congruential generators http://en.wikipedia.org/wiki/Linear_congruential_generator or mersenne twister http://en.wikipedia.org/wiki/Mersenne_twister will get you far.

I'll add that if you are working under Windows, if you take your exe and move between machines, the srand WILL generate the same numbers, because the implementation of the srand is implementor-specific, but you'll always use the runtime of the same implementor (so if you are using the Microsoft C++, you'll use the srand of Microsoft, and MS won't probably change its implementation of srand today or tomorrow). The same for Linux. Your srand will always be the one of glibc. Unless they change it in glibc, the numbers will be the same.

Related

Equivalent of srand() and rand() using post-C++11 std library

I have old code that predates C++11 and it uses rand() for generating random ints.
However, there is shortcoming in rand(): you can't save and then restore the state of the random device; since there is not an object I can save, nor can I extract the state.
Therefore, I want to refactor to use C++11's solution <random>.
However, I do not want a behaviour change - I hope to get exactly the sequence rand() gives me but with <random>.
Do you guys know whether this is achievable?
You can't even assure that you get the same sequence if you use rand() on another compiler. And no, you can't get random to produce the same sequence as whoever's rand() it was you were using. (Thank goodness. rand() is notorious for being one of the worst pseudo-random number generators of all time.)
It is possible for you to restore the state of rand(), simply by using srand() to set the initial state and counting how many times you called rand(). You can later repeat that to bring rand() back to that same state.
But don't use rand()!
What you want is not possible. The C-style random number generator is implementation-defined. The C++ random engines are all very well specified as to their particular algorithms (except random_device, which varies due to potentially being a more "true" random generator). None of its engines are defined to have the same algorithm as rand.

Deterministic random numbers from STL [duplicate]

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

If we seed c++11 mt19937 as the same on different machines, will we get the same sequence of random numbers

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries
Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

C++ rand and srand gets different output on different machines

I wanted to generate a random integer, so I used C++ rand(void) and srand(int) functions:
int main(){
srand(1);
cout << rand() << endl;
return 0;
}
OK, it suits my needs. Each time I execute it I get same result, which I like it!
But there is a problem. When I executed it on my computer I got 16807 as output. But when I executed it on another machine, I got 1804289383.
I know that rand() and srand(int) have a simple implementation similar to this:
static unsigned long int next = 1;
int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
void srand(unsigned int seed)
{
next = seed;
}
So why? Is it possible that rand() has different implementations on multiple machines? What should I do?
I want to modify the other machine in such a way that I get 16807 from that machine too.
Please note that I love the rand implementation on my computer. Please show me a way that other machine gets same result with mine.
Thanks in advance.
Yes, rand() has different implementations; there's no requirement for them to be identical.
If you want consistent sequences across implementations and platforms, you can copy the sample implementation from the C standard section 7.20.2. Be sure to rename both rand and srand so they don't collide with the standard library's versions. You might need to adjust the code so the types have the same size and range across the implementations (e.g., use uint32_t from <stdint.h> rather than unsigned int).
EDIT: Given the new information from the comments, it looks like the requirements are different from what we thought (and I'm still not 100% clear on what they are).
You wants to generate random numbers on two systems consistent with a stored file that you've generated on one system, but you're unable to transfer it to the other due to network issues (the file is about a gigabyte). (Burning it to a DVD, or splitting it and burning it to 2 CDs, isn't an option?)
Suggested solution:
Write a custom generator that generates consistent results on both systems (even if they're not the same results you got before). Once you've done that, use it to re-generate a new 1-gigabyte data file on both systems. The existing file becomes unnecessary, and you don't need to transfer huge amounts of data.
I think it's because int/unsigned int on your two platforms is a different size. Are ints/unsigned ints the same number of bytes on both machines/OSes you're compiling on? What platforms/compilers are you using?
Assuming the same rand/srand implementation, you need to use datatypes of the same precision (or appropriate casting) to get the same result. If you have stdint.h on your platform, try and use that (so you can define explicit sizes, e.g. uint32_t).
The C and C++ specifications do not define a particular implementation for rand or srand. They could be anything, as long as it is somewhat random. You cannot expect consistent output from different standard libraries.
The rand implementations can be different. If you need identical behavior on different machines, you need a random number generator that provides that. You can roll your own or use someone else's.
I am not sure if the random generators in the C++0x library suffices. I think not. But reading the standardeese there makes my head spin.
Similarly, I'm not sure whether the Boost Random library suffices. But I think it's worth checking out. And there you have the source code, so at worst it can serve as basis for rolling your own.
Cheers & hth.,
Also, there are different Pseudo-RNG algorithms (e.g LCG vs Mersenne Twister)
http://en.wikipedia.org/wiki/Random_number_generation
C compiler on your first machine may use one, and the second machine may use another.

How to eliminate all sources of randomness so that program always gives identical answers?

I have C++ code that relies heavily on sampling (using rand()), but I want it to be reproducible. So in the beginning, I initialize srand() with a random seed and print that seed out. I want others to be able to run the same code again but initializing srand() with that same seed and get exactly the same answer as I did.
But under what circumstances is that guaranteed? I suppose that works only if the binaries are compiled with the same compiler on the same system? What are other factors that might make the answer differ from the one I got initially?
The solution is to use the same code in all cases - the Boost random number library is infinitely better than any C++ standard library implementation, and you can use the same code on all platforms. Take a look at this question for example of its use and links to the library docs.
You're correct that the sequences might be different if compiled on different machines with different rand implementations. The best way to get around this is to write your own PRNG. The Linux man page for srand gives the following simple example (quoted from the POSIX standard):
POSIX.1-2001 gives the following
example of an implementation of rand()
and srand(), possibly useful when one
needs the same sequence on two
different machines.
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}
To avoid this kind of problem, write your own implementation of rand()! I'm no expert on random-number generation algorithms, so I'll say no more than that...
Check out implementation of rand(), and use one of the random number generators from there - which ensures repeatability no matter what platform you run on.