Random engine state and multiple deterministic independent random sequences

Random engine state and multiple deterministic independent random sequences - c++

The C++ TR1 random number generation scheme has improved the old C runtime library in terms of keeping a separate state for random engines in different threads, or for independent random sequences. The old library has a global state machine, and this is usually bad.
However, when implementing an algorithm that requires deterministic random sequences, I find it annoying to have to pass the engine down to any method that should be drawing numbers from such a sequence. From a design perspective, the code that initializes the random seed doesn't need to know which methods down the stack are using random numbers. Yet those inner methods cannot initialize their own random engines, because:
they lack the knowledge to create a unique reproducible seed
memory requirements prevent keeping a separate state for the many downstream clients
To clarify, the downstream methods do not need to draw numbers from the same sequence as the main method, but they do need to be independent and reproducible in different runs.
Any idea on how to solve this conundrum elegantly?
EDIT
Some code to clarify the situation
typedef std::mt19937 RandEng;
class PossibleRandomConsumer;
class RandomProvider {
public:
void foo() {
std::uniform_int<> uni;
uni(eng, 17); // using the random engine myself
std::for_each(children.begin(), children.end(), [](PossibleRandomConsumer& child) {
// may or may not need a random number. if it does, it has to be different than from other children, and from other providers
child.DoSomething(eng);
});
}
private:
RandEng eng; // unique seed per RandomProvider
std::array<PossibleRandomConsumer,10000> children; // lots of these...
};

Not an easy question without knowing some detail about your architecture. So this is just a try:
How about passing down references to an interface that knows how to provide random numbers. This interface may just have one function that returns an amount of random numbers if asked. In this way you can hide the implementation details from the downstream functions (methods) and passing constant references around is nearly for free. You can even decide at top level where those random numbers come from (file, random number generator, your room temperature, ...).
class RandomNumberProvider {
public:
typedef std::vector<int> RandomArray;
virtual RandomArray Generate(unsigned numNumbers) = 0;
};
void Consumer(const RandomNumberProvider& rndProvider) {
RandomNumberProvider::RandomArray rndNumber;
rndNumber = rndProvider(10); // get a sequence of 10 random numbers
...
}
something like this.

Related

How to derive pseudo-random values of a preset probability from a seed?

I have a bunch of classes that I'd like to instantiate using seeds. Such a class would only have one constructor; taking in a single argument, the seed. A very simple pseudo example could be:
class Person {
int age;
Person(uint32 seed){
age = deriveAgeFromSeed(seed);
}
}
If I instantiate a Person with a random given seed, e.g. 123456789, it should evaluate to a Person with a specific age, e.g 30. The same seed will always generate the same person (same age).
To achieve this specific example, I could use a regular random-number-generator and use my seed as its seed to generate a random number between e.g. 0-100 for age.
However, I may not want it to be linearly random. Maybe I'd want a 50% chance that the age is in the range of 30-40. I guess I could chain a bunch of "random" numbers operations with my logic, e.g. generating a number from 0 to 1 which would indicate which age-range should be used, and then generate a new number to decide what specific age within this given range. But this would be a very ugly chain of hard code, and very hard to make adjustments to later.
I'd rather want a way to bundle an "option probability set" with the application. For instance an XML file that would specify the probability for all variable. The file could be loaded into memory at launch to prevent having to read the same file every time a person is instantiated. Unrealistic example to give an idea of what I mean:
"Person":
"age":
0-30: 25%,
30-40: 50%,
40-100: 25%
The application would use this information to automagically set an age based on the seed, with these given "probability parameters".
Having such a file would drastically decrease the workload for future adjustments of parameters, and would even let me change parameters without having to rebuild the application. But is it viable?
In addition to this, there may be cases where a second parameter could be dependent on the first. An example could be enum Occupation, where certain occupations are more common for certain ages (e.g. 'fast food employee' being more common at younger ages and CEO more common the older they get).
This type of logic seems rather common for certain types of video games, e.g. RTS such as "Civilization", where the game seed would be used to create the map, place its resources and the player spawn locations. It appears to also be used for procedural sandbox games such as Minecraft, where there's a certain probability for biomes. The latter is probably more noise-based, but still, noise would only give an output between 0 and 1, and they somehow derive a certain biome probability from it.
(I will code in C++, but language doesn't matter for the question)
So
What is the optimal/best practice procedure to derive many values at preset probabilities from a single seed?
Can the probabilities be imported from an external file?
Can probabilities be dependent on each other?

It seems you are thinking about a prng with a state, that you can initialize and use exactly once to generate the age value. However, for this purpose, hash would be also good: it also produces a pseudo-random number with uniform distribution on the given output space. If there is any need to generate a sequence of random numbers from a single seed, then a random number engine would be more useful.
XML, JSON and other formats can be read in using 3rd party libraries in C++. However, if the difficulty level is at storing age ranges and probability, for portability and dependency management it can be better to implement the parser yourself.
If I am correct, the age distribution is occupation-dependent. Once you derive the correct distribution for the occupation (either explicitly defined in a settings file, like in point 2 or by calculating it using a formula), you can get the right age-distribution from a single pseudo random number generated by the hash function.

Comparing different implementations with random seeds

I have two implementations of a program, one using lists and one using vectors, in order to compare their runtimes. The class functions in each implementation are different, since the list implementation allows more flexibility in code. They both also use random number generators.
I set both to have random seed 0 and ran them, but the results I get are not the same.
One question I have is, if both implementations call a function using a random seed, e.g.
boost::variate_generator<boost::mt19937&, boost::exponential_distribution<>> random_n(seed, boost::exponential_distribution<>()) ;
and one calls it more times than the other implementation, will that cause desynchronization with respect to random seeds?
To be more specific, the vector implementation simulates a Poisson Process on a continuous real segment, e.g. [0,1], whereas the list implementation simulates the PP on separate partitions: {[0,0.1], [0.1,0.2], [0.2,0.3], ..., [0.9, 1]} and then combines the results. Simulating a PP on the big partition could mean as few as 1 boost::exponential_distribution calls, but simulating on the 10 partitions requires at least 10 boost::exponential_distribution calls, even if none of them may be used (e.g. if they overshoot the partition).
Even though probabilistically, these methods should generate the same kind of results, would the seeds between the programs be de-synchronized? And if so, is there any way to resynchronize them without changing the implementation?

Correctly seeding random number generator (Mersenne twister) c++

Besides being a rubbish programmer, my jargon is not up to scratch. I am going to try my best to explain myself.
I have implemented a Merssene twister random number generator using randomlib.
Admittedly I am not too familiar on how Visual 8 C++'s random number generator works, but I find I can seed it once srand(time(NULL)) in main() and I can safely use rand() in my other classes.
The Merssene twister that I have one needs to create an object, and then seed that object.
#include <RandomLib/Random.hpp>
RandomLib::Random r; // create random number object
r.Reseed(); // seed with a "unique" seed
float d = r.FloatN(); // a random in [0,1] rounded to the nearest double
If I want to generate a random number in a class how do I do this without having to define an object each time. I am just worried that if I use the computer clock I will use the same seed each run (only changes every second).
Am I explaining myself right?
Thanks in advance

The Random object is essentially state information that you need to preserve. You can use all the normal techniques: You could have it as a global variable or pass it around as a parameter. If a particular class needs random numbers you can keep a Random object as a class member to provide randomness for that class.
The C++ <random> library is similar in that it requires the construction of an object as the source of randomness/RNG state. This is a good design because it allows the program to control access to the state and, for example, guarantee good behavior with multiple threads. The C++ <random> library even includes mersenne twister algorithm.
Here's an example showing saving a RNG state as a class member (using std::mt19937 instead of Random)
#include <random> // for mt19937
#include <algorithm> // for std::shuffle
#include <vector>
struct Deck {
std::vector<Cards> m_cards;
std::mt19937 eng; // save RNG state as class member so we don't have to keep creating one
void shuffle() {
std::shuffle(std::begin(m_cards), std::end(m_cards), eng);
}
};
int main() {
Deck d;
d.shuffle();
d.shuffle(); // this reuses the RNG state as it was at the end of the first shuffle, no reseeding
}

The accepted answer does not actually seed its mt19937, see this Q&A for a more thorough and complete answer on how this might be achieved and why there is no single solution:
How to succinctly, portably, and thoroughly seed the mt19937 PRNG?
TL;DR:
The question is relating to RandomLib but I will answer by referring to the STL implementations due to <random> being more accessible 10 years on. The principles should apply to all mt19937 implementations however.
std::mt19937 and std::mt19937_64 have an internal default seed which provides some state for the engine to work off. The default seeds will cause the engine to produce the same values every time unless re-seeded.
std::mt19937 provides two methods to seed it, both via the seed() function.
The first overload accepts a param of result_type (uint32_t for std::mt19937 and uint64_t for std::mt19937_64). Internally (at least in the MSVC implementation) this function will use the provided seed value to fill its internal state through a series of bit fiddling ops. Most quick-and-dirty examples will use a std::random_device to provide this seed value, but due to the standard allowing random_device to be just another PRNG it cannot be relied on in all circumstances, apparently this is (or was) the case with the MinGW compiler on Windows.
The second overload accepts a more generic generator/range param which can be used with std::seed_seq. The linked question has an example of how to create one of these.
Creating a seed_seq or a sufficiently random initial seed is a challenge and why the linked question is provided.
It is not recommended that you create a new Mersenne Twister PRNG every time you need one due to the seeding process being non-trivial. Instead, it is better to declare one once and hold onto it, either as a static, thread_local, global, or member of a class with a long lifetime.

Random Number Generator: Should it be used as a singleton?

I use random numbers in several places and usually construct a random number generator whenever I need it. Currently I use the Marsaglia Xorshift algorithm seeding it with the current system time.
Now I have some doubts about this strategy:
If I use several generators the independence (randomness) of the numbers between the generators depends on the seed (same seed same number). Since I use the time (ns) as seed and since this time changes this works but I am wondering whether it would not be better to use only one singular generator and e.g. to make it available as a singleton. Would this increase the random number quality ?
Edit: Unfortunately c++11 is not an option yet
Edit: To be more specific: I am not suggesting that the singleton could increase the random number quality but the fact that only one generator is used and seeded. Otherwise I have to be sure that the seeds of the different generators are independent (random) from another.
Extreme example: I seed two generators with exactly the same number -> no randomness between them

Suppose you have several variables, each of which needs to be random, independent from the others, and will be regularly reassigned with a new random value from some random generator. This happens quite often with Monte Carlo analysis, and games (although the rigor for games is much less than it is for Monte Carlo). If a perfect random number generator existed, it would be fine to use a single instantiation of it. Assign the nth pseudo random number from the generator to variable x1, the next random number to variable x2, the next to x3, and so on, eventually coming back to variable x1 on the next cycle. around. There's a problem here: Far too many PRNGs fail the independence test fail the independence test when used this way, some even fail randomness tests on individual sequences.
My approach is to use a single PRNG generator as a seed generator for a set of N instances of self-contained PRNGs. Each instance of these latter PRNGs feeds a single variable. By self-contained, I mean that the PRNG is an object, with state maintained in instance members rather than in static members or global variables. The seed generator doesn't even need to be from the same family as those other N PRNGs. It just needs to be reentrant in the case that multiple threads are simultaneously trying to use the seed generator. However, In my uses I find that it is best to set up the PRNGs before threading starts so as to guarantee repeatability. That's one run, one execution. Monte Carlo techniques typically need thousands of executions, maybe more, maybe a lot more. With Monte Carlo, repeatability is essential. So yet another a random seed generator is needed. This one seeds the seed generator used to generate the N generators for the variables.
Repeatability is important, at least in the Monte Carlo world. Suppose run number 10234 of a long Monte Carlo simulation results in some massive failure. It would be nice to see what in the world happened. It might have been a statistical fluke, it might have been a problem. The problem is that in a typical MC setup, only the bare minimum of data are recorded, just enough for computing statistics. To see what happened in run number 10234, one needs to repeat that particular case but now record everything.

You should use the same instance of your random generator class whenever the clients are interrelated and the code needs "independent" random number.
You can use different objects of your random generator class when the clients do not depend on each other and it does not matter whether they receive the same numbers or not.
Note that for testing and debugging it is very useful to be able to create the same sequence of random numbers again. Therefore you should not "randomly seed" too much.

I don't think that its increasing the randomness but it reduces the memory you need to create an object every time you want to use the random generator. If this generator doesn't have any instance specific settings you can make a singleton.

Since I use the time (ns) as seed and since this time changes this works but I am wondering whether it would not be better to use only one singular generator and e.g. to make it available as a singleton.
This is a good example when the singleton is not an anti-pattern. You could also use some kind of inversion of control.
Would this increase the random number quality ?
No. The quality depends on the algorithm that generate random numbers. How you use it is irrelevant (assuming it is used correctly).
To your edit : you could create some kind of container that holds objects of your RNG classes (or use existing containers). Something like this :
std::vector< Rng > & RngSingleton()
{
static std::vector< Rng > allRngs( 2 );
return allRngs;
}
struct Rng
{
void SetSeed( const int seen );
int GenerateNumber() const;
//...
};
// ...
RngSingleton().at(0).SetSeed( 55 );
RngSingleton().at(1).SetSeed( 55 );
//...
const auto value1 = RngSingleton().at(0).GenerateNumber;
const auto value2 = RngSingleton().at(1).GenerateNumber;

Factory pattern to the rescue.
A client should never have to worry about the instantiation rules of its dependencies.
It allows for swapping creation methods. And the other way around, if you decide to use a different algorithm you can swap the generator class and the clients need no refactoring.
http://www.oodesign.com/factory-pattern.html
--EDIT
Added pseudocode (sorry, it's not c++, it's waaaaaay too long ago since I last worked in it)
interface PRNG{
function generateRandomNumber():Number;
}
interface Seeder{
function getSeed() : Number;
}
interface PRNGFactory{
function createPRNG():PRNG;
}
class MarsagliaPRNG implements PRNG{
constructor( seed : Number ){
//store seed
}
function generateRandomNumber() : Number{
//do your magic
}
}
class SingletonMarsagliaPRNGFactory implements PRNGFactory{
var seeder : Seeder;
static var prng : PRNG;
function createPRNG() : PRNG{
return prng ||= new MarsagliaPRNG( seeder.getSeed() );
}
}
class TimeSeeder implements Seeder{
function getSeed():Number{
return now();
}
}
//usage:
seeder : Seeder = new TimeSeeder();
prngFactory : PRNGFactory = new SingletonMarsagliaPRNGFactory();
clientA.prng = prngFactory.createPRNG();
clientB.prng = prngFactory.createPRNG();
//both clients got the same instance.
The big advantage is now that if you want/need to change any of the implementation details, nothing has to change in the clients. You can change seeding method, RNG algorithm and the instantiation rule w/o having to touch any client anywhere.

using one random engine for multi distributions in c++11

I am using c++11 new <random> header in my application and in one class in different methods I need different random number with different distributions. I just put a random engine std::default_random_engine as class member seed it in the class constructor with std::random_device and use it for different distributions in my methods. Is that OK to use the random engine in this way or I should declare different engines for every distribution I use.

It's ok.
Reasons to not share the generator:
threading (standard RNG implementations are not thread safe)
determinism of random sequences:
If you wish to be able (for testing/bug hunting) to control the exact sequences generated, you will by likely have fewer troubles by isolating the RNGs used, especially when not all RNGs consumption is deterministic.

You should be careful when using one pseudo random number generator for different random variables, because in doing so they become correlated.
Here is an example: If you want to simulate Brownian motion in two dimensions (e.g. x and y) you need randomness in both dimensions. If you take the random numbers from one generator (noise()) and assign them successively
while(simulating)
x = x + noise()
y = y + noise()
then the variables x and y become correlated, because the algorithms of the pseudo number generators only make statements about how good they are, if you take every single number generated and not only every second one like in this example. Here, the Brownian particles could maybe move into the positive x and y directions with a higher probability than in the negative directions and thus introduce an artificial drift.
For two further reasons to use different generators look at sehe's answer.

MosteM's answer isn't correct. It's correct to do this so long as you want the draws from the distributions to be independent. If for some reason you need exactly the same random input into draws of different distributions, then you may want different RNGs. If you want correlation between two random variables, it's better to build them starting from a common random variable using mathematical principal: e.g., if A, B are independent normal(0,1), then A and aA +sqrt(1-a**2)B are normal(0,1) with correlation a.
EDIT: I found a great resource on the C++11 random library which may be useful to you.

There is no reason not to do it like this. Depending on which random generator you use, the period is quite huge (2^19937 in case of Mersenne-Twister), so in most cases, you won't even reach the end of one period during the execution of your program. And even if it is not said that, it's worse to reach the period with all distributions using the same generator than having 3 generators each doing 1/3 of their period.
In my programs, I use one generator for each thread, and it works fine. I think that's the main reason they split up the generator and distributions in C++11, since if you weren't allowed to do this, there would be no benefit from having the generator and the distribution separate, if one needs one generator for each distribution anyway.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js