Why is std::uniform_real_distribution better than rand() as the random number generator? Can someone give an example please?
First, it should be made clear that the proposed comparison is nonsensical.
uniform_real_distribution is not a random number generator. You cannot produce random numbers from a uniform_real_distribution without having a random number generator that you pass to its operator(). uniform_real_distribution "shapes" the output of that random number generator into an uniform real distribution. You can plug various kinds of random number generators into a distribution.
I don't think this makes for a decent comparison, so I will be comparing the use of uniform_real_distribution with a C++11 random number generator against rand() instead.
Another obvious difference that makes the comparison even less useful is the fact that uniform_real_distribution is used to produce floating point numbers, while rand() produces integers.
That said, there are several reasons to prefer the new facilities.
rand() is global state, while when using the facilities from <random> there is no global state involved: you can have as many generators and distributions as you want and they are all independent from each other.
rand() has no specification about the quality of the sequence generated. The random number generators from C++11 are all well-specified, and so are the distributions. rand() implementations can be, and in practice have been, of very poor quality, and not very uniform.
rand() provides a random number within a predefined range. It is up to the programmer to adjust that range to the desired range. This is not a simple task. No, it is not enough to use % something. Doing this kind of adjustment in such a naive manner will most likely destroy whatever uniformity was there in the original sequence. uniform_real_distribution does this range adjustment for you, correctly.
The real comparison is between rand and one of the random number engines provided by the C++11 standard library. std::uniform_real_distribution just distributes the output of an engine according to some parameters (for example, real values between 10 and 20). You could just as well make an engine that uses rand behind the scenes.
Now the difference between the standard library random number engines and using plain old rand is in guarantee and flexibility. rand provides no guarantee for the quality of the random numbers - in fact, many implementations have shortcomings in their distribution and period. If you want some high quality random numbers, rand just won't do. However, the quality of the random number engines is defined by their algorithms. When you use std::mt19937, you know exactly what you're getting from this thoroughly tested and analysed algorithm. Different engines have different qualities that you may prefer (space efficiency, time efficiency, etc.) and are all configurable.
This is not to say you should use rand when you don't care too much. You might as well just start using the random number generation facilities from C++11 right away. There's no downside.
The reason is actually in the name of the function, which is the fact that the uniformity of the distribution of random numbers is better with std::uniform_real_distribution compared to the uniform distribution of random numbers that rand() provides.
The distribution for std::uniform_real_distribution is of course between a given interval [a,b).
Essentially, that is saying that the probability density that when you ask for a random number between 1 and 10 is as great of getting 5 or getting 9 or any other of the possible values with std::uniform_real_distribution, as when you'd do it with rand() and call it several times, the probability of getting 5 instead of 9 may be different.
Related
According to : http://www.cplusplus.com/reference/cstdlib/rand/
In C, the generation algorithm used by rand is guaranteed to only be
advanced by calls to this function. In C++, this constraint is
relaxed, and a library implementation is allowed to advance the
generator on other circumstances (such as calls to elements of
).
But then over here it says :
The function accesses and modifies internal state objects, which may
cause data races with concurrent calls to rand or srand.
Some libraries provide an alternative function that explicitly avoids
this kind of data race: rand_r (non-portable).
C++ library implementations are allowed to guarantee no data races for
calling this function.
Ideally I would like to have some kind of "instance" of rand, so that for that instance, and a given seed, I always generate the same sequence of numbers for calls to THAT instance . With the current versions it seems that in some platforms, calls by other functions to rand() (perhaps even on different threads), could affect the sequence of numbers generated in my thread by my code.
Is there an alternative, where I can hold on to some kind of "instance", where I am guaranteed to generate a particular sequence, given a seed, and where other calls to different "instances" do not affect it ?
EDIT: For clarity - my code is going to run on multiple different platforms (iOS, Android, Windows 8.1, Windows 10, Linux etc), and it isn't possible for me to test every implementation at present. I would just like to implement things based on what is guaranteed by the standard...
You can make use of std::uniform_int_distribution and std::mt19937 to keep a generator with your common seed (all from <random> library).
std::mt19937 gen(SEED);
std::uniform_int_distribution<> dis(MIN, MAX);
auto random_number = dis(gen);
Here, SEED is the seed number you want to specify. You can set another seed later with the .seed method too:
std::mt19937 gen{};
gen.seed(SEED);
If you need to generate one, you can use std::random_device for that:
std::random_device rd{};
std::mt19937 gen(rd());
The dis(MIN, MAX) part sets a range of min and max values this distribution can come up with, which means it will never generate a value bigger than MAX, or smaller than MIN.
Finally, you can use your generator with this distribution to generate your wanted random values like so: dis(gen). The distribution can take any generator, so if you want other distributions with the same sequence of random numbers, you may make a copy of gen, or use the same seed and construct two or more generators.
use random() instead of rand().
https://www.securecoding.cert.org/confluence/display/c/MSC30-C.+Do+not+use+the+rand%28%29+function+for+generating+pseudorandom+numbers
https://www.securecoding.cert.org/confluence/display/c/CON33-C.+Avoid+race+conditions+when+using+library+functions
I am using c++11 new <random> header in my application and in one class in different methods I need different random number with different distributions. I just put a random engine std::default_random_engine as class member seed it in the class constructor with std::random_device and use it for different distributions in my methods. Is that OK to use the random engine in this way or I should declare different engines for every distribution I use.
It's ok.
Reasons to not share the generator:
threading (standard RNG implementations are not thread safe)
determinism of random sequences:
If you wish to be able (for testing/bug hunting) to control the exact sequences generated, you will by likely have fewer troubles by isolating the RNGs used, especially when not all RNGs consumption is deterministic.
You should be careful when using one pseudo random number generator for different random variables, because in doing so they become correlated.
Here is an example: If you want to simulate Brownian motion in two dimensions (e.g. x and y) you need randomness in both dimensions. If you take the random numbers from one generator (noise()) and assign them successively
while(simulating)
x = x + noise()
y = y + noise()
then the variables x and y become correlated, because the algorithms of the pseudo number generators only make statements about how good they are, if you take every single number generated and not only every second one like in this example. Here, the Brownian particles could maybe move into the positive x and y directions with a higher probability than in the negative directions and thus introduce an artificial drift.
For two further reasons to use different generators look at sehe's answer.
MosteM's answer isn't correct. It's correct to do this so long as you want the draws from the distributions to be independent. If for some reason you need exactly the same random input into draws of different distributions, then you may want different RNGs. If you want correlation between two random variables, it's better to build them starting from a common random variable using mathematical principal: e.g., if A, B are independent normal(0,1), then A and aA +sqrt(1-a**2)B are normal(0,1) with correlation a.
EDIT: I found a great resource on the C++11 random library which may be useful to you.
There is no reason not to do it like this. Depending on which random generator you use, the period is quite huge (2^19937 in case of Mersenne-Twister), so in most cases, you won't even reach the end of one period during the execution of your program. And even if it is not said that, it's worse to reach the period with all distributions using the same generator than having 3 generators each doing 1/3 of their period.
In my programs, I use one generator for each thread, and it works fine. I think that's the main reason they split up the generator and distributions in C++11, since if you weren't allowed to do this, there would be no benefit from having the generator and the distribution separate, if one needs one generator for each distribution anyway.
If a random generator function is not supplied to the random_shuffle algorithm in the standard library, will successive runs of the program produce the same random sequence if supplied with the same data?
For example, if
std::random_shuffle(filenames.begin(), filenames.end());
is performed on the same list of filenames from a directory in successive runs of the program, is the random sequence produced the same as that in the prior run?
If you use the same random generator, with the same seed, and the same starting
sequence, the results will be the same. A computer is, after all,
deterministic in its behavior (modulo threading issues and a few other
odds and ends).
If you do not specify a generator, the default generator is
implementation defined. Most implementations, I think, use
std::rand() (which can cause problems, particularly when the number of
elements in the sequence is larger than RAND_MAX). I would recommend
getting a generator with known quality, and using it.
If you don't correctly seed the generator which is being used (another
reason to not use the default, since how you seed it will depend on the
implementation), then you'll get what you get. In the case of
std::rand(), the default always uses the same seed. How you seed
depends on the generator used. What you use to seed it should be vary
from one run to the other; for many applications, time(NULL) is
sufficient; on a Unix platform, I'd recommend reading however many bytes
it takes from /dev/random. Otherwise, hashing other information (IP
address of the machine, process id, etc.) can also improve things---it
means that two users starting the program at exactly the same second
will still get different sequences. (But this is really only relevant
if you're working in a networked environment.)
25.2.11 just says that the elements are shuffled with uniform distribution. It makes no guarantees as to which RNG is used behind the scenes (unless you pass one in) so you can't rely on any such behavior.
In order to guarantee the same shuffle outcome you'll need to provide your own RNG that provides those guarantees, but I suspect even then if you update your standard library the random_shuffle algorithm itself could change effects.
You may produce an identical result every run of the program. You can add a custom random number generator (which can be seeded from an external source) as an additional argument to std::random_shuffle if this is a problem. The function would be the third argument. Some people recommend call srand(unsigned(time(NULL))); before random_shuffle, but the results are often times implementation defined (and unreliable).
I'm using the Mersenne twister algorithm to shuffle playing cards. Each time the deck needs to be shuffled I seed it with time(NULL) + deckCutCardNumber which is where the user chose to cut the deck. Would I get better results from only seeding it the first hand and continuing to generate them with the same seed or is this method more random?
Thanks
Only seed the PRNG once. The statistical properties of the generated sequence are only guaranteed after the seed. If you reseed every time, the resulting sequence may not have any predictable statistical properties.
For instance, consider a PRNG which always returns the seed value itself as the first number in the sequence, but which is perfectly uniform over its range. This constitutes a great PRNG, as long as you don't use the first number. However, if you reseed it before every use, say to an incrementing counter value, you have no randomness at all!
Assuming the user doesn't mess with the clock (or carefully reduce their cut number by exactly the time that has passed), they'll never see a repeated state of the PRNG anyway, so it doesn't make much difference what you do. You'll get a reasonable distribution out of the Mersenne Twister from any seed value[*], and at any feasible number of steps after re-seeding.
If you're keen to reseed, though, you could combine both approaches by seeding with the time, plus the user-chosen number, plus an output taken from the generator just before reseeding. That combines (part of, not all) the current state of the PRNG with the new seed data, so to some degree all of the past times and cut values (and number of uses of the PRNG) can affect the state, not just the most recent. Pouring more information into the seed value in this way could be considered "more random" than a seed involving less information and hence fewer plausible values.
The only thing about Mersenne Twister in particular is that if you can observe 600-odd outputs of it, then you can deduce its internal state and predict the rest of the output until it's reseeded. Then again, you probably wouldn't use MT for an application where that sort of thing matters: if you're relying on the reseed in any way then you should probably use a more secure PRNG to begin with. Clearly it doesn't matter for your application if the user can predict the values out of the PRNG, since the user knows the time just as well as you do. All of this tells you that it shouldn't matter how it's seeded, just so long as it isn't seeded with exactly the same value so that two games are identical. Hence it doesn't matter whether it's reseeded either.
[*] That's not strictly true, there are classes of weak seeds for MT. But as long as you take that into account when seeding (for instance, hash the seed before use so that bad values are unlikely to crop up by chance), you work around that.
It will be less random if you seed off of the user choice every time than if you only seed once. The reason being that the choice of cut will probably have a skewed distribution (maybe cutting at the 10th card is the most likely etc). If you want to continuously seed you should use something like the system time as the seed.
Yes, you would get better results when not seeding every time. That's the purpose of a (good) random number generator.
In this special case the first value would just increase by the time you waited between the shuffles, while a continuously applied rng would give you numbers across it's whole range.
It's neither more nor less random. It's not really random at all anyway, but you won't notice any difference if you reseed it every time or not.
However, I'd recommend against it because time returns an unsigned int, so if you call it twice in the same second, you'll get the same number, and hence the same numbers from the RNG. Then there's distribution and all that.
I would suggest initializing the PRNG for each shuffle for a completely different reason: It allows you to quantify the state of the deck using only the seed, which means you can provide the seed to the user, or log it, or whatever suits, and be able to easily recreate the hand as dealt at a later stage.
You really should avoid seeding based on time, though - it's generally a better idea to use a source of randomness such as /dev/urandom instead.
Edit: Another argument for re-seeding occurs if you're worried about players guessing the internal state and therefore knowing what cards will be dealt in future. This is possible after observing 624 outputs from the Mersenne Twister (at least according to Wikipedia); this is only possible if you reuse the same PRNG. If this does matter, though, you certainly shouldn't be seeding based on time, and you should probably be using a cryptographically secure PRNG anyway.
Re-seeding the random number generator will not give you any higher quality random numbers than seeding it once (quite the contrary in many cases, depending on your seed values).
i am using the random number generator provided with stl c++. how do we bias it so that it produces smaller random numbers with a greater probability than larger random numbers.
One simple way would be to take every random number generated in the range [0,1) and raise it to any power greater than 1, depending how skewed you want the results
Well, in this case you probably would like a certain probability distribution. You can generate any distribution from a uniform random number generator, the question is only how it should look like. Rejection sampling is a common way of generating distributions that are hard to describe otherwise, but in your case something simpler might suffice.
You can take a look at this article for many common distribution functions. Chi, Chi-Square and Exponential look like good candidates.
Use std::discrete_distribution to calculate random numbers with a skewed probability distribution. See example here:
http://www.cplusplus.com/reference/random/discrete_distribution/