how to bias a random number generator

how to bias a random number generator - c++

i am using the random number generator provided with stl c++. how do we bias it so that it produces smaller random numbers with a greater probability than larger random numbers.

One simple way would be to take every random number generated in the range [0,1) and raise it to any power greater than 1, depending how skewed you want the results

Well, in this case you probably would like a certain probability distribution. You can generate any distribution from a uniform random number generator, the question is only how it should look like. Rejection sampling is a common way of generating distributions that are hard to describe otherwise, but in your case something simpler might suffice.
You can take a look at this article for many common distribution functions. Chi, Chi-Square and Exponential look like good candidates.

Use std::discrete_distribution to calculate random numbers with a skewed probability distribution. See example here:
http://www.cplusplus.com/reference/random/discrete_distribution/

Related

C++ normal_distribution function for simulation application

I was wondering what kind of random number generator does the normal_distribution function use ?
Does it fit for scientific simulation application ?
Regards

std::normal_distribution doesn't do any random number generation. It is a random number distribution. Random number distributions only map values returned by a random number engine to some kind of distribution. They don't do any generation themselves. So it is the random number engine that you care about.
One of the random number engines provided by the standard, the std::mersenne_twister_engine is a very high quality random number engine. You can use it to generate random numbers with a normal distribution like so:
std::random_device rd;
std::mt19937 gen(rd()); // Create and seed the generator
std::normal_distribution<> d(mean, deviation); // Create distribution
std::cout << d(gen) << std::endl; // Generate random numbers according to distribution
Note that std::mt19937 is a typedef of std::mersenne_twister_engine.

The whole point of the <random> standard library is to separate distributions from random number generators. You supply a random number generator that generates uniform integers, and the distribution takes care of transforming that random, uniform integer sequence into a sample of the desired distribution.
Fortunately, the <random> library also contains a collection of random number generators. The Mersenne Twister (std::mt19937) in particular is a relatively good (i.e. fast and statistically high quality) one.
(You also need to provide a seed for the generator.)

I know the post is old, however, I hope my answer is beneficial. I use normal_distribution to generate a Gaussian noise for a sensor. This is beneficial for simulating sensors. For example, let's say you have a sensor that gives you the position of a robot in 2D. Every time you move the robot, the sensor gives you some readings of the position of your robot. In OpenGL, you can simulate this example. For example, you can track the position of the mouse and add some Gaussian noise to the real position of the mouse. In this case, you have a sensor that track the position of the mouse, however it have uncertainty due to the noise.

std::uniform_real_distribution and rand()

Why is std::uniform_real_distribution better than rand() as the random number generator? Can someone give an example please?

First, it should be made clear that the proposed comparison is nonsensical.
uniform_real_distribution is not a random number generator. You cannot produce random numbers from a uniform_real_distribution without having a random number generator that you pass to its operator(). uniform_real_distribution "shapes" the output of that random number generator into an uniform real distribution. You can plug various kinds of random number generators into a distribution.
I don't think this makes for a decent comparison, so I will be comparing the use of uniform_real_distribution with a C++11 random number generator against rand() instead.
Another obvious difference that makes the comparison even less useful is the fact that uniform_real_distribution is used to produce floating point numbers, while rand() produces integers.
That said, there are several reasons to prefer the new facilities.
rand() is global state, while when using the facilities from <random> there is no global state involved: you can have as many generators and distributions as you want and they are all independent from each other.
rand() has no specification about the quality of the sequence generated. The random number generators from C++11 are all well-specified, and so are the distributions. rand() implementations can be, and in practice have been, of very poor quality, and not very uniform.
rand() provides a random number within a predefined range. It is up to the programmer to adjust that range to the desired range. This is not a simple task. No, it is not enough to use % something. Doing this kind of adjustment in such a naive manner will most likely destroy whatever uniformity was there in the original sequence. uniform_real_distribution does this range adjustment for you, correctly.

The real comparison is between rand and one of the random number engines provided by the C++11 standard library. std::uniform_real_distribution just distributes the output of an engine according to some parameters (for example, real values between 10 and 20). You could just as well make an engine that uses rand behind the scenes.
Now the difference between the standard library random number engines and using plain old rand is in guarantee and flexibility. rand provides no guarantee for the quality of the random numbers - in fact, many implementations have shortcomings in their distribution and period. If you want some high quality random numbers, rand just won't do. However, the quality of the random number engines is defined by their algorithms. When you use std::mt19937, you know exactly what you're getting from this thoroughly tested and analysed algorithm. Different engines have different qualities that you may prefer (space efficiency, time efficiency, etc.) and are all configurable.
This is not to say you should use rand when you don't care too much. You might as well just start using the random number generation facilities from C++11 right away. There's no downside.

The reason is actually in the name of the function, which is the fact that the uniformity of the distribution of random numbers is better with std::uniform_real_distribution compared to the uniform distribution of random numbers that rand() provides.
The distribution for std::uniform_real_distribution is of course between a given interval [a,b).
Essentially, that is saying that the probability density that when you ask for a random number between 1 and 10 is as great of getting 5 or getting 9 or any other of the possible values with std::uniform_real_distribution, as when you'd do it with rand() and call it several times, the probability of getting 5 instead of 9 may be different.

How to create a vector containing a (artificially generated) Guassian (normal) distribution?

If I have data (a daily stock chart is a good example but it could be anything) in which I only know the range (high - low) that X units sold within but I don't know the exact price at which any given item sold. Assume for simplicity that the price range contains enough buckets (e.g. forty one-cent increments for a 40 cent range) to make such a distribution practical. How can I go about distributing those items to form a normal bell curve stored in a vector? It doesn't have to be perfect but realistic.
My (very) naive thinking has been to assume that since random numbers should form a normal distribution I can do something like have a binary RNG. If, for example, there are forty buckets then if a '0' comes up 40 times the 0th bucket gets incremented and if a '1' comes up for times in a row then the 39th bucket gets incremented. If '1' comes up 20 times then it is in the middle of the vector. Do this for each item until X units have been accounted for. This may or may not be right and in any case seems way more inefficient than necessary. I am looking for something more sensible.
This isn't homework, just a problem that has been bugging me and my statistics is not up to snuff. Most literature seems to be about analyzing the distribution after it already exists but not much about how to artificially create one.
I want to write this in c++ so pre-packaged solutions in R or matlab or whatnot are not too useful for me.
Thanks. I hope this made sense.

Most literature seems to be about analyzing the distribution after it already exists but not much about how to artificially create one.
There's tons of literature on how to create one. The Box–Muller transform, the Marsaglia polar method (a variant of Box-Muller), and the Ziggurat algorithm are three. (Google those terms). Both Box-Muller methods are easy to implement.
Better yet, just use a random generator that already exists that implements one of these algorithms. Both boost and the new C++11 have such packages.

The algorithm that you describe relies on the Central Limit Theorem that says that a random variable defined as the sum of n random variables that belong to the same distribution tends to approach a normal distribution when n grows to infinity. Uniformly distributed pseudorandom variables that come from a computer PRNG make a special case of this general theorem.
To get a more efficient algorithm you can view probability density function as a some sort of space warp that expands the real axis in the middle and shrinks it to the ends.
Let F: R -> [0:1] be the cumulative function of the normal distribution, invF be its inverse and x be a random variable uniformly distributed on [0:1] then invF(x) will be a normally distributed random variable.
All you need to implement this is be able to compute invF(x). Unfortunately this function cannot be expressed with elementary functions. In fact, it is a solution of a nonlinear differential equation. However you can efficiently solve the equation x = F(y) using the Newton method.
What I have described is a simplified presentation of the Inverse transform method. It is a very general approach. There are specialized algorithms for sampling from the normal distribution that are more efficient. These are mentioned in the answer of David Hammen.

using one random engine for multi distributions in c++11

I am using c++11 new <random> header in my application and in one class in different methods I need different random number with different distributions. I just put a random engine std::default_random_engine as class member seed it in the class constructor with std::random_device and use it for different distributions in my methods. Is that OK to use the random engine in this way or I should declare different engines for every distribution I use.

It's ok.
Reasons to not share the generator:
threading (standard RNG implementations are not thread safe)
determinism of random sequences:
If you wish to be able (for testing/bug hunting) to control the exact sequences generated, you will by likely have fewer troubles by isolating the RNGs used, especially when not all RNGs consumption is deterministic.

You should be careful when using one pseudo random number generator for different random variables, because in doing so they become correlated.
Here is an example: If you want to simulate Brownian motion in two dimensions (e.g. x and y) you need randomness in both dimensions. If you take the random numbers from one generator (noise()) and assign them successively
while(simulating)
x = x + noise()
y = y + noise()
then the variables x and y become correlated, because the algorithms of the pseudo number generators only make statements about how good they are, if you take every single number generated and not only every second one like in this example. Here, the Brownian particles could maybe move into the positive x and y directions with a higher probability than in the negative directions and thus introduce an artificial drift.
For two further reasons to use different generators look at sehe's answer.

MosteM's answer isn't correct. It's correct to do this so long as you want the draws from the distributions to be independent. If for some reason you need exactly the same random input into draws of different distributions, then you may want different RNGs. If you want correlation between two random variables, it's better to build them starting from a common random variable using mathematical principal: e.g., if A, B are independent normal(0,1), then A and aA +sqrt(1-a**2)B are normal(0,1) with correlation a.
EDIT: I found a great resource on the C++11 random library which may be useful to you.

There is no reason not to do it like this. Depending on which random generator you use, the period is quite huge (2^19937 in case of Mersenne-Twister), so in most cases, you won't even reach the end of one period during the execution of your program. And even if it is not said that, it's worse to reach the period with all distributions using the same generator than having 3 generators each doing 1/3 of their period.
In my programs, I use one generator for each thread, and it works fine. I think that's the main reason they split up the generator and distributions in C++11, since if you weren't allowed to do this, there would be no benefit from having the generator and the distribution separate, if one needs one generator for each distribution anyway.

Best way to generate a set of integers of size N, distributed like a normal distribution, given a mean and std. deviation

I'm looking for a way to generate a set of integers with a specified mean and std. deviation.
Using the random library, it is possible to generate a set of random doubles distributed in gaussian fashion, this would look something like this:
#include <tr1/random>
std::tr1::normal_distribution<double> normal(mean, stdDev);
std::tr1::ranlux64_base_01 eng;
eng.seed(1000);
for (int i = 0; i < N; i++)
{
gaussiannums[i] = normal(eng);
}
However, for my application, I need integers instead of doubles. So my question is, how would you generate the equivalent of the above but for integers instead of doubles? One possible path to take is to convert the doubles into integers in some fashion, but I don't know enough about how the random library works to know whether this can be done in a fashion that really preserves the bell shape and the mean/std. deviation.
I should mention that the goal here is not so much randomness, as it is to get a set of integers of a specific size, with the correct mean and std. deviation.
Ideally I would also like to specify the minimum and maximum values that can be produced, but I have not found any way to do this even for doubles, so any suggestions on this are also welcome.

This isn't possible.
The gaussian distribution is continuous, the set of integers is discrete.
The gaussian pdf has unlimited support, if you specify minimum and maximum you'll also have a different distribution.
What are you really trying to do? Is it only the mean and standard deviation that count? Other distributions have a well-defined mean and standard-deviation, including several discrete distributions.
For example, you could use a binomial distribution.
Solve the equations for mean and variance simultaneously to get p and n. Then generate samples from this distribution.
If n doesn't come out integer, you can use a multinomial distribution instead.
Although wikipedia describes methods for sampling from a binomial or multinomial distribution, they aren't particularly efficient. There's a method for efficiently generating samples from an arbitrary discrete distribution which you can use here.
In the comments, you clarified that you want a bell-shaped distribution with specific mean and standard deviation and bounded support. So we'll use the Gaussian as a starting point:
compute a gaussian CDF across the range of integers you're interested in
offset and scale it slightly to account for the missing tails (so it varies from 0 to 1)
store it in an array
To sample from this distribution:
generate uniform reals in the range [0:1]
use binary search to invert the CDF
As the truncation step will reduce the standard deviation slightly (and affect the mean also, if the minimum and maximum aren't equidistant from the chosen mean) you may have to tweak the Gaussian parameters slightly beforehand.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js