I've written a simulation in C++ that generates (1,000,000)^2 numbers from a specific probability distribution and then does something with them. So far I've used Exponential, Normal, Gamma, Uniform and Poisson distributions. Here is the code for one of them:
#include <boost/random.hpp>
...main...
srand(time(NULL)) ;
seed = rand();
boost::random::mt19937 igen(seed) ;
boost::random::variate_generator<boost::random::mt19937, boost::random::normal_distribution<> >
norm_dist(igen, boost::random::normal_distribution<>(mu,sigma)) ;
Now I need to run it for the Beta distribution. All of the distributions I've done so far took 10-15 hours. The Beta distribution is not in the boost/random package so I had to use the boost/math/distributions package. I found this page on StackOverflow which proposed a solution. Here it is (copy-pasted):
#include <boost/math/distributions.hpp>
using namespace boost::math;
double alpha, beta, randFromUnif;
//parameters and the random value on (0,1) you drew
beta_distribution<> dist(alpha, beta);
double randFromDist = quantile(dist, randFromUnif);
I replicated it and it worked. The run time estimates of my simulation are linear and accurately predictable. They say that this will run for 25 days. I see two possibilities:
1. the method proposed is inferior to the one I was using previously for other distributions
2. the Beta distribution is just much harder to generate random numbers from
Bare in mind that I have below minimal understanding of C++ coding, so the questions I'm asking may be silly. I can't wait for a month for this simulation to complete, so is there anything I can do to improve that? Perhaps use the initial method that I was using and modify it to work with the boost/math/distributions package? I don't even know if that's possible.
Another piece of information that may be useful is that the parameters are the same for all (1,000,000)^2 of the numbers that I need to generate. I'm saying this because the Beta distribution does have a nasty PDF and perhaps the knowledge that the parameters are fixed can somehow be used to simplify the process? Just a random guess.
The beta distribution is related to the gamma distribution. Let X be a random number drawn from Gamma(α,1) and Y from Gamma(β,1), where the first argument to the gamma distribution is the shape parameter. Then Z=X/(X+Y) has distribution Beta(α,β). With this transformation, it should only take twice as much time as your gamma distribution test.
Note: The above assumes the most common representation of the gamma distribution, Gamma(shape,scale). Be aware that different implementations of the gamma distribution random generator will vary with the meaning and order of the arguments.
If you want a distribution that is very Beta-like, but has a very simple closed-form inverse CDF, it's worth considering the Kumaraswamy distribution:
http://en.wikipedia.org/wiki/Kumaraswamy_distribution
It's used as an alternative to the Beta distribution when a large number of random samples are required quickly.
Try compiling with optimization. Using a flag -O3 will usually speed things up. See this post on optimisation flags or this overview for slightly more detail.
Related
When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients.
After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after getting rid of all progress made before.
Roughly 21 iterations correspond to an Epoch.
This spike (at least the second one) resulted in a complete shift of the parameters (for vectors of parameters I just plotted the norm to see changes):
How can I deal with that? My first approach would be to clip the gradient, but that seems to require digging around the gpflow code.
My Setup:
Training works via Natural Gradients for the variational parameters and ADAM for the rest, with a slowly (linearly) increasing schedule for the Natural Gradient Gamma.
The batch and inducing point sizes are as large as possible for my setup
(both 2^12, with the data set consisting of ~88k samples). I include 1e-5 jitter and initialize the inducing points with kmeans.
I use a combined kernel, consisting of a combination of RBF, Matern52, a periodic and a linear kernel on a total of 95 features (a lot of them due to a one-hot encoding), all learnable.
The lengthscales are transformed with gpflow.transforms.
with gpflow.defer_build():
k1 = Matern52(input_dim=len(kernel_idxs["coords"]), active_dims=kernel_idxs["coords"], ARD=False)
k2 = Periodic(input_dim=len(kernel_idxs["wday"]), active_dims=kernel_idxs["wday"])
k3 = Linear(input_dim=len(kernel_idxs["onehot"]), active_dims=kernel_idxs["onehot"], ARD=True)
k4 = RBF(input_dim=len(kernel_idxs["rest"]), active_dims=kernel_idxs["rest"], ARD=True)
#
k1.lengthscales.transform = gpflow.transforms.Exp()
k2.lengthscales.transform = gpflow.transforms.Exp()
k3.variance.transform = gpflow.transforms.Exp()
k4.lengthscales.transform = gpflow.transforms.Exp()
m = gpflow.models.SVGP(X, Y, k1 + k2 + k3 + k4, gpflow.likelihoods.Poisson(), Z,
mean_function=gpflow.mean_functions.Constant(c=np.ones(1)),
minibatch_size=MB_SIZE, name=NAME)
m.mean_function.set_trainable(False)
m.compile()
UPDATE: Using only ADAM
Following the suggestion by Mark, I switched to ADAM only,
which helped me get rid of that sudden explosion. However, I still initialized with an epoch of natgrad only, which seems to save a lot of time.
In addition, the variational parameters seem to change a lot less abrupt (in terms of their norm at least). I guess they'll converge way slower now, but at least it's stable.
Just to add to Mark's answer above, when using nat grads in non-conjugate models it can take a bit of tuning to get the best performance, and instability is potentially a problem. As Mark points out, the large steps that provide potentially faster convergence can also lead to the parameters ending up in in bad regions of the parameter space. When the variational approximation is good (i.e. the true and approximate posterior are close) then there is good reason to expect that the nat grad will perform well, but unfortunately there is no silver bullet in the general case. See https://arxiv.org/abs/1903.02984 for some intuition.
This is very interesting. Perhaps trying to not use natgrads is a good idea as well. Clipping gradients indeed seems like a hack that could work. And yes, this would require digging around in the GPflow code a bit. One tip that can help towards this, is by not using the GPflow optimisers directly. The model._likelihood_tensor contains the TF tensor that should be optimised. Perhaps with some manual TensorFlow magic, you can do the gradient clipping on here before running an optimiser.
In general, I think this sounds like you've stumbled on an actual research problem. Usually these large gradients have a good reason in the model, which can be addressed with careful thought. Is it variance in some monte carlo estimate? Is the objective function behaving badly?
Regarding why not using natural gradients helps. Natural gradients use the Fisher matrix as a preconditioner to perform second order optimisation. Doing so can result in quite aggressive moves in parameter space. In certain cases (when there are usable conjugacy relations) these aggressive moves can make optimisation much faster. This case, with the Poisson likelihood, is not one where there are conjugacy relations that will necessarily help optimisation. In fact, the Fisher preconditioner can often be detrimental, particularly when variational parameters are not near the optimum.
I am trying to use mersenne twister to generate samples from various distribution. I have one generator and it is used to generate all of them. Something strange (to me at least) happens here. On one hand calculating the correlation coefficient of the various samples gives me almost zero, which seems nice. But when I change a parameter of one distribution (which is used nowhere else), it somehow also changes the results I get in others. Specifically:
#include <boost/random.hpp>
using namespace boost; // boost random library for random generators
mt19937 generator(7687); // mersenne twister random number generator, seed = 7687
double normal_sample(double mu, double sigma)
// returns a sample from normal distribution with mean mu and variance sigma
{
normal_distribution<> norm_dist;
variate_generator<mt19937&, normal_distribution<> > norm_rnd(generator, norm_dist);
return(mu + sigma * norm_rnd());
}
double poisson_sample(double intensity)
// returns a number of points in a realization of a Poisson point process
{
poisson_distribution<> poiss_dist(intensity);
variate_generator<mt19937&, poisson_distribution<> > poiss_rnd(generator, poiss_dist);
return(poiss_rnd());
}
This is the code...the generator part, then I draw from those two distributions, changing the parameter called intensity. This changes not only the Poisson sample, but the normal one as well...actually, now that I think of it, it kind of makes sense, because my Poisson sample determines a number of points that are also randomly generated using the same generator...so then then depending on how many of them there are, I get something else, because the normal sample is generated using different numbers in the sequence. Is that correct?
If so, how would one go about changing that? Should I use multiple generators?
It probably means that depending on the parameters fewer or more random samples are extracted from the mersenne twister.
This logically implies that all other results are shifted, making all other outcomes different.
[...] it kind of makes sense, because my Poisson sample determines a number of points that are also randomly generated using the same generator...so then then depending on how many of them there are, I get something else, because the normal sample is generated using different numbers in the sequence. Is that correct?
Seems to me you got it figured out already, yes.
If you wanted repeatable PRNG, use separate PRNG states, i.e. different mersenne egnines.
I'm trying to write a Monte Carlo simulation. In my simulation I need to generate many random variates from a discrete probability distribution.
I do have a closed-form solution for the distribution and it has finite support; however, it is not a standard distribution. I am aware that I could draw a uniform[0,1) random variate and compare it to the CDF get a random variate from my distribution, but the parameters in the distributions are always changing. Using this method is too slow.
So I guess my question has two parts:
Is there a method/algorithm to quickly generate finite, discrete random variates without using the CDF?
Is there a Python module and/or a C++ library which already has this functionality?
Acceptance\Rejection:
Find a function that is always higher than the pdf. Generate 2 Random variates. The first one you scale to calculate the value, the second you use to decide whether to accept or reject the choice. Rinse and repeat until you accept a value.
Sorry I can't be more specific, but I haven't done it for a while..
Its a standard algorithm, but I'd personally implement it from scratch, so I'm not aware of any implementations.
Indeed acceptance/rejection is the way to go if you know analytically your pdf. Let's call it f(x). Find a pdf g(x) such that there exist a constant c, such that c.g(x) > f(x), and such that you know how to simulate a variable with pdf g(x) - For example, as you work with a distribution with a finite support, a uniform will do: g(x) = 1/(size of your domain) over the domain.
Then draw a couple (G, U) such that G is simulated with pdf g(x), and U is uniform on [0, c.g(G)]. Then, if U < f(G), accept U as your variable. Otherwise draw again. The U you will finally accept will have f as a pdf.
Note that the constant c determines the efficiency of the method. The smaller c, the most efficient you will be - basically you will need on average c drawings to get the right variable. Better get a function g simple enough (don't forget you need to draw variables using g as a pdf) but will the smallest possible c.
If acceptance rejection is also too inefficient you could also try some Markov Chain MC method, they generate a sequence of samples each one dependent on the previous one, so by skipping blocks of them one can subsample obtaining a more or less independent set. They only need the PDF, or even just a multiple of it. Usually they work with fixed distributions, but can also be adapted to slowly changing ones.
I need to generate quickly lots of random numbers from binomial distributions for dramatically different trial sizes (most, however, will be small). I was hoping not to have to code an algorithm by hand (see, e.g., this related discussion from November), because I'm a novice programmer and don't like reinventing wheels. It appears Boost does not supply a generator for binomially distributed variates, but TR1 and GSL do. Is there a good reason to choose one over the other, or is it better that I write something customized to my situation? I don't know if this makes sense, but I'll alternate between generating numbers from uniform distributions and binomial distributions throughout the program, and I'd like for them to share the same seed and to minimize overhead. I'd love some advice or examples for what I should be considering.
Boost 1.43 appears to support binomial distributions. You can use boost::variate_generator to connect your source of randomness to the type
of distribution you want to sample from.
So your code might look something like this (Disclaimer: not tested!):
boost::mt19937 rng; // produces randomness out of thin air
// see pseudo-random number generators
const int n = 20;
const double p = 0.5;
boost::binomial<> my_binomial(n,p); // binomial distribution with n=20, p=0.5
// see random number distributions
boost::variate_generator<boost::mt19937&, boost::binomial<> >
next_value(rng, my_binomial); // glues randomness with mapping
int x = next_value(); // simulate flipping a fair coin 20 times
You misunderstand the Boost model - you choose a random number generator type and then a distribution on which to spread the values the RNG produces over. There's a very simple example in this answer, which uses a uniform distribution, but other distributions use the same basic pattern - the generator and the distribution are completely decoupled.
How can I easily generate random numbers following a normal distribution in C or C++?
I don't want any use of Boost.
I know that Knuth talks about this at length but I don't have his books at hand right now.
There are many methods to generate Gaussian-distributed numbers from a regular RNG.
The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.
C++11
C++11 offers std::normal_distribution, which is the way I would go today.
C or older C++
Here are some solutions in order of ascending complexity:
Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.
The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.
For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.
1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973
A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.
I created a C++ open source project for normally distributed random number generation benchmark.
It compares several algorithms, including
Central limit theorem method
Box-Muller transform
Marsaglia polar method
Ziggurat algorithm
Inverse transform sampling method.
cpp11random uses C++11 std::normal_distribution with std::minstd_rand (it is actually Box-Muller transform in clang).
The results of single-precision (float) version on iMac Corei5-3330S#2.70GHz , clang 6.1, 64-bit:
For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.
Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.
Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.
P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.
Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.
#include "math.h" // for RAND, and rand
double sampleNormal() {
double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
double r = u * u + v * v;
if (r == 0 || r > 1) return sampleNormal();
double c = sqrt(-2 * log(r) / r);
return u * c;
}
You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).
Use std::tr1::normal_distribution.
The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.
This is how you generate the samples on a modern C++ compiler.
#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;
You can use the GSL. Some complete examples are given to demonstrate how to use it.
Have a look on: http://www.cplusplus.com/reference/random/normal_distribution/. It's the simplest way to produce normal distributions.
If you're using C++11, you can use std::normal_distribution:
#include <random>
std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);
double randomNumber = distribution(generator);
There are many other distributions you can use to transform the output of the random number engine.
I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:
const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
return RandN2(0, 1.0);
}
It is maybe not the best approach, but it's quite simple.
The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.
You may take a look of it: http://c-faq.com/lib/gaussian.html
There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/
In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.
The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:
in the literature, Box–Muller is sometimes regarded as slightly inferior, mainly for two reasons. First, if one applies the Box–Muller method to numbers from a bad linear congruential generator, the transformed numbers provide an extremely poor coverage of the space. Plots of transformed numbers with spiraling tails can be found in many books, most notably in the classic book of Ripley, who was probably the first to make this observation"
Box-Muller implementation:
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
// return a uniformly distributed random number
double RandomGenerator()
{
return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
// return a normally distributed random number
double normalRandom()
{
double y1=RandomGenerator();
double y2=RandomGenerator();
return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}
int main(){
double sigma = 82.;
double Mi = 40.;
for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
cout << " x = " << x << endl;
}
return 0;
}
1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.
This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.
2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.
These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.
Monte Carlo method
The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge.
a. Choose a random number z between -X to X.
b. Keep with a probability of N(z, mean, variance) where N is the gaussian distribution. Drop otherwise and go back to step (a).
Take a look at what I found.
This library uses the Ziggurat algorithm.
Computer is deterministic device. There is no randomness in calculation.
Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.
You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)
There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is:
- the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical
- values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.
One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.
I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:
double generateUniform(double a, double b)
{
return uniformGen.generateReal(a, b);
}
double generateRelei(double sigma)
{
return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
double y2 = generateUniform(0.0, 2 * kPi);
double y1 = generateRelei(1.0);
double x1 = y1 * cos(y2);
return sigma*x1 + m;
}
Such way how todo it was just an example, I guess it exist another ways to implement it.
Provement that it is correct can be found in this book
"Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0
Unfortunately I don't know about existence of translation of this book into English.