Generate Random Number Based on Beta Distribution using Boost - c++

I am trying to use Boost to generate random numbers according to the beta distribution using C++. I have seen many examples online for generating random numbers according to distributions in random.hpp (e.g. this book). However, I cannot seen to translate them to use the beta distribution found in beta.hpp.
Thanks.

You'll first want to draw a random number uniformly from the range (0,1). Given any distribution, you can then plug that number into the distribution's "quantile function," and the result is as if a random value was drawn from the distribution. From here:
A general method to generate random numbers from an arbitrary distribution which has a cdf without jumps is to use the inverse function to the cdf: G(y)=F^{-1}(y). If u(1), ..., u(n) are random numbers from the uniform on (0,1) distribution then G(u(1)), ..., G(u(n)) is a random sample from the distribution with cdf F(x).
So how do we get a quantile function for a beta distribution? The documentation for beta.hpp is here. You should be able to use something like this:
#include <boost/math/distributions.hpp>
using namespace boost::math;
double alpha, beta, randFromUnif;
//parameters and the random value on (0,1) you drew
beta_distribution<> dist(alpha, beta);
double randFromDist = quantile(dist, randFromUnif);

According to boost's demo for the random number library
Random_demo.cpp and Generating integers with different probabilities
What you should do is to use "variate_generator" class to bind your random number generator and distribution.
An example may look like
#include <iostream>
#include "boost/random.hpp"
int main(int argc, char *argv[])
{
int seed = 2018;
typedef boost::random::mt19937 RandomNumberGenerator;
typedef boost::random::beta_distribution<> BetaDistribution;
typedef boost::variate_generator<RandomNumberGenerator&, BetaDistribution>
Generator;
RandomNumberGenerator Rng(seed);
BetaDistribution distribution(2,5);
Generator getRandomNumber(Rng,distribution);
for (int idx = 0 ; idx < 1000 ; ++idx)
{
std::cout << getRandomNumber() << std::endl;
}
return 0;
}
However, in the more recent document enter link description here, it seems that boost recommends to directly passing the generator to the distribution obejct. The result from the code below is identical.
#include <iostream>
#include "boost/random.hpp"
int main(int argc, char *argv[])
{
int seed = 2018;
typedef boost::random::mt19937 RandomNumberGenerator;
typedef boost::random::beta_distribution<> BetaDistribution;
RandomNumberGenerator Rng(seed);
BetaDistribution distribution(2,5);
for (int idx = 0 ; idx < 1000 ; ++idx)
{
std::cout << distribution(Rng) << std::endl;
}
return 0;
}

Related

Setting GSL RNG seed correctly in Rcpp for model with repeat iterations

I am writing a stochastic, process driven model of transmission of infection and diagnostic testing to detect infection. The model requires repeat random samples across multiple time steps and iterations. The faster my model can run, the better. For the random sampling in the model, parameters for the random samples can change at each time step in the model. I first wrote my model in R, and then in CPP (via the great Rcpp package). In Rcpp, using the R based random number generator, the model takes about 7% of the time to run as it took in R. I was advised that using GSL within CPP for random number generation is faster again. In the CPP model, with GSL based random sampling instead of R based random sampling, I get a marginal increase in speed. However, I am not sure that I am using the GSL based random sampler correctly.
My questions are:
Is it correct to only do the seed setting procedure once for the GSL RNG based on the time of day and use this same construct for all of my random draws (as I have done in code below)? I confess I do not fully understand the seed setting procedure within CPP for GSL as I am new to both. I have compared the distributions produced using both R-based and GSL-based RNG and they are very similar, so hopefully this bit is OK.
I obtained the code for setting the GSL seed according to the time of day from this Stack Overflow post:
GSL Uniform Random Number Generator
I was expecting a greater increase in speed using the GSL RNG. Is there anything I can do to maximize the speed of the GSL RNG?
I am using a Windows machine and the RStudio interface. I am sourcing the CPP functions from R using the Rcpp package. All of the packages and programmes were recently reinstalled. Here is the session info:
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
For context, I am a veterinary epidemiologist with R experience, but only two months into learning CPP. This is my first stack exchange query. Thanks in advance for your time!
Here is an example of what I am trying to achieve written in CPP (using Rcpp in RStudio) and using the GSL based RNG. Please can somebody tell me if this is the correct way to set the GSL RNG seed? Is it OK to do the seed setting process just once at the top of the function?
// CPP code - function GSL RNG written using Rcpp on a CPP file in RStudio
// [[Rcpp::plugins(cpp11)]]
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_blas.h>
#include <iostream>
#include <gsl/gsl_math.h>
#include <sys/time.h>
#include <RcppGSL.h>
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::export]]
Rcpp:: NumericMatrix check_cpp_gsl_rng(int n_iters,
int min_unif,
int max_unif,
double exp_rate,
double bernoulli_prob)
{
const gsl_rng_type * T;
gsl_rng * r;
gsl_rng_env_setup();
struct timeval tv; // Seed generation based on time
gettimeofday(&tv,0);
unsigned long mySeed = tv.tv_sec + tv.tv_usec;
T = gsl_rng_default; // Generator setup
r = gsl_rng_alloc (T);
gsl_rng_set(r, mySeed);
// matrix to collect outputs
Rcpp:: NumericMatrix Output_Mat(n_iters, 7);
for (int i = 0; i < n_iters; i++) // in real model, parameters may change for each iteration
{
// random exponential draws
Output_Mat(i, 0) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 1
Output_Mat(i, 1) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 2
// random uniform draws
Output_Mat(i, 2) = gsl_ran_flat(r, min_unif, max_unif); // unif 1
Output_Mat(i, 3) = gsl_ran_flat(r, min_unif, max_unif); // unif 2
// random Bernoulli draws
Output_Mat(i, 4) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 1
Output_Mat(i, 5) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 2
Output_Mat(i, 6) = i; // record iteration number
}
return Output_Mat;
gsl_rng_free(r);
// end of function
}
The plot below shows a comparison of run speeds of the random sampling function implemented in R only, CPP using the R RNG and CPP using the GSL RNG (as in code above) based on 100 comparisons of 1000 iterations using the "microbenchmark" package.
A package you may find useful is my RcppZiggurat (github). It revives the old but fast Ziggurat RNG for normal covariates and times it. It use several other Ziggurat implementations as benchmarks -- including one from the GSL.
First, we can use its code and infrastructure to set up a simple structure (see below). I first show that 'yes indeed' we can seed a GSL RNG:
> setseedGSL(42)
> rnormGSLZig(5)
[1] -0.811264 1.092556 -1.873074 -0.146400 -1.653703
> rnormGSLZig(5) # different
[1] -1.281593 0.893496 -0.545510 -0.337940 -1.258800
> setseedGSL(42)
> rnormGSLZig(5) # as before
[1] -0.811264 1.092556 -1.873074 -0.146400 -1.653703
>
Note that we need a global variable for an instance of a GSL RNG 'state'.
Second, we can show that Rcpp is actually faster that either the standard normal GSL generator or its Ziggurat implementation. Using Rcpp vectorised is even faster:
> library(microbenchmark)
> n <- 1e5
> res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
> res
Unit: microseconds
expr min lq mean median uq max neval cld
rnormGSLZig(n) 996.580 1151.7065 1768.500 1355.053 1424.220 18597.82 100 b
rnormGSLPlain(n) 996.316 1085.6820 1392.323 1358.696 1431.715 2929.05 100 b
rcppLoop(n) 223.221 259.2395 641.715 518.706 573.899 13779.20 100 a
rcppDirect(n) 46.224 67.2075 384.004 293.499 320.919 14883.86 100 a
>
The code is below; it is a pretty quick adaptation from my RcppZiggurat package. You can sourceCpp() it (if you have RcppGSL installed which I used to 'easily' get the compile and link instructions to the GSL) and it will run the demo code shown above.
#include <Rcpp/Lighter>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
// [[Rcpp::depends(RcppGSL)]]
class ZigguratGSL {
public:
ZigguratGSL(uint32_t seed=12345678) {
gsl_rng_env_setup() ;
r = gsl_rng_alloc (gsl_rng_default);
gsl_rng_set(r, seed);
}
~ZigguratGSL() {
gsl_rng_free(r);
}
double normZig() {
const double sigma=1.0;
return gsl_ran_gaussian_ziggurat(r, sigma);
}
double normPlain() {
const double sigma=1.0;
return gsl_ran_gaussian_ziggurat(r, sigma);
}
void setSeed(const uint32_t seed) {
gsl_rng_set(r, seed);
}
private:
gsl_rng *r;
};
static ZigguratGSL gsl;
// [[Rcpp::export]]
void setseedGSL(const uint32_t s) {
gsl.setSeed(s);
return;
}
// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLZig(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = gsl.normZig();
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLPlain(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = gsl.normPlain();
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rcppLoop(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = R::rnorm(1.0,0.0);
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rcppDirect(int n) {
return Rcpp::rnorm(n, 1.0, 0.0);
}
/*** R
setseedGSL(42)
rnormGSLZig(5)
rnormGSLZig(5) # different
setseedGSL(42)
rnormGSLZig(5) # as before
library(microbenchmark)
n <- 1e5
res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
res
*/
PS We write it as Rcpp. Capital R, lowercase cpp.

Draw random numbers from Boost binomial distribution

Here is an example to draw random numbers from a binomial distribution with std::binomial_distribution
#include <random>
int main ()
{
std::mt19937 eng(14);
std::binomial_distribution<size_t> dist(28,0.2);
size_t randomNumber = dist(eng);
return 0;
}
I am failing to find a similar example for boost. I went through this documentation, which explains how to compute PDF, CDF and others from a boost::math::binomial object but they are not talking about sampling a random number.
Should I write a binary search myself based on the CDF that boost::math::binomial will compute for me or can boost directly return random numbers?
Thanks to this link from #Bob__, here is a simple working example
#include <random>
#include <boost/random.hpp>
int main ()
{
std::mt19937 eng;
boost::random::binomial_distribution<int> dist(28,0.2);
int randomNumber = dist(eng);
return 0;
}
For some reason, it would not compile with size_t, so I used int (see #Bob__'s comment below for more information).

create random number from a poisson dist. using MersenneTwister

Hi I have a simulation I'm runnning in which I get random numbers from a uniform and normal distributions easily:
#include <iostream>
#include "MersenneTwister.h"
using namespace std;
int main()
{
MTRand mtrand1;
double r1,r2;
r1 = mtrand.rand(); // from a uninform dist.
r2 = mtrand1.randNorm(); //from a normal dist.
}
I would like to use this random number generator to obtain a random number from a poisson distribution with mean 'A'.
Any idea about how to implement this procedure using the MersseneTwister code?
the code can be found here:
https://gcc.gnu.org/bugzilla/attachment.cgi?id=11960. and it is widely used.
You can use the standard library
#include<random>
double mean = 3.1415926;
std::mt19937 mt{std::random_device{}()};
std::poisson_distribution<> pd{mean};
auto n = pd(mt); // get a number
Do note that seeding with std::random_device is unlikely to be satisfactory.

Adding Gaussian noise

I have a .arff file which contains a list of float numbers. I need to add to every number a gaussian noise, which in MATLAB would be:
m = m+k*randn(size(m)
where m is one of the numbers in the list and k is a standard deviation and has value 0.1. What is the C++ equivalent to randn()?
Could you please provide an example?
Use std::normal_distribution with an appropriate generator (std::default_random_engine will usually work). See http://en.cppreference.com/w/cpp/numeric/random for details on all of the random number generation facilities of the C++ standard library.
(live example)
#include <iostream>
#include <iterator>
#include <random>
int main() {
// Example data
std::vector<double> data = {1., 2., 3., 4., 5., 6.};
// Define random generator with Gaussian distribution
const double mean = 0.0;
const double stddev = 0.1;
std::default_random_engine generator;
std::normal_distribution<double> dist(mean, stddev);
// Add Gaussian noise
for (auto& x : data) {
x = x + dist(generator);
}
// Output the result, for demonstration purposes
std::copy(begin(data), end(data), std::ostream_iterator<double>(std::cout, " "));
std::cout << "\n";
return 0;
}
Output:
0.987803 1.89132 3.06843 3.89248 5.00333 6.07448
Further considerations
For decent statistical properties, you'll probably want to choose the std::mersenne_twister_engine generator (or, for convenience, the std::mt19937 predefined version), and seed it using std::random_device:
std::mt19937 generator(std::random_device{}());
[Note: Seeding from std::random_device is a good practice to get into; if you use the current time as a seed, you can end up with the same seed value across multiple generators (e.g. when initialising several generators in a very short period of time). std::random_device will obtain entropy from the system, if available.]
In order to avoid passing the generator to the distribution every time, you can do something like:
auto dist = std::bind(std::normal_distribution<double>{mean, stddev},
std::mt19937(std::random_device{}()));
which you can then use as follows:
double val = dist();
(Live example with these modifications)
The c++ standard now includes several distributions for random numbers.
You are looking for std::normal_distribution.
In the documentation you can also find a code sample
// construct a trivial random generator engine from a time-based seed:
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator (seed);
std::normal_distribution<double> distribution (0.0,1.0);
std::cout << "some Normal-distributed(0.0,1.0) results:" << std::endl;
for (int i=0; i<10; ++i)
std::cout << distribution(generator) << std::endl;
The parameters given to the constructor, std::normal_distribution, are first mean (0.0) and standard-deviation (1.0).

Fast way to avoid modulo bias

I'm doing a shuffle and it gets done very often on a small array. Could be anything from 1 - 10 elements.
I've tried the accepted answer in this question:
Is this C implementation of Fisher-Yates shuffle correct?
Unfortunately it's extremely slow.
I need a faster way of doing this and avoiding modulo bias which I'm seeing. Any suggestions?
EDIT:
Sorry I should point out that it's not the shuffle that's slow, it's the method used to generate a random int range. i.e. rand_int(). I'm using a Mersenne twister algorithm and RAND_MAX in my case is UINT_MAX to help out. This of course makes it slower when n is much smaller than RAND_MAX
I've also found 2 implementations of a rand_int type function.
static int rand_int(int n) {
int limit = RAND_MAX - RAND_MAX % n;
int rnd;
do {
rnd = rand();
} while (rnd >= limit);
return rnd % n;
}
The following is much much faster. But, does it avoid the modulo bias problem?
int rand_int(int limit) {
int divisor = RAND_MAX/(limit);
int retval;
do {
retval = rand() / divisor;
} while (retval > limit);
return retval;
}
Edit
To address the basic question on avoiding the modulo bias with rand() see http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx.
In short, you can't get truly uniform other than skipping non-domain random numbers1; The article lists some formulae to get a smaller bias (int r = rand() / ( RAND_MAX / N + 1 ) eg) without sacrificing more performance.
1 See Java's implementation of Random.nextInt(int):
http://download.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextInt(int)
Using C++
You should be able to use std::random_shuffle (from <algorithm> header);
If you must roll your own shuffle implementation, I suggest using std::random (TR1, C++0x or Boost). It comes with a number of generators and distributions, with varying performance characteristics.
#include <random>
std::mt19937 rng(seed);
std::uniform_int_distribution<int> gen(0, N); // uniform, unbiased
int r = gen(rng);
Refer to the boost documentation for a good overview of Boost Random generator and distribution characteristics:
http://www.boost.org/doc/libs/1_47_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Here is a sample of doing std::random_shuffle using Boost Random, directly:
#include <algorithm>
#include <functional>
#include <vector>
#include <boost/random.hpp>
struct Rng
{
Rng(boost::mt19937 &rng) : _rng(rng) {}
unsigned operator()(unsigned i)
{
boost::uniform_int<> dist(0, i - 1);
return dist(_rng);
}
private:
boost::mt19937 &_rng;
};
boost::mt19937 state;
std::random_shuffle(v.begin(), v.end(), Rng(state));