Calculating probability using std::binomial_distribution - c++

Is it possible to calculate the probability of obtaining n success in k trials using std::binomial_distribution? How?

Not really no.
std::binomial_distribution is more of an adapter for random number generation rather than being something that can provide a cumulative density function.
You can implement a cumulative density function in a few lines of code yourself or have a search for a good mathematics library. (Use a Pascal's triangle approach as opposed to calculating large factorial numbers).

To implement such PMF, like Binomial
PMF(p, n,k) = n!/(k!*(n-k)!) p^k (1-p)^(n-k)
Good trick is to compute logarithm of it and the exponentiate it
For log of the factorial you have to use log of gamma function
Along the lines (not tested!)
double logChoose(int n, int k) {
return std::lgamma(double(n+1)) - std::lgamma(double(k+1)) - std::lgamma(double(n-k+1));
}
double PMFBinomial(double p, int n, int k) {
double lgr = logChoose(n, k) + double(k)*std::log(p) + double(n-k)*std::log(1-p);
return std::exp(lgr);
}

Related

C++ Multinomial distribution

I'm trying to code a multinomial algorithm that will basically apply a binomial distribution to each value of an input vector, knowing the values of all previous ones. It's aimed here to generate a new population for multiple alleles knowing the initial population.
To achieve this, I'm using this recursive algorithm :
This is what my code looks like right now :
void RandomNumbers::multinomial(std::vector<unsigned int>& alleleNumbers) {
/* In this function we need two different records of the size.
* We need the size from the old populations, ( N - n1 - ... - nA )
* and we also need the size from the newly created population,
* ( N - k1 - ... - kA ).
* In order to achieve such a task, we'll use the integer "temp" to store
* the value n1 before modifying it to k1 and so on.
*
*
*/
double totalSize = 0;
for(auto n : alleleNumbers) totalSize+=n;
double newTotalSize(totalSize);
std::cout<< newTotalSize;
for(size_t i = 0; i < alleleNumbers.size(); ++i){
size_t temp = alleleNumbers[i];
alleleNumbers[i] = binomial(newTotalSize,
(alleleNumbers[i])/(totalSize));
newTotalSize-= alleleNumbers[i];
totalSize = temp;
}
}
But I'm not sure at all about this, and I was wondering if there was an already existing multinomial algorithm of that kind...
Thank you very much.
You could try using the GNU Scientific Library's gsl_ran_multinomial command.
The function is called as:
gsl_ran_multinomial (const gsl_rng * r, size_t K, unsigned int N, const double p[], unsigned int n[])
where (n_1, n_2, ..., n_K) are nonnegative integers with sum_{k=1}^K n_k = N, and (p_1, p_2, ..., p_K) is a probability distribution with sum(p_i) = 1. If the array p[K] is not normalized then its entries will be treated as weights and normalized appropriately. The arrays n[] and p[] must both be of length K.
The function implements the conditional binomial method from C.S. Davis's "The computer generation of multinomial random variates" (Comp. Stat. Data Anal. 16, 1993. link), so you could implement using that approach. Let me know if you need a copy of the paper.

Creating a curve given min, max, and length

I am given a desired min, max, and length of an array. How can I generate numbers that fit to a "normally distributed" / bell curve for this array, with the specified min and max?
i.e.
min: 0
max: 6
length = 7
result: [0,2,4,6,4,2,0]
I know that I can linearly interpolate between the min and max to get to the middle, then in the reverse direction back down to the min at the end of the array. However, is there a way to do this using a distribution? then pull values from it?
i.e. I was thinking something like this
max - min = diff
diff / (length/2) = increment
[min + increment*index, ..., max, max - increment*index, ..., min ]
If your problem is really to generate an array with values from a triangle shape, then there's not much more to do than the simple loop you suggested. You may even write a function which returns f(k). E.g like so:
double get_kth_value(double min, double max, int length, int k) {
int mid = length/2;
if (k < mid) {
return min + (max - min) * k / mid;
} else {
return min + (max - min) * (length - 1 - k) / mid;
}
}
When you say:
However, is there a way to do this using a distribution? then pull
values from it?
I wonder whether you're hinting at the fact that your problem is slightly different. The wording suggest that you want to do sampling according to a given distribution. That is, you want to compute y=f(x) for x a uniform random variable, and the probability that you get a given y is prescribed by some given distribution (bell, binomial, triangle, whatnot). Then it becomes more fun (albeit super classical).
The generic sledgehammer for that is inverse transform sampling. You compute the cumulative distribution function, and there you go. For the case you suggest of a triangle shape, it's easy enough. Basically you'll want something like
double t = 2*uniformly_random_double_in_01()-1;
double y = breadth/2*(1-sqrt(1-fabs(1-t)))*(1-2*(t<0));
Pardon my laziness for not adjusting the bounds correctly, you need some of that stuff too, especially if you want integer values.
For the case of bell curves, there are various options:
If you're satisfied with a no-brain approach, you might try the Box-Muller transform and truncate the result.
If you're trying to obtain something which is related to a binomial distribution, then there are methods, too. See there.

Calculation sine and cosine in one shot

I have a scientific code that uses both sine and cosine of the same argument (I basically need the complex exponential of that argument). I was wondering if it were possible to do this faster than calling sine and cosine functions separately.
Also I only need about 0.1% precision. So is there any way I can find the default trig functions and truncate the power series for speed?
One other thing I have in mind is, is there any way to perform the remainder operation such that the result is always positive? In my own algorithm I used x=fmod(x,2*pi); but then I would need to add 2pi if x is negative (smaller domain means I can use a shorter power series)
EDIT: LUT turned out to be the best approach for this, however I am glad I learned about other approximation techniques. I will also advise using an explicit midpoint approximation. This is what I ended up doing:
const int N = 10000;//about 3e-4 error for 1000//3e-5 for 10 000//3e-6 for 100 000
double *cs = new double[N];
double *sn = new double[N];
for(int i =0;i<N;i++){
double A= (i+0.5)*2*pi/N;
cs[i]=cos(A);
sn[i]=sin(A);
}
The following part approximates (midpoint) sincos(2*pi*(wc2+t[j]*(cotp*t[j]-wc)))
double A=(wc2+t[j]*(cotp*t[j]-wc));
int B =(int)N*(A-floor(A));
re += cs[B]*f[j];
im += sn[B]*f[j];
Another approach could have been using the chebyshev decomposition. You can use the orthogonality property to find the coefficients. Optimized for exponential, it looks like this:
double fastsin(double x){
x=x-floor(x/2/pi)*2*pi-pi;//this line can be improved, both inside this
//function and before you input it into the function
double x2 = x*x;
return (((0.00015025063885163012*x2-
0.008034350857376128)*x2+ 0.1659789684145034)*x2-0.9995812174943602)*x;} //7th order chebyshev approx
If you seek fast evaluation with good (but not high) accuracy with powerseries you should use an expansion in Chebyshev polynomials: tabulate the coefficients (you'll need VERY few for 0.1% accuracy) and evaluate the expansion with the recursion relations for these polynomials (it's really very easy).
References:
Tabulated coefficients: http://www.ams.org/mcom/1980-34-149/S0025-5718-1980-0551302-5/S0025-5718-1980-0551302-5.pdf
Evaluation of chebyshev expansion: https://en.wikipedia.org/wiki/Chebyshev_polynomials
You'll need to (a) get the "reduced" argument in the range -pi/2..+pi/2 and consequently then (b) handle the sign in your results when the argument actually should have been in the "other" half of the full elementary interval -pi..+pi. These aspects should not pose a major problem:
determine (and "remember" as an integer 1 or -1) the sign in the original angle and proceed with the absolute value.
use a modulo function to reduce to the interval 0..2PI
Determine (and "remember" as an integer 1 or -1) whether it is in the "second" half and, if so, subtract pi*3/2, otherwise subtract pi/2. Note: this effectively interchanges sine and cosine (apart from signs); take this into account in the final evaluation.
This completes the step to get an angle in -pi/2..+pi/2
After evaluating sine and cosine with the Cheb-expansions, apply the "flags" of steps 1 and 3 above to get the right signs in the values.
Just create a lookup table. The following will let you lookup the sin and cos of any radian value between -2PI and 2PI.
// LOOK UP TABLE
var LUT_SIN_COS = [];
var N = 14400;
var HALF_N = N >> 1;
var STEP = 4 * Math.PI / N;
var INV_STEP = 1 / STEP;
// BUILD LUT
for(var i=0, r = -2*Math.PI; i < N; i++, r += STEP) {
LUT_SIN_COS[2*i] = Math.sin(r);
LUT_SIN_COS[2*i + 1] = Math.cos(r);
}
You index into the lookup table by:
var index = ((r * INV_STEP) + HALF_N) << 1;
var sin = LUT_SIN_COS[index];
var cos = LUT_SIN_COS[index + 1];
Here's a fiddle that displays the % error you can expect from different sized LUTS http://jsfiddle.net/77h6tvhj/
EDIT Here's an ideone (c++) with a ~benchmark~ vs the float sin and cos. http://ideone.com/SGrFVG For whatever a benchmark on ideone.com is worth the LUT is 5 times faster.
One way to go would be to learn how to implement the CORDIC algorithm. It is not difficult and pretty interesting intelectually. This gives you both the cosine and the sine. Wikipedia gives a MATLAB example that should be easy to adapt in C++.
Note that you can augment speed and reduce precision simply by lowering the parameter n.
About your second question, it has already been asked here (in C). It seems that there is no simple way.
You can also calculate sine using a square root, given the angle and the cosine.
The example below assumes the angle ranges from 0 to 2π:
double c = cos(angle);
double s = sqrt(1.0-c*c);
if(angle>pi)s=-s;
For single-precision floats, Microsoft uses 11-degree polynomial approximation for sine, 10-degree for cosine: XMScalarSinCos.
They also have faster version, XMScalarSinCosEst, that uses lower-degree polynomials.
If you aren’t on Windows, you’ll find same code + coefficients on geometrictools.com under Boost license.

Divide and conquer algorithm: find the minimum of a matrix

I'm starting to learn how to implement divide and conquer algorithms, but I'm having some serious trouble with this exercise.
I have written an algorithm which finds the minimum value in a given vector using the divide and conquer method:
int minimum (int v[], int inf, int sup)
{
int med, m1, m2;
if (inf == sup)
return v[inf];
med = (inf+sup)/2;
m1 = minimum (v, inf, med);
m2 = minimum (v, med+1, sup);
if (m1 < m2)
return m1;
else
return m2;
}
And it works. Now, I have to do the same exercise on a matrix, but I'm getting lost. Specifically, I have been told to do the following:
Let n = 2^k. Consider a nxn square matrix. Calculate its minimum value using a recursive function
double (minmatrix(Matrix M))
return minmatrix2(M, 0, 0, M.row);
(the Matrix type is given, and as you can imagine M.row gives the number of rows and columns of the matrix). Use an auxiliary function
double minmatrix2(Matrix M, int i, int j, int m)
This has to be done use a recursive divide and conquer algorithm.
So.. I can't figure out a way of doing it. I have been given the suggestion of splitting the matrix in 4 parts each time (from (i,j) to (i+m/2, j+m/2), from (i+m/2,j) to (i+m,j+m/2), from (i,j+m/2) to (i+m/2,j+m), from (i+m/2,j+m/2) to (i+m,j+m)) and try to implement a code working in a similar way to the one I have written for the array.. but I just seem to be unable to do it. Any suggestions? Even if you don't want to post a complete answer, just give me some indications. I really want to understand this.
EDIT: All right, I've done this. I'm posting the code I have used just in case someone else has the same doubt.
double minmatrix2(Matrix M, int i, int j, int m)
{
int a1, a2, a3, a4;
if (m == 1)
return M[i][j];
else
a1 = minmatrix2(M, i, j, m/2);
a2 = minmatrix2(M, i+(m/2), j, m/2);
a3 = minmatrix2(M, i, j+(m/2), m/2);
a4 = minmatrix2(M, i+(m/2), j+(m/2), m/2);
if (min (a1, a2) < min (a3, a4))
return min (a1, a2);
else
return min (a3, a4);
}
(function min defined elsewhere)
Consider that a 2D matrix in C or C++ is often implemented as accessor functions on top of a 1D array. You already know how to do this for a 1D array, so the only difference is how you address the cells. If you do this, your performance will intrinsically be optimal because you will address neighboring cells together.
Alternatively, consider that a 2D matrix has two dimensions N and M. Just break it in half along the larger dimension repeatedly until the larger dimension is less than X, some reasonable value to stop and do the actual computation sequentially. This is not entirely optimal because you will have to "skip" over parts of the matrix as you address memory.
A final idea is to divide by the major dimension first, then the minor one. In C this means divide by rows until you have single rows, then run your 1D array algorithm on each row. This produces roughly optimal performance.

Avoid numerical underflow when obtaining determinant of large matrix in Eigen

I have implemented a MCMC algorithm in C++ using the Eigen library. The main part of the algorithm is a loop in which first some some matrix calculations are performed after which the determinant of the resulting matrix is obtained and added to the output. E.g.:
MatrixXd delta0;
NumericVector out(3);
out[0] = 0;
out[1] = 0;
for (int i = 0; i < s; i++) {
...
delta0 = V*(A.cast<double>()-(A+B).cast<double>()*theta.asDiagonal());
...
I = delta0.determinant()
out[1] += I;
out[2] += std::sqrt(I);
}
return out;
Now on certain matrices I unfortunately observe a numerical underflow so that the determinant is outputted as zero (which it actually isn't).
How can I avoid this underflow?
One solution would be to obtain, instead of the determinant, the log of the determinant. However,
I do not know how to do this;
how could I then add up these logs?
Any help is greatly appreciated.
There are 2 main options that come to my mind:
The product of eigenvalues of square matrix is the determinant of this matrix, therefore a sum of logarithms of each eigenvalue is a logarithm of the determinant of this matrix. Assume det(A) = a and det(B) = b for compact notation. After applying aforementioned for 2 matrices A and B, we end up with log(a) and log(b), then actually the following is true:
log(a + b) = log(a) + log(1 + e ^ (log(b) - log(a)))
Yes, we get a logarithm of the sum. What would you do with it next? I don't know, depends on what you have to. If you have to remove logarithm by e ^ log(a + b) = a + b, then you might be lucky that the value of a + b does not underflow now, but in some cases it can still underflow as well.
Perform clever preconditioning; there might be tons of options here, and you better read about them from some trusted sources as this is a serious topic. The simplest (and probably the cheapest ever) example of preconditioning for this particular problem could be to recall that det(c * A) = (c ^ n) * det(A), where A is n by n matrix, and to premultiply your matrix with some c, compute the determinant, and then to divide it by c ^ n to get the actual one.
Update
I thought about one more option. If on the last stages of #1 or #2 you still experience underflow too frequently, then it might be a good idea to increase precision specifically for these last operations, for example, by utilizing GNU MPFR.
You can use Householder elimination to get the QR decomposition of delta0. Then the determinant of the Q part is +/-1 (depending on whether you did an even or odd number of reflections) and the determinant of the R part is the product of the diagonal elements. Both of these are easy to compute without running into underflow hell---and you might not even care about the first.