I am trying to find a builtin CDF for chi square distribution.
Basically, I wish to have a CDF function like pchisq in R, where
chisquare(x,p,q) gives you the probability. x is the distribution of the function, p is the dof and q is the noncentrality parameter. I tried looking for some packages, but some library does not take in the q parameter.
To answer your question, unfortunately, no, this doesn’t exist in C++ as of C++20.
I needed to compute this CDF earlier today for a project and ended up just coding it up as a Riemann sum. For the special case where for k = 1, where the Riemann integral is hard to evaluate accurately because of the blowup of the PDF at x = 0, is used the fact that the CDF is given by erf(sqrt(x)), and those functions do exist in C++ in the <cmath> header.
Related
With this function I can sample from a normal distribution. I was wondering how could I sample efficiently from a normal distribution restricted to a certain interval [a,b]. My trivial approach would be to sample from the normal distribution and then keep the value if it belongs to a certain interval, otherwise re-sample. However would probably discards many values before I get a suitable one.
I could also approximate the normal distribution using a triangular distrubution, however I don't think this would be accurate enough.
I could also try to work on the cumulative function, but probably this would be slow as well. Is there any efficient approach to the problem?
Thx
I'm assuming you know how to transform to and from standard normal with shifting by μ and scaling by σ.
Option 1, as you said, is acceptance/rejection. Generate normals as usual, reject them if they're outside the range [a, b]. It's not as inefficient as you might think. If p = P{a < Z < b}, then the number of trials required follows a geometric distribution with parameter p and the expected number of attempts before accepting a value is 1/p.
Option 2 is to use an inverse Gaussian function, such as the one in boost. Calculate lo = Φ(a) and hi = Φ(b), the probabilities of your normal being below a and b, respectively. Then generate U distributed uniformly between lo and hi, and crank the resulting set of U's through the inverse Gaussian function and rescale to get outcomes with the desired truncated distribution.
The normal distribution is an integral, see the formula:
std::cout << "riemann_midpnt_sum = " << 1 / (sqrt(2*PI)) * riemann_mid_point_sum(fctn, -1, 1.0, 100) << '\n';
// where fctn is the function inside the integral
double fctn(double x) {
return exp(-(x*x)/2);
}
output: "riemann_midpnt_sum = 0.682698"
This calculates the normal distribution (standard) from -1 to 1.
This is using a riemman sum approximate the integral. You can take the riemman sum from here
You could have a look at the implementation of the normal dist function in your standard library (e.g., https://gcc.gnu.org/onlinedocs/gcc-4.6.3/libstdc++/api/a00277.html), and figure out a way to re-implement this with your constraint.
It might be tricky to understand the template-heavy library code, but if you really need speed then the trivial approach is not well suited, particularly if your interval is quite small.
I'm trying to plot a function which has a term with a generalized Laguerre polynomial in it. I know Mathematica can use LaguerreL[n, a, f(x,y)], but I'm not sure what the python equivalent would be. I'm currently trying scipy.special.genlaguerre(n, a, (f(x,y)), using x and y as numpy.arange arrays for the values across which I want to plot. but I keep getting the following error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
so I have 2 questions: 1) am I correct in using special.genlaguerre? (I think so)
2) how do I fix the truth value error that numpy is generating here?
Laguerre polynomials only depend on one variable. You want to evaluate it on x,y positions. In your problem, the one variable you need might be the radius from the origin r.
scipy.special.genlaguerre indeeds generate generalized laguerre polynomials. It returns a np.poly1d function. To evaluate this on a range of numbers:
rs = np.linspace(0,10)
scipy.special.genlaguerre(1, 0)(rs)
In your question, you try to supply f(x,y) as monic parameter to genlaguerre
If you need the evaluate this polynomial at very high accuracy, read this:
Laguerre polynomials in python using scipy, lack of convergence?
Is there any cpp's equivalent of the randsample in Matlab. The function randsample in Matlab is as follows:
y = randsample(n,k,true,w)
which returns a weighted sample taken with replacement, using a vector of positive weights w, whose length is n. The probability that the integer i is selected for an entry of y is w(i)/sum(w).
for example:
R = randsample('ACGT', 10,true,[0.15 0.35 0.35 0.15])
will have an output
CTTCGTCGGG
If there is no existing cpp lib having the same function as randsample in Matlab, how to write the equivalent function in cpp?
The Random library of boost helps a lot with this kind of stuff. It's not direct but a least they give you a way to draw samples from a discrete distribution defined by weights.
I am trying to solve two point boundary problem with odeint. My equation has the form of
y'' + a*y' + b*y + c = 0
It is pretty trivial when I have boundary conditions of y(x_1) = y_1 , y'(x_2) = y_2, but when boundary conditions are y(x_1) = y_1 , y(x_2) = y_2 I am lost. Does anybody know the way to deal with problems like this with odeint or other scientific library?
In this case you need a shooting method. odeint does not have such a method, it solved the initial value problem (IVP) which is your first case. I think in the Numerical Recipies this method is explained and you can use Boost.Odeint to do the time stepping.
An alternative and more efficient method to solve this type of problem is finite differences or finite elements method. For finite differences you can check Numerical Recipes. For finite elements I recommend dealii library.
Another approach is to use b-splines: Assuming you do know the initial x0 and final xfinal points of integration, then you can expand the solution y(x) in a b-spline basis, defined over (x0,xfinal), i.e.
y(x)= \sum_{i=1}^n A_i*B_i(x),
where A_i are constant coefficients to be determined, and B_i(x) are b-spline basis (well defined polynomial functions, that can be differentiated numerically). For scientific applications you can find an implementation of b-splines in GSL.
With this substitution the boundary value problem is reduced to a linear problem, since (am using Einstein summation for repeated indices):
A_i*[ B_i''(x) + a*B_i'(x) + b*B_i(x)] + c =0
You can choose a set of points x and create a linear system from the above equation. You can find information for this type of method in the following review paper "Applications of B-splines in Atomic and Molecular Physics" - H Bachau, E Cormier, P Decleva, J E Hansen and F Martín
http://iopscience.iop.org/0034-4885/64/12/205/
I do not know of any library solving directly this problem, but there are several libraries for B-splines (I recommend GSL for your needs), that will allow you to form the linear system. See this stackoverflow question:
Spline, B-Spline and NURBS C++ library
I'm trying to write a Monte Carlo simulation. In my simulation I need to generate many random variates from a discrete probability distribution.
I do have a closed-form solution for the distribution and it has finite support; however, it is not a standard distribution. I am aware that I could draw a uniform[0,1) random variate and compare it to the CDF get a random variate from my distribution, but the parameters in the distributions are always changing. Using this method is too slow.
So I guess my question has two parts:
Is there a method/algorithm to quickly generate finite, discrete random variates without using the CDF?
Is there a Python module and/or a C++ library which already has this functionality?
Acceptance\Rejection:
Find a function that is always higher than the pdf. Generate 2 Random variates. The first one you scale to calculate the value, the second you use to decide whether to accept or reject the choice. Rinse and repeat until you accept a value.
Sorry I can't be more specific, but I haven't done it for a while..
Its a standard algorithm, but I'd personally implement it from scratch, so I'm not aware of any implementations.
Indeed acceptance/rejection is the way to go if you know analytically your pdf. Let's call it f(x). Find a pdf g(x) such that there exist a constant c, such that c.g(x) > f(x), and such that you know how to simulate a variable with pdf g(x) - For example, as you work with a distribution with a finite support, a uniform will do: g(x) = 1/(size of your domain) over the domain.
Then draw a couple (G, U) such that G is simulated with pdf g(x), and U is uniform on [0, c.g(G)]. Then, if U < f(G), accept U as your variable. Otherwise draw again. The U you will finally accept will have f as a pdf.
Note that the constant c determines the efficiency of the method. The smaller c, the most efficient you will be - basically you will need on average c drawings to get the right variable. Better get a function g simple enough (don't forget you need to draw variables using g as a pdf) but will the smallest possible c.
If acceptance rejection is also too inefficient you could also try some Markov Chain MC method, they generate a sequence of samples each one dependent on the previous one, so by skipping blocks of them one can subsample obtaining a more or less independent set. They only need the PDF, or even just a multiple of it. Usually they work with fixed distributions, but can also be adapted to slowly changing ones.