Matlab - Cumulative distribution function (CDF) - c++

I'm in the middle of a code translation from Matlab to C++, and for some important reasons I must obtain the cumulative distribution function of a 'normal' function (in matlab, 'norm') with mean=0 and variance=1.
The implementation in Matlab is something like this:
map.c = cdf( 'norm', map.c, 0,1 );
Which is suppose to be the equalization of the histogram from map.c.
The problem comes at the moment of translating it into C++, due to the lack of decimals I have. I tried a lot of typical cdf implementations: such as the C++ code I found here,
Cumulative Normal Distribution Function in C/C++ but I got an important lack of decimals, so I tried the boost implementation:
#include "boost/math/distributions.hpp"
boost::math::normal_distribution<> d(0,1);
but still it is not the same implementation as Matlab's (I guess it seems to be even more precise!)
Does anyone know where I could find the original Matlab source for such a process, or which is the correct amount of decimals I should consider?
Thanks in advance!

The Gaussian CDF is an interesting function. I do not know whether my answer will interest you, but it is likely to interest others who look up your question later, so here it is.
One can compute the CDF by integrating the Taylor series of the PDF term by term. This approach works well in the body of the Gaussian bell curve, only it fails numerically in the tails. In the tails, it wants special-function techniques. The best source I have read for this is N. N. Lebedev's Special Functions and Their Applications, Ch. 2, Dover, 1972.

Octave is an open-source Matlab clone. Here's the source code for Octave's implementation of normcdf: http://octave-nan.sourcearchive.com/documentation/1.0.6/normcdf_8m-source.html
It should be (almost) the same as Matlab's, if it helps you.

C and C++ support long double for a more precise floating point type. You could try using that in your implementation. You can check your compiler documentation to see if it provides an even higher precision floating point type. GCC 4.3 and higher provide __float128 which has even more precision.

Related

high order bessel function computation with large variables

My work involves computation of high order bessel function at large variable value. Within MATLAB, this has been done without problems. However, in order to scale up the problem, I have tuned to writing C++ code with MPI. Of course, the step to generate bessel function is done by invoking some libraries. To put the problem concrete, let me consider this very specific bug.
In matlab, suppose I wish to compute $J_46341(86840.0)$, and
matlab gives me: besselj(46341,86840)=0.001309896212292
However, a simple test example to call
gsl_sf_bessel_Jn_e returns "ERROR: NaN"
and I have checked at order 46340, both matlab and gsl returns the same answer 0.00292895 within acceptable accuracies. One more step in GSL results in the NaN error while matlab still retains a good accurate numerical answer.
I did try to use recurrence relations to generate higher order values, from a-not-so-small-order, say from order of 20000 and up, however, this only delays the NaN error without completely solving the problem.
Switching my attention to other available software libraries out there, I tried NAG, but to my utter disappointment,
nag_bessel_j_alpha (s18ekc) has constraint of abs(nl)<=101
, in other words, it can only compute up to order of 101 and it is clearly not in my interest of study.
So, my question is fairly simple:
Is there a more reliable library approach to obtain high order bessel
function value for large x?
Asymptotically, bessel function approaches 0, I can surely set those values to zero if the tail is approaching the underflow limit. However, the NaN problem seems to occur somewhat between strongly oscillating curve and asymptotically decaying tail.
Problem solved. Thank you for the community work and it really amazed me with your knowledge and contributions!!!
Please see here,
how to call fortran routines from C++?
https://mathoverflow.net/questions/225121/computation-of-high-order-bessel-function-at-large-variable-value
MATLAB, R, Python and JuliaLang/openspecfun all build upon the original fortran source code by Dr. Donald E. Amos (sandia national lab), cited paper:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
Now known as Amos Algorithm 644 collected by ACM.
http://dl.acm.org/citation.cfm?id=212078
http://dl.acm.org/citation.cfm?id=1268783
http://dl.acm.org/citation.cfm?id=98299
However, the source codes hosted on netlib are not bug free and probably not up-to-date,
http://netlib.sandia.gov/master/index.html
http://netlib.sandia.gov/amos/
While the version adopted by openspecfun works as solid,
https://github.com/JuliaLang/openspecfun

C++ armadillo not correctly solving poorly conditioned matrix

I have a relatively simple question regarding the linear solver built into Armadillo. I am a relative newcomer to C++ but have experience coding in other languages. I am solving a fluid flow problem by successive linearization, using the armadillo function Solve(A,b) to get the solution at each iteration.
The issue that I am running into is that my matrix is very ill-conditioned. The determinant is on the order of 10^-20 and the condition number is 75000. I know these are terrible conditions but it's what I've got. Does anyone know if it is possible to specify the precision in my A matrix and in the solve function to something beyond double (long double perhaps)? I know that there are double matrix classes in Armadillo but I haven't found any documentation for higher levels of precision.
To approach this from another angle, I wrote some code in Mathematica and the LinearSolve worked very well and the program converged to the correct answer. My reasoning is that Mathematica variables have higher precision which can handle the higher levels of rounding error.
If anyone has any insight on this, please let me know. I know there are other ways to approach a poorly conditioned matrix (like preconditioning and pivoting), but my work is more in the physics than in the actual numerical solution so I'm trying to steer clear of that.
EDIT: I just limited the precision in the Mathematica version to 15 decimal places and the program still converges. This leads me to believe it is NOT a variable precision question but rather an issue with the method.
As you said "your work is more in the physics": rather than trying to increase the accuracy, I would use the Moore-Penrose Pseudo-Inverse, which in Armadillo can be obtained by the function pinv. You should then experience a bit with the parameter tolerance to set it to a reasonable level.
The geometrical interpretation is as follows: bad condition numbers are due to the fact that the row/column-vectors are linearly dependent. In physics, such linearly dependencies usually have an origin which at least needs to be interpreted. The pseudoinverse first projects the matrix onto a lower dimensional space in which the vectors are "less linearly dependent" by dropping all singular vectors with singular values smaller than the parameter tolerance. The reulting matrix has a better condition number such that the standard inverse can be constructed with less problems.

c++ numerical analysis Accurate data structure?

Using double type I made Cubic Spline Interpolation Algorithm.
That work was success as it seems, but there was a relative error around 6% when very small values calculated.
Is double data type enough for accurate scientific numerical analysis?
Double has plenty of precision for most applications. Of course it is finite, but it's always possible to squander any amount of precision by using a bad algorithm. In fact, that should be your first suspect. Look hard at your code and see if you're doing something that lets rounding errors accumulate quicker than necessary, or risky things like subtracting values that are very close to each other.
Scientific numerical analysis is difficult to get right which is why I leave it the professionals. Have you considered using a numeric library instead of writing your own? Eigen is my current favorite here: http://eigen.tuxfamily.org/index.php?title=Main_Page
I always have close at hand the latest copy of Numerical Recipes (nr.com) which does have an excellent chapter on interpolation. NR has a restrictive license but the writers know what they are doing and provide a succinct writeup on each numerical technique. Other libraries to look at include: ATLAS and GNU Scientific Library.
To answer your question double should be more than enough for most scientific applications, I agree with the previous posters it should like an algorithm problem. Have you considered posting the code for the algorithm you are using?
If double is enough for your needs depends on the type of numbers you are working with. As Henning suggests, it is probably best to take a look at the algorithms you are using and make sure they are numerically stable.
For starters, here's a good algorithm for addition: Kahan summation algorithm.
Double precision will be mostly suitable for any problem but the cubic spline will not work well if the polynomial or function is quickly oscillating or repeating or of quite high dimension.
In this case it can be better to use Legendre Polynomials since they handle variants of exponentials.
By way of a simple example if you use, Euler, Trapezoidal or Simpson's rule for interpolating within a 3rd order polynomial you won't need a huge sample rate to get the interpolant (area under the curve). However, if you apply these to an exponential function the sample rate may need to greatly increase to avoid loosing a lot of precision. Legendre Polynomials can cater for this case much more readily.

evaluating multiple integrals

Is there any library to evaluate multidimensional integrals? I have at least 4 (in general much more than that), where the integrand is a combination of variables, so I cannot separate them. Do you know of any library for numerical evaluation? I'm especially looking for either matlab or c++, but I will use anything that will do the work.
Since you don't specify the kind of integrals or the actual dimensionality, I can only suggest that you take into account that
where the function F(x) is defined as
and use this fact to compute your integrals with the usual quadrature techniques. For example, you could use trapz or quad in MATLAB. However, if the dimensionality is truly high, then you are better off using Monte Carlo algorithms.
First link off google.
Seems pretty roboust.
"Numerical Recipes In C" has a very nice chapter on numerical integration.
Maybe Gaussian quadrature can help you out.
Yes there is TESTPACK which is C++ program which demonstrates the testing of a routine for multidimensional integration.

How are FFTs different from DFTs and how would one go about implementing them in C++?

After some studying, I created a small app that calculates DFTs (Discrete Fourier Transformations) from some input. It works well enough, but it is quite slow.
I read that FFTs (Fast Fourier Transformations) allow quicker calculations, but how are they different? And more importantly, how would I go about implementing them in C++?
If you don't need to manually implement the algorithm, you could take a look at the Fastest Fourier Transform in the West
Even thought it's developed in C, it officially works in C++ (from the FAQ)
Question 2.9. Can I call FFTW from
C++?
Most definitely. FFTW should compile
and/or link under any C++ compiler.
Moreover, it is likely that the C++
template class is
bit-compatible with FFTW's
complex-number format (see the FFTW
manual for more details).
FFT has n*log(n) compexity compared to DFT which has n^2.
There are lot of literature about that, and I strongly advise that you check that first, because such wide topic can not be full explaned here.
http://en.wikipedia.org/wiki/Fast_Fourier_transform (check external links )
If you need library I advise you to use existing one, for instance.
http://www.fftw.org/
This library has efficiently implementation of FFT and is also used in propariaretery software (MATLAB for instance)
Steven Smith's book The Scientist and Engineer's Guide to Digital Signal Processing , specifically Chapter 8 on the DFT and Chapter 12 on the FFT, does a much better job of explaining the two transforms that I ever could.
By the way, the whole book is available for free (link above) and it's a very good introduction to signal processing.
Regarding the C++ code request, I've only used the Fastest Fourier Transform in the West (already cited by superexsl) or DSP libraries such as those from TI or Analog Devices.
The results of a correctly implemented DFT are essentially identical to the results of a correctly implemented FFT (they differ only by rounding errors). As others have pointed out here, the major difference is that of performance. DFT has O(n^2) operations while the FFT has O(nlogn) operations.
The best, most readable publication I have ever found (the one I still refer to) is The Fast Fourier Transform and its Applications by E Oran Brigham. The first few chapters provide a very thorough overview of the continuous and discrete forms of the Fourier Transform. He then uses that to develop the fast version of the DFT based on the Cooley-Tukey Algorithm for the radix-2 (n is a power of 2) and mixed-radix cases (though the latter being somewhat more shallow treatise than the former).
The basic approach in the radix-2 algorithm to perform a linear time operation on the input X and to recursively split the result in half and perform a similar linear time operation on the two halves. The mixed radix case is similar, though you need to divide X into equal portions each time, so it helps if n doesn't have any large prime factors.
I've found this nice explanation with some algorithms described.
FastFourierTransform
About implementation,
first i'd make sure your implementation returns correct results (compare the output from matlab or octave - which have built in fourier transformates)
optimize when necessary, use profilers
don't use unnecesary for loops