My work involves computation of high order bessel function at large variable value. Within MATLAB, this has been done without problems. However, in order to scale up the problem, I have tuned to writing C++ code with MPI. Of course, the step to generate bessel function is done by invoking some libraries. To put the problem concrete, let me consider this very specific bug.
In matlab, suppose I wish to compute $J_46341(86840.0)$, and
matlab gives me: besselj(46341,86840)=0.001309896212292
However, a simple test example to call
gsl_sf_bessel_Jn_e returns "ERROR: NaN"
and I have checked at order 46340, both matlab and gsl returns the same answer 0.00292895 within acceptable accuracies. One more step in GSL results in the NaN error while matlab still retains a good accurate numerical answer.
I did try to use recurrence relations to generate higher order values, from a-not-so-small-order, say from order of 20000 and up, however, this only delays the NaN error without completely solving the problem.
Switching my attention to other available software libraries out there, I tried NAG, but to my utter disappointment,
nag_bessel_j_alpha (s18ekc) has constraint of abs(nl)<=101
, in other words, it can only compute up to order of 101 and it is clearly not in my interest of study.
So, my question is fairly simple:
Is there a more reliable library approach to obtain high order bessel
function value for large x?
Asymptotically, bessel function approaches 0, I can surely set those values to zero if the tail is approaching the underflow limit. However, the NaN problem seems to occur somewhat between strongly oscillating curve and asymptotically decaying tail.
Problem solved. Thank you for the community work and it really amazed me with your knowledge and contributions!!!
Please see here,
how to call fortran routines from C++?
https://mathoverflow.net/questions/225121/computation-of-high-order-bessel-function-at-large-variable-value
MATLAB, R, Python and JuliaLang/openspecfun all build upon the original fortran source code by Dr. Donald E. Amos (sandia national lab), cited paper:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
Now known as Amos Algorithm 644 collected by ACM.
http://dl.acm.org/citation.cfm?id=212078
http://dl.acm.org/citation.cfm?id=1268783
http://dl.acm.org/citation.cfm?id=98299
However, the source codes hosted on netlib are not bug free and probably not up-to-date,
http://netlib.sandia.gov/master/index.html
http://netlib.sandia.gov/amos/
While the version adopted by openspecfun works as solid,
https://github.com/JuliaLang/openspecfun
Related
I am trying to solve linear constraint satisfaction problems. So I grabbed the "GNU Linear Programming Kit," wrote my constraints, and let it loose on it with some simple objective function.
GLPK claimed to find a solution, but if I check it against the constraints, they are not satisfied. Namely an expression that should be <= 0 is actually around 1e-10. I.e., slightly greater than 0.
I can probably live with the issue, by setting up my constraints to return the Chebyshev centre of the polyhedron, but I wonder if such discrepancies are to be expected with linear programming solvers, or I should report it as a bug for the GLPK folks.
All LP solvers use feasibility and other tolerances. These are needed because floating-point computations are not exact. You can tighten them a bit, but in general, it is better not to touch them.
So, you should expect solutions with the following properties:
variables are slightly outside their bounds
constraints may be violated by a small amount
binary and integer variables are slightly non-integer
I need to increase computation speed of MATLAB code. For this purpose I rewrite my program on C language with Intel IPP library for operations with vectors. And here I got a problem:
after some step main computation circle program in MATLAB and my C program go to different pathes of algorythm. It is happened because computations not absolutely equal and my program accumulate error in compare with MATLAB computations results. For this reason, my program doesn't compute correct gradient and the whole optimization algorythm doesn't count well. So I got a computation speed increase, but lost computation efficiency - when on 100th step MATLAB compute optimization error on 0.004, C program compute on 0.05 and this is important in my task.
I checked what function give me error, and what I found: common operations (like ippsAdd_64f_A53, ippsSub_64f_A53, ippsMul_f64_A53, ippsDiv_64f_A53 and usual C operations ,-,*,/) make equal to MATLAB results and sum error is zero, but math.h hyperbolic functions give a sum error on array with 75699 elements about -3..-5e-13. Intel functions ippsCosh_64f_A53 and others give a sum error about -1..-5e-14.
Do you know a library to compute high precision hyperbolic and exponent functions? Or maybe there are some compilator settings in Visual Studio 2012, which can help me?
All computations made in Ipp64f data type (double) in VS 2012 with installed Intel Parallel Studio XE 2013.
P.S.: Sum error was computed in MATLAB. I saved arrays from my C program to level 4 mat file and then imported in MATLAB where I summed difference between MATLAB array and imported array like sum(M_cosh - C_cosh);
Not an answer, more of an extended comment:
You write
I need to increase computation speed of MATLAB code
and ask
Do you know a library to compute high precision trigonometric and
exponent functions?
Yes, I know of several such libraries, but they implement floating-point numbers with more bits than are typically-provided on current CPUs (mainly 32- and 64-bit) and which implement, in software, arithmetic on these numbers. For your purpose of increasing computation speed, such libraries are useless, their increased precision is explicitly bought at the cost of increased execution time. For many other users that's a reasonable trade off.
I don't know of any widely-used or well-regarded libraries which implement precision-preserving algorithms on machine-numbers. There isn't space here to go into any detail, but for an introduction to the problem you could do worse than start reading about Kahan's summation algorithm.
The Mathworks are somewhat coy about revealing what algorithms Matlab implements. However most of the computational kernels of Matlab are written in C (or C++, I believe) and compiled into libraries. Many of them are now multi-threaded too. If you are trying to write code to outperform Matlab you will have to write multi-threaded, high-performance numerical code.
It wouldn't surprise me at all to learn that the algorithms that Matlab implements do have precision-preserving capabilities. The Mathworks are, after all, trying to offer the market a tool which will solve a wide range of problems without the user having to consider low-level issues such as whether or not machine-precision is good enough for a particular combination of problem and dataset.
Finally. It doesn't surprise me that your first attempts were unsuccessful, though beating Matlab for speed is impressive. And I look forward, sceptically, to being pleasantly surprised when you report success, a code of your own which outperforms Matlab in time and produces satisfactory results.
I'm in the middle of a code translation from Matlab to C++, and for some important reasons I must obtain the cumulative distribution function of a 'normal' function (in matlab, 'norm') with mean=0 and variance=1.
The implementation in Matlab is something like this:
map.c = cdf( 'norm', map.c, 0,1 );
Which is suppose to be the equalization of the histogram from map.c.
The problem comes at the moment of translating it into C++, due to the lack of decimals I have. I tried a lot of typical cdf implementations: such as the C++ code I found here,
Cumulative Normal Distribution Function in C/C++ but I got an important lack of decimals, so I tried the boost implementation:
#include "boost/math/distributions.hpp"
boost::math::normal_distribution<> d(0,1);
but still it is not the same implementation as Matlab's (I guess it seems to be even more precise!)
Does anyone know where I could find the original Matlab source for such a process, or which is the correct amount of decimals I should consider?
Thanks in advance!
The Gaussian CDF is an interesting function. I do not know whether my answer will interest you, but it is likely to interest others who look up your question later, so here it is.
One can compute the CDF by integrating the Taylor series of the PDF term by term. This approach works well in the body of the Gaussian bell curve, only it fails numerically in the tails. In the tails, it wants special-function techniques. The best source I have read for this is N. N. Lebedev's Special Functions and Their Applications, Ch. 2, Dover, 1972.
Octave is an open-source Matlab clone. Here's the source code for Octave's implementation of normcdf: http://octave-nan.sourcearchive.com/documentation/1.0.6/normcdf_8m-source.html
It should be (almost) the same as Matlab's, if it helps you.
C and C++ support long double for a more precise floating point type. You could try using that in your implementation. You can check your compiler documentation to see if it provides an even higher precision floating point type. GCC 4.3 and higher provide __float128 which has even more precision.
Using double type I made Cubic Spline Interpolation Algorithm.
That work was success as it seems, but there was a relative error around 6% when very small values calculated.
Is double data type enough for accurate scientific numerical analysis?
Double has plenty of precision for most applications. Of course it is finite, but it's always possible to squander any amount of precision by using a bad algorithm. In fact, that should be your first suspect. Look hard at your code and see if you're doing something that lets rounding errors accumulate quicker than necessary, or risky things like subtracting values that are very close to each other.
Scientific numerical analysis is difficult to get right which is why I leave it the professionals. Have you considered using a numeric library instead of writing your own? Eigen is my current favorite here: http://eigen.tuxfamily.org/index.php?title=Main_Page
I always have close at hand the latest copy of Numerical Recipes (nr.com) which does have an excellent chapter on interpolation. NR has a restrictive license but the writers know what they are doing and provide a succinct writeup on each numerical technique. Other libraries to look at include: ATLAS and GNU Scientific Library.
To answer your question double should be more than enough for most scientific applications, I agree with the previous posters it should like an algorithm problem. Have you considered posting the code for the algorithm you are using?
If double is enough for your needs depends on the type of numbers you are working with. As Henning suggests, it is probably best to take a look at the algorithms you are using and make sure they are numerically stable.
For starters, here's a good algorithm for addition: Kahan summation algorithm.
Double precision will be mostly suitable for any problem but the cubic spline will not work well if the polynomial or function is quickly oscillating or repeating or of quite high dimension.
In this case it can be better to use Legendre Polynomials since they handle variants of exponentials.
By way of a simple example if you use, Euler, Trapezoidal or Simpson's rule for interpolating within a 3rd order polynomial you won't need a huge sample rate to get the interpolant (area under the curve). However, if you apply these to an exponential function the sample rate may need to greatly increase to avoid loosing a lot of precision. Legendre Polynomials can cater for this case much more readily.
I'm convinced that software testing indeed is very important, especially in science. However, over the last 6 years, I never have come across any scientific software project which was under regular tests (and most of them were not even version controlled).
Now I'm wondering how you deal with software tests for scientific codes (numerical computations).
From my point of view, standard unit tests often miss the point, since there is no exact result, so using assert(a == b) might prove a bit difficult due to "normal" numerical errors.
So I'm looking forward to reading your thoughts about this.
I am also in academia and I have written quantum mechanical simulation programs to be executed on our cluster. I made the same observation regarding testing or even version control. I was even worse: in my case I am using a C++ library for my simulations and the code I got from others was pure spaghetti code, no inheritance, not even functions.
I rewrote it and I also implemented some unit testing. You are correct that you have to deal with the numerical precision, which can be different depending on the architecture you are running on. Nevertheless, unit testing is possible, as long as you are taking these numerical rounding errors into account. Your result should not depend on the rounding of the numerical values, otherwise you would have a different problem with the robustness of your algorithm.
So, to conclude, I use unit testing for my scientific programs, and it really makes one more confident about the results, especially with regards to publishing the data in the end.
Just been looking at a similar issue (google: "testing scientific software") and came up with a few papers that may be of interest. These cover both the mundane coding errors and the bigger issues of knowing if the result is even right (depth of the Earth's mantle?)
http://http.icsi.berkeley.edu/ftp/pub/speech/papers/wikipapers/cox_harris_testing_numerical_software.pdf
http://www.cs.ua.edu/~SECSE09/Presentations/09_Hook.pdf (broken link; new link is http://www.se4science.org/workshops/secse09/Presentations/09_Hook.pdf)
http://www.associationforsoftwaretesting.org/?dl_name=DianeKellyRebeccaSanders_TheChallengeOfTestingScientificSoftware_paper.pdf
I thought the idea of mutation testing described in 09_Hook.pdf (see also matmute.sourceforge.net) is particularly interesting as it mimics the simple mistakes we all make. The hardest part is to learn to use statistical analysis for confidence levels, rather than single pass code reviews (man or machine).
The problem is not new. I'm sure I have an original copy of "How accurate is scientific software?" by Hatton et al Oct 1994, that even then showed how different implementations of the same theories (as algorithms) diverged rather rapidly (It's also ref 8 in Kelly & Sanders paper)
--- (Oct 2019)
More recently Testing Scientific Software: A Systematic Literature Review
I'm also using cpptest for its TEST_ASSERT_DELTA. I'm writing high-performance numerical programs in computational electromagnetics and I've been happily using it in my C++ programs.
I typically go about testing scientific code the same way as I do with any other kind of code, with only a few retouches, namely:
I always test my numerical codes for cases that make no physical sense and make sure the computation actually stops before producing a result. I learned this the hard way: I had a function that was computing some frequency responses, then supplied a matrix built with them to another function as arguments which eventually gave its answer a single vector. The matrix could have been any size depending on how many terminals the signal was applied to, but my function was not checking if the matrix size was consistent with the number of terminals (2 terminals should have meant a 2 x 2 x n matrix); however, the code itself was wrapped so as not to depend on that, it didn't care what size the matrices were since it just had to do some basic matrix operations on them. Eventually, the results were perfectly plausible, well within the expected range and, in fact, partially correct -- only half of the solution vector was garbled. It took me a while to figure. If your data looks correct, it's assembled in a valid data structure and the numerical values are good (e.g. no NaNs or negative number of particles) but it doesn't make physical sense, the function has to fail gracefully.
I always test the I/O routines even if they are just reading a bunch of comma-separated numbers from a test file. When you're writing code that does twisted math, it's always tempting to jump into debugging the part of the code that is so math-heavy that you need a caffeine jolt just to understand the symbols. Days later, you realize you are also adding the ASCII value of \n to your list of points.
When testing for a mathematical relation, I always test it "by the book", and I also learned this by example. I've seen code that was supposed to compare two vectors but only checked for equality of elements and did not check for equality of length.
Please take a look at the answers to the SO question How to use TDD correctly to implement a numerical method?