c++ numerical analysis Accurate data structure? - c++

Using double type I made Cubic Spline Interpolation Algorithm.
That work was success as it seems, but there was a relative error around 6% when very small values calculated.
Is double data type enough for accurate scientific numerical analysis?

Double has plenty of precision for most applications. Of course it is finite, but it's always possible to squander any amount of precision by using a bad algorithm. In fact, that should be your first suspect. Look hard at your code and see if you're doing something that lets rounding errors accumulate quicker than necessary, or risky things like subtracting values that are very close to each other.

Scientific numerical analysis is difficult to get right which is why I leave it the professionals. Have you considered using a numeric library instead of writing your own? Eigen is my current favorite here: http://eigen.tuxfamily.org/index.php?title=Main_Page
I always have close at hand the latest copy of Numerical Recipes (nr.com) which does have an excellent chapter on interpolation. NR has a restrictive license but the writers know what they are doing and provide a succinct writeup on each numerical technique. Other libraries to look at include: ATLAS and GNU Scientific Library.
To answer your question double should be more than enough for most scientific applications, I agree with the previous posters it should like an algorithm problem. Have you considered posting the code for the algorithm you are using?

If double is enough for your needs depends on the type of numbers you are working with. As Henning suggests, it is probably best to take a look at the algorithms you are using and make sure they are numerically stable.
For starters, here's a good algorithm for addition: Kahan summation algorithm.

Double precision will be mostly suitable for any problem but the cubic spline will not work well if the polynomial or function is quickly oscillating or repeating or of quite high dimension.
In this case it can be better to use Legendre Polynomials since they handle variants of exponentials.
By way of a simple example if you use, Euler, Trapezoidal or Simpson's rule for interpolating within a 3rd order polynomial you won't need a huge sample rate to get the interpolant (area under the curve). However, if you apply these to an exponential function the sample rate may need to greatly increase to avoid loosing a lot of precision. Legendre Polynomials can cater for this case much more readily.

Related

What solution quality can we expect for a linear programming solver?

I am trying to solve linear constraint satisfaction problems. So I grabbed the "GNU Linear Programming Kit," wrote my constraints, and let it loose on it with some simple objective function.
GLPK claimed to find a solution, but if I check it against the constraints, they are not satisfied. Namely an expression that should be <= 0 is actually around 1e-10. I.e., slightly greater than 0.
I can probably live with the issue, by setting up my constraints to return the Chebyshev centre of the polyhedron, but I wonder if such discrepancies are to be expected with linear programming solvers, or I should report it as a bug for the GLPK folks.
All LP solvers use feasibility and other tolerances. These are needed because floating-point computations are not exact. You can tighten them a bit, but in general, it is better not to touch them.
So, you should expect solutions with the following properties:
variables are slightly outside their bounds
constraints may be violated by a small amount
binary and integer variables are slightly non-integer

C++ armadillo not correctly solving poorly conditioned matrix

I have a relatively simple question regarding the linear solver built into Armadillo. I am a relative newcomer to C++ but have experience coding in other languages. I am solving a fluid flow problem by successive linearization, using the armadillo function Solve(A,b) to get the solution at each iteration.
The issue that I am running into is that my matrix is very ill-conditioned. The determinant is on the order of 10^-20 and the condition number is 75000. I know these are terrible conditions but it's what I've got. Does anyone know if it is possible to specify the precision in my A matrix and in the solve function to something beyond double (long double perhaps)? I know that there are double matrix classes in Armadillo but I haven't found any documentation for higher levels of precision.
To approach this from another angle, I wrote some code in Mathematica and the LinearSolve worked very well and the program converged to the correct answer. My reasoning is that Mathematica variables have higher precision which can handle the higher levels of rounding error.
If anyone has any insight on this, please let me know. I know there are other ways to approach a poorly conditioned matrix (like preconditioning and pivoting), but my work is more in the physics than in the actual numerical solution so I'm trying to steer clear of that.
EDIT: I just limited the precision in the Mathematica version to 15 decimal places and the program still converges. This leads me to believe it is NOT a variable precision question but rather an issue with the method.
As you said "your work is more in the physics": rather than trying to increase the accuracy, I would use the Moore-Penrose Pseudo-Inverse, which in Armadillo can be obtained by the function pinv. You should then experience a bit with the parameter tolerance to set it to a reasonable level.
The geometrical interpretation is as follows: bad condition numbers are due to the fact that the row/column-vectors are linearly dependent. In physics, such linearly dependencies usually have an origin which at least needs to be interpreted. The pseudoinverse first projects the matrix onto a lower dimensional space in which the vectors are "less linearly dependent" by dropping all singular vectors with singular values smaller than the parameter tolerance. The reulting matrix has a better condition number such that the standard inverse can be constructed with less problems.

Matlab - Cumulative distribution function (CDF)

I'm in the middle of a code translation from Matlab to C++, and for some important reasons I must obtain the cumulative distribution function of a 'normal' function (in matlab, 'norm') with mean=0 and variance=1.
The implementation in Matlab is something like this:
map.c = cdf( 'norm', map.c, 0,1 );
Which is suppose to be the equalization of the histogram from map.c.
The problem comes at the moment of translating it into C++, due to the lack of decimals I have. I tried a lot of typical cdf implementations: such as the C++ code I found here,
Cumulative Normal Distribution Function in C/C++ but I got an important lack of decimals, so I tried the boost implementation:
#include "boost/math/distributions.hpp"
boost::math::normal_distribution<> d(0,1);
but still it is not the same implementation as Matlab's (I guess it seems to be even more precise!)
Does anyone know where I could find the original Matlab source for such a process, or which is the correct amount of decimals I should consider?
Thanks in advance!
The Gaussian CDF is an interesting function. I do not know whether my answer will interest you, but it is likely to interest others who look up your question later, so here it is.
One can compute the CDF by integrating the Taylor series of the PDF term by term. This approach works well in the body of the Gaussian bell curve, only it fails numerically in the tails. In the tails, it wants special-function techniques. The best source I have read for this is N. N. Lebedev's Special Functions and Their Applications, Ch. 2, Dover, 1972.
Octave is an open-source Matlab clone. Here's the source code for Octave's implementation of normcdf: http://octave-nan.sourcearchive.com/documentation/1.0.6/normcdf_8m-source.html
It should be (almost) the same as Matlab's, if it helps you.
C and C++ support long double for a more precise floating point type. You could try using that in your implementation. You can check your compiler documentation to see if it provides an even higher precision floating point type. GCC 4.3 and higher provide __float128 which has even more precision.

C++ library for integer trigonometry, speed optimized with optional approximations?

I've reached the point in a project where it makes more sense to start building some support classes for vectors and misc trigonometry than keep using ad-hoc functions. I expect there to be many C++ libraries for this, but I don't want to sacrifice speed and features I am used to.
Specifically, I want to be able to use integer angles, and I want to keep the blazing speed afforded by approximations like this:
static inline int32_t sin_approx(int32_t angle)
//Angle is -32768 to 32767: Return -32768 to 32767
{
return (angle<<1) - ((angle*abs(angle))>>14);
}
So, before I needlessly roll my own, are there any really fast fixed point libraries for c++ with template classes such as vectors where I can specify the width of the integer used and that has fast approximations such as the one above that I should look at?
I went down this path a few years ago when I had to convert some audio fingerprinting code from floating-point to fixed-point. The hard parts were the DCT (which used a large cosine table) and a high-precision logarithm. I found surprisingly little in the way of existing libraries. Since then, I have heard that the original Sony PlayStation (PS1) had no floating-point support, so development forums (fori?) for it, if they still exist, might have what you are looking for.
Some people I have worked with have had good luck with the NewMat library, though it is geared toward linear algebra rather than trigonometry, and seems to focus on floating-point numbers. Still, its site leads to this list, which looks to be worth checking out. I also found spuc, a signal processing library that might be good for fixed-point support. And years ago I saw a signal processing template library (sptl) from Fraunhofer. I think it was proprietary, but may be available somehow.
All that being said, I think you are pretty close with what you have already. Since you have a sine function, you basically also have a cosine function, provided you transform the input appropriately (cos(x) == sin(x + pi/2)). Since the tangent is the quotient of the sine and cosine (tan(x) = sin(x) / cos(x)) you are basically there for the trigonometry.
Regarding vectors, don't the STL vector and valarray classes combined with STL algorithms get you pretty close, too? If not, there's always Boost's math libraries.
Sorry I can't point you to the silver bullet you are looking for, but what you are trying to do is rather uncommon these days. People who want precision usually go straight to floating-point, which has decent performance on modern processors, and lots of library support. Those who want speed on resource-constrained hardware usually don't need precision and aren't doing trig by the vector, and probably aren't doing C++ either. I think your best option is to roll your own. Try to think of it as applying the wheel design pattern in a new context, rather than reinventing it. :)

Least Squares Regression in C/C++

How would one go about implementing least squares regression for factor analysis in C/C++?
the gold standard for this is LAPACK. you want, in particular, xGELS.
When I've had to deal with large datasets and large parameter sets for non-linear parameter fitting I used a combination of RANSAC and Levenberg-Marquardt. I'm talking thousands of parameters with tens of thousands of data-points.
RANSAC is a robust algorithm for minimizing noise due to outliers by using a reduced data set. Its not strictly Least Squares, but can be applied to many fitting methods.
Levenberg-Marquardt is an efficient way to solve non-linear least-squares numerically.
The convergence rate in most cases is between that of steepest-descent and Newton's method, without requiring the calculation of second derivatives. I've found it to be faster than Conjugate gradient in the cases I've examined.
The way I did this was to set up the RANSAC an outer loop around the LM method. This is very robust but slow. If you don't need the additional robustness you can just use LM.
Get ROOT and use TGraph::Fit() (or TGraphErrors::Fit())?
Big, heavy piece of software to install just of for the fitter, though. Works for me because I already have it installed.
Or use GSL.
If you want to implement an optimization algorithm by yourself Levenberg-Marquard seems to be quite difficult to implement. If really fast convergence is not needed, take a look at the Nelder-Mead simplex optimization algorithm. It can be implemented from scratch in at few hours.
http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
Have a look at
http://www.alglib.net/optimization/
They have C++ implementations for L-BFGS and Levenberg-Marquardt.
You only need to work out the first derivative of your objective function to use these two algorithms.
I've used TNT/JAMA for linear least-squares estimation. It's not very sophisticated but is fairly quick + easy.
Lets talk first about factor analysis since most of the discussion above is about regression. Most of my experience is with software like SAS, Minitab, or SPSS, that solves the factor analysis equations, so I have limited experience in solving these directly. That said, that the most common implementations do not use linear regression to solve the equations. According to this, the most common methods used are principal component analysis and principal factor analysis. In a text on Applied Multivariate Analysis (Dallas Johnson), no less that seven methods are documented each with their own pros and cons. I would strongly recommend finding an implementation that gives you factor scores rather than programming a solution from scratch.
The reason why there's different methods is that you can choose exactly what you're trying to minimize. There a pretty comprehensive discussion of the breadth of methods here.