C++: Solve underdetermined linear system with lapack - c++

Let's put it simple: I have an under-determined linear system of equations
Ax = b
and I want to get one valid solution, no matter which one of the infinite solutions for the system. And I want to get it as efficiently as possible.
I have checked general LAPACK routines and it seems that they cannot handle the under-determined case. For example, dgesv(), whose documentation is found here, will return and integer larger than 1 in INFO if the factor U, from PLU factorization, is singular, and it will not solve the system if that is the case.
I have also checked some routines for Linear Least Squares problems (LLS) (documentation here), which does exactly solve my problem, just not as efficiently as I wished. If the LLS problems I provide is under-determined, the LLS routine will return the vector that minimizes
||Ax-b||
Which is a valid solution. However, it is calculated as the solution to an optimization problem, and I was wondering if there is a more efficient way of obtaining a valid solution for my under-determined problem.
A similar question was made here, but I believe that my question is more concrete than that: I am using LAPACK, and I want to solve an under-determined system of linear equations as efficiently as possible.

For an under-determined system of equations:
The correct approach is to use singular value decomposition (SVD). Lapack offers singular value decomposition in the form of dgesvd.
To perform the SVD you will have to homogenize your problem to turn it into a matrix problem of the form: My = 0. This is easy to do by introducing another degree of freedom (another variable). This will transform the vector x -> y and matrix A -> M. When performing the SVD on the matrix M, the smallest singular vector will be the solution to your under-determined least squares problem.
I would recommend using matlab or octave to experiment before wasting time coding anything up.

Related

batch CUDA solution of sparse banded Ax=b for various b's

I have a sparse banded matrix A and I'd like to (direct) solve Ax=b. I have about 500 vectors b, so I'd like to solve for the corresponding 500 x's.
I'm brand new to CUDA, so I'm a little confused as to what options I have available.
cuSOLVER has a batch direct solver cuSolverSP for sparse A_i x_i = b_i using QR here. (I'd be fine with LU too since A is decently conditioned.) However, as far as I can tell, I can't exploit the fact that all my A_i's are the same.
Would an alternative option be to first determine a sparse LU (QR) factorization on the CPU or GPU then perform in parallel the backsubstitution (respectively, backsub and matrix mult) on the GPU? If cusolverSp< t >csrlsvlu() is for one b_i, is there a standard way to batch perform this operation for multiple b_i's?
Finally, since I don't have intuition for this, should I expect a speedup on a GPU for either of these options, given the necessary overhead? x has length ~10000-100000. Thanks.
I'm currently working on something similar myself. I decided to basically wrap the conjugate gradient and level-0 incomplete cholesky preconditioned conjugate gradient solvers utility samples that came with the CUDA SDK into a small class.
You can find them in your CUDA_HOME directory under the path:
samples/7_CUDALibraries/conjugateGradient and /Developer/NVIDIA/CUDA-samples/7_CUDALibraries/conjugateGradientPrecond
Basically, you would load the matrix into the device memory once (and for ICCG, compute the corresponding conditioner / matrix analysis) then call the solve kernel with different b vectors.
I don't know what you anticipate your matrix band structure to look like, but if it is symmetric and either diagonally dominant (off diagonal bands along each row and column are opposite sign of diagonal and their sum is less than the diagonal entry) or positive definite (no eigenvectors with an eigenvalue of 0.) then CG and ICCG should be useful. Alternately, the various multigrid algorithms are another option if you are willing to go through coding them up.
If your matrix is only positive semi-definite (e.g. has at least one eigenvector with an eigenvalue of zero) you can still ofteb get away with using CG or ICCG as long as you ensure that:
1) The right hand side (b vectors) are orthogonal to the null space (null space meaning eigenvectors with an eigenvalue of zero).
2) The solution you obtain is orthogonal to the null space.
It is interesting to note that if you do have a non-trivial null space, then different numeric solvers can give you different answers for the same exact system. The solutions will end up differing by a linear combination of the null space... That problem has caused me many many man hours of debugging and frustration before I finally caught on, so its good to be aware of it.
Lastly, if your matrix has a Circulant Band structure you might consider using a fast fourier transform (FFT) based solver. FFT based numerical solvers can often yield superior performance in cases where they are applicable.
is there a standard way to batch perform this operation for multiple b_i's?
One option is to use the batched refactorization module in CUDA's cuSOLVER, but I am not sure if it is standard.
Batched refactorization module in cuSOLVER provides an efficient method to solve batches of linear systems with fixed left-hand side sparse matrix (or matrices with fixed sparsity pattern but varying coefficients) and varying right-hand sides, based on LU decomposition. Only some partially completed code snippets can be found in the official documentation (as of CUDA 10.1) that are related to it. A complete example can be found here.
If you don't mind going with an open-source library, you could also check out CUSP:
CUSP Quick Start Page
It has a fairly decent suite of solvers, including a few preconditioned methods:
CUSP Preconditioner Examples
The smoothed aggregation preconditioner (a variant of algebraic multigrid) seems to work very well as long as your GPU has enough onboard memory for it.

C++ armadillo not correctly solving poorly conditioned matrix

I have a relatively simple question regarding the linear solver built into Armadillo. I am a relative newcomer to C++ but have experience coding in other languages. I am solving a fluid flow problem by successive linearization, using the armadillo function Solve(A,b) to get the solution at each iteration.
The issue that I am running into is that my matrix is very ill-conditioned. The determinant is on the order of 10^-20 and the condition number is 75000. I know these are terrible conditions but it's what I've got. Does anyone know if it is possible to specify the precision in my A matrix and in the solve function to something beyond double (long double perhaps)? I know that there are double matrix classes in Armadillo but I haven't found any documentation for higher levels of precision.
To approach this from another angle, I wrote some code in Mathematica and the LinearSolve worked very well and the program converged to the correct answer. My reasoning is that Mathematica variables have higher precision which can handle the higher levels of rounding error.
If anyone has any insight on this, please let me know. I know there are other ways to approach a poorly conditioned matrix (like preconditioning and pivoting), but my work is more in the physics than in the actual numerical solution so I'm trying to steer clear of that.
EDIT: I just limited the precision in the Mathematica version to 15 decimal places and the program still converges. This leads me to believe it is NOT a variable precision question but rather an issue with the method.
As you said "your work is more in the physics": rather than trying to increase the accuracy, I would use the Moore-Penrose Pseudo-Inverse, which in Armadillo can be obtained by the function pinv. You should then experience a bit with the parameter tolerance to set it to a reasonable level.
The geometrical interpretation is as follows: bad condition numbers are due to the fact that the row/column-vectors are linearly dependent. In physics, such linearly dependencies usually have an origin which at least needs to be interpreted. The pseudoinverse first projects the matrix onto a lower dimensional space in which the vectors are "less linearly dependent" by dropping all singular vectors with singular values smaller than the parameter tolerance. The reulting matrix has a better condition number such that the standard inverse can be constructed with less problems.

What is a fast simple solver for a large Laplacian matrix?

I need to solve some large (N~1e6) Laplacian matrices that arise in the study of resistor networks. The rest of the network analysis is being handled with boost graph and I would like to stay in C++ if possible. I know there are lots and lots of C++ matrix libraries but no one seems to be a clear leader in speed or usability. Also, the many questions on the subject, here and elsewhere seem to rapidly devolve into laundry lists which are of limited utility. In an attempt to help myself and others, I will try to keep the question concise and answerable:
What is the best library that can effectively handle the following requirements?
Matrix type: Symmetric Diagonal Dominant/Laplacian
Size: Very large (N~1e6), no dynamic resizing needed
Sparsity: Extreme (maximum 5 nonzero terms per row/column)
Operations needed: Solve for x in A*x=b and mat/vec multiply
Language: C++ (C ok)
Priority: Speed and simplicity to code. I would really rather avoid having to learn a whole new framework for this one problem or have to manually write too much helper code.
Extra love to answers with a minimal working example...
If you want to write your own solver, in terms of simplicity, it's hard to beat Gauss-Seidel iteration. The update step is one line, and it can be parallelized easily. Successive over-relaxation (SOR) is only slightly more complicated and converges much faster.
Conjugate gradient is also straightforward to code, and should converge much faster than the other iterative methods. The important thing to note is that you don't need to form the full matrix A, just compute matrix-vector products A*b. Once that's working, you can improve the convergance rate again by adding a preconditioner like SSOR (Symmetric SOR).
Probably the fastest solution method that's reasonable to write yourself is a Fourier-based solver. It essentially involves taking an FFT of the right-hand side, multiplying each value by a function of its coordinate, and taking the inverse FFT. You can use an FFT library like FFTW, or roll your own.
A good reference for all of these is A First Course in the Numerical Analysis of Differential Equations by Arieh Iserles.
Eigen is quite nice to use and one of the fastest libraries I know:
http://eigen.tuxfamily.org/dox/group__TutorialSparse.html
There is a lot of related post, you could have look.
I would recommend C++ and Boost::ublas as used in UMFPACK and BOOST's uBLAS Sparse Matrix

Solving normal equation system in C++

I would like to solve the system of linear equations:
Ax = b
A is a n x m matrix (not square), b and x are both n x 1 vectors. Where A and b are known, n is from the order of 50-100 and m is about 2 (in other words, A could be maximum [100x2]).
I know the solution of x: $x = \inv(A^T A) A^T b$
I found few ways to solve it: uBLAS (Boost), Lapack, Eigen and etc. but i dont know how fast are the CPU computation time of 'x' using those packages. I also don't know if this numerically a fast why to solve 'x'
What is for my important is that the CPU computation time would be short as possible and good documentation since i am newbie.
After solving the normal equation Ax = b i would like to improve my approximation using regressive and maybe later applying Kalman Filter.
My question is which C++ library is the robuster and faster for the needs i describe above?
This is a least squares solution, because you have more unknowns than equations. If m is indeed equal to 2, that tells me that a simple linear least squares will be sufficient for you. The formulas can be written out in closed form. You don't need a library.
If m is in single digits, I'd still say that you can easily solve this using A(transpose)*A*X = A(transpose)*b. A simple LU decomposition to solve for the coefficients would be sufficient. It should be a much more straightforward problem than you're making it out to be.
uBlas is not optimized unless you use it with optimized BLAS bindings.
The following are optimized for multi-threading and SIMD:
Intel MKL. FORTRAN library with C interface. Not free but very good.
Eigen. True C++ library. Free and open source. Easy to use and good.
Atlas. FORTRAN and C. Free and open source. Not Windows friendly, but otherwise good.
Btw, I don't know exactly what are you doing, but as a rule normal equations are not a proper way to do linear regression. Unless your matrix is well conditioned, QR or SVD should be preferred.
If liscencing is not a problem, you might try the gnu scientific library
http://www.gnu.org/software/gsl/
It comes with a blas library that you can swap for an optimised library if you need to later (for example the intel, ATLAS, or ACML (AMD chip) library.
If you have access to MATLAB, I would recommend using its C libraries.
If you really need to specialize, you can approximate matrix inversion (to arbitrary precision) using the Skilling method. It uses order (N^2) operations only (rather than the order N^3 of usual matrix inversion - LU decomposition etc).
Its described in the thesis of Gibbs linked to here (around page 27):
http://www.inference.phy.cam.ac.uk/mng10/GP/thesis.ps.gz

Large matrix inversion methods

Hi I've been doing some research about matrix inversion (linear algebra) and I wanted to use C++ template programming for the algorithm , what i found out is that there are number of methods like: Gauss-Jordan Elimination or LU Decomposition and I found the function LU_factorize (c++ boost library)
I want to know if there are other methods , which one is better (advantages/disadvantages) , from a perspective of programmers or mathematicians ?
If there are no other faster methods is there already a (matrix) inversion function in the boost library ? , because i've searched alot and didn't find any.
As you mention, the standard approach is to perform a LU factorization and then solve for the identity. This can be implemented using the LAPACK library, for example, with dgetrf (factor) and dgetri (compute inverse). Most other linear algebra libraries have roughly equivalent functions.
There are some slower methods that degrade more gracefully when the matrix is singular or nearly singular, and are used for that reason. For example, the Moore-Penrose pseudoinverse is equal to the inverse if the matrix is invertible, and often useful even if the matrix is not invertible; it can be calculated using a Singular Value Decomposition.
I'd suggest you to take a look at Eigen source code.
Please Google or Wikipedia for the buzzwords below.
First, make sure you really want the inverse. Solving a system does not require inverting a matrix. Matrix inversion can be performed by solving n systems, with unit basis vectors as right hand sides. So I'll focus on solving systems, because it is usually what you want.
It depends on what "large" means. Methods based on decomposition must generally store the entire matrix. Once you have decomposed the matrix, you can solve for multiple right hand sides at once (and thus invert the matrix easily). I won't discuss here factorization methods, as you're likely to know them already.
Please note that when a matrix is large, its condition number is very likely to be close to zero, which means that the matrix is "numerically non-invertible". Remedy: Preconditionning. Check wikipedia for this. The article is well written.
If the matrix is large, you don't want to store it. If it has a lot of zeros, it is a sparse matrix. Either it has structure (eg. band diagonal, block matrix, ...), and you have specialized methods for solving systems involving such matrices, or it has not.
When you're faced with a sparse matrix with no obvious structure, or with a matrix you don't want to store, you must use iterative methods. They only involve matrix-vector multiplications, which don't require a particular form of storage: you can compute the coefficients when you need them, or store non-zero coefficients the way you want, etc.
The methods are:
For symmetric definite positive matrices: conjugate gradient method. In short, solving Ax = b amounts to minimize 1/2 x^T A x - x^T b.
Biconjugate gradient method for general matrices. Unstable though.
Minimum residual methods, or best, GMRES. Please check the wikipedia articles for details. You may want to experiment with the number of iterations before restarting the algorithm.
And finally, you can perform some sort of factorization with sparse matrices, with specially designed algorithms to minimize the number of non-zero elements to store.
depending on the how large the matrix actually is, you probably need to keep only a small subset of the columns in memory at any given time. This might require overriding the low-level write and read operations to the matrix elements, which i'm not sure if Eigen, an otherwise pretty decent library, will allow you to.
For These very narrow cases where the matrix is really big, There is StlXXL library designed for memory access to arrays that are mostly stored in disk
EDIT To be more precise, if you have a matrix that does not fix in the available RAM, the preferred approach is to do blockwise inversion. The matrix is split recursively until each matrix does fit in RAM (this is a tuning parameter of the algorithm of course). The tricky part here is to avoid starving the CPU of matrices to invert while they are pulled in and out of disk. This might require to investigate in appropiate parallel filesystems, since even with StlXXL, this is likely to be the main bottleneck. Although, let me repeat the mantra; Premature optimization is the root of all programming evil. This evil can only be banished with the cleansing ritual of Coding, Execute and Profile
You might want to use a C++ wrapper around LAPACK. The LAPACK is very mature code: well-tested, optimized, etc.
One such wrapper is the Intel Math Kernel Library.