Eigen's Conjugate Gradient vs SimplicialLLT for Poisson Equation - c++

I am using finite differences for a square 100x100 domain (with neumann bcs on all sides) in c++ using Eigen's sparse matrix functionality, and built in solvers to compute x in Ax=b.
I have tried the following solvers, but am getting very different time results to what I expect from reading the documentation in http://eigen.tuxfamily.org/dox-devel/group__TopicSparseSystems.html which gives typical timescales for different solvers. In particular, the documentation suggests that conjugate gradients should be one of the fastest ways of solving this system, giving a timescale of 0.239 seconds for solving Poisson SPD with a size bigger than my system. By contrast this documentation suggests that SimplicialLLT should take roughly 3X as long.
When I run each of the solvers I am obtaining the following:
- Conjugate gradients: 25 seconds
- LLT: 0.35 seconds
I was wondering whether anyone could help me understand why there is two orders of magnitude between these two solvers, and in particular, why CG seems to be being beaten by LLT, in contrast to the literature? Also, if anyone else has an idea how I could substantially speed up the solvers by using different methods from other packages, then suggestions are welcome!
I am implementing the solvers by the following:
//Conjugate gradients
ConjugateGradient<SparseMatrix<double> > cg;
cg.compute(A);
MatrixXd vGDNF = cg.solve(b);
//SimplicialLLT
SimplicialLLT<SparseMatrix<double>> solver;
MatrixXd vGDNF = solver.compute(A).solve(b);
Here A is that of a finite difference Laplacian, and b is a vector of point sources of the field.
Thanks!
Ben

On this documentation, Poisson_SPD corresponds to a 3D problem which is much harder for direct methods as Cholesky because there are much more fill-in in the factors. As the same documentation page says in the first table, for 2D Laplacian/Poisson problems as yours, SimplicialLLT is the recommended solver.

Related

C++ How do I solve very large system of sparse linear system

I am trying to solve a very large and sparse system of linear equations in C++. Currently, I am using BiCGSTAB from eigen. It works fine for small matrix, but it is taking just too much time for matrix of the size I need, which is 40804x40804 (It could be even larger in the future).
I have a very long script, but I simply used the following format:
SparseMatrix<double> sj(40804,40804);
VectorXd c_(40804), sf(40804);
sj.reserve(VectorXi::Constant(40804,36)); //This is a very good estimate of how many non zeros in each column
//...Fill in actual number in sj
sj.makeCompressed();
BiCGSTAB<SparseMatrix<double> > handler;
//...Fill in sj, only in the entries that have been initialized previously
handler.analyzePattern(sj)
handler.factorize(sj);
c_.setZero();
c_=handler.solve(sf);
This takes way too long! And yes, the solution does exist. Sparse function in matlab seems to handle this very well, but I need it in C++ in order to connect to a server.
I would really appreciate it you could help me!
You should consider use of one of the advanced sparse direct solvers: CHOLMOD
Sparse direct solvers are a fundamental tool in computational analysis, providing a very general method for obtaining high-quality results to almost any problem. CHOLMOD is a high performance library for sparse Cholesky factorization.
I guarantee that this package definetly will help you. Moreover CHOLMOD has supported GPU acceleration since 2012 with version 4.0.0 . In SuiteSparse-4.3.1 performance has been further improved, providing speedups of 3x or greater vs. the CPU for the sparse factorization operation.
If your matrices are the representations of graphs you can also consider METIS with combination of CHOLMOD. Which means you will be able to do partition/domainDecomposition in graphs then parallel solve with CHOLMOD.
SuiteSparse is a powerfull tool with the support of linear(KLU) and direct solvers.
Here are the GitHub link, UserGuide and SuiteSparse's home page

batch CUDA solution of sparse banded Ax=b for various b's

I have a sparse banded matrix A and I'd like to (direct) solve Ax=b. I have about 500 vectors b, so I'd like to solve for the corresponding 500 x's.
I'm brand new to CUDA, so I'm a little confused as to what options I have available.
cuSOLVER has a batch direct solver cuSolverSP for sparse A_i x_i = b_i using QR here. (I'd be fine with LU too since A is decently conditioned.) However, as far as I can tell, I can't exploit the fact that all my A_i's are the same.
Would an alternative option be to first determine a sparse LU (QR) factorization on the CPU or GPU then perform in parallel the backsubstitution (respectively, backsub and matrix mult) on the GPU? If cusolverSp< t >csrlsvlu() is for one b_i, is there a standard way to batch perform this operation for multiple b_i's?
Finally, since I don't have intuition for this, should I expect a speedup on a GPU for either of these options, given the necessary overhead? x has length ~10000-100000. Thanks.
I'm currently working on something similar myself. I decided to basically wrap the conjugate gradient and level-0 incomplete cholesky preconditioned conjugate gradient solvers utility samples that came with the CUDA SDK into a small class.
You can find them in your CUDA_HOME directory under the path:
samples/7_CUDALibraries/conjugateGradient and /Developer/NVIDIA/CUDA-samples/7_CUDALibraries/conjugateGradientPrecond
Basically, you would load the matrix into the device memory once (and for ICCG, compute the corresponding conditioner / matrix analysis) then call the solve kernel with different b vectors.
I don't know what you anticipate your matrix band structure to look like, but if it is symmetric and either diagonally dominant (off diagonal bands along each row and column are opposite sign of diagonal and their sum is less than the diagonal entry) or positive definite (no eigenvectors with an eigenvalue of 0.) then CG and ICCG should be useful. Alternately, the various multigrid algorithms are another option if you are willing to go through coding them up.
If your matrix is only positive semi-definite (e.g. has at least one eigenvector with an eigenvalue of zero) you can still ofteb get away with using CG or ICCG as long as you ensure that:
1) The right hand side (b vectors) are orthogonal to the null space (null space meaning eigenvectors with an eigenvalue of zero).
2) The solution you obtain is orthogonal to the null space.
It is interesting to note that if you do have a non-trivial null space, then different numeric solvers can give you different answers for the same exact system. The solutions will end up differing by a linear combination of the null space... That problem has caused me many many man hours of debugging and frustration before I finally caught on, so its good to be aware of it.
Lastly, if your matrix has a Circulant Band structure you might consider using a fast fourier transform (FFT) based solver. FFT based numerical solvers can often yield superior performance in cases where they are applicable.
is there a standard way to batch perform this operation for multiple b_i's?
One option is to use the batched refactorization module in CUDA's cuSOLVER, but I am not sure if it is standard.
Batched refactorization module in cuSOLVER provides an efficient method to solve batches of linear systems with fixed left-hand side sparse matrix (or matrices with fixed sparsity pattern but varying coefficients) and varying right-hand sides, based on LU decomposition. Only some partially completed code snippets can be found in the official documentation (as of CUDA 10.1) that are related to it. A complete example can be found here.
If you don't mind going with an open-source library, you could also check out CUSP:
CUSP Quick Start Page
It has a fairly decent suite of solvers, including a few preconditioned methods:
CUSP Preconditioner Examples
The smoothed aggregation preconditioner (a variant of algebraic multigrid) seems to work very well as long as your GPU has enough onboard memory for it.

C++ armadillo not correctly solving poorly conditioned matrix

I have a relatively simple question regarding the linear solver built into Armadillo. I am a relative newcomer to C++ but have experience coding in other languages. I am solving a fluid flow problem by successive linearization, using the armadillo function Solve(A,b) to get the solution at each iteration.
The issue that I am running into is that my matrix is very ill-conditioned. The determinant is on the order of 10^-20 and the condition number is 75000. I know these are terrible conditions but it's what I've got. Does anyone know if it is possible to specify the precision in my A matrix and in the solve function to something beyond double (long double perhaps)? I know that there are double matrix classes in Armadillo but I haven't found any documentation for higher levels of precision.
To approach this from another angle, I wrote some code in Mathematica and the LinearSolve worked very well and the program converged to the correct answer. My reasoning is that Mathematica variables have higher precision which can handle the higher levels of rounding error.
If anyone has any insight on this, please let me know. I know there are other ways to approach a poorly conditioned matrix (like preconditioning and pivoting), but my work is more in the physics than in the actual numerical solution so I'm trying to steer clear of that.
EDIT: I just limited the precision in the Mathematica version to 15 decimal places and the program still converges. This leads me to believe it is NOT a variable precision question but rather an issue with the method.
As you said "your work is more in the physics": rather than trying to increase the accuracy, I would use the Moore-Penrose Pseudo-Inverse, which in Armadillo can be obtained by the function pinv. You should then experience a bit with the parameter tolerance to set it to a reasonable level.
The geometrical interpretation is as follows: bad condition numbers are due to the fact that the row/column-vectors are linearly dependent. In physics, such linearly dependencies usually have an origin which at least needs to be interpreted. The pseudoinverse first projects the matrix onto a lower dimensional space in which the vectors are "less linearly dependent" by dropping all singular vectors with singular values smaller than the parameter tolerance. The reulting matrix has a better condition number such that the standard inverse can be constructed with less problems.

What is a fast simple solver for a large Laplacian matrix?

I need to solve some large (N~1e6) Laplacian matrices that arise in the study of resistor networks. The rest of the network analysis is being handled with boost graph and I would like to stay in C++ if possible. I know there are lots and lots of C++ matrix libraries but no one seems to be a clear leader in speed or usability. Also, the many questions on the subject, here and elsewhere seem to rapidly devolve into laundry lists which are of limited utility. In an attempt to help myself and others, I will try to keep the question concise and answerable:
What is the best library that can effectively handle the following requirements?
Matrix type: Symmetric Diagonal Dominant/Laplacian
Size: Very large (N~1e6), no dynamic resizing needed
Sparsity: Extreme (maximum 5 nonzero terms per row/column)
Operations needed: Solve for x in A*x=b and mat/vec multiply
Language: C++ (C ok)
Priority: Speed and simplicity to code. I would really rather avoid having to learn a whole new framework for this one problem or have to manually write too much helper code.
Extra love to answers with a minimal working example...
If you want to write your own solver, in terms of simplicity, it's hard to beat Gauss-Seidel iteration. The update step is one line, and it can be parallelized easily. Successive over-relaxation (SOR) is only slightly more complicated and converges much faster.
Conjugate gradient is also straightforward to code, and should converge much faster than the other iterative methods. The important thing to note is that you don't need to form the full matrix A, just compute matrix-vector products A*b. Once that's working, you can improve the convergance rate again by adding a preconditioner like SSOR (Symmetric SOR).
Probably the fastest solution method that's reasonable to write yourself is a Fourier-based solver. It essentially involves taking an FFT of the right-hand side, multiplying each value by a function of its coordinate, and taking the inverse FFT. You can use an FFT library like FFTW, or roll your own.
A good reference for all of these is A First Course in the Numerical Analysis of Differential Equations by Arieh Iserles.
Eigen is quite nice to use and one of the fastest libraries I know:
http://eigen.tuxfamily.org/dox/group__TutorialSparse.html
There is a lot of related post, you could have look.
I would recommend C++ and Boost::ublas as used in UMFPACK and BOOST's uBLAS Sparse Matrix

Matrix logarithm algorithm

is there any way to compute the matrix logarithm in OpenCV? I understand that it's not available as a library function, but, pointers to a good source (paper, textbook, etc) will be appreciated.
As a matter of fact, I'm in the process of programming the matrix logarithm in the Eigen library which is apparently used in some Willow Garage libraries; not sure about OpenCV. Higham's book (see answer by aix) is the best reference in my opinion and I'm implementing Algorithm 11.11 in his book. That is a rather complicated algorithm though.
Diagonalization (as in Alexandre's comment) is an easy-to-program method which works very well for symmetric positive definite matrices. It also works well for many general matrices. However, it is not accurate for matrices whose eigenvalues are close together, and it fails for matrices that are not diagonalizable.
If you want something more robust than diagonalization but less complicated than Higham's Algorithm 11.11 then I'd recommend to do a Schur decomposition followed by inverse scaling and squaring. This is Algorithm 11.10 in Higham's book, and described in the paper "Approximating the Logarithm of a Matrix to Specified Accuracy" (http://dx.doi.org/10.1137/S0895479899364015, preprint at http://eprints.ma.man.ac.uk/318/).
If you use OpenCV matrices, you can easily map them to Eigen3 matrices. See this post:
OpenCV CV::Mat and Eigen::Matrix
Then, Eigen3 library has a matrix logarithm function that you can use:
http://eigen.tuxfamily.org/dox/unsupported/group__MatrixFunctions__Module.html
it is under the unsupported module, but this is not an issue, this just means:
These modules are contributions from various users. They are provided
"as is", without any support.
-- http://eigen.tuxfamily.org/dox/unsupported/
Hope this is of help.