Eigen linear algebra solvers seem slow - c++

I want to solve a linear algebraic equation Ax = b using Eigen solvers.
In my case, A is a complex sparse matrix(26410*26410), b is a real vector (26410*1).
I use mex file in MATLAB to map the sparse matrix A and vector b to Eigen accepted format. The reason why I use Eigen solver is to hope it would be faster than solving directly in MATLAB using x = A\b.
However, after tried LDLT, SparseLU, CG and BiCGSTAB, I found the results are not very satisfying:
LDLT takes 1.462s with norm(A*x - b)/norm(b) = 331;
SparseLU takes 37.994s with 1.5193e-4;
BiCGSTAB takes 95.217s with 4.5977e-4;
On the contrast, directly use x = A\b in MATLAB consumes 13.992s with norm of the error 2.606e-5.
I know it is a little stupid and also time consuming to map the sparse matrix A and vector b in MATLAB workspace to Eigen. But I am wondering whether the results I got are the best results which Eigen can give? Anyone can give me some pointers? Should I try some other linear equation solvers? Thanks a lot in advance! The following is the main part of codes.
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
//input vars
//temp var
size_t nrows;
//output vars
//double *x;
//GetData
/* check inputs
...*/
//"mxArray2SCM" is a sub-function for map the complex sparse matrix in Eigen
SpCMat A = mxArray2SCM(prhs[0]);
//SpMat A = mxArray2SM(prhs[0]);
//"mxArray2ECV" is a sub-function for map the real vector in Eigen
Eigen::VectorXcd b = mxArray2ECV(prhs[1]);
//Eigen::VectorXd b = mxArray2EV(prhs[1]);
nrows = b.size();
//Computation
Eigen::VectorXcd x(nrows);
//SparseLU<SparseMatrix<CD> > solver;
BiCGSTAB<SparseMatrix<CD>,IncompleteLUT<CD> > BiCG;
//BiCG.preconditioner().setDroptol(0.001);
BiCG.compute(A);
if(BiCG.info()!=Success){
//decomposition failed
return;
}
x = BiCG.solve(b);
//Output results
plhs[0] = ECV2mxArray(x);
}

Have you considered using PetSc for Krylov solvers or SLEPc to compute eigenvalues?
Make sure you analyze the eigenspectrum before using a specific Krylov solver (CG works only for symmetric positive definite matrices).
PETSc has quite a few solvers that you can try out based on your eigenspectrum.
You can check Y. Saad's book on how these solvers work.
If your matrix is not symmetric positive definite GMRES is a good option.

Related

Eigen c++ triangular from

I use C++ 14 and Eigen. For n x n matrix A how can I extract Q and R matrices using QR decomposition in Eigen, I tried to read the documentation but I'm disorientated
I've obtain only R:
HouseholderQR<MatrixXd> qr(A);
qr.compute(A);
MatrixXd R = qr.matrixQR().template triangularView<Upper>();
Anyway, I just want to convert matrix A into a triangular matrix (in a efficient way, around O(n^3) I think), which have the determinant equal to determinant of A, in this way accept any other methods to do this in Eigen. (or another Linear Algebra library, if you know some good libraries I waiting for suggestions )
You can get Q and R as follows:
Eigen::MatrixXd Q = qr.householderQ();
Eigen::MatrixXd QR = qr.matrixQR();
The R matrix is in the upper triangular portion of matrix QR. You can compute the determinant of R as R.diagonal().prod() which is equal in magnitude to A.determinant(). If you want to isolate the upper triangular
portion you can do this:
Eigen::MatrixXd T = R.triangularView<Eigen::UnitUpper>();

Is there a something like a sparse cube in armadillo or some way of using sparse matrices as slices in a cube?

I am using armadillos sparse matrices. But now I would like to use something like a "sparse cube" which does not exist in armadillo. writing sparse matrices into a cube with cube.slice(some_sparse_matrix) converts everything back to a dense cube.
I am using sparse matrices in order to multiply a vector with. for larger vectors/matrices the sparse variant is much faster. Now I have to sum up the multiplications of several sparse matrices with several vectors.
would a std:vector be a way?
In my experience it is faster to use armadillos functions (for example a subvector or arma::span() or arma::sum() )) as opposed to write loops myself. So I was wondering what would be the fastest way of doing this.
It's possible to approximate a sparse cube using the field class, like so.
arma::uword number_of_matrices = 10;
arma::uword number_of_rows = 5000;
arma::uword number_of_cols = 5000;
arma::field<arma::sp_mat> F(number_of_matrices);
F.for_each( [&](arma::sp_mat& X) { X.set_size(number_of_rows, number_of_cols); } );
F(0)(1,2) = 456.7; // write to element (1,2) in matrix 0
F(1)(2,3) = 567.8; // write to element (2,3) in matrix 1
F.print("F:"); // show all matrices
Your compiler must support at least C++11 for this to work.

Apply a large Matrix on QR decomposed

I have to convert the MATLAB code to C++ on eigen library,but I have some problems at QR decomposed, matlab has a function:
[Q,R]=qr(A,0); // A is m-by-n
It produces the economy-size decomposition.If m>n,only the first n columns of Q and the first n rows of R are computed. If m<=n,this is the same as [Q,R]=qr(A).
I have tried to compute it on eigen library. But the A is 20000x1000, so there is always a application crash at QR decomposed.And I don't know how to produce the economy-size decomposition on eigen or other ways.
How can I convert [Q,R]=qr(A,0) to C++/Eigen?
MatrixXd A(m,n);
HouseholderQR<MatrixXd> qr;
qr.compute(A);
temp= qr.matrixQR().triangularView<Upper>();
Q= qr.householderQ() * Eigen::MatrixXd::Identity(m, n);
R=temp.topRows(n);

Solving system linear equation of small matrices via Cramer's rule has large numerical error

I made the observation that when I solve a system of linear equation via the Cramer's rule (quotient of two determinants) of matrices of order N < 10, then I get quite a large residual error compared to LAPACK solution.
Here is an example:
float B00[36] __attribute__((aligned(16))) = {127.3611, -46.75962, 62.8739, -9.175959, 27.23792, 1.395347,
-46.75962, 841.5496, 406.2475, -119.3715, -33.60108, 6.269638,
62.8739, 406.2475, 1302.981, -542.8405, 95.03378, 42.77704,
-9.175959, -119.3715, -542.8405, 434.3342, 34.96918, -33.74546,
27.23792, -33.60108, 95.03378, 34.96918, 59.10199, -1.880791,
1.395347, 6.269638, 42.77704, -33.74546, -1.880791, 2.650853};
float c00[6] __attribute__((aligned(16))) = {-0.102149, -5.76615, -17.02828, 12.47396, 1.158018, -0.9571021};
Now linsolving this, yields for LAPACK (from Intel MKL):
x = [-0.000314947
-0.000589154
-0.00587876
0.0184799
0.01738
-0.0170484]
and the Cramer's rule (own implementation) yields:
x = [-0.000314933
-0.000798058
-0.00587888
0.0184808
0.017381
-0.0170508]
Note x[1] difference.
I can guarantee that the determinant calculation of mine is correct. Has anyone made a similar observation or can tell something about this?

Diagonalization of a 2x2 self-adjoined (hermitian) matrix

Diagonalizing a 2x2 hermitian matrix is simple, it can be done analytically. However, when it comes to calculating the eigenvalues and eigenvectors over >10^6 times, it is important to do it as efficient as possible. Especially if the off-diagonal elements can vanish it is not possible to use one formula for the eigenvectors: An if-statement is necessary, which of course slows down the code. Thus, I thought using Eigen, where it's stated that the diagonalization of 2x2 and 3x3 matrices is optimized, would be still a good choice:
using
const std::complex<double> I ( 0.,1. );
inline double block_distr ( double W )
{
return (-W/2. + rand() * W/RAND_MAX);
}
a test-loop would be
...
SelfAdjointEigenSolver<Matrix<complex< double >, 2, 2> > ces;
Matrix<complex< double >, 2, 2> X;
for (int i = 0 ; i <iter_MAX; ++i) {
a00=block_distr(100.);
a11=block_distr(100.);
re_a01=block_distr(100.);
im_a01=block_distr(100.);
X(0,0)=a00;
X(1,0)=re_a01-I*im_a01;
//only the lower triangular part is referenced! X(0,1)=0.; <--- not necessary
X(1,1)=a11;
ces.compute(X,ComputeEigenvectors);
}
Writing the loop without Eigen, using directly the formulas for eigenvalues and eigenvectors of a hermitian matrix and an if-statement to check if the off diagonal is zero, is a factor of 5 faster. Am I not using Eigen properly or is such an overhead normal? Are there other lib.s which are optimized for small self-adjoint matrices?
By default, the iterative method is used. To use the analytical version for the 2x2 and 3x3, you have to call the computeDirect function:
ces.computeDirect(X);
but it is unlikely to be faster than your implementation of the analytic formulas.