Diagonalization of a 2x2 self-adjoined (hermitian) matrix - c++

Diagonalizing a 2x2 hermitian matrix is simple, it can be done analytically. However, when it comes to calculating the eigenvalues and eigenvectors over >10^6 times, it is important to do it as efficient as possible. Especially if the off-diagonal elements can vanish it is not possible to use one formula for the eigenvectors: An if-statement is necessary, which of course slows down the code. Thus, I thought using Eigen, where it's stated that the diagonalization of 2x2 and 3x3 matrices is optimized, would be still a good choice:
using
const std::complex<double> I ( 0.,1. );
inline double block_distr ( double W )
{
return (-W/2. + rand() * W/RAND_MAX);
}
a test-loop would be
...
SelfAdjointEigenSolver<Matrix<complex< double >, 2, 2> > ces;
Matrix<complex< double >, 2, 2> X;
for (int i = 0 ; i <iter_MAX; ++i) {
a00=block_distr(100.);
a11=block_distr(100.);
re_a01=block_distr(100.);
im_a01=block_distr(100.);
X(0,0)=a00;
X(1,0)=re_a01-I*im_a01;
//only the lower triangular part is referenced! X(0,1)=0.; <--- not necessary
X(1,1)=a11;
ces.compute(X,ComputeEigenvectors);
}
Writing the loop without Eigen, using directly the formulas for eigenvalues and eigenvectors of a hermitian matrix and an if-statement to check if the off diagonal is zero, is a factor of 5 faster. Am I not using Eigen properly or is such an overhead normal? Are there other lib.s which are optimized for small self-adjoint matrices?

By default, the iterative method is used. To use the analytical version for the 2x2 and 3x3, you have to call the computeDirect function:
ces.computeDirect(X);
but it is unlikely to be faster than your implementation of the analytic formulas.

Related

Matrix inverse calculation of upper triangular matrix gives error for large matrix dimensions

I have a recursive function to calculate the inverse of an upper triangular matrix. I have divided the matrix into Top, Bottom and Corner sections and then followed the methodology as laid down in https://math.stackexchange.com/a/2333418. Here is a pseudocode form:
//A diagram structure of the Matrix
Matrix = [Top Corner]
[0 Bottom]
Matrix multiply_matrix(Matrix A, Matrix B){
Simple Code to multiply two matrices and return a Matrix
}
Matrix simple_inverse(Matrix A){
Simple Code to get inverse of a 2x2 Matrix
}
Matrix inverse_matrix(Matrix A){
//Creating an empty A_inv matrix of dimension equal to A
Matrix A_inv;
if(A.dimension == 2){
A_inv = simple_inverse(A)
}
else{
Top_inv = inverse_matrix(Top);
(Code to check Top*Top_inv == Identity Matrix)
Bottom_inv = inverse_matrix(Bottom);
(Code to check Bottom*Bottom_inv == Identity Matrix)
Corner_inv = multiply_matrix(Top_inv, Corner);
Corner_inv = multiply_matrix(Corner_inv, Bottom_inv);
Corner_inv = negate(Corner_inv); //Just a function for negation of the matrix elements
//Code to copy Top_inv, Bottom_inv and Corner_inv to A_inv
...
...
}
return A_inv;
}
int main(){
matrix A = {An upper triangular matrix with random integers between 1 and 9};
A_inv = inverse_matrix(A);
test_matrix = multiply_matrix(A, A_inv);
(Code to Raise error if test_matrix != Identity matrix)
}
For simplicity I have implemented the code such that only power of 2 dimension matrices are supported.
My problem is that I have tested this code for matrix dimensions of 2, 4, 8, 16, 32 and 64. All of these pass all of the assertion checks as shown in code.
But for matrix dimension of 128 I get failure is the assertion in main(). And when I check, I observer that the test_matrix is not Identity matrix. Some non-diagonal elements are not equal to 0.
I am wondering what could be the reason for this:-
I am using C++ std::vector<std::vector<double>> for Matrix representation.
Since the data type is double the non-diagonal elements of test_matrix for cases 2, 4, 8, ..., 64 do have some value but very small. For example, -9.58122e-14
All my matrices at any recursion stage are square matrix
I am performing checks that Top*Top_inv = Identity and Bottom*Bottom_inv = Identity.
Finally, for dimensions 2, 4, ..., 64 I generated random numbers(b/w 1 and 10) to create my upper triangular matrix. Since, these cases passed, I guess my mathematical implementation is correct.
I feel like there is some aspect of C++ datatypes about double which I am unaware of that could be causing the error. Otherwise the sudden error from 64->128 doesn't make sense.
Could you please elaborate on how the matrix == identity operation is implemented?
My guess is that the problem might be resumed to the floating point comparison.
The matrix inversion can be O(n^3) in the worst case. This means that, as the matrix size increases, the amount of computations involved also increase. Real numbers cannot be perfectly represented even when using 64 bit floating point, they are always an approximation.
For operations such as matrix inversion this can cause problems of numerical error propagation, due to the loss of precision on the accumulated multiply adds operations.
For this, there has been discussions already in the StackOverflow: How should I do floating point comparison?
EDIT: Other thing to consider if the full matrix is actually invertible.
Perhaps the Top and/or Bottom matrices are invertible, but the full matrix (when composing with the Corner matrix) is not.

Is there a something like a sparse cube in armadillo or some way of using sparse matrices as slices in a cube?

I am using armadillos sparse matrices. But now I would like to use something like a "sparse cube" which does not exist in armadillo. writing sparse matrices into a cube with cube.slice(some_sparse_matrix) converts everything back to a dense cube.
I am using sparse matrices in order to multiply a vector with. for larger vectors/matrices the sparse variant is much faster. Now I have to sum up the multiplications of several sparse matrices with several vectors.
would a std:vector be a way?
In my experience it is faster to use armadillos functions (for example a subvector or arma::span() or arma::sum() )) as opposed to write loops myself. So I was wondering what would be the fastest way of doing this.
It's possible to approximate a sparse cube using the field class, like so.
arma::uword number_of_matrices = 10;
arma::uword number_of_rows = 5000;
arma::uword number_of_cols = 5000;
arma::field<arma::sp_mat> F(number_of_matrices);
F.for_each( [&](arma::sp_mat& X) { X.set_size(number_of_rows, number_of_cols); } );
F(0)(1,2) = 456.7; // write to element (1,2) in matrix 0
F(1)(2,3) = 567.8; // write to element (2,3) in matrix 1
F.print("F:"); // show all matrices
Your compiler must support at least C++11 for this to work.

Matrix multiplication optimization

I am performing a series of matrix multiplications with fairly large matrices. To run through all of these operations takes a long time, and I need my program to do this in a large loop. I was wondering if anyone has any ideas to speed this up? I just started using Eigen, so I have very limited knowledge.
I was using ROOT-cern's built in TMatrix class, but the speed for performing the matrix operations is very poor. I set up some diagonal matrices using Eigen with the hope that it handled the multiplication operation in a more optimal way. It may, but I cannot really see the performance difference.
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
Eigen::MatrixXf r(2*size,2*size); // diagonal matrix of row sums of a
// fill matrix r
Eigen::MatrixXf c(size,size); // diagonal matrix of col sums of a
// fill matrix c
// transpose a in place
a.transposeInPlace();
Eigen::MatrixXf c_dia;
c_dia = c.diagonal().asDiagonal();
Eigen::MatrixXf r_dia;
r_dia = r.diagonal().asDiagonal();
// calc car
Eigen::MatrixXf car;
car = c_dia*a*r_dia;
You are doing way too much work here. If you have diagonal matrices, only store the diagonal (and directly use that for products). Once you store a diagonal matrix in a square matrix, the information of the structure is lost to Eigen.
Also, you don't need to store the transposed variant of a, just use a.transpose() inside a product (that is only a minor issue here ...)
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
a.setRandom();
Eigen::VectorXf r = a.rowwise().sum(); // diagonal matrix of row sums of a
Eigen::VectorXf c = a.colwise().sum(); // diagonal matrix of col sums of a
Eigen::MatrixXf car = c.asDiagonal() * a.transpose() * r.asDiagonal();
Finally, of course make sure to compile with optimization enabled, and enable vectorization if available (with gcc or clang compile with -O2 -march=native).

Fast matrix multiplication of XDX^T for D diagonal

Consider fast matrix multiplication of XDX^T for X an n by m matrix, and D an m by m diagonal matrix. Here m>>n (suppose n around 1000, m around 100000). In my application, X is a fixed matrix and values of D can change at every iteration.
What would be a fast way to calculate this? At the moment I am just doing simple multiplication in C++.
EDIT: I should clarify my current procedure, it is not "simple multiplication". In particular, I am columnise multiplying the X by the square root of diagonal entries of D to get A:=XD^{1/2}. Then I am directly calculating A*t(A) (which is the multiplication of an n by m matrix with its transpose).
Thank you.
If you know that D is diagonal, the you can just do simple multiplication. Hopefully, you are not multiplying the zeros.

Avoid numerical underflow when obtaining determinant of large matrix in Eigen

I have implemented a MCMC algorithm in C++ using the Eigen library. The main part of the algorithm is a loop in which first some some matrix calculations are performed after which the determinant of the resulting matrix is obtained and added to the output. E.g.:
MatrixXd delta0;
NumericVector out(3);
out[0] = 0;
out[1] = 0;
for (int i = 0; i < s; i++) {
...
delta0 = V*(A.cast<double>()-(A+B).cast<double>()*theta.asDiagonal());
...
I = delta0.determinant()
out[1] += I;
out[2] += std::sqrt(I);
}
return out;
Now on certain matrices I unfortunately observe a numerical underflow so that the determinant is outputted as zero (which it actually isn't).
How can I avoid this underflow?
One solution would be to obtain, instead of the determinant, the log of the determinant. However,
I do not know how to do this;
how could I then add up these logs?
Any help is greatly appreciated.
There are 2 main options that come to my mind:
The product of eigenvalues of square matrix is the determinant of this matrix, therefore a sum of logarithms of each eigenvalue is a logarithm of the determinant of this matrix. Assume det(A) = a and det(B) = b for compact notation. After applying aforementioned for 2 matrices A and B, we end up with log(a) and log(b), then actually the following is true:
log(a + b) = log(a) + log(1 + e ^ (log(b) - log(a)))
Yes, we get a logarithm of the sum. What would you do with it next? I don't know, depends on what you have to. If you have to remove logarithm by e ^ log(a + b) = a + b, then you might be lucky that the value of a + b does not underflow now, but in some cases it can still underflow as well.
Perform clever preconditioning; there might be tons of options here, and you better read about them from some trusted sources as this is a serious topic. The simplest (and probably the cheapest ever) example of preconditioning for this particular problem could be to recall that det(c * A) = (c ^ n) * det(A), where A is n by n matrix, and to premultiply your matrix with some c, compute the determinant, and then to divide it by c ^ n to get the actual one.
Update
I thought about one more option. If on the last stages of #1 or #2 you still experience underflow too frequently, then it might be a good idea to increase precision specifically for these last operations, for example, by utilizing GNU MPFR.
You can use Householder elimination to get the QR decomposition of delta0. Then the determinant of the Q part is +/-1 (depending on whether you did an even or odd number of reflections) and the determinant of the R part is the product of the diagonal elements. Both of these are easy to compute without running into underflow hell---and you might not even care about the first.