I have a question regarding the Array operations in Eigen (basically matrix element-wise operations).
Are such operations (+,-,*,/) parallelized in Eigen (when using OpenMP)? The doc does not specify it (c.f. here), however such operations would be expected to be parallelized since it would be pretty straightforward I guess.
Example:
MatrixXd A = MatrixXd::Zero(100,100);
MatrixXd B = MatrixXd::Ones(100,100);
MatrixXd C = A.array() + B.array(); // element-wise addition
MatrixXd D = A.array() / B.array(); // element-wise division
It would be great if it was parallelized. I have a lots of these element-wise operations in my code, and it would be heavier to redefine all of these with OpenMP.
Thanks in advance
The Eigen web site lists the few cases that take advantage of multithreading.
Currently, the following algorithms can make use of multi-threading:
general dense matrix - matrix products
PartialPivLU
row-major-sparse * dense vector/matrix products
ConjugateGradient with Lower|Upper as the UpLo template parameter.
BiCGSTAB with a row-major sparse matrix format.
LeastSquaresConjugateGradient
This does not exclude SIMD operations, so those will still be used.
Related
I'm writing a program using Eigen (an optimizer to be precise) that has a "hot loop" within which I'd like to avoid memory allocation whenever possible.
When using Eigen/Dense I've been able to organize things so that all my memory allocation is performed in the constructor (with judicious usage of .noalias() and inplace LLT factorizations) by computing the sizes of all the workspaces I need in advance and preallocating.
I'd like to do something similar with the Eigen/Sparse variant of the code (to the extent that it is possible). The constructor will require the user to supply sparsity patterns for all the data which I can use to determine the sizes and sparsity patterns for all the subsequent matrices I need, workspaces included. I need to perform 4 kinds of operations:
Sparse matrix-vector products
Cholesky factorizations
Schur complements of the form E = H + A' * B * A where B is a diagonal matrix and H,A are general sparse matrices.
My current understanding is as follows:
I can use x.noalias() = A*y (A is sparse, x,y dense vectors) for matrix vector products with no problems
I can perform matrix addition using coefficient wise tricks by padding with explicit zeros as in e.g.,How to avoid memory allocations in sparse expressions with Eigen. This is mostly for operations like B = A + s I where A is sparse and sI is a scalar multiple of identity. I can just make sure the main diagonal is included in the sparsity pattern of B and perform the addition in a loop.
As of 2017, there is no way to avoid temporary allocation in sparse matrix products e.g., C = A*B (https://forum.kde.org/viewtopic.php?f=74&t=140031&p=376933&hilit=sparse+memory#p376933), and I don't see any inplace functions for sparse matrices yet so I'll have to bite the bullet on this one and accept the temporary creation.
For licensing reasons, I'm using an eigen external LDL' factorization package which allows me to preallocate based on a symbolic analysis stage.
Can anyone suggest a fast way to organize the Schur computation E = H + A' * B * A? (exploiting that I know its sparsity structure in advance)
In Eigen library, I know that there are visitors and reductions for dense Eigen::Matrix class which I can use efficiently to compute their 1-norm, inf-norm, etc. someway like this:
Eigen::MatrixXd A;
...
A.colwise().lpNorm<1>().maxCoeff();
A.rowwise().lpNorm<1>().maxCoeff();
// etc.
Now I have sparse Eigen::SparseMatrix class. How can I efficiently compute these norms in this case?
You can compute the colwise/rowwise 1-norm using a product with a vector of ones:
(Eigen::RowVectorXd::Ones(A.rows()) * A.cwiseAbs()).maxCoeff();
(A.cwiseAbs() * Eigen::VectorXd::Ones(A.cols()).maxCoeff();
Check the generated assembly to see if this gets sufficiently optimized for your purpose. If not, or if you need other lpNorms, you may need to write two nested loops with sparse iterators.
I am trying to convert some methods implemented in Eigen C++ dense matrix class (MatrixXd from <Eigen/Dense>) to methods with Eigen C++ sparse matrix (like SparseMatrix<double> from <Eigen/Sparse>).
Many methods can be directly transformed by simply chance MatrixXd to SparseMatrix<double>. However, some methods cannot be.
One problem I met is to convert the following elementwise dividend into sparse matrix method:
(beta.array() / beta.cwiseAbs().array()).sum()
Originally, beta is declared as MatrixXd beta. Now, if I declare beta as SparseMatrix<double> beta, there is no more corresponding array() method to allow me to do the above.
How should I still perform element-wise operations with sparse matrix?
Is there any efficient way that I can convert dense matrix to sparse matrix and vice versa?
This is not supported because rigorously you would compute 0/0 for any explicit zeros. You can workaround if the matrix is in compress mode, to be sure call:
beta.makeCompressed();
then map the nonzeros as a dense array:
Map<ArrayXd> a(beta.valuePtr(), beta.nonZeros();
(a / a.abs()).sum;
I'm writing a program with Armadillo C++ (4.400.1)
I have a matrix that has to be sparse and complex, and I want to calculate the inverse of such matrix. Since it is sparse it could be the pseudoinverse, but I can guarantee that the matrix has the full diagonal.
In the API documentation of Armadillo, it mentions the method .i() to calculate the inverse of any matrix, but sp_cx_mat members do not contain such method, and the inv() or pinv() functions cannot handle the sp_cx_mat type apparently.
sp_cx_mat Y;
/*Fill Y ensuring that the diagonal is full*/
sp_cx_mat Z = Y.i();
or
sp_cx_mat Z = inv(Y);
None of them work.
I would like to know how to compute the inverse of matrices of sp_cx_mat type.
Sparse matrix support in Armadillo is not complete and many of the factorizations/complex operations that are available for dense matrices are not available for sparse matrices. There are a number of reasons for this, the largest being that efficient complex operations such as factorizations for sparse matrices is still very much an open research field. So, there is no .i() function available for cx_sp_mat or other sp_mat types. Another reason for this is lack of time on the part of the sparse matrix developers (...which includes me).
Given that the inverse of a sparse matrix is generally going to be dense, then you may simply be better off turning your cx_sp_mat into a cx_mat and then using the same inversion techniques that you normally would for dense matrices. Since you are planning to represent this as a dense matrix anyway, then it's a fair assumption that you have enough RAM to do that.
I am using the Armadillo C++ library for solving linear systems of medium/large dimensions (1000-5000 equations).
Since I have to solve different linear systems
AX=b
in which A is always the same and B changes, I would like to LU factorize A only once and reuse the LU factorization with different b. Unfortunately I do not know how to perform this kind of operations in Armadillo.
What I did was just the LU factorization of the A matrix:
arma::mat A;
// ... fill the A matrix ...
arma::mat P,L,U;
arma::lu(L, U, P, A);
But now I would like to use the matrices P, L and U to solve several linear systems with different b vectors.
Could you help me please?
Since A = P.t()*L*U (where equality is only approximate due to rounding errors), solving for x in P.t()*L*U*x = b requires to permute rows of B and performing forward and back substitution:
x = solve(trimatu(U), solve(trimatl(L), P*b) );
Due to the lack of a true triangular solver in armadillo, and a fast way to perform row permutation, this procedure will not be very efficient, with respect to a direct call to the relevant computational LAPACK subroutines.
General advice is to avoid explicit LU decomposition in higher level libraries, like armadillo.
if all different b's are known at the same time, store them as columns in a rectangular matrix B and X = solve(A,B);
if the different b's are known one at a time, then precomputing AINV = A.i(); and x = AINV*b; will be more efficient if the number of different r.h.s. vectors is big enough. See this answer to a similar question.