Preallocating memory for sparse matrix operations in Eigen

Preallocating memory for sparse matrix operations in Eigen - c++

I'm writing a program using Eigen (an optimizer to be precise) that has a "hot loop" within which I'd like to avoid memory allocation whenever possible.
When using Eigen/Dense I've been able to organize things so that all my memory allocation is performed in the constructor (with judicious usage of .noalias() and inplace LLT factorizations) by computing the sizes of all the workspaces I need in advance and preallocating.
I'd like to do something similar with the Eigen/Sparse variant of the code (to the extent that it is possible). The constructor will require the user to supply sparsity patterns for all the data which I can use to determine the sizes and sparsity patterns for all the subsequent matrices I need, workspaces included. I need to perform 4 kinds of operations:
Sparse matrix-vector products
Cholesky factorizations
Schur complements of the form E = H + A' * B * A where B is a diagonal matrix and H,A are general sparse matrices.
My current understanding is as follows:
I can use x.noalias() = A*y (A is sparse, x,y dense vectors) for matrix vector products with no problems
I can perform matrix addition using coefficient wise tricks by padding with explicit zeros as in e.g.,How to avoid memory allocations in sparse expressions with Eigen. This is mostly for operations like B = A + s I where A is sparse and sI is a scalar multiple of identity. I can just make sure the main diagonal is included in the sparsity pattern of B and perform the addition in a loop.
As of 2017, there is no way to avoid temporary allocation in sparse matrix products e.g., C = A*B (https://forum.kde.org/viewtopic.php?f=74&t=140031&p=376933&hilit=sparse+memory#p376933), and I don't see any inplace functions for sparse matrices yet so I'll have to bite the bullet on this one and accept the temporary creation.
For licensing reasons, I'm using an eigen external LDL' factorization package which allows me to preallocate based on a symbolic analysis stage.
Can anyone suggest a fast way to organize the Schur computation E = H + A' * B * A? (exploiting that I know its sparsity structure in advance)

Related

Constructing sparse tridiagonal matrix in Eigen

How do I construct a sparse tridiagonal matrix in Eigen? The matrix that I want to construct looks like this in Python:
alpha = 0.5j/dx**2
off_diag = alpha*np.ones(N-1)
A_fixed = sp.sparse.diags([-off_diag,(1/dt+2*alpha)*np.ones(N),-off_diag],[-1,0,1],format='csc')
How do I do it in C++ using the Eigen package? It looks like I need to use the 'triplet' as documented here, but are there easier ways to do this, considering that this should be a fairly common operation?
Another side question is whether I should use row-major or column major. I want to solve the matrix equation Ax=b, where A is a tridiagonal matrix. When we do matrix-vector multiplication by hand, we usually multiply each row of the matrix by the column vector, so storing the matrix in row-major seems to make more sense. But what about a computer? Which one is preferred if I want to solve Ax=b?
Thanks

The triplets are the designated method of setting up a sparse matrix.
You could go the even more straightforward way and use A.coeffRef(row, col) = val or A.inser(row,col) = val, i.e. fill the matrix element-by-element.
Since you have a tridiagonal system you know the number of non-zeros of the matrix beforehand and can reserve the space using A.reserve(Nnz).
A dumb way, which nevertheless works, is:
uint N(1000);
CSRMat U(N,N);
U.reserve(N-1);
for(uint j(0); j<N-1; ++j)
U.insert(j,j+1) = -1;
CSRMat D(N,N);
D.setIdentity();
D *= 2;
CSRMat A = U + CSRMat(U.transpose()) + D;
As to the solvers and preferred storage order that is, as I recall, of minor importance. Whilst C(++) stores contiguous data in row-major format it is up to the algorithm whether the data is accessed in an optimal way (row-by-row for row-major storage order). The correctness of an algorithm does not, as a rule, depend on the storage order of the data. Its performance depends on compatibility of storage order and actual data access patterns.
If you intend to use Eigen's own solvers stick with its default choice (col-major). If you intend to interface with other libraries (e.g. ARPACK) choose the storage order the library prefers/requires.

Right function for computing a limited number of eigenvectors of a complex symmetric matrix in Armadillo

I am using the Armadillo library to manually port a piece of Matlab code. The matlab code uses the eigs() function to find a small number (~3) of eigen vectors of a relative large(200x200) covariance matrix R. The code looks like this:
[E,D] = eigs(R,3,"lm");
In Armadillo there are two functions eigs_sym() and eigs_gen() however the former only support real symmetric matrix and the latter requires ARPACK (I'm building the code for Android). Is there a reason eigs_sym doesn't support complex matrices? Is there any other way to find the eigenvectors of a complex symmetric matrix?

The eigs_sym() and eigs_gen() functions (where the s in eigs stands for sparse) in Armadillo are for large sparse matrices. A "large" size in this context is roughly 5000x5000 or larger.
Your R matrix has a size of 200x200. This is very small by current standards. It would be much faster to simply use the dense eigendecomposition eig_sym() or eig_gen() functions to get all the eigenvalues / eigenvectors, followed by extracting a subset of them using submatrix operations like .tail_cols()

Have you tested constructing a 400x400 real symmetric matrix by replacing each complex value, a+bi, by a 2x2 matrix [a,b;-b,a] (alternatively using a block variant of this)?
This should construct a real symmetric matrix that in some way correspond to the complex one.
There will be a slow-down due to the larger size, and all eigenvalues will be duplicated (which may slow down the algorithm), but it seems straightforward to test.

Armadillo complex sparse matrix inverse

I'm writing a program with Armadillo C++ (4.400.1)
I have a matrix that has to be sparse and complex, and I want to calculate the inverse of such matrix. Since it is sparse it could be the pseudoinverse, but I can guarantee that the matrix has the full diagonal.
In the API documentation of Armadillo, it mentions the method .i() to calculate the inverse of any matrix, but sp_cx_mat members do not contain such method, and the inv() or pinv() functions cannot handle the sp_cx_mat type apparently.
sp_cx_mat Y;
/*Fill Y ensuring that the diagonal is full*/
sp_cx_mat Z = Y.i();
or
sp_cx_mat Z = inv(Y);
None of them work.
I would like to know how to compute the inverse of matrices of sp_cx_mat type.

Sparse matrix support in Armadillo is not complete and many of the factorizations/complex operations that are available for dense matrices are not available for sparse matrices. There are a number of reasons for this, the largest being that efficient complex operations such as factorizations for sparse matrices is still very much an open research field. So, there is no .i() function available for cx_sp_mat or other sp_mat types. Another reason for this is lack of time on the part of the sparse matrix developers (...which includes me).
Given that the inverse of a sparse matrix is generally going to be dense, then you may simply be better off turning your cx_sp_mat into a cx_mat and then using the same inversion techniques that you normally would for dense matrices. Since you are planning to represent this as a dense matrix anyway, then it's a fair assumption that you have enough RAM to do that.

Partitioned Matrix-Vector Multiplication

Given a very sparse nxn matrix A with nnz(A) non-zeros, and a dense nxn matrix B. I would like to compute the matrix product AxB. Since n is very large, if carried out naively, the dense matrix B cannot be put into the memory. I have the following two options, but not sure which one is better. Could you give some suggestions. Thanks.
Option1. I parition the matrix B into n column vectors [b1,b2,...,bn]. Then, I can put matrix A and any single vector bi into the memory, and calculate the A*b1, A*b2, ..., A*bn, respectively.
Option2. I partition the matrices A and B, respectively, into four n/2Xn/2 blocks, and then use the block matrix-matrix multiplications to calculate A*B.
Which of the above choice is better? Can I say that Option 1 has high performance in parallel calculation?

See a discussion of both approaches, though for two dense matrices, in this document from the Scalapack documentation. Scalapack is the one of the reference tools for distributed linear algebra.

Efficient solution of linear system Ax= b when only one of the constant term changes

How does one solve a large system of linear equations efficiently when only a few of the constant terms change. For example:
I currently have the system Ax= b. I compute the inverse of A once, store it in a matrix and each time any entry updates in b perform a matrix-vector multiplication A^-1(b) to recompute x.
This is inefficient as only a couple of entries would have update in b. Are there more efficient ways of solving this system when A-1 remains constant but specific known values change in b?
I use uBlas and Eigen, but not aware of solutions that would address this problem of selective recalculation. Thanks for any guidance.

Compute A^-1. If b_i is the ith component of b, then examine d/db_i A^-1 b (the derivative of A^-1 with respect to the ith component of b) -- it equals a column of A^-1 (in particular, the ith column). And derivatives of linear functions are constant over their domain. So if you have b and b', and they differ only in the ith component, then A^-1 b - A^-1 b' = [d/db_i A^-1] * (b-b')_i. For multiple components, just add them up (as A^-1 is linear).
Or, in short, you can calculate A^-1 (b'-b) with some optimizations for input components that are zero (which, if only some components change, will be most of the components). A^-1 b' = A^-1 (b'-b) + A^-1 (b). And if you know that only some components will change, you can take a copy of the appropriate column of A^-1, then multiply it by the change in that component of b.

You can take advantage of the problem's linearity :
x0 = A_(-1)*b0
x = A_(-1)*b = x0 + A_(-1)*db
where db is a the difference matrix between b and b0 and it should be filled with zero : you can compressed it to a sparse matrix.
the Eigen lib has a lot of cool functions for sparse matrices (multiplication, inverse, ...).

Firstly, don't perform a matrix inversion, use a solver library instead. Secondly, pass your initial x to the library as a first guess.
The library will perform some kind of decomposition like LU, and use that to calculate x. If you choose an iterative solver, then it is already doing pretty much what you describe to home in on the solution; it will begin with a worse guess and generate a better one, and any good routine will take an initial guess to speed up the process. In many circumstances you have a good idea of the result anyway, so it makes sense to exploit that.
If the new b is near the old b, then the new x should be near the old x, and it will serve as a good initial guess.

First, don't compute the matrix inverse, use rather the LU decomposition, or the QR decomposition (slower than LU but stabler). Such decompositions scale better than inversion performancewise with the matrix size, and are usually stabler (especially QR).
There are ways to update the QR decomposition if A changes slightly (eg. by a rank one matrix), but if B is changed, you have to solve again with the new b -- you cannot escape this, and this is O(n^2).
However, if the right hand side B only changes by a fixed element, ie. B' = B + dB with dB known in advance, you can solve A dx = dB once and for all and now the solution x' of Ax' = B' is x + dX.
If dB is not known in advance but is always a linear combination of a few dB_i vectors, you may solve for A dx_i = dB_i, but if you have many such dB_i, you end up with a n^2 process (this in fact amounts to computing the inverse)...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js