C++ armadillo sparse matrix batch insertion - c++

I am looking at batch insertion for sparse matrices in armadillo in the docs "http://arma.sourceforge.net/docs.html#batch_constructors_sp_mat".
It defines form1 as:
form 1: sp_mat(rowind, colptr, values, n_rows, n_cols)
What does colptr hold? If I understand correctly, it should have the actual address to whatever columns we want to insert at ?
Seems strange to me rowind are not pointers but colptr are pointers. Any reason for this?

Armadillo uses the standard Compressed Sparse Column (CSC) format for storing sparse matrix data. The format is also known as Compressed Column Storage (CCS) and Harwell-Boeing. The row indices and column pointers are explained on several sites:
Wikipedia: http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29
Netlib: http://netlib.org/linalg/html_templates/node92.html
http://www.cs.colostate.edu/~mroberts/toolbox/c++/sparseMatrix/sparse_matrix_compression.html
The CSC format is used for compatibility with existing sparse solvers, etc.

Related

Eigen::Map<Sparse> for COO-SpMV

Here is my question in short:
What is the correct code for generating a map for an (unsorted) coo-matrix in Tux Eigen (C++)?
The following code succeeds at generating a map A_map for a compressed row storage (crs/csr) format sparse matrix, that is stored in a crs_structure A1. (I use metis notation. m=rows, n=cols, nnz=#nonzeros.)
Eigen::Map<Eigen::SparseMatrix< double,Eigen::RowMajor,myInt>> A_map(A1.m,A1.n,A1.nnz,A1.adj,A1.adjncy,A1.values,NULL );
I use the following code in attempting to generate a map A_map for a coordinate storage (coo) format sparse matrix, that is stored in a coo_structure A2. ptrI,ptrJ,ptrV are int64*,int64*,double*, giving row-,col-coordinates of values in ptrV.
Eigen::Map<Eigen::SparseMatrix< double,Eigen::RowMajor,myInt>> A_map(A2.m,A2.n,A2.nnz,A2.ptrI,A2.ptrJ,A2.ptrV,innerNonZerosPtr);
I need the map because I want to benchmark Eigen's sparse matrix vector product (matvec) against mine.
In general, none of the indices of A are sorted.
Otherwise, the csr format could be created from the coo format in $\cO(nnz)$, circumventing the issue.
That is not an option here because sorting the indices consumes far more time then computing the matvec.
Side note:
I hove not understood what the "innerNonZerosPtr" means; I failed at finding its actual explanation in the Eigen documentation.
Possibly, understanding its intention and purposeful use in my scenario could solve my problem.
Cheers, and many thanks in advance for any help.
There is an example here sparseTutorial
For row-major case, innerNnz vector stores the number of non-zero elements of each rows.
If matrix is in compressed form, innerNnz is non required.

sparse or dense storage of a matrix

I'm working with large sparse matrices that are not exactly very sparse and I'm always wondering how much sparsity is required for storage of a matrix as sparse to be beneficial? We know that sparse representation of a reasonably dense matrix could have a larger size than the original one. So is there a threshold for the density of a matrix so that it would be better to store it as sparse? I know that the answer to this question usually depends on the structure of the sparsity, etc but I was wondering if there is just some guidelines? for example I have a very large matrix with density around 42%. should I store this matrix as dense or sparse?
scipy.coo_matrix format stores the matrix as 3 np.arrays. row and col are integer indices, data has the same data type as the equivalent dense matrix. So it should be straight forward to calculate the memory it will take as a function of overall shape and sparsity (as well as the data type).
csr_matrix may be more compact. data and indices are the same as with coo, but indptr has a value for each row plus 1. I was thinking that indptr would be shorter than the others, but I just constructed a small matrix where it was longer. An empty row, for example, requires a value in indptr, but none in data or indices. The emphasis with this format is computational efficiency.
csc is similar, but working with columns. Again you should be able to the math to calculate this size.
Brief discussion of memory advantages from MATLAB (using similar storage options)
http://www.mathworks.com/help/matlab/math/computational-advantages.html#brbrfxy
background paper from MATLAB designers
http://www.mathworks.com/help/pdf_doc/otherdocs/simax.pdf
SPARSE MATRICES IN MATLAB: DESIGN AND IMPLEMENTATION

What is a fast matrix or two-dimensional array to store an adjacency matrix in C++

I'm trying to infer a Markov chain of a process I can only simulate. The amount of states/vertices that the final graph will contain is very large, but I do not know the amount of vertices in advance.
Right now I have the following:
My simulation outputs a boost::dynamic_bitset containing 112 bits every timestep.
I use the bitset as a key in a Google Sparse Hash to map to an integer value that can be used as an index to the adjacency matrix I want to construct.
Now I need a good/fast matrix or two-dimensional array to store integers. It should:
Use the integer values I stored in the Google Sparse Hash as row/column numbers. (Eg. I want to access/change a stored integer by doing something like matrix(3,4) = 3.
I do not know the amount of rows or columns I will need in advance. So it should be able to just add rows and columns on the fly.
Most values will be 0, so it should probably be a sparse implementation of something.
The amount of rows and columns will be very large, so it should be very fast.
Simple to use. I don't need a lot of mathematical operations, it should just be a fast and simple way to store and access integers.
I hope I put my question clear enough.
I'd recommend http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/matrix_sparse.htm -- boost UBLAS sparse matrices. There are several different implementations of sparse matrix storages, so reading the documentation can help you choose a type that's right for your purpose. (TLDR: sparse matrices have either fast retrieval or fast insertion.)

multi-dimensional Sparse Matrix Compression

Can anybody suggest a good C++ library for storing Multi-dimensional Sparse Matrix that focuses on the compression of data in matrix. The number of dimensions of the matrix will be huge (say, 80 dimensions). Any help is most welcome :).
EDIT:
The matrix is highly sparse, in the order of 0.0000001 (or) 1x10-6.
In c# I have used key value pairs or "dictionaries" to store sparse populated arrays. I think for 80 dimensions you would have to construct a string based key. Use a single function to create the key it should all remain consistent. Simply concatenate a comma separated list of the dimensions. Unfortunately I'm not aware of a good key pair, dictionary library for c++. Possibly STL if you have used it before but I would not recommend it otherwise.

Handle a hierarchical sparse matrix in fortran

Introduction
I am developing a code in Fortran solving an MHD problem with preconditioning of a linear operator. The sparse matrix to be inverted can be considered as a matrix of the following hierarchical structure. The original matrix (say, A_1) is a band matrix of blocks. Each block of A_1 is a sparse matrix (say, A_2) of the same structure (i.e. a block banded matrix). Each block of A_2 is again a block banded matrix of the same sparsity structure, A_3. Each block of A_3 is, finally, a dense matrix 5 by 5, A_4. I find this hierarchical representation is very convenient to initialize elements of the matrix.
Question
I wonder if there exists a library (in Fortran) permitting to handle such a structure and convert it in one of the standard sparse matrix formats (CSR, CSC, BSR,...), since Sparse BLAS or MKL Pardiso will be used to invert it. Let me stress that my intention is to use the hierarchical structure only to initialize elements of the matrix. Of course, the hierarchical structure can be disregarded and the matrix could be hard-coded in the CSR format, but I find this is too time consuming to implement and test.
Comments
I don't expect a linear solver to use the hierarchical structure, although in S. Pissanetsky " Sparse matrix technology", 1984, Academmic Press, page 27 (available online here) such storage schemes are mentioned, namely, the "hypermatrix" and "supersparse" storage schemes, and were used in Gauss elimination. I have not found available implementations of these schemes yet.
Block compressed sparse row (BSR) format (supported by MKL) can be used to handle two levels of the matrix, A_3(sparse) + A_4(dense), not more.