Eigen::Map<Sparse> for COO-SpMV - c++

Here is my question in short:
What is the correct code for generating a map for an (unsorted) coo-matrix in Tux Eigen (C++)?
The following code succeeds at generating a map A_map for a compressed row storage (crs/csr) format sparse matrix, that is stored in a crs_structure A1. (I use metis notation. m=rows, n=cols, nnz=#nonzeros.)
Eigen::Map<Eigen::SparseMatrix< double,Eigen::RowMajor,myInt>> A_map(A1.m,A1.n,A1.nnz,A1.adj,A1.adjncy,A1.values,NULL );
I use the following code in attempting to generate a map A_map for a coordinate storage (coo) format sparse matrix, that is stored in a coo_structure A2. ptrI,ptrJ,ptrV are int64*,int64*,double*, giving row-,col-coordinates of values in ptrV.
Eigen::Map<Eigen::SparseMatrix< double,Eigen::RowMajor,myInt>> A_map(A2.m,A2.n,A2.nnz,A2.ptrI,A2.ptrJ,A2.ptrV,innerNonZerosPtr);
I need the map because I want to benchmark Eigen's sparse matrix vector product (matvec) against mine.
In general, none of the indices of A are sorted.
Otherwise, the csr format could be created from the coo format in $\cO(nnz)$, circumventing the issue.
That is not an option here because sorting the indices consumes far more time then computing the matvec.
Side note:
I hove not understood what the "innerNonZerosPtr" means; I failed at finding its actual explanation in the Eigen documentation.
Possibly, understanding its intention and purposeful use in my scenario could solve my problem.
Cheers, and many thanks in advance for any help.

There is an example here sparseTutorial
For row-major case, innerNnz vector stores the number of non-zero elements of each rows.
If matrix is in compressed form, innerNnz is non required.

Related

Do iterators of SpMat<Type> in Armadillo only visit non-zero entries?

I was wondering how to loop through all the non-zero entries of a sp_umat (i.e., SpMat<unsigned int>) in Armadillo, and came across this related question ( link ). That post suggests using a const_iterator to retrieve the non-zero locations and values in sp_mat. Can one assume that all iterators of sp_mat (and other related types of sparse matrices in armadillo; sp_umat in my case) support only iterators that visit non-zero entries alone? I was not able to get this sorted out from the documentation. Another related question also comes to mind: in general, does Armadillo support visiting any other locations in a sparse matrix at all by other means? Thanks very much for the help!
1) Yes, all iterators of sparse objects only iterate over nonzero locations. I'm sorry that isn't clear in the documentation, I'll see if maybe that can be improved.
2) Yes, you can access any location in a sparse matrix with matrix(i, j) just like dense matrices. So in that sense the sparse and dense matrices are somewhat interchangeable.

C++ armadillo sparse matrix batch insertion

I am looking at batch insertion for sparse matrices in armadillo in the docs "http://arma.sourceforge.net/docs.html#batch_constructors_sp_mat".
It defines form1 as:
form 1: sp_mat(rowind, colptr, values, n_rows, n_cols)
What does colptr hold? If I understand correctly, it should have the actual address to whatever columns we want to insert at ?
Seems strange to me rowind are not pointers but colptr are pointers. Any reason for this?
Armadillo uses the standard Compressed Sparse Column (CSC) format for storing sparse matrix data. The format is also known as Compressed Column Storage (CCS) and Harwell-Boeing. The row indices and column pointers are explained on several sites:
Wikipedia: http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29
Netlib: http://netlib.org/linalg/html_templates/node92.html
http://www.cs.colostate.edu/~mroberts/toolbox/c++/sparseMatrix/sparse_matrix_compression.html
The CSC format is used for compatibility with existing sparse solvers, etc.

What is a fast matrix or two-dimensional array to store an adjacency matrix in C++

I'm trying to infer a Markov chain of a process I can only simulate. The amount of states/vertices that the final graph will contain is very large, but I do not know the amount of vertices in advance.
Right now I have the following:
My simulation outputs a boost::dynamic_bitset containing 112 bits every timestep.
I use the bitset as a key in a Google Sparse Hash to map to an integer value that can be used as an index to the adjacency matrix I want to construct.
Now I need a good/fast matrix or two-dimensional array to store integers. It should:
Use the integer values I stored in the Google Sparse Hash as row/column numbers. (Eg. I want to access/change a stored integer by doing something like matrix(3,4) = 3.
I do not know the amount of rows or columns I will need in advance. So it should be able to just add rows and columns on the fly.
Most values will be 0, so it should probably be a sparse implementation of something.
The amount of rows and columns will be very large, so it should be very fast.
Simple to use. I don't need a lot of mathematical operations, it should just be a fast and simple way to store and access integers.
I hope I put my question clear enough.
I'd recommend http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/matrix_sparse.htm -- boost UBLAS sparse matrices. There are several different implementations of sparse matrix storages, so reading the documentation can help you choose a type that's right for your purpose. (TLDR: sparse matrices have either fast retrieval or fast insertion.)

multi-dimensional Sparse Matrix Compression

Can anybody suggest a good C++ library for storing Multi-dimensional Sparse Matrix that focuses on the compression of data in matrix. The number of dimensions of the matrix will be huge (say, 80 dimensions). Any help is most welcome :).
EDIT:
The matrix is highly sparse, in the order of 0.0000001 (or) 1x10-6.
In c# I have used key value pairs or "dictionaries" to store sparse populated arrays. I think for 80 dimensions you would have to construct a string based key. Use a single function to create the key it should all remain consistent. Simply concatenate a comma separated list of the dimensions. Unfortunately I'm not aware of a good key pair, dictionary library for c++. Possibly STL if you have used it before but I would not recommend it otherwise.

What are the differences between the various boost ublas sparse vectors?

In boost::numeric::ublas, there are three sparse vector types.
I can see that the mapped_vector is essentially an stl::map from index to value, which considers all not-found values to be 0 (or whatever is the common value).
But the documentation is sparse (ha ha) on information about compressed_vector and coordinate_vector.
Is anyone able to clarify? I'm trying to figure out the algorithmic complexity of adding items to the various vectors, and also of dot products between two such vectors.
A very helpful answer offered that compressed_vector is very similar to compressed_matrix. But it seems that, for example, compressed row storage is only for storing matrices -- not just vectors.
I see that unbounded_array is the storage type, but I'm not quite sure what the specification is for that, either. If I create a compressed_vector with size 200,000,000, but with only 5 non-zero locations, is this less efficient in any way than creating a compressed_vector with size 10 and 5 non-zero locations?
Many thanks!
replace matrix with vector and you have the answers
http://www.guwi17.de/ublas/matrix_sparse_usage.html