Armadillo: efficient RAM sparse batch insertion - c++

I know that Sparse matrix in armadillo is still in preliminary support.
I'm using armadillo lib in my quantum systems research and I have problem to construct sparse mat in effective RAM way.
So far I was using my own implementation of sparse matrixes, but I want to have an optimized matrix class.
I'm filling elements in batch mode:
umat loc(2,size);
cx_vec val(size);
// calculate loc and val
...
//
sp_cx_mat Hamiltonian(loc, val);
This kind of action copy values from loc,val to constructor of Hamiltonian and for some few seconds require 2x RAM. I calculate huge matrix (size is about 2**L, where L=22, 24, ...) so I wish I had well optimised code in memory.
For comparison, matrix size: 705432x705432 - RAM and "filling time":
my implementation (COO format): time 7.95s, memory 317668kB
armadillo (CSC format): time 5.32s, memory 715000kB
Is it possible to deallocate fragments of vectors: loc, val on the fly to save memory, element by element?

The answer here will be to use the other sparse matrix constructor that takes the CSC format, so you will need to modify your // calculate loc and val code, instead filling the following three arrays:
values (length equal to number of points)
row_indices (length equal to number of points)
col_ptrs (length equal to number of columns plus one)
The points should be arranged in column-major ordering in the values and row_indices vectors, and the col_ptrs vector contains the number of nonzero elements before the beginning of the column. That is, col_ptrs[0] will always contain 0, col_ptrs[1] will contain the number of nonzero elements in the first column, col_ptrs[2] will contain the number of nonzero elements in the first and second columns, and col_ptrs[n_cols + 1] will contain the number of nonzero elements in the matrix.
For more documentation on this constructor, see the "Batch constructors" section of http://arma.sourceforge.net/docs.html#SpMat ; this is the fourth entry in that list.
If you cannot easily modify your calculation code to adhere to that format, then you might be better off trying to specify sort_locations = false to the constructor you are using, if you are not already doing that.

Related

Eigen Sparse Vector : Find max coefficient

I am working with sparse vector with Eigen, and I need to find an efficient way to compute the index of the max coefficient (or the nth max coefficient).
My initial method uses Eigen::SparseVector::InnerIterator, however it does not compute the right value in the case of vector containing only zeros and negative value because InnerIterator only iterate on non-zero values.
How to implement it in order to take into account zero values ?
To get the index of the largest non-zero element, you can use this function:
Eigen::Index maxRow(Eigen::SparseVector<double> const & v)
{
Eigen::Index nnz = v.nonZeros();
Eigen::Index rowIdx;
double value = Eigen::VectorXd::Map(v.valuePtr(), nnz).maxCoeff(&rowIdx);
// requires special handling if value <= 0.0
return v.innerIndexPtr()[rowIdx];
}
In case value <=0 (and v.nonZeros()<v.size()), you can iterate through innerIndexPtr() until you find a gap between consecutive elements (or write something more sophisticated using std::lower_bound)
For getting the nth largest element it depends on how large your n is relative to the vector size, how many non-zeros you have, if you can modify your SparseVector, etc.
Especially, if n is relatively large, consider to partition your elements into positive and negative elements, then using std::nth_element in the correct half.
Iterate over the index array (innerIndices I think) as well at the same time as the inner iterator.

(effectively) storing a polynomial dynamically

What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?
Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.
A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.

Quick access to a cell in 2D matrix which wraps around

I have a matrix which wraps around.
m_matrixOffset points to first cell(0, 0) of the wrapped around matrix. So to access a cell we have below function GetCellInMatrix .Logic to wrap around(in while loop) is executed each time someone access a cell. This is executed thousands of time in a second. Is there any way to optimize this using some lookup or someother way. MAX_ROWS and MAX_COLS may not be power of 2.
struct Cell
{
Int rowId;
Int colId;
}
int matData[MAX_ROWS][MAX_COLS];
int GetCellInMatrix(const Cell& cellIndex)
{
Cell newCellIndex = cellIndex + m_matrixOffset ;
while (newCellIndex.rowId > MAX_ROWS)
{
newCellIndex.rowId -= MAX_ROWS;
}
while (newCellIndex.colId > MX_COLS)
{
newCellIndex.y -= MAX_COLS;
}
return data[newCellIndex.rowId][newCellIndex.colId];
}
You might be interested in the concept of division with remainder, usually implemented as a % b for the remainder.
Thus
return data[newCellIndex.rowId % MAX_ROWS][newCellIndex.colId % MAX_COLS];
does not need the while loops before it.
As per comment, the implied integer division in the remainder computation is too costly if done at each query. Assuming that m_matrixOffset is constant over a large number of queries, reduce its coordinates once using the remainder operations. Then the newCellIndex are less than twice the maximum, thus need only to be reduced at most once. Thus it is safe to replace while with if, sparing one comparison.
If you can sacrifice memory for space, then double the matrix dimensions and fill the excess entries with the repeated matrix elements. You have to make sure this pattern holds when updating the matrix.
Then, again assuming that both m_matrixOffset and CellIndex are inside the maxima for rows and columns, you can access the cell of the extended matrix without any further reduction. This would be a variant on the "lookup table" idea.
Or use real lookup tables, but you then execute 3 array cell lookups like in
return data[repeatedRowIndex[newCellIndex.rowId]][repeatedColIndex[newCellIndex.colId]];
It depends if the wrap is small or large in relation to the matrix.
The most common case is that all you need is the nearest neighbour. So make the matrix N+2 by M+2 and duplicate the wrap. That makes reads fast but writes a bit fiddly (often a good trade-off).
If that's no good, specialise the functions. Work out which cells are edge cells and handle the specially (you must be able to do this cheaper than simply hard-coding the logic into the access, of course, if only one or two cells change every pass that will hold, not if you generate a random list every pass).

Eigen conservativeResize strange behavior

I am using m.conservativeResize() to do the equivalent in Eigen as the reshape function in MatLab. So let N = 3, and then...
static MatrixXd m(N*N,1);
and then I assign this matrix some values, and it looks like this:
1
1
0
1
0
1
0
1
1
and then try to reshape it...
m.conservativeResize(N,N);
So now the same values should be there, but now in N rows and N columns rather than N*N rows and one column.
However that's not what I get. The first column has the first three values in the column vector - OK so far - but then the remaining values just look like garbage values from uninitialized memory:
1 3.08116e-309 0.420085
1 -2.68156e+154 1.2461e-47
0 -2.68156e+154 0.634626
Any idea what I am doing wrong?
conservativeResize() doesn't ``move" the elements around (in other words, doesn't work like MATLABs reshape, since it performs memory re-allocation even if the initial and final sizes are the same). From the documentation:
Resizes the matrix to rows x cols while leaving old values untouched.
...
Matrices are resized relative to the top-left element. In case values need to be appended to the matrix they will be uninitialized.
These statements seem a bit confusing. What it means is the following: think about the initial matrix as a rectangle, of size A x B. Then think about the resized matrix as another rectangle of size C x D. Then mentally overlap the two rectangles, making sure the top-left corner is common to both. The common elements of the intersection are the ones that are preserved by the conservativeResize. The rest just correspond to uninitialized memory.
In case you want a true reshaping, use resize() instead (making absolutely sure that A x B == C x D, otherwise reallocation takes place and all bets are off).

std::bad_alloc at transpose of Eigen::SparseMatrix

I'm trying to calculate the following:
A = X^t * X
I'm using the Eigen::SparseMatrix and get a std::bad_alloc error on the transpose() operation:
Eigen::SparseMatrix<double> trans = sp.transpose();
sp is also a Eigen::SparseMatrix Matrix, but it is very big, on one of the smaller datasets, the commands
std::cout << "Rows: " << sp.rows() << std::endl;
std::cout << "Rows: " << sp.cols() << std::endl;
give the following result:
Rows: 2061565968
Cols: 600
(I precompute the sizes of this matrix before I start to fill it)
Is there a limit on how many entries such a matrix can hold?
I'm using a 64bit Linux system with g++
Thanks in advance
Alex
The answer from ggael worked with a slight modification:
In the definition of the SparseMatrix one cannot ommit the options, so the correct typedef is
typedef SparseMatrix<double, 0, std::ptrdiff_t> SpMat;
The 0 can also be exchanged for a 1, 0 means column-major and 1 means RowMajor
Thank your for your help
By default Eigen::SparseMatrix uses int to stores sizes and indices (for compactness). However, with that huge amount of rows, you need to use 64 integers for both sp and sp.transpose():
typedef SparseMatrix<double, 0, std::ptrdiff_t> SpMat;
Note that you can directly write:
SpMat sp, sp2;
sp2 = sp.transpose() * sp;
even though sp.transpose() will have to be evaluated into a temporary anyway.
I think it is impossible to answer your question in its current state.
There are two things. The size of the matrix - the mathematical object, and the size understood as memory it occupies. In dense matrices the are pretty much the same (linear dependence). But in sparse case the memory occupation is not tied to the size of the matrix, but to the number of non-zero elements.
So, technically, you have pretty much unlimited size constraints - equal to the Size type. However, you are, of course, still bound by memory when it comes to the number of (non-zero) elements.
You make a copy of a matrix obviously. So you could try calculating the size of the data the matrix object need to hold, and see if it fits within your memory.
This is not very trivial, but docs say that the storage is a list of non-zero elements. So a good estimate would probably be (2*sizeof(Index)+sizeof(Scalar))*sp.nonZeros() - for (x,y,value).
You could also monitor RAM usage before calling the transpose, and see if it stays within the limit if you double it.
Note: The transposition is probably not the culprit there, but operator=. Maybe you can avoid making the copy.