Armadillo Sparse Matrix Size in Bytes - c++

I would like to assess how large Armadillo sparse matrices are. The question is related to this answer regarding dense matrices.
Consider the following example:
void some_function(unsigned int matrix_size) {
arma::sp_mat x(matrix_size, matrix_size);
// Steps entering some non-zero values
std::cout << sizeof(x) << std::endl;
}
Unfortunately, sizeof does, as in the dense matrix case, not return the size of the matrix itself, but rather the size of a pointer or some other small object. The size of the sparse matrix should not simply be the number of non-zero elements times the data type's size. Armadillo stores sparse matrices in a compressed format. And on top of the cell values, there should also be a matrix or vectors storing the cell indices. And I guess that the matrix also has a header storing information about the object.

There are three key properties:
n_rows
n_cols and
n_nonzero
The last value represents the number of cells 0 <= n_nonzero <= (n_rows*n_cols) which have a value.
You can use this to know the density (which is also displayed as a percentage with .print, e.g.
[matrix size: 3x3; n_nonzero: 4; density: 44.44%]
(1, 0) 0.2505
(0, 1) 0.9467
(0, 2) 0.2513
(2, 2) 0.5206
I used these properties to implement sp_matrix serialization before: How to serialize sparse matrix in Armadillo and use with mpi implementation of boost?
The actual number of bytes allocated will be roughly correlating to n_nonzero, but you have to account for /some/ overhead. In practice the best way to measure actual allocations is by using instrumented allocators, or (the logical extension of that idea) memory profilers. See e.g. How to find the memory occupied by a boost::dynamic_bitset?

Related

efficiently updating inplace certain blocks of a large sparse matrix in Eigen?

Suppose that I have a large sparse matrix with the following pattern:
the number of nonzeros per column and their locations are fixed
only matrix block A and B will change and the rest of the matrix stays static; (blocks A and B themselves are also sparse with fixed nonzero locations)
As instructed in the document, i've initialized the above matrix by
reserving the exact number of nonzeros per column for the column major sparse matrix
inserting column by column
inserting from the smallest row index per column
In later part of the program, it's natural to reuse the matrix and only updates the A, B blocks inplace. Possible ways are:
accessing existing entries by coeffRef, would introduce binary search so not preferred here.
iterating over the outer and inner dimensions as documented here
However, it seems a bit unnecessary to iterate over all nonzero entries since most part of the sparse matrix stays the same.
Is it possible to update A, B inplace without iterating over all nonzeros in the matrix?
From what I can tell, the InnerIterator can be used used for this and runs in constant time.
Eigen::Index col = 1;
Eigen::Index offset_in_col = 1;
using SparseMatrixD = Eigen::SparseMatrix<double>;
SparseMatrixD mat = ...;
SparseMatrixD::InnerIterator i =
SparseMatrixD::InnerIterator(mat, col) + offset_in_col;
assert(i.row() == 1);
assert(i.col() == 1);
assert(i.value() == C);
This should access the value C. All you need to know is how many nonzero elements are per column (or inner dimension in general). You don't need to know how many nonzero columns (outer dimensions) are stored because that array (SparseMatrix.outerIndexPtr()) has one entry per column.

Set sparsity pattern of Eigen::SparseMatrix without memory overhead

I need to set sparsity pattern of Eigen::SparseMatrix which i already know (i have unique sorted column indices and row offsets). Obviously it's possible via setFromTriplets but unfortunately setFromTriplets requires a lot of additional memory (at least in my case)
I wrote small example
const long nRows = 5000000;
const long nCols = 100000;
const long nCols2Skip = 1000;
//It's quite big!
const long nTriplets2Reserve = nRows * (nCols / nCols2Skip) * 1.1;
Eigen::SparseMatrix<double, Eigen::RowMajor, long> mat(nRows, nCols);
std::vector<Eigen::Triplet<double, long>> triplets;
triplets.reserve(nTriplets2Reserve);
for(long row = 0; row < nRows; ++row){
for(long col = 0; col < nCols; col += nCols2Skip){
triplets.push_back(Eigen::Triplet<double, long>(row, col, 1));
}
}
std::cout << "filling mat" << std::endl << std::flush;
mat.setFromTriplets(triplets.begin(), triplets.end());
std::cout << "Finished! nnz " << mat.nonZeros() << std::endl;
//Stupid way to check memory consumption
std::cin.get();
In my case this example consumes something about 26Gb at peak (between lines "filling mat" and "Finished") and 18Gb after all. (I made all checks via htop). ~8Gb overhead is quite big for me (in my "real world" task i have bigger overhead).
So i have two questions:
How to fill sparsity pattern of Eigen::SparseMatrix with as little overhead as possible
Why setFromTriplets requires so many memory?
Please let me know if my example is wrong.
My Eigen version is 3.3.2
PS Sorry for my English
EDIT:
It's looks like inserting (with preallocation) each triplet manually works faster and requires less memory at peak. But i still want to know is it possible to set sparsity pattern manually
Ad 1: You can be even a bit more efficient than plain insert by using the internal functions startVec and insertBack, if you can guarantee that you insert elements in lexicographical order.
Ad 2: If you use setFromTriplets you need approximately twice final the size of the matrix (plus the size of your Triplet container), since the elements are first inserted in a transposed version of the matrix, which is then transposed to the final matrix in order to make sure that all inner vectors are sorted. If you know the structure of your matrix ahead, this is obviously quite a waste of memory, but it is intended to work on arbitrary input data.
In your example you have 5000000 * 100000 / 1000 = 5e8 elements. A Triplet requires 8+8+8 = 24 bytes (making about 12Gb for the vector) and each element of the sparse matrix requires 8+8=16 bytes (one double for the value, one long for the inner index), i.e., about 8Gb per matrix, so in total you require about 28Gb which is about 26 Gib.
Bonus:
If your matrix has some special structure, which can be stored more efficiently, and you are willing to dig deeper into the Eigen internals, you may also consider implementing a new type inheriting from Eigen::SparseBase<> (but I don't recomment this, unless memory/performance is very critical for you, and you are willing to go through a lot of "sparsely" documented internal Eigen code ...). However, in that case it is probably easier to think about what you intend to do with your matrix, and try to implement only special operations for that.

Fast data structure or algorithm to find mean of each pixel in a stack of images

I have a stack of images in which I want to calculate the mean of each pixel down the stack.
For example, let (x_n,y_n) be the (x,y) pixel in the nth image. Thus, the mean of pixel (x,y) for three images in the image stack is:
mean-of-(x,y) = (1/3) * ((x_1,y_1) + (x_2,y_2) + (x_3,y_3))
My first thought was to load all pixel intensities from each image into a data structure with a single linear buffer like so:
|All pixels from image 1| All pixels from image 2| All pixels from image 3|
To find the sum of a pixel down the image stack, I perform a series of nested for loops like so:
for(int col=0; col<img_cols; col++)
{
for(int row=0; row<img_rows; row++)
{
for(int img=0; img<num_of_images; img++)
{
sum_of_px += px_buffer[(img*img_rows*img_cols)+col*img_rows+row];
}
}
}
Basically img*img_rows*img_cols gives the buffer element of the first pixel in the nth image and col*img_rows+row gives the (x,y) pixel that I want to find for each n image in the stack.
Is there a data structure or algorithm that will help me sum up pixel intensities down an image stack that is faster and more organized than my current implementation?
I am aiming for portability so I will not be using OpenCV and am using C++ on linux.
The problem with the nested loop in the question is that it's not very cache friendly. You go skipping through memory with a long stride, effectively rendering your data cache useless. You're going to spend a lot of time just accessing the memory.
If you can spare the memory, you can create an extra image-sized buffer to accumulate totals for each pixel as you walk through all the pixels in all the images in memory order. Then you do a single pass through the buffer for the division.
Your accumulation buffer may need to use a larger type than you use for individual pixel values, since it has to accumulate many of them. If your pixel values are, say, 8-bit integers, then your accumulation buffer might need 32-bit integers or floats.
Usually, a stack of pixels
(x_1,y_1),...,(x_n,y_n)
is conditionally independent from a stack
(a_1,b_1),...,(a_n,b_n)
And even if they weren't (assuming a particular dataset), then modeling their interactions is a complex task and will give you only an estimate of the mean. So, if you want to compute the exact mean for each stack, you don't have any other choice but to iterate through the three loops that you supply. Languages such as Matlab/octave and libraries such as Theano (python) or Torch7 (lua) all parallelize these iterations. If you are using C++, what you do is well suited for Cuda or OpenMP. As for portability, I think OpenMP is the easier solution.
A portable, fast data structure specifically for the average calculation could be:
std::vector<std::vector<std::vector<sometype> > > VoVoV;
VoVoV.resize(img_cols);
int i,j;
for (i=0 ; i<img_cols ; ++i)
{
VoVoV[i].resize(img_rows);
for (j=0 ; j<img_rows ; ++j)
{
VoVoV[i][j].resize(num_of_images);
// The values of all images at this pixel are stored continguously,
// therefore should be fast to access.
}
}
VoVoV[col][row][img] = foo;
As a side note, 1/3 in your example will evaluate to 0 which is not what you want.
For fast summation/averaging you can now do:
sometype sum = 0;
std::vector<sometype>::iterator it = VoVoV[col][row].begin();
std::vector<sometype>::iterator it_end = VoVoV[col][row].end();
for ( ; it != it_end ; ++it)
sum += *it;
sometype avg = sum / num_of_images; // or similar for integers; check for num_of_images==0
Basically you should not rely that the compiler would optimize away the repeated calculation of always the same offsets.

c++ use 1D Array with 2D Data

I do not use any matrix library, but instead plain std::vector for my matrix data.
To fill it with 2D data I use this code:
data[iy + dataPointsY * ix] = value;
I would like to know is this is correct or if it must be the other way (ix first).
To my understanding fftw needs 'Row-major Format'. Since I use it the formula should be according to row-major format.
Assuming you want row major format for fftw, what you want is:
data[ix + iy*dataPointsY]
The point of row-major is, when the combined index increased by 1, the corresponding row index would be same (assuming not overflowing to the next row).
double m[4][4];
mp = (double*)m;
mp[1+2*3] == m[2][1]; //true
mp[2+2*3] == m[2][2]; //true
mp[2+2*3] == m[3][1]; //false
In general, there's no "right" way to store a matrix. Row major format is also called "C-style" matrix, while column major is called "fortran-style" matrix. The naming is due to different multidimensional array indexing scheme between the two language.

Accessing elements of a cv::Mat with at<float>(i, j). Is it (x,y) or (row,col)?

When we access specific elements of a cv::Mat structure, we can use mat.at(i,j) to access the element at position i,j. What is not immediately clear, however, whether (i,j) refers to the x,y coordinate in the matrix, or the ith row and the jth column.
OpenCV, like many other libraries, treat matrix access in row-major order. That means every access is defined as (row, column). Note that if you're working with x and y coordinates of an image, this becomes (y, x), if y is your vertical axis.
Most matrix libraries are the same in that regards, the access is (row, col) as well in, for example, Matlab or Eigen (a C++ matrix library).
Where these applications and libraries do differ however is how the data is actually stored in memory. OpenCV stores the data in row-major order in memory (i.e. the rows come first), while for example Matlab stores the data in column-major order in memory. But if you're just a user of these libraries, and accessing the data via a (row, col) accessor, you'll never actually see this difference in memory storage order.
So OpenCV handles this a bit strange. OpenCV stores the Mat in row major order, but addressing it over the methood Mat::at() falsely suggests column major order. I think the Opencv documentation is misleading in this case. I had to write this testcase to make sure for myself.
cv::Mat m(3,3,CV_32FC1,0.0f);
m.at<float>(1,0) = 2;
cout << m << endl;
So addressing is done with Mat::at(y,x) :
[0, 0, 0;
2, 0, 0;
0, 0, 0]
But raw pointer access reveals that it is actually stored row major, e.g. the "2" is in the 4th position. If it were stored in column major order, it would be in the 2nd position.
float* mp = &m.at<float>(0);
for(int i=0;i<9;i++)
cout << mp[i] << " ";
0 0 0 2 0 0 0 0 0
As a side remark: Matlab stores and addresses a matrix in column major order. It might be annoying, but at least it is consistent.
OpenCV, like may other libraries, treat matrices (and images) in row-major order. That means every access is defined as (row, column).
Notable exceptions from this general rule are Matlab and Eigen libraries.
From what I've read in the documentation, it's at(y, x) (i.e. row, col).
Since cv::Mat is actually a general matrix, with images being just a special case, it follows matrix indexing and therefore the row (y) comes before the column (x):
mat.at(i, j) = mat.at(row, col) = mat.at(y, x)