Submatrix view from indices in Eigen - c++

Is it possible in Eigen to do the equivalent of the following operation in Matlab?
A=rand(10,10);
indices = [2,5,6,8,9];
B=A(indices,indices)
I want to have a submatrix as a view on the original matrix with given, non consecutive indices.
The best option would be to have a shared memory view of the original matrix, is this possible?
I've figured out a method that works but is not very fast, since it involves non vectorized for loops:
MatrixXi slice(const MatrixXi &A, const std::set<int> &indices)
{
int n = indices.size();
Eigen::MatrixXi B;
B.setZero(n,n);
std::set<int>::const_iterator iInd1 = indices.begin();
for (int i=0; i<n;++i)
{
std::set<int>::const_iterator iInd2=indices.begin();
for (int j=0; j<n;++j)
{
B(i,j) = A.coeffRef(*iInd1,*iInd2);
++iInd2;
}
++iInd1;
}
return B;
}
How can this be made faster?

Make your matrix traversal col-major (which is default in Eigen) http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html
Disable debug asserts, EIGEN_NO_DEBUG, see http://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html, as the comment by Deepfreeze suggested.
It is very non-trivial to implement vectorized version since elements are not contiguous in general. If you are up to it, take a look at AVX2 gather instructions (provided you have CPU with AVX2 support)
To implement matrix view (you called it shared-memory) you'd need to implement an Eigen expression, which is not too hard if you are well versed in C++ and know Eigen codebase. I can help you to get started if you so want.

Related

c++ Eigen:how to make code more elegant and faster when I must traverse each row of matrix

When I write algorithm for point cloud in Eigen, I use Eigen::Matrix to represent point cloud. In many cases, I usually write code beblow
Eigen::Matrix Distance(const Eigen::MatrixXd& base,const Eigen::MatrixXd& center){
Eigen::Matrix disRes;
for(int i = 0;i < center.cols();++i){
disRes.col(i) = (base.colwise() - center.col(i)).colwise().squaredNorm();
}
return std::move(disRes);
}
The code may be like this.
How to make the code more elegant and faster when both input is more then 1e5 without parallel.

Eigen: modify sparse matrix' triplet list, instead of using coeffRef

I am facing the problem of assembling an Eigen::SparseMatrix. In reality it concerns a finite element system matrix, assembled by looping over elements and integration points. Below I have made the problem more abstract.
I initialize the matrix by first constructing a list of triplets (as suggested in the Eigen documentation). I then perform the assembly in concurrent loops using coeffRef (see example below). The question concerns the fact that coeffRef "performs a binary search", while I know exactly where each item is in the list of triplets (T below). More specifically:
Is it more efficient to modify the list of triplets to avoid coeffRef, at the cost of having to reinitialize the sparse matrix?
If one wants to modify a value in the list of triplets, is there something more elegant than
T[i] = Trip(T[i].row(),T[i].col(),T[i].value()+X);
I realize that the answer may largely depend on the bandwidth of the matrix (i.e. how costly the search is), but there might be generic things to say about this.
Example
#include <iostream>
#include <Eigen/Sparse>
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::Triplet <double> Trip;
int main(void)
{
size_t N = 100;
SpMat A(N,N);
std::vector<Trip> T;
T.reserve(3*N);
for ( size_t i=0; i<N; ++i )
{
if ( i==0 ) T.push_back(Trip(i,i ,-1.0));
else T.push_back(Trip(i,i-1,-1.0));
T.push_back(Trip(i,i,+2.0));
if ( i==N-1 ) T.push_back(Trip(i,0 ,-1.0));
else T.push_back(Trip(i,i+1,-1.0));
}
A.setFromTriplets(T.begin(),T.end());
for ( size_t i=0; i<N; ++i )
A.coeffRef(i,i) += static_cast<double>(i);
return 0;
}
Compiled using e.g.:
clang++ -I/usr/local/include/eigen3 test.cpp
My guess is that as long as the coefficients accessed by coeffRef already exist in the matrix, then calling coeffRef should be faster than reconstructing the matrix from the triplet list.
You might also outsmart the binary search performed by coeffRef by directly accessing the underlying data structure with A.valuePtr()[A.outerIndexPtr()[i]+some_offset] += ..., assuming you can directly compute some_offset taking advantage of the known structure.
Finally, if you need to update all entries, you can also sequentially iterate over them using an InnerIterator it and update the entries with it.valueRef() += ....

Using Eigen and C++ to do a colsum of massive matrix product

I am trying to compute colsum(N * P), where N is a sparse, 1M by 2500 matrix, and P is a dense 2500 by 1.5M matrix. I am using the Eigen C++ library with Intel's MKL library. The issue is that the matrix N*P can't actually exist in memory, it's way too big (~10 TB). My question is whether Eigen will be able to handle this computation through some combination of lazy evaluation and parallelism? It says here that Eigen won't make temporary matrices unnecessarily: http://eigen.tuxfamily.org/dox-devel/TopicLazyEvaluation.html
But does Eigen know to compute N * P in piecewise chunks that will actually fit in memory? IE: it will have to do something like colsum(N * P_1) ++ colsum(N * P_2) ++ .. ++ colsum(N * P_n), where P is split into n different submatrices column-wise and "++" is concatenation.
I am working with 128 GB RAM.
I gave it a try but ended up with a bad malloc (I'm only running on 8GB on Win8). I set up my main() and used a not inline colsum function I wrote.
int main(int argc, char *argv[])
{
Eigen::MatrixXd dense = Eigen::MatrixXd::Random(1000, 100000);
Eigen::SparseMatrix<double> sparse(100000, 1000);
typedef Triplet<int> Trip;
std::vector<Trip> trps(dense.rows());
for(int i = 0; i < dense.rows(); i++)
{
trps[i] = Trip(20*i, i, 2);
}
sparse.setFromTriplets(trps.begin(), trps.end());
VectorXd res = colsum(sparse, dense);
std::cout << res;
std::cin >> argc;
return 0;
}
The attempt was simply:
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
return (sparse * dense).colwise().sum();
}
That had a bad malloc. Sol it looks like you have to split it up manually on your own (unless someone else has a better solution).
EDIT
I improved the function a bit, but the get the same bad malloc:
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
return (sparse * dense).topRows(4).colwise().sum();
}
EDIT 2
Another option would be to make the sparse matrix dense and force a lazy evaluation. I don't think that it would work with a sparse matrix (oh well).
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
Eigen::MatrixXd denseSparse(sparse);
return denseSparse.lazyProduct(dense).colwise().sum();
}
This doesn't give me the bad malloc, but computes a lot of pointless 0*x_i expressions.
To answer your question: Especially, when products are involved, Eigen often evaluates parts of expressions into temporaries. In some situations this could be optimized but is not implemented yet, in some cases this is essentially the most efficient way to implement it.
However, in your case you could simply calculate the colsum of N (a 1 x 2500 vector) and multiply that by P.
Maybe future versions of Eigen will be able to make this kind of optimization themselves, but most of the time it is a good idea to make problem-specific optimizations oneself before letting the computer do the rest of the work.
Btw: I'm afraid sparse.colwise() is not implemented yet, so you must compute that manually. If you are lazy, you can instead compute Eigen::RowVectorXd Nsum = Eigen::RowVectorXd::Ones(N.rows())*P; (I have not checked it, but this might actually get optimized to near optimal code, with the most recent versions of Eigen).

Most efficient option for build 3D structures using Eigen matrices

I need a 3D matrix/array structure on my code, and right now I'm relying on Eigen for both my matrices and vectors.
Right now I am creating a 3D structure using new:
MatrixXd* cube= new MatrixXd[60];
for (int i; i<60; i++) cube[i]=MatrixXd(60,60);
and for acessing the values:
double val;
MatrixXd pos;
for (int i; i<60; i++){
pos=cube[i];
for (int j; j<60; j++){
for (int k; k<60; k++){
val=pos(j,k);
//...
}
}
}
However, right now it is very slow in this part of the code, which makes me beleive that this might not be the most efficient way. Are there any alternatives?
While it was not available, when the question was asked, Eigen has been providing a Tensor module for a while now. It is still in an "unsupported" stage (meaning the API may change), but basic functionality should be mostly stable. The documentation is scattered here and here.
A solution I used is to form a fat matrix containing all the matrices you need stacked.
MatrixXd A(60*60,60);
and then access them with block operations
A0 = A.block<60,60>(0*60,0);
...
A5 = A.block<60,60>(5*60,0);
An alternative is to create a very large chunk of memory ones, and maps Eigen matrices from it:
double* data = new double(60*60 * 60*60*60);
Map<MatrixXd> Mijk(data+60*(60*(60*k)+j)+i), 60, 60);
At this stage you can use Mijk like a MatrixXd object. However, since this not a MatrixXd type, if you want to pass it to a function, your function must either:
be of the form foo(Map<MatrixXd> mat)
be a template function: template<typename Der> void foo(const MatrixBase<Der>& mat)
take a Ref<MatrixXd> object which can handle both Map<> and Matrix<> objects without being a template function and without copies. (doc)

Slow performance of sparse matrix using std::vector

I'm trying to implement the functionality of MATLAB function sparse.
Insert a value in sparse matrix at a specific index such that:
If a value with same index is already present in the matrix, then the new and old values are added.
Else the new value is appended to the matrix.
The function addNode performs correctly but the problem is that it is extremely slow. I call this function in a loop about 100000 times and the program takes more than 3 minutes to run. While MATLAB accomplishes this task in a matter of seconds. Is there any way to optimize the code or use stl algorithms instead of my own function to achieve what I want?
Code:
struct SparseMatNode
{
int x;
int y;
float value;
};
std::vector<SparseMatNode> SparseMatrix;
void addNode(int x, int y, float val)
{
SparseMatNode n;
n.x = x;
n.y = y;
n.value = val;
bool alreadyPresent = false;
int i = 0;
for(i=0; i<SparseMatrix.size(); i++)
{
if((SparseMatrix[i].x == x) && (SparseMatrix[i].y == y))
{
alreadyPresent = true;
break;
}
}
if(alreadyPresent)
{
SparseMatrix[i].value += val;
if(SparseMatrix[i].value == 0.0f)
SparseMatrix.erase(SparseMatrix.begin + i);
}
else
SparseMatrix.push_back(n);
}
Sparse matrices aren't typically stored as a vector of triplets as you are attempting.
MATLAB (as well as many other libraries) uses a Compressed Sparse Column (CSC) data structure, which is very efficient for static matrices. The MATLAB function sparse also does not build the matrix one entry at a time (as you are attempting) - it takes an array of triplet entries and packs the whole sequence into a CSC matrix. If you are attempting to build a static sparse matrix this is the way to go.
If you want a dynamic sparse matrix object, that supports efficient insertion and deletion of entries, you could look at different structures - possibly a std::map of triplets, or an array of column lists - see here for more information on data formats.
Also, there are many good libraries. If you're wanting to do sparse matrix operations/factorisations etc - SuiteSparse is a good option, otherwise Eigen also has good sparse support.
Sparse matrices are usually stored in compressed sparse row (CSR) or compressed sparse column (CSC, also called Harwell-Boeing) format. MATLAB by default uses CSC, IIRC, while most sparse matrix packages tend to use CSR.
Anyway, if this is for production usage rather than a learning exercise, I'd recommend using a matrix package with support for sparse matrices. In the C++ world, my favourite is Eigen.
The first thinks that stands out is that you are implementing your own functionality for finding an element: that's what std::find is for. So, instead of:
bool alreadyPresent = false;
int i = 0;
for(i=0; i<SparseMatrix.size(); i++)
{
if((SparseMatrix[i].x == x) && (SparseMatrix[i].y == y))
{
alreadyPresent = true;
break;
}
}
You should write:
auto it = std::find(SparseMatrix.begin(), SparseMatrix().end(), Comparer);
where Comparer is a function that compares two SparseMatNode objects.
But the main improvement will come from using the appropriate container. Instead of std::vector, you will be much better off using an associative container. This way, finding an element will have just a O(logN) complexity instead of O(N). You may slighly modify your SparseMatNode class as follows:
typedef std::pair<int, int> Coords;
typedef std::pair<const Coords, float> SparseMatNode;
You may cover this typedefs inside a class to provide a better interface, of course.
And then:
std::unordered_map<Coords, float> SparseMatrix;
This way you can use:
auto it = SparseMatrix.find(std::make_pair(x, y));
to find elements much more efficiently.
Have you tried sorting your vector of sparse nodes? Performing a linear search becomes costly every time you add a node. You could Insert In Place and always perform Binary Search.
Because sparse matrix may be huge and need to be compressed, you may use std::unordered_map. I assume matrix indexes (x and y) are always positive.
#include <unordered_map>
const size_t MAX_X = 1000*1000*1000;
std::unordered_map <size_t, float> matrix;
void addNode (size_t x, size_t y, float val)
{
size_t index = x + y*MAX_X;
matrix[index] += val; //this function can be still faster
if (matrix[index] == 0) //using find() / insert() methods
matrix.erase(index);
}
If std::unordered_map is not available on your system, you may try std::tr1::unordered_map or stdext::hash_map...
If you can use more memory, then use double instead of float, this will improve a bit your processing speed.