How to efficiently initialize a SparseVector in Eigen - c++

In the Eigen docs for filling a sparse matrix it is recommended to use the triplet filling method as it can be much more efficient than making calls to coeffRef, which involves a binary search.
For filling SparseVectors however, there is no clear recommendation on how to do it efficiently.
The suggested method in this SO answer uses coeffRef which means that a binary search is performed for every insertion.
Is there a recommended, efficient way to build sparse vectors? Should I try to create a single row SparseMatrix and then store that as a SparseVector?
My use case is reading in LibSVM files, in which there can be millions of very sparse features and billions of data points. I'm currently representing these as an std::vector<Eigen::SparseVector>. Perhaps I should just use SparseMatrix instead?
Edit: One thing I've tried is this:
// for every data point in a batch do the following:
Eigen::SparseMatrix<float> features(1, num_features);
// copy the data over
typedef Eigen::Triplet<float> T;
std::vector<T> tripletList;
for (int j = 0; j < num_batch_instances; ++j) {
for (size_t i = batch.offset[j]; i < batch.offset[j + 1]; ++i) {
uint32_t index = batch.index[i];
float fvalue = batch.value;
if (index < num_features) {
tripletList.emplace_back(T(0, index, fvalue));
}
}
features.setFromTriplets(tripletList.begin(), tripletList.end());
samples->emplace_back(Eigen::SparseVector<float>(features));
}
This creates a SparseMatrix using the triplet list approach, then creates a SparseVector from that object. In my experiments with ~1.4M features and very high sparsity this is 2 orders of magnitude slower than using SparseVector and coeffRef, which I definitely did not expect.

Related

Vector dot product in Microsoft SEAL with CKKS

I am currently trying to implement matrix multiplication methods using the Microsoft SEAL library. I have created a vector<vector<double>> as input matrix and encoded it with CKKSEncoder. However the encoder packs an entire vector into a single Plaintext so I just have a vector<Plaintext> which makes me lose the 2D structure (and then of course I'll have a vector<Ciphertext> after encryption). Having a 1D vector allows me to access only the rows entirely but not the columns.
I managed to transpose the matrices before encoding. This allowed me to multiply component-wise the rows of the first matrix and columns (rows in transposed form) of the second matrix but I am unable to sum the elements of the resulting vector together since it's packed into a single Ciphertext. I just need to figure out how to make the vector dot product work in SEAL to perform matrix multiplication. Am I missing something or is my method wrong?
It has been suggested by KyoohyungHan in the issue: https://github.com/microsoft/SEAL/issues/138 that it is possible to solve the problem with rotations by rotating the output vector and summing it up repeatedly.
For example:
// my_output_vector is the Ciphertext output
vector<Ciphertext> rotations_output(my_output_vector.size());
for(int steps = 0; steps < my_output_vector.size(); steps++)
{
evaluator.rotate_vector(my_output_vector, steps, galois_keys, rotations_output[steps]);
}
Ciphertext sum_output;
evaluator.add_many(rotations_output, sum_output);
vector of vectors is not the same as an array of arrays (2D, matrix).
While one-dimentional vector<double>.data() points to contiguous memory space (e.g., you can do memcpy on that), each of "subvectors" allocates own, separate memory buffer. Therefore vector<vector<double>>.data() makes no sense and cannot be used as a matrix.
In C++, two-dimensional array array2D[W][H] is stored in memory identically to array[W*H]. Therefore both can be processed by the same routines (when it makes sense). Consider the following example:
void fill_array(double *array, size_t size, double value) {
for (size_t i = 0; i < size; ++i) {
array[i] = value;
}
}
int main(int argc, char *argv[])
{
constexpr size_t W = 10;
constexpr size_t H = 5;
double matrix[W][H];
// using 2D array as 1D to fill all elements with 5.
fill_array(&matrix[0][0], W * H, 5);
for (const auto &row: matrix) {
for (const auto v : row) {
cout << v << '\t';
}
cout << '\n';
}
return 0;
}
In the above example, you can substitute double matrix[W][H]; with vector<double> matrix(W * H); and feed matrix.data() into fill_array(). However, you cannot declare vector(W) of vector(H).
P.S. There are plenty of C++ implementations of math vector and matrix. You can use one of those if you don't want to deal with C-style arrays.

Fast way to slice an Eigen SparseMatrix

In finite element analyses it is quite common to apply some prescribed condition(s) to a big sparse matrix and get a reduced one. This can be achieved easily in MATLAB, SciPy and Julia, for instance, in MATLAB
a=sprand(10000,10000,0.2); % create a random sparse matrix; fill %20
tic; c=a(1:2:4000,2:3:5000); toc % slice the matrix to get a reduced one
Assuming that one has a similar sparse matrix in Eigen, what is the most efficient way to slice an Eigen matrix. I don't care about a copy or a view, but the methodology needs to be able to cope up with non-contiguous slicing. The latter requirement makes the Eigen block operations useless in this regard.
I can think of two methodologies that I have tested:
Iterate over the columns and rows using for loops and assign the values to a second sparse matrix (I know this is a truly bad idea).
Create a dummy sparse matrix with zeros and ones and pre and post multiply it with the actual matrix D*A*A.transpose()
I always use setFromTriplets to create a sparse matrices in Eigen and I have been happy with the solvers and assembling of sparse matrices. However it seems that this slicing is the bottleneck in my code at the moment
The timing of MATLAB vs Eigen (using -O3 -DNDEBUG -march=native) is
MATLAB: 0.016 secs
EIGEN LOOP INDEXING: 193 secs
EIGEN PRE-POST MUL: 13.7 secs
The other methodology that I do not know how to go about is to manipulate directly the [I,J,V] triplets outerIndexPtr, innerIndexPtr, valuePtr.
Here is a proof of concept code snippet
#include <Eigen/Core>
#include <Eigen/Sparse>
template<typename T>
using spmatrix = Eigen::SparseMatrix<T,Eigen::RowMajor>;
spmatrix<double> sprand(int rows, int cols, double sparsity) {
std::default_random_engine gen;
std::uniform_real_distribution<double> dist(0.0,1.0);
int sparsity_ = sparsity*100;
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList; tripletList.reserve(rows*cols);
int counter = 0;
for(int i=0;i<rows;++i) {
for(int j=0;j<cols;++j) {
if (counter % sparsity_ == 0) {
auto v_ij=dist(gen);
tripletList.push_back(T(i,j,v_ij));
}
counter++;
}
}
spmatrix<double> mat(rows,cols);
mat.setFromTriplets(tripletList.begin(), tripletList.end());
return mat;
}
int main() {
int m=1000,n=10000;
auto a = sprand(n,n,0.05);
auto b = sprand(m,n,0.1);
spmatrix<double> c;
// this is efficient but definitely not the right way to do this
// c = b*a*b.transpose(); // uncomment to check, much slower than block operation
c = a.block(0,0,1000,1000); // very fast, Faster than MATLAB (I believe this is just a view)
return 0;
}
So Any pointers, in this direction would be useful.

filling only half of Matrix using OpenMp in C++

I have a quite big matrix. I would like to fill half of the matrix in parallel.
m_matrix is 2D std vector. Any suggestion for the type of container is appreciated as well. What _fill(i,j) function is doing is not considered heavy compared to size of the matrix.
//i: row
//j: column
for (size_t i=1; i<num_row; ++i)
{
for (size_t j=0; j<i; ++j)
{
m_matrix[i][j] = _fill(i, j);
}
}
What would be a nice openMP structure for that? I tried dynamic strategy bet I got even time increase compared to the sequential mode.

C++ Matrix product: increased speed with little changes

I'm writing a code to multiply a sparse matrix with a full matrix.
I've created 2 class: SparseMatrix and Matrix, which store datas as a vector of shared pointers to vectors. In the SparseMatrix case i save item as an object, called SparseMatrixItem with 2 attributes: position and values. In the Matrix case I simply save the value.
They can both be row or column based, by the value of a bool attribute.
Now I'm trying to write an efficient version of the standard product between the 2 matrixes. By semplicity in the first implementation I consider only the case in which the first matrix is a row based SparseMatrix and the second is a column based Matrix. I write the code into the class SparseMatrix by overloading the operator *.
I post my implementation:
template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
int numberOfRow=newMatrix.getNumberOfRows();
int numberOfColumn=newMatrix.getNumberOfColumns();
for (int i=0; i<numberOfRow; i++) {
vector<SparseMatrixItem<scalar>>& readRow(*horizontalVectorMatrix[i]);
vector<scalar>& writeRow(*newMatrix.internalMatrix[i]);
for (int j=0; j<numeroColonne; j++) {
vector<scalar>& readColumn1(*input2.internalMatrix[j]);
writeRow[j]=fastScalarProduct(readRow, readColumn1);
}
}
}
The strange fact I cannot figure out is that if I change the 2 loop order performance are dramatically faster.
I test it with 2 matrix: 6040x4000 and 4000*6040, the first implementation tooks nearly 30 seconds,while the second implementation tooks only 12 seconds.
I post it:
template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
int numberOfRow=newMatrix.getNumberOfRows();
int numeroColonne=newMatrix.getNumberOfColumns();
for (int j=0; j<numeroColonne; j++) {
vector<scalar>& readColumn(*input2.internalMatrix[j]);
vector<scalar>& writeColumn(*newMatrix.internalMatrix[j]);
for (int i=0; i<numberOfRow; i++) {
vector<SparseMatrixItem<scalar>>& readRow(*matriceOrizzontaleVettori[i]);
writeColumn[i]=fastScalarProduct(readRow, readColumn);
}
}
}
I post also the code of the function fastScalarProduct() that I use:
template <typename scalar>
scalar SparseVectorMatrix<scalar>::fastScalarProduct
( vector<SparseMatrixItem<scalar>> &vector1
, const vector<scalar> &vector2
) {
int totalSum=0;
int position;
auto sizeVector1=vector1.size();
for (int i=0; i<sizeVector1; i++) {
position=vector1[i].position-1;
if (vector2[position]) {
totalSum+=(vector1[i].value)*vector2[position];
}
}
return totalSum;
}
I try the same product with MATLAB and it takes only 1.5 seconds more or less. I think that there are issues with cache memory, but since I'm new to this kind of problems I cannot figure out the real problem.
I'm also trying to write an efficient full matrix product, and I'm facing the same problems.
You are right in saying the "issue" is with cache memory. I suggest you read about locality of reference (http://en.wikipedia.org/wiki/Locality_of_reference) which explains why your program runs faster when the loop with the most iterations is inside the one with less iterations. Basically, arrays are linear data sctructures, and they make great use of spatial locality.
As for the time it took to run the algorithm in matlab vs C++, I suggest you read this post: Why is MATLAB so fast in matrix multiplication?

Eigen MatrixXd push back in c++

Eigen is a well known matrix Library in c++. I am having trouble finding an in built function to simply push an item on to the end of a matrix. Currently I know that it can be done like this:
Eigen::MatrixXd matrix(10, 3);
long int count = 0;
long int topCount = 10;
for (int i = 0; i < listLength; ++i) {
matrix(count, 0) = list.x;
matrix(count, 1) = list.y;
matrix(count, 2) = list.z;
count++;
if (count == topCount) {
topCount *= 2;
matrix.conservativeResize(topCount, 3);
}
}
matrix.conservativeResize(count, 3);
And this will work (some of the syntax may be out). But its pretty convoluted for a simple thing to do. Is there already an in built function?
There is no such function for Eigen matrices. The reason for this is such a function would either be very slow or use excessive memory.
For a push_back function to not be prohibitively expensive it must increase the matrix's capacity by some factor when it runs out of space as you have done. However when dealing with matrices, memory usage is often a concern so having a matrix's capacity be larger than necessary could be problematic.
If it instead increased the size by rows() or cols() each time the operation would be O(n*m). Doing this to fill an entire matrix would be O(n*n*m*m) which for even moderately sized matrices would be quite slow.
Additionally, in linear algebra matrix and vector sizes are nearly always constant and known beforehand. Often when resizeing a matrix you don't care about the previous values in the matrix. This is why Eigen's resize function does not retain old values, unlike std::vector's resize.
The only case I can think of where you wouldn't know the matrix's size beforehand is when reading from a file. In this case I would either load the data first into a standard container such as std::vector using push_back and then copy it into an already sized matrix, or if memory is tight run through the file once to get the size and then a second time to copy the values.
There is no such function, however, you can build something like this yourself:
using Eigen::MatrixXd;
using Eigen::Vector3d;
template <typename DynamicEigenMatrix>
void push_back(DynamicEigenMatrix& m, Vector3d&& values, std::size_t row)
{
if(row >= m.rows()) {
m.conservativeResize(row + 1, Eigen::NoChange);
}
m.row(row) = values;
}
int main()
{
MatrixXd matrix(10, 3);
for (std::size_t i = 0; i < 10; ++i) {
push_back(matrix, Vector3d(1,2,3), i);
}
std::cout << matrix << "\n";
return 0;
}
If this needs to perform too many resizes though, it's going to be horrendously slow.