Slow performance of sparse matrix using std::vector

Slow performance of sparse matrix using std::vector - c++

I'm trying to implement the functionality of MATLAB function sparse.
Insert a value in sparse matrix at a specific index such that:
If a value with same index is already present in the matrix, then the new and old values are added.
Else the new value is appended to the matrix.
The function addNode performs correctly but the problem is that it is extremely slow. I call this function in a loop about 100000 times and the program takes more than 3 minutes to run. While MATLAB accomplishes this task in a matter of seconds. Is there any way to optimize the code or use stl algorithms instead of my own function to achieve what I want?
Code:
struct SparseMatNode
{
int x;
int y;
float value;
};
std::vector<SparseMatNode> SparseMatrix;
void addNode(int x, int y, float val)
{
SparseMatNode n;
n.x = x;
n.y = y;
n.value = val;
bool alreadyPresent = false;
int i = 0;
for(i=0; i<SparseMatrix.size(); i++)
{
if((SparseMatrix[i].x == x) && (SparseMatrix[i].y == y))
{
alreadyPresent = true;
break;
}
}
if(alreadyPresent)
{
SparseMatrix[i].value += val;
if(SparseMatrix[i].value == 0.0f)
SparseMatrix.erase(SparseMatrix.begin + i);
}
else
SparseMatrix.push_back(n);
}

Sparse matrices aren't typically stored as a vector of triplets as you are attempting.
MATLAB (as well as many other libraries) uses a Compressed Sparse Column (CSC) data structure, which is very efficient for static matrices. The MATLAB function sparse also does not build the matrix one entry at a time (as you are attempting) - it takes an array of triplet entries and packs the whole sequence into a CSC matrix. If you are attempting to build a static sparse matrix this is the way to go.
If you want a dynamic sparse matrix object, that supports efficient insertion and deletion of entries, you could look at different structures - possibly a std::map of triplets, or an array of column lists - see here for more information on data formats.
Also, there are many good libraries. If you're wanting to do sparse matrix operations/factorisations etc - SuiteSparse is a good option, otherwise Eigen also has good sparse support.

Sparse matrices are usually stored in compressed sparse row (CSR) or compressed sparse column (CSC, also called Harwell-Boeing) format. MATLAB by default uses CSC, IIRC, while most sparse matrix packages tend to use CSR.
Anyway, if this is for production usage rather than a learning exercise, I'd recommend using a matrix package with support for sparse matrices. In the C++ world, my favourite is Eigen.

The first thinks that stands out is that you are implementing your own functionality for finding an element: that's what std::find is for. So, instead of:
bool alreadyPresent = false;
int i = 0;
for(i=0; i<SparseMatrix.size(); i++)
{
if((SparseMatrix[i].x == x) && (SparseMatrix[i].y == y))
{
alreadyPresent = true;
break;
}
}
You should write:
auto it = std::find(SparseMatrix.begin(), SparseMatrix().end(), Comparer);
where Comparer is a function that compares two SparseMatNode objects.
But the main improvement will come from using the appropriate container. Instead of std::vector, you will be much better off using an associative container. This way, finding an element will have just a O(logN) complexity instead of O(N). You may slighly modify your SparseMatNode class as follows:
typedef std::pair<int, int> Coords;
typedef std::pair<const Coords, float> SparseMatNode;
You may cover this typedefs inside a class to provide a better interface, of course.
And then:
std::unordered_map<Coords, float> SparseMatrix;
This way you can use:
auto it = SparseMatrix.find(std::make_pair(x, y));
to find elements much more efficiently.

Have you tried sorting your vector of sparse nodes? Performing a linear search becomes costly every time you add a node. You could Insert In Place and always perform Binary Search.

Because sparse matrix may be huge and need to be compressed, you may use std::unordered_map. I assume matrix indexes (x and y) are always positive.
#include <unordered_map>
const size_t MAX_X = 1000*1000*1000;
std::unordered_map <size_t, float> matrix;
void addNode (size_t x, size_t y, float val)
{
size_t index = x + y*MAX_X;
matrix[index] += val; //this function can be still faster
if (matrix[index] == 0) //using find() / insert() methods
matrix.erase(index);
}
If std::unordered_map is not available on your system, you may try std::tr1::unordered_map or stdext::hash_map...
If you can use more memory, then use double instead of float, this will improve a bit your processing speed.

Related

Eigen: modify sparse matrix' triplet list, instead of using coeffRef

I am facing the problem of assembling an Eigen::SparseMatrix. In reality it concerns a finite element system matrix, assembled by looping over elements and integration points. Below I have made the problem more abstract.
I initialize the matrix by first constructing a list of triplets (as suggested in the Eigen documentation). I then perform the assembly in concurrent loops using coeffRef (see example below). The question concerns the fact that coeffRef "performs a binary search", while I know exactly where each item is in the list of triplets (T below). More specifically:
Is it more efficient to modify the list of triplets to avoid coeffRef, at the cost of having to reinitialize the sparse matrix?
If one wants to modify a value in the list of triplets, is there something more elegant than
T[i] = Trip(T[i].row(),T[i].col(),T[i].value()+X);
I realize that the answer may largely depend on the bandwidth of the matrix (i.e. how costly the search is), but there might be generic things to say about this.
Example
#include <iostream>
#include <Eigen/Sparse>
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::Triplet <double> Trip;
int main(void)
{
size_t N = 100;
SpMat A(N,N);
std::vector<Trip> T;
T.reserve(3*N);
for ( size_t i=0; i<N; ++i )
{
if ( i==0 ) T.push_back(Trip(i,i ,-1.0));
else T.push_back(Trip(i,i-1,-1.0));
T.push_back(Trip(i,i,+2.0));
if ( i==N-1 ) T.push_back(Trip(i,0 ,-1.0));
else T.push_back(Trip(i,i+1,-1.0));
}
A.setFromTriplets(T.begin(),T.end());
for ( size_t i=0; i<N; ++i )
A.coeffRef(i,i) += static_cast<double>(i);
return 0;
}
Compiled using e.g.:
clang++ -I/usr/local/include/eigen3 test.cpp

My guess is that as long as the coefficients accessed by coeffRef already exist in the matrix, then calling coeffRef should be faster than reconstructing the matrix from the triplet list.
You might also outsmart the binary search performed by coeffRef by directly accessing the underlying data structure with A.valuePtr()[A.outerIndexPtr()[i]+some_offset] += ..., assuming you can directly compute some_offset taking advantage of the known structure.
Finally, if you need to update all entries, you can also sequentially iterate over them using an InnerIterator it and update the entries with it.valueRef() += ....

How to remove the same row location from multiple vectors?

I have n number of vectors (single column), which are correlated. One of them (let's say the first of the n vectors) has a bunch of NaNs in it. I have used the erase - remove_if idiom to clear this vector of the rows which contain the NaNs. I want to remove the exact same row from all the other vectors either simultaneously or after the fact. This seems like it would be a common coding problem, but I can't find an example. I'm coding in C++, with OpenCV libraries.
Here is my code sample that doesn't work, which I think is kind of what Miki is suggesting
vector<float> RemoveManyEs(vector<float> &V1, vector<float> &V2, vector<float> &V3)
{
int length = V1.size();
int n = 0;
do
{
if (isnan(V1.at(n)))
{
V1.erase(V1.begin() + n);
V2.erase(V2.begin() + n);
V3.erase(V3.begin() + n);
}
n += 1;
} while (n < length);
return V1,V2,V3;
}

Consider whether or not you really need to delete the rows or whether it would suffice to set them to zero. If less than 10% of the rows need to be deleted, then removing them would have a negligible effect on run-time (and could even make things take longer overall after the overhead of reallocation).
If you are computing correlation matrices for example, then you'll get the same result by zeroing out all elements. This is generally a much simpler operation and will give you the result you need for most applications.

How to verify if a vector has a value at a certain index

In a "self-avoiding random walk" situation, I have a 2-dimensional vector with a configuration of step-coordinates. I want to be able to check if a certain site has been occupied, but the problem is that the axis can be zero, so checking if the fabs() of the coordinate is true (or that it has a value), won't work. Therefore, I've considered looping through the steps and checking if my coordinate equals another coordinate on all axis, and if it does, stepping back and trying again (a so-called depth-first approach).
Is there a more efficient way to do this? I've seen someone use a boolean array with all possible coordinates, like so:
bool occupied[nMax][nMax]; // true if lattice site is occupied
for (int y = -rMax; y <= rMax; y++)
for (int x = -rMax; x <= rMax; x++)
occupied[index(y)][index(x)] = false;
But, in my program the number of dimensions is unknown, so would an approach such as:
typedef std::vector<std::vector<long int>> WalkVec;
WalkVec walk(1, std::vector<long int>(dof,0));
siteVisited = false; counter = 0;
while (counter < (walkVec.back().size()-1))
{
tdof = 1;
while (tdof <= dimensions)
{
if (walkHist.back().at(tdof-1) == walkHist.at(counter).at(tdof-1) || walkHist.back().at(tdof-1) == 0)
{
siteVisited = true;
}
else
{
siteVisited = false;
break;
}
tdof++;
}
work where dof if the number of dimensions. (the check for zero checks if the position is the origin. Three zero coordinates, or three visited coordinates on the same step is the only way to make it true)
Is there a more efficient way of doing it?

You can do this check in O(log n) or O(1) time using STL's set or unordered_set respectively. The unordered_set container requires you to write a custom hash function for your coordinates, while the set container only needs you to provide a comparison function. The set implementation is particularly easy, and logarithmic time should be fast enough:
#include <iostream>
#include <set>
#include <vector>
#include <cassert>
class Position {
public:
Position(const std::vector<long int> &c)
: m_coords(c) { }
size_t dim() const { return m_coords.size(); }
bool operator <(const Position &b) const {
assert(b.dim() == dim());
for (size_t i = 0; i < dim(); ++i) {
if (m_coords[i] < b.m_coords[i])
return true;
if (m_coords[i] > b.m_coords[i])
return false;
}
return false;
}
private:
std::vector<long int> m_coords;
};
int main(int argc, const char *argv[])
{
std::set<Position> visited;
std::vector<long int> coords(3, 0);
visited.insert(Position(coords));
while (true) {
std::cout << "x, y, z: ";
std::cin >> coords[0] >> coords[1] >> coords[2];
Position candidate(coords);
if (visited.find(candidate) != visited.end())
std::cout << "Aready visited!" << std::endl;
else
visited.insert(candidate);
}
return 0;
}
Of course, as iavr mentions, any of these approaches will require O(n) storage.
Edit: The basic idea here is very simple. The goal is to store all the visited locations in a way that allows you to quickly check if a particular location has been visited. Your solution had to scan through all the visited locations to do this check, which makes it O(n), where n is the number of visited locations. To do this faster, you need a way to rule out most of the visited locations so you don't have to compare against them at all.
You can understand my set-based solution by thinking of a binary search on a sorted array. First you come up with a way to compare (sort) the D-dimensional locations. That's what the Position class' < operator is doing. As iavr pointed out in the comments, this is basically just a lexicographic comparison. Then, when all the visited locations are sorted in this order, you can run a binary search to check if the candidate point has been visited: you recursively check if the candidate would be found in the upper or lower half of the list, eliminating half of the remaining list from comparison at each step. This halving of the search domain at each step gives you logarithmic complexity, O(log n).
The STL set container is just a nice data structure that keeps your elements in sorted order as you insert and remove them, ensuring insertion, removal, and queries are all fast. In case you're curious, the STL implementation I use uses a red-black tree to implement this data structure, but from your perspective this is irrelevant; all that matters is that, once you give it a way to compare elements (the < operator), inserting elements into the collection (set::insert) and asking if an element is in the collection (set::find) are O(log n). I check against the origin by just adding it to the visited set--no reason to treat it specially.
The unordered_set is a hash table, an asymptotically more efficient data structure (O(1)), but a harder one to use because you must write a good hash function. Also, for your application, going from O(n) to O(log n) should be plenty good enough.

Your question concerns the algorithm rather the use of the (C++) language, so here is a generic answer.
What you need is a data structure to store a set (of point coordinates) with an efficient operation to query whether a new point is in the set or not.
Explicitly storing the set as a boolean array provides constant-time query (fastest), but at space that is exponential in the number of dimensions.
An exhaustive search (your second option) provides queries that are linear in the set size (walk length), at a space that is also linear in the set size and independent of dimensionality.
The other two common options are tree structures and hash tables, e.g. available as std::set (typically using a red-black tree) and std::unordered_set (the latter only in C++11). A tree structure typically has logarithmic-time query, while a hash table query can be constant-time in practice, almost bringing you back to the complexity of a boolean array. But in both cases the space needed is again linear in the set size and independent of dimensionality.

Get number of elements greater than a number

I am trying to solve the following problem: Numbers are being inserted into a container. Each time a number is inserted I need to know how many elements are in the container that are greater than or equal to the current number being inserted. I believe both operations can be done in logarithmic complexity.
My question:
Are there standard containers in a C++ library that can solve the problem?
I know that std::multiset can insert elements in logarithmic time, but how can you query it? Or should I implement a data structure (e.x. a binary search tree) to solve it?

Great question. I do not think there is anything in STL which would suit your needs (provided you MUST have logarithmic times). I think the best solution then, as aschepler says in comments, is to implement a RB tree. You may have a look at STL source code, particularly on stl_tree.h to see whether you could use bits of it.
Better still, look at : (Rank Tree in C++)
Which contains link to implementation:
(http://code.google.com/p/options/downloads/list)

You should use a multiset for logarithmic complexity, yes. But computing the distance is the problem, as set/map iterators are Bidirectional, not RandomAccess, std::distance has an O(n) complexity on them:
multiset<int> my_set;
...
auto it = my_map.lower_bound(3);
size_t count_inserted = distance(it, my_set.end()) // this is definitely O(n)
my_map.insert(make_pair(3);
Your complexity-issue is complicated. Here is a full analysis:
If you want a O(log(n)) complexity for each insertion, you need a sorted structure as a set. If you want the structure to not reallocate or move items when adding a new item, the insertion point distance computation will be O(n). If know the insertion size in advance, you do not need logarithmic insertion time in a sorted container. You can insert all the items then sort, it is as much O(n.log(n)) as n * O(log(n)) insertions in a set.
The only alternative is to use a dedicated container like a weighted RB-tree. Depending on your problem this may be the solution, or something really overkill.
Use multiset and distance, you are O(n.log(n)) on insertion (yes, n insertions * log(n) insertion time for each one of them), O(n.n) on distance computation, but computing distances is very fast.
If you know the inserted data size (n) in advance : Use a vector, fill it, sort it, return your distances, you are O(n.log(n)), and it is easy to code.
If you do not know n in advance, your n is likely huge, each item is memory-heavy so you can not have O(n.log(n)) reallocation : then you have time to re-encode or re-use some non-standard code, you really have to meet these complexity expectations, use a dedicated container. Also consider using a database, you will probably have issues maintaining this in memory.

Here's a quick way using Policy-Based Data Structures in C++:
There exists something called as an Ordered Set, which lets you insert/remove elements in O(logN) time (and pretty much all other functions that std::set has to offer). It also gives 2 more features: Find the Kth element and **find the rank of the Xth element. The problem is that this doesn't allow duplicates :(
No Worries though! We will map duplicates with a separate index/priority, and define a new structure (call it Ordered Multiset)! I've attached my implementation below for reference.
Finally, every time you want to find the no of elements greater than say x, call the function upper_bound (No of elements less than or equal to x) and subtract this number from the size of your Ordered Multiset!
Note: PBDS use a lot of memory, so that is a constraint, I'd suggest using a Binary Search Tree or a Fenwick Tree.
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
struct ordered_multiset { // multiset supporting duplicating values in set
int len = 0;
const int ADD = 1000010;
const int MAXVAL = 1000000010;
unordered_map<int, int> mp; // hash = 96814
tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> T;
ordered_multiset() { len = 0; T.clear(), mp.clear(); }
inline void insert(int x){
len++, x += MAXVAL;
int c = mp[x]++;
T.insert((x * ADD) + c); }
inline void erase(int x){
x += MAXVAL;
int c = mp[x];
if(c) {
c--, mp[x]--, len--;
T.erase((x*ADD) + c); } }
inline int kth(int k){ // 1-based index, returns the
if(k<1 || k>len) return -1; // K'th element in the treap,
auto it = T.find_by_order(--k); // -1 if none exists
return ((*it)/ADD) - MAXVAL; }
inline int lower_bound(int x){ // Count of value <x in treap
x += MAXVAL;
int c = mp[x];
return (T.order_of_key((x*ADD)+c)); }
inline int upper_bound(int x){ // Count of value <=x in treap
x += MAXVAL;
int c = mp[x];
return (T.order_of_key((x*ADD)+c)); }
inline int size() { return len; } // Number of elements in treap
};
Usage:
ordered_multiset s;
for(int i=0; i<n; i++) {
int x; cin>>x;
s.insert(x);
int ctr = s.size() - s.upper_bound(x);
cout<<ctr<<" ";
}
Input (n = 6) : 10 1 3 3 2
Output : 0 1 1 1 3
Time Complexity : O(log n) per query/insert
References : mochow13's GitHub

Sounds like a case for count_if - although I admit this doesn't solve it at logarithmic complexity, that would require a sorted type.
vector<int> v = { 1, 2, 3, 4, 5 };
int some_value = 3;
int count = count_if(v.begin(), v.end(), [some_value](int n) { return n > some_value; } );
Edit done to fix syntactic problems with lambda function

If the whole range of numbers is sufficiently small (on the order of a few million), this problem can be solved relatively easily using a Fenwick tree.
Although Fenwick trees are not part of the STL, they are both very easy to implement and time efficient. The time complexity is O(log N) for both updates and queries and the constant factors are low.
You mention in a comment on another question, that you needed this for a contest. Fenwick trees are very popular tools in competitive programming and are often useful.

Sorting eigenvectors by their eigenvalues (associated sorting)

I have an unsorted vector of eigenvalues and a related matrix of eigenvectors. I'd like to sort the columns of the matrix with respect to the sorted set of eigenvalues. (e.g., if eigenvalue[3] moves to eigenvalue[2], I want column 3 of the eigenvector matrix to move over to column 2.)
I know I can sort the eigenvalues in O(N log N) via std::sort. Without rolling my own sorting algorithm, how do I make sure the matrix's columns (the associated eigenvectors) follow along with their eigenvalues as the latter are sorted?

Typically just create a structure something like this:
struct eigen {
int value;
double *vector;
bool operator<(eigen const &other) const {
return value < other.value;
}
};
Alternatively, just put the eigenvalue/eigenvector into an std::pair -- though I'd prefer eigen.value and eigen.vector over something.first and something.second.

I've done this a number of times in different situations. Rather than sorting the array, just create a new array that has the sorted indices in it.
For example, you have a length n array (vector) evals, and a 2d nxn array evects. Create a new array index that has contains the values [0, n-1].
Then rather than accessing evals as evals[i], you access it as evals[index[i]] and instead of evects[i][j], you access it evects[index[i]][j].
Now you write your sort routine to sort the index array rather than the evals array, so instead of index looking like {0, 1, 2, ... , n-1}, the value in the index array will be in increasing order of the values in the evals array.
So after sorting, if you do this:
for (int i=0;i<n;++i)
{
cout << evals[index[i]] << endl;
}
you'll get a sorted list of evals.
this way you can sort anything that's associated with that evals array without actually moving memory around. This is important when n gets large, you don't want to be moving around the columns of the evects matrix.
basically the i'th smallest eval will be located at index[i] and that corresponds to the index[i]th evect.
Edited to add. Here's a sort function that I've written to work with std::sort to do what I just said:
template <class DataType, class IndexType>
class SortIndicesInc
{
protected:
DataType* mData;
public:
SortIndicesInc(DataType* Data) : mData(Data) {}
Bool operator()(const IndexType& i, const IndexType& j) const
{
return mData[i]<mData[j];
}
};

The solution purely relies on the way you store your eigenvector matrix.
The best performance while sorting will be achieved if you can implement swap(evector1, evector2) so that it only rebinds the pointers and the real data is left unchanged.
This could be done using something like double* or probably something more complicated, depends on your matrix implementation.
If done this way, swap(...) wouldn't affect your sorting operation performance.

The idea of conglomerating your vector and matrix is probably the best way to do it in C++. I am thinking about how I would do it in R and seeing if that can be translated to C++. In R it's very easy, simply evec<-evec[,order(eval)]. Unfortunately, I don't know of any built in way to perform the order() operation in C++. Perhaps someone else does, in which case this could be done in a similar way.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Slow performance of sparse matrix using std::vector - c++

Have you tried sorting your vector of sparse nodes? Performing a linear search becomes costly every time you add a node. You could Insert In Place and always perform Binary Search.

Related

Eigen: modify sparse matrix' triplet list, instead of using coeffRef

How to remove the same row location from multiple vectors?

How to verify if a vector has a value at a certain index

Get number of elements greater than a number

Sorting eigenvectors by their eigenvalues (associated sorting)

Categories

Resources