How to speed up a C++ sparse matrix manipulation? - c++

I have a small script for manipulating a sparse matrix in C++. It works perfectly fine except taking too much time. Since I'm doing this manipulation over and over, it is critical to speed it up. I appreciate any idea.Thanks
#include <stdio.h> /* printf, scanf, puts, NULL */
#include <stdlib.h> /* srand, rand */
#include <time.h> /* time */
#include <iostream> /* cout, fixed, scientific */
#include <string>
#include <cmath>
#include <vector>
#include <list>
#include <string>
#include <sstream> /* SJW 08/09/2010 */
#include <fstream>
#include <Eigen/Dense>
#include <Eigen/Sparse>
using namespace Eigen;
using namespace std;
SparseMatrix<double> MatMaker (int n1, int n2, double prob)
{
MatrixXd A = (MatrixXd::Random(n1, n2) + MatrixXd::Ones(n1, n2))/2;
A = (A.array() > prob).select(0, A);
return A.sparseView();
}
////////////////This needs to be optimized/////////////////////
int SD_func(SparseMatrix<double> &W, VectorXd &STvec, SparseMatrix<double> &Wo, int tauR, int tauD)
{
W = W + 1/tauR*(Wo - W);
for (int k = 0; k < W.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(W, k); it; ++it)
W.coeffRef(it.row(),it.col()) = it.value() * (1-STvec(it.col())/tauD);
return 1;
}
int main ()
{
SparseMatrix<double> Wo = MatMaker(5000, 5000, 0.1);
SparseMatrix<double> W = MatMaker(5000, 5000, 0.1);
VectorXd STvec = VectorXd::Random(5000);
clock_t tsd1,tsd2;
float Timesd = 0.0;
tsd1 = clock();
///////////////////////////////// Any way to speed up this function???????
SD_func(W, STvec, Wo, 8000, 50);
//////////////////////////////// ??????????
tsd2 = clock();
Timesd += (tsd2 - tsd1);
cout<<"SD time: " << Timesd / CLOCKS_PER_SEC << " s" << endl;
return 0;
}

The most critical performance improvement (IMO) you can make is to not use W.coeffRef(it.row(),it.col()). It performs a binary search in W for the element each time. As you are already using SparseMatrix<double>::InnerIterator it(W, k); it is very simple to change your function to skip the binary search:
int SD_func_2(SparseMatrix<double> &W, VectorXd &STvec, SparseMatrix<double> &Wo, int tauR, int tauD)
{
W = W + 1/tauR*(Wo - W);
double tauDInv = 1./tauD;
for (int k = 0; k < W.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(W, k); it; ++it)
it.valueRef() *= (1-STvec(it.col())*tauDInv);
return 1;
}
This results in a roughly x3 speedup. Note that I've incorporated #dshin's comment that multiplying is faster than division, however the performance improvement is about 90% removing the binary search, 10% multiplication vs. division.

Related

Program not working for some reason, could someone pls help me fix it [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Could someone please help me fix my program and explain why it s not working?
It's supposed to generate n points with 2 coordinates, which are both random numbers. The values themselves are random but have to scale the interval from 0 to some chosen value k. All the points have to be apart from each other by some radius which is taken to be 1.
For some reason my program doesn't even start. When I run it, Windows just says that the program is not responding and is trying to diagnose the problem.
Please simplify your explanation as much as possible since I'm a complete beginner and probably won't understand otherwise. Thanks a bunch in advance.
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <fstream>
using namespace std;
int main()
{
int n=5;
int k=100;
vector<vector<double>> a(n, vector<double> (2));
srand(132);
//a[0][1]=k*((float(rand()))/RAND_MAX);
//a[0][0]=k*((float(rand()))/RAND_MAX);
for(int i=0; i<n;){
a[i][0]=k*((float(rand()))/RAND_MAX);
a[i][1]=k*((float(rand()))/RAND_MAX);
for (int j=0; j<n; j+=1){
if (sqrt(pow((a[i][1]-a[j][1]),2)+pow((a[i][0]-a[j][0]),2))<=1){
i=i;
break;}
else if(j==n-1){
cout << a[i][0] << " " << a[i][1] << endl;
i+=1;}
}}
return 0;
}
Your code lacks structure. That's why it is hard to understand, as you now learned even for you.
I think a good start would be to write a class for point and two functions, one for random points and for point distance then all, especially the double loops, will become much easier to read and debug.
Look at this:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
struct Point
{
Point() = default;
float x;
float y;
};
float scaled_random(int k)
{
return k*((float(rand()))/RAND_MAX);
}
float distance(const Point& a, const Point& b)
{
return sqrt(pow(a.y-b.y,2)+pow(a.x-b.x,2));
}
int main()
{
int n = 5;
int k = 100;
vector<Point> a(n);
srand(132);
for (int i=0; i<n; ) {
a[i].x = scaled_random(k);
a[i].y = scaled_random(k);
for (int j=0; j<n; j+=1) {
if (distance(a[i], a[j]) <= 1) {
i = i;
break;
} else if (j == n-1) {
cout << a[i].x << " " << a[i].y << endl;
i += 1;
}
}
}
return 0;
}
The issue is still the same, but it has now more structure, better formatting and superfluous includes removed.
Maybe you can see the problem yourself much better this way.
The first time through your code i and j will both be zero, this means a[i][1] - a[j][1] and a[i][0] - a[j][0] are zero, this resets i to 0, breaks the loop and starts again resulting in an infinite loop.
Checking i != j fixes the problem:
if (i != j && sqrt(pow((a[i][1] - a[j][1]), 2) + pow((a[i][0] - a[j][0]), 2)) <= 1) {
Your code might be better structured as:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <algorithm>
int main()
{
int n = 5;
int k = 100;
std::vector<std::vector<double>> a(n, std::vector<double>(2));
srand(132);
for (int i = 0; i < n; i++) {
auto end = a.begin() + i;
do
{
a[i][0] = k * ((float(rand())) / RAND_MAX);
a[i][1] = k * ((float(rand())) / RAND_MAX);
}
while (end != std::find_if(a.begin(), end, [&](const std::vector<double>& element)
{
return sqrt(pow((a[i][1] - element[1]), 2) + pow((a[i][0] - element[0]), 2)) <= 1;
}));
std::cout << a[i][0] << " " << a[i][1] << "\n";
}
return 0;
}
Using this code only the values before i are checked each time rather than all of the values.
rand should be avoided in modern c++, see Why is the use of rand() considered bad?
As the elements of your vector always have 2 elements it'd be better to use std::pair or std::array.
pow may be quite an inefficient way to square two numbers. The sqrt could be avoided by squaring your distance instead.
Using the above points your code could become:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <algorithm>
#include <array>
#include <random>
using point = std::array<double, 2>;
double distanceSquared(const point& a, const point& b)
{
auto d0 = a[0] - b[0];
auto d1 = a[1] - b[1];
return d0 * d0 + d1 * d1;
}
int main()
{
int n = 5;
int k = 100;
std::vector<point> a(n);
std::random_device rd;
std::mt19937_64 engine(rd());
std::uniform_real_distribution<double> dist(0, k);
for (int i = 0; i < n; i++) {
auto end = a.begin() + i;
do
{
a[i][0] = dist(engine);
a[i][1] = dist(engine);
}
while (end != std::find_if(a.begin(), end, [&](const point& element)
{
return distanceSquared(a[i], element) <= 1;
}));
std::cout << a[i][0] << " " << a[i][1] << "\n";
}
return 0;
}

Non-uniform FFT forward and backward test in 1D

I am learning to use a c++ library to perform non-uniform FFT (NUFFT). The library provides 3 types of NUFFT.
Type 1: forward transform from a non-uniform x grid to a uniform k-space grid.
Type 2: backward transform from a uniform k-space grid to a non-uniform x grid
Type 3: from non-uniform to non-uniform
I tested the library in 1D by performing NUFFT on a test function sin(x) from -pi to pi using Type1 NUFFT, transform it back using Type2 NUFFT, and compare the output with sin(x). At first, I tested it on a uniform x grid, which shows a very small error. The error unfortunately is very large when the test is done on a non-uniform x grid.
Two possibilities:
My implementation of NUFFT is incorrect, but the implementation is rather simple, so I doubt if this is the case.
The author mentions that Type2 is NOT the inverse of Type1, so I believe that might be the problem. Since I am not an expert in NUFFT, I wonder if there is an alternative way to perform a forward/backward test with NUFFT?
My purpose is to develop a FFT Poisson solver on a irregular mesh, so I need to perform NUFFT forward and backward, and therefore important to overcome this problem. Besides using FINUFFT, any other suggestion is also welcome.
Thank you for reading.
The code is here for those who is interested.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex>
#include <fftw3.h>
#include <functional>
#include "finufft/src/finufft.h"
using namespace std;
int main()
{
double pi = 3.14159265359;
int N = 128*2;
int i;
double X[N];
double L = 2*pi;
double dx = L/(N);
nufft_opts opts; finufft_default_opts(&opts);
complex<double> R = complex<double>(1.0,0.0); // the real unit
complex<double> in1[N], out1[N], out2[N];
for(i = 0; i < N; i++) {
//X[i] = -(L/2) + i*dx ; // uniform grid
X[i] = -(L/2) + pow(double(i)/N,7.0)*L; //non-uniform grid
in1[i] = sin(X[i])*R ;}
int ier = finufft1d1(N,X,in1,-1,1e-10,N,out1,opts); // type-1 NUFFT
int ier2 = finufft1d2(N,X,out2,+1,1e-10,N,out1,opts); // type-2 NUFFT
// checking the error
double erl1 = 0.;
for ( i = 0; i < N; i++) {
erl1 += fabs( in1[i].real() - out2[i].real()/(N))*dx;
}
std::cout<< erl1 <<" " << ier << " "<< ier2<< std::endl ; // error
return 0;
}
For some reason, the developer made an update on their page which answers exactly my question. https://finufft.readthedocs.io/en/latest/examples.html#periodic-poisson-solve-on-non-cartesian-quadrature-grid. In brief, their NUFFT code is NOT good in the case of fully adaptive scheme, but I would still provide an answer and code here for completeness.
There are two ingredients missing in my code.
(1) I need to multiply the function, sin(x), with a weight before using NUFFT. The weight comes from the determinant of the Jacobian in their 2D example, so the weight is simply the derivative the of the nonuniform coordinate with respect to the uniform coordinate dx/dksi for a 1D example.
(2) Nk must be smaller than N.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex>
#include <fftw3.h>
#include <functional>
#include "finufft/src/finufft.h"
using namespace std;
int main()
{
double pi = 3.14159265359;
int N = 128*2;
int Nk = 32; // smaller than N
int i;
double X[N];
double L = 2*pi;
double dx = L/(N);
nufft_opts opts; finufft_default_opts(&opts);
complex<double> R = complex<double>(1.0,0.0); // the real unit
complex<double> in1[N], out1[N], out2[N];
for(i = 0; i < N; i++) {
ksi[i] = -(L/2) + i*dx ; //uniform grid
X[i] = -(L/2) + pow(double(i)/(N-1),6)*L; //nonuniform grid
}
dX = der(ksi,X,1); // your own derivative code
for(i = 0; i < N; i++) {
in1[i] = sin(X[i]) * dX[i] * R ; // applying weight
}
int ier = finufft1d1(N,X,in1,-1,1e-10,Nk,out1,opts); // type-1 NUFFT
int ier2 = finufft1d2(N,X,out2,+1,1e-10,Nk,out1,opts); // type-2 NUFFT
// checking the error
double erl1 = 0.;
for ( i = 0; i < N; i++) {
erl1 += fabs( in1[i].real() - out2[i].real()/(N))*dx;
}
std::cout<< erl1 <<" " << ier << " "<< ier2<< std::endl ; // error
return 0;
}

C++ equivalent of Python's scipy.sparse.rand

I need C++ equivalent of Python's scipy.sparse.rand function (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.rand.html). I will input m, n, density parameters and rand() function will generate a matrix (preferably COO matrix) for me. How can I do it?
If you have boost on your hands it might be a good idea to use a combination of boost-ublas and boost-random to do the job.
For this you have to install the boost libraries beforehand.
You might optimize the code if you know beforehand the number of non-zero elements.
You can watch the code work at:
https://coliru.stacked-crooked.com/a/097ea92bb336c774
#include <boost/numeric/ublas/matrix_sparse.hpp>
#include <boost/numeric/ublas/io.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_01.hpp>
int main() {
int rows{ 4 };
int cols{ 4 };
double density{ 0.25 };
boost::random::uniform_01<> dist;
boost::random::mt19937 gen;
boost::numeric::ublas::mapped_matrix<double> m(rows, cols, 3 * 3);
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = 0; j < m.size2(); ++j)
if (dist(gen)>density) m(i, j) = dist(gen);
std::cout << m << std::endl;
}

c++ - Fill a symmetric matrix using an array stored on the heap

I am trying to build a code where I have to declare a large array in the heap.
At the same time I will use the boost library to perform some matrix calculations (as can be seen in Fill a symmetric matrix using an array
).
My limitations here are two : I will deal with large arrays and matrices so I have to declare everything on the heap and I have to work with arrays and not with vectors.
However I am facing a rather trivial for many people problem... When filling the matrix, the last element doesn't get filled in correctly. So although I expect to get
[3,3]((0,1,3),(1,2,4),(3,4,5))
the output of the code is
[3,3]((0,1,3),(1,2,4),(3,4,2.6681e-315))
I am compiling this code in ROOT6. I don't think it's related to that, I am just mentioning it for completion.
A small sample of the code follows
#include <iterator>
#include <iostream>
#include <fstream>
#include </usr/include/boost/numeric/ublas/matrix.hpp>
#include </usr/include/boost/numeric/ublas/matrix_sparse.hpp>
#include </usr/include/boost/numeric/ublas/symmetric.hpp>
#include </usr/include/boost/numeric/ublas/io.hpp>
using namespace std;
int test_boost () {
using namespace boost::numeric::ublas;
symmetric_matrix<double, upper> m_sym1 (3, 3);
float* filler = new float[6];
for (int i = 0; i<6; ++i) filler[i] = i;
float const* in1 = filler;
for (size_t i = 0; i < m_sym1.size1(); ++ i)
for (size_t j = 0; j <= i && in1 != &filler[5]; ++ j)
m_sym1 (i, j) = *in1++;
delete[] filler;
std::cout << m_sym1 << std::endl;
return 0;
}
Any idea on how to solve that?
Arrays and pointers are not objects of class type, they don't have members. You already have a float *, it is filler.
float const* in1 = filler; // adding const is always allowed
I've manged to finally solve it by changing &filler[5] to &filler[6].
So a version that works is seen below
#include <iterator>
#include <iostream>
#include <fstream>
#include </usr/include/boost/numeric/ublas/matrix.hpp>
#include </usr/include/boost/numeric/ublas/matrix_sparse.hpp>
#include </usr/include/boost/numeric/ublas/symmetric.hpp>
#include </usr/include/boost/numeric/ublas/io.hpp>
using namespace std;
int test_boost () {
using namespace boost::numeric::ublas;
symmetric_matrix<double, upper> m_sym1 (3, 3);
float* filler = new float[6];
for (int i = 0; i<6; ++i) filler[i] = i;
float const* in1 = filler;
for (size_t i = 0; i < m_sym1.size1(); ++ i)
for (size_t j = 0; j <= i && in1 != &filler[6]; ++ j)
m_sym1 (i, j) = *in1++;
delete[] filler;
std::cout << m_sym1 << std::endl;
return 0;
}
Running this code yields the following output
[3,3]((0,1,3),(1,2,4),(3,4,5))

Overwriting vector<vector<> > and segmentation fault

I'm trying to write a program in which at each step of a loop I create an adjacency list representing a graph that changes in time.
Here's the code:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <vector>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <algorithm>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/variate_generator.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/uniform_real.hpp>
#include <boost/random/exponential_distribution.hpp>
using namespace std;
using std::vector;
typedef boost::mt19937_64 ENG; // use Mersenne Twister 19937 as PRNG engine
typedef boost::uniform_int<> DIST_INT; // define uniform distribution of integers
typedef boost::uniform_real<> DIST_REAL; // define uniform distribution of reals on [0,1)
typedef boost::exponential_distribution<> DIST_EXP; // define exponential distribution
typedef boost::variate_generator<ENG,DIST_INT> VARIATE_INT;
typedef boost::variate_generator<ENG,DIST_REAL> VARIATE_REAL;
typedef boost::variate_generator<ENG,DIST_EXP> VARIATE_EXP;
int main()
{
const unsigned int random_seed = time(NULL);
// ======= initialize BOOST machines
ENG eng(random_seed);
DIST_INT dist_int;
DIST_REAL dist_rand(0,1);
DIST_EXP dist_exp;
VARIATE_INT randint(eng,dist_int); //random integer. use as: randint(N)
VARIATE_REAL rand(eng,dist_rand); //random float on [0,1[. use as: rand()
VARIATE_EXP randexp(eng,dist_exp); //random exponentially distributed float.
int N = 500, Tmax=200000, v, w;
float p = 0.2, s;
vector<vector<int> > contact_list;
for(int i = 0; i < 200000; i++)
{
contact_list.resize(N, vector<int>());
v = 1;
w = -1;
while(v < N)
{
s = rand();
w += 1 + int(log(1-s)/log(1-p));
while((w >= v) && (v < N))
{
w = w - v;
v += 1;
}
if (v < N)
{
contact_list[v].push_back(w);
contact_list[w].push_back(v);
}
}
}
}
However at some point I get segmentation fault. In fact I think this may not be the correct way to overwrite a vector. I also add that I would like to change N_nodes at each step. Any help is appreciated!
For the segmentation fault part you can use Valgrind to try and find out what operation in your code is writing at an invalid location.
Also you can catch Segmantation fault, with the use of signal, but it's not a good practice to catch a segmentation fault