gsl lu decomposition and inversion for float matrix - gsl

Due to memory limit, I need to use gsl_matrix_float instead of gsl_matrix which stores data of type double. However, I want to use gsl_linalg_LU_decomp and gsl_linalg_LU_invert which only support gsl_matrix. And I did not find some other method which support the float version decomposition and inversion in gsl.
Is there any way to solve this dilemma? Or I can only transfer from float to double and then back? Thanks in advance!

The best you can probably do is, as you suggest, convert from float to double and back. Here is example code to perform the inversion (only the essential components are given - you have to fill in the blanks):
include <gsl/gsl_blas.h>
include <gsl/gsl_linalg.h>
void matrix_invert(gsl_matrix_float *, gsl_matrix_float *, int);
int main()
{
gsl_matrix_float *X = gsl_matrix_float_alloc(N, N);
gsl_matrix_float *invX = gsl_matrix_float_alloc(N, N);
matrix_invert(X, invX, N); //invM = inv(I)
return 0;
}
void matrix_invert(gsl_matrix_float *matrix, gsl_matrix_float *inverse, int N)
{
int i=0,j=0,signum=0;
gsl_matrix *DM = gsl_matrix_alloc(N, N);
gsl_matrix *DM_I = gsl_matrix_alloc(N, N);
for (i=0;i<N;i++)
for (j=0;j<N;j++)
gsl_matrix_set(DM, i, j, gsl_matrix_float_get(matrix,i,j));
gsl_permutation *p = gsl_permutation_alloc(N);
gsl_linalg_LU_decomp(DM, p, &signum);
gsl_linalg_LU_invert(DM, p, DM_I);
gsl_permutation_free(p);
gsl_matrix_free(DM);
for (i=0;i<N;i++)
for (j=0;j<N;j++)
gsl_matrix_float_set(inverse, i, j, gsl_matrix_get(DM_I,i,j));
}

Related

Can you create and destroy pthreads inside operator overloading function

I am trying to overload the * operator to use for matrix multiplication. It must be multithreaded using pthreads. I have never done multithreading before, and I am really struggling. If possible I would like to create and destroy the pthreads within the function(so the only place pthreads appear is within the function).
Here is my Matrix Class:
class Matrix
{
private:
int numRows_;
int numColumns_;
std::vector<double> data_;
public:
Matrix();
Matrix(Matrix const& matrix_objext);
Matrix(int numRows, int numColumns);
Matrix(int numRows, int numColumns, std::vector<double> data);
Matrix(std::string importFilePath);
~Matrix();
int get_numRows();
int get_numColumns();
std::vector<double> get_data();
void set_numRows(int);
void set_numColumns(int);
void set_data(std::vector<double>);
Matrix operator*(Matrix x);
Matrix operator*(int x);
Matrix operator+(Matrix x);
Matrix operator-(Matrix x);
Matrix operator^(int exp);
Matrix operator-();
Matrix operator=(Matrix x);
bool operator==(Matrix x);
Matrix MultiplySlow(Matrix x);
Matrix Transpose();
void ExportToFile(std::string);
std::string to_string();
};
This is the function I am struggling with:
Matrix Matrix::operator*(Matrix x){
pthread_t thread;
int rc;
rc = pthread_create(&thread, NULL, MultiplySlow, x)
}
The MultiplySlow function is not intended to be used only for this overloaded operator, so if it needs to be changed I would have to make a new multiplication function.
Here is the MultiplySlow function just in case:
Matrix Matrix::MultiplySlow(Matrix x){
Matrix mat;
mat.numColumns_=numColumns_;
mat.numRows_=numRows_;
mat.data_.resize(numRows_*numColumns_);
if(numColumns_ != x.numRows_){
std::cout << "The number of columns of the 1st matrix must equal the number of rows of the 2nd matrix." << std::endl;
}
for(int i=0; i<numRows_;i++){
for(int j=0; j<x.numColumns_;j++){
double sum =0.0;
for(int k=0; k<x.numRows_; k++){
sum = sum + data_[i * numColumns_ + k] * x.data_[k * x.numColumns_ +j];
}
mat.data_[i * mat.numColumns_ + j] = sum;
}
}
return mat;
}
Please help!!
There are a few problems here:
pthread_create() returns 0 or 1 depending on success or failure (not a matrix).
The function that is spawn by the thread is supposed to return void * and receive a void * (not a matrix)
Spawning another thread to do the actual multiplication hinders efficiency. It doesn’t help it. So, why are you doing this?
My suggestions are
Look into C++ threads
look into C++ async calls
Think about how to divide the matrix multiplication into independent components, so multi-threading could help efficiency.

How to get the diagonal of a sparse matrix in cuSparse?

I have a sparse matrix in cuSparse and I want to extract the diagonal. I can't seem to find a way to do it other than converting it back to CPU memory into Eigen SparseMatrix and use the .diagonal provided by Eigen to do it, and then copy the result back to GPU. Obviously this is pretty inefficient so I am wondering if there's a way to do it directly in the GPU. Please see below code for reference:
void CuSparseTransposeToEigenSparse(
const int *d_row,
const int *d_col,
const double *d_val,
const int num_non0,
const int mat_row,
const int mat_col,
Eigen::SparseMatrix<double> &mat){
std::vector<int> outer(mat_col + 1);
std::vector<int> inner(num_non0);
std::vector<double> value(num_non0);
cudaMemcpy(
outer.data(), d_row, sizeof(int) * (mat_col + 1), cudaMemcpyDeviceToHost);
cudaMemcpy(
inner.data(), d_col, sizeof(int) * num_non0, cudaMemcpyDeviceToHost);
cudaMemcpy(
value.data(), d_val, sizeof(double) * num_non0, cudaMemcpyDeviceToHost);
Eigen::Map<Eigen::SparseMatrix<double>> mat_map(
mat_row, mat_col, num_non0, outer.data(), inner.data(), value.data());
mat = mat_map.eval();
}
int main(){
int *d_A_row;
int *d_A_col;
double *d_A_val;
int A_len;
int num_A_non0;
double *d_A_diag;
// these values are filled with some computation
// current solution
Eigen::SparseMatrix<double> A;
CuSparseTransposeToEigenSparse(
d_A_row, d_A_col, d_A_val, num_A_non0, A_len, A_len, A);
Eigen::VectorXd A_diag = A.diagonal();
cudaMemcpy(d_A_diag, A_diag.data(), sizeof(double) * A_len, cudaMemcpyHostToDevice);
// is there a way to fill in d_A_diag without copying back to CPU?
return 0;
}
Just in case anyone is interested. I figured it out for the case of a CSR matrix. The custom kernel to do it looks like this:
__global__ static void GetDiagFromSparseMat(const int *A_row,
const int *A_col,
const double *A_val,
const int A_len,
double *A_diag){
const int x = blockIdx.x * blockDim.x + threadIdx.x;
if (x < A_len){
const int num_non0_row = A_row[x + 1] - A_row[x];
A_diag[x] = 0.0;
for (int i = 0; i < num_non0_row; i++){
if (A_col[i + A_row[x]] == x){
A_diag[x] = A_val[i + A_row[x]];
break;
}
}
}
}

CUDA: Fill matrix with results of summation

I need to fill a matrix with values returned from function below
__device__ float calc(float *ar, int m, float sum, int i, int j)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < m)
{
ar[idx] = __powf(ar[idx], i + j);
atomicAdd(&sum, ar[idx]);
}
return sum;
}
Matrix set up as one dimensional array and fills up through this function
__global__ void createMatrix(float *A, float *arr, int size)
{
A[threadIdx.y*size + threadIdx.x] = /*some number*/;
}
In theory it should be something like this
__global__ void createMatrix(float *A, float *arr, int size)
{
float sum = 0;
A[threadIdx.y*size + threadIdx.x] = calc(arr, size, sum, threadIdx.x, threadIdx.y);
}
but it doesn't work that way, calc always returns 0. Is there any way I can fill matrix using global function? Thanks in advance.
You're passing sum by value rather than by reference. So all of your atomicAdd()'s have no effect on the zero-initialized value in the kernel.
However, even if you were to pass it by reference, this would still be a poorly-designed kernel. You see, you don't need the atomics if you have a per-thread sum variable (which you do). Also, your calc() function only adds a value once to each sum value, while it seems you expect it to add more than once.

2D MemoryView to C Pointer Error (1D works, but 2D doesnt)

I was able to get pointers for 1D memoryviews using this StackOverflow question, but applying the same method to 2D memoryviews gives me a " Cannot assign type 'double *' to 'double **'" error.
cdef extern from "dgesvd.h" nogil:
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT)
cdef:
double[:] S
double[:,:] A, U, VT
A = np.ascontiguousarray(np.zeros((N,N)))
S = np.zeros(N)
U = np.zeros(N)
VT = np.zeros(N)
dgesvd(&A[0,0], N, N, &S[0], &U[0], &VT[0])
EDIT: I got it to compile by doing
So I got it to compile successfully by doing:
cdef:
double[:] S
double[:,:] A, U, VT
U = np.zeros((N,N))
VT = np.zeros((N,N))
A = np.zeros((N,N))
S = np.zeros(N)
A_p = <double *> malloc(sizeof(double) * N)
U_p = <double *> malloc(sizeof(double) * N)
VT_p = <double *> malloc(sizeof(double) * N)
for i in range(N):
A_p = &A[i, 0]
U_p = &U[i, 0]
VT_p = &VT[i, 0]
dgesvd(&A_p, N, N, &S[0], &U_p, &VT_p)
free(A_p)
free(U_p)
free(VT_p)
BUT I get a segfault when I try to run it, so I probably did this wrong.
Here are the contents of "dgesvd.h" (I did not write it, but I know it works):
/*
This file has my implementation of the LAPACK routine dgesdd for
C++. This program solves for the singular value decomposition of a
rectangular matrix A. The function call is of the form
void dgesdd(double **A, int m, int n, double *S, double *U, double *VT)
A: the m by n matrix that we are decomposing
m: the number of rows in A
n: the number of columns in A (generally, n<m)
S: a min(m,n) element array to hold the singular values of A
U: a [m, min(m,n)] element rectangular array to hold the right
singular vectors of A. These vectors will be the columns of U,
so that U[i][j] is the ith element of vector j.
VT: a [min(m,n), n] element rectangular array to hold the left
singular vectors of A. These vectors will be the rows of VT
(it is a transpose of the vector matrix), so that VT[i][j] is
the jth element of vector i.
Note that S, U, and VT must be initialized before calling this
routine, or there will be an error. Here is a quick sample piece of
code to perform this initialization; in many cases, it can be lifted
right from here into your program.
S = new double[minmn];
U = new double*[m]; for (int i=0; i<m; i++) U[i] = new double[minmn];
VT = new double*[minmn]; for (int i=0; i<minmn; i++) VT[i] = new double[n];
Scot Shaw
24 January 2000 */
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT);
double *dgesvd_ctof(double **in, int rows, int cols);
void dgesvd_ftoc(double *in, double **out, int rows, int cols);
extern "C" void dgesvd_(char *jobu, char *jobvt, int *m, int *n,
double *a, int *lda, double *s, double *u,
int *ldu, double *vt, int *ldvt, double *work,
int *lwork, int *info);
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT)
{
char jobu, jobvt;
int lda, ldu, ldvt, lwork, info;
double *a, *u, *vt, *work;
int minmn, maxmn;
jobu = 'S'; /* Specifies options for computing U.
A: all M columns of U are returned in array U;
S: the first min(m,n) columns of U (the left
singular vectors) are returned in the array U;
O: the first min(m,n) columns of U (the left
singular vectors) are overwritten on the array A;
N: no columns of U (no left singular vectors) are
computed. */
jobvt = 'S'; /* Specifies options for computing VT.
A: all N rows of V**T are returned in the array
VT;
S: the first min(m,n) rows of V**T (the right
singular vectors) are returned in the array VT;
O: the first min(m,n) rows of V**T (the right
singular vectors) are overwritten on the array A;
N: no rows of V**T (no right singular vectors) are
computed. */
lda = m; // The leading dimension of the matrix a.
a = dgesvd_ctof(A, lda, n); /* Convert the matrix A from double pointer
C form to single pointer Fortran form. */
ldu = m;
/* Since A is not a square matrix, we have to make some decisions
based on which dimension is shorter. */
if (m>=n) { minmn = n; maxmn = m; } else { minmn = m; maxmn = n; }
ldu = m; // Left singular vector matrix
u = new double[ldu*minmn];
ldvt = minmn; // Right singular vector matrix
vt = new double[ldvt*n];
lwork = 5*maxmn; // Set up the work array, larger than needed.
work = new double[lwork];
dgesvd_(&jobu, &jobvt, &m, &n, a, &lda, S, u,
&ldu, vt, &ldvt, work, &lwork, &info);
dgesvd_ftoc(u, U, ldu, minmn);
dgesvd_ftoc(vt, VT, ldvt, n);
delete a;
delete u;
delete vt;
delete work;
}
double* dgesvd_ctof(double **in, int rows, int cols)
{
double *out;
int i, j;
out = new double[rows*cols];
for (i=0; i<rows; i++) for (j=0; j<cols; j++) out[i+j*rows] = in[i][j];
return(out);
}
void dgesvd_ftoc(double *in, double **out, int rows, int cols)
{
int i, j;
for (i=0; i<rows; i++) for (j=0; j<cols; j++) out[i][j] = in[i+j*rows];
}
You don't want to be using the "pointer-to-pointer" form. All the Cython/numpy arrays are stored as a single continuous array together with a few length parameters to let it do 2D access. You're probably best replicating the dgesvd wrapper in Cython (to allocate the working arrays, but not do the ftoc or ctof conversions).
I've had a go, below, but it's untested so there may be bugs. It's more for the gist of what to do than to be copied outright.
def dgesvd(double [:,:] A):
"""All sizes implicit in A, returns a tuple of U S V"""
# start by ensuring we have Fortran style ordering
cdef double[::1, :] A_f = A.copy_fortran()
# work out the sizes - it's possible I've got this the wrong way round!
cdef int m = A.shape[0]
cdef int n = A.shape[1]
cdef char jobu[] = 'S'
cdef char jobvt[] = 'S'
cdef double[::1,:] U
cdef double[::1,:] Vt
cdef double[::1] S
cdef double[::1] work
cdef int minnm, maxnm
cdef int info, lwork, ldu, ldvt
if m>=n:
minmn = n
maxmn = m
else:
minmn = m
maxmn = n
ldu = m;
U = np.array((ldu,minmn), order='F')
ldvt = minmn
Vt = np.array((ldvt,n), order='F')
S = np.array((minmn,)) # not absolutely sure - check this!
lwork = 5*maxmn
work = np.array((lwork,))
dgesvd_(&jobu, &jobvt, &m, &n, &A_f[0,0], &lda, &S[0], &U[0],
&ldu, &Vt[0,0], &ldvt, &work[0], &lwork, &info);
return U, S, Vt.T # transpose Vt on the way out
The way you call dgesdd is not consistent with its prototype. Apart from that, this should work. See, for instance, this example, that performs the dgemm call from Cython in a similar way.
Also note, that Scipy 0.16, will include a Cython API for BLAS/LAPACK, and it will probably be the best approach in the future.

How to allocate a matrix in C++?

For a vector in C++, I have
class Vec
{
public:
int len;
double * vdata;
Vec();
Vec(Vec const & v)
{
cout<<"Vec copy constructor\n";
len = v.len;
vdata=new double[len];
for (int i=0;i<len;i++) vdata[i]=v.vdata[i];
};
I would greatly appreciate it if you could help me how to write an analogous code for a matrix. I am thinking something like this:
class Mat
{
public:
int nrows;
int ncols;
double * mdata;
Mat();
Mat(Mat const & m)
{
cout<<"Mat copy constructor\n";
nrows = m.nrows;
ncols = m.ncols;
But I don't know how to code the memory allocation for a matrix using the idea that first we put all the elements into a 1D array (row1 row2 ... rown) then we chop the array into rows and then chop each row into columns. Particularly, could you help me translate this idea into C++ language that is analogous to the following:
vdata=new double[len];
for (int i=0;i<len;i++) vdata[i]=v.vdata[i];
};
I am thinking of something like this:
double *data=new double[nrows*ncols];
for (int i=0;i<nrows;i++)
{
for (int j=0;j<ncols,j++){data(i,j)=m.mdata[i][j]};
};
But I am not sure about this part:
data(i,j)=m.mdata[i][j]
Also, I am supposed to use a pure virtual element indexing method: the (i,j) element of a Mat object m will be retrieved by m(i,j). I have to provide both const and non-const versions of this indexing operator.<-- May you show me how I could do this?
Thanks a lot.
Use as a single-dimensional array. You will notice that in practice, it's generally much simpler to use a 1d-array for such things.
class Matrix
{
public:
Matrix(unsigned int rows, unsigned int cols)
: _rows(rows)
, _cols(cols)
, _size(_rows*_cols)
, _components(new double[_size])
{
for(unsigned int i = 0; i < _size; ++i)
{
_components[i] = 0;
}
}
~Matrix()
{
delete[] _components;
}
double& operator()(unsigned int row, unsigned int col)
{
unsigned int index = row * _cols + col;
return _components[index];
}
private:
unsigned int _rows;
unsigned int _cols;
unsigned int _size;
double* _components;
};
However, if you want to actually use matrices and vectors, and not just implement them for learning, I would really advise you to use the Eigen library. It's free and open source and has great and easy-to-use vector and matrix classes.
While Eigen is great to use, if you want to look at source code of an existing implementation, it can be quite confusing for new programmers - it's very general and contains a lot of optimizations. A less complicated implementation of basic matrix and vector classes can be found in vmmlib.
Also you can use one standard vector to implement matrix but vector size will be nrows * ncols:
#include <vector>
class Mat {
public:
Mat(int rows, int cols):
nrows(rows),
ncols(cols),
elems(rows*cols,0)
{}
Mat(const Mat &m):
nrows(m.nrows),
ncols(m.ncols),
elems(m.elems.begin(), m.elems.end())
{}
double celem(int i,int j) const {
return elems[ncols*i + nrows*j];
}
double *pelem(int i,int j) {
return &elems[ncols*i + nrows*j];
}
private:
int nrows;
int ncols;
vector<double> elems;
};