I am trying to determine the eigenvalues and eigenvectors of a sparse array in Eigen. Since I need to compute all the eigenvectors and eigenvalues, and I could not get this done using the unsupported ArpackSupport module working, I chose to convert the system to a dense matrix and compute the eigensystem using SelfAdjointEigenSolver (I know my matrix is real and has real eigenvalues). This works well until I have matrices of size 1024*1024 but then I start getting deviations from the expected results.
In the documentation of this module (https://eigen.tuxfamily.org/dox/classEigen_1_1SelfAdjointEigenSolver.html) from what I understood it is possible to change the number of max iterations:
const int m_maxIterations
static
Maximum number of iterations.
The algorithm terminates if it does not converge within m_maxIterations * n iterations, where n denotes the size of the matrix. This value is currently set to 30 (copied from LAPACK).
However, I do not understand how do you implement this, using their examples:
SelfAdjointEigenSolver<Matrix4f> es;
Matrix4f X = Matrix4f::Random(4,4);
Matrix4f A = X + X.transpose();
es.compute(A);
cout << "The eigenvalues of A are: " << es.eigenvalues().transpose() << endl;
es.compute(A + Matrix4f::Identity(4,4)); // re-use es to compute eigenvalues of A+I
cout << "The eigenvalues of A+I are: " << es.eigenvalues().transpose() << endl
How would you modify it in order to change the maximum number of iterations?
Additionally, will this solve my problem or should I try to find an alternative function or algorithm to solve the eigensystem?
My thanks in advance.
Increasing the number of iterations is unlikely to help. On the other hand, moving from float to double will help a lot!
If that does not help, please, be more specific on "deviations from the expected results".
m_maxIterations is a static const int variable, and as such it can be considered an intrinsic property of the type. Changing such a type property usually would be done via a specific template parameter. In this case, however, it is set to the constant number 30, so it's not possible.
Therefore, you're only choice is to change the value in the header file and recompile your program.
However, before doing that, I would try the Singular Value Decomposition. According to the homepage, its accuracy is "Excellent-Proven". Moreover, it can overcome problems due to numerically not completely symmetric matrices.
I solved the problem by writing the Jacobi algorithm adapted from the Book Numerical Recipes:
void ROTATy(MatrixXd &a, int i, int j, int k, int l, double s, double tau)
{
double g,h;
g=a(i,j);
h=a(k,l);
a(i,j)=g-s*(h+g*tau);
a(k,l)=h+s*(g-h*tau);
}
void jacoby(int n, MatrixXd &a, MatrixXd &v, VectorXd &d )
{
int j,iq,ip,i;
double tresh,theta,tau,t,sm,s,h,g,c;
VectorXd b(n);
VectorXd z(n);
v.setIdentity();
z.setZero();
for (ip=0;ip<n;ip++)
{
d(ip)=a(ip,ip);
b(ip)=d(ip);
}
for (i=0;i<50;i++)
{
sm=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
sm += fabs(a(ip,iq));
}
if (sm == 0.0) {
break;
}
if (i < 3)
tresh=0.2*sm/(n*n);
else
tresh=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
{
g=100.0*fabs(a(ip,iq));
if (i > 3 && (fabs(d(ip))+g) == fabs(d[ip]) && (fabs(d[iq])+g) == fabs(d[iq]))
a(ip,iq)=0.0;
else if (fabs(a(ip,iq)) > tresh)
{
h=d(iq)-d(ip);
if ((fabs(h)+g) == fabs(h))
{
t=(a(ip,iq))/h;
}
else
{
theta=0.5*h/(a(ip,iq));
t=1.0/(fabs(theta)+sqrt(1.0+theta*theta));
if (theta < 0.0)
{
t = -t;
}
c=1.0/sqrt(1+t*t);
s=t*c;
tau=s/(1.0+c);
h=t*a(ip,iq);
z(ip)=z(ip)-h;
z(iq)=z(iq)+h;
d(ip)=d(ip)- h;
d(iq)=d(iq) + h;
a(ip,iq)=0.0;
for (j=0;j<ip;j++)
ROTATy(a,j,ip,j,iq,s,tau);
for (j=ip+1;j<iq;j++)
ROTATy(a,ip,j,j,iq,s,tau);
for (j=iq+1;j<n;j++)
ROTATy(a,ip,j,iq,j,s,tau);
for (j=0;j<n;j++)
ROTATy(v,j,ip,j,iq,s,tau);
}
}
}
}
}
}
the function jacoby receives the size of of the square matrix n, the matrix we want to calculate the we want to solve (a) and a matrix that will receive the eigenvectors in each column and a vector that is going to receive the eigenvalues. It is a bit slower so I tried to parallelize it with OpenMp (see: Parallelization of Jacobi algorithm using eigen c++ using openmp) but for 4096x4096 sized matrices what I did not mean an improvement in computation time, unfortunately.
Related
Which algorithm you would recommend for fast solution of dense linear system of fixed dimension (N=9) (matrix is symmetric, positive-semidefinite)?
Gaussian elimination
LU decomposition
Cholesky decomposition
etc?
Types are 32 and 64 bits floating points.
Such systems will be solved millions of times, so algorithm should be rather fast with respect to dimension (n=9).
P.S. examples of robust C++ implementations for proposed algorithm are appreciated.
1) What do you mean by "solved million of times"? Same coefficient matrix with a million of different right hand terms, or a million of distinct matrices?
Million of distinct matrices.
2) Positive _semi_definite means that matrix can be singular (to machine precision). How would you like to deal with this case? Just raise an error, or try to return some sensible answer?
Raising error is OK.
The matrix being symmetric, positive-semidefinite, the Cholesky decomposition is strictly superior to the LU decomposition. (roughly twice faster than LU, whatever the size of the matrix. Source : "Numerical Linear Algebra" by Trefethen and Bau)
It is also de facto the standard for small dense matrices (source : I do a PhD in computational mathematics) Iterative methods are less efficient than direct methods unless the system becomes large enough (quick rule of thumb that means nothing, but that is always nice to have : on any modern computer, any matrix smaller than 100*100 is definitely a small matrix that needs direct methods, rather than iterative ones)
Now, I do not recommend to do it yourself. There are tons of good libraries out there that have been thoroughly tested. But if I have to recommend you one, it would be Eigen :
No installation required (header only library, so no library to link, only #include<>)
Robust and efficient (they have a lot of benchmarks on the main page, and the results are nice)
Easy to use and well documented
By the way, here in the documentation, you have the various pros and cons of their 7 direct linear solvers in a nice, concise table. It seems that in your case, LDLT (a variation of Cholesky) wins
Generally, one is best off using an existing library, rather than a roll-your-own approach, as there are many tedious details to attend to in pursuit of a fast, stable numerical implementation.
Here's a few to get you started:
Eigen library (my personal preference):
http://eigen.tuxfamily.org/dox/QuickRefPage.html#QuickRef_Headers
Armadillo:
http://arma.sourceforge.net/
Search around and you'll find plenty of others.
I would recommend LU decomposition, especially if "solved millions of times" really means "solved once and applied to millions of vectors". You'll create the LU decomposition, save it, and apply forward-back substitution against as many r.h.s. vectors as you wish.
It's more stable in the face of roundoff if you use pivoting.
LU for a symmetric semi-definite matrix does not make much sense: you destroy a nice property of your input data performing unnecessary operations.
Choice between LLT or LDLT really depends on the condition number of your matrices, and how you intend to treat edge cases. LDLT should be used only if you can prove a statistically significant improve in accuracy, or if robustness is of paramount importance to your app.
(Without a sample of your matrices it is hard to give sound advice, but I suspect that with such a small order N=9, pivoting the small diagonal terms toward the bottom part of D is really not necessary. So I would start with classical Cholesky and simply abort factorization if the diag terms become to small with respect to some sensibly chosen tolerance.)
Cholesky is pretty simple to code, and if you strive for a really fast code, it is better to implement it yourself.
Like others above, I recommend cholesky. I've found that the increased number of additions, subtractions and memory accesses means that LDLt is slower than cholesky.
There are in fact a number a number of variations on cholesky, and which one will be fastest depends on the representation you choose for your matrices. I generally use a fortran style representation, that is a matrix M is a double* M with M(i,j) being m[i+dim*j]; for this I reckon that an upper triangular cholesky is (a little) the fastest, that is one seeks upper triangular U with U'*U = M.
For fixed, small, dimension it is definitely worth considering writing a version that uses no loops. A relatively straightforward way to do this is to write a program to do it. As I recall, using a routine that deals with the general case as a template, it only took a morning to write a program that would write a specific fixed dimension version. The savings can be considerable. For example my general version takes 0.47 seconds to do a million 9x9 factorisations, while the loopless version takes 0.17 seconds -- these timings running single threaded on a 2.6GHz pc.
To show that this is not a major task, I've included the source of such a program below. It includes the general version of the factorisation as a comment. I've used this code in circumstances where the matrices are not close to singular, and I reckon it works ok there; however it may well be too crude for more delicate work.
/* ----------------------------------------------------------------
** to write fixed dimension ut cholesky routines
** ----------------------------------------------------------------
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <strings.h>
/* ----------------------------------------------------------------
*/
#if 0
static inline double vec_dot_1_1( int dim, const double* x, const double* y)
{
double d = 0.0;
while( --dim >= 0)
{ d += *x++ * *y++;
}
return d;
}
/* ----------------------------------------------------------------
** ut cholesky: solve U'*U = P for ut U in P (only ut of P accessed)
** ----------------------------------------------------------------
*/
int mat_ut_cholesky( int dim, double* P)
{
int i, j;
double d;
double* Ucoli;
for( Ucoli=P, i=0; i<dim; ++i, Ucoli+=dim)
{ /* U[i,i] = P[i,i] - Sum{ k<i | U[k,i]*U[k,i]} */
d = Ucoli[i] - vec_dot_1_1( i, Ucoli, Ucoli);
if ( d < 0.0)
{ return 0;
}
Ucoli[i] = sqrt( d);
d = 1.0/Ucoli[i];
for( j=i+1; j<dim; ++j)
{ /* U[i,j] = (P[i,j] - Sum{ k<i | U[k,i]*U[k,j]})/U[i,i] */
P[i+j*dim] = d*(P[i+j*dim] - vec_dot_1_1( i, Ucoli, P+j*dim));
}
}
return 1;
}
/* ----------------------------------------------------------------
*/
#endif
/* ----------------------------------------------------------------
**
** ----------------------------------------------------------------
*/
static void write_ut_inner_step( int dim, int i, int off)
{
int j, k, l;
printf( "\td = 1.0/P[%d];\n", i+off);
for( j=i+1; j<dim; ++j)
{ k = i+j*dim;
printf( "\tP[%d] = d * ", k);
if ( i)
{ printf( "(P[%d]", k);
printf( " - (P[%d]*P[%d]", off, j*dim);
for( l=1; l<i; ++l)
{ printf( " + P[%d]*P[%d]", l+off, l+j*dim);
}
printf( "));");
}
else
{ printf( "P[%d];", k);
}
printf( "\n");
}
}
static void write_dot( int n, int off)
{
int i;
printf( "P[%d]*P[%d]", off, off);
for( i=1; i<n; ++i)
{ printf( "+P[%d]*P[%d]", off+i, off+i);
}
}
static void write_ut_outer_step( int dim, int i, int off)
{
printf( "\td = P[%d]", off+i);
if ( i)
{ printf( " - (");
write_dot( i, off);
printf( ")");
}
printf( ";\n");
printf( "\tif ( d <= 0.0)\n");
printf( "\t{\treturn 0;\n");
printf( "\t}\n");
printf( "\tP[%d] = sqrt( d);\n", i+off);
if ( i < dim-1)
{ write_ut_inner_step( dim, i, off);
}
}
static void write_ut_chol( int dim)
{
int i;
int off=0;
printf( "int\tut_chol_%.2d( double* P)\n", dim);
printf( "{\n");
printf( "double\td;\n");
for( i=0; i<dim; ++i)
{ write_ut_outer_step( dim, i, off);
printf( "\n");
off += dim;
}
printf( "\treturn 1;\n");
printf( "}\n");
}
/* ----------------------------------------------------------------
*/
/* ----------------------------------------------------------------
**
** ----------------------------------------------------------------
*/
static int read_args( int* dim, int argc, char** argv)
{
while( argc)
{ if ( strcmp( *argv, "-h") == 0)
{ return 0;
}
else if (strcmp( *argv, "-d") == 0)
{ --argc; ++argv;
*dim = atoi( (--argc, *argv++));
}
else
{ break;
}
}
return 1;
}
int main( int argc, char** argv)
{
int dim = 9;
if( read_args( &dim, --argc, ++argv))
{ write_ut_chol( dim);
}
else
{ fprintf( stderr, "usage: wchol (-d dim)? -- writes to stdout\n");
}
return EXIT_SUCCESS;
}
/* ----------------------------------------------------------------
*/
Because of its ease of use, you can take Eigen solvers just for comparison. For specific use case a specific solver might be faster although another is supposed to be better. For that, you can measure runtimes for the each algorithm just for the selection. After that you can implement the desired option (or find an existing one that fits your needs the best).
I have a ~3000x3000 covariance-alike matrix on which I compute the eigenvalue-eigenvector decomposition (it's a OpenCV matrix, and I use cv::eigen() to get the job done).
However, I actually only need the, say, first 30 eigenvalues/vectors, I don't care about the rest. Theoretically, this should allow to speed up the computation significantly, right? I mean, that means it has 2970 eigenvectors less that need to be computed.
Which C++ library will allow me to do that? Please note that OpenCV's eigen() method does have the parameters for that, but the documentation says they are ignored, and I tested it myself, they are indeed ignored :D
UPDATE:
I managed to do it with ARPACK. I managed to compile it for windows, and even to use it. The results look promising, an illustration can be seen in this toy example:
#include "ardsmat.h"
#include "ardssym.h"
int n = 3; // Dimension of the problem.
double* EigVal = NULL; // Eigenvalues.
double* EigVec = NULL; // Eigenvectors stored sequentially.
int lowerHalfElementCount = (n*n+n) / 2;
//whole matrix:
/*
2 3 8
3 9 -7
8 -7 19
*/
double* lower = new double[lowerHalfElementCount]; //lower half of the matrix
//to be filled with COLUMN major (i.e. one column after the other, always starting from the diagonal element)
lower[0] = 2; lower[1] = 3; lower[2] = 8; lower[3] = 9; lower[4] = -7; lower[5] = 19;
//params: dimensions (i.e. width/height), array with values of the lower or upper half (sequentially, row major), 'L' or 'U' for upper or lower
ARdsSymMatrix<double> mat(n, lower, 'L');
// Defining the eigenvalue problem.
int noOfEigVecValues = 2;
//int maxIterations = 50000000;
//ARluSymStdEig<double> dprob(noOfEigVecValues, mat, "LM", 0, 0.5, maxIterations);
ARluSymStdEig<double> dprob(noOfEigVecValues, mat);
// Finding eigenvalues and eigenvectors.
int converged = dprob.EigenValVectors(EigVec, EigVal);
for (int eigValIdx = 0; eigValIdx < noOfEigVecValues; eigValIdx++) {
std::cout << "Eigenvalue: " << EigVal[eigValIdx] << "\nEigenvector: ";
for (int i = 0; i < n; i++) {
int idx = n*eigValIdx+i;
std::cout << EigVec[idx] << " ";
}
std::cout << std::endl;
}
The results are:
9.4298, 24.24059
for the eigenvalues, and
-0.523207, -0.83446237, -0.17299346
0.273269, -0.356554, 0.893416
for the 2 eigenvectors respectively (one eigenvector per row)
The code fails to find 3 eigenvectors (it can only find 1-2 in this case, an assert() makes sure of that, but well, that's not a problem).
In this article, Simon Funk shows a simple, effective way to estimate a singular value decomposition (SVD) of a very large matrix. In his case, the matrix is sparse, with dimensions: 17,000 x 500,000.
Now, looking here, describes how eigenvalue decomposition closely related to SVD. Thus, you might benefit from considering a modified version of Simon Funk's approach, especially if your matrix is sparse. Furthermore, your matrix is not only square but also symmetric (if that is what you mean by covariance-like), which likely leads to additional simplification.
... Just an idea :)
It seems that Spectra will do the job with good performances.
Here is an example from their documentation to compute the 3 first eigen values of a dense symmetric matrix M (likewise your covariance matrix):
#include <Eigen/Core>
#include <Spectra/SymEigsSolver.h>
// <Spectra/MatOp/DenseSymMatProd.h> is implicitly included
#include <iostream>
using namespace Spectra;
int main()
{
// We are going to calculate the eigenvalues of M
Eigen::MatrixXd A = Eigen::MatrixXd::Random(10, 10);
Eigen::MatrixXd M = A + A.transpose();
// Construct matrix operation object using the wrapper class DenseSymMatProd
DenseSymMatProd<double> op(M);
// Construct eigen solver object, requesting the largest three eigenvalues
SymEigsSolver< double, LARGEST_ALGE, DenseSymMatProd<double> > eigs(&op, 3, 6);
// Initialize and compute
eigs.init();
int nconv = eigs.compute();
// Retrieve results
Eigen::VectorXd evalues;
if(eigs.info() == SUCCESSFUL)
evalues = eigs.eigenvalues();
std::cout << "Eigenvalues found:\n" << evalues << std::endl;
return 0;
}
Is it actually possible to calculate the Matrix Exponential of a Complex Matrix in c / c++?
I've managed to take the product of two complex matrices using blas functions from the GNU Science Library. for matC = matA * matB:
gsl_blas_zgemm (CblasNoTrans, CblasNoTrans, GSL_COMPLEX_ONE, matA, matB, GSL_COMPLEX_ZERO, matC);
And I've managed to get the matrix exponential of a matrix by using the undocumented
gsl_linalg_exponential_ss(&m.matrix, &em.matrix, .01);
But this doesn't seems to accept complex arguments.
Is there anyway to do this? I used to think c++ was capable of anything. Now I think its outdated and cryptic...
Several options:
modify the gsl_linalg_exponential_ss code to accept complex matrices
write your complex NxN matrix as real 2N x 2N matrix
Diagonalize the matrix, take the exponential of the eigenvalues, and rotate the matrix back to the original basis
Using the complex matrix product that is available, implement the matrix exponential according to it's definition: exp(A) = sum_(n=0)^(n=infinity) A^n/(n!)
You have to check which methods are appropriate for your problem.
C++ is a general purpose language. As mentioned above, if you need specific functionality you have to find a library that can do it or implement it yourself. Alternatively you could use software like MatLab and Mathematica. If that's too expensive there are open source alternatives, e.g. Sage and Octave.
"I used to think c++ was capable of anything" - if a general-purpose language has built-in complex math in its core, then something is wrong with that language.
Fur such very specific tasks there is a well-accepted solution: libraries. Either write your own, or much better, use an already existing one.
I myself rarely need complex matrices in C++, I always used Matlab and similar tools for that. However, this http://www.mathtools.net/C_C__/Mathematics/index.html might be of interest to you if you know Matlab.
There are a couple other libraries which might be of help:
http://eigen.tuxfamily.org/index.php?title=Main_Page
http://math.nist.gov/lapack++/
I was also thinking to do the same, writing your complex NxN matrix as real 2N x 2N matrix is the best way to solve the problem, then use gsl_linalg_exponential_ss().
Suppose A=Ar+i*Ai, where A is the complex matrix and Ar and Ai are the real matrices. Then write the new matrix B=[Ar Ai ;-Ai Ar] (Here the matrix is written in matlab notation). Now calculate the exponential of B, that is eB=[eB1 eB2 ;eB3 eB4], then exponential of A is given by, eA=eB1+1i.*eB2
(summing the matrices eB1 and 1i.*eB2).
I have written a code to calculate the matrix exponential of the complex matrices with the gsl function, gsl_linalg_exponential_ss(&m.matrix, &em.matrix, .01);
Here you have the complete code, and the compilation results. I have checked the result with the Matlab and result agrees.
#include <stdio.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_linalg.h>
#include <gsl/gsl_complex.h>
#include <gsl/gsl_complex_math.h>
void my_gsl_complex_matrix_exponential(gsl_matrix_complex *eA, gsl_matrix_complex *A, int dimx)
{
int j,k=0;
gsl_complex temp;
gsl_matrix *matreal =gsl_matrix_alloc(2*dimx,2*dimx);
gsl_matrix *expmatreal =gsl_matrix_alloc(2*dimx,2*dimx);
//Converting the complex matrix into real one using A=[Areal, Aimag;-Aimag,Areal]
for (j = 0; j < dimx;j++)
for (k = 0; k < dimx;k++)
{
temp=gsl_matrix_complex_get(A,j,k);
gsl_matrix_set(matreal,j,k,GSL_REAL(temp));
gsl_matrix_set(matreal,dimx+j,dimx+k,GSL_REAL(temp));
gsl_matrix_set(matreal,j,dimx+k,GSL_IMAG(temp));
gsl_matrix_set(matreal,dimx+j,k,-GSL_IMAG(temp));
}
gsl_linalg_exponential_ss(matreal,expmatreal,.01);
double realp;
double imagp;
for (j = 0; j < dimx;j++)
for (k = 0; k < dimx;k++)
{
realp=gsl_matrix_get(expmatreal,j,k);
imagp=gsl_matrix_get(expmatreal,j,dimx+k);
gsl_matrix_complex_set(eA,j,k,gsl_complex_rect(realp,imagp));
}
gsl_matrix_free(matreal);
gsl_matrix_free(expmatreal);
}
int main()
{
int dimx=4;
int i, j ;
gsl_matrix_complex *A = gsl_matrix_complex_alloc (dimx, dimx);
gsl_matrix_complex *eA = gsl_matrix_complex_alloc (dimx, dimx);
for (i = 0; i < dimx;i++)
{
for (j = 0; j < dimx;j++)
{
gsl_matrix_complex_set(A,i,j,gsl_complex_rect(i+j,i-j));
if ((i-j)>=0)
printf("%d+%di ",i+j,i-j);
else
printf("%d%di ",i+j,i-j);
}
printf(";\n");
}
my_gsl_complex_matrix_exponential(eA,A,dimx);
printf("\n Printing the complex matrix exponential\n");
gsl_complex compnum;
for (i = 0; i < dimx;i++)
{
for (j = 0; j < dimx;j++)
{
compnum=gsl_matrix_complex_get(eA,i,j);
if (GSL_IMAG(compnum)>=0)
printf("%f+%fi\t ",GSL_REAL(compnum),GSL_IMAG(compnum));
else
printf("%f%fi\t ",GSL_REAL(compnum),GSL_IMAG(compnum));
}
printf("\n");
}
return(0);
}
I have just started trying C++ AMP and I decided to give it a shot with the current project I am working on. At some point, I have to build a distance matrix for the vectors I have and I have written the code below for this
unsigned int samplesize=samplelist.size();
unsigned int vs = samplelist.front().size();
vector<double> samplevec(samplesize*vs);
vector<double> distancevec(samplesize*samplesize,0);
it1=samplelist.begin();
for(int i=0 ; i<samplesize; ++i){
for(int j = 0 ; j<vs ; ++j){
samplevec[j + i*vs] = (*it1)[j];
}
++it1;
}
array_view<const double,2> samplearray(samplesize,vs,samplevec);
array_view<writeonly<double>,2> distances(samplesize,samplesize,distancevec);
parallel_for_each(distances.grid, [=](index<2> idx) restrict(direct3d){
double sqrsum=0;
double tempd=0;
for ( unsigned int i=0 ; i<vs ; ++i)
{
tempd = samplearray(idx.x,i) - samplearray(idx.y,i);
sqrsum += tempd*tempd;
}
distances[idx]=sqrsum;
}
However, as you can see, this does not take into account the symmetry property of distance matrices. When I calculate sqrsum of matrices i and j, I don't want to do the same calculation again when the order of the i and j are reversed. Is there any way to accomplish this? I came up with the following trick, but I don't know if this would bump up the performance significantly
for ( unsigned int i=0 ; i<vs ; ++i)
{
if(idx.x<=idx.y){
break;
}
tempd = samplearray(idx.x,i) - samplearray(idx.y,i);
sqrsum += tempd*tempd;
}
Can the if-condition do the job? Or do you think the if statement would hurt the performance unnecessarily? I couldn't came up with any alternative to it
BTW, I just noticed that the above written code does not work on my machine, whose gpu only supports single precision. Is there anything to do to get around that problem? Error message is as follows:
"runtime_exception: Concurrency;;parallel_for_each uses features unsupported by the selected accelerator.
ID3D11Device::CreateComputeShader: Shader uses double precision float ops which are not supported on the current device."
I think you can eliminate if-condition, if you would schedule only as many threads as you need, instead of scheduling entire rectangle that covers your output matrix. What you need is upper or lower triangle without diagonal, which you can calculate using arithmetic sequence.
The alternative would be to organize input data such that it is in two 1D vectors, each thread would read value from vector 1, then vector 2 and calculate distance and store it in one of the input vectors.
Finally, the error on double precision shows up, because the card you are using does not support double precision operations. Please check your card specification to confirm that. You can workaround it by switching to single precision type i.e. "float" in array_view template.
I am trying to calculate a determinant using the boost c++ libraries. I found the code for the function InvertMatrix() which I have copied below. Every time I calculate this inverse, I want the determinant as well. I have a good idea how to calculate, by multiplying down the diagonal of the U matrix from the LU decomposition. There is one problem, I am able to calculate the determinant properly, except for the sign. Depending on the pivoting I get the sign incorrect half of the time. Does anyone have a suggestion on how to get the sign right every time? Thanks in advance.
template<class T>
bool InvertMatrix(const ublas::matrix<T>& input, ublas::matrix<T>& inverse)
{
using namespace boost::numeric::ublas;
typedef permutation_matrix<std::size_t> pmatrix;
// create a working copy of the input
matrix<T> A(input);
// create a permutation matrix for the LU-factorization
pmatrix pm(A.size1());
// perform LU-factorization
int res = lu_factorize(A,pm);
if( res != 0 ) return false;
Here is where I inserted my best shot at calculating the determinant.
T determinant = 1;
for(int i = 0; i < A.size1(); i++)
{
determinant *= A(i,i);
}
End my portion of the code.
// create identity matrix of "inverse"
inverse.assign(ublas::identity_matrix<T>(A.size1()));
// backsubstitute to get the inverse
lu_substitute(A, pm, inverse);
return true;
}
The permutation matrix pm contains the information you'll need to determine the sign change: you'll want to multiply your determinant by the determinant of the permutation matrix.
Perusing the source file lu.hpp we find a function called swap_rows which tells how to apply a permutation matrix to a matrix. It's easily modified to yield the determinant of the permutation matrix (the sign of the permutation), given that each actual swap contributes a factor of -1:
template <typename size_type, typename A>
int determinant(const permutation_matrix<size_type,A>& pm)
{
int pm_sign=1;
size_type size=pm.size();
for (size_type i = 0; i < size; ++i)
if (i != pm(i))
pm_sign* = -1; // swap_rows would swap a pair of rows here, so we change sign
return pm_sign;
}
Another alternative would be to use the lu_factorize and lu_substitute methods which don't do any pivoting (consult the source, but basically drop the pm in the calls to lu_factorize and lu_substitute). That change would make your determinant calculation work as-is. Be careful, however: removing pivoting will make the algorithm less numerically stable.
According to http://qiangsong.wordpress.com/2011/07/16/lu-factorisation-in-ublas/:
Just change determinant *= A(i,i) to determinant *= (pm(i) == i ? 1 : -1) * A(i,i).
I tried this way and it works.
I know, that it's actually very similar to Managu's answer and the idea is the same, but I believe it is simpler (and "2 in 1" if used in InvertMatrix function).