Complex matrix exponential in C++ - c++

Is it actually possible to calculate the Matrix Exponential of a Complex Matrix in c / c++?
I've managed to take the product of two complex matrices using blas functions from the GNU Science Library. for matC = matA * matB:
gsl_blas_zgemm (CblasNoTrans, CblasNoTrans, GSL_COMPLEX_ONE, matA, matB, GSL_COMPLEX_ZERO, matC);
And I've managed to get the matrix exponential of a matrix by using the undocumented
gsl_linalg_exponential_ss(&m.matrix, &em.matrix, .01);
But this doesn't seems to accept complex arguments.
Is there anyway to do this? I used to think c++ was capable of anything. Now I think its outdated and cryptic...

Several options:
modify the gsl_linalg_exponential_ss code to accept complex matrices
write your complex NxN matrix as real 2N x 2N matrix
Diagonalize the matrix, take the exponential of the eigenvalues, and rotate the matrix back to the original basis
Using the complex matrix product that is available, implement the matrix exponential according to it's definition: exp(A) = sum_(n=0)^(n=infinity) A^n/(n!)
You have to check which methods are appropriate for your problem.
C++ is a general purpose language. As mentioned above, if you need specific functionality you have to find a library that can do it or implement it yourself. Alternatively you could use software like MatLab and Mathematica. If that's too expensive there are open source alternatives, e.g. Sage and Octave.

"I used to think c++ was capable of anything" - if a general-purpose language has built-in complex math in its core, then something is wrong with that language.
Fur such very specific tasks there is a well-accepted solution: libraries. Either write your own, or much better, use an already existing one.
I myself rarely need complex matrices in C++, I always used Matlab and similar tools for that. However, this http://www.mathtools.net/C_C__/Mathematics/index.html might be of interest to you if you know Matlab.
There are a couple other libraries which might be of help:
http://eigen.tuxfamily.org/index.php?title=Main_Page
http://math.nist.gov/lapack++/

I was also thinking to do the same, writing your complex NxN matrix as real 2N x 2N matrix is the best way to solve the problem, then use gsl_linalg_exponential_ss().
Suppose A=Ar+i*Ai, where A is the complex matrix and Ar and Ai are the real matrices. Then write the new matrix B=[Ar Ai ;-Ai Ar] (Here the matrix is written in matlab notation). Now calculate the exponential of B, that is eB=[eB1 eB2 ;eB3 eB4], then exponential of A is given by, eA=eB1+1i.*eB2
(summing the matrices eB1 and 1i.*eB2).

I have written a code to calculate the matrix exponential of the complex matrices with the gsl function, gsl_linalg_exponential_ss(&m.matrix, &em.matrix, .01);
Here you have the complete code, and the compilation results. I have checked the result with the Matlab and result agrees.
#include <stdio.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_linalg.h>
#include <gsl/gsl_complex.h>
#include <gsl/gsl_complex_math.h>
void my_gsl_complex_matrix_exponential(gsl_matrix_complex *eA, gsl_matrix_complex *A, int dimx)
{
int j,k=0;
gsl_complex temp;
gsl_matrix *matreal =gsl_matrix_alloc(2*dimx,2*dimx);
gsl_matrix *expmatreal =gsl_matrix_alloc(2*dimx,2*dimx);
//Converting the complex matrix into real one using A=[Areal, Aimag;-Aimag,Areal]
for (j = 0; j < dimx;j++)
for (k = 0; k < dimx;k++)
{
temp=gsl_matrix_complex_get(A,j,k);
gsl_matrix_set(matreal,j,k,GSL_REAL(temp));
gsl_matrix_set(matreal,dimx+j,dimx+k,GSL_REAL(temp));
gsl_matrix_set(matreal,j,dimx+k,GSL_IMAG(temp));
gsl_matrix_set(matreal,dimx+j,k,-GSL_IMAG(temp));
}
gsl_linalg_exponential_ss(matreal,expmatreal,.01);
double realp;
double imagp;
for (j = 0; j < dimx;j++)
for (k = 0; k < dimx;k++)
{
realp=gsl_matrix_get(expmatreal,j,k);
imagp=gsl_matrix_get(expmatreal,j,dimx+k);
gsl_matrix_complex_set(eA,j,k,gsl_complex_rect(realp,imagp));
}
gsl_matrix_free(matreal);
gsl_matrix_free(expmatreal);
}
int main()
{
int dimx=4;
int i, j ;
gsl_matrix_complex *A = gsl_matrix_complex_alloc (dimx, dimx);
gsl_matrix_complex *eA = gsl_matrix_complex_alloc (dimx, dimx);
for (i = 0; i < dimx;i++)
{
for (j = 0; j < dimx;j++)
{
gsl_matrix_complex_set(A,i,j,gsl_complex_rect(i+j,i-j));
if ((i-j)>=0)
printf("%d+%di ",i+j,i-j);
else
printf("%d%di ",i+j,i-j);
}
printf(";\n");
}
my_gsl_complex_matrix_exponential(eA,A,dimx);
printf("\n Printing the complex matrix exponential\n");
gsl_complex compnum;
for (i = 0; i < dimx;i++)
{
for (j = 0; j < dimx;j++)
{
compnum=gsl_matrix_complex_get(eA,i,j);
if (GSL_IMAG(compnum)>=0)
printf("%f+%fi\t ",GSL_REAL(compnum),GSL_IMAG(compnum));
else
printf("%f%fi\t ",GSL_REAL(compnum),GSL_IMAG(compnum));
}
printf("\n");
}
return(0);
}

Related

Matrix Multiplication using SIMD vectors in C++

I am currently reading an article on github about performance optimisation using Clang's extended vector syntax. The author gives the following code snippet:
The templated code below implements the innermost loops that calculate a patch of size regA x regB in matrix C. The code loads regA scalars from matrixA and regB SIMD-width vectors from matrix B. The program uses Clang's extended vector syntax.
/// Compute a RAxRB block of C using a vectorized dot product, where RA is the
/// number of registers to load from matrix A, and RB is the number of registers
/// to load from matrix B.
template <unsigned regsA, unsigned regsB>
void matmul_dot_inner(int k, const float *a, int lda, const float *b, int ldb,
float *c, int ldc) {
float8 csum[regsA][regsB] = {{0.0}};
for (int p = 0; p < k; p++) {
// Perform the DOT product.
for (int bi = 0; bi < regsB; bi++) {
float8 bb = LoadFloat8(&B(p, bi * 8));
for (int ai = 0; ai < regsA; ai++) {
float8 aa = BroadcastFloat8(A(ai, p));
csum[ai][bi] += aa * bb;
}
}
}
// Accumulate the results into C.
for (int ai = 0; ai < regsA; ai++) {
for (int bi = 0; bi < regsB; bi++) {
AdduFloat8(&C(ai, bi * 8), csum[ai][bi]);
}
}
}
The code, outlines below, confuses me the most. I read the full article and understood the logic behind using blocking and calculating a small patch, but I can't entirely understand what does this bit means:
// Perform the DOT product.
for (int bi = 0; bi < regsB; bi++) {
float8 bb = LoadFloat8(&B(p, bi * 8)); //the pointer to the range of values?
for (int ai = 0; ai < regsA; ai++) {
float8 aa = BroadcastFloat8(A(ai, p));
csum[ai][bi] += aa * bb;
}
}
}
Can anyone elaborate what's going on in here?
The article could be found here
The 2nd comment on the article links to https://github.com/pytorch/glow/blob/405e632ef138f1d49db9c3181182f7efd837bccc/lib/Backends/CPU/libjit/libjit_defs.h#L26 which defines the float8 type as
typedef float float8 __attribute__((ext_vector_type(8)));
(similar to how immintrin.h defines __m256). And defines the load / broadcast functions Similar to _mm256_load_ps and _mm256_set1_ps. With that header, you should be able to compile the code in the article.
See Clang's native vector documentation. GNU C native vector syntax is a nice way to get an overloaded * operator. I don't know what clang's ext_vector_type does that GCC/clang/ICC float __attribute__((vector_width(32))) (32 byte width) wouldn't.
The article could have added 1 small section to explain that, but it seems it was more focused on the performance details, and wasn't really interested in explaining how to use syntax.
Most of the discussion in the article is about how to manually vectorize matmul for cache efficiency with SIMD vectors. That part looks good from the quick skim I gave it.
You can do those things with any of multiple ways to manually vectors: GNU C native vectors or clang's very similar "extended" vectors, or portable Intel intrinsics.

Fast way to slice an Eigen SparseMatrix

In finite element analyses it is quite common to apply some prescribed condition(s) to a big sparse matrix and get a reduced one. This can be achieved easily in MATLAB, SciPy and Julia, for instance, in MATLAB
a=sprand(10000,10000,0.2); % create a random sparse matrix; fill %20
tic; c=a(1:2:4000,2:3:5000); toc % slice the matrix to get a reduced one
Assuming that one has a similar sparse matrix in Eigen, what is the most efficient way to slice an Eigen matrix. I don't care about a copy or a view, but the methodology needs to be able to cope up with non-contiguous slicing. The latter requirement makes the Eigen block operations useless in this regard.
I can think of two methodologies that I have tested:
Iterate over the columns and rows using for loops and assign the values to a second sparse matrix (I know this is a truly bad idea).
Create a dummy sparse matrix with zeros and ones and pre and post multiply it with the actual matrix D*A*A.transpose()
I always use setFromTriplets to create a sparse matrices in Eigen and I have been happy with the solvers and assembling of sparse matrices. However it seems that this slicing is the bottleneck in my code at the moment
The timing of MATLAB vs Eigen (using -O3 -DNDEBUG -march=native) is
MATLAB: 0.016 secs
EIGEN LOOP INDEXING: 193 secs
EIGEN PRE-POST MUL: 13.7 secs
The other methodology that I do not know how to go about is to manipulate directly the [I,J,V] triplets outerIndexPtr, innerIndexPtr, valuePtr.
Here is a proof of concept code snippet
#include <Eigen/Core>
#include <Eigen/Sparse>
template<typename T>
using spmatrix = Eigen::SparseMatrix<T,Eigen::RowMajor>;
spmatrix<double> sprand(int rows, int cols, double sparsity) {
std::default_random_engine gen;
std::uniform_real_distribution<double> dist(0.0,1.0);
int sparsity_ = sparsity*100;
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList; tripletList.reserve(rows*cols);
int counter = 0;
for(int i=0;i<rows;++i) {
for(int j=0;j<cols;++j) {
if (counter % sparsity_ == 0) {
auto v_ij=dist(gen);
tripletList.push_back(T(i,j,v_ij));
}
counter++;
}
}
spmatrix<double> mat(rows,cols);
mat.setFromTriplets(tripletList.begin(), tripletList.end());
return mat;
}
int main() {
int m=1000,n=10000;
auto a = sprand(n,n,0.05);
auto b = sprand(m,n,0.1);
spmatrix<double> c;
// this is efficient but definitely not the right way to do this
// c = b*a*b.transpose(); // uncomment to check, much slower than block operation
c = a.block(0,0,1000,1000); // very fast, Faster than MATLAB (I believe this is just a view)
return 0;
}
So Any pointers, in this direction would be useful.

Increase precision in SelfAdjointEigenSolver in Eigen

I am trying to determine the eigenvalues and eigenvectors of a sparse array in Eigen. Since I need to compute all the eigenvectors and eigenvalues, and I could not get this done using the unsupported ArpackSupport module working, I chose to convert the system to a dense matrix and compute the eigensystem using SelfAdjointEigenSolver (I know my matrix is real and has real eigenvalues). This works well until I have matrices of size 1024*1024 but then I start getting deviations from the expected results.
In the documentation of this module (https://eigen.tuxfamily.org/dox/classEigen_1_1SelfAdjointEigenSolver.html) from what I understood it is possible to change the number of max iterations:
const int m_maxIterations
static
Maximum number of iterations.
The algorithm terminates if it does not converge within m_maxIterations * n iterations, where n denotes the size of the matrix. This value is currently set to 30 (copied from LAPACK).
However, I do not understand how do you implement this, using their examples:
SelfAdjointEigenSolver<Matrix4f> es;
Matrix4f X = Matrix4f::Random(4,4);
Matrix4f A = X + X.transpose();
es.compute(A);
cout << "The eigenvalues of A are: " << es.eigenvalues().transpose() << endl;
es.compute(A + Matrix4f::Identity(4,4)); // re-use es to compute eigenvalues of A+I
cout << "The eigenvalues of A+I are: " << es.eigenvalues().transpose() << endl
How would you modify it in order to change the maximum number of iterations?
Additionally, will this solve my problem or should I try to find an alternative function or algorithm to solve the eigensystem?
My thanks in advance.
Increasing the number of iterations is unlikely to help. On the other hand, moving from float to double will help a lot!
If that does not help, please, be more specific on "deviations from the expected results".
m_maxIterations is a static const int variable, and as such it can be considered an intrinsic property of the type. Changing such a type property usually would be done via a specific template parameter. In this case, however, it is set to the constant number 30, so it's not possible.
Therefore, you're only choice is to change the value in the header file and recompile your program.
However, before doing that, I would try the Singular Value Decomposition. According to the homepage, its accuracy is "Excellent-Proven". Moreover, it can overcome problems due to numerically not completely symmetric matrices.
I solved the problem by writing the Jacobi algorithm adapted from the Book Numerical Recipes:
void ROTATy(MatrixXd &a, int i, int j, int k, int l, double s, double tau)
{
double g,h;
g=a(i,j);
h=a(k,l);
a(i,j)=g-s*(h+g*tau);
a(k,l)=h+s*(g-h*tau);
}
void jacoby(int n, MatrixXd &a, MatrixXd &v, VectorXd &d )
{
int j,iq,ip,i;
double tresh,theta,tau,t,sm,s,h,g,c;
VectorXd b(n);
VectorXd z(n);
v.setIdentity();
z.setZero();
for (ip=0;ip<n;ip++)
{
d(ip)=a(ip,ip);
b(ip)=d(ip);
}
for (i=0;i<50;i++)
{
sm=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
sm += fabs(a(ip,iq));
}
if (sm == 0.0) {
break;
}
if (i < 3)
tresh=0.2*sm/(n*n);
else
tresh=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
{
g=100.0*fabs(a(ip,iq));
if (i > 3 && (fabs(d(ip))+g) == fabs(d[ip]) && (fabs(d[iq])+g) == fabs(d[iq]))
a(ip,iq)=0.0;
else if (fabs(a(ip,iq)) > tresh)
{
h=d(iq)-d(ip);
if ((fabs(h)+g) == fabs(h))
{
t=(a(ip,iq))/h;
}
else
{
theta=0.5*h/(a(ip,iq));
t=1.0/(fabs(theta)+sqrt(1.0+theta*theta));
if (theta < 0.0)
{
t = -t;
}
c=1.0/sqrt(1+t*t);
s=t*c;
tau=s/(1.0+c);
h=t*a(ip,iq);
z(ip)=z(ip)-h;
z(iq)=z(iq)+h;
d(ip)=d(ip)- h;
d(iq)=d(iq) + h;
a(ip,iq)=0.0;
for (j=0;j<ip;j++)
ROTATy(a,j,ip,j,iq,s,tau);
for (j=ip+1;j<iq;j++)
ROTATy(a,ip,j,j,iq,s,tau);
for (j=iq+1;j<n;j++)
ROTATy(a,ip,j,iq,j,s,tau);
for (j=0;j<n;j++)
ROTATy(v,j,ip,j,iq,s,tau);
}
}
}
}
}
}
the function jacoby receives the size of of the square matrix n, the matrix we want to calculate the we want to solve (a) and a matrix that will receive the eigenvectors in each column and a vector that is going to receive the eigenvalues. It is a bit slower so I tried to parallelize it with OpenMp (see: Parallelization of Jacobi algorithm using eigen c++ using openmp) but for 4096x4096 sized matrices what I did not mean an improvement in computation time, unfortunately.

How can I calculate inverse of sparse matrix in Eigen library

I have a question about Eigen library in C++. Actually, I want to calculate inverse matrix of sparse matrix.
When I used Dense matrix in Eigen, I can use .inverse() operation to calculate inverse of dense matrix.
But in Sparse matrix, I cannot find inverse operation anywhere. Does anyone who know to calculate inverse of sparse matrix? help me.
You cannot do it directly, but you can always calculate it, using one of the sparse solvers. The idea is to solve A*X=I, where I is the identity matrix. If there is a solution, X will be your inverse matrix.
The eigen documentation has a page about sparse solvers and how to use them, but the basic steps are as follows:
SolverClassName<SparseMatrix<double> > solver;
solver.compute(A);
SparseMatrix<double> I(n,n);
I.setIdentity();
auto A_inv = solver.solve(I);
It's not mathematically meaningful.
A sparse matrix does not necessarily have a sparse inverse.
That's why the method is not available.
A small extension on #Soheib and #MatthiasB's answers, if you're using Eigen::SparseMatrix<float> it's better to use SparseLU rather than SimplicialLLT or SimplicialLDLT, they produced wrong answers with me on float matrices
Be warned that the inverse of a sparse matrix is not necessarily sparse, so if you're working with large matrices (which is likely, if you're using sparse representations) then this is going to be expensive. Think carefully about whether you really need the actual matrix inverse. If you're going to use the matrix inverse to solve a system of equations, then you don't need to actually compute the matrix inverse and multiply it out (use the method typically named solve and supply the right-hand-side of the equation). If you need the inverse of the Fisher matrix for covariances, try to approximate.
You can find a example about inverse of Sparse Complex Matrix
I used of SimplicialLLT class,
you can find other class from bellow
http://eigen.tuxfamily.org/dox-devel/group__TopicSparseSystems.html
This page can help you with proper class name for your work (spead, accuracy and dimmenssion of your Matrix)
////////////////////// In His Name \\\\\\\\\\\\\\\\\\\\\\\\\\\
#include <iostream>
#include <vector>
#include <Eigen/Dense>
#include <Eigen/Sparse>
using namespace std;
using namespace Eigen;
int main()
{
SparseMatrix< complex<float> > A(4,4);
for (int i=0; i<4; i++) {
for (int j=0; j<4; j++) {
A.coeffRef(i, i) = i+j;
}
}
A.insert(2,1) = {2,1};
A.insert(3,0) = {0,0};
A.insert(3,1) = {2.5,1};
A.insert(1,3) = {2.5,1};
SimplicialLLT<SparseMatrix<complex<float> > > solverA;
A.makeCompressed();
solverA.compute(A);
if(solverA.info()!=Success) {
cout << "Oh: Very bad" << endl;
}
SparseMatrix<float> eye(4,4);
eye.setIdentity();
SparseMatrix<complex<float> > inv_A = solverA.solve(eye);
cout << "A:\n" << A << endl;
cout << "inv_A\n" << inv_A << endl;
}

Controlling the index variables in C++ AMP

I have just started trying C++ AMP and I decided to give it a shot with the current project I am working on. At some point, I have to build a distance matrix for the vectors I have and I have written the code below for this
unsigned int samplesize=samplelist.size();
unsigned int vs = samplelist.front().size();
vector<double> samplevec(samplesize*vs);
vector<double> distancevec(samplesize*samplesize,0);
it1=samplelist.begin();
for(int i=0 ; i<samplesize; ++i){
for(int j = 0 ; j<vs ; ++j){
samplevec[j + i*vs] = (*it1)[j];
}
++it1;
}
array_view<const double,2> samplearray(samplesize,vs,samplevec);
array_view<writeonly<double>,2> distances(samplesize,samplesize,distancevec);
parallel_for_each(distances.grid, [=](index<2> idx) restrict(direct3d){
double sqrsum=0;
double tempd=0;
for ( unsigned int i=0 ; i<vs ; ++i)
{
tempd = samplearray(idx.x,i) - samplearray(idx.y,i);
sqrsum += tempd*tempd;
}
distances[idx]=sqrsum;
}
However, as you can see, this does not take into account the symmetry property of distance matrices. When I calculate sqrsum of matrices i and j, I don't want to do the same calculation again when the order of the i and j are reversed. Is there any way to accomplish this? I came up with the following trick, but I don't know if this would bump up the performance significantly
for ( unsigned int i=0 ; i<vs ; ++i)
{
if(idx.x<=idx.y){
break;
}
tempd = samplearray(idx.x,i) - samplearray(idx.y,i);
sqrsum += tempd*tempd;
}
Can the if-condition do the job? Or do you think the if statement would hurt the performance unnecessarily? I couldn't came up with any alternative to it
BTW, I just noticed that the above written code does not work on my machine, whose gpu only supports single precision. Is there anything to do to get around that problem? Error message is as follows:
"runtime_exception: Concurrency;;parallel_for_each uses features unsupported by the selected accelerator.
ID3D11Device::CreateComputeShader: Shader uses double precision float ops which are not supported on the current device."
I think you can eliminate if-condition, if you would schedule only as many threads as you need, instead of scheduling entire rectangle that covers your output matrix. What you need is upper or lower triangle without diagonal, which you can calculate using arithmetic sequence.
The alternative would be to organize input data such that it is in two 1D vectors, each thread would read value from vector 1, then vector 2 and calculate distance and store it in one of the input vectors.
Finally, the error on double precision shows up, because the card you are using does not support double precision operations. Please check your card specification to confirm that. You can workaround it by switching to single precision type i.e. "float" in array_view template.