Rewriting Matlab eig(A,B) (Generalized eigenvalues/eigenvectors) to C/C++

Rewriting Matlab eig(A,B) (Generalized eigenvalues/eigenvectors) to C/C++ - c++

Do anyone have any idea how can I rewrite eig(A,B) from Matlab used to calculate generalized eigenvector/eigenvalues? I've been struggling with this problem lately. So far:
Matlab definition of eig function I need:
[V,D] = eig(A,B) produces a diagonal matrix D of generalized
eigenvalues and a full matrix V whose columns are the corresponding
eigenvectors so that A*V = B*V*D.
So far I tried the Eigen library (http://eigen.tuxfamily.org/dox/classEigen_1_1GeneralizedSelfAdjointEigenSolver.html)
My implementation looks like this:
std::pair<Matrix4cd, Vector4d> eig(const Matrix4cd& A, const Matrix4cd& B)
{
Eigen::GeneralizedSelfAdjointEigenSolver<Matrix4cd> solver(A, B);
Matrix4cd V = solver.eigenvectors();
Vector4d D = solver.eigenvalues();
return std::make_pair(V, D);
}
But first thing that comes to my mind is, that I can't use Vector4cd as .eigenvalues() doesn't return complex values where Matlab does. Furthermore results of .eigenvectors() and .eigenvalues() for the same matrices are not the same at all:
C++:
Matrix4cd x;
Matrix4cd y;
pair<Matrix4cd, Vector4d> result;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
x.real()(i,j) = (double)(i+j+1+i*3);
y.real()(i,j) = (double)(17 - (i+j+1+i*3));
x.imag()(i,j) = (double)(i+j+1+i*3);
y.imag()(i,j) = (double)(17 - (i+j+1+i*3));
}
}
result = eig(x,y);
cout << result.first << endl << endl;
cout << result.second << endl << endl;
Matlab:
for i=1:1:4
for j=1:1:4
x(i,j) = complex((i-1)+(j-1)+1+((i-1)*3), (i-1)+(j-1)+1+((i-1)*3));
y(i,j) = complex(17 - ((i-1)+(j-1)+1+((i-1)*3)), 17 - ((i-1)+(j-1)+1+((i-1)*3)));
end
end
[A,B] = eig(x,y)
So I give eig the same 4x4 matrices holding values 1-16 ascending (x) and descending (y). But I receive different results, furthermore Eigen method returns double from eigenvalues while Matlab returns complex dobule. I also find out that there is other Eigen solver named GeneralizedEigenSolver. That one in the documentation (http://eigen.tuxfamily.org/dox/classEigen_1_1GeneralizedEigenSolver.html) has written that it solves A*V = B*V*D but to be honest I tried it and results (matrix sizes) are not the same size as Matlab so I got quite lost how it works (examplary results are on the website I've linked). It also has only .eigenvector method.
C++ results:
(-0.222268,-0.0108754) (0.0803437,-0.0254809) (0.0383264,-0.0233819) (0.0995482,0.00682079)
(-0.009275,-0.0182668) (-0.0395551,-0.0582127) (0.0550395,0.03434) (-0.034419,-0.0287563)
(-0.112716,-0.0621061) (-0.010788,0.10297) (-0.0820552,0.0294896) (-0.114596,-0.146384)
(0.28873,0.257988) (0.0166259,-0.0529934) (0.0351645,-0.0322988) (0.405394,0.424698)
-1.66983
-0.0733194
0.0386832
3.97933
Matlab results:
[A,B] = eig(x,y)
A =
Columns 1 through 3
-0.9100 + 0.0900i -0.5506 + 0.4494i 0.3614 + 0.3531i
0.7123 + 0.0734i 0.4928 - 0.2586i -0.5663 - 0.4337i
0.0899 - 0.4170i -0.1210 - 0.3087i 0.0484 - 0.1918i
0.1077 + 0.2535i 0.1787 + 0.1179i 0.1565 + 0.2724i
Column 4
-0.3237 - 0.3868i
0.2338 + 0.7662i
0.5036 - 0.3720i
-0.4136 - 0.0074i
B =
Columns 1 through 3
-1.0000 + 0.0000i 0.0000 + 0.0000i 0.0000 + 0.0000i
0.0000 + 0.0000i -1.0000 - 0.0000i 0.0000 + 0.0000i
0.0000 + 0.0000i 0.0000 + 0.0000i -4.5745 - 1.8929i
0.0000 + 0.0000i 0.0000 + 0.0000i 0.0000 + 0.0000i
Column 4
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
-0.3317 + 1.1948i
Second try was with Intel IPP but it seems that it solves only A*V = V*D and support told me that it's not supported anymore.
https://software.intel.com/en-us/node/505270 (list of constructors for Intel IPP)
I got suggestion to move from Intel IPP to MKL. I did it and hit the wall again. I tried to check all algorithms for Eigen but it seems that there are only A*V = V*D problems solved. I was checking lapack95.lib. The list of algorithms used by this library is available there:
https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/index.htm#dsyev.htm
Somewhere on the web I could find topic on Mathworks when someone said that managed to solve my problem partially with usage of MKL:
http://jp.mathworks.com/matlabcentral/answers/40050-generalized-eigenvalue-and-eigenvectors-differences-between-matlab-eig-a-b-and-mkl-lapack-dsygv
Person said that he/she used dsygv algorithm but I can't locate anything like that on the web. Maybe it's a typo.
Anyone has any other proposition/idea how can I implement it? Or maybe point my mistake. I'd appreciate that.
EDIT:
In comments I've received a hint that I was using Eigen solver wrong. My A matrix wasn't self-adjoint and my B matrix wasn't positive-definite. I took matrices from program I want to rewrite to C++ (from random iteration) and checked if they meet the requirements. They did:
Rj =
1.0e+02 *
Columns 1 through 3
0.1302 + 0.0000i -0.0153 + 0.0724i 0.0011 - 0.0042i
-0.0153 - 0.0724i 1.2041 + 0.0000i -0.0524 + 0.0377i
0.0011 + 0.0042i -0.0524 - 0.0377i 0.0477 + 0.0000i
-0.0080 - 0.0108i 0.0929 - 0.0115i -0.0055 + 0.0021i
Column 4
-0.0080 + 0.0108i
0.0929 + 0.0115i
-0.0055 - 0.0021i
0.0317 + 0.0000i
Rt =
Columns 1 through 3
4.8156 + 0.0000i -0.3397 + 1.3502i -0.2143 - 0.3593i
-0.3397 - 1.3502i 7.3635 + 0.0000i -0.5539 - 0.5176i
-0.2143 + 0.3593i -0.5539 + 0.5176i 1.7801 + 0.0000i
0.5241 + 0.9105i 0.9514 + 0.6572i -0.7302 + 0.3161i
Column 4
0.5241 - 0.9105i
0.9514 - 0.6572i
-0.7302 - 0.3161i
9.6022 + 0.0000i
As for Rj which is now my A - it is self-adjoint because Rj = Rj' and Rj = ctranspose(Rj). (http://mathworld.wolfram.com/Self-AdjointMatrix.html)
As for Rt which is now my B - it is Positive-Definite what is checked with method linked to me. (http://www.mathworks.com/matlabcentral/answers/101132-how-do-i-determine-if-a-matrix-is-positive-definite-using-matlab). So
>> [~,p] = chol(Rt)
p =
0
I've rewritten matrices manually to C++ and performed eig(A,B) again with matrices meeting requirements:
Matrix4cd x;
Matrix4cd y;
pair<Matrix4cd, Vector4d> result;
x.real()(0,0) = 13.0163601949795;
x.real()(0,1) = -1.53172561296005;
x.real()(0,2) = 0.109594869350436;
x.real()(0,3) = -0.804231869422614;
x.real()(1,0) = -1.53172561296005;
x.real()(1,1) = 120.406645675346;
x.real()(1,2) = -5.23758765476463;
x.real()(1,3) = 9.28686785230169;
x.real()(2,0) = 0.109594869350436;
x.real()(2,1) = -5.23758765476463;
x.real()(2,2) = 4.76648319080400;
x.real()(2,3) = -0.552823839520508;
x.real()(3,0) = -0.804231869422614;
x.real()(3,1) = 9.28686785230169;
x.real()(3,2) = -0.552823839520508;
x.real()(3,3) = 3.16510496622613;
x.imag()(0,0) = -0.00000000000000;
x.imag()(0,1) = 7.23946944213164;
x.imag()(0,2) = 0.419181335323979;
x.imag()(0,3) = 1.08441894337449;
x.imag()(1,0) = -7.23946944213164;
x.imag()(1,1) = -0.00000000000000;
x.imag()(1,2) = 3.76849276970080;
x.imag()(1,3) = 1.14635625342266;
x.imag()(2,0) = 0.419181335323979;
x.imag()(2,1) = -3.76849276970080;
x.imag()(2,2) = -0.00000000000000;
x.imag()(2,3) = 0.205129702522089;
x.imag()(3,0) = -1.08441894337449;
x.imag()(3,1) = -1.14635625342266;
x.imag()(3,2) = 0.205129702522089;
x.imag()(3,3) = -0.00000000000000;
y.real()(0,0) = 4.81562784930907;
y.real()(0,1) = -0.339731222392148;
y.real()(0,2) = -0.214319720979258;
y.real()(0,3) = 0.524107127885349;
y.real()(1,0) = -0.339731222392148;
y.real()(1,1) = 7.36354235698375;
y.real()(1,2) = -0.553927983436786;
y.real()(1,3) = 0.951404408649307;
y.real()(2,0) = -0.214319720979258;
y.real()(2,1) = -0.553927983436786;
y.real()(2,2) = 1.78008768533745;
y.real()(2,3) = -0.730246631850385;
y.real()(3,0) = 0.524107127885349;
y.real()(3,1) = 0.951404408649307;
y.real()(3,2) = -0.730246631850385;
y.real()(3,3) = 9.60215057284395;
y.imag()(0,0) = -0.00000000000000;
y.imag()(0,1) = 1.35016928394966;
y.imag()(0,2) = -0.359262708214312;
y.imag()(0,3) = -0.910512495060186;
y.imag()(1,0) = -1.35016928394966;
y.imag()(1,1) = -0.00000000000000;
y.imag()(1,2) = -0.517616473138836;
y.imag()(1,3) = -0.657235460367660;
y.imag()(2,0) = 0.359262708214312;
y.imag()(2,1) = 0.517616473138836;
y.imag()(2,2) = -0.00000000000000;
y.imag()(2,3) = -0.316090662865005;
y.imag()(3,0) = 0.910512495060186;
y.imag()(3,1) = 0.657235460367660;
y.imag()(3,2) = 0.316090662865005;
y.imag()(3,3) = -0.00000000000000;
result = eig(x,y);
cout << result.first << endl << endl;
cout << result.second << endl << endl;
And the results of C++:
(0.0295948,0.00562174) (-0.253532,0.0138373) (-0.395087,-0.0139696) (-0.0918132,-0.0788735)
(-0.00994614,-0.0213973) (-0.0118322,-0.0445976) (0.00993512,0.0127006) (0.0590018,-0.387949)
(0.0139485,-0.00832193) (0.363694,-0.446652) (-0.319168,0.376483) (-0.234447,-0.0859585)
(0.173697,0.268015) (0.0279387,-0.0103741) (0.0273701,0.0937148) (-0.055169,0.0295393)
0.244233
2.24309
3.24152
18.664
Results of MATLAB:
>> [A,B] = eig(Rj,Rt)
A =
Columns 1 through 3
0.0208 - 0.0218i 0.2425 + 0.0753i -0.1242 + 0.3753i
-0.0234 - 0.0033i -0.0044 + 0.0459i 0.0150 - 0.0060i
0.0006 - 0.0162i -0.4964 + 0.2921i 0.2719 + 0.4119i
0.3194 + 0.0000i -0.0298 + 0.0000i 0.0976 + 0.0000i
Column 4
-0.0437 - 0.1129i
0.2351 - 0.3142i
-0.1661 - 0.1864i
-0.0626 + 0.0000i
B =
0.2442 0 0 0
0 2.2431 0 0
0 0 3.2415 0
0 0 0 18.6640
Eigenvalues are the same! Nice, but why Eigenvectors are not similar at all?

There is no problem here with Eigen.
In fact for the second example run, Matlab and Eigen produced the very same result. Please remember from basic linear algebra that eigenvector are determined up to an arbitrary scaling factor. (I.e. if v is an eigenvector the same holds for alpha*v, where alpha is a non zero complex scalar.)
It is quite common that different linear algebra libraries compute different eigenvectors, but this does not mean that one of the two codes is wrong: it simply means that they choose a different scaling of the eigenvectors.
EDIT
The main problem with exactly replicating the scaling chosen by matlab is that eig(A,B) is a driver routine, which depending from the different properties of A and B may call different libraries/routines, and apply extra steps like balancing the matrices and so on. By quickly inspecting your example, I would say that in this case matlab is enforcing following condition:
all(imag(V(end,:))==0) (the last component of each eigenvector is real)
but not imposing other constraints. This unfortunately means that the scaling is not unique, and probably depends on intermediate results of the generalised eigenvector algorithm used. In this case I'm not able to give you advice on how to exactly replicate matlab: knowledge of the internal working of matlab is required.
As a general remark, in linear algebra usually one does not care too much about eigenvector scaling, since this is usually completely irrelevant for the problem solved, when the eigenvectors are just used as intermediate results.
The only case in which the scaling has to be defined exactly, is when you are going to give a graphic representation of the eigenvalues.

The eigenvector scaling in Matlab seems to be based on normalizing them to 1.0 (ie. the absolute value of the biggest term in each vector is 1.0). In the application I was using it also returns the left eigenvector rather than the more commonly used right eigenvector. This could explain the differences between Matlab and the eigensolvers in Lapack MKL.

Related

Matrix multiplication with Python

I have a numerical analysis assignment and I need to find some coefficients by multiplying matrices. We were given an example in Mathcad, but now we have to do it in another programming language so I chose Python.
The problem is, that I get different results by multiplying matrices in respective environments. Here's the function in Python:
from numpy import *
def matrica(C, n):
N = len(C) - 1
m = N - n
A = [[0] * (N + 1) for i in range(N+1)]
A[0][0] = 1
for i in range(0, n + 1):
A[i][i] = 1
for j in range(1, m + 1):
for i in range(0, N + 1):
if i + j <= N:
A[i+j][n+j] = A[i+j][n+j] - C[i]/2
A[int(abs(i - j))][n+j] = A[int(abs(i - j))][n+j] - C[i]/2
M = matrix(A)
x = matrix([[x] for x in C])
return [float(y) for y in M.I * x]
As you can see I am using numpy library. This function is consistent with its analog in Mathcad until return statement, the part where matrices are multiplied, to be more specific. One more observation: this function returns correct matrix if N = 1.

I'm not sure I understand exactly what your code do. Could you explain a little more, like what math stuff you're actually computing. But if you want a plain regular product and if you use a numpy.matrix, why don't you use the already written matrix product?
a = numpy.matrix(...)
b = numpy.matrix(...)
p = a * b #matrix product

Program Help - Solving for e(n)

I've been wrestling with this issue for a week and I just need some guidance on the math part of it. If I could just understand the math behind it I could piece together the functions to make it work. The assignment is;
Design and develop a C++ program for Calculating e(n) when delta <= 0.000001
e(n-1) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n-1)!
e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)!
delta = e(n) – e(n-1)
You do not have any input to the program. Your output should be something like this:
N = 2 e(1) = 2 e(2) = 2.5 delta = 0.5
N = 3 e(2) = 2.5 e(3) = 2.565 delta = 0.065
...
You must use recursive function calls.
My first issue is the math and the variables that would contain them.
the delta, e(n), and e(n-1) variable must doubles
if e(n) = 1 + 1 / 1! = 2 then e(n-1) must equal 1, which means delta = 1 (that's my thinking anyway) I'm just not sure of the math behind the .5 delta the first time and the 0.065 in the second iteration.
Can someone point me in the right direction on this problem?
Thank you,
T

From the wikipedia link, you can see that
I will not explain the notion of limits here, but what this basically means is that, if we define a function e where e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)! (which is the function given in your problem), we are able to approximate the real value of the constant e.
The higher n is, the closer we get from e.
If you look closely at the function, you can see that each time, we add a term which is smaller than the previous one: 1 >= 1/1! >= 1/2! >= .... >= 1/(n)!
That basically means that, every time we increase n we are getting closer to e but we are slowing down in the way.
The real value of e is 2.71828...
In our first step e(1) = 1, we are 1.71828... too far from the real value
In the second step e(2) = 2, we are at 0.71828..., 1 distance closer
In the third step e(3) = 2.5, we are now at 0.21828..., 0.5 distance closer
As you can see, we are getting there, but the closer we get, the slower we move. Now let's say that at each step, we want to know how close we have moved compared to the previous value.
We then do simply e(n) - e(n-1). This is basically what the delta means.
At some point, we are moving so slow that it does no longer make any sense to keep going. We are almost staying put. At this point, we decide that our approximation is close enough from e.
In your case, the problem defines the minimum progression speed to 0.000001

here is a solution :-
delta = e(n) - e(n-1)
delta = 1/n!
delta < 0.000001
n! > 1000000
n >= 10 as 10! = 3628800

Understanding DEL2 function in Matlab in order to code it in C++

in order to code the DEL2 matlab function in c++ I need to understand the algorithm. I've managed to code the function for elements of the matrix that are not on the borders or the edges.
I've seen several topics about it and read the MATLAB code by typing "edit del2" or "type del2" but I don't understand the calculations that are made to obtain the borders and the edges.
Any help would be appreciated, thanks.

You want to approximate u'' knowing only the value of u on the right (or the left) of a point.
In order to have a second order approximation, you need 3 equations (basic taylor expansion):
u(i+1) = u(i) + h u' + (1/2) h^2 u'' + (1/6) h^3 u''' + O(h^4)
u(i+2) = u(i) + 2 h u' + (4/2) h^2 u'' + (8/6) h^3 u''' + O(h^4)
u(i+3) = u(i) + 3 h u' + (9/2) h^2 u'' + (27/6) h^3 u''' + O(h^4)
Solving for u'' gives (1):
h^2 u'' = -5 u(i+1) + 4 u(i+2) - u(i+3) + 2 u(i) +O(h^4)
To get the laplacian you need to replace the traditional formula with this one on the borders.
For example where "i = 0" you'll have:
del2(u) (i=0,j) = [-5 u(i+1,j) + 4 u(i+2,j) - u(i+3,j) + 2 u(i,j) + u(i,j+1) + u(i,j-1) - 2u(i,j) ]/h^2
EDIT clarifications:
The laplacian is the sum of the 2nd derivatives in the x and in the y directions. You can calculate the second derivative with the formula (2)
u'' = (u(i+1) + u(i-1) - 2u(i))/h^2
if you have both u(i+1) and u(i-1). If i=0 or i=imax you can use the first formula I wrote to compute the derivatives (notice that due to the simmetry of the 2nd derivative, if i = imax you can just replace "i+k" with "i-k"). The same applies for the y (j) direction:
On the edges you can mix up the formulas (1) and (2):
del2(u) (i=imax,j) = [-5 u(i-1,j) + 4 u(i-2,j) - u(i-3,j) + 2 u(i,j) + u(i,j+1) + u(i,j-1) - 2u(i,j) ]/h^2
del2(u) (i,j=0) = [-5 u(i,j+1) + 4 u(i,j+2) - u(i,j+3) + 2 u(i,j) + u(i+1,j) + u(i-1,j) - 2u(i,j) ]/h^2
del2(u) (i,j=jmax) = [-5 u(i,j-1) + 4 u(i,j-2) - u(i,j-3) + 2 u(i,j) + u(i+1,j) + u(i-1,j) - 2u(i,j) ]/h^2
And on the corners you'll just use (1) two times for both directions.
del2(u) (i=0,j=0) = [-5 u(i,j+1) + 4 u(i,j+2) - u(i,j+3) + 2 u(i,j) + -5 u(i,j+1) + 4 u(i+2,j) - u(i+3,j) + 2 u(i,j)]/h^2
Del2 is the 2nd order discrete laplacian, i.e. it permits to approximate the laplacian of a real continuous function given its values on a square cartesian grid NxN where the distance between two adjacent nodes is h.
h^2 is just a constant dimensional-factor, you can get the matlab implementation from these formulas by setting h^2 = 4.
For example, if you want to compute the real laplacian of u(x,y) on the (0,L) x (0,L) square, what you do is writing down the values of this function on an NxN cartesian grid, i.e. you calculate u(0,0), u(L/(N-1),0), u(2L/(N-1),0) ... u( (N-1)L/(N-1) =L,0) ... u(0,L/(N-1)), u(L/(N-1),L/(N-1)) etc. and you put down these N^2 values in a matrix A.
Then you'll have
ans = 4*del2(A)/h^2, where h = L/(N-1).
del2 will return the exact value of the continuous laplacian if your starting function is linear or quadratic (x^2+y^2 fine, x^3 + y^3 not fine). If the function is not linear nor quadratic, the result will be more accurate the more points you use (i.e. in the limit h -> 0)
I hope this is more clear, notice that i used 0-based indices for accessing matrix (C/C++ array style), while matlab uses 1-based.

DEL2 in MatLab represents Discrete Laplace operator, you can find some information about it here.
The main thing about the edges is that elements in the interior of the matrix have four neighbors, while elements on the edges and corners have three or two neighbors respectfully. So you calculate the corners and edges the same way, but using less elements.

Here is a module I wrote in Fortran 90 that replicates the "del2()" operator in MATLAB implementing the above ideas. It only works for arrays that that are atleast 4x4 or larger. It works successfully when I run it so I thought I would post it so that other people dont have to waste time making their own.
module del2_mod
implicit none
real, private :: pi
integer, private :: nr, nc, i, j, k
contains
! nr is number of rows in array, while nc is the number of columns in the array.
!!----------------------------------------------------------
subroutine del2(in, out)
real, dimension(:,:) :: in, out
real, dimension(nr,nc) :: interior, left, right, top, bottom, ul_corner, br_corner, disp
integer :: i, j
real :: h, ul, ur, bl, br
! Zero out internal arrays
out = 0.0; interior=0.0; left = 0.0; right = 0.0; top = 0.0; bottom = 0.0; ul_corner = 0.0; br_corner = 0.0;
h=2.0
! Interior Points
do j=1,nc
do i=1,nr
! Interior Point Calculations
if( j>1 .and. j<nc .and. i>1 .and. i<nr )then
interior(i,j) = ((in(i-1,j) + in(i+1,j) + in(i,j-1) + in(i,j+1)) - 4*in(i,j) )/(h**2)
end if
! Boundary Conditions for Left and Right edges
left(i,1) = (-5.0*in(i,2) + 4.0*in(i,3) - in(i,4) + 2.0*in(i,1) + in(i+1,1) + in(i-1,1) - 2.0*in(i,1) )/(h**2)
right(i,nc) = (-5.0*in(i,nc-1) + 4.0*in(i,nc-2) - in(i,nc-3) + 2.0*in(i,nc) + in(i+1,nc) + in(i-1,nc) - 2.0*in(i,nc) )/(h**2)
end do
! Boundary Conditions for Top and Bottom edges
top(1,j) = (-5.0*in(2,j) + 4.0*in(3,j) - in(4,j) + 2.0*in(1,j) + in(1,j+1) + in(1,j-1) - 2.0*in(1,j) )/(h**2)
bottom(nr,j) = (-5.0*in(nr-1,j) + 4.0*in(nr-2,j) - in(nr-3,j) + 2.0*in(nr,j) + in(nr,j+1) + in(nr,j-1) - 2.0*in(nr,j) )/(h**2)
end do
out = interior + left + right + top + bottom
! Calculate BC for the corners
ul = (-5.0*in(1,2) + 4.0*in(1,3) - in(1,4) + 2.0*in(1,1) - 5.0*in(2,1) + 4.0*in(3,1) - in(4,1) + 2.0*in(1,1))/(h**2)
br = (-5.0*in(nr,nc-1) + 4.0*in(nr,nc-2) - in(nr,nc-3) + 2.0*in(nr,nc) - 5.0*in(nr-1,nc) + 4.0*in(nr-2,nc) - in(nr-3,nc) + 2.0*in(nr,nc))/(h**2)
bl = (-5.0*in(nr,2) + 4.0*in(nr,3) - in(nr,4) + 2.0*in(nr,1) - 5.0*in(nr-1,1) + 4.0*in(nr-2,1) - in(nr-3,1) + 2.0*in(nr,1))/(h**2)
ur = (-5.0*in(1,nc-1) + 4.0*in(1,nc-2) - in(1,nc-3) + 2.0*in(1,nc) - 5.0*in(2,nc) + 4.0*in(3,nc) - in(4,nc) + 2.0*in(1,nc))/(h**2)
! Apply BC for the corners
out(1,1)=ul
out(1,nc)=ur
out(nr,1)=bl
out(nr,nc)=br
end subroutine
end module

It's so hard! I wasted a few hours to understand and implement it in Java.
Here is: https://gist.github.com/emersonmoretto/dec8f7125c032775da0d
Tested and compared to the original function DEL2 (Matlab)
I've found a typo in sbabbi response:
del2(u) (i=0,j=0) = [-5 u(i,j+1) + 4 u(i,j+2) - u(i,j+3) + 2 u(i,j) + -5 u(i,j+1) + 4 u(i+2,j) - u(i+3,j) + 2 u(i,j)]/h^2
is
del2(u) (i=0,j=0) = [-5 u(i,j+1) + 4 u(i,j+2) - u(i,j+3) + 2 u(i,j) + -5 u(i+1,j) + 4 u(i+2,j) - u(i+3,j) + 2 u(i,j)]/h^2

Implementation of the Discrete Fourier Transform - FFT

I am trying to do a project in sound processing and need to put the frequencies into another domain. Now, I have tried to implement an FFT, that didn't go well. I tried to understand the z-transform, that didn't go to well either. I read up and found DFT's a lot more simple to understand, especially the algorithm. So I coded the algorithm using examples but I do not know or think the output is right. (I don't have Matlab on here, and cannot find any resources to test it) and wondered if you guys knew if I was going in the right direction. Here is my code so far:
#include <iostream>
#include <complex>
#include <vector>
using namespace std;
const double PI = 3.141592;
vector< complex<double> > DFT(vector< complex<double> >& theData)
{
// Define the Size of the read in vector
const int S = theData.size();
// Initalise new vector with size of S
vector< complex<double> > out(S, 0);
for(unsigned i=0; (i < S); i++)
{
out[i] = complex<double>(0.0, 0.0);
for(unsigned j=0; (j < S); j++)
{
out[i] += theData[j] * polar<double>(1.0, - 2 * PI * i * j / S);
}
}
return out;
}
int main(int argc, char *argv[]) {
vector< complex<double> > numbers;
numbers.push_back(102023);
numbers.push_back(102023);
numbers.push_back(102023);
numbers.push_back(102023);
vector< complex<double> > testing = DFT(numbers);
for(unsigned i=0; (i < testing.size()); i++)
{
cout << testing[i] << endl;
}
}
The inputs are:
102023 102023
102023 102023
And the result:
(408092, 0)
(-0.0666812, -0.0666812)
(1.30764e-07, -0.133362)
(0.200044, -0.200043)
Any help or advice would be great, I'm not expecting a lot, but, anything would be great. Thank you :)

#Phorce is right here. I don't think there is any reson to reinvent the wheel. However, if you want to do this so that you understand the methodology and to have the joy of coding it yourself I can provide a FORTRAN FFT code that I developed some years ago. Of course this is not C++ and will require a translation; this should not be too difficult and should enable you to learn a lot in doing so...
Below is a Radix 4 based algorithm; this radix-4 FFT recursively partitions a DFT into four quarter-length DFTs of groups of every fourth time sample. The outputs of these shorter FFTs are reused to compute many outputs, thus greatly reducing the total computational cost. The radix-4 decimation-in-frequency FFT groups every fourth output sample into shorter-length DFTs to save computations. The radix-4 FFTs require only 75% as many complex multiplies as the radix-2 FFTs. See here for more information.
!+ FILE: RADIX4.FOR
! ===================================================================
! Discription: Radix 4 is a descreet complex Fourier transform algorithim. It
! is to be supplied with two real arrays, one for real parts of function
! one for imaginary parts: It can also unscramble transformed arrays.
! Usage: calling FASTF(XREAL,XIMAG,ISIZE,ITYPE,IFAULT); we supply the
! following:
!
! XREAL - array containing real parts of transform sequence
! XIMAG - array containing imagianry parts of transformation sequence
! ISIZE - size of transform (ISIZE = 4*2*M)
! ITYPE - +1 forward transform
! -1 reverse transform
! IFAULT - 1 if error
! - 0 otherwise
! ===================================================================
!
! Forward transform computes:
! X(k) = sum_{j=0}^{isize-1} x(j)*exp(-2ijk*pi/isize)
! Backward computes:
! x(j) = (1/isize) sum_{k=0}^{isize-1} X(k)*exp(ijk*pi/isize)
!
! Forward followed by backwards will result in the origonal sequence!
!
! ===================================================================
SUBROUTINE FASTF(XREAL,XIMAG,ISIZE,ITYPE,IFAULT)
REAL*8 XREAL(*),XIMAG(*)
INTEGER MAX2,II,IPOW
PARAMETER (MAX2 = 20)
! Check for valid transform size upto 2**(max2):
IFAULT = 1
IF(ISIZE.LT.4) THEN
print*,'FFT: Error: Data array < 4 - Too small!'
return
ENDIF
II = 4
IPOW = 2
! Prepare mod 2:
1 IF((II-ISIZE).NE.0) THEN
II = II*2
IPOW = IPOW + 1
IF(IPOW.GT.MAX2) THEN
print*,'FFT: Error: FFT1!'
return
ENDIF
GOTO 1
ENDIF
! Check for correct type:
IF(IABS(ITYPE).NE.1) THEN
print*,'FFT: Error: Wrong type of transformation!'
return
ENDIF
! No entry errors - continue:
IFAULT = 0
! call FASTG to preform transformation:
CALL FASTG(XREAL,XIMAG,ISIZE,ITYPE)
! Due to Radix 4 factorisation results are not in the same order
! after transformation as they were when the data was submitted:
! We now call SCRAM, to unscramble the reults:
CALL SCRAM(XREAL,XIMAG,ISIZE,IPOW)
return
END
!-END: RADIX4.FOR
! ===============================================================
! Discription: This is the radix 4 complex descreet fast Fourier
! transform with out unscrabling. Suitable for convolutions or other
! applications that do not require unscrambling. Designed for use
! with FASTF.FOR.
!
SUBROUTINE FASTG(XREAL,XIMAG,N,ITYPE)
INTEGER N,IFACA,IFCAB,LITLA
INTEGER I0,I1,I2,I3
REAL*8 XREAL(*),XIMAG(*),BCOS,BSIN,CW1,CW2,PI
REAL*8 SW1,SW2,SW3,TEMPR,X1,X2,X3,XS0,XS1,XS2,XS3
REAL*8 Y1,Y2,Y3,YS0,YS1,YS2,YS3,Z,ZATAN,ZFLOAT,ZSIN
ZATAN(Z) = ATAN(Z)
ZFLOAT(K) = FLOAT(K) ! Real equivalent of K.
ZSIN(Z) = SIN(Z)
PI = (4.0)*ZATAN(1.0)
IFACA = N/4
! Forward transform:
IF(ITYPE.GT.0) THEN
GOTO 5
ENDIF
! If this is for an inverse transform - conjugate the data:
DO 4, K = 1,N
XIMAG(K) = -XIMAG(K)
4 CONTINUE
5 IFCAB = IFACA*4
! Proform appropriate transformations:
Z = PI/ZFLOAT(IFCAB)
BCOS = -2.0*ZSIN(Z)**2
BSIN = ZSIN(2.0*Z)
CW1 = 1.0
SW1 = 0.0
! This is the main body of radix 4 calculations:
DO 10, LITLA = 1,IFACA
DO 8, I0 = LITLA,N,IFCAB
I1 = I0 + IFACA
I2 = I1 + IFACA
I3 = I2 + IFACA
XS0 = XREAL(I0) + XREAL(I2)
XS1 = XREAL(I0) - XREAL(I2)
YS0 = XIMAG(I0) + XIMAG(I2)
YS1 = XIMAG(I0) - XIMAG(I2)
XS2 = XREAL(I1) + XREAL(I3)
XS3 = XREAL(I1) - XREAL(I3)
YS2 = XIMAG(I1) + XIMAG(I3)
YS3 = XIMAG(I1) - XIMAG(I3)
XREAL(I0) = XS0 + XS2
XIMAG(I0) = YS0 + YS2
X1 = XS1 + YS3
Y1 = YS1 - XS3
X2 = XS0 - XS2
Y2 = YS0 - YS2
X3 = XS1 - YS3
Y3 = YS1 + XS3
IF(LITLA.GT.1) THEN
GOTO 7
ENDIF
XREAL(I2) = X1
XIMAG(I2) = Y1
XREAL(I1) = X2
XIMAG(I1) = Y2
XREAL(I3) = X3
XIMAG(I3) = Y3
GOTO 8
! Now IF required - we multiply by twiddle factors:
7 XREAL(I2) = X1*CW1 + Y1*SW1
XIMAG(I2) = Y1*CW1 - X1*SW1
XREAL(I1) = X2*CW2 + Y2*SW2
XIMAG(I1) = Y2*CW2 - X2*SW2
XREAL(I3) = X3*CW3 + Y3*SW3
XIMAG(I3) = Y3*CW3 - X3*SW3
8 CONTINUE
IF(LITLA.EQ.IFACA) THEN
GOTO 10
ENDIF
! Calculate a new set of twiddle factors:
Z = CW1*BCOS - SW1*BSIN + CW1
SW1 = BCOS*SW1 + BSIN*CW1 + SW1
TEMPR = 1.5 - 0.5*(Z*Z + SW1*SW1)
CW1 = Z*TEMPR
SW1 = SW1*TEMPR
CW2 = CW1*CW1 - SW1*SW1
SW2 = 2.0*CW1*SW1
CW3 = CW1*CW2 - SW1*SW2
SW3 = CW1*SW2 + CW2*SW1
10 CONTINUE
IF(IFACA.LE.1) THEN
GOTO 14
ENDIF
! Set up tranform split for next stage:
IFACA = IFACA/4
IF(IFACA.GT.0) THEN
GOTO 5
ENDIF
! This is the calculation of a radix two-stage:
DO 13, K = 1,N,2
TEMPR = XREAL(K) + XREAL(K + 1)
XREAL(K + 1) = XREAL(K) - XREAL(K + 1)
XREAL(K) = TEMPR
TEMPR = XIMAG(K) + XIMAG(K + 1)
XIMAG(K + 1) = XIMAG(K) - XIMAG(K + 1)
XIMAG(K) = TEMPR
13 CONTINUE
14 IF(ITYPE.GT.0) THEN
GOTO 17
ENDIF
! For the inverse case, cojugate and scale the transform:
Z = 1.0/ZFLOAT(N)
DO 16, K = 1,N
XIMAG(K) = -XIMAG(K)*Z
XREAL(K) = XREAL(K)*Z
16 CONTINUE
17 return
END
! ----------------------------------------------------------
!-END of subroutine FASTG.FOR.
! ----------------------------------------------------------
!+ FILE: SCRAM.FOR
! ==========================================================
! Discription: Subroutine for unscrambiling FFT data:
! ==========================================================
SUBROUTINE SCRAM(XREAL,XIMAG,N,IPOW)
INTEGER L(19),II,J1,J2,J3,J4,J5,J6,J7,J8,J9,J10,J11,J12
INTEGER J13,J14,J15,J16,J17,J18,J19,J20,ITOP,I
REAL*8 XREAL(*),XIMAG(*),TEMPR
EQUIVALENCE (L1,L(1)),(L2,L(2)),(L3,L(3)),(L4,L(4))
EQUIVALENCE (L5,L(5)),(L6,L(6)),(L7,L(7)),(L8,L(8))
EQUIVALENCE (L9,L(9)),(L10,L(10)),(L11,L(11)),(L12,L(12))
EQUIVALENCE (L13,L(13)),(L14,L(14)),(L15,L(15)),(L16,L(16))
EQUIVALENCE (L17,L(17)),(L18,L(18)),(L19,L(19))
II = 1
ITOP = 2**(IPOW - 1)
I = 20 - IPOW
DO 5, K = 1,I
L(K) = II
5 CONTINUE
L0 = II
I = I + 1
DO 6, K = I,19
II = II*2
L(K) = II
6 CONTINUE
II = 0
DO 9, J1 = 1,L1,L0
DO 9, J2 = J1,L2,L1
DO 9, J3 = J2,L3,L2
DO 9, J4 = J3,L4,L3
DO 9, J5 = J4,L5,L4
DO 9, J6 = J5,L6,L5
DO 9, J7 = J6,L7,L6
DO 9, J8 = J7,L8,L7
DO 9, J9 = J8,L9,L8
DO 9, J10 = J9,L10,L9
DO 9, J11 = J10,L11,L10
DO 9, J12 = J11,L12,L11
DO 9, J13 = J12,L13,L12
DO 9, J14 = J13,L14,L13
DO 9, J15 = J14,L15,L14
DO 9, J16 = J15,L16,L15
DO 9, J17 = J16,L17,L16
DO 9, J18 = J17,L18,L17
DO 9, J19 = J18,L19,L18
J20 = J19
DO 9, I = 1,2
II = II +1
IF(II.GE.J20) THEN
GOTO 8
ENDIF
! J20 is the bit reverse of II!
! Pairwise exchange:
TEMPR = XREAL(II)
XREAL(II) = XREAL(J20)
XREAL(J20) = TEMPR
TEMPR = XIMAG(II)
XIMAG(II) = XIMAG(J20)
XIMAG(J20) = TEMPR
8 J20 = J20 + ITOP
9 CONTINUE
return
END
! -------------------------------------------------------------------
!-END:
! -------------------------------------------------------------------
Going through this and understanding it will take time! I wrote this using a CalTech paper I found years ago, I cannot recall the reference I am afraid. Good luck.
I hope this helps.

Your code works.
I would give more digits for PI ( 3.1415926535898 ).
Also, you have to devide the output of the DFT summation by S, the DFT size.
Since the input series in your test is constant, the DFT output should have only one non-zero coefficient.
And indeed all the output coefficients are very small relative to the first one.
But for a large input length, this is not an efficient way of implementing the DFT.
If timing is a concern, look into the Fast Fourrier Transform for faster methods to calculate the DFT.

Your code looks right to me. I'm not sure what you were expecting for output but, given that your input is a constant value, the DFT of a constant is a DC term in bin 0 and zeroes in the remaining bins (or a close equivalent, which you have).
You might try testing you code with a longer sequence containing some type of waveform like a sine wave or a square wave. In general, however, you should consider using something like fftw in production code. Its been wrung out and highly optimized by many people for a long time. FFTs are optimized DFTs for special cases (e.g., lengths that are powers of 2).

Your code looks okey. out[0] should represent the "DC" component of your input waveform. In your case, it is 4 times bigger than the input waveform, because your normalization coefficient is 1.
The other coefficients should represent the amplitude and phase of your input waveform. The coefficients are mirrored, i.e., out[i] == out[N-i]. You can test this with the following code:
double frequency = 1; /* use other values like 2, 3, 4 etc. */
for (int i = 0; i < 16; i++)
numbers.push_back(sin((double)i / 16 * frequency * 2 * PI));
For frequency = 1, this gives:
(6.53592e-07,0)
(6.53592e-07,-8)
(6.53592e-07,1.75661e-07)
(6.53591e-07,2.70728e-07)
(6.5359e-07,3.75466e-07)
(6.5359e-07,4.95006e-07)
(6.53588e-07,6.36767e-07)
(6.53587e-07,8.12183e-07)
(6.53584e-07,1.04006e-06)
(6.53581e-07,1.35364e-06)
(6.53576e-07,1.81691e-06)
(6.53568e-07,2.56792e-06)
(6.53553e-07,3.95615e-06)
(6.53519e-07,7.1238e-06)
(6.53402e-07,1.82855e-05)
(-8.30058e-05,7.99999)
which seems correct to me: negligible DC, amplitude 8 for 1st harmonics, negligible amplitudes for other harmonics.

MoonKnight has already provided a radix-4 Decimation In Frequency Cooley-Tukey scheme in Fortran. I'm below providing a radix-2 Decimation In Frequency Cooley-Tukey scheme in Matlab.
The code is an iterative one and considers the scheme in the following figure:
A recursive approach is also possible.
As you will see, the implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in How many FLOPS for FFT?.
The code is obviously much slower than the highly optimized FFTW exploited by Matlab.
Note also that the twiddle factors omegaa^((2^(p - 1) * n)) can be calculated off-line and then restored from a lookup table, but this point is skipped in the code below.
For a Matlab implementation of an iterative radix-2 Decimation In Time Cooley-Tukey scheme, please see Implementing a Fast Fourier Transform for Option Pricing.
% --- Radix-2 Decimation In Frequency - Iterative approach
clear all
close all
clc
N = 32;
x = randn(1, N);
xoriginal = x;
xhat = zeros(1, N);
numStages = log2(N);
omegaa = exp(-1i * 2 * pi / N);
mulCount = 0;
sumCount = 0;
tic
M = N / 2;
for p = 1 : numStages;
for index = 0 : (N / (2^(p - 1))) : (N - 1);
for n = 0 : M - 1;
a = x(n + index + 1) + x(n + index + M + 1);
b = (x(n + index + 1) - x(n + index + M + 1)) .* omegaa^((2^(p - 1) * n));
x(n + 1 + index) = a;
x(n + M + 1 + index) = b;
mulCount = mulCount + 4;
sumCount = sumCount + 6;
end;
end;
M = M / 2;
end
xhat = bitrevorder(x);
timeCooleyTukey = toc;
tic
xhatcheck = fft(xoriginal);
timeFFTW = toc;
rms = 100 * sqrt(sum(sum(abs(xhat - xhatcheck).^2)) / sum(sum(abs(xhat).^2)));
fprintf('Time Cooley-Tukey = %f; \t Time FFTW = %f\n\n', timeCooleyTukey, timeFFTW);
fprintf('Theoretical multiplications count \t = %i; \t Actual multiplications count \t = %i\n', ...
2 * N * log2(N), mulCount);
fprintf('Theoretical additions count \t\t = %i; \t Actual additions count \t\t = %i\n\n', ...
3 * N * log2(N), sumCount);
fprintf('Root mean square with FFTW implementation = %.10e\n', rms);

Your code is correct to obtain the DFT.
The function you are testing is (sin ((double) i / points * frequency * 2) which corresponds to a synoid of amplitude 1, frequency 1 and sampling frequency Fs = number of points taken.
Operating with the obtained data we have:
As you can see, the DFT coefficients are symmetric with respect to the position coefficient N / 2, so only the first N / 2 provide information. The amplitude obtained by means of the module of the real and imaginary part must be divided by N and multiplied by 2 to reconstruct it. The frequencies of the coefficients will be multiples of Fs / N by the coefficient number.
If we introduce two sinusoids, one of frequency 2 and amplitude 1.3 and another of frequency 3 and amplitude 1.7.
for (int i = 0; i < 16; i++)
{
numbers.push_back(1.3 *sin((double)i / 16 * frequency1 * 2 * PI)+ 1.7 *
sin((double)i / 16 * frequency2 * 2 * PI));
}
The obtained data are:
Good luck.

Fast Method for computing 3x3 symmetric matrix spectral decomposition

I am working on a project where I'm basically preforming PCA millions of times on sets of 20-100 points. Currently, we are using some legacy code that is using GNU's GSL linear algebra pack to do SVD on covariance matrix. This works, but is very slow.
I was wondering if there are any simple methods to do eigen decompositions on a 3x3 symmetric matrix, so that I can just put it on the GPU and let it run in parallel.
Since the matrices themselves are so small, I wasn't sure what kind of algorithm to use, because it seems like they were designed for large matrices or data sets. There's also the choice of doing a straight SVD on the data set, but I'm not sure what would be the best option.
I have to admit, I'm not stellar at Linear Algebra, especially when considering algorithm advantages. Any help would be greatly appreciated.
(I'm working in C++ right now)

Using the characteristic polynomial works, but it tends to be somewhat numerically unstable (or at the very least inaccurate).
A standard algorithm to compute eigensystems for symmetric matrices is the QR method. For 3x3 matrices, a very slick implementation is possible by building the orthogonal transform out of rotations and representing them as a Quaternion. A (quite short!) implementation of this idea in C++, assuming you have a 3x3 matrix and a Quaternion class, can be found here. The algorithm should be fairly suitable for GPU implementation because it's iterative (and thus self-correcting), can make reasonably good use of fast low-dimensional vector math primitives when they're available and uses a fairly small number of vector registers (so it allows for more active threads).

Most methods are efficient for bigger matrices. For small ones the analytical method ist the quickest and simplest, but is in some cases inaccurate.
Joachim Kopp developed a optimized "hybrid" method for a 3x3 symmetric matrix, which relays on the analytical mathod, but falls back to QL algorithm.
An other solution for 3x3 symmetric matrices can be found here (symmetric tridiagonal QL algorithm).

// Slightly modified version of Stan Melax's code for 3x3 matrix diagonalization (Thanks Stan!)
// source: http://www.melax.com/diag.html?attredirects=0
void Diagonalize(const Real (&A)[3][3], Real (&Q)[3][3], Real (&D)[3][3])
{
// A must be a symmetric matrix.
// returns Q and D such that
// Diagonal matrix D = QT * A * Q; and A = Q*D*QT
const int maxsteps=24; // certainly wont need that many.
int k0, k1, k2;
Real o[3], m[3];
Real q [4] = {0.0,0.0,0.0,1.0};
Real jr[4];
Real sqw, sqx, sqy, sqz;
Real tmp1, tmp2, mq;
Real AQ[3][3];
Real thet, sgn, t, c;
for(int i=0;i < maxsteps;++i)
{
// quat to matrix
sqx = q[0]*q[0];
sqy = q[1]*q[1];
sqz = q[2]*q[2];
sqw = q[3]*q[3];
Q[0][0] = ( sqx - sqy - sqz + sqw);
Q[1][1] = (-sqx + sqy - sqz + sqw);
Q[2][2] = (-sqx - sqy + sqz + sqw);
tmp1 = q[0]*q[1];
tmp2 = q[2]*q[3];
Q[1][0] = 2.0 * (tmp1 + tmp2);
Q[0][1] = 2.0 * (tmp1 - tmp2);
tmp1 = q[0]*q[2];
tmp2 = q[1]*q[3];
Q[2][0] = 2.0 * (tmp1 - tmp2);
Q[0][2] = 2.0 * (tmp1 + tmp2);
tmp1 = q[1]*q[2];
tmp2 = q[0]*q[3];
Q[2][1] = 2.0 * (tmp1 + tmp2);
Q[1][2] = 2.0 * (tmp1 - tmp2);
// AQ = A * Q
AQ[0][0] = Q[0][0]*A[0][0]+Q[1][0]*A[0][1]+Q[2][0]*A[0][2];
AQ[0][1] = Q[0][1]*A[0][0]+Q[1][1]*A[0][1]+Q[2][1]*A[0][2];
AQ[0][2] = Q[0][2]*A[0][0]+Q[1][2]*A[0][1]+Q[2][2]*A[0][2];
AQ[1][0] = Q[0][0]*A[0][1]+Q[1][0]*A[1][1]+Q[2][0]*A[1][2];
AQ[1][1] = Q[0][1]*A[0][1]+Q[1][1]*A[1][1]+Q[2][1]*A[1][2];
AQ[1][2] = Q[0][2]*A[0][1]+Q[1][2]*A[1][1]+Q[2][2]*A[1][2];
AQ[2][0] = Q[0][0]*A[0][2]+Q[1][0]*A[1][2]+Q[2][0]*A[2][2];
AQ[2][1] = Q[0][1]*A[0][2]+Q[1][1]*A[1][2]+Q[2][1]*A[2][2];
AQ[2][2] = Q[0][2]*A[0][2]+Q[1][2]*A[1][2]+Q[2][2]*A[2][2];
// D = Qt * AQ
D[0][0] = AQ[0][0]*Q[0][0]+AQ[1][0]*Q[1][0]+AQ[2][0]*Q[2][0];
D[0][1] = AQ[0][0]*Q[0][1]+AQ[1][0]*Q[1][1]+AQ[2][0]*Q[2][1];
D[0][2] = AQ[0][0]*Q[0][2]+AQ[1][0]*Q[1][2]+AQ[2][0]*Q[2][2];
D[1][0] = AQ[0][1]*Q[0][0]+AQ[1][1]*Q[1][0]+AQ[2][1]*Q[2][0];
D[1][1] = AQ[0][1]*Q[0][1]+AQ[1][1]*Q[1][1]+AQ[2][1]*Q[2][1];
D[1][2] = AQ[0][1]*Q[0][2]+AQ[1][1]*Q[1][2]+AQ[2][1]*Q[2][2];
D[2][0] = AQ[0][2]*Q[0][0]+AQ[1][2]*Q[1][0]+AQ[2][2]*Q[2][0];
D[2][1] = AQ[0][2]*Q[0][1]+AQ[1][2]*Q[1][1]+AQ[2][2]*Q[2][1];
D[2][2] = AQ[0][2]*Q[0][2]+AQ[1][2]*Q[1][2]+AQ[2][2]*Q[2][2];
o[0] = D[1][2];
o[1] = D[0][2];
o[2] = D[0][1];
m[0] = fabs(o[0]);
m[1] = fabs(o[1]);
m[2] = fabs(o[2]);
k0 = (m[0] > m[1] && m[0] > m[2])?0: (m[1] > m[2])? 1 : 2; // index of largest element of offdiag
k1 = (k0+1)%3;
k2 = (k0+2)%3;
if (o[k0]==0.0)
{
break; // diagonal already
}
thet = (D[k2][k2]-D[k1][k1])/(2.0*o[k0]);
sgn = (thet > 0.0)?1.0:-1.0;
thet *= sgn; // make it positive
t = sgn /(thet +((thet < 1.E6)?sqrt(thet*thet+1.0):thet)) ; // sign(T)/(|T|+sqrt(T^2+1))
c = 1.0/sqrt(t*t+1.0); // c= 1/(t^2+1) , t=s/c
if(c==1.0)
{
break; // no room for improvement - reached machine precision.
}
jr[0 ] = jr[1] = jr[2] = jr[3] = 0.0;
jr[k0] = sgn*sqrt((1.0-c)/2.0); // using 1/2 angle identity sin(a/2) = sqrt((1-cos(a))/2)
jr[k0] *= -1.0; // since our quat-to-matrix convention was for v*M instead of M*v
jr[3 ] = sqrt(1.0f - jr[k0] * jr[k0]);
if(jr[3]==1.0)
{
break; // reached limits of floating point precision
}
q[0] = (q[3]*jr[0] + q[0]*jr[3] + q[1]*jr[2] - q[2]*jr[1]);
q[1] = (q[3]*jr[1] - q[0]*jr[2] + q[1]*jr[3] + q[2]*jr[0]);
q[2] = (q[3]*jr[2] + q[0]*jr[1] - q[1]*jr[0] + q[2]*jr[3]);
q[3] = (q[3]*jr[3] - q[0]*jr[0] - q[1]*jr[1] - q[2]*jr[2]);
mq = sqrt(q[0] * q[0] + q[1] * q[1] + q[2] * q[2] + q[3] * q[3]);
q[0] /= mq;
q[1] /= mq;
q[2] /= mq;
q[3] /= mq;
}
}

I am not stellar at linear algebra either but since Murphy stated that "When you don't know what you're talking about, everything is possible", it is possible that the CULA pack might be relevant to your needs. They do SVD and Eigenvalues decomposition

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js