I am developing a program, making heavy use of Armadillo library. I have the 10.8.2 version, linked against Intel oneAPI MKL 2022.0.2. At some point, I need to perform many
sparse matrix times dense vector multiplications, both of which are defined using Armadillo structures. I have found this point to be a probable bottleneck, and was being curious if replacing the Armadillo multiplication with "bare bones" sparse CBLAS routines from MKL (mkl_sparse_d_mv) would speed things up. But in order to do so, I need to convert from Armadillo's SpMat to something that MKL understands. As per Armadillo docs, sparse matrices are stored in CSC format, so I have tried
mkl_sparse_d_create_csc. My attempt at this is below:
#include <iostream>
#include <armadillo>
#include "mkl.h"
int main()
{
arma::umat locations = {{0, 0, 1, 3, 2},{0, 1, 0, 2, 3}};
// arma::vec vals = {0.5, 2.5, 2.5, 4.5, 4.5};
arma::vec vals = {0.5, 2.5, 3.5, 4.5, 5.5};
arma::sp_mat X(locations, vals);
std::cout << "X = \n" << arma::mat(X) << std::endl;
arma::vec v = {1,1,1,1};
arma::vec v2;
v2.resize(4);
std::cout << "v = \n" << v << std::endl;
std::cout << "X * v = \n" << X * v << std::endl;
MKL_INT *cols_beg = static_cast<MKL_INT *>(mkl_malloc(X.n_cols * sizeof(MKL_INT), 64));
MKL_INT *cols_end = static_cast<MKL_INT *>(mkl_malloc(X.n_cols * sizeof(MKL_INT), 64));
MKL_INT *row_idx = static_cast<MKL_INT *>(mkl_malloc(X.n_nonzero * sizeof(MKL_INT), 64));
double *values = static_cast<double *>(mkl_malloc(X.n_nonzero * sizeof(double), 64));
for (MKL_INT i = 0; i < X.n_cols; i++)
{
cols_beg[i] = static_cast<MKL_INT>(X.col_ptrs[i]);
cols_end[i] = static_cast<MKL_INT>((--X.end_col(i)).pos());
std::cout << cols_beg[i] << " --- " << cols_end[i] << std::endl;
}
std::cout << std::endl;
for (MKL_INT i = 0; i < X.n_nonzero; i++)
{
row_idx[i] = static_cast<MKL_INT>(X.row_indices[i]);
values[i] = X.values[i];
std::cout << row_idx[i] << " --- " << values[i] << std::endl;
}
std::cout << std::endl;
sparse_matrix_t X_mkl = NULL;
sparse_status_t res = mkl_sparse_d_create_csc(&X_mkl, SPARSE_INDEX_BASE_ZERO,
X.n_rows, X.n_cols, cols_beg, cols_end, row_idx, values);
if(res == SPARSE_STATUS_SUCCESS) std::cout << "Constructed mkl representation of X" << std::endl;
matrix_descr dsc;
dsc.type = SPARSE_MATRIX_TYPE_GENERAL;
sparse_status_t stat = mkl_sparse_d_mv(SPARSE_OPERATION_NON_TRANSPOSE, 1.0, X_mkl, dsc, v.memptr(), 0.0, v2.memptr());
std::cout << "Multiplication status = " << stat << std::endl;
if(stat == SPARSE_STATUS_SUCCESS)
{
std::cout << "Calculated X*v via mkl" << std::endl;
std::cout << v2;
}
mkl_free(cols_beg);
mkl_free(cols_end);
mkl_free(row_idx);
mkl_free(values);
mkl_sparse_destroy(X_mkl);
return 0;
}
I am compiling this code with (with the help of Link Line Advisor)
icpc -g testing.cpp -o intel_testing.out -DARMA_ALLOW_FAKE_GCC -O3 -xhost -Wall -Wextra -L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -DMKL_ILP64 -qmkl=parallel -larmadillo
on Pop!_OS 21.10.
It compiles and runs without any problems. The output is as follows:
X =
0.5000 2.5000 0 0
3.5000 0 0 0
0 0 0 5.5000
0 0 4.5000 0
v =
1.0000
1.0000
1.0000
1.0000
X * v =
3.0000
3.5000
5.5000
4.5000
0 --- 1
2 --- 2
3 --- 3
4 --- 4
0 --- 0.5
1 --- 3.5
0 --- 2.5
3 --- 4.5
2 --- 5.5
Constructed mkl representation of X
Multiplication status = 0
Calculated X*v via mkl
0.5000
0
0
0
As we can see, the result of Armadillo's multiplication is correct, whereas the one from MKL is wrong. My question is this: Am I making a mistake somewhere? Or is there something wrong with MKL?. I suspect the former of course, but after spending considerable amount of time, cannot find anything. Any help would be much appreciated!
EDIT
As CJR and Vidyalatha_Intel suggested, I have changed col_end to
cols_end[i] = static_cast<MKL_INT>((X.end_col(i)).pos());
The result is now
X =
0.5000 2.5000 0 0
3.5000 0 0 0
0 0 0 5.5000
0 0 4.5000 0
v =
1.0000
1.0000
1.0000
1.0000
X * v =
3.0000
3.5000
5.5000
4.5000
0 --- 2
2 --- 3
3 --- 4
4 --- 5
0 --- 0.5
1 --- 3.5
0 --- 2.5
3 --- 4.5
2 --- 5.5
Constructed mkl representation of X
Multiplication status = 0
Calculated X*v via mkl
4.0000
2.5000
0
0
col_end is indeed 2,3,4,5 as suggested, but the result is still wrong.
Yes, the cols_end array is incorrect as pointed out by CJR. They should be indexed as 2,3,4,5. Please see the documentation regarding the parameter to the function mkl_sparse_d_create_csc
cols_end:
This array contains col indices, such that cols_end[i] - ind - 1 is the last index of col i in the arrays values and row_indx. ind takes 0 for zero-based indexing and 1 for one-based indexing.
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines/matrix-manipulation-routines/mkl-sparse-create-csc.html
Change this line
cols_end[i] = static_cast<MKL_INT>((--X.end_col(i)).pos());
to
cols_end[i] = static_cast<MKL_INT>((X.end_col(i)).pos());
Now recompile and run the code. I've tested it and it is showing the correct results. Image with results and compilation command
In the first line the number 5043 means an interaction that I am in, right? Without the end line of the same line, the time remaining to complete the training (~ 47 hours).
I would like to know what other lines mean, especially the last one
5043: 0.668292, 1.235671 avg loss, 0.001000 rate, 20.289504 seconds, 322752 images, 47.346673 hours left
Loaded: 14.566142 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000004, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000049, iou_loss = 0.000000, total_loss = 0.000049
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.812168, GIOU: 0.808716), Class: 0.938050, Obj: 0.072795, No Obj: 0.000732, .5R: 1.000000, .75R: 0.800000, count: 10, class_loss = 2.413296, iou_loss = 3.502501, total_loss = 5.915797
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.781551, GIOU: 0.775137), Class: 0.985000, Obj: 0.347750, No Obj: 0.016977, .5R: 1.000000, .75R: 0.694444, count: 36, class_loss = 6.508103, iou_loss = 1.942730, total_loss = 8.450833
total_bbox = 17899, rewritten_bbox = 0.050282 %
I have a problem with my ray tracer program. The image looks wrong. Here is the output image:
Barycentric coordinates and collision calculation codes are as follows:
bool CTriangle::Intersect(Calculus::CRay& ray, CIntersection* isect) const {
// Möller–Trumbore intersection algorithm
const Calculus::CPoint3<float>& p1 = v_points[0];
const Calculus::CPoint3<float>& p2 = v_points[1];
const Calculus::CPoint3<float>& p3 = v_points[2];
Calculus::CVector3<float> e1 = p2 - p1;
Calculus::CVector3<float> e2 = p3 - p1;
Calculus::CVector3<float> s1 = Calculus::Math::Cross(ray.direction, e2);
float determinant = Calculus::Math::Dot(s1, e1);
if (determinant == 0.0f)
return false;
float inv_determinant = 1.0f / determinant;
Calculus::CVector3<float> s = ray.origin - p1;
float b1 = Calculus::Math::Dot(s, s1) * inv_determinant;
if (b1 < 0.0f || b1 > 1.0f)
return false;
Calculus::CVector3<float> s2 = Calculus::Math::Cross(s, e1);
float b2 = Calculus::Math::Dot(ray.direction, s2) * inv_determinant;
if (b2 < 0.0f || b1 + b2 > 1.0f)
return false;
float b0 = 1 - b1 - b2;
float thit = Calculus::Math::Dot(e2, s2) * inv_determinant;
if (thit < ray.mint || thit > ray.maxt)
return false;
isect->p = ray(thit);
isect->n = Calculus::Math::Normalize(Calculus::CVector3<float>
(v_normals[0].x, v_normals[0].y, v_normals[0].z) * b0 +
Calculus::CVector3<float>(v_normals[1].x, v_normals[1].y,
v_normals[1].z) * b1 +
Calculus::CVector3<float>(v_normals[2].x, v_normals[2].y,
v_normals[2].z) * b2);
isect->uv = v_uvs[0] * b0 + v_uvs[1] * b1 + v_uvs[2] * b2;
isect->tHit = thit;
isect->ray_epsilon = 1e-5f * thit;
return true;
}
Texture i used int the ray trace program:(file type: bmp)
my obj file is as follows. The background shape consists of two triangles. Texture projection is applied only to the background shape:
v -24.1456 -11.1684 -26.2413
v 24.1455 -11.1684 -26.2413
v -24.1456 37.1227 -26.2413
v 24.1455 37.1227 -26.2413
# 4 vertices
vn 0.0000 0.0000 1.0000
vn 0.0000 0.0000 1.0000
vn 0.0000 0.0000 1.0000
vn 0.0000 0.0000 1.0000
vn 0.0000 0.0000 1.0000
vn 0.0000 0.0000 1.0000
# 6 vertex normals
vt 0.9995 0.0005 0.0000
vt 0.0005 0.0005 0.0000
vt 0.9995 0.9995 0.0000
vt 0.0005 0.9995 0.0000
# 4 texture coords
o back
g back
usemtl default
s 1
f 1/1/1 2/2/2 4/4/3
f 4/4/4 3/3/5 1/1/6
# 2 faces
Here is the interpolated uv draw call.
Here is indexing algorithm, i'm starting from zero:
...
Calculus::CPoint3<unsigned short> p, t, n;
sscanf_s(token, "%hu/%hu/%hu %hu/%hu/%hu %hu/%hu/%hu",
&p.x, &t.x, &n.x, &p.y, &t.y, &n.y, &p.z, &t.z, &n.z);
pi.push_back(p);
ti.push_back(t);
ni.push_back(n);
…
index = ti[i].x - 1;
temp_t[0] = vt[index]; // first uv
index = ti[i].y - 1;
temp_t[1] = vt[index]; // second uv
index = ti[i].z - 1;
temp_t[2] = vt[index]; // third uv
I wonder where I'm making a mistake. Thank you.
isect->uv = v_uvs[0] * b1 + v_uvs[1] * b2;
This is not the correct parametric interpolation of vertex attributes:
The parameters b1, b2 are being applied to the wrong vertices
You are not taking the third vertex v_uvs[2] into account
Correct version:
isect->uv = v_uvs[0] * b0 + v_uvs[1] * b1 + v_uvs[2] * b2;
I try and diagonalize the matrix:
In my analysis, I set $\hbar = 1$. The code is:
MODULE FUNCTION_CONTAINER
IMPLICIT NONE
SAVE
INTEGER, PARAMETER :: DBL = SELECTED_REAL_KIND(P = 15,R = 200)
COMPLEX(KIND = DBL), PARAMETER :: IMU = (0.0D0, 1.0D0)
REAL(KIND = DBL), PARAMETER :: S = 1.0D0
INTEGER, PARAMETER :: TEMP1 = NINT((2.0D0 * S) + 1.0D0)
INTEGER, PARAMETER :: DIMJ = TEMP1
INTEGER, PARAMETER :: TEMP2 = TEMP1*TEMP1
INTEGER, PARAMETER :: DIMMAT = TEMP2
CONTAINS
INTEGER FUNCTION KRONDELTAR(K,L)
IMPLICIT NONE
REAL(KIND = DBL), INTENT(IN)::K,L
REAL(KIND = DBL) :: TEMP
TEMP = DABS(K - L)
IF (TEMP < 0.000001D0) THEN
KRONDELTAR = 1
ELSE
KRONDELTAR = 0
END IF
END FUNCTION KRONDELTAR
SUBROUTINE MATJplus(MATOUT)
IMPLICIT NONE
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ),INTENT(OUT)::MATOUT
INTEGER::K,L
REAL(KIND = DBL)::M,MP
DO K = 1,DIMJ
DO L = 1,DIMJ
MP = (S + 1.0D0) - L
M = (S + 1.0D0) - K
MATOUT(K,L) = DSQRT(S * (S + 1.0D0) - M * (M + 1.0D0)) * KRONDELTAR(MP,M + 1)
END DO
END DO
END SUBROUTINE MATJplus
SUBROUTINE MATJminus(MATOUT)
IMPLICIT NONE
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ),INTENT(OUT)::MATOUT
INTEGER::K,L
REAL(KIND = DBL)::MP,M
DO K = 1,DIMJ
DO L = 1,DIMJ
MP = (S + 1) - L
M = (S + 1) - K
MATOUT(K,L) = DSQRT(S* (S + 1.0D0) - M * (M - 1.0D0)) * KRONDELTAR(MP,M - 1)
END DO
END DO
END SUBROUTINE MATJminus
SUBROUTINE MATJy(MATOUT)
IMPLICIT NONE
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ),INTENT(OUT)::MATOUT
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ)::Jp,Jm
CALL MATJplus(Jp)
CALL MATJminus(Jm)
MATOUT = (Jp - Jm)/(2.0D0 * IMU)
END SUBROUTINE MATJy
SUBROUTINE DIAGONALIZEJy(EIGENSTATESJy,EIGENVALUESJY)
IMPLICIT NONE
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ),INTENT(OUT)::EIGENSTATESJy
REAL(KIND = DBL), DIMENSION(DIMJ),INTENT(OUT)::EIGENVALUESJY
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ)::JyTEMP,Jy
COMPLEX(KIND = DBL),DIMENSION(2*DIMJ)::D1
REAL(KIND = DBL),DIMENSION(3*DIMJ - 2)::D2
INTEGER::D3
CALL MATJy(Jy)
JyTEMP = Jy
CALL ZHEEV('V','U',DIMJ,JyTEMP,DIMJ,EIGENVALUESJy,D1,2*DIMJ,D2,D3)
EIGENSTATESJy = JyTEMP
END SUBROUTINE DIAGONALIZEJy
END MODULE FUNCTION_CONTAINER
PROGRAM TEST
USE FUNCTION_CONTAINER
IMPLICIT NONE
COMPLEX(KIND = DBL), DIMENSION(DIMJ,DIMJ) :: EIGENSTATESJy, MatrixJy
REAL(KIND = DBL), DIMENSION(DIMJ) :: EIGENVALUESJy
CALL DIAGONALIZEJy(EIGENSTATESJy,EIGENVALUESJY)
CALL MATJy(MatrixJy)
OPEN(1, FILE = 'EIGENVALUESJy.DAT')
OPEN(2, FILE = 'EIGENSTATESJyREAL.DAT')
OPEN(3,FILE = 'EIGENSTATESJyCOMPLEX.DAT')
WRITE (1,*) EIGENVALUESJy
WRITE (2,*) REAL(EIGENSTATESJy)
WRITE (3,*) AIMAG(EIGENSTATESJy)
CLOSE(1)
CLOSE(2)
CLOSE(3)
END PROGRAM TEST
Up till the subroutine DIAGONALIZEJy, I am simply constructing the matrix stated above. One can easily check Fortran constructs is neatly by simply writing the result from the subroutine MatJy. I transfer the data to Mathematica. The results are:
{{-1., -9.19403*10^-17, 1.}}
This is the list of eigenvalues. The list of eigenvectors is:
{{-0.5 + 0. I, 0. - 0.707107 I, 0.5 + 0. I}, {0.707107 + 0. I,
0. + 1.04083*10^-16 I, 0.707107 + 0. I}, {-0.5 + 0. I,
0. + 0.707107 I, 0.5 + 0. I}}
The first eigenvector corresponds to the first eigenvalue (at least that's what I get by printing the column vectors from EigenvectorsJy one by one).
Clearly, the result is wrong. See:
http://www.wolframalpha.com/widgets/view.jsp?id=9aa01caf50c9307e9dabe159c9068c41
I hope the link shows the results for the eigenvalues problem done using a widget. The eigenvalues are correct but all the eigenvectors are way off.
Also, when I run only the subroutine that diagonlizes the matrix in my main program which contains a whole host of other stuff, the results are:
{{0.885212, 0., -0.920222}}
and
{{0.0439691 + 0. I, -0.388918 + 0. I, 0.5 + 0. I}, {0.707107 + 0. I,
0. + 1.04083*10^-16 I, 0.707107 + 0. I}, {-0.5 + 0. I,
0. + 0.707107 I, 0.5 + 0. I}}
As you can see, the non zero eigenvalues are a bit off and the eigenvectors are too (and still incorrect). Why is the main program giving a different result, perhaphs exacerbating the error? Also, in the first place (minimal example, see above), why am I getting wrong answers?
Edit: Apparently, the link doesn't show the results so here's a snippet:
In short, your Jy matrix in the code seems to be the complex conjugate of what is desired (i.e., the image posted in the Question), which results in the eigenvectors that are complex conjugate of the correct ones.
The above error seems to originate from the OP's assumption that list-directed output (as write(*,*) A) prints the matrix elements in the "row-major" order, while in fact they are printed in the "column-major" order (see the comments below). By noting this and correcting the program accordingly, I think the program will work as expected.
More specifically, adding the following utility routine to print a matrix
subroutine printmat( msg, mat )
implicit none
character(*), intent(in) :: msg
complex(DBL), intent(in) :: mat( dimJ, dimJ )
integer i1, i2
print *
print *, msg
do i1 = 1, dimJ
print "(3('(',f10.6,',',f10.6,' ) '))", ( mat( i1, i2 ), i2 = 1,dimJ )
enddo
end subroutine
and checking the value of Jp, Jm, Jy in the subroutine MATJy()
Jp:
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
( 1.414214, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
( 0.000000, 0.000000 ) ( 1.414214, 0.000000 ) ( 0.000000, 0.000000 )
Jm:
( 0.000000, 0.000000 ) ( 1.414214, 0.000000 ) ( 0.000000, 0.000000 )
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 1.414214, 0.000000 )
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
Jy * sqrt(2):
( 0.000000, 0.000000 ) ( 0.000000, 1.000000 ) ( 0.000000, 0.000000 )
( 0.000000, -1.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 1.000000 )
( 0.000000, 0.000000 ) ( 0.000000, -1.000000 ) ( 0.000000, 0.000000 )
eigenvaluesJy(1) = -1.000000
eigvec:
( -0.500000, 0.000000 )
( 0.000000, -0.707107 )
( 0.500000, 0.000000 )
eigenvaluesJy(2) = -0.000000
eigvec:
( 0.707107, 0.000000 )
( 0.000000, 0.000000 )
( 0.707107, 0.000000 )
eigenvaluesJy(3) = 1.000000
eigvec:
( -0.500000, 0.000000 )
( 0.000000, 0.707107 )
( 0.500000, 0.000000 )
we see that the above Jy matrix is the complex conjugate of the desired matrix (given as an image in the Question). The reason seems to be that the Jp and Jm matrices are given as the transpose of the correct ones (according to some pages like this and this). For example, if we change their index as
SUBROUTINE MATJplus(MATOUT)
IMPLICIT NONE
COMPLEX(KIND = DBL),DIMENSION(DIMJ,DIMJ),INTENT(OUT)::MATOUT
INTEGER::K,L
REAL(KIND = DBL)::M,MP
DO K = 1,DIMJ
DO L = 1,DIMJ
MP = (S + 1.0D0) - L !! 1, 0, -1 ("m_prime")
M = (S + 1.0D0) - K !! 1, 0, -1 ("m")
!>>> Here, we swap the indices K and L in the LHS
!! MATOUT(K,L) = DSQRT(S * (S + 1.0D0) - M * (M + 1.0D0)) * KRONDELTAR(MP, M + 1)
MATOUT(L,K) = DSQRT(S * (S + 1.0D0) - M * (M + 1.0D0)) * KRONDELTAR(MP, M + 1)
END DO
END DO
call printmat( "Jplus:", matout )
END SUBROUTINE
(and modifying MATJminus() similarly), we obtain the expected result:
Jp:
( 0.000000, 0.000000 ) ( 1.414214, 0.000000 ) ( 0.000000, 0.000000 )
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 1.414214, 0.000000 )
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
Jm:
( 0.000000, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
( 1.414214, 0.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, 0.000000 )
( 0.000000, 0.000000 ) ( 1.414214, 0.000000 ) ( 0.000000, 0.000000 )
Jy * sqrt(2):
( 0.000000, 0.000000 ) ( 0.000000, -1.000000 ) ( 0.000000, 0.000000 )
( 0.000000, 1.000000 ) ( 0.000000, 0.000000 ) ( 0.000000, -1.000000 )
( 0.000000, 0.000000 ) ( 0.000000, 1.000000 ) ( 0.000000, 0.000000 )
eigenvaluesJy(1) = -1.000000
eigvec:
( -0.500000, 0.000000 )
( 0.000000, 0.707107 )
( 0.500000, 0.000000 )
eigenvaluesJy(2) = -0.000000
eigvec:
( 0.707107, 0.000000 )
( 0.000000, -0.000000 )
( 0.707107, 0.000000 )
eigenvaluesJy(3) = 1.000000
eigvec:
( -0.500000, 0.000000 )
( 0.000000, -0.707107 )
( 0.500000, 0.000000 )
For convenience, here are some matrices taken from the above pages (which can be compared directly with the above Jp, Jm, Jy):
I copied a code for conversion of 3D roation matrix to quaternions and back. The same code is used in jMonkey (I just rewrote it into my C++ class). However, it does not work properly (at least not as I would expect.)
e.g. I made this test:
matrix (a,b,c):
a : 0.707107 0.000000 0.707107
b : 0.000000 -1.000000 0.000000
c : -0.707107 0.000000 0.707107
>>> ortonormality:
a.a b.b c.c 1.000000 1.000000 1.000000
a.b a.c b.c 0.000000 0.000000 0.000000
>>> matrix -> quat
quat: 0.000000 0.594604 0.000000 0.594604 norm(quat) 0.707107
>>> quat -> matrix
matrix (a,b,c):
a: 0.000000 0.000000 1.000000
b: 0.000000 1.000000 0.000000
c: -1.000000 0.000000 0.000000
I think the problem is in matrix -> quat because I have used quat -> matrix procedure before, and it was working fine. Also it is strange that quaternion made from orthonormal matrix is not unitary.
the matrix -> quat procedure
inline void fromMatrix( TYPE m00, TYPE m01, TYPE m02, TYPE m10, TYPE m11, TYPE m12, TYPE m20, TYPE m21, TYPE m22) {
// Use the Graphics Gems code, from
// ftp://ftp.cis.upenn.edu/pub/graphics/shoemake/quatut.ps.Z
TYPE t = m00 + m11 + m22;
// we protect the division by s by ensuring that s>=1
if (t >= 0) { // by w
TYPE s = sqrt(t + 1);
w = 0.5 * s;
s = 0.5 / s;
x = (m21 - m12) * s;
y = (m02 - m20) * s;
z = (m10 - m01) * s;
} else if ((m00 > m11) && (m00 > m22)) { // by x
TYPE s = sqrt(1 + m00 - m11 - m22);
x = s * 0.5;
s = 0.5 / s;
y = (m10 + m01) * s;
z = (m02 + m20) * s;
w = (m21 - m12) * s;
} else if (m11 > m22) { // by y
TYPE s = sqrt(1 + m11 - m00 - m22);
y = s * 0.5;
s = 0.5 / s;
x = (m10 + m01) * s;
z = (m21 + m12) * s;
w = (m02 - m20) * s;
} else { // by z
TYPE s = sqrt(1 + m22 - m00 - m11);
z = s * 0.5;
s = 0.5 / s;
x = (m02 + m20) * s;
y = (m21 + m12) * s;
w = (m10 - m01) * s;
}
}
the quat -> matrix procedure
inline void toMatrix( MAT& result) const {
TYPE r2 = w*w + x*x + y*y + z*z;
//TYPE s = (r2 > 0) ? 2d / r2 : 0;
TYPE s = 2 / r2;
// compute xs/ys/zs first to save 6 multiplications, since xs/ys/zs
// will be used 2-4 times each.
TYPE xs = x * s; TYPE ys = y * s; TYPE zs = z * s;
TYPE xx = x * xs; TYPE xy = x * ys; TYPE xz = x * zs;
TYPE xw = w * xs; TYPE yy = y * ys; TYPE yz = y * zs;
TYPE yw = w * ys; TYPE zz = z * zs; TYPE zw = w * zs;
// using s=2/norm (instead of 1/norm) saves 9 multiplications by 2 here
result.xx = 1 - (yy + zz);
result.xy = (xy - zw);
result.xz = (xz + yw);
result.yx = (xy + zw);
result.yy = 1 - (xx + zz);
result.yz = (yz - xw);
result.zx = (xz - yw);
result.zy = (yz + xw);
result.zz = 1 - (xx + yy);
};
sorry for TYPE, VEC, MAT, QUAT it is part of class tepmpltes... should be replaced by double, Vec3d, Mat3d, Quat3d or float, Vec3f, Mat3f, Quat3f.
EDIT:
I also checked if I get the same behaviour with jMonkey directly (in case I made a bug in Java to C++ conversion ). And I do - using this code:
Matrix3f Min = new Matrix3f( 0.707107f, 0.000000f, 0.707107f, 0.000000f, -1.000000f, 0.000000f, -0.707107f, 0.000000f, 0.707107f );
Matrix3f Mout = new Matrix3f( );
Quaternion q = new Quaternion();
q.fromRotationMatrix(Min);
System.out.println( q.getX()+" "+q.getY()+" "+q.getZ()+" "+q.getW() );
q.toRotationMatrix(Mout);
System.out.println( Mout.get(0,0) +" "+Mout.get(0,1)+" "+Mout.get(0,2) );
System.out.println( Mout.get(1,0) +" "+Mout.get(1,1)+" "+Mout.get(1,2) );
System.out.println( Mout.get(2,0) +" "+Mout.get(2,1)+" "+Mout.get(2,2) );
Your matrix:
matrix (a,b,c):
a : 0.707107 0.000000 0.707107
b : 0.000000 -1.000000 0.000000
c : -0.707107 0.000000 0.707107
is orthogonal but it is not a rotation matrix. A rotation matrix has determinant 1; your matrix has determinant -1 and is thus an improper rotation.
I think your code is likely correct and the issue is in your data. Try it with a real rotation matrix.