Eigen use of diagonal matrix - c++

Using Eigen, I have a Matrix3Xd (3 rows, n columns). I would like to get the squared norm of all columns
to be clearer, lets say I have
Matrix3Xd a =
1 3 2 1
2 1 1 4
I would like to get the squared norm of each column
squaredNorms =
5 10 5 17
I wanted to take advantage of matrix computation instead of going through a for loop doing the computation myself.
What I though of was
squaredNorms = (A.transpose() * A).diagonal()
This works, but I am afraid of performance issues: A.transpose() * A will be a nxn matrix (potentially million of elements), when I only need the diagonal.
Is Eigen clever enough to compute only the coefficients I need?
What would be the most efficient way to achieve squareNorm computation on each column?

The case of (A.transpose() * A).diagonal() is explicitly handled by Eigen to enforce lazy evaluation of the product expression nested in a diagonal-view. Therefore, only the n required diagonal coefficients will be computed.
That said, it's simpler to call A.colwise().squaredNorm() as well noted by Eric.

This will do what you want.
squaredNorms = A.colwise().squaredNorm();
https://eigen.tuxfamily.org/dox/group__QuickRefPage.html
Eigen provides several reduction methods such as: minCoeff() , maxCoeff() , sum() , prod() , trace() *, norm() *, squaredNorm() *, all() , and any() . All reduction operations can be done matrix-wise, column-wise or row-wise .

Related

How to remove the multiple of a function in Sympy

I am now making an ODE solver using Sympy. It can solve = Separable, Linear, Bernoulli, Second-Order Homogenous, Exact and Non Exact Equation, Nth Order Homogenous DE, Lagrange, Clairaut, 2nd and 3rd Order Nonhomogenous Diff. Eq, Can use undetermined coeff. in the form of: exp(x), cos(x), sin(x), x**n, exp(x)cos(x).
But for 2 * cos(x), 3 * exp(x) the code does not understand when the code is multiplied by a number. Here is my question. How can I remove every coefficients. Are there any function or method? Thanks in advance.
There are lots of ways depending on exactly what you are doing but e.g.:
In [124]: f = 2 * cos(x)
In [125]: f
Out[125]: 2â‹…cos(x)
In [126]: c, m = f.as_coeff_Mul()
In [127]: c
Out[127]: 2
In [128]: m
Out[128]: cos(x)

cpp calculate 6x6 Covariance Matrix from two 1x3 arrays

Like title says, I am attempting to calculate the covariance matrix for two 1x3 arrays and get one 6x6 std::array in C++. I need some guidance with my understanding - have looked and not been able to see much in terms of clarity to answer my question.
I have two arrays each with 3 elements.
Array 1 holds location data (x,y,z) and Array2 holds velocity data; we will call it (A,B,C)
Array1 = {x,y,z}
Array2 = {A,B,C}
and need to complete a Covariance matrix computing this into a 2d array[6][6]
I don't understand How I would get this.
I think my covariance formula is correct but still this would give me just an array[3][3].
cov = ( (Array1[n] - mean(Array1)) * (Array2[n] - mean(Array2)) ) / 3
\ 3 because its the number of values in each array.

vector * matrix product efficiency issue

Just as Z boson recommended, I am using a column-major matrix format in order to avoid having to use the dot product. I don't see a feasible way to avoid it when multiplying a vector with a matrix, though. The matrix multiplication trick requires efficient extraction of rows (or columns, if we transpose the product). To multiply a vector by a matrix, we therefore transpose:
(b * A)^T = A^T * b^T
A is a matrix, b a row vector, which, after being transposed, becomes a column vector. Its rows are just single scalars and the vector * matrix product implementation becomes an inefficient implementation of dot products of columns of (non-transposed) matrix A with b. Is there a way to avoid performing these dot products? The only way I see that could do it, would involve row extraction, which is inefficient with the column-major matrix format.
This can be understood from original post on this (my first on SO)
efficient-4x4-matrix-vector-multiplication-with-sse-horizontal-add-and-dot-prod
. The rest of the discussion applies to 4x4 matrices.
Here are two methods to do do matrix times vector (v = Mu where v and u are column vectors)
method 1) v1 = dot(row1, u), v2 = dot(row2, u), v3 = dot(row3, u), v4 = dot(row4, u)
method 2) v = u1*col1 + u2*col2 + u3*col3 + u4*col4.
The first method is more familiar from math class while the second is more efficient for a SIMD computer. The second method uses vectorized math (like numpy) e.g.
u1*col1 = (u1x*col1x, u1y*col1y, u1z*col1z, u1w*col1w).
Now let's look at vector times matrix (v = uM where v and u are row vectors)
method 1) v1 = dot(col1, u), v2 = dot(col2, u), v3 = dot(col3, u), v4 = dot(col4, u)
method 2) v = u1*row1 + u2*row2 + u3*row3 + u4*row4.
Now the roles of columns and rows have swapped but method 2 is still the efficient method to use on a SIMD computer.
To do matrix times vector efficiently on a SIMD computer the matrix should be stored in column-major order. To do vector times matrix efficient on a SIMD computer the matrix should be stored in row-major order.
As far as I understand OpenGL uses column major ordering and does matrix times vector and DirectX uses row-major ordering and does vector times matrix.
If you have three matrix transformations that you do in order M1 first then M2 then M3 with matrix times vector you write it as
v = M3*M2*M1*u //u and v are column vectors - OpenGL form
With vector times matrix you write
v = u*M1*M2*M3 //u and v are row vectors - DirectX form
Neither form is better than the other in terms of efficiency. It's just a question of notation (and causing confusion which is useful when you have competition).
It's important to note that for matrix*matrix row-major versus column-major storage is irrelevant.
If you want to know why the vertical SIMD instructions are faster than the horizontal ones that's a separate question which should be asked but in short the horizontal ones really act in serial rather than parallel and are broken up into several micro-ops (which is why ironically dppd is faster than dpps).

Diagonalization of a 2x2 self-adjoined (hermitian) matrix

Diagonalizing a 2x2 hermitian matrix is simple, it can be done analytically. However, when it comes to calculating the eigenvalues and eigenvectors over >10^6 times, it is important to do it as efficient as possible. Especially if the off-diagonal elements can vanish it is not possible to use one formula for the eigenvectors: An if-statement is necessary, which of course slows down the code. Thus, I thought using Eigen, where it's stated that the diagonalization of 2x2 and 3x3 matrices is optimized, would be still a good choice:
using
const std::complex<double> I ( 0.,1. );
inline double block_distr ( double W )
{
return (-W/2. + rand() * W/RAND_MAX);
}
a test-loop would be
...
SelfAdjointEigenSolver<Matrix<complex< double >, 2, 2> > ces;
Matrix<complex< double >, 2, 2> X;
for (int i = 0 ; i <iter_MAX; ++i) {
a00=block_distr(100.);
a11=block_distr(100.);
re_a01=block_distr(100.);
im_a01=block_distr(100.);
X(0,0)=a00;
X(1,0)=re_a01-I*im_a01;
//only the lower triangular part is referenced! X(0,1)=0.; <--- not necessary
X(1,1)=a11;
ces.compute(X,ComputeEigenvectors);
}
Writing the loop without Eigen, using directly the formulas for eigenvalues and eigenvectors of a hermitian matrix and an if-statement to check if the off diagonal is zero, is a factor of 5 faster. Am I not using Eigen properly or is such an overhead normal? Are there other lib.s which are optimized for small self-adjoint matrices?
By default, the iterative method is used. To use the analytical version for the 2x2 and 3x3, you have to call the computeDirect function:
ces.computeDirect(X);
but it is unlikely to be faster than your implementation of the analytic formulas.

An optimal way to calculate the subset of a vector of vectors

Okay, so I'm implementing an algorithm that calculates the determinant of a 3x3 matrix give by the following placements:
A = [0,0 0,1 0,2
1,0 1,1 1,2
2,0 2,1 2,2]
Currently, the algorithm is like so:
float a1 = A[0][0];
float calula1 = (A[1][1] * A[2][2]) - (A[2][1] * A[1][2])
Then we move over to the next column, so it would be be:
float a2 = A[0][1];
float calcula2 = (A[1][0] * A[2][2]) - (A[2][0] * A[1][2]);
Like so, moving across one more. Now, this, personally is not very efficient and I've already implemented a function that can calculate the determinant of a 2x2 matrix which, is basically what I'm doing for each of these calculations.
My question is therefore, is there an optimal way that I can do this? I've thought about the idea of having a function, that invokes a template (X, Y) which denotes the start and ending positions of the particular block of the 3x3 matrix:
template<typename X, Y>
float det(std::vector<Vector> data)
{
//....
}
But, I have no idea if this was the way to do this, how I would be able to access the different elements of this like the proposed approach?
You could hardcode the rule of Sarrus like so if you're exclusively dealing with 3 x 3 matrices.
float det_3_x_3(float** A) {
return A[0][0]*A[1][1]*A[2][2] + A[0][1]*A[1][2]*A[2][0]
+ A[0][2]*A[1][0]*A[2][1] - A[2][0]*A[1][1]*A[0][2]
- A[2][1]*A[1][2]*A[0][0] - A[2][2]*A[1][0]*A[0][1];
}
If you want to save 3 multiplications, you can go
float det_3_x_3(float** A) {
return A[0][0] * (A[1][1]*A[2][2] - A[2][1]*A[1][2])
+ A[0][1] * (A[1][2]*A[2][0] - A[2][2]*A[1][0])
+ A[0][2] * (A[1][0]*A[2][1] - A[2][0]*A[1][1]);
}
I expect this second function is pretty close to what you have already.
Since you need all those numbers to calculate the determinant and thus have to access each of them at least once, I doubt there's anything faster than this. Determinants aren't exactly pretty, computationally. Faster algorithms than the brute force approach (which the rule of Sarrus basically is) require you to transform the matrix first, and that'll eat more time for 3 x 3 matrices than just doing the above would. Hardcoding the Leibniz formula - which is all that the rule of Sarrus amounts to - is not pretty, but I expect it's the fastest way to go if you don't have to do any determinants for n > 3.