Matrix-vector product like multiplication in Eigen - c++

I am currently facing this problem.
I have two matrixes MatrixXf
A:
0.5 0.5 0.5 0.5
0.694496 0.548501 0.680067 0.717111
0.362112 0.596561 0.292028 0.370271
0.56341 0.642395 0.467179 0.598476
and B
0.713072
0.705231
0.772228
0.767898
I want to multiply them like matrix x vector to achive:
0.5*0.713072 0.5*0.713072 0.5*0.713072 0.5*0.713072
0.694496*0.705231 0.548501*0.705231 0.680067*0.705231 0.717111*0.705231
0.362112*0.772228 0.596561*0.772228 0.292028*0.772228 0.370271*0.772228
0.56341*0.767898 0.642395*0.767898 0.467179*0.767898 0.598476*0.767898
Is there an option to do that in Eigen? How can do that a simply way?
http://mathinsight.org/matrix_vector_multiplication

This has been asked so many times, you want a scaling:
MatrixXf A;
VectorXf B;
MatrixXf res = B.asDiagonal() * A;
or using broadcasting:
res = A.array().colwise() * B.array();

In short you want to do an element-wise product between each column of A and vector B.
There are at least two ways of accomplishing this:
iterate over each column of A to do an element-wise product with B (in eigen referred to as coefficient-wise product)
replicate your B vector into a matrix with the same size as A and perform an element-wise product between A and the new matrix obtained from vector B.
Here's a quick and dirty example based on Eigen's cwiseProduct() and replicate() functions:
auto C = A.cwiseProduct( B.replicate<1,4>() );

Related

Solving for Lx=b and Px=b when A=LLt

I am decomposing a sparse SPD matrix A using Eigen. It will either be a LLt or a LDLt deomposition (Cholesky), so we can assume the matrix will be decomposed as A = P-1 LDLt P where P is a permutation matrix, L is triangular lower and D diagonal (possibly identity). If I do
SolverClassName<SparseMatrix<double> > solver;
solver.compute(A);
To solve Lx=b then is it efficient to do the following?
solver.matrixL().TriangularView<Lower>().solve(b)
Similarly, to solve Px=b then is it efficient to do the following?
solver.permutationPinv()*b
I would like to do this in order to compute bt A-1 b efficiently and stably.
Have a look how _solve_impl is implemented for SimplicialCholesky. Essentially, you can simply write:
Eigen::VectorXd x = solver.permutationP()*b; // P not Pinv!
solver.matrixL().solveInPlace(x); // matrixL is already a triangularView
// depending on LLt or LDLt use either:
double res_llt = x.squaredNorm();
double res_ldlt = x.dot(solver.vectorD().asDiagonal().inverse()*x);
Note that you need to multiply by P and not Pinv, since the inverse of
A = P^-1 L D L^t P is
P^-1 L^-t D^-1 L^-1 P
because the order of matrices reverses when taking the inverse of a product.

Matrix multiplication very slow in Eigen

I have implemented a Gauss-Newton optimization process which involves calculating the increment by solving a linearized system Hx = b. The H matrx is calculated by H = J.transpose() * W * J and b is calculated from b = J.transpose() * (W * e) where e is the error vector. Jacobian here is a n-by-6 matrix where n is in thousands and stays unchanged across iterations and W is a n-by-n diagonal weight matrix which will change across iterations (some diagonal elements will be set to zero). However I encountered a speed issue.
When I do not add the weight matrix W, namely H = J.transpose()*J and b = J.transpose()*e, my Gauss-Newton process can run very fast in 0.02 sec for 30 iterations. However when I add the W matrix which is defined outside the iteration loop, it becomes so slow (0.3~0.7 sec for 30 iterations) and I don't understand if it is my coding problem or it normally takes this long.
Everything here are Eigen matrices and vectors.
I defined my W matrix using .asDiagonal() function in Eigen library from a vector of inverse variances. then just used it in the calculation for H ad b. Then it gets very slow. I wish to get some hints about the potential reasons for this huge slowdown.
EDIT:
There are only two matrices. Jacobian is definitely dense. Weight matrix is generated from a vector by the function vec.asDiagonal() which comes from the dense library so I assume it is also dense.
The code is really simple and the only difference that's causing the time change is the addition of the weight matrix. Here is a code snippet:
for (int iter=0; iter<max_iter; ++iter) {
// obtain error vector
error = ...
// calculate H and b - the fast one
Eigen::MatrixXf H = J.transpose() * J;
Eigen::VectorXf b = J.transpose() * error;
// calculate H and b - the slow one
Eigen::MatrixXf H = J.transpose() * weight_ * J;
Eigen::VectorXf b = J.transpose() * (weight_ * error);
// obtain delta and update state
del = H.ldlt().solve(b);
T <- T(del) // this is pseudo code, meaning update T with del
}
It is in a function in a class, and weight matrix now for debug purposes is defined as a class variable that can be accessed by the function and is defined before the function is called.
I guess that weight_ is declared as a dense MatrixXf? If so, then replace it by w.asDiagonal() everywhere you use weight_, or make the later an alias to the asDiagonal expression:
auto weight = w.asDiagonal();
This way Eigen will knows that weight is a diagonal matrix and computations will be optimized as expected.
Because the matrix multiplication is just the diagonal, you can change it to use coefficient wise multiplication like so:
MatrixXd m;
VectorXd w;
w.setLinSpaced(5, 2, 6);
m.setOnes(5,5);
std::cout << (m.array().rowwise() * w.array().transpose()).matrix() << "\n";
Likewise, the matrix vector product can be written as:
(w.array() * error.array()).matrix()
This avoids the zero elements in the matrix. Without an MCVE for me to base this on, YMMV...

sparse sparse product A^T*A optim in Eigen lib

In the case of multiple of same matrix matA, like
matA.transpose()*matA,
You don't have to compute all result product, because the result matrix is symmetric(so only if the m>n), in my specific case is always symmetric! square.
So its enough the compute only for. ex. lower triangular part and rest only copy..... because the results of the multiple 2nd and 3rd row, resp.col, is the same like 3rd and 2nd.....And etc....
So my question is , exist way how to tell Eigen, to compute only lower part. and optionally save to only lower trinaguler part the product?
DATA = SparseMatrix<double>((SparseMatrix<double>(matA.transpose()) * matA).pruned()).toDense();
According to the documentation, you can evaluate the lower triangle of a matrix with:
m1.triangularView<Eigen::Lower>() = m2 + m3;
or in your case:
m1.triangularView<Eigen::Lower>() = matA.transpose()*matA;
(where it says "Writing to a specific triangular part: (only the referenced triangular part is evaluated)"). Otherwise, in the line you've written
Eigen will calculate the entire sparse matrix matA.transpose()*matA.
Regarding saving the resulting m1 matrix, it is the same as saving whatever type of matrix it is (Eigen::MatrixXt or Eigen::SparseMatrix<t>). If m1 is sparse, then it will be only half the size of a straightforward matA.transpose()*matA. If m1 is dense, then it will be the full square matrix.
https://eigen.tuxfamily.org/dox/classEigen_1_1SparseSelfAdjointView.html
The symmetric rank update is defined as:
B = B + alpha * A * A^T
where alpha is a scalar. In your case, you are doing A^T * A, so you should pass the transposed matrix instead. The resulting matrix will only store the upper or lower portion of the matrix, whichever you prefer. For example:
SparseMatrix<double> B;
B.selfadjointView<Lower>().rankUpdate(A.transpose());

Multiplying matrices in Eigen c++ gives wrong dimensions

I'm having trouble understanding why I am getting a 10x10 matrix as a result from multiplying a 10x3 matrix with a 3x10 matrix using the Eigen library in c++.
By following the documentation at http://eigen.tuxfamily.org/dox-devel/group__TutorialMatrixArithmetic.html I came up with
const int NUM_OBSERVATIONS = 10;
const int NUM_DIMENSIONS = 3;
MatrixXf localspace(NUM_DIMENSIONS, NUM_OBSERVATIONS);
MatrixXf rotatedlocalspace(NUM_OBSERVATIONS, NUM_DIMENSIONS);
MatrixXf covariance(NUM_DIMENSIONS, NUM_DIMENSIONS);
covariance = (rotatedlocalspace * localspace) / (NUM_OBSERVATIONS - 1);
cout << covariance << endl;
Output gives a 10x10 matrix, when I am trying to obtain a 3x3 covariance matrix for each dimension (These are mean centered XYZ points). "localspace" and "rotatedlocalspace" are both filled with float values when covariance is calculated.
How do I get the correct covariance matrix?
Eigen is correct, as it reproduces basic math: if A is a matrix of dimension n x m and B has dimension m x k, then A*B has the dimension n x k.
Applied to your problem, if your matrix rotatedlocalspace is of dimension 10 x 3 and localspace has dimension 3 x 10, then rotatedlocalspace*localspace has dimension
(10 x 3) * (3 x 10) -> 10 x 10.
The scalar division you apply further doesn't change the dimension.
If you expect a different dimension, then try to commute the factors in the matrix product. This you will obtain a 3x3 matrix.

vector * matrix product efficiency issue

Just as Z boson recommended, I am using a column-major matrix format in order to avoid having to use the dot product. I don't see a feasible way to avoid it when multiplying a vector with a matrix, though. The matrix multiplication trick requires efficient extraction of rows (or columns, if we transpose the product). To multiply a vector by a matrix, we therefore transpose:
(b * A)^T = A^T * b^T
A is a matrix, b a row vector, which, after being transposed, becomes a column vector. Its rows are just single scalars and the vector * matrix product implementation becomes an inefficient implementation of dot products of columns of (non-transposed) matrix A with b. Is there a way to avoid performing these dot products? The only way I see that could do it, would involve row extraction, which is inefficient with the column-major matrix format.
This can be understood from original post on this (my first on SO)
efficient-4x4-matrix-vector-multiplication-with-sse-horizontal-add-and-dot-prod
. The rest of the discussion applies to 4x4 matrices.
Here are two methods to do do matrix times vector (v = Mu where v and u are column vectors)
method 1) v1 = dot(row1, u), v2 = dot(row2, u), v3 = dot(row3, u), v4 = dot(row4, u)
method 2) v = u1*col1 + u2*col2 + u3*col3 + u4*col4.
The first method is more familiar from math class while the second is more efficient for a SIMD computer. The second method uses vectorized math (like numpy) e.g.
u1*col1 = (u1x*col1x, u1y*col1y, u1z*col1z, u1w*col1w).
Now let's look at vector times matrix (v = uM where v and u are row vectors)
method 1) v1 = dot(col1, u), v2 = dot(col2, u), v3 = dot(col3, u), v4 = dot(col4, u)
method 2) v = u1*row1 + u2*row2 + u3*row3 + u4*row4.
Now the roles of columns and rows have swapped but method 2 is still the efficient method to use on a SIMD computer.
To do matrix times vector efficiently on a SIMD computer the matrix should be stored in column-major order. To do vector times matrix efficient on a SIMD computer the matrix should be stored in row-major order.
As far as I understand OpenGL uses column major ordering and does matrix times vector and DirectX uses row-major ordering and does vector times matrix.
If you have three matrix transformations that you do in order M1 first then M2 then M3 with matrix times vector you write it as
v = M3*M2*M1*u //u and v are column vectors - OpenGL form
With vector times matrix you write
v = u*M1*M2*M3 //u and v are row vectors - DirectX form
Neither form is better than the other in terms of efficiency. It's just a question of notation (and causing confusion which is useful when you have competition).
It's important to note that for matrix*matrix row-major versus column-major storage is irrelevant.
If you want to know why the vertical SIMD instructions are faster than the horizontal ones that's a separate question which should be asked but in short the horizontal ones really act in serial rather than parallel and are broken up into several micro-ops (which is why ironically dppd is faster than dpps).