Matrix multiplication very slow in Eigen - c++

I have implemented a Gauss-Newton optimization process which involves calculating the increment by solving a linearized system Hx = b. The H matrx is calculated by H = J.transpose() * W * J and b is calculated from b = J.transpose() * (W * e) where e is the error vector. Jacobian here is a n-by-6 matrix where n is in thousands and stays unchanged across iterations and W is a n-by-n diagonal weight matrix which will change across iterations (some diagonal elements will be set to zero). However I encountered a speed issue.
When I do not add the weight matrix W, namely H = J.transpose()*J and b = J.transpose()*e, my Gauss-Newton process can run very fast in 0.02 sec for 30 iterations. However when I add the W matrix which is defined outside the iteration loop, it becomes so slow (0.3~0.7 sec for 30 iterations) and I don't understand if it is my coding problem or it normally takes this long.
Everything here are Eigen matrices and vectors.
I defined my W matrix using .asDiagonal() function in Eigen library from a vector of inverse variances. then just used it in the calculation for H ad b. Then it gets very slow. I wish to get some hints about the potential reasons for this huge slowdown.
EDIT:
There are only two matrices. Jacobian is definitely dense. Weight matrix is generated from a vector by the function vec.asDiagonal() which comes from the dense library so I assume it is also dense.
The code is really simple and the only difference that's causing the time change is the addition of the weight matrix. Here is a code snippet:
for (int iter=0; iter<max_iter; ++iter) {
// obtain error vector
error = ...
// calculate H and b - the fast one
Eigen::MatrixXf H = J.transpose() * J;
Eigen::VectorXf b = J.transpose() * error;
// calculate H and b - the slow one
Eigen::MatrixXf H = J.transpose() * weight_ * J;
Eigen::VectorXf b = J.transpose() * (weight_ * error);
// obtain delta and update state
del = H.ldlt().solve(b);
T <- T(del) // this is pseudo code, meaning update T with del
}
It is in a function in a class, and weight matrix now for debug purposes is defined as a class variable that can be accessed by the function and is defined before the function is called.

I guess that weight_ is declared as a dense MatrixXf? If so, then replace it by w.asDiagonal() everywhere you use weight_, or make the later an alias to the asDiagonal expression:
auto weight = w.asDiagonal();
This way Eigen will knows that weight is a diagonal matrix and computations will be optimized as expected.

Because the matrix multiplication is just the diagonal, you can change it to use coefficient wise multiplication like so:
MatrixXd m;
VectorXd w;
w.setLinSpaced(5, 2, 6);
m.setOnes(5,5);
std::cout << (m.array().rowwise() * w.array().transpose()).matrix() << "\n";
Likewise, the matrix vector product can be written as:
(w.array() * error.array()).matrix()
This avoids the zero elements in the matrix. Without an MCVE for me to base this on, YMMV...

Related

correlation of a fit from colPivHouseholderQr().solve

I have some C++ code that is getting a bunch of X,Y values and doing
a linear fit
Eigen::Matrix<float, Eigen::Dynamic, 2> DX;
Eigen::Matrix<float, Eigen::Dynamic, 1> DY;
For loop over the data values (edited a bit because my data source
is a bit more complicated than simple arrays):
{
DX(i,0) = x[i];
DX(i,1) = 1;
DY(i,0) = y[i];
}
then
Eigen::Vector2f Dsolution = DX.colPivHouseholderQr().solve(DY);
// linear solution is in Dsolution[0] and Dsolution[1]
I need the correlation coefficient from that calculation.
How do I obtain it?
Most Eigen stuff is about two floors above my head, so you may need to spell it out in an elementary way.
The fundamental issue is that I'm running this routine on multiple data sets
and I need some indication of the quality of data as regards to internal noise and variance.
Thanks!
I'm assuming you are looking to compute the R² coefficient of your least-square fitting.
Linear least squares
First, let's recap what you're doing. In your Dsolution vector are two coefficients (lets call them a and b, which are your estimated parameters for an affine model between your xs and your y s). This means that for each x[i] your model's estimate for the corresponding y[i] is estimated_y[i] = a * x[i] + b.
a and b are computed by minimizing the sum of the squares of the difference between the observed y[i] and their estimated value a*x[i] + b, also called the residuals. It turns out that you can simply do that by solving a linear problem, which is why you use Eigen's solve() to find them.
Computing R²
Now we want to compute R², which is an indicator of how "good" your fit is.
If we follow the definition from Wikipedia linked above, to compute R² you need to :
Compute the average of the observed values y_avg
Compute the total sum of squares i.e. the sum of the square differences between the observed values and their average (this is like the variance but you don't divide by the number of samples)
Compute the total sum of squared residuals by summing the square of differences between the predicted and observed value of each y
Then R² is 1 - (sum_residuals_squares / sum_squares)
Eigen code
Let's see how we can do this with Eigen :
float r_squared(const MatrixX2f& DX, const VectorXf& DY, const Vector2f& model)
{
// Compute average
const float y_avg = DY.mean();
// Compute total sum of squares
const int N = DX.rows();
const float sum_squares = (DY - (y_avg * VectorXf::Ones(N))).squaredNorm();
// Compute predicted values
const VectorXf estimated_DY = DX * model;
// Compute sum of residual squared
const float sum_residuals_square = (DY - estimated_DY).squaredNorm();
return 1 - (sum_residuals_square / sum_squares);
}
The trick used in both sum of squares's expression is to use the squared norm function, because the squared norm of a vector is the sum of squares of its components. We do it twice because we have two sum of squares to compute.
In the first case, we created a vector of size N full of ones that we multiply by y_avg, to get a vector whose elements are all y_avg. Then each element of DY minus that vector will be y[i] - y_avg, and we compute the square norm to get the total sum of squares.
In the second case, we first compute the predicted y's by using your linear model, and then compute the difference with the observed values, using the squared norm to compute the sum of squared differences.

Matrix multiplication optimization

I am performing a series of matrix multiplications with fairly large matrices. To run through all of these operations takes a long time, and I need my program to do this in a large loop. I was wondering if anyone has any ideas to speed this up? I just started using Eigen, so I have very limited knowledge.
I was using ROOT-cern's built in TMatrix class, but the speed for performing the matrix operations is very poor. I set up some diagonal matrices using Eigen with the hope that it handled the multiplication operation in a more optimal way. It may, but I cannot really see the performance difference.
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
Eigen::MatrixXf r(2*size,2*size); // diagonal matrix of row sums of a
// fill matrix r
Eigen::MatrixXf c(size,size); // diagonal matrix of col sums of a
// fill matrix c
// transpose a in place
a.transposeInPlace();
Eigen::MatrixXf c_dia;
c_dia = c.diagonal().asDiagonal();
Eigen::MatrixXf r_dia;
r_dia = r.diagonal().asDiagonal();
// calc car
Eigen::MatrixXf car;
car = c_dia*a*r_dia;
You are doing way too much work here. If you have diagonal matrices, only store the diagonal (and directly use that for products). Once you store a diagonal matrix in a square matrix, the information of the structure is lost to Eigen.
Also, you don't need to store the transposed variant of a, just use a.transpose() inside a product (that is only a minor issue here ...)
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
a.setRandom();
Eigen::VectorXf r = a.rowwise().sum(); // diagonal matrix of row sums of a
Eigen::VectorXf c = a.colwise().sum(); // diagonal matrix of col sums of a
Eigen::MatrixXf car = c.asDiagonal() * a.transpose() * r.asDiagonal();
Finally, of course make sure to compile with optimization enabled, and enable vectorization if available (with gcc or clang compile with -O2 -march=native).

Solving for Lx=b and Px=b when A=LLt

I am decomposing a sparse SPD matrix A using Eigen. It will either be a LLt or a LDLt deomposition (Cholesky), so we can assume the matrix will be decomposed as A = P-1 LDLt P where P is a permutation matrix, L is triangular lower and D diagonal (possibly identity). If I do
SolverClassName<SparseMatrix<double> > solver;
solver.compute(A);
To solve Lx=b then is it efficient to do the following?
solver.matrixL().TriangularView<Lower>().solve(b)
Similarly, to solve Px=b then is it efficient to do the following?
solver.permutationPinv()*b
I would like to do this in order to compute bt A-1 b efficiently and stably.
Have a look how _solve_impl is implemented for SimplicialCholesky. Essentially, you can simply write:
Eigen::VectorXd x = solver.permutationP()*b; // P not Pinv!
solver.matrixL().solveInPlace(x); // matrixL is already a triangularView
// depending on LLt or LDLt use either:
double res_llt = x.squaredNorm();
double res_ldlt = x.dot(solver.vectorD().asDiagonal().inverse()*x);
Note that you need to multiply by P and not Pinv, since the inverse of
A = P^-1 L D L^t P is
P^-1 L^-t D^-1 L^-1 P
because the order of matrices reverses when taking the inverse of a product.

Avoid numerical underflow when obtaining determinant of large matrix in Eigen

I have implemented a MCMC algorithm in C++ using the Eigen library. The main part of the algorithm is a loop in which first some some matrix calculations are performed after which the determinant of the resulting matrix is obtained and added to the output. E.g.:
MatrixXd delta0;
NumericVector out(3);
out[0] = 0;
out[1] = 0;
for (int i = 0; i < s; i++) {
...
delta0 = V*(A.cast<double>()-(A+B).cast<double>()*theta.asDiagonal());
...
I = delta0.determinant()
out[1] += I;
out[2] += std::sqrt(I);
}
return out;
Now on certain matrices I unfortunately observe a numerical underflow so that the determinant is outputted as zero (which it actually isn't).
How can I avoid this underflow?
One solution would be to obtain, instead of the determinant, the log of the determinant. However,
I do not know how to do this;
how could I then add up these logs?
Any help is greatly appreciated.
There are 2 main options that come to my mind:
The product of eigenvalues of square matrix is the determinant of this matrix, therefore a sum of logarithms of each eigenvalue is a logarithm of the determinant of this matrix. Assume det(A) = a and det(B) = b for compact notation. After applying aforementioned for 2 matrices A and B, we end up with log(a) and log(b), then actually the following is true:
log(a + b) = log(a) + log(1 + e ^ (log(b) - log(a)))
Yes, we get a logarithm of the sum. What would you do with it next? I don't know, depends on what you have to. If you have to remove logarithm by e ^ log(a + b) = a + b, then you might be lucky that the value of a + b does not underflow now, but in some cases it can still underflow as well.
Perform clever preconditioning; there might be tons of options here, and you better read about them from some trusted sources as this is a serious topic. The simplest (and probably the cheapest ever) example of preconditioning for this particular problem could be to recall that det(c * A) = (c ^ n) * det(A), where A is n by n matrix, and to premultiply your matrix with some c, compute the determinant, and then to divide it by c ^ n to get the actual one.
Update
I thought about one more option. If on the last stages of #1 or #2 you still experience underflow too frequently, then it might be a good idea to increase precision specifically for these last operations, for example, by utilizing GNU MPFR.
You can use Householder elimination to get the QR decomposition of delta0. Then the determinant of the Q part is +/-1 (depending on whether you did an even or odd number of reflections) and the determinant of the R part is the product of the diagonal elements. Both of these are easy to compute without running into underflow hell---and you might not even care about the first.

2D rigid body physics using runge kutta

Does anyone know any c++/opengl sourcecode demos for 2D rigid body physics using runge kutta?
I want to build a physics engine but I need some reference code to understand better how others have implemented this.
There are a lot of things you have to take care to do this nicely. I will focus on the integrator implementation and what I have found works good for me.
For all the degrees of freedom in your system implement a function to return the accelerations a as a function of time t, positions x and velocities v. This should operate on arrays or vectors of quantities and not just scalars.
a = accel(t,x,v);
After each RK step evaluate the acceleration to be ready for the next step. In the loop then do this:
{
// assume t,x[],v[], a[] are known
// step time t -> t+h and calc new values
float h2=h/2;
vec q1 = v + h2*a;
vec k1 = accel(t+h2, x+h2*v, q1);
vec q2 = v + h2*k1;
vec k2 = accel(t+h2, x+h2*q1, q2);
vec q3 = v + h*k2;
vec k3 = accel(t_h, x+h*q2, q3);
float h6 = h/6;
t = t + h;
x = x + h*(v+h6*(a+k1+k2));
v = v + h6*(a+2*k1+2*k2+k3);
a = accel(t,x,v);
}
Why? Well the standard RK method requires you to make a 2xN state vector, but the derivatives of the fist N elements are equal to the last N elements. If you split the problem up to two N state vectors and simplify a little you will arrive at the above scheme for 2nd order RK.
I have done this and the results are identical to commercial software for a plan system with N=6 degrees of freedom.