Best way to broadcast Armadillo matrix operations similar to Numpy - c++

Consider the matrices A and B where A is a 5x5 matrix and B is a 1x5 matrix (or a row vector). If I try to do A + B in Numpy, its broadcasting capabilities will implicitly create a 5x5 matrix where each row has the values of B and then do normal matrix addition between those two matrices. This can be written in Armadillo like this;
mat A = randu<mat>(4,5);
mat B = randu<mat>(1,5);
A + B;
But this fails. And I have looked at the documentation and couldn't find a built-in way to do broadcasting. So I want to know the best (fastest) way to do an operation similar to the above.
Of course, I could manually resize the smaller matrix into the size of the larger, and copy the first-row value to each other row using a for loop and use the overloaded + operator in Armadillo. But, I'm hoping that there is a more efficient method to achieve this. Any help would be appreciated!

Expanding on the note from Claes Rolen. Broadcasting for matrices in Armadillo is done using .each_col() and .each_row(). Broadcasting for cubes is done with .each_slice().
mat A(4, 5, fill::randu);
colvec V(4, fill::randu);
rowvec R(5, fill::randu);
mat X = A.each_col() + V; // or A.each_col() += V for in-place operation
mat Y = A.each_row() + R; // or A.each_row() += R for in-place operation
cube C(4, 5, 2, fill::randu);
cube D = C.each_slice() + A; // or C.each_slice() += A for in-place operation

Related

Matrix multiplication optimization

I am performing a series of matrix multiplications with fairly large matrices. To run through all of these operations takes a long time, and I need my program to do this in a large loop. I was wondering if anyone has any ideas to speed this up? I just started using Eigen, so I have very limited knowledge.
I was using ROOT-cern's built in TMatrix class, but the speed for performing the matrix operations is very poor. I set up some diagonal matrices using Eigen with the hope that it handled the multiplication operation in a more optimal way. It may, but I cannot really see the performance difference.
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
Eigen::MatrixXf r(2*size,2*size); // diagonal matrix of row sums of a
// fill matrix r
Eigen::MatrixXf c(size,size); // diagonal matrix of col sums of a
// fill matrix c
// transpose a in place
a.transposeInPlace();
Eigen::MatrixXf c_dia;
c_dia = c.diagonal().asDiagonal();
Eigen::MatrixXf r_dia;
r_dia = r.diagonal().asDiagonal();
// calc car
Eigen::MatrixXf car;
car = c_dia*a*r_dia;
You are doing way too much work here. If you have diagonal matrices, only store the diagonal (and directly use that for products). Once you store a diagonal matrix in a square matrix, the information of the structure is lost to Eigen.
Also, you don't need to store the transposed variant of a, just use a.transpose() inside a product (that is only a minor issue here ...)
// setup matrices
int size = 8000;
Eigen::MatrixXf a(size*2,size);
// fill matrix a....
a.setRandom();
Eigen::VectorXf r = a.rowwise().sum(); // diagonal matrix of row sums of a
Eigen::VectorXf c = a.colwise().sum(); // diagonal matrix of col sums of a
Eigen::MatrixXf car = c.asDiagonal() * a.transpose() * r.asDiagonal();
Finally, of course make sure to compile with optimization enabled, and enable vectorization if available (with gcc or clang compile with -O2 -march=native).

How could I subtract a 1xN eigen matrix from a MxN matrix, like numpy does?

I could not summarize a 1xN matrix from a MxN matrix like I do in numpy.
I create a matrix of np.arange(9).reshape(3,3) with eigen like this:
int buf[9];
for (int i{0}; i < 9; ++i) {
buf[i] = i;
}
m = Map<MatrixXi>(buf, 3,3);
Then I compute mean along row direction:
m2 = m.rowwise().mean();
I would like to broadcast m2 to 3x3 matrix, and subtract it from m, how could I do this?
There is no numpy-like broadcasting available in Eigen, what you can do is reuse the same pattern that you used:
m.colwise() -= m2
(See Eigen tutorial on this)
N.B.: m2 needs to be a vector, not a matrix. Also the more fixed the dimensions, the better the compiler can generate efficient code.
You need to use appropriate types for your values, MatrixXi lacks the vector operations (such as broadcasting). You also seem to have the bad habit of declaring your variables well before you initialise them. Don't.
This should work
std::array<int, 9> buf;
std::iota(buf.begin(), buf.end(), 0);
auto m = Map<Matrix3i>(buf.data());
auto v = m.rowwise().mean();
auto result = m.colwise() - v;
While the .colwise() method already suggested should be preferred in this case, it is actually also possible to broadcast a vector to multiple columns using the replicate method.
m -= m2.replicate<1,3>();
// or
m -= m2.rowwise().replicate<3>();
If 3 is not known at compile time, you can write
m -= m2.rowwise().replicate(m.cols());

Regular Multiplication of different shaped Eigen Matrices

I have an Nx3 Eigen matrix.
I have an Nx1 Egein marix.
I'm trying to get the coefficient multiplication of each row in the Nx3 by the corresponding scal in the Nx1 so I can scale a bunch of 3d vectors.
I'm sure I'm overlooking something obvious but I can't get it to work.
#include <Eigen/Dense>
MatrixXf m(4, 3);
m << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12;
MatrixXf dots(4, 1)
dots << 2,2,2,2;
I want to resulting matrix to be Nx3 like so:
2,4,6
8,10,12,
14,16,18,
20,22,24
You can use broadcasting:
m = m.colwise().cwiseProduct(dots);
or observe that all you want to do is to apply a non uniform scaling:
m = dots.asDiagonal() * m;
Both expressions will generate similar code.
Okay, so I got something working. I'm probably doing something wrong but this worked for me so I thought I would share. I wrote my first line of c++ a week ago so I figure I deserve some grace. Anyone with a better solution is encouraged to post.
// scalar/coefficient multiplication (not matrix) on Nx3 x N. For multiplying dot products by vectors
void N3xNcoefIP(MatrixXf &A, MatrixXf &B) {
A.array() *= B.replicate(1, A.size()).array();
}

Matrix multiplication very slow in Eigen

I have implemented a Gauss-Newton optimization process which involves calculating the increment by solving a linearized system Hx = b. The H matrx is calculated by H = J.transpose() * W * J and b is calculated from b = J.transpose() * (W * e) where e is the error vector. Jacobian here is a n-by-6 matrix where n is in thousands and stays unchanged across iterations and W is a n-by-n diagonal weight matrix which will change across iterations (some diagonal elements will be set to zero). However I encountered a speed issue.
When I do not add the weight matrix W, namely H = J.transpose()*J and b = J.transpose()*e, my Gauss-Newton process can run very fast in 0.02 sec for 30 iterations. However when I add the W matrix which is defined outside the iteration loop, it becomes so slow (0.3~0.7 sec for 30 iterations) and I don't understand if it is my coding problem or it normally takes this long.
Everything here are Eigen matrices and vectors.
I defined my W matrix using .asDiagonal() function in Eigen library from a vector of inverse variances. then just used it in the calculation for H ad b. Then it gets very slow. I wish to get some hints about the potential reasons for this huge slowdown.
EDIT:
There are only two matrices. Jacobian is definitely dense. Weight matrix is generated from a vector by the function vec.asDiagonal() which comes from the dense library so I assume it is also dense.
The code is really simple and the only difference that's causing the time change is the addition of the weight matrix. Here is a code snippet:
for (int iter=0; iter<max_iter; ++iter) {
// obtain error vector
error = ...
// calculate H and b - the fast one
Eigen::MatrixXf H = J.transpose() * J;
Eigen::VectorXf b = J.transpose() * error;
// calculate H and b - the slow one
Eigen::MatrixXf H = J.transpose() * weight_ * J;
Eigen::VectorXf b = J.transpose() * (weight_ * error);
// obtain delta and update state
del = H.ldlt().solve(b);
T <- T(del) // this is pseudo code, meaning update T with del
}
It is in a function in a class, and weight matrix now for debug purposes is defined as a class variable that can be accessed by the function and is defined before the function is called.
I guess that weight_ is declared as a dense MatrixXf? If so, then replace it by w.asDiagonal() everywhere you use weight_, or make the later an alias to the asDiagonal expression:
auto weight = w.asDiagonal();
This way Eigen will knows that weight is a diagonal matrix and computations will be optimized as expected.
Because the matrix multiplication is just the diagonal, you can change it to use coefficient wise multiplication like so:
MatrixXd m;
VectorXd w;
w.setLinSpaced(5, 2, 6);
m.setOnes(5,5);
std::cout << (m.array().rowwise() * w.array().transpose()).matrix() << "\n";
Likewise, the matrix vector product can be written as:
(w.array() * error.array()).matrix()
This avoids the zero elements in the matrix. Without an MCVE for me to base this on, YMMV...

OpenCV - Directly copying matrix multiplication result to a subset of another matrix

I try to directly copying matrix multiplication result to a subset of another matrix:
cv::Mat a,b,c;
//fill matrices a and b and set matrix c to correct size
cv::Mat ab=a*b;
ab.copyTo(c(cv::Rect(0,0,3,3)));
isn' it possible to directly copy the result to matrix c like e.g. (I know this doesn't work):
(a*b).copyTo(c(cv::Rect(0,0,3,3)));
//or
c(cv::Rect(0,0,3,3)).setTo(a*b);
Wouldn't it be more efficient?
Try this:
cv::Mat subC = c(cv::Rect(0,0,3,3));
subC = a*b;
No copying here.
Or more succinctly:
c(cv::Rect(0,0,3,3)) = a*b;