Element wise multiplication between matrices in BLAS? - c++

Im starting to use BLAS functions in c++ (specifically intel MKL) to create faster versions of some of my old Matlab code.
Its been working out well so far, but I cant figure out how to perform elementwise multiplication on 2 matrices (A .* B in Matlab).
I know gemv does something similar between a matrix and a vector, so should I just break one of my matrices into vectprs and call gemv repeatedly? I think this would work, but I feel like there should be aomething built in for this operation.

Use the Hadamard product. In MKL it's v?MUL. E.g. for doubles:
vdMul( n, a, b, y );
in Matlab notation it performs:
y[1:n] = a[1:n] .* b[1:n]
In your case you can treat matrices as vectors.

Related

How to implement scalar raised to the power of a matrix in Eigen?

I have the following code in MATLAB that I wish to port to C++, ideally with the Eigen library:
N(:,i)=2.^L(:,i)+1;
Where L is a symmetric matrix (1,2;2,1), and diagonal elements are all one.
In Eigen (unsupported) I note there is a function to calculate the exponential of a matrix, but none to raise an arbitrary scalar to a matrix power.
http://eigen.tuxfamily.org/dox-devel/unsupported/group__MatrixFunctions__Module.html#matrixbase_exp
Is there something I am missing?
If you really wanted to raise an arbitrary scalar to a matrix power, you should use the identity a^x = exp(log(a)*x).
However, the Matlab .^ operator computes an element-wise power. If you want the same in Eigen, use the corresponding Array functionality:
N.col(i) = pow(2.0, L.col(i).array()) + 1.0;
Beware that Eigen starts indexing at 0, and Matlab starts at 1, so you may need to replace i by i-1.

Eigen equivalent to Octave/MATLAB mldivide for rectangular matrices

I'm using Eigen v3.2.7.
I have a medium-sized rectangular matrix X (170x17) and row vector Y (170x1) and I'm trying to solve them using Eigen. Octave solves this problem fine using X\Y, but Eigen is returning incorrect values for these matrices (but not smaller ones) - however I suspect that it's how I'm using Eigen, rather than Eigen itself.
auto X = Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>{170, 17};
auto Y = Eigen::Matrix<T, Eigen::Dynamic, 1>{170};
// Assign their values...
const auto theta = X.colPivHouseholderQr().solve(Y).eval(); // Wrong!
According to the Eigen documentation, the ColPivHouseholderQR solver is for general matrices and pretty robust, but to make sure I've also tried the FullPivHouseholderQR. The results were identical.
Is there some special magic that Octave's mldivide does that I need to implement manually for Eigen?
Update
This spreadsheet has the two input matrices, plus Octave's and my result matrices.
Replacing auto doesn't make a difference, nor would I expect it to because construction cannot be a lazy operation, and I have to call .eval() on the solve result because the next thing I do with the result matrix is get at the raw data (using .data()) on tail and head operations. The expression template versions of the result of those block operations do not have a .data() member, so I have to force evaluation beforehand - in other words theta is the concrete type already, not an expression template.
The result for (X*theta-Y).norm()/Y.norm() is:
2.5365e-007
And the result for (X.transpose()*X*theta-X.transpose()*Y).norm() / (X.transpose()*Y).norm() is:
2.80096e-007
As I'm currently using single precision float for my basic numerical type, that's pretty much zero for both.
According to your verifications, the solution you get is perfectly fine. If you want more accuracy, then use double floating point numbers. Note that MatLab/Octave use double precision by default.
Moreover, it might also likely be that your problem is not full rank, in which case your problem admit an infinite number of solution. ColPivHouseholderQR picks one, somehow arbitrarily. On the other hand, mldivide will pick the minimal norm one that you can also obtain with Eigen::BDCSVD (Eigen 3.3), or the slower Eigen::JacobiSVD.

Is there a `numpy.minimum` equivalent in GSL?

I'm working on porting a complex data analysis routine I "prototyped" in Python to C++. I used Numpy extensively throughout the Python code. I'm looking at employing the GSL in the C++ port since it implements all of the various numerical routines I require (whereas Armadillo, Eigen, etc. only have a subset of what I need, though their APIs are closer to what I am looking for).
Is there an equivalent to numpy.minimum in the GSL (i.e., element-wise minimum of two matrices)? This is just one example of the abstractions from Numpy that I am looking for. Do things like this simply have to be reimplemented manually when using the GSL? I note that the GSL provides for things like:
double gsl_matrix_min (const gsl_matrix * m)
But that simply provides the minimum value of the entire matrix. Regardless of element-wise comparisons, it doesn't even seem possible to report the minimum along a particular axis of a single matrix using the GSL. That surprises me.
Are my expectations misplaced?
You can implement an element-wise minimum easily in Armadillo, via the find() and .elem() functions:
mat A; A.randu(5,5);
mat B; B.randu(5,5);
umat indices = find(B < A);
mat C = A;
C.elem(indices) = B.elem(indices);
For other functions that are not present in Armadillo, it might be possible to interface Armadillo matrices with GSL functions, through the .memptr() function.

How to implement Matlab's mldivide (a.k.a. the backslash operator "\")

I'm currently trying to develop a small matrix-oriented math library (I'm using Eigen 3 for matrix data structures and operations) and I wanted to implement some handy Matlab functions, such as the widely used backslash operator (which is equivalent to mldivide ) in order to compute the solution of linear systems (expressed in matrix form).
Is there any good detailed explanation on how this could be achieved ? (I've already implemented the Moore-Penrose pseudoinverse pinv function with a classical SVD decomposition, but I've read somewhere that A\b isn't always pinv(A)*b , at least Matalb doesn't simply do that)
Thanks
For x = A\b, the backslash operator encompasses a number of algorithms to handle different kinds of input matrices. So the matrix A is diagnosed and an execution path is selected according to its characteristics.
The following page describes in pseudo-code when A is a full matrix:
if size(A,1) == size(A,2) % A is square
if isequal(A,tril(A)) % A is lower triangular
x = A \ b; % This is a simple forward substitution on b
elseif isequal(A,triu(A)) % A is upper triangular
x = A \ b; % This is a simple backward substitution on b
else
if isequal(A,A') % A is symmetric
[R,p] = chol(A);
if (p == 0) % A is symmetric positive definite
x = R \ (R' \ b); % a forward and a backward substitution
return
end
end
[L,U,P] = lu(A); % general, square A
x = U \ (L \ (P*b)); % a forward and a backward substitution
end
else % A is rectangular
[Q,R] = qr(A);
x = R \ (Q' * b);
end
For non-square matrices, QR decomposition is used. For square triangular matrices, it performs a simple forward/backward substitution. For square symmetric positive-definite matrices, Cholesky decomposition is used. Otherwise LU decomposition is used for general square matrices.
Update: MathWorks has updated the algorithm section in the doc page of mldivide with some nice flow charts. See here and here (full and sparse cases).
All of these algorithms have corresponding methods in LAPACK, and in fact it's probably what MATLAB is doing (note that recent versions of MATLAB ship with the optimized Intel MKL implementation).
The reason for having different methods is that it tries to use the most specific algorithm to solve the system of equations that takes advantage of all the characteristics of the coefficient matrix (either because it would be faster or more numerically stable). So you could certainly use a general solver, but it wont be the most efficient.
In fact if you know what A is like beforehand, you could skip the extra testing process by calling linsolve and specifying the options directly.
if A is rectangular or singular, you could also use PINV to find a minimal norm least-squares solution (implemented using SVD decomposition):
x = pinv(A)*b
All of the above applies to dense matrices, sparse matrices are a whole different story. Usually iterative solvers are used in such cases. I believe MATLAB uses UMFPACK and other related libraries from the SuiteSpase package for direct solvers.
When working with sparse matrices, you can turn on diagnostic information and see the tests performed and algorithms chosen using spparms:
spparms('spumoni',2)
x = A\b;
What's more, the backslash operator also works on gpuArray's, in which case it relies on cuBLAS and MAGMA to execute on the GPU.
It is also implemented for distributed arrays which works in a distributed computing environment (work divided among a cluster of computers where each worker has only part of the array, possibly where the entire matrix cannot be stored in memory all at once). The underlying implementation is using ScaLAPACK.
That's a pretty tall order if you want to implement all of that yourself :)

CUBLAS - matrix addition.. how?

I am trying to use CUBLAS to sum two big matrices of unknown size. I need a fully optimized code (if possible) so I chose not to rewrite the matrix addition code (simple) but using CUBLAS, in particular the cublasSgemm function which allows to sum A and C (if B is a unit matrix): *C = alpha*op(A)*op(B)+beta*c*
The problem is: C and C++ store the matrices in row-major format, cublasSgemm is intended (for fortran compatibility) to work in column-major format. You can specify whether A and B are to be transposed first, but you can NOT indicate to transpose C. So I'm unable to complete my matrix addition..
I can't transpose the C matrix by myself because the matrix is something like 20000x20000 maximum size.
Any idea on how to solve please?
cublasgeam has been added to CUBLAS5.0.
It computes the weighted sum of 2 optionally transposed matrices
If you're just adding the matrices, it doesn't actually matter. You give it alpha, Aij, beta, and Cij. It thinks you're giving it alpha, Aji, beta, and Cji, and gives you what it thinks is Cji = beta Cji + alpha Aji. But that's the correct Cij as far as you're concerned. My worry is when you start going to things which do matter -- like matrix products. There, there's likely no working around it.
But more to the point, you don't want to be using GEMM to do matrix addition -- you're doing a completely pointless matrix multiplication (which takes takes ~20,0003 operations and many passes through memory) for an operatinon which should only require ~20,0002 operations and a single pass! Treat the matricies as 20,000^2-long vectors and use saxpy.
Matrix multiplication is memory-bandwidth intensive, so there is a huge (factors of 10x or 100x) difference in performance between coding it yourself and a tuned version. Ideally, you'd change structures in your code to match the library. If you can't, in this case you can manage just by using linear algebra identities. The C-vs-Fortran ordering means that when you pass in A, CUBLAS "sees" AT (A transpose). Which is fine, we can work around it. If what you want is C=A.B, pass in the matricies in the opposite order, B.A . Then the library sees (BT . AT), and calculates CT = (A.B)T; and then when it passes back CT, you get (in your ordering) C. Test it and see.