Templated function for sparse and dense matrices in RcppArmadillo - c++

I'm trying to define a templated function that can handle both sparse and dense matrix inputs using RcppArmadillo. I got the very simple case of sending a dense or sparse matrix to C++ and back to R to work like this:
library(inline); library(Rcpp); library(RcppArmadillo)
sourceCpp(code = "
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp ;
using namespace arma ;
template <typename T> T importexport_template(const T X) {
T ret = X ;
return ret ;
};
//[[Rcpp::export]]
SEXP importexport(SEXP X) {
return wrap( importexport_template(X) ) ;
}")
library(Matrix)
X <- diag(3)
X_sp <- as(X, "dgCMatrix")
importexport(X)
## [,1] [,2] [,3]
##[1,] 1 0 0
##[2,] 0 1 0
##[3,] 0 0 1
importexport(X_sp)
##3 x 3 sparse Matrix of class "dgCMatrix"
##
##[1,] 1 . .
##[2,] . 1 .
##[3,] . . 1
and I interpret that to mean that the templating basically works (i.e., a dense R-matrix gets turned into a arma::mat, while a sparse R-matrix gets turned into a arma::sp_mat-object by the implicit calls to Rcpp::as, and the corresponding impliict Rcpp:wraps then do the right thing as well and return dense for dense and sparse for sparse).
The actual function I try to write needs multiple arguments of course, and that's where I fail -- doing something like:
sourceCpp(code ="
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp ;
using namespace arma ;
template <typename T> T scalarmult_template(const T X, double scale) {
T ret = X * scale;
return ret;
};
//[[Rcpp::export]]
SEXP scalarmult(SEXP X, double scale) {
return wrap(scalarmult_template(X, scale) ) ;
}")
fails because the compiler doesn't know how to resolve * at compile time for SEXPREC* const.
So I guess I need something like the switch-statement in this Rcpp Gallery snippet to properly dispatch to specific template functions, but I don't know how to write that for types that seem more complicated than INTSXP etc.
I think I know how to access the type I would need for such a switch statement, e.g.:
sourceCpp(code ="
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp ;
using namespace arma ;
//[[Rcpp::export]]
SEXP printtype(SEXP Xr) {
Rcpp::Rcout << TYPEOF(Xr) << std::endl ;
return R_NilValue;
}")
printtype(X)
##14
##NULL
printtype(X_sp)
##25
##NULL
but I don't understand how to proceed from there. What would a version of scalarmult_template that works for sparse and dense matrices look like?

Answering my own question based on #KevinUshey's comment. I do matrix multiplication for 3 cases: dense-dense, sparse-dense, and "indMatrix"-dense:
library(inline)
library(Rcpp)
library(RcppArmadillo)
library(Matrix)
library(rbenchmark)
sourceCpp(code="
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp ;
using namespace arma ;
arma::mat matmult_sp(const arma::sp_mat X, const arma::mat Y){
arma::mat ret = X * Y;
return ret;
};
arma::mat matmult_dense(const arma::mat X, const arma::mat Y){
arma::mat ret = X * Y;
return ret;
};
arma::mat matmult_ind(const SEXP Xr, const arma::mat Y){
// pre-multplication with index matrix is a permutation of Y's rows:
S4 X(Xr);
arma::uvec perm = X.slot("perm");
arma::mat ret = Y.rows(perm - 1);
return ret;
};
//[[Rcpp::export]]
arma::mat matmult_cpp(SEXP Xr, const arma::mat Y) {
if (Rf_isS4(Xr)) {
if(Rf_inherits(Xr, "dgCMatrix")) {
return matmult_sp(as<arma::sp_mat>(Xr), Y) ;
} ;
if(Rf_inherits(Xr, "indMatrix")) {
return matmult_ind(Xr, Y) ;
} ;
stop("unknown class of Xr") ;
} else {
return matmult_dense(as<arma::mat>(Xr), Y) ;
}
}")
n <- 10000
d <- 20
p <- 30
X <- matrix(rnorm(n*d), n, d)
X_sp <- as(diag(n)[,1:d], "dgCMatrix")
X_ind <- as(sample(1:d, n, rep=TRUE), "indMatrix")
Y <- matrix(1:(d*p), d, p)
matmult_cpp(as(X_ind, "ngTMatrix"), Y)
## Error: unknown class of Xr
all.equal(X%*%Y, matmult_cpp(X, Y))
## [1] TRUE
all.equal(as.vector(X_sp%*%Y),
as.vector(matmult_cpp(X_sp, Y)))
## [1] TRUE
all.equal(X_ind%*%Y, matmult_cpp(X_ind, Y))
## [1] TRUE
EDIT: This has been turned into an Rcpp Gallery post.

Related

Eigenvector calculation in C++

How can I make a function in cpp in order to calculate the first "Q" eigenvectors of a matrix M?
I tried using this code, but failed.
#include <RcppArmadillo.h>
using namespace arma;
mat M;
int Q;
vec getEigen(M,Q) {
return eig_sym(M, Q);
}
The error message says:
"no matching function for call to "arma::col(arma::mat&, int&)"
Any idea? I am new at cpp and don't know what the message means.
Thanks
As noted in the comments, there is no function in Armadillo that returns a subset of the eigenvalues. However, one can combine .head() or .tail() with eigen_sym() to extract a subset. In addition, it makes sense to use reverse(), since Armadillo returns eigenvalues in ascending order. For convenience I am using RcppArmadillo with Rcpp attributes here:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::vec getEigen(const arma::mat& M, int Q) {
return arma::reverse(arma::eig_sym(M).tail(Q));
}
/*** R
set.seed(42)
N <- 10
m <- matrix(rnorm(N * N), N, N)
m <- m + t(m)
getEigen(m, N/2)
*/
Output upon calling Rcpp::sourceCpp on the file:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::vec getEigen(const arma::mat& M, int Q) {
return arma::reverse(arma::eig_sym(M).tail(Q));
}
/*** R
set.seed(42)
N <- 10
m <- matrix(rnorm(N * N), N, N)
m <- m + t(m)
getEigen(m, N/2)
*/
This is only for the eigenvalues, not the eigenvectors. Extracting the Eigenvectors shouldn't be difficult, though.

Extract elements from a matrix based on the row and column indices with Armadillo

In R, I could extract matrix elements based on their indices as follow
> m <- matrix(1:6, nrow = 3)
> m
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> row_index <- c(1, 2)
> col_index <- c(2, 2)
> m[cbind(row_index, col_index)]
[1] 4 5
Is there a native way to do this is Armadillo / Rcpp::Armadillo? The best I could do is a custom function that uses the row and column indices to calculate the element index (see below). I'm mostly worried that custom function won't perform as well.
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
NumericVector Rsubmatrix(arma::uvec rowInd, arma::uvec colInd, arma::mat m) {
arma::uvec ind = (colInd - 1) * m.n_rows + (rowInd - 1);
arma::vec ret = m.elem(ind);
return wrap(ret);
}
/*** R
Rsubmatrix(row_index, col_index, m)
/
From the docs:
X.submat( vector_of_row_indices, vector_of_column_indices )
but that seems to only return matrix blocks. For non-simply-connected regions, I think your solution is the best, but you don't really need a function,
m.elem((colInd - 1) * m.n_rows + (rowInd - 1));
returns the vector without any problem. For clarity you could define a function to deal with the row+col to indices conversion,
inline arma::uvec arr2ind(arma::uvec c, arma::uvec r, int nrow)
{
return c * nrow + r;
}
// m.elem(arr2ind(colInd - 1, rowInd - 1, m.n_rows));
Let's try this...
In particular, you can subset by rowInd and colInd through writing your own loop to use the .(i,j) subset operator. Otherwise, the only other option is the solution that you proposed to start the question off...
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// Optimized OP method
// [[Rcpp::export]]
arma::vec Rsubmatrix(const arma::mat& m, const arma::uvec& rowInd, const arma::uvec& colInd) {
return m.elem((colInd - 1) * m.n_rows + (rowInd - 1));
}
// Proposed Alternative
// [[Rcpp::export]]
arma::rowvec get_elements(const arma::mat& m, const arma::uvec& rowInd, const arma::uvec& colInd){
unsigned int n = rowInd.n_elem;
arma::rowvec out(n);
for(unsigned int i = 0; i < n; i++){
out(i) = m(rowInd[i]-1,colInd[i]-1);
}
return out;
}
Where:
m <- matrix(1:6, nrow = 3)
row_index <- c(1, 2)
col_index <- c(2, 2)
m[cbind(row_index, col_index)]
Gives:
[1] 4 5
And we have:
get_elements(m, row_index, col_index)
Giving:
[,1] [,2]
[1,] 4 5
Edit
Microbenchmark:
microbenchmark(Rsubmatrix(m, row_index, col_index), get_elements(m, row_index, col_index), times = 1e4)
Gives:
Unit: microseconds
expr min lq mean median uq max neval
Rsubmatrix(m, row_index, col_index) 2.836 3.111 4.129051 3.281 3.502 5016.652 10000
get_elements(m, row_index, col_index) 2.699 2.947 3.436844 3.115 3.335 716.742 10000
The methods are both close time wise. Note that the later should be better as it avoids having two separate loops (1. to calculate & 2. to subset) and an additional temporary vector created to store the results.
Edit
Per armadillo 7.200.0 release, the sub2ind() function has received the ability to take matrix notation. This function takes a matrix subscript via a 2 x n matrix, where n denotes the number of elements to subset, and converts them into element notation.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::rowvec matrix_locs(arma::mat M, arma::umat locs) {
arma::uvec eids = sub2ind( size(M), locs ); // Obtain Element IDs
arma::vec v = M.elem( eids ); // Values of the Elements
return v.t(); // Transpose to mimic R
}
Calling in R:
cpp_locs <- locs - 1 # Shift indices from R to C++
(cpp_locs <- t(cpp_locs)) # Transpose matrix for 2 x n form
matrix_locs(M, cpp_locs) # Subset the matrix

Matrix multiplication in Rcpp

First of all, I am a novice user so forget my general ignorance. I am looking for a faster alternative to the %*% operator in R. Even though older posts suggest the use of RcppArmadillo, I have tried for 2 hours to make RcppArmadillo work without success. I always run into lexical issues that yield 'unexpected ...' errors. I have found the following function in Rcpp which I do can make work:
library(Rcpp)
func <- '
NumericMatrix mmult( NumericMatrix m , NumericMatrix v, bool byrow=true )
{
if( ! m.nrow() == v.nrow() ) stop("Non-conformable arrays") ;
if( ! m.ncol() == v.ncol() ) stop("Non-conformable arrays") ;
NumericMatrix out(m) ;
for (int i = 0; i < m.nrow(); i++)
{
for (int j = 0; j < m.ncol(); j++)
{
out(i,j)=m(i,j) * v(i,j) ;
}
}
return out ;
}
'
This function, however, performs element-wise multiplication and does not behave as %*%. Is there an easy way to modify the above code to achieve the intended result?
EDIT:
I have come up with a function using RcppEigen that seems to beat %*%:
etest <- cxxfunction(signature(tm="NumericMatrix",
tm2="NumericMatrix"),
plugin="RcppEigen",
body="
NumericMatrix tm22(tm2);
NumericMatrix tmm(tm);
const Eigen::Map<Eigen::MatrixXd> ttm(as<Eigen::Map<Eigen::MatrixXd> >(tmm));
const Eigen::Map<Eigen::MatrixXd> ttm2(as<Eigen::Map<Eigen::MatrixXd> >(tm22));
Eigen::MatrixXd prod = ttm*ttm2;
return(wrap(prod));
")
set.seed(123)
M1 <- matrix(sample(1e3),ncol=50)
M2 <- matrix(sample(1e3),nrow=50)
identical(etest(M1,M2), M1 %*% M2)
[1] TRUE
res <- microbenchmark(
+ etest(M1,M2),
+ M1 %*% M2,
+ times=10000L)
res
Unit: microseconds
expr min lq mean median uq max neval
etest(M1, M2) 5.709 6.61 7.414607 6.611 7.211 49.879 10000
M1 %*% M2 11.718 12.32 13.505272 12.621 13.221 58.592 10000
There are good reasons to rely on existing libraries / packages for standard tasks. The routines in the libraries are
optimized
thoroughly tested
a good means to keep the code compact, human-readable, and easy to maintain.
Therefore I think that using RcppArmadillo or RcppEigen should be preferable here. However, to answer your question, below is a possible Rcpp code to perform a matrix multiplication:
library(Rcpp)
cppFunction('NumericMatrix mmult(const NumericMatrix& m1, const NumericMatrix& m2){
if (m1.ncol() != m2.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(m1.nrow(),m2.ncol());
NumericVector rm1, cm2;
for (size_t i = 0; i < m1.nrow(); ++i) {
rm1 = m1(i,_);
for (size_t j = 0; j < m2.ncol(); ++j) {
cm2 = m2(_,j);
out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
}
}
return out;
}')
Let's test it:
A <- matrix(c(1:6),ncol=2)
B <- matrix(c(0:7),nrow=2)
mmult(A,B)
# [,1] [,2] [,3] [,4]
#[1,] 4 14 24 34
#[2,] 5 19 33 47
#[3,] 6 24 42 60
identical(mmult(A,B), A %*% B)
#[1] TRUE
Hope this helps.
As benchmark tests show, the above Rcpp code is slower than R's built-in %*% operator. I assume that, while my Rcpp code can certainly be improved, it will be hard to beat the optimized code behind %*% in terms of performance:
library(microbenchmark)
set.seed(123)
M1 <- matrix(rnorm(1e4),ncol=100)
M2 <- matrix(rnorm(1e4),nrow=100)
identical(M1 %*% M2, mmult(M1,M2))
#[1] TRUE
res <- microbenchmark(
mmult(M1,M2),
M1 %*% M2,
times=1000L)
#> res
#Unit: microseconds
# expr min lq mean median uq max neval cld
# mmult(M1, M2) 1466.855 1484.8535 1584.9509 1494.0655 1517.5105 2699.643 1000 b
# M1 %*% M2 602.053 617.9685 687.6863 621.4335 633.7675 2774.954 1000 a
I would encourage to try to work out your issues with RcppArmadillo. Using it is as simple as this example also created by calling RcppArmadillo.package.skeleton():
// another simple example: outer product of a vector,
// returning a matrix
//
// [[Rcpp::export]]
arma::mat rcpparma_outerproduct(const arma::colvec & x) {
arma::mat m = x * x.t();
return m;
}
// and the inner product returns a scalar
//
// [[Rcpp::export]]
double rcpparma_innerproduct(const arma::colvec & x) {
double v = arma::as_scalar(x.t() * x);
return v;
}
There is actually more code in the example but this should give you an idea.
The following approach can also be used :
NumericMatrix mmult(NumericMatrix m, NumericMatrix v)
{
Environment base("package:base");
Function mat_Mult = base["%*%"];
return(mat_Mult(m, v));
}
With this approach, we use the operator %*% of R.

Trying to write a setdiff() function using RcppArmadillo gives compilation error

I'm trying to write a sort of analogue of R's setdiff() function in C++ using RcppArmadillo. My rather crude approach:
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec x, arma::uvec y){
// Coefficientes of unsigned integer vector y form a subset of the coefficients of unsigned integer vector x.
// Returns set difference between the coefficients of x and those of y
int n2 = y.n_elem;
uword q1;
for (int j=0 ; j<n2 ; j++){
q1 = find(x==y[j]);
x.shed_row(q1);
}
return x;
}
fails at compilation time. The error reads:
fnsauxarma.cpp:622:29: error: no matching function for call to ‘arma::Col<double>::shed_row(const arma::mtOp<unsigned int, arma::mtOp<unsigned int, arma::Col<double>, arma::op_rel_eq>, arma::op_find>)’
I really have no idea what's going on, any help or comments would be greatly appreciated.
The problem is that arma::find returns a uvec, and doesn't know how to make the implicit conversion to arma::uword, as pointed out by #mtall. You can help the compiler out by using the templated arma::conv_to<T>::from() function. Also, I included another version of my_setdiff that returns an Rcpp::NumericVector because although the first version returns the correct values, it's technically a matrix (i.e. it has dimensions), and I assume you would want this to be as compatible with R's setdiff as possible. This is accomplished by setting the dim attribute of the return vector to NULL, using R_NilValue and the Rcpp::attr member function.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector my_setdiff2(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:8
y <- 2:6
R> all.equal(setdiff(x,y), my_setdiff(x,y))
#[1] "Attributes: < target is NULL, current is list >" "target is numeric, current is matrix"
R> all.equal(setdiff(x,y), my_setdiff2(x,y))
#[1] TRUE
R> setdiff(x,y)
#[1] 1 7 8
R> my_setdiff(x,y)
# [,1]
# [1,] 1
# [2,] 7
# [3,] 8
R> my_setdiff2(x,y)
#[1] 1 7 8
*/
Edit:
For the sake of completeness, here is a more robust version of setdiff than the two implementations presented above:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::NumericVector arma_setdiff(arma::uvec& x, arma::uvec& y){
x = arma::unique(x);
y = arma::unique(y);
for (size_t j = 0; j < y.n_elem; j++) {
arma::uvec q1 = arma::find(x == y[j]);
if (!q1.empty()) {
x.shed_row(q1(0));
}
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:10
y <- 2:8
R> all.equal(setdiff(x,y), arma_setdiff(x,y))
#[1] TRUE
X <- 1:6
Y <- c(2,2,3)
R> all.equal(setdiff(X,Y), arma_setdiff(X,Y))
#[1] TRUE
*/
The previous versions would throw an error if you passed them vectors with non-unique elements, e.g.
R> my_setdiff2(X,Y)
error: conv_to(): given object doesn't have exactly one element
To solve the problem and more closely mirror R's setdiff, we just make x and y unique. Additionally, I switched out the arma::conv_to<>::from with q1(0) (where q1 is now a uvec instead of a uword), because uvec's are just a vector of uwords, and the explicit cast seemed a little inelegant.
I've used std::set_difference from the STL instead, converting back and forth from arma::uvec.
#include <RcppArmadillo.h>
#include <algorithm>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec std_setdiff(arma::uvec& x, arma::uvec& y) {
std::vector<int> a = arma::conv_to< std::vector<int> >::from(arma::sort(x));
std::vector<int> b = arma::conv_to< std::vector<int> >::from(arma::sort(y));
std::vector<int> out;
std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
std::inserter(out, out.end()));
return arma::conv_to<arma::uvec>::from(out);
}
Edit: I thought a performance comparison might be in order. The difference becomes smaller when the relative sizes of the sets are in the opposite order.
a <- sample.int(350)
b <- sample.int(150)
microbenchmark::microbenchmark(std_setdiff(a, b), arma_setdiff(a, b))
> Unit: microseconds
> expr min lq mean median uq max neval cld
> std_setdiff(a, b) 11.548 14.7545 17.29930 17.107 19.245 36.779 100 a
> arma_setdiff(a, b) 60.727 65.0040 71.77804 66.714 72.702 138.133 100 b
The Questioner might have already got the answer. However, the following template version may be more general. This is equivalent to setdiff function in Matlab
If P and Q are two sets, then their difference is given by P - Q or Q - P. If P = {1, 2, 3, 4} and Q = {4, 5, 6}, P - Q means elements of P which are not in Q. i.e., in the above example P - Q = {1, 2, 3}.
/* setdiff(t1, t2) is similar to setdiff() function in MATLAB. It removes the common elements and
gives the uncommon elements in the vectors t1 and t2. */
template <typename T>
T setdiff(T t1, T t2)
{
int size_of_t1 = size(t1);
int size_of_t2 = size(t2);
T Intersection_Elements;
uvec iA, iB;
intersect(Intersection_Elements, iA, iB, t1, t2);
for (int i = 0; i < size(iA); i++)
{
t1(iA(i)) = 0;
}
for (int i = 0; i < size(iB); i++)
{
t2(iB(i)) = 0;
}
T t1_t2_vec(size_of_t1 + size_of_t2);
t1_t2_vec = join_vert(t1, t2);
T DiffVec = nonzeros(t1_t2_vec);
return DiffVec;
}
Any suggestions for improving the performance of the algorithm are welcome.

Elementwise matrix multiplication: R versus Rcpp (How to speed this code up?)

I am new to C++ programming (using Rcpp for seamless integration into R), and I would appreciate some advice on how to speed up some calculations.
Consider the following example:
testmat <- matrix(1:9, nrow=3)
testvec <- 1:3
testmat*testvec
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 4 10 16
#[3,] 9 18 27
Here, R recycled testvec so that, loosely speaking, testvec "became" a matrix of the same dimensions as testmat for the purpose of this multiplication. Then the Hadamard product is returned. I wish to implement this behavior using Rcpp, that is I want that each element of the i-th row in the matrix testmat is multiplied with the i-th element of the vector testvec. My benchmarks tell me that my implementations are extremely slow, and I would appreciate advise on how to speed this up. Here my code:
First, using Eigen:
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
using namespace Eigen;
// [[Rcpp::export]]
NumericMatrix E_matvecprod_elwise(NumericMatrix Xs, NumericVector ys){
Map<MatrixXd> X(as<Map<MatrixXd> >(Xs));
Map<VectorXd> y(as<Map<VectorXd> >(ys));
int k = X.cols();
int n = X.rows();
MatrixXd Y(n,k) ;
// here, I emulate R's recycling. I did not find an easier way of doing this. Any hint appreciated.
for(int i = 0; i < k; ++i) {
Y.col(i) = y;
}
MatrixXd out = X.cwiseProduct(Y);
return wrap(out);
}
Here my implementation using Armadillo (adjusted to follow Dirk's example, see answer below):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat A_matvecprod_elwise(const arma::mat & X, const arma::vec & y){
int k = X.n_cols ;
arma::mat Y = repmat(y, 1, k) ; //
arma::mat out = X % Y;
return out;
}
Benchmarking these solutions using R, Eigen or Armadillo shows that both Eigen and Armadillo are about 2 times slower than R. Is there a way to speed these computations up or to get at least as fast as R? Are there more elegant ways of setting this up? Any advise is appreciated and welcome. (I also encourage tangential remarks about programming style in general as I am new to Rcpp / C++.)
Here some reproducable benchmarks:
# for comparison, define R function:
R_matvecprod_elwise <- function(mat, vec) mat*vec
n <- 50000
k <- 50
X <- matrix(rnorm(n*k), nrow=n)
e <- rnorm(n)
benchmark(R_matvecprod_elwise(X, e), A2_matvecprod_elwise(X, e), E_matvecprod_elwise(X,e),
columns = c("test", "replications", "elapsed", "relative"), order = "relative", replications = 1000)
This yields
test replications elapsed relative
1 R_matvecprod_elwise(X, e) 1000 10.89 1.000
2 A_matvecprod_elwise(X, e) 1000 26.87 2.467
3 E_matvecprod_elwise(X, e) 1000 27.73 2.546
As you can see, my Rcpp-solutions perform quite miserably. Any way to do it better?
If you want to speed up your calculations you will have to be a little careful about not making copies. This usually means sacrificing readability. Here is a version which makes no copies and modifies matrix X inplace.
// [[Rcpp::export]]
NumericMatrix Rcpp_matvecprod_elwise(NumericMatrix & X, NumericVector & y){
unsigned int ncol = X.ncol();
unsigned int nrow = X.nrow();
int counter = 0;
for (unsigned int j=0; j<ncol; j++) {
for (unsigned int i=0; i<nrow; i++) {
X[counter++] *= y[i];
}
}
return X;
}
Here is what I get on my machine
> library(microbenchmark)
> microbenchmark(R=R_matvecprod_elwise(X, e), Arma=A_matvecprod_elwise(X, e), Rcpp=Rcpp_matvecprod_elwise(X, e))
Unit: milliseconds
expr min lq median uq max neval
R 8.262845 9.386214 10.542599 11.53498 12.77650 100
Arma 18.852685 19.872929 22.782958 26.35522 83.93213 100
Rcpp 6.391219 6.640780 6.940111 7.32773 7.72021 100
> all.equal(R_matvecprod_elwise(X, e), Rcpp_matvecprod_elwise(X, e))
[1] TRUE
For starters, I'd write the Armadillo version (interface) as
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arama::mat A_matvecprod_elwise(const arma::mat & X, const arma::vec & y){
int k = X.n_cols ;
arma::mat Y = repmat(y, 1, k) ; //
arma::mat out = X % Y;
return out;
}
as you're doing an additional conversion in and out (though the wrap() gets added by the glue code). The const & is notional (as you learned via your last question, a SEXP is a pointer object that is lightweight to copy) but better style.
You didn't show your benchmark results so I can't comment on the effect of matrix size etc pp. I suspect you might get better answers on rcpp-devel than here. Your pick.
Edit: If you really want something cheap and fast, I would just do this:
// [[Rcpp::export]]
mat cheapHadamard(mat X, vec y) {
// should row dim of X versus length of Y here
for (unsigned int i=0; i<y.n_elem; i++) X.row(i) *= y(i);
return X;
}
which allocates no new memory and will hence be faster, and probably be competitive with R.
Test output:
R> cheapHadamard(testmat, testvec)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 4 10 16
[3,] 9 18 27
R>
My apologies for giving an essentially C answer to a C++ question, but as has been suggested the solution generally lies in the efficient BLAS implementation of things. Unfortunately, BLAS itself lacks a Hadamard multiply so you would have to implement your own.
Here is a pure Rcpp implementation that basically calls C code. If you want to make it proper C++, the worker function can be templated but for most applications using R that isn't a concern. Note that this also operates "in-place", which means that it modifies X without copying it.
// it may be necessary on your system to uncomment one of the following
//#define restrict __restrict__ // gcc/clang
//#define restrict __restrict // MS Visual Studio
//#define restrict // remove it completely
#include <Rcpp.h>
using namespace Rcpp;
#include <cstdlib>
using std::size_t;
void hadamardMultiplyMatrixByVectorInPlace(double* restrict x,
size_t numRows, size_t numCols,
const double* restrict y)
{
if (numRows == 0 || numCols == 0) return;
for (size_t col = 0; col < numCols; ++col) {
double* restrict x_col = x + col * numRows;
for (size_t row = 0; row < numRows; ++row) {
x_col[row] *= y[row];
}
}
}
// [[Rcpp::export]]
NumericMatrix C_matvecprod_elwise_inplace(NumericMatrix& X,
const NumericVector& y)
{
// do some dimension checking here
hadamardMultiplyMatrixByVectorInPlace(X.begin(), X.nrow(), X.ncol(),
y.begin());
return X;
}
Here is a version that makes a copy first. I don't know Rcpp well enough to do this natively and not incur a substantial performance hit. Creating and returning a NumericMatrix(numRows, numCols) on the stack causes the code to run about 30% slower.
#include <Rcpp.h>
using namespace Rcpp;
#include <cstdlib>
using std::size_t;
#include <R.h>
#include <Rdefines.h>
void hadamardMultiplyMatrixByVector(const double* restrict x,
size_t numRows, size_t numCols,
const double* restrict y,
double* restrict z)
{
if (numRows == 0 || numCols == 0) return;
for (size_t col = 0; col < numCols; ++col) {
const double* restrict x_col = x + col * numRows;
double* restrict z_col = z + col * numRows;
for (size_t row = 0; row < numRows; ++row) {
z_col[row] = x_col[row] * y[row];
}
}
}
// [[Rcpp::export]]
SEXP C_matvecprod_elwise(const NumericMatrix& X, const NumericVector& y)
{
size_t numRows = X.nrow();
size_t numCols = X.ncol();
// do some dimension checking here
SEXP Z = PROTECT(Rf_allocVector(REALSXP, (int) (numRows * numCols)));
SEXP dimsExpr = PROTECT(Rf_allocVector(INTSXP, 2));
int* dims = INTEGER(dimsExpr);
dims[0] = (int) numRows;
dims[1] = (int) numCols;
Rf_setAttrib(Z, R_DimSymbol, dimsExpr);
hadamardMultiplyMatrixByVector(X.begin(), X.nrow(), X.ncol(), y.begin(), REAL(Z));
UNPROTECT(2);
return Z;
}
If you're curious about usage of restrict, it means that you as the programmer enter a contract with the compiler that different bits of memory do not overlap, allowing the compiler to make certain optimizations. The restrict keyword is part of C++11 (and C99), but many compilers added extensions to C++ for earlier standards.
Some R code to benchmark:
require(rbenchmark)
n <- 50000
k <- 50
X <- matrix(rnorm(n*k), nrow=n)
e <- rnorm(n)
R_matvecprod_elwise <- function(mat, vec) mat*vec
all.equal(R_matvecprod_elwise(X, e), C_matvecprod_elwise(X, e))
X_dup <- X + 0
all.equal(R_matvecprod_elwise(X, e), C_matvecprod_elwise_inplace(X_dup, e))
benchmark(R_matvecprod_elwise(X, e),
C_matvecprod_elwise(X, e),
C_matvecprod_elwise_inplace(X, e),
columns = c("test", "replications", "elapsed", "relative"),
order = "relative", replications = 1000)
And the results:
test replications elapsed relative
3 C_matvecprod_elwise_inplace(X, e) 1000 3.317 1.000
2 C_matvecprod_elwise(X, e) 1000 7.174 2.163
1 R_matvecprod_elwise(X, e) 1000 10.670 3.217
Finally, the in-place version may actually be faster, as the repeated multiplications into the same matrix can cause some overflow mayhem.
Edit:
Removed the loop unrolling, as it provided no benefit and was otherwise distracting.