Multiplying complex matrices in R using C++ - c++

Suppose that A is a complex matrix. I am interested in computing the product A%*%Conj(t(A)) in R efficiently. As far as I understand, using C++ would speed up things significantly, so that is what I am trying to do.
I have the following code for real matrices that I can use in R.
library(Rcpp);
library(inline);
library(RcppEigen);
crossprodCpp <- '
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::Lower;
const Map<MatrixXd> A(as<Map<MatrixXd> >(AA));
const int m(A.rows());
MatrixXd AAt(MatrixXd(m, m).setZero().selfadjointView<Lower>().rankUpdate(A));
return wrap(AAt);
'
fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen")
A<-matrix(rnorm(100^2),100)
all.equal(fcprd(A),tcrossprod(A))
fcprd(A) runs much faster on my laptop than tcrossprod(A). This is what I get for A<-matrix(rnorm(1000^2),1000):
microbenchmark::microbenchmark('tcrossprod(A)'=tcrossprod(A),'A%*%t(A)'=A%*%t(A),fcprd=fcprd(A))
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(A) 428.06452 435.9700 468.9323 448.8168 504.2628 618.7681 100
A%*%t(A) 722.24053 736.6197 775.4814 767.7668 809.8356 903.8592 100
fcprd 95.04678 100.0733 111.5021 103.6616 107.2551 197.4479 100
However, this code only works for matrices with double precision entries. How could I modify this code so that it works for complex matrices?
I have a very limited knowledge of programming, but I am trying to learn.
Any help is much appreciated!

The Eigen library supports also complex entries via Eigen::MatrixXcd. So in principle it should work if you replace MatrixXd with MatrixXcd. However, this does not compile probably because there is no as-function for complex matrices using Map (c.f. https://github.com/RcppCore/RcppEigen/blob/master/inst/unitTests/runit.RcppEigen.R#L205). The as-function are needed to convert between R data types and C++/Eigen data types (c.f. http://dirk.eddelbuettel.com/code/rcpp/Rcpp-extending.pdf). If you do not use Map, then you can use this:
crossprodCpp <- '
using Eigen::MatrixXcd;
using Eigen::Lower;
const MatrixXcd A(as<MatrixXcd>(AA));
const int m(A.rows());
MatrixXcd AAt(MatrixXcd(m, m).setZero().selfadjointView<Lower>().rankUpdate(A));
return wrap(AAt);
'
fcprd <- inline::cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen")
N <- 100
A <- matrix(complex(real = rnorm(N), imaginary = rnorm(N)), N)
all.equal(fcprd(A), A %*% Conj(t(A)))
However, this is slower than the base R version in my tests:
N <- 1000
A <- matrix(complex(real = rnorm(N * N), imaginary = rnorm(N * N)), N)
all.equal(fcprd(A), A %*% Conj(t(A)))
#> [1] TRUE
microbenchmark::microbenchmark(base = A %*% Conj(t(A)), eigen = fcprd(A))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> base 111.6512 124.4490 145.7583 140.9199 160.3420 241.8986 100
#> eigen 453.6702 501.5419 535.0192 537.2925 564.8746 628.4999 100
Note that matrix multiplication in R is done via BLAS. However, the default BLAS implementation used by R is not very fast. One way to improve R's performance is to use an optimized BLAS library, c.f. https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters.
Alternatively you can use the BLAS function zherk if you have a full BLAS available. Very rough:
dyn.load("/usr/lib/libblas.so")
zherk <- function(a, uplo = 'u', trans = 'n') {
n <- nrow(a)
k <- ncol(a)
c <- matrix(complex(real = 0, imaginary = 0), nrow = n, ncol = n)
z <- .Fortran("zherk",
uplo = as.character(uplo),
trans = as.character(trans),
n = as.integer(n),
k = as.integer(k),
alpha = as.double(1),
a = as.complex(a),
lda = as.integer(n),
beta = as.double(0),
c = as.complex(c),
ldc = as.integer(n))
matrix(z$c, nrow = n, ncol = n)
}
N <- 2
A <- matrix(complex(real = rnorm(N * N), imaginary = rnorm(N * N)), N)
zherk(A, uplo = "l") - A %*% Conj(t(A))
Note that this fills only the upper (or lower) triangular part but is quite fast:
microbenchmark::microbenchmark(base = A %*% Conj(t(A)), blas = zherk(A))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> base 112.5588 117.12531 146.10026 138.37565 167.6811 282.3564 100
#> blas 66.9541 70.12438 91.44617 82.74522 108.4979 188.3728 100

Here is a way to bind an Eigen::Map<Eigen::MatrixXcd> object in Rcpp. The solution works in a R package setup, but I'm not sure about an easy way to put it together using the inline library.
First, you need to provide the following specialization in your inst/include/mylib.h such that this header get included in the RcppExports.cpp:
#include <complex>
#include <Eigen/Core>
#include <Eigen/Dense>
#include <Rcpp.h>
namespace Rcpp {
namespace traits {
template<>
class Exporter<Eigen::Map<Eigen::Matrix<std::complex<double>, Eigen::Dynamic, Eigen::Dynamic> > > {
using OUT = typename Eigen::Map<Eigen::Matrix<std::complex<double>, Eigen::Dynamic, Eigen::Dynamic> >;
const static int RTYPE = ::Rcpp::traits::r_sexptype_traits<std::complex<double>>::rtype;
Rcpp::Vector<RTYPE> vec;
int d_ncol, d_nrow;
public:
Exporter(SEXP x)
: vec(x), d_ncol(1)
, d_nrow(Rf_xlength(x)) {
if (TYPEOF(x) != RTYPE)
throw std::invalid_argument("Wrong R type for mapped matrix");
if (::Rf_isMatrix(x)) {
int* dims = INTEGER(::Rf_getAttrib(x, R_DimSymbol));
d_nrow = dims[0];
d_ncol = dims[1];
}
}
OUT get() { return OUT(reinterpret_cast<std::complex<double>*>(vec.begin()), d_nrow, d_ncol); }
};
}}
The only difference with the unspecialized Exporter available in RcppEigenWrap.h being the reinterpret_cast on the last line. Both std::complex and Rcomplex having C99 complex compatible types, they are supposed to have identical memory layouts regardless of the implementation.
Wrapping it up, you can now create your function as:
// [[Rcpp::export]]
Eigen::MatrixXd selfadj_mult(const Eigen::Map<Eigen::MatrixXcd>& mat) {
Eigen::MatrixXd result = (mat * mat.adjoint()).real();
return result;
}
and then invoke the function in R as:
library(mylib)
library(microbenchmark)
N <- 1000
A <- matrix(complex(real = rnorm(N * N), imaginary = rnorm(N * N)), N)
microbenchmark::microbenchmark(
base = A %*% Conj(t(A))
, eigen = mylib::selfadj_mult(A)
, times = 100L
)
the code is compiled on centos7/gcc83 with -O3 -DNDEBUG -flto -march=generic. R has been build from source with the exact same compiler/flags (using the default BLAS binding). Results are:
Unit: seconds
expr min lq mean median uq max neval
base 2.9030320 2.9045865 2.9097162 2.9053835 2.9093232 2.9614318 100
eigen 1.1978697 1.2004888 1.2134219 1.2031046 1.2057647 1.3035751 100

Related

Rcpp fast statistical mode function with vector input of any type

I'm trying to build a super fast mode function for R to use for aggregating large categorical datasets. The function should take vector input of all supported R types and return the mode. I have read This post, This Help-page and others, but I was not able to make the function take in all R data types. My code now works for numeric vectors, I am relying on Rcpp sugar wrapper functions:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int Mode(NumericVector x, bool narm = false)
{
if (narm) x = x[!is_na(x)];
NumericVector ux = unique(x);
int y = ux[which_max(table(match(x, ux)))];
return y;
}
In addition I was wondering if the 'narm' argument can be renamed 'na.rm' without giving errors, and of course if there is a faster way to code a mode function in C++, I would be grateful to know about it.
In order to make the function work for any vector input, you could implement #JosephWood's algorithm for any data type you want to support and call it from a switch(TYPEOF(x)). But that would be lots of code duplication. Instead, it is better to make a generic function that can work on any Vector<RTYPE> argument. If we follow R's paradigm that everything is a vector and let the function also return a Vector<RTYPE>, then we can make use of RCPP_RETURN_VECTOR. Note that we need C++11 to be able to pass additional arguments to the function called by RCPP_RETURN_VECTOR. One tricky thing is that you need the storage type for Vector<RTYPE> in order to create a suitable std::unordered_map. Here Rcpp::traits::storage_type<RTYPE>::type comes to the rescue. However, std::unordered_map does not know how to deal with complex numbers from R. For simplicity, I am disabling this special case.
Putting it all together:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::plugins(cpp11)]]
#include <unordered_map>
template <int RTYPE>
Vector<RTYPE> fastModeImpl(Vector<RTYPE> x, bool narm){
if (narm) x = x[!is_na(x)];
int myMax = 1;
Vector<RTYPE> myMode(1);
// special case for factors == INTSXP with "class" and "levels" attribute
if (x.hasAttribute("levels")){
myMode.attr("class") = x.attr("class");
myMode.attr("levels") = x.attr("levels");
}
std::unordered_map<typename Rcpp::traits::storage_type<RTYPE>::type, int> modeMap;
modeMap.reserve(x.size());
for (std::size_t i = 0, len = x.size(); i < len; ++i) {
auto it = modeMap.find(x[i]);
if (it != modeMap.end()) {
++(it->second);
if (it->second > myMax) {
myMax = it->second;
myMode[0] = x[i];
}
} else {
modeMap.insert({x[i], 1});
}
}
return myMode;
}
template <>
Vector<CPLXSXP> fastModeImpl(Vector<CPLXSXP> x, bool narm) {
stop("Not supported SEXP type!");
}
// [[Rcpp::export]]
SEXP fastMode( SEXP x, bool narm = false ){
RCPP_RETURN_VECTOR(fastModeImpl, x, narm);
}
/*** R
set.seed(1234)
s <- sample(1e5, replace = TRUE)
fastMode(s)
fastMode(s + 0.1)
l <- sample(c(TRUE, FALSE), 11, replace = TRUE)
fastMode(l)
c <- sample(letters, 1e5, replace = TRUE)
fastMode(c)
f <- as.factor(c)
fastMode(f)
*/
Output:
> set.seed(1234)
> s <- sample(1e5, replace = TRUE)
> fastMode(s)
[1] 85433
> fastMode(s + 0.1)
[1] 85433.1
> l <- sample(c(TRUE, FALSE), 11, replace = TRUE)
> fastMode(l)
[1] TRUE
> c <- sample(letters, 1e5, replace = TRUE)
> fastMode(c)
[1] "z"
> f <- as.factor(c)
> fastMode(f)
[1] z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
As noted above, the used algorithm comes from Joseph Wood's answer, which has been explicitly dual-licensed under CC-BY-SA and GPL >= 2. I am following Joseph and hereby license the code in this answer under the GPL (version 2 or later) in addition to the implicit CC-BY-SA license.
In your Mode function, since you are mostly calling sugar wrapper functions, you won't see that much improvement over base R. In fact, simply writing a faithful base R translation, we have:
baseMode <- function(x, narm = FALSE) {
if (narm) x <- x[!is.na(x)]
ux <- unique(x)
ux[which.max(table(match(x, ux)))]
}
And benchmarking, we have:
set.seed(1234)
s <- sample(1e5, replace = TRUE)
library(microbenchmark)
microbenchmark(Mode(s), baseMode(s), times = 10, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Mode(s) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
baseMode(s) 1.490765 1.645367 1.571132 1.616061 1.637181 1.448306 10
Typically, when we undertake the effort of writing our own compiled code, we would expect bigger gains. Simply wrapping these already efficient compiled functions in Rcpp isn't going to magically get you the gains you expect. In fact, on larger examples the base solution is faster. Observe:
set.seed(1234)
sBig <- sample(1e6, replace = TRUE)
system.time(Mode(sBig))
user system elapsed
1.410 0.036 1.450
system.time(baseMode(sBig))
user system elapsed
0.915 0.025 0.943
To address your question of writing a faster mode function, we can make use of std::unordered_map, which is very similar to table underneath the hood (i.e. they are both hash tables at their heart). Additionally, since you are returning a single integer, we can safely assume that we can replace NumericVector with IntegerVector and also that you are not concerned with returning every value that occurs the most.
The algorithm below can be modified to return the true mode, but I will leave that as an exercise (hint: you will need std::vector along with taking some sort of action when it->second == myMax). N.B. you will also need to add // [[Rcpp::plugins(cpp11)]] at the top of your cpp file for std::unordered_map and auto.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
#include <unordered_map>
// [[Rcpp::export]]
int fastIntMode(IntegerVector x, bool narm = false) {
if (narm) x = x[!is_na(x)];
int myMax = 1;
int myMode = 0;
std::unordered_map<int, int> modeMap;
modeMap.reserve(x.size());
for (std::size_t i = 0, len = x.size(); i < len; ++i) {
auto it = modeMap.find(x[i]);
if (it != modeMap.end()) {
++(it->second);
if (it->second > myMax) {
myMax = it->second;
myMode = x[i];
}
} else {
modeMap.insert({x[i], 1});
}
}
return myMode;
}
And the benchmarks:
microbenchmark(Mode(s), baseMode(s), fastIntMode(s), times = 15, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Mode(s) 6.428343 6.268131 6.622914 6.134388 6.881746 7.78522 15
baseMode(s) 9.757491 9.404101 9.454857 9.169315 9.018938 10.16640 15
fastIntMode(s) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 15
Now we are talking... about 6x faster than the original and 9x faster than base. They all return the same value:
fastIntMode(s)
##[1] 85433
baseMode(s)
##[1] 85433
Mode(s)
##[1] 85433
And for our larger example:
## base R returned in 0.943s
system.time(fastIntMode(s))
user system elapsed
0.217 0.006 0.224
In addition to the implicit CC-BY-SA license I hereby license the code in this answer under the GPL >= 2.
To follow up with some shameless self-promotion, I have now published a package collapse on CRAN which includes a full set of Fast Statistical Functions, amonst them the generic function fmode. The implementation is based on index hashing and even faster than the solution above. fmode can be used to perform simple, grouped and/or weighted mode calculations on vectors, matrices, data.frames and dplyr grouped tibbles. Syntax:
fmode(x, g = NULL, w = NULL, ...)
where x is a vector, matrix, data.frame or grouped_df, g is a grouping vector or list of grouping vectors, and w is a vector of weights. A compact solution to categorical and mixed aggregation problems is further provided by the function collap. The code
collap(data, ~ id1 + id2, FUN = fmean, catFUN = fmode)
aggregates the mixed type data.frame data applying fmean to numeric and fmode to categorical columns. More customized calls are also possible. Together with the Fast Statistical Functions, collap is just as fast as data.table on large numeric data, and categorical and weighted aggregations are significantly faster than anything that can presently be done with data.table.

Armadillo - Norm of each small block in a long vector

I am using Armadillo in C++.
I have a long vector with 10 elements. I want to take norm 2 of each block of 2 adjacent values. In the end I will have 5 values.
In R I can convert that vector into a matrix and use apply but I am not sure how to do it in Armadillo. Appreciate any help
You just have to create a matrix from your vector and then loop through the columns.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec foo_Cpp(arma::vec x) {
// Note that the dimension of x must be divisible by two.
arma::mat X = arma::mat(x.memptr(), 2, x.n_elem/2);
arma::uword n = X.n_cols;
arma::vec norms = arma::vec(n);
for (arma::uword i = 0; i < n; i++) {
norms(i) = arma::norm(X.col(i), 2);
}
return norms;
}
/*** R
foo_R <- function(x) {
X <- matrix(x, 2, length(x)/2)
apply(X, 2, norm, type = "2")
}
x <- rnorm(1000)
all.equal(foo_R(x), c(foo_Cpp(x)))
microbenchmark::microbenchmark(foo_R(x), foo_Cpp(x))
*/
> all.equal(foo_R(x), c(foo_Cpp(x)))
[1] TRUE
> microbenchmark::microbenchmark(foo_R(x), foo_Cpp(x))
Unit: microseconds
expr min lq mean median uq max neval
foo_R(x) 17907.290 19640.24 21548.06789 20386.5815 21212.609 50780.584 100
foo_Cpp(x) 5.133 6.34 26.48266 19.4705 21.734 1191.124 100

Matrix multiplication in Rcpp

First of all, I am a novice user so forget my general ignorance. I am looking for a faster alternative to the %*% operator in R. Even though older posts suggest the use of RcppArmadillo, I have tried for 2 hours to make RcppArmadillo work without success. I always run into lexical issues that yield 'unexpected ...' errors. I have found the following function in Rcpp which I do can make work:
library(Rcpp)
func <- '
NumericMatrix mmult( NumericMatrix m , NumericMatrix v, bool byrow=true )
{
if( ! m.nrow() == v.nrow() ) stop("Non-conformable arrays") ;
if( ! m.ncol() == v.ncol() ) stop("Non-conformable arrays") ;
NumericMatrix out(m) ;
for (int i = 0; i < m.nrow(); i++)
{
for (int j = 0; j < m.ncol(); j++)
{
out(i,j)=m(i,j) * v(i,j) ;
}
}
return out ;
}
'
This function, however, performs element-wise multiplication and does not behave as %*%. Is there an easy way to modify the above code to achieve the intended result?
EDIT:
I have come up with a function using RcppEigen that seems to beat %*%:
etest <- cxxfunction(signature(tm="NumericMatrix",
tm2="NumericMatrix"),
plugin="RcppEigen",
body="
NumericMatrix tm22(tm2);
NumericMatrix tmm(tm);
const Eigen::Map<Eigen::MatrixXd> ttm(as<Eigen::Map<Eigen::MatrixXd> >(tmm));
const Eigen::Map<Eigen::MatrixXd> ttm2(as<Eigen::Map<Eigen::MatrixXd> >(tm22));
Eigen::MatrixXd prod = ttm*ttm2;
return(wrap(prod));
")
set.seed(123)
M1 <- matrix(sample(1e3),ncol=50)
M2 <- matrix(sample(1e3),nrow=50)
identical(etest(M1,M2), M1 %*% M2)
[1] TRUE
res <- microbenchmark(
+ etest(M1,M2),
+ M1 %*% M2,
+ times=10000L)
res
Unit: microseconds
expr min lq mean median uq max neval
etest(M1, M2) 5.709 6.61 7.414607 6.611 7.211 49.879 10000
M1 %*% M2 11.718 12.32 13.505272 12.621 13.221 58.592 10000
There are good reasons to rely on existing libraries / packages for standard tasks. The routines in the libraries are
optimized
thoroughly tested
a good means to keep the code compact, human-readable, and easy to maintain.
Therefore I think that using RcppArmadillo or RcppEigen should be preferable here. However, to answer your question, below is a possible Rcpp code to perform a matrix multiplication:
library(Rcpp)
cppFunction('NumericMatrix mmult(const NumericMatrix& m1, const NumericMatrix& m2){
if (m1.ncol() != m2.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(m1.nrow(),m2.ncol());
NumericVector rm1, cm2;
for (size_t i = 0; i < m1.nrow(); ++i) {
rm1 = m1(i,_);
for (size_t j = 0; j < m2.ncol(); ++j) {
cm2 = m2(_,j);
out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
}
}
return out;
}')
Let's test it:
A <- matrix(c(1:6),ncol=2)
B <- matrix(c(0:7),nrow=2)
mmult(A,B)
# [,1] [,2] [,3] [,4]
#[1,] 4 14 24 34
#[2,] 5 19 33 47
#[3,] 6 24 42 60
identical(mmult(A,B), A %*% B)
#[1] TRUE
Hope this helps.
As benchmark tests show, the above Rcpp code is slower than R's built-in %*% operator. I assume that, while my Rcpp code can certainly be improved, it will be hard to beat the optimized code behind %*% in terms of performance:
library(microbenchmark)
set.seed(123)
M1 <- matrix(rnorm(1e4),ncol=100)
M2 <- matrix(rnorm(1e4),nrow=100)
identical(M1 %*% M2, mmult(M1,M2))
#[1] TRUE
res <- microbenchmark(
mmult(M1,M2),
M1 %*% M2,
times=1000L)
#> res
#Unit: microseconds
# expr min lq mean median uq max neval cld
# mmult(M1, M2) 1466.855 1484.8535 1584.9509 1494.0655 1517.5105 2699.643 1000 b
# M1 %*% M2 602.053 617.9685 687.6863 621.4335 633.7675 2774.954 1000 a
I would encourage to try to work out your issues with RcppArmadillo. Using it is as simple as this example also created by calling RcppArmadillo.package.skeleton():
// another simple example: outer product of a vector,
// returning a matrix
//
// [[Rcpp::export]]
arma::mat rcpparma_outerproduct(const arma::colvec & x) {
arma::mat m = x * x.t();
return m;
}
// and the inner product returns a scalar
//
// [[Rcpp::export]]
double rcpparma_innerproduct(const arma::colvec & x) {
double v = arma::as_scalar(x.t() * x);
return v;
}
There is actually more code in the example but this should give you an idea.
The following approach can also be used :
NumericMatrix mmult(NumericMatrix m, NumericMatrix v)
{
Environment base("package:base");
Function mat_Mult = base["%*%"];
return(mat_Mult(m, v));
}
With this approach, we use the operator %*% of R.

Inverse of sum of two matrices

I am trying to implement a code to compute the inverse of a sum of two matrices. My algorithm is recursive, and I need to use a loop for() I tried to do in R, but my code is very slow. Then, I am trying to do using RcppArmadillo, but my code is very very slow. I think I am doing some thing wrong. Let me show my R code.
mySolveR <- function(A,B){
ncol = dim(B)[1]
ZERO.B <- Matrix(0,ncol = ncol, nrow = ncol)
invCi <- A
for(i in 1:ncol){
ZERO.B[,i] <- B[,i]
gi <- 1/(1 + sum(diag(ZERO.B%*%invCi)))
invCi <- invCi - gi*(invCi%*%ZERO.B%*%invCi)
ZERO.B[,i] <- 0
}
return(invCi)}
And now my C++ code using RcppArmadillo.
src <- '
Rcpp::NumericMatrix Ac(A); // creates Rcpp matrix from SEXP
Rcpp::NumericMatrix Bc(B);
int n = Ac.nrow(), k = Ac.ncol();
arma::mat A(Ac.begin(), n, k, false); // reuses memory and avoids extra copy
arma::mat B(Bc.begin(), n, k, false);
arma::mat Z(n,k);
Z.zeros();
arma::mat invCi = A;
for( int i = 0 ; i < n ; i++){
Z.col(i) = B.col(i);
double gi = 1/(1 + trace(Z*invCi));
invCi = invCi - gi*(invCi*Z*invCi);
Z.zeros() ;
}
return wrap(invCi);'
I am using the inline package to compile my function.
mySolveCpp <- cxxfunction(signature(A = "numeric", B = "numeric"),
src, plugin="RcppArmadillo")
Now consider the following easy example,
A <- diag(5)
B <- matrix(c(1,-1,0,0,0, -1, 2, -1,0,0, 0,-1,2,-1,0,
0,0,-1,2,-1, 0,0,0,-1,1),5,5)
Using my function to compute the inverse of A + B
mySolveCpp(A,B)
mySolveR(A,B)
You can see my functions work well, in this small example. But I would like to apply this algorithm for a matrix around 15000 x 15000. In this case my R code does not work and my C++ code is very slow, spends hours to compute the inverse. I would like to know if is possible to improve my C++ code to deal with big matrix, as 15000 x 15000.
Best
Have you tried solve()?
A <- diag(5)
B <- matrix(c(1,-1,0,0,0, -1, 2, -1,0,0, 0,-1,2,-1,0,0,0,-1,2,-1, 0,0,0,-1,1),5,5)
solve(A+B)
For sparse Matrix objects:
As=Matrix(A)
Bs=Matrix(B)
solve(As+Bs)
5 x 5 Matrix of class "dsyMatrix"
[,1] [,2] [,3] [,4] [,5]
[1,] 0.61818182 0.23636364 0.09090909 0.03636364 0.01818182
[2,] 0.23636364 0.47272727 0.18181818 0.07272727 0.03636364
[3,] 0.09090909 0.18181818 0.45454545 0.18181818 0.09090909
[4,] 0.03636364 0.07272727 0.18181818 0.47272727 0.23636364
[5,] 0.01818182 0.03636364 0.09090909 0.23636364 0.61818182
I'm more comfortable with Eigen and can get some speed-up without changing the algorithm:
src2 <- '
using Eigen::Map;
using Eigen::MatrixXd;
using Rcpp::as;
const Map<MatrixXd> A(as<Map<MatrixXd> >(AA));
const Map<MatrixXd> B(as<Map<MatrixXd> >(BB));
const int n = A.rows(), k = A.cols();
MatrixXd Z(n,k), C(n,k);
const MatrixXd Z0 = Z.setZero();
MatrixXd invCi = A;
double gi;
for( int i = 0 ; i < n ; i++){
Z.col(i) = B.col(i);
C = Z*invCi;
gi = 1/(1 + C.trace());
invCi -= gi*(invCi*C);
Z=Z0;
}
return wrap(invCi);'
mySolveCpp2 <- cxxfunction(signature(AA = "matrix", BB = "matrix"),
src2, plugin="RcppEigen")
set.seed(42)
A <- matrix(rnorm(1e4), 1e2)
B <- matrix(rnorm(1e4), 1e2)
all.equal(
mySolveCpp(A,B),
mySolveCpp2(A,B))
#[1] TRUE
library(microbenchmark)
microbenchmark(mySolveCpp(A,B),
mySolveCpp2(A,B), times=10)
#Unit: milliseconds
# expr min lq median uq max neval
# mySolveCpp(A, B) 129.51222 129.62216 132.68336 136.67307 137.43591 10
# mySolveCpp2(A, B) 46.76913 47.26311 47.96435 50.12505 61.82288 10

Speeding up computation of Dice coefficient in C / Rcpp

I need to compute a similarity measure call the Dice coefficient over large matrices (600,000 x 500) of binary vectors in R. For speed I use C / Rcpp. The function runs great but as I am not a computer scientist by background I would like to know if it could run faster. This code is suitable for parallelisation but I have no experience parallelising C code.
The Dice coefficient is a simple measure of similarity / dissimilarity (depending how you take it). It is intended to compare asymmetric binary vectors, meaning one of the combination (usually 0-0) is not important and agreement (1-1 pairs) have more weight than disagreement (1-0 or 0-1 pairs). Imagine the following contingency table:
1 0
1 a b
0 c d
The Dice coef is: (2*a) / (2*a +b + c)
Here is my Rcpp implementation:
library(Rcpp)
cppFunction('
NumericMatrix dice(NumericMatrix binaryMat){
int nrows = binaryMat.nrow(), ncols = binaryMat.ncol();
NumericMatrix results(ncols, ncols);
for(int i=0; i < ncols-1; i++){ // columns fixed
for(int j=i+1; j < ncols; j++){ // columns moving
double a = 0;
double d = 0;
for (int l = 0; l < nrows; l++) {
if(binaryMat(l, i)>0){
if(binaryMat(l, j)>0){
a++;
}
}else{
if(binaryMat(l, j)<1){
d++;
}
}
}
// compute Dice coefficient
double abc = nrows - d;
double bc = abc - a;
results(j,i) = (2*a) / (2*a + bc);
}
}
return wrap(results);
}
')
And here is a running example:
x <- rbinom(1:200000, 1, 0.5)
X <- matrix(x, nrow = 200, ncol = 1000)
system.time(dice(X))
user system elapsed
0.814 0.000 0.814
The solution proposed by Roland was not entirely satisfying for my use case. So based on the source code from the arules package I implement a much faster version. The code in arules rely on an algorithm from Leisch (2005) using the tcrossproduct() function in R.
First, I wrote a Rcpp / RcppEigen version of crossprod that is 2-3 time faster. This is based on the example code in the RcppEigen vignette.
library(Rcpp)
library(RcppEigen)
library(inline)
crossprodCpp <- '
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::Lower;
const Map<MatrixXi> A(as<Map<MatrixXi> >(AA));
const int m(A.rows()), n(A.cols());
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint()));
return wrap(AtA);
'
fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen")
Then I wrote a small R function to compute the Dice coefficient.
diceR <- function(X){
a <- fcprd(X)
nx <- ncol(X)
rsx <- colSums(X)
c <- matrix(rsx, nrow = nx, ncol = nx) - a
# b <- matrix(rsx, nrow = nx, ncol = nx, byrow = TRUE) - a
b <- t(c)
m <- (2 * a) / (2*a + b + c)
return(m)
}
This new function is ~8 time faster than the old one and ~3 time faster than the one in arules.
m <- microbenchmark(dice(X), diceR(X), dissimilarity(t(X), method="dice"), times=100)
m
# Unit: milliseconds
# expr min lq median uq max neval
# dice(X) 791.34558 809.8396 812.19480 814.6735 910.1635 100
# diceR(X) 62.98642 76.5510 92.02528 159.2557 507.1662 100
# dissimilarity(t(X), method = "dice") 264.07997 342.0484 352.59870 357.4632 520.0492 100
I cannot run your function at work, but is the result the same as this?
library(arules)
plot(dissimilarity(X,method="dice"))
system.time(dissimilarity(X,method="dice"))
#user system elapsed
#0.04 0.00 0.04